netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch net v2 1/2] net_sched: get rid of tcfa_rcu
@ 2017-09-07  4:26 Cong Wang
  2017-09-07  4:26 ` [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-09-07  4:26 UTC (permalink / raw)
  To: netdev; +Cc: jakub.kicinski, Cong Wang, Jiri Pirko, Eric Dumazet

gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller is no longer needed to wait for a grace period.
So this patch gets rid of it.

This also completely closes a race condition between action free
path and filter chain add/remove path for the following patch.
Because otherwise the nested RCU callback can't be caught by
rcu_barrier().

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/act_api.h |  2 --
 net/sched/act_api.c   | 12 +++---------
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 26ffd8333f50..68218a5f8e72 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -38,7 +38,6 @@ struct tc_action {
 	struct gnet_stats_queue		tcfa_qstats;
 	struct net_rate_estimator __rcu *tcfa_rate_est;
 	spinlock_t			tcfa_lock;
-	struct rcu_head			tcfa_rcu;
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue __percpu *cpu_qstats;
 	struct tc_cookie	*act_cookie;
@@ -55,7 +54,6 @@ struct tc_action {
 #define tcf_qstats	common.tcfa_qstats
 #define tcf_rate_est	common.tcfa_rate_est
 #define tcf_lock	common.tcfa_lock
-#define tcf_rcu		common.tcfa_rcu
 
 static inline unsigned int tcf_hash(u32 index, unsigned int hmask)
 {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f2e9ed34a963..fc17d286a6a2 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -53,10 +53,8 @@ static void tcf_action_goto_chain_exec(const struct tc_action *a,
 	res->goto_tp = rcu_dereference_bh(chain->filter_chain);
 }
 
-static void free_tcf(struct rcu_head *head)
+static void free_tcf(struct tc_action *p)
 {
-	struct tc_action *p = container_of(head, struct tc_action, tcfa_rcu);
-
 	free_percpu(p->cpu_bstats);
 	free_percpu(p->cpu_qstats);
 
@@ -76,11 +74,7 @@ static void tcf_hash_destroy(struct tcf_hashinfo *hinfo, struct tc_action *p)
 	hlist_del(&p->tcfa_head);
 	spin_unlock_bh(&hinfo->lock);
 	gen_kill_estimator(&p->tcfa_rate_est);
-	/*
-	 * gen_estimator est_timer() might access p->tcfa_lock
-	 * or bstats, wait a RCU grace period before freeing p
-	 */
-	call_rcu(&p->tcfa_rcu, free_tcf);
+	free_tcf(p);
 }
 
 int __tcf_hash_release(struct tc_action *p, bool bind, bool strict)
@@ -271,7 +265,7 @@ void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est)
 {
 	if (est)
 		gen_kill_estimator(&a->tcfa_rate_est);
-	call_rcu(&a->tcfa_rcu, free_tcf);
+	free_tcf(a);
 }
 EXPORT_SYMBOL(tcf_hash_cleanup);
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-07  4:26 [Patch net v2 1/2] net_sched: get rid of tcfa_rcu Cong Wang
@ 2017-09-07  4:26 ` Cong Wang
  2017-09-07  6:32   ` Jiri Pirko
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-09-07  4:26 UTC (permalink / raw)
  To: netdev; +Cc: jakub.kicinski, Cong Wang, Jiri Pirko

This patch fixes the following madness of tc filter chain:

1) tcf_chain_destroy() is called by both tcf_block_put() and
   tcf_chain_put().  tcf_chain_put() is correctly refcnt'ed and paired
   with tcf_chain_get(), but tcf_block_put() is not, it should be paired
   with tcf_block_get() which means we still need to decrease the refcnt.
   Think it in another way: if we call tcf_bock_put() immediately after
   tcf_block_get(), could we get effectively a nop? This causes a memory
   leak as reported by Jakub.

2) tp proto should hold a refcnt to the chain too. This significantly
   simplifies the logic:

2a) Chain 0 is no longer special, it is created and refcnted by tp
    like any other chains. All the ugliness in tcf_chain_put() can be
    gone!

2b) No need to handle the flushing oddly, because block still holds
    chain 0, it can not be released, this guarantees block is the last
    user.

2c) The race condition with RCU callbacks is easier to handle with just
    a rcu_barrier()! Much easier to understand, nothing to hide! Thanks
    to the previous patch. Please see also the comments in code.

2d) Make the code understandable by humans, much less error-prone.

Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_api.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6c5ea84d2682..e9060dc36519 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -209,21 +209,20 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 		RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
 	while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
 		RCU_INIT_POINTER(chain->filter_chain, tp->next);
+		tcf_chain_put(chain);
 		tcf_proto_destroy(tp);
 	}
 }
 
 static void tcf_chain_destroy(struct tcf_chain *chain)
 {
-	/* May be already removed from the list by the previous call. */
-	if (!list_empty(&chain->list))
-		list_del_init(&chain->list);
+	list_del(&chain->list);
+	kfree(chain);
+}
 
-	/* There might still be a reference held when we got here from
-	 * tcf_block_put. Wait for the user to drop reference before free.
-	 */
-	if (!chain->refcnt)
-		kfree(chain);
+static void tcf_chain_hold(struct tcf_chain *chain)
+{
+	++chain->refcnt;
 }
 
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
@@ -233,7 +232,7 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
 
 	list_for_each_entry(chain, &block->chain_list, list) {
 		if (chain->index == chain_index) {
-			chain->refcnt++;
+			tcf_chain_hold(chain);
 			return chain;
 		}
 	}
@@ -246,10 +245,7 @@ EXPORT_SYMBOL(tcf_chain_get);
 
 void tcf_chain_put(struct tcf_chain *chain)
 {
-	/* Destroy unused chain, with exception of chain 0, which is the
-	 * default one and has to be always present.
-	 */
-	if (--chain->refcnt == 0 && !chain->filter_chain && chain->index != 0)
+	if (--chain->refcnt == 0)
 		tcf_chain_destroy(chain);
 }
 EXPORT_SYMBOL(tcf_chain_put);
@@ -294,10 +290,18 @@ void tcf_block_put(struct tcf_block *block)
 	if (!block)
 		return;
 
-	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
+	/* Standalone actions are not allowed to jump to any chain, and
+	 * bound actions should be all removed after flushing. However,
+	 * filters are destroyed in RCU callbacks, we have to flush and wait
+	 * for them before releasing this refcnt, otherwise we race with RCU
+	 * callbacks!!!
+	 */
+	list_for_each_entry(chain, &block->chain_list, list)
 		tcf_chain_flush(chain);
-		tcf_chain_destroy(chain);
-	}
+	rcu_barrier();
+
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
+		tcf_chain_put(chain);
 	kfree(block);
 }
 EXPORT_SYMBOL(tcf_block_put);
@@ -375,6 +379,7 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 		rcu_assign_pointer(*chain->p_filter_chain, tp);
 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
+	tcf_chain_hold(chain);
 }
 
 static void tcf_chain_tp_remove(struct tcf_chain *chain,
@@ -386,6 +391,7 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 	if (chain->p_filter_chain && tp == chain->filter_chain)
 		RCU_INIT_POINTER(*chain->p_filter_chain, next);
 	RCU_INIT_POINTER(*chain_info->pprev, next);
+	tcf_chain_put(chain);
 }
 
 static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-07  4:26 ` [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain Cong Wang
@ 2017-09-07  6:32   ` Jiri Pirko
  2017-09-07 17:45     ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Pirko @ 2017-09-07  6:32 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jakub.kicinski, Jiri Pirko

Thu, Sep 07, 2017 at 06:26:07AM CEST, xiyou.wangcong@gmail.com wrote:
>This patch fixes the following madness of tc filter chain:

Could you avoid expressive words like "madness" and such?
Please be technical.


>
>1) tcf_chain_destroy() is called by both tcf_block_put() and
>   tcf_chain_put().  tcf_chain_put() is correctly refcnt'ed and paired
>   with tcf_chain_get(), but tcf_block_put() is not, it should be paired
>   with tcf_block_get() which means we still need to decrease the refcnt.
>   Think it in another way: if we call tcf_bock_put() immediately after
>   tcf_block_get(), could we get effectively a nop? This causes a memory
>   leak as reported by Jakub.
>
>2) tp proto should hold a refcnt to the chain too. This significantly
>   simplifies the logic:
>
>2a) Chain 0 is no longer special, it is created and refcnted by tp
>    like any other chains. All the ugliness in tcf_chain_put() can be
>    gone!
>
>2b) No need to handle the flushing oddly, because block still holds
>    chain 0, it can not be released, this guarantees block is the last
>    user.
>
>2c) The race condition with RCU callbacks is easier to handle with just
>    a rcu_barrier()! Much easier to understand, nothing to hide! Thanks
>    to the previous patch. Please see also the comments in code.
>
>2d) Make the code understandable by humans, much less error-prone.
>
>Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
>Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
>Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>Cc: Jiri Pirko <jiri@mellanox.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>---
> net/sched/cls_api.c | 38 ++++++++++++++++++++++----------------
> 1 file changed, 22 insertions(+), 16 deletions(-)
>
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index 6c5ea84d2682..e9060dc36519 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -209,21 +209,20 @@ static void tcf_chain_flush(struct tcf_chain *chain)
> 		RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
> 	while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
> 		RCU_INIT_POINTER(chain->filter_chain, tp->next);
>+		tcf_chain_put(chain);
> 		tcf_proto_destroy(tp);
> 	}
> }
> 
> static void tcf_chain_destroy(struct tcf_chain *chain)
> {
>-	/* May be already removed from the list by the previous call. */
>-	if (!list_empty(&chain->list))
>-		list_del_init(&chain->list);
>+	list_del(&chain->list);
>+	kfree(chain);
>+}
> 
>-	/* There might still be a reference held when we got here from
>-	 * tcf_block_put. Wait for the user to drop reference before free.
>-	 */
>-	if (!chain->refcnt)
>-		kfree(chain);
>+static void tcf_chain_hold(struct tcf_chain *chain)
>+{
>+	++chain->refcnt;
> }
> 
> struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>@@ -233,7 +232,7 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
> 
> 	list_for_each_entry(chain, &block->chain_list, list) {
> 		if (chain->index == chain_index) {
>-			chain->refcnt++;
>+			tcf_chain_hold(chain);
> 			return chain;
> 		}
> 	}
>@@ -246,10 +245,7 @@ EXPORT_SYMBOL(tcf_chain_get);
> 
> void tcf_chain_put(struct tcf_chain *chain)
> {
>-	/* Destroy unused chain, with exception of chain 0, which is the
>-	 * default one and has to be always present.
>-	 */
>-	if (--chain->refcnt == 0 && !chain->filter_chain && chain->index != 0)
>+	if (--chain->refcnt == 0)

Okay, so you take the reference for every goto_chain action and every
tp, right? Note that for chain 0, you hold one more reference (due to
the creation). That is probably ok as we need chain 0 not to go away
even if all tps and goto_chain actions are gone.


> 		tcf_chain_destroy(chain);
> }
> EXPORT_SYMBOL(tcf_chain_put);
>@@ -294,10 +290,18 @@ void tcf_block_put(struct tcf_block *block)
> 	if (!block)
> 		return;
> 
>-	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
>+	/* Standalone actions are not allowed to jump to any chain, and
>+	 * bound actions should be all removed after flushing. However,
>+	 * filters are destroyed in RCU callbacks, we have to flush and wait
>+	 * for them before releasing this refcnt, otherwise we race with RCU
>+	 * callbacks!!!

Why the "!!!"? Please avoid that. Not necessary at all.


>+	 */
>+	list_for_each_entry(chain, &block->chain_list, list)
> 		tcf_chain_flush(chain);
>-		tcf_chain_destroy(chain);
>-	}
>+	rcu_barrier();

This actually tries to fix another bug I discovered yesterday. Good.


>+
>+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>+		tcf_chain_put(chain);

Which reference are you putting here? For chain 0, that is the original
reference due to creation from block_get. But how about the other
chains? If you do flush all in the previous list iteration, they are
removed there. Also note that they are removed from the list while
iterating it. 

I believe that you need to add tcf_chain_hold(chain) to the start of the
previous list iteration to ensure all existing chains will stay, then
you can put them here.

Did you test this? I believe we need some simple test script.


> 	kfree(block);
> }
> EXPORT_SYMBOL(tcf_block_put);
>@@ -375,6 +379,7 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
> 		rcu_assign_pointer(*chain->p_filter_chain, tp);
> 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain_info));
> 	rcu_assign_pointer(*chain_info->pprev, tp);
>+	tcf_chain_hold(chain);
> }
> 
> static void tcf_chain_tp_remove(struct tcf_chain *chain,
>@@ -386,6 +391,7 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
> 	if (chain->p_filter_chain && tp == chain->filter_chain)
> 		RCU_INIT_POINTER(*chain->p_filter_chain, next);
> 	RCU_INIT_POINTER(*chain_info->pprev, next);
>+	tcf_chain_put(chain);
> }
> 
> static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
>-- 
>2.13.0
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-07  6:32   ` Jiri Pirko
@ 2017-09-07 17:45     ` Cong Wang
  2017-09-07 20:52       ` Jiri Pirko
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-09-07 17:45 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Linux Kernel Network Developers, Jakub Kicinski, Jiri Pirko

On Wed, Sep 6, 2017 at 11:32 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Sep 07, 2017 at 06:26:07AM CEST, xiyou.wangcong@gmail.com wrote:
>>This patch fixes the following madness of tc filter chain:
>
> Could you avoid expressive words like "madness" and such?
> Please be technical.
>

If the following 2a) 2b) 2c) 2d) are not enough to show the madness,
I don't know any other to show it. Madness is for code, not for you
or any other person, so 100% technical.

>
>>
>>1) tcf_chain_destroy() is called by both tcf_block_put() and
>>   tcf_chain_put().  tcf_chain_put() is correctly refcnt'ed and paired
>>   with tcf_chain_get(), but tcf_block_put() is not, it should be paired
>>   with tcf_block_get() which means we still need to decrease the refcnt.
>>   Think it in another way: if we call tcf_bock_put() immediately after
>>   tcf_block_get(), could we get effectively a nop? This causes a memory
>>   leak as reported by Jakub.
>>
>>2) tp proto should hold a refcnt to the chain too. This significantly
>>   simplifies the logic:
>>
>>2a) Chain 0 is no longer special, it is created and refcnted by tp
>>    like any other chains. All the ugliness in tcf_chain_put() can be
>>    gone!
>>
>>2b) No need to handle the flushing oddly, because block still holds
>>    chain 0, it can not be released, this guarantees block is the last
>>    user.
>>
>>2c) The race condition with RCU callbacks is easier to handle with just
>>    a rcu_barrier()! Much easier to understand, nothing to hide! Thanks
>>    to the previous patch. Please see also the comments in code.
>>
>>2d) Make the code understandable by humans, much less error-prone.
>>
>>Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
>>Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
>>Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>Cc: Jiri Pirko <jiri@mellanox.com>
>>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>---
>> net/sched/cls_api.c | 38 ++++++++++++++++++++++----------------
>> 1 file changed, 22 insertions(+), 16 deletions(-)
>>
>>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>index 6c5ea84d2682..e9060dc36519 100644
>>--- a/net/sched/cls_api.c
>>+++ b/net/sched/cls_api.c
>>@@ -209,21 +209,20 @@ static void tcf_chain_flush(struct tcf_chain *chain)
>>               RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
>>       while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
>>               RCU_INIT_POINTER(chain->filter_chain, tp->next);
>>+              tcf_chain_put(chain);
>>               tcf_proto_destroy(tp);
>>       }
>> }
>>
>> static void tcf_chain_destroy(struct tcf_chain *chain)
>> {
>>-      /* May be already removed from the list by the previous call. */
>>-      if (!list_empty(&chain->list))
>>-              list_del_init(&chain->list);
>>+      list_del(&chain->list);
>>+      kfree(chain);
>>+}
>>
>>-      /* There might still be a reference held when we got here from
>>-       * tcf_block_put. Wait for the user to drop reference before free.
>>-       */
>>-      if (!chain->refcnt)
>>-              kfree(chain);
>>+static void tcf_chain_hold(struct tcf_chain *chain)
>>+{
>>+      ++chain->refcnt;
>> }
>>
>> struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>@@ -233,7 +232,7 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>
>>       list_for_each_entry(chain, &block->chain_list, list) {
>>               if (chain->index == chain_index) {
>>-                      chain->refcnt++;
>>+                      tcf_chain_hold(chain);
>>                       return chain;
>>               }
>>       }
>>@@ -246,10 +245,7 @@ EXPORT_SYMBOL(tcf_chain_get);
>>
>> void tcf_chain_put(struct tcf_chain *chain)
>> {
>>-      /* Destroy unused chain, with exception of chain 0, which is the
>>-       * default one and has to be always present.
>>-       */
>>-      if (--chain->refcnt == 0 && !chain->filter_chain && chain->index != 0)
>>+      if (--chain->refcnt == 0)
>
> Okay, so you take the reference for every goto_chain action and every
> tp, right? Note that for chain 0, you hold one more reference (due to
> the creation). That is probably ok as we need chain 0 not to go away
> even if all tps and goto_chain actions are gone.

Yeah, this is the core of the patch.

>
>
>>               tcf_chain_destroy(chain);
>> }
>> EXPORT_SYMBOL(tcf_chain_put);
>>@@ -294,10 +290,18 @@ void tcf_block_put(struct tcf_block *block)
>>       if (!block)
>>               return;
>>
>>-      list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
>>+      /* Standalone actions are not allowed to jump to any chain, and
>>+       * bound actions should be all removed after flushing. However,
>>+       * filters are destroyed in RCU callbacks, we have to flush and wait
>>+       * for them before releasing this refcnt, otherwise we race with RCU
>>+       * callbacks!!!
>
> Why the "!!!"? Please avoid that. Not necessary at all.


Because we don't have lock here, we have to pay more extra
attention to it. Playing list without lock is like playing fire. I use "!!!" to
draw people's attention when they touch it, perhaps "XXX" is better?


>
>
>>+       */
>>+      list_for_each_entry(chain, &block->chain_list, list)
>>               tcf_chain_flush(chain);
>>-              tcf_chain_destroy(chain);
>>-      }
>>+      rcu_barrier();
>
> This actually tries to fix another bug I discovered yesterday. Good.
>

This on the other hand proves we should make the code clean
and understandable asap, not just wait for net-next.



>
>>+
>>+      list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>>+              tcf_chain_put(chain);
>
> Which reference are you putting here? For chain 0, that is the original
> reference due to creation from block_get. But how about the other
> chains? If you do flush all in the previous list iteration, they are
> removed there. Also note that they are removed from the list while
> iterating it.


Yes it is for chain 0, because block holds a reference to chain 0 during
creation. Non-0 chains are created with refcnt==1 too but paired with
tcf_chain_put() rather than tcf_block_put(). This is what makes chain 0
not special w.r.t. refcnt.


>
> I believe that you need to add tcf_chain_hold(chain) to the start of the
> previous list iteration to ensure all existing chains will stay, then
> you can put them here.

We don't need it, it is already perfectly paired with tcf_block_get().
Think about the following:

1)
tcf_block_get(...); // create chain 0 with refcnt=1
tcf_block_put(); // no tp, only chain 0 in chain_list
// chain 0 is put, refcnt=0, it is gone

2)
tcf_block_get(); // create chain 0 with refcnt=1
...
tcf_chain_get(11); // create chain 11 with refcnt=1
// one tp is inserted to chain 11, now refcnt==2
// in tc_ctl_tfilter()
tcf_chain_put(11); // paired with above get, refcnt==1
...
tcf_block_put(); // flush chain 11, tp removed, refcnt==0
// put chain 0 too, paired with block get, refcnt == 0
// both chain 0 and chain11 are gone



>
> Did you test this? I believe we need some simple test script.

Of course I did. I verified memleak is gone and tested basic
filters and gact actions with chain 0 and chain 11, everything
works as expected, I also added a printk() to verify the chain
is really gone as soon as all references are gone.

I will contribute my test cases back after I figure out how.
Also net-next is already closed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-07 17:45     ` Cong Wang
@ 2017-09-07 20:52       ` Jiri Pirko
  2017-09-08 16:37         ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Pirko @ 2017-09-07 20:52 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, Jakub Kicinski, Jiri Pirko

Thu, Sep 07, 2017 at 07:45:49PM CEST, xiyou.wangcong@gmail.com wrote:
>On Wed, Sep 6, 2017 at 11:32 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Sep 07, 2017 at 06:26:07AM CEST, xiyou.wangcong@gmail.com wrote:
>>>This patch fixes the following madness of tc filter chain:
>>
>> Could you avoid expressive words like "madness" and such?
>> Please be technical.
>>
>
>If the following 2a) 2b) 2c) 2d) are not enough to show the madness,
>I don't know any other to show it. Madness is for code, not for you
>or any other person, so 100% technical.

We hav eto agree to disagree. You want to be expressive, apparently
there is no way to talk you out of that. Sigh...


>
>>
>>>
>>>1) tcf_chain_destroy() is called by both tcf_block_put() and
>>>   tcf_chain_put().  tcf_chain_put() is correctly refcnt'ed and paired
>>>   with tcf_chain_get(), but tcf_block_put() is not, it should be paired
>>>   with tcf_block_get() which means we still need to decrease the refcnt.
>>>   Think it in another way: if we call tcf_bock_put() immediately after
>>>   tcf_block_get(), could we get effectively a nop? This causes a memory
>>>   leak as reported by Jakub.
>>>
>>>2) tp proto should hold a refcnt to the chain too. This significantly
>>>   simplifies the logic:
>>>
>>>2a) Chain 0 is no longer special, it is created and refcnted by tp
>>>    like any other chains. All the ugliness in tcf_chain_put() can be
>>>    gone!
>>>
>>>2b) No need to handle the flushing oddly, because block still holds
>>>    chain 0, it can not be released, this guarantees block is the last
>>>    user.
>>>
>>>2c) The race condition with RCU callbacks is easier to handle with just
>>>    a rcu_barrier()! Much easier to understand, nothing to hide! Thanks
>>>    to the previous patch. Please see also the comments in code.
>>>
>>>2d) Make the code understandable by humans, much less error-prone.
>>>
>>>Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
>>>Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
>>>Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>Cc: Jiri Pirko <jiri@mellanox.com>
>>>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>---
>>> net/sched/cls_api.c | 38 ++++++++++++++++++++++----------------
>>> 1 file changed, 22 insertions(+), 16 deletions(-)
>>>
>>>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>>index 6c5ea84d2682..e9060dc36519 100644
>>>--- a/net/sched/cls_api.c
>>>+++ b/net/sched/cls_api.c
>>>@@ -209,21 +209,20 @@ static void tcf_chain_flush(struct tcf_chain *chain)
>>>               RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
>>>       while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
>>>               RCU_INIT_POINTER(chain->filter_chain, tp->next);
>>>+              tcf_chain_put(chain);
>>>               tcf_proto_destroy(tp);
>>>       }
>>> }
>>>
>>> static void tcf_chain_destroy(struct tcf_chain *chain)
>>> {
>>>-      /* May be already removed from the list by the previous call. */
>>>-      if (!list_empty(&chain->list))
>>>-              list_del_init(&chain->list);
>>>+      list_del(&chain->list);
>>>+      kfree(chain);
>>>+}
>>>
>>>-      /* There might still be a reference held when we got here from
>>>-       * tcf_block_put. Wait for the user to drop reference before free.
>>>-       */
>>>-      if (!chain->refcnt)
>>>-              kfree(chain);
>>>+static void tcf_chain_hold(struct tcf_chain *chain)
>>>+{
>>>+      ++chain->refcnt;
>>> }
>>>
>>> struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>>@@ -233,7 +232,7 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>>
>>>       list_for_each_entry(chain, &block->chain_list, list) {
>>>               if (chain->index == chain_index) {
>>>-                      chain->refcnt++;
>>>+                      tcf_chain_hold(chain);
>>>                       return chain;
>>>               }
>>>       }
>>>@@ -246,10 +245,7 @@ EXPORT_SYMBOL(tcf_chain_get);
>>>
>>> void tcf_chain_put(struct tcf_chain *chain)
>>> {
>>>-      /* Destroy unused chain, with exception of chain 0, which is the
>>>-       * default one and has to be always present.
>>>-       */
>>>-      if (--chain->refcnt == 0 && !chain->filter_chain && chain->index != 0)
>>>+      if (--chain->refcnt == 0)
>>
>> Okay, so you take the reference for every goto_chain action and every
>> tp, right? Note that for chain 0, you hold one more reference (due to
>> the creation). That is probably ok as we need chain 0 not to go away
>> even if all tps and goto_chain actions are gone.
>
>Yeah, this is the core of the patch.
>
>>
>>
>>>               tcf_chain_destroy(chain);
>>> }
>>> EXPORT_SYMBOL(tcf_chain_put);
>>>@@ -294,10 +290,18 @@ void tcf_block_put(struct tcf_block *block)
>>>       if (!block)
>>>               return;
>>>
>>>-      list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
>>>+      /* Standalone actions are not allowed to jump to any chain, and
>>>+       * bound actions should be all removed after flushing. However,
>>>+       * filters are destroyed in RCU callbacks, we have to flush and wait
>>>+       * for them before releasing this refcnt, otherwise we race with RCU
>>>+       * callbacks!!!
>>
>> Why the "!!!"? Please avoid that. Not necessary at all.
>
>
>Because we don't have lock here, we have to pay more extra
>attention to it. Playing list without lock is like playing fire. I use "!!!" to
>draw people's attention when they touch it, perhaps "XXX" is better?
>
>
>>
>>
>>>+       */
>>>+      list_for_each_entry(chain, &block->chain_list, list)
>>>               tcf_chain_flush(chain);
>>>-              tcf_chain_destroy(chain);
>>>-      }
>>>+      rcu_barrier();
>>
>> This actually tries to fix another bug I discovered yesterday. Good.
>>
>
>This on the other hand proves we should make the code clean
>and understandable asap, not just wait for net-next.
>
>
>
>>
>>>+
>>>+      list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>>>+              tcf_chain_put(chain);
>>
>> Which reference are you putting here? For chain 0, that is the original
>> reference due to creation from block_get. But how about the other
>> chains? If you do flush all in the previous list iteration, they are
>> removed there. Also note that they are removed from the list while
>> iterating it.
>
>
>Yes it is for chain 0, because block holds a reference to chain 0 during
>creation. Non-0 chains are created with refcnt==1 too but paired with
>tcf_chain_put() rather than tcf_block_put(). This is what makes chain 0
>not special w.r.t. refcnt.

So you need to tcf_chain_put only chain 0 here, right? The rest of the
chains get destroyed by the previous list_for_each_entry iteration when
flush happens and actions destroy happens what decrements recnt to 0
there.

What do I miss, who would still hold reference for non-0 chains when all
tps and all goto_chain actions are gone?


>
>
>>
>> I believe that you need to add tcf_chain_hold(chain) to the start of the
>> previous list iteration to ensure all existing chains will stay, then
>> you can put them here.
>
>We don't need it, it is already perfectly paired with tcf_block_get().
>Think about the following:
>
>1)
>tcf_block_get(...); // create chain 0 with refcnt=1
>tcf_block_put(); // no tp, only chain 0 in chain_list
>// chain 0 is put, refcnt=0, it is gone
>
>2)
>tcf_block_get(); // create chain 0 with refcnt=1
>...
>tcf_chain_get(11); // create chain 11 with refcnt=1
>// one tp is inserted to chain 11, now refcnt==2
>// in tc_ctl_tfilter()
>tcf_chain_put(11); // paired with above get, refcnt==1
>...
>tcf_block_put(); // flush chain 11, tp removed, refcnt==0
>// put chain 0 too, paired with block get, refcnt == 0
>// both chain 0 and chain11 are gone
>
>
>
>>
>> Did you test this? I believe we need some simple test script.
>
>Of course I did. I verified memleak is gone and tested basic
>filters and gact actions with chain 0 and chain 11, everything
>works as expected, I also added a printk() to verify the chain
>is really gone as soon as all references are gone.
>
>I will contribute my test cases back after I figure out how.
>Also net-next is already closed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-07 20:52       ` Jiri Pirko
@ 2017-09-08 16:37         ` Cong Wang
  2017-09-10 14:33           ` Jiri Pirko
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-09-08 16:37 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Linux Kernel Network Developers, Jakub Kicinski, Jiri Pirko

On Thu, Sep 7, 2017 at 1:52 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Sep 07, 2017 at 07:45:49PM CEST, xiyou.wangcong@gmail.com wrote:
>>Yes it is for chain 0, because block holds a reference to chain 0 during
>>creation. Non-0 chains are created with refcnt==1 too but paired with
>>tcf_chain_put() rather than tcf_block_put(). This is what makes chain 0
>>not special w.r.t. refcnt.
>
> So you need to tcf_chain_put only chain 0 here, right? The rest of the
> chains get destroyed by the previous list_for_each_entry iteration when
> flush happens and actions destroy happens what decrements recnt to 0
> there.


This is correct. And it should be only chain 0 after flush.

>
> What do I miss, who would still hold reference for non-0 chains when all
> tps and all goto_chain actions are gone?

No one. This is totally correct and is exactly what this patch intends to do.

Look, this is why we never need an object with refcnt==0 to exist. ;)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-08 16:37         ` Cong Wang
@ 2017-09-10 14:33           ` Jiri Pirko
  2017-09-11 21:14             ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Pirko @ 2017-09-10 14:33 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, Jakub Kicinski, Jiri Pirko

Fri, Sep 08, 2017 at 06:37:55PM CEST, xiyou.wangcong@gmail.com wrote:
>On Thu, Sep 7, 2017 at 1:52 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Sep 07, 2017 at 07:45:49PM CEST, xiyou.wangcong@gmail.com wrote:
>>>Yes it is for chain 0, because block holds a reference to chain 0 during
>>>creation. Non-0 chains are created with refcnt==1 too but paired with
>>>tcf_chain_put() rather than tcf_block_put(). This is what makes chain 0
>>>not special w.r.t. refcnt.
>>
>> So you need to tcf_chain_put only chain 0 here, right? The rest of the
>> chains get destroyed by the previous list_for_each_entry iteration when
>> flush happens and actions destroy happens what decrements recnt to 0
>> there.
>
>
>This is correct. And it should be only chain 0 after flush.
>
>>
>> What do I miss, who would still hold reference for non-0 chains when all
>> tps and all goto_chain actions are gone?
>
>No one. This is totally correct and is exactly what this patch intends to do.
>
>Look, this is why we never need an object with refcnt==0 to exist. ;)

So, I understand that correctly, good. But this is a problem.

When you do:
       list_for_each_entry(chain, &block->chain_list, list)
                tcf_chain_flush(chain);

The reference may get dropped for chains to 0 (for those that does not
have a goto_chain action holding a ref), and therefore they get freed
within the loop. That is problematic when you do the traversing of the
list. You may use list_for_each_entry_safe, but there is another issue:

As a part of tcf_chain_flush destruction, act goto_chain destruction may
get scheduled by call_rcu. That may be the last reference held for the
chain. So you race between this loop and rcu callback.

Consider following example:

chain0  - has only one rule with goto_chain 22 action
chain22 - no rule (refcnt 1 because of the action mentioned above)

         CPU0                            CPU1

    tcf_chain_flush(0)
               -> call_rcu(free_tcf)
                                          free_tcf
                                             ->tcf_chain_put(22)
                                                 ->tcf_chain_destroy(22)
                                                     ->kfree(22)	
    tcf_chain_flush(22)...use-after-free


So what I suggest in order to prevent this is to change your code to
something like:

	/* To avoid race between getting reference in the next loop and
	 * rcu callbacks from deleleted actions freeing the chain.
	 */
	rcu_barrier();

	list_for_each_entry(chain, &block->chain_list, list)
		if (chain->index) /* we already hold ref to chain 0 */
			tcf_chain_hold(chain);

	list_for_each_entry(chain, &block->chain_list, list)
		tcf_chain_flush(chain);
	
	/* Wait for rcu callbacks from deleleted actions that were
	 * sheduled as a result of tcf_chain_flush in the previous loop.
	 * This is not absolutelly necessary, as the chain may live after
	 * the tcf_chain_put is called in the next iteration and would
	 * get freed on tcf_chain_put call from rcu callback later on.
	 */
	rcu_barrier();

	/* Now we are sure that we are the only one holding a reference
	 * to all chains, drop it and let them go.
	 */
	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
		tcf_chain_put(chain);
	kfree(block);

Does this make sense?

Thanks!

Jiri

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-10 14:33           ` Jiri Pirko
@ 2017-09-11 21:14             ` Cong Wang
  2017-09-11 21:58               ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-09-11 21:14 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Linux Kernel Network Developers, Jakub Kicinski, Jiri Pirko

On Sun, Sep 10, 2017 at 7:33 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Fri, Sep 08, 2017 at 06:37:55PM CEST, xiyou.wangcong@gmail.com wrote:
>>On Thu, Sep 7, 2017 at 1:52 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Thu, Sep 07, 2017 at 07:45:49PM CEST, xiyou.wangcong@gmail.com wrote:
>>>>Yes it is for chain 0, because block holds a reference to chain 0 during
>>>>creation. Non-0 chains are created with refcnt==1 too but paired with
>>>>tcf_chain_put() rather than tcf_block_put(). This is what makes chain 0
>>>>not special w.r.t. refcnt.
>>>
>>> So you need to tcf_chain_put only chain 0 here, right? The rest of the
>>> chains get destroyed by the previous list_for_each_entry iteration when
>>> flush happens and actions destroy happens what decrements recnt to 0
>>> there.
>>
>>
>>This is correct. And it should be only chain 0 after flush.
>>
>>>
>>> What do I miss, who would still hold reference for non-0 chains when all
>>> tps and all goto_chain actions are gone?
>>
>>No one. This is totally correct and is exactly what this patch intends to do.
>>
>>Look, this is why we never need an object with refcnt==0 to exist. ;)
>
> So, I understand that correctly, good. But this is a problem.
>
> When you do:
>        list_for_each_entry(chain, &block->chain_list, list)
>                 tcf_chain_flush(chain);
>
> The reference may get dropped for chains to 0 (for those that does not
> have a goto_chain action holding a ref), and therefore they get freed
> within the loop. That is problematic when you do the traversing of the
> list. You may use list_for_each_entry_safe, but there is another issue:
>
> As a part of tcf_chain_flush destruction, act goto_chain destruction may
> get scheduled by call_rcu. That may be the last reference held for the
> chain. So you race between this loop and rcu callback.
>
> Consider following example:
>
> chain0  - has only one rule with goto_chain 22 action
> chain22 - no rule (refcnt 1 because of the action mentioned above)
>
>          CPU0                            CPU1
>
>     tcf_chain_flush(0)
>                -> call_rcu(free_tcf)
>                                           free_tcf
>                                              ->tcf_chain_put(22)
>                                                  ->tcf_chain_destroy(22)
>                                                      ->kfree(22)


Looks like all due to the lack of locking on block->chain_list.
I thought the rcu_barrier() could properly handle this,
but seems still not, probably I need to move it in the loop,
I am still not 100% sure if it is totally safe with
list_for_each_safe():


-       list_for_each_entry(chain, &block->chain_list, list)
+       list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
                tcf_chain_flush(chain);
-       rcu_barrier();
+               rcu_barrier(); // are we safe now???
+       }


>     tcf_chain_flush(22)...use-after-free
>

Same race could happen with your code too, right?
chain 22 still has refcnt==1 so chain_put() will destroy
it in flush() too. So this is not a regression.

I know you have list_for_each_safe() but you lack of
rcu_barrier(). This is why I said the lack of locking is
the cause, not your code or my code.


>
> So what I suggest in order to prevent this is to change your code to
> something like:
>
>         /* To avoid race between getting reference in the next loop and
>          * rcu callbacks from deleleted actions freeing the chain.
>          */
>         rcu_barrier();
>
>         list_for_each_entry(chain, &block->chain_list, list)
>                 if (chain->index) /* we already hold ref to chain 0 */
>                         tcf_chain_hold(chain);
>
>         list_for_each_entry(chain, &block->chain_list, list)
>                 tcf_chain_flush(chain);
>
>         /* Wait for rcu callbacks from deleleted actions that were
>          * sheduled as a result of tcf_chain_flush in the previous loop.
>          * This is not absolutelly necessary, as the chain may live after
>          * the tcf_chain_put is called in the next iteration and would
>          * get freed on tcf_chain_put call from rcu callback later on.
>          */
>         rcu_barrier();
>
>         /* Now we are sure that we are the only one holding a reference
>          * to all chains, drop it and let them go.
>          */
>         list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>                 tcf_chain_put(chain);
>         kfree(block);
>
> Does this make sense?

Yeah, but again this is all due to lack of locking. Do we really
have to to be such complex or just use either a) proper sync
with rcu_barrier() or b) proper locking on chain_list (with RCU
of course)?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain
  2017-09-11 21:14             ` Cong Wang
@ 2017-09-11 21:58               ` Cong Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2017-09-11 21:58 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Linux Kernel Network Developers, Jakub Kicinski, Jiri Pirko

On Mon, Sep 11, 2017 at 2:14 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Looks like all due to the lack of locking on block->chain_list.
> I thought the rcu_barrier() could properly handle this,
> but seems still not, probably I need to move it in the loop,
> I am still not 100% sure if it is totally safe with
> list_for_each_safe():
>
>
> -       list_for_each_entry(chain, &block->chain_list, list)
> +       list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
>                 tcf_chain_flush(chain);
> -       rcu_barrier();
> +               rcu_barrier(); // are we safe now???
> +       }
>

Answer myself: No, this is not safe either, because we may
list_del() the next node, and apparently _safe() can't guarantee
that...

So either we have to use locking or use the trick you suggested.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-11 21:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-07  4:26 [Patch net v2 1/2] net_sched: get rid of tcfa_rcu Cong Wang
2017-09-07  4:26 ` [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain Cong Wang
2017-09-07  6:32   ` Jiri Pirko
2017-09-07 17:45     ` Cong Wang
2017-09-07 20:52       ` Jiri Pirko
2017-09-08 16:37         ` Cong Wang
2017-09-10 14:33           ` Jiri Pirko
2017-09-11 21:14             ` Cong Wang
2017-09-11 21:58               ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).