netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch net] net_sched: remove tcf_block_put_deferred()
@ 2017-10-30 18:10 Cong Wang
  2017-10-31  2:06 ` David Miller
  2017-10-31 10:40 ` Jiri Pirko
  0 siblings, 2 replies; 5+ messages in thread
From: Cong Wang @ 2017-10-30 18:10 UTC (permalink / raw)
  To: netdev
  Cc: Cong Wang, Daniel Borkmann, Jiri Pirko, John Fastabend,
	Jamal Hadi Salim, Paul E. McKenney, Eric Dumazet

In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
because qdisc is already destroyed after we queue this work.

The tcf_block_put_deferred() is no longer necessary after we get RTNL
for each tc filter destroy work, no others could jump in at this point.
Same for tcf_chain_hold(), we are fully serialized now.

This also reduces one indirection therefore makes the code more
readable. Note this brings back a rcu_barrier(), however comparing
to the code prior to commit 7aa0045dadb6 we still reduced one
rcu_barrier(). For net-next, we can consider to refcnt tcf block to
avoid it.

Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_api.c | 37 ++++++++-----------------------------
 1 file changed, 8 insertions(+), 29 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 231181c602ed..b2d310745487 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -280,8 +280,8 @@ static void tcf_block_put_final(struct work_struct *work)
 	struct tcf_block *block = container_of(work, struct tcf_block, work);
 	struct tcf_chain *chain, *tmp;
 
-	/* At this point, all the chains should have refcnt == 1. */
 	rtnl_lock();
+	/* Only chain 0 should be still here. */
 	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
 		tcf_chain_put(chain);
 	rtnl_unlock();
@@ -289,23 +289,17 @@ static void tcf_block_put_final(struct work_struct *work)
 }
 
 /* XXX: Standalone actions are not allowed to jump to any chain, and bound
- * actions should be all removed after flushing. However, filters are destroyed
- * in RCU callbacks, we have to hold the chains first, otherwise we would
- * always race with RCU callbacks on this list without proper locking.
+ * actions should be all removed after flushing. However, filters are now
+ * destroyed in tc filter workqueue with RTNL lock, they can not race here.
  */
-static void tcf_block_put_deferred(struct work_struct *work)
+void tcf_block_put(struct tcf_block *block)
 {
-	struct tcf_block *block = container_of(work, struct tcf_block, work);
-	struct tcf_chain *chain;
+	struct tcf_chain *chain, *tmp;
 
-	rtnl_lock();
-	/* Hold a refcnt for all chains, except 0, in case they are gone. */
-	list_for_each_entry(chain, &block->chain_list, list)
-		if (chain->index)
-			tcf_chain_hold(chain);
+	if (!block)
+		return;
 
-	/* No race on the list, because no chain could be destroyed. */
-	list_for_each_entry(chain, &block->chain_list, list)
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
 		tcf_chain_flush(chain);
 
 	INIT_WORK(&block->work, tcf_block_put_final);
@@ -314,21 +308,6 @@ static void tcf_block_put_deferred(struct work_struct *work)
 	 */
 	rcu_barrier();
 	tcf_queue_work(&block->work);
-	rtnl_unlock();
-}
-
-void tcf_block_put(struct tcf_block *block)
-{
-	if (!block)
-		return;
-
-	INIT_WORK(&block->work, tcf_block_put_deferred);
-	/* Wait for existing RCU callbacks to cool down, make sure their works
-	 * have been queued before this. We can not flush pending works here
-	 * because we are holding the RTNL lock.
-	 */
-	rcu_barrier();
-	tcf_queue_work(&block->work);
 }
 EXPORT_SYMBOL(tcf_block_put);
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Patch net] net_sched: remove tcf_block_put_deferred()
  2017-10-30 18:10 [Patch net] net_sched: remove tcf_block_put_deferred() Cong Wang
@ 2017-10-31  2:06 ` David Miller
  2017-10-31 10:40 ` Jiri Pirko
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2017-10-31  2:06 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: netdev, daniel, jiri, john.fastabend, jhs, paulmck, edumazet

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 30 Oct 2017 11:10:09 -0700

> In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
> I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
> because qdisc is already destroyed after we queue this work.
> 
> The tcf_block_put_deferred() is no longer necessary after we get RTNL
> for each tc filter destroy work, no others could jump in at this point.
> Same for tcf_chain_hold(), we are fully serialized now.
> 
> This also reduces one indirection therefore makes the code more
> readable. Note this brings back a rcu_barrier(), however comparing
> to the code prior to commit 7aa0045dadb6 we still reduced one
> rcu_barrier(). For net-next, we can consider to refcnt tcf block to
> avoid it.
> 
> Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks for fixing this use-after-free so quickly.

This will be another fun merge into net-next :-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch net] net_sched: remove tcf_block_put_deferred()
  2017-10-30 18:10 [Patch net] net_sched: remove tcf_block_put_deferred() Cong Wang
  2017-10-31  2:06 ` David Miller
@ 2017-10-31 10:40 ` Jiri Pirko
  2017-10-31 21:47   ` Cong Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Jiri Pirko @ 2017-10-31 10:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, Daniel Borkmann, John Fastabend, Jamal Hadi Salim,
	Paul E. McKenney, Eric Dumazet

Mon, Oct 30, 2017 at 07:10:09PM CET, xiyou.wangcong@gmail.com wrote:
>In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
>I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
>because qdisc is already destroyed after we queue this work.
>
>The tcf_block_put_deferred() is no longer necessary after we get RTNL
>for each tc filter destroy work, no others could jump in at this point.
>Same for tcf_chain_hold(), we are fully serialized now.
>
>This also reduces one indirection therefore makes the code more
>readable. Note this brings back a rcu_barrier(), however comparing
>to the code prior to commit 7aa0045dadb6 we still reduced one
>rcu_barrier(). For net-next, we can consider to refcnt tcf block to
>avoid it.
>
>Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
>Cc: Daniel Borkmann <daniel@iogearbox.net>
>Cc: Jiri Pirko <jiri@resnulli.us>
>Cc: John Fastabend <john.fastabend@gmail.com>
>Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>Cc: Eric Dumazet <edumazet@google.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>---
> net/sched/cls_api.c | 37 ++++++++-----------------------------
> 1 file changed, 8 insertions(+), 29 deletions(-)
>
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index 231181c602ed..b2d310745487 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -280,8 +280,8 @@ static void tcf_block_put_final(struct work_struct *work)
> 	struct tcf_block *block = container_of(work, struct tcf_block, work);
> 	struct tcf_chain *chain, *tmp;
> 
>-	/* At this point, all the chains should have refcnt == 1. */
> 	rtnl_lock();
>+	/* Only chain 0 should be still here. */
> 	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
> 		tcf_chain_put(chain);
> 	rtnl_unlock();
>@@ -289,23 +289,17 @@ static void tcf_block_put_final(struct work_struct *work)
> }
> 
> /* XXX: Standalone actions are not allowed to jump to any chain, and bound
>- * actions should be all removed after flushing. However, filters are destroyed
>- * in RCU callbacks, we have to hold the chains first, otherwise we would
>- * always race with RCU callbacks on this list without proper locking.
>+ * actions should be all removed after flushing. However, filters are now
>+ * destroyed in tc filter workqueue with RTNL lock, they can not race here.
>  */
>-static void tcf_block_put_deferred(struct work_struct *work)
>+void tcf_block_put(struct tcf_block *block)
> {
>-	struct tcf_block *block = container_of(work, struct tcf_block, work);
>-	struct tcf_chain *chain;
>+	struct tcf_chain *chain, *tmp;
> 
>-	rtnl_lock();
>-	/* Hold a refcnt for all chains, except 0, in case they are gone. */
>-	list_for_each_entry(chain, &block->chain_list, list)
>-		if (chain->index)
>-			tcf_chain_hold(chain);
>+	if (!block)
>+		return;
> 
>-	/* No race on the list, because no chain could be destroyed. */
>-	list_for_each_entry(chain, &block->chain_list, list)
>+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
> 		tcf_chain_flush(chain);

The reason for the hold above was to avoid use after free in this loop.
Consider following example:

chain1
  1 filter with action goto_chain 2
chain2
  empty

Now in your list_for_each_entry_safe loop, chain1 is flushed, action is
removed and chain is put:
tcf_action_goto_chain_fini->tcf_chain_put(2)

Given the fact chain2 is empty, this put would lead to tcf_chain_destroy(2)

Then in another iteration of list_for_each_entry_safe you are using
already freed chain.

Am I missing something that prevents this?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch net] net_sched: remove tcf_block_put_deferred()
  2017-10-31 10:40 ` Jiri Pirko
@ 2017-10-31 21:47   ` Cong Wang
  2017-11-01  8:11     ` Jiri Pirko
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2017-10-31 21:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Linux Kernel Network Developers, Daniel Borkmann, John Fastabend,
	Jamal Hadi Salim, Paul E. McKenney, Eric Dumazet

On Tue, Oct 31, 2017 at 3:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Mon, Oct 30, 2017 at 07:10:09PM CET, xiyou.wangcong@gmail.com wrote:
>>In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
>>I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
>>because qdisc is already destroyed after we queue this work.
>>
>>The tcf_block_put_deferred() is no longer necessary after we get RTNL
>>for each tc filter destroy work, no others could jump in at this point.
>>Same for tcf_chain_hold(), we are fully serialized now.
>>
>>This also reduces one indirection therefore makes the code more
>>readable. Note this brings back a rcu_barrier(), however comparing
>>to the code prior to commit 7aa0045dadb6 we still reduced one
>>rcu_barrier(). For net-next, we can consider to refcnt tcf block to
>>avoid it.
>>
>>Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
>>Cc: Daniel Borkmann <daniel@iogearbox.net>
>>Cc: Jiri Pirko <jiri@resnulli.us>
>>Cc: John Fastabend <john.fastabend@gmail.com>
>>Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>>Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>>Cc: Eric Dumazet <edumazet@google.com>
>>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>---
>> net/sched/cls_api.c | 37 ++++++++-----------------------------
>> 1 file changed, 8 insertions(+), 29 deletions(-)
>>
>>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>index 231181c602ed..b2d310745487 100644
>>--- a/net/sched/cls_api.c
>>+++ b/net/sched/cls_api.c
>>@@ -280,8 +280,8 @@ static void tcf_block_put_final(struct work_struct *work)
>>       struct tcf_block *block = container_of(work, struct tcf_block, work);
>>       struct tcf_chain *chain, *tmp;
>>
>>-      /* At this point, all the chains should have refcnt == 1. */
>>       rtnl_lock();
>>+      /* Only chain 0 should be still here. */
>>       list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>>               tcf_chain_put(chain);
>>       rtnl_unlock();
>>@@ -289,23 +289,17 @@ static void tcf_block_put_final(struct work_struct *work)
>> }
>>
>> /* XXX: Standalone actions are not allowed to jump to any chain, and bound
>>- * actions should be all removed after flushing. However, filters are destroyed
>>- * in RCU callbacks, we have to hold the chains first, otherwise we would
>>- * always race with RCU callbacks on this list without proper locking.
>>+ * actions should be all removed after flushing. However, filters are now
>>+ * destroyed in tc filter workqueue with RTNL lock, they can not race here.
>>  */
>>-static void tcf_block_put_deferred(struct work_struct *work)
>>+void tcf_block_put(struct tcf_block *block)
>> {
>>-      struct tcf_block *block = container_of(work, struct tcf_block, work);
>>-      struct tcf_chain *chain;
>>+      struct tcf_chain *chain, *tmp;
>>
>>-      rtnl_lock();
>>-      /* Hold a refcnt for all chains, except 0, in case they are gone. */
>>-      list_for_each_entry(chain, &block->chain_list, list)
>>-              if (chain->index)
>>-                      tcf_chain_hold(chain);
>>+      if (!block)
>>+              return;
>>
>>-      /* No race on the list, because no chain could be destroyed. */
>>-      list_for_each_entry(chain, &block->chain_list, list)
>>+      list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>>               tcf_chain_flush(chain);
>
> The reason for the hold above was to avoid use after free in this loop.
> Consider following example:
>
> chain1
>   1 filter with action goto_chain 2
> chain2
>   empty
>
> Now in your list_for_each_entry_safe loop, chain1 is flushed, action is
> removed and chain is put:
> tcf_action_goto_chain_fini->tcf_chain_put(2)
>
> Given the fact chain2 is empty, this put would lead to tcf_chain_destroy(2)
>
> Then in another iteration of list_for_each_entry_safe you are using
> already freed chain.
>
> Am I missing something that prevents this?

Actions are destroyed in filter's ->destroy(), and all filters now
use RCU callback+workqueue to defer the tcf_action_goto_chain_fini()
and with RTNL, so this function won't be executed until we release
RTNL. This is what I mean by "fully serialized".

Or if there any other case I still miss?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch net] net_sched: remove tcf_block_put_deferred()
  2017-10-31 21:47   ` Cong Wang
@ 2017-11-01  8:11     ` Jiri Pirko
  0 siblings, 0 replies; 5+ messages in thread
From: Jiri Pirko @ 2017-11-01  8:11 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Daniel Borkmann, John Fastabend,
	Jamal Hadi Salim, Paul E. McKenney, Eric Dumazet

Tue, Oct 31, 2017 at 10:47:48PM CET, xiyou.wangcong@gmail.com wrote:
>On Tue, Oct 31, 2017 at 3:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Mon, Oct 30, 2017 at 07:10:09PM CET, xiyou.wangcong@gmail.com wrote:
>>>In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
>>>I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
>>>because qdisc is already destroyed after we queue this work.
>>>
>>>The tcf_block_put_deferred() is no longer necessary after we get RTNL
>>>for each tc filter destroy work, no others could jump in at this point.
>>>Same for tcf_chain_hold(), we are fully serialized now.
>>>
>>>This also reduces one indirection therefore makes the code more
>>>readable. Note this brings back a rcu_barrier(), however comparing
>>>to the code prior to commit 7aa0045dadb6 we still reduced one
>>>rcu_barrier(). For net-next, we can consider to refcnt tcf block to
>>>avoid it.
>>>
>>>Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
>>>Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>Cc: Jiri Pirko <jiri@resnulli.us>
>>>Cc: John Fastabend <john.fastabend@gmail.com>
>>>Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>>>Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>>>Cc: Eric Dumazet <edumazet@google.com>
>>>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>---
>>> net/sched/cls_api.c | 37 ++++++++-----------------------------
>>> 1 file changed, 8 insertions(+), 29 deletions(-)
>>>
>>>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>>index 231181c602ed..b2d310745487 100644
>>>--- a/net/sched/cls_api.c
>>>+++ b/net/sched/cls_api.c
>>>@@ -280,8 +280,8 @@ static void tcf_block_put_final(struct work_struct *work)
>>>       struct tcf_block *block = container_of(work, struct tcf_block, work);
>>>       struct tcf_chain *chain, *tmp;
>>>
>>>-      /* At this point, all the chains should have refcnt == 1. */
>>>       rtnl_lock();
>>>+      /* Only chain 0 should be still here. */
>>>       list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>>>               tcf_chain_put(chain);
>>>       rtnl_unlock();
>>>@@ -289,23 +289,17 @@ static void tcf_block_put_final(struct work_struct *work)
>>> }
>>>
>>> /* XXX: Standalone actions are not allowed to jump to any chain, and bound
>>>- * actions should be all removed after flushing. However, filters are destroyed
>>>- * in RCU callbacks, we have to hold the chains first, otherwise we would
>>>- * always race with RCU callbacks on this list without proper locking.
>>>+ * actions should be all removed after flushing. However, filters are now
>>>+ * destroyed in tc filter workqueue with RTNL lock, they can not race here.
>>>  */
>>>-static void tcf_block_put_deferred(struct work_struct *work)
>>>+void tcf_block_put(struct tcf_block *block)
>>> {
>>>-      struct tcf_block *block = container_of(work, struct tcf_block, work);
>>>-      struct tcf_chain *chain;
>>>+      struct tcf_chain *chain, *tmp;
>>>
>>>-      rtnl_lock();
>>>-      /* Hold a refcnt for all chains, except 0, in case they are gone. */
>>>-      list_for_each_entry(chain, &block->chain_list, list)
>>>-              if (chain->index)
>>>-                      tcf_chain_hold(chain);
>>>+      if (!block)
>>>+              return;
>>>
>>>-      /* No race on the list, because no chain could be destroyed. */
>>>-      list_for_each_entry(chain, &block->chain_list, list)
>>>+      list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>>>               tcf_chain_flush(chain);
>>
>> The reason for the hold above was to avoid use after free in this loop.
>> Consider following example:
>>
>> chain1
>>   1 filter with action goto_chain 2
>> chain2
>>   empty
>>
>> Now in your list_for_each_entry_safe loop, chain1 is flushed, action is
>> removed and chain is put:
>> tcf_action_goto_chain_fini->tcf_chain_put(2)
>>
>> Given the fact chain2 is empty, this put would lead to tcf_chain_destroy(2)
>>
>> Then in another iteration of list_for_each_entry_safe you are using
>> already freed chain.
>>
>> Am I missing something that prevents this?
>
>Actions are destroyed in filter's ->destroy(), and all filters now
>use RCU callback+workqueue to defer the tcf_action_goto_chain_fini()
>and with RTNL, so this function won't be executed until we release
>RTNL. This is what I mean by "fully serialized".

Ah! I see it now. You did that in the previous patchset. It confused me
that you remove the hold now.

I don't see any issue now. Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-01  8:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 18:10 [Patch net] net_sched: remove tcf_block_put_deferred() Cong Wang
2017-10-31  2:06 ` David Miller
2017-10-31 10:40 ` Jiri Pirko
2017-10-31 21:47   ` Cong Wang
2017-11-01  8:11     ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).