All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
@ 2015-05-20 15:13 Daniel Borkmann
  2015-05-20 17:38 ` Cong Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Daniel Borkmann @ 2015-05-20 15:13 UTC (permalink / raw)
  To: davem
  Cc: subramanian.vijay, netdev, Daniel Borkmann, John Fastabend,
	Eric Dumazet, Thomas Graf, Jamal Hadi Salim, Alexei Starovoitov

Vijay reported that a loop as simple as ...

  while true; do
    tc qdisc add dev foo root handle 1: prio
    tc filter add dev foo parent 1: u32 match u32 0 0  flowid 1
    tc qdisc del dev foo root
    rmmod cls_u32
  done

... will panic the kernel. Moreover, he bisected the change
apparently introducing it to 78fd1d0ab072 ("netlink: Re-add
locking to netlink_lookup() and seq walker").

The removal of synchronize_net() from the netlink socket
triggering the qdisc to be removed, seems to have uncovered
an RCU resp. module reference count race from the tc API.
Given that RCU conversion was done after e341694e3eb5 ("netlink:
Convert netlink_lookup() to use RCU protected hash table")
which added the synchronize_net() originally, occasion of
hitting the bug was less likely (not impossible though):

When qdiscs that i) support attaching classifiers and,
ii) have at least one of them attached, get deleted, they
invoke tcf_destroy_chain(), and thus call into ->destroy()
handler from a classifier module.

After RCU conversion, all classifier that have an internal
prio list, unlink them and initiate freeing via call_rcu()
deferral.

Meanhile, tcf_destroy() releases already reference to the
tp->ops->owner module before the queued RCU callback handler
has been invoked.

Subsequent rmmod on the classifier module is then not prevented
since all module references are already dropped.

By the time, the kernel invokes the RCU callback handler from
the module, that function address is then invalid.

One way to fix it would be to add an rcu_barrier() to
unregister_tcf_proto_ops() to wait for all pending call_rcu()s
to complete.

synchronize_rcu() is not appropriate as under heavy RCU
callback load, registered call_rcu()s could be deferred
longer than a grace period. In case we don't have any pending
call_rcu()s, the barrier is allowed to return immediately.

Since we came here via unregister_tcf_proto_ops(), there
are no users of a given classifier anymore. Further nested
call_rcu()s pointing into the module space are not being
done anywhere.

Only cls_bpf_delete_prog() may schedule a work item, to
unlock pages eventually, but that is not in the range/context
of cls_bpf anymore.

Fixes: 25d8c0d55f24 ("net: rcu-ify tcf_proto")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
 (For Fixes tags, I've added the first ones that should be affected.)

 Vijay, can you give this a run on your side? This fixes the panic
 on my side, thanks!

 net/sched/cls_api.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b6ef9a0..a75864d 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -81,6 +81,11 @@ int unregister_tcf_proto_ops(struct tcf_proto_ops *ops)
 	struct tcf_proto_ops *t;
 	int rc = -ENOENT;
 
+	/* Wait for outstanding call_rcu()s, if any, from a
+	 * tcf_proto_ops's destroy() handler.
+	 */
+	rcu_barrier();
+
 	write_lock(&cls_mod_lock);
 	list_for_each_entry(t, &tcf_proto_base, head) {
 		if (t == ops) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
  2015-05-20 15:13 [PATCH net] net: sched: fix call_rcu() race on classifier module unloads Daniel Borkmann
@ 2015-05-20 17:38 ` Cong Wang
  2015-05-20 18:12   ` Daniel Borkmann
  2015-05-20 18:01 ` Vijay Subramanian
  2015-05-21 22:48 ` David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2015-05-20 17:38 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, subramanian.vijay, netdev, John Fastabend,
	Eric Dumazet, Thomas Graf, Jamal Hadi Salim, Alexei Starovoitov

On Wed, May 20, 2015 at 8:13 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Vijay reported that a loop as simple as ...
>
>   while true; do
>     tc qdisc add dev foo root handle 1: prio
>     tc filter add dev foo parent 1: u32 match u32 0 0  flowid 1
>     tc qdisc del dev foo root
>     rmmod cls_u32
>   done
>
> ... will panic the kernel. Moreover, he bisected the change
> apparently introducing it to 78fd1d0ab072 ("netlink: Re-add
> locking to netlink_lookup() and seq walker").


Please include the stack trace whenever possible, it helps
a lots even just for reviewing the patch.

>
> The removal of synchronize_net() from the netlink socket
> triggering the qdisc to be removed, seems to have uncovered
> an RCU resp. module reference count race from the tc API.
> Given that RCU conversion was done after e341694e3eb5 ("netlink:
> Convert netlink_lookup() to use RCU protected hash table")
> which added the synchronize_net() originally, occasion of
> hitting the bug was less likely (not impossible though):
>
> When qdiscs that i) support attaching classifiers and,
> ii) have at least one of them attached, get deleted, they
> invoke tcf_destroy_chain(), and thus call into ->destroy()
> handler from a classifier module.
>
> After RCU conversion, all classifier that have an internal
> prio list, unlink them and initiate freeing via call_rcu()
> deferral.
>
> Meanhile, tcf_destroy() releases already reference to the
> tp->ops->owner module before the queued RCU callback handler
> has been invoked.
>
> Subsequent rmmod on the classifier module is then not prevented
> since all module references are already dropped.
>
> By the time, the kernel invokes the RCU callback handler from
> the module, that function address is then invalid.
>
> One way to fix it would be to add an rcu_barrier() to
> unregister_tcf_proto_ops() to wait for all pending call_rcu()s
> to complete.
>
> synchronize_rcu() is not appropriate as under heavy RCU
> callback load, registered call_rcu()s could be deferred
> longer than a grace period. In case we don't have any pending
> call_rcu()s, the barrier is allowed to return immediately.


Why synchronize_rcu() even matters here? It waits for
readers, not for RCU callbacks.

>
> Since we came here via unregister_tcf_proto_ops(), there
> are no users of a given classifier anymore. Further nested
> call_rcu()s pointing into the module space are not being
> done anywhere.


This doesn't look like the best way to fix it, since calling
call_rcu() is tc filter specific, so why not just move the
rcu_barrier() to each of the ->destroy() implementation?
Let each filter handle its own implementation bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
  2015-05-20 15:13 [PATCH net] net: sched: fix call_rcu() race on classifier module unloads Daniel Borkmann
  2015-05-20 17:38 ` Cong Wang
@ 2015-05-20 18:01 ` Vijay Subramanian
  2015-05-21 22:48 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Vijay Subramanian @ 2015-05-20 18:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, netdev, John Fastabend, Eric Dumazet, Thomas Graf,
	Jamal Hadi Salim, Alexei Starovoitov

>  Vijay, can you give this a run on your side? This fixes the panic
>  on my side, thanks!

Thanks for the explanation and patch Daniel !
Yes. This also fixes the crash for me. So,

Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>

Vijay

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
  2015-05-20 17:38 ` Cong Wang
@ 2015-05-20 18:12   ` Daniel Borkmann
  2015-05-20 18:30     ` Cong Wang
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Daniel Borkmann @ 2015-05-20 18:12 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, subramanian.vijay, netdev, John Fastabend,
	Eric Dumazet, Thomas Graf, Jamal Hadi Salim, Alexei Starovoitov

On 05/20/2015 07:38 PM, Cong Wang wrote:
...
> Why synchronize_rcu() even matters here? It waits for
> readers, not for RCU callbacks.

Hm, I am mentioning it here as it was related to 78fd1d0ab072
as explained in the commit message.

>> Since we came here via unregister_tcf_proto_ops(), there
>> are no users of a given classifier anymore. Further nested
>> call_rcu()s pointing into the module space are not being
>> done anywhere.
>
> This doesn't look like the best way to fix it, since calling
> call_rcu() is tc filter specific, so why not just move the
> rcu_barrier() to each of the ->destroy() implementation?
> Let each filter handle its own implementation bug.

Effectively, every in-tree classifier (rsvp is the only exception)
is making use of call_rcu(). Moreover, moving this into every
->destroy() handler would also be unnecessary overhead, imho, as
this is only relevant when we actually _unload_ a module.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
  2015-05-20 18:12   ` Daniel Borkmann
@ 2015-05-20 18:30     ` Cong Wang
  2015-05-20 18:32     ` Alexei Starovoitov
  2015-05-20 18:36     ` Eric Dumazet
  2 siblings, 0 replies; 8+ messages in thread
From: Cong Wang @ 2015-05-20 18:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, subramanian.vijay, netdev, John Fastabend,
	Eric Dumazet, Thomas Graf, Jamal Hadi Salim, Alexei Starovoitov

On Wed, May 20, 2015 at 11:12 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 05/20/2015 07:38 PM, Cong Wang wrote:
> ...
>>
>> Why synchronize_rcu() even matters here? It waits for
>> readers, not for RCU callbacks.
>
>
> Hm, I am mentioning it here as it was related to 78fd1d0ab072
> as explained in the commit message.


Move it to the right place or remove it, it is only confusing.


>
>>> Since we came here via unregister_tcf_proto_ops(), there
>>> are no users of a given classifier anymore. Further nested
>>> call_rcu()s pointing into the module space are not being
>>> done anywhere.
>>
>>
>> This doesn't look like the best way to fix it, since calling
>> call_rcu() is tc filter specific, so why not just move the
>> rcu_barrier() to each of the ->destroy() implementation?
>> Let each filter handle its own implementation bug.
>
>
> Effectively, every in-tree classifier (rsvp is the only exception)
> is making use of call_rcu(). Moreover, moving this into every
> ->destroy() handler would also be unnecessary overhead, imho, as
> this is only relevant when we actually _unload_ a module.

Well, ->destroy() is not a fast path, it holds rtnl lock, we don't want to
lose readability for little performance gain. The code is hard to
understand even with a comment there, move it to tcf_destroy() so
that we probably don't even need a comment there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
  2015-05-20 18:12   ` Daniel Borkmann
  2015-05-20 18:30     ` Cong Wang
@ 2015-05-20 18:32     ` Alexei Starovoitov
  2015-05-20 18:36     ` Eric Dumazet
  2 siblings, 0 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2015-05-20 18:32 UTC (permalink / raw)
  To: Daniel Borkmann, Cong Wang
  Cc: David Miller, subramanian.vijay, netdev, John Fastabend,
	Eric Dumazet, Thomas Graf, Jamal Hadi Salim

On 5/20/15 11:12 AM, Daniel Borkmann wrote:
>
> Effectively, every in-tree classifier (rsvp is the only exception)
> is making use of call_rcu(). Moreover, moving this into every
> ->destroy() handler would also be unnecessary overhead, imho, as
> this is only relevant when we actually _unload_ a module.

+1
indeed unregister_tcf_proto_ops() is only called at module unload time.
So there is no overhead from this rcu_barrier(). It's actually
mandatory. All modules with rcu callbacks should do that.

Acked-by: Alexei Starovoitov <ast@plumgrid.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
  2015-05-20 18:12   ` Daniel Borkmann
  2015-05-20 18:30     ` Cong Wang
  2015-05-20 18:32     ` Alexei Starovoitov
@ 2015-05-20 18:36     ` Eric Dumazet
  2 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2015-05-20 18:36 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Cong Wang, David Miller, subramanian.vijay, netdev,
	John Fastabend, Eric Dumazet, Thomas Graf, Jamal Hadi Salim,
	Alexei Starovoitov

On Wed, 2015-05-20 at 20:12 +0200, Daniel Borkmann wrote:

> Effectively, every in-tree classifier (rsvp is the only exception)
> is making use of call_rcu(). Moreover, moving this into every
> ->destroy() handler would also be unnecessary overhead, imho, as
> this is only relevant when we actually _unload_ a module.

I kind of agree with you, not sure there is any gain from spreading this
fix all over the places.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: sched: fix call_rcu() race on classifier module unloads
  2015-05-20 15:13 [PATCH net] net: sched: fix call_rcu() race on classifier module unloads Daniel Borkmann
  2015-05-20 17:38 ` Cong Wang
  2015-05-20 18:01 ` Vijay Subramanian
@ 2015-05-21 22:48 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2015-05-21 22:48 UTC (permalink / raw)
  To: daniel
  Cc: subramanian.vijay, netdev, john.r.fastabend, edumazet, tgraf, jhs, ast

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 20 May 2015 17:13:33 +0200

> Vijay reported that a loop as simple as ...
 ...
> ... will panic the kernel. Moreover, he bisected the change
> apparently introducing it to 78fd1d0ab072 ("netlink: Re-add
> locking to netlink_lookup() and seq walker").
> 
> The removal of synchronize_net() from the netlink socket
> triggering the qdisc to be removed, seems to have uncovered
> an RCU resp. module reference count race from the tc API.
> Given that RCU conversion was done after e341694e3eb5 ("netlink:
> Convert netlink_lookup() to use RCU protected hash table")
> which added the synchronize_net() originally, occasion of
> hitting the bug was less likely (not impossible though):
> 
> When qdiscs that i) support attaching classifiers and,
> ii) have at least one of them attached, get deleted, they
> invoke tcf_destroy_chain(), and thus call into ->destroy()
> handler from a classifier module.
> 
> After RCU conversion, all classifier that have an internal
> prio list, unlink them and initiate freeing via call_rcu()
> deferral.
> 
> Meanhile, tcf_destroy() releases already reference to the
> tp->ops->owner module before the queued RCU callback handler
> has been invoked.
> 
> Subsequent rmmod on the classifier module is then not prevented
> since all module references are already dropped.
> 
> By the time, the kernel invokes the RCU callback handler from
> the module, that function address is then invalid.
> 
> One way to fix it would be to add an rcu_barrier() to
> unregister_tcf_proto_ops() to wait for all pending call_rcu()s
> to complete.
> 
> synchronize_rcu() is not appropriate as under heavy RCU
> callback load, registered call_rcu()s could be deferred
> longer than a grace period. In case we don't have any pending
> call_rcu()s, the barrier is allowed to return immediately.
> 
> Since we came here via unregister_tcf_proto_ops(), there
> are no users of a given classifier anymore. Further nested
> call_rcu()s pointing into the module space are not being
> done anywhere.
> 
> Only cls_bpf_delete_prog() may schedule a work item, to
> unlock pages eventually, but that is not in the range/context
> of cls_bpf anymore.
> 
> Fixes: 25d8c0d55f24 ("net: rcu-ify tcf_proto")
> Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
> Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Applied, thanks a lot Daniel.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-21 22:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-20 15:13 [PATCH net] net: sched: fix call_rcu() race on classifier module unloads Daniel Borkmann
2015-05-20 17:38 ` Cong Wang
2015-05-20 18:12   ` Daniel Borkmann
2015-05-20 18:30     ` Cong Wang
2015-05-20 18:32     ` Alexei Starovoitov
2015-05-20 18:36     ` Eric Dumazet
2015-05-20 18:01 ` Vijay Subramanian
2015-05-21 22:48 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.