[PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
@ 2016-11-22 14:25 Roi Dayan
  2016-11-22 14:48 ` Jiri Pirko
  0 siblings, 1 reply; 16+ messages in thread
From: Roi Dayan @ 2016-11-22 14:25 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Cong Wang, Or Gerlitz, Roi Dayan, Cong Wang

tp->root is being allocated in init() time and kfreed in destroy()
however it is being dereferenced in classify() path.

We could be in classify() path after destroy() was called and thus 
tp->root is null. Verifying if tp->root is null in classify() path 
is enough because it's being freed with kfree_rcu() and classify() 
path is under rcu_read_lock().

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Cc: Cong Wang <cwang@twopensource.com>
---

Hi Cong, all

As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: destroy 
proto tp when all filters are gone"). This patch provides a fix only for cls_flower where 
I succeeded in reproducing the issue. Cong, if you can/want to come up with a fix that
will be applicable for all the others classifiners, I am fine with that.

Thanks,
Roi

 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..88a26c4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	struct fl_flow_key skb_mkey;
 	struct ip_tunnel_info *info;

-	if (!atomic_read(&head->ht.nelems))
+	if (!head || !atomic_read(&head->ht.nelems))
 		return -1;

 	fl_clear_masked_range(&skb_key, &head->mask);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 14:25 [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it Roi Dayan
@ 2016-11-22 14:48 ` Jiri Pirko
  2016-11-22 15:37   ` David Miller
  2016-11-22 16:04   ` Daniel Borkmann
  0 siblings, 2 replies; 16+ messages in thread
From: Jiri Pirko @ 2016-11-22 14:48 UTC (permalink / raw)
  To: Roi Dayan
  Cc: David S. Miller, netdev, Jiri Pirko, Cong Wang, Or Gerlitz, Cong Wang

Tue, Nov 22, 2016 at 03:25:26PM CET, roid@mellanox.com wrote:
>tp->root is being allocated in init() time and kfreed in destroy()
>however it is being dereferenced in classify() path.
>
>We could be in classify() path after destroy() was called and thus 
>tp->root is null. Verifying if tp->root is null in classify() path 
>is enough because it's being freed with kfree_rcu() and classify() 
>path is under rcu_read_lock().
>
>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>Signed-off-by: Roi Dayan <roid@mellanox.com>
>Cc: Cong Wang <cwang@twopensource.com>

This is correct

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

The other way to fix this would be to move tp->ops->destroy call to
call_rcu phase. That would require bigger changes though. net-next
perhaps?



>---
>
>Hi Cong, all
>
>As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: destroy 
>proto tp when all filters are gone"). This patch provides a fix only for cls_flower where 
>I succeeded in reproducing the issue. Cong, if you can/want to come up with a fix that
>will be applicable for all the others classifiners, I am fine with that.
>
>Thanks,
>Roi
>
>
> net/sched/cls_flower.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index e8dd09a..88a26c4 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> 	struct fl_flow_key skb_mkey;
> 	struct ip_tunnel_info *info;
> 
>-	if (!atomic_read(&head->ht.nelems))
>+	if (!head || !atomic_read(&head->ht.nelems))
> 		return -1;
> 
> 	fl_clear_masked_range(&skb_key, &head->mask);
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 14:48 ` Jiri Pirko
@ 2016-11-22 15:37   ` David Miller
  2016-11-22 16:13     ` Jiri Pirko
  2016-11-22 16:04   ` Daniel Borkmann
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2016-11-22 15:37 UTC (permalink / raw)
  To: jiri; +Cc: roid, netdev, jiri, xiyou.wangcong, ogerlitz, cwang

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 22 Nov 2016 15:48:44 +0100

> Tue, Nov 22, 2016 at 03:25:26PM CET, roid@mellanox.com wrote:
>>tp->root is being allocated in init() time and kfreed in destroy()
>>however it is being dereferenced in classify() path.
>>
>>We could be in classify() path after destroy() was called and thus 
>>tp->root is null. Verifying if tp->root is null in classify() path 
>>is enough because it's being freed with kfree_rcu() and classify() 
>>path is under rcu_read_lock().
>>
>>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>>Signed-off-by: Roi Dayan <roid@mellanox.com>
>>Cc: Cong Wang <cwang@twopensource.com>
> 
> This is correct
> 
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> 
> The other way to fix this would be to move tp->ops->destroy call to
> call_rcu phase. That would require bigger changes though. net-next
> perhaps?

This patch is targetted at net-next as per Subj.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 14:48 ` Jiri Pirko
  2016-11-22 15:37   ` David Miller
@ 2016-11-22 16:04   ` Daniel Borkmann
  2016-11-22 16:11     ` Jiri Pirko
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2016-11-22 16:04 UTC (permalink / raw)
  To: Jiri Pirko, Roi Dayan
  Cc: David S. Miller, netdev, Jiri Pirko, Cong Wang, Or Gerlitz,
	Cong Wang, john.fastabend

[ + John ]

On 11/22/2016 03:48 PM, Jiri Pirko wrote:
> Tue, Nov 22, 2016 at 03:25:26PM CET, roid@mellanox.com wrote:
>> tp->root is being allocated in init() time and kfreed in destroy()
>> however it is being dereferenced in classify() path.
>>
>> We could be in classify() path after destroy() was called and thus
>> tp->root is null. Verifying if tp->root is null in classify() path
>> is enough because it's being freed with kfree_rcu() and classify()
>> path is under rcu_read_lock().
>>
>> Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>> Signed-off-by: Roi Dayan <roid@mellanox.com>
>> Cc: Cong Wang <cwang@twopensource.com>
>
> This is correct
>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>
> The other way to fix this would be to move tp->ops->destroy call to
> call_rcu phase. That would require bigger changes though. net-next
> perhaps?

Hmm, I don't think we want to have such an additional test in fast
path for each and every classifier. Can we think of ways to avoid that?

My question is, since we unlink individual instances from such tp-internal
lists through RCU and release the instance through call_rcu() as well as
the head (tp->root) via kfree_rcu() eventually, against what are we protecting
setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
not respecting grace period?

The only thing that actually checks if tp->root is NULL right now is the
get() callback. Is that the reason why tp->root is RCU'ified? John?

Thanks,
Daniel

>> Hi Cong, all
>>
>> As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: destroy
>> proto tp when all filters are gone"). This patch provides a fix only for cls_flower where
>> I succeeded in reproducing the issue. Cong, if you can/want to come up with a fix that
>> will be applicable for all the others classifiners, I am fine with that.
>>
>> Thanks,
>> Roi
>>
>>
>> net/sched/cls_flower.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index e8dd09a..88a26c4 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>> 	struct fl_flow_key skb_mkey;
>> 	struct ip_tunnel_info *info;
>>
>> -	if (!atomic_read(&head->ht.nelems))
>> +	if (!head || !atomic_read(&head->ht.nelems))
>> 		return -1;
>>
>> 	fl_clear_masked_range(&skb_key, &head->mask);
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 16:04   ` Daniel Borkmann
@ 2016-11-22 16:11     ` Jiri Pirko
  2016-11-22 19:28       ` Cong Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Pirko @ 2016-11-22 16:11 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Roi Dayan, David S. Miller, netdev, Jiri Pirko, Cong Wang,
	Or Gerlitz, Cong Wang, john.fastabend

Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>[ + John ]
>
>On 11/22/2016 03:48 PM, Jiri Pirko wrote:
>> Tue, Nov 22, 2016 at 03:25:26PM CET, roid@mellanox.com wrote:
>> > tp->root is being allocated in init() time and kfreed in destroy()
>> > however it is being dereferenced in classify() path.
>> > 
>> > We could be in classify() path after destroy() was called and thus
>> > tp->root is null. Verifying if tp->root is null in classify() path
>> > is enough because it's being freed with kfree_rcu() and classify()
>> > path is under rcu_read_lock().
>> > 
>> > Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>> > Signed-off-by: Roi Dayan <roid@mellanox.com>
>> > Cc: Cong Wang <cwang@twopensource.com>
>> 
>> This is correct
>> 
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> 
>> The other way to fix this would be to move tp->ops->destroy call to
>> call_rcu phase. That would require bigger changes though. net-next
>> perhaps?
>
>Hmm, I don't think we want to have such an additional test in fast
>path for each and every classifier. Can we think of ways to avoid that?
>
>My question is, since we unlink individual instances from such tp-internal
>lists through RCU and release the instance through call_rcu() as well as
>the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>not respecting grace period?

If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
to null.


>
>The only thing that actually checks if tp->root is NULL right now is the
>get() callback. Is that the reason why tp->root is RCU'ified? John?
>
>Thanks,
>Daniel
>
>> > Hi Cong, all
>> > 
>> > As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: destroy
>> > proto tp when all filters are gone"). This patch provides a fix only for cls_flower where
>> > I succeeded in reproducing the issue. Cong, if you can/want to come up with a fix that
>> > will be applicable for all the others classifiners, I am fine with that.
>> > 
>> > Thanks,
>> > Roi
>> > 
>> > 
>> > net/sched/cls_flower.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> > index e8dd09a..88a26c4 100644
>> > --- a/net/sched/cls_flower.c
>> > +++ b/net/sched/cls_flower.c
>> > @@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>> > 	struct fl_flow_key skb_mkey;
>> > 	struct ip_tunnel_info *info;
>> > 
>> > -	if (!atomic_read(&head->ht.nelems))
>> > +	if (!head || !atomic_read(&head->ht.nelems))
>> > 		return -1;
>> > 
>> > 	fl_clear_masked_range(&skb_key, &head->mask);
>> > --
>> > 2.7.4
>> > 
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 15:37   ` David Miller
@ 2016-11-22 16:13     ` Jiri Pirko
  0 siblings, 0 replies; 16+ messages in thread
From: Jiri Pirko @ 2016-11-22 16:13 UTC (permalink / raw)
  To: David Miller; +Cc: roid, netdev, jiri, xiyou.wangcong, ogerlitz, cwang

Tue, Nov 22, 2016 at 04:37:42PM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Tue, 22 Nov 2016 15:48:44 +0100
>
>> Tue, Nov 22, 2016 at 03:25:26PM CET, roid@mellanox.com wrote:
>>>tp->root is being allocated in init() time and kfreed in destroy()
>>>however it is being dereferenced in classify() path.
>>>
>>>We could be in classify() path after destroy() was called and thus 
>>>tp->root is null. Verifying if tp->root is null in classify() path 
>>>is enough because it's being freed with kfree_rcu() and classify() 
>>>path is under rcu_read_lock().
>>>
>>>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>>>Signed-off-by: Roi Dayan <roid@mellanox.com>
>>>Cc: Cong Wang <cwang@twopensource.com>
>> 
>> This is correct
>> 
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> 
>> The other way to fix this would be to move tp->ops->destroy call to
>> call_rcu phase. That would require bigger changes though. net-next
>> perhaps?
>
>This patch is targetted at net-next as per Subj.

Oh, right, then it should be fixed so the tp->head could be never null

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 16:11     ` Jiri Pirko
@ 2016-11-22 19:28       ` Cong Wang
  2016-11-22 20:41         ` Daniel Borkmann
  2016-11-24 20:25         ` David Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Cong Wang @ 2016-11-22 19:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Daniel Borkmann, Roi Dayan, David S. Miller,
	Linux Kernel Network Developers, Jiri Pirko, Or Gerlitz,
	Cong Wang, John Fastabend

On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>Hmm, I don't think we want to have such an additional test in fast
>>path for each and every classifier. Can we think of ways to avoid that?
>>
>>My question is, since we unlink individual instances from such tp-internal
>>lists through RCU and release the instance through call_rcu() as well as
>>the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>not respecting grace period?
>
> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
> to null.

We do need to respect the grace period if we touch the globally visible
data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
right place.

Also I don't know why you blame my commit, this problem should already
exist prior to my commit, probably date back to John's RCU patches.

I am working on a patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 19:28       ` Cong Wang
@ 2016-11-22 20:41         ` Daniel Borkmann
  2016-11-22 23:36           ` John Fastabend
  2016-11-24 20:25         ` David Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2016-11-22 20:41 UTC (permalink / raw)
  To: Cong Wang, Jiri Pirko
  Cc: Roi Dayan, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Or Gerlitz, Cong Wang, John Fastabend

On 11/22/2016 08:28 PM, Cong Wang wrote:
> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>> Hmm, I don't think we want to have such an additional test in fast
>>> path for each and every classifier. Can we think of ways to avoid that?
>>>
>>> My question is, since we unlink individual instances from such tp-internal
>>> lists through RCU and release the instance through call_rcu() as well as
>>> the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>> not respecting grace period?
>>
>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>> to null.

But that's not really an answer to my question. ;)

> We do need to respect the grace period if we touch the globally visible
> data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
> right place.

I think there may be multiple issues actually.

At the time we go into tc_classify(), from ingress as well as egress side,
we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
we use kfree_rcu() on tp, although we iterate tps (and implicitly inner filters)
via rcu_dereference_bh() from reader side. Is there a reason why we don't
use call_rcu_bh() variant on destruction for all this instead?

Just looking at cls_bpf and others, what protects RCU_INIT_POINTER(tp->root,
NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
tcf_destroy() cases. Still active readers under RCU BH can race against this
(tp->root being NULL), as the commit identified. Only the get() callback checks
for head against NULL, but both are serialized under rtnl, and the only place
we call this is tc_ctl_tfilter(). Even if we create a new tp, head should not
be NULL there, if it was assigned during the init() cb, but contains an empty
list. (It's different for things like cls_cgroup, though.) So, I'm wondering
if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead (unless I'm
missing something obvious)?

> Also I don't know why you blame my commit, this problem should already
> exist prior to my commit, probably date back to John's RCU patches.

It seems so.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 20:41         ` Daniel Borkmann
@ 2016-11-22 23:36           ` John Fastabend
  2016-11-23  5:24             ` Cong Wang
  0 siblings, 1 reply; 16+ messages in thread
From: John Fastabend @ 2016-11-22 23:36 UTC (permalink / raw)
  To: Daniel Borkmann, Cong Wang, Jiri Pirko
  Cc: Roi Dayan, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Or Gerlitz, Cong Wang

On 16-11-22 12:41 PM, Daniel Borkmann wrote:
> On 11/22/2016 08:28 PM, Cong Wang wrote:
>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>>> Hmm, I don't think we want to have such an additional test in fast
>>>> path for each and every classifier. Can we think of ways to avoid that?
>>>>
>>>> My question is, since we unlink individual instances from such
>>>> tp-internal
>>>> lists through RCU and release the instance through call_rcu() as
>>>> well as
>>>> the head (tp->root) via kfree_rcu() eventually, against what are we
>>>> protecting
>>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
>>>> Something
>>>> not respecting grace period?
>>>
>>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>>> to null.
> 
> But that's not really an answer to my question. ;)
> 
>> We do need to respect the grace period if we touch the globally visible
>> data structure tp in tcf_destroy(). Therefore Roi's patch is not
>> fixing the
>> right place.
> 
> I think there may be multiple issues actually.
> 
> At the time we go into tc_classify(), from ingress as well as egress side,
> we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
> everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
> we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
> filters)
> via rcu_dereference_bh() from reader side. Is there a reason why we don't
> use call_rcu_bh() variant on destruction for all this instead?

I can't think of any if its all under _bh we can convert the call_rcu to
call_rcu_bh it just needs an audit.

> 
> Just looking at cls_bpf and others, what protects
> RCU_INIT_POINTER(tp->root,
> NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
> tcf_destroy() cases. Still active readers under RCU BH can race against
> this
> (tp->root being NULL), as the commit identified. Only the get() callback
> checks
> for head against NULL, but both are serialized under rtnl, and the only
> place
> we call this is tc_ctl_tfilter(). Even if we create a new tp, head
> should not
> be NULL there, if it was assigned during the init() cb, but contains an
> empty
> list. (It's different for things like cls_cgroup, though.) So, I'm
> wondering
> if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
> (unless I'm
> missing something obvious)?


Just took a look at this I think there are a couple possible solutions.
The easiest is likely to fix all the call sites so that 'tp' is unlinked
before calling the destroy() handlers AND not doing the NULL set. I only
see one such call site where destroy is called before unlinking at the
moment. This should enforce that after a grace period there is no path
to reach the classifiers because 'tp' is unlinked. Calling destroy
before unlinking 'tp' however could cause a small race between grace
period of 'tp' and grace period of the filter.

Another would be to only call the destroy path from the call_rcu path
of the 'tp' object so that destroy is only ever called after the object
is guaranteed to be unlinked from the tc_filter path.

I think both solutions would be fine.

Cong were you working on one of these? Or do you have another idea?


> 
>> Also I don't know why you blame my commit, this problem should already
>> exist prior to my commit, probably date back to John's RCU patches.
> 
> It seems so.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 23:36           ` John Fastabend
@ 2016-11-23  5:24             ` Cong Wang
  2016-11-23 11:29               ` Daniel Borkmann
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2016-11-23  5:24 UTC (permalink / raw)
  To: John Fastabend
  Cc: Daniel Borkmann, Jiri Pirko, Roi Dayan, David S. Miller,
	Linux Kernel Network Developers, Jiri Pirko, Or Gerlitz,
	Cong Wang

On Tue, Nov 22, 2016 at 3:36 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 16-11-22 12:41 PM, Daniel Borkmann wrote:
>> On 11/22/2016 08:28 PM, Cong Wang wrote:
>>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>>>> Hmm, I don't think we want to have such an additional test in fast
>>>>> path for each and every classifier. Can we think of ways to avoid that?
>>>>>
>>>>> My question is, since we unlink individual instances from such
>>>>> tp-internal
>>>>> lists through RCU and release the instance through call_rcu() as
>>>>> well as
>>>>> the head (tp->root) via kfree_rcu() eventually, against what are we
>>>>> protecting
>>>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
>>>>> Something
>>>>> not respecting grace period?
>>>>
>>>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>>>> to null.
>>
>> But that's not really an answer to my question. ;)
>>
>>> We do need to respect the grace period if we touch the globally visible
>>> data structure tp in tcf_destroy(). Therefore Roi's patch is not
>>> fixing the
>>> right place.
>>
>> I think there may be multiple issues actually.
>>
>> At the time we go into tc_classify(), from ingress as well as egress side,
>> we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
>> everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
>> we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
>> filters)
>> via rcu_dereference_bh() from reader side. Is there a reason why we don't
>> use call_rcu_bh() variant on destruction for all this instead?
>
> I can't think of any if its all under _bh we can convert the call_rcu to
> call_rcu_bh it just needs an audit.
>
>>
>> Just looking at cls_bpf and others, what protects
>> RCU_INIT_POINTER(tp->root,
>> NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
>> tcf_destroy() cases. Still active readers under RCU BH can race against
>> this
>> (tp->root being NULL), as the commit identified. Only the get() callback
>> checks
>> for head against NULL, but both are serialized under rtnl, and the only
>> place
>> we call this is tc_ctl_tfilter(). Even if we create a new tp, head
>> should not
>> be NULL there, if it was assigned during the init() cb, but contains an
>> empty
>> list. (It's different for things like cls_cgroup, though.) So, I'm
>> wondering
>> if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
>> (unless I'm
>> missing something obvious)?
>
>
> Just took a look at this I think there are a couple possible solutions.
> The easiest is likely to fix all the call sites so that 'tp' is unlinked
> before calling the destroy() handlers AND not doing the NULL set. I only
> see one such call site where destroy is called before unlinking at the
> moment. This should enforce that after a grace period there is no path
> to reach the classifiers because 'tp' is unlinked. Calling destroy
> before unlinking 'tp' however could cause a small race between grace
> period of 'tp' and grace period of the filter.
>
> Another would be to only call the destroy path from the call_rcu path
> of the 'tp' object so that destroy is only ever called after the object
> is guaranteed to be unlinked from the tc_filter path.
>
> I think both solutions would be fine.
>
> Cong were you working on one of these? Or do you have another idea?

Yeah, this is basic what I think as well, however, both are hard.
On one hand, we can't detach the tp from the global singly-linked list
before tcf_destroy() since we rely on its return value to make this decision.
On the other hand, it is a singly-linked list, we have to pass in the address
of its previous pointer to rcu callback to remove it, it seems racy as well
since we modify a previous pointer which is still visible globally...

Hmm, perhaps we really have to switch to a doubly-linked list, that is
list_head. I need to double check. And also the semantic of ->destroy()
needs to revise too.

So yeah, my commit should be blamed. :-/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-23  5:24             ` Cong Wang
@ 2016-11-23 11:29               ` Daniel Borkmann
  2016-11-23 19:29                 ` Cong Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2016-11-23 11:29 UTC (permalink / raw)
  To: Cong Wang, John Fastabend
  Cc: Jiri Pirko, Roi Dayan, David S. Miller,
	Linux Kernel Network Developers, Jiri Pirko, Or Gerlitz,
	Cong Wang

On 11/23/2016 06:24 AM, Cong Wang wrote:
> On Tue, Nov 22, 2016 at 3:36 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> On 16-11-22 12:41 PM, Daniel Borkmann wrote:
>>> On 11/22/2016 08:28 PM, Cong Wang wrote:
>>>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>>>>> Hmm, I don't think we want to have such an additional test in fast
>>>>>> path for each and every classifier. Can we think of ways to avoid that?
>>>>>>
>>>>>> My question is, since we unlink individual instances from such
>>>>>> tp-internal
>>>>>> lists through RCU and release the instance through call_rcu() as
>>>>>> well as
>>>>>> the head (tp->root) via kfree_rcu() eventually, against what are we
>>>>>> protecting
>>>>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
>>>>>> Something
>>>>>> not respecting grace period?
>>>>>
>>>>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>>>>> to null.
>>>
>>> But that's not really an answer to my question. ;)
>>>
>>>> We do need to respect the grace period if we touch the globally visible
>>>> data structure tp in tcf_destroy(). Therefore Roi's patch is not
>>>> fixing the
>>>> right place.
>>>
>>> I think there may be multiple issues actually.
>>>
>>> At the time we go into tc_classify(), from ingress as well as egress side,
>>> we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
>>> everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
>>> we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
>>> filters)
>>> via rcu_dereference_bh() from reader side. Is there a reason why we don't
>>> use call_rcu_bh() variant on destruction for all this instead?
>>
>> I can't think of any if its all under _bh we can convert the call_rcu to
>> call_rcu_bh it just needs an audit.
>>
>>> Just looking at cls_bpf and others, what protects
>>> RCU_INIT_POINTER(tp->root,
>>> NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
>>> tcf_destroy() cases. Still active readers under RCU BH can race against
>>> this
>>> (tp->root being NULL), as the commit identified. Only the get() callback
>>> checks
>>> for head against NULL, but both are serialized under rtnl, and the only
>>> place
>>> we call this is tc_ctl_tfilter(). Even if we create a new tp, head
>>> should not
>>> be NULL there, if it was assigned during the init() cb, but contains an
>>> empty
>>> list. (It's different for things like cls_cgroup, though.) So, I'm
>>> wondering
>>> if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
>>> (unless I'm
>>> missing something obvious)?
>>
>> Just took a look at this I think there are a couple possible solutions.
>> The easiest is likely to fix all the call sites so that 'tp' is unlinked
>> before calling the destroy() handlers AND not doing the NULL set. I only
>> see one such call site where destroy is called before unlinking at the
>> moment. This should enforce that after a grace period there is no path
>> to reach the classifiers because 'tp' is unlinked. Calling destroy
>> before unlinking 'tp' however could cause a small race between grace
>> period of 'tp' and grace period of the filter.
>>
>> Another would be to only call the destroy path from the call_rcu path
>> of the 'tp' object so that destroy is only ever called after the object
>> is guaranteed to be unlinked from the tc_filter path.
>>
>> I think both solutions would be fine.
>>
>> Cong were you working on one of these? Or do you have another idea?
>
> Yeah, this is basic what I think as well, however, both are hard.
> On one hand, we can't detach the tp from the global singly-linked list
> before tcf_destroy() since we rely on its return value to make this decision.
> On the other hand, it is a singly-linked list, we have to pass in the address
> of its previous pointer to rcu callback to remove it, it seems racy as well
> since we modify a previous pointer which is still visible globally...

Can't we drop the 'force' parameter from tcf_destroy() and related cls
destroy() callbacks, and change the logic roughly like this:

[...]
         case RTM_DELTFILTER:
                 err = tp->ops->delete(tp, fh, &drop_tp);
                 if (err == 0) {
                         struct tcf_proto *next = rtnl_dereference(tp->next);

                         tfilter_notify(net, skb, n, tp,
                                        t->tcm_handle,
                                        RTM_DELTFILTER, false);
                         if (drop_tp) {
                                 RCU_INIT_POINTER(*back, next);
                                 tcf_destroy(tp);
                         }
                 }
                 goto errout;
[...]

This one was the only tcf_destroy() instance with force=false. Why can't
the prior delete() callback make the decision whether the tp now has no
further internal filters and thus can be dropped. Afaik, delete() and
destroy() are protected by RTNL anyway. Thus, we could unlink the tp from
the list before tcf_destroy(), which should then work with grace period
as well. Given we remove the setting of tp->root to NULL, any outstanding
readers for that grace period should either still execute the 'scheduled
for removal' filter we just dropped, or find an empty list of filters.

> Hmm, perhaps we really have to switch to a doubly-linked list, that is
> list_head. I need to double check. And also the semantic of ->destroy()
> needs to revise too.

Can you elaborate why double-linked list? Isn't the tp list always protected
from modifications via RTNL in control path, and walked via rcu_dereference_bh()
in data path?

> So yeah, my commit should be blamed. :-/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-23 11:29               ` Daniel Borkmann
@ 2016-11-23 19:29                 ` Cong Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Cong Wang @ 2016-11-23 19:29 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: John Fastabend, Jiri Pirko, Roi Dayan, David S. Miller,
	Linux Kernel Network Developers, Jiri Pirko, Or Gerlitz,
	Cong Wang

On Wed, Nov 23, 2016 at 3:29 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Can't we drop the 'force' parameter from tcf_destroy() and related cls
> destroy() callbacks, and change the logic roughly like this:
>
> [...]
>         case RTM_DELTFILTER:
>                 err = tp->ops->delete(tp, fh, &drop_tp);
>                 if (err == 0) {
>                         struct tcf_proto *next = rtnl_dereference(tp->next);
>
>                         tfilter_notify(net, skb, n, tp,
>                                        t->tcm_handle,
>                                        RTM_DELTFILTER, false);
>                         if (drop_tp) {
>                                 RCU_INIT_POINTER(*back, next);
>                                 tcf_destroy(tp);
>                         }
>                 }
>                 goto errout;
> [...]
>
> This one was the only tcf_destroy() instance with force=false. Why can't
> the prior delete() callback make the decision whether the tp now has no
> further internal filters and thus can be dropped. Afaik, delete() and
> destroy() are protected by RTNL anyway. Thus, we could unlink the tp from
> the list before tcf_destroy(), which should then work with grace period
> as well. Given we remove the setting of tp->root to NULL, any outstanding
> readers for that grace period should either still execute the 'scheduled
> for removal' filter we just dropped, or find an empty list of filters.

This is exactly why I said "the semantic of ->destroy() needs to revise too",
this is a reasonable revision of course, but the change is still large because
we need to move that logic from ->destroy() to ->delete(). I was trying to find
a relatively small fix for -net and -stable, for -net-next we could do
aggressive
change as long as it's necessary. This is why I am still thinking about it,
perhaps there is no quick fix for this bug.


>
>> Hmm, perhaps we really have to switch to a doubly-linked list, that is
>> list_head. I need to double check. And also the semantic of ->destroy()
>> needs to revise too.
>
>
> Can you elaborate why double-linked list? Isn't the tp list always protected
> from modifications via RTNL in control path, and walked via
> rcu_dereference_bh()
> in data path?

At least two benefits we can get from using doubly-linked list:

1) No need to pass a 'prev' pointer if we want to remove tp in a RCU callback,
list_del_rcu(&tp->head) is just enough.

2) No need to worry about RCU pointers because list_head has RCU API's
already, much more readable to me.

Of course, the size of struct tcf_proto will grow a bit, but it doesn't seem to
be a problem.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-22 19:28       ` Cong Wang
  2016-11-22 20:41         ` Daniel Borkmann
@ 2016-11-24 20:25         ` David Miller
  2016-11-25  0:17           ` Daniel Borkmann
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2016-11-24 20:25 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: jiri, daniel, roid, netdev, jiri, ogerlitz, cwang, john.fastabend

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 22 Nov 2016 11:28:37 -0800

> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>>Hmm, I don't think we want to have such an additional test in fast
>>>path for each and every classifier. Can we think of ways to avoid that?
>>>
>>>My question is, since we unlink individual instances from such tp-internal
>>>lists through RCU and release the instance through call_rcu() as well as
>>>the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>>>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>>not respecting grace period?
>>
>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>> to null.
> 
> We do need to respect the grace period if we touch the globally visible
> data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
> right place.

Another idea is to assign tp->root to a dummy static cls_fl_head object,
instead of NULL, which we just make sure has an ht.elems value of zero.

This avoids having to touch the fast path and also avoids all of these
complicated changes being discussed wrt. doing things in call_rcu_bh()
or whatever.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-24 20:25         ` David Miller
@ 2016-11-25  0:17           ` Daniel Borkmann
  2016-11-25  6:23             ` Cong Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2016-11-25  0:17 UTC (permalink / raw)
  To: David Miller, xiyou.wangcong
  Cc: jiri, roid, netdev, jiri, ogerlitz, cwang, john.fastabend,
	alexei.starovoitov

On 11/24/2016 09:25 PM, David Miller wrote:
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Tue, 22 Nov 2016 11:28:37 -0800
>
>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>>> Hmm, I don't think we want to have such an additional test in fast
>>>> path for each and every classifier. Can we think of ways to avoid that?
>>>>
>>>> My question is, since we unlink individual instances from such tp-internal
>>>> lists through RCU and release the instance through call_rcu() as well as
>>>> the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>>> not respecting grace period?
>>>
>>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>>> to null.
>>
>> We do need to respect the grace period if we touch the globally visible
>> data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
>> right place.
>
> Another idea is to assign tp->root to a dummy static cls_fl_head object,
> instead of NULL, which we just make sure has an ht.elems value of zero.
>
> This avoids having to touch the fast path and also avoids all of these
> complicated changes being discussed wrt. doing things in call_rcu_bh()
> or whatever.

I'm not sure if setting a dummy object for each affected classifier is
making things better. Are you having this in mind as a target for -net?

We do kfree_rcu() the head (tp->root) and likewise do we kfree_rcu() the
tp immediately after the callback. The head object's lifetime for such
classifiers is thus always bound to the tp lifetime. And besides that,
nothing apart from get() checks whether it's actually really NULL (and
that check in get() is odd anyway; some cls do it, some don't).

I took a deeper look into the history, and found that this was not always
the case. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic: fix NULL
pointer dereference") moved tp->root initialization into init() routine,
where before it was part of change(), so get() had to deal with head being
NULL back then, so that was indeed a valid case. Also some classify()
callbacks were checking for head being NULL, so destroy would set it to
NULL, f.e. 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg() in packet
classifiers").

For basic, bpf, flow, flower, matchall, I cooked the below diff which
should be usable and small enough for -net.

The remaining pieces from ->destroy() to ->delete() conversion patch from
Cong, we could later on do in -net-next.

Roi, could you check the below for flower with your setup?

(Btw, matchall is still broken besides this fix. mall_delete() sets the
  RCU_INIT_POINTER(head->filter, NULL), so that a mall_delete() plus
  mall_destroy() combo doesn't free head->filter twice, but doing that is
  racy with mall_classify() where head->filter is dereferenced unchecked.
  Requires additional fix.)

  net/sched/cls_basic.c    |  4 ----
  net/sched/cls_bpf.c      |  4 ----
  net/sched/cls_flow.c     |  1 -
  net/sched/cls_flower.c   | 15 +++++++++++----
  net/sched/cls_matchall.c |  1 -
  5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index eb219b7..5877f60 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -62,9 +62,6 @@ static unsigned long basic_get(struct tcf_proto *tp, u32 handle)
  	struct basic_head *head = rtnl_dereference(tp->root);
  	struct basic_filter *f;

-	if (head == NULL)
-		return 0UL;
-
  	list_for_each_entry(f, &head->flist, link) {
  		if (f->handle == handle) {
  			l = (unsigned long) f;
@@ -109,7 +106,6 @@ static bool basic_destroy(struct tcf_proto *tp, bool force)
  		tcf_unbind_filter(tp, &f->res);
  		call_rcu(&f->rcu, basic_delete_filter);
  	}
-	RCU_INIT_POINTER(tp->root, NULL);
  	kfree_rcu(head, rcu);
  	return true;
  }
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index bb1d5a4..0a47ba5 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -292,7 +292,6 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool force)
  		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
  	}

-	RCU_INIT_POINTER(tp->root, NULL);
  	kfree_rcu(head, rcu);
  	return true;
  }
@@ -303,9 +302,6 @@ static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
  	struct cls_bpf_prog *prog;
  	unsigned long ret = 0UL;

-	if (head == NULL)
-		return 0UL;
-
  	list_for_each_entry(prog, &head->plist, link) {
  		if (prog->handle == handle) {
  			ret = (unsigned long) prog;
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index e396723..6575aba 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -596,7 +596,6 @@ static bool flow_destroy(struct tcf_proto *tp, bool force)
  		list_del_rcu(&f->list);
  		call_rcu(&f->rcu, flow_destroy_filter);
  	}
-	RCU_INIT_POINTER(tp->root, NULL);
  	kfree_rcu(head, rcu);
  	return true;
  }
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index f6f40fb..f6a7ae0 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -269,6 +269,15 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f)
  	dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol, &tc);
  }

+static void fl_destroy_rcu(struct rcu_head *rcu)
+{
+	struct cls_fl_head *head = container_of(rcu, struct cls_fl_head, rcu);
+
+	if (head->mask_assigned)
+		rhashtable_destroy(&head->ht);
+	kfree(head);
+}
+
  static bool fl_destroy(struct tcf_proto *tp, bool force)
  {
  	struct cls_fl_head *head = rtnl_dereference(tp->root);
@@ -282,10 +291,8 @@ static bool fl_destroy(struct tcf_proto *tp, bool force)
  		list_del_rcu(&f->list);
  		call_rcu(&f->rcu, fl_destroy_filter);
  	}
-	RCU_INIT_POINTER(tp->root, NULL);
-	if (head->mask_assigned)
-		rhashtable_destroy(&head->ht);
-	kfree_rcu(head, rcu);
+
+	call_rcu(&head->rcu, fl_destroy_rcu);
  	return true;
  }

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 25927b6..f935429 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -114,7 +114,6 @@ static bool mall_destroy(struct tcf_proto *tp, bool force)

  		call_rcu(&f->rcu, mall_destroy_filter);
  	}
-	RCU_INIT_POINTER(tp->root, NULL);
  	kfree_rcu(head, rcu);
  	return true;
  }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
  2016-11-25  0:17           ` Daniel Borkmann
@ 2016-11-25  6:23             ` Cong Wang
  2016-11-25  9:43               ` matchall race (was: Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it) Daniel Borkmann
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2016-11-25  6:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, Jiri Pirko, Roi Dayan,
	Linux Kernel Network Developers, Jiri Pirko, Or Gerlitz,
	Cong Wang, John Fastabend, Alexei Starovoitov

On Thu, Nov 24, 2016 at 4:17 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
>
> I'm not sure if setting a dummy object for each affected classifier is
> making things better. Are you having this in mind as a target for -net?
>
> We do kfree_rcu() the head (tp->root) and likewise do we kfree_rcu() the
> tp immediately after the callback. The head object's lifetime for such
> classifiers is thus always bound to the tp lifetime. And besides that,
> nothing apart from get() checks whether it's actually really NULL (and
> that check in get() is odd anyway; some cls do it, some don't).
>

Excellent point.

I thought we should exclude any parallel readers when we call destroy(),
you are taking a different approach by observing we only have to exclude
readers when we really free them, this seems fine to me after a second
thought, because the RCU API should take care of races with readers so
as long as we free everything in RCU callback we are good. Hmm...

But I may miss something since I am not an RCU expert.

[...]
>
> (Btw, matchall is still broken besides this fix. mall_delete() sets the
>  RCU_INIT_POINTER(head->filter, NULL), so that a mall_delete() plus
>  mall_destroy() combo doesn't free head->filter twice, but doing that is
>  racy with mall_classify() where head->filter is dereferenced unchecked.
>  Requires additional fix.)

This seems due to matchall only has one filter per tp. But you don't need
to worry since readers never read a freed pointer, right?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* matchall race (was: Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it)
  2016-11-25  6:23             ` Cong Wang
@ 2016-11-25  9:43               ` Daniel Borkmann
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Borkmann @ 2016-11-25  9:43 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Jiri Pirko, Roi Dayan,
	Linux Kernel Network Developers, Jiri Pirko, Or Gerlitz,
	Cong Wang, John Fastabend, Alexei Starovoitov

[ Making this a different thread, since unrelated to the other one. ]

On 11/25/2016 07:23 AM, Cong Wang wrote:
> On Thu, Nov 24, 2016 at 4:17 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
[...]
>>
>> (Btw, matchall is still broken besides this fix. mall_delete() sets the
>>   RCU_INIT_POINTER(head->filter, NULL), so that a mall_delete() plus
>>   mall_destroy() combo doesn't free head->filter twice, but doing that is
>>   racy with mall_classify() where head->filter is dereferenced unchecked.
>>   Requires additional fix.)
>
> This seems due to matchall only has one filter per tp. But you don't need
> to worry since readers never read a freed pointer, right?

The race would be that CPU0 is in tc_ctl_tfilter() with RTM_DELTFILTER and
tcm_handle = 1 request. matchall has only single handle and id of it is 1
(or a prior user-specified one). So in tc_ctl_tfilter() ->get() will find it,
returns pointer as fh. Then we call ->delete(). mall_delete() sets head->filter
to NULL with RCU_INIT_POINTER() (head->filter is not even an RCU pointer btw),
real filter destruction comes via call_rcu(). If we got here, that tp is still
visible and not unlinked from tp chain. So in CPU1 tc_classify() executes
mall_classify(), we fetch the head (tp->root) and the corresponding head->filter
from there, and dereference it next. Probably best would have been to just
return -EOPNOTSUPP for ->delete() like in cls_cgroup case. It probably makes
sense to just defer real destruction to mall_destroy() instead of mall_delete()
if one wants to avoid extra !f test in mall_classify().

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-11-25  9:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-22 14:25 [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it Roi Dayan
2016-11-22 14:48 ` Jiri Pirko
2016-11-22 15:37   ` David Miller
2016-11-22 16:13     ` Jiri Pirko
2016-11-22 16:04   ` Daniel Borkmann
2016-11-22 16:11     ` Jiri Pirko
2016-11-22 19:28       ` Cong Wang
2016-11-22 20:41         ` Daniel Borkmann
2016-11-22 23:36           ` John Fastabend
2016-11-23  5:24             ` Cong Wang
2016-11-23 11:29               ` Daniel Borkmann
2016-11-23 19:29                 ` Cong Wang
2016-11-24 20:25         ` David Miller
2016-11-25  0:17           ` Daniel Borkmann
2016-11-25  6:23             ` Cong Wang
2016-11-25  9:43               ` matchall race (was: Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it) Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.