netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] net/sched: cls_u32: fix refcount leak
@ 2019-12-17 23:00 Davide Caratti
  2019-12-17 23:00 ` [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change() Davide Caratti
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Davide Caratti @ 2019-12-17 23:00 UTC (permalink / raw)
  To: netdev, David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Vlad Buslov, Roman Mashak

a refcount leak in the error path of u32_change() has been recently
introduced. It can be observed with the following commands:

  [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 97 \
  > u32 match ip src 127.0.0.1/32 indev notexist20 flowid 1:1 action drop
  RTNETLINK answers: Invalid argument
  We have an error talking to the kernel
  [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 98 \
  > handle 42:42 u32 divisor 256
  Error: cls_u32: Divisor can only be used on a hash table.
  We have an error talking to the kernel
  [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 99 \
  > u32 ht 47:47
  Error: cls_u32: Specified hash table not found.
  We have an error talking to the kernel

they all legitimately return -EINVAL; however, they leave semi-configured
filters at eth0 tc ingress:

 [root@f31 ~]# tc filter show dev eth0 ingress
 filter protocol ip pref 97 u32 chain 0
 filter protocol ip pref 97 u32 chain 0 fh 800: ht divisor 1
 filter protocol ip pref 98 u32 chain 0
 filter protocol ip pref 98 u32 chain 0 fh 801: ht divisor 1
 filter protocol ip pref 99 u32 chain 0
 filter protocol ip pref 99 u32 chain 0 fh 802: ht divisor 1

With older kernels, filters were unconditionally considered empty (and
thus de-refcounted) on the error path of ->change().
After commit 8b64678e0af8 ("net: sched: refactor tp insert/delete for
concurrent execution"), filters were considered empty when the walk()
function didn't set 'walker.stop' to 1.
Finally, with commit 6676d5e416ee ("net: sched: set dedicated tcf_walker
flag when tp is empty"), tc filters are considered empty unless the walker
function is called with a non-NULL handle. This last change doesn't fit
cls_u32 design, because at least the "root hnode" is (almost) always
non-NULL, as it's allocated in u32_init().

- patch 1/2 is a proposal to restore the original kernel behavior, where
  no filter was installed in the error path of u32_change().
- patch 2/2 adds tdc selftests that can be ued to verify the correct
  behavior of u32 in the error path of ->change().

Davide Caratti (2):
  net/sched: cls_u32: fix refcount leak in the error path of
    u32_change()
  tc-testing: initial tdc selftests for cls_u32

 net/sched/cls_u32.c                           |  25 +++
 .../tc-testing/tc-tests/filters/tests.json    |  22 --
 .../tc-testing/tc-tests/filters/u32.json      | 205 ++++++++++++++++++
 3 files changed, 230 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/filters/u32.json

-- 
2.23.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-17 23:00 [PATCH net 0/2] net/sched: cls_u32: fix refcount leak Davide Caratti
@ 2019-12-17 23:00 ` Davide Caratti
  2019-12-18 14:23   ` Jamal Hadi Salim
  2019-12-17 23:00 ` [PATCH net 2/2] tc-testing: initial tdc selftests for cls_u32 Davide Caratti
  2019-12-20  1:53 ` [PATCH net 0/2] net/sched: cls_u32: fix refcount leak David Miller
  2 siblings, 1 reply; 20+ messages in thread
From: Davide Caratti @ 2019-12-17 23:00 UTC (permalink / raw)
  To: netdev, David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Vlad Buslov, Roman Mashak

when users replace cls_u32 filters with new ones having wrong parameters,
so that u32_change() fails to validate them, the kernel doesn't roll-back
correctly, and leaves semi-configured rules.

Fix this in u32_walk(), avoiding a call to the walker function on filters
that don't have a match rule connected. The side effect is, these "empty"
filters are not even dumped when present; but that shouldn't be a problem
as long as we are restoring the original behaviour, where semi-configured
filters were not even added in the error path of u32_change().

Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/cls_u32.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index a0e6fac613de..66c6bcec16cb 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -1108,10 +1108,33 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
+static bool u32_hnode_empty(struct tc_u_hnode *ht, bool *non_root_ht)
+{
+	int i;
+
+	if (!ht)
+		return true;
+	if (!ht->is_root) {
+		*non_root_ht = true;
+		return false;
+	}
+	if (*non_root_ht)
+		return false;
+	if (ht->refcnt < 2)
+		return true;
+
+	for (i = 0; i <= ht->divisor; i++) {
+		if (rtnl_dereference(ht->ht[i]))
+			return false;
+	}
+	return true;
+}
+
 static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 		     bool rtnl_held)
 {
 	struct tc_u_common *tp_c = tp->data;
+	bool non_root_ht = false;
 	struct tc_u_hnode *ht;
 	struct tc_u_knode *n;
 	unsigned int h;
@@ -1124,6 +1147,8 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 	     ht = rtnl_dereference(ht->next)) {
 		if (ht->prio != tp->prio)
 			continue;
+		if (u32_hnode_empty(ht, &non_root_ht))
+			return;
 		if (arg->count >= arg->skip) {
 			if (arg->fn(tp, ht, arg) < 0) {
 				arg->stop = 1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net 2/2] tc-testing: initial tdc selftests for cls_u32
  2019-12-17 23:00 [PATCH net 0/2] net/sched: cls_u32: fix refcount leak Davide Caratti
  2019-12-17 23:00 ` [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change() Davide Caratti
@ 2019-12-17 23:00 ` Davide Caratti
  2019-12-20  1:53 ` [PATCH net 0/2] net/sched: cls_u32: fix refcount leak David Miller
  2 siblings, 0 replies; 20+ messages in thread
From: Davide Caratti @ 2019-12-17 23:00 UTC (permalink / raw)
  To: netdev, David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Vlad Buslov, Roman Mashak

- move test "e9a3 - Add u32 with source match" to u32.json, and change the
  match pattern to catch all hnodes
- add testcases for relevant error paths of cls_u32 module

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 .../tc-testing/tc-tests/filters/tests.json    |  22 --
 .../tc-testing/tc-tests/filters/u32.json      | 205 ++++++++++++++++++
 2 files changed, 205 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/filters/u32.json

diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json b/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
index 0f89cd50a94b..8877f7b2b809 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
@@ -1,26 +1,4 @@
 [
-    {
-        "id": "e9a3",
-        "name": "Add u32 with source match",
-        "category": [
-            "filter",
-            "u32"
-        ],
-        "plugins": {
-                "requires": "nsPlugin"
-        },
-        "setup": [
-            "$TC qdisc add dev $DEV1 ingress"
-        ],
-        "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 u32 match ip src 127.0.0.1/32 flowid 1:1 action ok",
-        "expExitCode": "0",
-        "verifyCmd": "$TC filter show dev $DEV1 parent ffff:",
-        "matchPattern": "match 7f000001/ffffffff at 12",
-        "matchCount": "1",
-        "teardown": [
-            "$TC qdisc del dev $DEV1 ingress"
-        ]
-    },
     {
         "id": "2638",
         "name": "Add matchall and try to get it",
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
new file mode 100644
index 000000000000..e09d3c0e307f
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
@@ -0,0 +1,205 @@
+[
+    {
+        "id": "afa9",
+        "name": "Add u32 with source match",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress"
+        ],
+        "cmdUnderTest": "$TC filter add dev $DEV1 ingress protocol ip prio 1 u32 match ip src 127.0.0.1/32 flowid 1:1 action ok",
+        "expExitCode": "0",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "filter protocol ip pref 1 u32 chain (0[ ]+$|0 fh 800: ht divisor 1|0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1.*match 7f000001/ffffffff at 12)",
+        "matchCount": "3",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "6aa7",
+        "name": "Add/Replace u32 with source match and invalid indev",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress"
+        ],
+        "cmdUnderTest": "$TC filter replace dev $DEV1 ingress protocol ip prio 1 u32 match ip src 127.0.0.1/32 indev notexist20 flowid 1:1 action ok",
+        "expExitCode": "2",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "filter protocol ip pref 1 u32 chain 0",
+        "matchCount": "0",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "bc4d",
+        "name": "Replace valid u32 with source match and invalid indev",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress",
+            "$TC filter add dev $DEV1 ingress protocol ip prio 1 u32 match ip src 127.0.0.3/32 flowid 1:3 action ok"
+        ],
+        "cmdUnderTest": "$TC filter replace dev $DEV1 ingress protocol ip prio 1 u32 match ip src 127.0.0.2/32 indev notexist20 flowid 1:2 action ok",
+        "expExitCode": "2",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "filter protocol ip pref 1 u32 chain (0[ ]+$|0 fh 800: ht divisor 1|0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:3.*match 7f000003/ffffffff at 12)",
+        "matchCount": "3",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "648b",
+        "name": "Add u32 with custom hash table",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress"
+        ],
+        "cmdUnderTest": "$TC filter add dev $DEV1 ingress prio 99 handle 42: u32 divisor 256",
+        "expExitCode": "0",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "pref 99 u32 chain (0[ ]+$|0 fh 42: ht divisor 256|0 fh 800: ht divisor 1)",
+        "matchCount": "3",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "6658",
+        "name": "Add/Replace u32 with custom hash table and invalid handle",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress"
+        ],
+        "cmdUnderTest": "$TC filter replace dev $DEV1 ingress prio 99 handle 42:42 u32 divisor 256",
+        "expExitCode": "2",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "pref 99 u32 chain 0",
+        "matchCount": "0",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "9d0a",
+        "name": "Replace valid u32 with custom hash table and invalid handle",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress",
+            "$TC filter add dev $DEV1 ingress prio 99 handle 42: u32 divisor 256"
+        ],
+        "cmdUnderTest": "$TC filter replace dev $DEV1 ingress prio 99 handle 42:42 u32 divisor 128",
+        "expExitCode": "2",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "pref 99 u32 chain (0[ ]+$|0 fh 42: ht divisor 256|0 fh 800: ht divisor 1)",
+        "matchCount": "3",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "1644",
+        "name": "Add u32 filter that links to a custom hash table",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress",
+            "$TC filter add dev $DEV1 ingress prio 99 handle 43: u32 divisor 256"
+        ],
+        "cmdUnderTest": "$TC filter add dev $DEV1 ingress protocol ip prio 98 u32 link 43: hashkey mask 0x0000ff00 at 12 match ip src 192.168.0.0/16",
+        "expExitCode": "0",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "filter protocol ip pref 98 u32 chain (0[ ]+$|0 fh 801: ht divisor 1|0 fh 801::800 order 2048 key ht 801 bkt 0 link 43:.*match c0a80000/ffff0000 at 12.*hash mask 0000ff00 at 12)",
+        "matchCount": "3",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "74c2",
+        "name": "Add/Replace u32 filter with invalid hash table id",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress"
+        ],
+        "cmdUnderTest": "$TC filter replace dev $DEV1 ingress protocol ip prio 20 u32 ht 47:47 action drop",
+        "expExitCode": "2",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "filter protocol ip pref 20 u32 chain 0",
+        "matchCount": "0",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    },
+    {
+        "id": "1fe6",
+        "name": "Replace valid u32 filter with invalid hash table id",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 ingress",
+            "$TC filter add dev $DEV1 ingress protocol ip prio 99 handle 43: u32 divisor 1",
+            "$TC filter add dev $DEV1 ingress protocol ip prio 98 u32 ht 43: match tcp src 22 FFFF classid 1:3"
+        ],
+        "cmdUnderTest": "$TC filter replace dev $DEV1 ingress protocol ip prio 98 u32 ht 43:1 match tcp src 23 FFFF classid 1:4",
+        "expExitCode": "2",
+        "verifyCmd": "$TC filter show dev $DEV1 ingress",
+        "matchPattern": "filter protocol ip pref 99 u32 chain (0[ ]+$|0 fh (43|800): ht divisor 1|0 fh 43::800 order 2048 key ht 43 bkt 0 flowid 1:3.*match 00160000/ffff0000 at nexthdr\\+0)",
+        "matchCount": "4",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 ingress"
+        ]
+    }
+]
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-17 23:00 ` [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change() Davide Caratti
@ 2019-12-18 14:23   ` Jamal Hadi Salim
  2019-12-19 13:32     ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-18 14:23 UTC (permalink / raw)
  To: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko
  Cc: Vlad Buslov, Roman Mashak


On 2019-12-17 6:00 p.m., Davide Caratti wrote:
> when users replace cls_u32 filters with new ones having wrong parameters,
> so that u32_change() fails to validate them, the kernel doesn't roll-back
> correctly, and leaves semi-configured rules.
> 
> Fix this in u32_walk(), avoiding a call to the walker function on filters
> that don't have a match rule connected. The side effect is, these "empty"
> filters are not even dumped when present; but that shouldn't be a problem
> as long as we are restoring the original behaviour, where semi-configured
> filters were not even added in the error path of u32_change().
> 
> Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Hi Davide,

Great catch (and good test case addition),
but I am not sure about the fix.

Unless I am  misunderstanding the flow is:
You enter bad rules, validation fails, partial state had already been
created (in this case root hts) before validation failure, you then
leave the partial state in the kernel but when someone dumps you hide
these bad tables?

It sounds like the root cause is there is a missing destroy()
invocation somewhere during the create/validation failure - which is
what needs fixing... Those dangling tables should not have been
inserted; maybe somewhere along the code path for tc_new_tfilter().
Note "replace" is essentially "create if it doesnt
exist" semantic therefore NLM_F_CREATE will be set.

Since this is a core cls issue - I would check all other classifiers
with similar aggrevation - make some attribute fail in the middle or
end.
Very likely only u32 is the victim.

cheers,
jamal


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-18 14:23   ` Jamal Hadi Salim
@ 2019-12-19 13:32     ` Jamal Hadi Salim
  2019-12-19 16:15       ` Vlad Buslov
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-19 13:32 UTC (permalink / raw)
  To: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko
  Cc: Vlad Buslov, Roman Mashak

Hi Davide,

I ran your test on my laptop (4.19) and nothing dangled.
Looking at that area of the code difference 4.19 vs current net-next
there was a destroy() in there that migrated into the inner guts of
tcf_chain_tp_delete_empty() Vlad? My gut feeling is restoring the old
logic like this would work at least for u32:

------------
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6a0eacafdb19..34a1d4e7e6e3 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2135,8 +2135,10 @@ static int tc_new_tfilter(struct sk_buff *skb, 
struct nlmsghdr *n,
         }

  errout:
-       if (err && tp_created)
+       if (err && tp_created) {
+               tcf_proto_destroy(tp, rtnl_held, true, NULL);
                 tcf_chain_tp_delete_empty(chain, tp, rtnl_held, NULL);
+       }
  errout_tp:
         if (chain) {
                 if (tp && !IS_ERR(tp))
-----

Maybe even better tcf_proto_put(tp, rtnl_held, NULL) directly instead
and no need for tcf_chain_tp_delete_empty().

Of course above not even compile tested and may have consequences
for other classifiers (flower would be a good test) and concurency
intentions. Thoughts?

cheers,
jamal

On 2019-12-18 9:23 a.m., Jamal Hadi Salim wrote:
> 
> On 2019-12-17 6:00 p.m., Davide Caratti wrote:
>> when users replace cls_u32 filters with new ones having wrong parameters,
>> so that u32_change() fails to validate them, the kernel doesn't roll-back
>> correctly, and leaves semi-configured rules.
>>
>> Fix this in u32_walk(), avoiding a call to the walker function on filters
>> that don't have a match rule connected. The side effect is, these "empty"
>> filters are not even dumped when present; but that shouldn't be a problem
>> as long as we are restoring the original behaviour, where semi-configured
>> filters were not even added in the error path of u32_change().
>>
>> Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when 
>> tp is empty")
>> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> 
> Hi Davide,
> 
> Great catch (and good test case addition),
> but I am not sure about the fix.
> 
> Unless I am  misunderstanding the flow is:
> You enter bad rules, validation fails, partial state had already been
> created (in this case root hts) before validation failure, you then
> leave the partial state in the kernel but when someone dumps you hide
> these bad tables?
> 
> It sounds like the root cause is there is a missing destroy()
> invocation somewhere during the create/validation failure - which is
> what needs fixing... Those dangling tables should not have been
> inserted; maybe somewhere along the code path for tc_new_tfilter().
> Note "replace" is essentially "create if it doesnt
> exist" semantic therefore NLM_F_CREATE will be set.
> 
> Since this is a core cls issue - I would check all other classifiers
> with similar aggrevation - make some attribute fail in the middle or
> end.
> Very likely only u32 is the victim.
> 
> cheers,
> jamal
> 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-19 13:32     ` Jamal Hadi Salim
@ 2019-12-19 16:15       ` Vlad Buslov
  2019-12-19 16:33         ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Vlad Buslov @ 2019-12-19 16:15 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko,
	Vlad Buslov, Roman Mashak


On Thu 19 Dec 2019 at 15:32, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> Hi Davide,
>
> I ran your test on my laptop (4.19) and nothing dangled.
> Looking at that area of the code difference 4.19 vs current net-next
> there was a destroy() in there that migrated into the inner guts of
> tcf_chain_tp_delete_empty() Vlad? My gut feeling is restoring the old
> logic like this would work at least for u32:
>
> ------------
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 6a0eacafdb19..34a1d4e7e6e3 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -2135,8 +2135,10 @@ static int tc_new_tfilter(struct sk_buff *skb, struct
> nlmsghdr *n,
>         }
>
>  errout:
> -       if (err && tp_created)
> +       if (err && tp_created) {
> +               tcf_proto_destroy(tp, rtnl_held, true, NULL);
>                 tcf_chain_tp_delete_empty(chain, tp, rtnl_held, NULL);
> +       }
>  errout_tp:
>         if (chain) {
>                 if (tp && !IS_ERR(tp))
> -----
>
> Maybe even better tcf_proto_put(tp, rtnl_held, NULL) directly instead
> and no need for tcf_chain_tp_delete_empty().
>
> Of course above not even compile tested and may have consequences
> for other classifiers (flower would be a good test) and concurency
> intentions. Thoughts?
>
> cheers,
> jamal
>
> On 2019-12-18 9:23 a.m., Jamal Hadi Salim wrote:
>>
>> On 2019-12-17 6:00 p.m., Davide Caratti wrote:
>>> when users replace cls_u32 filters with new ones having wrong parameters,
>>> so that u32_change() fails to validate them, the kernel doesn't roll-back
>>> correctly, and leaves semi-configured rules.
>>>
>>> Fix this in u32_walk(), avoiding a call to the walker function on filters
>>> that don't have a match rule connected. The side effect is, these "empty"
>>> filters are not even dumped when present; but that shouldn't be a problem
>>> as long as we are restoring the original behaviour, where semi-configured
>>> filters were not even added in the error path of u32_change().
>>>
>>> Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is
>>> empty")
>>> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
>>
>> Hi Davide,
>>
>> Great catch (and good test case addition),
>> but I am not sure about the fix.
>>
>> Unless I am  misunderstanding the flow is:
>> You enter bad rules, validation fails, partial state had already been
>> created (in this case root hts) before validation failure, you then
>> leave the partial state in the kernel but when someone dumps you hide
>> these bad tables?
>>
>> It sounds like the root cause is there is a missing destroy()
>> invocation somewhere during the create/validation failure - which is
>> what needs fixing... Those dangling tables should not have been
>> inserted; maybe somewhere along the code path for tc_new_tfilter().
>> Note "replace" is essentially "create if it doesnt
>> exist" semantic therefore NLM_F_CREATE will be set.
>>
>> Since this is a core cls issue - I would check all other classifiers
>> with similar aggrevation - make some attribute fail in the middle or
>> end.
>> Very likely only u32 is the victim.
>>
>> cheers,
>> jamal
>>

Hi Jamal,

Just destroying tp unconditionally will break unlocked case (flower)
because of possibility of concurrent insertion of new filters to the
same tp instance.

The root cause here is precisely described by Davide in cover letter -
to accommodate concurrent insertions cls API verifies that tp instance
is empty before deleting it and since there is no cls ops to do it
directly, it relies on checking that walk() stopped without accessing
any filters instead. Unfortunately, somw classifier implementations
assumed that there is always at least one filter on classifier (I fixed
several of these) and now Davide also uncovered this leak in u32.

As a simpler solution to fix such issues once and for all I can propose
not to perform the walk() check at all and assume that any classifier
implementation that doesn't have TCF_PROTO_OPS_DOIT_UNLOCKED flag set is
empty in tcf_chain_tp_delete_empty() (there is no possibility of
concurrent insertion when synchronizing with rtnl).

WDYT?

Regards,
Vlad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-19 16:15       ` Vlad Buslov
@ 2019-12-19 16:33         ` Jamal Hadi Salim
  2019-12-19 16:51           ` Vlad Buslov
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-19 16:33 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko,
	Roman Mashak

On 2019-12-19 11:15 a.m., Vlad Buslov wrote:


> Hi Jamal,
> 
> Just destroying tp unconditionally will break unlocked case (flower)
> because of possibility of concurrent insertion of new filters to the
> same tp instance.
> 

I was worried about that. So rtnlheld doesnt help?

> The root cause here is precisely described by Davide in cover letter -
> to accommodate concurrent insertions cls API verifies that tp instance
> is empty before deleting it and since there is no cls ops to do it
> directly, it relies on checking that walk() stopped without accessing
> any filters instead. Unfortunately, somw classifier implementations
> assumed that there is always at least one filter on classifier (I fixed
> several of these) and now Davide also uncovered this leak in u32.
> 
> As a simpler solution to fix such issues once and for all I can propose
> not to perform the walk() check at all and assume that any classifier
> implementation that doesn't have TCF_PROTO_OPS_DOIT_UNLOCKED flag set is
> empty in tcf_chain_tp_delete_empty() (there is no possibility of
> concurrent insertion when synchronizing with rtnl).
> 
> WDYT?

IMO that would be a cleaner fix give walk() is used for other
operations and this is a core cls issue.
Also: documenting what it takes for a classifier to support
TCF_PROTO_OPS_DOIT_UNLOCKED is useful (you may have done this
in some commit already).

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-19 16:33         ` Jamal Hadi Salim
@ 2019-12-19 16:51           ` Vlad Buslov
  2019-12-19 17:01             ` Vlad Buslov
  0 siblings, 1 reply; 20+ messages in thread
From: Vlad Buslov @ 2019-12-19 16:51 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Vlad Buslov, Davide Caratti, netdev, David S. Miller, Cong Wang,
	Jiri Pirko, Roman Mashak


On Thu 19 Dec 2019 at 18:33, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 2019-12-19 11:15 a.m., Vlad Buslov wrote:
>
>
>> Hi Jamal,
>>
>> Just destroying tp unconditionally will break unlocked case (flower)
>> because of possibility of concurrent insertion of new filters to the
>> same tp instance.
>>
>
> I was worried about that. So rtnlheld doesnt help?

Rtnl_held flag can be set for multiple other reasons besides
locked/unlocked classifier implementation (parent Qdisc doesn't support
unlocked execution, retry in tc_new_tfilter(), etc.).

>
>> The root cause here is precisely described by Davide in cover letter -
>> to accommodate concurrent insertions cls API verifies that tp instance
>> is empty before deleting it and since there is no cls ops to do it
>> directly, it relies on checking that walk() stopped without accessing
>> any filters instead. Unfortunately, somw classifier implementations
>> assumed that there is always at least one filter on classifier (I fixed
>> several of these) and now Davide also uncovered this leak in u32.
>>
>> As a simpler solution to fix such issues once and for all I can propose
>> not to perform the walk() check at all and assume that any classifier
>> implementation that doesn't have TCF_PROTO_OPS_DOIT_UNLOCKED flag set is
>> empty in tcf_chain_tp_delete_empty() (there is no possibility of
>> concurrent insertion when synchronizing with rtnl).
>>
>> WDYT?
>
> IMO that would be a cleaner fix give walk() is used for other
> operations and this is a core cls issue.
> Also: documenting what it takes for a classifier to support
> TCF_PROTO_OPS_DOIT_UNLOCKED is useful (you may have done this
> in some commit already).

Well, idea was not to have classifier implement any specific API. If
classifier has TCF_PROTO_OPS_DOIT_UNLOCKED, then cls API doesn't obtain
rtnl and it is up to classifier to implement whatever locking necessary
internally. However, my approach to checking for empty classifier
instance uncovered multiple bugs and inconsistencies in classifier
implementations of ops->walk().

So I guess the requirement now is for unlocked classifier to have sane
implementation of ops->walk() that doesn't assume >1 filters and
correctly handles the case when insertion of first filter after
classifier instance is created fails.

>
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-19 16:51           ` Vlad Buslov
@ 2019-12-19 17:01             ` Vlad Buslov
  2019-12-20 12:11               ` Jamal Hadi Salim
  2019-12-20 13:21               ` Davide Caratti
  0 siblings, 2 replies; 20+ messages in thread
From: Vlad Buslov @ 2019-12-19 17:01 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Jamal Hadi Salim, Davide Caratti, netdev, David S. Miller,
	Cong Wang, Jiri Pirko, Roman Mashak


On Thu 19 Dec 2019 at 18:51, Vlad Buslov <vladbu@mellanox.com> wrote:
> On Thu 19 Dec 2019 at 18:33, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> On 2019-12-19 11:15 a.m., Vlad Buslov wrote:
>>
>>
>>> Hi Jamal,
>>>
>>> Just destroying tp unconditionally will break unlocked case (flower)
>>> because of possibility of concurrent insertion of new filters to the
>>> same tp instance.
>>>
>>
>> I was worried about that. So rtnlheld doesnt help?
>
> Rtnl_held flag can be set for multiple other reasons besides
> locked/unlocked classifier implementation (parent Qdisc doesn't support
> unlocked execution, retry in tc_new_tfilter(), etc.).
>
>>
>>> The root cause here is precisely described by Davide in cover letter -
>>> to accommodate concurrent insertions cls API verifies that tp instance
>>> is empty before deleting it and since there is no cls ops to do it
>>> directly, it relies on checking that walk() stopped without accessing
>>> any filters instead. Unfortunately, somw classifier implementations
>>> assumed that there is always at least one filter on classifier (I fixed
>>> several of these) and now Davide also uncovered this leak in u32.
>>>
>>> As a simpler solution to fix such issues once and for all I can propose
>>> not to perform the walk() check at all and assume that any classifier
>>> implementation that doesn't have TCF_PROTO_OPS_DOIT_UNLOCKED flag set is
>>> empty in tcf_chain_tp_delete_empty() (there is no possibility of
>>> concurrent insertion when synchronizing with rtnl).
>>>
>>> WDYT?
>>
>> IMO that would be a cleaner fix give walk() is used for other
>> operations and this is a core cls issue.
>> Also: documenting what it takes for a classifier to support
>> TCF_PROTO_OPS_DOIT_UNLOCKED is useful (you may have done this
>> in some commit already).
>
> Well, idea was not to have classifier implement any specific API. If
> classifier has TCF_PROTO_OPS_DOIT_UNLOCKED, then cls API doesn't obtain
> rtnl and it is up to classifier to implement whatever locking necessary
> internally. However, my approach to checking for empty classifier
> instance uncovered multiple bugs and inconsistencies in classifier
> implementations of ops->walk().
>
> So I guess the requirement now is for unlocked classifier to have sane
> implementation of ops->walk() that doesn't assume >1 filters and
> correctly handles the case when insertion of first filter after
> classifier instance is created fails.

BTW another approach would be to extend ops with new callback
delete_empty(), require unlocked implementation to define it and move
functionality of tcf_proto_check_delete() there. Such approach would
remove the need for (ab)using ops->walk() for this since internally
in classifier implementation there is always a way to correctly verify
that classifier instance is empty. Don't know which approach is better
in this case.

>
>>
>> cheers,
>> jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 0/2] net/sched: cls_u32: fix refcount leak
  2019-12-17 23:00 [PATCH net 0/2] net/sched: cls_u32: fix refcount leak Davide Caratti
  2019-12-17 23:00 ` [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change() Davide Caratti
  2019-12-17 23:00 ` [PATCH net 2/2] tc-testing: initial tdc selftests for cls_u32 Davide Caratti
@ 2019-12-20  1:53 ` David Miller
  2019-12-20 12:14   ` Jamal Hadi Salim
  2 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2019-12-20  1:53 UTC (permalink / raw)
  To: dcaratti; +Cc: netdev, jhs, xiyou.wangcong, jiri, vladbu, mrv

From: Davide Caratti <dcaratti@redhat.com>
Date: Wed, 18 Dec 2019 00:00:03 +0100

> a refcount leak in the error path of u32_change() has been recently
> introduced. It can be observed with the following commands:
 ...
> they all legitimately return -EINVAL; however, they leave semi-configured
> filters at eth0 tc ingress:
 ...
> With older kernels, filters were unconditionally considered empty (and
> thus de-refcounted) on the error path of ->change().
> After commit 8b64678e0af8 ("net: sched: refactor tp insert/delete for
> concurrent execution"), filters were considered empty when the walk()
> function didn't set 'walker.stop' to 1.
> Finally, with commit 6676d5e416ee ("net: sched: set dedicated tcf_walker
> flag when tp is empty"), tc filters are considered empty unless the walker
> function is called with a non-NULL handle. This last change doesn't fit
> cls_u32 design, because at least the "root hnode" is (almost) always
> non-NULL, as it's allocated in u32_init().
> 
> - patch 1/2 is a proposal to restore the original kernel behavior, where
>   no filter was installed in the error path of u32_change().
> - patch 2/2 adds tdc selftests that can be ued to verify the correct
>   behavior of u32 in the error path of ->change().

Series applied, thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-19 17:01             ` Vlad Buslov
@ 2019-12-20 12:11               ` Jamal Hadi Salim
  2019-12-20 12:25                 ` Jamal Hadi Salim
  2019-12-20 13:21               ` Davide Caratti
  1 sibling, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-20 12:11 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko,
	Roman Mashak

On 2019-12-19 12:01 p.m., Vlad Buslov wrote:

> 
> BTW another approach would be to extend ops with new callback
> delete_empty(), require unlocked implementation to define it and move
> functionality of tcf_proto_check_delete() there. Such approach would
> remove the need for (ab)using ops->walk() for this since internally
> in classifier implementation there is always a way to correctly verify
> that classifier instance is empty. Don't know which approach is better
> in this case.

I see both as complementing each other. delete_empty()
could serves like guidance almost for someone who wants to implement
parallelization (and stops abuse of walk()) and
TCF_PROTO_OPS_DOIT_UNLOCKED is more of a shortcut. IOW, you
could at the top of tcf_proto_check_delete() return true
if TCF_PROTO_OPS_DOIT_UNLOCKED is set while still invoking
delete_empty() (instead of walk()) if you go past that.
Would that work?

cheers,
jamal


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 0/2] net/sched: cls_u32: fix refcount leak
  2019-12-20  1:53 ` [PATCH net 0/2] net/sched: cls_u32: fix refcount leak David Miller
@ 2019-12-20 12:14   ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-20 12:14 UTC (permalink / raw)
  To: David Miller, dcaratti; +Cc: netdev, xiyou.wangcong, jiri, vladbu, mrv

On 2019-12-19 8:53 p.m., David Miller wrote:
> From: Davide Caratti <dcaratti@redhat.com>
> Date: Wed, 18 Dec 2019 00:00:03 +0100
> 
>> a refcount leak in the error path of u32_change() has been recently
>> introduced. It can be observed with the following commands:
>   ...
>> they all legitimately return -EINVAL; however, they leave semi-configured
>> filters at eth0 tc ingress:
>   ...
>> With older kernels, filters were unconditionally considered empty (and
>> thus de-refcounted) on the error path of ->change().
>> After commit 8b64678e0af8 ("net: sched: refactor tp insert/delete for
>> concurrent execution"), filters were considered empty when the walk()
>> function didn't set 'walker.stop' to 1.
>> Finally, with commit 6676d5e416ee ("net: sched: set dedicated tcf_walker
>> flag when tp is empty"), tc filters are considered empty unless the walker
>> function is called with a non-NULL handle. This last change doesn't fit
>> cls_u32 design, because at least the "root hnode" is (almost) always
>> non-NULL, as it's allocated in u32_init().
>>
>> - patch 1/2 is a proposal to restore the original kernel behavior, where
>>    no filter was installed in the error path of u32_change().
>> - patch 2/2 adds tdc selftests that can be ued to verify the correct
>>    behavior of u32 in the error path of ->change().
> 
> Series applied, thanks.
> 

There is still an ongoing discussion on this patch set...

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-20 12:11               ` Jamal Hadi Salim
@ 2019-12-20 12:25                 ` Jamal Hadi Salim
  2019-12-20 13:29                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-20 12:25 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko,
	Roman Mashak

[-- Attachment #1: Type: text/plain, Size: 454 bytes --]

On 2019-12-20 7:11 a.m., Jamal Hadi Salim wrote:

> I see both as complementing each other. delete_empty()
> could serves like guidance almost for someone who wants to implement
> parallelization (and stops abuse of walk()) and
> TCF_PROTO_OPS_DOIT_UNLOCKED is more of a shortcut. IOW, you
> could at the top of tcf_proto_check_delete() return true
> if TCF_PROTO_OPS_DOIT_UNLOCKED is set while still invoking


Something like attached...

cheers,
jamal

[-- Attachment #2: patchlet-doitunlocked --]
[-- Type: text/plain, Size: 516 bytes --]

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6a0eacafdb19..8f1e49ba65fa 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -331,6 +331,11 @@ static bool tcf_proto_is_empty(struct tcf_proto *tp, bool rtnl_held)
 
 static bool tcf_proto_check_delete(struct tcf_proto *tp, bool rtnl_held)
 {
+	if (!(tp->ops->flags & TCF_PROTO_OPS_DOIT_UNLOCKED)) {
+		tp->deleting = true;
+		return tp->deleting;
+	}
+
 	spin_lock(&tp->lock);
 	if (tcf_proto_is_empty(tp, rtnl_held))
 		tp->deleting = true;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-19 17:01             ` Vlad Buslov
  2019-12-20 12:11               ` Jamal Hadi Salim
@ 2019-12-20 13:21               ` Davide Caratti
  2019-12-20 13:54                 ` Jamal Hadi Salim
  1 sibling, 1 reply; 20+ messages in thread
From: Davide Caratti @ 2019-12-20 13:21 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Jamal Hadi Salim, netdev, David S. Miller, Cong Wang, Jiri Pirko,
	Roman Mashak

hi Jamal and Vlad,

thanks a lot for sharing your thoughts.

On Thu, 2019-12-19 at 17:01 +0000, Vlad Buslov wrote:
> > > IMO that would be a cleaner fix give walk() is used for other
> > > operations and this is a core cls issue.
> > > 

[...]

> > So I guess the requirement now is for unlocked classifier to have sane
> > implementation of ops->walk() that doesn't assume >1 filters and
> > correctly handles the case when insertion of first filter after
> > classifier instance is created fails.

I tried forcing an error in matchall, and didn't observe this problem:

[root@f31 ~]# perf record -e probe:mall_change__return -agR -- tc filter add dev dam0 parent root matchall skip_sw action drop
RTNETLINK answers: Operation not supported
We have an error talking to the kernel
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.225 MB perf.data (1 samples) ]
[root@f31 ~]# perf script
tc 115241 [002] 19665.372130: probe:mall_change__return: (ffffffffc115f930 <- ffffffffb98a7266) ret=0xffffffa1
        ffffffffb9066790 kretprobe_trampoline+0x0 (vmlinux)
            7fa64c16cb77 __libc_sendmsg+0x17 (/usr/lib64/libc-2.30.so)

[root@f31 ~]# tc filter show dev dam0 
[root@f31 ~]#

and similar test can be also carried for the other classifiers
(on unpatched kernel), so it should be easy to understand if there are
other filter that show the same problem. 

> BTW another approach would be to extend ops with new callback
> delete_empty(), require unlocked implementation to define it and move
> functionality of tcf_proto_check_delete() there. Such approach would
> remove the need for (ab)using ops->walk() for this since internally
> in classifier implementation there is always a way to correctly verify
> that classifier instance is empty. Don't know which approach is better
> in this case.

I like the idea of using walk() only when filters are dumped, and handling
the check for empty filters on deletion with a separate callback. That
would allow de-refcounting the filter in the error path unconditionally
for   all classifiers, like old kernel was doing, unless the classifier
implements its own "check empty" function.

how does it sound?
thanks!
-- 
davide




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-20 12:25                 ` Jamal Hadi Salim
@ 2019-12-20 13:29                   ` Jamal Hadi Salim
  2019-12-20 14:04                     ` Vlad Buslov
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-20 13:29 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko,
	Roman Mashak

On 2019-12-20 7:25 a.m., Jamal Hadi Salim wrote:
> On 2019-12-20 7:11 a.m., Jamal Hadi Salim wrote:
> 
>> I see both as complementing each other. delete_empty()
>> could serves like guidance almost for someone who wants to implement
>> parallelization (and stops abuse of walk()) and
>> TCF_PROTO_OPS_DOIT_UNLOCKED is more of a shortcut. IOW, you
>> could at the top of tcf_proto_check_delete() return true
>> if TCF_PROTO_OPS_DOIT_UNLOCKED is set while still invoking
> 
> 
> Something like attached...

Vlad,
I tested this and it seems to fix the issue. But there may be
other consequences...

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-20 13:21               ` Davide Caratti
@ 2019-12-20 13:54                 ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-20 13:54 UTC (permalink / raw)
  To: Davide Caratti, Vlad Buslov
  Cc: netdev, David S. Miller, Cong Wang, Jiri Pirko, Roman Mashak

On 2019-12-20 8:21 a.m., Davide Caratti wrote:
> hi Jamal and Vlad,
> 
> thanks a lot for sharing your thoughts.
> 
> On Thu, 2019-12-19 at 17:01 +0000, Vlad Buslov wrote:
>>>> IMO that would be a cleaner fix give walk() is used for other
>>>> operations and this is a core cls issue.

> I tried forcing an error in matchall, and didn't observe this problem:
> 
> [root@f31 ~]# perf record -e probe:mall_change__return -agR -- tc filter add dev dam0 parent root matchall skip_sw action drop
> RTNETLINK answers: Operation not supported
> We have an error talking to the kernel
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.225 MB perf.data (1 samples) ]
> [root@f31 ~]# perf script
> tc 115241 [002] 19665.372130: probe:mall_change__return: (ffffffffc115f930 <- ffffffffb98a7266) ret=0xffffffa1
>          ffffffffb9066790 kretprobe_trampoline+0x0 (vmlinux)
>              7fa64c16cb77 __libc_sendmsg+0x17 (/usr/lib64/libc-2.30.so)
> 
> [root@f31 ~]# tc filter show dev dam0
> [root@f31 ~]#
> 
> and similar test can be also carried for the other classifiers
> (on unpatched kernel), so it should be easy to understand if there are
> other filter that show the same problem.
> 

I think this one works because of the simpler walk().
In the corner cases you caught u32 is more complex. It would create
multiple tp and so the list would not appear empty.

>> BTW another approach would be to extend ops with new callback
>> delete_empty(), require unlocked implementation to define it and move
>> functionality of tcf_proto_check_delete() there. Such approach would
>> remove the need for (ab)using ops->walk() for this since internally
>> in classifier implementation there is always a way to correctly verify
>> that classifier instance is empty. Don't know which approach is better
>> in this case.
> 
> I like the idea of using walk() only when filters are dumped, and handling
> the check for empty filters on deletion with a separate callback. That
> would allow de-refcounting the filter in the error path unconditionally
> for   all classifiers, like old kernel was doing, unless the classifier
> implements its own "check empty" function.
> 
> how does it sound?

Agreed.
Note: my comment on other email was also to consider using
TCF_PROTO_OPS_DOIT_UNLOCKED for obvious cases.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-20 13:29                   ` Jamal Hadi Salim
@ 2019-12-20 14:04                     ` Vlad Buslov
  2019-12-20 14:57                       ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Vlad Buslov @ 2019-12-20 14:04 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Vlad Buslov, Davide Caratti, netdev, David S. Miller, Cong Wang,
	Jiri Pirko, Roman Mashak


On Fri 20 Dec 2019 at 15:29, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 2019-12-20 7:25 a.m., Jamal Hadi Salim wrote:
>> On 2019-12-20 7:11 a.m., Jamal Hadi Salim wrote:
>>
>>> I see both as complementing each other. delete_empty()
>>> could serves like guidance almost for someone who wants to implement
>>> parallelization (and stops abuse of walk()) and
>>> TCF_PROTO_OPS_DOIT_UNLOCKED is more of a shortcut. IOW, you
>>> could at the top of tcf_proto_check_delete() return true
>>> if TCF_PROTO_OPS_DOIT_UNLOCKED is set while still invoking
>>
>>
>> Something like attached...
>
> Vlad,
> I tested this and it seems to fix the issue. But there may be
> other consequences...
>
> cheers,
> jamal

Hi Jamal,

Yes, I think the patch would work. However, we don't really need the
flags check, if we are going to implement the new ops->delete_empty()
callback because it can work like this:

if (!tp->ops->delete_empty) {
   tp->deleting = true;
   return tp->deleting;
} else {
  return tp->ops->delete_empty(tp);
}

WDYT?

Regards,
Vlad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-20 14:04                     ` Vlad Buslov
@ 2019-12-20 14:57                       ` Jamal Hadi Salim
  2019-12-20 15:20                         ` Davide Caratti
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-20 14:57 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Davide Caratti, netdev, David S. Miller, Cong Wang, Jiri Pirko,
	Roman Mashak

On 2019-12-20 9:04 a.m., Vlad Buslov wrote:
> 

> Hi Jamal,
> 
> Yes, I think the patch would work. However, we don't really need the
> flags check, if we are going to implement the new ops->delete_empty()
> callback because it can work like this:
> 
> if (!tp->ops->delete_empty) {
>     tp->deleting = true;
>     return tp->deleting;
> } else {
>    return tp->ops->delete_empty(tp);
> }
> 
> WDYT?

Looks reasonable - we kill two birds with one stone.
We'd need to revert the patch David took in.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-20 14:57                       ` Jamal Hadi Salim
@ 2019-12-20 15:20                         ` Davide Caratti
  2019-12-20 15:23                           ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Davide Caratti @ 2019-12-20 15:20 UTC (permalink / raw)
  To: Jamal Hadi Salim, Vlad Buslov
  Cc: netdev, David S. Miller, Cong Wang, Jiri Pirko, Roman Mashak

On Fri, 2019-12-20 at 09:57 -0500, Jamal Hadi Salim wrote:
> On 2019-12-20 9:04 a.m., Vlad Buslov wrote:
> > Hi Jamal,
> > 
> > Yes, I think the patch would work. However, we don't really need the
> > flags check, if we are going to implement the new ops->delete_empty()
> > callback because it can work like this:
> > 
> > if (!tp->ops->delete_empty) {
> >     tp->deleting = true;
> >     return tp->deleting;
> > } else {
> >    return tp->ops->delete_empty(tp);
> > }
> > 
> > WDYT?
> 
> Looks reasonable - we kill two birds with one stone.
> We'd need to revert the patch David took in.
> 
> cheers,
> jamal

... ok, I can try to do a series that reverts the fix on cls_u32 and
implements the check_delete_empty for cls_flower. You might find it above
the christmas tree (*)?

-- 
davide 

(*) above, not under, because the tree is reversed.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change()
  2019-12-20 15:20                         ` Davide Caratti
@ 2019-12-20 15:23                           ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2019-12-20 15:23 UTC (permalink / raw)
  To: Davide Caratti, Vlad Buslov
  Cc: netdev, David S. Miller, Cong Wang, Jiri Pirko, Roman Mashak

On 2019-12-20 10:20 a.m., Davide Caratti wrote:

[..]
>
> ... ok, I can try to do a series that reverts the fix on cls_u32 and
> implements the check_delete_empty for cls_flower. You might find it above
> the christmas tree (*)?
> 

Thanks Davide. Please go for it..

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-12-20 15:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 23:00 [PATCH net 0/2] net/sched: cls_u32: fix refcount leak Davide Caratti
2019-12-17 23:00 ` [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the error path of u32_change() Davide Caratti
2019-12-18 14:23   ` Jamal Hadi Salim
2019-12-19 13:32     ` Jamal Hadi Salim
2019-12-19 16:15       ` Vlad Buslov
2019-12-19 16:33         ` Jamal Hadi Salim
2019-12-19 16:51           ` Vlad Buslov
2019-12-19 17:01             ` Vlad Buslov
2019-12-20 12:11               ` Jamal Hadi Salim
2019-12-20 12:25                 ` Jamal Hadi Salim
2019-12-20 13:29                   ` Jamal Hadi Salim
2019-12-20 14:04                     ` Vlad Buslov
2019-12-20 14:57                       ` Jamal Hadi Salim
2019-12-20 15:20                         ` Davide Caratti
2019-12-20 15:23                           ` Jamal Hadi Salim
2019-12-20 13:21               ` Davide Caratti
2019-12-20 13:54                 ` Jamal Hadi Salim
2019-12-17 23:00 ` [PATCH net 2/2] tc-testing: initial tdc selftests for cls_u32 Davide Caratti
2019-12-20  1:53 ` [PATCH net 0/2] net/sched: cls_u32: fix refcount leak David Miller
2019-12-20 12:14   ` Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).