netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
@ 2019-06-19 21:09 Davide Caratti
  2019-06-19 22:04 ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Davide Caratti @ 2019-06-19 21:09 UTC (permalink / raw)
  To: Vlad Buslov, David S. Miller, netdev; +Cc: Cong Wang, Lucas Bates

on some CPUs (e.g. i686), tcf_walker.cookie has the same size as the IDR.
In this situation, the following script:

 # tc filter add dev eth0 ingress handle 0xffffffff flower action ok
 # tc filter show dev eth0 ingress

results in an infinite loop. It happened also on other CPUs (e.g x86_64),
before commit 061775583e35 ("net: sched: flower: introduce reference
counting for filters"), because 'handle' + 1 made the u32 overflow before
it was assigned to 'cookie'; but that commit replaced the assignment with
a self-increment of 'cookie', so the problem was indirectly fixed.

Ensure not to call idr_get_next_ul() when 'cookie' contains an overflowed
value, and bail out of fl_walk() when its value is equal to ULONG_MAX.
While at it, add a TDC selftest that can be used to reproduce the problem.

 test results (on 5.2.0-0.rc5.git0.1.fc31.i686)

 unpatched (or affected) kernel:
 # ./tdc.py  -e 2ff3 -d dum0
 Test 2ff3: Add flower with max handle and then dump it
 All test results:
 1..1
 not ok 1 2ff3 - Add flower with max handle and then dump it
         Could not match regex pattern. Verify command output:
 Command "/sbin/tc filter show dev dum0 ingress" timed out

 patched (or unaffected) kernel:
 # ./tdc.py  -e 2ff3 -d dum0
 Test 2ff3: Add flower with max handle and then dump it
 All test results:
 1..1
 ok 1 2ff3 - Add flower with max handle and then dump it

Fixes: 01683a146999 ("net: sched: refactor flower walk to iterate over idr")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/cls_flower.c                        |  2 ++
 .../tc-testing/tc-tests/filters/tests.json    | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index eedd5786c084..acc86ae159f4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1702,6 +1702,8 @@ static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 			break;
 		}
 		__fl_put(f);
+		if (arg->cookie == ULONG_MAX)
+			break;
 		arg->cookie++;
 		arg->count++;
 	}
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json b/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
index e2f92cefb8d5..16559c436f21 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
@@ -38,6 +38,25 @@
             "$TC qdisc del dev $DEV1 clsact"
         ]
     },
+    {
+        "id": "2ff3",
+        "name": "Add flower with max handle and then dump it",
+        "category": [
+            "filter",
+            "flower"
+        ],
+        "setup": [
+            "$TC qdisc add dev $DEV2 ingress"
+        ],
+        "cmdUnderTest": "$TC filter add dev $DEV2 protocol ip pref 1 parent ffff: handle 0xffffffff flower action ok",
+        "expExitCode": "0",
+        "verifyCmd": "$TC filter show dev $DEV2 ingress",
+        "matchPattern": "filter protocol ip pref 1 flower.*handle 0xffffffff",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DEV2 ingress"
+        ]
+    },
     {
         "id": "d052",
         "name": "Add 1M filters with the same action",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-19 21:09 [PATCH net] net/sched: flower: fix infinite loop in fl_walk() Davide Caratti
@ 2019-06-19 22:04 ` Cong Wang
  2019-06-20 12:52   ` Davide Caratti
  0 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2019-06-19 22:04 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

On Wed, Jun 19, 2019 at 2:10 PM Davide Caratti <dcaratti@redhat.com> wrote:
>
> on some CPUs (e.g. i686), tcf_walker.cookie has the same size as the IDR.
> In this situation, the following script:
>
>  # tc filter add dev eth0 ingress handle 0xffffffff flower action ok
>  # tc filter show dev eth0 ingress
>
> results in an infinite loop. It happened also on other CPUs (e.g x86_64),
> before commit 061775583e35 ("net: sched: flower: introduce reference
> counting for filters"), because 'handle' + 1 made the u32 overflow before
> it was assigned to 'cookie'; but that commit replaced the assignment with
> a self-increment of 'cookie', so the problem was indirectly fixed.

Interesting... Is this really specific to cls_flower? To me it looks like
a bug of idr_*_ul() API's, especially for idr_for_each_entry_ul().

Can you test if the following command has the same problem on i386?

tc actions add action ok index 4294967295

It is hard for me to find a 32bit CPU.


Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-19 22:04 ` Cong Wang
@ 2019-06-20 12:52   ` Davide Caratti
  2019-06-20 17:33     ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Davide Caratti @ 2019-06-20 12:52 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

hello Cong, thanks for reading.

On Wed, 2019-06-19 at 15:04 -0700, Cong Wang wrote:
> On Wed, Jun 19, 2019 at 2:10 PM Davide Caratti <dcaratti@redhat.com> wrote:
> > on some CPUs (e.g. i686), tcf_walker.cookie has the same size as the IDR.
> > In this situation, the following script:
> > 
> >  # tc filter add dev eth0 ingress handle 0xffffffff flower action ok
> >  # tc filter show dev eth0 ingress
> > 
> > results in an infinite loop. It happened also on other CPUs (e.g x86_64),
> > before commit 061775583e35 ("net: sched: flower: introduce reference
> > counting for filters"), because 'handle' + 1 made the u32 overflow before
> > it was assigned to 'cookie'; but that commit replaced the assignment with
> > a self-increment of 'cookie', so the problem was indirectly fixed.
> 
> Interesting... Is this really specific to cls_flower? To me it looks like
> a bug of idr_*_ul() API's, especially for idr_for_each_entry_ul().

good question, I have to investigate this better (idr_for_each_entry_ul()
expands in a iteration of idr_get_next_ul()). It surely got in cls_flower
when it was converted to use IDRs, but it's true that there might be other
points in TC where IDR are used and the same pattern is present (see
below).

> Can you test if the following command has the same problem on i386?
> 
> tc actions add action ok index 4294967295

the action is added, but then reading it back results in an infinite loop.
And again, the infinite loop happens on i686 and not on x86_64. I will try
to see where's the problem also here.

-- 
davide


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-20 12:52   ` Davide Caratti
@ 2019-06-20 17:33     ` Cong Wang
  2019-06-25 15:47       ` Davide Caratti
  0 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2019-06-20 17:33 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

On Thu, Jun 20, 2019 at 5:52 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> hello Cong, thanks for reading.
>
> On Wed, 2019-06-19 at 15:04 -0700, Cong Wang wrote:
> > On Wed, Jun 19, 2019 at 2:10 PM Davide Caratti <dcaratti@redhat.com> wrote:
> > > on some CPUs (e.g. i686), tcf_walker.cookie has the same size as the IDR.
> > > In this situation, the following script:
> > >
> > >  # tc filter add dev eth0 ingress handle 0xffffffff flower action ok
> > >  # tc filter show dev eth0 ingress
> > >
> > > results in an infinite loop. It happened also on other CPUs (e.g x86_64),
> > > before commit 061775583e35 ("net: sched: flower: introduce reference
> > > counting for filters"), because 'handle' + 1 made the u32 overflow before
> > > it was assigned to 'cookie'; but that commit replaced the assignment with
> > > a self-increment of 'cookie', so the problem was indirectly fixed.
> >
> > Interesting... Is this really specific to cls_flower? To me it looks like
> > a bug of idr_*_ul() API's, especially for idr_for_each_entry_ul().
>
> good question, I have to investigate this better (idr_for_each_entry_ul()
> expands in a iteration of idr_get_next_ul()). It surely got in cls_flower
> when it was converted to use IDRs, but it's true that there might be other
> points in TC where IDR are used and the same pattern is present (see
> below).


Yeah, this means we probably want to fix it in idr_get_next_ul() or its
callers like idr_for_each_entry_ul().

>
> > Can you test if the following command has the same problem on i386?
> >
> > tc actions add action ok index 4294967295
>
> the action is added, but then reading it back results in an infinite loop.
> And again, the infinite loop happens on i686 and not on x86_64. I will try
> to see where's the problem also here.

Right, this is what I expect, thanks for confirming it.

I am not sure it is better to handle this overflow inside idr_get_next_ul()
or just let its callers to handle it. According to the comments above
idr_get_next_ul() it sounds like it is not expected to overflow, so...

diff --git a/lib/idr.c b/lib/idr.c
index c34e256d2f01..a38f5e391cec 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -267,6 +267,9 @@ void *idr_get_next_ul(struct idr *idr, unsigned
long *nextid)
        if (!slot)
                return NULL;

+       /* overflow */
+       if (iter.index < id)
+               return NULL;
        *nextid = iter.index + base;
        return rcu_dereference_raw(*slot);
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-20 17:33     ` Cong Wang
@ 2019-06-25 15:47       ` Davide Caratti
  2019-06-25 16:23         ` Davide Caratti
  2019-06-25 18:07         ` Cong Wang
  0 siblings, 2 replies; 12+ messages in thread
From: Davide Caratti @ 2019-06-25 15:47 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

On Thu, 2019-06-20 at 10:33 -0700, Cong Wang wrote:
> On Thu, Jun 20, 2019 at 5:52 AM Davide Caratti <dcaratti@redhat.com> wrote:
> > hello Cong, thanks for reading.
> > 
> > On Wed, 2019-06-19 at 15:04 -0700, Cong Wang wrote:
> > > On Wed, Jun 19, 2019 at 2:10 PM Davide Caratti <dcaratti@redhat.com> wrote:
> > > > on some CPUs (e.g. i686), tcf_walker.cookie has the same size as the IDR.
> > > > In this situation, the following script:
> > > > 
> > > >  # tc filter add dev eth0 ingress handle 0xffffffff flower action ok
> > > >  # tc filter show dev eth0 ingress
> > > > 
> > > > results in an infinite loop.

[...]

> I am not sure it is better to handle this overflow inside idr_get_next_ul()
> or just let its callers to handle it. According to the comments above
> idr_get_next_ul() it sounds like it is not expected to overflow, so...
> 
> diff --git a/lib/idr.c b/lib/idr.c
> index c34e256d2f01..a38f5e391cec 100644
> --- a/lib/idr.c
> +++ b/lib/idr.c
> @@ -267,6 +267,9 @@ void *idr_get_next_ul(struct idr *idr, unsigned
> long *nextid)
>         if (!slot)
>                 return NULL;
> 
> +       /* overflow */
> +       if (iter.index < id)
> +               return NULL;
>         *nextid = iter.index + base;
>         return rcu_dereference_raw(*slot);
>  }

hello Cong,

I tested the above patch, but I still see the infinite loop on kernel
5.2.0-0.rc5.git0.1.fc31.i686 . 

idr_get_next_ul() returns the entry in the radix tree which is greater or 
equal to '*nextid' (which has the same value as 'id' in the above hunk). 
So, when the radix tree contains one slot with index equal to ULONG_MAX,
whatever can be the value of 'id', the condition in that if() will always 
be false (and the function will keep  returning non-NULL, hence the 
infinite loop).

I also tried this:

if (iter.index == id && id == ULONG_MAX) {
	return NULL;
}

it fixes the infinite loop, but it clearly breaks the function semantic
(and anyway, it's not sufficient to fix my test, at least with cls_flower
it still dumps the entry with id 0xffffffff several times).  I'm for
fixing the callers of idr_get_next_ul(), and in details:

- apply this patch for cls_flower
- change tcf_dump_walker() in act_api.c as follows, and add a TDC testcase
for 'gact'.

index 4e5d2e9ace5d..f34888c8a952 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -228,8 +228,11 @@ static int tcf_dump_walker(struct tcf_idrinfo
*idrinfo, struct sk_buff *skb,
 
        idr_for_each_entry_ul(idr, p, id) {
                index++;
-               if (index < s_i)
+               if (index < s_i) {
+                       if (id == ULONG_MAX)
+                               break;
                        continue;
+               }
 
                if (jiffy_since &&
                    time_after(jiffy_since,


WDYT?

thanks a lot,
-- 
davide


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-25 15:47       ` Davide Caratti
@ 2019-06-25 16:23         ` Davide Caratti
  2019-06-25 18:07         ` Cong Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Davide Caratti @ 2019-06-25 16:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

On Tue, 2019-06-25 at 17:47 +0200, Davide Caratti wrote:
> On Thu, 2019-06-20 at 10:33 -0700, Cong Wang wrote:
> > On Thu, Jun 20, 2019 at 5:52 AM Davide Caratti <dcaratti@redhat.com> wrote:
> > > hello Cong, thanks for reading.
> > > 
> > > On Wed, 2019-06-19 at 15:04 -0700, Cong Wang wrote:
> > > > On Wed, Jun 19, 2019 at 2:10 PM Davide Caratti <dcaratti@redhat.com> wrote:
> > > > > on some CPUs (e.g. i686), tcf_walker.cookie has the same size as the IDR.
> > > > > In this situation, the following script:
> > > > > 
> > > > >  # tc filter add dev eth0 ingress handle 0xffffffff flower action ok
> > > > >  # tc filter show dev eth0 ingress
> > > > > 
> > > > > results in an infinite loop.
> 
> So, when the radix tree contains one slot with index equal to ULONG_MAX,
> whatever can be the value of 'id',

oops, this phrase is of course wrong. the value of 'id' matters to
determine the condition of the if().

>  the condition in that if() will always 
> be false (and the function will keep  returning non-NULL, hence the 
> infinite loop).

what I wanted to say is, when the radix tree contains a single slot with
index equal to ULONG_MAX, whatever value I put in 'id' the function will
always return a pointer to that slot.

-- 
davide


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-25 15:47       ` Davide Caratti
  2019-06-25 16:23         ` Davide Caratti
@ 2019-06-25 18:07         ` Cong Wang
  2019-06-25 19:29           ` Cong Wang
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2019-06-25 18:07 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

Hello,

On Tue, Jun 25, 2019 at 8:47 AM Davide Caratti <dcaratti@redhat.com> wrote:
> hello Cong,
>
> I tested the above patch, but I still see the infinite loop on kernel
> 5.2.0-0.rc5.git0.1.fc31.i686 .
>
> idr_get_next_ul() returns the entry in the radix tree which is greater or
> equal to '*nextid' (which has the same value as 'id' in the above hunk).
> So, when the radix tree contains one slot with index equal to ULONG_MAX,
> whatever can be the value of 'id', the condition in that if() will always
> be false (and the function will keep  returning non-NULL, hence the
> infinite loop).
>
> I also tried this:
>
> if (iter.index == id && id == ULONG_MAX) {
>         return NULL;
> }
>
> it fixes the infinite loop, but it clearly breaks the function semantic
> (and anyway, it's not sufficient to fix my test, at least with cls_flower
> it still dumps the entry with id 0xffffffff several times).  I'm for
> fixing the callers of idr_get_next_ul(), and in details:

It now becomes more interesting.

On one hand, its callers should not need to worry about details
like overflow. On the other hand, in fact it does exactly what its
callers tell it to do, the problematic part is actually the
incremented id. On 64bit, it is fairly easy, we can just simply
know 'long' is longer than 32bit and leverage this to detect overflow,
but on 32bit this clearly doesn't work.

Let me think about it.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-25 18:07         ` Cong Wang
@ 2019-06-25 19:29           ` Cong Wang
  2019-06-26  0:05             ` Cong Wang
  2019-06-26 21:15             ` Cong Wang
  0 siblings, 2 replies; 12+ messages in thread
From: Cong Wang @ 2019-06-25 19:29 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

On Tue, Jun 25, 2019 at 11:07 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On one hand, its callers should not need to worry about details
> like overflow. On the other hand, in fact it does exactly what its
> callers tell it to do, the problematic part is actually the
> incremented id. On 64bit, it is fairly easy, we can just simply
> know 'long' is longer than 32bit and leverage this to detect overflow,
> but on 32bit this clearly doesn't work.
>
> Let me think about it.

Davide, do you mind to try the attached patch?

It should handle this overflow case more gracefully, I hope.

Thanks.

[-- Attachment #2: idr_get_next_ul.diff --]
[-- Type: application/octet-stream, Size: 2903 bytes --]

commit 685934f9eed9b50a46d33a9ec9671800397d20cc
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Tue Jun 25 12:23:18 2019 -0700

    idr: fix idr_get_next_ul() usage
    
    Reported-by: Li Shuang <shuali@redhat.com>
    Cc: Davide Caratti <dcaratti@redhat.com>
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index c6c28f56aa29..c73f80bddde4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -105,10 +105,11 @@ static struct list_head *mlx5_fc_counters_lookup_next(struct mlx5_core_dev *dev,
 
 	rcu_read_lock();
 	/* skip counters that are in idr, but not yet in counters list */
-	while ((counter = idr_get_next_ul(&fc_stats->counters_idr,
-					  &next_id)) != NULL &&
-	       list_empty(&counter->list))
-		next_id++;
+	idr_for_each_entry_continue_ul(&fc_stats->counters_idr,
+				       counter, next_id) {
+		if (list_empty(&counter->list))
+			continue;
+	}
 	rcu_read_unlock();
 
 	return counter ? &counter->list : &fc_stats->counters;
diff --git a/include/linux/idr.h b/include/linux/idr.h
index ee7abae143d3..af7a67e65c1c 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -198,7 +198,21 @@ static inline void idr_preload_end(void)
  * is convenient for a "not found" value.
  */
 #define idr_for_each_entry_ul(idr, entry, id)			\
-	for (id = 0; ((entry) = idr_get_next_ul(idr, &(id))) != NULL; ++id)
+	for (id = 0;						\
+	     (u32)id + 1 > id && ((entry) = idr_get_next_ul(idr, &(id))) != NULL; \
+	     ++id)
+
+/**
+ * idr_for_each_entry_continue_ul() - Continue iteration over an IDR's elements of a given type
+ * @idr: IDR handle.
+ * @entry: The type * to use as a cursor.
+ * @id: Entry ID.
+ *
+ * Continue to iterate over entries, continuing after the current position.
+ */
+#define idr_for_each_entry_continue_ul(idr, entry, id)			\
+	for (; (u32)id + 1 > id && ((entry) = idr_get_next_ul(idr, &(id))) != NULL; \
+	     ++id)
 
 /**
  * idr_for_each_entry_continue() - Continue iteration over an IDR's elements of a given type
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index eedd5786c084..06338dadd5e4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -528,17 +528,18 @@ static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
 						unsigned long *handle)
 {
 	struct cls_fl_head *head = fl_head_dereference(tp);
+	unsigned long id = *handle;
 	struct cls_fl_filter *f;
 
 	rcu_read_lock();
-	while ((f = idr_get_next_ul(&head->handle_idr, handle))) {
+	idr_for_each_entry_continue_ul(&head->handle_idr, f, id) {
 		/* don't return filters that are being deleted */
 		if (refcount_inc_not_zero(&f->refcnt))
 			break;
-		++(*handle);
 	}
 	rcu_read_unlock();
 
+	*handle = id;
 	return f;
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-25 19:29           ` Cong Wang
@ 2019-06-26  0:05             ` Cong Wang
  2019-06-26 21:15             ` Cong Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Cong Wang @ 2019-06-26  0:05 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

On Tue, Jun 25, 2019 at 12:29 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Tue, Jun 25, 2019 at 11:07 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On one hand, its callers should not need to worry about details
> > like overflow. On the other hand, in fact it does exactly what its
> > callers tell it to do, the problematic part is actually the
> > incremented id. On 64bit, it is fairly easy, we can just simply
> > know 'long' is longer than 32bit and leverage this to detect overflow,
> > but on 32bit this clearly doesn't work.
> >
> > Let me think about it.
>
> Davide, do you mind to try the attached patch?
>
> It should handle this overflow case more gracefully, I hope.

Well, it looks like it would miss UINT_MAX... Let me see how this
can be fixed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-25 19:29           ` Cong Wang
  2019-06-26  0:05             ` Cong Wang
@ 2019-06-26 21:15             ` Cong Wang
  2019-06-27 22:10               ` Davide Caratti
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2019-06-26 21:15 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

[-- Attachment #1: Type: text/plain, Size: 253 bytes --]

Hi, Davide

On Tue, Jun 25, 2019 at 12:29 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> It should handle this overflow case more gracefully, I hope.
>

Please try this attached one and let me know if it works.
Hope I get it right this time.

Thanks!

[-- Attachment #2: idr_get_next_ul.patch --]
[-- Type: application/octet-stream, Size: 4956 bytes --]

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index c6c28f56aa29..b080e6f3488d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -102,13 +102,15 @@ static struct list_head *mlx5_fc_counters_lookup_next(struct mlx5_core_dev *dev,
 	struct mlx5_fc_stats *fc_stats = &dev->priv.fc_stats;
 	unsigned long next_id = (unsigned long)id + 1;
 	struct mlx5_fc *counter;
+	unsigned long tmp;
 
 	rcu_read_lock();
 	/* skip counters that are in idr, but not yet in counters list */
-	while ((counter = idr_get_next_ul(&fc_stats->counters_idr,
-					  &next_id)) != NULL &&
-	       list_empty(&counter->list))
-		next_id++;
+	idr_for_each_entry_continue_ul(&fc_stats->counters_idr,
+				       counter, tmp, next_id) {
+		if (list_empty(&counter->list))
+			continue;
+	}
 	rcu_read_unlock();
 
 	return counter ? &counter->list : &fc_stats->counters;
diff --git a/include/linux/idr.h b/include/linux/idr.h
index ee7abae143d3..4ec8986e5dfb 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -191,14 +191,17 @@ static inline void idr_preload_end(void)
  * idr_for_each_entry_ul() - Iterate over an IDR's elements of a given type.
  * @idr: IDR handle.
  * @entry: The type * to use as cursor.
+ * @tmp: A temporary placeholder for ID.
  * @id: Entry ID.
  *
  * @entry and @id do not need to be initialized before the loop, and
  * after normal termination @entry is left with the value NULL.  This
  * is convenient for a "not found" value.
  */
-#define idr_for_each_entry_ul(idr, entry, id)			\
-	for (id = 0; ((entry) = idr_get_next_ul(idr, &(id))) != NULL; ++id)
+#define idr_for_each_entry_ul(idr, entry, tmp, id)			\
+	for (tmp = 0, id = 0;						\
+	     tmp <= id && ((entry) = idr_get_next_ul(idr, &(id))) != NULL; \
+	     tmp = id, ++id)
 
 /**
  * idr_for_each_entry_continue() - Continue iteration over an IDR's elements of a given type
@@ -213,6 +216,20 @@ static inline void idr_preload_end(void)
 	     entry;							\
 	     ++id, (entry) = idr_get_next((idr), &(id)))
 
+/**
+ * idr_for_each_entry_continue_ul() - Continue iteration over an IDR's elements of a given type
+ * @idr: IDR handle.
+ * @entry: The type * to use as a cursor.
+ * @tmp: A temporary placeholder for ID.
+ * @id: Entry ID.
+ *
+ * Continue to iterate over entries, continuing after the current position.
+ */
+#define idr_for_each_entry_continue_ul(idr, entry, tmp, id)		\
+	for (tmp = id;							\
+	     tmp <= id && ((entry) = idr_get_next_ul(idr, &(id))) != NULL; \
+	     tmp = id, ++id)
+
 /*
  * IDA - ID Allocator, use when translation from id to pointer isn't necessary.
  */
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 5567af5d7cb5..835adde28a7e 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -221,12 +221,13 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	struct idr *idr = &idrinfo->action_idr;
 	struct tc_action *p;
 	unsigned long id = 1;
+	unsigned long tmp;
 
 	mutex_lock(&idrinfo->lock);
 
 	s_i = cb->args[0];
 
-	idr_for_each_entry_ul(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, tmp, id) {
 		index++;
 		if (index < s_i)
 			continue;
@@ -292,6 +293,7 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	struct idr *idr = &idrinfo->action_idr;
 	struct tc_action *p;
 	unsigned long id = 1;
+	unsigned long tmp;
 
 	nest = nla_nest_start_noflag(skb, 0);
 	if (nest == NULL)
@@ -300,7 +302,7 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 		goto nla_put_failure;
 
 	mutex_lock(&idrinfo->lock);
-	idr_for_each_entry_ul(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, tmp, id) {
 		ret = tcf_idr_release_unsafe(p);
 		if (ret == ACT_P_DELETED) {
 			module_put(ops->owner);
@@ -533,8 +535,9 @@ void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
 	struct tc_action *p;
 	int ret;
 	unsigned long id = 1;
+	unsigned long tmp;
 
-	idr_for_each_entry_ul(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, tmp, id) {
 		ret = __tcf_idr_release(p, false, true);
 		if (ret == ACT_P_DELETED)
 			module_put(ops->owner);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index eedd5786c084..01c361eb16a2 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -528,17 +528,19 @@ static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
 						unsigned long *handle)
 {
 	struct cls_fl_head *head = fl_head_dereference(tp);
+	unsigned long id = *handle;
 	struct cls_fl_filter *f;
+	unsigned long tmp;
 
 	rcu_read_lock();
-	while ((f = idr_get_next_ul(&head->handle_idr, handle))) {
+	idr_for_each_entry_continue_ul(&head->handle_idr, f, tmp, id) {
 		/* don't return filters that are being deleted */
 		if (refcount_inc_not_zero(&f->refcnt))
 			break;
-		++(*handle);
 	}
 	rcu_read_unlock();
 
+	*handle = id;
 	return f;
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-26 21:15             ` Cong Wang
@ 2019-06-27 22:10               ` Davide Caratti
  2019-06-28  1:24                 ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Davide Caratti @ 2019-06-27 22:10 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

On Wed, 2019-06-26 at 14:15 -0700, Cong Wang wrote:
> Hi, Davide
> 
> On Tue, Jun 25, 2019 at 12:29 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > It should handle this overflow case more gracefully, I hope.
> > 
> 
> Please try this attached one and let me know if it works.
> Hope I get it right this time.
> 
> Thanks!

hello Cong, and thanks a lot for the patch!
I see it uses 

    (tmp <= id)

as the condition to detect the overflow, and at each iteration it does

    tmp = id, ++id

so that 'tmp' contains the last IDR found in the tree and 'id' is the next
tentative value to be searched for. When 'id' overflows, (tmp <= id)
becomes false, and the 'for' loop exits.
I tested it successfully with TC actions having the highest possible
index: 'tc actions show' doesn't loop anymore. But with cls_flower (that
uses idr_for_each_entry_continue_ul() ) I still see the infinite loop:
even when idr_for_each_entry_continue_ul() is used, fl_get_next_filter() 
never returns NULL, because 

    (tmp <= id) && (((entry) = idr_get_next_ul(idr, &(id))) != NULL)

calls idr_get_next_ul(idr, &(id)) at least once. So, even if
idr_for_each_entry_continue_ul() detected the overflow of 'id' after the
first iteration, and bailouts the for loop, fl_get_next_filter()
repeatedly returns a pointer to the idr slot with index equal to
0xffffffff. Because of that, the while() loop in fl_walk() keeps dumping
the same rule.
In my original patch I found easier to check for the overflow of
arg->cookie in fl_walk(), before the self-increment, so I was sure that 

    arg->fn(tp, f, arg)

was already called once when 'f' was the slot having the highest possible
IDR. Now, I didn't check it, but I guess 

    refcount_inc_not_zero(&f->refcnt))

in fl_get_next_filter() is always true during my test, so the inner
while() loop is not endless, even when the idr has a slot with id equal to
ULONG_MAX. Probably, to stay on the safe side, cls_flower needs both tests
to be in place, what do you think?  

-- 
davide




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net/sched: flower: fix infinite loop in fl_walk()
  2019-06-27 22:10               ` Davide Caratti
@ 2019-06-28  1:24                 ` Cong Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Cong Wang @ 2019-06-28  1:24 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Vlad Buslov, David S. Miller, Linux Kernel Network Developers,
	Lucas Bates

On Thu, Jun 27, 2019 at 3:10 PM Davide Caratti <dcaratti@redhat.com> wrote:
>
> On Wed, 2019-06-26 at 14:15 -0700, Cong Wang wrote:
> > Hi, Davide
> >
> > On Tue, Jun 25, 2019 at 12:29 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > It should handle this overflow case more gracefully, I hope.
> > >
> >
> > Please try this attached one and let me know if it works.
> > Hope I get it right this time.
> >
> > Thanks!
>
> hello Cong, and thanks a lot for the patch!
> I see it uses
>
>     (tmp <= id)
>
> as the condition to detect the overflow, and at each iteration it does
>
>     tmp = id, ++id
>
> so that 'tmp' contains the last IDR found in the tree and 'id' is the next
> tentative value to be searched for. When 'id' overflows, (tmp <= id)
> becomes false, and the 'for' loop exits.

Yes, thanks for testing it with tc actions.


> I tested it successfully with TC actions having the highest possible
> index: 'tc actions show' doesn't loop anymore. But with cls_flower (that
> uses idr_for_each_entry_continue_ul() ) I still see the infinite loop:
> even when idr_for_each_entry_continue_ul() is used, fl_get_next_filter()
> never returns NULL, because
>
>     (tmp <= id) && (((entry) = idr_get_next_ul(idr, &(id))) != NULL)
>
> calls idr_get_next_ul(idr, &(id)) at least once. So, even if
> idr_for_each_entry_continue_ul() detected the overflow of 'id' after the
> first iteration, and bailouts the for loop, fl_get_next_filter()
> repeatedly returns a pointer to the idr slot with index equal to
> 0xffffffff. Because of that, the while() loop in fl_walk() keeps dumping
> the same rule.


Good catch, it is actually the arg->cookie++ which causes the trouble
here.


> In my original patch I found easier to check for the overflow of
> arg->cookie in fl_walk(), before the self-increment, so I was sure that
>
>     arg->fn(tp, f, arg)
>
> was already called once when 'f' was the slot having the highest possible
> IDR. Now, I didn't check it, but I guess
>
>     refcount_inc_not_zero(&f->refcnt))
>
> in fl_get_next_filter() is always true during my test, so the inner
> while() loop is not endless, even when the idr has a slot with id equal to
> ULONG_MAX. Probably, to stay on the safe side, cls_flower needs both tests
> to be in place, what do you think?

I think we can just fold the nested loops into one for cls_flower and remove
the arg->cookie++.

What's more, arg->cookie could overflow too,seems we have to switch back to
arg->skip. I am not sure, if this is really a problem we can fix it separately.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-28  1:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-19 21:09 [PATCH net] net/sched: flower: fix infinite loop in fl_walk() Davide Caratti
2019-06-19 22:04 ` Cong Wang
2019-06-20 12:52   ` Davide Caratti
2019-06-20 17:33     ` Cong Wang
2019-06-25 15:47       ` Davide Caratti
2019-06-25 16:23         ` Davide Caratti
2019-06-25 18:07         ` Cong Wang
2019-06-25 19:29           ` Cong Wang
2019-06-26  0:05             ` Cong Wang
2019-06-26 21:15             ` Cong Wang
2019-06-27 22:10               ` Davide Caratti
2019-06-28  1:24                 ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).