linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH next-next 0/2] qdisc-hashtable fixes
@ 2016-08-16 21:52 Jiri Kosina
  2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jiri Kosina @ 2016-08-16 21:52 UTC (permalink / raw)
  To: David Miller, Daniel Borkmann, David Ahern
  Cc: Simon Horman, Cong Wang, Eric Dumazet, Jamal Hadi Salim,
	Phil Sutter, linux-kernel, netdev

The following two patches fix all the issues that have been reported 
against the conversion of qdisc linked list to hashtable (currently in 
net-next) so far.

First patch adjusts handling of singleton qdiscs to the new semantics, and 
is rather straightforward.

The second patch, which fixes "cosmetic" issue of duplicate entries in the 
qdisc dump for ingress qdiscs, is a little bit more hairy; I personally 
would love to see all the already existing "if (ingress)"-like hacks go 
away (by, let's say, introducing a general TCQ_F_? flag), but that's way 
out of scope of this patchset (but already on my todo).

Thanks a lot to Daniel Borkmann and David Ahern for reporting the issues 
and testing the patches promptly.

Jiri Kosina (2):
      net: sched: fix handling of singleton qdiscs with qdisc_hash
      net: sched: avoid duplicates in qdisc dump

 net/sched/sch_api.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash
  2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
@ 2016-08-16 21:52 ` Jiri Kosina
  2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
  2016-08-19  4:19 ` [PATCH next-next 0/2] qdisc-hashtable fixes David Miller
  2 siblings, 0 replies; 6+ messages in thread
From: Jiri Kosina @ 2016-08-16 21:52 UTC (permalink / raw)
  To: David Miller, Daniel Borkmann, David Ahern
  Cc: Simon Horman, Cong Wang, Eric Dumazet, Jamal Hadi Salim,
	Phil Sutter, linux-kernel, netdev

From: Jiri Kosina <jkosina@suse.cz>

qdisc_match_from_root() is now iterating over per-netdevice qdisc 
hashtable instead of going through a linked-list of qdiscs (independently 
on the actual underlying netdev), which was the case before the switch to 
hashtable for qdiscs.

For singleton qdiscs, there is no underlying netdev associated though, and 
therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will 
always be NULL.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
 IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
 PGD 1aceba067 PUD 1aceb7067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP
[ ... ]
 task: ffff8801ec996e00 task.stack: ffff8801ec934000
 RIP: 0010:[<ffffffff8167efac>]  [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
 RSP: 0018:ffff8801ec937ab0  EFLAGS: 00010203
 RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
 RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
 RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
 R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
 R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
 FS:  00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
 Stack:
  ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
  ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
  ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
 Call Trace:
  [<ffffffff81681210>] qdisc_lookup+0x40/0x50
  [<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
  [<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
  [<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
  [<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
  [<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
  [<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
  [<ffffffff8168919b>] netlink_unicast+0x15b/0x210
  [<ffffffff81689569>] netlink_sendmsg+0x319/0x390
  [<ffffffff816379f8>] sock_sendmsg+0x38/0x50
  [<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
  [<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
  [<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
  [<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
  [<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
  [<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
  [<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
  [<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
  [<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
  [<ffffffff810038e7>] do_syscall_64+0x57/0xb0

Fix this by special-casing singleton qdiscs (those that don't have 
underlying netdevice) and introduce immediate handling of those rather 
than trying to go over an underlying netdevice. We're in the same 
situation in tc_dump_qdisc_root() and tc_dump_tclass_root().

Ultimately, this will have to be slightly reworked so that we are actually 
able to show singleton qdiscs (noop) in the dump properly; but we're not 
currently doing that anyway, so no regression there, and better do this in 
a gradual manner.

Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 net/sched/sch_api.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c093d32..2b75d68 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 {
 	struct Qdisc *q;
 
+	if (!qdisc_dev(root))
+		return (root->handle == handle ? root : NULL);
+
 	if (!(root->flags & TCQ_F_BUILTIN) &&
 	    root->handle == handle)
 		return root;
@@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
+
+	if (!qdisc_dev(root))
+		goto out;
+
 	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
@@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
 		return -1;
 
+	if (!qdisc_dev(root))
+		return 0;
+
 	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] net: sched: avoid duplicates in qdisc dump
  2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
  2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
@ 2016-08-16 21:53 ` Jiri Kosina
  2016-08-19  1:39   ` Cong Wang
  2016-08-19  4:19 ` [PATCH next-next 0/2] qdisc-hashtable fixes David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Jiri Kosina @ 2016-08-16 21:53 UTC (permalink / raw)
  To: David Miller, Daniel Borkmann, David Ahern
  Cc: Simon Horman, Cong Wang, Eric Dumazet, Jamal Hadi Salim,
	Phil Sutter, linux-kernel, netdev

From: Jiri Kosina <jkosina@suse.cz>

tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases; 
first, the "standard" dev->qdisc is being dumped. Second, if there is/are 
ingress queue(s), they are being dumped as well.

After conversion of netdevice's qdisc linked-list into hashtable, these 
two sets are not in two disjunctive sets/lists any more, but are both 
"reachable" directly from netdevice's hashtable. As a consequence, the 
"full-depth" dump of the ingress qdiscs results in immediately hitting the 
netdevice hashtable again, and duplicating the dump that has already been 
performed for dev->qdisc. 
What in fact needs to be dumped in case of ingress queue is "just" the 
top-level ingress qdisc, as everything else has been dumped already.

Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed
whether it should (while performing the "full" per-netdev qdisc dump) perform
the whole recursion, or just dump "additional" top-level (ingress) qdiscs
without performing any kind of recursion.

This fixes duplicate dumps such as

	qdisc mq 0: root
	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
	qdisc clsact ffff: parent ffff:fff1
	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 net/sched/sch_api.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2b75d68..749811f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1440,7 +1440,7 @@ err_out:
 
 static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			      struct netlink_callback *cb,
-			      int *q_idx_p, int s_q_idx)
+			      int *q_idx_p, int s_q_idx, bool recur)
 {
 	int ret = 0, q_idx = *q_idx_p;
 	struct Qdisc *q;
@@ -1460,7 +1460,13 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 		q_idx++;
 	}
 
-	if (!qdisc_dev(root))
+	/* If dumping singletons, there is no qdisc_dev(root) and the singleton
+	 * itself has already been dumped.
+	 *
+	 * If we've already dumped the top-level (ingress) qdisc above and the global
+	 * qdisc hashtable, we don't want to hit it again
+	 */
+	if (!qdisc_dev(root) || !recur)
 		goto out;
 
 	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
@@ -1504,13 +1510,13 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
 			s_q_idx = 0;
 		q_idx = 0;
 
-		if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx) < 0)
+		if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx, true) < 0)
 			goto done;
 
 		dev_queue = dev_ingress_queue(dev);
 		if (dev_queue &&
 		    tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb,
-				       &q_idx, s_q_idx) < 0)
+				       &q_idx, s_q_idx, false) < 0)
 			goto done;
 
 cont:
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] net: sched: avoid duplicates in qdisc dump
  2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
@ 2016-08-19  1:39   ` Cong Wang
  2016-08-19 21:48     ` Jiri Kosina
  0 siblings, 1 reply; 6+ messages in thread
From: Cong Wang @ 2016-08-19  1:39 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: David Miller, Daniel Borkmann, David Ahern, Simon Horman,
	Eric Dumazet, Jamal Hadi Salim, Phil Sutter, LKML,
	Linux Kernel Network Developers

On Tue, Aug 16, 2016 at 2:53 PM, Jiri Kosina <jikos@kernel.org> wrote:
> From: Jiri Kosina <jkosina@suse.cz>
>
> tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases;
> first, the "standard" dev->qdisc is being dumped. Second, if there is/are
> ingress queue(s), they are being dumped as well.
>
> After conversion of netdevice's qdisc linked-list into hashtable, these
> two sets are not in two disjunctive sets/lists any more, but are both
> "reachable" directly from netdevice's hashtable. As a consequence, the
> "full-depth" dump of the ingress qdiscs results in immediately hitting the
> netdevice hashtable again, and duplicating the dump that has already been
> performed for dev->qdisc.
> What in fact needs to be dumped in case of ingress queue is "just" the
> top-level ingress qdisc, as everything else has been dumped already.

Doesn't this mean we can now just remove the ingress case from
tc_dump_qdisc() and simply iterate the whole hash table?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH next-next 0/2] qdisc-hashtable fixes
  2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
  2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
  2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
@ 2016-08-19  4:19 ` David Miller
  2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2016-08-19  4:19 UTC (permalink / raw)
  To: jikos
  Cc: daniel, dsa, simon.horman, xiyou.wangcong, eric.dumazet, jhs,
	phil, linux-kernel, netdev

From: Jiri Kosina <jikos@kernel.org>
Date: Tue, 16 Aug 2016 23:52:08 +0200 (CEST)

> The following two patches fix all the issues that have been reported 
> against the conversion of qdisc linked list to hashtable (currently in 
> net-next) so far.
> 
> First patch adjusts handling of singleton qdiscs to the new semantics, and 
> is rather straightforward.
> 
> The second patch, which fixes "cosmetic" issue of duplicate entries in the 
> qdisc dump for ingress qdiscs, is a little bit more hairy; I personally 
> would love to see all the already existing "if (ingress)"-like hacks go 
> away (by, let's say, introducing a general TCQ_F_? flag), but that's way 
> out of scope of this patchset (but already on my todo).
> 
> Thanks a lot to Daniel Borkmann and David Ahern for reporting the issues 
> and testing the patches promptly.

Series applied, thanks Jiri.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] net: sched: avoid duplicates in qdisc dump
  2016-08-19  1:39   ` Cong Wang
@ 2016-08-19 21:48     ` Jiri Kosina
  0 siblings, 0 replies; 6+ messages in thread
From: Jiri Kosina @ 2016-08-19 21:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Daniel Borkmann, David Ahern, Simon Horman,
	Eric Dumazet, Jamal Hadi Salim, Phil Sutter, LKML,
	Linux Kernel Network Developers

On Thu, 18 Aug 2016, Cong Wang wrote:

> Doesn't this mean we can now just remove the ingress case from 
> tc_dump_qdisc() and simply iterate the whole hash table?

That'd be indeed a nice cleanup, but for that we'll have to make sure that 
the top-level ingress entry makes it to the hashtable as well; that's not 
the case currently, and hence the special-casing (but I agree that it's in 
principle just a remnant of the old design and of a rather 1:1 
list->hashtable conversion, and there is no good reason for keeping it 
this way any more).

So patch that'd add ingress qdisc to the hashtable whenever it 
materializes should be enough to get rid of the "we have to walk two 
linked lists" heritage and the ingress special-casing for dumps.

Will look into it next week unless someone is faster.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-19 21:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
2016-08-19  1:39   ` Cong Wang
2016-08-19 21:48     ` Jiri Kosina
2016-08-19  4:19 ` [PATCH next-next 0/2] qdisc-hashtable fixes David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).