* [PATCH next-next 0/2] qdisc-hashtable fixes
@ 2016-08-16 21:52 Jiri Kosina
2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Jiri Kosina @ 2016-08-16 21:52 UTC (permalink / raw)
To: David Miller, Daniel Borkmann, David Ahern
Cc: Simon Horman, Cong Wang, Eric Dumazet, Jamal Hadi Salim,
Phil Sutter, linux-kernel, netdev
The following two patches fix all the issues that have been reported
against the conversion of qdisc linked list to hashtable (currently in
net-next) so far.
First patch adjusts handling of singleton qdiscs to the new semantics, and
is rather straightforward.
The second patch, which fixes "cosmetic" issue of duplicate entries in the
qdisc dump for ingress qdiscs, is a little bit more hairy; I personally
would love to see all the already existing "if (ingress)"-like hacks go
away (by, let's say, introducing a general TCQ_F_? flag), but that's way
out of scope of this patchset (but already on my todo).
Thanks a lot to Daniel Borkmann and David Ahern for reporting the issues
and testing the patches promptly.
Jiri Kosina (2):
net: sched: fix handling of singleton qdiscs with qdisc_hash
net: sched: avoid duplicates in qdisc dump
net/sched/sch_api.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash
2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
@ 2016-08-16 21:52 ` Jiri Kosina
2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
2016-08-19 4:19 ` [PATCH next-next 0/2] qdisc-hashtable fixes David Miller
2 siblings, 0 replies; 6+ messages in thread
From: Jiri Kosina @ 2016-08-16 21:52 UTC (permalink / raw)
To: David Miller, Daniel Borkmann, David Ahern
Cc: Simon Horman, Cong Wang, Eric Dumazet, Jamal Hadi Salim,
Phil Sutter, linux-kernel, netdev
From: Jiri Kosina <jkosina@suse.cz>
qdisc_match_from_root() is now iterating over per-netdevice qdisc
hashtable instead of going through a linked-list of qdiscs (independently
on the actual underlying netdev), which was the case before the switch to
hashtable for qdiscs.
For singleton qdiscs, there is no underlying netdev associated though, and
therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will
always be NULL.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
PGD 1aceba067 PUD 1aceb7067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
[ ... ]
task: ffff8801ec996e00 task.stack: ffff8801ec934000
RIP: 0010:[<ffffffff8167efac>] [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
RSP: 0018:ffff8801ec937ab0 EFLAGS: 00010203
RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
FS: 00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
Stack:
ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
Call Trace:
[<ffffffff81681210>] qdisc_lookup+0x40/0x50
[<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
[<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
[<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
[<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
[<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
[<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
[<ffffffff8168919b>] netlink_unicast+0x15b/0x210
[<ffffffff81689569>] netlink_sendmsg+0x319/0x390
[<ffffffff816379f8>] sock_sendmsg+0x38/0x50
[<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
[<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
[<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
[<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
[<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
[<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
[<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
[<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
[<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
[<ffffffff810038e7>] do_syscall_64+0x57/0xb0
Fix this by special-casing singleton qdiscs (those that don't have
underlying netdevice) and introduce immediate handling of those rather
than trying to go over an underlying netdevice. We're in the same
situation in tc_dump_qdisc_root() and tc_dump_tclass_root().
Ultimately, this will have to be slightly reworked so that we are actually
able to show singleton qdiscs (noop) in the dump properly; but we're not
currently doing that anyway, so no regression there, and better do this in
a gradual manner.
Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
net/sched/sch_api.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c093d32..2b75d68 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
{
struct Qdisc *q;
+ if (!qdisc_dev(root))
+ return (root->handle == handle ? root : NULL);
+
if (!(root->flags & TCQ_F_BUILTIN) &&
root->handle == handle)
return root;
@@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
goto done;
q_idx++;
}
+
+ if (!qdisc_dev(root))
+ goto out;
+
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (q_idx < s_q_idx) {
q_idx++;
@@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
return -1;
+ if (!qdisc_dev(root))
+ return 0;
+
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
return -1;
--
1.9.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] net: sched: avoid duplicates in qdisc dump
2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
@ 2016-08-16 21:53 ` Jiri Kosina
2016-08-19 1:39 ` Cong Wang
2016-08-19 4:19 ` [PATCH next-next 0/2] qdisc-hashtable fixes David Miller
2 siblings, 1 reply; 6+ messages in thread
From: Jiri Kosina @ 2016-08-16 21:53 UTC (permalink / raw)
To: David Miller, Daniel Borkmann, David Ahern
Cc: Simon Horman, Cong Wang, Eric Dumazet, Jamal Hadi Salim,
Phil Sutter, linux-kernel, netdev
From: Jiri Kosina <jkosina@suse.cz>
tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases;
first, the "standard" dev->qdisc is being dumped. Second, if there is/are
ingress queue(s), they are being dumped as well.
After conversion of netdevice's qdisc linked-list into hashtable, these
two sets are not in two disjunctive sets/lists any more, but are both
"reachable" directly from netdevice's hashtable. As a consequence, the
"full-depth" dump of the ingress qdiscs results in immediately hitting the
netdevice hashtable again, and duplicating the dump that has already been
performed for dev->qdisc.
What in fact needs to be dumped in case of ingress queue is "just" the
top-level ingress qdisc, as everything else has been dumped already.
Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed
whether it should (while performing the "full" per-netdev qdisc dump) perform
the whole recursion, or just dump "additional" top-level (ingress) qdiscs
without performing any kind of recursion.
This fixes duplicate dumps such as
qdisc mq 0: root
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
net/sched/sch_api.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2b75d68..749811f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1440,7 +1440,7 @@ err_out:
static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
struct netlink_callback *cb,
- int *q_idx_p, int s_q_idx)
+ int *q_idx_p, int s_q_idx, bool recur)
{
int ret = 0, q_idx = *q_idx_p;
struct Qdisc *q;
@@ -1460,7 +1460,13 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
q_idx++;
}
- if (!qdisc_dev(root))
+ /* If dumping singletons, there is no qdisc_dev(root) and the singleton
+ * itself has already been dumped.
+ *
+ * If we've already dumped the top-level (ingress) qdisc above and the global
+ * qdisc hashtable, we don't want to hit it again
+ */
+ if (!qdisc_dev(root) || !recur)
goto out;
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
@@ -1504,13 +1510,13 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
s_q_idx = 0;
q_idx = 0;
- if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx) < 0)
+ if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx, true) < 0)
goto done;
dev_queue = dev_ingress_queue(dev);
if (dev_queue &&
tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb,
- &q_idx, s_q_idx) < 0)
+ &q_idx, s_q_idx, false) < 0)
goto done;
cont:
--
1.9.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] net: sched: avoid duplicates in qdisc dump
2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
@ 2016-08-19 1:39 ` Cong Wang
2016-08-19 21:48 ` Jiri Kosina
0 siblings, 1 reply; 6+ messages in thread
From: Cong Wang @ 2016-08-19 1:39 UTC (permalink / raw)
To: Jiri Kosina
Cc: David Miller, Daniel Borkmann, David Ahern, Simon Horman,
Eric Dumazet, Jamal Hadi Salim, Phil Sutter, LKML,
Linux Kernel Network Developers
On Tue, Aug 16, 2016 at 2:53 PM, Jiri Kosina <jikos@kernel.org> wrote:
> From: Jiri Kosina <jkosina@suse.cz>
>
> tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases;
> first, the "standard" dev->qdisc is being dumped. Second, if there is/are
> ingress queue(s), they are being dumped as well.
>
> After conversion of netdevice's qdisc linked-list into hashtable, these
> two sets are not in two disjunctive sets/lists any more, but are both
> "reachable" directly from netdevice's hashtable. As a consequence, the
> "full-depth" dump of the ingress qdiscs results in immediately hitting the
> netdevice hashtable again, and duplicating the dump that has already been
> performed for dev->qdisc.
> What in fact needs to be dumped in case of ingress queue is "just" the
> top-level ingress qdisc, as everything else has been dumped already.
Doesn't this mean we can now just remove the ingress case from
tc_dump_qdisc() and simply iterate the whole hash table?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH next-next 0/2] qdisc-hashtable fixes
2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
@ 2016-08-19 4:19 ` David Miller
2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2016-08-19 4:19 UTC (permalink / raw)
To: jikos
Cc: daniel, dsa, simon.horman, xiyou.wangcong, eric.dumazet, jhs,
phil, linux-kernel, netdev
From: Jiri Kosina <jikos@kernel.org>
Date: Tue, 16 Aug 2016 23:52:08 +0200 (CEST)
> The following two patches fix all the issues that have been reported
> against the conversion of qdisc linked list to hashtable (currently in
> net-next) so far.
>
> First patch adjusts handling of singleton qdiscs to the new semantics, and
> is rather straightforward.
>
> The second patch, which fixes "cosmetic" issue of duplicate entries in the
> qdisc dump for ingress qdiscs, is a little bit more hairy; I personally
> would love to see all the already existing "if (ingress)"-like hacks go
> away (by, let's say, introducing a general TCQ_F_? flag), but that's way
> out of scope of this patchset (but already on my todo).
>
> Thanks a lot to Daniel Borkmann and David Ahern for reporting the issues
> and testing the patches promptly.
Series applied, thanks Jiri.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] net: sched: avoid duplicates in qdisc dump
2016-08-19 1:39 ` Cong Wang
@ 2016-08-19 21:48 ` Jiri Kosina
0 siblings, 0 replies; 6+ messages in thread
From: Jiri Kosina @ 2016-08-19 21:48 UTC (permalink / raw)
To: Cong Wang
Cc: David Miller, Daniel Borkmann, David Ahern, Simon Horman,
Eric Dumazet, Jamal Hadi Salim, Phil Sutter, LKML,
Linux Kernel Network Developers
On Thu, 18 Aug 2016, Cong Wang wrote:
> Doesn't this mean we can now just remove the ingress case from
> tc_dump_qdisc() and simply iterate the whole hash table?
That'd be indeed a nice cleanup, but for that we'll have to make sure that
the top-level ingress entry makes it to the hashtable as well; that's not
the case currently, and hence the special-casing (but I agree that it's in
principle just a remnant of the old design and of a rather 1:1
list->hashtable conversion, and there is no good reason for keeping it
this way any more).
So patch that'd add ingress qdisc to the hashtable whenever it
materializes should be enough to get rid of the "we have to walk two
linked lists" heritage and the ingress special-casing for dumps.
Will look into it next week unless someone is faster.
Thanks,
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-08-19 21:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16 21:52 [PATCH next-next 0/2] qdisc-hashtable fixes Jiri Kosina
2016-08-16 21:52 ` [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash Jiri Kosina
2016-08-16 21:53 ` [PATCH 2/2] net: sched: avoid duplicates in qdisc dump Jiri Kosina
2016-08-19 1:39 ` Cong Wang
2016-08-19 21:48 ` Jiri Kosina
2016-08-19 4:19 ` [PATCH next-next 0/2] qdisc-hashtable fixes David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).