All of lore.kernel.org
 help / color / mirror / Atom feed
* Deleting child qdisc doesn't reset parent to default qdisc?
@ 2016-04-14 14:44 Jiri Kosina
  2016-04-14 14:51 ` Jiri Kosina
  2016-04-14 15:01 ` Eric Dumazet
  0 siblings, 2 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-04-14 14:44 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev, linux-kernel

Hi,

I've came across the behavior where adding a child qdisc and then deleting 
it again makes the networking dysfunctional (I guess that's because all of 
a sudden there is absolutely no working qdisc on the device, although 
there originally was a default one in the parent).

In a nutshell, is this expected behavior or bug?

=====
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms

jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 1024 

jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms

jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
	[ ... nothing happens ... ]
^C
jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.66 ms
=====

Thanks,

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 14:44 Deleting child qdisc doesn't reset parent to default qdisc? Jiri Kosina
@ 2016-04-14 14:51 ` Jiri Kosina
  2016-04-14 15:01 ` Eric Dumazet
  1 sibling, 0 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-04-14 14:51 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev, linux-kernel

On Thu, 14 Apr 2016, Jiri Kosina wrote:

> In a nutshell, is this expected behavior or bug?

Just to clarify what seems to suggest to me that this is rather a bug that 
needs to be fixed (but apparently one that has been there for quite a long 
time) can be demonstrated by this:

> 
> =====
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

The above configuration works.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms
> 
> jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
> qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 1024 
> 
> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms
> 
> jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

The above configuration doesn't although it's identical to the working one 
at the beginning.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 	[ ... nothing happens ... ]
> ^C

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 14:44 Deleting child qdisc doesn't reset parent to default qdisc? Jiri Kosina
  2016-04-14 14:51 ` Jiri Kosina
@ 2016-04-14 15:01 ` Eric Dumazet
  2016-04-14 15:18   ` Phil Sutter
  1 sibling, 1 reply; 43+ messages in thread
From: Eric Dumazet @ 2016-04-14 15:01 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Jamal Hadi Salim, netdev, linux-kernel

On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> Hi,
> 
> I've came across the behavior where adding a child qdisc and then deleting 
> it again makes the networking dysfunctional (I guess that's because all of 
> a sudden there is absolutely no working qdisc on the device, although 
> there originally was a default one in the parent).
> 
> In a nutshell, is this expected behavior or bug?

This is the expected behavior.

If the kernel was suddenly doing a 'replace' when you ask a delete,
then the scripts doing a delete , than a add would break.

tc users are skilled admins ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 15:01 ` Eric Dumazet
@ 2016-04-14 15:18   ` Phil Sutter
  2016-04-14 15:34     ` Jiri Kosina
  2016-04-14 16:08     ` Jiri Kosina
  0 siblings, 2 replies; 43+ messages in thread
From: Phil Sutter @ 2016-04-14 15:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Kosina, Jamal Hadi Salim, netdev, linux-kernel

On Thu, Apr 14, 2016 at 08:01:39AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> > Hi,
> > 
> > I've came across the behavior where adding a child qdisc and then deleting 
> > it again makes the networking dysfunctional (I guess that's because all of 
> > a sudden there is absolutely no working qdisc on the device, although 
> > there originally was a default one in the parent).
> > 
> > In a nutshell, is this expected behavior or bug?
> 
> This is the expected behavior.

OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
one upon deletion instead of noop_qdisc, hence I would describe
the situation using the words 'inconsistent' and 'accident' rather than
'expected'. :)

Anyhow, the problem with skilled admins is they accept quirks too easily
and just build their scripts around them - the same scripts we have to
keep compatible to then.

Cheers, Phil

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 15:18   ` Phil Sutter
@ 2016-04-14 15:34     ` Jiri Kosina
  2016-04-14 15:44       ` Eric Dumazet
  2016-04-14 16:08     ` Jiri Kosina
  1 sibling, 1 reply; 43+ messages in thread
From: Jiri Kosina @ 2016-04-14 15:34 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Eric Dumazet, Jamal Hadi Salim, netdev, linux-kernel

On Thu, 14 Apr 2016, Phil Sutter wrote:

> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Exactly. I'd again like to stress the fact that this configuration works:

	jikos:~ # tc qdisc show
	qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

and this (after performing add/delete operation) doesn't:

	jikos:~ # tc qdisc show
	qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

It's hard to spot a difference (hint: there is none).

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 15:34     ` Jiri Kosina
@ 2016-04-14 15:44       ` Eric Dumazet
  2016-04-14 16:22         ` Phil Sutter
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Dumazet @ 2016-04-14 15:44 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Phil Sutter, Jamal Hadi Salim, netdev, linux-kernel

On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Phil Sutter wrote:
> 
> > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > one upon deletion instead of noop_qdisc, hence I would describe
> > the situation using the words 'inconsistent' and 'accident' rather than
> > 'expected'. :)
> 
> Exactly. I'd again like to stress the fact that this configuration works:
> 
> 	jikos:~ # tc qdisc show
> 	qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
> 
> and this (after performing add/delete operation) doesn't:
> 
> 	jikos:~ # tc qdisc show
> 	qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
> 
> It's hard to spot a difference (hint: there is none).

This is because some qdisc are not visible in the dump.


qdisc_list_add() uses a single list, so adding too much stuff in it
could slow down fast path (qdisc_lookup(), called from
qdisc_tree_reduce_backlog())

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 15:18   ` Phil Sutter
  2016-04-14 15:34     ` Jiri Kosina
@ 2016-04-14 16:08     ` Jiri Kosina
  2016-04-14 17:49       ` Eric Dumazet
  1 sibling, 1 reply; 43+ messages in thread
From: Jiri Kosina @ 2016-04-14 16:08 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Eric Dumazet, Jamal Hadi Salim, netdev, linux-kernel

On Thu, 14 Apr 2016, Phil Sutter wrote:

> > > I've came across the behavior where adding a child qdisc and then deleting 
> > > it again makes the networking dysfunctional (I guess that's because all of 
> > > a sudden there is absolutely no working qdisc on the device, although 
> > > there originally was a default one in the parent).
> > > 
> > > In a nutshell, is this expected behavior or bug?
> > 
> > This is the expected behavior.
> 
> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Would a patch that'd unify this in a sense that all qdiscs would assign 
the default one upon deletion acceptable?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 15:44       ` Eric Dumazet
@ 2016-04-14 16:22         ` Phil Sutter
  2016-04-14 16:40           ` Eric Dumazet
  0 siblings, 1 reply; 43+ messages in thread
From: Phil Sutter @ 2016-04-14 16:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Kosina, Jamal Hadi Salim, netdev, linux-kernel

On Thu, Apr 14, 2016 at 08:44:40AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote:
> > On Thu, 14 Apr 2016, Phil Sutter wrote:
> > 
> > > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > > one upon deletion instead of noop_qdisc, hence I would describe
> > > the situation using the words 'inconsistent' and 'accident' rather than
> > > 'expected'. :)
> > 
> > Exactly. I'd again like to stress the fact that this configuration works:
> > 
> > 	jikos:~ # tc qdisc show
> > 	qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
> > 
> > and this (after performing add/delete operation) doesn't:
> > 
> > 	jikos:~ # tc qdisc show
> > 	qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
> > 
> > It's hard to spot a difference (hint: there is none).
> 
> This is because some qdisc are not visible in the dump.

And those being invisible can be overridden using 'tc qd add', right?
AFAIR they're not listed because they don't properly register, so the
system doesn't care to override them. In this case we could change all
classful qdiscs to restore the default qdisc if a leaf qdisc is being
deleted instead of noop (which is probably not what the user wants
anyway).

Cheers, Phil

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 16:22         ` Phil Sutter
@ 2016-04-14 16:40           ` Eric Dumazet
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Dumazet @ 2016-04-14 16:40 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Jiri Kosina, Jamal Hadi Salim, netdev, linux-kernel

On Thu, 2016-04-14 at 18:22 +0200, Phil Sutter wrote:

> And those being invisible can be overridden using 'tc qd add', right?
> AFAIR they're not listed because they don't properly register, so the
> system doesn't care to override them. In this case we could change all
> classful qdiscs to restore the default qdisc if a leaf qdisc is being
> deleted instead of noop (which is probably not what the user wants
> anyway).

Even if they properly register, they are not visible.

Take a look at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95dc19299f741c986227ec33e23cbf9b3321f812

for some context.

When a default pfifo is created on say a HTB class, you do not see it by
default in a dump.

If you have 100 HTB classes, HTB created 100 pfifo just fine, and it
works, unless an admin tries to delete them maybe ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 16:08     ` Jiri Kosina
@ 2016-04-14 17:49       ` Eric Dumazet
  2016-04-15 12:42         ` Jamal Hadi Salim
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Dumazet @ 2016-04-14 17:49 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Phil Sutter, Jamal Hadi Salim, netdev, linux-kernel

On Thu, 2016-04-14 at 18:08 +0200, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Phil Sutter wrote:
> 
> > > > I've came across the behavior where adding a child qdisc and then deleting 
> > > > it again makes the networking dysfunctional (I guess that's because all of 
> > > > a sudden there is absolutely no working qdisc on the device, although 
> > > > there originally was a default one in the parent).
> > > > 
> > > > In a nutshell, is this expected behavior or bug?
> > > 
> > > This is the expected behavior.
> > 
> > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > one upon deletion instead of noop_qdisc, hence I would describe
> > the situation using the words 'inconsistent' and 'accident' rather than
> > 'expected'. :)
> 
> Would a patch that'd unify this in a sense that all qdiscs would assign 
> the default one upon deletion acceptable?
> 

And what would be the chosen behavior ?

Relying on TBF installing a bfifo for you at delete would be hazardous.

For example CBQ got it differently than HFSC

If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
falls back to noop_qdisc, without warning the user :(

At least always using noop_qdisc is consistent. No magic there.

Doing 'unification' right now would break existing scripts.

This is too late, I am afraid.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-14 17:49       ` Eric Dumazet
@ 2016-04-15 12:42         ` Jamal Hadi Salim
  2016-04-15 14:58           ` Eric Dumazet
  0 siblings, 1 reply; 43+ messages in thread
From: Jamal Hadi Salim @ 2016-04-15 12:42 UTC (permalink / raw)
  To: Eric Dumazet, Jiri Kosina; +Cc: Phil Sutter, netdev, linux-kernel

On 16-04-14 01:49 PM, Eric Dumazet wrote:

> And what would be the chosen behavior ?
>

TBF is probably a bad example because it started life as
a classless qdisc. There was only one built-in fifo queue
that was shaped. Then someone made it classful and changed
this behavior. To me it sounds reasonable to have the
default behavior restored. At minimal consistency.

> Relying on TBF installing a bfifo for you at delete would be hazardous.
>
> For example CBQ got it differently than HFSC
>
> If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
> falls back to noop_qdisc, without warning the user :(
>
> At least always using noop_qdisc is consistent. No magic there.
>
> Doing 'unification' right now would break existing scripts.
>
> This is too late, I am afraid.


Sigh. So rant:
IMO, we should let any new APIs and API updates stay longer
in discussion. Or better mark them as unstable for sometime.
The excuse that "it is out in the wild therefore cant be changed"
is harmful because the timeline is "forever" whereas
patches are applied after a short period of posting
and discussions and sometimes not involving the right people.
It is like having a jury issuing a death sentence after 1 week of
deliberation. You cant take it back after execution.

cheers,
jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-15 12:42         ` Jamal Hadi Salim
@ 2016-04-15 14:58           ` Eric Dumazet
  2016-04-15 17:13             ` David Miller
                               ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Eric Dumazet @ 2016-04-15 14:58 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Jiri Kosina, Phil Sutter, netdev, linux-kernel

On Fri, 2016-04-15 at 08:42 -0400, Jamal Hadi Salim wrote:
> On 16-04-14 01:49 PM, Eric Dumazet wrote:
> 
> > And what would be the chosen behavior ?
> >
> 
> TBF is probably a bad example because it started life as
> a classless qdisc. There was only one built-in fifo queue
> that was shaped. Then someone made it classful and changed
> this behavior. To me it sounds reasonable to have the
> default behavior restored. At minimal consistency.


Then you need to save the initial qdisc (bfifo for TBF) in a special
place, to make sure the delete operation is guaranteed to succeed.

Or fail the delete if the bfifo can not be allocated.

I can tell that determinism if far more interesting than usability for
some users occasionally playing with tc.

Surely the silent fallback to noop_qdisc is wrong.

Anyway, we probably need to improve our ability to understand qdisc
hierarchies. Having some hidden qdiscs is the real problem here.

We need to add some hash table so that qdisc_match_from_root() does not
have to scan hundred of qdiscs.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-15 14:58           ` Eric Dumazet
@ 2016-04-15 17:13             ` David Miller
  2016-06-28 15:19             ` Jiri Kosina
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: David Miller @ 2016-04-15 17:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jhs, jikos, phil, netdev, linux-kernel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 15 Apr 2016 07:58:48 -0700

> Having some hidden qdiscs is the real problem here.

+1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-04-15 14:58           ` Eric Dumazet
  2016-04-15 17:13             ` David Miller
@ 2016-06-28 15:19             ` Jiri Kosina
  2016-06-28 17:28               ` Cong Wang
  2016-06-28 16:49             ` [RFC PATCH] sch_tbf: avoid silent fallback to noop_qdisc (was Re: Deleting child qdisc doesn't reset parent to default qdisc?) Jiri Kosina
  2016-07-07  9:04             ` [RFC PATCH] net: sched: convert qdisc linked list to hashtable " Jiri Kosina
  3 siblings, 1 reply; 43+ messages in thread
From: Jiri Kosina @ 2016-06-28 15:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Fri, 15 Apr 2016, Eric Dumazet wrote:

> > TBF is probably a bad example because it started life as a classless 
> > qdisc. There was only one built-in fifo queue that was shaped. Then 
> > someone made it classful and changed this behavior. To me it sounds 
> > reasonable to have the default behavior restored. At minimal 
> > consistency.
> 
> 
> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
> 
> Or fail the delete if the bfifo can not be allocated.
> 
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.

BTW, I've started to actually work on fixing this, and I've noticed that 
TBF behavior actually violates what's stated in pfifo_fast manpage:

==========
        Whenever  an  interface is created, the pfifo_fast qdisc is 
	automatically used as a queue. If another qdisc is
	attached, it preempts the default pfifo_fast, which automatically 
	returns to function when an  existing  qdisc is detached.

	In this sense this qdisc is magic, and unlike other qdiscs.
==========

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH] sch_tbf: avoid silent fallback to noop_qdisc (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
  2016-04-15 14:58           ` Eric Dumazet
  2016-04-15 17:13             ` David Miller
  2016-06-28 15:19             ` Jiri Kosina
@ 2016-06-28 16:49             ` Jiri Kosina
  2016-07-07  9:04             ` [RFC PATCH] net: sched: convert qdisc linked list to hashtable " Jiri Kosina
  3 siblings, 0 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-06-28 16:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
> 
> Or fail the delete if the bfifo can not be allocated.
> 
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.
> 
> Surely the silent fallback to noop_qdisc is wrong.

So before we go further and fix the fact that we actually do have hidden 
qdiscs (by refactoring qdisc_match_from_root() and friends), I'd still 
like to bring the patch below up for consideration.

Thanks.




From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] sch_tbf: avoid silent fallback to noop_qdisc

TBF started its life as a classless qdisc with a single builtin FIFO queue
which was being shaped.

When it got later turned into classful qdisc, it was written in a way that
the fallback qdisc was noop_qdisc, which produces bad user experience (delete
of last manually added class doesn't reset it to initial default, but renders
networking unusable instead).

Switch the default fallback to bfifo; this also mimics how the other guys
(HTB, HFSC, CBQ, ...) are behaving.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 net/sched/sch_tbf.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 3161e49..b06dffe 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -508,8 +508,12 @@ static int tbf_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 {
 	struct tbf_sched_data *q = qdisc_priv(sch);
 
-	if (new == NULL)
-		new = &noop_qdisc;
+	if (new == NULL) {
+		/* reset to default qdisc */
+		new = qdisc_create_dflt(sch->dev_queue, &bfifo_qdisc_ops, sch->parent);
+		if (!new)
+			return -ENOBUFS;
+	}
 
 	*old = qdisc_replace(sch, new, &q->qdisc);
 	return 0;
-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-06-28 15:19             ` Jiri Kosina
@ 2016-06-28 17:28               ` Cong Wang
  2016-06-28 17:33                 ` Jiri Kosina
  0 siblings, 1 reply; 43+ messages in thread
From: Cong Wang @ 2016-06-28 17:28 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Eric Dumazet, Jamal Hadi Salim, Phil Sutter,
	Linux Kernel Network Developers, LKML

On Tue, Jun 28, 2016 at 8:19 AM, Jiri Kosina <jikos@kernel.org> wrote:
> BTW, I've started to actually work on fixing this, and I've noticed that
> TBF behavior actually violates what's stated in pfifo_fast manpage:
>
> ==========
>         Whenever  an  interface is created, the pfifo_fast qdisc is
>         automatically used as a queue. If another qdisc is
>         attached, it preempts the default pfifo_fast, which automatically
>         returns to function when an  existing  qdisc is detached.
>
>         In this sense this qdisc is magic, and unlike other qdiscs.
> ==========

It is out of date, now default qdisc can be set to any other qdisc
via /proc. Also, probably due to historical reasons, we don't have
a unified default default qdisc, some uses bfifo, some uses pfifo,
we may break some existing script if we change that.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
  2016-06-28 17:28               ` Cong Wang
@ 2016-06-28 17:33                 ` Jiri Kosina
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-06-28 17:33 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Jamal Hadi Salim, Phil Sutter,
	Linux Kernel Network Developers, LKML

On Tue, 28 Jun 2016, Cong Wang wrote:

> > BTW, I've started to actually work on fixing this, and I've noticed that
> > TBF behavior actually violates what's stated in pfifo_fast manpage:
> >
> > ==========
> >         Whenever  an  interface is created, the pfifo_fast qdisc is
> >         automatically used as a queue. If another qdisc is
> >         attached, it preempts the default pfifo_fast, which automatically
> >         returns to function when an  existing  qdisc is detached.
> >
> >         In this sense this qdisc is magic, and unlike other qdiscs.
> > ==========
> 
> It is out of date, now default qdisc can be set to any other qdisc
> via /proc. Also, probably due to historical reasons, we don't have
> a unified default default qdisc, some uses bfifo, some uses pfifo,
> we may break some existing script if we change that.

While I do understand that reasoning, I'd argue that unpredictable and 
unexpected behavior of TBF causing systems with non-working networking is 
much more likely than any userspace having hard dependency on the fact 
that default (*) qdisc for TBF is noop.

(*) where 'default upon creation' != 'default when reset'

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
  2016-04-15 14:58           ` Eric Dumazet
                               ` (2 preceding siblings ...)
  2016-06-28 16:49             ` [RFC PATCH] sch_tbf: avoid silent fallback to noop_qdisc (was Re: Deleting child qdisc doesn't reset parent to default qdisc?) Jiri Kosina
@ 2016-07-07  9:04             ` Jiri Kosina
  2016-07-07 13:51               ` Eric Dumazet
  2016-07-07 20:36               ` [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable Jiri Kosina
  3 siblings, 2 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-07-07  9:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Anyway, we probably need to improve our ability to understand qdisc 
> hierarchies. Having some hidden qdiscs is the real problem here.
> 
> We need to add some hash table so that qdisc_match_from_root() does not 
> have to scan hundred of qdiscs.

So how about something like the patch below? I already have preliminary 
patches on top which unhide the default qdiscs, but let's make this one 
step after the other.

Thanks.




From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] net: sched: convert qdisc linked list to hashtable

Convert the per-device linked list into a hashtable. The primary motivation
for this change is that currently, we're not tracking all the qdiscs in
hierarchy (e.g. excluding default qdiscs), as the lookup performed over the
linked list by qdisc_match_from_root() is rather expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will bring
much more determinism in user experience.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 include/linux/netdevice.h |  2 ++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c            |  1 +
 net/sched/sch_api.c       | 23 +++++++++++++----------
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c        |  2 +-
 net/sched/sch_mqprio.c    |  2 +-
 8 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..630838e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
 	unsigned int		num_tx_queues;
 	unsigned int		real_num_tx_queues;
 	struct Qdisc		*qdisc;
+	DECLARE_HASHTABLE	(qdisc_hash, 16);
 	unsigned long		tx_queue_len;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
-	struct list_head	list;
+	struct hlist_node       hash;
 	u32			handle;
 	u32			parent;
 	int			(*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->all_adj_list.lower);
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
+	hash_init(dev->qdisc_hash);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..82953cb 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include <linux/hrtimer.h>
 #include <linux/lockdep.h>
 #include <linux/slab.h>
+#include <linux/hashtable.h>
 
 #include <net/net_namespace.h>
 #include <net/sock.h>
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 	    root->handle == handle)
 		return root;
 
-	list_for_each_entry_rcu(q, &root->list, list) {
+	hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
 		if (q->handle == handle)
 			return q;
 	}
 	return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		struct Qdisc *root = qdisc_dev(q)->qdisc;
 
 		WARN_ON_ONCE(root == &noop_qdisc);
 		ASSERT_RTNL();
-		list_add_tail_rcu(&q->list, &root->list);
+		hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		ASSERT_RTNL();
-		list_del_rcu(&q->list);
+		hash_del_rcu(&q->hash);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out4;
 		}
 
-		qdisc_list_add(sch);
+		qdisc_hash_add(sch);
 
 		return sch;
 	}
@@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 {
 	int ret = 0, q_idx = *q_idx_p;
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
 			continue;
@@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 			       int *t_p, int s_t)
 {
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
 		return -1;
 
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..7efc923 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
 	.dequeue	=	noop_dequeue,
 	.flags		=	TCQ_F_BUILTIN,
 	.ops		=	&noop_qdisc_ops,
-	.list		=	LIST_HEAD_INIT(noop_qdisc.list),
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
 		sch->padded = (char *) sch - (char *) p;
 	}
-	INIT_LIST_HEAD(&sch->list);
 	skb_queue_head_init(&sch->q);
 
 	spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 		return;
 
 #ifdef CONFIG_NET_SCHED
-	qdisc_list_del(qdisc);
+	qdisc_hash_del(qdisc);
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
@@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
 			qdisc->ops->attach(qdisc);
 		}
 	}
+	if (dev->qdisc)
+		qdisc_hash_add(dev->qdisc);
 }
 
 static void transition_one_qdisc(struct net_device *dev,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 56a77b8..3bee15d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -88,7 +88,7 @@ static void mq_attach(struct Qdisc *sch)
 			qdisc_destroy(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 #endif
 
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b8002ce..dbfb3a5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -182,7 +182,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		if (old)
 			qdisc_destroy(old);
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 	}
 	kfree(priv->qdiscs);
 	priv->qdiscs = NULL;

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
  2016-07-07  9:04             ` [RFC PATCH] net: sched: convert qdisc linked list to hashtable " Jiri Kosina
@ 2016-07-07 13:51               ` Eric Dumazet
  2016-07-07 16:32                 ` Jiri Kosina
  2016-07-07 20:36               ` [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable Jiri Kosina
  1 sibling, 1 reply; 43+ messages in thread
From: Eric Dumazet @ 2016-07-07 13:51 UTC (permalink / raw)
  To: Jiri Kosina, Craig Gallek
  Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Thu, 2016-07-07 at 11:04 +0200, Jiri Kosina wrote:

> 
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] net: sched: convert qdisc linked list to hashtable
> 
> Convert the per-device linked list into a hashtable. The primary motivation
> for this change is that currently, we're not tracking all the qdiscs in
> hierarchy (e.g. excluding default qdiscs), as the lookup performed over the
> linked list by qdisc_match_from_root() is rather expensive.

...

>  	}
> @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
>  {
>  	int ret = 0, q_idx = *q_idx_p;
>  	struct Qdisc *q;
> +	int b;
>  
>  	if (!root)
>  		return 0;
> @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
>  			goto done;
>  		q_idx++;
>  	}
> -	list_for_each_entry(q, &root->list, list) {
> +	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
>  		if (q_idx < s_q_idx) {
>  			q_idx++;
>  			continue;
> @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
>  			       int *t_p, int s_t)
>  {
>  	struct Qdisc *q;
> +	int b;
>  
>  	if (!root)
>  		return 0;
> @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
>  	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
>  		return -1;
>  
> -	list_for_each_entry(q, &root->list, list) {
> +	hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
>  		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
>  			return -1;
>  	}


Not sure why you used the rcu version here, but the non rcu version in
tc_dump_qdisc_root()

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
  2016-07-07 13:51               ` Eric Dumazet
@ 2016-07-07 16:32                 ` Jiri Kosina
  2016-07-07 16:53                   ` Eric Dumazet
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Kosina @ 2016-07-07 16:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Craig Gallek, Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Thu, 7 Jul 2016, Eric Dumazet wrote:

> > @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
> >  {
> >  	int ret = 0, q_idx = *q_idx_p;
> >  	struct Qdisc *q;
> > +	int b;
> >  
> >  	if (!root)
> >  		return 0;
> > @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
> >  			goto done;
> >  		q_idx++;
> >  	}
> > -	list_for_each_entry(q, &root->list, list) {
> > +	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> >  		if (q_idx < s_q_idx) {
> >  			q_idx++;
> >  			continue;
> > @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
> >  			       int *t_p, int s_t)
> >  {
> >  	struct Qdisc *q;
> > +	int b;
> >  
> >  	if (!root)
> >  		return 0;
> > @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
> >  	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
> >  		return -1;
> >  
> > -	list_for_each_entry(q, &root->list, list) {
> > +	hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> >  		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
> >  			return -1;
> >  	}
> 
> 
> Not sure why you used the rcu version here, but the non rcu version in
> tc_dump_qdisc_root()

Good catch.

Actually even the current code is odd in this regard -- 
qdisc_match_from_root() uses RCU iterator, while tc_dump_*() use the 
non-RCU one; addition and deletion is performed using RCU primitives.

I haven't got my head around this yet; if it's correct at all, it'd at 
least deserve a comment somewhere.

I'll respin v2 of the patch (there is also a conflict on HASH_SIZE 
definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include 
in netdevice.h that needs to be resolved as well) that'd make RCU usage 
consistent.

Any other objections/comments? I was namely curious about any opinions 
regarding the hashtable size.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
  2016-07-07 16:32                 ` Jiri Kosina
@ 2016-07-07 16:53                   ` Eric Dumazet
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Dumazet @ 2016-07-07 16:53 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Craig Gallek, Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Thu, 2016-07-07 at 18:32 +0200, Jiri Kosina wrote:
> On Thu, 7 Jul 2016, Eric Dumazet wrote:
> 
> > > @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
> > >  {
> > >  	int ret = 0, q_idx = *q_idx_p;
> > >  	struct Qdisc *q;
> > > +	int b;
> > >  
> > >  	if (!root)
> > >  		return 0;
> > > @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
> > >  			goto done;
> > >  		q_idx++;
> > >  	}
> > > -	list_for_each_entry(q, &root->list, list) {
> > > +	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > >  		if (q_idx < s_q_idx) {
> > >  			q_idx++;
> > >  			continue;
> > > @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
> > >  			       int *t_p, int s_t)
> > >  {
> > >  	struct Qdisc *q;
> > > +	int b;
> > >  
> > >  	if (!root)
> > >  		return 0;
> > > @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
> > >  	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
> > >  		return -1;
> > >  
> > > -	list_for_each_entry(q, &root->list, list) {
> > > +	hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > >  		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
> > >  			return -1;
> > >  	}
> > 
> > 
> > Not sure why you used the rcu version here, but the non rcu version in
> > tc_dump_qdisc_root()
> 
> Good catch.
> 
> Actually even the current code is odd in this regard -- 
> qdisc_match_from_root() uses RCU iterator,

Because it can be run from qdisc enqueue() dequeue(), not holding RTNL.

>  while tc_dump_*() use the 
> non-RCU one; addition and deletion is performed using RCU primitives.

It really is protected by RTNL (qdiscs can not change during the dump)

> 
> I haven't got my head around this yet; if it's correct at all, it'd at 
> least deserve a comment somewhere.
> 
> I'll respin v2 of the patch (there is also a conflict on HASH_SIZE 
> definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include 
> in netdevice.h that needs to be resolved as well) that'd make RCU usage 
> consistent.
> 
> Any other objections/comments? I was namely curious about any opinions 
> regarding the hashtable size.

Well, this is the tricky part, but rhashtable would mean way more
changes...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
  2016-07-07  9:04             ` [RFC PATCH] net: sched: convert qdisc linked list to hashtable " Jiri Kosina
  2016-07-07 13:51               ` Eric Dumazet
@ 2016-07-07 20:36               ` Jiri Kosina
  2016-07-08  8:50                 ` Eric Dumazet
                                   ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-07-07 20:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

From: Jiri Kosina <jkosina@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to make
sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

v1 -> v2: 	fix up RCU hastable usage wrt. rtnl
	   	fix compilation of .c files which define their own 
		 HASH_SIZE that now oncflicts with the one from 
		 hashtable.h (newly included via netdevice.h)

 include/linux/netdevice.h |  2 ++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c            |  1 +
 net/ipv6/ip6_gre.c        |  8 ++++----
 net/ipv6/ip6_tunnel.c     |  6 +++---
 net/ipv6/ip6_vti.c        |  6 +++---
 net/ipv6/sit.c            | 10 +++++-----
 net/sched/sch_api.c       | 23 +++++++++++++----------
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c        |  2 +-
 net/sched/sch_mqprio.c    |  2 +-
 12 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..630838e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
 	unsigned int		num_tx_queues;
 	unsigned int		real_num_tx_queues;
 	struct Qdisc		*qdisc;
+	DECLARE_HASHTABLE	(qdisc_hash, 16);
 	unsigned long		tx_queue_len;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
-	struct list_head	list;
+	struct hlist_node       hash;
 	u32			handle;
 	u32			parent;
 	int			(*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->all_adj_list.lower);
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
+	hash_init(dev->qdisc_hash);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..0f70ecc 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
 #define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-	struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+	struct ip6_tnl __rcu *tunnels[4][__HASH_SIZE];
 
 	struct net_device *fb_tunnel_dev;
 };
@@ -96,7 +96,7 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
    will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(__HASH_SIZE - 1))
 static u32 HASH_ADDR(const struct in6_addr *addr)
 {
 	u32 hash = ipv6_addr_hash(addr);
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head)
 
 	for (prio = 0; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < __HASH_SIZE; h++) {
 			struct ip6_tnl *t;
 
 			t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..a9da620 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -65,7 +65,7 @@ MODULE_ALIAS_RTNL_LINK("ip6tnl");
 MODULE_ALIAS_NETDEV("ip6tnl0");
 
 #define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)
 
 static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
@@ -87,7 +87,7 @@ struct ip6_tnl_net {
 	/* the IPv6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[__HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -2031,7 +2031,7 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
 		if (dev->rtnl_link_ops == &ip6_link_ops)
 			unregister_netdevice_queue(dev, &list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < __HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			/* If dev is in the same netns, it has already
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..2d192af 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -51,7 +51,7 @@
 #include <net/netns/generic.h>
 
 #define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)
 
 static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
@@ -69,7 +69,7 @@ struct vti6_net {
 	/* the vti6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[__HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -1040,7 +1040,7 @@ static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
 	struct ip6_tnl *t;
 	LIST_HEAD(list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < __HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			unregister_netdevice_queue(t->dev, &list);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0a5a255..9f776d7 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -62,7 +62,7 @@
    For comments look at net/ipv4/ip_gre.c --ANK
  */
 
-#define HASH_SIZE  16
+#define __HASH_SIZE  16
 #define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
 
 static bool log_ecn_error = true;
@@ -78,9 +78,9 @@ static struct rtnl_link_ops sit_link_ops __read_mostly;
 
 static int sit_net_id __read_mostly;
 struct sit_net {
-	struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r_l[__HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r[__HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_l[__HASH_SIZE];
 	struct ip_tunnel __rcu *tunnels_wc[1];
 	struct ip_tunnel __rcu **tunnels[4];
 
@@ -1773,7 +1773,7 @@ static void __net_exit sit_destroy_tunnels(struct net *net,
 
 	for (prio = 1; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < __HASH_SIZE; h++) {
 			struct ip_tunnel *t;
 
 			t = rtnl_dereference(sitn->tunnels[prio][h]);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..c093d32 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include <linux/hrtimer.h>
 #include <linux/lockdep.h>
 #include <linux/slab.h>
+#include <linux/hashtable.h>
 
 #include <net/net_namespace.h>
 #include <net/sock.h>
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 	    root->handle == handle)
 		return root;
 
-	list_for_each_entry_rcu(q, &root->list, list) {
+	hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
 		if (q->handle == handle)
 			return q;
 	}
 	return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		struct Qdisc *root = qdisc_dev(q)->qdisc;
 
 		WARN_ON_ONCE(root == &noop_qdisc);
 		ASSERT_RTNL();
-		list_add_tail_rcu(&q->list, &root->list);
+		hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		ASSERT_RTNL();
-		list_del_rcu(&q->list);
+		hash_del_rcu(&q->hash);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out4;
 		}
 
-		qdisc_list_add(sch);
+		qdisc_hash_add(sch);
 
 		return sch;
 	}
@@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 {
 	int ret = 0, q_idx = *q_idx_p;
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
 			continue;
@@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 			       int *t_p, int s_t)
 {
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
 		return -1;
 
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..7efc923 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
 	.dequeue	=	noop_dequeue,
 	.flags		=	TCQ_F_BUILTIN,
 	.ops		=	&noop_qdisc_ops,
-	.list		=	LIST_HEAD_INIT(noop_qdisc.list),
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
 		sch->padded = (char *) sch - (char *) p;
 	}
-	INIT_LIST_HEAD(&sch->list);
 	skb_queue_head_init(&sch->q);
 
 	spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 		return;
 
 #ifdef CONFIG_NET_SCHED
-	qdisc_list_del(qdisc);
+	qdisc_hash_del(qdisc);
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
@@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
 			qdisc->ops->attach(qdisc);
 		}
 	}
+	if (dev->qdisc)
+		qdisc_hash_add(dev->qdisc);
 }
 
 static void transition_one_qdisc(struct net_device *dev,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 56a77b8..3bee15d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -88,7 +88,7 @@ static void mq_attach(struct Qdisc *sch)
 			qdisc_destroy(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 #endif
 
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b8002ce..dbfb3a5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -182,7 +182,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		if (old)
 			qdisc_destroy(old);
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 	}
 	kfree(priv->qdiscs);
 	priv->qdiscs = NULL;
-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
  2016-07-07 20:36               ` [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable Jiri Kosina
@ 2016-07-08  8:50                 ` Eric Dumazet
  2016-07-08  9:02                   ` Jiri Kosina
  2016-07-08 11:07                 ` Thomas Graf
  2016-07-11 14:02                 ` [RFC PATCH v3] " Jiri Kosina
  2 siblings, 1 reply; 43+ messages in thread
From: Eric Dumazet @ 2016-07-08  8:50 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Thu, 2016-07-07 at 22:36 +0200, Jiri Kosina wrote:
> From: Jiri Kosina <jkosina@suse.cz>
> 
> Convert the per-device linked list into a hashtable. The primary 
> motivation for this change is that currently, we're not tracking all the 
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
> performed over the linked list by qdisc_match_from_root() is rather 
> expensive.
> 
> The ultimate goal is to get rid of hidden qdiscs completely, which will 
> bring much more determinism in user experience.
> 
> As we're adding hashtable.h include into generic netdevice.h, we have to make
> sure HASH_SIZE macro is now non-conflicting with local definitions.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---


> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index fdc9de2..0f70ecc 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
>  MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
>  
>  #define HASH_SIZE_SHIFT  5
> -#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
> +#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)

__ prefix is mostly used for functions having some kind of
shells/helpers.

I would rather use IP6_GRE_HASH_SIZE or something which has lower
chances of being used elsewhere.

Or maybe you could use new HASH_SIZE(name), providing proper 'name'

@@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
>  			qdisc->ops->attach(qdisc);
>  		}
>  	}
> +	if (dev->qdisc)
> +		qdisc_hash_add(dev->qdisc);
>  }
>  

I do not understand this addition, could you comment on it ?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
  2016-07-08  8:50                 ` Eric Dumazet
@ 2016-07-08  9:02                   ` Jiri Kosina
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-07-08  9:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Fri, 8 Jul 2016, Eric Dumazet wrote:

> > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> > index fdc9de2..0f70ecc 100644
> > --- a/net/ipv6/ip6_gre.c
> > +++ b/net/ipv6/ip6_gre.c
> > @@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
> >  MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
> >  
> >  #define HASH_SIZE_SHIFT  5
> > -#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
> > +#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)
> 
> __ prefix is mostly used for functions having some kind of
> shells/helpers.
> 
> I would rather use IP6_GRE_HASH_SIZE or something which has lower
> chances of being used elsewhere.

Alright, makes sense, will do this in v3.

> @@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
> >  			qdisc->ops->attach(qdisc);
> >  		}
> >  	}
> > +	if (dev->qdisc)
> > +		qdisc_hash_add(dev->qdisc);
> >  }
> >  
> 
> I do not understand this addition, could you comment on it ?

With linked lists, assigning to struct net_device's Qdisc pointer is 
enough to "initialize" the linked list and have it contain one (root) 
item. With hashtable, this is not the case, it needs to be explicitly 
added.

Hmm, dev_init_scheduler() (and perhaps also dev_shutdown()) would possibly 
need similar treatment in order to have accurate data there 100% of the 
time even during initialization.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
  2016-07-07 20:36               ` [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable Jiri Kosina
  2016-07-08  8:50                 ` Eric Dumazet
@ 2016-07-08 11:07                 ` Thomas Graf
  2016-07-08 13:52                   ` Eric Dumazet
  2016-07-11 14:02                 ` [RFC PATCH v3] " Jiri Kosina
  2 siblings, 1 reply; 43+ messages in thread
From: Thomas Graf @ 2016-07-08 11:07 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On 07/07/16 at 10:36pm, Jiri Kosina wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f45929c..630838e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -52,6 +52,7 @@
>  #include <uapi/linux/netdevice.h>
>  #include <uapi/linux/if_bonding.h>
>  #include <uapi/linux/pkt_cls.h>
> +#include <linux/hashtable.h>
>  
>  struct netpoll_info;
>  struct device;
> @@ -1778,6 +1779,7 @@ struct net_device {
>  	unsigned int		num_tx_queues;
>  	unsigned int		real_num_tx_queues;
>  	struct Qdisc		*qdisc;
> +	DECLARE_HASHTABLE	(qdisc_hash, 16);

This blows up net_device to an insane size: 64K * sizeof(struct
hlist_head). Can we allocate this on demand for net_devices where
it is actually needed? The majority of virtual devices won't need
this. Doesn't have to be rhashtable, can still be fixed size but
at least allocate it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
  2016-07-08 11:07                 ` Thomas Graf
@ 2016-07-08 13:52                   ` Eric Dumazet
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Dumazet @ 2016-07-08 13:52 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Kosina, Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

On Fri, 2016-07-08 at 13:07 +0200, Thomas Graf wrote:
> On 07/07/16 at 10:36pm, Jiri Kosina wrote:
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index f45929c..630838e 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -52,6 +52,7 @@
> >  #include <uapi/linux/netdevice.h>
> >  #include <uapi/linux/if_bonding.h>
> >  #include <uapi/linux/pkt_cls.h>
> > +#include <linux/hashtable.h>
> >  
> >  struct netpoll_info;
> >  struct device;
> > @@ -1778,6 +1779,7 @@ struct net_device {
> >  	unsigned int		num_tx_queues;
> >  	unsigned int		real_num_tx_queues;
> >  	struct Qdisc		*qdisc;
> > +	DECLARE_HASHTABLE	(qdisc_hash, 16);
> 
> This blows up net_device to an insane size: 64K * sizeof(struct
> hlist_head). Can we allocate this on demand for net_devices where
> it is actually needed? The majority of virtual devices won't need
> this. Doesn't have to be rhashtable, can still be fixed size but
> at least allocate it.


Jiri probably misread the API and should have used :

DECLARE_HASHTABLE	(qdisc_hash, 4);

Google has a very similar patch with 16 buckets, and it is 'good enough',
although we do not hit the qdisc_tree_reduce_backlog() penalty.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable
  2016-07-07 20:36               ` [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable Jiri Kosina
  2016-07-08  8:50                 ` Eric Dumazet
  2016-07-08 11:07                 ` Thomas Graf
@ 2016-07-11 14:02                 ` Jiri Kosina
  2016-07-12 17:36                   ` Cong Wang
  2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
  2 siblings, 2 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-07-11 14:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel

From: Jiri Kosina <jkosina@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

v1 -> v2:     fix up RCU hastable usage wrt. rtnl
              fix compilation of .c files which define their own
              HASH_SIZE that now oncflicts with the one from
              hashtable.h (newly included via netdevice.h)

v2 -> v3:     resolve HASH_SIZE identifier conflicts in a cleaner way
	      fix up the number of hash bucket bits (4 bits for 16 buckets)

 include/linux/netdevice.h |  2 ++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c            |  1 +
 net/ipv6/ip6_gre.c        | 12 ++++++------
 net/ipv6/ip6_tunnel.c     | 10 +++++-----
 net/ipv6/ip6_vti.c        | 10 +++++-----
 net/ipv6/sit.c            | 10 +++++-----
 net/sched/sch_api.c       | 23 +++++++++++++----------
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c        |  2 +-
 net/sched/sch_mqprio.c    |  2 +-
 12 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..0b5c172e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
 	unsigned int		num_tx_queues;
 	unsigned int		real_num_tx_queues;
 	struct Qdisc		*qdisc;
+	DECLARE_HASHTABLE	(qdisc_hash, 4);
 	unsigned long		tx_queue_len;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
-	struct list_head	list;
+	struct hlist_node       hash;
 	u32			handle;
 	u32			parent;
 	int			(*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->all_adj_list.lower);
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
+	hash_init(dev->qdisc_hash);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-	struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+	struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
 	struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
    will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1))
 static u32 HASH_ADDR(const struct in6_addr *addr)
 {
 	u32 hash = ipv6_addr_hash(addr);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT);
 }
 
 #define tunnels_r_l	tunnels[3]
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head)
 
 	for (prio = 0; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_GRE_HASH_SIZE; h++) {
 			struct ip6_tnl *t;
 
 			t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..2050217 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -64,8 +64,8 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("ip6tnl");
 MODULE_ALIAS_NETDEV("ip6tnl0");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_TUNNEL_HASH_SIZE_SHIFT  5
+#define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT)
 
 static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
@@ -75,7 +75,7 @@ static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_TUNNEL_HASH_SIZE_SHIFT);
 }
 
 static int ip6_tnl_dev_init(struct net_device *dev);
@@ -87,7 +87,7 @@ struct ip6_tnl_net {
 	/* the IPv6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_TUNNEL_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -2031,7 +2031,7 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
 		if (dev->rtnl_link_ops == &ip6_link_ops)
 			unregister_netdevice_queue(dev, &list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_TUNNEL_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			/* If dev is in the same netns, it has already
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..cc7e058 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -50,14 +50,14 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_VTI_HASH_SIZE_SHIFT  5
+#define IP6_VTI_HASH_SIZE (1 << IP6_VTI_HASH_SIZE_SHIFT)
 
 static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_VTI_HASH_SIZE_SHIFT);
 }
 
 static int vti6_dev_init(struct net_device *dev);
@@ -69,7 +69,7 @@ struct vti6_net {
 	/* the vti6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_VTI_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -1040,7 +1040,7 @@ static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
 	struct ip6_tnl *t;
 	LIST_HEAD(list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_VTI_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			unregister_netdevice_queue(t->dev, &list);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0a5a255..94dd0f0 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -62,7 +62,7 @@
    For comments look at net/ipv4/ip_gre.c --ANK
  */
 
-#define HASH_SIZE  16
+#define IP6_SIT_HASH_SIZE  16
 #define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
 
 static bool log_ecn_error = true;
@@ -78,9 +78,9 @@ static struct rtnl_link_ops sit_link_ops __read_mostly;
 
 static int sit_net_id __read_mostly;
 struct sit_net {
-	struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r_l[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_l[IP6_SIT_HASH_SIZE];
 	struct ip_tunnel __rcu *tunnels_wc[1];
 	struct ip_tunnel __rcu **tunnels[4];
 
@@ -1773,7 +1773,7 @@ static void __net_exit sit_destroy_tunnels(struct net *net,
 
 	for (prio = 1; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_SIT_HASH_SIZE; h++) {
 			struct ip_tunnel *t;
 
 			t = rtnl_dereference(sitn->tunnels[prio][h]);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..c093d32 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include <linux/hrtimer.h>
 #include <linux/lockdep.h>
 #include <linux/slab.h>
+#include <linux/hashtable.h>
 
 #include <net/net_namespace.h>
 #include <net/sock.h>
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 	    root->handle == handle)
 		return root;
 
-	list_for_each_entry_rcu(q, &root->list, list) {
+	hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
 		if (q->handle == handle)
 			return q;
 	}
 	return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		struct Qdisc *root = qdisc_dev(q)->qdisc;
 
 		WARN_ON_ONCE(root == &noop_qdisc);
 		ASSERT_RTNL();
-		list_add_tail_rcu(&q->list, &root->list);
+		hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		ASSERT_RTNL();
-		list_del_rcu(&q->list);
+		hash_del_rcu(&q->hash);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out4;
 		}
 
-		qdisc_list_add(sch);
+		qdisc_hash_add(sch);
 
 		return sch;
 	}
@@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 {
 	int ret = 0, q_idx = *q_idx_p;
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
 			continue;
@@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 			       int *t_p, int s_t)
 {
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
 		return -1;
 
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..7efc923 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
 	.dequeue	=	noop_dequeue,
 	.flags		=	TCQ_F_BUILTIN,
 	.ops		=	&noop_qdisc_ops,
-	.list		=	LIST_HEAD_INIT(noop_qdisc.list),
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
 		sch->padded = (char *) sch - (char *) p;
 	}
-	INIT_LIST_HEAD(&sch->list);
 	skb_queue_head_init(&sch->q);
 
 	spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 		return;
 
 #ifdef CONFIG_NET_SCHED
-	qdisc_list_del(qdisc);
+	qdisc_hash_del(qdisc);
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
@@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
 			qdisc->ops->attach(qdisc);
 		}
 	}
+	if (dev->qdisc)
+		qdisc_hash_add(dev->qdisc);
 }
 
 static void transition_one_qdisc(struct net_device *dev,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 56a77b8..3bee15d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -88,7 +88,7 @@ static void mq_attach(struct Qdisc *sch)
 			qdisc_destroy(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 #endif
 
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b8002ce..dbfb3a5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -182,7 +182,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		if (old)
 			qdisc_destroy(old);
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 	}
 	kfree(priv->qdiscs);
 	priv->qdiscs = NULL;

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable
  2016-07-11 14:02                 ` [RFC PATCH v3] " Jiri Kosina
@ 2016-07-12 17:36                   ` Cong Wang
  2016-07-13 13:47                     ` Jiri Kosina
  2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
  1 sibling, 1 reply; 43+ messages in thread
From: Cong Wang @ 2016-07-12 17:36 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Eric Dumazet, Jamal Hadi Salim, Phil Sutter,
	Linux Kernel Network Developers, LKML

On Mon, Jul 11, 2016 at 7:02 AM, Jiri Kosina <jikos@kernel.org> wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f45929c..0b5c172e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -52,6 +52,7 @@
>  #include <uapi/linux/netdevice.h>
>  #include <uapi/linux/if_bonding.h>
>  #include <uapi/linux/pkt_cls.h>
> +#include <linux/hashtable.h>
>
>  struct netpoll_info;
>  struct device;
> @@ -1778,6 +1779,7 @@ struct net_device {
>         unsigned int            num_tx_queues;
>         unsigned int            real_num_tx_queues;
>         struct Qdisc            *qdisc;
> +       DECLARE_HASHTABLE       (qdisc_hash, 4);
>         unsigned long           tx_queue_len;
>         spinlock_t              tx_global_lock;
>         int                     watchdog_timeo;

Should it be surrounded by CONFIG_NET_SCHED?
To save several bytes for !CONFIG_NET_SCHED case.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable
  2016-07-12 17:36                   ` Cong Wang
@ 2016-07-13 13:47                     ` Jiri Kosina
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-07-13 13:47 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Jamal Hadi Salim, Phil Sutter,
	Linux Kernel Network Developers, LKML

On Tue, 12 Jul 2016, Cong Wang wrote:

> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index f45929c..0b5c172e 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -52,6 +52,7 @@
> >  #include <uapi/linux/netdevice.h>
> >  #include <uapi/linux/if_bonding.h>
> >  #include <uapi/linux/pkt_cls.h>
> > +#include <linux/hashtable.h>
> >
> >  struct netpoll_info;
> >  struct device;
> > @@ -1778,6 +1779,7 @@ struct net_device {
> >         unsigned int            num_tx_queues;
> >         unsigned int            real_num_tx_queues;
> >         struct Qdisc            *qdisc;
> > +       DECLARE_HASHTABLE       (qdisc_hash, 4);
> >         unsigned long           tx_queue_len;
> >         spinlock_t              tx_global_lock;
> >         int                     watchdog_timeo;
> 
> Should it be surrounded by CONFIG_NET_SCHED?
> To save several bytes for !CONFIG_NET_SCHED case.

Makes sense. I'll wait a bit for more feedback (if there is any) before 
including this in potential v4.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-11 14:02                 ` [RFC PATCH v3] " Jiri Kosina
  2016-07-12 17:36                   ` Cong Wang
@ 2016-07-28  9:56                   ` Jiri Kosina
  2016-07-28 11:10                     ` kbuild test robot
                                       ` (4 more replies)
  1 sibling, 5 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-07-28  9:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel, Cong Wang

From: Jiri Kosina <jkosina@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
v1 -> v2:     fix up RCU hastable usage wrt. rtnl
              fix compilation of .c files which define their own
              HASH_SIZE that now oncflicts with the one from
              hashtable.h (newly included via netdevice.h)

v2 -> v3:     resolve HASH_SIZE identifier conflicts in a cleaner way
              fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4:     put the hastable into struct netdevice only if 
              CONFIG_NET_SCHED has been enabled

 include/linux/netdevice.h |  4 ++++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c            |  3 +++
 net/ipv6/ip6_gre.c        | 12 ++++++------
 net/ipv6/ip6_tunnel.c     | 10 +++++-----
 net/ipv6/ip6_vti.c        | 10 +++++-----
 net/ipv6/sit.c            | 10 +++++-----
 net/sched/sch_api.c       | 23 +++++++++++++----------
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c        |  2 +-
 net/sched/sch_mqprio.c    |  2 +-
 12 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
 	unsigned int		num_tx_queues;
 	unsigned int		real_num_tx_queues;
 	struct Qdisc		*qdisc;
+#ifdef CONFIG_NET_SCHED
+	DECLARE_HASHTABLE	(qdisc_hash, 4);
+#endif
 	unsigned long		tx_queue_len;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
-	struct list_head	list;
+	struct hlist_node       hash;
 	u32			handle;
 	u32			parent;
 	int			(*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->all_adj_list.lower);
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
+#ifdef CONFIG_NET_SCHED
+	hash_init(dev->qdisc_hash);
+#endif
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-	struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+	struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
 	struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
    will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1))
 static u32 HASH_ADDR(const struct in6_addr *addr)
 {
 	u32 hash = ipv6_addr_hash(addr);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT);
 }
 
 #define tunnels_r_l	tunnels[3]
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head)
 
 	for (prio = 0; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_GRE_HASH_SIZE; h++) {
 			struct ip6_tnl *t;
 
 			t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..2050217 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -64,8 +64,8 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("ip6tnl");
 MODULE_ALIAS_NETDEV("ip6tnl0");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_TUNNEL_HASH_SIZE_SHIFT  5
+#define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT)
 
 static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
@@ -75,7 +75,7 @@ static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_TUNNEL_HASH_SIZE_SHIFT);
 }
 
 static int ip6_tnl_dev_init(struct net_device *dev);
@@ -87,7 +87,7 @@ struct ip6_tnl_net {
 	/* the IPv6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_TUNNEL_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -2031,7 +2031,7 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
 		if (dev->rtnl_link_ops == &ip6_link_ops)
 			unregister_netdevice_queue(dev, &list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_TUNNEL_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			/* If dev is in the same netns, it has already
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..cc7e058 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -50,14 +50,14 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_VTI_HASH_SIZE_SHIFT  5
+#define IP6_VTI_HASH_SIZE (1 << IP6_VTI_HASH_SIZE_SHIFT)
 
 static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_VTI_HASH_SIZE_SHIFT);
 }
 
 static int vti6_dev_init(struct net_device *dev);
@@ -69,7 +69,7 @@ struct vti6_net {
 	/* the vti6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_VTI_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -1040,7 +1040,7 @@ static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
 	struct ip6_tnl *t;
 	LIST_HEAD(list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_VTI_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			unregister_netdevice_queue(t->dev, &list);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0a5a255..94dd0f0 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -62,7 +62,7 @@
    For comments look at net/ipv4/ip_gre.c --ANK
  */
 
-#define HASH_SIZE  16
+#define IP6_SIT_HASH_SIZE  16
 #define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
 
 static bool log_ecn_error = true;
@@ -78,9 +78,9 @@ static struct rtnl_link_ops sit_link_ops __read_mostly;
 
 static int sit_net_id __read_mostly;
 struct sit_net {
-	struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r_l[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_l[IP6_SIT_HASH_SIZE];
 	struct ip_tunnel __rcu *tunnels_wc[1];
 	struct ip_tunnel __rcu **tunnels[4];
 
@@ -1773,7 +1773,7 @@ static void __net_exit sit_destroy_tunnels(struct net *net,
 
 	for (prio = 1; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_SIT_HASH_SIZE; h++) {
 			struct ip_tunnel *t;
 
 			t = rtnl_dereference(sitn->tunnels[prio][h]);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..c093d32 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include <linux/hrtimer.h>
 #include <linux/lockdep.h>
 #include <linux/slab.h>
+#include <linux/hashtable.h>
 
 #include <net/net_namespace.h>
 #include <net/sock.h>
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 	    root->handle == handle)
 		return root;
 
-	list_for_each_entry_rcu(q, &root->list, list) {
+	hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
 		if (q->handle == handle)
 			return q;
 	}
 	return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		struct Qdisc *root = qdisc_dev(q)->qdisc;
 
 		WARN_ON_ONCE(root == &noop_qdisc);
 		ASSERT_RTNL();
-		list_add_tail_rcu(&q->list, &root->list);
+		hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		ASSERT_RTNL();
-		list_del_rcu(&q->list);
+		hash_del_rcu(&q->hash);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out4;
 		}
 
-		qdisc_list_add(sch);
+		qdisc_hash_add(sch);
 
 		return sch;
 	}
@@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 {
 	int ret = 0, q_idx = *q_idx_p;
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
 			continue;
@@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 			       int *t_p, int s_t)
 {
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
 		return -1;
 
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..7efc923 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
 	.dequeue	=	noop_dequeue,
 	.flags		=	TCQ_F_BUILTIN,
 	.ops		=	&noop_qdisc_ops,
-	.list		=	LIST_HEAD_INIT(noop_qdisc.list),
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
 		sch->padded = (char *) sch - (char *) p;
 	}
-	INIT_LIST_HEAD(&sch->list);
 	skb_queue_head_init(&sch->q);
 
 	spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 		return;
 
 #ifdef CONFIG_NET_SCHED
-	qdisc_list_del(qdisc);
+	qdisc_hash_del(qdisc);
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
@@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
 			qdisc->ops->attach(qdisc);
 		}
 	}
+	if (dev->qdisc)
+		qdisc_hash_add(dev->qdisc);
 }
 
 static void transition_one_qdisc(struct net_device *dev,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 56a77b8..3bee15d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -88,7 +88,7 @@ static void mq_attach(struct Qdisc *sch)
 			qdisc_destroy(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 #endif
 
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b8002ce..dbfb3a5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -182,7 +182,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		if (old)
 			qdisc_destroy(old);
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 	}
 	kfree(priv->qdiscs);
 	priv->qdiscs = NULL;

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
@ 2016-07-28 11:10                     ` kbuild test robot
  2016-07-28 11:18                       ` Jiri Kosina
  2016-07-28 11:22                     ` kbuild test robot
                                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: kbuild test robot @ 2016-07-28 11:10 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

Hi,

[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: i386-randconfig-s0-201630 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> (.text+0x37ccb): undefined reference to `qdisc_hash_add'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23936 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28 11:10                     ` kbuild test robot
@ 2016-07-28 11:18                       ` Jiri Kosina
  2016-07-28 12:53                         ` Fengguang Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Kosina @ 2016-07-28 11:18 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

On Thu, 28 Jul 2016, kbuild test robot wrote:

> [auto build test ERROR on v4.7-rc7]
> [also build test ERROR on next-20160728]
> [cannot apply to net/master net-next/master ipsec-next/master]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
> config: i386-randconfig-s0-201630 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>    net/built-in.o: In function `dev_activate':
> >> (.text+0x37ccb): undefined reference to `qdisc_hash_add'

Dear 0-day team,

could you please check my question regarding this very build failure here?

	lkml.kernel.org/r/alpine.LNX.2.00.1607141612560.24757@cbobk.fhfr.pm

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
  2016-07-28 11:10                     ` kbuild test robot
@ 2016-07-28 11:22                     ` kbuild test robot
  2016-07-28 11:52                     ` kbuild test robot
                                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 43+ messages in thread
From: kbuild test robot @ 2016-07-28 11:22 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]

Hi,

[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: sh-edosk7760_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> net/sched/sch_generic.c:733: undefined reference to `qdisc_hash_add'

vim +733 net/sched/sch_generic.c

   727			qdisc = qdisc_create_dflt(txq, &mq_qdisc_ops, TC_H_ROOT);
   728			if (qdisc) {
   729				dev->qdisc = qdisc;
   730				qdisc->ops->attach(qdisc);
   731			}
   732		}
 > 733		if (dev->qdisc)
   734			qdisc_hash_add(dev->qdisc);
   735	}
   736	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 12180 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
  2016-07-28 11:10                     ` kbuild test robot
  2016-07-28 11:22                     ` kbuild test robot
@ 2016-07-28 11:52                     ` kbuild test robot
  2016-07-28 16:52                     ` Cong Wang
  2016-07-29  7:49                     ` [PATCH v5] " Jiri Kosina
  4 siblings, 0 replies; 43+ messages in thread
From: kbuild test robot @ 2016-07-28 11:52 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

Hi,

[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: arm-sunxi_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> dns_query.c:(.text+0x39adc): undefined reference to `qdisc_hash_add'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 19385 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28 11:18                       ` Jiri Kosina
@ 2016-07-28 12:53                         ` Fengguang Wu
  2016-07-28 16:53                           ` Cong Wang
  2016-07-31 11:21                           ` Fengguang Wu
  0 siblings, 2 replies; 43+ messages in thread
From: Fengguang Wu @ 2016-07-28 12:53 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

On Thu, Jul 28, 2016 at 01:18:27PM +0200, Jiri Kosina wrote:
>On Thu, 28 Jul 2016, kbuild test robot wrote:
>
>> [auto build test ERROR on v4.7-rc7]
>> [also build test ERROR on next-20160728]
>> [cannot apply to net/master net-next/master ipsec-next/master]
>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>>
>> url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
>> config: i386-randconfig-s0-201630 (attached as .config)
>> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
>> reproduce:
>>         # save the attached .config to linux build tree
>>         make ARCH=i386
>>
>> All errors (new ones prefixed by >>):
>>
>>    net/built-in.o: In function `dev_activate':
>> >> (.text+0x37ccb): undefined reference to `qdisc_hash_add'
>
>Dear 0-day team,
>
>could you please check my question regarding this very build failure here?
>
>	lkml.kernel.org/r/alpine.LNX.2.00.1607141612560.24757@cbobk.fhfr.pm

Sorry I missed that. For your convenience, here is the answer to the
original email:

>This issue is be there even without my patch (but with qdisc_list_add
>instead), isn't it?

Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'

>The problem is that sch_generic.c (where dev_activate() is) is being
>compiled everytime CONFIG_NET is set, but sch_api.c (where
>qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
>is no stub for !CONFIG_NET_SCHED case).

So it looks like a more general problem than specific to this patch.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
                                       ` (2 preceding siblings ...)
  2016-07-28 11:52                     ` kbuild test robot
@ 2016-07-28 16:52                     ` Cong Wang
  2016-07-29  7:49                     ` [PATCH v5] " Jiri Kosina
  4 siblings, 0 replies; 43+ messages in thread
From: Cong Wang @ 2016-07-28 16:52 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Eric Dumazet, Jamal Hadi Salim, Phil Sutter,
	Linux Kernel Network Developers, LKML

On Thu, Jul 28, 2016 at 2:56 AM, Jiri Kosina <jikos@kernel.org> wrote:
> From: Jiri Kosina <jkosina@suse.cz>
>
> Convert the per-device linked list into a hashtable. The primary
> motivation for this change is that currently, we're not tracking all the
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
> performed over the linked list by qdisc_match_from_root() is rather
> expensive.
>
> The ultimate goal is to get rid of hidden qdiscs completely, which will
> bring much more determinism in user experience.
>
> As we're adding hashtable.h include into generic netdevice.h, we have to
> make sure HASH_SIZE macro is now non-conflicting with local definitions.
>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> v1 -> v2:     fix up RCU hastable usage wrt. rtnl
>               fix compilation of .c files which define their own
>               HASH_SIZE that now oncflicts with the one from
>               hashtable.h (newly included via netdevice.h)
>
> v2 -> v3:     resolve HASH_SIZE identifier conflicts in a cleaner way
>               fix up the number of hash bucket bits (4 bits for 16 buckets)
>
> v3 -> v4:     put the hastable into struct netdevice only if
>               CONFIG_NET_SCHED has been enabled

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>

Thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28 12:53                         ` Fengguang Wu
@ 2016-07-28 16:53                           ` Cong Wang
  2016-07-31 11:21                           ` Fengguang Wu
  1 sibling, 0 replies; 43+ messages in thread
From: Cong Wang @ 2016-07-28 16:53 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Jiri Kosina, kbuild-all, Eric Dumazet, Jamal Hadi Salim,
	Phil Sutter, Linux Kernel Network Developers, LKML

On Thu, Jul 28, 2016 at 5:53 AM, Fengguang Wu <lkp@intel.com> wrote:
> On Thu, Jul 28, 2016 at 01:18:27PM +0200, Jiri Kosina wrote:
>> This issue is be there even without my patch (but with qdisc_list_add
>> instead), isn't it?
>
>
> Yes it looks so, this number happens in a number of places:
>
> dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
> include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
> net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
> net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
> sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
> switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
> sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
> (.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
> (.text+0x37d0b): undefined reference to `qdisc_hash_add'
> wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'
>
>> The problem is that sch_generic.c (where dev_activate() is) is being
>> compiled everytime CONFIG_NET is set, but sch_api.c (where
>> qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
>> is no stub for !CONFIG_NET_SCHED case).
>
>
> So it looks like a more general problem than specific to this patch.

Agreed. I can send a patch if Jiri doesn't. ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v5] net: sched: convert qdisc linked list to hashtable
  2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
                                       ` (3 preceding siblings ...)
  2016-07-28 16:52                     ` Cong Wang
@ 2016-07-29  7:49                     ` Jiri Kosina
  2016-07-29 20:01                       ` kbuild test robot
                                         ` (2 more replies)
  4 siblings, 3 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-07-29  7:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel, Cong Wang

From: Jiri Kosina <jkosina@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

v1 -> v2:     fix up RCU hastable usage wrt. rtnl
              fix compilation of .c files which define their own
              HASH_SIZE that now oncflicts with the one from
              hashtable.h (newly included via netdevice.h)

v2 -> v3:     resolve HASH_SIZE identifier conflicts in a cleaner way
              fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4:     put the hastable into struct netdevice only if 
              CONFIG_NET_SCHED has been enabled

v4 -> v5:     fix !CONFIG_NET_SCHED build (reported by Fengguang Wu)
	      add Cong Wang's reviewed-by

 include/linux/netdevice.h |  4 ++++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c            |  3 +++
 net/ipv6/ip6_gre.c        | 12 ++++++------
 net/ipv6/ip6_tunnel.c     | 10 +++++-----
 net/ipv6/ip6_vti.c        | 10 +++++-----
 net/ipv6/sit.c            | 10 +++++-----
 net/sched/sch_api.c       | 23 +++++++++++++----------
 net/sched/sch_generic.c   |  8 +++++---
 net/sched/sch_mq.c        |  2 +-
 net/sched/sch_mqprio.c    |  2 +-
 12 files changed, 51 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
 	unsigned int		num_tx_queues;
 	unsigned int		real_num_tx_queues;
 	struct Qdisc		*qdisc;
+#ifdef CONFIG_NET_SCHED
+	DECLARE_HASHTABLE	(qdisc_hash, 4);
+#endif
 	unsigned long		tx_queue_len;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
-	struct list_head	list;
+	struct hlist_node       hash;
 	u32			handle;
 	u32			parent;
 	int			(*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->all_adj_list.lower);
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
+#ifdef CONFIG_NET_SCHED
+	hash_init(dev->qdisc_hash);
+#endif
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-	struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+	struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
 	struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
    will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1))
 static u32 HASH_ADDR(const struct in6_addr *addr)
 {
 	u32 hash = ipv6_addr_hash(addr);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT);
 }
 
 #define tunnels_r_l	tunnels[3]
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head)
 
 	for (prio = 0; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_GRE_HASH_SIZE; h++) {
 			struct ip6_tnl *t;
 
 			t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..2050217 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -64,8 +64,8 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("ip6tnl");
 MODULE_ALIAS_NETDEV("ip6tnl0");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_TUNNEL_HASH_SIZE_SHIFT  5
+#define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT)
 
 static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
@@ -75,7 +75,7 @@ static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_TUNNEL_HASH_SIZE_SHIFT);
 }
 
 static int ip6_tnl_dev_init(struct net_device *dev);
@@ -87,7 +87,7 @@ struct ip6_tnl_net {
 	/* the IPv6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_TUNNEL_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -2031,7 +2031,7 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
 		if (dev->rtnl_link_ops == &ip6_link_ops)
 			unregister_netdevice_queue(dev, &list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_TUNNEL_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			/* If dev is in the same netns, it has already
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..cc7e058 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -50,14 +50,14 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_VTI_HASH_SIZE_SHIFT  5
+#define IP6_VTI_HASH_SIZE (1 << IP6_VTI_HASH_SIZE_SHIFT)
 
 static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_VTI_HASH_SIZE_SHIFT);
 }
 
 static int vti6_dev_init(struct net_device *dev);
@@ -69,7 +69,7 @@ struct vti6_net {
 	/* the vti6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_VTI_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -1040,7 +1040,7 @@ static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
 	struct ip6_tnl *t;
 	LIST_HEAD(list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_VTI_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			unregister_netdevice_queue(t->dev, &list);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0a5a255..94dd0f0 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -62,7 +62,7 @@
    For comments look at net/ipv4/ip_gre.c --ANK
  */
 
-#define HASH_SIZE  16
+#define IP6_SIT_HASH_SIZE  16
 #define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
 
 static bool log_ecn_error = true;
@@ -78,9 +78,9 @@ static struct rtnl_link_ops sit_link_ops __read_mostly;
 
 static int sit_net_id __read_mostly;
 struct sit_net {
-	struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r_l[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_l[IP6_SIT_HASH_SIZE];
 	struct ip_tunnel __rcu *tunnels_wc[1];
 	struct ip_tunnel __rcu **tunnels[4];
 
@@ -1773,7 +1773,7 @@ static void __net_exit sit_destroy_tunnels(struct net *net,
 
 	for (prio = 1; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_SIT_HASH_SIZE; h++) {
 			struct ip_tunnel *t;
 
 			t = rtnl_dereference(sitn->tunnels[prio][h]);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..c093d32 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include <linux/hrtimer.h>
 #include <linux/lockdep.h>
 #include <linux/slab.h>
+#include <linux/hashtable.h>
 
 #include <net/net_namespace.h>
 #include <net/sock.h>
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 	    root->handle == handle)
 		return root;
 
-	list_for_each_entry_rcu(q, &root->list, list) {
+	hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
 		if (q->handle == handle)
 			return q;
 	}
 	return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		struct Qdisc *root = qdisc_dev(q)->qdisc;
 
 		WARN_ON_ONCE(root == &noop_qdisc);
 		ASSERT_RTNL();
-		list_add_tail_rcu(&q->list, &root->list);
+		hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		ASSERT_RTNL();
-		list_del_rcu(&q->list);
+		hash_del_rcu(&q->hash);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out4;
 		}
 
-		qdisc_list_add(sch);
+		qdisc_hash_add(sch);
 
 		return sch;
 	}
@@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 {
 	int ret = 0, q_idx = *q_idx_p;
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
 			continue;
@@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 			       int *t_p, int s_t)
 {
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
 		return -1;
 
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..94d5999 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
 	.dequeue	=	noop_dequeue,
 	.flags		=	TCQ_F_BUILTIN,
 	.ops		=	&noop_qdisc_ops,
-	.list		=	LIST_HEAD_INIT(noop_qdisc.list),
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
 		sch->padded = (char *) sch - (char *) p;
 	}
-	INIT_LIST_HEAD(&sch->list);
 	skb_queue_head_init(&sch->q);
 
 	spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 		return;
 
 #ifdef CONFIG_NET_SCHED
-	qdisc_list_del(qdisc);
+	qdisc_hash_del(qdisc);
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
@@ -732,6 +730,10 @@ static void attach_default_qdiscs(struct net_device *dev)
 			qdisc->ops->attach(qdisc);
 		}
 	}
+#ifdef CONFIG_NET_SCHED
+	if (dev->qdisc)
+		qdisc_hash_add(dev->qdisc);
+#endif
 }
 
 static void transition_one_qdisc(struct net_device *dev,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 56a77b8..3bee15d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -88,7 +88,7 @@ static void mq_attach(struct Qdisc *sch)
 			qdisc_destroy(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 #endif
 
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b8002ce..dbfb3a5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -182,7 +182,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		if (old)
 			qdisc_destroy(old);
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 	}
 	kfree(priv->qdiscs);
 	priv->qdiscs = NULL;

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v5] net: sched: convert qdisc linked list to hashtable
  2016-07-29  7:49                     ` [PATCH v5] " Jiri Kosina
@ 2016-07-29 20:01                       ` kbuild test robot
  2016-07-31  0:11                       ` kbuild test robot
  2016-08-01 10:23                       ` [PATCH v6] " Jiri Kosina
  2 siblings, 0 replies; 43+ messages in thread
From: kbuild test robot @ 2016-07-29 20:01 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

[-- Attachment #1: Type: text/plain, Size: 3331 bytes --]

Hi,

[auto build test WARNING on v4.7-rc7]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160729-155412
config: x86_64-randconfig-s1-07292101 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from drivers/net/ethernet/dec/tulip/de4x5.c:480:
   drivers/net/ethernet/dec/tulip/de4x5.h:864:1: warning: "HASH_BITS" redefined
   In file included from include/linux/netdevice.h:55,
                    from drivers/net/ethernet/dec/tulip/de4x5.c:459:
>> include/linux/hashtable.h:27:1: warning: this is the location of the previous definition
   drivers/net/ethernet/dec/tulip/de4x5.o: warning: objtool: dma_free_coherent()+0x4b: function has unreachable instruction

vim +27 include/linux/hashtable.h

d9b482c8 Sasha Levin  2012-10-30  11  #include <linux/kernel.h>
d9b482c8 Sasha Levin  2012-10-30  12  #include <linux/hash.h>
d9b482c8 Sasha Levin  2012-10-30  13  #include <linux/rculist.h>
d9b482c8 Sasha Levin  2012-10-30  14  
d9b482c8 Sasha Levin  2012-10-30  15  #define DEFINE_HASHTABLE(name, bits)						\
d9b482c8 Sasha Levin  2012-10-30  16  	struct hlist_head name[1 << (bits)] =					\
d9b482c8 Sasha Levin  2012-10-30  17  			{ [0 ... ((1 << (bits)) - 1)] = HLIST_HEAD_INIT }
d9b482c8 Sasha Levin  2012-10-30  18  
6180d9de Eric Dumazet 2015-11-18  19  #define DEFINE_READ_MOSTLY_HASHTABLE(name, bits)				\
6180d9de Eric Dumazet 2015-11-18  20  	struct hlist_head name[1 << (bits)] __read_mostly =			\
6180d9de Eric Dumazet 2015-11-18  21  			{ [0 ... ((1 << (bits)) - 1)] = HLIST_HEAD_INIT }
6180d9de Eric Dumazet 2015-11-18  22  
d9b482c8 Sasha Levin  2012-10-30  23  #define DECLARE_HASHTABLE(name, bits)                                   	\
d9b482c8 Sasha Levin  2012-10-30  24  	struct hlist_head name[1 << (bits)]
d9b482c8 Sasha Levin  2012-10-30  25  
d9b482c8 Sasha Levin  2012-10-30  26  #define HASH_SIZE(name) (ARRAY_SIZE(name))
d9b482c8 Sasha Levin  2012-10-30 @27  #define HASH_BITS(name) ilog2(HASH_SIZE(name))
d9b482c8 Sasha Levin  2012-10-30  28  
d9b482c8 Sasha Levin  2012-10-30  29  /* Use hash_32 when possible to allow for fast 32bit hashing in 64bit kernels. */
d9b482c8 Sasha Levin  2012-10-30  30  #define hash_min(val, bits)							\
d9b482c8 Sasha Levin  2012-10-30  31  	(sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits))
d9b482c8 Sasha Levin  2012-10-30  32  
d9b482c8 Sasha Levin  2012-10-30  33  static inline void __hash_init(struct hlist_head *ht, unsigned int sz)
d9b482c8 Sasha Levin  2012-10-30  34  {
d9b482c8 Sasha Levin  2012-10-30  35  	unsigned int i;

:::::: The code at line 27 was first introduced by commit
:::::: d9b482c8ba1973a189f2d4c8175d405b87fbf2d7 hashtable: introduce a small and naive hashtable

:::::: TO: Sasha Levin <levinsasha928@gmail.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 25043 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v5] net: sched: convert qdisc linked list to hashtable
  2016-07-29  7:49                     ` [PATCH v5] " Jiri Kosina
  2016-07-29 20:01                       ` kbuild test robot
@ 2016-07-31  0:11                       ` kbuild test robot
  2016-08-01 10:23                       ` [PATCH v6] " Jiri Kosina
  2 siblings, 0 replies; 43+ messages in thread
From: kbuild test robot @ 2016-07-31  0:11 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

[-- Attachment #1: Type: text/plain, Size: 13361 bytes --]

Hi,

[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160729]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160729-155412
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/ti/davinci_emac.c:736:57: error: macro "hash_add" requires 3 arguments, but only 2 given
    static int hash_add(struct emac_priv *priv, u8 *mac_addr)
                                                            ^
>> drivers/net/ethernet/ti/davinci_emac.c:737:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    {
    ^
>> drivers/net/ethernet/ti/davinci_emac.c:778:12: error: conflicting types for 'hash_del'
    static int hash_del(struct emac_priv *priv, u8 *mac_addr)
               ^
   In file included from include/linux/netdevice.h:55:0,
                    from drivers/net/ethernet/ti/davinci_emac.c:44:
   include/linux/hashtable.h:104:20: note: previous definition of 'hash_del' was here
    static inline void hash_del(struct hlist_node *node)
                       ^
   drivers/net/ethernet/ti/davinci_emac.c: In function 'emac_add_mcast':
   drivers/net/ethernet/ti/davinci_emac.c:828:35: error: macro "hash_add" requires 3 arguments, but only 2 given
      update = hash_add(priv, mac_addr);
                                      ^
>> drivers/net/ethernet/ti/davinci_emac.c:828:12: error: 'hash_add' undeclared (first use in this function)
      update = hash_add(priv, mac_addr);
               ^
   drivers/net/ethernet/ti/davinci_emac.c:828:12: note: each undeclared identifier is reported only once for each function it appears in

vim +/hash_add +736 drivers/net/ethernet/ti/davinci_emac.c

a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  730   * @priv: The DaVinci EMAC private adapter structure
49ce9c2c drivers/net/ethernet/ti/davinci_emac.c Ben Hutchings 2012-07-10  731   * @mac_addr: mac address to delete from hash table
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  732   *
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  733   * Adds mac address to the internal hash table
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  734   *
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  735   */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18 @736  static int hash_add(struct emac_priv *priv, u8 *mac_addr)
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18 @737  {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  738  	struct device *emac_dev = &priv->ndev->dev;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  739  	u32 rc = 0;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  740  	u32 hash_bit;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  741  	u32 hash_value = hash_get(mac_addr);
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  742  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  743  	if (hash_value >= EMAC_NUM_MULTICAST_BITS) {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  744  		if (netif_msg_drv(priv)) {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  745  			dev_err(emac_dev, "DaVinci EMAC: hash_add(): Invalid "\
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  746  				"Hash %08x, should not be greater than %08x",
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  747  				hash_value, (EMAC_NUM_MULTICAST_BITS - 1));
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  748  		}
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  749  		return -1;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  750  	}
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  751  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  752  	/* set the hash bit only if not previously set */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  753  	if (priv->multicast_hash_cnt[hash_value] == 0) {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  754  		rc = 1; /* hash value changed */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  755  		if (hash_value < 32) {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  756  			hash_bit = BIT(hash_value);
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  757  			priv->mac_hash1 |= hash_bit;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  758  		} else {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  759  			hash_bit = BIT((hash_value - 32));
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  760  			priv->mac_hash2 |= hash_bit;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  761  		}
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  762  	}
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  763  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  764  	/* incr counter for num of mcast addr's mapped to "this" hash bit */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  765  	++priv->multicast_hash_cnt[hash_value];
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  766  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  767  	return rc;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  768  }
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  769  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  770  /**
49ce9c2c drivers/net/ethernet/ti/davinci_emac.c Ben Hutchings 2012-07-10  771   * hash_del - Hash function to delete mac addr from hash table
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  772   * @priv: The DaVinci EMAC private adapter structure
49ce9c2c drivers/net/ethernet/ti/davinci_emac.c Ben Hutchings 2012-07-10  773   * @mac_addr: mac address to delete from hash table
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  774   *
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  775   * Removes mac address from the internal hash table
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  776   *
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  777   */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18 @778  static int hash_del(struct emac_priv *priv, u8 *mac_addr)
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  779  {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  780  	u32 hash_value;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  781  	u32 hash_bit;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  782  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  783  	hash_value = hash_get(mac_addr);
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  784  	if (priv->multicast_hash_cnt[hash_value] > 0) {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  785  		/* dec cntr for num of mcast addr's mapped to this hash bit */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  786  		--priv->multicast_hash_cnt[hash_value];
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  787  	}
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  788  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  789  	/* if counter still > 0, at least one multicast address refers
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  790  	 * to this hash bit. so return 0 */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  791  	if (priv->multicast_hash_cnt[hash_value] > 0)
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  792  		return 0;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  793  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  794  	if (hash_value < 32) {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  795  		hash_bit = BIT(hash_value);
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  796  		priv->mac_hash1 &= ~hash_bit;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  797  	} else {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  798  		hash_bit = BIT((hash_value - 32));
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  799  		priv->mac_hash2 &= ~hash_bit;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  800  	}
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  801  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  802  	/* return 1 to indicate change in mac_hash registers reqd */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  803  	return 1;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  804  }
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  805  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  806  /* EMAC multicast operation */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  807  #define EMAC_MULTICAST_ADD	0
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  808  #define EMAC_MULTICAST_DEL	1
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  809  #define EMAC_ALL_MULTI_SET	2
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  810  #define EMAC_ALL_MULTI_CLR	3
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  811  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  812  /**
49ce9c2c drivers/net/ethernet/ti/davinci_emac.c Ben Hutchings 2012-07-10  813   * emac_add_mcast - Set multicast address in the EMAC adapter (Internal)
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  814   * @priv: The DaVinci EMAC private adapter structure
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  815   * @action: multicast operation to perform
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  816   * mac_addr: mac address to set
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  817   *
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  818   * Set multicast addresses in EMAC adapter - internal function
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  819   *
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  820   */
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  821  static void emac_add_mcast(struct emac_priv *priv, u32 action, u8 *mac_addr)
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  822  {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  823  	struct device *emac_dev = &priv->ndev->dev;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  824  	int update = -1;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  825  
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  826  	switch (action) {
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  827  	case EMAC_MULTICAST_ADD:
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18 @828  		update = hash_add(priv, mac_addr);
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  829  		break;
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  830  	case EMAC_MULTICAST_DEL:
a6286ee6 drivers/net/davinci_emac.c             Anant Gole    2009-05-18  831  		update = hash_del(priv, mac_addr);

:::::: The code at line 736 was first introduced by commit
:::::: a6286ee630f6d95f8466e19d7f1ae38d677028ae net: Add TI DaVinci EMAC driver

:::::: TO: Anant Gole <anantgole@ti.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 57608 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-28 12:53                         ` Fengguang Wu
  2016-07-28 16:53                           ` Cong Wang
@ 2016-07-31 11:21                           ` Fengguang Wu
  2016-08-01 10:12                             ` Jiri Kosina
  1 sibling, 1 reply; 43+ messages in thread
From: Fengguang Wu @ 2016-07-31 11:21 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

On Thu, Jul 28, 2016 at 08:53:12PM +0800, Fengguang Wu wrote:
>On Thu, Jul 28, 2016 at 01:18:27PM +0200, Jiri Kosina wrote:
>>On Thu, 28 Jul 2016, kbuild test robot wrote:
>>
>>> [auto build test ERROR on v4.7-rc7]
>>> [also build test ERROR on next-20160728]
>>> [cannot apply to net/master net-next/master ipsec-next/master]
>>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>>>
>>> url:    https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
>>> config: i386-randconfig-s0-201630 (attached as .config)
>>> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
>>> reproduce:
>>>         # save the attached .config to linux build tree
>>>         make ARCH=i386
>>>
>>> All errors (new ones prefixed by >>):
>>>
>>>    net/built-in.o: In function `dev_activate':
>>> >> (.text+0x37ccb): undefined reference to `qdisc_hash_add'
>>
>>Dear 0-day team,
>>
>>could you please check my question regarding this very build failure here?
>>
>>	lkml.kernel.org/r/alpine.LNX.2.00.1607141612560.24757@cbobk.fhfr.pm
>
>Sorry I missed that. For your convenience, here is the answer to the
>original email:
>
>>This issue is be there even without my patch (but with qdisc_list_add
>>instead), isn't it?
>
>Yes it looks so, this number happens in a number of places:
>
>dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
>include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
>net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
>net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
>sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
>switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
>sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
>(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
>(.text+0x37d0b): undefined reference to `qdisc_hash_add'
>wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'

Jiri, I just double checked and find no similar errors related to
qdisc_list_add(). The parent commit 95556a8838 ("dccp: avoid deadlock
in dccp_v4_ctl_send_reset") builds fine without error.

Thanks,
Fengguang

>>The problem is that sch_generic.c (where dev_activate() is) is being
>>compiled everytime CONFIG_NET is set, but sch_api.c (where
>>qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
>>is no stub for !CONFIG_NET_SCHED case).
>
>So it looks like a more general problem than specific to this patch.
>
>Thanks,
>Fengguang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
  2016-07-31 11:21                           ` Fengguang Wu
@ 2016-08-01 10:12                             ` Jiri Kosina
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-08-01 10:12 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: kbuild-all, Eric Dumazet, Jamal Hadi Salim, Phil Sutter, netdev,
	linux-kernel, Cong Wang

On Sun, 31 Jul 2016, Fengguang Wu wrote:

> Jiri, I just double checked and find no similar errors related to
> qdisc_list_add(). The parent commit 95556a8838 ("dccp: avoid deadlock
> in dccp_v4_ctl_send_reset") builds fine without error.

You are right, I realized my mistake afterwards. This is fixed in v5 of 
the patch.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v6] net: sched: convert qdisc linked list to hashtable
  2016-07-29  7:49                     ` [PATCH v5] " Jiri Kosina
  2016-07-29 20:01                       ` kbuild test robot
  2016-07-31  0:11                       ` kbuild test robot
@ 2016-08-01 10:23                       ` Jiri Kosina
  2 siblings, 0 replies; 43+ messages in thread
From: Jiri Kosina @ 2016-08-01 10:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, Phil Sutter, netdev, linux-kernel, Cong Wang

From: Jiri Kosina <jkosina@suse.cz>
 
Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

v1 -> v2:     fix up RCU hastable usage wrt. rtnl
              fix compilation of .c files which define their own
              HASH_SIZE that now oncflicts with the one from
              hashtable.h (newly included via netdevice.h)

v2 -> v3:     resolve HASH_SIZE identifier conflicts in a cleaner way
              fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4:     put the hastable into struct netdevice only if 
              CONFIG_NET_SCHED has been enabled

v4 -> v5:     fix !CONFIG_NET_SCHED build (reported by Fengguang Wu)
	      add Cong Wang's reviewed-by

v5 -> v6:     build fix for davinci_emac driver that got symbol conflict 
	      due to hashtable.h include, reported by 0day bot

 drivers/net/ethernet/ti/davinci_emac.c | 14 +++++++-------
 include/linux/netdevice.h              |  4 ++++
 include/net/pkt_sched.h                |  4 ++--
 include/net/sch_generic.h              |  2 +-
 net/core/dev.c                         |  3 +++
 net/ipv6/ip6_gre.c                     | 12 ++++++------
 net/ipv6/ip6_tunnel.c                  | 10 +++++-----
 net/ipv6/ip6_vti.c                     | 10 +++++-----
 net/ipv6/sit.c                         | 10 +++++-----
 net/sched/sch_api.c                    | 23 +++++++++++++----------
 net/sched/sch_generic.c                |  8 +++++---
 net/sched/sch_mq.c                     |  2 +-
 net/sched/sch_mqprio.c                 |  2 +-
 13 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index f56d66e..91ca2b2 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -726,14 +726,14 @@ static u32 hash_get(u8 *addr)
 }
 
 /**
- * hash_add - Hash function to add mac addr from hash table
+ * emac_hash_add - Hash function to add mac addr from hash table
  * @priv: The DaVinci EMAC private adapter structure
  * @mac_addr: mac address to delete from hash table
  *
  * Adds mac address to the internal hash table
  *
  */
-static int hash_add(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_add(struct emac_priv *priv, u8 *mac_addr)
 {
 	struct device *emac_dev = &priv->ndev->dev;
 	u32 rc = 0;
@@ -742,7 +742,7 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)
 
 	if (hash_value >= EMAC_NUM_MULTICAST_BITS) {
 		if (netif_msg_drv(priv)) {
-			dev_err(emac_dev, "DaVinci EMAC: hash_add(): Invalid "\
+			dev_err(emac_dev, "DaVinci EMAC: emac_hash_add(): Invalid "\
 				"Hash %08x, should not be greater than %08x",
 				hash_value, (EMAC_NUM_MULTICAST_BITS - 1));
 		}
@@ -768,14 +768,14 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)
 }
 
 /**
- * hash_del - Hash function to delete mac addr from hash table
+ * emac_hash_del - Hash function to delete mac addr from hash table
  * @priv: The DaVinci EMAC private adapter structure
  * @mac_addr: mac address to delete from hash table
  *
  * Removes mac address from the internal hash table
  *
  */
-static int hash_del(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_del(struct emac_priv *priv, u8 *mac_addr)
 {
 	u32 hash_value;
 	u32 hash_bit;
@@ -825,10 +825,10 @@ static void emac_add_mcast(struct emac_priv *priv, u32 action, u8 *mac_addr)
 
 	switch (action) {
 	case EMAC_MULTICAST_ADD:
-		update = hash_add(priv, mac_addr);
+		update = emac_hash_add(priv, mac_addr);
 		break;
 	case EMAC_MULTICAST_DEL:
-		update = hash_del(priv, mac_addr);
+		update = emac_hash_del(priv, mac_addr);
 		break;
 	case EMAC_ALL_MULTI_SET:
 		update = 1;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
 	unsigned int		num_tx_queues;
 	unsigned int		real_num_tx_queues;
 	struct Qdisc		*qdisc;
+#ifdef CONFIG_NET_SCHED
+	DECLARE_HASHTABLE	(qdisc_hash, 4);
+#endif
 	unsigned long		tx_queue_len;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
-	struct list_head	list;
+	struct hlist_node       hash;
 	u32			handle;
 	u32			parent;
 	int			(*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->all_adj_list.lower);
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
+#ifdef CONFIG_NET_SCHED
+	hash_init(dev->qdisc_hash);
+#endif
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-	struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+	struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
 	struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
    will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1))
 static u32 HASH_ADDR(const struct in6_addr *addr)
 {
 	u32 hash = ipv6_addr_hash(addr);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT);
 }
 
 #define tunnels_r_l	tunnels[3]
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head)
 
 	for (prio = 0; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_GRE_HASH_SIZE; h++) {
 			struct ip6_tnl *t;
 
 			t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..2050217 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -64,8 +64,8 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("ip6tnl");
 MODULE_ALIAS_NETDEV("ip6tnl0");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_TUNNEL_HASH_SIZE_SHIFT  5
+#define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT)
 
 static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
@@ -75,7 +75,7 @@ static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_TUNNEL_HASH_SIZE_SHIFT);
 }
 
 static int ip6_tnl_dev_init(struct net_device *dev);
@@ -87,7 +87,7 @@ struct ip6_tnl_net {
 	/* the IPv6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_TUNNEL_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -2031,7 +2031,7 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
 		if (dev->rtnl_link_ops == &ip6_link_ops)
 			unregister_netdevice_queue(dev, &list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_TUNNEL_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			/* If dev is in the same netns, it has already
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..cc7e058 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -50,14 +50,14 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_VTI_HASH_SIZE_SHIFT  5
+#define IP6_VTI_HASH_SIZE (1 << IP6_VTI_HASH_SIZE_SHIFT)
 
 static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
 {
 	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
 
-	return hash_32(hash, HASH_SIZE_SHIFT);
+	return hash_32(hash, IP6_VTI_HASH_SIZE_SHIFT);
 }
 
 static int vti6_dev_init(struct net_device *dev);
@@ -69,7 +69,7 @@ struct vti6_net {
 	/* the vti6 tunnel fallback device */
 	struct net_device *fb_tnl_dev;
 	/* lists for storing tunnels in use */
-	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_r_l[IP6_VTI_HASH_SIZE];
 	struct ip6_tnl __rcu *tnls_wc[1];
 	struct ip6_tnl __rcu **tnls[2];
 };
@@ -1040,7 +1040,7 @@ static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
 	struct ip6_tnl *t;
 	LIST_HEAD(list);
 
-	for (h = 0; h < HASH_SIZE; h++) {
+	for (h = 0; h < IP6_VTI_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
 			unregister_netdevice_queue(t->dev, &list);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0a5a255..94dd0f0 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -62,7 +62,7 @@
    For comments look at net/ipv4/ip_gre.c --ANK
  */
 
-#define HASH_SIZE  16
+#define IP6_SIT_HASH_SIZE  16
 #define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
 
 static bool log_ecn_error = true;
@@ -78,9 +78,9 @@ static struct rtnl_link_ops sit_link_ops __read_mostly;
 
 static int sit_net_id __read_mostly;
 struct sit_net {
-	struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
-	struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r_l[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_r[IP6_SIT_HASH_SIZE];
+	struct ip_tunnel __rcu *tunnels_l[IP6_SIT_HASH_SIZE];
 	struct ip_tunnel __rcu *tunnels_wc[1];
 	struct ip_tunnel __rcu **tunnels[4];
 
@@ -1773,7 +1773,7 @@ static void __net_exit sit_destroy_tunnels(struct net *net,
 
 	for (prio = 1; prio < 4; prio++) {
 		int h;
-		for (h = 0; h < HASH_SIZE; h++) {
+		for (h = 0; h < IP6_SIT_HASH_SIZE; h++) {
 			struct ip_tunnel *t;
 
 			t = rtnl_dereference(sitn->tunnels[prio][h]);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..c093d32 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include <linux/hrtimer.h>
 #include <linux/lockdep.h>
 #include <linux/slab.h>
+#include <linux/hashtable.h>
 
 #include <net/net_namespace.h>
 #include <net/sock.h>
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 	    root->handle == handle)
 		return root;
 
-	list_for_each_entry_rcu(q, &root->list, list) {
+	hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
 		if (q->handle == handle)
 			return q;
 	}
 	return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		struct Qdisc *root = qdisc_dev(q)->qdisc;
 
 		WARN_ON_ONCE(root == &noop_qdisc);
 		ASSERT_RTNL();
-		list_add_tail_rcu(&q->list, &root->list);
+		hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
 		ASSERT_RTNL();
-		list_del_rcu(&q->list);
+		hash_del_rcu(&q->hash);
 	}
 }
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out4;
 		}
 
-		qdisc_list_add(sch);
+		qdisc_hash_add(sch);
 
 		return sch;
 	}
@@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 {
 	int ret = 0, q_idx = *q_idx_p;
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
 			continue;
@@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 			       int *t_p, int s_t)
 {
 	struct Qdisc *q;
+	int b;
 
 	if (!root)
 		return 0;
@@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
 		return -1;
 
-	list_for_each_entry(q, &root->list, list) {
+	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..94d5999 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
 	.dequeue	=	noop_dequeue,
 	.flags		=	TCQ_F_BUILTIN,
 	.ops		=	&noop_qdisc_ops,
-	.list		=	LIST_HEAD_INIT(noop_qdisc.list),
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
 		sch->padded = (char *) sch - (char *) p;
 	}
-	INIT_LIST_HEAD(&sch->list);
 	skb_queue_head_init(&sch->q);
 
 	spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 		return;
 
 #ifdef CONFIG_NET_SCHED
-	qdisc_list_del(qdisc);
+	qdisc_hash_del(qdisc);
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
@@ -732,6 +730,10 @@ static void attach_default_qdiscs(struct net_device *dev)
 			qdisc->ops->attach(qdisc);
 		}
 	}
+#ifdef CONFIG_NET_SCHED
+	if (dev->qdisc)
+		qdisc_hash_add(dev->qdisc);
+#endif
 }
 
 static void transition_one_qdisc(struct net_device *dev,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 56a77b8..3bee15d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -88,7 +88,7 @@ static void mq_attach(struct Qdisc *sch)
 			qdisc_destroy(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 #endif
 
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b8002ce..dbfb3a5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -182,7 +182,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		if (old)
 			qdisc_destroy(old);
 		if (ntx < dev->real_num_tx_queues)
-			qdisc_list_add(qdisc);
+			qdisc_hash_add(qdisc);
 	}
 	kfree(priv->qdiscs);
 	priv->qdiscs = NULL;
-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2016-08-01 10:24 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 14:44 Deleting child qdisc doesn't reset parent to default qdisc? Jiri Kosina
2016-04-14 14:51 ` Jiri Kosina
2016-04-14 15:01 ` Eric Dumazet
2016-04-14 15:18   ` Phil Sutter
2016-04-14 15:34     ` Jiri Kosina
2016-04-14 15:44       ` Eric Dumazet
2016-04-14 16:22         ` Phil Sutter
2016-04-14 16:40           ` Eric Dumazet
2016-04-14 16:08     ` Jiri Kosina
2016-04-14 17:49       ` Eric Dumazet
2016-04-15 12:42         ` Jamal Hadi Salim
2016-04-15 14:58           ` Eric Dumazet
2016-04-15 17:13             ` David Miller
2016-06-28 15:19             ` Jiri Kosina
2016-06-28 17:28               ` Cong Wang
2016-06-28 17:33                 ` Jiri Kosina
2016-06-28 16:49             ` [RFC PATCH] sch_tbf: avoid silent fallback to noop_qdisc (was Re: Deleting child qdisc doesn't reset parent to default qdisc?) Jiri Kosina
2016-07-07  9:04             ` [RFC PATCH] net: sched: convert qdisc linked list to hashtable " Jiri Kosina
2016-07-07 13:51               ` Eric Dumazet
2016-07-07 16:32                 ` Jiri Kosina
2016-07-07 16:53                   ` Eric Dumazet
2016-07-07 20:36               ` [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable Jiri Kosina
2016-07-08  8:50                 ` Eric Dumazet
2016-07-08  9:02                   ` Jiri Kosina
2016-07-08 11:07                 ` Thomas Graf
2016-07-08 13:52                   ` Eric Dumazet
2016-07-11 14:02                 ` [RFC PATCH v3] " Jiri Kosina
2016-07-12 17:36                   ` Cong Wang
2016-07-13 13:47                     ` Jiri Kosina
2016-07-28  9:56                   ` [PATCH v4] " Jiri Kosina
2016-07-28 11:10                     ` kbuild test robot
2016-07-28 11:18                       ` Jiri Kosina
2016-07-28 12:53                         ` Fengguang Wu
2016-07-28 16:53                           ` Cong Wang
2016-07-31 11:21                           ` Fengguang Wu
2016-08-01 10:12                             ` Jiri Kosina
2016-07-28 11:22                     ` kbuild test robot
2016-07-28 11:52                     ` kbuild test robot
2016-07-28 16:52                     ` Cong Wang
2016-07-29  7:49                     ` [PATCH v5] " Jiri Kosina
2016-07-29 20:01                       ` kbuild test robot
2016-07-31  0:11                       ` kbuild test robot
2016-08-01 10:23                       ` [PATCH v6] " Jiri Kosina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.