All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
@ 2016-11-03 13:55 Jesper Dangaard Brouer
  2016-11-03 13:56 ` [net-next PATCH 1/3] net: make default TX queue length a defined constant Jesper Dangaard Brouer
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-03 13:55 UTC (permalink / raw)
  To: netdev
  Cc: Phil Sutter, Robert Olsson, Jamal Hadi Salim, Jesper Dangaard Brouer

This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
hopefully help userspace get a more consistent behavior when attaching
qdisc to such virtual devices.

---

Jesper Dangaard Brouer (3):
      net: make default TX queue length a defined constant
      net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
      qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device


 include/net/pkt_sched.h |    2 ++
 net/core/dev.c          |    2 +-
 net/ethernet/eth.c      |    3 ++-
 net/sched/sch_api.c     |   11 +++++++++++
 4 files changed, 16 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [net-next PATCH 1/3] net: make default TX queue length a defined constant
  2016-11-03 13:55 [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices Jesper Dangaard Brouer
@ 2016-11-03 13:56 ` Jesper Dangaard Brouer
  2016-11-03 13:56 ` [net-next PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-03 13:56 UTC (permalink / raw)
  To: netdev
  Cc: Phil Sutter, Robert Olsson, Jamal Hadi Salim, Jesper Dangaard Brouer

The default TX queue length of Ethernet devices have been a magic
constant of 1000, ever since the initial git import.

Looking back in historical trees[1][2] the value used to be 100,
with the same comment "Ethernet wants good queues". The commit[3]
that changed this from 100 to 1000 didn't describe why, but from
conversations with Robert Olsson it seems that it was changed
when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
link speed increased x10 the queue size were also adjusted.  This
value later caused much heartache for the bufferbloat community.

This patch merely moves the value into a defined constant.

[1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/
[2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/
[3] https://git.kernel.org/tglx/history/c/98921832c232

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/pkt_sched.h |    2 ++
 net/ethernet/eth.c      |    3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index cd334c9584e9..f1b76b8e6d2d 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -6,6 +6,8 @@
 #include <linux/if_vlan.h>
 #include <net/sch_generic.h>
 
+#define DEFAULT_TX_QUEUE_LEN	1000
+
 struct qdisc_walker {
 	int	stop;
 	int	skip;
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index d9e2fe1da724..8c5a479681ca 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -62,6 +62,7 @@
 #include <net/dsa.h>
 #include <net/flow_dissector.h>
 #include <linux/uaccess.h>
+#include <net/pkt_sched.h>
 
 __setup("ether=", netdev_boot_setup);
 
@@ -359,7 +360,7 @@ void ether_setup(struct net_device *dev)
 	dev->min_mtu		= ETH_MIN_MTU;
 	dev->max_mtu		= ETH_DATA_LEN;
 	dev->addr_len		= ETH_ALEN;
-	dev->tx_queue_len	= 1000;	/* Ethernet wants good queues */
+	dev->tx_queue_len	= DEFAULT_TX_QUEUE_LEN;
 	dev->flags		= IFF_BROADCAST|IFF_MULTICAST;
 	dev->priv_flags		|= IFF_TX_SKB_SHARING;
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
  2016-11-03 13:55 [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices Jesper Dangaard Brouer
  2016-11-03 13:56 ` [net-next PATCH 1/3] net: make default TX queue length a defined constant Jesper Dangaard Brouer
@ 2016-11-03 13:56 ` Jesper Dangaard Brouer
  2016-11-03 20:54   ` Krister Johansen
  2016-11-03 13:56 ` [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-03 13:56 UTC (permalink / raw)
  To: netdev
  Cc: Phil Sutter, Robert Olsson, Jamal Hadi Salim, Jesper Dangaard Brouer

The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a
default qdisc attached, given they will be backed by physical device,
that already have a qdisc attached for pushback.

It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as
this can be useful for difference policy reasons (e.g. bandwidth
limiting containers).  For this to work, the tx_queue_len need to have
a sane value, because some qdiscs inherit/copy the tx_queue_len
(namely, pfifo, bfifo, gred, htb, plug and sfb).

Commit a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling
ether_setup()") caught situations where some drivers didn't initialize
tx_queue_len.  The problem with the commit was choosing 1 as the
fallback value.

A qdisc queue length of 1 causes more harm than good, because it
creates hard to debug situations for userspace. It gives userspace a
false sense of a working config after attaching a qdisc.  As low
volume traffic (that doesn't activate the qdisc policy) works,
like ping, while traffic that e.g. needs shaping cannot reach the
configured policy levels, given the queue length is too small.

This patch change the value to DEFAULT_TX_QUEUE_LEN, given other
IFF_NO_QUEUE devices (that call ether_setup()) also use this value.

Fixes: a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f23e28668f32..0260ad314506 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7651,7 +7651,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 
 	if (!dev->tx_queue_len) {
 		dev->priv_flags |= IFF_NO_QUEUE;
-		dev->tx_queue_len = 1;
+		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
 	}
 
 	dev->num_tx_queues = txqs;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-03 13:55 [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices Jesper Dangaard Brouer
  2016-11-03 13:56 ` [net-next PATCH 1/3] net: make default TX queue length a defined constant Jesper Dangaard Brouer
  2016-11-03 13:56 ` [net-next PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len Jesper Dangaard Brouer
@ 2016-11-03 13:56 ` Jesper Dangaard Brouer
  2016-11-04  9:35   ` Phil Sutter
                     ` (2 more replies)
  2016-11-07 18:13 ` [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices David Miller
  2016-11-08  1:16 ` David Miller
  4 siblings, 3 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-03 13:56 UTC (permalink / raw)
  To: netdev
  Cc: Phil Sutter, Robert Olsson, Jamal Hadi Salim, Jesper Dangaard Brouer

It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] https://github.com/docker/libcontainer/pull/193
[2] https://github.com/openshift/origin/pull/11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/sched/sch_api.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 206dc24add3a..f337f1bdd1d4 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -960,6 +960,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
 
 	sch->handle = handle;
 
+	/* This exist to keep backward compatible with a userspace
+	 * loophole, what allowed userspace to get IFF_NO_QUEUE
+	 * facility on older kernels by setting tx_queue_len=0 (prior
+	 * to qdisc init), and then forgot to reinit tx_queue_len
+	 * before again attaching a qdisc.
+	 */
+	if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) {
+		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
+		netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
+	}
+
 	if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) {
 		if (qdisc_is_percpu_stats(sch)) {
 			sch->cpu_bstats =

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
  2016-11-03 13:56 ` [net-next PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len Jesper Dangaard Brouer
@ 2016-11-03 20:54   ` Krister Johansen
  2016-11-04 10:56     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 16+ messages in thread
From: Krister Johansen @ 2016-11-03 20:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Phil Sutter, Robert Olsson, Jamal Hadi Salim

On Thu, Nov 03, 2016 at 02:56:06PM +0100, Jesper Dangaard Brouer wrote:
> The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a
> default qdisc attached, given they will be backed by physical device,
> that already have a qdisc attached for pushback.
> 
> It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as
> this can be useful for difference policy reasons (e.g. bandwidth
> limiting containers).  For this to work, the tx_queue_len need to have
> a sane value, because some qdiscs inherit/copy the tx_queue_len
> (namely, pfifo, bfifo, gred, htb, plug and sfb).
> 
> Commit a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling
> ether_setup()") caught situations where some drivers didn't initialize
> tx_queue_len.  The problem with the commit was choosing 1 as the
> fallback value.
> 
> A qdisc queue length of 1 causes more harm than good, because it
> creates hard to debug situations for userspace. It gives userspace a
> false sense of a working config after attaching a qdisc.  As low
> volume traffic (that doesn't activate the qdisc policy) works,
> like ping, while traffic that e.g. needs shaping cannot reach the
> configured policy levels, given the queue length is too small.

Thanks for fixing this.  I've run into this in the exact scenario you
describe -- bandwith limiting containers.  I'm pretty sure my vote
doesn't count, but I'm in favor of this change.

-K

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-03 13:56 ` [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device Jesper Dangaard Brouer
@ 2016-11-04  9:35   ` Phil Sutter
  2016-11-04 10:10     ` Jesper Dangaard Brouer
  2016-11-04 12:53   ` Sergei Shtylyov
  2016-11-08  6:14   ` Maciej Żenczykowski
  2 siblings, 1 reply; 16+ messages in thread
From: Phil Sutter @ 2016-11-04  9:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, Robert Olsson, Jamal Hadi Salim

Hi,

On Thu, Nov 03, 2016 at 02:56:11PM +0100, Jesper Dangaard Brouer wrote:
[...]
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 206dc24add3a..f337f1bdd1d4 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -960,6 +960,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
>  
>  	sch->handle = handle;
>  
> +	/* This exist to keep backward compatible with a userspace
> +	 * loophole, what allowed userspace to get IFF_NO_QUEUE
> +	 * facility on older kernels by setting tx_queue_len=0 (prior
> +	 * to qdisc init), and then forgot to reinit tx_queue_len
> +	 * before again attaching a qdisc.
> +	 */
> +	if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) {
> +		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
> +		netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
> +	}

I wonder why this is limited to IFF_NO_QUEUE devices. Do you think there
is a valid use case for physical ones?

Also, if we sanitize here, couldn't we then just get rid of the
sanitization you're fixing in patch 2?

Apart from that, ACK to all the patches. Thanks for cleaning up my mess!
:)

Cheers, Phil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-04  9:35   ` Phil Sutter
@ 2016-11-04 10:10     ` Jesper Dangaard Brouer
  2016-11-04 10:59       ` Phil Sutter
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-04 10:10 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev, Robert Olsson, Jamal Hadi Salim, brouer


On Fri, 4 Nov 2016 10:35:26 +0100 Phil Sutter <phil@nwl.cc> wrote:

> Hi,
> 
> On Thu, Nov 03, 2016 at 02:56:11PM +0100, Jesper Dangaard Brouer wrote:
> [...]
> > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > index 206dc24add3a..f337f1bdd1d4 100644
> > --- a/net/sched/sch_api.c
> > +++ b/net/sched/sch_api.c
> > @@ -960,6 +960,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
> >  
> >  	sch->handle = handle;
> >  
> > +	/* This exist to keep backward compatible with a userspace
> > +	 * loophole, what allowed userspace to get IFF_NO_QUEUE
> > +	 * facility on older kernels by setting tx_queue_len=0 (prior
> > +	 * to qdisc init), and then forgot to reinit tx_queue_len
> > +	 * before again attaching a qdisc.
> > +	 */
> > +	if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) {
> > +		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
> > +		netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
> > +	}  
> 
> I wonder why this is limited to IFF_NO_QUEUE devices. Do you think there
> is a valid use case for physical ones?

Hmmm, I cannot come up with a useful use-case for physical devices, but
I cannot see why we should save users that had used the loophole on
physical devices, as that is clearly a faulty config to begin with.
See net_crit_ratelimited warning here:
 https://github.com/torvalds/linux/blob/v4.9-rc3/net/core/dev.c#L3403

> Also, if we sanitize here, couldn't we then just get rid of the
> sanitization you're fixing in patch 2?

Without patch 2, then some IFF_NO_QUEUE devices would have a visible
tx_queue_len 0 (e.g. the ones not calling ether_setup()), and that
would be inconsistent (visible from userspace).

> Apart from that, ACK to all the patches. Thanks for cleaning up my mess!
> :)

Thanks for the ACKs

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
  2016-11-03 20:54   ` Krister Johansen
@ 2016-11-04 10:56     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-04 10:56 UTC (permalink / raw)
  To: Krister Johansen
  Cc: netdev, Phil Sutter, Robert Olsson, Jamal Hadi Salim, brouer

On Thu, 3 Nov 2016 13:54:40 -0700
Krister Johansen <kjlx@templeofstupid.com> wrote:

> On Thu, Nov 03, 2016 at 02:56:06PM +0100, Jesper Dangaard Brouer wrote:
> > The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a
> > default qdisc attached, given they will be backed by physical device,
> > that already have a qdisc attached for pushback.
> > 
> > It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as
> > this can be useful for difference policy reasons (e.g. bandwidth
> > limiting containers).  For this to work, the tx_queue_len need to have
> > a sane value, because some qdiscs inherit/copy the tx_queue_len
> > (namely, pfifo, bfifo, gred, htb, plug and sfb).
> > 
> > Commit a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling
> > ether_setup()") caught situations where some drivers didn't initialize
> > tx_queue_len.  The problem with the commit was choosing 1 as the
> > fallback value.
> > 
> > A qdisc queue length of 1 causes more harm than good, because it
> > creates hard to debug situations for userspace. It gives userspace a
> > false sense of a working config after attaching a qdisc.  As low
> > volume traffic (that doesn't activate the qdisc policy) works,
> > like ping, while traffic that e.g. needs shaping cannot reach the
> > configured policy levels, given the queue length is too small.  
> 
> Thanks for fixing this.  I've run into this in the exact scenario you
> describe -- bandwith limiting containers.  I'm pretty sure my vote
> doesn't count, but I'm in favor of this change.

Thanks for confirming the problem. You voice is actually very important
in matters like this. It is important to know if people were actually
hit by this.

My own story is that I was hit by this subtle queue length 1 problem
approx 11 years ago without noticing.  An ISP were doing qdisc shaping
(with HTB) on VLAN devices.  The original guy who developed the system
were fired because Internet customers were not getting the bandwidth
they paid for.  I were hired to fix the problem, and unknowingly fixed
it (and bufferbloat) by using SFQ instead of pfifo_fast as leaf qdisc.
I actually didn't realize the root-cause until Oct 2014, see[1].
(I also ended-up fixing other scalability issues in iptables[2])

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1152231
[2] http://people.netfilter.org/hawk/presentations/osd2008/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-04 10:10     ` Jesper Dangaard Brouer
@ 2016-11-04 10:59       ` Phil Sutter
  2016-11-04 12:09         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 16+ messages in thread
From: Phil Sutter @ 2016-11-04 10:59 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, Robert Olsson, Jamal Hadi Salim

On Fri, Nov 04, 2016 at 11:10:42AM +0100, Jesper Dangaard Brouer wrote:
> 
> On Fri, 4 Nov 2016 10:35:26 +0100 Phil Sutter <phil@nwl.cc> wrote:
> 
> > Hi,
> > 
> > On Thu, Nov 03, 2016 at 02:56:11PM +0100, Jesper Dangaard Brouer wrote:
> > [...]
> > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > > index 206dc24add3a..f337f1bdd1d4 100644
> > > --- a/net/sched/sch_api.c
> > > +++ b/net/sched/sch_api.c
> > > @@ -960,6 +960,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
> > >  
> > >  	sch->handle = handle;
> > >  
> > > +	/* This exist to keep backward compatible with a userspace
> > > +	 * loophole, what allowed userspace to get IFF_NO_QUEUE
> > > +	 * facility on older kernels by setting tx_queue_len=0 (prior
> > > +	 * to qdisc init), and then forgot to reinit tx_queue_len
> > > +	 * before again attaching a qdisc.
> > > +	 */
> > > +	if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) {
> > > +		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
> > > +		netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
> > > +	}  
> > 
> > I wonder why this is limited to IFF_NO_QUEUE devices. Do you think there
> > is a valid use case for physical ones?
> 
> Hmmm, I cannot come up with a useful use-case for physical devices, but
> I cannot see why we should save users that had used the loophole on
> physical devices, as that is clearly a faulty config to begin with.
> See net_crit_ratelimited warning here:
>  https://github.com/torvalds/linux/blob/v4.9-rc3/net/core/dev.c#L3403

I really feel like nit-picking again, but what differs in between
loophole users of virtual devices (whose broken scripts stopped working)
and loophole users of physical devices (whose broken scripts stopped
working as well)?

I we really take exposing broken userspace scripts as kernel bugs, don't
we have to take this one for the same as well?

> > Also, if we sanitize here, couldn't we then just get rid of the
> > sanitization you're fixing in patch 2?
> 
> Without patch 2, then some IFF_NO_QUEUE devices would have a visible
> tx_queue_len 0 (e.g. the ones not calling ether_setup()), and that
> would be inconsistent (visible from userspace).

Ah, indeed. Although there's no functional difference, I guess it might
confuse people seeing an interface with 0 qlen performing properly.

Thanks, Phil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-04 10:59       ` Phil Sutter
@ 2016-11-04 12:09         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-04 12:09 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev, Robert Olsson, Jamal Hadi Salim, brouer

On Fri, 4 Nov 2016 11:59:13 +0100
Phil Sutter <phil@nwl.cc> wrote:

> On Fri, Nov 04, 2016 at 11:10:42AM +0100, Jesper Dangaard Brouer wrote:
> > 
> > On Fri, 4 Nov 2016 10:35:26 +0100 Phil Sutter <phil@nwl.cc> wrote:
> >   
> > > Hi,
> > > 
> > > On Thu, Nov 03, 2016 at 02:56:11PM +0100, Jesper Dangaard Brouer wrote:
> > > [...]  
> > > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > > > index 206dc24add3a..f337f1bdd1d4 100644
> > > > --- a/net/sched/sch_api.c
> > > > +++ b/net/sched/sch_api.c
> > > > @@ -960,6 +960,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
> > > >  
> > > >  	sch->handle = handle;
> > > >  
> > > > +	/* This exist to keep backward compatible with a userspace
> > > > +	 * loophole, what allowed userspace to get IFF_NO_QUEUE
> > > > +	 * facility on older kernels by setting tx_queue_len=0 (prior
> > > > +	 * to qdisc init), and then forgot to reinit tx_queue_len
> > > > +	 * before again attaching a qdisc.
> > > > +	 */
> > > > +	if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) {
> > > > +		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
> > > > +		netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
> > > > +	}    
> > > 
> > > I wonder why this is limited to IFF_NO_QUEUE devices. Do you think there
> > > is a valid use case for physical ones?  
> > 
> > Hmmm, I cannot come up with a useful use-case for physical devices, but
> > I cannot see why we should save users that had used the loophole on
> > physical devices, as that is clearly a faulty config to begin with.
> > See net_crit_ratelimited warning here:
> >  [1] https://github.com/torvalds/linux/blob/v4.9-rc3/net/core/dev.c#L3403  
> 
> I really feel like nit-picking again,

Perhaps a follow up patch is better?  This patch does solve a real
issue.

> but what differs in between
> loophole users of virtual devices (whose broken scripts stopped working)
> and loophole users of physical devices (whose broken scripts stopped
> working as well)?

There is a difference.  We basically closed the loophole config, but
fixed that qdisc can be attached to virtual (IFF_NO_QUEUE) devices,
without needing to adjusting tx_queue_len.

Thus, running a loophole-script have no-effect, but for IFF_NO_QUEUE
devices (veth specifically) it looks like it had the desired effect,
thus Docker will/can keep doing that, to work with older kernels, and
on newer kernels it just doesn't have any effect.

The remaining problem is that a "loophole-script" leaves the interface
in a broken state with tx_queue_len==0.  Which this patch address.

So, why only catch misconfig for IFF_NO_QUEUE devices?  Because a
loophole-script on veth brought it into a valid config, thus valid
use-case, while one a physical into a invalid config (hence the
critical warn[1]).

You could (in a followup patch, please) argue that it is a lot simpler,
just to always catch the misconfig of having tx_queue_len==0 when
attaching a qdisc.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-03 13:56 ` [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device Jesper Dangaard Brouer
  2016-11-04  9:35   ` Phil Sutter
@ 2016-11-04 12:53   ` Sergei Shtylyov
  2016-11-08  6:14   ` Maciej Żenczykowski
  2 siblings, 0 replies; 16+ messages in thread
From: Sergei Shtylyov @ 2016-11-04 12:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev
  Cc: Phil Sutter, Robert Olsson, Jamal Hadi Salim

Hello.

On 11/3/2016 4:56 PM, Jesper Dangaard Brouer wrote:

> It is a clear misconfiguration to attach a qdisc to a device with
> tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
> htb, plug and sfb) inherit/copy this value as their queue length.
>
> Why should the kernel catch such a misconfiguration?  Because prior to
> introducing the IFF_NO_QUEUE device flag, userspace found a loophole
> in the qdisc config system that allowed them to achieve the equivalent
> of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
> a device.  The loophole on older kernels is setting tx_queue_len=0,
> *prior* to device qdisc init (the config time is significant, simply
> setting tx_queue_len=0 doesn't trigger the loophole).
>
> This loophole is currently used by Docker[1] to get better performance
> and scalability out of the veth device.  The Docker developers were
> warned[1] that they needed to adjust the tx_queue_len if ever
> attaching a qdisc.  The OpenShift project didn't remember this warning
> and attached a qdisc, this were caught and fixed in[2].
>
> [1] https://github.com/docker/libcontainer/pull/193
> [2] https://github.com/openshift/origin/pull/11126
>
> Instead of fixing every userspace program that used this loophole, and
> forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
> catch the misconfiguration on the kernel side.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  net/sched/sch_api.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 206dc24add3a..f337f1bdd1d4 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -960,6 +960,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
>
>  	sch->handle = handle;
>
> +	/* This exist to keep backward compatible with a userspace

    "Exists" and "compatibility".

> +	 * loophole, what allowed userspace to get IFF_NO_QUEUE
> +	 * facility on older kernels by setting tx_queue_len=0 (prior
> +	 * to qdisc init), and then forgot to reinit tx_queue_len
> +	 * before again attaching a qdisc.
> +	 */
> +	if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) {
> +		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
> +		netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
> +	}
> +
>  	if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) {
>  		if (qdisc_is_percpu_stats(sch)) {
>  			sch->cpu_bstats =

MBR, Sergei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
  2016-11-03 13:55 [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2016-11-03 13:56 ` [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device Jesper Dangaard Brouer
@ 2016-11-07 18:13 ` David Miller
  2016-11-07 20:11   ` Jesper Dangaard Brouer
  2016-11-08  1:16 ` David Miller
  4 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2016-11-07 18:13 UTC (permalink / raw)
  To: brouer; +Cc: netdev, phil, robert, jhs

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 03 Nov 2016 14:55:56 +0100

> This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
> hopefully help userspace get a more consistent behavior when attaching
> qdisc to such virtual devices.

I'm still thinking about this.

My reservation about this is basically since the one known offender in
userspace acknowledged that what it was doing wrong, and fixed it
quickly already, I see no reason to explicitly accomodate this.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
  2016-11-07 18:13 ` [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices David Miller
@ 2016-11-07 20:11   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-07 20:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, phil, robert, jhs, brouer

On Mon, 07 Nov 2016 13:13:44 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Thu, 03 Nov 2016 14:55:56 +0100
> 
> > This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
> > hopefully help userspace get a more consistent behavior when attaching
> > qdisc to such virtual devices.  
> 
> I'm still thinking about this.
> 
> My reservation about this is basically since the one known offender in
> userspace acknowledged that what it was doing wrong, and fixed it
> quickly already, I see no reason to explicitly accomodate this.

The situation I worry about is that a sysadm cannot manually apply a tc
qdisc on a Docker container's veth without getting bitten.  Docker will
forever run a "loophole-script" to accommodate older kernels, and yes
the shiny management scripts will get fixed, but how should a mortal
sysadm know (to change tx_queue_len before applying a qdisc).

Besides the it was only fixed in OpenShift, which inherited the "bug"
from Docker.  Thus, it is per-say not fixed in Docker or other projects
that (like OpenShift) uses components from Docker.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
  2016-11-03 13:55 [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2016-11-07 18:13 ` [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices David Miller
@ 2016-11-08  1:16 ` David Miller
  4 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2016-11-08  1:16 UTC (permalink / raw)
  To: brouer; +Cc: netdev, phil, robert, jhs

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 03 Nov 2016 14:55:56 +0100

> This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
> hopefully help userspace get a more consistent behavior when attaching
> qdisc to such virtual devices.

Series applied, thanks Jesper.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-03 13:56 ` [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device Jesper Dangaard Brouer
  2016-11-04  9:35   ` Phil Sutter
  2016-11-04 12:53   ` Sergei Shtylyov
@ 2016-11-08  6:14   ` Maciej Żenczykowski
  2016-11-08  7:46     ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 16+ messages in thread
From: Maciej Żenczykowski @ 2016-11-08  6:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Linux NetDev, Phil Sutter, Robert Olsson, Jamal Hadi Salim

Just FYI:

I'm tangentially aware of internal Google code that:
- expects a bonding device running HTB with non-zero txqueuelen
- wants to remove HTB and get a noqueue interface (the normal default
for bonding)

The code currently removes HTB, which gets us to mq, sets txqueuelen
to 0, adds a pfifo, removes the pfifo, which gets us to noqueue.

After this patch this would ?possibly? break (adding pfifo, would
change txqueuelen, so when we remove it we wouldn't end up with
noqueue).

>From what I fuzzily recall, HTB with txquelelen == 0 drops traffic
hard, while pfifo continues to function, hence the ordering...

Obviously our code can be fixed, but I'm worried there's a more
generic backwards compatibility problem here.

(note: this is mostly about 3.11 and 4.3 and might no longer be
relevant with 4.10... maybe the new kernel's default qdisc selection
logic doesn't depend on txqueuelen and checks the flag instead???)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
  2016-11-08  6:14   ` Maciej Żenczykowski
@ 2016-11-08  7:46     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-08  7:46 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Linux NetDev, Phil Sutter, Robert Olsson, Jamal Hadi Salim, brouer


On Mon, 7 Nov 2016 22:14:37 -0800 Maciej Żenczykowski <zenczykowski@gmail.com> wrote:

> Just FYI:
> 
> I'm tangentially aware of internal Google code that:
> - expects a bonding device running HTB with non-zero txqueuelen
> - wants to remove HTB and get a noqueue interface (the normal default
> for bonding)
> 
> The code currently removes HTB, which gets us to mq, sets txqueuelen
> to 0, adds a pfifo, removes the pfifo, which gets us to noqueue.

This clearly shows that the older userspace interface, of tx_queue_len
having double meaning, was a mess!

> After this patch this would ?possibly? break (adding pfifo, would
> change txqueuelen, so when we remove it we wouldn't end up with
> noqueue).

No, you will still end-up with "noqueue".  It is now the flag
IFF_NO_QUEUE that determine if a device gets "noqueue" when the default
qdisc is attached. The tx_queue_len no longer have any effect on
getting "noqueue".  The IFF_NO_QUEUE system removed this double meaning
of tx_queue_len.


> From what I fuzzily recall, HTB with txquelelen == 0 drops traffic
> hard, while pfifo continues to function, hence the ordering...
> 
> Obviously our code can be fixed, but I'm worried there's a more
> generic backwards compatibility problem here.

It is good you bring it up, but I don't see a backwards compatibility
problem with your usage after the patchset.
 
> (note: this is mostly about 3.11 and 4.3 and might no longer be
> relevant with 4.10... maybe the new kernel's default qdisc selection
> logic doesn't depend on txqueuelen and checks the flag instead???)

If I were you, I would now implement a validation check that reported
the problem if not getting into the expected "noqueue" state.  Then
when you eventually upgrade to a more recent kernel, you would get
alerted of improper state.

Something like:

noqueue=$(ip link show dev $DEV 2> /dev/null | grep -q "noqueue" && echo "noqueue" || echo "bad")
if [[ "$noqueue" != "noqueue" ]]; then
    echo "report-problem";
fi

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-11-08  7:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-03 13:55 [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices Jesper Dangaard Brouer
2016-11-03 13:56 ` [net-next PATCH 1/3] net: make default TX queue length a defined constant Jesper Dangaard Brouer
2016-11-03 13:56 ` [net-next PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len Jesper Dangaard Brouer
2016-11-03 20:54   ` Krister Johansen
2016-11-04 10:56     ` Jesper Dangaard Brouer
2016-11-03 13:56 ` [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device Jesper Dangaard Brouer
2016-11-04  9:35   ` Phil Sutter
2016-11-04 10:10     ` Jesper Dangaard Brouer
2016-11-04 10:59       ` Phil Sutter
2016-11-04 12:09         ` Jesper Dangaard Brouer
2016-11-04 12:53   ` Sergei Shtylyov
2016-11-08  6:14   ` Maciej Żenczykowski
2016-11-08  7:46     ` Jesper Dangaard Brouer
2016-11-07 18:13 ` [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices David Miller
2016-11-07 20:11   ` Jesper Dangaard Brouer
2016-11-08  1:16 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.