All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: orphan queued skbs if device tx can stall
@ 2012-04-08 17:13 Michael S. Tsirkin
  2012-04-08 23:49   ` Herbert Xu
  2012-04-10  7:55 ` Eric Dumazet
  0 siblings, 2 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-08 17:13 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Jamal Hadi Salim, Michael S. Tsirkin,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings, Herbert Xu

commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5
tun: orphan an skb on tx
Fixed a configuration where skbs get queued
at the tun device forever, blocking senders.

However this fix isn't waterproof:
userspace can control whether the interface
is stopped, and if it is, packets
get queued in the qdisc, again potentially forever.

Complete the fix by setting a private flag and orphaning
at the qdisc level.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/tun.c       |    3 +++
 include/linux/if.h      |    1 +
 net/core/dev.c          |    5 +++++
 net/sched/sch_generic.c |    5 +++++
 4 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bb8c72c..15c5bb8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -535,6 +535,9 @@ static void tun_net_init(struct net_device *dev)
 		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
 		break;
 	}
+	/* Once queue becomes full, we stop tx until userspace
+	 * dequeues some packets, that is potentially forever. */
+	dev->priv_flags |= IFF_TX_CAN_STALL;
 }
 
 /* Character device part */
diff --git a/include/linux/if.h b/include/linux/if.h
index f995c66..dd2c7f7 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -81,6 +81,7 @@
 #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
 #define IFF_TEAM_PORT	0x40000		/* device used as team port */
 #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
+#define IFF_TX_CAN_STALL 0x100000	/* Device can stop tx forever */
 
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
diff --git a/net/core/dev.c b/net/core/dev.c
index 5d59155..e812706 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2516,6 +2516,11 @@ int dev_queue_xmit(struct sk_buff *skb)
 	struct Qdisc *q;
 	int rc = -ENOMEM;
 
+	/* Orphan the skb - required if we might hang on to it
+	 * for indefinite time. */
+	if (dev->priv_flags & IFF_TX_CAN_STALL)
+		skb_orphan(skb);
+
 	/* Disable soft irqs for various locks below. Also
 	 * stops preemption for RCU.
 	 */
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 67fc573..27883d1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -120,6 +120,11 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	/* And release qdisc */
 	spin_unlock(root_lock);
 
+	/* Orphan the skb - required if we might hang on to it
+	 * for indefinite time. */
+	if (dev->priv_flags & IFF_TX_CAN_STALL)
+		skb_orphan(skb);
+
 	HARD_TX_LOCK(dev, txq, smp_processor_id());
 	if (!netif_xmit_frozen_or_stopped(txq))
 		ret = dev_hard_start_xmit(skb, dev, txq);
-- 
1.7.9.111.gf3fb0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-08 17:13 [PATCH] net: orphan queued skbs if device tx can stall Michael S. Tsirkin
@ 2012-04-08 23:49   ` Herbert Xu
  2012-04-10  7:55 ` Eric Dumazet
  1 sibling, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-08 23:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Sun, Apr 08, 2012 at 08:13:25PM +0300, Michael S. Tsirkin wrote:
> commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5
> tun: orphan an skb on tx
> Fixed a configuration where skbs get queued
> at the tun device forever, blocking senders.
> 
> However this fix isn't waterproof:
> userspace can control whether the interface
> is stopped, and if it is, packets
> get queued in the qdisc, again potentially forever.
> 
> Complete the fix by setting a private flag and orphaning
> at the qdisc level.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

1) Doesn't this break local UDP push-back?
2) Isn't the stall a bug in the backend and isn't this just
papering over that?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
@ 2012-04-08 23:49   ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-08 23:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Sun, Apr 08, 2012 at 08:13:25PM +0300, Michael S. Tsirkin wrote:
> commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5
> tun: orphan an skb on tx
> Fixed a configuration where skbs get queued
> at the tun device forever, blocking senders.
> 
> However this fix isn't waterproof:
> userspace can control whether the interface
> is stopped, and if it is, packets
> get queued in the qdisc, again potentially forever.
> 
> Complete the fix by setting a private flag and orphaning
> at the qdisc level.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

1) Doesn't this break local UDP push-back?
2) Isn't the stall a bug in the backend and isn't this just
papering over that?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-08 23:49   ` Herbert Xu
  (?)
@ 2012-04-09  7:28   ` Michael S. Tsirkin
  2012-04-09  7:33       ` Herbert Xu
  -1 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-09  7:28 UTC (permalink / raw)
  To: Herbert Xu
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 07:49:51AM +0800, Herbert Xu wrote:
> On Sun, Apr 08, 2012 at 08:13:25PM +0300, Michael S. Tsirkin wrote:
> > commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5
> > tun: orphan an skb on tx
> > Fixed a configuration where skbs get queued
> > at the tun device forever, blocking senders.
> > 
> > However this fix isn't waterproof:
> > userspace can control whether the interface
> > is stopped, and if it is, packets
> > get queued in the qdisc, again potentially forever.
> > 
> > Complete the fix by setting a private flag and orphaning
> > at the qdisc level.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> 1) Doesn't this break local UDP push-back?

What is meant by UDP pushback here? Two tap
devices communicating by UDP packets locally?
This was always broken, see below.

> 2) Isn't the stall a bug in the backend and isn't this just
> papering over that?
> 
> Cheers,

What do you mean by the backend? userspace? Yes, the stall is a result of
userspace not consuming packets:

        if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
                if (!(tun->flags & TUN_ONE_QUEUE)) {
                        /* Normal queueing mode. */
                        /* Packet scheduler handles dropping of further packets. */
                        netif_stop_queue(dev);
                       /* We won't see all dropped packets individually, so overrun
                         * error is more appropriate. */
                        dev->stats.tx_fifo_errors++;

Thus we get this situation
    tap1 sends packets, some of them to tap2, tap2 does not consume them,
    as a result tap2 queue overflows, tap2 stops forever and
    packets get queued in the qdisc, now tap1
    send buffer gets full so it can not communicate to any destination.

So the problem is one VM can block all networking from another one.

As a solution this patch is always changing ownership if we're going
into a hostile device: it just does this early in qdisc instead of
at xmit time.



> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-09  7:28   ` Michael S. Tsirkin
@ 2012-04-09  7:33       ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-09  7:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 10:28:49AM +0300, Michael S. Tsirkin wrote:
>
> > 1) Doesn't this break local UDP push-back?
> 
> What is meant by UDP pushback here? Two tap
> devices communicating by UDP packets locally?
> This was always broken, see below.

I mean push-back from UDP transmission to the physical NIC.

Your patch breaks that as now the guest will have no push-back
whatsoever so anything that transmits UDP without protocol-level
congestion control will start dropping most of their packets.

Granted you can argue that these apps are broken, but they do
exist and we've always catered for them, both on baremetal and
under virtualisation.
 
> Thus we get this situation
>     tap1 sends packets, some of them to tap2, tap2 does not consume them,
>     as a result tap2 queue overflows, tap2 stops forever and
>     packets get queued in the qdisc, now tap1
>     send buffer gets full so it can not communicate to any destination.
> 
> So the problem is one VM can block all networking from another one.

This should be addressed in the backend, as it can distinguish
between packets going out to physical and packets stuck going to
a local VM.  In the latter case you can then duplicate and release
the sender's memory.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
@ 2012-04-09  7:33       ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-09  7:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 10:28:49AM +0300, Michael S. Tsirkin wrote:
>
> > 1) Doesn't this break local UDP push-back?
> 
> What is meant by UDP pushback here? Two tap
> devices communicating by UDP packets locally?
> This was always broken, see below.

I mean push-back from UDP transmission to the physical NIC.

Your patch breaks that as now the guest will have no push-back
whatsoever so anything that transmits UDP without protocol-level
congestion control will start dropping most of their packets.

Granted you can argue that these apps are broken, but they do
exist and we've always catered for them, both on baremetal and
under virtualisation.
 
> Thus we get this situation
>     tap1 sends packets, some of them to tap2, tap2 does not consume them,
>     as a result tap2 queue overflows, tap2 stops forever and
>     packets get queued in the qdisc, now tap1
>     send buffer gets full so it can not communicate to any destination.
> 
> So the problem is one VM can block all networking from another one.

This should be addressed in the backend, as it can distinguish
between packets going out to physical and packets stuck going to
a local VM.  In the latter case you can then duplicate and release
the sender's memory.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-09  7:33       ` Herbert Xu
  (?)
@ 2012-04-09  7:39       ` Michael S. Tsirkin
  2012-04-09  8:29           ` Herbert Xu
  -1 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-09  7:39 UTC (permalink / raw)
  To: Herbert Xu
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 03:33:54PM +0800, Herbert Xu wrote:
> On Mon, Apr 09, 2012 at 10:28:49AM +0300, Michael S. Tsirkin wrote:
> >
> > > 1) Doesn't this break local UDP push-back?
> > 
> > What is meant by UDP pushback here? Two tap
> > devices communicating by UDP packets locally?
> > This was always broken, see below.
> 
> I mean push-back from UDP transmission to the physical NIC.
> 
> Your patch breaks that

I think there's some misunderstanding. pushback is only disabled
for destinations that set IFF_TX_CAN_STALL. I expect that
no physical NICs set this flag - only tun and possibly
other userspace-controlled devices in the future.

> as now the guest will have no push-back
> whatsoever so anything that transmits UDP without protocol-level
> congestion control will start dropping most of their packets.
> 
> Granted you can argue that these apps are broken, but they do
> exist and we've always catered for them, both on baremetal and
> under virtualisation.
> > Thus we get this situation
> >     tap1 sends packets, some of them to tap2, tap2 does not consume them,
> >     as a result tap2 queue overflows, tap2 stops forever and
> >     packets get queued in the qdisc, now tap1
> >     send buffer gets full so it can not communicate to any destination.
> > 
> > So the problem is one VM can block all networking from another one.
> 
> This should be addressed in the backend, as it can distinguish
> between packets going out to physical and packets stuck going to
> a local VM.

Sorry I still have no clue what you call a backend.
Could you clarify please? All packets from a tun device
go to userspace.

>  In the latter case you can then duplicate and release
> the sender's memory.
> 
> Cheers,

The problem this patch is trying to address is tun device
can get stopped forever, which causes packets to
accumulate in qdisc again forever. So tun does not
get a chance to release sender's memory for these packets.

> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-09  7:39       ` Michael S. Tsirkin
@ 2012-04-09  8:29           ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-09  8:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 10:39:54AM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 09, 2012 at 03:33:54PM +0800, Herbert Xu wrote:
> > On Mon, Apr 09, 2012 at 10:28:49AM +0300, Michael S. Tsirkin wrote:
> > >
> > > > 1) Doesn't this break local UDP push-back?
> > > 
> > > What is meant by UDP pushback here? Two tap
> > > devices communicating by UDP packets locally?
> > > This was always broken, see below.
> > 
> > I mean push-back from UDP transmission to the physical NIC.
> > 
> > Your patch breaks that
> 
> I think there's some misunderstanding. pushback is only disabled
> for destinations that set IFF_TX_CAN_STALL. I expect that
> no physical NICs set this flag - only tun and possibly
> other userspace-controlled devices in the future.

I'm talking about an app running in the guest transmitting UDP
to the physical NIC via virtio/vhost.  This will break with your
patch.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
@ 2012-04-09  8:29           ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-09  8:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 10:39:54AM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 09, 2012 at 03:33:54PM +0800, Herbert Xu wrote:
> > On Mon, Apr 09, 2012 at 10:28:49AM +0300, Michael S. Tsirkin wrote:
> > >
> > > > 1) Doesn't this break local UDP push-back?
> > > 
> > > What is meant by UDP pushback here? Two tap
> > > devices communicating by UDP packets locally?
> > > This was always broken, see below.
> > 
> > I mean push-back from UDP transmission to the physical NIC.
> > 
> > Your patch breaks that
> 
> I think there's some misunderstanding. pushback is only disabled
> for destinations that set IFF_TX_CAN_STALL. I expect that
> no physical NICs set this flag - only tun and possibly
> other userspace-controlled devices in the future.

I'm talking about an app running in the guest transmitting UDP
to the physical NIC via virtio/vhost.  This will break with your
patch.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-09  8:29           ` Herbert Xu
  (?)
@ 2012-04-09  8:34           ` Michael S. Tsirkin
  2012-04-09  8:39               ` Herbert Xu
  -1 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-09  8:34 UTC (permalink / raw)
  To: Herbert Xu
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 04:29:20PM +0800, Herbert Xu wrote:
> On Mon, Apr 09, 2012 at 10:39:54AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Apr 09, 2012 at 03:33:54PM +0800, Herbert Xu wrote:
> > > On Mon, Apr 09, 2012 at 10:28:49AM +0300, Michael S. Tsirkin wrote:
> > > >
> > > > > 1) Doesn't this break local UDP push-back?
> > > > 
> > > > What is meant by UDP pushback here? Two tap
> > > > devices communicating by UDP packets locally?
> > > > This was always broken, see below.
> > > 
> > > I mean push-back from UDP transmission to the physical NIC.
> > > 
> > > Your patch breaks that
> > 
> > I think there's some misunderstanding. pushback is only disabled
> > for destinations that set IFF_TX_CAN_STALL. I expect that
> > no physical NICs set this flag - only tun and possibly
> > other userspace-controlled devices in the future.
> 
> I'm talking about an app running in the guest transmitting UDP
> to the physical NIC via virtio/vhost.  This will break with your
> patch.
> 
> Cheers,

Puzzled. This patch orphans skbs only if the destination device sets
IFF_TX_CAN_STALL.  Since the physical NIC doesn't stall forever it never
sets this flag.

So it seems that this patch should not affect the configuration
you describe at all.

Could you please clarify? What did I miss?

> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-09  8:34           ` Michael S. Tsirkin
@ 2012-04-09  8:39               ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-09  8:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 11:34:02AM +0300, Michael S. Tsirkin wrote:
>
> Puzzled. This patch orphans skbs only if the destination device sets
> IFF_TX_CAN_STALL.  Since the physical NIC doesn't stall forever it never
> sets this flag.
> 
> So it seems that this patch should not affect the configuration
> you describe at all.
> 
> Could you please clarify? What did I miss?

Oops, you're right.  Somehow I was thinking you were patching
virtio-net instead of tun.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
@ 2012-04-09  8:39               ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2012-04-09  8:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 11:34:02AM +0300, Michael S. Tsirkin wrote:
>
> Puzzled. This patch orphans skbs only if the destination device sets
> IFF_TX_CAN_STALL.  Since the physical NIC doesn't stall forever it never
> sets this flag.
> 
> So it seems that this patch should not affect the configuration
> you describe at all.
> 
> Could you please clarify? What did I miss?

Oops, you're right.  Somehow I was thinking you were patching
virtio-net instead of tun.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-09  8:39               ` Herbert Xu
  (?)
@ 2012-04-09  8:42               ` Michael S. Tsirkin
  2012-04-09  9:13                 ` Eric Dumazet
  -1 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-09  8:42 UTC (permalink / raw)
  To: Herbert Xu
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Eric Dumazet, Michał Mirosław,
	Ben Hutchings

On Mon, Apr 09, 2012 at 04:39:58PM +0800, Herbert Xu wrote:
> On Mon, Apr 09, 2012 at 11:34:02AM +0300, Michael S. Tsirkin wrote:
> >
> > Puzzled. This patch orphans skbs only if the destination device sets
> > IFF_TX_CAN_STALL.  Since the physical NIC doesn't stall forever it never
> > sets this flag.
> > 
> > So it seems that this patch should not affect the configuration
> > you describe at all.
> > 
> > Could you please clarify? What did I miss?
> 
> Oops, you're right.  Somehow I was thinking you were patching
> virtio-net instead of tun.
> 
> Cheers,

Cool. Want to ack the patch then just to make it clear you are happy?

> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-09  8:42               ` Michael S. Tsirkin
@ 2012-04-09  9:13                 ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2012-04-09  9:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Herbert Xu, netdev, linux-kernel, David S. Miller,
	Jamal Hadi Salim, Stephen Hemminger, Jason Wang, Neil Horman,
	Jiri Pirko, Jeff Kirsher, Michał Mirosław,
	Ben Hutchings

On Mon, 2012-04-09 at 11:42 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 09, 2012 at 04:39:58PM +0800, Herbert Xu wrote:
> > On Mon, Apr 09, 2012 at 11:34:02AM +0300, Michael S. Tsirkin wrote:
> > >
> > > Puzzled. This patch orphans skbs only if the destination device sets
> > > IFF_TX_CAN_STALL.  Since the physical NIC doesn't stall forever it never
> > > sets this flag.
> > > 
> > > So it seems that this patch should not affect the configuration
> > > you describe at all.
> > > 
> > > Could you please clarify? What did I miss?
> > 
> > Oops, you're right.  Somehow I was thinking you were patching
> > virtio-net instead of tun.
> > 
> > Cheers,
> 
> Cool. Want to ack the patch then just to make it clear you are happy?

I am not happy with this patch.

Since its Easter, I dont have much time to comment right now, I'll do
that tomorrow.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-08 17:13 [PATCH] net: orphan queued skbs if device tx can stall Michael S. Tsirkin
  2012-04-08 23:49   ` Herbert Xu
@ 2012-04-10  7:55 ` Eric Dumazet
  2012-04-10  8:41   ` Michael S. Tsirkin
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2012-04-10  7:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Sun, 2012-04-08 at 20:13 +0300, Michael S. Tsirkin wrote:
> commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5
> tun: orphan an skb on tx
> Fixed a configuration where skbs get queued
> at the tun device forever, blocking senders.
> 
> However this fix isn't waterproof:
> userspace can control whether the interface
> is stopped, and if it is, packets
> get queued in the qdisc, again potentially forever.
> 
> Complete the fix by setting a private flag and orphaning
> at the qdisc level.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/tun.c       |    3 +++
>  include/linux/if.h      |    1 +
>  net/core/dev.c          |    5 +++++
>  net/sched/sch_generic.c |    5 +++++
>  4 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index bb8c72c..15c5bb8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -535,6 +535,9 @@ static void tun_net_init(struct net_device *dev)
>  		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
>  		break;
>  	}
> +	/* Once queue becomes full, we stop tx until userspace
> +	 * dequeues some packets, that is potentially forever. */
> +	dev->priv_flags |= IFF_TX_CAN_STALL;
>  }
>  
>  /* Character device part */
> diff --git a/include/linux/if.h b/include/linux/if.h
> index f995c66..dd2c7f7 100644
> --- a/include/linux/if.h
> +++ b/include/linux/if.h
> @@ -81,6 +81,7 @@
>  #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
>  #define IFF_TEAM_PORT	0x40000		/* device used as team port */
>  #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
> +#define IFF_TX_CAN_STALL 0x100000	/* Device can stop tx forever */
>  
> 
>  #define IF_GET_IFACE	0x0001		/* for querying only */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5d59155..e812706 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2516,6 +2516,11 @@ int dev_queue_xmit(struct sk_buff *skb)
>  	struct Qdisc *q;
>  	int rc = -ENOMEM;
>  
> +	/* Orphan the skb - required if we might hang on to it
> +	 * for indefinite time. */
> +	if (dev->priv_flags & IFF_TX_CAN_STALL)
> +		skb_orphan(skb);
> +
>  	/* Disable soft irqs for various locks below. Also
>  	 * stops preemption for RCU.
>  	 */
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 67fc573..27883d1 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -120,6 +120,11 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>  	/* And release qdisc */
>  	spin_unlock(root_lock);
>  
> +	/* Orphan the skb - required if we might hang on to it
> +	 * for indefinite time. */
> +	if (dev->priv_flags & IFF_TX_CAN_STALL)
> +		skb_orphan(skb);
> +
>  	HARD_TX_LOCK(dev, txq, smp_processor_id());
>  	if (!netif_xmit_frozen_or_stopped(txq))
>  		ret = dev_hard_start_xmit(skb, dev, txq);

This slows down the core fastpath for a very specific use.

In your case I would just not use qdisc at all, like other virtual
devices.

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bb8c72c..fd8c7f0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -396,7 +396,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	    sk_filter(tun->socket.sk, skb))
 		goto drop;
 
-	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
+	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= TUN_READQ_SIZE) {
 		if (!(tun->flags & TUN_ONE_QUEUE)) {
 			/* Normal queueing mode. */
 			/* Packet scheduler handles dropping of further packets. */
@@ -521,7 +521,7 @@ static void tun_net_init(struct net_device *dev)
 		/* Zero header length */
 		dev->type = ARPHRD_NONE;
 		dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
-		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
+		dev->tx_queue_len = 0;
 		break;
 
 	case TUN_TAP_DEV:
@@ -532,7 +532,7 @@ static void tun_net_init(struct net_device *dev)
 
 		eth_hw_addr_random(dev);
 
-		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
+		dev->tx_queue_len = 0;
 		break;
 	}
 }



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10  7:55 ` Eric Dumazet
@ 2012-04-10  8:41   ` Michael S. Tsirkin
  2012-04-10  8:55     ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-10  8:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 09:55:58AM +0200, Eric Dumazet wrote:
> On Sun, 2012-04-08 at 20:13 +0300, Michael S. Tsirkin wrote:
> > commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5
> > tun: orphan an skb on tx
> > Fixed a configuration where skbs get queued
> > at the tun device forever, blocking senders.
> > 
> > However this fix isn't waterproof:
> > userspace can control whether the interface
> > is stopped, and if it is, packets
> > get queued in the qdisc, again potentially forever.
> > 
> > Complete the fix by setting a private flag and orphaning
> > at the qdisc level.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/net/tun.c       |    3 +++
> >  include/linux/if.h      |    1 +
> >  net/core/dev.c          |    5 +++++
> >  net/sched/sch_generic.c |    5 +++++
> >  4 files changed, 14 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index bb8c72c..15c5bb8 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -535,6 +535,9 @@ static void tun_net_init(struct net_device *dev)
> >  		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
> >  		break;
> >  	}
> > +	/* Once queue becomes full, we stop tx until userspace
> > +	 * dequeues some packets, that is potentially forever. */
> > +	dev->priv_flags |= IFF_TX_CAN_STALL;
> >  }
> >  
> >  /* Character device part */
> > diff --git a/include/linux/if.h b/include/linux/if.h
> > index f995c66..dd2c7f7 100644
> > --- a/include/linux/if.h
> > +++ b/include/linux/if.h
> > @@ -81,6 +81,7 @@
> >  #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
> >  #define IFF_TEAM_PORT	0x40000		/* device used as team port */
> >  #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
> > +#define IFF_TX_CAN_STALL 0x100000	/* Device can stop tx forever */
> >  
> > 
> >  #define IF_GET_IFACE	0x0001		/* for querying only */
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 5d59155..e812706 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -2516,6 +2516,11 @@ int dev_queue_xmit(struct sk_buff *skb)
> >  	struct Qdisc *q;
> >  	int rc = -ENOMEM;
> >  
> > +	/* Orphan the skb - required if we might hang on to it
> > +	 * for indefinite time. */
> > +	if (dev->priv_flags & IFF_TX_CAN_STALL)
> > +		skb_orphan(skb);
> > +
> >  	/* Disable soft irqs for various locks below. Also
> >  	 * stops preemption for RCU.
> >  	 */
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 67fc573..27883d1 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -120,6 +120,11 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
> >  	/* And release qdisc */
> >  	spin_unlock(root_lock);
> >  
> > +	/* Orphan the skb - required if we might hang on to it
> > +	 * for indefinite time. */
> > +	if (dev->priv_flags & IFF_TX_CAN_STALL)
> > +		skb_orphan(skb);
> > +
> >  	HARD_TX_LOCK(dev, txq, smp_processor_id());
> >  	if (!netif_xmit_frozen_or_stopped(txq))
> >  		ret = dev_hard_start_xmit(skb, dev, txq);
> 
> This slows down the core fastpath for a very specific use.

You are right.
I presume the fastpath is when the device is *not* stalled,
correct? So maybe it's ok to add this logic on slow path
where the queue is stopped and we queue the packet?
When the queue is running tun can orphan the skb itself.

This is what the change at the bottom of this mail on top of my
patch does - untested yet and naturally needs to be combined
and resubmitted properly, assuming it makes sense.

Would this, in your opinion, address this concern adequately?
Thanks!

> In your case I would just not use qdisc at all, like other virtual
> devices.

I think that if we do this, this also disables gso
for the device, doesn't it?
If true that would be a problem as this would
hurt performance of virtualized setups a lot.

> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index bb8c72c..fd8c7f0 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -396,7 +396,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	    sk_filter(tun->socket.sk, skb))
>  		goto drop;
>  
> -	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
> +	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= TUN_READQ_SIZE) {
>  		if (!(tun->flags & TUN_ONE_QUEUE)) {
>  			/* Normal queueing mode. */
>  			/* Packet scheduler handles dropping of further packets. */

tx_queue_len is controllable by SIOCSIFTXQLEN
so we'll need to override SIOCSIFTXQLEN somehow
to avoid breaking userspace that actually uses SIOCSIFTXQLEN, right?

> @@ -521,7 +521,7 @@ static void tun_net_init(struct net_device *dev)
>  		/* Zero header length */
>  		dev->type = ARPHRD_NONE;
>  		dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
> -		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
> +		dev->tx_queue_len = 0;
>  		break;
>  
>  	case TUN_TAP_DEV:
> @@ -532,7 +532,7 @@ static void tun_net_init(struct net_device *dev)
>  
>  		eth_hw_addr_random(dev);
>  
> -		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
> +		dev->tx_queue_len = 0;
>  		break;
>  	}
>  }
> 

I presume the fastpath is when the device is *not* stalled,
correct? So maybe the following on top of my patch - untested
and naturally would need to be combined and resubmitted properly -
would address the concern?
Thanks for the review!


diff --git a/net/core/dev.c b/net/core/dev.c
index 9d713b8..e42529b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2455,6 +2455,11 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 		rc = NET_XMIT_SUCCESS;
 	} else {
 		skb_dst_force(skb);
+		/* Orphan the skb - required if we might hang on to it
+		 * for indefinite time. */
+		if (unlikely(dev->priv_flags & IFF_TX_CAN_STALL))
+			skb_orphan(skb);
+
 		rc = q->enqueue(skb, q) & NET_XMIT_MASK;
 		if (qdisc_run_begin(q)) {
 			if (unlikely(contended)) {
@@ -2517,11 +2522,6 @@ int dev_queue_xmit(struct sk_buff *skb)
 	struct Qdisc *q;
 	int rc = -ENOMEM;
 
-	/* Orphan the skb - required if we might hang on to it
-	 * for indefinite time. */
-	if (dev->priv_flags & IFF_TX_CAN_STALL)
-		skb_orphan(skb);
-
 	/* Disable soft irqs for various locks below. Also
 	 * stops preemption for RCU.
 	 */
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 27883d1..ccc6137 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -120,14 +120,11 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	/* And release qdisc */
 	spin_unlock(root_lock);
 
-	/* Orphan the skb - required if we might hang on to it
-	 * for indefinite time. */
-	if (dev->priv_flags & IFF_TX_CAN_STALL)
-		skb_orphan(skb);
-
 	HARD_TX_LOCK(dev, txq, smp_processor_id());
-	if (!netif_xmit_frozen_or_stopped(txq))
+	if (likely(!netif_xmit_frozen_or_stopped(txq)))
 		ret = dev_hard_start_xmit(skb, dev, txq);
+	else if (dev->priv_flags & IFF_TX_CAN_STALL)
+		skb_orphan(skb);
 
 	HARD_TX_UNLOCK(dev, txq);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10  8:41   ` Michael S. Tsirkin
@ 2012-04-10  8:55     ` Eric Dumazet
  2012-04-10  9:31       ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2012-04-10  8:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, 2012-04-10 at 11:41 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 10, 2012 at 09:55:58AM +0200, Eric Dumazet wrote:

> > In your case I would just not use qdisc at all, like other virtual
> > devices.
> 
> I think that if we do this, this also disables gso
> for the device, doesn't it?

Not at all, thats unrelated.

> If true that would be a problem as this would
> hurt performance of virtualized setups a lot.

In fact, removing qdisc layer will help a lot, removing a contention
point.

Anyway, with a 500 packet limit in TUN queue itself, qdisc layer should
be always empty. Whats the point storing more than 500 packets for a
device ? Thats a latency killer.

> 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index bb8c72c..fd8c7f0 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -396,7 +396,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	    sk_filter(tun->socket.sk, skb))
> >  		goto drop;
> >  
> > -	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
> > +	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= TUN_READQ_SIZE) {
> >  		if (!(tun->flags & TUN_ONE_QUEUE)) {
> >  			/* Normal queueing mode. */
> >  			/* Packet scheduler handles dropping of further packets. */
> 
> tx_queue_len is controllable by SIOCSIFTXQLEN
> so we'll need to override SIOCSIFTXQLEN somehow
> to avoid breaking userspace that actually uses SIOCSIFTXQLEN, right?

Right now, you control with this tx_queue_len both the qdisc limit (if
pfifo_fast default) and the receive_queue in TUN.

That doesnt seem right to me, and more a hack/side effect.

Maybe you want to introduce a new setting, only controling receive queue
limit, and use tx_queue_len for its original meaning.

Then, setting tx_queue_len to 0 permits to remove qdisc layer, as any
other netdevice.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10  8:55     ` Eric Dumazet
@ 2012-04-10  9:31       ` Michael S. Tsirkin
  2012-04-10 10:04         ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-10  9:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 10:55:00AM +0200, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 11:41 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 10, 2012 at 09:55:58AM +0200, Eric Dumazet wrote:
> 
> > > In your case I would just not use qdisc at all, like other virtual
> > > devices.
> > 
> > I think that if we do this, this also disables gso
> > for the device, doesn't it?
> 
> Not at all, thats unrelated.
> 
> > If true that would be a problem as this would
> > hurt performance of virtualized setups a lot.
> 
> In fact, removing qdisc layer will help a lot, removing a contention
> point.
>
> Anyway, with a 500 packet limit in TUN queue itself, qdisc layer should
> be always empty. Whats the point storing more than 500 packets for a
> device ? Thats a latency killer.

AKA bufferbloat :)
We could try and reduce the TUN queue so that qdisc can drop packets in
an intelligent manner (e.g. choke) but this would conflict with what you
propose, right?

> > 
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index bb8c72c..fd8c7f0 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -396,7 +396,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> > >  	    sk_filter(tun->socket.sk, skb))
> > >  		goto drop;
> > >  
> > > -	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
> > > +	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= TUN_READQ_SIZE) {
> > >  		if (!(tun->flags & TUN_ONE_QUEUE)) {
> > >  			/* Normal queueing mode. */
> > >  			/* Packet scheduler handles dropping of further packets. */
> > 
> > tx_queue_len is controllable by SIOCSIFTXQLEN
> > so we'll need to override SIOCSIFTXQLEN somehow
> > to avoid breaking userspace that actually uses SIOCSIFTXQLEN, right?
> 
> Right now, you control with this tx_queue_len both the qdisc limit (if
> pfifo_fast default) and the receive_queue in TUN.
> 
> That doesnt seem right to me, and more a hack/side effect.
> Maybe you want to introduce a new setting, only controling receive queue
> limit, and use tx_queue_len for its original meaning.
> 
> Then, setting tx_queue_len to 0 permits to remove qdisc layer, as any
> other netdevice.
> 
> 

True. Still this is the only interface we have for controlling
the internal queue length so it seems safe to assume someone
is using it for this purpose.

And if that happens we get the deadlock back since
tx_queue_len will get set to a non-0 value. Right?

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10  9:31       ` Michael S. Tsirkin
@ 2012-04-10 10:04         ` Eric Dumazet
  2012-04-10 11:25           ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2012-04-10 10:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, 2012-04-10 at 12:31 +0300, Michael S. Tsirkin wrote:

> True. Still this is the only interface we have for controlling
> the internal queue length so it seems safe to assume someone
> is using it for this purpose.
> 

So to workaround a problem in tun, you want to hack net/core/dev.c :(

Packets in qdisc should not be orphaned.

If you think about it, why do we attach skb to socket in the first
place ?

If its not needed for tun, why should it be needed for other devices ?

If TUN has a problem being stopped forever, maybe it should take
appropriate action to flush all packets in qdisc queue after a while, as
this makes no sense to delay packets forever.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 10:04         ` Eric Dumazet
@ 2012-04-10 11:25           ` Michael S. Tsirkin
  2012-04-10 11:45             ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-10 11:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 12:04:19PM +0200, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 12:31 +0300, Michael S. Tsirkin wrote:
> 
> > True. Still this is the only interface we have for controlling
> > the internal queue length so it seems safe to assume someone
> > is using it for this purpose.
> > 
> 
> So to workaround a problem in tun, you want to hack net/core/dev.c :(

Sorry about being unclear, I'm just saying that your patch assumes
tx_queue_len == 0 since you set it that way at device init but we can't
rely on this as existing users might have changed that value.
One way to fix would be a patch at the bottom: then we
can leave tun to treat tx_queue_len like it always did.

> Packets in qdisc should not be orphaned.
>
> If you think about it, why do we attach skb to socket in the first
> place ?

Good point. The answer is to avoid skb drops for local sockets by
stopping them, right?

> If its not needed for tun, why should it be needed for other devices ?

Maybe needed but it's already broken for tun since the skbs in the
private queue are orphaned by skb_orphan_try?

> If TUN has a problem being stopped forever, maybe it should take
> appropriate action to flush all packets in qdisc queue after a while, as
> this makes no sense to delay packets forever.
> 

Well arbitrary timers aren't a solid protection, right?
We get out of stalling transmitters forever but a bad VM can still
degrade performance significantly for others...


----

We don't want a queue for tun since it can stall forever, but userspace
might tweak it's tx_queue_len as a way to control RX queue depth,
and we don't want to break userspace. Use a private flag to disable queue.

Warning: untested.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 27883d1..644ca53 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -695,7 +692,7 @@ static void attach_one_default_qdisc(struct net_device *dev,
 {
 	struct Qdisc *qdisc = &noqueue_qdisc;
 
-	if (dev->tx_queue_len) {
+	if (dev->tx_queue_len && !(dev->priv_flags & IFF_TX_CAN_STALL)) {
 		qdisc = qdisc_create_dflt(dev_queue,
 					  &pfifo_fast_ops, TC_H_ROOT);
 		if (!qdisc) {
-- 
MST

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 11:25           ` Michael S. Tsirkin
@ 2012-04-10 11:45             ` Eric Dumazet
  2012-04-10 12:41               ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2012-04-10 11:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, 2012-04-10 at 14:25 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 10, 2012 at 12:04:19PM +0200, Eric Dumazet wrote:
> > On Tue, 2012-04-10 at 12:31 +0300, Michael S. Tsirkin wrote:
> > 
> > > True. Still this is the only interface we have for controlling
> > > the internal queue length so it seems safe to assume someone
> > > is using it for this purpose.
> > > 
> > 
> > So to workaround a problem in tun, you want to hack net/core/dev.c :(
> 
> Sorry about being unclear, I'm just saying that your patch assumes
> tx_queue_len == 0 since you set it that way at device init but we can't
> rely on this as existing users might have changed that value.
> One way to fix would be a patch at the bottom: then we
> can leave tun to treat tx_queue_len like it always did.


> ----
> 
> We don't want a queue for tun since it can stall forever, but userspace
> might tweak it's tx_queue_len as a way to control RX queue depth,
> and we don't want to break userspace. Use a private flag to disable queue.
> 
> Warning: untested.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 27883d1..644ca53 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -695,7 +692,7 @@ static void attach_one_default_qdisc(struct net_device *dev,
>  {
>  	struct Qdisc *qdisc = &noqueue_qdisc;
>  
> -	if (dev->tx_queue_len) {
> +	if (dev->tx_queue_len && !(dev->priv_flags & IFF_TX_CAN_STALL)) {
>  		qdisc = qdisc_create_dflt(dev_queue,
>  					  &pfifo_fast_ops, TC_H_ROOT);
>  		if (!qdisc) {


Thing is this function is called before userspace can tweak tx_queue_len

So if you create a vlan device (this sets tx_queue_len to 0), no qdisc
is attached.

If later userspace changes tx_queue_len to this device, qdisc wont
automatically be created/attached.

Really, tx_queue_len is private to net/sched layer, it should not be
used by tun device to control a receive queue limit.

Please try to not hack net/sched or net/core for your needs.

Its not because tun abused tx_queue_len in the past we must keep this
hack forever.

In ethernet drivers, TX ring size is controlled by ethtool -g

Why tun driver would use another way ?




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 11:45             ` Eric Dumazet
@ 2012-04-10 12:41               ` Michael S. Tsirkin
  2012-04-10 13:52                 ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-10 12:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 01:45:00PM +0200, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 14:25 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 10, 2012 at 12:04:19PM +0200, Eric Dumazet wrote:
> > > On Tue, 2012-04-10 at 12:31 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > True. Still this is the only interface we have for controlling
> > > > the internal queue length so it seems safe to assume someone
> > > > is using it for this purpose.
> > > > 
> > > 
> > > So to workaround a problem in tun, you want to hack net/core/dev.c :(
> > 
> > Sorry about being unclear, I'm just saying that your patch assumes
> > tx_queue_len == 0 since you set it that way at device init but we can't
> > rely on this as existing users might have changed that value.
> > One way to fix would be a patch at the bottom: then we
> > can leave tun to treat tx_queue_len like it always did.
> 
> 
> > ----
> > 
> > We don't want a queue for tun since it can stall forever, but userspace
> > might tweak it's tx_queue_len as a way to control RX queue depth,
> > and we don't want to break userspace. Use a private flag to disable queue.
> > 
> > Warning: untested.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 27883d1..644ca53 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -695,7 +692,7 @@ static void attach_one_default_qdisc(struct net_device *dev,
> >  {
> >  	struct Qdisc *qdisc = &noqueue_qdisc;
> >  
> > -	if (dev->tx_queue_len) {
> > +	if (dev->tx_queue_len && !(dev->priv_flags & IFF_TX_CAN_STALL)) {
> >  		qdisc = qdisc_create_dflt(dev_queue,
> >  					  &pfifo_fast_ops, TC_H_ROOT);
> >  		if (!qdisc) {
> 
> 
> Thing is this function is called before userspace can tweak tx_queue_len
> 
> So if you create a vlan device (this sets tx_queue_len to 0), no qdisc
> is attached.
>
> If later userspace changes tx_queue_len to this device, qdisc wont
> automatically be created/attached.

True. But there's another place where this can happen - after
dev_change_net_namespace, no?
This calls dev_shutdown.

> Really, tx_queue_len is private to net/sched layer, it should not be
> used by tun device to control a receive queue limit.
> 
> Please try to not hack net/sched or net/core for your needs.
> 
> Its not because tun abused tx_queue_len in the past we must keep this
> hack forever.
> 
> In ethernet drivers, TX ring size is controlled by ethtool -g
> 
> Why tun driver would use another way ?
> 

I think it's a bad interface too but it's in a userspace ABI
now so I suspect we are stuck with it for now. We can try deprecating
but we can't just drop it.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 12:41               ` Michael S. Tsirkin
@ 2012-04-10 13:52                 ` Eric Dumazet
  2012-04-10 14:10                   ` Michael S. Tsirkin
                                     ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Eric Dumazet @ 2012-04-10 13:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, 2012-04-10 at 15:41 +0300, Michael S. Tsirkin wrote:

> I think it's a bad interface too but it's in a userspace ABI
> now so I suspect we are stuck with it for now. We can try deprecating
> but we can't just drop it.
> 

By the way, skb orphaning should already be done in skb_orphan_try(),
not sure why its done again in tun_net_xmit(). Note we perform orphaning
right before giving skb to device on premise it'll be sent (and freed)
in a reasonable amount of time.


With following patch, no more qdisc on top of tun device, yet user can
change the limit (I would be curious to know if anybody changes tun
txqueuelen and why)




diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bb8c72c..c4a00cf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -379,6 +379,7 @@ static int tun_net_close(struct net_device *dev)
 static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct tun_struct *tun = netdev_priv(dev);
+	int limit;
 
 	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
 
@@ -396,7 +397,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	    sk_filter(tun->socket.sk, skb))
 		goto drop;
 
-	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
+	limit = dev->tx_queue_len ? : TUN_READQ_SIZE;
+	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= limit) {
 		if (!(tun->flags & TUN_ONE_QUEUE)) {
 			/* Normal queueing mode. */
 			/* Packet scheduler handles dropping of further packets. */
@@ -521,7 +523,7 @@ static void tun_net_init(struct net_device *dev)
 		/* Zero header length */
 		dev->type = ARPHRD_NONE;
 		dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
-		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
+		dev->tx_queue_len = 0;
 		break;
 
 	case TUN_TAP_DEV:
@@ -532,7 +534,7 @@ static void tun_net_init(struct net_device *dev)
 
 		eth_hw_addr_random(dev);
 
-		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
+		dev->tx_queue_len = 0;
 		break;
 	}
 }



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 13:52                 ` Eric Dumazet
@ 2012-04-10 14:10                   ` Michael S. Tsirkin
  2012-04-11 21:52                   ` Michael S. Tsirkin
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-10 14:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 03:52:09PM +0200, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 15:41 +0300, Michael S. Tsirkin wrote:
> 
> > I think it's a bad interface too but it's in a userspace ABI
> > now so I suspect we are stuck with it for now. We can try deprecating
> > but we can't just drop it.
> > 
> 
> By the way, skb orphaning should already be done in skb_orphan_try(),
> not sure why its done again in tun_net_xmit().

I think it's for when some tx flags are set.

> Note we perform orphaning right before giving skb to device on premise
> it'll be sent (and freed) in a reasonable amount of time.

Right. No way to ensure this with tun :(

> With following patch, no more qdisc on top of tun device,

I'll test this, thanks.

> yet user can
> change the limit

Question: if tun is moved between namespaces after tx queue len
is changed, will it get a queueing qdisc then?
If yes we need some other hack to address that ...

> (I would be curious to know if anybody changes tun
> txqueuelen and why)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index bb8c72c..c4a00cf 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -379,6 +379,7 @@ static int tun_net_close(struct net_device *dev)
>  static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct tun_struct *tun = netdev_priv(dev);
> +	int limit;
>  
>  	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
>  
> @@ -396,7 +397,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	    sk_filter(tun->socket.sk, skb))
>  		goto drop;
>  
> -	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
> +	limit = dev->tx_queue_len ? : TUN_READQ_SIZE;
> +	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= limit) {
>  		if (!(tun->flags & TUN_ONE_QUEUE)) {
>  			/* Normal queueing mode. */
>  			/* Packet scheduler handles dropping of further packets. */
> @@ -521,7 +523,7 @@ static void tun_net_init(struct net_device *dev)
>  		/* Zero header length */
>  		dev->type = ARPHRD_NONE;
>  		dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
> -		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
> +		dev->tx_queue_len = 0;
>  		break;
>  
>  	case TUN_TAP_DEV:
> @@ -532,7 +534,7 @@ static void tun_net_init(struct net_device *dev)
>  
>  		eth_hw_addr_random(dev);
>  
> -		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
> +		dev->tx_queue_len = 0;
>  		break;
>  	}
>  }
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 13:52                 ` Eric Dumazet
  2012-04-10 14:10                   ` Michael S. Tsirkin
@ 2012-04-11 21:52                   ` Michael S. Tsirkin
  2012-05-08 19:35                   ` Michael S. Tsirkin
  2012-05-08 19:50                   ` Michael S. Tsirkin
  3 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-04-11 21:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 03:52:09PM +0200, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 15:41 +0300, Michael S. Tsirkin wrote:
> 
> > I think it's a bad interface too but it's in a userspace ABI
> > now so I suspect we are stuck with it for now. We can try deprecating
> > but we can't just drop it.
> > 
> 
> By the way, skb orphaning should already be done in skb_orphan_try(),
> not sure why its done again in tun_net_xmit(). Note we perform orphaning
> right before giving skb to device on premise it'll be sent (and freed)
> in a reasonable amount of time.
> 
> 
> With following patch, no more qdisc on top of tun device, yet user can
> change the limit (I would be curious to know if anybody changes tun
> txqueuelen and why)

Following would makes the default visible with SIOCGIFTXQLEN
same as it was + avoid branch on data path.
Didn't have time to test yet - it should achieve
same purpose, shouldn't it?
Or do I misunderstand how it's designed to work?

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bb8c72c..418abd1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -521,7 +521,6 @@ static void tun_net_init(struct net_device *dev)
 		/* Zero header length */
 		dev->type = ARPHRD_NONE;
 		dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
-		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
 		break;
 
 	case TUN_TAP_DEV:
@@ -531,10 +530,9 @@ static void tun_net_init(struct net_device *dev)
 		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
 		eth_hw_addr_random(dev);
-
-		dev->tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
 		break;
 	}
+	dev->tx_queue_len = 0;  /* Disable qdisc queueing */
 }
 
 /* Character device part */
@@ -1143,6 +1141,12 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (err < 0)
 			goto err_free_sk;
 
+		/*
+		 * We (ab)use queue length to limit our amount of buffering.
+		 * Set our own default queue length.
+		 */
+		dev->tx_queue_len = TUN_READQ_SIZE;
+
 		if (device_create_file(&tun->dev->dev, &dev_attr_tun_flags) ||
 		    device_create_file(&tun->dev->dev, &dev_attr_owner) ||
 		    device_create_file(&tun->dev->dev, &dev_attr_group))

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 13:52                 ` Eric Dumazet
  2012-04-10 14:10                   ` Michael S. Tsirkin
  2012-04-11 21:52                   ` Michael S. Tsirkin
@ 2012-05-08 19:35                   ` Michael S. Tsirkin
  2012-05-08 19:50                   ` Michael S. Tsirkin
  3 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-05-08 19:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 03:52:09PM +0200, Eric Dumazet wrote:
> By the way, skb orphaning should already be done in skb_orphan_try(),
> not sure why its done again in tun_net_xmit().

Because we need to force it for skbs which have tx_flags set.

> Note we perform orphaning
> right before giving skb to device on premise it'll be sent (and freed)
> in a reasonable amount of time.

It is usually the case. I think this assumption holds for
tun normally.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net: orphan queued skbs if device tx can stall
  2012-04-10 13:52                 ` Eric Dumazet
                                     ` (2 preceding siblings ...)
  2012-05-08 19:35                   ` Michael S. Tsirkin
@ 2012-05-08 19:50                   ` Michael S. Tsirkin
  3 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2012-05-08 19:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu

On Tue, Apr 10, 2012 at 03:52:09PM +0200, Eric Dumazet wrote:
> With following patch, no more qdisc on top of tun device,

I'm not sure killing qdisc is a right direction.
I think tun queue is currently oversized but I
also think we would benefit from a smart queue
management. As it is if tun user gets delayed,
stale packets accumulate in the queue and there's
no way to get rid of them without ploughing through
the backlog.

In a sense we don't really need the queue in tun at all,
the only reason we have it at all, is to make code simpler.
We could thinkably move skbs from qdisc directly to userspace.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-05-08 19:50 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-08 17:13 [PATCH] net: orphan queued skbs if device tx can stall Michael S. Tsirkin
2012-04-08 23:49 ` Herbert Xu
2012-04-08 23:49   ` Herbert Xu
2012-04-09  7:28   ` Michael S. Tsirkin
2012-04-09  7:33     ` Herbert Xu
2012-04-09  7:33       ` Herbert Xu
2012-04-09  7:39       ` Michael S. Tsirkin
2012-04-09  8:29         ` Herbert Xu
2012-04-09  8:29           ` Herbert Xu
2012-04-09  8:34           ` Michael S. Tsirkin
2012-04-09  8:39             ` Herbert Xu
2012-04-09  8:39               ` Herbert Xu
2012-04-09  8:42               ` Michael S. Tsirkin
2012-04-09  9:13                 ` Eric Dumazet
2012-04-10  7:55 ` Eric Dumazet
2012-04-10  8:41   ` Michael S. Tsirkin
2012-04-10  8:55     ` Eric Dumazet
2012-04-10  9:31       ` Michael S. Tsirkin
2012-04-10 10:04         ` Eric Dumazet
2012-04-10 11:25           ` Michael S. Tsirkin
2012-04-10 11:45             ` Eric Dumazet
2012-04-10 12:41               ` Michael S. Tsirkin
2012-04-10 13:52                 ` Eric Dumazet
2012-04-10 14:10                   ` Michael S. Tsirkin
2012-04-11 21:52                   ` Michael S. Tsirkin
2012-05-08 19:35                   ` Michael S. Tsirkin
2012-05-08 19:50                   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.