All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <George.Dunlap@eu.citrix.com>
To: David Laight <David.Laight@aculab.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Jonathan Davies <Jonathan.Davies@citrix.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	netdev <netdev@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Paul Durrant <paul.durrant@citrix.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	Felipe Franciosi <felipe.franciosi@citrix.com>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen
Date: Thu, 16 Apr 2015 11:57:24 +0100	[thread overview]
Message-ID: <CAFLBxZYn1t8huXrLyXPhRO=B4fthsQy9J+3bVTU5-n9AsRS43w@mail.gmail.com> (raw)
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1CB1F43C@AcuExch.aculab.com>

On Thu, Apr 16, 2015 at 10:22 AM, David Laight <David.Laight@aculab.com> wrote:
> ISTM that you are changing the wrong knob.
> You need to change something that affects the global amount of pending tx data,
> not the amount that can be buffered by a single connection.

Well it seems like the problem is that the global amount of pending tx
data is high enough, but that the per-stream amount is too low for
only a single stream.

> If you change tcp_limit_output_bytes and then have 1000 connections trying
> to send data you'll suffer 'bufferbloat'.

Right -- so are you worried about the buffers in the local device
here, or are you worried about buffers elsewhere in the network?

If you're worried about buffers on the local device, don't you have a
similar problem for physical NICs?  i.e., if a NIC has a big buffer
that you're trying to keep mostly empty, limiting a single TCP stream
may keep that buffer empty, but if you have 1000 connections,
1000*limit will still fill up the buffer.

Or am I missing something?

> If you call skb_orphan() in the tx setup path then the total number of
> buffers is limited, but a single connection can (and will) will the tx
> ring leading to incorrect RTT calculations and additional latency for
> other connections.
> This will give high single connection throughput but isn't ideal.
>
> One possibility might be to call skb_orphan() when enough time has
> elapsed since the packet was queued for transmit that it is very likely
> to have actually been transmitted - even though 'transmit done' has
> not yet been signalled.
> Not at all sure how this would fit in though...

Right -- so it sounds like the problem with skb_orphan() is making
sure that the tx ring is shared properly between different streams.
That would mean that ideally we wouldn't call it until the tx ring
actually had space to add more packets onto it.

The Xen project is having a sort of developer meeting in a few weeks;
if we can get a good picture of all the constraints, maybe we can hash
out a solution that works for everyone.

 -George

WARNING: multiple messages have this Message-ID (diff)
From: George.Dunlap@eu.citrix.com (George Dunlap)
To: linux-arm-kernel@lists.infradead.org
Subject: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen
Date: Thu, 16 Apr 2015 11:57:24 +0100	[thread overview]
Message-ID: <CAFLBxZYn1t8huXrLyXPhRO=B4fthsQy9J+3bVTU5-n9AsRS43w@mail.gmail.com> (raw)
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1CB1F43C@AcuExch.aculab.com>

On Thu, Apr 16, 2015 at 10:22 AM, David Laight <David.Laight@aculab.com> wrote:
> ISTM that you are changing the wrong knob.
> You need to change something that affects the global amount of pending tx data,
> not the amount that can be buffered by a single connection.

Well it seems like the problem is that the global amount of pending tx
data is high enough, but that the per-stream amount is too low for
only a single stream.

> If you change tcp_limit_output_bytes and then have 1000 connections trying
> to send data you'll suffer 'bufferbloat'.

Right -- so are you worried about the buffers in the local device
here, or are you worried about buffers elsewhere in the network?

If you're worried about buffers on the local device, don't you have a
similar problem for physical NICs?  i.e., if a NIC has a big buffer
that you're trying to keep mostly empty, limiting a single TCP stream
may keep that buffer empty, but if you have 1000 connections,
1000*limit will still fill up the buffer.

Or am I missing something?

> If you call skb_orphan() in the tx setup path then the total number of
> buffers is limited, but a single connection can (and will) will the tx
> ring leading to incorrect RTT calculations and additional latency for
> other connections.
> This will give high single connection throughput but isn't ideal.
>
> One possibility might be to call skb_orphan() when enough time has
> elapsed since the packet was queued for transmit that it is very likely
> to have actually been transmitted - even though 'transmit done' has
> not yet been signalled.
> Not at all sure how this would fit in though...

Right -- so it sounds like the problem with skb_orphan() is making
sure that the tx ring is shared properly between different streams.
That would mean that ideally we wouldn't call it until the tx ring
actually had space to add more packets onto it.

The Xen project is having a sort of developer meeting in a few weeks;
if we can get a good picture of all the constraints, maybe we can hash
out a solution that works for everyone.

 -George

  reply	other threads:[~2015-04-16 10:57 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-09 15:46 "tcp: refine TSO autosizing" causes performance regression on Xen Stefano Stabellini
2015-04-09 15:46 ` Stefano Stabellini
2015-04-09 15:46 ` Stefano Stabellini
2015-04-09 16:16 ` Eric Dumazet
2015-04-09 16:16   ` Eric Dumazet
2015-04-09 16:36   ` Stefano Stabellini
2015-04-09 16:36     ` Stefano Stabellini
2015-04-09 16:36     ` Stefano Stabellini
2015-04-09 17:07     ` Eric Dumazet
2015-04-09 17:07       ` Eric Dumazet
2015-04-13 10:56     ` [Xen-devel] " George Dunlap
2015-04-13 10:56       ` George Dunlap
2015-04-13 13:38       ` Jonathan Davies
2015-04-13 13:38         ` Jonathan Davies
2015-04-13 13:38         ` Jonathan Davies
2015-04-13 13:49       ` Eric Dumazet
2015-04-13 13:49         ` Eric Dumazet
2015-04-15 13:43         ` George Dunlap
2015-04-15 13:43           ` George Dunlap
2015-04-15 16:38           ` Eric Dumazet
2015-04-15 16:38             ` Eric Dumazet
2015-04-15 16:38             ` Eric Dumazet
2015-04-15 17:23             ` George Dunlap
2015-04-15 17:23               ` George Dunlap
2015-04-15 17:23               ` George Dunlap
2015-04-15 17:29               ` Eric Dumazet
2015-04-15 17:29                 ` Eric Dumazet
2015-04-15 17:41                 ` George Dunlap
2015-04-15 17:41                   ` George Dunlap
2015-04-15 17:41                   ` George Dunlap
2015-04-15 17:52                   ` Eric Dumazet
2015-04-15 17:52                     ` Eric Dumazet
2015-04-15 17:55                     ` Rick Jones
2015-04-15 17:55                       ` Rick Jones
2015-04-15 18:08                       ` Eric Dumazet
2015-04-15 18:08                         ` Eric Dumazet
2015-04-15 18:19                         ` Rick Jones
2015-04-15 18:19                           ` Rick Jones
2015-04-15 18:32                           ` Eric Dumazet
2015-04-15 18:32                             ` Eric Dumazet
2015-04-15 18:32                             ` Eric Dumazet
2015-04-15 20:08                             ` [Xen-devel] " Rick Jones
2015-04-15 20:08                               ` Rick Jones
2015-04-15 20:08                               ` Rick Jones
2015-04-15 18:04                     ` George Dunlap
2015-04-15 18:04                       ` George Dunlap
2015-04-15 18:04                       ` George Dunlap
2015-04-15 18:19                       ` Eric Dumazet
2015-04-15 18:19                         ` Eric Dumazet
2015-04-16  8:56                         ` George Dunlap
2015-04-16  8:56                           ` George Dunlap
2015-04-16  8:56                           ` George Dunlap
2015-04-16  9:20                           ` Daniel Borkmann
2015-04-16  9:20                             ` Daniel Borkmann
2015-04-16  9:20                             ` Daniel Borkmann
2015-04-16 10:01                             ` George Dunlap
2015-04-16 10:01                               ` George Dunlap
2015-04-16 10:01                               ` George Dunlap
2015-04-16 12:42                               ` Eric Dumazet
2015-04-16 12:42                                 ` Eric Dumazet
2015-04-20 11:03                                 ` George Dunlap
2015-04-20 11:03                                   ` George Dunlap
2015-06-02  9:52                                 ` Wei Liu
2015-06-02  9:52                                   ` Wei Liu
2015-06-02  9:52                                   ` Wei Liu
2015-06-02 16:16                                   ` Eric Dumazet
2015-06-02 16:16                                     ` Eric Dumazet
2015-04-16  9:22                           ` David Laight
2015-04-16  9:22                             ` David Laight
2015-04-16  9:22                             ` David Laight
2015-04-16 10:57                             ` George Dunlap [this message]
2015-04-16 10:57                               ` George Dunlap
2015-04-16 10:57                               ` George Dunlap
2015-04-15 17:41               ` Eric Dumazet
2015-04-15 17:41                 ` Eric Dumazet
2015-04-15 17:58                 ` Stefano Stabellini
2015-04-15 17:58                   ` Stefano Stabellini
2015-04-15 17:58                   ` Stefano Stabellini
2015-04-15 18:17                   ` Eric Dumazet
2015-04-15 18:17                     ` Eric Dumazet
2015-04-16  4:20                     ` Herbert Xu
2015-04-16  4:20                       ` Herbert Xu
2015-04-16  4:30                       ` Eric Dumazet
2015-04-16  4:30                         ` Eric Dumazet
2015-04-16 11:39                     ` George Dunlap
2015-04-16 11:39                       ` George Dunlap
2015-04-16 11:39                       ` George Dunlap
2015-04-16 12:16                       ` Eric Dumazet
2015-04-16 12:16                         ` Eric Dumazet
2015-04-16 13:00                       ` Tim Deegan
2015-04-16 13:00                         ` Tim Deegan
2015-04-16 13:00                         ` Tim Deegan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFLBxZYn1t8huXrLyXPhRO=B4fthsQy9J+3bVTU5-n9AsRS43w@mail.gmail.com' \
    --to=george.dunlap@eu.citrix.com \
    --cc=David.Laight@aculab.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Jonathan.Davies@citrix.com \
    --cc=christoffer.dall@linaro.org \
    --cc=david.vrabel@citrix.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=felipe.franciosi@citrix.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paul.durrant@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.