All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neal Cardwell <ncardwell@google.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Yuchung Cheng <ycheng@google.com>,
	Nandita Dukkipati <nanditad@google.com>,
	Van Jacobson <vanj@google.com>
Subject: Re: [PATCH v3 net-next] tcp: refine TSO splits
Date: Fri, 13 Dec 2013 20:59:41 -0500	[thread overview]
Message-ID: <CADVnQy=Am3eOuZcGf3swzWRMa-GfutyCA_8V-K0of0booZwzhg@mail.gmail.com> (raw)
In-Reply-To: <1386971483.19078.188.camel@edumazet-glaptop2.roam.corp.google.com>

On Fri, Dec 13, 2013 at 4:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While investigating performance problems on small RPC workloads,
> I noticed linux TCP stack was always splitting the last TSO skb
> into two parts (skbs). One being a multiple of MSS, and a small one
> with the Push flag. This split is done even if TCP_NODELAY is set,
> or if no small packet is in flight.
>
> Example with request/response of 4K/4K
>
> IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
> IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
> IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
> IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
> IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
> IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
> IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
> IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
> IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
>
> We can avoid this split by including the Nagle tests at the right place.
>
> Note : If some NIC had trouble sending TSO packets with a partial
> last segment, we would have hit the problem in GRO/forwarding workload already.
>
> tcp_minshall_update() is moved to tcp_output.c and is updated as we might
> feed a TSO packet with a partial last segment.
>
> This patch tremendously improves performance, as the traffic now looks
> like :
>
> IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
> IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
> IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
> IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
> IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
> IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
> IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
> IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
> IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
> IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
> IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
> IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
>
>
> Before :
>
> lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
> 280774
>
>  Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
>
>      205719.049006 task-clock                #    9.278 CPUs utilized
>          8,449,968 context-switches          #    0.041 M/sec
>          1,935,997 CPU-migrations            #    0.009 M/sec
>            160,541 page-faults               #    0.780 K/sec
>    548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
>    455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
>    272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
>    166,091,460,030 instructions              #    0.30  insns per cycle
>                                              #    2.74  stalled cycles per insn [83.39%]
>     29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
>      1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]
>
>       22.173517844 seconds time elapsed
>
> lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
> IpOutRequests                   16851063           0.0
> IpExtOutOctets                  23878580777        0.0
>
> After patch :
>
> lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
> 280877
>
>  Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
>
>      107496.071918 task-clock                #    4.847 CPUs utilized
>          5,635,458 context-switches          #    0.052 M/sec
>          1,374,707 CPU-migrations            #    0.013 M/sec
>            160,920 page-faults               #    0.001 M/sec
>    281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
>    228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
>    142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
>     95,227,712,566 instructions              #    0.34  insns per cycle
>                                              #    2.40  stalled cycles per insn [83.43%]
>     16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
>        874,252,952 branch-misses             #    5.39% of all branches         [83.37%]
>
>       22.175821286 seconds time elapsed
>
> lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
> IpOutRequests                   11239428           0.0
> IpExtOutOctets                  23595191035        0.0
>
> Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
> 2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
> scheduler.
>
> Many thanks to Neal for review and ideas.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Nandita Dukkipati <nanditad@google.com>
> Cc: Van Jacobson <vanj@google.com>
> ---
> v3: Add the Nagle tests to make sure we can send the whole skb
>     according to Nagle rules. Factorize the code.
>
> v2: changed tcp_minshall_update() as Neal pointed out.
>
>  include/net/tcp.h     |    7 ----
>  net/ipv4/tcp_output.c |   65 +++++++++++++++++++++++++---------------
>  2 files changed, 42 insertions(+), 30 deletions(-)

I love it! Thanks, Eric.

Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>


neal

  reply	other threads:[~2013-12-14  1:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-12 19:28 [PATCH net-next] tcp: remove a bogus TSO split Eric Dumazet
2013-12-13 14:15 ` Neal Cardwell
2013-12-13 14:54   ` Eric Dumazet
2013-12-13 16:22     ` Neal Cardwell
2013-12-13 17:54       ` Eric Dumazet
2013-12-13 16:58   ` David Laight
2013-12-13 17:56     ` Eric Dumazet
2013-12-13 18:13 ` [PATCH v2 " Eric Dumazet
2013-12-13 18:17   ` Neal Cardwell
2013-12-13 18:38     ` Eric Dumazet
2013-12-13 21:51   ` [PATCH v3 net-next] tcp: refine TSO splits Eric Dumazet
2013-12-14  1:59     ` Neal Cardwell [this message]
2013-12-14  2:05       ` Eric Dumazet
2013-12-17 20:15         ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADVnQy=Am3eOuZcGf3swzWRMa-GfutyCA_8V-K0of0booZwzhg@mail.gmail.com' \
    --to=ncardwell@google.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=nanditad@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=vanj@google.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.