From: Willy Tarreau <w@1wt.eu>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Subject: Stable regression with 'tcp: allow splice() to build full TSO packets'
Date: Thu, 17 May 2012 14:18:00 +0200 [thread overview]
Message-ID: <20120517121800.GA18052@1wt.eu> (raw)
Hi Eric,
I'm facing a regression in stable 3.2.17 and 3.0.31 which is
exhibited by your patch 'tcp: allow splice() to build full TSO
packets' which unfortunately I am very interested in !
What I'm observing is that TCP transmits using splice() stall
quite quickly if I'm using pipes larger than 64kB (even 65537
is enough to reliably observe the stall).
I'm seeing this on haproxy running on a small ARM machine (a
dockstar), which exchanges data through a gig switch with my
development PC. The NIC (mv643xx) doesn't support TSO but has
GSO enabled. If I disable GSO, the problem remains. I can however
make the problem disappear by disabling SG or Tx checksumming.
BTW, using recv/send() instead of splice() also gets rid of the
problem.
I can also reduce the risk of seeing the problem by increasing
the default TCP buffer sizes in tcp_wmem. By default I'm running
at 16kB, but if I increase the output buffer size above the pipe
size, the problem *seems* to disappear though I can't be certain,
since larger buffers generally means the problem takes longer to
appear, probably due to the fact that the buffers don't need to
be filled. Still I'm certain that with 64k TCP buffers and 128k
pipes I'm still seeing it.
With strace, I'm seeing data fill up the pipe with the splice()
call responsible for pushing the data to the output socket returing
-1 EAGAIN. During this time, the client receives no data.
Something bugs me, I have tested with a dummy server of mine,
httpterm, which uses tee+splice() to push data outside, and it
has no problem filling the gig pipe, and correctly recoverers
from the EAGAIN :
send(13, "HTTP/1.1 200\r\nConnection: close\r"..., 160, MSG_DONTWAIT|MSG_NOSIGNAL) = 160
tee(0x3, 0x6, 0x10000, 0x2) = 42552
splice(0x5, 0, 0xd, 0, 0xa00000, 0x2) = 14440
tee(0x3, 0x6, 0x10000, 0x2) = 13880
splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2) = -1 EAGAIN (Resource temporarily unavailable)
...
tee(0x3, 0x6, 0x10000, 0x2) = 13880
splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2) = 51100
tee(0x3, 0x6, 0x10000, 0x2) = 50744
splice(0x5, 0, 0xd, 0, 0x9efffc, 0x2) = 32120
tee(0x3, 0x6, 0x10000, 0x2) = 30264
splice(0x5, 0, 0xd, 0, 0x9e8284, 0x2) = -1 EAGAIN (Resource temporarily unavailable)
etc...
It's only with haproxy which uses splice() to copy data between
two sockets that I'm getting the issue (data forwarded from fd 0xe
to fd 0x6) :
16:03:17.797144 pipe([36, 37]) = 0
16:03:17.797318 fcntl64(36, 0x407 /* F_??? */, 0x20000) = 131072 ## note: fcntl(F_SETPIPE_SZ, 128k)
16:03:17.797473 splice(0xe, 0, 0x25, 0, 0x9f2234, 0x3) = 10220
16:03:17.797706 splice(0x24, 0, 0x6, 0, 0x27ec, 0x3) = 10220
16:03:17.802036 gettimeofday({1324652597, 802093}, NULL) = 0
16:03:17.802200 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 7
16:03:17.802363 gettimeofday({1324652597, 802419}, NULL) = 0
16:03:17.802530 splice(0xe, 0, 0x25, 0, 0x9efa48, 0x3) = 16060
16:03:17.802789 splice(0x24, 0, 0x6, 0, 0x3ebc, 0x3) = 16060
16:03:17.806593 gettimeofday({1324652597, 806651}, NULL) = 0
16:03:17.806759 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 4
16:03:17.806919 gettimeofday({1324652597, 806974}, NULL) = 0
16:03:17.807087 splice(0xe, 0, 0x25, 0, 0x9ebb8c, 0x3) = 17520
16:03:17.807356 splice(0x24, 0, 0x6, 0, 0x4470, 0x3) = 17520
16:03:17.809565 gettimeofday({1324652597, 809620}, NULL) = 0
16:03:17.809726 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
16:03:17.809883 gettimeofday({1324652597, 809937}, NULL) = 0
16:03:17.810047 splice(0xe, 0, 0x25, 0, 0x9e771c, 0x3) = 36500
16:03:17.810399 splice(0x24, 0, 0x6, 0, 0x8e94, 0x3) = 23360
16:03:17.810629 epoll_ctl(0x3, 0x1, 0x6, 0x85378) = 0 ## note: epoll_ctl(ADD, fd=6, dir=OUT).
16:03:17.810792 gettimeofday({1324652597, 810848}, NULL) = 0
16:03:17.810954 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
16:03:17.811188 gettimeofday({1324652597, 811246}, NULL) = 0
16:03:17.811356 splice(0xe, 0, 0x25, 0, 0x9de888, 0x3) = 21900
16:03:17.811651 splice(0x24, 0, 0x6, 0, 0x88e0, 0x3) = -1 EAGAIN (Resource temporarily unavailable)
So output fd 6 hangs here and will not appear anymore until
here where I pressed Ctrl-C to stop the test :
16:03:24.740985 gettimeofday({1324652604, 741042}, NULL) = 0
16:03:24.741148 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 7
16:03:24.951762 gettimeofday({1324652604, 951838}, NULL) = 0
16:03:24.951956 splice(0x24, 0, 0x6, 0, 0x88e0, 0x3) = -1 EPIPE (Broken pipe)
I tried disabling LRO/GRO at the input interface (which happens to be
the same) to see if fragmentation of input data had any impact on this
but nothing chnages.
Please note that I'm not even certain the patch is the culprit, I'm
suspecting that by improving splice() efficiency, it might make a
latent issue become more visible. I have no data to back this
feeling, but nothing strikes me in your patch.
I don't know what I can do to troubleshoot this issue. I don't want
to pollute the list with network captures nor strace outputs, but I
have them if you're interested in verifying a few things.
I have another platform available for a test (Atom+82574L supporting
TSO). I'll rebuild and boot on this one to see if I observe the same
behaviour.
If you have any suggestion about things to check of tweaks to change
in the code, I'm quite open to experiment.
Best regards,
Willy
next reply other threads:[~2012-05-17 12:18 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-17 12:18 Willy Tarreau [this message]
2012-05-17 15:01 ` Stable regression with 'tcp: allow splice() to build full TSO packets' Willy Tarreau
2012-05-17 15:43 ` Eric Dumazet
2012-05-17 15:56 ` Willy Tarreau
2012-05-17 16:33 ` Eric Dumazet
2012-05-17 16:40 ` Willy Tarreau
2012-05-17 16:47 ` Eric Dumazet
2012-05-17 16:49 ` Eric Dumazet
2012-05-17 17:22 ` Willy Tarreau
2012-05-17 17:34 ` [PATCH net-next] net: netdev_alloc_skb() use build_skb() Eric Dumazet
2012-05-17 17:45 ` Willy Tarreau
2012-06-04 12:39 ` Michael S. Tsirkin
2012-06-04 12:44 ` Willy Tarreau
2012-05-17 19:53 ` David Miller
2012-05-18 4:41 ` Eric Dumazet
2012-06-04 12:37 ` Michael S. Tsirkin
2012-06-04 13:06 ` Eric Dumazet
2012-06-04 13:41 ` Michael S. Tsirkin
2012-06-04 14:01 ` Eric Dumazet
2012-06-04 14:09 ` Eric Dumazet
2012-06-04 14:17 ` Michael S. Tsirkin
2012-06-04 15:01 ` Eric Dumazet
2012-06-04 17:20 ` Michael S. Tsirkin
2012-06-04 17:44 ` Eric Dumazet
2012-06-04 18:16 ` Michael S. Tsirkin
2012-06-04 19:24 ` Eric Dumazet
2012-06-04 19:48 ` Michael S. Tsirkin
2012-06-04 19:56 ` Eric Dumazet
2012-06-04 21:20 ` Michael S. Tsirkin
2012-06-05 2:50 ` Eric Dumazet
2012-06-04 18:16 ` Michael S. Tsirkin
2012-06-04 19:29 ` Eric Dumazet
2012-06-04 19:43 ` Michael S. Tsirkin
2012-06-04 19:52 ` Eric Dumazet
2012-06-04 21:54 ` Michael S. Tsirkin
2012-06-05 2:46 ` Eric Dumazet
2012-06-04 19:56 ` Michael S. Tsirkin
2012-06-04 20:05 ` Eric Dumazet
2012-05-17 18:38 ` Stable regression with 'tcp: allow splice() to build full TSO packets' Ben Hutchings
2012-05-17 19:55 ` David Miller
2012-05-17 20:04 ` Willy Tarreau
2012-05-17 20:07 ` David Miller
2012-05-17 20:41 ` Eric Dumazet
2012-05-17 21:14 ` Willy Tarreau
2012-05-17 21:40 ` Eric Dumazet
2012-05-17 21:50 ` Eric Dumazet
2012-05-17 21:57 ` Willy Tarreau
2012-05-17 22:01 ` Eric Dumazet
2012-05-17 22:10 ` Eric Dumazet
2012-05-17 22:16 ` Willy Tarreau
2012-05-17 22:22 ` Eric Dumazet
2012-05-17 22:24 ` Willy Tarreau
2012-05-17 22:25 ` David Miller
2012-05-17 22:30 ` Willy Tarreau
2012-05-17 22:35 ` David Miller
2012-05-17 22:49 ` Willy Tarreau
2012-05-17 22:27 ` Joe Perches
2012-05-17 21:54 ` Willy Tarreau
2012-05-17 21:47 ` Willy Tarreau
2012-05-17 22:14 ` Eric Dumazet
2012-05-17 22:29 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120517121800.GA18052@1wt.eu \
--to=w@1wt.eu \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).