All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Bruno Prémont" <bonbons@sysophe.eu>
To: Eric Dumazet <edumazet@google.com>
Cc: richard.purdie@linuxfoundation.org,
	Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Alexander Kanavin <alex.kanavin@gmail.com>,
	Bruce Ashfield <bruce.ashfield@gmail.com>
Subject: Re: [PATCH net-next 2/3] tcp: implement coalescing on backlog queue
Date: Thu, 25 Apr 2019 09:55:31 +0200	[thread overview]
Message-ID: <20190425095531.4fd8c53f@pluto.restena.lu> (raw)
In-Reply-To: <CANn89i+afn_7D+dpk1dd_pjN=OFT=_8xJrr8iybC+oORt2QUoA@mail.gmail.com>

Hi Eric,

On Wed, 24 Apr 2019 08:47:27 -0700 Eric Dumazet wrote:
> On Wed, Apr 24, 2019 at 7:51 AM Bruno Prémont wrote:
> >
> > Hi Eric,
> >
> > I'm seeing issues with this patch as well, not as regular as for
> > Richard but still (about up to one in 30-50 TCP sessions).
> >
> > In my case I have a virtual machine (on VMWare) with this patch where
> > NGINX as reverse proxy misses part (end) of payload from its upstream
> > and times out on the upstream connection (while according to tcpdump all
> > packets including upstream's FIN were sent and the upstream did get
> > ACKs from the VM).
> >
> > From when browsers get from NGINX it feels as if at some point reading
> > from the socket or waiting for data using select() never returned data
> > that arrived as more than just EOF is missing.
> >
> > The upstream is a hardware machine in the same subnet.
> >
> > My VM is using VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
> > as network adapter which lists the following features:
> >  
> 
> Hi Bruno.
> 
> I suspect a EPOLLIN notification being lost by the application.
> 
> Fact that TCP backlog contains 1 instead of 2+ packets should not
> change stack behavior,
> this packet should land into socket receive queue eventually.
> 
> Are you using epoll() in Edge Trigger mode. You mention select() but
> select() is a rather old and inefficient API.

nginx is using epoll (c.f. http://nginx.org/en/docs/events.html)

For source, see here
https://trac.nginx.org/nginx/browser/nginx/src/event/modules/ngx_epoll_module.c?rev=ebf8c9686b8ce7428f975d8a567935ea3722da70

> Could you watch/report the output of " ss -temoi "  for the frozen TCP flow ?

Here it is, three distinct reproducing attempts:
State      Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
ESTAB      0       0       158.64.2.228:44248  158.64.2.217:webcache              uid:83 ino:13245 sk:87 <->
         skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) ts sack cubic wscale:7,7 rto:210 rtt:0.24/0.118 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:949 bytes_received:28381 segs_out:12 segs_in:12 data_segs_out:1 data_segs_in:10 send 482.7Mbps lastsnd:46810 lastrcv:46790 lastack:46790 pacing_rate 965.3Mbps delivery_rate 74.3Mbps app_limited rcv_rtt:1 rcv_space:14480 minrtt:0.156


ESTAB      0      0                                                                   2001:a18:1:6::228:33572                                                                             2001:a18:1:6::217:webcache              uid:83 ino:16699 sk:e1 <->
         skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) ts sack cubic wscale:7,7 rto:210 rtt:0.231/0.11 ato:40 mss:1428 rcvmss:1428 advmss:1428 cwnd:10 bytes_acked:948 bytes_received:28474 segs_out:12 segs_in:12 data_segs_out:1 data_segs_in:10 send 494.5Mbps lastsnd:8380 lastrcv:8360 lastack:8360 pacing_rate 989.1Mbps delivery_rate 71.0Mbps app_limited rcv_rtt:1.109 rcv_space:14280 minrtt:0.161


ESTAB      0      0                                                                        158.64.2.228:44578                                                                                  158.64.2.217:webcache              uid:83 ino:17628 sk:12c <->
         skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) ts sack cubic wscale:7,7 rto:210 rtt:0.279/0.136 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:949 bytes_received:28481 segs_out:12 segs_in:12 data_segs_out:1 data_segs_in:10 send 415.2Mbps lastsnd:11360 lastrcv:11330 lastack:11340 pacing_rate 828.2Mbps delivery_rate 61.9Mbps app_limited rcv_rtt:1 rcv_space:14480 minrtt:0.187


From nginx debug logging I don't get a real clue though it seems for working connections
the last event obtained is 2005 (EPOLLMSG | EPOLLWRBAND | EPOLLWRNORM |
EPOLLRDBAND | EPOLLRDNORM | EPOLLHUP | EPOLLIN | EPOLLOUT) - previous ones are 5
while for failing connections it looks like last event seen is 5 (EPOLLIN | EPOLLOUT).

> This migtht give us a clue about packets being dropped, say the the
> accumulated packet became too big.


The following minor patch (might be white-space mangled) does prevent the issue
happening for me:

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4904250a9aac..c102cd367c79 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1667,7 +1667,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
        if (TCP_SKB_CB(tail)->end_seq != TCP_SKB_CB(skb)->seq ||
            TCP_SKB_CB(tail)->ip_dsfield != TCP_SKB_CB(skb)->ip_dsfield ||
            ((TCP_SKB_CB(tail)->tcp_flags |
-             TCP_SKB_CB(skb)->tcp_flags) & TCPHDR_URG) ||
+             TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_URG | TCPHDR_FIN)) ||
            ((TCP_SKB_CB(tail)->tcp_flags ^
              TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_ECE | TCPHDR_CWR)) ||
 #ifdef CONFIG_TLS_DEVICE

Cheers,
Bruno


> > rx-checksumming: on
> > tx-checksumming: on
> >         tx-checksum-ipv4: off [fixed]
> >         tx-checksum-ip-generic: on
> >         tx-checksum-ipv6: off [fixed]
> >         tx-checksum-fcoe-crc: off [fixed]
> >         tx-checksum-sctp: off [fixed]
> > scatter-gather: on
> >         tx-scatter-gather: on
> >         tx-scatter-gather-fraglist: off [fixed]
> > tcp-segmentation-offload: on
> >         tx-tcp-segmentation: on
> >         tx-tcp-ecn-segmentation: off [fixed]
> >         tx-tcp-mangleid-segmentation: off
> >         tx-tcp6-segmentation: on
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: on
> > generic-receive-offload: on
> > large-receive-offload: on
> > rx-vlan-offload: on
> > tx-vlan-offload: on
> > ntuple-filters: off [fixed]
> > receive-hashing: off [fixed]
> > highdma: on
> > rx-vlan-filter: on [fixed]
> > vlan-challenged: off [fixed]
> > tx-lockless: off [fixed]
> > netns-local: off [fixed]
> > tx-gso-robust: off [fixed]
> > tx-fcoe-segmentation: off [fixed]
> > tx-gre-segmentation: off [fixed]
> > tx-gre-csum-segmentation: off [fixed]
> > tx-ipxip4-segmentation: off [fixed]
> > tx-ipxip6-segmentation: off [fixed]
> > tx-udp_tnl-segmentation: off [fixed]
> > tx-udp_tnl-csum-segmentation: off [fixed]
> > tx-gso-partial: off [fixed]
> > tx-sctp-segmentation: off [fixed]
> > tx-esp-segmentation: off [fixed]
> > tx-udp-segmentation: off [fixed]
> > fcoe-mtu: off [fixed]
> > tx-nocache-copy: off
> > loopback: off [fixed]
> > rx-fcs: off [fixed]
> > rx-all: off [fixed]
> > tx-vlan-stag-hw-insert: off [fixed]
> > rx-vlan-stag-hw-parse: off [fixed]
> > rx-vlan-stag-filter: off [fixed]
> > l2-fwd-offload: off [fixed]
> > hw-tc-offload: off [fixed]
> > esp-hw-offload: off [fixed]
> > esp-tx-csum-hw-offload: off [fixed]
> > rx-udp_tunnel-port-offload: off [fixed]
> > tls-hw-tx-offload: off [fixed]
> > tls-hw-rx-offload: off [fixed]
> > rx-gro-hw: off [fixed]
> > tls-hw-record: off [fixed]
> >
> >
> > I can reproduce the issue with kernels 5.0.x and as recent as 5.1-rc6.
> >
> > Cheers,
> > Bruno
> >
> > On Sunday, April 7, 2019 11:28:30 PM CEST, richard.purdie@linuxfoundation.org wrote:  
> > > Hi,
> > >
> > > I've been chasing down why a python test from the python3 testsuite
> > > started failing and it seems to point to this kernel change in the
> > > networking stack.
> > >
> > > In kernels beyond commit 4f693b55c3d2d2239b8a0094b518a1e533cf75d5 the
> > > test hangs about 90% of the time (I've reproduced with 5.1-rc3, 5.0.7,
> > > 5.0-rc1 but not 4.18, 4.19 or 4.20). The reproducer is:
> > >
> > > $ python3 -m test test_httplib -v
> > > == CPython 3.7.2 (default, Apr 5 2019, 15:17:15) [GCC 8.3.0]
> > > == Linux-5.0.0-yocto-standard-x86_64-with-glibc2.2.5 little-endian
> > > == cwd: /var/volatile/tmp/test_python_288
> > > == CPU count: 1
> > > == encodings: locale=UTF-8, FS=utf-8
> > > [...]
> > > test_response_fileno (test.test_httplib.BasicTest) ...
> > >
> > > and it hangs in test_response_fileno.
> > >
> > > The test in question comes from Lib/test/test_httplib.py in the python
> > > source tree and the code is:
> > >
> > >     def test_response_fileno(self):
> > >         # Make sure fd returned by fileno is valid.
> > >         serv = socket.socket(
> > >             socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)
> > >         self.addCleanup(serv.close)
> > >         serv.bind((HOST, 0))
> > >         serv.listen()
> > >
> > >         result = None
> > >         def run_server():
> > >             [conn, address] = serv.accept()
> > >             with conn, conn.makefile("rb") as reader:
> > >                 # Read the request header until a blank line
> > >                 while True:
> > >                     line = reader.readline()
> > >                     if not line.rstrip(b"\r\n"):
> > >                         break
> > >                 conn.sendall(b"HTTP/1.1 200 Connection established\r\n\r\n")
> > >                 nonlocal result
> > >                 result = reader.read()
> > >
> > >         thread = threading.Thread(target=run_server)
> > >         thread.start()
> > >         self.addCleanup(thread.join, float(1))
> > >         conn = client.HTTPConnection(*serv.getsockname())
> > >         conn.request("CONNECT", "dummy:1234")
> > >         response = conn.getresponse()
> > >         try:
> > >             self.assertEqual(response.status, client.OK)
> > >             s = socket.socket(fileno=response.fileno())
> > >             try:
> > >                 s.sendall(b"proxied data\n")
> > >             finally:
> > >                 s.detach()
> > >         finally:
> > >             response.close()
> > >             conn.close()
> > >         thread.join()
> > >         self.assertEqual(result, b"proxied data\n")
> > >
> > > I was hoping someone with more understanding of the networking stack
> > > could look at this and tell whether its a bug in the python test, the
> > > kernel change or otherwise give a pointer to where the problem might
> > > be? I'll freely admit this is not an area I know much about.
> > >
> > > Cheers,
> > >
> > > Richard
> > >
> > >
> > >  

  reply	other threads:[~2019-04-25  7:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-07 21:28 [PATCH net-next 2/3] tcp: implement coalescing on backlog queue richard.purdie
2019-04-24 14:51 ` Bruno Prémont
2019-04-24 15:47   ` Eric Dumazet
2019-04-25  7:55     ` Bruno Prémont [this message]
2019-04-25 13:13       ` Bruno Prémont
2019-04-25 13:30         ` Eric Dumazet
2019-04-25 14:16           ` Bruno Prémont
  -- strict thread matches above, loose matches on Subject: below --
2018-11-21 17:52 [PATCH net-next 0/3] tcp: take a bit more care of backlog stress Eric Dumazet
2018-11-21 17:52 ` [PATCH net-next 2/3] tcp: implement coalescing on backlog queue Eric Dumazet
2018-11-21 22:31   ` Yuchung Cheng
2018-11-21 22:40     ` Eric Dumazet
2018-11-22 16:34       ` Yuchung Cheng
2018-11-22 18:01   ` Neal Cardwell
2018-11-22 18:16     ` Eric Dumazet
2018-11-22 18:21       ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190425095531.4fd8c53f@pluto.restena.lu \
    --to=bonbons@sysophe.eu \
    --cc=alex.kanavin@gmail.com \
    --cc=bruce.ashfield@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.