From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=fYmv=S3=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.7 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED,
	FROM_EXCESS_BASE64,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9DDB7C10F03
	for <netdev@archiver.kernel.org>; Thu, 25 Apr 2019 07:55:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 53DE2208E4
	for <netdev@archiver.kernel.org>; Thu, 25 Apr 2019 07:55:39 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (1024-bit key) header.d=sysophe.eu header.i=@sysophe.eu header.b="BYM7GW1f"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727751AbfDYHzh (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Thu, 25 Apr 2019 03:55:37 -0400
Received: from hygieia.sysophe.eu ([138.201.91.14]:39929 "EHLO
        hygieia.sysophe.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726380AbfDYHzh (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 25 Apr 2019 03:55:37 -0400
Received: from pluto.restena.lu (pluto.restena.lu [IPv6:2001:a18:1:10::156])
        by smtp.sysophe.eu (Postfix) with ESMTPSA id 2A1EC106BE07B;
        Thu, 25 Apr 2019 09:53:43 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=sysophe.eu; s=201205;
        t=1556178823; x=1556265223;
        bh=ZU7Akr1U6Tutw04T40SjEi4OAcI385onT3CeaRTf2PM=;
        h=Date:From:To:Cc:Subject:In-Reply-To:References;
        b=BYM7GW1f6kmN4IpIo4QBY3owB6caho7PzTPAmyGo8T3SaA3vGD8JsT7o37F8K+0bb
         762+Dvb2kFP0so4Q26qO9NxW0A/3pH9cFJW6lg/gFkjM+V402V2DehlZ7+3EnFptEd
         SBq4clE4tA/cpXE641TeMJgFr0R3P4QZWpiCjwy4=
Date:   Thu, 25 Apr 2019 09:55:31 +0200
From:   Bruno =?UTF-8?B?UHLDqW1vbnQ=?= <bonbons@sysophe.eu>
To:     Eric Dumazet <edumazet@google.com>
Cc:     richard.purdie@linuxfoundation.org,
        Neal Cardwell <ncardwell@google.com>,
        Yuchung Cheng <ycheng@google.com>,
        "David S. Miller" <davem@davemloft.net>,
        netdev <netdev@vger.kernel.org>,
        Alexander Kanavin <alex.kanavin@gmail.com>,
        Bruce Ashfield <bruce.ashfield@gmail.com>
Subject: Re: [PATCH net-next 2/3] tcp: implement coalescing on backlog queue
Message-ID: <20190425095531.4fd8c53f@pluto.restena.lu>
In-Reply-To: <CANn89i+afn_7D+dpk1dd_pjN=OFT=_8xJrr8iybC+oORt2QUoA@mail.gmail.com>
References: <85aabf9d4f41b6c57629e736993233f80a037e59.camel@linuxfoundation.org>
        <20190424165150.1420b046@pluto.restena.lu>
        <CANn89i+afn_7D+dpk1dd_pjN=OFT=_8xJrr8iybC+oORt2QUoA@mail.gmail.com>
X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Hi Eric,

On Wed, 24 Apr 2019 08:47:27 -0700 Eric Dumazet wrote:
> On Wed, Apr 24, 2019 at 7:51 AM Bruno Pr=C3=A9mont wrote:
> >
> > Hi Eric,
> >
> > I'm seeing issues with this patch as well, not as regular as for
> > Richard but still (about up to one in 30-50 TCP sessions).
> >
> > In my case I have a virtual machine (on VMWare) with this patch where
> > NGINX as reverse proxy misses part (end) of payload from its upstream
> > and times out on the upstream connection (while according to tcpdump all
> > packets including upstream's FIN were sent and the upstream did get
> > ACKs from the VM).
> >
> > From when browsers get from NGINX it feels as if at some point reading
> > from the socket or waiting for data using select() never returned data
> > that arrived as more than just EOF is missing.
> >
> > The upstream is a hardware machine in the same subnet.
> >
> > My VM is using VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
> > as network adapter which lists the following features:
> > =20
>=20
> Hi Bruno.
>=20
> I suspect a EPOLLIN notification being lost by the application.
>=20
> Fact that TCP backlog contains 1 instead of 2+ packets should not
> change stack behavior,
> this packet should land into socket receive queue eventually.
>=20
> Are you using epoll() in Edge Trigger mode. You mention select() but
> select() is a rather old and inefficient API.

nginx is using epoll (c.f. http://nginx.org/en/docs/events.html)

For source, see here
https://trac.nginx.org/nginx/browser/nginx/src/event/modules/ngx_epoll_modu=
le.c?rev=3Debf8c9686b8ce7428f975d8a567935ea3722da70

> Could you watch/report the output of " ss -temoi "  for the frozen TCP fl=
ow ?

Here it is, three distinct reproducing attempts:
State      Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
ESTAB      0       0       158.64.2.228:44248  158.64.2.217:webcache       =
       uid:83 ino:13245 sk:87 <->
         skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) ts sack cubic wscal=
e:7,7 rto:210 rtt:0.24/0.118 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:1=
0 bytes_acked:949 bytes_received:28381 segs_out:12 segs_in:12 data_segs_out=
:1 data_segs_in:10 send 482.7Mbps lastsnd:46810 lastrcv:46790 lastack:46790=
 pacing_rate 965.3Mbps delivery_rate 74.3Mbps app_limited rcv_rtt:1 rcv_spa=
ce:14480 minrtt:0.156


ESTAB      0      0                                                        =
           2001:a18:1:6::228:33572                                         =
                                    2001:a18:1:6::217:webcache             =
 uid:83 ino:16699 sk:e1 <->
         skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) ts sack cubic wscal=
e:7,7 rto:210 rtt:0.231/0.11 ato:40 mss:1428 rcvmss:1428 advmss:1428 cwnd:1=
0 bytes_acked:948 bytes_received:28474 segs_out:12 segs_in:12 data_segs_out=
:1 data_segs_in:10 send 494.5Mbps lastsnd:8380 lastrcv:8360 lastack:8360 pa=
cing_rate 989.1Mbps delivery_rate 71.0Mbps app_limited rcv_rtt:1.109 rcv_sp=
ace:14280 minrtt:0.161


ESTAB      0      0                                                        =
                158.64.2.228:44578                                         =
                                         158.64.2.217:webcache             =
 uid:83 ino:17628 sk:12c <->
         skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) ts sack cubic wscal=
e:7,7 rto:210 rtt:0.279/0.136 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:=
10 bytes_acked:949 bytes_received:28481 segs_out:12 segs_in:12 data_segs_ou=
t:1 data_segs_in:10 send 415.2Mbps lastsnd:11360 lastrcv:11330 lastack:1134=
0 pacing_rate 828.2Mbps delivery_rate 61.9Mbps app_limited rcv_rtt:1 rcv_sp=
ace:14480 minrtt:0.187


=46rom nginx debug logging I don't get a real clue though it seems for workin=
g connections
the last event obtained is 2005 (EPOLLMSG | EPOLLWRBAND | EPOLLWRNORM |
EPOLLRDBAND | EPOLLRDNORM | EPOLLHUP | EPOLLIN | EPOLLOUT) - previous ones =
are 5
while for failing connections it looks like last event seen is 5 (EPOLLIN |=
 EPOLLOUT).

> This migtht give us a clue about packets being dropped, say the the
> accumulated packet became too big.


The following minor patch (might be white-space mangled) does prevent the i=
ssue
happening for me:

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4904250a9aac..c102cd367c79 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1667,7 +1667,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff =
*skb)
        if (TCP_SKB_CB(tail)->end_seq !=3D TCP_SKB_CB(skb)->seq ||
            TCP_SKB_CB(tail)->ip_dsfield !=3D TCP_SKB_CB(skb)->ip_dsfield ||
            ((TCP_SKB_CB(tail)->tcp_flags |
-             TCP_SKB_CB(skb)->tcp_flags) & TCPHDR_URG) ||
+             TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_URG | TCPHDR_FIN)) ||
            ((TCP_SKB_CB(tail)->tcp_flags ^
              TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_ECE | TCPHDR_CWR)) ||
 #ifdef CONFIG_TLS_DEVICE

Cheers,
Bruno


> > rx-checksumming: on
> > tx-checksumming: on
> >         tx-checksum-ipv4: off [fixed]
> >         tx-checksum-ip-generic: on
> >         tx-checksum-ipv6: off [fixed]
> >         tx-checksum-fcoe-crc: off [fixed]
> >         tx-checksum-sctp: off [fixed]
> > scatter-gather: on
> >         tx-scatter-gather: on
> >         tx-scatter-gather-fraglist: off [fixed]
> > tcp-segmentation-offload: on
> >         tx-tcp-segmentation: on
> >         tx-tcp-ecn-segmentation: off [fixed]
> >         tx-tcp-mangleid-segmentation: off
> >         tx-tcp6-segmentation: on
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: on
> > generic-receive-offload: on
> > large-receive-offload: on
> > rx-vlan-offload: on
> > tx-vlan-offload: on
> > ntuple-filters: off [fixed]
> > receive-hashing: off [fixed]
> > highdma: on
> > rx-vlan-filter: on [fixed]
> > vlan-challenged: off [fixed]
> > tx-lockless: off [fixed]
> > netns-local: off [fixed]
> > tx-gso-robust: off [fixed]
> > tx-fcoe-segmentation: off [fixed]
> > tx-gre-segmentation: off [fixed]
> > tx-gre-csum-segmentation: off [fixed]
> > tx-ipxip4-segmentation: off [fixed]
> > tx-ipxip6-segmentation: off [fixed]
> > tx-udp_tnl-segmentation: off [fixed]
> > tx-udp_tnl-csum-segmentation: off [fixed]
> > tx-gso-partial: off [fixed]
> > tx-sctp-segmentation: off [fixed]
> > tx-esp-segmentation: off [fixed]
> > tx-udp-segmentation: off [fixed]
> > fcoe-mtu: off [fixed]
> > tx-nocache-copy: off
> > loopback: off [fixed]
> > rx-fcs: off [fixed]
> > rx-all: off [fixed]
> > tx-vlan-stag-hw-insert: off [fixed]
> > rx-vlan-stag-hw-parse: off [fixed]
> > rx-vlan-stag-filter: off [fixed]
> > l2-fwd-offload: off [fixed]
> > hw-tc-offload: off [fixed]
> > esp-hw-offload: off [fixed]
> > esp-tx-csum-hw-offload: off [fixed]
> > rx-udp_tunnel-port-offload: off [fixed]
> > tls-hw-tx-offload: off [fixed]
> > tls-hw-rx-offload: off [fixed]
> > rx-gro-hw: off [fixed]
> > tls-hw-record: off [fixed]
> >
> >
> > I can reproduce the issue with kernels 5.0.x and as recent as 5.1-rc6.
> >
> > Cheers,
> > Bruno
> >
> > On Sunday, April 7, 2019 11:28:30 PM CEST, richard.purdie@linuxfoundati=
on.org wrote: =20
> > > Hi,
> > >
> > > I've been chasing down why a python test from the python3 testsuite
> > > started failing and it seems to point to this kernel change in the
> > > networking stack.
> > >
> > > In kernels beyond commit 4f693b55c3d2d2239b8a0094b518a1e533cf75d5 the
> > > test hangs about 90% of the time (I've reproduced with 5.1-rc3, 5.0.7,
> > > 5.0-rc1 but not 4.18, 4.19 or 4.20). The reproducer is:
> > >
> > > $ python3 -m test test_httplib -v
> > > =3D=3D CPython 3.7.2 (default, Apr 5 2019, 15:17:15) [GCC 8.3.0]
> > > =3D=3D Linux-5.0.0-yocto-standard-x86_64-with-glibc2.2.5 little-endian
> > > =3D=3D cwd: /var/volatile/tmp/test_python_288
> > > =3D=3D CPU count: 1
> > > =3D=3D encodings: locale=3DUTF-8, FS=3Dutf-8
> > > [...]
> > > test_response_fileno (test.test_httplib.BasicTest) ...
> > >
> > > and it hangs in test_response_fileno.
> > >
> > > The test in question comes from Lib/test/test_httplib.py in the python
> > > source tree and the code is:
> > >
> > >     def test_response_fileno(self):
> > >         # Make sure fd returned by fileno is valid.
> > >         serv =3D socket.socket(
> > >             socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)
> > >         self.addCleanup(serv.close)
> > >         serv.bind((HOST, 0))
> > >         serv.listen()
> > >
> > >         result =3D None
> > >         def run_server():
> > >             [conn, address] =3D serv.accept()
> > >             with conn, conn.makefile("rb") as reader:
> > >                 # Read the request header until a blank line
> > >                 while True:
> > >                     line =3D reader.readline()
> > >                     if not line.rstrip(b"\r\n"):
> > >                         break
> > >                 conn.sendall(b"HTTP/1.1 200 Connection established\r\=
n\r\n")
> > >                 nonlocal result
> > >                 result =3D reader.read()
> > >
> > >         thread =3D threading.Thread(target=3Drun_server)
> > >         thread.start()
> > >         self.addCleanup(thread.join, float(1))
> > >         conn =3D client.HTTPConnection(*serv.getsockname())
> > >         conn.request("CONNECT", "dummy:1234")
> > >         response =3D conn.getresponse()
> > >         try:
> > >             self.assertEqual(response.status, client.OK)
> > >             s =3D socket.socket(fileno=3Dresponse.fileno())
> > >             try:
> > >                 s.sendall(b"proxied data\n")
> > >             finally:
> > >                 s.detach()
> > >         finally:
> > >             response.close()
> > >             conn.close()
> > >         thread.join()
> > >         self.assertEqual(result, b"proxied data\n")
> > >
> > > I was hoping someone with more understanding of the networking stack
> > > could look at this and tell whether its a bug in the python test, the
> > > kernel change or otherwise give a pointer to where the problem might
> > > be? I'll freely admit this is not an area I know much about.
> > >
> > > Cheers,
> > >
> > > Richard
> > >
> > >
> > > =20