All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Bruno Prémont" <bonbons@sysophe.eu>
To: Eric Dumazet <edumazet@google.com>
Cc: richard.purdie@linuxfoundation.org,
	Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org,
	Alexander Kanavin <alex.kanavin@gmail.com>,
	Bruce Ashfield <bruce.ashfield@gmail.com>
Subject: Re: [PATCH net-next 2/3] tcp: implement coalescing on backlog queue
Date: Wed, 24 Apr 2019 16:51:50 +0200	[thread overview]
Message-ID: <20190424165150.1420b046@pluto.restena.lu> (raw)
In-Reply-To: <85aabf9d4f41b6c57629e736993233f80a037e59.camel@linuxfoundation.org>

Hi Eric,

I'm seeing issues with this patch as well, not as regular as for
Richard but still (about up to one in 30-50 TCP sessions).

In my case I have a virtual machine (on VMWare) with this patch where
NGINX as reverse proxy misses part (end) of payload from its upstream
and times out on the upstream connection (while according to tcpdump all
packets including upstream's FIN were sent and the upstream did get
ACKs from the VM).

From when browsers get from NGINX it feels as if at some point reading
from the socket or waiting for data using select() never returned data
that arrived as more than just EOF is missing.

The upstream is a hardware machine in the same subnet.

My VM is using VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
as network adapter which lists the following features:

rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]


I can reproduce the issue with kernels 5.0.x and as recent as 5.1-rc6.

Cheers,
Bruno

On Sunday, April 7, 2019 11:28:30 PM CEST, richard.purdie@linuxfoundation.org wrote:
> Hi,
>
> I've been chasing down why a python test from the python3 testsuite
> started failing and it seems to point to this kernel change in the
> networking stack.
>
> In kernels beyond commit 4f693b55c3d2d2239b8a0094b518a1e533cf75d5 the
> test hangs about 90% of the time (I've reproduced with 5.1-rc3, 5.0.7,
> 5.0-rc1 but not 4.18, 4.19 or 4.20). The reproducer is:
>
> $ python3 -m test test_httplib -v
> == CPython 3.7.2 (default, Apr 5 2019, 15:17:15) [GCC 8.3.0]
> == Linux-5.0.0-yocto-standard-x86_64-with-glibc2.2.5 little-endian
> == cwd: /var/volatile/tmp/test_python_288
> == CPU count: 1
> == encodings: locale=UTF-8, FS=utf-8
> [...]
> test_response_fileno (test.test_httplib.BasicTest) ... 
>
> and it hangs in test_response_fileno.
>
> The test in question comes from Lib/test/test_httplib.py in the python
> source tree and the code is:
>
>     def test_response_fileno(self):
>         # Make sure fd returned by fileno is valid.
>         serv = socket.socket(
>             socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)
>         self.addCleanup(serv.close)
>         serv.bind((HOST, 0))
>         serv.listen()
>
>         result = None
>         def run_server():
>             [conn, address] = serv.accept()
>             with conn, conn.makefile("rb") as reader:
>                 # Read the request header until a blank line
>                 while True:
>                     line = reader.readline()
>                     if not line.rstrip(b"\r\n"):
>                         break
>                 conn.sendall(b"HTTP/1.1 200 Connection established\r\n\r\n")
>                 nonlocal result
>                 result = reader.read()
>
>         thread = threading.Thread(target=run_server)
>         thread.start()
>         self.addCleanup(thread.join, float(1))
>         conn = client.HTTPConnection(*serv.getsockname())
>         conn.request("CONNECT", "dummy:1234")
>         response = conn.getresponse()
>         try:
>             self.assertEqual(response.status, client.OK)
>             s = socket.socket(fileno=response.fileno())
>             try:
>                 s.sendall(b"proxied data\n")
>             finally:
>                 s.detach()
>         finally:
>             response.close()
>             conn.close()
>         thread.join()
>         self.assertEqual(result, b"proxied data\n")
>
> I was hoping someone with more understanding of the networking stack
> could look at this and tell whether its a bug in the python test, the
> kernel change or otherwise give a pointer to where the problem might
> be? I'll freely admit this is not an area I know much about.
>
> Cheers,
>
> Richard
>
>
>

  reply	other threads:[~2019-04-24 15:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-07 21:28 [PATCH net-next 2/3] tcp: implement coalescing on backlog queue richard.purdie
2019-04-24 14:51 ` Bruno Prémont [this message]
2019-04-24 15:47   ` Eric Dumazet
2019-04-25  7:55     ` Bruno Prémont
2019-04-25 13:13       ` Bruno Prémont
2019-04-25 13:30         ` Eric Dumazet
2019-04-25 14:16           ` Bruno Prémont
  -- strict thread matches above, loose matches on Subject: below --
2018-11-21 17:52 [PATCH net-next 0/3] tcp: take a bit more care of backlog stress Eric Dumazet
2018-11-21 17:52 ` [PATCH net-next 2/3] tcp: implement coalescing on backlog queue Eric Dumazet
2018-11-21 22:31   ` Yuchung Cheng
2018-11-21 22:40     ` Eric Dumazet
2018-11-22 16:34       ` Yuchung Cheng
2018-11-22 18:01   ` Neal Cardwell
2018-11-22 18:16     ` Eric Dumazet
2018-11-22 18:21       ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190424165150.1420b046@pluto.restena.lu \
    --to=bonbons@sysophe.eu \
    --cc=alex.kanavin@gmail.com \
    --cc=bruce.ashfield@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.