regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Matthias Treydte <mt@waldheinz.de>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: David Ahern <dsahern@gmail.com>,
	stable@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, regressions@lists.linux.dev,
	davem@davemloft.net, yoshfuji@linux-ipv6.org, dsahern@kernel.org
Subject: Re: [regression] UDP recv data corruption
Date: Fri, 02 Jul 2021 14:36:42 +0200	[thread overview]
Message-ID: <20210702143642.Horde.PFbG3LFNTZ3wp0TYiBRGsCM@mail.your-server.de> (raw)
In-Reply-To: <CA+FuTSc3POcZo0En3JBqRwq2+eF645_Cs4U-4nBmTs9FvjoVkg@mail.gmail.com>


Quoting Willem de Bruijn <willemdebruijn.kernel@gmail.com>:

> That library does not enable UDP_GRO. You do not have any UDP based
> tunnel devices (besides vxlan) configured, either, right?

The configuration is really minimal by now, I also took the bonding
out of the equation. We have systemd configure "en*" with mDNS and DHCP
enabled and that's it. The problem remains.

I also found new hardware on my desk today (some Intel SoC), showing
exactly the same symptoms. So it's really nothing to do with the
hardware.

> It is also unlikely that the device has either of NETIF_F_GRO_FRAGLIST
> or NETIF_F_GRO_UDP_FWD configured. This can be checked with `ethtool
> -K $DEV`, shown as "rx-gro-list" and "rx-udp-gro-forwarding",
> respectively.

The full output of "ethtool -k enp5s0" from that SoC:

Features for enp5s0:
rx-checksumming: on
tx-checksumming: on
         tx-checksum-ipv4: off [fixed]
         tx-checksum-ip-generic: on
         tx-checksum-ipv6: off [fixed]
         tx-checksum-fcoe-crc: off [fixed]
         tx-checksum-sctp: on
scatter-gather: on
         tx-scatter-gather: on
         tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
         tx-tcp-segmentation: on
         tx-tcp-ecn-segmentation: off [fixed]
         tx-tcp-mangleid-segmentation: off
         tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

That's the only NIC on this board:

# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN  
mode DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state  
UP mode DEFAULT group default qlen 1000
     link/ether 00:30:d6:24:99:67 brd ff:ff:ff:ff:ff:ff

> One possible short-term workaround is to disable GRO.

Indeed, "ethtool -K enp5s0 gro off" fixes the problem, and calling it with
"gro on" brings it back.


And to answer Paolo's questions from his mail to the list (@Paolo: I'm  
not subscribed,
please also send to me directly so I don't miss your mail)

> Could you please:
> - tell how frequent is the pkt corruption, even a rough estimate of the
> frequency.

# journalctl --since "5min ago" | grep "Packet corrupt" | wc -l
167

So there are 167 detected failures in 5 minutes, while the system is receiving
at a moderate rate of about 900 pkts/s (according to Prometheus' node exporter
at least, but seems about right)

Next I'll try to capture some broken packets and reply in a separate mail,
I'll have to figure out a good way to do this first.


Thanks for your help,
-Matthias




  parent reply	other threads:[~2021-07-02 12:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01 10:47 [regression] UDP recv data corruption Matthias Treydte
2021-07-01 15:39 ` David Ahern
2021-07-02  0:31   ` Willem de Bruijn
2021-07-02 11:42     ` Paolo Abeni
2021-07-02 14:31       ` Matthias Treydte
2021-07-02 12:36     ` Matthias Treydte [this message]
2021-07-02 14:06       ` Paolo Abeni
2021-07-02 14:21         ` Paolo Abeni
2021-07-02 15:23           ` Matthias Treydte
2021-07-02 15:32             ` Paolo Abeni
2021-07-02 16:07               ` Matthias Treydte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210702143642.Horde.PFbG3LFNTZ3wp0TYiBRGsCM@mail.your-server.de \
    --to=mt@waldheinz.de \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=dsahern@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).