All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.19 - tons of hw csum failure errors
@ 2018-10-27 19:18 Nikola Ciprich
  2018-10-30  2:51 ` Cong Wang
  0 siblings, 1 reply; 2+ messages in thread
From: Nikola Ciprich @ 2018-10-27 19:18 UTC (permalink / raw)
  To: netdev; +Cc: nik

Hi,

just wanted to report, thet after switching to 4.19 (fro 4.14.x, so maybe
the problem appeared somewhere between), I'm getting tons of similar
messages:

Oct 27 09:06:27 xxx kernel: br501: hw csum failure
Oct 27 09:06:27 xxx kernel: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E     4.19.0lb7.00_01_PRE04 #1
Oct 27 09:06:27 xxx kernel: Hardware name: Supermicro Super Server/X11DDW-NT, BIOS 2.0b 03/07/2018
Oct 27 09:06:27 xxx kernel: Call Trace:
Oct 27 09:06:27 xxx kernel:  <IRQ>
Oct 27 09:06:27 xxx kernel:  dump_stack+0x5a/0x73
Oct 27 09:06:27 xxx kernel:  __skb_checksum_complete+0xba/0xc0
Oct 27 09:06:27 xxx kernel:  tcp_error+0x108/0x180 [nf_conntrack]
Oct 27 09:06:27 xxx kernel:  nf_conntrack_in+0xd2/0x4b0 [nf_conntrack]
Oct 27 09:06:27 xxx kernel:  ? csum_partial+0xd/0x20
Oct 27 09:06:27 xxx kernel:  nf_hook_slow+0x3d/0xb0
Oct 27 09:06:27 xxx kernel:  ip_rcv+0xb5/0xd0
Oct 27 09:06:27 xxx kernel:  ? ip_rcv_finish_core.isra.12+0x370/0x370
Oct 27 09:06:27 xxx kernel:  __netif_receive_skb_one_core+0x52/0x70
Oct 27 09:06:27 xxx kernel:  process_backlog+0xa3/0x150
Oct 27 09:06:27 xxx kernel:  net_rx_action+0x2af/0x3f0
Oct 27 09:06:27 xxx kernel:  __do_softirq+0xd1/0x28c
Oct 27 09:06:27 xxx kernel:  irq_exit+0xde/0xf0
Oct 27 09:06:27 xxx kernel:  do_IRQ+0x54/0xe0
Oct 27 09:06:27 xxx kernel:  common_interrupt+0xf/0xf
Oct 27 09:06:27 xxx kernel:  </IRQ>
Oct 27 09:06:27 xxx kernel: RIP: 0010:cpuidle_enter_state+0xb6/0x2e0
Oct 27 09:06:27 xxx kernel: Code: 7e e8 ee 84 b2 ff 8b 5d 04 49 89 c6 0f 1f 44 00 00 31 ff e8 bc 95 b2 ff 80 7c 24 03 00 0f 85 93 01 00 00 fb 66 0f 1f 44 00 00 <4d> 29 fe 48 ba cf f7 5
3 e3 a5 9b c4 20 4c 89 f0 49 c1 fe 3f 48 f7
Oct 27 09:06:27 xxx kernel: RSP: 0018:ffffc90018b17e88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
Oct 27 09:06:27 xxx kernel: RAX: ffff888faf822600 RBX: 0000000000000008 RCX: 000000000000001f
Oct 27 09:06:27 xxx kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
Oct 27 09:06:27 xxx kernel: RBP: ffffe8ffffa029a8 R08: 0000000000000002 R09: ffe9afdaa39e3efa
Oct 27 09:06:27 xxx kernel: R10: 0000000000000377 R11: 0000000000000008 R12: 0000000000000008
Oct 27 09:06:27 xxx kernel: R13: 0000000000000003 R14: 0000000d6670b7ca R15: 0000000d564db458
Oct 27 09:06:27 xxx kernel:  ? cpuidle_enter_state+0xa4/0x2e0
Oct 27 09:06:27 xxx kernel:  do_idle+0x1e4/0x290
Oct 27 09:06:27 xxx kernel:  cpu_startup_entry+0x6f/0x80
Oct 27 09:06:27 xxx kernel:  start_secondary+0x1aa/0x200
Oct 27 09:06:27 xxx kernel:  secondary_startup_64+0xa4/0xb0

it's being reported for various kernel threads (swapper, ksoftirqd, ...)


I tried applying

commit db4f1be3ca9b0ef7330763d07bf4ace83ad6f913
Author: Sean Tranchetti <stranche@codeaurora.org>
Date:   Tue Oct 23 16:04:31 2018 -0600

    net: udp: fix handling of CHECKSUM_COMPLETE packets

but to no avail..

the system is running virtual machines and using openvswitch with
following simple topology:

[root@xxx tmp]# ovs-vsctl show
22519243-4f9e-47dc-ac8c-3635f6595c4d
    Bridge brovs
        Port brovs
            Interface brovs
                type: internal
        Port "bond0"
            Interface "eth2"
            Interface "eth3"
        Port "vnet0"
            tag: 502
            Interface "vnet0"
        Port brdef
            tag: 0
            Interface brdef
                type: internal
        Port "br51"
            tag: 51
            Interface "br51"
                type: internal
        Port "br50"
            tag: 50
            Interface "br50"
                type: internal
        Port "br501"
            tag: 501
            Interface "br501"
                type: internal
    ovs_version: "2.5.0"

is this some known problem? may I provide some additional info?

BR

nik


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: 4.19 - tons of hw csum failure errors
  2018-10-27 19:18 4.19 - tons of hw csum failure errors Nikola Ciprich
@ 2018-10-30  2:51 ` Cong Wang
  0 siblings, 0 replies; 2+ messages in thread
From: Cong Wang @ 2018-10-30  2:51 UTC (permalink / raw)
  To: nikola.ciprich; +Cc: Linux Kernel Network Developers, nik, Eric Dumazet

(Cc'ing Eric)

On Sat, Oct 27, 2018 at 12:47 PM Nikola Ciprich
<nikola.ciprich@linuxbox.cz> wrote:
>
> Hi,
>
> just wanted to report, thet after switching to 4.19 (fro 4.14.x, so maybe
> the problem appeared somewhere between), I'm getting tons of similar
> messages:
>
> Oct 27 09:06:27 xxx kernel: br501: hw csum failure
> Oct 27 09:06:27 xxx kernel: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E     4.19.0lb7.00_01_PRE04 #1
> Oct 27 09:06:27 xxx kernel: Hardware name: Supermicro Super Server/X11DDW-NT, BIOS 2.0b 03/07/2018
> Oct 27 09:06:27 xxx kernel: Call Trace:
> Oct 27 09:06:27 xxx kernel:  <IRQ>
> Oct 27 09:06:27 xxx kernel:  dump_stack+0x5a/0x73
> Oct 27 09:06:27 xxx kernel:  __skb_checksum_complete+0xba/0xc0
> Oct 27 09:06:27 xxx kernel:  tcp_error+0x108/0x180 [nf_conntrack]
> Oct 27 09:06:27 xxx kernel:  nf_conntrack_in+0xd2/0x4b0 [nf_conntrack]
> Oct 27 09:06:27 xxx kernel:  ? csum_partial+0xd/0x20
> Oct 27 09:06:27 xxx kernel:  nf_hook_slow+0x3d/0xb0
> Oct 27 09:06:27 xxx kernel:  ip_rcv+0xb5/0xd0
> Oct 27 09:06:27 xxx kernel:  ? ip_rcv_finish_core.isra.12+0x370/0x370
> Oct 27 09:06:27 xxx kernel:  __netif_receive_skb_one_core+0x52/0x70
> Oct 27 09:06:27 xxx kernel:  process_backlog+0xa3/0x150
> Oct 27 09:06:27 xxx kernel:  net_rx_action+0x2af/0x3f0
> Oct 27 09:06:27 xxx kernel:  __do_softirq+0xd1/0x28c
> Oct 27 09:06:27 xxx kernel:  irq_exit+0xde/0xf0
> Oct 27 09:06:27 xxx kernel:  do_IRQ+0x54/0xe0
> Oct 27 09:06:27 xxx kernel:  common_interrupt+0xf/0xf
> Oct 27 09:06:27 xxx kernel:  </IRQ>


We got the same warning (but a different backtrace) with mlx5 driver.

It seems you are using a different driver. Do you have any clue to reproduce
it?

If you do, try to tcpdump the packets triggering this warning, it could
be useful for debugging.

As I explained in other thread, it is likely that commit 88078d98d1bb
introduces more troubles than the one fixed by d55bef5059dd057bd.
You can try to play with these two commits to see if you get the same
conclusion.

BTW, the offending commit has been backported to 4.14 too. ;)

Thanks!

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-10-30 11:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-27 19:18 4.19 - tons of hw csum failure errors Nikola Ciprich
2018-10-30  2:51 ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.