All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw: [Bug 201423] New: eth0: hw csum failure
@ 2018-10-15 15:15 Stephen Hemminger
  2018-10-15 15:41 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2018-10-15 15:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev



Begin forwarded message:

Date: Sun, 14 Oct 2018 10:42:48 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 201423] New: eth0: hw csum failure


https://bugzilla.kernel.org/show_bug.cgi?id=201423

            Bug ID: 201423
           Summary: eth0: hw csum failure
           Product: Networking
           Version: 2.5
    Kernel Version: 4.19.0-rc7
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: rossi.f@inwind.it
        Regression: No

I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
ports. I get the following error message:

[  433.727397] eth0: hw csum failure
[  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
[  433.727406] Hardware name: System manufacturer System Product Name/P6T
DELUXE V2, BIOS 1202    12/22/2010
[  433.727407] Call Trace:
[  433.727409]  <IRQ>
[  433.727415]  dump_stack+0x46/0x5b
[  433.727419]  __skb_checksum_complete+0xb0/0xc0
[  433.727423]  tcp_v4_rcv+0x528/0xb60
[  433.727426]  ? ipt_do_table+0x2d0/0x400
[  433.727429]  ip_local_deliver_finish+0x5a/0x110
[  433.727430]  ip_local_deliver+0xe1/0xf0
[  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
[  433.727432]  ip_rcv+0xca/0xe0
[  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
[  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
[  433.727438]  netif_receive_skb_internal+0x4e/0x130
[  433.727439]  napi_gro_receive+0x6a/0x80
[  433.727442]  sky2_poll+0x707/0xd20
[  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
[  433.727447]  net_rx_action+0x237/0x380
[  433.727449]  __do_softirq+0xdc/0x1e0
[  433.727452]  irq_exit+0xa9/0xb0
[  433.727453]  do_IRQ+0x45/0xc0
[  433.727455]  common_interrupt+0xf/0xf
[  433.727456]  </IRQ>
[  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
[  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
[  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffffde
[  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
000000000000001f
[  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
0000000000000000
[  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
000000650512105d
[  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
00000064fc2a8b1c
[  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
ffffffff8204af20
[  433.727468]  ? cpuidle_enter_state+0x119/0x200
[  433.727471]  do_idle+0x1bf/0x200
[  433.727473]  cpu_startup_entry+0x6a/0x70
[  433.727475]  start_secondary+0x17f/0x1c0
[  433.727476]  secondary_startup_64+0xa4/0xb0
[  441.662954] eth0: hw csum failure
[  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
[  441.662960] Hardware name: System manufacturer System Product Name/P6T
DELUXE V2, BIOS 1202    12/22/2010
[  441.662960] Call Trace:
[  441.662963]  <IRQ>
[  441.662968]  dump_stack+0x46/0x5b
[  441.662972]  __skb_checksum_complete+0xb0/0xc0
[  441.662975]  tcp_v4_rcv+0x528/0xb60
[  441.662979]  ? ipt_do_table+0x2d0/0x400
[  441.662981]  ip_local_deliver_finish+0x5a/0x110
[  441.662983]  ip_local_deliver+0xe1/0xf0
[  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
[  441.662986]  ip_rcv+0xca/0xe0
[  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
[  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
[  441.662993]  netif_receive_skb_internal+0x4e/0x130
[  441.662994]  napi_gro_receive+0x6a/0x80
[  441.662998]  sky2_poll+0x707/0xd20
[  441.663000]  net_rx_action+0x237/0x380
[  441.663002]  __do_softirq+0xdc/0x1e0
[  441.663005]  irq_exit+0xa9/0xb0
[  441.663007]  do_IRQ+0x45/0xc0
[  441.663009]  common_interrupt+0xf/0xf
[  441.663010]  </IRQ>
[  441.663012] RIP: 0010:merge+0x22/0xb0
[  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
[  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffde
[  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
ffff88021ab2d408
[  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
0000000000000000
[  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
0000000000008500
[  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
ffffffffa021c440
[  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
ffffc9000090b9e0
[  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
[  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
[  441.663065]  ? merge+0x57/0xb0
[  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
[  441.663082]  list_sort+0x8b/0x230
[  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
[  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
[  441.663113]  ? __switch_to_asm+0x34/0x70
[  441.663114]  ? __switch_to_asm+0x40/0x70
[  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
[  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
[  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
[  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
[  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
[  441.663171]  do_vfs_ioctl+0x9a/0x600
[  441.663173]  ksys_ioctl+0x35/0x60
[  441.663175]  __x64_sys_ioctl+0x11/0x20
[  441.663177]  do_syscall_64+0x3d/0xf0
[  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  441.663180] RIP: 0033:0x7f9377377f37
[  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
[  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
00007f9377377f37
[  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
0000000000000010
[  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
0000000000000000
[  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
00000000c0206466
[  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
0000564497a38120
[  462.833418] eth0: hw csum failure
[  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
[  462.833429] Hardware name: System manufacturer System Product Name/P6T
DELUXE V2, BIOS 1202    12/22/2010
[  462.833429] Call Trace:
[  462.833432]  <IRQ>
[  462.833438]  dump_stack+0x46/0x5b
[  462.833442]  __skb_checksum_complete+0xb0/0xc0
[  462.833446]  tcp_v4_rcv+0x528/0xb60
[  462.833449]  ? ipt_do_table+0x2d0/0x400
[  462.833452]  ip_local_deliver_finish+0x5a/0x110
[  462.833454]  ip_local_deliver+0xe1/0xf0
[  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
[  462.833457]  ip_rcv+0xca/0xe0
[  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
[  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
[  462.833464]  netif_receive_skb_internal+0x4e/0x130
[  462.833466]  napi_gro_receive+0x6a/0x80
[  462.833469]  sky2_poll+0x707/0xd20
[  462.833471]  net_rx_action+0x237/0x380
[  462.833474]  __do_softirq+0xdc/0x1e0
[  462.833477]  irq_exit+0xa9/0xb0
[  462.833479]  do_IRQ+0x45/0xc0
[  462.833481]  common_interrupt+0xf/0xf
[  462.833482]  </IRQ>
[  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
[  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
[  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffffde
[  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
000000000000001f
[  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
0000000000000000
[  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
0000000000000000
[  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
0000006bc3052131
[  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
ffffffff8204af20
[  462.833498]  ? cpuidle_enter_state+0x119/0x200
[  462.833503]  do_idle+0x1bf/0x200
[  462.833506]  cpu_startup_entry+0x6a/0x70
[  462.833510]  start_secondary+0x17f/0x1c0
[  462.833513]  secondary_startup_64+0xa4/0xb0

Something is changed between 4.17.12 and 4.18, after bisecting the problem I
got the following first bad commit:

commit 88078d98d1bb085d72af8437707279e203524fa5
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Apr 18 11:43:15 2018 -0700

    net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

    After working on IP defragmentation lately, I found that some large
    packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
    zero paddings on the last (small) fragment.

    While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
    to CHECKSUM_NONE, forcing a full csum validation, even if all prior
    fragments had CHECKSUM_COMPLETE set.

    We can instead compute the checksum of the part we are trimming,
    usually smaller than the part we keep.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-15 15:15 Fw: [Bug 201423] New: eth0: hw csum failure Stephen Hemminger
@ 2018-10-15 15:41 ` Eric Dumazet
  2018-10-15 16:12   ` Dave Stevenson
                     ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-10-15 15:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, rossi.f

On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
>
>
> Begin forwarded message:
>
> Date: Sun, 14 Oct 2018 10:42:48 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 201423] New: eth0: hw csum failure
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201423
>
>             Bug ID: 201423
>            Summary: eth0: hw csum failure
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 4.19.0-rc7
>           Hardware: Intel
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: rossi.f@inwind.it
>         Regression: No
>
> I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
> ports. I get the following error message:
>
> [  433.727397] eth0: hw csum failure
> [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> [  433.727406] Hardware name: System manufacturer System Product Name/P6T
> DELUXE V2, BIOS 1202    12/22/2010
> [  433.727407] Call Trace:
> [  433.727409]  <IRQ>
> [  433.727415]  dump_stack+0x46/0x5b
> [  433.727419]  __skb_checksum_complete+0xb0/0xc0
> [  433.727423]  tcp_v4_rcv+0x528/0xb60
> [  433.727426]  ? ipt_do_table+0x2d0/0x400
> [  433.727429]  ip_local_deliver_finish+0x5a/0x110
> [  433.727430]  ip_local_deliver+0xe1/0xf0
> [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
> [  433.727432]  ip_rcv+0xca/0xe0
> [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
> [  433.727438]  netif_receive_skb_internal+0x4e/0x130
> [  433.727439]  napi_gro_receive+0x6a/0x80
> [  433.727442]  sky2_poll+0x707/0xd20
> [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
> [  433.727447]  net_rx_action+0x237/0x380
> [  433.727449]  __do_softirq+0xdc/0x1e0
> [  433.727452]  irq_exit+0xa9/0xb0
> [  433.727453]  do_IRQ+0x45/0xc0
> [  433.727455]  common_interrupt+0xf/0xf
> [  433.727456]  </IRQ>
> [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
> [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> ffffffffffffffde
> [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> 000000000000001f
> [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> 0000000000000000
> [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
> 000000650512105d
> [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
> 00000064fc2a8b1c
> [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
> ffffffff8204af20
> [  433.727468]  ? cpuidle_enter_state+0x119/0x200
> [  433.727471]  do_idle+0x1bf/0x200
> [  433.727473]  cpu_startup_entry+0x6a/0x70
> [  433.727475]  start_secondary+0x17f/0x1c0
> [  433.727476]  secondary_startup_64+0xa4/0xb0
> [  441.662954] eth0: hw csum failure
> [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
> [  441.662960] Hardware name: System manufacturer System Product Name/P6T
> DELUXE V2, BIOS 1202    12/22/2010
> [  441.662960] Call Trace:
> [  441.662963]  <IRQ>
> [  441.662968]  dump_stack+0x46/0x5b
> [  441.662972]  __skb_checksum_complete+0xb0/0xc0
> [  441.662975]  tcp_v4_rcv+0x528/0xb60
> [  441.662979]  ? ipt_do_table+0x2d0/0x400
> [  441.662981]  ip_local_deliver_finish+0x5a/0x110
> [  441.662983]  ip_local_deliver+0xe1/0xf0
> [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
> [  441.662986]  ip_rcv+0xca/0xe0
> [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
> [  441.662993]  netif_receive_skb_internal+0x4e/0x130
> [  441.662994]  napi_gro_receive+0x6a/0x80
> [  441.662998]  sky2_poll+0x707/0xd20
> [  441.663000]  net_rx_action+0x237/0x380
> [  441.663002]  __do_softirq+0xdc/0x1e0
> [  441.663005]  irq_exit+0xa9/0xb0
> [  441.663007]  do_IRQ+0x45/0xc0
> [  441.663009]  common_interrupt+0xf/0xf
> [  441.663010]  </IRQ>
> [  441.663012] RIP: 0010:merge+0x22/0xb0
> [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
> 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
> 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
> [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffde
> [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
> ffff88021ab2d408
> [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
> 0000000000000000
> [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
> 0000000000008500
> [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
> ffffffffa021c440
> [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
> ffffc9000090b9e0
> [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> [  441.663065]  ? merge+0x57/0xb0
> [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> [  441.663082]  list_sort+0x8b/0x230
> [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
> [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
> [  441.663113]  ? __switch_to_asm+0x34/0x70
> [  441.663114]  ? __switch_to_asm+0x40/0x70
> [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
> [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
> [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
> [  441.663171]  do_vfs_ioctl+0x9a/0x600
> [  441.663173]  ksys_ioctl+0x35/0x60
> [  441.663175]  __x64_sys_ioctl+0x11/0x20
> [  441.663177]  do_syscall_64+0x3d/0xf0
> [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  441.663180] RIP: 0033:0x7f9377377f37
> [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
> db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
> [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
> 00007f9377377f37
> [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
> 0000000000000010
> [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
> 0000000000000000
> [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00000000c0206466
> [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
> 0000564497a38120
> [  462.833418] eth0: hw csum failure
> [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> [  462.833429] Hardware name: System manufacturer System Product Name/P6T
> DELUXE V2, BIOS 1202    12/22/2010
> [  462.833429] Call Trace:
> [  462.833432]  <IRQ>
> [  462.833438]  dump_stack+0x46/0x5b
> [  462.833442]  __skb_checksum_complete+0xb0/0xc0
> [  462.833446]  tcp_v4_rcv+0x528/0xb60
> [  462.833449]  ? ipt_do_table+0x2d0/0x400
> [  462.833452]  ip_local_deliver_finish+0x5a/0x110
> [  462.833454]  ip_local_deliver+0xe1/0xf0
> [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
> [  462.833457]  ip_rcv+0xca/0xe0
> [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
> [  462.833464]  netif_receive_skb_internal+0x4e/0x130
> [  462.833466]  napi_gro_receive+0x6a/0x80
> [  462.833469]  sky2_poll+0x707/0xd20
> [  462.833471]  net_rx_action+0x237/0x380
> [  462.833474]  __do_softirq+0xdc/0x1e0
> [  462.833477]  irq_exit+0xa9/0xb0
> [  462.833479]  do_IRQ+0x45/0xc0
> [  462.833481]  common_interrupt+0xf/0xf
> [  462.833482]  </IRQ>
> [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
> [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> ffffffffffffffde
> [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> 000000000000001f
> [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> 0000000000000000
> [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
> 0000000000000000
> [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
> 0000006bc3052131
> [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
> ffffffff8204af20
> [  462.833498]  ? cpuidle_enter_state+0x119/0x200
> [  462.833503]  do_idle+0x1bf/0x200
> [  462.833506]  cpu_startup_entry+0x6a/0x70
> [  462.833510]  start_secondary+0x17f/0x1c0
> [  462.833513]  secondary_startup_64+0xa4/0xb0
>
> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> got the following first bad commit:
>
> commit 88078d98d1bb085d72af8437707279e203524fa5
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Apr 18 11:43:15 2018 -0700
>
>     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>
>     After working on IP defragmentation lately, I found that some large
>     packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>     zero paddings on the last (small) fragment.
>
>     While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
>     to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>     fragments had CHECKSUM_COMPLETE set.
>
>     We can instead compute the checksum of the part we are trimming,
>     usually smaller than the part we keep.
>
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>

Thanks for bisecting !

This commit is known to expose some NIC/driver bugs.

Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
("net: sungem: fix rx checksum support")  for one driver needing a fix.

I assume SKY2_HW_NEW_LE is not set on your NIC ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-15 15:41 ` Eric Dumazet
@ 2018-10-15 16:12   ` Dave Stevenson
  2018-10-15 16:21   ` Stephen Hemminger
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Dave Stevenson @ 2018-10-15 16:12 UTC (permalink / raw)
  To: edumazet
  Cc: stephen, netdev, rossi.f, Woojung Huh,
	Microchip Linux Driver Support, Steve Glendinning

Hi Eric.

On Mon, 15 Oct 2018 at 16:42, Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 14 Oct 2018 10:42:48 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 201423] New: eth0: hw csum failure
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=201423
> >
> >             Bug ID: 201423
> >            Summary: eth0: hw csum failure
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 4.19.0-rc7
> >           Hardware: Intel
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: stephen@networkplumber.org
> >           Reporter: rossi.f@inwind.it
> >         Regression: No
> >
> > I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
> > ports. I get the following error message:
> >
> > [  433.727397] eth0: hw csum failure
> > [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  433.727406] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  433.727407] Call Trace:
> > [  433.727409]  <IRQ>
> > [  433.727415]  dump_stack+0x46/0x5b
> > [  433.727419]  __skb_checksum_complete+0xb0/0xc0
> > [  433.727423]  tcp_v4_rcv+0x528/0xb60
> > [  433.727426]  ? ipt_do_table+0x2d0/0x400
> > [  433.727429]  ip_local_deliver_finish+0x5a/0x110
> > [  433.727430]  ip_local_deliver+0xe1/0xf0
> > [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  433.727432]  ip_rcv+0xca/0xe0
> > [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
> > [  433.727438]  netif_receive_skb_internal+0x4e/0x130
> > [  433.727439]  napi_gro_receive+0x6a/0x80
> > [  433.727442]  sky2_poll+0x707/0xd20
> > [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
> > [  433.727447]  net_rx_action+0x237/0x380
> > [  433.727449]  __do_softirq+0xdc/0x1e0
> > [  433.727452]  irq_exit+0xa9/0xb0
> > [  433.727453]  do_IRQ+0x45/0xc0
> > [  433.727455]  common_interrupt+0xf/0xf
> > [  433.727456]  </IRQ>
> > [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
> > 000000650512105d
> > [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
> > 00000064fc2a8b1c
> > [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  433.727468]  ? cpuidle_enter_state+0x119/0x200
> > [  433.727471]  do_idle+0x1bf/0x200
> > [  433.727473]  cpu_startup_entry+0x6a/0x70
> > [  433.727475]  start_secondary+0x17f/0x1c0
> > [  433.727476]  secondary_startup_64+0xa4/0xb0
> > [  441.662954] eth0: hw csum failure
> > [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
> > [  441.662960] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  441.662960] Call Trace:
> > [  441.662963]  <IRQ>
> > [  441.662968]  dump_stack+0x46/0x5b
> > [  441.662972]  __skb_checksum_complete+0xb0/0xc0
> > [  441.662975]  tcp_v4_rcv+0x528/0xb60
> > [  441.662979]  ? ipt_do_table+0x2d0/0x400
> > [  441.662981]  ip_local_deliver_finish+0x5a/0x110
> > [  441.662983]  ip_local_deliver+0xe1/0xf0
> > [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  441.662986]  ip_rcv+0xca/0xe0
> > [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
> > [  441.662993]  netif_receive_skb_internal+0x4e/0x130
> > [  441.662994]  napi_gro_receive+0x6a/0x80
> > [  441.662998]  sky2_poll+0x707/0xd20
> > [  441.663000]  net_rx_action+0x237/0x380
> > [  441.663002]  __do_softirq+0xdc/0x1e0
> > [  441.663005]  irq_exit+0xa9/0xb0
> > [  441.663007]  do_IRQ+0x45/0xc0
> > [  441.663009]  common_interrupt+0xf/0xf
> > [  441.663010]  </IRQ>
> > [  441.663012] RIP: 0010:merge+0x22/0xb0
> > [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
> > 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
> > 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
> > [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffffde
> > [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
> > ffff88021ab2d408
> > [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
> > 0000000000000000
> > [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
> > 0000000000008500
> > [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
> > ffffffffa021c440
> > [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
> > ffffc9000090b9e0
> > [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663065]  ? merge+0x57/0xb0
> > [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663082]  list_sort+0x8b/0x230
> > [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
> > [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
> > [  441.663113]  ? __switch_to_asm+0x34/0x70
> > [  441.663114]  ? __switch_to_asm+0x40/0x70
> > [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
> > [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
> > [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
> > [  441.663171]  do_vfs_ioctl+0x9a/0x600
> > [  441.663173]  ksys_ioctl+0x35/0x60
> > [  441.663175]  __x64_sys_ioctl+0x11/0x20
> > [  441.663177]  do_syscall_64+0x3d/0xf0
> > [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  441.663180] RIP: 0033:0x7f9377377f37
> > [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
> > db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
> > f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
> > [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
> > 00007f9377377f37
> > [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
> > 0000000000000010
> > [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00000000c0206466
> > [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
> > 0000564497a38120
> > [  462.833418] eth0: hw csum failure
> > [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  462.833429] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  462.833429] Call Trace:
> > [  462.833432]  <IRQ>
> > [  462.833438]  dump_stack+0x46/0x5b
> > [  462.833442]  __skb_checksum_complete+0xb0/0xc0
> > [  462.833446]  tcp_v4_rcv+0x528/0xb60
> > [  462.833449]  ? ipt_do_table+0x2d0/0x400
> > [  462.833452]  ip_local_deliver_finish+0x5a/0x110
> > [  462.833454]  ip_local_deliver+0xe1/0xf0
> > [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  462.833457]  ip_rcv+0xca/0xe0
> > [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
> > [  462.833464]  netif_receive_skb_internal+0x4e/0x130
> > [  462.833466]  napi_gro_receive+0x6a/0x80
> > [  462.833469]  sky2_poll+0x707/0xd20
> > [  462.833471]  net_rx_action+0x237/0x380
> > [  462.833474]  __do_softirq+0xdc/0x1e0
> > [  462.833477]  irq_exit+0xa9/0xb0
> > [  462.833479]  do_IRQ+0x45/0xc0
> > [  462.833481]  common_interrupt+0xf/0xf
> > [  462.833482]  </IRQ>
> > [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
> > 0000006bc3052131
> > [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  462.833498]  ? cpuidle_enter_state+0x119/0x200
> > [  462.833503]  do_idle+0x1bf/0x200
> > [  462.833506]  cpu_startup_entry+0x6a/0x70
> > [  462.833510]  start_secondary+0x17f/0x1c0
> > [  462.833513]  secondary_startup_64+0xa4/0xb0
> >
> > Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> > got the following first bad commit:
> >
> > commit 88078d98d1bb085d72af8437707279e203524fa5
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Wed Apr 18 11:43:15 2018 -0700
> >
> >     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >
> >     After working on IP defragmentation lately, I found that some large
> >     packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >     zero paddings on the last (small) fragment.
> >
> >     While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >     to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >     fragments had CHECKSUM_COMPLETE set.
> >
> >     We can instead compute the checksum of the part we are trimming,
> >     usually smaller than the part we keep.
> >
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
>
> Thanks for bisecting !
>
> This commit is known to expose some NIC/driver bugs.
>
> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
>
> I assume SKY2_HW_NEW_LE is not set on your NIC ?

Just to say that we've also just hit this with both the LAN78xx and
SMSC9514 drivers, ie all Raspberry Pis with onboard ethernet. Likewise
that commit had been pinpointed as the cause, or at least exposing an
underlying issue.
As the patch has been backported in 4.14.71 it's hitting LTS users too.

Thanks for the pointer on sungem. I'll have a look into what's going
on and see if we can sort it, although I have cc'ed in the maintainers
of those chips in case they are already on the case.

Cheers.
  Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bug 201423] New: eth0: hw csum failure
  2018-10-15 15:41 ` Eric Dumazet
  2018-10-15 16:12   ` Dave Stevenson
@ 2018-10-15 16:21   ` Stephen Hemminger
  2018-10-15 22:28   ` Fw: " Fabio Rossi
  2018-10-16  6:30   ` Andre Tomt
  3 siblings, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2018-10-15 16:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, rossi.f

On Mon, 15 Oct 2018 08:41:47 -0700
Eric Dumazet <edumazet@google.com> wrote:

> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 14 Oct 2018 10:42:48 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 201423] New: eth0: hw csum failure
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=201423
> >
> >             Bug ID: 201423
> >            Summary: eth0: hw csum failure
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 4.19.0-rc7
> >           Hardware: Intel
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: stephen@networkplumber.org
> >           Reporter: rossi.f@inwind.it
> >         Regression: No
> >
> > I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
> > ports. I get the following error message:
> >
> > [  433.727397] eth0: hw csum failure
> > [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  433.727406] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  433.727407] Call Trace:
> > [  433.727409]  <IRQ>
> > [  433.727415]  dump_stack+0x46/0x5b
> > [  433.727419]  __skb_checksum_complete+0xb0/0xc0
> > [  433.727423]  tcp_v4_rcv+0x528/0xb60
> > [  433.727426]  ? ipt_do_table+0x2d0/0x400
> > [  433.727429]  ip_local_deliver_finish+0x5a/0x110
> > [  433.727430]  ip_local_deliver+0xe1/0xf0
> > [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  433.727432]  ip_rcv+0xca/0xe0
> > [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
> > [  433.727438]  netif_receive_skb_internal+0x4e/0x130
> > [  433.727439]  napi_gro_receive+0x6a/0x80
> > [  433.727442]  sky2_poll+0x707/0xd20
> > [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
> > [  433.727447]  net_rx_action+0x237/0x380
> > [  433.727449]  __do_softirq+0xdc/0x1e0
> > [  433.727452]  irq_exit+0xa9/0xb0
> > [  433.727453]  do_IRQ+0x45/0xc0
> > [  433.727455]  common_interrupt+0xf/0xf
> > [  433.727456]  </IRQ>
> > [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
> > 000000650512105d
> > [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
> > 00000064fc2a8b1c
> > [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  433.727468]  ? cpuidle_enter_state+0x119/0x200
> > [  433.727471]  do_idle+0x1bf/0x200
> > [  433.727473]  cpu_startup_entry+0x6a/0x70
> > [  433.727475]  start_secondary+0x17f/0x1c0
> > [  433.727476]  secondary_startup_64+0xa4/0xb0
> > [  441.662954] eth0: hw csum failure
> > [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
> > [  441.662960] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  441.662960] Call Trace:
> > [  441.662963]  <IRQ>
> > [  441.662968]  dump_stack+0x46/0x5b
> > [  441.662972]  __skb_checksum_complete+0xb0/0xc0
> > [  441.662975]  tcp_v4_rcv+0x528/0xb60
> > [  441.662979]  ? ipt_do_table+0x2d0/0x400
> > [  441.662981]  ip_local_deliver_finish+0x5a/0x110
> > [  441.662983]  ip_local_deliver+0xe1/0xf0
> > [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  441.662986]  ip_rcv+0xca/0xe0
> > [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
> > [  441.662993]  netif_receive_skb_internal+0x4e/0x130
> > [  441.662994]  napi_gro_receive+0x6a/0x80
> > [  441.662998]  sky2_poll+0x707/0xd20
> > [  441.663000]  net_rx_action+0x237/0x380
> > [  441.663002]  __do_softirq+0xdc/0x1e0
> > [  441.663005]  irq_exit+0xa9/0xb0
> > [  441.663007]  do_IRQ+0x45/0xc0
> > [  441.663009]  common_interrupt+0xf/0xf
> > [  441.663010]  </IRQ>
> > [  441.663012] RIP: 0010:merge+0x22/0xb0
> > [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
> > 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
> > 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
> > [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffffde
> > [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
> > ffff88021ab2d408
> > [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
> > 0000000000000000
> > [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
> > 0000000000008500
> > [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
> > ffffffffa021c440
> > [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
> > ffffc9000090b9e0
> > [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663065]  ? merge+0x57/0xb0
> > [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663082]  list_sort+0x8b/0x230
> > [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
> > [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
> > [  441.663113]  ? __switch_to_asm+0x34/0x70
> > [  441.663114]  ? __switch_to_asm+0x40/0x70
> > [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
> > [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
> > [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
> > [  441.663171]  do_vfs_ioctl+0x9a/0x600
> > [  441.663173]  ksys_ioctl+0x35/0x60
> > [  441.663175]  __x64_sys_ioctl+0x11/0x20
> > [  441.663177]  do_syscall_64+0x3d/0xf0
> > [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  441.663180] RIP: 0033:0x7f9377377f37
> > [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
> > db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
> > f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
> > [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
> > 00007f9377377f37
> > [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
> > 0000000000000010
> > [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00000000c0206466
> > [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
> > 0000564497a38120
> > [  462.833418] eth0: hw csum failure
> > [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  462.833429] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  462.833429] Call Trace:
> > [  462.833432]  <IRQ>
> > [  462.833438]  dump_stack+0x46/0x5b
> > [  462.833442]  __skb_checksum_complete+0xb0/0xc0
> > [  462.833446]  tcp_v4_rcv+0x528/0xb60
> > [  462.833449]  ? ipt_do_table+0x2d0/0x400
> > [  462.833452]  ip_local_deliver_finish+0x5a/0x110
> > [  462.833454]  ip_local_deliver+0xe1/0xf0
> > [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  462.833457]  ip_rcv+0xca/0xe0
> > [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
> > [  462.833464]  netif_receive_skb_internal+0x4e/0x130
> > [  462.833466]  napi_gro_receive+0x6a/0x80
> > [  462.833469]  sky2_poll+0x707/0xd20
> > [  462.833471]  net_rx_action+0x237/0x380
> > [  462.833474]  __do_softirq+0xdc/0x1e0
> > [  462.833477]  irq_exit+0xa9/0xb0
> > [  462.833479]  do_IRQ+0x45/0xc0
> > [  462.833481]  common_interrupt+0xf/0xf
> > [  462.833482]  </IRQ>
> > [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
> > 0000006bc3052131
> > [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  462.833498]  ? cpuidle_enter_state+0x119/0x200
> > [  462.833503]  do_idle+0x1bf/0x200
> > [  462.833506]  cpu_startup_entry+0x6a/0x70
> > [  462.833510]  start_secondary+0x17f/0x1c0
> > [  462.833513]  secondary_startup_64+0xa4/0xb0
> >
> > Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> > got the following first bad commit:
> >
> > commit 88078d98d1bb085d72af8437707279e203524fa5
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Wed Apr 18 11:43:15 2018 -0700
> >
> >     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >
> >     After working on IP defragmentation lately, I found that some large
> >     packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >     zero paddings on the last (small) fragment.
> >
> >     While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >     to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >     fragments had CHECKSUM_COMPLETE set.
> >
> >     We can instead compute the checksum of the part we are trimming,
> >     usually smaller than the part we keep.
> >
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >  
> 
> Thanks for bisecting !
> 
> This commit is known to expose some NIC/driver bugs.
> 
> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> 
> I assume SKY2_HW_NEW_LE is not set on your NIC ?

There are two variants of this chip, one does 1's compliment checksum, and
the other one does TCP checksum. Maybe the 1's compliment version is incorrectly
including the CRC.

Side note, not sure why but the driver only calls gro for checksummed packets.
Is that necessary?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-15 15:41 ` Eric Dumazet
  2018-10-15 16:12   ` Dave Stevenson
  2018-10-15 16:21   ` Stephen Hemminger
@ 2018-10-15 22:28   ` Fabio Rossi
  2018-10-16  6:30   ` Andre Tomt
  3 siblings, 0 replies; 22+ messages in thread
From: Fabio Rossi @ 2018-10-15 22:28 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger; +Cc: netdev



On 15 October 2018 17:41:47 CEST, Eric Dumazet <edumazet@google.com> wrote:
>On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
><stephen@networkplumber.org> wrote:
>>
>>
>>
>> Begin forwarded message:
>>
>> Date: Sun, 14 Oct 2018 10:42:48 +0000
>> From: bugzilla-daemon@bugzilla.kernel.org
>> To: stephen@networkplumber.org
>> Subject: [Bug 201423] New: eth0: hw csum failure
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=201423
>>
>>             Bug ID: 201423
>>            Summary: eth0: hw csum failure
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 4.19.0-rc7
>>           Hardware: Intel
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>           Assignee: stephen@networkplumber.org
>>           Reporter: rossi.f@inwind.it
>>         Regression: No
>>
>> I have a P6T DELUXE V2 motherboard and using the sky2 driver for the
>ethernet
>> ports. I get the following error message:
>>
>> [  433.727397] eth0: hw csum failure
>> [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7
>#19
>> [  433.727406] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 1202    12/22/2010
>> [  433.727407] Call Trace:
>> [  433.727409]  <IRQ>
>> [  433.727415]  dump_stack+0x46/0x5b
>> [  433.727419]  __skb_checksum_complete+0xb0/0xc0
>> [  433.727423]  tcp_v4_rcv+0x528/0xb60
>> [  433.727426]  ? ipt_do_table+0x2d0/0x400
>> [  433.727429]  ip_local_deliver_finish+0x5a/0x110
>> [  433.727430]  ip_local_deliver+0xe1/0xf0
>> [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  433.727432]  ip_rcv+0xca/0xe0
>> [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
>> [  433.727438]  netif_receive_skb_internal+0x4e/0x130
>> [  433.727439]  napi_gro_receive+0x6a/0x80
>> [  433.727442]  sky2_poll+0x707/0xd20
>> [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
>> [  433.727447]  net_rx_action+0x237/0x380
>> [  433.727449]  __do_softirq+0xdc/0x1e0
>> [  433.727452]  irq_exit+0xa9/0xb0
>> [  433.727453]  do_IRQ+0x45/0xc0
>> [  433.727455]  common_interrupt+0xf/0xf
>> [  433.727456]  </IRQ>
>> [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
>> [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e
>e8 d1 8f
>> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20
><4c> 89 e1
>> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
>> [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
>> ffffffffffffffde
>> [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
>> 000000000000001f
>> [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
>> 0000000000000000
>> [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
>> 000000650512105d
>> [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
>> 00000064fc2a8b1c
>> [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
>> ffffffff8204af20
>> [  433.727468]  ? cpuidle_enter_state+0x119/0x200
>> [  433.727471]  do_idle+0x1bf/0x200
>> [  433.727473]  cpu_startup_entry+0x6a/0x70
>> [  433.727475]  start_secondary+0x17f/0x1c0
>> [  433.727476]  secondary_startup_64+0xa4/0xb0
>> [  441.662954] eth0: hw csum failure
>> [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted
>4.19.0-rc7 #19
>> [  441.662960] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 1202    12/22/2010
>> [  441.662960] Call Trace:
>> [  441.662963]  <IRQ>
>> [  441.662968]  dump_stack+0x46/0x5b
>> [  441.662972]  __skb_checksum_complete+0xb0/0xc0
>> [  441.662975]  tcp_v4_rcv+0x528/0xb60
>> [  441.662979]  ? ipt_do_table+0x2d0/0x400
>> [  441.662981]  ip_local_deliver_finish+0x5a/0x110
>> [  441.662983]  ip_local_deliver+0xe1/0xf0
>> [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  441.662986]  ip_rcv+0xca/0xe0
>> [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
>> [  441.662993]  netif_receive_skb_internal+0x4e/0x130
>> [  441.662994]  napi_gro_receive+0x6a/0x80
>> [  441.662998]  sky2_poll+0x707/0xd20
>> [  441.663000]  net_rx_action+0x237/0x380
>> [  441.663002]  __do_softirq+0xdc/0x1e0
>> [  441.663005]  irq_exit+0xa9/0xb0
>> [  441.663007]  do_IRQ+0x45/0xc0
>> [  441.663009]  common_interrupt+0xf/0xf
>> [  441.663010]  </IRQ>
>> [  441.663012] RIP: 0010:merge+0x22/0xb0
>> [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48
>89 d5 53
>> 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0
><48> 85 c9
>> 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
>> [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
>> ffffffffffffffde
>> [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
>> ffff88021ab2d408
>> [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
>> 0000000000000000
>> [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
>> 0000000000008500
>> [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
>> ffffffffa021c440
>> [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
>> ffffc9000090b9e0
>> [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120
>[radeon]
>> [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120
>[radeon]
>> [  441.663065]  ? merge+0x57/0xb0
>> [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120
>[radeon]
>> [  441.663082]  list_sort+0x8b/0x230
>> [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
>> [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
>> [  441.663113]  ? __switch_to_asm+0x34/0x70
>> [  441.663114]  ? __switch_to_asm+0x40/0x70
>> [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
>> [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
>> [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
>> [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
>> [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
>> [  441.663171]  do_vfs_ioctl+0x9a/0x600
>> [  441.663173]  ksys_ioctl+0x35/0x60
>> [  441.663175]  __x64_sys_ioctl+0x11/0x20
>> [  441.663177]  do_syscall_64+0x3d/0xf0
>> [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  441.663180] RIP: 0033:0x7f9377377f37
>> [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18
>c3 e8 ad
>> db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05
><48> 3d 01
>> f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
>> [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000010
>> [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
>> 00007f9377377f37
>> [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
>> 0000000000000010
>> [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
>> 0000000000000000
>> [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
>> 00000000c0206466
>> [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
>> 0000564497a38120
>> [  462.833418] eth0: hw csum failure
>> [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7
>#19
>> [  462.833429] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 1202    12/22/2010
>> [  462.833429] Call Trace:
>> [  462.833432]  <IRQ>
>> [  462.833438]  dump_stack+0x46/0x5b
>> [  462.833442]  __skb_checksum_complete+0xb0/0xc0
>> [  462.833446]  tcp_v4_rcv+0x528/0xb60
>> [  462.833449]  ? ipt_do_table+0x2d0/0x400
>> [  462.833452]  ip_local_deliver_finish+0x5a/0x110
>> [  462.833454]  ip_local_deliver+0xe1/0xf0
>> [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  462.833457]  ip_rcv+0xca/0xe0
>> [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
>> [  462.833464]  netif_receive_skb_internal+0x4e/0x130
>> [  462.833466]  napi_gro_receive+0x6a/0x80
>> [  462.833469]  sky2_poll+0x707/0xd20
>> [  462.833471]  net_rx_action+0x237/0x380
>> [  462.833474]  __do_softirq+0xdc/0x1e0
>> [  462.833477]  irq_exit+0xa9/0xb0
>> [  462.833479]  do_IRQ+0x45/0xc0
>> [  462.833481]  common_interrupt+0xf/0xf
>> [  462.833482]  </IRQ>
>> [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
>> [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e
>e8 d1 8f
>> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20
><4c> 89 e1
>> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
>> [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
>> ffffffffffffffde
>> [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
>> 000000000000001f
>> [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
>> 0000000000000000
>> [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
>> 0000000000000000
>> [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
>> 0000006bc3052131
>> [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
>> ffffffff8204af20
>> [  462.833498]  ? cpuidle_enter_state+0x119/0x200
>> [  462.833503]  do_idle+0x1bf/0x200
>> [  462.833506]  cpu_startup_entry+0x6a/0x70
>> [  462.833510]  start_secondary+0x17f/0x1c0
>> [  462.833513]  secondary_startup_64+0xa4/0xb0
>>
>> Something is changed between 4.17.12 and 4.18, after bisecting the
>problem I
>> got the following first bad commit:
>>
>> commit 88078d98d1bb085d72af8437707279e203524fa5
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Wed Apr 18 11:43:15 2018 -0700
>>
>>     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>>
>>     After working on IP defragmentation lately, I found that some
>large
>>     packets defeat CHECKSUM_COMPLETE optimization because of NIC
>adding
>>     zero paddings on the last (small) fragment.
>>
>>     While removing the padding with pskb_trim_rcsum(), we set
>skb->ip_summed
>>     to CHECKSUM_NONE, forcing a full csum validation, even if all
>prior
>>     fragments had CHECKSUM_COMPLETE set.
>>
>>     We can instead compute the checksum of the part we are trimming,
>>     usually smaller than the part we keep.
>>
>>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>
>Thanks for bisecting !
>
>This commit is known to expose some NIC/driver bugs.
>
>Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
>("net: sungem: fix rx checksum support")  for one driver needing a fix.
>
>I assume SKY2_HW_NEW_LE is not set on your NIC ?

here is the chip found on my motherboard so it seems that flag is not set 

# dmesg | grep sky2

[    0.545661] sky2: driver version 1.30 [    0.545781] sky2 0000:06:00.0: Yukon-2 EC Ultra chip revision 3 [    0.546067] sky2 0000:06:00.0 eth0: addr 00:24:8c:xx:xx:xx [    0.546188] sky2 0000:04:00.0: Yukon-2 EC Ultra chip revision 3 [    0.546484] sky2 0000:04:00.0 eth1: addr 00:24:8c:xx:xx:xx [   38.043074] sky2 0000:06:00.0 eth0: enabling interface [   39.842161] sky2 0000:06:00.0 eth0: Link is up at 100 Mbps, full duplex, flow control rx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-15 15:41 ` Eric Dumazet
                     ` (2 preceding siblings ...)
  2018-10-15 22:28   ` Fw: " Fabio Rossi
@ 2018-10-16  6:30   ` Andre Tomt
  2018-10-16 13:00     ` Eric Dumazet
  3 siblings, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-16  6:30 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger; +Cc: netdev, rossi.f

On 15.10.2018 17:41, Eric Dumazet wrote:
> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
>> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
>> got the following first bad commit:
>>
>> commit 88078d98d1bb085d72af8437707279e203524fa5
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Wed Apr 18 11:43:15 2018 -0700
>>
>>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>>
>>      After working on IP defragmentation lately, I found that some large
>>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>>      zero paddings on the last (small) fragment.
>>
>>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
>>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>>      fragments had CHECKSUM_COMPLETE set.
>>
>>      We can instead compute the checksum of the part we are trimming,
>>      usually smaller than the part we keep.
>>
>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>
> 
> Thanks for bisecting !
> 
> This commit is known to expose some NIC/driver bugs.
> 
> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> 
> I assume SKY2_HW_NEW_LE is not set on your NIC ?
> 

I've seen similar on several systems with mlx4 cards when using 4.18.x - 
that is hw csum failure followed by some backtrace.

Only seems to happen on systems dealing with quite a bit of UDP.

Example from 4.18.10:
> [635607.740574] p0xe0: hw csum failure
> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> [635607.740599] Call Trace:
> [635607.740602]  <IRQ>
> [635607.740611]  dump_stack+0x5c/0x7b
> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
> [635607.740621]  udp6_gro_receive+0x211/0x290
> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
> [635607.740627]  dev_gro_receive+0x33e/0x550
> [635607.740628]  napi_gro_frags+0xa2/0x210
> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [635607.740658]  net_rx_action+0xe0/0x2e0
> [635607.740662]  __do_softirq+0xd8/0x2e5
> [635607.740666]  irq_exit+0xb4/0xc0
> [635607.740667]  do_IRQ+0x85/0xd0
> [635607.740670]  common_interrupt+0xf/0xf
> [635607.740671]  </IRQ>
> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 
> [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
> [635607.740712]  do_idle+0x1d0/0x240
> [635607.740715]  cpu_startup_entry+0x5f/0x70
> [635607.740719]  start_secondary+0x185/0x1a0
> [635607.740722]  secondary_startup_64+0xa5/0xb0
> [635607.740731] p0xe0: hw csum failure
> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> [635607.740746] Call Trace:
> [635607.740747]  <IRQ>
> [635607.740750]  dump_stack+0x5c/0x7b
> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> [635607.740774]  ip6_input_finish+0xc0/0x460
> [635607.740776]  ip6_input+0x2b/0x90
> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
> [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
> [635607.740785]  __netif_receive_skb_core+0x455/0xb30
> [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
> [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
> [635607.740792]  netif_receive_skb_internal+0x24/0xb0
> [635607.740793]  napi_gro_frags+0x165/0x210
> [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [635607.740811]  net_rx_action+0xe0/0x2e0
> [635607.740813]  __do_softirq+0xd8/0x2e5
> [635607.740816]  irq_exit+0xb4/0xc0
> [635607.740817]  do_IRQ+0x85/0xd0
> [635607.740820]  common_interrupt+0xf/0xf
> [635607.740821]  </IRQ>
> [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 
> [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
> [635607.740857]  do_idle+0x1d0/0x240
> [635607.740859]  cpu_startup_entry+0x5f/0x70
> [635607.740861]  start_secondary+0x185/0x1a0
> [635607.740863]  secondary_startup_64+0xa5/0xb0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-16  6:30   ` Andre Tomt
@ 2018-10-16 13:00     ` Eric Dumazet
  2018-10-19 21:58       ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-10-16 13:00 UTC (permalink / raw)
  To: andre; +Cc: Stephen Hemminger, netdev, rossi.f

On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
>
> On 15.10.2018 17:41, Eric Dumazet wrote:
> > On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> >> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> >> got the following first bad commit:
> >>
> >> commit 88078d98d1bb085d72af8437707279e203524fa5
> >> Author: Eric Dumazet <edumazet@google.com>
> >> Date:   Wed Apr 18 11:43:15 2018 -0700
> >>
> >>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >>
> >>      After working on IP defragmentation lately, I found that some large
> >>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >>      zero paddings on the last (small) fragment.
> >>
> >>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >>      fragments had CHECKSUM_COMPLETE set.
> >>
> >>      We can instead compute the checksum of the part we are trimming,
> >>      usually smaller than the part we keep.
> >>
> >>      Signed-off-by: Eric Dumazet <edumazet@google.com>
> >>      Signed-off-by: David S. Miller <davem@davemloft.net>
> >>
> >
> > Thanks for bisecting !
> >
> > This commit is known to expose some NIC/driver bugs.
> >
> > Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> > ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> >
> > I assume SKY2_HW_NEW_LE is not set on your NIC ?
> >
>
> I've seen similar on several systems with mlx4 cards when using 4.18.x -
> that is hw csum failure followed by some backtrace.
>
> Only seems to happen on systems dealing with quite a bit of UDP.
>

Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
but CHECKSUM_UNNECESSARY

I would be nice to track this a bit further, maybe by providing the
full packet content.

> Example from 4.18.10:
> > [635607.740574] p0xe0: hw csum failure
> > [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> > [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> > [635607.740599] Call Trace:
> > [635607.740602]  <IRQ>
> > [635607.740611]  dump_stack+0x5c/0x7b
> > [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
> > [635607.740621]  udp6_gro_receive+0x211/0x290
> > [635607.740624]  ipv6_gro_receive+0x1a8/0x390
> > [635607.740627]  dev_gro_receive+0x33e/0x550
> > [635607.740628]  napi_gro_frags+0xa2/0x210
> > [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> > [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> > [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> > [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> > [635607.740658]  net_rx_action+0xe0/0x2e0
> > [635607.740662]  __do_softirq+0xd8/0x2e5
> > [635607.740666]  irq_exit+0xb4/0xc0
> > [635607.740667]  do_IRQ+0x85/0xd0
> > [635607.740670]  common_interrupt+0xf/0xf
> > [635607.740671]  </IRQ>
> > [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> > [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> > [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> > [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> > [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> > [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> > [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> > [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> > [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
> > [635607.740712]  do_idle+0x1d0/0x240
> > [635607.740715]  cpu_startup_entry+0x5f/0x70
> > [635607.740719]  start_secondary+0x185/0x1a0
> > [635607.740722]  secondary_startup_64+0xa5/0xb0
> > [635607.740731] p0xe0: hw csum failure
> > [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> > [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> > [635607.740746] Call Trace:
> > [635607.740747]  <IRQ>
> > [635607.740750]  dump_stack+0x5c/0x7b
> > [635607.740755]  __skb_checksum_complete+0xb8/0xd0
> > [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
> > [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> > [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> > [635607.740774]  ip6_input_finish+0xc0/0x460
> > [635607.740776]  ip6_input+0x2b/0x90
> > [635607.740778]  ? ip6_rcv_finish+0x110/0x110
> > [635607.740780]  ipv6_rcv+0x2cd/0x4b0
> > [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
> > [635607.740785]  __netif_receive_skb_core+0x455/0xb30
> > [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
> > [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
> > [635607.740792]  netif_receive_skb_internal+0x24/0xb0
> > [635607.740793]  napi_gro_frags+0x165/0x210
> > [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> > [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> > [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> > [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> > [635607.740811]  net_rx_action+0xe0/0x2e0
> > [635607.740813]  __do_softirq+0xd8/0x2e5
> > [635607.740816]  irq_exit+0xb4/0xc0
> > [635607.740817]  do_IRQ+0x85/0xd0
> > [635607.740820]  common_interrupt+0xf/0xf
> > [635607.740821]  </IRQ>
> > [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> > [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> > [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> > [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> > [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> > [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> > [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> > [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> > [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
> > [635607.740857]  do_idle+0x1d0/0x240
> > [635607.740859]  cpu_startup_entry+0x5f/0x70
> > [635607.740861]  start_secondary+0x185/0x1a0
> > [635607.740863]  secondary_startup_64+0xa5/0xb0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-16 13:00     ` Eric Dumazet
@ 2018-10-19 21:58       ` Eric Dumazet
  2018-10-19 22:25         ` Eric Dumazet
  2018-10-31  0:25         ` Fabio Rossi
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-10-19 21:58 UTC (permalink / raw)
  To: Eric Dumazet, andre
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis



On 10/16/2018 06:00 AM, Eric Dumazet wrote:
> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
>>
>> On 15.10.2018 17:41, Eric Dumazet wrote:
>>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
>>>> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
>>>> got the following first bad commit:
>>>>
>>>> commit 88078d98d1bb085d72af8437707279e203524fa5
>>>> Author: Eric Dumazet <edumazet@google.com>
>>>> Date:   Wed Apr 18 11:43:15 2018 -0700
>>>>
>>>>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>>>>
>>>>      After working on IP defragmentation lately, I found that some large
>>>>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>>>>      zero paddings on the last (small) fragment.
>>>>
>>>>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
>>>>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>>>>      fragments had CHECKSUM_COMPLETE set.
>>>>
>>>>      We can instead compute the checksum of the part we are trimming,
>>>>      usually smaller than the part we keep.
>>>>
>>>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>
>>> Thanks for bisecting !
>>>
>>> This commit is known to expose some NIC/driver bugs.
>>>
>>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
>>> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
>>>
>>> I assume SKY2_HW_NEW_LE is not set on your NIC ?
>>>
>>
>> I've seen similar on several systems with mlx4 cards when using 4.18.x -
>> that is hw csum failure followed by some backtrace.
>>
>> Only seems to happen on systems dealing with quite a bit of UDP.
>>
> 
> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
> but CHECKSUM_UNNECESSARY
> 
> I would be nice to track this a bit further, maybe by providing the
> full packet content.
> 
>> Example from 4.18.10:
>>> [635607.740574] p0xe0: hw csum failure
>>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>> [635607.740599] Call Trace:
>>> [635607.740602]  <IRQ>
>>> [635607.740611]  dump_stack+0x5c/0x7b
>>> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
>>> [635607.740621]  udp6_gro_receive+0x211/0x290
>>> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
>>> [635607.740627]  dev_gro_receive+0x33e/0x550
>>> [635607.740628]  napi_gro_frags+0xa2/0x210
>>> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>> [635607.740658]  net_rx_action+0xe0/0x2e0
>>> [635607.740662]  __do_softirq+0xd8/0x2e5
>>> [635607.740666]  irq_exit+0xb4/0xc0
>>> [635607.740667]  do_IRQ+0x85/0xd0
>>> [635607.740670]  common_interrupt+0xf/0xf
>>> [635607.740671]  </IRQ>
>>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>> [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>> [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>> [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>> [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>> [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>> [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
>>> [635607.740712]  do_idle+0x1d0/0x240
>>> [635607.740715]  cpu_startup_entry+0x5f/0x70
>>> [635607.740719]  start_secondary+0x185/0x1a0
>>> [635607.740722]  secondary_startup_64+0xa5/0xb0
>>> [635607.740731] p0xe0: hw csum failure
>>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>> [635607.740746] Call Trace:
>>> [635607.740747]  <IRQ>
>>> [635607.740750]  dump_stack+0x5c/0x7b
>>> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
>>> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
>>> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>> [635607.740774]  ip6_input_finish+0xc0/0x460
>>> [635607.740776]  ip6_input+0x2b/0x90
>>> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
>>> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
>>> [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
>>> [635607.740785]  __netif_receive_skb_core+0x455/0xb30
>>> [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
>>> [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
>>> [635607.740792]  netif_receive_skb_internal+0x24/0xb0
>>> [635607.740793]  napi_gro_frags+0x165/0x210
>>> [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>> [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>> [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>> [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>> [635607.740811]  net_rx_action+0xe0/0x2e0
>>> [635607.740813]  __do_softirq+0xd8/0x2e5
>>> [635607.740816]  irq_exit+0xb4/0xc0
>>> [635607.740817]  do_IRQ+0x85/0xd0
>>> [635607.740820]  common_interrupt+0xf/0xf
>>> [635607.740821]  </IRQ>
>>> [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>> [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>> [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>> [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>> [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>> [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>> [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>> [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>> [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
>>> [635607.740857]  do_idle+0x1d0/0x240
>>> [635607.740859]  cpu_startup_entry+0x5f/0x70
>>> [635607.740861]  start_secondary+0x185/0x1a0
>>> [635607.740863]  secondary_startup_64+0xa5/0xb0

As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub()

Problems comes from trimming an odd number of bytes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-19 21:58       ` Eric Dumazet
@ 2018-10-19 22:25         ` Eric Dumazet
  2018-10-21 13:34           ` Andre Tomt
  2018-10-31  0:25         ` Fabio Rossi
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-10-19 22:25 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet, andre
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis



On 10/19/2018 02:58 PM, Eric Dumazet wrote:
> 
> 
> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
>> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
>>>
>>> On 15.10.2018 17:41, Eric Dumazet wrote:
>>>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
>>>>> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
>>>>> got the following first bad commit:
>>>>>
>>>>> commit 88078d98d1bb085d72af8437707279e203524fa5
>>>>> Author: Eric Dumazet <edumazet@google.com>
>>>>> Date:   Wed Apr 18 11:43:15 2018 -0700
>>>>>
>>>>>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>>>>>
>>>>>      After working on IP defragmentation lately, I found that some large
>>>>>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>>>>>      zero paddings on the last (small) fragment.
>>>>>
>>>>>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
>>>>>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>>>>>      fragments had CHECKSUM_COMPLETE set.
>>>>>
>>>>>      We can instead compute the checksum of the part we are trimming,
>>>>>      usually smaller than the part we keep.
>>>>>
>>>>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>>
>>>>
>>>> Thanks for bisecting !
>>>>
>>>> This commit is known to expose some NIC/driver bugs.
>>>>
>>>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
>>>> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
>>>>
>>>> I assume SKY2_HW_NEW_LE is not set on your NIC ?
>>>>
>>>
>>> I've seen similar on several systems with mlx4 cards when using 4.18.x -
>>> that is hw csum failure followed by some backtrace.
>>>
>>> Only seems to happen on systems dealing with quite a bit of UDP.
>>>
>>
>> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
>> but CHECKSUM_UNNECESSARY
>>
>> I would be nice to track this a bit further, maybe by providing the
>> full packet content.
>>
>>> Example from 4.18.10:
>>>> [635607.740574] p0xe0: hw csum failure
>>>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>>> [635607.740599] Call Trace:
>>>> [635607.740602]  <IRQ>
>>>> [635607.740611]  dump_stack+0x5c/0x7b
>>>> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
>>>> [635607.740621]  udp6_gro_receive+0x211/0x290
>>>> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
>>>> [635607.740627]  dev_gro_receive+0x33e/0x550
>>>> [635607.740628]  napi_gro_frags+0xa2/0x210
>>>> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>>> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>>> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>>> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>>> [635607.740658]  net_rx_action+0xe0/0x2e0
>>>> [635607.740662]  __do_softirq+0xd8/0x2e5
>>>> [635607.740666]  irq_exit+0xb4/0xc0
>>>> [635607.740667]  do_IRQ+0x85/0xd0
>>>> [635607.740670]  common_interrupt+0xf/0xf
>>>> [635607.740671]  </IRQ>
>>>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>>> [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>>> [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>>> [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>>> [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>>> [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>>> [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>>> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
>>>> [635607.740712]  do_idle+0x1d0/0x240
>>>> [635607.740715]  cpu_startup_entry+0x5f/0x70
>>>> [635607.740719]  start_secondary+0x185/0x1a0
>>>> [635607.740722]  secondary_startup_64+0xa5/0xb0
>>>> [635607.740731] p0xe0: hw csum failure
>>>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>>> [635607.740746] Call Trace:
>>>> [635607.740747]  <IRQ>
>>>> [635607.740750]  dump_stack+0x5c/0x7b
>>>> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
>>>> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
>>>> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>>> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>>> [635607.740774]  ip6_input_finish+0xc0/0x460
>>>> [635607.740776]  ip6_input+0x2b/0x90
>>>> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
>>>> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
>>>> [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
>>>> [635607.740785]  __netif_receive_skb_core+0x455/0xb30
>>>> [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
>>>> [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
>>>> [635607.740792]  netif_receive_skb_internal+0x24/0xb0
>>>> [635607.740793]  napi_gro_frags+0x165/0x210
>>>> [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>>> [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>>> [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>>> [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>>> [635607.740811]  net_rx_action+0xe0/0x2e0
>>>> [635607.740813]  __do_softirq+0xd8/0x2e5
>>>> [635607.740816]  irq_exit+0xb4/0xc0
>>>> [635607.740817]  do_IRQ+0x85/0xd0
>>>> [635607.740820]  common_interrupt+0xf/0xf
>>>> [635607.740821]  </IRQ>
>>>> [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>>> [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>>> [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>>> [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>>> [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>>> [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>>> [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>>> [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>>> [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
>>>> [635607.740857]  do_idle+0x1d0/0x240
>>>> [635607.740859]  cpu_startup_entry+0x5f/0x70
>>>> [635607.740861]  start_secondary+0x185/0x1a0
>>>> [635607.740863]  secondary_startup_64+0xa5/0xb0
> 
> As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub()
> 
> Problems comes from trimming an odd number of bytes.

More exactly, trimming bytes starting at an odd offset.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-19 22:25         ` Eric Dumazet
@ 2018-10-21 13:34           ` Andre Tomt
  2018-10-24 19:41             ` Andre Tomt
  0 siblings, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-21 13:34 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 20.10.2018 00:25, Eric Dumazet wrote:
> On 10/19/2018 02:58 PM, Eric Dumazet wrote:
>> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
>>> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
>>>> I've seen similar on several systems with mlx4 cards when using 4.18.x -
>>>> that is hw csum failure followed by some backtrace.
>>>>
>>>> Only seems to happen on systems dealing with quite a bit of UDP.
>>>>
>>>
>>> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
>>> but CHECKSUM_UNNECESSARY
>>>
>>> I would be nice to track this a bit further, maybe by providing the
>>> full packet content.
>>>
<snip>
>>
>> As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub()
>>
>> Problems comes from trimming an odd number of bytes.
> 
> More exactly, trimming bytes starting at an odd offset.

No hw csum failures here since I deployed Dimitris fix on top of 4.18.16 
32 hours ago.

Thanks

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-21 13:34           ` Andre Tomt
@ 2018-10-24 19:41             ` Andre Tomt
  2018-10-25 17:38               ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-24 19:41 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 21.10.2018 15:34, Andre Tomt wrote:
> On 20.10.2018 00:25, Eric Dumazet wrote:
>> On 10/19/2018 02:58 PM, Eric Dumazet wrote:
>>> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
>>>> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
>>>>> I've seen similar on several systems with mlx4 cards when using 
>>>>> 4.18.x -
>>>>> that is hw csum failure followed by some backtrace.
>>>>>
>>>>> Only seems to happen on systems dealing with quite a bit of UDP.
>>>>>
>>>>
>>>> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
>>>> but CHECKSUM_UNNECESSARY
>>>>
>>>> I would be nice to track this a bit further, maybe by providing the
>>>> full packet content.
>>>>
> <snip>
>>>
>>> As a matter of fact Dimitris found the issue in the patch and is 
>>> working on a fix involving csum_block_sub()
>>>
>>> Problems comes from trimming an odd number of bytes.
>>
>> More exactly, trimming bytes starting at an odd offset.
> 
> No hw csum failures here since I deployed Dimitris fix on top of 4.18.16 
> 32 hours ago.
> 
> Thanks

It eventually showed up again with mlx4, on 4.18.16 + fix and also on 
4.19. I still do not have a useful packet capture.

It is running a torrent client serving up various linux distributions.

> [116116.994519] p0xe0: hw csum failure
> [116116.994550] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.0-1 #1
> [116116.994551] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> [116116.994555] Call Trace:
> [116116.994558]  <IRQ>
> [116116.994567]  dump_stack+0x5c/0x7b
> [116116.994574]  __skb_gro_checksum_complete+0x9a/0xa0
> [116116.994580]  udp6_gro_receive+0x211/0x290
> [116116.994585]  ipv6_gro_receive+0x1b1/0x3a0
> [116116.994588]  dev_gro_receive+0x3a0/0x620
> [116116.994590]  ? __build_skb+0x25/0xe0
> [116116.994592]  napi_gro_frags+0xa8/0x220
> [116116.994598]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> [116116.994611]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> [116116.994621]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> [116116.994629]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [116116.994635]  net_rx_action+0xe0/0x2e0
> [116116.994641]  __do_softirq+0xd8/0x2ff
> [116116.994646]  irq_exit+0xbd/0xd0
> [116116.994650]  do_IRQ+0x85/0xd0
> [116116.994656]  common_interrupt+0xf/0xf
> [116116.994659]  </IRQ>
> [116116.994665] RIP: 0010:cpuidle_enter_state+0xb3/0x310
> [116116.994668] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> [116116.994669] RSP: 0018:ffff924a0635bea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
> [116116.994671] RAX: ffff9016ffb60fc0 RBX: 0000699b9835d616 RCX: 000000000000001f
> [116116.994673] RDX: 0000699b9835d616 RSI: 00000000229837f7 RDI: 0000000000000000
> [116116.994674] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000020840
> [116116.994675] R10: ffff924a0635be88 R11: 0000000000000367 R12: ffff9016ffb69aa8
> [116116.994676] R13: ffffffffa50ac638 R14: 0000000000000000 R15: 0000699b981c63b9
> [116116.994680]  ? cpuidle_enter_state+0x90/0x310
> [116116.994685]  do_idle+0x1d0/0x240
> [116116.994687]  cpu_startup_entry+0x5f/0x70
> [116116.994690]  start_secondary+0x185/0x1a0
> [116116.994693]  secondary_startup_64+0xa4/0xb0
> [116116.994709] p0xe0: hw csum failure
> [116116.994739] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.0-1 #1
> [116116.994740] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> [116116.994741] Call Trace:
> [116116.994743]  <IRQ>
> [116116.994746]  dump_stack+0x5c/0x7b
> [116116.994751]  __skb_checksum_complete+0xb8/0xd0
> [116116.994755]  __udp6_lib_rcv+0xa0e/0xa20
> [116116.994764]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> [116116.994768]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> [116116.994771]  ip6_input_finish+0xc0/0x460
> [116116.994774]  ip6_input+0x2b/0x90
> [116116.994776]  ? ip6_make_skb+0x1b0/0x1b0
> [116116.994778]  ipv6_rcv+0x54/0xb0
> [116116.994781]  __netif_receive_skb_one_core+0x42/0x50
> [116116.994784]  netif_receive_skb_internal+0x24/0xb0
> [116116.994786]  napi_gro_frags+0x171/0x220
> [116116.994790]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> [116116.994798]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> [116116.994803]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> [116116.994806]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [116116.994808]  net_rx_action+0xe0/0x2e0
> [116116.994810]  __do_softirq+0xd8/0x2ff
> [116116.994812]  irq_exit+0xbd/0xd0
> [116116.994814]  do_IRQ+0x85/0xd0
> [116116.994816]  common_interrupt+0xf/0xf
> [116116.994818]  </IRQ>
> [116116.994821] RIP: 0010:cpuidle_enter_state+0xb3/0x310
> [116116.994823] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> [116116.994824] RSP: 0018:ffff924a0635bea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
> [116116.994825] RAX: ffff9016ffb60fc0 RBX: 0000699b9835d616 RCX: 000000000000001f
> [116116.994826] RDX: 0000699b9835d616 RSI: 00000000229837f7 RDI: 0000000000000000
> [116116.994827] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000020840
> [116116.994828] R10: ffff924a0635be88 R11: 0000000000000367 R12: ffff9016ffb69aa8
> [116116.994829] R13: ffffffffa50ac638 R14: 0000000000000000 R15: 0000699b981c63b9
> [116116.994832]  ? cpuidle_enter_state+0x90/0x310
> [116116.994835]  do_idle+0x1d0/0x240
> [116116.994837]  cpu_startup_entry+0x5f/0x70
> [116116.994838]  start_secondary+0x185/0x1a0
> [116116.994840]  secondary_startup_64+0xa4/0xb0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-24 19:41             ` Andre Tomt
@ 2018-10-25 17:38               ` Eric Dumazet
  2018-10-26 11:45                 ` Andre Tomt
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-10-25 17:38 UTC (permalink / raw)
  To: Andre Tomt, Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis



On 10/24/2018 12:41 PM, Andre Tomt wrote:
> 
> It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture.
> 
> It is running a torrent client serving up various linux distributions.
>

Have you also applied this fix ?

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-25 17:38               ` Eric Dumazet
@ 2018-10-26 11:45                 ` Andre Tomt
  2018-10-26 12:38                   ` Andre Tomt
  2018-10-27 21:41                   ` Andre Tomt
  0 siblings, 2 replies; 22+ messages in thread
From: Andre Tomt @ 2018-10-26 11:45 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 25.10.2018 19:38, Eric Dumazet wrote:
> 
> 
> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>
>> It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture.
>>
>> It is running a torrent client serving up various linux distributions.
>>
> 
> Have you also applied this fix ?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913
> 

No. I've applied it now to 4.19 and will report back if anything shows up.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-26 11:45                 ` Andre Tomt
@ 2018-10-26 12:38                   ` Andre Tomt
  2018-10-26 12:59                     ` Eric Dumazet
  2018-10-27 21:41                   ` Andre Tomt
  1 sibling, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-26 12:38 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 26.10.2018 13:45, Andre Tomt wrote:
> On 25.10.2018 19:38, Eric Dumazet wrote:
>>
>>
>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>
>>> It eventually showed up again with mlx4, on 4.18.16 + fix and also on 
>>> 4.19. I still do not have a useful packet capture.
>>>
>>> It is running a torrent client serving up various linux distributions.
>>>
>>
>> Have you also applied this fix ?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>
>>
> 
> No. I've applied it now to 4.19 and will report back if anything shows up.

And it tripped again with that commit; however on another box with a 
much more complicated setup (VRFs, sch_cake, ifb, conntrack/nat, 6in4 
tunnel, VF device on mlx4)

> [ 8197.348260] wanib: hw csum failure
> [ 8197.348288] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-1 #1
> [ 8197.348289] Hardware name: Supermicro SYS-5018D-FN8T/X10SDV-TP8F, BIOS 1.3 03/19/2018
> [ 8197.348290] Call Trace:
> [ 8197.348296]  <IRQ>
> [ 8197.348304]  dump_stack+0x5c/0x80
> [ 8197.348308]  __skb_checksum_complete+0xac/0xc0
> [ 8197.348318]  icmp_error+0x1c8/0x1f0 [nf_conntrack]
> [ 8197.348325]  ? ip_output+0x61/0xc0
> [ 8197.348328]  ? skb_copy_bits+0x13d/0x220
> [ 8197.348334]  nf_conntrack_in+0xd8/0x390 [nf_conntrack]
> [ 8197.348339]  ? ___pskb_trim+0x192/0x330
> [ 8197.348343]  nf_hook_slow+0x43/0xc0
> [ 8197.348346]  ip_rcv+0x90/0xb0
> [ 8197.348349]  ? ip_rcv_finish_core.isra.0+0x310/0x310
> [ 8197.348354]  __netif_receive_skb_one_core+0x42/0x50
> [ 8197.348357]  netif_receive_skb_internal+0x24/0xb0
> [ 8197.348361]  ifb_ri_tasklet+0x167/0x260 [ifb]
> [ 8197.348365]  tasklet_action_common.isra.3+0x49/0xb0
> [ 8197.348369]  __do_softirq+0xe7/0x2d3
> [ 8197.348372]  irq_exit+0x96/0xd0
> [ 8197.348375]  do_IRQ+0x85/0xd0
> [ 8197.348378]  common_interrupt+0xf/0xf
> [ 8197.348379]  </IRQ>
> [ 8197.348382] RIP: 0010:cpuidle_enter_state+0xb9/0x320
> [ 8197.348384] Code: e8 1c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 3e fb c0 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3
> [ 8197.348386] RSP: 0018:ffff9f0441953ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd5
> [ 8197.348388] RAX: ffff9759efae0fc0 RBX: 000007749807d911 RCX: 000000000000001f
> [ 8197.348390] RDX: 000007749807d911 RSI: 000000003a2e8670 RDI: 0000000000000000
> [ 8197.348393] RBP: ffff9759efae98a8 R08: 0000000000000002 R09: 0000000000020840
> [ 8197.348396] R10: 00626b4810384abc R11: ffff9759efae01e8 R12: 0000000000000001
> [ 8197.348398] R13: ffffffff8d0ac638 R14: 0000000000000001 R15: 0000000000000000
> [ 8197.348402]  ? cpuidle_enter_state+0x94/0x320
> [ 8197.348407]  do_idle+0x1e4/0x220
> [ 8197.348411]  cpu_startup_entry+0x5f/0x70
> [ 8197.348415]  start_secondary+0x185/0x1a0
> [ 8197.348417]  secondary_startup_64+0xa4/0xb0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-26 12:38                   ` Andre Tomt
@ 2018-10-26 12:59                     ` Eric Dumazet
  2018-10-26 13:17                       ` Andre Tomt
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-10-26 12:59 UTC (permalink / raw)
  To: andre
  Cc: Eric Dumazet, Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On Fri, Oct 26, 2018 at 5:38 AM Andre Tomt <andre@tomt.net> wrote:
>
> On 26.10.2018 13:45, Andre Tomt wrote:
> > On 25.10.2018 19:38, Eric Dumazet wrote:
> >>
> >>
> >> On 10/24/2018 12:41 PM, Andre Tomt wrote:
> >>>
> >>> It eventually showed up again with mlx4, on 4.18.16 + fix and also on
> >>> 4.19. I still do not have a useful packet capture.
> >>>
> >>> It is running a torrent client serving up various linux distributions.
> >>>
> >>
> >> Have you also applied this fix ?
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913
> >>
> >>
> >
> > No. I've applied it now to 4.19 and will report back if anything shows up.
>
> And it tripped again with that commit; however on another box with a
> much more complicated setup (VRFs, sch_cake, ifb, conntrack/nat, 6in4
> tunnel, VF device on mlx4)
>
> > [ 8197.348260] wanib: hw csum failure
> > [ 8197.348288] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-1 #1
> > [ 8197.348289] Hardware name: Supermicro SYS-5018D-FN8T/X10SDV-TP8F, BIOS 1.3 03/19/2018
> > [ 8197.348290] Call Trace:
> > [ 8197.348296]  <IRQ>
> > [ 8197.348304]  dump_stack+0x5c/0x80
> > [ 8197.348308]  __skb_checksum_complete+0xac/0xc0
> > [ 8197.348318]  icmp_error+0x1c8/0x1f0 [nf_conntrack]
> > [ 8197.348325]  ? ip_output+0x61/0xc0
> > [ 8197.348328]  ? skb_copy_bits+0x13d/0x220
> > [ 8197.348334]  nf_conntrack_in+0xd8/0x390 [nf_conntrack]
> > [ 8197.348339]  ? ___pskb_trim+0x192/0x330
> > [ 8197.348343]  nf_hook_slow+0x43/0xc0
> > [ 8197.348346]  ip_rcv+0x90/0xb0
> > [ 8197.348349]  ? ip_rcv_finish_core.isra.0+0x310/0x310
> > [ 8197.348354]  __netif_receive_skb_one_core+0x42/0x50
> > [ 8197.348357]  netif_receive_skb_internal+0x24/0xb0
> > [ 8197.348361]  ifb_ri_tasklet+0x167/0x260 [ifb]
> > [ 8197.348365]  tasklet_action_common.isra.3+0x49/0xb0
> > [ 8197.348369]  __do_softirq+0xe7/0x2d3
> > [ 8197.348372]  irq_exit+0x96/0xd0
> > [ 8197.348375]  do_IRQ+0x85/0xd0
> > [ 8197.348378]  common_interrupt+0xf/0xf
> > [ 8197.348379]  </IRQ>
> > [ 8197.348382] RIP: 0010:cpuidle_enter_state+0xb9/0x320
> > [ 8197.348384] Code: e8 1c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 3e fb c0 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3
> > [ 8197.348386] RSP: 0018:ffff9f0441953ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd5
> > [ 8197.348388] RAX: ffff9759efae0fc0 RBX: 000007749807d911 RCX: 000000000000001f
> > [ 8197.348390] RDX: 000007749807d911 RSI: 000000003a2e8670 RDI: 0000000000000000
> > [ 8197.348393] RBP: ffff9759efae98a8 R08: 0000000000000002 R09: 0000000000020840
> > [ 8197.348396] R10: 00626b4810384abc R11: ffff9759efae01e8 R12: 0000000000000001
> > [ 8197.348398] R13: ffffffff8d0ac638 R14: 0000000000000001 R15: 0000000000000000
> > [ 8197.348402]  ? cpuidle_enter_state+0x94/0x320
> > [ 8197.348407]  do_idle+0x1e4/0x220
> > [ 8197.348411]  cpu_startup_entry+0x5f/0x70
> > [ 8197.348415]  start_secondary+0x185/0x1a0
> > [ 8197.348417]  secondary_startup_64+0xa4/0xb0



Very different trace , yet another bug to track .

If you can, try to remove some components from this setup.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-26 12:59                     ` Eric Dumazet
@ 2018-10-26 13:17                       ` Andre Tomt
  0 siblings, 0 replies; 22+ messages in thread
From: Andre Tomt @ 2018-10-26 13:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 26.10.2018 14:59, Eric Dumazet wrote:
> On Fri, Oct 26, 2018 at 5:38 AM Andre Tomt <andre@tomt.net> wrote:
>> And it tripped again with that commit; however on another box with a
>> much more complicated setup (VRFs, sch_cake, ifb, conntrack/nat, 6in4
>> tunnel, VF device on mlx4)
>>
>>> [ 8197.348260] wanib: hw csum failure
>>> [ 8197.348288] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-1 #1
>>> [ 8197.348289] Hardware name: Supermicro SYS-5018D-FN8T/X10SDV-TP8F, BIOS 1.3 03/19/2018
>>> [ 8197.348290] Call Trace:
>>> [ 8197.348296]  <IRQ>
>>> [ 8197.348304]  dump_stack+0x5c/0x80
>>> [ 8197.348308]  __skb_checksum_complete+0xac/0xc0
>>> [ 8197.348318]  icmp_error+0x1c8/0x1f0 [nf_conntrack]
>>> [ 8197.348325]  ? ip_output+0x61/0xc0
>>> [ 8197.348328]  ? skb_copy_bits+0x13d/0x220
>>> [ 8197.348334]  nf_conntrack_in+0xd8/0x390 [nf_conntrack]
>>> [ 8197.348339]  ? ___pskb_trim+0x192/0x330
>>> [ 8197.348343]  nf_hook_slow+0x43/0xc0
>>> [ 8197.348346]  ip_rcv+0x90/0xb0
>>> [ 8197.348349]  ? ip_rcv_finish_core.isra.0+0x310/0x310
>>> [ 8197.348354]  __netif_receive_skb_one_core+0x42/0x50
>>> [ 8197.348357]  netif_receive_skb_internal+0x24/0xb0
>>> [ 8197.348361]  ifb_ri_tasklet+0x167/0x260 [ifb]
>>> [ 8197.348365]  tasklet_action_common.isra.3+0x49/0xb0
>>> [ 8197.348369]  __do_softirq+0xe7/0x2d3
>>> [ 8197.348372]  irq_exit+0x96/0xd0
>>> [ 8197.348375]  do_IRQ+0x85/0xd0
>>> [ 8197.348378]  common_interrupt+0xf/0xf
>>> [ 8197.348379]  </IRQ>
>>> [ 8197.348382] RIP: 0010:cpuidle_enter_state+0xb9/0x320
>>> [ 8197.348384] Code: e8 1c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 3e fb c0 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3
>>> [ 8197.348386] RSP: 0018:ffff9f0441953ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd5
>>> [ 8197.348388] RAX: ffff9759efae0fc0 RBX: 000007749807d911 RCX: 000000000000001f
>>> [ 8197.348390] RDX: 000007749807d911 RSI: 000000003a2e8670 RDI: 0000000000000000
>>> [ 8197.348393] RBP: ffff9759efae98a8 R08: 0000000000000002 R09: 0000000000020840
>>> [ 8197.348396] R10: 00626b4810384abc R11: ffff9759efae01e8 R12: 0000000000000001
>>> [ 8197.348398] R13: ffffffff8d0ac638 R14: 0000000000000001 R15: 0000000000000000
>>> [ 8197.348402]  ? cpuidle_enter_state+0x94/0x320
>>> [ 8197.348407]  do_idle+0x1e4/0x220
>>> [ 8197.348411]  cpu_startup_entry+0x5f/0x70
>>> [ 8197.348415]  start_secondary+0x185/0x1a0
>>> [ 8197.348417]  secondary_startup_64+0xa4/0xb0
> 
> 
> Very different trace , yet another bug to track .
> 
> If you can, try to remove some components from this setup.
> 

Will do. Just remembered I took out the VF stuff a few days ago and that 
netdev is just a normal vlan device now. Going to eliminate VRF and 
cake/ifb as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-26 11:45                 ` Andre Tomt
  2018-10-26 12:38                   ` Andre Tomt
@ 2018-10-27 21:41                   ` Andre Tomt
  2018-10-30 10:58                     ` Andre Tomt
  1 sibling, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-27 21:41 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 26.10.2018 13:45, Andre Tomt wrote:
> On 25.10.2018 19:38, Eric Dumazet wrote:
>>
>>
>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>
>>> It eventually showed up again with mlx4, on 4.18.16 + fix and also on 
>>> 4.19. I still do not have a useful packet capture.
>>>
>>> It is running a torrent client serving up various linux distributions.
>>>
>>
>> Have you also applied this fix ?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>
>>
> 
> No. I've applied it now to 4.19 and will report back if anything shows up.

Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. 
Only a basic stateless nftables ruleset and a vlan netdev (unlikely to 
be the one triggering this I guess; it has only v4 traffic).

On 4.19 + above commit:
> [158269.360271] p0xe0: hw csum failure
> [158269.360286] CPU: 3 PID: 0 Comm: swapper/3 Tainted: P           O      4.19.0-1 #1
> [158269.360287] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> [158269.360288] Call Trace:
> [158269.360290]  <IRQ>
> [158269.360295]  dump_stack+0x5c/0x7b
> [158269.360299]  __skb_gro_checksum_complete+0x9a/0xa0
> [158269.360301]  udp6_gro_receive+0x211/0x290
> [158269.360303]  ipv6_gro_receive+0x1b1/0x3a0
> [158269.360306]  ? ip_sublist_rcv_finish+0x70/0x70
> [158269.360307]  dev_gro_receive+0x3a0/0x620
> [158269.360309]  ? __build_skb+0x25/0xe0
> [158269.360310]  napi_gro_frags+0xa8/0x220
> [158269.360314]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> [158269.360322]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> [158269.360325]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> [158269.360327]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [158269.360329]  net_rx_action+0xe0/0x2e0
> [158269.360330]  __do_softirq+0xd8/0x2ff
> [158269.360333]  irq_exit+0xbd/0xd0
> [158269.360334]  do_IRQ+0x85/0xd0
> [158269.360336]  common_interrupt+0xf/0xf
> [158269.360337]  </IRQ>
> [158269.360339] RIP: 0010:cpuidle_enter_state+0xb3/0x310
> [158269.360340] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> [158269.360341] RSP: 0018:ffffaf28c634bea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> [158269.360342] RAX: ffff9a9f7fae0fc0 RBX: 00008ff1f4ff622a RCX: 000000000000001f
> [158269.360343] RDX: 00008ff1f4ff622a RSI: 0000000022983893 RDI: 0000000000000000
> [158269.360343] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000020840
> [158269.360344] R10: ffffaf28c634be88 R11: 0000000000000036 R12: ffff9a9f7fae9aa8
> [158269.360344] R13: ffffffffaa0ac638 R14: 0000000000000000 R15: 00008ff1f4f09d43
> [158269.360347]  ? cpuidle_enter_state+0x90/0x310
> [158269.360349]  do_idle+0x1d0/0x240
> [158269.360351]  cpu_startup_entry+0x5f/0x70
> [158269.360352]  start_secondary+0x185/0x1a0
> [158269.360354]  secondary_startup_64+0xa4/0xb0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-27 21:41                   ` Andre Tomt
@ 2018-10-30 10:58                     ` Andre Tomt
  2018-10-30 11:04                       ` Andre Tomt
  0 siblings, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-30 10:58 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 27.10.2018 23:41, Andre Tomt wrote:
> On 26.10.2018 13:45, Andre Tomt wrote:
>> On 25.10.2018 19:38, Eric Dumazet wrote:
>>>
>>>
>>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>>
>>>> It eventually showed up again with mlx4, on 4.18.16 + fix and also 
>>>> on 4.19. I still do not have a useful packet capture.
>>>>
>>>> It is running a torrent client serving up various linux distributions.
>>>>
>>>
>>> Have you also applied this fix ?
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>>
>>>
>>
>> No. I've applied it now to 4.19 and will report back if anything shows 
>> up.
> 
> Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. 
> Only a basic stateless nftables ruleset and a vlan netdev (unlikely to 
> be the one triggering this I guess; it has only v4 traffic).

I'm currently testing 4.19 with the recomended commit added, plus these 
to sort out some GRO issues (on a hunch, unsure if related):
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894

and I *think* it is behaving better now? it's not conclusive as it could 
take a while to trip in this environment but some of the test servers 
have not shown anything bad in almost 24h.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-30 10:58                     ` Andre Tomt
@ 2018-10-30 11:04                       ` Andre Tomt
  2018-10-31  4:08                         ` Andre Tomt
  0 siblings, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-30 11:04 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 30.10.2018 11:58, Andre Tomt wrote:
> On 27.10.2018 23:41, Andre Tomt wrote:
>> On 26.10.2018 13:45, Andre Tomt wrote:
>>> On 25.10.2018 19:38, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>>>
>>>>> It eventually showed up again with mlx4, on 4.18.16 + fix and also 
>>>>> on 4.19. I still do not have a useful packet capture.
>>>>>
>>>>> It is running a torrent client serving up various linux distributions.
>>>>>
>>>>
>>>> Have you also applied this fix ?
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>>>
>>>>
>>>
>>> No. I've applied it now to 4.19 and will report back if anything 
>>> shows up.
>>
>> Just hit it on the simpler server; no VRF, no tunnels, no 
>> nat/conntrack. Only a basic stateless nftables ruleset and a vlan 
>> netdev (unlikely to be the one triggering this I guess; it has only v4 
>> traffic).
> 
> I'm currently testing 4.19 with the recomended commit added, plus these 
> to sort out some GRO issues (on a hunch, unsure if related):
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 
> 
> 
> and I *think* it is behaving better now? it's not conclusive as it could 
> take a while to trip in this environment but some of the test servers 
> have not shown anything bad in almost 24h.

Sorry, s/some of the/none of the

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-19 21:58       ` Eric Dumazet
  2018-10-19 22:25         ` Eric Dumazet
@ 2018-10-31  0:25         ` Fabio Rossi
  1 sibling, 0 replies; 22+ messages in thread
From: Fabio Rossi @ 2018-10-31  0:25 UTC (permalink / raw)
  To: andre, Eric Dumazet; +Cc: Stephen Hemminger, Dimitris Michailidis, netdev

> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
> > On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
> >>
> >> On 15.10.2018 17:41, Eric Dumazet wrote:
> >>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> >>>> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> >>>> got the following first bad commit:
> >>>>
> >>>> commit 88078d98d1bb085d72af8437707279e203524fa5
> >>>> Author: Eric Dumazet <edumazet@google.com>
> >>>> Date:   Wed Apr 18 11:43:15 2018 -0700
> >>>>
> >>>>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >>>>
> >>>>      After working on IP defragmentation lately, I found that some large
> >>>>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >>>>      zero paddings on the last (small) fragment.
> >>>>
> >>>>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >>>>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >>>>      fragments had CHECKSUM_COMPLETE set.
> >>>>
> >>>>      We can instead compute the checksum of the part we are trimming,
> >>>>      usually smaller than the part we keep.
> >>>>
> >>>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
> >>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
> >>>>
> >>>
> >>> Thanks for bisecting !
> >>>
> >>> This commit is known to expose some NIC/driver bugs.
> >>>
> >>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> >>> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> >>>
> >>> I assume SKY2_HW_NEW_LE is not set on your NIC ?
> >>>
> >>
> >> I've seen similar on several systems with mlx4 cards when using 4.18.x -
> >> that is hw csum failure followed by some backtrace.
> >>
> >> Only seems to happen on systems dealing with quite a bit of UDP.
> >>
> > 
> > Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
> > but CHECKSUM_UNNECESSARY
> > 
> > I would be nice to track this a bit further, maybe by providing the
> > full packet content.
> > 
> >> Example from 4.18.10:
> >>> [635607.740574] p0xe0: hw csum failure
> >>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> >>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> >>> [635607.740599] Call Trace:
> >>> [635607.740602]  <IRQ>
> >>> [635607.740611]  dump_stack+0x5c/0x7b
> >>> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
> >>> [635607.740621]  udp6_gro_receive+0x211/0x290
> >>> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
> >>> [635607.740627]  dev_gro_receive+0x33e/0x550
> >>> [635607.740628]  napi_gro_frags+0xa2/0x210
> >>> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> >>> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> >>> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> >>> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> >>> [635607.740658]  net_rx_action+0xe0/0x2e0
> >>> [635607.740662]  __do_softirq+0xd8/0x2e5
> >>> [635607.740666]  irq_exit+0xb4/0xc0
> >>> [635607.740667]  do_IRQ+0x85/0xd0
> >>> [635607.740670]  common_interrupt+0xf/0xf
> >>> [635607.740671]  </IRQ>
> >>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> >>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> >>> [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> >>> [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> >>> [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> >>> [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> >>> [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> >>> [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> >>> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
> >>> [635607.740712]  do_idle+0x1d0/0x240
> >>> [635607.740715]  cpu_startup_entry+0x5f/0x70
> >>> [635607.740719]  start_secondary+0x185/0x1a0
> >>> [635607.740722]  secondary_startup_64+0xa5/0xb0
> >>> [635607.740731] p0xe0: hw csum failure
> >>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> >>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> >>> [635607.740746] Call Trace:
> >>> [635607.740747]  <IRQ>
> >>> [635607.740750]  dump_stack+0x5c/0x7b
> >>> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
> >>> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
> >>> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> >>> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> >>> [635607.740774]  ip6_input_finish+0xc0/0x460
> >>> [635607.740776]  ip6_input+0x2b/0x90
> >>> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
> >>> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
> >>> [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
> >>> [635607.740785]  __netif_receive_skb_core+0x455/0xb30
> >>> [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
> >>> [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
> >>> [635607.740792]  netif_receive_skb_internal+0x24/0xb0
> >>> [635607.740793]  napi_gro_frags+0x165/0x210
> >>> [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> >>> [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> >>> [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> >>> [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> >>> [635607.740811]  net_rx_action+0xe0/0x2e0
> >>> [635607.740813]  __do_softirq+0xd8/0x2e5
> >>> [635607.740816]  irq_exit+0xb4/0xc0
> >>> [635607.740817]  do_IRQ+0x85/0xd0
> >>> [635607.740820]  common_interrupt+0xf/0xf
> >>> [635607.740821]  </IRQ>
> >>> [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> >>> [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> >>> [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> >>> [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> >>> [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> >>> [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> >>> [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> >>> [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> >>> [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
> >>> [635607.740857]  do_idle+0x1d0/0x240
> >>> [635607.740859]  cpu_startup_entry+0x5f/0x70
> >>> [635607.740861]  start_secondary+0x185/0x1a0
> >>> [635607.740863]  secondary_startup_64+0xa5/0xb0
> 
> As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub()
> 
> Problems comes from trimming an odd number of bytes.

I have applied the commit db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 on top of kernel 4.19.0 and the problem seems gone. The same commit applies to kernel 4.18.x but doesn't work. Thanks!

Fabio

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-30 11:04                       ` Andre Tomt
@ 2018-10-31  4:08                         ` Andre Tomt
  2018-11-04  5:43                           ` Andre Tomt
  0 siblings, 1 reply; 22+ messages in thread
From: Andre Tomt @ 2018-10-31  4:08 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 30.10.2018 12:04, Andre Tomt wrote:
> On 30.10.2018 11:58, Andre Tomt wrote:
>> On 27.10.2018 23:41, Andre Tomt wrote:
>>> On 26.10.2018 13:45, Andre Tomt wrote:
>>>> On 25.10.2018 19:38, Eric Dumazet wrote:
>>>>>
>>>>>
>>>>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>>>>
>>>>>> It eventually showed up again with mlx4, on 4.18.16 + fix and also 
>>>>>> on 4.19. I still do not have a useful packet capture.
>>>>>>
>>>>>> It is running a torrent client serving up various linux 
>>>>>> distributions.
>>>>>>
>>>>>
>>>>> Have you also applied this fix ?
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>>>>
>>>>>
>>>>
>>>> No. I've applied it now to 4.19 and will report back if anything 
>>>> shows up.
>>>
>>> Just hit it on the simpler server; no VRF, no tunnels, no 
>>> nat/conntrack. Only a basic stateless nftables ruleset and a vlan 
>>> netdev (unlikely to be the one triggering this I guess; it has only 
>>> v4 traffic).
>>
>> I'm currently testing 4.19 with the recomended commit added, plus 
>> these to sort out some GRO issues (on a hunch, unsure if related):
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 
>>
>>
>> and I *think* it is behaving better now? it's not conclusive as it 
>> could take a while to trip in this environment but some of the test 
>> servers have not shown anything bad in almost 24h.
> 
> Sorry, s/some of the/none of the

I think it is fairly safe to say 4.19 + mlx4 + these 4 commits is OK. At 
least for my workload. Servers are now 51-61 hours in, no splats. I also 
added ntp pool traffic to one of them to make things a little more exciting.

Not sure what is needed for 4.18, I dont have the mental bandwidth to 
test that right now. Also no idea about the similar looking mlx5 splats 
reported elsewhere.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
  2018-10-31  4:08                         ` Andre Tomt
@ 2018-11-04  5:43                           ` Andre Tomt
  0 siblings, 0 replies; 22+ messages in thread
From: Andre Tomt @ 2018-11-04  5:43 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis

On 31.10.2018 05:08, Andre Tomt wrote:
> On 30.10.2018 12:04, Andre Tomt wrote:
>> On 30.10.2018 11:58, Andre Tomt wrote:
>>> On 27.10.2018 23:41, Andre Tomt wrote:
>>>> On 26.10.2018 13:45, Andre Tomt wrote:
>>>>> On 25.10.2018 19:38, Eric Dumazet wrote:
>>>>>>
>>>>>>
>>>>>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>>>>>
>>>>>>> It eventually showed up again with mlx4, on 4.18.16 + fix and 
>>>>>>> also on 4.19. I still do not have a useful packet capture.
>>>>>>>
>>>>>>> It is running a torrent client serving up various linux 
>>>>>>> distributions.
>>>>>>>
>>>>>>
>>>>>> Have you also applied this fix ?
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>>>>>
>>>>>>
>>>>>
>>>>> No. I've applied it now to 4.19 and will report back if anything 
>>>>> shows up.
>>>>
>>>> Just hit it on the simpler server; no VRF, no tunnels, no 
>>>> nat/conntrack. Only a basic stateless nftables ruleset and a vlan 
>>>> netdev (unlikely to be the one triggering this I guess; it has only 
>>>> v4 traffic).
>>>
>>> I'm currently testing 4.19 with the recomended commit added, plus 
>>> these to sort out some GRO issues (on a hunch, unsure if related):
>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 
>>>
>>>
>>> and I *think* it is behaving better now? it's not conclusive as it 
>>> could take a while to trip in this environment but some of the test 
>>> servers have not shown anything bad in almost 24h.
>>
>> Sorry, s/some of the/none of the
> 
> I think it is fairly safe to say 4.19 + mlx4 + these 4 commits is OK. At 
> least for my workload. Servers are now 51-61 hours in, no splats. I also 
> added ntp pool traffic to one of them to make things a little more 
> exciting.
> 
> Not sure what is needed for 4.18, I dont have the mental bandwidth to 
> test that right now. Also no idea about the similar looking mlx5 splats 
> reported elsewhere.

As expected conntrack/nat + vlan + forwarding still splats.
sch_cake, IFB and VRF was removed from this setup.

Here is a conntrack splat without IFB/VRF/Cake inteference:
> [34458.506346] wanib: hw csum failure
> [34458.506371] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-1 #1
> [34458.506374] Hardware name: Supermicro Super Server/X10SDV-4C-TLN2F, BIOS 2.0 06/13/2018
> [34458.506377] Call Trace:
> [34458.506381]  <IRQ>
> [34458.506388]  dump_stack+0x5c/0x80
> [34458.506392]  __skb_checksum_complete+0xac/0xc0
> [34458.506402]  icmp_error+0x1c8/0x1f0 [nf_conntrack]
> [34458.506406]  ? skb_copy_bits+0x13d/0x220
> [34458.506411]  nf_conntrack_in+0xd8/0x390 [nf_conntrack]
> [34458.506416]  ? ___pskb_trim+0x192/0x330
> [34458.506421]  nf_hook_slow+0x43/0xc0
> [34458.506426]  ip_rcv+0x90/0xb0
> [34458.506430]  ? ip_rcv_finish_core.isra.0+0x310/0x310
> [34458.506435]  __netif_receive_skb_one_core+0x42/0x50
> [34458.506438]  netif_receive_skb_internal+0x24/0xb0
> [34458.506441]  napi_gro_frags+0x177/0x210
> [34458.506446]  mlx4_en_process_rx_cq+0x8df/0xb50 [mlx4_en]
> [34458.506459]  ? mlx4_eq_int+0x38f/0xcb0 [mlx4_core]
> [34458.506463]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [34458.506466]  net_rx_action+0xe1/0x2c0
> [34458.506469]  __do_softirq+0xe7/0x2d3
> [34458.506475]  irq_exit+0x96/0xd0
> [34458.506478]  do_IRQ+0x85/0xd0
> [34458.506483]  common_interrupt+0xf/0xf
> [34458.506486]  </IRQ>
> [34458.506491] RIP: 0010:cpuidle_enter_state+0xb9/0x320
> [34458.506495] Code: e8 3c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 5e fb c0 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3
> [34458.506497] RSP: 0018:ffff978d41943ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
> [34458.506500] RAX: ffff8d8f6fa60fc0 RBX: 00001f56ff07af28 RCX: 000000000000001f
> [34458.506501] RDX: 00001f56ff07af28 RSI: 000000003a2e90d6 RDI: 0000000000000000
> [34458.506503] RBP: ffff8d8f6fa698c0 R08: 0000000000000002 R09: 0000000000020840
> [34458.506504] R10: 0004ea58f2899595 R11: ffff8d8f6fa601e8 R12: 0000000000000001
> [34458.506505] R13: ffffffff8a0ac638 R14: 0000000000000001 R15: 0000000000000000
> [34458.506509]  ? cpuidle_enter_state+0x94/0x320
> [34458.506512]  do_idle+0x1e4/0x220
> [34458.506515]  cpu_startup_entry+0x5f/0x70
> [34458.506519]  start_secondary+0x185/0x1a0
> [34458.506521]  secondary_startup_64+0xa4/0xb0

Stateless filtered non-forwarding host still looks like it has been 
fixed (the udp6_gro_* splats are still all gone). Also seems fine when 
moving the traffic over a vlan device. These fixes went into 4.19.1-rc1 
(checksum_complete + unlink gro packets on overflow fixes)

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-11-04 14:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15 15:15 Fw: [Bug 201423] New: eth0: hw csum failure Stephen Hemminger
2018-10-15 15:41 ` Eric Dumazet
2018-10-15 16:12   ` Dave Stevenson
2018-10-15 16:21   ` Stephen Hemminger
2018-10-15 22:28   ` Fw: " Fabio Rossi
2018-10-16  6:30   ` Andre Tomt
2018-10-16 13:00     ` Eric Dumazet
2018-10-19 21:58       ` Eric Dumazet
2018-10-19 22:25         ` Eric Dumazet
2018-10-21 13:34           ` Andre Tomt
2018-10-24 19:41             ` Andre Tomt
2018-10-25 17:38               ` Eric Dumazet
2018-10-26 11:45                 ` Andre Tomt
2018-10-26 12:38                   ` Andre Tomt
2018-10-26 12:59                     ` Eric Dumazet
2018-10-26 13:17                       ` Andre Tomt
2018-10-27 21:41                   ` Andre Tomt
2018-10-30 10:58                     ` Andre Tomt
2018-10-30 11:04                       ` Andre Tomt
2018-10-31  4:08                         ` Andre Tomt
2018-11-04  5:43                           ` Andre Tomt
2018-10-31  0:25         ` Fabio Rossi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.