* Fwd: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() [not found] <bug-209423-201211-atteo0d1ZY@https.bugzilla.kernel.org/> @ 2020-10-01 20:34 ` Heiner Kallweit 2020-10-02 8:26 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Heiner Kallweit @ 2020-10-01 20:34 UTC (permalink / raw) To: Eric Dumazet, netdev I have a problem with the following code in ndo_start_xmit() of the r8169 driver. A user reported the WARN being triggered due to gso_size > 0 and gso_type = 0. The chip supports TSO(6). The driver is widely used, therefore I'd expect much more such reports if it should be a common problem. Not sure what's special. My primary question: Is it a valid use case that gso_size is greater than 0, and no SKB_GSO_ flag is set? Any hint would be appreciated. u32 mss = shinfo->gso_size; if (mss) { if (shinfo->gso_type & SKB_GSO_TCPV4) { opts[0] |= TD1_GTSENV4; } else if (shinfo->gso_type & SKB_GSO_TCPV6) { if (skb_cow_head(skb, 0)) return false; tcp_v6_gso_csum_prep(skb); opts[0] |= TD1_GTSENV6; } else { WARN_ON_ONCE(1); } -------- Forwarded Message -------- Subject: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() Date: Thu, 01 Oct 2020 19:19:24 +0000 From: bugzilla-daemon@bugzilla.kernel.org To: hkallweit1@gmail.com https://bugzilla.kernel.org/show_bug.cgi?id=209423 --- Comment #7 from Damian Wrobel (dwrobel@ertelnet.rybnik.pl) --- Here it comes: [86678.377120] ------------[ cut here ]------------ [86678.377155] gso_size = 1448, gso_type = 0x00000000 [86678.377381] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 [r8169] [86678.377393] Modules linked in: tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_amd snd_hda_intel snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core snd_hwdep snd_pcm hp_wmi snd_timer wmi_bmof sparse_keymap irqbypass snd sp5100_tco i2c_piix4 soundcore k10temp fam15h_power rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ax88179_178a usbnet serio_raw r8169 mii [86678.377442] wmi video [86678.377486] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-201.fc32.x86_64 #1 [86678.377495] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 [86678.377511] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169] [86678.377521] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff ff 44 89 ee 48 c7 c7 b0 72 36 c0 c6 05 a4 20 01 00 01 e8 0d 33 d8 e1 <0f> 0b 44 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff [86678.377533] RSP: 0018:ffffa8f280003c80 EFLAGS: 00010282 [86678.377542] RAX: 0000000000000026 RBX: ffff8d331abc6000 RCX: 0000000000000000 [86678.377551] RDX: ffff8d331b427060 RSI: ffff8d331b418d00 RDI: 0000000000000300 [86678.377559] RBP: ffff8d32b5bb8200 R08: 00000000000003d0 R09: 000000000000000d [86678.377576] R10: 0000000000000730 R11: ffffa8f280003b15 R12: 00000000000001c0 [86678.377596] R13: 00000000000005a8 R14: 0000000000000022 R15: 000000000000001c [86678.377606] FS: 0000000000000000(0000) GS:ffff8d331b400000(0000) knlGS:0000000000000000 [86678.377617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [86678.377624] CR2: 00007fa516f64520 CR3: 00000000b6de6000 CR4: 00000000001406f0 [86678.377632] Call Trace: [86678.377641] <IRQ> [86678.377657] dev_hard_start_xmit+0x8d/0x1d0 [86678.377676] sch_direct_xmit+0xeb/0x2f0 [86678.377687] __dev_queue_xmit+0x710/0x8a0 [86678.377713] ? nf_confirm+0xcb/0xf0 [nf_conntrack] [86678.377725] ? nf_hook_slow+0x3f/0xb0 [86678.377735] ip_finish_output2+0x2ad/0x560 [86678.377746] __netif_receive_skb_core+0x4f0/0xf40 [86678.377758] ? packet_rcv+0x44/0x490 [86678.377770] __netif_receive_skb_one_core+0x2d/0x70 [86678.377779] process_backlog+0x96/0x160 [86678.377789] net_rx_action+0x13c/0x3e0 [86678.377804] ? usbnet_bh+0x24/0x2b0 [usbnet] [86678.377815] __do_softirq+0xd9/0x2c4 [86678.377825] asm_call_on_stack+0x12/0x20 [86678.377835] </IRQ> [86678.377845] do_softirq_own_stack+0x39/0x50 [86678.377855] irq_exit_rcu+0xc2/0x100 [86678.377865] common_interrupt+0x75/0x140 [86678.377875] asm_common_interrupt+0x1e/0x40 [86678.377885] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0 [86678.377894] Code: e0 ab 6b 5d e8 ab c4 7b ff 49 89 c7 0f 1f 44 00 00 31 ff e8 7c dd 7b ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48 [86678.377907] RSP: 0018:ffffffffa3a03e58 EFLAGS: 00000246 [86678.377915] RAX: ffff8d331b42a2c0 RBX: ffff8d3312f3e400 RCX: 000000000000001f [86678.377923] RDX: 0000000000000000 RSI: 00000000401ec2e2 RDI: 0000000000000000 [86678.377931] RBP: ffffffffa3b78960 R08: 00004ed561df8e36 R09: 0000000000000006 [86678.377939] R10: 000000000000001d R11: 000000000000000e R12: 0000000000000002 [86678.377956] R13: ffff8d3312f3e400 R14: 0000000000000002 R15: 00004ed561df8e36 [86678.377970] ? cpuidle_enter_state+0xa4/0x3f0 [86678.377980] cpuidle_enter+0x29/0x40 [86678.377990] do_idle+0x1d5/0x2a0 [86678.377999] cpu_startup_entry+0x19/0x20 [86678.378009] start_kernel+0x7f4/0x804 [86678.378022] secondary_startup_64+0xb6/0xc0 [86678.378032] ---[ end trace 263bcddb7119c953 ]--- -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-01 20:34 ` Fwd: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() Heiner Kallweit @ 2020-10-02 8:26 ` Eric Dumazet 2020-10-02 8:32 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2020-10-02 8:26 UTC (permalink / raw) To: Heiner Kallweit; +Cc: netdev On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > I have a problem with the following code in ndo_start_xmit() of > the r8169 driver. A user reported the WARN being triggered due > to gso_size > 0 and gso_type = 0. The chip supports TSO(6). > The driver is widely used, therefore I'd expect much more such > reports if it should be a common problem. Not sure what's special. > My primary question: Is it a valid use case that gso_size is > greater than 0, and no SKB_GSO_ flag is set? > Any hint would be appreciated. > > Maybe this is not a TCP packet ? But in this case GSO should have taken place. You might add a pr_err_once("gso_type=%x\n", shinfo->gso_type); > > u32 mss = shinfo->gso_size; > > if (mss) { > if (shinfo->gso_type & SKB_GSO_TCPV4) { > opts[0] |= TD1_GTSENV4; > } else if (shinfo->gso_type & SKB_GSO_TCPV6) { > if (skb_cow_head(skb, 0)) > return false; > > tcp_v6_gso_csum_prep(skb); > opts[0] |= TD1_GTSENV6; > } else { > WARN_ON_ONCE(1); > } > > > > > -------- Forwarded Message -------- > Subject: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() > Date: Thu, 01 Oct 2020 19:19:24 +0000 > From: bugzilla-daemon@bugzilla.kernel.org > To: hkallweit1@gmail.com > > https://bugzilla.kernel.org/show_bug.cgi?id=209423 > > --- Comment #7 from Damian Wrobel (dwrobel@ertelnet.rybnik.pl) --- > Here it comes: > > [86678.377120] ------------[ cut here ]------------ > [86678.377155] gso_size = 1448, gso_type = 0x00000000 > [86678.377381] WARNING: CPU: 0 PID: 0 at > drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 > [r8169] > [86678.377393] Modules linked in: tun nft_nat nft_masq nft_objref > nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 > nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject > nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw > ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 > iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink > ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek > edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_amd > snd_hda_intel snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core snd_hwdep > snd_pcm hp_wmi snd_timer wmi_bmof sparse_keymap irqbypass snd sp5100_tco > i2c_piix4 soundcore k10temp fam15h_power rfkill_gpio rfkill acpi_cpufreq > ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec drm > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ax88179_178a > usbnet serio_raw r8169 mii > [86678.377442] wmi video > [86678.377486] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-201.fc32.x86_64 > #1 > [86678.377495] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 > 02/04/2020 > [86678.377511] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169] > [86678.377521] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff > ff 44 89 ee 48 c7 c7 b0 72 36 c0 c6 05 a4 20 01 00 01 e8 0d 33 d8 e1 <0f> 0b 44 > 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff > [86678.377533] RSP: 0018:ffffa8f280003c80 EFLAGS: 00010282 > [86678.377542] RAX: 0000000000000026 RBX: ffff8d331abc6000 RCX: > 0000000000000000 > [86678.377551] RDX: ffff8d331b427060 RSI: ffff8d331b418d00 RDI: > 0000000000000300 > [86678.377559] RBP: ffff8d32b5bb8200 R08: 00000000000003d0 R09: > 000000000000000d > [86678.377576] R10: 0000000000000730 R11: ffffa8f280003b15 R12: > 00000000000001c0 > [86678.377596] R13: 00000000000005a8 R14: 0000000000000022 R15: > 000000000000001c > [86678.377606] FS: 0000000000000000(0000) GS:ffff8d331b400000(0000) > knlGS:0000000000000000 > [86678.377617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [86678.377624] CR2: 00007fa516f64520 CR3: 00000000b6de6000 CR4: > 00000000001406f0 > [86678.377632] Call Trace: > [86678.377641] <IRQ> > [86678.377657] dev_hard_start_xmit+0x8d/0x1d0 > [86678.377676] sch_direct_xmit+0xeb/0x2f0 > [86678.377687] __dev_queue_xmit+0x710/0x8a0 > [86678.377713] ? nf_confirm+0xcb/0xf0 [nf_conntrack] > [86678.377725] ? nf_hook_slow+0x3f/0xb0 > [86678.377735] ip_finish_output2+0x2ad/0x560 > [86678.377746] __netif_receive_skb_core+0x4f0/0xf40 > [86678.377758] ? packet_rcv+0x44/0x490 > [86678.377770] __netif_receive_skb_one_core+0x2d/0x70 > [86678.377779] process_backlog+0x96/0x160 > [86678.377789] net_rx_action+0x13c/0x3e0 > [86678.377804] ? usbnet_bh+0x24/0x2b0 [usbnet] > [86678.377815] __do_softirq+0xd9/0x2c4 > [86678.377825] asm_call_on_stack+0x12/0x20 > [86678.377835] </IRQ> > [86678.377845] do_softirq_own_stack+0x39/0x50 > [86678.377855] irq_exit_rcu+0xc2/0x100 > [86678.377865] common_interrupt+0x75/0x140 > [86678.377875] asm_common_interrupt+0x1e/0x40 > [86678.377885] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0 > [86678.377894] Code: e0 ab 6b 5d e8 ab c4 7b ff 49 89 c7 0f 1f 44 00 00 31 ff > e8 7c dd 7b ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 > 0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48 > [86678.377907] RSP: 0018:ffffffffa3a03e58 EFLAGS: 00000246 > [86678.377915] RAX: ffff8d331b42a2c0 RBX: ffff8d3312f3e400 RCX: > 000000000000001f > [86678.377923] RDX: 0000000000000000 RSI: 00000000401ec2e2 RDI: > 0000000000000000 > [86678.377931] RBP: ffffffffa3b78960 R08: 00004ed561df8e36 R09: > 0000000000000006 > [86678.377939] R10: 000000000000001d R11: 000000000000000e R12: > 0000000000000002 > [86678.377956] R13: ffff8d3312f3e400 R14: 0000000000000002 R15: > 00004ed561df8e36 > [86678.377970] ? cpuidle_enter_state+0xa4/0x3f0 > [86678.377980] cpuidle_enter+0x29/0x40 > [86678.377990] do_idle+0x1d5/0x2a0 > [86678.377999] cpu_startup_entry+0x19/0x20 > [86678.378009] start_kernel+0x7f4/0x804 > [86678.378022] secondary_startup_64+0xb6/0xc0 > [86678.378032] ---[ end trace 263bcddb7119c953 ]--- > > -- > You are receiving this mail because: > You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-02 8:26 ` Eric Dumazet @ 2020-10-02 8:32 ` Eric Dumazet 2020-10-02 8:46 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2020-10-02 8:32 UTC (permalink / raw) To: Eric Dumazet, Heiner Kallweit; +Cc: netdev On 10/2/20 10:26 AM, Eric Dumazet wrote: > On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >> >> I have a problem with the following code in ndo_start_xmit() of >> the r8169 driver. A user reported the WARN being triggered due >> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). >> The driver is widely used, therefore I'd expect much more such >> reports if it should be a common problem. Not sure what's special. >> My primary question: Is it a valid use case that gso_size is >> greater than 0, and no SKB_GSO_ flag is set? >> Any hint would be appreciated. >> >> > > Maybe this is not a TCP packet ? But in this case GSO should have taken place. > > You might add a > pr_err_once("gso_type=%x\n", shinfo->gso_type); > >> >> u32 mss = shinfo->gso_size; >> >> if (mss) { > > > >> if (shinfo->gso_type & SKB_GSO_TCPV4) { >> opts[0] |= TD1_GTSENV4; >> } else if (shinfo->gso_type & SKB_GSO_TCPV6) { >> if (skb_cow_head(skb, 0)) >> return false; >> >> tcp_v6_gso_csum_prep(skb); >> opts[0] |= TD1_GTSENV6; >> } else { >> WARN_ON_ONCE(1); >> } >> >> >> >> >> -------- Forwarded Message -------- >> Subject: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() >> Date: Thu, 01 Oct 2020 19:19:24 +0000 >> From: bugzilla-daemon@bugzilla.kernel.org >> To: hkallweit1@gmail.com >> >> https://bugzilla.kernel.org/show_bug.cgi?id=209423 >> >> --- Comment #7 from Damian Wrobel (dwrobel@ertelnet.rybnik.pl) --- >> Here it comes: >> >> [86678.377120] ------------[ cut here ]------------ >> [86678.377155] gso_size = 1448, gso_type = 0x00000000 Ah, sorry I see you already printed gso_type Must then be a bug somewhere :/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-02 8:32 ` Eric Dumazet @ 2020-10-02 8:46 ` Eric Dumazet 2020-10-02 11:09 ` Heiner Kallweit 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2020-10-02 8:46 UTC (permalink / raw) To: Eric Dumazet; +Cc: Heiner Kallweit, netdev On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 10/2/20 10:26 AM, Eric Dumazet wrote: > > On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >> > >> I have a problem with the following code in ndo_start_xmit() of > >> the r8169 driver. A user reported the WARN being triggered due > >> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). > >> The driver is widely used, therefore I'd expect much more such > >> reports if it should be a common problem. Not sure what's special. > >> My primary question: Is it a valid use case that gso_size is > >> greater than 0, and no SKB_GSO_ flag is set? > >> Any hint would be appreciated. > >> > >> > > > > Maybe this is not a TCP packet ? But in this case GSO should have taken place. > > > > You might add a > > pr_err_once("gso_type=%x\n", shinfo->gso_type); > > > > Ah, sorry I see you already printed gso_type > > Must then be a bug somewhere :/ napi_reuse_skb() does : skb_shinfo(skb)->gso_type = 0; It does _not_ clear gso_size. I wonder if in some cases we could reuse an skb while gso_size is not zero. Normally, we set it only from dev_gro_receive() when the skb is queued into GRO engine (status being GRO_HELD) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-02 8:46 ` Eric Dumazet @ 2020-10-02 11:09 ` Heiner Kallweit 2020-10-02 11:48 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Heiner Kallweit @ 2020-10-02 11:09 UTC (permalink / raw) To: Eric Dumazet, Eric Dumazet; +Cc: netdev On 02.10.2020 10:46, Eric Dumazet wrote: > On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> >> >> On 10/2/20 10:26 AM, Eric Dumazet wrote: >>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>> >>>> I have a problem with the following code in ndo_start_xmit() of >>>> the r8169 driver. A user reported the WARN being triggered due >>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). >>>> The driver is widely used, therefore I'd expect much more such >>>> reports if it should be a common problem. Not sure what's special. >>>> My primary question: Is it a valid use case that gso_size is >>>> greater than 0, and no SKB_GSO_ flag is set? >>>> Any hint would be appreciated. >>>> >>>> >>> >>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. >>> >>> You might add a >>> pr_err_once("gso_type=%x\n", shinfo->gso_type); >>> > >> >> Ah, sorry I see you already printed gso_type >> >> Must then be a bug somewhere :/ > > > napi_reuse_skb() does : > > skb_shinfo(skb)->gso_type = 0; > > It does _not_ clear gso_size. > > I wonder if in some cases we could reuse an skb while gso_size is not zero. > > Normally, we set it only from dev_gro_receive() when the skb is queued > into GRO engine (status being GRO_HELD) > Thanks Eric. I'm no expert that deep in the network stack and just wonder why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). The latter one does a memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); What I can do is letting the affected user test the following. diff --git a/net/core/dev.c b/net/core/dev.c index 62b06523b..8e75399cc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb->encapsulation = 0; skb_shinfo(skb)->gso_type = 0; + skb_shinfo(skb)->gso_size = 0; skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); skb_ext_reset(skb); -- 2.28.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-02 11:09 ` Heiner Kallweit @ 2020-10-02 11:48 ` Eric Dumazet 2020-10-08 16:37 ` Heiner Kallweit 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2020-10-02 11:48 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Eric Dumazet, netdev On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > On 02.10.2020 10:46, Eric Dumazet wrote: > > On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> > >> > >> > >> On 10/2/20 10:26 AM, Eric Dumazet wrote: > >>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >>>> > >>>> I have a problem with the following code in ndo_start_xmit() of > >>>> the r8169 driver. A user reported the WARN being triggered due > >>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). > >>>> The driver is widely used, therefore I'd expect much more such > >>>> reports if it should be a common problem. Not sure what's special. > >>>> My primary question: Is it a valid use case that gso_size is > >>>> greater than 0, and no SKB_GSO_ flag is set? > >>>> Any hint would be appreciated. > >>>> > >>>> > >>> > >>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. > >>> > >>> You might add a > >>> pr_err_once("gso_type=%x\n", shinfo->gso_type); > >>> > > > >> > >> Ah, sorry I see you already printed gso_type > >> > >> Must then be a bug somewhere :/ > > > > > > napi_reuse_skb() does : > > > > skb_shinfo(skb)->gso_type = 0; > > > > It does _not_ clear gso_size. > > > > I wonder if in some cases we could reuse an skb while gso_size is not zero. > > > > Normally, we set it only from dev_gro_receive() when the skb is queued > > into GRO engine (status being GRO_HELD) > > > Thanks Eric. I'm no expert that deep in the network stack and just wonder > why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). > The latter one does a > memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > memset() over the whole thing is more expensive. Here we know the prior state of some fields, while __alloc_skb() just got a piece of memory with random content. > What I can do is letting the affected user test the following. > > diff --git a/net/core/dev.c b/net/core/dev.c > index 62b06523b..8e75399cc 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) > > skb->encapsulation = 0; > skb_shinfo(skb)->gso_type = 0; > + skb_shinfo(skb)->gso_size = 0; > skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); > skb_ext_reset(skb); > As I hinted, this should not be needed. For debugging purposes, I would rather do : BUG_ON(skb_shinfo(skb)->gso_size); Nothing in GRO stack will change gso_size, unless the packet is queued by GRO layer (after this, napi_reuse_skb() wont be called) napi_reuse_skb() is only used when a packet has been aggregated to another, and at this point gso_size should be still 0. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-02 11:48 ` Eric Dumazet @ 2020-10-08 16:37 ` Heiner Kallweit 2020-10-08 17:15 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Heiner Kallweit @ 2020-10-08 16:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: Eric Dumazet, netdev On 02.10.2020 13:48, Eric Dumazet wrote: > On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >> >> On 02.10.2020 10:46, Eric Dumazet wrote: >>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: >>>> >>>> >>>> >>>> On 10/2/20 10:26 AM, Eric Dumazet wrote: >>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>>>> >>>>>> I have a problem with the following code in ndo_start_xmit() of >>>>>> the r8169 driver. A user reported the WARN being triggered due >>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). >>>>>> The driver is widely used, therefore I'd expect much more such >>>>>> reports if it should be a common problem. Not sure what's special. >>>>>> My primary question: Is it a valid use case that gso_size is >>>>>> greater than 0, and no SKB_GSO_ flag is set? >>>>>> Any hint would be appreciated. >>>>>> >>>>>> >>>>> >>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. >>>>> >>>>> You might add a >>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type); >>>>> >>> >>>> >>>> Ah, sorry I see you already printed gso_type >>>> >>>> Must then be a bug somewhere :/ >>> >>> >>> napi_reuse_skb() does : >>> >>> skb_shinfo(skb)->gso_type = 0; >>> >>> It does _not_ clear gso_size. >>> >>> I wonder if in some cases we could reuse an skb while gso_size is not zero. >>> >>> Normally, we set it only from dev_gro_receive() when the skb is queued >>> into GRO engine (status being GRO_HELD) >>> >> Thanks Eric. I'm no expert that deep in the network stack and just wonder >> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). >> The latter one does a >> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); >> > > memset() over the whole thing is more expensive. > > Here we know the prior state of some fields, while __alloc_skb() just > got a piece of memory with random content. > >> What I can do is letting the affected user test the following. >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 62b06523b..8e75399cc 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) >> >> skb->encapsulation = 0; >> skb_shinfo(skb)->gso_type = 0; >> + skb_shinfo(skb)->gso_size = 0; >> skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); >> skb_ext_reset(skb); >> > > As I hinted, this should not be needed. > > For debugging purposes, I would rather do : > > BUG_ON(skb_shinfo(skb)->gso_size); > We did the following for debugging: diff --git a/net/core/dev.c b/net/core/dev.c index 62b06523b..4c943b774 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, { u16 gso_segs = skb_shinfo(skb)->gso_segs; + if (!skb_shinfo(skb)->gso_type) + skb_warn_bad_offload(skb); + if (gso_segs > dev->gso_max_segs) return features & ~NETIF_F_GSO_MASK; Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps to find out where in the network stack something goes wrong. [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536 mac=(778,14) net=(792,20) trans=812 shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1)) csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2 [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00 [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8 [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18 [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77 [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8 [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63 [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d > > Nothing in GRO stack will change gso_size, unless the packet is queued > by GRO layer (after this, napi_reuse_skb() wont be called) > > napi_reuse_skb() is only used when a packet has been aggregated to > another, and at this point gso_size should be still 0. > ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-08 16:37 ` Heiner Kallweit @ 2020-10-08 17:15 ` Eric Dumazet 2020-10-08 18:41 ` Heiner Kallweit 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2020-10-08 17:15 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Eric Dumazet, netdev On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > On 02.10.2020 13:48, Eric Dumazet wrote: > > On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >> > >> On 02.10.2020 10:46, Eric Dumazet wrote: > >>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > >>>> > >>>> > >>>> > >>>> On 10/2/20 10:26 AM, Eric Dumazet wrote: > >>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >>>>>> > >>>>>> I have a problem with the following code in ndo_start_xmit() of > >>>>>> the r8169 driver. A user reported the WARN being triggered due > >>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). > >>>>>> The driver is widely used, therefore I'd expect much more such > >>>>>> reports if it should be a common problem. Not sure what's special. > >>>>>> My primary question: Is it a valid use case that gso_size is > >>>>>> greater than 0, and no SKB_GSO_ flag is set? > >>>>>> Any hint would be appreciated. > >>>>>> > >>>>>> > >>>>> > >>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. > >>>>> > >>>>> You might add a > >>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type); > >>>>> > >>> > >>>> > >>>> Ah, sorry I see you already printed gso_type > >>>> > >>>> Must then be a bug somewhere :/ > >>> > >>> > >>> napi_reuse_skb() does : > >>> > >>> skb_shinfo(skb)->gso_type = 0; > >>> > >>> It does _not_ clear gso_size. > >>> > >>> I wonder if in some cases we could reuse an skb while gso_size is not zero. > >>> > >>> Normally, we set it only from dev_gro_receive() when the skb is queued > >>> into GRO engine (status being GRO_HELD) > >>> > >> Thanks Eric. I'm no expert that deep in the network stack and just wonder > >> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). > >> The latter one does a > >> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > >> > > > > memset() over the whole thing is more expensive. > > > > Here we know the prior state of some fields, while __alloc_skb() just > > got a piece of memory with random content. > > > >> What I can do is letting the affected user test the following. > >> > >> diff --git a/net/core/dev.c b/net/core/dev.c > >> index 62b06523b..8e75399cc 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) > >> > >> skb->encapsulation = 0; > >> skb_shinfo(skb)->gso_type = 0; > >> + skb_shinfo(skb)->gso_size = 0; > >> skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); > >> skb_ext_reset(skb); > >> > > > > As I hinted, this should not be needed. > > > > For debugging purposes, I would rather do : > > > > BUG_ON(skb_shinfo(skb)->gso_size); > > > > We did the following for debugging: > > diff --git a/net/core/dev.c b/net/core/dev.c > index 62b06523b..4c943b774 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, > { > u16 gso_segs = skb_shinfo(skb)->gso_segs; > > + if (!skb_shinfo(skb)->gso_type) > + skb_warn_bad_offload(skb); You also want to get a stack trace here, to give us the call graph. > + > if (gso_segs > dev->gso_max_segs) > return features & ~NETIF_F_GSO_MASK; > > Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps > to find out where in the network stack something goes wrong. > > > [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536 > mac=(778,14) net=(792,20) trans=812 > shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1)) > csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0) > hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 > [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2 > [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00 > [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8 > [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18 > [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea > [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a > [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77 > [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8 > [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63 > [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d > > > > > > > Nothing in GRO stack will change gso_size, unless the packet is queued > > by GRO layer (after this, napi_reuse_skb() wont be called) > > > > napi_reuse_skb() is only used when a packet has been aggregated to > > another, and at this point gso_size should be still 0. > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-08 17:15 ` Eric Dumazet @ 2020-10-08 18:41 ` Heiner Kallweit 2020-10-08 18:50 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Heiner Kallweit @ 2020-10-08 18:41 UTC (permalink / raw) To: Eric Dumazet; +Cc: Eric Dumazet, netdev On 08.10.2020 19:15, Eric Dumazet wrote: > On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >> >> On 02.10.2020 13:48, Eric Dumazet wrote: >>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>> >>>> On 02.10.2020 10:46, Eric Dumazet wrote: >>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote: >>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>>>>>> >>>>>>>> I have a problem with the following code in ndo_start_xmit() of >>>>>>>> the r8169 driver. A user reported the WARN being triggered due >>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). >>>>>>>> The driver is widely used, therefore I'd expect much more such >>>>>>>> reports if it should be a common problem. Not sure what's special. >>>>>>>> My primary question: Is it a valid use case that gso_size is >>>>>>>> greater than 0, and no SKB_GSO_ flag is set? >>>>>>>> Any hint would be appreciated. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. >>>>>>> >>>>>>> You might add a >>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type); >>>>>>> >>>>> >>>>>> >>>>>> Ah, sorry I see you already printed gso_type >>>>>> >>>>>> Must then be a bug somewhere :/ >>>>> >>>>> >>>>> napi_reuse_skb() does : >>>>> >>>>> skb_shinfo(skb)->gso_type = 0; >>>>> >>>>> It does _not_ clear gso_size. >>>>> >>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero. >>>>> >>>>> Normally, we set it only from dev_gro_receive() when the skb is queued >>>>> into GRO engine (status being GRO_HELD) >>>>> >>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder >>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). >>>> The latter one does a >>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); >>>> >>> >>> memset() over the whole thing is more expensive. >>> >>> Here we know the prior state of some fields, while __alloc_skb() just >>> got a piece of memory with random content. >>> >>>> What I can do is letting the affected user test the following. >>>> >>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>> index 62b06523b..8e75399cc 100644 >>>> --- a/net/core/dev.c >>>> +++ b/net/core/dev.c >>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) >>>> >>>> skb->encapsulation = 0; >>>> skb_shinfo(skb)->gso_type = 0; >>>> + skb_shinfo(skb)->gso_size = 0; >>>> skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); >>>> skb_ext_reset(skb); >>>> >>> >>> As I hinted, this should not be needed. >>> >>> For debugging purposes, I would rather do : >>> >>> BUG_ON(skb_shinfo(skb)->gso_size); >>> >> >> We did the following for debugging: >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 62b06523b..4c943b774 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, >> { >> u16 gso_segs = skb_shinfo(skb)->gso_segs; >> >> + if (!skb_shinfo(skb)->gso_type) >> + skb_warn_bad_offload(skb); > > You also want to get a stack trace here, to give us the call graph. > Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423 [236222.967498] ------------[ cut here ]------------ [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000) [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0 [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm [236222.967776] ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1 [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0 [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25 [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282 [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000 [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300 [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720 [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729 [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac [236222.968000] FS: 0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000 [236222.968013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0 [236222.968035] Call Trace: [236222.968047] <IRQ> [236222.968064] netif_skb_features+0x25e/0x2c0 [236222.968084] ? ipt_do_table+0x333/0x600 [ip_tables] [236222.968098] validate_xmit_skb+0x1d/0x300 [236222.968111] validate_xmit_skb_list+0x48/0x70 [236222.968126] sch_direct_xmit+0x129/0x2f0 [236222.968140] __dev_queue_xmit+0x710/0x8a0 [236222.968184] ? nf_confirm+0xcb/0xf0 [nf_conntrack] [236222.968200] ? nf_hook_slow+0x3f/0xb0 [236222.968214] ip_finish_output2+0x2ad/0x560 [236222.968229] __netif_receive_skb_core+0x4f0/0xf40 [236222.968244] ? packet_rcv+0x44/0x490 [236222.968257] __netif_receive_skb_one_core+0x2d/0x70 [236222.968277] process_backlog+0x96/0x160 [236222.968290] net_rx_action+0x13c/0x3e0 [236222.968312] ? usbnet_bh+0x24/0x2b0 [usbnet] [236222.968327] __do_softirq+0xd9/0x2c4 [236222.968340] asm_call_on_stack+0x12/0x20 [236222.968350] </IRQ> [236222.968362] do_softirq_own_stack+0x39/0x50 [236222.968376] irq_exit_rcu+0xc2/0x100 [236222.968389] common_interrupt+0x75/0x140 [236222.968405] asm_common_interrupt+0x1e/0x40 [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10 [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00 [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246 [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00 [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006 [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064 [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [236222.968535] acpi_safe_halt+0x1b/0x30 [236222.968549] acpi_idle_enter+0x27e/0x2e0 [236222.968566] cpuidle_enter_state+0x81/0x3f0 [236222.968589] cpuidle_enter+0x29/0x40 [236222.968602] do_idle+0x1d5/0x2a0 [236222.968615] cpu_startup_entry+0x19/0x20 [236222.968628] start_kernel+0x7f4/0x804 [236222.968645] secondary_startup_64+0xb6/0xc0 [236222.968659] ---[ end trace 8a4d7f639ad88505 ]--- > >> + >> if (gso_segs > dev->gso_max_segs) >> return features & ~NETIF_F_GSO_MASK; >> >> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps >> to find out where in the network stack something goes wrong. >> >> >> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536 >> mac=(778,14) net=(792,20) trans=812 >> shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1)) >> csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0) >> hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 >> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2 >> [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00 >> [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8 >> [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18 >> [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea >> [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a >> [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77 >> [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8 >> [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63 >> [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d >> >> >> >>> >>> Nothing in GRO stack will change gso_size, unless the packet is queued >>> by GRO layer (after this, napi_reuse_skb() wont be called) >>> >>> napi_reuse_skb() is only used when a packet has been aggregated to >>> another, and at this point gso_size should be still 0. >>> >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-08 18:41 ` Heiner Kallweit @ 2020-10-08 18:50 ` Eric Dumazet 2020-10-08 19:07 ` Eric Dumazet 2021-01-19 12:40 ` Juerg Haefliger 0 siblings, 2 replies; 19+ messages in thread From: Eric Dumazet @ 2020-10-08 18:50 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Eric Dumazet, netdev On Thu, Oct 8, 2020 at 8:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > On 08.10.2020 19:15, Eric Dumazet wrote: > > On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >> > >> On 02.10.2020 13:48, Eric Dumazet wrote: > >>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >>>> > >>>> On 02.10.2020 10:46, Eric Dumazet wrote: > >>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote: > >>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >>>>>>>> > >>>>>>>> I have a problem with the following code in ndo_start_xmit() of > >>>>>>>> the r8169 driver. A user reported the WARN being triggered due > >>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). > >>>>>>>> The driver is widely used, therefore I'd expect much more such > >>>>>>>> reports if it should be a common problem. Not sure what's special. > >>>>>>>> My primary question: Is it a valid use case that gso_size is > >>>>>>>> greater than 0, and no SKB_GSO_ flag is set? > >>>>>>>> Any hint would be appreciated. > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. > >>>>>>> > >>>>>>> You might add a > >>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type); > >>>>>>> > >>>>> > >>>>>> > >>>>>> Ah, sorry I see you already printed gso_type > >>>>>> > >>>>>> Must then be a bug somewhere :/ > >>>>> > >>>>> > >>>>> napi_reuse_skb() does : > >>>>> > >>>>> skb_shinfo(skb)->gso_type = 0; > >>>>> > >>>>> It does _not_ clear gso_size. > >>>>> > >>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero. > >>>>> > >>>>> Normally, we set it only from dev_gro_receive() when the skb is queued > >>>>> into GRO engine (status being GRO_HELD) > >>>>> > >>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder > >>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). > >>>> The latter one does a > >>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > >>>> > >>> > >>> memset() over the whole thing is more expensive. > >>> > >>> Here we know the prior state of some fields, while __alloc_skb() just > >>> got a piece of memory with random content. > >>> > >>>> What I can do is letting the affected user test the following. > >>>> > >>>> diff --git a/net/core/dev.c b/net/core/dev.c > >>>> index 62b06523b..8e75399cc 100644 > >>>> --- a/net/core/dev.c > >>>> +++ b/net/core/dev.c > >>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) > >>>> > >>>> skb->encapsulation = 0; > >>>> skb_shinfo(skb)->gso_type = 0; > >>>> + skb_shinfo(skb)->gso_size = 0; > >>>> skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); > >>>> skb_ext_reset(skb); > >>>> > >>> > >>> As I hinted, this should not be needed. > >>> > >>> For debugging purposes, I would rather do : > >>> > >>> BUG_ON(skb_shinfo(skb)->gso_size); > >>> > >> > >> We did the following for debugging: > >> > >> diff --git a/net/core/dev.c b/net/core/dev.c > >> index 62b06523b..4c943b774 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, > >> { > >> u16 gso_segs = skb_shinfo(skb)->gso_segs; > >> > >> + if (!skb_shinfo(skb)->gso_type) > >> + skb_warn_bad_offload(skb); > > > > You also want to get a stack trace here, to give us the call graph. > > > > Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423 > > > [236222.967498] ------------[ cut here ]------------ > [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000) > [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0 > [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm > [236222.967776] ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video > [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1 > [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 > [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0 > [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25 > [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282 > [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000 > [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300 > [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720 > [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729 > [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac > [236222.968000] FS: 0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000 > [236222.968013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0 > [236222.968035] Call Trace: > [236222.968047] <IRQ> > [236222.968064] netif_skb_features+0x25e/0x2c0 > [236222.968084] ? ipt_do_table+0x333/0x600 [ip_tables] > [236222.968098] validate_xmit_skb+0x1d/0x300 > [236222.968111] validate_xmit_skb_list+0x48/0x70 > [236222.968126] sch_direct_xmit+0x129/0x2f0 > [236222.968140] __dev_queue_xmit+0x710/0x8a0 > [236222.968184] ? nf_confirm+0xcb/0xf0 [nf_conntrack] > [236222.968200] ? nf_hook_slow+0x3f/0xb0 > [236222.968214] ip_finish_output2+0x2ad/0x560 > [236222.968229] __netif_receive_skb_core+0x4f0/0xf40 > [236222.968244] ? packet_rcv+0x44/0x490 > [236222.968257] __netif_receive_skb_one_core+0x2d/0x70 > [236222.968277] process_backlog+0x96/0x160 > [236222.968290] net_rx_action+0x13c/0x3e0 > [236222.968312] ? usbnet_bh+0x24/0x2b0 [usbnet] > [236222.968327] __do_softirq+0xd9/0x2c4 > [236222.968340] asm_call_on_stack+0x12/0x20 > [236222.968350] </IRQ> > [236222.968362] do_softirq_own_stack+0x39/0x50 > [236222.968376] irq_exit_rcu+0xc2/0x100 > [236222.968389] common_interrupt+0x75/0x140 > [236222.968405] asm_common_interrupt+0x1e/0x40 > [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10 > [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00 > [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246 > [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f > [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00 > [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006 > [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064 > [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 > [236222.968535] acpi_safe_halt+0x1b/0x30 > [236222.968549] acpi_idle_enter+0x27e/0x2e0 > [236222.968566] cpuidle_enter_state+0x81/0x3f0 > [236222.968589] cpuidle_enter+0x29/0x40 > [236222.968602] do_idle+0x1d5/0x2a0 > [236222.968615] cpu_startup_entry+0x19/0x20 > [236222.968628] start_kernel+0x7f4/0x804 > [236222.968645] secondary_startup_64+0xb6/0xc0 > [236222.968659] ---[ end trace 8a4d7f639ad88505 ]--- > > OK, it would be nice to know what is the input interface if4 -> look at "ip link | grep 4:" Then identifying the driver that built such a strange packet (32000 bytes allocated in skb->head) ethtool -i ifname > > > >> + > >> if (gso_segs > dev->gso_max_segs) > >> return features & ~NETIF_F_GSO_MASK; > >> > >> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps > >> to find out where in the network stack something goes wrong. > >> > >> > >> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536 > >> mac=(778,14) net=(792,20) trans=812 > >> shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1)) > >> csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0) > >> hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 > >> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2 > >> [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00 > >> [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8 > >> [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18 > >> [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea > >> [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a > >> [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77 > >> [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8 > >> [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63 > >> [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d > >> > >> > >> > >>> > >>> Nothing in GRO stack will change gso_size, unless the packet is queued > >>> by GRO layer (after this, napi_reuse_skb() wont be called) > >>> > >>> napi_reuse_skb() is only used when a packet has been aggregated to > >>> another, and at this point gso_size should be still 0. > >>> > >> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-08 18:50 ` Eric Dumazet @ 2020-10-08 19:07 ` Eric Dumazet 2020-10-08 20:54 ` Heiner Kallweit 2021-01-19 12:40 ` Juerg Haefliger 1 sibling, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2020-10-08 19:07 UTC (permalink / raw) To: Eric Dumazet, Heiner Kallweit; +Cc: netdev On 10/8/20 8:50 PM, Eric Dumazet wrote: > > > OK, it would be nice to know what is the input interface > > if4 -> look at "ip link | grep 4:" > > Then identifying the driver that built such a strange packet (32000 > bytes allocated in skb->head) > > ethtool -i ifname > According to https://bugzilla.kernel.org/show_bug.cgi?id=209423 iif4 is the tun200 interface used by openvpn. So this might be a tun bug, or lack of proper SKB_GSO_DODGY validation in our stack for buggy/malicious packets. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-08 19:07 ` Eric Dumazet @ 2020-10-08 20:54 ` Heiner Kallweit 2020-10-09 8:29 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Heiner Kallweit @ 2020-10-08 20:54 UTC (permalink / raw) To: Eric Dumazet, Eric Dumazet; +Cc: netdev On 08.10.2020 21:07, Eric Dumazet wrote: > > > On 10/8/20 8:50 PM, Eric Dumazet wrote: >> >> >> OK, it would be nice to know what is the input interface >> >> if4 -> look at "ip link | grep 4:" >> >> Then identifying the driver that built such a strange packet (32000 >> bytes allocated in skb->head) >> >> ethtool -i ifname >> > > According to https://bugzilla.kernel.org/show_bug.cgi?id=209423 > > iif4 is the tun200 interface used by openvpn. > > So this might be a tun bug, or lack of proper SKB_GSO_DODGY validation > in our stack for buggy/malicious packets. > > Following old commit sounds like it might be related: 622e0ca1cd4d ("gro: Fix bogus gso_size on the first fraglist entry") This code however was removed later in 58025e46ea2d ("net: gro: remove obsolete code from skb_gro_receive()") ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-08 20:54 ` Heiner Kallweit @ 2020-10-09 8:29 ` Eric Dumazet 0 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2020-10-09 8:29 UTC (permalink / raw) To: Heiner Kallweit, Eric Dumazet, Eric Dumazet; +Cc: netdev On 10/8/20 10:54 PM, Heiner Kallweit wrote: > On 08.10.2020 21:07, Eric Dumazet wrote: >> >> >> On 10/8/20 8:50 PM, Eric Dumazet wrote: >>> >>> >>> OK, it would be nice to know what is the input interface >>> >>> if4 -> look at "ip link | grep 4:" >>> >>> Then identifying the driver that built such a strange packet (32000 >>> bytes allocated in skb->head) >>> >>> ethtool -i ifname >>> >> >> According to https://bugzilla.kernel.org/show_bug.cgi?id=209423 >> >> iif4 is the tun200 interface used by openvpn. >> >> So this might be a tun bug, or lack of proper SKB_GSO_DODGY validation >> in our stack for buggy/malicious packets. >> >> > > Following old commit sounds like it might be related: > 622e0ca1cd4d ("gro: Fix bogus gso_size on the first fraglist entry") > > This code however was removed later in 58025e46ea2d ("net: gro: remove > obsolete code from skb_gro_receive()") > GRO wont keep in its queues a GSO packet dev_gro_receive() ... NAPI_GRO_CB(skb)->flush = skb_is_gso(skb) || skb_has_frag_list(skb); ... Also note that tun no longer can inject a packet with a length of 134 bytes pretending to have gso_size == 538 Look at virtio_net_hdr_to_skb() and commits 6dd912f82680 ("net: check untrusted gso_size at kernel entry") 7c6d2ecbda83 ("net: be more gentle about silly gso requests coming from user") Really looking at the skb layout I suspect some usbnet bug and a use-after-free. ASAN build might help. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2020-10-08 18:50 ` Eric Dumazet 2020-10-08 19:07 ` Eric Dumazet @ 2021-01-19 12:40 ` Juerg Haefliger 2021-01-19 13:47 ` Heiner Kallweit 2021-01-19 13:54 ` Eric Dumazet 1 sibling, 2 replies; 19+ messages in thread From: Juerg Haefliger @ 2021-01-19 12:40 UTC (permalink / raw) To: Eric Dumazet Cc: Heiner Kallweit, Eric Dumazet, netdev, UNGLinuxDriver, Woojung Huh [-- Attachment #1: Type: text/plain, Size: 32213 bytes --] On Thu, 8 Oct 2020 20:50:28 +0200 Eric Dumazet <edumazet@google.com> wrote: > On Thu, Oct 8, 2020 at 8:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > > > On 08.10.2020 19:15, Eric Dumazet wrote: > > > On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > >> > > >> On 02.10.2020 13:48, Eric Dumazet wrote: > > >>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > >>>> > > >>>> On 02.10.2020 10:46, Eric Dumazet wrote: > > >>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote: > > >>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > >>>>>>>> > > >>>>>>>> I have a problem with the following code in ndo_start_xmit() of > > >>>>>>>> the r8169 driver. A user reported the WARN being triggered due > > >>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). > > >>>>>>>> The driver is widely used, therefore I'd expect much more such > > >>>>>>>> reports if it should be a common problem. Not sure what's special. > > >>>>>>>> My primary question: Is it a valid use case that gso_size is > > >>>>>>>> greater than 0, and no SKB_GSO_ flag is set? > > >>>>>>>> Any hint would be appreciated. > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. > > >>>>>>> > > >>>>>>> You might add a > > >>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type); > > >>>>>>> > > >>>>> > > >>>>>> > > >>>>>> Ah, sorry I see you already printed gso_type > > >>>>>> > > >>>>>> Must then be a bug somewhere :/ > > >>>>> > > >>>>> > > >>>>> napi_reuse_skb() does : > > >>>>> > > >>>>> skb_shinfo(skb)->gso_type = 0; > > >>>>> > > >>>>> It does _not_ clear gso_size. > > >>>>> > > >>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero. > > >>>>> > > >>>>> Normally, we set it only from dev_gro_receive() when the skb is queued > > >>>>> into GRO engine (status being GRO_HELD) > > >>>>> > > >>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder > > >>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). > > >>>> The latter one does a > > >>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); > > >>>> > > >>> > > >>> memset() over the whole thing is more expensive. > > >>> > > >>> Here we know the prior state of some fields, while __alloc_skb() just > > >>> got a piece of memory with random content. > > >>> > > >>>> What I can do is letting the affected user test the following. > > >>>> > > >>>> diff --git a/net/core/dev.c b/net/core/dev.c > > >>>> index 62b06523b..8e75399cc 100644 > > >>>> --- a/net/core/dev.c > > >>>> +++ b/net/core/dev.c > > >>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) > > >>>> > > >>>> skb->encapsulation = 0; > > >>>> skb_shinfo(skb)->gso_type = 0; > > >>>> + skb_shinfo(skb)->gso_size = 0; > > >>>> skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); > > >>>> skb_ext_reset(skb); > > >>>> > > >>> > > >>> As I hinted, this should not be needed. > > >>> > > >>> For debugging purposes, I would rather do : > > >>> > > >>> BUG_ON(skb_shinfo(skb)->gso_size); > > >>> > > >> > > >> We did the following for debugging: > > >> > > >> diff --git a/net/core/dev.c b/net/core/dev.c > > >> index 62b06523b..4c943b774 100644 > > >> --- a/net/core/dev.c > > >> +++ b/net/core/dev.c > > >> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, > > >> { > > >> u16 gso_segs = skb_shinfo(skb)->gso_segs; > > >> > > >> + if (!skb_shinfo(skb)->gso_type) > > >> + skb_warn_bad_offload(skb); > > > > > > You also want to get a stack trace here, to give us the call graph. > > > > > > > Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423 > > > > > > [236222.967498] ------------[ cut here ]------------ > > [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000) > > [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0 > > [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm > > [236222.967776] ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video > > [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1 > > [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 > > [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0 > > [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25 > > [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282 > > [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000 > > [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300 > > [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720 > > [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729 > > [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac > > [236222.968000] FS: 0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000 > > [236222.968013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0 > > [236222.968035] Call Trace: > > [236222.968047] <IRQ> > > [236222.968064] netif_skb_features+0x25e/0x2c0 > > [236222.968084] ? ipt_do_table+0x333/0x600 [ip_tables] > > [236222.968098] validate_xmit_skb+0x1d/0x300 > > [236222.968111] validate_xmit_skb_list+0x48/0x70 > > [236222.968126] sch_direct_xmit+0x129/0x2f0 > > [236222.968140] __dev_queue_xmit+0x710/0x8a0 > > [236222.968184] ? nf_confirm+0xcb/0xf0 [nf_conntrack] > > [236222.968200] ? nf_hook_slow+0x3f/0xb0 > > [236222.968214] ip_finish_output2+0x2ad/0x560 > > [236222.968229] __netif_receive_skb_core+0x4f0/0xf40 > > [236222.968244] ? packet_rcv+0x44/0x490 > > [236222.968257] __netif_receive_skb_one_core+0x2d/0x70 > > [236222.968277] process_backlog+0x96/0x160 > > [236222.968290] net_rx_action+0x13c/0x3e0 > > [236222.968312] ? usbnet_bh+0x24/0x2b0 [usbnet] > > [236222.968327] __do_softirq+0xd9/0x2c4 > > [236222.968340] asm_call_on_stack+0x12/0x20 > > [236222.968350] </IRQ> > > [236222.968362] do_softirq_own_stack+0x39/0x50 > > [236222.968376] irq_exit_rcu+0xc2/0x100 > > [236222.968389] common_interrupt+0x75/0x140 > > [236222.968405] asm_common_interrupt+0x1e/0x40 > > [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10 > > [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00 > > [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246 > > [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f > > [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00 > > [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006 > > [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064 > > [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 > > [236222.968535] acpi_safe_halt+0x1b/0x30 > > [236222.968549] acpi_idle_enter+0x27e/0x2e0 > > [236222.968566] cpuidle_enter_state+0x81/0x3f0 > > [236222.968589] cpuidle_enter+0x29/0x40 > > [236222.968602] do_idle+0x1d5/0x2a0 > > [236222.968615] cpu_startup_entry+0x19/0x20 > > [236222.968628] start_kernel+0x7f4/0x804 > > [236222.968645] secondary_startup_64+0xb6/0xc0 > > [236222.968659] ---[ end trace 8a4d7f639ad88505 ]--- > > > > > > OK, it would be nice to know what is the input interface > > if4 -> look at "ip link | grep 4:" > > Then identifying the driver that built such a strange packet (32000 > bytes allocated in skb->head) > > ethtool -i ifname > > > > > > > > >> + > > >> if (gso_segs > dev->gso_max_segs) > > >> return features & ~NETIF_F_GSO_MASK; > > >> > > >> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps > > >> to find out where in the network stack something goes wrong. > > >> > > >> > > >> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536 > > >> mac=(778,14) net=(792,20) trans=812 > > >> shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1)) > > >> csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0) > > >> hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 > > >> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2 > > >> [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00 > > >> [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8 > > >> [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18 > > >> [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea > > >> [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a > > >> [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77 > > >> [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8 > > >> [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63 > > >> [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d > > >> > > >> > > >> > > >>> > > >>> Nothing in GRO stack will change gso_size, unless the packet is queued > > >>> by GRO layer (after this, napi_reuse_skb() wont be called) > > >>> > > >>> napi_reuse_skb() is only used when a packet has been aggregated to > > >>> another, and at this point gso_size should be still 0. > > >>> > > >> I seem to have stumbled over the same or a similar issue with a Raspberry Pi 3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I get tons of the below warnings after a couple of seconds: Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0) Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear: 00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear: 00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear: 00000030: 00 00 6a 58 00 00 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------ Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in: Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace: Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599] skb_warn_bad_offload+0x84/0x100 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020] netif_skb_features+0x218/0x2a0 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350] validate_xmit_skb.isra.0+0x28/0x2c8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125] validate_xmit_skb_list+0x44/0x98 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631] sch_direct_xmit+0xf0/0x3a8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599] __qdisc_run+0x140/0x668 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297] __dev_queue_xmit+0x59c/0x980 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446] dev_queue_xmit+0x1c/0x28 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237] ip_finish_output2+0x30c/0x558 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476] __ip_finish_output+0xe4/0x260 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715] ip_finish_output+0x3c/0xd8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683] ip_output+0xb4/0x148 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116] ip_forward_finish+0x7c/0xc0 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174] ip_forward+0x42c/0x4f0 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783] ip_rcv_finish+0x98/0xb8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481] ip_rcv+0xe0/0xf0 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552] __netif_receive_skb_one_core+0x5c/0x88 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597] __netif_receive_skb+0x20/0x70 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834] process_backlog+0xc0/0x1d0 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802] net_rx_action+0x134/0x478 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682] __do_softirq+0x130/0x378 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472] irq_exit+0xc0/0xe8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725] __handle_domain_irq+0x70/0xc8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185] el1_irq+0xb4/0x140 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377] arch_cpu_idle+0x18/0x28 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981] default_idle_call+0x44/0x178 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009] do_idle+0x224/0x270 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147] cpu_startup_entry+0x30/0x98 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916] rest_init+0xc8/0xd8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937] arch_call_rest_init+0x18/0x24 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829] start_kernel+0x57c/0x5b8 Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]--- If I also add the following rule: $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE I get a single warning followed by a TX timeout: Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] mac=(5194,14) net=(5208,20) trans=5228 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0) Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.546872] dev name=eth0 feat=0x0x0000010000114b09 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.552060] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.560090] skb linear: 00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.568019] skb linear: 00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.575921] skb linear: 00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.583918] skb linear: 00000040: 9f fc Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.588105] ------------[ cut here ]------------ Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.607900] Modules linked in: Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.630396] pc : skb_warn_bad_offload+0x84/0x100 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.635175] lr : skb_warn_bad_offload+0x84/0x100 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.639953] sp : ffff800010003810 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.643374] x29: ffff800010003810 x28: ffff50043b196290 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.648870] x27: ffff500407371600 x26: 0000000000000001 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.670847] x19: ffff500407371600 x18: 0000000000000010 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.676340] x17: 0000000000000000 x16: 0000000000000000 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.681833] x15: 000000000000ad55 x14: 0000000000000010 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.703806] x7 : c0000000ffffefff x6 : 0000000000000003 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.709300] x5 : 0000000000000000 x4 : 0000000000000000 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.714791] x3 : 0000000000000100 x2 : 0000000000001000 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.720283] x1 : 0000000000000000 x0 : 0000000000000000 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.725778] Call trace: Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.728306] skb_warn_bad_offload+0x84/0x100 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.732728] netif_skb_features+0x218/0x2a0 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.737057] validate_xmit_skb.isra.0+0x28/0x2c8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.741833] validate_xmit_skb_list+0x44/0x98 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.746339] sch_direct_xmit+0xf0/0x3a8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.750309] __qdisc_run+0x140/0x668 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.754008] __dev_queue_xmit+0x59c/0x980 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.758156] dev_queue_xmit+0x1c/0x28 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.761945] neigh_resolve_output+0x108/0x230 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.766450] ip_finish_output2+0x180/0x558 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.770690] __ip_finish_output+0xe4/0x260 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.774928] ip_finish_output+0x3c/0xd8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.778896] ip_output+0xb4/0x148 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.782328] ip_forward_finish+0x7c/0xc0 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.786385] ip_forward+0x42c/0x4f0 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.789995] ip_rcv_finish+0x98/0xb8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.793694] ip_rcv+0xe0/0xf0 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.796765] __netif_receive_skb_one_core+0x5c/0x88 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.801810] __netif_receive_skb+0x20/0x70 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.806047] process_backlog+0xc0/0x1d0 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.810016] net_rx_action+0x134/0x478 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.813897] __do_softirq+0x130/0x378 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.817686] irq_exit+0xc0/0xe8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.820940] __handle_domain_irq+0x70/0xc8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.829099] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.838223] el1_irq+0xb4/0x140 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.845371] arch_cpu_idle+0x18/0x28 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.852882] default_idle_call+0x44/0x178 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.860756] do_idle+0x224/0x270 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.867794] cpu_startup_entry+0x30/0x98 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.875516] rest_init+0xc8/0xd8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.882496] arch_call_rest_init+0x18/0x24 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.890352] start_kernel+0x57c/0x5b8 Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.897706] ---[ end trace a5789410f231a10b ]--- Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.046337] ------------[ cut here ]------------ Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.077534] Modules linked in: Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 5.11.0-rc4 #103 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.116085] pc : dev_watchdog+0x384/0x390 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.123857] lr : dev_watchdog+0x384/0x390 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.131558] sp : ffff800010013d90 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.138497] x29: ffff800010013d90 x28: 0000000000000140 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.156489] x25: 0000000000000002 x24: 0000000000000000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.165496] x23: 0000000000000001 x22: ffff50042e96b000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.183490] x19: 0000000000000000 x18: 0000000000000010 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.192493] x17: 0000000000000000 x16: 0000000000000000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.201473] x15: 000000000000ad55 x14: 0000000000000010 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.237362] x7 : c0000000ffffefff x6 : 0000000000000003 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.246353] x5 : 0000000000000000 x4 : 0000000000000000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.255328] x3 : 0000000000000100 x2 : 0000000000001000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.264273] x1 : 0000000000000000 x0 : 0000000000000000 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.273192] Call trace: Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.279183] dev_watchdog+0x384/0x390 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.286461] call_timer_fn+0x38/0x188 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.293762] run_timer_softirq+0x494/0x688 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.301489] __do_softirq+0x130/0x378 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.308767] irq_exit+0xc0/0xe8 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.315500] __handle_domain_irq+0x70/0xc8 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.323214] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.331940] el1_irq+0xb4/0x140 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.338706] arch_cpu_idle+0x18/0x28 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.345916] default_idle_call+0x44/0x178 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.353577] do_idle+0x224/0x270 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.360433] cpu_startup_entry+0x2c/0x98 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.368000] secondary_start_kernel+0x148/0x180 Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.376199] ---[ end trace a5789410f231a10c ]--- I did some bisecting and found commit [2] to be problematic. Reverting that commit plus the two follow-on fixes [3] and [4] prevents the warnings and timeout. I'm no networking expert so can't determine if [2] is broken or merely exposes a different underlying issue. I failed to reproduce the problem using a dedicated Realtek-based USB NIC plugged into the Pi, which points towards the lan78xx driver/HW being the culprit. Enabling KASAN didn't trigger any error reports. Let me know if there's anything else I can try to narrow this down. ...Juerg [1] On the Pi, I run: $ nc -l 1234 | dd status=progress >/dev/null And on another machine, that is configured to use the Pi as the gateway: $ nc 192.168.99.115 1234 < /dev/urandom and a couple of firefox instances that keep opening public URls. [2] Author: Eric Dumazet <edumazet@google.com> Date: Tue Nov 27 14:42:03 2018 -0800 tcp: implement coalescing on backlog queue In case GRO is not as efficient as it should be or disabled, we might have a user thread trapped in __release_sock() while softirq handler flood packets up to the point we have to drop. This patch balances work done from user thread and softirq, to give more chances to __release_sock() to complete its work before new packets are added the the backlog. This also helps if we receive many ACK packets, since GRO does not aggregate them. This patch brings ~60% throughput increase on a receiver without GRO, but the spectacular gain is really on 1000x release_sock() latency reduction I have measured. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog() [4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog() [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2021-01-19 12:40 ` Juerg Haefliger @ 2021-01-19 13:47 ` Heiner Kallweit 2021-01-19 13:58 ` Eric Dumazet 2021-01-19 13:54 ` Eric Dumazet 1 sibling, 1 reply; 19+ messages in thread From: Heiner Kallweit @ 2021-01-19 13:47 UTC (permalink / raw) To: Juerg Haefliger, Eric Dumazet Cc: Eric Dumazet, netdev, UNGLinuxDriver, Woojung Huh On 19.01.2021 13:40, Juerg Haefliger wrote: > On Thu, 8 Oct 2020 20:50:28 +0200 > Eric Dumazet <edumazet@google.com> wrote: > >> On Thu, Oct 8, 2020 at 8:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>> >>> On 08.10.2020 19:15, Eric Dumazet wrote: >>>> On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>>> >>>>> On 02.10.2020 13:48, Eric Dumazet wrote: >>>>>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>>>>> >>>>>>> On 02.10.2020 10:46, Eric Dumazet wrote: >>>>>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote: >>>>>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> I have a problem with the following code in ndo_start_xmit() of >>>>>>>>>>> the r8169 driver. A user reported the WARN being triggered due >>>>>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6). >>>>>>>>>>> The driver is widely used, therefore I'd expect much more such >>>>>>>>>>> reports if it should be a common problem. Not sure what's special. >>>>>>>>>>> My primary question: Is it a valid use case that gso_size is >>>>>>>>>>> greater than 0, and no SKB_GSO_ flag is set? >>>>>>>>>>> Any hint would be appreciated. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place. >>>>>>>>>> >>>>>>>>>> You might add a >>>>>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type); >>>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Ah, sorry I see you already printed gso_type >>>>>>>>> >>>>>>>>> Must then be a bug somewhere :/ >>>>>>>> >>>>>>>> >>>>>>>> napi_reuse_skb() does : >>>>>>>> >>>>>>>> skb_shinfo(skb)->gso_type = 0; >>>>>>>> >>>>>>>> It does _not_ clear gso_size. >>>>>>>> >>>>>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero. >>>>>>>> >>>>>>>> Normally, we set it only from dev_gro_receive() when the skb is queued >>>>>>>> into GRO engine (status being GRO_HELD) >>>>>>>> >>>>>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder >>>>>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb(). >>>>>>> The latter one does a >>>>>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); >>>>>>> >>>>>> >>>>>> memset() over the whole thing is more expensive. >>>>>> >>>>>> Here we know the prior state of some fields, while __alloc_skb() just >>>>>> got a piece of memory with random content. >>>>>> >>>>>>> What I can do is letting the affected user test the following. >>>>>>> >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>> index 62b06523b..8e75399cc 100644 >>>>>>> --- a/net/core/dev.c >>>>>>> +++ b/net/core/dev.c >>>>>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) >>>>>>> >>>>>>> skb->encapsulation = 0; >>>>>>> skb_shinfo(skb)->gso_type = 0; >>>>>>> + skb_shinfo(skb)->gso_size = 0; >>>>>>> skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); >>>>>>> skb_ext_reset(skb); >>>>>>> >>>>>> >>>>>> As I hinted, this should not be needed. >>>>>> >>>>>> For debugging purposes, I would rather do : >>>>>> >>>>>> BUG_ON(skb_shinfo(skb)->gso_size); >>>>>> >>>>> >>>>> We did the following for debugging: >>>>> >>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>> index 62b06523b..4c943b774 100644 >>>>> --- a/net/core/dev.c >>>>> +++ b/net/core/dev.c >>>>> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, >>>>> { >>>>> u16 gso_segs = skb_shinfo(skb)->gso_segs; >>>>> >>>>> + if (!skb_shinfo(skb)->gso_type) >>>>> + skb_warn_bad_offload(skb); >>>> >>>> You also want to get a stack trace here, to give us the call graph. >>>> >>> >>> Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423 >>> >>> >>> [236222.967498] ------------[ cut here ]------------ >>> [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000) >>> [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0 >>> [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm >>> [236222.967776] ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video >>> [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1 >>> [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 >>> [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0 >>> [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25 >>> [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282 >>> [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000 >>> [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300 >>> [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720 >>> [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729 >>> [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac >>> [236222.968000] FS: 0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000 >>> [236222.968013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0 >>> [236222.968035] Call Trace: >>> [236222.968047] <IRQ> >>> [236222.968064] netif_skb_features+0x25e/0x2c0 >>> [236222.968084] ? ipt_do_table+0x333/0x600 [ip_tables] >>> [236222.968098] validate_xmit_skb+0x1d/0x300 >>> [236222.968111] validate_xmit_skb_list+0x48/0x70 >>> [236222.968126] sch_direct_xmit+0x129/0x2f0 >>> [236222.968140] __dev_queue_xmit+0x710/0x8a0 >>> [236222.968184] ? nf_confirm+0xcb/0xf0 [nf_conntrack] >>> [236222.968200] ? nf_hook_slow+0x3f/0xb0 >>> [236222.968214] ip_finish_output2+0x2ad/0x560 >>> [236222.968229] __netif_receive_skb_core+0x4f0/0xf40 >>> [236222.968244] ? packet_rcv+0x44/0x490 >>> [236222.968257] __netif_receive_skb_one_core+0x2d/0x70 >>> [236222.968277] process_backlog+0x96/0x160 >>> [236222.968290] net_rx_action+0x13c/0x3e0 >>> [236222.968312] ? usbnet_bh+0x24/0x2b0 [usbnet] >>> [236222.968327] __do_softirq+0xd9/0x2c4 >>> [236222.968340] asm_call_on_stack+0x12/0x20 >>> [236222.968350] </IRQ> >>> [236222.968362] do_softirq_own_stack+0x39/0x50 >>> [236222.968376] irq_exit_rcu+0xc2/0x100 >>> [236222.968389] common_interrupt+0x75/0x140 >>> [236222.968405] asm_common_interrupt+0x1e/0x40 >>> [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10 >>> [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00 >>> [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246 >>> [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f >>> [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00 >>> [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006 >>> [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064 >>> [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 >>> [236222.968535] acpi_safe_halt+0x1b/0x30 >>> [236222.968549] acpi_idle_enter+0x27e/0x2e0 >>> [236222.968566] cpuidle_enter_state+0x81/0x3f0 >>> [236222.968589] cpuidle_enter+0x29/0x40 >>> [236222.968602] do_idle+0x1d5/0x2a0 >>> [236222.968615] cpu_startup_entry+0x19/0x20 >>> [236222.968628] start_kernel+0x7f4/0x804 >>> [236222.968645] secondary_startup_64+0xb6/0xc0 >>> [236222.968659] ---[ end trace 8a4d7f639ad88505 ]--- >>> >>> >> >> OK, it would be nice to know what is the input interface >> >> if4 -> look at "ip link | grep 4:" >> >> Then identifying the driver that built such a strange packet (32000 >> bytes allocated in skb->head) >> >> ethtool -i ifname >> >> >> >>>> >>>>> + >>>>> if (gso_segs > dev->gso_max_segs) >>>>> return features & ~NETIF_F_GSO_MASK; >>>>> >>>>> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps >>>>> to find out where in the network stack something goes wrong. >>>>> >>>>> >>>>> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536 >>>>> mac=(778,14) net=(792,20) trans=812 >>>>> shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1)) >>>>> csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0) >>>>> hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 >>>>> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2 >>>>> [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00 >>>>> [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8 >>>>> [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18 >>>>> [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea >>>>> [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a >>>>> [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77 >>>>> [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8 >>>>> [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63 >>>>> [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d >>>>> >>>>> >>>>> >>>>>> >>>>>> Nothing in GRO stack will change gso_size, unless the packet is queued >>>>>> by GRO layer (after this, napi_reuse_skb() wont be called) >>>>>> >>>>>> napi_reuse_skb() is only used when a packet has been aggregated to >>>>>> another, and at this point gso_size should be still 0. >>>>>> >>>>> > > I seem to have stumbled over the same or a similar issue with a Raspberry Pi > 3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used > as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I > get tons of the below warnings after a couple of seconds: > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear: 00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear: 00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear: 00000030: 00 00 6a 58 00 00 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------ > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in: > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace: > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599] skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020] netif_skb_features+0x218/0x2a0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350] validate_xmit_skb.isra.0+0x28/0x2c8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125] validate_xmit_skb_list+0x44/0x98 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631] sch_direct_xmit+0xf0/0x3a8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599] __qdisc_run+0x140/0x668 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297] __dev_queue_xmit+0x59c/0x980 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446] dev_queue_xmit+0x1c/0x28 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237] ip_finish_output2+0x30c/0x558 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476] __ip_finish_output+0xe4/0x260 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715] ip_finish_output+0x3c/0xd8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683] ip_output+0xb4/0x148 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116] ip_forward_finish+0x7c/0xc0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174] ip_forward+0x42c/0x4f0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783] ip_rcv_finish+0x98/0xb8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481] ip_rcv+0xe0/0xf0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552] __netif_receive_skb_one_core+0x5c/0x88 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597] __netif_receive_skb+0x20/0x70 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834] process_backlog+0xc0/0x1d0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802] net_rx_action+0x134/0x478 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682] __do_softirq+0x130/0x378 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472] irq_exit+0xc0/0xe8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725] __handle_domain_irq+0x70/0xc8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185] el1_irq+0xb4/0x140 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377] arch_cpu_idle+0x18/0x28 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981] default_idle_call+0x44/0x178 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009] do_idle+0x224/0x270 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147] cpu_startup_entry+0x30/0x98 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916] rest_init+0xc8/0xd8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937] arch_call_rest_init+0x18/0x24 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829] start_kernel+0x57c/0x5b8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]--- > > If I also add the following rule: > $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > I get a single warning followed by a TX timeout: > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] mac=(5194,14) net=(5208,20) trans=5228 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.546872] dev name=eth0 feat=0x0x0000010000114b09 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.552060] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.560090] skb linear: 00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.568019] skb linear: 00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.575921] skb linear: 00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.583918] skb linear: 00000040: 9f fc > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.588105] ------------[ cut here ]------------ > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.607900] Modules linked in: > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.630396] pc : skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.635175] lr : skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.639953] sp : ffff800010003810 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.643374] x29: ffff800010003810 x28: ffff50043b196290 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.648870] x27: ffff500407371600 x26: 0000000000000001 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.670847] x19: ffff500407371600 x18: 0000000000000010 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.676340] x17: 0000000000000000 x16: 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.681833] x15: 000000000000ad55 x14: 0000000000000010 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.703806] x7 : c0000000ffffefff x6 : 0000000000000003 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.709300] x5 : 0000000000000000 x4 : 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.714791] x3 : 0000000000000100 x2 : 0000000000001000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.720283] x1 : 0000000000000000 x0 : 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.725778] Call trace: > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.728306] skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.732728] netif_skb_features+0x218/0x2a0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.737057] validate_xmit_skb.isra.0+0x28/0x2c8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.741833] validate_xmit_skb_list+0x44/0x98 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.746339] sch_direct_xmit+0xf0/0x3a8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.750309] __qdisc_run+0x140/0x668 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.754008] __dev_queue_xmit+0x59c/0x980 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.758156] dev_queue_xmit+0x1c/0x28 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.761945] neigh_resolve_output+0x108/0x230 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.766450] ip_finish_output2+0x180/0x558 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.770690] __ip_finish_output+0xe4/0x260 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.774928] ip_finish_output+0x3c/0xd8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.778896] ip_output+0xb4/0x148 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.782328] ip_forward_finish+0x7c/0xc0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.786385] ip_forward+0x42c/0x4f0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.789995] ip_rcv_finish+0x98/0xb8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.793694] ip_rcv+0xe0/0xf0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.796765] __netif_receive_skb_one_core+0x5c/0x88 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.801810] __netif_receive_skb+0x20/0x70 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.806047] process_backlog+0xc0/0x1d0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.810016] net_rx_action+0x134/0x478 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.813897] __do_softirq+0x130/0x378 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.817686] irq_exit+0xc0/0xe8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.820940] __handle_domain_irq+0x70/0xc8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.829099] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.838223] el1_irq+0xb4/0x140 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.845371] arch_cpu_idle+0x18/0x28 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.852882] default_idle_call+0x44/0x178 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.860756] do_idle+0x224/0x270 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.867794] cpu_startup_entry+0x30/0x98 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.875516] rest_init+0xc8/0xd8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.882496] arch_call_rest_init+0x18/0x24 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.890352] start_kernel+0x57c/0x5b8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.897706] ---[ end trace a5789410f231a10b ]--- > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.046337] ------------[ cut here ]------------ > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.077534] Modules linked in: > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 5.11.0-rc4 #103 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.116085] pc : dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.123857] lr : dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.131558] sp : ffff800010013d90 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.138497] x29: ffff800010013d90 x28: 0000000000000140 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.156489] x25: 0000000000000002 x24: 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.165496] x23: 0000000000000001 x22: ffff50042e96b000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.183490] x19: 0000000000000000 x18: 0000000000000010 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.192493] x17: 0000000000000000 x16: 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.201473] x15: 000000000000ad55 x14: 0000000000000010 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.237362] x7 : c0000000ffffefff x6 : 0000000000000003 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.246353] x5 : 0000000000000000 x4 : 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.255328] x3 : 0000000000000100 x2 : 0000000000001000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.264273] x1 : 0000000000000000 x0 : 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.273192] Call trace: > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.279183] dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.286461] call_timer_fn+0x38/0x188 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.293762] run_timer_softirq+0x494/0x688 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.301489] __do_softirq+0x130/0x378 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.308767] irq_exit+0xc0/0xe8 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.315500] __handle_domain_irq+0x70/0xc8 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.323214] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.331940] el1_irq+0xb4/0x140 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.338706] arch_cpu_idle+0x18/0x28 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.345916] default_idle_call+0x44/0x178 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.353577] do_idle+0x224/0x270 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.360433] cpu_startup_entry+0x2c/0x98 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.368000] secondary_start_kernel+0x148/0x180 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.376199] ---[ end trace a5789410f231a10c ]--- > > I did some bisecting and found commit [2] to be problematic. Reverting that > commit plus the two follow-on fixes [3] and [4] prevents the warnings and > timeout. I'm no networking expert so can't determine if [2] is broken or > merely exposes a different underlying issue. I failed to reproduce the problem > using a dedicated Realtek-based USB NIC plugged into the Pi, which points > towards the lan78xx driver/HW being the culprit. > > Enabling KASAN didn't trigger any error reports. > > Let me know if there's anything else I can try to narrow this down. > > ...Juerg > > [1] > On the Pi, I run: > $ nc -l 1234 | dd status=progress >/dev/null > > And on another machine, that is configured to use the Pi as the gateway: > $ nc 192.168.99.115 1234 < /dev/urandom > and a couple of firefox instances that keep opening public URls. > > [2] > Author: Eric Dumazet <edumazet@google.com> > Date: Tue Nov 27 14:42:03 2018 -0800 > > tcp: implement coalescing on backlog queue > > In case GRO is not as efficient as it should be or disabled, > we might have a user thread trapped in __release_sock() while > softirq handler flood packets up to the point we have to drop. > > This patch balances work done from user thread and softirq, > to give more chances to __release_sock() to complete its work > before new packets are added the the backlog. > > This also helps if we receive many ACK packets, since GRO > does not aggregate them. > > This patch brings ~60% throughput increase on a receiver > without GRO, but the spectacular gain is really on > 1000x release_sock() latency reduction I have measured. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Acked-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > [3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog() > [4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog() > In tcp_add_backlog() we have the following that looks like it could be related to the problem. gso_type doesn't get set, not sure however whether this is a bug or intentional (because we expect gso_type to be set already or because it's supposed to be set somewhere else). Meybe Eric can comment on this. if (!shinfo->gso_size) shinfo->gso_size = skb->len - hdrlen; if (!shinfo->gso_segs) shinfo->gso_segs = 1; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2021-01-19 13:47 ` Heiner Kallweit @ 2021-01-19 13:58 ` Eric Dumazet 0 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2021-01-19 13:58 UTC (permalink / raw) To: Heiner Kallweit Cc: Juerg Haefliger, Eric Dumazet, netdev, Microchip Linux Driver Support, Woojung Huh On Tue, Jan 19, 2021 at 2:47 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > > In tcp_add_backlog() we have the following that looks like it could > be related to the problem. gso_type doesn't get set, not sure however > whether this is a bug or intentional (because we expect gso_type > to be set already or because it's supposed to be set somewhere else). > Meybe Eric can comment on this. > > > if (!shinfo->gso_size) > shinfo->gso_size = skb->len - hdrlen; > > if (!shinfo->gso_segs) > shinfo->gso_segs = 1; > Yes, at this point TCP is supposed to own the skb, which is partially true. Check for skb_cloned() in places like skb_try_coalesce() I think that calling skb_unclone() would be terribly expensive for all these USB drivers having fake skb (all clones from a giant one), and thus very big headroom that would be copied from generic expand head. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2021-01-19 12:40 ` Juerg Haefliger 2021-01-19 13:47 ` Heiner Kallweit @ 2021-01-19 13:54 ` Eric Dumazet 2021-01-19 15:38 ` Juerg Haefliger 1 sibling, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2021-01-19 13:54 UTC (permalink / raw) To: Juerg Haefliger Cc: Heiner Kallweit, Eric Dumazet, netdev, Microchip Linux Driver Support, Woojung Huh On Tue, Jan 19, 2021 at 1:40 PM Juerg Haefliger <juerg.haefliger@canonical.com> wrote: > > I seem to have stumbled over the same or a similar issue with a Raspberry Pi > 3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used > as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I > get tons of the below warnings after a couple of seconds: > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear: 00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear: 00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear: 00000030: 00 00 6a 58 00 00 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------ > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in: > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace: > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599] skb_warn_bad_offload+0x84/0x100 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020] netif_skb_features+0x218/0x2a0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350] validate_xmit_skb.isra.0+0x28/0x2c8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125] validate_xmit_skb_list+0x44/0x98 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631] sch_direct_xmit+0xf0/0x3a8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599] __qdisc_run+0x140/0x668 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297] __dev_queue_xmit+0x59c/0x980 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446] dev_queue_xmit+0x1c/0x28 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237] ip_finish_output2+0x30c/0x558 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476] __ip_finish_output+0xe4/0x260 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715] ip_finish_output+0x3c/0xd8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683] ip_output+0xb4/0x148 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116] ip_forward_finish+0x7c/0xc0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174] ip_forward+0x42c/0x4f0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783] ip_rcv_finish+0x98/0xb8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481] ip_rcv+0xe0/0xf0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552] __netif_receive_skb_one_core+0x5c/0x88 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597] __netif_receive_skb+0x20/0x70 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834] process_backlog+0xc0/0x1d0 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802] net_rx_action+0x134/0x478 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682] __do_softirq+0x130/0x378 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472] irq_exit+0xc0/0xe8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725] __handle_domain_irq+0x70/0xc8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185] el1_irq+0xb4/0x140 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377] arch_cpu_idle+0x18/0x28 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981] default_idle_call+0x44/0x178 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009] do_idle+0x224/0x270 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147] cpu_startup_entry+0x30/0x98 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916] rest_init+0xc8/0xd8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937] arch_call_rest_init+0x18/0x24 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829] start_kernel+0x57c/0x5b8 > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]--- > > If I also add the following rule: > $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > I get a single warning followed by a TX timeout: > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] mac=(5194,14) net=(5208,20) trans=5228 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.546872] dev name=eth0 feat=0x0x0000010000114b09 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.552060] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.560090] skb linear: 00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.568019] skb linear: 00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.575921] skb linear: 00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.583918] skb linear: 00000040: 9f fc > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.588105] ------------[ cut here ]------------ > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.607900] Modules linked in: > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.630396] pc : skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.635175] lr : skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.639953] sp : ffff800010003810 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.643374] x29: ffff800010003810 x28: ffff50043b196290 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.648870] x27: ffff500407371600 x26: 0000000000000001 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.670847] x19: ffff500407371600 x18: 0000000000000010 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.676340] x17: 0000000000000000 x16: 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.681833] x15: 000000000000ad55 x14: 0000000000000010 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.703806] x7 : c0000000ffffefff x6 : 0000000000000003 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.709300] x5 : 0000000000000000 x4 : 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.714791] x3 : 0000000000000100 x2 : 0000000000001000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.720283] x1 : 0000000000000000 x0 : 0000000000000000 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.725778] Call trace: > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.728306] skb_warn_bad_offload+0x84/0x100 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.732728] netif_skb_features+0x218/0x2a0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.737057] validate_xmit_skb.isra.0+0x28/0x2c8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.741833] validate_xmit_skb_list+0x44/0x98 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.746339] sch_direct_xmit+0xf0/0x3a8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.750309] __qdisc_run+0x140/0x668 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.754008] __dev_queue_xmit+0x59c/0x980 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.758156] dev_queue_xmit+0x1c/0x28 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.761945] neigh_resolve_output+0x108/0x230 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.766450] ip_finish_output2+0x180/0x558 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.770690] __ip_finish_output+0xe4/0x260 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.774928] ip_finish_output+0x3c/0xd8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.778896] ip_output+0xb4/0x148 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.782328] ip_forward_finish+0x7c/0xc0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.786385] ip_forward+0x42c/0x4f0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.789995] ip_rcv_finish+0x98/0xb8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.793694] ip_rcv+0xe0/0xf0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.796765] __netif_receive_skb_one_core+0x5c/0x88 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.801810] __netif_receive_skb+0x20/0x70 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.806047] process_backlog+0xc0/0x1d0 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.810016] net_rx_action+0x134/0x478 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.813897] __do_softirq+0x130/0x378 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.817686] irq_exit+0xc0/0xe8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.820940] __handle_domain_irq+0x70/0xc8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.829099] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.838223] el1_irq+0xb4/0x140 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.845371] arch_cpu_idle+0x18/0x28 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.852882] default_idle_call+0x44/0x178 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.860756] do_idle+0x224/0x270 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.867794] cpu_startup_entry+0x30/0x98 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.875516] rest_init+0xc8/0xd8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.882496] arch_call_rest_init+0x18/0x24 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.890352] start_kernel+0x57c/0x5b8 > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.897706] ---[ end trace a5789410f231a10b ]--- > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.046337] ------------[ cut here ]------------ > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.077534] Modules linked in: > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 5.11.0-rc4 #103 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.116085] pc : dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.123857] lr : dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.131558] sp : ffff800010013d90 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.138497] x29: ffff800010013d90 x28: 0000000000000140 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.156489] x25: 0000000000000002 x24: 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.165496] x23: 0000000000000001 x22: ffff50042e96b000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.183490] x19: 0000000000000000 x18: 0000000000000010 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.192493] x17: 0000000000000000 x16: 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.201473] x15: 000000000000ad55 x14: 0000000000000010 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.237362] x7 : c0000000ffffefff x6 : 0000000000000003 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.246353] x5 : 0000000000000000 x4 : 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.255328] x3 : 0000000000000100 x2 : 0000000000001000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.264273] x1 : 0000000000000000 x0 : 0000000000000000 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.273192] Call trace: > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.279183] dev_watchdog+0x384/0x390 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.286461] call_timer_fn+0x38/0x188 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.293762] run_timer_softirq+0x494/0x688 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.301489] __do_softirq+0x130/0x378 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.308767] irq_exit+0xc0/0xe8 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.315500] __handle_domain_irq+0x70/0xc8 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.323214] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.331940] el1_irq+0xb4/0x140 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.338706] arch_cpu_idle+0x18/0x28 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.345916] default_idle_call+0x44/0x178 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.353577] do_idle+0x224/0x270 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.360433] cpu_startup_entry+0x2c/0x98 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.368000] secondary_start_kernel+0x148/0x180 > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.376199] ---[ end trace a5789410f231a10c ]--- > > I did some bisecting and found commit [2] to be problematic. Reverting that > commit plus the two follow-on fixes [3] and [4] prevents the warnings and > timeout. I'm no networking expert so can't determine if [2] is broken or > merely exposes a different underlying issue. I failed to reproduce the problem > using a dedicated Realtek-based USB NIC plugged into the Pi, which points > towards the lan78xx driver/HW being the culprit. > > Enabling KASAN didn't trigger any error reports. > > Let me know if there's anything else I can try to narrow this down. > > ...Juerg > > [1] > On the Pi, I run: > $ nc -l 1234 | dd status=progress >/dev/null > > And on another machine, that is configured to use the Pi as the gateway: > $ nc 192.168.99.115 1234 < /dev/urandom > and a couple of firefox instances that keep opening public URls. > > [2] > Author: Eric Dumazet <edumazet@google.com> > Date: Tue Nov 27 14:42:03 2018 -0800 > > tcp: implement coalescing on backlog queue > > In case GRO is not as efficient as it should be or disabled, > we might have a user thread trapped in __release_sock() while > softirq handler flood packets up to the point we have to drop. > > This patch balances work done from user thread and softirq, > to give more chances to __release_sock() to complete its work > before new packets are added the the backlog. > > This also helps if we receive many ACK packets, since GRO > does not aggregate them. > > This patch brings ~60% throughput increase on a receiver > without GRO, but the spectacular gain is really on > 1000x release_sock() latency reduction I have measured. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Acked-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > [3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog() > [4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog() Oops. Very nice detective work :) It is true that the skb_clone() done in lan78xx (and some other usb drivers) is probably triggering this issue. (lan78xx is also lying about skb->truesize) skb_try_coalesce() bails if the target skb is cloned, but not if the source is. Can you try the following patch ? diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 58207c7769d05693b650e3c93e4ef405a5d4b23a..4e82745d336fc3fb0d9ce8c92aaeb39702f64b8a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1760,6 +1760,7 @@ int tcp_v4_early_demux(struct sk_buff *skb) bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) { u32 limit = READ_ONCE(sk->sk_rcvbuf) + READ_ONCE(sk->sk_sndbuf); + u32 tail_gso_size, tail_gso_segs; struct skb_shared_info *shinfo; const struct tcphdr *th; struct tcphdr *thtail; @@ -1767,6 +1768,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) unsigned int hdrlen; bool fragstolen; u32 gso_segs; + u32 gso_size; int delta; /* In case all data was pulled from skb frags (in __pskb_pull_tail()), @@ -1792,13 +1794,6 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) */ th = (const struct tcphdr *)skb->data; hdrlen = th->doff * 4; - shinfo = skb_shinfo(skb); - - if (!shinfo->gso_size) - shinfo->gso_size = skb->len - hdrlen; - - if (!shinfo->gso_segs) - shinfo->gso_segs = 1; tail = sk->sk_backlog.tail; if (!tail) @@ -1821,6 +1816,15 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) goto no_coalesce; __skb_pull(skb, hdrlen); + + shinfo = skb_shinfo(skb); + gso_size = shinfo->gso_size ?: skb->len; + gso_segs = shinfo->gso_segs ?: 1; + + shinfo = skb_shinfo(tail); + tail_gso_size = shinfo->gso_size ?: (tail->len - hdrlen); + tail_gso_segs = shinfo->gso_segs ?: 1; + if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) { TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq; @@ -1847,11 +1851,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) } /* Not as strict as GRO. We only need to carry mss max value */ - skb_shinfo(tail)->gso_size = max(shinfo->gso_size, - skb_shinfo(tail)->gso_size); - - gso_segs = skb_shinfo(tail)->gso_segs + shinfo->gso_segs; - skb_shinfo(tail)->gso_segs = min_t(u32, gso_segs, 0xFFFF); + shinfo->gso_size = max(gso_size, tail_gso_size); + shinfo->gso_segs = min_t(u32, gso_segs + tail_gso_segs, 0xFFFF); sk->sk_backlog.len += delta; __NET_INC_STATS(sock_net(sk), ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2021-01-19 13:54 ` Eric Dumazet @ 2021-01-19 15:38 ` Juerg Haefliger 2021-01-19 15:50 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Juerg Haefliger @ 2021-01-19 15:38 UTC (permalink / raw) To: Eric Dumazet Cc: Juerg Haefliger, Heiner Kallweit, Eric Dumazet, netdev, Microchip Linux Driver Support, Woojung Huh [-- Attachment #1: Type: text/plain, Size: 25217 bytes --] On Tue, 19 Jan 2021 14:54:31 +0100 Eric Dumazet <edumazet@google.com> wrote: > On Tue, Jan 19, 2021 at 1:40 PM Juerg Haefliger > <juerg.haefliger@canonical.com> wrote: > > > > > I seem to have stumbled over the same or a similar issue with a Raspberry Pi > > 3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used > > as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I > > get tons of the below warnings after a couple of seconds: > > > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0) > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear: 00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear: 00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear: 00000030: 00 00 6a 58 00 00 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------ > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in: > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace: > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599] skb_warn_bad_offload+0x84/0x100 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020] netif_skb_features+0x218/0x2a0 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350] validate_xmit_skb.isra.0+0x28/0x2c8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125] validate_xmit_skb_list+0x44/0x98 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631] sch_direct_xmit+0xf0/0x3a8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599] __qdisc_run+0x140/0x668 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297] __dev_queue_xmit+0x59c/0x980 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446] dev_queue_xmit+0x1c/0x28 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237] ip_finish_output2+0x30c/0x558 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476] __ip_finish_output+0xe4/0x260 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715] ip_finish_output+0x3c/0xd8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683] ip_output+0xb4/0x148 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116] ip_forward_finish+0x7c/0xc0 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174] ip_forward+0x42c/0x4f0 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783] ip_rcv_finish+0x98/0xb8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481] ip_rcv+0xe0/0xf0 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552] __netif_receive_skb_one_core+0x5c/0x88 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597] __netif_receive_skb+0x20/0x70 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834] process_backlog+0xc0/0x1d0 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802] net_rx_action+0x134/0x478 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682] __do_softirq+0x130/0x378 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472] irq_exit+0xc0/0xe8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725] __handle_domain_irq+0x70/0xc8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185] el1_irq+0xb4/0x140 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377] arch_cpu_idle+0x18/0x28 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981] default_idle_call+0x44/0x178 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009] do_idle+0x224/0x270 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147] cpu_startup_entry+0x30/0x98 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916] rest_init+0xc8/0xd8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937] arch_call_rest_init+0x18/0x24 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829] start_kernel+0x57c/0x5b8 > > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]--- > > > > If I also add the following rule: > > $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > > I get a single warning followed by a TX timeout: > > > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] mac=(5194,14) net=(5208,20) trans=5228 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1)) > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0) > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.546872] dev name=eth0 feat=0x0x0000010000114b09 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.552060] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.560090] skb linear: 00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.568019] skb linear: 00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.575921] skb linear: 00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.583918] skb linear: 00000040: 9f fc > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.588105] ------------[ cut here ]------------ > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000) > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.607900] Modules linked in: > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.630396] pc : skb_warn_bad_offload+0x84/0x100 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.635175] lr : skb_warn_bad_offload+0x84/0x100 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.639953] sp : ffff800010003810 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.643374] x29: ffff800010003810 x28: ffff50043b196290 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.648870] x27: ffff500407371600 x26: 0000000000000001 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.670847] x19: ffff500407371600 x18: 0000000000000010 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.676340] x17: 0000000000000000 x16: 0000000000000000 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.681833] x15: 000000000000ad55 x14: 0000000000000010 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.703806] x7 : c0000000ffffefff x6 : 0000000000000003 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.709300] x5 : 0000000000000000 x4 : 0000000000000000 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.714791] x3 : 0000000000000100 x2 : 0000000000001000 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.720283] x1 : 0000000000000000 x0 : 0000000000000000 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.725778] Call trace: > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.728306] skb_warn_bad_offload+0x84/0x100 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.732728] netif_skb_features+0x218/0x2a0 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.737057] validate_xmit_skb.isra.0+0x28/0x2c8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.741833] validate_xmit_skb_list+0x44/0x98 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.746339] sch_direct_xmit+0xf0/0x3a8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.750309] __qdisc_run+0x140/0x668 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.754008] __dev_queue_xmit+0x59c/0x980 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.758156] dev_queue_xmit+0x1c/0x28 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.761945] neigh_resolve_output+0x108/0x230 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.766450] ip_finish_output2+0x180/0x558 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.770690] __ip_finish_output+0xe4/0x260 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.774928] ip_finish_output+0x3c/0xd8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.778896] ip_output+0xb4/0x148 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.782328] ip_forward_finish+0x7c/0xc0 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.786385] ip_forward+0x42c/0x4f0 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.789995] ip_rcv_finish+0x98/0xb8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.793694] ip_rcv+0xe0/0xf0 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.796765] __netif_receive_skb_one_core+0x5c/0x88 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.801810] __netif_receive_skb+0x20/0x70 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.806047] process_backlog+0xc0/0x1d0 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.810016] net_rx_action+0x134/0x478 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.813897] __do_softirq+0x130/0x378 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.817686] irq_exit+0xc0/0xe8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.820940] __handle_domain_irq+0x70/0xc8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.829099] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.838223] el1_irq+0xb4/0x140 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.845371] arch_cpu_idle+0x18/0x28 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.852882] default_idle_call+0x44/0x178 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.860756] do_idle+0x224/0x270 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.867794] cpu_startup_entry+0x30/0x98 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.875516] rest_init+0xc8/0xd8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.882496] arch_call_rest_init+0x18/0x24 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.890352] start_kernel+0x57c/0x5b8 > > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.897706] ---[ end trace a5789410f231a10b ]--- > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.046337] ------------[ cut here ]------------ > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.077534] Modules linked in: > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 5.11.0-rc4 #103 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.116085] pc : dev_watchdog+0x384/0x390 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.123857] lr : dev_watchdog+0x384/0x390 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.131558] sp : ffff800010013d90 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.138497] x29: ffff800010013d90 x28: 0000000000000140 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.156489] x25: 0000000000000002 x24: 0000000000000000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.165496] x23: 0000000000000001 x22: ffff50042e96b000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.183490] x19: 0000000000000000 x18: 0000000000000010 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.192493] x17: 0000000000000000 x16: 0000000000000000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.201473] x15: 000000000000ad55 x14: 0000000000000010 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.237362] x7 : c0000000ffffefff x6 : 0000000000000003 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.246353] x5 : 0000000000000000 x4 : 0000000000000000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.255328] x3 : 0000000000000100 x2 : 0000000000001000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.264273] x1 : 0000000000000000 x0 : 0000000000000000 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.273192] Call trace: > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.279183] dev_watchdog+0x384/0x390 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.286461] call_timer_fn+0x38/0x188 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.293762] run_timer_softirq+0x494/0x688 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.301489] __do_softirq+0x130/0x378 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.308767] irq_exit+0xc0/0xe8 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.315500] __handle_domain_irq+0x70/0xc8 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.323214] bcm2836_arm_irqchip_handle_irq+0x6c/0x80 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.331940] el1_irq+0xb4/0x140 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.338706] arch_cpu_idle+0x18/0x28 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.345916] default_idle_call+0x44/0x178 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.353577] do_idle+0x224/0x270 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.360433] cpu_startup_entry+0x2c/0x98 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.368000] secondary_start_kernel+0x148/0x180 > > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.376199] ---[ end trace a5789410f231a10c ]--- > > > > I did some bisecting and found commit [2] to be problematic. Reverting that > > commit plus the two follow-on fixes [3] and [4] prevents the warnings and > > timeout. I'm no networking expert so can't determine if [2] is broken or > > merely exposes a different underlying issue. I failed to reproduce the problem > > using a dedicated Realtek-based USB NIC plugged into the Pi, which points > > towards the lan78xx driver/HW being the culprit. > > > > Enabling KASAN didn't trigger any error reports. > > > > Let me know if there's anything else I can try to narrow this down. > > > > ...Juerg > > > > [1] > > On the Pi, I run: > > $ nc -l 1234 | dd status=progress >/dev/null > > > > And on another machine, that is configured to use the Pi as the gateway: > > $ nc 192.168.99.115 1234 < /dev/urandom > > and a couple of firefox instances that keep opening public URls. > > > > [2] > > Author: Eric Dumazet <edumazet@google.com> > > Date: Tue Nov 27 14:42:03 2018 -0800 > > > > tcp: implement coalescing on backlog queue > > > > In case GRO is not as efficient as it should be or disabled, > > we might have a user thread trapped in __release_sock() while > > softirq handler flood packets up to the point we have to drop. > > > > This patch balances work done from user thread and softirq, > > to give more chances to __release_sock() to complete its work > > before new packets are added the the backlog. > > > > This also helps if we receive many ACK packets, since GRO > > does not aggregate them. > > > > This patch brings ~60% throughput increase on a receiver > > without GRO, but the spectacular gain is really on > > 1000x release_sock() latency reduction I have measured. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > Cc: Yuchung Cheng <ycheng@google.com> > > Acked-by: Neal Cardwell <ncardwell@google.com> > > Signed-off-by: David S. Miller <davem@davemloft.net> > > > > [3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog() > > [4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog() > > > Oops. Very nice detective work :) > > It is true that the skb_clone() done in lan78xx (and some other usb > drivers) is probably triggering this issue. > (lan78xx is also lying about skb->truesize) > > skb_try_coalesce() bails if the target skb is cloned, but not if the source is. > > > Can you try the following patch ? Works. Nice :-) If you submit this and care you can add: Tested-by: Juerg Haefliger <juergh@canonical.com> Thanks a lot for the quick turnaround! ...Juerg > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 58207c7769d05693b650e3c93e4ef405a5d4b23a..4e82745d336fc3fb0d9ce8c92aaeb39702f64b8a > 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1760,6 +1760,7 @@ int tcp_v4_early_demux(struct sk_buff *skb) > bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) > { > u32 limit = READ_ONCE(sk->sk_rcvbuf) + READ_ONCE(sk->sk_sndbuf); > + u32 tail_gso_size, tail_gso_segs; > struct skb_shared_info *shinfo; > const struct tcphdr *th; > struct tcphdr *thtail; > @@ -1767,6 +1768,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) > unsigned int hdrlen; > bool fragstolen; > u32 gso_segs; > + u32 gso_size; > int delta; > > /* In case all data was pulled from skb frags (in __pskb_pull_tail()), > @@ -1792,13 +1794,6 @@ bool tcp_add_backlog(struct sock *sk, struct > sk_buff *skb) > */ > th = (const struct tcphdr *)skb->data; > hdrlen = th->doff * 4; > - shinfo = skb_shinfo(skb); > - > - if (!shinfo->gso_size) > - shinfo->gso_size = skb->len - hdrlen; > - > - if (!shinfo->gso_segs) > - shinfo->gso_segs = 1; > > tail = sk->sk_backlog.tail; > if (!tail) > @@ -1821,6 +1816,15 @@ bool tcp_add_backlog(struct sock *sk, struct > sk_buff *skb) > goto no_coalesce; > > __skb_pull(skb, hdrlen); > + > + shinfo = skb_shinfo(skb); > + gso_size = shinfo->gso_size ?: skb->len; > + gso_segs = shinfo->gso_segs ?: 1; > + > + shinfo = skb_shinfo(tail); > + tail_gso_size = shinfo->gso_size ?: (tail->len - hdrlen); > + tail_gso_segs = shinfo->gso_segs ?: 1; > + > if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) { > TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq; > > @@ -1847,11 +1851,8 @@ bool tcp_add_backlog(struct sock *sk, struct > sk_buff *skb) > } > > /* Not as strict as GRO. We only need to carry mss max value */ > - skb_shinfo(tail)->gso_size = max(shinfo->gso_size, > - skb_shinfo(tail)->gso_size); > - > - gso_segs = skb_shinfo(tail)->gso_segs + shinfo->gso_segs; > - skb_shinfo(tail)->gso_segs = min_t(u32, gso_segs, 0xFFFF); > + shinfo->gso_size = max(gso_size, tail_gso_size); > + shinfo->gso_segs = min_t(u32, gso_segs + tail_gso_segs, 0xFFFF); > > sk->sk_backlog.len += delta; > __NET_INC_STATS(sock_net(sk), [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() 2021-01-19 15:38 ` Juerg Haefliger @ 2021-01-19 15:50 ` Eric Dumazet 0 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2021-01-19 15:50 UTC (permalink / raw) To: Juerg Haefliger Cc: Heiner Kallweit, Eric Dumazet, netdev, Microchip Linux Driver Support, Woojung Huh On Tue, Jan 19, 2021 at 4:39 PM Juerg Haefliger <juerg.haefliger@canonical.com> wrote: > > On Tue, 19 Jan 2021 14:54:31 +0100 > Eric Dumazet <edumazet@google.com> wrote: > > > > > Oops. Very nice detective work :) > > > > It is true that the skb_clone() done in lan78xx (and some other usb > > drivers) is probably triggering this issue. > > (lan78xx is also lying about skb->truesize) > > > > skb_try_coalesce() bails if the target skb is cloned, but not if the source is. > > > > > > Can you try the following patch ? > > Works. Nice :-) > Excellent ! > If you submit this and care you can add: > > Tested-by: Juerg Haefliger <juergh@canonical.com> Sure, I will also add a : Bisected-by: Juerg Haefliger <juergh@canonical.com> Because you did quite a lot of work narrowing the problem ! > > Thanks a lot for the quick turnaround! > > ...Juerg > > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-01-20 0:16 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-209423-201211-atteo0d1ZY@https.bugzilla.kernel.org/> 2020-10-01 20:34 ` Fwd: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() Heiner Kallweit 2020-10-02 8:26 ` Eric Dumazet 2020-10-02 8:32 ` Eric Dumazet 2020-10-02 8:46 ` Eric Dumazet 2020-10-02 11:09 ` Heiner Kallweit 2020-10-02 11:48 ` Eric Dumazet 2020-10-08 16:37 ` Heiner Kallweit 2020-10-08 17:15 ` Eric Dumazet 2020-10-08 18:41 ` Heiner Kallweit 2020-10-08 18:50 ` Eric Dumazet 2020-10-08 19:07 ` Eric Dumazet 2020-10-08 20:54 ` Heiner Kallweit 2020-10-09 8:29 ` Eric Dumazet 2021-01-19 12:40 ` Juerg Haefliger 2021-01-19 13:47 ` Heiner Kallweit 2021-01-19 13:58 ` Eric Dumazet 2021-01-19 13:54 ` Eric Dumazet 2021-01-19 15:38 ` Juerg Haefliger 2021-01-19 15:50 ` Eric Dumazet
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.