All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
       [not found] <bug-209423-201211-atteo0d1ZY@https.bugzilla.kernel.org/>
@ 2020-10-01 20:34 ` Heiner Kallweit
  2020-10-02  8:26   ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2020-10-01 20:34 UTC (permalink / raw)
  To: Eric Dumazet, netdev

I have a problem with the following code in ndo_start_xmit() of
the r8169 driver. A user reported the WARN being triggered due
to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
The driver is widely used, therefore I'd expect much more such
reports if it should be a common problem. Not sure what's special.
My primary question: Is it a valid use case that gso_size is
greater than 0, and no SKB_GSO_ flag is set?
Any hint would be appreciated.



u32 mss = shinfo->gso_size;

	if (mss) {
		if (shinfo->gso_type & SKB_GSO_TCPV4) {
			opts[0] |= TD1_GTSENV4;
		} else if (shinfo->gso_type & SKB_GSO_TCPV6) {
			if (skb_cow_head(skb, 0))
				return false;

			tcp_v6_gso_csum_prep(skb);
			opts[0] |= TD1_GTSENV6;
		} else {
			WARN_ON_ONCE(1);
		}




-------- Forwarded Message --------
Subject: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
Date: Thu, 01 Oct 2020 19:19:24 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: hkallweit1@gmail.com

https://bugzilla.kernel.org/show_bug.cgi?id=209423

--- Comment #7 from Damian Wrobel (dwrobel@ertelnet.rybnik.pl) ---
Here it comes:

[86678.377120] ------------[ cut here ]------------
[86678.377155] gso_size = 1448, gso_type = 0x00000000
[86678.377381] WARNING: CPU: 0 PID: 0 at
drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800
[r8169]
[86678.377393] Modules linked in: tun nft_nat nft_masq nft_objref
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject
nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink
ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek
edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_amd
snd_hda_intel snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core snd_hwdep
snd_pcm hp_wmi snd_timer wmi_bmof sparse_keymap irqbypass snd sp5100_tco
i2c_piix4 soundcore k10temp fam15h_power rfkill_gpio rfkill acpi_cpufreq
ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec drm
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ax88179_178a
usbnet serio_raw r8169 mii
[86678.377442]  wmi video
[86678.377486] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-201.fc32.x86_64
#1
[86678.377495] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12
02/04/2020
[86678.377511] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169]
[86678.377521] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff
ff 44 89 ee 48 c7 c7 b0 72 36 c0 c6 05 a4 20 01 00 01 e8 0d 33 d8 e1 <0f> 0b 44
8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff
[86678.377533] RSP: 0018:ffffa8f280003c80 EFLAGS: 00010282
[86678.377542] RAX: 0000000000000026 RBX: ffff8d331abc6000 RCX:
0000000000000000
[86678.377551] RDX: ffff8d331b427060 RSI: ffff8d331b418d00 RDI:
0000000000000300
[86678.377559] RBP: ffff8d32b5bb8200 R08: 00000000000003d0 R09:
000000000000000d
[86678.377576] R10: 0000000000000730 R11: ffffa8f280003b15 R12:
00000000000001c0
[86678.377596] R13: 00000000000005a8 R14: 0000000000000022 R15:
000000000000001c
[86678.377606] FS:  0000000000000000(0000) GS:ffff8d331b400000(0000)
knlGS:0000000000000000
[86678.377617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[86678.377624] CR2: 00007fa516f64520 CR3: 00000000b6de6000 CR4:
00000000001406f0
[86678.377632] Call Trace:
[86678.377641]  <IRQ>
[86678.377657]  dev_hard_start_xmit+0x8d/0x1d0
[86678.377676]  sch_direct_xmit+0xeb/0x2f0
[86678.377687]  __dev_queue_xmit+0x710/0x8a0
[86678.377713]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
[86678.377725]  ? nf_hook_slow+0x3f/0xb0
[86678.377735]  ip_finish_output2+0x2ad/0x560
[86678.377746]  __netif_receive_skb_core+0x4f0/0xf40
[86678.377758]  ? packet_rcv+0x44/0x490
[86678.377770]  __netif_receive_skb_one_core+0x2d/0x70
[86678.377779]  process_backlog+0x96/0x160
[86678.377789]  net_rx_action+0x13c/0x3e0
[86678.377804]  ? usbnet_bh+0x24/0x2b0 [usbnet]
[86678.377815]  __do_softirq+0xd9/0x2c4
[86678.377825]  asm_call_on_stack+0x12/0x20
[86678.377835]  </IRQ>
[86678.377845]  do_softirq_own_stack+0x39/0x50
[86678.377855]  irq_exit_rcu+0xc2/0x100
[86678.377865]  common_interrupt+0x75/0x140
[86678.377875]  asm_common_interrupt+0x1e/0x40
[86678.377885] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
[86678.377894] Code: e0 ab 6b 5d e8 ab c4 7b ff 49 89 c7 0f 1f 44 00 00 31 ff
e8 7c dd 7b ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4
0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
[86678.377907] RSP: 0018:ffffffffa3a03e58 EFLAGS: 00000246
[86678.377915] RAX: ffff8d331b42a2c0 RBX: ffff8d3312f3e400 RCX:
000000000000001f
[86678.377923] RDX: 0000000000000000 RSI: 00000000401ec2e2 RDI:
0000000000000000
[86678.377931] RBP: ffffffffa3b78960 R08: 00004ed561df8e36 R09:
0000000000000006
[86678.377939] R10: 000000000000001d R11: 000000000000000e R12:
0000000000000002
[86678.377956] R13: ffff8d3312f3e400 R14: 0000000000000002 R15:
00004ed561df8e36
[86678.377970]  ? cpuidle_enter_state+0xa4/0x3f0
[86678.377980]  cpuidle_enter+0x29/0x40
[86678.377990]  do_idle+0x1d5/0x2a0
[86678.377999]  cpu_startup_entry+0x19/0x20
[86678.378009]  start_kernel+0x7f4/0x804
[86678.378022]  secondary_startup_64+0xb6/0xc0
[86678.378032] ---[ end trace 263bcddb7119c953 ]---

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-01 20:34 ` Fwd: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() Heiner Kallweit
@ 2020-10-02  8:26   ` Eric Dumazet
  2020-10-02  8:32     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2020-10-02  8:26 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev

On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> I have a problem with the following code in ndo_start_xmit() of
> the r8169 driver. A user reported the WARN being triggered due
> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
> The driver is widely used, therefore I'd expect much more such
> reports if it should be a common problem. Not sure what's special.
> My primary question: Is it a valid use case that gso_size is
> greater than 0, and no SKB_GSO_ flag is set?
> Any hint would be appreciated.
>
>

Maybe this is not a TCP packet ? But in this case GSO should have taken place.

You might add a
pr_err_once("gso_type=%x\n", shinfo->gso_type);

>
> u32 mss = shinfo->gso_size;
>
>         if (mss) {



>                 if (shinfo->gso_type & SKB_GSO_TCPV4) {
>                         opts[0] |= TD1_GTSENV4;
>                 } else if (shinfo->gso_type & SKB_GSO_TCPV6) {
>                         if (skb_cow_head(skb, 0))
>                                 return false;
>
>                         tcp_v6_gso_csum_prep(skb);
>                         opts[0] |= TD1_GTSENV6;
>                 } else {
>                         WARN_ON_ONCE(1);
>                 }
>
>
>
>
> -------- Forwarded Message --------
> Subject: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
> Date: Thu, 01 Oct 2020 19:19:24 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: hkallweit1@gmail.com
>
> https://bugzilla.kernel.org/show_bug.cgi?id=209423
>
> --- Comment #7 from Damian Wrobel (dwrobel@ertelnet.rybnik.pl) ---
> Here it comes:
>
> [86678.377120] ------------[ cut here ]------------
> [86678.377155] gso_size = 1448, gso_type = 0x00000000
> [86678.377381] WARNING: CPU: 0 PID: 0 at
> drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800
> [r8169]
> [86678.377393] Modules linked in: tun nft_nat nft_masq nft_objref
> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject
> nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw
> ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink
> ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek
> edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_amd
> snd_hda_intel snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core snd_hwdep
> snd_pcm hp_wmi snd_timer wmi_bmof sparse_keymap irqbypass snd sp5100_tco
> i2c_piix4 soundcore k10temp fam15h_power rfkill_gpio rfkill acpi_cpufreq
> ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec drm
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ax88179_178a
> usbnet serio_raw r8169 mii
> [86678.377442]  wmi video
> [86678.377486] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-201.fc32.x86_64
> #1
> [86678.377495] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12
> 02/04/2020
> [86678.377511] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169]
> [86678.377521] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff
> ff 44 89 ee 48 c7 c7 b0 72 36 c0 c6 05 a4 20 01 00 01 e8 0d 33 d8 e1 <0f> 0b 44
> 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff
> [86678.377533] RSP: 0018:ffffa8f280003c80 EFLAGS: 00010282
> [86678.377542] RAX: 0000000000000026 RBX: ffff8d331abc6000 RCX:
> 0000000000000000
> [86678.377551] RDX: ffff8d331b427060 RSI: ffff8d331b418d00 RDI:
> 0000000000000300
> [86678.377559] RBP: ffff8d32b5bb8200 R08: 00000000000003d0 R09:
> 000000000000000d
> [86678.377576] R10: 0000000000000730 R11: ffffa8f280003b15 R12:
> 00000000000001c0
> [86678.377596] R13: 00000000000005a8 R14: 0000000000000022 R15:
> 000000000000001c
> [86678.377606] FS:  0000000000000000(0000) GS:ffff8d331b400000(0000)
> knlGS:0000000000000000
> [86678.377617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [86678.377624] CR2: 00007fa516f64520 CR3: 00000000b6de6000 CR4:
> 00000000001406f0
> [86678.377632] Call Trace:
> [86678.377641]  <IRQ>
> [86678.377657]  dev_hard_start_xmit+0x8d/0x1d0
> [86678.377676]  sch_direct_xmit+0xeb/0x2f0
> [86678.377687]  __dev_queue_xmit+0x710/0x8a0
> [86678.377713]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
> [86678.377725]  ? nf_hook_slow+0x3f/0xb0
> [86678.377735]  ip_finish_output2+0x2ad/0x560
> [86678.377746]  __netif_receive_skb_core+0x4f0/0xf40
> [86678.377758]  ? packet_rcv+0x44/0x490
> [86678.377770]  __netif_receive_skb_one_core+0x2d/0x70
> [86678.377779]  process_backlog+0x96/0x160
> [86678.377789]  net_rx_action+0x13c/0x3e0
> [86678.377804]  ? usbnet_bh+0x24/0x2b0 [usbnet]
> [86678.377815]  __do_softirq+0xd9/0x2c4
> [86678.377825]  asm_call_on_stack+0x12/0x20
> [86678.377835]  </IRQ>
> [86678.377845]  do_softirq_own_stack+0x39/0x50
> [86678.377855]  irq_exit_rcu+0xc2/0x100
> [86678.377865]  common_interrupt+0x75/0x140
> [86678.377875]  asm_common_interrupt+0x1e/0x40
> [86678.377885] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
> [86678.377894] Code: e0 ab 6b 5d e8 ab c4 7b ff 49 89 c7 0f 1f 44 00 00 31 ff
> e8 7c dd 7b ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4
> 0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
> [86678.377907] RSP: 0018:ffffffffa3a03e58 EFLAGS: 00000246
> [86678.377915] RAX: ffff8d331b42a2c0 RBX: ffff8d3312f3e400 RCX:
> 000000000000001f
> [86678.377923] RDX: 0000000000000000 RSI: 00000000401ec2e2 RDI:
> 0000000000000000
> [86678.377931] RBP: ffffffffa3b78960 R08: 00004ed561df8e36 R09:
> 0000000000000006
> [86678.377939] R10: 000000000000001d R11: 000000000000000e R12:
> 0000000000000002
> [86678.377956] R13: ffff8d3312f3e400 R14: 0000000000000002 R15:
> 00004ed561df8e36
> [86678.377970]  ? cpuidle_enter_state+0xa4/0x3f0
> [86678.377980]  cpuidle_enter+0x29/0x40
> [86678.377990]  do_idle+0x1d5/0x2a0
> [86678.377999]  cpu_startup_entry+0x19/0x20
> [86678.378009]  start_kernel+0x7f4/0x804
> [86678.378022]  secondary_startup_64+0xb6/0xc0
> [86678.378032] ---[ end trace 263bcddb7119c953 ]---
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-02  8:26   ` Eric Dumazet
@ 2020-10-02  8:32     ` Eric Dumazet
  2020-10-02  8:46       ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2020-10-02  8:32 UTC (permalink / raw)
  To: Eric Dumazet, Heiner Kallweit; +Cc: netdev



On 10/2/20 10:26 AM, Eric Dumazet wrote:
> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> I have a problem with the following code in ndo_start_xmit() of
>> the r8169 driver. A user reported the WARN being triggered due
>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
>> The driver is widely used, therefore I'd expect much more such
>> reports if it should be a common problem. Not sure what's special.
>> My primary question: Is it a valid use case that gso_size is
>> greater than 0, and no SKB_GSO_ flag is set?
>> Any hint would be appreciated.
>>
>>
> 
> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
> 
> You might add a
> pr_err_once("gso_type=%x\n", shinfo->gso_type);
> 
>>
>> u32 mss = shinfo->gso_size;
>>
>>         if (mss) {
> 
> 
> 
>>                 if (shinfo->gso_type & SKB_GSO_TCPV4) {
>>                         opts[0] |= TD1_GTSENV4;
>>                 } else if (shinfo->gso_type & SKB_GSO_TCPV6) {
>>                         if (skb_cow_head(skb, 0))
>>                                 return false;
>>
>>                         tcp_v6_gso_csum_prep(skb);
>>                         opts[0] |= TD1_GTSENV6;
>>                 } else {
>>                         WARN_ON_ONCE(1);
>>                 }
>>
>>
>>
>>
>> -------- Forwarded Message --------
>> Subject: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
>> Date: Thu, 01 Oct 2020 19:19:24 +0000
>> From: bugzilla-daemon@bugzilla.kernel.org
>> To: hkallweit1@gmail.com
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=209423
>>
>> --- Comment #7 from Damian Wrobel (dwrobel@ertelnet.rybnik.pl) ---
>> Here it comes:
>>
>> [86678.377120] ------------[ cut here ]------------
>> [86678.377155] gso_size = 1448, gso_type = 0x00000000

Ah, sorry I see you already printed gso_type

Must then be a bug somewhere :/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-02  8:32     ` Eric Dumazet
@ 2020-10-02  8:46       ` Eric Dumazet
  2020-10-02 11:09         ` Heiner Kallweit
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2020-10-02  8:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Heiner Kallweit, netdev

On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 10/2/20 10:26 AM, Eric Dumazet wrote:
> > On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> I have a problem with the following code in ndo_start_xmit() of
> >> the r8169 driver. A user reported the WARN being triggered due
> >> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
> >> The driver is widely used, therefore I'd expect much more such
> >> reports if it should be a common problem. Not sure what's special.
> >> My primary question: Is it a valid use case that gso_size is
> >> greater than 0, and no SKB_GSO_ flag is set?
> >> Any hint would be appreciated.
> >>
> >>
> >
> > Maybe this is not a TCP packet ? But in this case GSO should have taken place.
> >
> > You might add a
> > pr_err_once("gso_type=%x\n", shinfo->gso_type);
> >

>
> Ah, sorry I see you already printed gso_type
>
> Must then be a bug somewhere :/


napi_reuse_skb() does :

skb_shinfo(skb)->gso_type = 0;

It does _not_ clear gso_size.

I wonder if in some cases we could reuse an skb while gso_size is not zero.

Normally, we set it only from dev_gro_receive() when the skb is queued
into GRO engine (status being GRO_HELD)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-02  8:46       ` Eric Dumazet
@ 2020-10-02 11:09         ` Heiner Kallweit
  2020-10-02 11:48           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2020-10-02 11:09 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet; +Cc: netdev

On 02.10.2020 10:46, Eric Dumazet wrote:
> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 10/2/20 10:26 AM, Eric Dumazet wrote:
>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> I have a problem with the following code in ndo_start_xmit() of
>>>> the r8169 driver. A user reported the WARN being triggered due
>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
>>>> The driver is widely used, therefore I'd expect much more such
>>>> reports if it should be a common problem. Not sure what's special.
>>>> My primary question: Is it a valid use case that gso_size is
>>>> greater than 0, and no SKB_GSO_ flag is set?
>>>> Any hint would be appreciated.
>>>>
>>>>
>>>
>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
>>>
>>> You might add a
>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
>>>
> 
>>
>> Ah, sorry I see you already printed gso_type
>>
>> Must then be a bug somewhere :/
> 
> 
> napi_reuse_skb() does :
> 
> skb_shinfo(skb)->gso_type = 0;
> 
> It does _not_ clear gso_size.
> 
> I wonder if in some cases we could reuse an skb while gso_size is not zero.
> 
> Normally, we set it only from dev_gro_receive() when the skb is queued
> into GRO engine (status being GRO_HELD)
> 
Thanks Eric. I'm no expert that deep in the network stack and just wonder
why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
The latter one does a
memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));

What I can do is letting the affected user test the following.

diff --git a/net/core/dev.c b/net/core/dev.c
index 62b06523b..8e75399cc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 
 	skb->encapsulation = 0;
 	skb_shinfo(skb)->gso_type = 0;
+	skb_shinfo(skb)->gso_size = 0;
 	skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
 	skb_ext_reset(skb);
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-02 11:09         ` Heiner Kallweit
@ 2020-10-02 11:48           ` Eric Dumazet
  2020-10-08 16:37             ` Heiner Kallweit
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2020-10-02 11:48 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Eric Dumazet, netdev

On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 02.10.2020 10:46, Eric Dumazet wrote:
> > On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >>
> >>
> >> On 10/2/20 10:26 AM, Eric Dumazet wrote:
> >>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> I have a problem with the following code in ndo_start_xmit() of
> >>>> the r8169 driver. A user reported the WARN being triggered due
> >>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
> >>>> The driver is widely used, therefore I'd expect much more such
> >>>> reports if it should be a common problem. Not sure what's special.
> >>>> My primary question: Is it a valid use case that gso_size is
> >>>> greater than 0, and no SKB_GSO_ flag is set?
> >>>> Any hint would be appreciated.
> >>>>
> >>>>
> >>>
> >>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
> >>>
> >>> You might add a
> >>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
> >>>
> >
> >>
> >> Ah, sorry I see you already printed gso_type
> >>
> >> Must then be a bug somewhere :/
> >
> >
> > napi_reuse_skb() does :
> >
> > skb_shinfo(skb)->gso_type = 0;
> >
> > It does _not_ clear gso_size.
> >
> > I wonder if in some cases we could reuse an skb while gso_size is not zero.
> >
> > Normally, we set it only from dev_gro_receive() when the skb is queued
> > into GRO engine (status being GRO_HELD)
> >
> Thanks Eric. I'm no expert that deep in the network stack and just wonder
> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
> The latter one does a
> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
>

memset() over the whole thing is more expensive.

Here we know the prior state of some fields, while __alloc_skb() just
got a piece of memory with random content.

> What I can do is letting the affected user test the following.
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 62b06523b..8e75399cc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
>
>         skb->encapsulation = 0;
>         skb_shinfo(skb)->gso_type = 0;
> +       skb_shinfo(skb)->gso_size = 0;
>         skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
>         skb_ext_reset(skb);
>

As I hinted, this should not be needed.

For debugging purposes, I would rather do :

BUG_ON(skb_shinfo(skb)->gso_size);


Nothing in GRO stack will change gso_size, unless the packet is queued
by GRO layer (after this, napi_reuse_skb() wont be called)

napi_reuse_skb() is only used when a packet has been aggregated to
another, and at this point gso_size should be still 0.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-02 11:48           ` Eric Dumazet
@ 2020-10-08 16:37             ` Heiner Kallweit
  2020-10-08 17:15               ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2020-10-08 16:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, netdev

On 02.10.2020 13:48, Eric Dumazet wrote:
> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 02.10.2020 10:46, Eric Dumazet wrote:
>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote:
>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> I have a problem with the following code in ndo_start_xmit() of
>>>>>> the r8169 driver. A user reported the WARN being triggered due
>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
>>>>>> The driver is widely used, therefore I'd expect much more such
>>>>>> reports if it should be a common problem. Not sure what's special.
>>>>>> My primary question: Is it a valid use case that gso_size is
>>>>>> greater than 0, and no SKB_GSO_ flag is set?
>>>>>> Any hint would be appreciated.
>>>>>>
>>>>>>
>>>>>
>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
>>>>>
>>>>> You might add a
>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
>>>>>
>>>
>>>>
>>>> Ah, sorry I see you already printed gso_type
>>>>
>>>> Must then be a bug somewhere :/
>>>
>>>
>>> napi_reuse_skb() does :
>>>
>>> skb_shinfo(skb)->gso_type = 0;
>>>
>>> It does _not_ clear gso_size.
>>>
>>> I wonder if in some cases we could reuse an skb while gso_size is not zero.
>>>
>>> Normally, we set it only from dev_gro_receive() when the skb is queued
>>> into GRO engine (status being GRO_HELD)
>>>
>> Thanks Eric. I'm no expert that deep in the network stack and just wonder
>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
>> The latter one does a
>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
>>
> 
> memset() over the whole thing is more expensive.
> 
> Here we know the prior state of some fields, while __alloc_skb() just
> got a piece of memory with random content.
> 
>> What I can do is letting the affected user test the following.
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 62b06523b..8e75399cc 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
>>
>>         skb->encapsulation = 0;
>>         skb_shinfo(skb)->gso_type = 0;
>> +       skb_shinfo(skb)->gso_size = 0;
>>         skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
>>         skb_ext_reset(skb);
>>
> 
> As I hinted, this should not be needed.
> 
> For debugging purposes, I would rather do :
> 
> BUG_ON(skb_shinfo(skb)->gso_size);
> 

We did the following for debugging:

diff --git a/net/core/dev.c b/net/core/dev.c
index 62b06523b..4c943b774 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
 {
 	u16 gso_segs = skb_shinfo(skb)->gso_segs;
 
+	if (!skb_shinfo(skb)->gso_type)
+		skb_warn_bad_offload(skb);
+
 	if (gso_segs > dev->gso_max_segs)
 		return features & ~NETIF_F_GSO_MASK;

Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps
to find out where in the network stack something goes wrong.


[236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
                mac=(778,14) net=(792,20) trans=812
                shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
                csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
                hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
[236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
[236222.967392] skb linear:   00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
[236222.967404] skb linear:   00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
[236222.967415] skb linear:   00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
[236222.967426] skb linear:   00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
[236222.967437] skb linear:   00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
[236222.967454] skb linear:   00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
[236222.967466] skb linear:   00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
[236222.967477] skb linear:   00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
[236222.967488] skb linear:   00000080: d2 a1 ab 1f 26 1d



> 
> Nothing in GRO stack will change gso_size, unless the packet is queued
> by GRO layer (after this, napi_reuse_skb() wont be called)
> 
> napi_reuse_skb() is only used when a packet has been aggregated to
> another, and at this point gso_size should be still 0.
> 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-08 16:37             ` Heiner Kallweit
@ 2020-10-08 17:15               ` Eric Dumazet
  2020-10-08 18:41                 ` Heiner Kallweit
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2020-10-08 17:15 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Eric Dumazet, netdev

On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 02.10.2020 13:48, Eric Dumazet wrote:
> > On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> On 02.10.2020 10:46, Eric Dumazet wrote:
> >>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 10/2/20 10:26 AM, Eric Dumazet wrote:
> >>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> I have a problem with the following code in ndo_start_xmit() of
> >>>>>> the r8169 driver. A user reported the WARN being triggered due
> >>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
> >>>>>> The driver is widely used, therefore I'd expect much more such
> >>>>>> reports if it should be a common problem. Not sure what's special.
> >>>>>> My primary question: Is it a valid use case that gso_size is
> >>>>>> greater than 0, and no SKB_GSO_ flag is set?
> >>>>>> Any hint would be appreciated.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
> >>>>>
> >>>>> You might add a
> >>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
> >>>>>
> >>>
> >>>>
> >>>> Ah, sorry I see you already printed gso_type
> >>>>
> >>>> Must then be a bug somewhere :/
> >>>
> >>>
> >>> napi_reuse_skb() does :
> >>>
> >>> skb_shinfo(skb)->gso_type = 0;
> >>>
> >>> It does _not_ clear gso_size.
> >>>
> >>> I wonder if in some cases we could reuse an skb while gso_size is not zero.
> >>>
> >>> Normally, we set it only from dev_gro_receive() when the skb is queued
> >>> into GRO engine (status being GRO_HELD)
> >>>
> >> Thanks Eric. I'm no expert that deep in the network stack and just wonder
> >> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
> >> The latter one does a
> >> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> >>
> >
> > memset() over the whole thing is more expensive.
> >
> > Here we know the prior state of some fields, while __alloc_skb() just
> > got a piece of memory with random content.
> >
> >> What I can do is letting the affected user test the following.
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 62b06523b..8e75399cc 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
> >>
> >>         skb->encapsulation = 0;
> >>         skb_shinfo(skb)->gso_type = 0;
> >> +       skb_shinfo(skb)->gso_size = 0;
> >>         skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> >>         skb_ext_reset(skb);
> >>
> >
> > As I hinted, this should not be needed.
> >
> > For debugging purposes, I would rather do :
> >
> > BUG_ON(skb_shinfo(skb)->gso_size);
> >
>
> We did the following for debugging:
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 62b06523b..4c943b774 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>  {
>         u16 gso_segs = skb_shinfo(skb)->gso_segs;
>
> +       if (!skb_shinfo(skb)->gso_type)
> +               skb_warn_bad_offload(skb);

You also want to get a stack trace here, to give us the call graph.


> +
>         if (gso_segs > dev->gso_max_segs)
>                 return features & ~NETIF_F_GSO_MASK;
>
> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps
> to find out where in the network stack something goes wrong.
>
>
> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
>                 mac=(778,14) net=(792,20) trans=812
>                 shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
>                 csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
>                 hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
> [236222.967392] skb linear:   00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
> [236222.967404] skb linear:   00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
> [236222.967415] skb linear:   00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
> [236222.967426] skb linear:   00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
> [236222.967437] skb linear:   00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
> [236222.967454] skb linear:   00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
> [236222.967466] skb linear:   00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
> [236222.967477] skb linear:   00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
> [236222.967488] skb linear:   00000080: d2 a1 ab 1f 26 1d
>
>
>
> >
> > Nothing in GRO stack will change gso_size, unless the packet is queued
> > by GRO layer (after this, napi_reuse_skb() wont be called)
> >
> > napi_reuse_skb() is only used when a packet has been aggregated to
> > another, and at this point gso_size should be still 0.
> >
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-08 17:15               ` Eric Dumazet
@ 2020-10-08 18:41                 ` Heiner Kallweit
  2020-10-08 18:50                   ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2020-10-08 18:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, netdev

On 08.10.2020 19:15, Eric Dumazet wrote:
> On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 02.10.2020 13:48, Eric Dumazet wrote:
>>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> On 02.10.2020 10:46, Eric Dumazet wrote:
>>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote:
>>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> I have a problem with the following code in ndo_start_xmit() of
>>>>>>>> the r8169 driver. A user reported the WARN being triggered due
>>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
>>>>>>>> The driver is widely used, therefore I'd expect much more such
>>>>>>>> reports if it should be a common problem. Not sure what's special.
>>>>>>>> My primary question: Is it a valid use case that gso_size is
>>>>>>>> greater than 0, and no SKB_GSO_ flag is set?
>>>>>>>> Any hint would be appreciated.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
>>>>>>>
>>>>>>> You might add a
>>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
>>>>>>>
>>>>>
>>>>>>
>>>>>> Ah, sorry I see you already printed gso_type
>>>>>>
>>>>>> Must then be a bug somewhere :/
>>>>>
>>>>>
>>>>> napi_reuse_skb() does :
>>>>>
>>>>> skb_shinfo(skb)->gso_type = 0;
>>>>>
>>>>> It does _not_ clear gso_size.
>>>>>
>>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero.
>>>>>
>>>>> Normally, we set it only from dev_gro_receive() when the skb is queued
>>>>> into GRO engine (status being GRO_HELD)
>>>>>
>>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder
>>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
>>>> The latter one does a
>>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
>>>>
>>>
>>> memset() over the whole thing is more expensive.
>>>
>>> Here we know the prior state of some fields, while __alloc_skb() just
>>> got a piece of memory with random content.
>>>
>>>> What I can do is letting the affected user test the following.
>>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 62b06523b..8e75399cc 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
>>>>
>>>>         skb->encapsulation = 0;
>>>>         skb_shinfo(skb)->gso_type = 0;
>>>> +       skb_shinfo(skb)->gso_size = 0;
>>>>         skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
>>>>         skb_ext_reset(skb);
>>>>
>>>
>>> As I hinted, this should not be needed.
>>>
>>> For debugging purposes, I would rather do :
>>>
>>> BUG_ON(skb_shinfo(skb)->gso_size);
>>>
>>
>> We did the following for debugging:
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 62b06523b..4c943b774 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>>  {
>>         u16 gso_segs = skb_shinfo(skb)->gso_segs;
>>
>> +       if (!skb_shinfo(skb)->gso_type)
>> +               skb_warn_bad_offload(skb);
> 
> You also want to get a stack trace here, to give us the call graph.
> 

Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423


[236222.967498] ------------[ cut here ]------------
[236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000)
[236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0
[236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
[236222.967776]  ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video
[236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1
[236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
[236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0
[236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25
[236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282
[236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000
[236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300
[236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720
[236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729
[236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac
[236222.968000] FS:  0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000
[236222.968013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0
[236222.968035] Call Trace:
[236222.968047]  <IRQ>
[236222.968064]  netif_skb_features+0x25e/0x2c0
[236222.968084]  ? ipt_do_table+0x333/0x600 [ip_tables]
[236222.968098]  validate_xmit_skb+0x1d/0x300
[236222.968111]  validate_xmit_skb_list+0x48/0x70
[236222.968126]  sch_direct_xmit+0x129/0x2f0
[236222.968140]  __dev_queue_xmit+0x710/0x8a0
[236222.968184]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
[236222.968200]  ? nf_hook_slow+0x3f/0xb0
[236222.968214]  ip_finish_output2+0x2ad/0x560
[236222.968229]  __netif_receive_skb_core+0x4f0/0xf40
[236222.968244]  ? packet_rcv+0x44/0x490
[236222.968257]  __netif_receive_skb_one_core+0x2d/0x70
[236222.968277]  process_backlog+0x96/0x160
[236222.968290]  net_rx_action+0x13c/0x3e0
[236222.968312]  ? usbnet_bh+0x24/0x2b0 [usbnet]
[236222.968327]  __do_softirq+0xd9/0x2c4
[236222.968340]  asm_call_on_stack+0x12/0x20
[236222.968350]  </IRQ>
[236222.968362]  do_softirq_own_stack+0x39/0x50
[236222.968376]  irq_exit_rcu+0xc2/0x100
[236222.968389]  common_interrupt+0x75/0x140
[236222.968405]  asm_common_interrupt+0x1e/0x40
[236222.968427] RIP: 0010:native_safe_halt+0xe/0x10
[236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
[236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246
[236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00
[236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006
[236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064
[236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
[236222.968535]  acpi_safe_halt+0x1b/0x30
[236222.968549]  acpi_idle_enter+0x27e/0x2e0
[236222.968566]  cpuidle_enter_state+0x81/0x3f0
[236222.968589]  cpuidle_enter+0x29/0x40
[236222.968602]  do_idle+0x1d5/0x2a0
[236222.968615]  cpu_startup_entry+0x19/0x20
[236222.968628]  start_kernel+0x7f4/0x804
[236222.968645]  secondary_startup_64+0xb6/0xc0
[236222.968659] ---[ end trace 8a4d7f639ad88505 ]---


> 
>> +
>>         if (gso_segs > dev->gso_max_segs)
>>                 return features & ~NETIF_F_GSO_MASK;
>>
>> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps
>> to find out where in the network stack something goes wrong.
>>
>>
>> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
>>                 mac=(778,14) net=(792,20) trans=812
>>                 shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
>>                 csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
>>                 hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
>> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
>> [236222.967392] skb linear:   00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
>> [236222.967404] skb linear:   00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
>> [236222.967415] skb linear:   00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
>> [236222.967426] skb linear:   00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
>> [236222.967437] skb linear:   00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
>> [236222.967454] skb linear:   00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
>> [236222.967466] skb linear:   00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
>> [236222.967477] skb linear:   00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
>> [236222.967488] skb linear:   00000080: d2 a1 ab 1f 26 1d
>>
>>
>>
>>>
>>> Nothing in GRO stack will change gso_size, unless the packet is queued
>>> by GRO layer (after this, napi_reuse_skb() wont be called)
>>>
>>> napi_reuse_skb() is only used when a packet has been aggregated to
>>> another, and at this point gso_size should be still 0.
>>>
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-08 18:41                 ` Heiner Kallweit
@ 2020-10-08 18:50                   ` Eric Dumazet
  2020-10-08 19:07                     ` Eric Dumazet
  2021-01-19 12:40                     ` Juerg Haefliger
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2020-10-08 18:50 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Eric Dumazet, netdev

On Thu, Oct 8, 2020 at 8:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 08.10.2020 19:15, Eric Dumazet wrote:
> > On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> On 02.10.2020 13:48, Eric Dumazet wrote:
> >>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> On 02.10.2020 10:46, Eric Dumazet wrote:
> >>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote:
> >>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> I have a problem with the following code in ndo_start_xmit() of
> >>>>>>>> the r8169 driver. A user reported the WARN being triggered due
> >>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
> >>>>>>>> The driver is widely used, therefore I'd expect much more such
> >>>>>>>> reports if it should be a common problem. Not sure what's special.
> >>>>>>>> My primary question: Is it a valid use case that gso_size is
> >>>>>>>> greater than 0, and no SKB_GSO_ flag is set?
> >>>>>>>> Any hint would be appreciated.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
> >>>>>>>
> >>>>>>> You might add a
> >>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>> Ah, sorry I see you already printed gso_type
> >>>>>>
> >>>>>> Must then be a bug somewhere :/
> >>>>>
> >>>>>
> >>>>> napi_reuse_skb() does :
> >>>>>
> >>>>> skb_shinfo(skb)->gso_type = 0;
> >>>>>
> >>>>> It does _not_ clear gso_size.
> >>>>>
> >>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero.
> >>>>>
> >>>>> Normally, we set it only from dev_gro_receive() when the skb is queued
> >>>>> into GRO engine (status being GRO_HELD)
> >>>>>
> >>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder
> >>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
> >>>> The latter one does a
> >>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> >>>>
> >>>
> >>> memset() over the whole thing is more expensive.
> >>>
> >>> Here we know the prior state of some fields, while __alloc_skb() just
> >>> got a piece of memory with random content.
> >>>
> >>>> What I can do is letting the affected user test the following.
> >>>>
> >>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>> index 62b06523b..8e75399cc 100644
> >>>> --- a/net/core/dev.c
> >>>> +++ b/net/core/dev.c
> >>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
> >>>>
> >>>>         skb->encapsulation = 0;
> >>>>         skb_shinfo(skb)->gso_type = 0;
> >>>> +       skb_shinfo(skb)->gso_size = 0;
> >>>>         skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> >>>>         skb_ext_reset(skb);
> >>>>
> >>>
> >>> As I hinted, this should not be needed.
> >>>
> >>> For debugging purposes, I would rather do :
> >>>
> >>> BUG_ON(skb_shinfo(skb)->gso_size);
> >>>
> >>
> >> We did the following for debugging:
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 62b06523b..4c943b774 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
> >>  {
> >>         u16 gso_segs = skb_shinfo(skb)->gso_segs;
> >>
> >> +       if (!skb_shinfo(skb)->gso_type)
> >> +               skb_warn_bad_offload(skb);
> >
> > You also want to get a stack trace here, to give us the call graph.
> >
>
> Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423
>
>
> [236222.967498] ------------[ cut here ]------------
> [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000)
> [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0
> [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
> [236222.967776]  ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video
> [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1
> [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
> [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0
> [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25
> [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282
> [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000
> [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300
> [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720
> [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729
> [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac
> [236222.968000] FS:  0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000
> [236222.968013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0
> [236222.968035] Call Trace:
> [236222.968047]  <IRQ>
> [236222.968064]  netif_skb_features+0x25e/0x2c0
> [236222.968084]  ? ipt_do_table+0x333/0x600 [ip_tables]
> [236222.968098]  validate_xmit_skb+0x1d/0x300
> [236222.968111]  validate_xmit_skb_list+0x48/0x70
> [236222.968126]  sch_direct_xmit+0x129/0x2f0
> [236222.968140]  __dev_queue_xmit+0x710/0x8a0
> [236222.968184]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
> [236222.968200]  ? nf_hook_slow+0x3f/0xb0
> [236222.968214]  ip_finish_output2+0x2ad/0x560
> [236222.968229]  __netif_receive_skb_core+0x4f0/0xf40
> [236222.968244]  ? packet_rcv+0x44/0x490
> [236222.968257]  __netif_receive_skb_one_core+0x2d/0x70
> [236222.968277]  process_backlog+0x96/0x160
> [236222.968290]  net_rx_action+0x13c/0x3e0
> [236222.968312]  ? usbnet_bh+0x24/0x2b0 [usbnet]
> [236222.968327]  __do_softirq+0xd9/0x2c4
> [236222.968340]  asm_call_on_stack+0x12/0x20
> [236222.968350]  </IRQ>
> [236222.968362]  do_softirq_own_stack+0x39/0x50
> [236222.968376]  irq_exit_rcu+0xc2/0x100
> [236222.968389]  common_interrupt+0x75/0x140
> [236222.968405]  asm_common_interrupt+0x1e/0x40
> [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10
> [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
> [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246
> [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
> [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00
> [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006
> [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064
> [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
> [236222.968535]  acpi_safe_halt+0x1b/0x30
> [236222.968549]  acpi_idle_enter+0x27e/0x2e0
> [236222.968566]  cpuidle_enter_state+0x81/0x3f0
> [236222.968589]  cpuidle_enter+0x29/0x40
> [236222.968602]  do_idle+0x1d5/0x2a0
> [236222.968615]  cpu_startup_entry+0x19/0x20
> [236222.968628]  start_kernel+0x7f4/0x804
> [236222.968645]  secondary_startup_64+0xb6/0xc0
> [236222.968659] ---[ end trace 8a4d7f639ad88505 ]---
>
>

OK, it would be nice to know what is the input interface

if4 -> look at "ip link | grep 4:"

Then identifying the driver that built such a strange packet (32000
bytes allocated in skb->head)

ethtool -i ifname



> >
> >> +
> >>         if (gso_segs > dev->gso_max_segs)
> >>                 return features & ~NETIF_F_GSO_MASK;
> >>
> >> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps
> >> to find out where in the network stack something goes wrong.
> >>
> >>
> >> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
> >>                 mac=(778,14) net=(792,20) trans=812
> >>                 shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
> >>                 csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
> >>                 hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
> >> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
> >> [236222.967392] skb linear:   00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
> >> [236222.967404] skb linear:   00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
> >> [236222.967415] skb linear:   00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
> >> [236222.967426] skb linear:   00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
> >> [236222.967437] skb linear:   00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
> >> [236222.967454] skb linear:   00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
> >> [236222.967466] skb linear:   00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
> >> [236222.967477] skb linear:   00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
> >> [236222.967488] skb linear:   00000080: d2 a1 ab 1f 26 1d
> >>
> >>
> >>
> >>>
> >>> Nothing in GRO stack will change gso_size, unless the packet is queued
> >>> by GRO layer (after this, napi_reuse_skb() wont be called)
> >>>
> >>> napi_reuse_skb() is only used when a packet has been aggregated to
> >>> another, and at this point gso_size should be still 0.
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-08 18:50                   ` Eric Dumazet
@ 2020-10-08 19:07                     ` Eric Dumazet
  2020-10-08 20:54                       ` Heiner Kallweit
  2021-01-19 12:40                     ` Juerg Haefliger
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2020-10-08 19:07 UTC (permalink / raw)
  To: Eric Dumazet, Heiner Kallweit; +Cc: netdev



On 10/8/20 8:50 PM, Eric Dumazet wrote:
>
> 
> OK, it would be nice to know what is the input interface
> 
> if4 -> look at "ip link | grep 4:"
> 
> Then identifying the driver that built such a strange packet (32000
> bytes allocated in skb->head)
> 
> ethtool -i ifname
>

According to https://bugzilla.kernel.org/show_bug.cgi?id=209423

iif4 is the tun200 interface used by openvpn.

So this might be a tun bug, or lack of proper SKB_GSO_DODGY validation
in our stack for buggy/malicious packets.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-08 19:07                     ` Eric Dumazet
@ 2020-10-08 20:54                       ` Heiner Kallweit
  2020-10-09  8:29                         ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2020-10-08 20:54 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet; +Cc: netdev

On 08.10.2020 21:07, Eric Dumazet wrote:
> 
> 
> On 10/8/20 8:50 PM, Eric Dumazet wrote:
>>
>>
>> OK, it would be nice to know what is the input interface
>>
>> if4 -> look at "ip link | grep 4:"
>>
>> Then identifying the driver that built such a strange packet (32000
>> bytes allocated in skb->head)
>>
>> ethtool -i ifname
>>
> 
> According to https://bugzilla.kernel.org/show_bug.cgi?id=209423
> 
> iif4 is the tun200 interface used by openvpn.
> 
> So this might be a tun bug, or lack of proper SKB_GSO_DODGY validation
> in our stack for buggy/malicious packets.
> 
> 

Following old commit sounds like it might be related:
622e0ca1cd4d ("gro: Fix bogus gso_size on the first fraglist entry")

This code however was removed later in 58025e46ea2d ("net: gro: remove
obsolete code from skb_gro_receive()")

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-08 20:54                       ` Heiner Kallweit
@ 2020-10-09  8:29                         ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2020-10-09  8:29 UTC (permalink / raw)
  To: Heiner Kallweit, Eric Dumazet, Eric Dumazet; +Cc: netdev



On 10/8/20 10:54 PM, Heiner Kallweit wrote:
> On 08.10.2020 21:07, Eric Dumazet wrote:
>>
>>
>> On 10/8/20 8:50 PM, Eric Dumazet wrote:
>>>
>>>
>>> OK, it would be nice to know what is the input interface
>>>
>>> if4 -> look at "ip link | grep 4:"
>>>
>>> Then identifying the driver that built such a strange packet (32000
>>> bytes allocated in skb->head)
>>>
>>> ethtool -i ifname
>>>
>>
>> According to https://bugzilla.kernel.org/show_bug.cgi?id=209423
>>
>> iif4 is the tun200 interface used by openvpn.
>>
>> So this might be a tun bug, or lack of proper SKB_GSO_DODGY validation
>> in our stack for buggy/malicious packets.
>>
>>
> 
> Following old commit sounds like it might be related:
> 622e0ca1cd4d ("gro: Fix bogus gso_size on the first fraglist entry")
> 
> This code however was removed later in 58025e46ea2d ("net: gro: remove
> obsolete code from skb_gro_receive()")
> 

GRO wont keep in its queues a GSO packet
dev_gro_receive()
...
NAPI_GRO_CB(skb)->flush = skb_is_gso(skb) || skb_has_frag_list(skb);
...

Also note that tun no longer can inject a packet with a length of 134 bytes pretending
to have gso_size == 538

Look at virtio_net_hdr_to_skb() and commits
6dd912f82680 ("net: check untrusted gso_size at kernel entry")
7c6d2ecbda83 ("net: be more gentle about silly gso requests coming from user")

Really looking at the skb layout I suspect some usbnet bug and a use-after-free.

ASAN build might help.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2020-10-08 18:50                   ` Eric Dumazet
  2020-10-08 19:07                     ` Eric Dumazet
@ 2021-01-19 12:40                     ` Juerg Haefliger
  2021-01-19 13:47                       ` Heiner Kallweit
  2021-01-19 13:54                       ` Eric Dumazet
  1 sibling, 2 replies; 19+ messages in thread
From: Juerg Haefliger @ 2021-01-19 12:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Heiner Kallweit, Eric Dumazet, netdev, UNGLinuxDriver, Woojung Huh

[-- Attachment #1: Type: text/plain, Size: 32213 bytes --]

On Thu, 8 Oct 2020 20:50:28 +0200
Eric Dumazet <edumazet@google.com> wrote:

> On Thu, Oct 8, 2020 at 8:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >
> > On 08.10.2020 19:15, Eric Dumazet wrote:  
> > > On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:  
> > >>
> > >> On 02.10.2020 13:48, Eric Dumazet wrote:  
> > >>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:  
> > >>>>
> > >>>> On 02.10.2020 10:46, Eric Dumazet wrote:  
> > >>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:  
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote:  
> > >>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:  
> > >>>>>>>>
> > >>>>>>>> I have a problem with the following code in ndo_start_xmit() of
> > >>>>>>>> the r8169 driver. A user reported the WARN being triggered due
> > >>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
> > >>>>>>>> The driver is widely used, therefore I'd expect much more such
> > >>>>>>>> reports if it should be a common problem. Not sure what's special.
> > >>>>>>>> My primary question: Is it a valid use case that gso_size is
> > >>>>>>>> greater than 0, and no SKB_GSO_ flag is set?
> > >>>>>>>> Any hint would be appreciated.
> > >>>>>>>>
> > >>>>>>>>  
> > >>>>>>>
> > >>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
> > >>>>>>>
> > >>>>>>> You might add a
> > >>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
> > >>>>>>>  
> > >>>>>  
> > >>>>>>
> > >>>>>> Ah, sorry I see you already printed gso_type
> > >>>>>>
> > >>>>>> Must then be a bug somewhere :/  
> > >>>>>
> > >>>>>
> > >>>>> napi_reuse_skb() does :
> > >>>>>
> > >>>>> skb_shinfo(skb)->gso_type = 0;
> > >>>>>
> > >>>>> It does _not_ clear gso_size.
> > >>>>>
> > >>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero.
> > >>>>>
> > >>>>> Normally, we set it only from dev_gro_receive() when the skb is queued
> > >>>>> into GRO engine (status being GRO_HELD)
> > >>>>>  
> > >>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder
> > >>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
> > >>>> The latter one does a
> > >>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> > >>>>  
> > >>>
> > >>> memset() over the whole thing is more expensive.
> > >>>
> > >>> Here we know the prior state of some fields, while __alloc_skb() just
> > >>> got a piece of memory with random content.
> > >>>  
> > >>>> What I can do is letting the affected user test the following.
> > >>>>
> > >>>> diff --git a/net/core/dev.c b/net/core/dev.c
> > >>>> index 62b06523b..8e75399cc 100644
> > >>>> --- a/net/core/dev.c
> > >>>> +++ b/net/core/dev.c
> > >>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
> > >>>>
> > >>>>         skb->encapsulation = 0;
> > >>>>         skb_shinfo(skb)->gso_type = 0;
> > >>>> +       skb_shinfo(skb)->gso_size = 0;
> > >>>>         skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> > >>>>         skb_ext_reset(skb);
> > >>>>  
> > >>>
> > >>> As I hinted, this should not be needed.
> > >>>
> > >>> For debugging purposes, I would rather do :
> > >>>
> > >>> BUG_ON(skb_shinfo(skb)->gso_size);
> > >>>  
> > >>
> > >> We did the following for debugging:
> > >>
> > >> diff --git a/net/core/dev.c b/net/core/dev.c
> > >> index 62b06523b..4c943b774 100644
> > >> --- a/net/core/dev.c
> > >> +++ b/net/core/dev.c
> > >> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
> > >>  {
> > >>         u16 gso_segs = skb_shinfo(skb)->gso_segs;
> > >>
> > >> +       if (!skb_shinfo(skb)->gso_type)
> > >> +               skb_warn_bad_offload(skb);  
> > >
> > > You also want to get a stack trace here, to give us the call graph.
> > >  
> >
> > Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423
> >
> >
> > [236222.967498] ------------[ cut here ]------------
> > [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000)
> > [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0
> > [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
> > [236222.967776]  ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video
> > [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1
> > [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
> > [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0
> > [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25
> > [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282
> > [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000
> > [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300
> > [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720
> > [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729
> > [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac
> > [236222.968000] FS:  0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000
> > [236222.968013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0
> > [236222.968035] Call Trace:
> > [236222.968047]  <IRQ>
> > [236222.968064]  netif_skb_features+0x25e/0x2c0
> > [236222.968084]  ? ipt_do_table+0x333/0x600 [ip_tables]
> > [236222.968098]  validate_xmit_skb+0x1d/0x300
> > [236222.968111]  validate_xmit_skb_list+0x48/0x70
> > [236222.968126]  sch_direct_xmit+0x129/0x2f0
> > [236222.968140]  __dev_queue_xmit+0x710/0x8a0
> > [236222.968184]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
> > [236222.968200]  ? nf_hook_slow+0x3f/0xb0
> > [236222.968214]  ip_finish_output2+0x2ad/0x560
> > [236222.968229]  __netif_receive_skb_core+0x4f0/0xf40
> > [236222.968244]  ? packet_rcv+0x44/0x490
> > [236222.968257]  __netif_receive_skb_one_core+0x2d/0x70
> > [236222.968277]  process_backlog+0x96/0x160
> > [236222.968290]  net_rx_action+0x13c/0x3e0
> > [236222.968312]  ? usbnet_bh+0x24/0x2b0 [usbnet]
> > [236222.968327]  __do_softirq+0xd9/0x2c4
> > [236222.968340]  asm_call_on_stack+0x12/0x20
> > [236222.968350]  </IRQ>
> > [236222.968362]  do_softirq_own_stack+0x39/0x50
> > [236222.968376]  irq_exit_rcu+0xc2/0x100
> > [236222.968389]  common_interrupt+0x75/0x140
> > [236222.968405]  asm_common_interrupt+0x1e/0x40
> > [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10
> > [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
> > [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246
> > [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
> > [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00
> > [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006
> > [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064
> > [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
> > [236222.968535]  acpi_safe_halt+0x1b/0x30
> > [236222.968549]  acpi_idle_enter+0x27e/0x2e0
> > [236222.968566]  cpuidle_enter_state+0x81/0x3f0
> > [236222.968589]  cpuidle_enter+0x29/0x40
> > [236222.968602]  do_idle+0x1d5/0x2a0
> > [236222.968615]  cpu_startup_entry+0x19/0x20
> > [236222.968628]  start_kernel+0x7f4/0x804
> > [236222.968645]  secondary_startup_64+0xb6/0xc0
> > [236222.968659] ---[ end trace 8a4d7f639ad88505 ]---
> >
> >  
> 
> OK, it would be nice to know what is the input interface
> 
> if4 -> look at "ip link | grep 4:"
> 
> Then identifying the driver that built such a strange packet (32000
> bytes allocated in skb->head)
> 
> ethtool -i ifname
> 
> 
> 
> > >  
> > >> +
> > >>         if (gso_segs > dev->gso_max_segs)
> > >>                 return features & ~NETIF_F_GSO_MASK;
> > >>
> > >> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps
> > >> to find out where in the network stack something goes wrong.
> > >>
> > >>
> > >> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
> > >>                 mac=(778,14) net=(792,20) trans=812
> > >>                 shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
> > >>                 csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
> > >>                 hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
> > >> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
> > >> [236222.967392] skb linear:   00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
> > >> [236222.967404] skb linear:   00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
> > >> [236222.967415] skb linear:   00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
> > >> [236222.967426] skb linear:   00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
> > >> [236222.967437] skb linear:   00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
> > >> [236222.967454] skb linear:   00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
> > >> [236222.967466] skb linear:   00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
> > >> [236222.967477] skb linear:   00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
> > >> [236222.967488] skb linear:   00000080: d2 a1 ab 1f 26 1d
> > >>
> > >>
> > >>  
> > >>>
> > >>> Nothing in GRO stack will change gso_size, unless the packet is queued
> > >>> by GRO layer (after this, napi_reuse_skb() wont be called)
> > >>>
> > >>> napi_reuse_skb() is only used when a packet has been aggregated to
> > >>> another, and at this point gso_size should be still 0.
> > >>>  
> > >>  

I seem to have stumbled over the same or a similar issue with a Raspberry Pi
3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used
as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I
get tons of the below warnings after a couple of seconds:

Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear:   00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear:   00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear:   00000030: 00 00 6a 58 00 00
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in:
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000 
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace:
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599]  skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020]  netif_skb_features+0x218/0x2a0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350]  validate_xmit_skb.isra.0+0x28/0x2c8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125]  validate_xmit_skb_list+0x44/0x98
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631]  sch_direct_xmit+0xf0/0x3a8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599]  __qdisc_run+0x140/0x668
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297]  __dev_queue_xmit+0x59c/0x980
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446]  dev_queue_xmit+0x1c/0x28
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237]  ip_finish_output2+0x30c/0x558
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476]  __ip_finish_output+0xe4/0x260
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715]  ip_finish_output+0x3c/0xd8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683]  ip_output+0xb4/0x148
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116]  ip_forward_finish+0x7c/0xc0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174]  ip_forward+0x42c/0x4f0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783]  ip_rcv_finish+0x98/0xb8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481]  ip_rcv+0xe0/0xf0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552]  __netif_receive_skb_one_core+0x5c/0x88
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597]  __netif_receive_skb+0x20/0x70
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834]  process_backlog+0xc0/0x1d0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802]  net_rx_action+0x134/0x478
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682]  __do_softirq+0x130/0x378
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472]  irq_exit+0xc0/0xe8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725]  __handle_domain_irq+0x70/0xc8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185]  el1_irq+0xb4/0x140
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377]  arch_cpu_idle+0x18/0x28
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981]  default_idle_call+0x44/0x178
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009]  do_idle+0x224/0x270
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147]  cpu_startup_entry+0x30/0x98
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916]  rest_init+0xc8/0xd8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937]  arch_call_rest_init+0x18/0x24
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829]  start_kernel+0x57c/0x5b8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]---

If I also add the following rule:
  $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
I get a single warning followed by a TX timeout:

Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] mac=(5194,14) net=(5208,20) trans=5228
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.546872] dev name=eth0 feat=0x0x0000010000114b09
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.552060] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.560090] skb linear:   00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.568019] skb linear:   00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.575921] skb linear:   00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.583918] skb linear:   00000040: 9f fc
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.588105] ------------[ cut here ]------------
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.607900] Modules linked in:
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.630396] pc : skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.635175] lr : skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.639953] sp : ffff800010003810
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.643374] x29: ffff800010003810 x28: ffff50043b196290 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.648870] x27: ffff500407371600 x26: 0000000000000001 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.670847] x19: ffff500407371600 x18: 0000000000000010 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.676340] x17: 0000000000000000 x16: 0000000000000000 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.681833] x15: 000000000000ad55 x14: 0000000000000010 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.703806] x7 : c0000000ffffefff x6 : 0000000000000003 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.709300] x5 : 0000000000000000 x4 : 0000000000000000 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.714791] x3 : 0000000000000100 x2 : 0000000000001000 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.720283] x1 : 0000000000000000 x0 : 0000000000000000 
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.725778] Call trace:
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.728306]  skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.732728]  netif_skb_features+0x218/0x2a0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.737057]  validate_xmit_skb.isra.0+0x28/0x2c8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.741833]  validate_xmit_skb_list+0x44/0x98
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.746339]  sch_direct_xmit+0xf0/0x3a8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.750309]  __qdisc_run+0x140/0x668
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.754008]  __dev_queue_xmit+0x59c/0x980
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.758156]  dev_queue_xmit+0x1c/0x28
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.761945]  neigh_resolve_output+0x108/0x230
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.766450]  ip_finish_output2+0x180/0x558
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.770690]  __ip_finish_output+0xe4/0x260
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.774928]  ip_finish_output+0x3c/0xd8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.778896]  ip_output+0xb4/0x148
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.782328]  ip_forward_finish+0x7c/0xc0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.786385]  ip_forward+0x42c/0x4f0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.789995]  ip_rcv_finish+0x98/0xb8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.793694]  ip_rcv+0xe0/0xf0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.796765]  __netif_receive_skb_one_core+0x5c/0x88
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.801810]  __netif_receive_skb+0x20/0x70
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.806047]  process_backlog+0xc0/0x1d0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.810016]  net_rx_action+0x134/0x478
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.813897]  __do_softirq+0x130/0x378
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.817686]  irq_exit+0xc0/0xe8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.820940]  __handle_domain_irq+0x70/0xc8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.829099]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.838223]  el1_irq+0xb4/0x140
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.845371]  arch_cpu_idle+0x18/0x28
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.852882]  default_idle_call+0x44/0x178
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.860756]  do_idle+0x224/0x270
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.867794]  cpu_startup_entry+0x30/0x98
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.875516]  rest_init+0xc8/0xd8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.882496]  arch_call_rest_init+0x18/0x24
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.890352]  start_kernel+0x57c/0x5b8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.897706] ---[ end trace a5789410f231a10b ]---
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.046337] ------------[ cut here ]------------
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.077534] Modules linked in:
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W         5.11.0-rc4 #103
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.116085] pc : dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.123857] lr : dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.131558] sp : ffff800010013d90
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.138497] x29: ffff800010013d90 x28: 0000000000000140 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.156489] x25: 0000000000000002 x24: 0000000000000000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.165496] x23: 0000000000000001 x22: ffff50042e96b000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.183490] x19: 0000000000000000 x18: 0000000000000010 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.192493] x17: 0000000000000000 x16: 0000000000000000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.201473] x15: 000000000000ad55 x14: 0000000000000010 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.237362] x7 : c0000000ffffefff x6 : 0000000000000003 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.246353] x5 : 0000000000000000 x4 : 0000000000000000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.255328] x3 : 0000000000000100 x2 : 0000000000001000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.264273] x1 : 0000000000000000 x0 : 0000000000000000 
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.273192] Call trace:
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.279183]  dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.286461]  call_timer_fn+0x38/0x188
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.293762]  run_timer_softirq+0x494/0x688
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.301489]  __do_softirq+0x130/0x378
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.308767]  irq_exit+0xc0/0xe8
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.315500]  __handle_domain_irq+0x70/0xc8
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.323214]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.331940]  el1_irq+0xb4/0x140
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.338706]  arch_cpu_idle+0x18/0x28
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.345916]  default_idle_call+0x44/0x178
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.353577]  do_idle+0x224/0x270
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.360433]  cpu_startup_entry+0x2c/0x98
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.368000]  secondary_start_kernel+0x148/0x180
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.376199] ---[ end trace a5789410f231a10c ]---

I did some bisecting and found commit [2] to be problematic. Reverting that
commit plus the two follow-on fixes [3] and [4] prevents the warnings and
timeout. I'm no networking expert so can't determine if [2] is broken or
merely exposes a different underlying issue. I failed to reproduce the problem
using a dedicated Realtek-based USB NIC plugged into the Pi, which points
towards the lan78xx driver/HW being the culprit.

Enabling KASAN didn't trigger any error reports.

Let me know if there's anything else I can try to narrow this down.

...Juerg

[1]
On the Pi, I run:
  $ nc -l 1234 | dd status=progress >/dev/null

And on another machine, that is configured to use the Pi as the gateway:
  $ nc 192.168.99.115 1234 < /dev/urandom
and a couple of firefox instances that keep opening public URls.

[2]
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Nov 27 14:42:03 2018 -0800

    tcp: implement coalescing on backlog queue
    
    In case GRO is not as efficient as it should be or disabled,
    we might have a user thread trapped in __release_sock() while
    softirq handler flood packets up to the point we have to drop.
    
    This patch balances work done from user thread and softirq,
    to give more chances to __release_sock() to complete its work
    before new packets are added the the backlog.
    
    This also helps if we receive many ACK packets, since GRO
    does not aggregate them.
    
    This patch brings ~60% throughput increase on a receiver
    without GRO, but the spectacular gain is really on
    1000x release_sock() latency reduction I have measured.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Neal Cardwell <ncardwell@google.com>
    Cc: Yuchung Cheng <ycheng@google.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

[3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog()
[4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog()

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2021-01-19 12:40                     ` Juerg Haefliger
@ 2021-01-19 13:47                       ` Heiner Kallweit
  2021-01-19 13:58                         ` Eric Dumazet
  2021-01-19 13:54                       ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2021-01-19 13:47 UTC (permalink / raw)
  To: Juerg Haefliger, Eric Dumazet
  Cc: Eric Dumazet, netdev, UNGLinuxDriver, Woojung Huh

On 19.01.2021 13:40, Juerg Haefliger wrote:
> On Thu, 8 Oct 2020 20:50:28 +0200
> Eric Dumazet <edumazet@google.com> wrote:
> 
>> On Thu, Oct 8, 2020 at 8:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>
>>> On 08.10.2020 19:15, Eric Dumazet wrote:  
>>>> On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:  
>>>>>
>>>>> On 02.10.2020 13:48, Eric Dumazet wrote:  
>>>>>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:  
>>>>>>>
>>>>>>> On 02.10.2020 10:46, Eric Dumazet wrote:  
>>>>>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:  
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote:  
>>>>>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:  
>>>>>>>>>>>
>>>>>>>>>>> I have a problem with the following code in ndo_start_xmit() of
>>>>>>>>>>> the r8169 driver. A user reported the WARN being triggered due
>>>>>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
>>>>>>>>>>> The driver is widely used, therefore I'd expect much more such
>>>>>>>>>>> reports if it should be a common problem. Not sure what's special.
>>>>>>>>>>> My primary question: Is it a valid use case that gso_size is
>>>>>>>>>>> greater than 0, and no SKB_GSO_ flag is set?
>>>>>>>>>>> Any hint would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>>  
>>>>>>>>>>
>>>>>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
>>>>>>>>>>
>>>>>>>>>> You might add a
>>>>>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
>>>>>>>>>>  
>>>>>>>>  
>>>>>>>>>
>>>>>>>>> Ah, sorry I see you already printed gso_type
>>>>>>>>>
>>>>>>>>> Must then be a bug somewhere :/  
>>>>>>>>
>>>>>>>>
>>>>>>>> napi_reuse_skb() does :
>>>>>>>>
>>>>>>>> skb_shinfo(skb)->gso_type = 0;
>>>>>>>>
>>>>>>>> It does _not_ clear gso_size.
>>>>>>>>
>>>>>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero.
>>>>>>>>
>>>>>>>> Normally, we set it only from dev_gro_receive() when the skb is queued
>>>>>>>> into GRO engine (status being GRO_HELD)
>>>>>>>>  
>>>>>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder
>>>>>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
>>>>>>> The latter one does a
>>>>>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
>>>>>>>  
>>>>>>
>>>>>> memset() over the whole thing is more expensive.
>>>>>>
>>>>>> Here we know the prior state of some fields, while __alloc_skb() just
>>>>>> got a piece of memory with random content.
>>>>>>  
>>>>>>> What I can do is letting the affected user test the following.
>>>>>>>
>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>> index 62b06523b..8e75399cc 100644
>>>>>>> --- a/net/core/dev.c
>>>>>>> +++ b/net/core/dev.c
>>>>>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
>>>>>>>
>>>>>>>         skb->encapsulation = 0;
>>>>>>>         skb_shinfo(skb)->gso_type = 0;
>>>>>>> +       skb_shinfo(skb)->gso_size = 0;
>>>>>>>         skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
>>>>>>>         skb_ext_reset(skb);
>>>>>>>  
>>>>>>
>>>>>> As I hinted, this should not be needed.
>>>>>>
>>>>>> For debugging purposes, I would rather do :
>>>>>>
>>>>>> BUG_ON(skb_shinfo(skb)->gso_size);
>>>>>>  
>>>>>
>>>>> We did the following for debugging:
>>>>>
>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>> index 62b06523b..4c943b774 100644
>>>>> --- a/net/core/dev.c
>>>>> +++ b/net/core/dev.c
>>>>> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>>>>>  {
>>>>>         u16 gso_segs = skb_shinfo(skb)->gso_segs;
>>>>>
>>>>> +       if (!skb_shinfo(skb)->gso_type)
>>>>> +               skb_warn_bad_offload(skb);  
>>>>
>>>> You also want to get a stack trace here, to give us the call graph.
>>>>  
>>>
>>> Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423
>>>
>>>
>>> [236222.967498] ------------[ cut here ]------------
>>> [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000)
>>> [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0
>>> [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
>>> [236222.967776]  ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video
>>> [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1
>>> [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
>>> [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0
>>> [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25
>>> [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282
>>> [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000
>>> [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300
>>> [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720
>>> [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729
>>> [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac
>>> [236222.968000] FS:  0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000
>>> [236222.968013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0
>>> [236222.968035] Call Trace:
>>> [236222.968047]  <IRQ>
>>> [236222.968064]  netif_skb_features+0x25e/0x2c0
>>> [236222.968084]  ? ipt_do_table+0x333/0x600 [ip_tables]
>>> [236222.968098]  validate_xmit_skb+0x1d/0x300
>>> [236222.968111]  validate_xmit_skb_list+0x48/0x70
>>> [236222.968126]  sch_direct_xmit+0x129/0x2f0
>>> [236222.968140]  __dev_queue_xmit+0x710/0x8a0
>>> [236222.968184]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
>>> [236222.968200]  ? nf_hook_slow+0x3f/0xb0
>>> [236222.968214]  ip_finish_output2+0x2ad/0x560
>>> [236222.968229]  __netif_receive_skb_core+0x4f0/0xf40
>>> [236222.968244]  ? packet_rcv+0x44/0x490
>>> [236222.968257]  __netif_receive_skb_one_core+0x2d/0x70
>>> [236222.968277]  process_backlog+0x96/0x160
>>> [236222.968290]  net_rx_action+0x13c/0x3e0
>>> [236222.968312]  ? usbnet_bh+0x24/0x2b0 [usbnet]
>>> [236222.968327]  __do_softirq+0xd9/0x2c4
>>> [236222.968340]  asm_call_on_stack+0x12/0x20
>>> [236222.968350]  </IRQ>
>>> [236222.968362]  do_softirq_own_stack+0x39/0x50
>>> [236222.968376]  irq_exit_rcu+0xc2/0x100
>>> [236222.968389]  common_interrupt+0x75/0x140
>>> [236222.968405]  asm_common_interrupt+0x1e/0x40
>>> [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10
>>> [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
>>> [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246
>>> [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
>>> [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00
>>> [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006
>>> [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064
>>> [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
>>> [236222.968535]  acpi_safe_halt+0x1b/0x30
>>> [236222.968549]  acpi_idle_enter+0x27e/0x2e0
>>> [236222.968566]  cpuidle_enter_state+0x81/0x3f0
>>> [236222.968589]  cpuidle_enter+0x29/0x40
>>> [236222.968602]  do_idle+0x1d5/0x2a0
>>> [236222.968615]  cpu_startup_entry+0x19/0x20
>>> [236222.968628]  start_kernel+0x7f4/0x804
>>> [236222.968645]  secondary_startup_64+0xb6/0xc0
>>> [236222.968659] ---[ end trace 8a4d7f639ad88505 ]---
>>>
>>>  
>>
>> OK, it would be nice to know what is the input interface
>>
>> if4 -> look at "ip link | grep 4:"
>>
>> Then identifying the driver that built such a strange packet (32000
>> bytes allocated in skb->head)
>>
>> ethtool -i ifname
>>
>>
>>
>>>>  
>>>>> +
>>>>>         if (gso_segs > dev->gso_max_segs)
>>>>>                 return features & ~NETIF_F_GSO_MASK;
>>>>>
>>>>> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps
>>>>> to find out where in the network stack something goes wrong.
>>>>>
>>>>>
>>>>> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
>>>>>                 mac=(778,14) net=(792,20) trans=812
>>>>>                 shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
>>>>>                 csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
>>>>>                 hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
>>>>> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
>>>>> [236222.967392] skb linear:   00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
>>>>> [236222.967404] skb linear:   00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
>>>>> [236222.967415] skb linear:   00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
>>>>> [236222.967426] skb linear:   00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
>>>>> [236222.967437] skb linear:   00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
>>>>> [236222.967454] skb linear:   00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
>>>>> [236222.967466] skb linear:   00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
>>>>> [236222.967477] skb linear:   00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
>>>>> [236222.967488] skb linear:   00000080: d2 a1 ab 1f 26 1d
>>>>>
>>>>>
>>>>>  
>>>>>>
>>>>>> Nothing in GRO stack will change gso_size, unless the packet is queued
>>>>>> by GRO layer (after this, napi_reuse_skb() wont be called)
>>>>>>
>>>>>> napi_reuse_skb() is only used when a packet has been aggregated to
>>>>>> another, and at this point gso_size should be still 0.
>>>>>>  
>>>>>  
> 
> I seem to have stumbled over the same or a similar issue with a Raspberry Pi
> 3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used
> as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I
> get tons of the below warnings after a couple of seconds:
> 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear:   00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear:   00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear:   00000030: 00 00 6a 58 00 00
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in:
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000 
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace:
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599]  skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020]  netif_skb_features+0x218/0x2a0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350]  validate_xmit_skb.isra.0+0x28/0x2c8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125]  validate_xmit_skb_list+0x44/0x98
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631]  sch_direct_xmit+0xf0/0x3a8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599]  __qdisc_run+0x140/0x668
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297]  __dev_queue_xmit+0x59c/0x980
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446]  dev_queue_xmit+0x1c/0x28
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237]  ip_finish_output2+0x30c/0x558
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476]  __ip_finish_output+0xe4/0x260
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715]  ip_finish_output+0x3c/0xd8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683]  ip_output+0xb4/0x148
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116]  ip_forward_finish+0x7c/0xc0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174]  ip_forward+0x42c/0x4f0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783]  ip_rcv_finish+0x98/0xb8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481]  ip_rcv+0xe0/0xf0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552]  __netif_receive_skb_one_core+0x5c/0x88
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597]  __netif_receive_skb+0x20/0x70
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834]  process_backlog+0xc0/0x1d0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802]  net_rx_action+0x134/0x478
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682]  __do_softirq+0x130/0x378
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472]  irq_exit+0xc0/0xe8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725]  __handle_domain_irq+0x70/0xc8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185]  el1_irq+0xb4/0x140
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377]  arch_cpu_idle+0x18/0x28
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981]  default_idle_call+0x44/0x178
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009]  do_idle+0x224/0x270
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147]  cpu_startup_entry+0x30/0x98
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916]  rest_init+0xc8/0xd8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937]  arch_call_rest_init+0x18/0x24
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829]  start_kernel+0x57c/0x5b8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]---
> 
> If I also add the following rule:
>   $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> I get a single warning followed by a TX timeout:
> 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] mac=(5194,14) net=(5208,20) trans=5228
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.546872] dev name=eth0 feat=0x0x0000010000114b09
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.552060] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.560090] skb linear:   00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.568019] skb linear:   00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.575921] skb linear:   00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.583918] skb linear:   00000040: 9f fc
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.588105] ------------[ cut here ]------------
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.607900] Modules linked in:
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.630396] pc : skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.635175] lr : skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.639953] sp : ffff800010003810
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.643374] x29: ffff800010003810 x28: ffff50043b196290 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.648870] x27: ffff500407371600 x26: 0000000000000001 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.670847] x19: ffff500407371600 x18: 0000000000000010 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.676340] x17: 0000000000000000 x16: 0000000000000000 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.681833] x15: 000000000000ad55 x14: 0000000000000010 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.703806] x7 : c0000000ffffefff x6 : 0000000000000003 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.709300] x5 : 0000000000000000 x4 : 0000000000000000 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.714791] x3 : 0000000000000100 x2 : 0000000000001000 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.720283] x1 : 0000000000000000 x0 : 0000000000000000 
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.725778] Call trace:
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.728306]  skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.732728]  netif_skb_features+0x218/0x2a0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.737057]  validate_xmit_skb.isra.0+0x28/0x2c8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.741833]  validate_xmit_skb_list+0x44/0x98
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.746339]  sch_direct_xmit+0xf0/0x3a8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.750309]  __qdisc_run+0x140/0x668
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.754008]  __dev_queue_xmit+0x59c/0x980
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.758156]  dev_queue_xmit+0x1c/0x28
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.761945]  neigh_resolve_output+0x108/0x230
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.766450]  ip_finish_output2+0x180/0x558
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.770690]  __ip_finish_output+0xe4/0x260
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.774928]  ip_finish_output+0x3c/0xd8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.778896]  ip_output+0xb4/0x148
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.782328]  ip_forward_finish+0x7c/0xc0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.786385]  ip_forward+0x42c/0x4f0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.789995]  ip_rcv_finish+0x98/0xb8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.793694]  ip_rcv+0xe0/0xf0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.796765]  __netif_receive_skb_one_core+0x5c/0x88
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.801810]  __netif_receive_skb+0x20/0x70
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.806047]  process_backlog+0xc0/0x1d0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.810016]  net_rx_action+0x134/0x478
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.813897]  __do_softirq+0x130/0x378
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.817686]  irq_exit+0xc0/0xe8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.820940]  __handle_domain_irq+0x70/0xc8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.829099]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.838223]  el1_irq+0xb4/0x140
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.845371]  arch_cpu_idle+0x18/0x28
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.852882]  default_idle_call+0x44/0x178
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.860756]  do_idle+0x224/0x270
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.867794]  cpu_startup_entry+0x30/0x98
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.875516]  rest_init+0xc8/0xd8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.882496]  arch_call_rest_init+0x18/0x24
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.890352]  start_kernel+0x57c/0x5b8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.897706] ---[ end trace a5789410f231a10b ]---
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.046337] ------------[ cut here ]------------
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.077534] Modules linked in:
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W         5.11.0-rc4 #103
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.116085] pc : dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.123857] lr : dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.131558] sp : ffff800010013d90
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.138497] x29: ffff800010013d90 x28: 0000000000000140 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.156489] x25: 0000000000000002 x24: 0000000000000000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.165496] x23: 0000000000000001 x22: ffff50042e96b000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.183490] x19: 0000000000000000 x18: 0000000000000010 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.192493] x17: 0000000000000000 x16: 0000000000000000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.201473] x15: 000000000000ad55 x14: 0000000000000010 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.237362] x7 : c0000000ffffefff x6 : 0000000000000003 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.246353] x5 : 0000000000000000 x4 : 0000000000000000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.255328] x3 : 0000000000000100 x2 : 0000000000001000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.264273] x1 : 0000000000000000 x0 : 0000000000000000 
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.273192] Call trace:
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.279183]  dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.286461]  call_timer_fn+0x38/0x188
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.293762]  run_timer_softirq+0x494/0x688
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.301489]  __do_softirq+0x130/0x378
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.308767]  irq_exit+0xc0/0xe8
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.315500]  __handle_domain_irq+0x70/0xc8
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.323214]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.331940]  el1_irq+0xb4/0x140
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.338706]  arch_cpu_idle+0x18/0x28
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.345916]  default_idle_call+0x44/0x178
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.353577]  do_idle+0x224/0x270
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.360433]  cpu_startup_entry+0x2c/0x98
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.368000]  secondary_start_kernel+0x148/0x180
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.376199] ---[ end trace a5789410f231a10c ]---
> 
> I did some bisecting and found commit [2] to be problematic. Reverting that
> commit plus the two follow-on fixes [3] and [4] prevents the warnings and
> timeout. I'm no networking expert so can't determine if [2] is broken or
> merely exposes a different underlying issue. I failed to reproduce the problem
> using a dedicated Realtek-based USB NIC plugged into the Pi, which points
> towards the lan78xx driver/HW being the culprit.
> 
> Enabling KASAN didn't trigger any error reports.
> 
> Let me know if there's anything else I can try to narrow this down.
> 
> ...Juerg
> 
> [1]
> On the Pi, I run:
>   $ nc -l 1234 | dd status=progress >/dev/null
> 
> And on another machine, that is configured to use the Pi as the gateway:
>   $ nc 192.168.99.115 1234 < /dev/urandom
> and a couple of firefox instances that keep opening public URls.
> 
> [2]
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Nov 27 14:42:03 2018 -0800
> 
>     tcp: implement coalescing on backlog queue
>     
>     In case GRO is not as efficient as it should be or disabled,
>     we might have a user thread trapped in __release_sock() while
>     softirq handler flood packets up to the point we have to drop.
>     
>     This patch balances work done from user thread and softirq,
>     to give more chances to __release_sock() to complete its work
>     before new packets are added the the backlog.
>     
>     This also helps if we receive many ACK packets, since GRO
>     does not aggregate them.
>     
>     This patch brings ~60% throughput increase on a receiver
>     without GRO, but the spectacular gain is really on
>     1000x release_sock() latency reduction I have measured.
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Neal Cardwell <ncardwell@google.com>
>     Cc: Yuchung Cheng <ycheng@google.com>
>     Acked-by: Neal Cardwell <ncardwell@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> [3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog()
> [4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog()
> 

In tcp_add_backlog() we have the following that looks like it could
be related to the problem. gso_type doesn't get set, not sure however
whether this is a bug or intentional (because we expect gso_type
to be set already or because it's supposed to be set somewhere else).
Meybe Eric can comment on this.


	if (!shinfo->gso_size)
		shinfo->gso_size = skb->len - hdrlen;

	if (!shinfo->gso_segs)
		shinfo->gso_segs = 1;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2021-01-19 12:40                     ` Juerg Haefliger
  2021-01-19 13:47                       ` Heiner Kallweit
@ 2021-01-19 13:54                       ` Eric Dumazet
  2021-01-19 15:38                         ` Juerg Haefliger
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2021-01-19 13:54 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: Heiner Kallweit, Eric Dumazet, netdev,
	Microchip Linux Driver Support, Woojung Huh

On Tue, Jan 19, 2021 at 1:40 PM Juerg Haefliger
<juerg.haefliger@canonical.com> wrote:

>
> I seem to have stumbled over the same or a similar issue with a Raspberry Pi
> 3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used
> as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I
> get tons of the below warnings after a couple of seconds:
>
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear:   00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear:   00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear:   00000030: 00 00 6a 58 00 00
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in:
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace:
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599]  skb_warn_bad_offload+0x84/0x100
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020]  netif_skb_features+0x218/0x2a0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350]  validate_xmit_skb.isra.0+0x28/0x2c8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125]  validate_xmit_skb_list+0x44/0x98
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631]  sch_direct_xmit+0xf0/0x3a8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599]  __qdisc_run+0x140/0x668
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297]  __dev_queue_xmit+0x59c/0x980
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446]  dev_queue_xmit+0x1c/0x28
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237]  ip_finish_output2+0x30c/0x558
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476]  __ip_finish_output+0xe4/0x260
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715]  ip_finish_output+0x3c/0xd8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683]  ip_output+0xb4/0x148
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116]  ip_forward_finish+0x7c/0xc0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174]  ip_forward+0x42c/0x4f0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783]  ip_rcv_finish+0x98/0xb8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481]  ip_rcv+0xe0/0xf0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552]  __netif_receive_skb_one_core+0x5c/0x88
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597]  __netif_receive_skb+0x20/0x70
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834]  process_backlog+0xc0/0x1d0
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802]  net_rx_action+0x134/0x478
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682]  __do_softirq+0x130/0x378
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472]  irq_exit+0xc0/0xe8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725]  __handle_domain_irq+0x70/0xc8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185]  el1_irq+0xb4/0x140
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377]  arch_cpu_idle+0x18/0x28
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981]  default_idle_call+0x44/0x178
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009]  do_idle+0x224/0x270
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147]  cpu_startup_entry+0x30/0x98
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916]  rest_init+0xc8/0xd8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937]  arch_call_rest_init+0x18/0x24
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829]  start_kernel+0x57c/0x5b8
> Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]---
>
> If I also add the following rule:
>   $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> I get a single warning followed by a TX timeout:
>
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] mac=(5194,14) net=(5208,20) trans=5228
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.546872] dev name=eth0 feat=0x0x0000010000114b09
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.552060] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.560090] skb linear:   00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.568019] skb linear:   00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.575921] skb linear:   00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.583918] skb linear:   00000040: 9f fc
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.588105] ------------[ cut here ]------------
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.607900] Modules linked in:
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.630396] pc : skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.635175] lr : skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.639953] sp : ffff800010003810
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.643374] x29: ffff800010003810 x28: ffff50043b196290
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.648870] x27: ffff500407371600 x26: 0000000000000001
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.670847] x19: ffff500407371600 x18: 0000000000000010
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.676340] x17: 0000000000000000 x16: 0000000000000000
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.681833] x15: 000000000000ad55 x14: 0000000000000010
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.703806] x7 : c0000000ffffefff x6 : 0000000000000003
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.709300] x5 : 0000000000000000 x4 : 0000000000000000
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.714791] x3 : 0000000000000100 x2 : 0000000000001000
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.720283] x1 : 0000000000000000 x0 : 0000000000000000
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.725778] Call trace:
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.728306]  skb_warn_bad_offload+0x84/0x100
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.732728]  netif_skb_features+0x218/0x2a0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.737057]  validate_xmit_skb.isra.0+0x28/0x2c8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.741833]  validate_xmit_skb_list+0x44/0x98
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.746339]  sch_direct_xmit+0xf0/0x3a8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.750309]  __qdisc_run+0x140/0x668
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.754008]  __dev_queue_xmit+0x59c/0x980
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.758156]  dev_queue_xmit+0x1c/0x28
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.761945]  neigh_resolve_output+0x108/0x230
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.766450]  ip_finish_output2+0x180/0x558
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.770690]  __ip_finish_output+0xe4/0x260
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.774928]  ip_finish_output+0x3c/0xd8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.778896]  ip_output+0xb4/0x148
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.782328]  ip_forward_finish+0x7c/0xc0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.786385]  ip_forward+0x42c/0x4f0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.789995]  ip_rcv_finish+0x98/0xb8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.793694]  ip_rcv+0xe0/0xf0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.796765]  __netif_receive_skb_one_core+0x5c/0x88
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.801810]  __netif_receive_skb+0x20/0x70
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.806047]  process_backlog+0xc0/0x1d0
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.810016]  net_rx_action+0x134/0x478
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.813897]  __do_softirq+0x130/0x378
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.817686]  irq_exit+0xc0/0xe8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.820940]  __handle_domain_irq+0x70/0xc8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.829099]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.838223]  el1_irq+0xb4/0x140
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.845371]  arch_cpu_idle+0x18/0x28
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.852882]  default_idle_call+0x44/0x178
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.860756]  do_idle+0x224/0x270
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.867794]  cpu_startup_entry+0x30/0x98
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.875516]  rest_init+0xc8/0xd8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.882496]  arch_call_rest_init+0x18/0x24
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.890352]  start_kernel+0x57c/0x5b8
> Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.897706] ---[ end trace a5789410f231a10b ]---
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.046337] ------------[ cut here ]------------
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.077534] Modules linked in:
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W         5.11.0-rc4 #103
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.116085] pc : dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.123857] lr : dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.131558] sp : ffff800010013d90
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.138497] x29: ffff800010013d90 x28: 0000000000000140
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.156489] x25: 0000000000000002 x24: 0000000000000000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.165496] x23: 0000000000000001 x22: ffff50042e96b000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.183490] x19: 0000000000000000 x18: 0000000000000010
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.192493] x17: 0000000000000000 x16: 0000000000000000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.201473] x15: 000000000000ad55 x14: 0000000000000010
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.237362] x7 : c0000000ffffefff x6 : 0000000000000003
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.246353] x5 : 0000000000000000 x4 : 0000000000000000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.255328] x3 : 0000000000000100 x2 : 0000000000001000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.264273] x1 : 0000000000000000 x0 : 0000000000000000
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.273192] Call trace:
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.279183]  dev_watchdog+0x384/0x390
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.286461]  call_timer_fn+0x38/0x188
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.293762]  run_timer_softirq+0x494/0x688
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.301489]  __do_softirq+0x130/0x378
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.308767]  irq_exit+0xc0/0xe8
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.315500]  __handle_domain_irq+0x70/0xc8
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.323214]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.331940]  el1_irq+0xb4/0x140
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.338706]  arch_cpu_idle+0x18/0x28
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.345916]  default_idle_call+0x44/0x178
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.353577]  do_idle+0x224/0x270
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.360433]  cpu_startup_entry+0x2c/0x98
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.368000]  secondary_start_kernel+0x148/0x180
> Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.376199] ---[ end trace a5789410f231a10c ]---
>
> I did some bisecting and found commit [2] to be problematic. Reverting that
> commit plus the two follow-on fixes [3] and [4] prevents the warnings and
> timeout. I'm no networking expert so can't determine if [2] is broken or
> merely exposes a different underlying issue. I failed to reproduce the problem
> using a dedicated Realtek-based USB NIC plugged into the Pi, which points
> towards the lan78xx driver/HW being the culprit.
>
> Enabling KASAN didn't trigger any error reports.
>
> Let me know if there's anything else I can try to narrow this down.
>
> ...Juerg
>
> [1]
> On the Pi, I run:
>   $ nc -l 1234 | dd status=progress >/dev/null
>
> And on another machine, that is configured to use the Pi as the gateway:
>   $ nc 192.168.99.115 1234 < /dev/urandom
> and a couple of firefox instances that keep opening public URls.
>
> [2]
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Nov 27 14:42:03 2018 -0800
>
>     tcp: implement coalescing on backlog queue
>
>     In case GRO is not as efficient as it should be or disabled,
>     we might have a user thread trapped in __release_sock() while
>     softirq handler flood packets up to the point we have to drop.
>
>     This patch balances work done from user thread and softirq,
>     to give more chances to __release_sock() to complete its work
>     before new packets are added the the backlog.
>
>     This also helps if we receive many ACK packets, since GRO
>     does not aggregate them.
>
>     This patch brings ~60% throughput increase on a receiver
>     without GRO, but the spectacular gain is really on
>     1000x release_sock() latency reduction I have measured.
>
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Neal Cardwell <ncardwell@google.com>
>     Cc: Yuchung Cheng <ycheng@google.com>
>     Acked-by: Neal Cardwell <ncardwell@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> [3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog()
> [4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog()


Oops. Very nice detective work :)

It is true that the skb_clone() done in lan78xx (and some other usb
drivers) is probably triggering this issue.
(lan78xx is also lying about skb->truesize)

skb_try_coalesce() bails if the target  skb is cloned, but not if the source is.


Can you try the following patch ?
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 58207c7769d05693b650e3c93e4ef405a5d4b23a..4e82745d336fc3fb0d9ce8c92aaeb39702f64b8a
100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1760,6 +1760,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
 bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
        u32 limit = READ_ONCE(sk->sk_rcvbuf) + READ_ONCE(sk->sk_sndbuf);
+       u32 tail_gso_size, tail_gso_segs;
        struct skb_shared_info *shinfo;
        const struct tcphdr *th;
        struct tcphdr *thtail;
@@ -1767,6 +1768,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
        unsigned int hdrlen;
        bool fragstolen;
        u32 gso_segs;
+       u32 gso_size;
        int delta;

        /* In case all data was pulled from skb frags (in __pskb_pull_tail()),
@@ -1792,13 +1794,6 @@ bool tcp_add_backlog(struct sock *sk, struct
sk_buff *skb)
         */
        th = (const struct tcphdr *)skb->data;
        hdrlen = th->doff * 4;
-       shinfo = skb_shinfo(skb);
-
-       if (!shinfo->gso_size)
-               shinfo->gso_size = skb->len - hdrlen;
-
-       if (!shinfo->gso_segs)
-               shinfo->gso_segs = 1;

        tail = sk->sk_backlog.tail;
        if (!tail)
@@ -1821,6 +1816,15 @@ bool tcp_add_backlog(struct sock *sk, struct
sk_buff *skb)
                goto no_coalesce;

        __skb_pull(skb, hdrlen);
+
+       shinfo = skb_shinfo(skb);
+       gso_size = shinfo->gso_size ?: skb->len;
+       gso_segs = shinfo->gso_segs ?: 1;
+
+       shinfo = skb_shinfo(tail);
+       tail_gso_size = shinfo->gso_size ?: (tail->len - hdrlen);
+       tail_gso_segs = shinfo->gso_segs ?: 1;
+
        if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) {
                TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq;

@@ -1847,11 +1851,8 @@ bool tcp_add_backlog(struct sock *sk, struct
sk_buff *skb)
                }

                /* Not as strict as GRO. We only need to carry mss max value */
-               skb_shinfo(tail)->gso_size = max(shinfo->gso_size,
-                                                skb_shinfo(tail)->gso_size);
-
-               gso_segs = skb_shinfo(tail)->gso_segs + shinfo->gso_segs;
-               skb_shinfo(tail)->gso_segs = min_t(u32, gso_segs, 0xFFFF);
+               shinfo->gso_size = max(gso_size, tail_gso_size);
+               shinfo->gso_segs = min_t(u32, gso_segs + tail_gso_segs, 0xFFFF);

                sk->sk_backlog.len += delta;
                __NET_INC_STATS(sock_net(sk),

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2021-01-19 13:47                       ` Heiner Kallweit
@ 2021-01-19 13:58                         ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2021-01-19 13:58 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Juerg Haefliger, Eric Dumazet, netdev,
	Microchip Linux Driver Support, Woojung Huh

On Tue, Jan 19, 2021 at 2:47 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
>
> In tcp_add_backlog() we have the following that looks like it could
> be related to the problem. gso_type doesn't get set, not sure however
> whether this is a bug or intentional (because we expect gso_type
> to be set already or because it's supposed to be set somewhere else).
> Meybe Eric can comment on this.
>
>
>         if (!shinfo->gso_size)
>                 shinfo->gso_size = skb->len - hdrlen;
>
>         if (!shinfo->gso_segs)
>                 shinfo->gso_segs = 1;
>

Yes, at this point TCP is supposed to own the skb, which is partially true.

Check for skb_cloned() in places like skb_try_coalesce()

I think that calling skb_unclone() would be terribly expensive for all
these USB drivers having fake skb
(all clones from a giant one), and thus very big headroom that would
be copied from generic expand head.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2021-01-19 13:54                       ` Eric Dumazet
@ 2021-01-19 15:38                         ` Juerg Haefliger
  2021-01-19 15:50                           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Juerg Haefliger @ 2021-01-19 15:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Juerg Haefliger, Heiner Kallweit, Eric Dumazet, netdev,
	Microchip Linux Driver Support, Woojung Huh

[-- Attachment #1: Type: text/plain, Size: 25217 bytes --]

On Tue, 19 Jan 2021 14:54:31 +0100
Eric Dumazet <edumazet@google.com> wrote:

> On Tue, Jan 19, 2021 at 1:40 PM Juerg Haefliger
> <juerg.haefliger@canonical.com> wrote:
> 
> >
> > I seem to have stumbled over the same or a similar issue with a Raspberry Pi
> > 3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used
> > as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I
> > get tons of the below warnings after a couple of seconds:
> >
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0)
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear:   00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear:   00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear:   00000030: 00 00 6a 58 00 00
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in:
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace:
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599]  skb_warn_bad_offload+0x84/0x100
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020]  netif_skb_features+0x218/0x2a0
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350]  validate_xmit_skb.isra.0+0x28/0x2c8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125]  validate_xmit_skb_list+0x44/0x98
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631]  sch_direct_xmit+0xf0/0x3a8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599]  __qdisc_run+0x140/0x668
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297]  __dev_queue_xmit+0x59c/0x980
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446]  dev_queue_xmit+0x1c/0x28
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237]  ip_finish_output2+0x30c/0x558
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476]  __ip_finish_output+0xe4/0x260
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715]  ip_finish_output+0x3c/0xd8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683]  ip_output+0xb4/0x148
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116]  ip_forward_finish+0x7c/0xc0
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174]  ip_forward+0x42c/0x4f0
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783]  ip_rcv_finish+0x98/0xb8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481]  ip_rcv+0xe0/0xf0
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552]  __netif_receive_skb_one_core+0x5c/0x88
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597]  __netif_receive_skb+0x20/0x70
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834]  process_backlog+0xc0/0x1d0
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802]  net_rx_action+0x134/0x478
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682]  __do_softirq+0x130/0x378
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472]  irq_exit+0xc0/0xe8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725]  __handle_domain_irq+0x70/0xc8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185]  el1_irq+0xb4/0x140
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377]  arch_cpu_idle+0x18/0x28
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981]  default_idle_call+0x44/0x178
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009]  do_idle+0x224/0x270
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147]  cpu_startup_entry+0x30/0x98
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916]  rest_init+0xc8/0xd8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937]  arch_call_rest_init+0x18/0x24
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829]  start_kernel+0x57c/0x5b8
> > Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]---
> >
> > If I also add the following rule:
> >   $ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> > I get a single warning followed by a TX timeout:
> >
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] mac=(5194,14) net=(5208,20) trans=5228
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0)
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.546872] dev name=eth0 feat=0x0x0000010000114b09
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.552060] skb linear:   00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.560090] skb linear:   00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.568019] skb linear:   00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.575921] skb linear:   00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.583918] skb linear:   00000040: 9f fc
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.588105] ------------[ cut here ]------------
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.607900] Modules linked in:
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.630396] pc : skb_warn_bad_offload+0x84/0x100
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.635175] lr : skb_warn_bad_offload+0x84/0x100
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.639953] sp : ffff800010003810
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.643374] x29: ffff800010003810 x28: ffff50043b196290
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.648870] x27: ffff500407371600 x26: 0000000000000001
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.670847] x19: ffff500407371600 x18: 0000000000000010
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.676340] x17: 0000000000000000 x16: 0000000000000000
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.681833] x15: 000000000000ad55 x14: 0000000000000010
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.703806] x7 : c0000000ffffefff x6 : 0000000000000003
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.709300] x5 : 0000000000000000 x4 : 0000000000000000
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.714791] x3 : 0000000000000100 x2 : 0000000000001000
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.720283] x1 : 0000000000000000 x0 : 0000000000000000
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.725778] Call trace:
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.728306]  skb_warn_bad_offload+0x84/0x100
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.732728]  netif_skb_features+0x218/0x2a0
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.737057]  validate_xmit_skb.isra.0+0x28/0x2c8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.741833]  validate_xmit_skb_list+0x44/0x98
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.746339]  sch_direct_xmit+0xf0/0x3a8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.750309]  __qdisc_run+0x140/0x668
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.754008]  __dev_queue_xmit+0x59c/0x980
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.758156]  dev_queue_xmit+0x1c/0x28
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.761945]  neigh_resolve_output+0x108/0x230
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.766450]  ip_finish_output2+0x180/0x558
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.770690]  __ip_finish_output+0xe4/0x260
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.774928]  ip_finish_output+0x3c/0xd8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.778896]  ip_output+0xb4/0x148
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.782328]  ip_forward_finish+0x7c/0xc0
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.786385]  ip_forward+0x42c/0x4f0
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.789995]  ip_rcv_finish+0x98/0xb8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.793694]  ip_rcv+0xe0/0xf0
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.796765]  __netif_receive_skb_one_core+0x5c/0x88
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.801810]  __netif_receive_skb+0x20/0x70
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.806047]  process_backlog+0xc0/0x1d0
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.810016]  net_rx_action+0x134/0x478
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.813897]  __do_softirq+0x130/0x378
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.817686]  irq_exit+0xc0/0xe8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.820940]  __handle_domain_irq+0x70/0xc8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.829099]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.838223]  el1_irq+0xb4/0x140
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.845371]  arch_cpu_idle+0x18/0x28
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.852882]  default_idle_call+0x44/0x178
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.860756]  do_idle+0x224/0x270
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.867794]  cpu_startup_entry+0x30/0x98
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.875516]  rest_init+0xc8/0xd8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.882496]  arch_call_rest_init+0x18/0x24
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.890352]  start_kernel+0x57c/0x5b8
> > Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [   81.897706] ---[ end trace a5789410f231a10b ]---
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.046337] ------------[ cut here ]------------
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.077534] Modules linked in:
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W         5.11.0-rc4 #103
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.116085] pc : dev_watchdog+0x384/0x390
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.123857] lr : dev_watchdog+0x384/0x390
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.131558] sp : ffff800010013d90
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.138497] x29: ffff800010013d90 x28: 0000000000000140
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.156489] x25: 0000000000000002 x24: 0000000000000000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.165496] x23: 0000000000000001 x22: ffff50042e96b000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.183490] x19: 0000000000000000 x18: 0000000000000010
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.192493] x17: 0000000000000000 x16: 0000000000000000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.201473] x15: 000000000000ad55 x14: 0000000000000010
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.237362] x7 : c0000000ffffefff x6 : 0000000000000003
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.246353] x5 : 0000000000000000 x4 : 0000000000000000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.255328] x3 : 0000000000000100 x2 : 0000000000001000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.264273] x1 : 0000000000000000 x0 : 0000000000000000
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.273192] Call trace:
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.279183]  dev_watchdog+0x384/0x390
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.286461]  call_timer_fn+0x38/0x188
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.293762]  run_timer_softirq+0x494/0x688
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.301489]  __do_softirq+0x130/0x378
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.308767]  irq_exit+0xc0/0xe8
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.315500]  __handle_domain_irq+0x70/0xc8
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.323214]  bcm2836_arm_irqchip_handle_irq+0x6c/0x80
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.331940]  el1_irq+0xb4/0x140
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.338706]  arch_cpu_idle+0x18/0x28
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.345916]  default_idle_call+0x44/0x178
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.353577]  do_idle+0x224/0x270
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.360433]  cpu_startup_entry+0x2c/0x98
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.368000]  secondary_start_kernel+0x148/0x180
> > Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [  110.376199] ---[ end trace a5789410f231a10c ]---
> >
> > I did some bisecting and found commit [2] to be problematic. Reverting that
> > commit plus the two follow-on fixes [3] and [4] prevents the warnings and
> > timeout. I'm no networking expert so can't determine if [2] is broken or
> > merely exposes a different underlying issue. I failed to reproduce the problem
> > using a dedicated Realtek-based USB NIC plugged into the Pi, which points
> > towards the lan78xx driver/HW being the culprit.
> >
> > Enabling KASAN didn't trigger any error reports.
> >
> > Let me know if there's anything else I can try to narrow this down.
> >
> > ...Juerg
> >
> > [1]
> > On the Pi, I run:
> >   $ nc -l 1234 | dd status=progress >/dev/null
> >
> > And on another machine, that is configured to use the Pi as the gateway:
> >   $ nc 192.168.99.115 1234 < /dev/urandom
> > and a couple of firefox instances that keep opening public URls.
> >
> > [2]
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Tue Nov 27 14:42:03 2018 -0800
> >
> >     tcp: implement coalescing on backlog queue
> >
> >     In case GRO is not as efficient as it should be or disabled,
> >     we might have a user thread trapped in __release_sock() while
> >     softirq handler flood packets up to the point we have to drop.
> >
> >     This patch balances work done from user thread and softirq,
> >     to give more chances to __release_sock() to complete its work
> >     before new packets are added the the backlog.
> >
> >     This also helps if we receive many ACK packets, since GRO
> >     does not aggregate them.
> >
> >     This patch brings ~60% throughput increase on a receiver
> >     without GRO, but the spectacular gain is really on
> >     1000x release_sock() latency reduction I have measured.
> >
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Cc: Neal Cardwell <ncardwell@google.com>
> >     Cc: Yuchung Cheng <ycheng@google.com>
> >     Acked-by: Neal Cardwell <ncardwell@google.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > [3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog()
> > [4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog()  
> 
> 
> Oops. Very nice detective work :)
> 
> It is true that the skb_clone() done in lan78xx (and some other usb
> drivers) is probably triggering this issue.
> (lan78xx is also lying about skb->truesize)
> 
> skb_try_coalesce() bails if the target  skb is cloned, but not if the source is.
> 
> 
> Can you try the following patch ?

Works. Nice :-)

If you submit this and care you can add:

Tested-by: Juerg Haefliger <juergh@canonical.com>

Thanks a lot for the quick turnaround!

...Juerg


> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 58207c7769d05693b650e3c93e4ef405a5d4b23a..4e82745d336fc3fb0d9ce8c92aaeb39702f64b8a
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1760,6 +1760,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
>  bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
>  {
>         u32 limit = READ_ONCE(sk->sk_rcvbuf) + READ_ONCE(sk->sk_sndbuf);
> +       u32 tail_gso_size, tail_gso_segs;
>         struct skb_shared_info *shinfo;
>         const struct tcphdr *th;
>         struct tcphdr *thtail;
> @@ -1767,6 +1768,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
>         unsigned int hdrlen;
>         bool fragstolen;
>         u32 gso_segs;
> +       u32 gso_size;
>         int delta;
> 
>         /* In case all data was pulled from skb frags (in __pskb_pull_tail()),
> @@ -1792,13 +1794,6 @@ bool tcp_add_backlog(struct sock *sk, struct
> sk_buff *skb)
>          */
>         th = (const struct tcphdr *)skb->data;
>         hdrlen = th->doff * 4;
> -       shinfo = skb_shinfo(skb);
> -
> -       if (!shinfo->gso_size)
> -               shinfo->gso_size = skb->len - hdrlen;
> -
> -       if (!shinfo->gso_segs)
> -               shinfo->gso_segs = 1;
> 
>         tail = sk->sk_backlog.tail;
>         if (!tail)
> @@ -1821,6 +1816,15 @@ bool tcp_add_backlog(struct sock *sk, struct
> sk_buff *skb)
>                 goto no_coalesce;
> 
>         __skb_pull(skb, hdrlen);
> +
> +       shinfo = skb_shinfo(skb);
> +       gso_size = shinfo->gso_size ?: skb->len;
> +       gso_segs = shinfo->gso_segs ?: 1;
> +
> +       shinfo = skb_shinfo(tail);
> +       tail_gso_size = shinfo->gso_size ?: (tail->len - hdrlen);
> +       tail_gso_segs = shinfo->gso_segs ?: 1;
> +
>         if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) {
>                 TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq;
> 
> @@ -1847,11 +1851,8 @@ bool tcp_add_backlog(struct sock *sk, struct
> sk_buff *skb)
>                 }
> 
>                 /* Not as strict as GRO. We only need to carry mss max value */
> -               skb_shinfo(tail)->gso_size = max(shinfo->gso_size,
> -                                                skb_shinfo(tail)->gso_size);
> -
> -               gso_segs = skb_shinfo(tail)->gso_segs + shinfo->gso_segs;
> -               skb_shinfo(tail)->gso_segs = min_t(u32, gso_segs, 0xFFFF);
> +               shinfo->gso_size = max(gso_size, tail_gso_size);
> +               shinfo->gso_segs = min_t(u32, gso_segs + tail_gso_segs, 0xFFFF);
> 
>                 sk->sk_backlog.len += delta;
>                 __NET_INC_STATS(sock_net(sk),


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
  2021-01-19 15:38                         ` Juerg Haefliger
@ 2021-01-19 15:50                           ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2021-01-19 15:50 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: Heiner Kallweit, Eric Dumazet, netdev,
	Microchip Linux Driver Support, Woojung Huh

On Tue, Jan 19, 2021 at 4:39 PM Juerg Haefliger
<juerg.haefliger@canonical.com> wrote:
>
> On Tue, 19 Jan 2021 14:54:31 +0100
> Eric Dumazet <edumazet@google.com> wrote:
>

> >
> > Oops. Very nice detective work :)
> >
> > It is true that the skb_clone() done in lan78xx (and some other usb
> > drivers) is probably triggering this issue.
> > (lan78xx is also lying about skb->truesize)
> >
> > skb_try_coalesce() bails if the target  skb is cloned, but not if the source is.
> >
> >
> > Can you try the following patch ?
>
> Works. Nice :-)
>

Excellent !

> If you submit this and care you can add:
>
> Tested-by: Juerg Haefliger <juergh@canonical.com>

Sure, I will also add a :

Bisected-by: Juerg Haefliger <juergh@canonical.com>

Because you did quite a lot of work narrowing the problem !

>
> Thanks a lot for the quick turnaround!
>
> ...Juerg
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-01-20  0:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-209423-201211-atteo0d1ZY@https.bugzilla.kernel.org/>
2020-10-01 20:34 ` Fwd: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2() Heiner Kallweit
2020-10-02  8:26   ` Eric Dumazet
2020-10-02  8:32     ` Eric Dumazet
2020-10-02  8:46       ` Eric Dumazet
2020-10-02 11:09         ` Heiner Kallweit
2020-10-02 11:48           ` Eric Dumazet
2020-10-08 16:37             ` Heiner Kallweit
2020-10-08 17:15               ` Eric Dumazet
2020-10-08 18:41                 ` Heiner Kallweit
2020-10-08 18:50                   ` Eric Dumazet
2020-10-08 19:07                     ` Eric Dumazet
2020-10-08 20:54                       ` Heiner Kallweit
2020-10-09  8:29                         ` Eric Dumazet
2021-01-19 12:40                     ` Juerg Haefliger
2021-01-19 13:47                       ` Heiner Kallweit
2021-01-19 13:58                         ` Eric Dumazet
2021-01-19 13:54                       ` Eric Dumazet
2021-01-19 15:38                         ` Juerg Haefliger
2021-01-19 15:50                           ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.