All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: PROBLEM: virtio_net LRO kernel panics
       [not found] <CACFia2dwacaVVYD+1uG=CDGaJqdCOSBvZ5FcXp04caecaWAY3w@mail.gmail.com>
@ 2021-07-23  1:28 ` Tonghao Zhang
       [not found]   ` <CACFia2fDZHUZB5wJ7LK8M2sv_+W58rHw0NzzrwPPoX9=s7yPdQ@mail.gmail.com>
  2021-07-30 11:42   ` Michael S. Tsirkin
  1 sibling, 1 reply; 24+ messages in thread
From: Tonghao Zhang @ 2021-07-23  1:28 UTC (permalink / raw)
  To: Ivan
  Cc: David S. Miller, virtualization, Willem de Bruijn, Michael S. Tsirkin

On Fri, Jul 23, 2021 at 7:29 AM Ivan <ivan@prestigetransportation.com> wrote:
>
> Dear Sir,
>
> I've been plagued with kernel panics recently. The problem is easily
> reproducible on any virtual machine that uses the virtio-net driver
> from stock Linux kernel. Simply isuse this command:
>
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ...and the kernel panics.
>
> Is there any way we can possibly fix this?
Hi
what is your kernel version, and features of your netdevice.
I set the option, and the kernel does not panic. 5.13.0+

echo 1 > /proc/sys/net/ipv4/ip_forward

root@localhost-upstream:~# ethtool -k eth0
Features for eth0:
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> virtio_pci_modern_dev virtio_ring virtio loop unix
> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> kernel: Call Trace:
> kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> kernel:  proc_sys_call_handler+0x127/0x230
> kernel:  new_sync_write+0x114/0x1a0
> kernel:  vfs_write+0x18c/0x220
> kernel:  ksys_write+0x5a/0xd0
> kernel:  do_syscall_64+0x45/0x80
> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> kernel: RIP: 0033:0x7fd4912b79b3
> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> kernel: ---[ end trace ee7985b10570603d ]---
> kernel: ------------[ cut here ]------------



-- 
Best regards, Tonghao
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]   ` <CACFia2fDZHUZB5wJ7LK8M2sv_+W58rHw0NzzrwPPoX9=s7yPdQ@mail.gmail.com>
@ 2021-07-23  2:37     ` Jason Wang
       [not found]       ` <CACFia2eLCJuy=w1r20691s_cSYkBkPaY-Dbd-9CkrcpSAe7z6g@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-07-23  2:37 UTC (permalink / raw)
  To: Ivan, Tonghao Zhang
  Cc: Willem de Bruijn, virtualization, David S. Miller, Michael S. Tsirkin


在 2021/7/23 上午9:40, Ivan 写道:
> On Thu, Jul 22, 2021 at 8:28 PM Tonghao Zhang<xiangxia.m.yue@gmail.com>  wrote:
>> what is your kernel version, and features of your netdevice
> Currently, 5.13.4.  But I also tested with kernels back to 5.10, and
> it always panics.  I also downloaded the stock generic kernel from
> Slackware, and it too panicked.
>
> 0 root@NuRaid:~# ethtool -k eth0
> Features for eth0:
> rx-checksumming: on [fixed]
> tx-checksumming: on
>          tx-checksum-ipv4: off [fixed]
>          tx-checksum-ip-generic: on
>          tx-checksum-ipv6: off [fixed]
>          tx-checksum-fcoe-crc: off [fixed]
>          tx-checksum-sctp: off [fixed]
> scatter-gather: on
>          tx-scatter-gather: on
>          tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>          tx-tcp-segmentation: on
>          tx-tcp-ecn-segmentation: off [fixed]
>          tx-tcp-mangleid-segmentation: off
>          tx-tcp6-segmentation: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: on [fixed]


Does it work if you turn off lro before enabling the forwarding?

Btw, using LRO for virtio-net is suspicious, it's actually the GSO in 
the RX patch not LRO.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]       ` <CACFia2eLCJuy=w1r20691s_cSYkBkPaY-Dbd-9CkrcpSAe7z6g@mail.gmail.com>
@ 2021-07-23  4:25         ` Jason Wang
       [not found]           ` <CACFia2eH3eCZxtt70LB5zoPbhLXRv=crPh5oOhR=6mY3auDdQA@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-07-23  4:25 UTC (permalink / raw)
  To: Ivan
  Cc: David S. Miller, virtualization, Willem de Bruijn, Michael S. Tsirkin


在 2021/7/23 上午10:54, Ivan 写道:
> On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/7/23 上午9:40, Ivan 写道:
>>> On Thu, Jul 22, 2021 at 8:28 PM Tonghao Zhang<xiangxia.m.yue@gmail.com>  wrote:
>>>> what is your kernel version, and features of your netdevice
>>> Currently, 5.13.4.  But I also tested with kernels back to 5.10, and
>>> it always panics.  I also downloaded the stock generic kernel from
>>> Slackware, and it too panicked.
>>>
>>> 0 root@NuRaid:~# ethtool -k eth0
>>> Features for eth0:
>>> rx-checksumming: on [fixed]
>>> tx-checksumming: on
>>>           tx-checksum-ipv4: off [fixed]
>>>           tx-checksum-ip-generic: on
>>>           tx-checksum-ipv6: off [fixed]
>>>           tx-checksum-fcoe-crc: off [fixed]
>>>           tx-checksum-sctp: off [fixed]
>>> scatter-gather: on
>>>           tx-scatter-gather: on
>>>           tx-scatter-gather-fraglist: off [fixed]
>>> tcp-segmentation-offload: on
>>>           tx-tcp-segmentation: on
>>>           tx-tcp-ecn-segmentation: off [fixed]
>>>           tx-tcp-mangleid-segmentation: off
>>>           tx-tcp6-segmentation: on
>>> generic-segmentation-offload: on
>>> generic-receive-offload: on
>>> large-receive-offload: on [fixed]
>>
>> Does it work if you turn off lro before enabling the forwarding?
>>
>> Btw, using LRO for virtio-net is suspicious, it's actually the GSO in
>> the RX patch not LRO.
> As I mention, it's a freshly booted system on which I have not made
> any setting changes with ethtool or sysctl. (So, whatever the kernel
> defaults are)
>
> Per your suggestion:
>
> 0 root@NuRaid:~# ethtool -K eth0 lro off
> Actual changes:
> rx-lro: on [requested off]
> Could not change any device features
> 1 root@NuRaid:~#


Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS 
which makes it impossible to change the LRO setting.

Did you use qemu? If yes, what's the qemu version you've used?

Thanks


>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]           ` <CACFia2eH3eCZxtt70LB5zoPbhLXRv=crPh5oOhR=6mY3auDdQA@mail.gmail.com>
@ 2021-07-23  7:59             ` Michael S. Tsirkin
       [not found]               ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
       [not found]               ` <CACFia2fYQG4Y3_ffym06C1HGrOiOS38YWxuoUu4HYorwS9qOjA@mail.gmail.com>
  0 siblings, 2 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23  7:59 UTC (permalink / raw)
  To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn

On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > 在 2021/7/23 上午10:54, Ivan 写道:
> > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > >> Does it work if you turn off lro before enabling the forwarding?
> > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > Actual changes:
> > > rx-lro: on [requested off]
> > > Could not change any device features
> >
> > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > which makes it impossible to change the LRO setting.
> >
> > Did you use qemu? If yes, what's the qemu version you've used?
> 
> These are VirtualBox machines, which I've been using for years with
> longterm kernels 4.19, and I never had such a problem.  But now that I
> tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> are just generic kernel builds, and a minimalistic userspace.

I would be useful to see the features your virtualbox instance provides

cat /sys/class/net/eth0/device/features

replacing eth0 with device name as appropriate



-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]               ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
@ 2021-07-23  8:13                 ` Michael S. Tsirkin
  2021-07-23 12:10                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23  8:13 UTC (permalink / raw)
  To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn

On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > Actual changes:
> > > > > rx-lro: on [requested off]
> > > > > Could not change any device features
> > > >
> > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > which makes it impossible to change the LRO setting.
> > > >
> > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >
> > > These are VirtualBox machines, which I've been using for years with
> > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > are just generic kernel builds, and a minimalistic userspace.
> >
> > I would be useful to see the features your virtualbox instance provides
> >
> > cat /sys/class/net/eth0/device/features
> 
> # cat /sys/class/net/eth0/device/features
> 1100010110111011111100000000000000000000000000000000000000000000


So if I'm not wrong:

1#define VIRTIO_NET_F_CSUM      0       /* Host handles pkts w/ partial csum */
1#define VIRTIO_NET_F_GUEST_CSUM        1       /* Guest handles pkts w/ partial csum */
0#define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS 2 /* Dynamic offload configuration. */
0#define VIRTIO_NET_F_MTU       3       /* Initial MTU advice */
0
1#define VIRTIO_NET_F_MAC       5       /* Host has given MAC address. */
0
1#define VIRTIO_NET_F_GUEST_TSO4        7       /* Guest can handle TSOv4 in. */
1#define VIRTIO_NET_F_GUEST_TSO6        8       /* Guest can handle TSOv6 in. */
0#define VIRTIO_NET_F_GUEST_ECN 9       /* Guest can handle TSO[6] w/ ECN in. */
1#define VIRTIO_NET_F_GUEST_UFO 10      /* Guest can handle UFO in. */
1#define VIRTIO_NET_F_HOST_TSO4 11      /* Host can handle TSOv4 in. */
1#define VIRTIO_NET_F_HOST_TSO6 12      /* Host can handle TSOv6 in. */
0#define VIRTIO_NET_F_HOST_ECN  13      /* Host can handle TSO[6] w/ ECN in. */
1#define VIRTIO_NET_F_HOST_UFO  14      /* Host can handle UFO in. */
1#define VIRTIO_NET_F_MRG_RXBUF 15      /* Host can merge receive buffers. */
1#define VIRTIO_NET_F_STATUS    16      /* virtio_net_config.status available */
1#define VIRTIO_NET_F_CTRL_VQ   17      /* Control channel available */
1#define VIRTIO_NET_F_CTRL_RX   18      /* Control channel RX mode support */
1#define VIRTIO_NET_F_CTRL_VLAN 19      /* Control channel VLAN filtering */



_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]               ` <CACFia2fYQG4Y3_ffym06C1HGrOiOS38YWxuoUu4HYorwS9qOjA@mail.gmail.com>
@ 2021-07-23  8:59                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23  8:59 UTC (permalink / raw)
  To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn

On Fri, Jul 23, 2021 at 03:31:02AM -0500, Ivan wrote:
> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > Actual changes:
> > > > > rx-lro: on [requested off]
> > > > > Could not change any device features
> > > >
> > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > which makes it impossible to change the LRO setting.
> > > >
> > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >
> > > These are VirtualBox machines, which I've been using for years with
> > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > are just generic kernel builds, and a minimalistic userspace.
> >
> > I would be useful to see the features your virtualbox instance provides
> >
> > cat /sys/class/net/eth0/device/features
> >
> > replacing eth0 with device name as appropriate
> 
> # grep . /sys/class/net/eth0/device/* 2>/dev/null
> /sys/class/net/eth0/device/device:0x0001
> /sys/class/net/eth0/device/
> features:1100010110111011111100000000000000000000000000000000000000000000
> /sys/class/net/eth0/device/modalias:virtio:d00000001v00001AF4
> /sys/class/net/eth0/device/status:0x00000007
> /sys/class/net/eth0/device/uevent:DRIVER=virtio_net
> /sys/class/net/eth0/device/uevent:MODALIAS=virtio:d00000001v00001AF4
> /sys/class/net/eth0/device/vendor:0x1af4
> 
> # lspci -vv -nn
> 00:03.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device
> [1af4:1000]
>         Subsystem: Red Hat, Inc. Virtio network device [1af4:0001]
>         Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>         Latency: 64
>         Interrupt: pin A routed to IRQ 19
>         Region 0: I/O ports at d000 [size=32]
>         Capabilities: [80] Null
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
> 


Disabling guest offloads reproduces the warning, but not the crash
for me.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]               ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
  2021-07-23  8:13                 ` Michael S. Tsirkin
@ 2021-07-23 12:10                 ` Michael S. Tsirkin
       [not found]                   ` <CACFia2en0JJDFyz3Umk-JTnMT=kjvRogt4PudED4kiLeMjcHFg@mail.gmail.com>
  1 sibling, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23 12:10 UTC (permalink / raw)
  To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn

On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > Actual changes:
> > > > > rx-lro: on [requested off]
> > > > > Could not change any device features
> > > >
> > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > which makes it impossible to change the LRO setting.
> > > >
> > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >
> > > These are VirtualBox machines, which I've been using for years with
> > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > are just generic kernel builds, and a minimalistic userspace.
> >
> > I would be useful to see the features your virtualbox instance provides
> >
> > cat /sys/class/net/eth0/device/features
> 
> # cat /sys/class/net/eth0/device/features
> 1100010110111011111100000000000000000000000000000000000000000000

I was able to reproduce the warning but not the panic.
OTOH if LRO stays on when enabling forwarding that
is already a problem. Any chance you can bisect to
find out which change introduced the panic?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]                         ` <CACFia2dns1rTe5OQj4H-kpurVm2CTtGfAXz0aOUS0_cs0QUrsA@mail.gmail.com>
@ 2021-07-27  9:11                           ` Michael S. Tsirkin
       [not found]                             ` <CACFia2dLp19pzJsScSvVYREpQm0n6XCWLieWXzA94=OVYVHTbw@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-27  9:11 UTC (permalink / raw)
  To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn

On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> >
> > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > >
> > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > >>
> > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >>>
> > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >>> > >
> > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > >>> > > > > > Actual changes:
> > >>> > > > > > rx-lro: on [requested off]
> > >>> > > > > > Could not change any device features
> > >>> > > > >
> > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > >>> > > > > which makes it impossible to change the LRO setting.
> > >>> > > > >
> > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >>> > > >
> > >>> > > > These are VirtualBox machines, which I've been using for years with
> > >>> > > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > >>> > >
> > >>> > > I would be useful to see the features your virtualbox instance provides
> > >>> > >
> > >>> > > cat /sys/class/net/eth0/device/features
> > >>> >
> > >>> > # cat /sys/class/net/eth0/device/features
> > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > >>>
> > >>> I was able to reproduce the warning but not the panic.
> > >>> OTOH if LRO stays on when enabling forwarding that
> > >>> is already a problem. Any chance you can bisect to
> > >>> find out which change introduced the panic?
> > >>
> > >>
> > >> Any kernels up to 4.19.198 don't panic.
> > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > >> I have not tested any kernels between 4.19 and 5.10.
> > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > >> That may take a day or so.  I'll get on with it now, and report my findings.
> > >
> > > So, I narrowed  it down: the panics start with kernel 5.0-rc.
> >
> > More narowly, the problem seems be coming from commit
> > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > Just to test my suspicion, I deleted a few lines from that code,
> > and the panic went away.  Hope that helps you guys figure out
> > what the problem might be.

Well it disables LRO but we knew this :( I'd help if we knew
where does it panic, all we see it the warning which is
related for sure but not the immediate rootcause ...

> >
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2978,11 +2978,6 @@
> >   }
> >   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> >      dev->features |= NETIF_F_RXCSUM;
> > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > -    dev->features |= NETIF_F_LRO;
> > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > -    dev->hw_features |= NETIF_F_LRO;
> >
> >   dev->vlan_features = dev->features;
> 
> Just FYI, Google turned up two similar bug reposts...
> Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> 
> Is there any sensible thing I could do, temporarily, until this
> problem is sorted out?
> Or am I simply stuck to kernels 4.19 on these machines for now?


Something like this I guess:


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..cc5982193a40 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
 			__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
 	}
 
+	__virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
+	__virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
 	return 0;
 }
 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found] <CACFia2dwacaVVYD+1uG=CDGaJqdCOSBvZ5FcXp04caecaWAY3w@mail.gmail.com>
@ 2021-07-30 11:42   ` Michael S. Tsirkin
  2021-07-30 11:42   ` Michael S. Tsirkin
  1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-30 11:42 UTC (permalink / raw)
  To: Ivan
  Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
	virtualization, netdev, Eric Dumazet, Jakub Kicinski

On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> Dear Sir,
> 
> I've been plagued with kernel panics recently. The problem is easily
> reproducible on any virtual machine that uses the virtio-net driver
> from stock Linux kernel. Simply isuse this command:
> 
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ...and the kernel panics.
> 
> Is there any way we can possibly fix this?
> 
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> virtio_pci_modern_dev virtio_ring virtio loop unix
> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> kernel: Call Trace:
> kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> kernel:  proc_sys_call_handler+0x127/0x230
> kernel:  new_sync_write+0x114/0x1a0
> kernel:  vfs_write+0x18c/0x220
> kernel:  ksys_write+0x5a/0xd0
> kernel:  do_syscall_64+0x45/0x80
> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> kernel: RIP: 0033:0x7fd4912b79b3
> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> kernel: ---[ end trace ee7985b10570603d ]---
> kernel: ------------[ cut here ]------------

So the warning is easy to reproduce.
On qemu/kvm just set ctrl_guest_offloads=off for the device.

The panic does not seem to trigger for me and you did not provide
any data about it.  What happens? Does guest just freeze?

I am guessing the issue is that dev_disable_lro does not report the
return status and inet_forward_change assumes it's successful.  We then
end up with LRO packets in unexpected places.

Cc netdev and a bunch of people who might have a better idea.

-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
@ 2021-07-30 11:42   ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-30 11:42 UTC (permalink / raw)
  To: Ivan
  Cc: Willem de Bruijn, netdev, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller

On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> Dear Sir,
> 
> I've been plagued with kernel panics recently. The problem is easily
> reproducible on any virtual machine that uses the virtio-net driver
> from stock Linux kernel. Simply isuse this command:
> 
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ...and the kernel panics.
> 
> Is there any way we can possibly fix this?
> 
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> virtio_pci_modern_dev virtio_ring virtio loop unix
> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> kernel: Call Trace:
> kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> kernel:  proc_sys_call_handler+0x127/0x230
> kernel:  new_sync_write+0x114/0x1a0
> kernel:  vfs_write+0x18c/0x220
> kernel:  ksys_write+0x5a/0xd0
> kernel:  do_syscall_64+0x45/0x80
> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> kernel: RIP: 0033:0x7fd4912b79b3
> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> kernel: ---[ end trace ee7985b10570603d ]---
> kernel: ------------[ cut here ]------------

So the warning is easy to reproduce.
On qemu/kvm just set ctrl_guest_offloads=off for the device.

The panic does not seem to trigger for me and you did not provide
any data about it.  What happens? Does guest just freeze?

I am guessing the issue is that dev_disable_lro does not report the
return status and inet_forward_change assumes it's successful.  We then
end up with LRO packets in unexpected places.

Cc netdev and a bunch of people who might have a better idea.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-07-30 11:42   ` Michael S. Tsirkin
  (?)
@ 2021-07-30 17:04   ` Ivan
  2021-07-31 20:53       ` Michael S. Tsirkin
  2021-08-02  4:35       ` Jason Wang
  -1 siblings, 2 replies; 24+ messages in thread
From: Ivan @ 2021-07-30 17:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
	virtualization, netdev, Eric Dumazet, Jakub Kicinski, Ivan

On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > Dear Sir,
> >
> > I've been plagued with kernel panics recently. The problem is easily
> > reproducible on any virtual machine that uses the virtio-net driver
> > from stock Linux kernel. Simply isuse this command:
> >
> > echo 1 > /proc/sys/net/ipv4/ip_forward
> > ...and the kernel panics.
> >
> > Is there any way we can possibly fix this?
> >
> > kernel: ------------[ cut here ]------------
> > kernel: netdevice: eth0: failed to disable LRO!
> > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > dev_disable_lro+0x108/0x150
> > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > virtio_pci_modern_dev virtio_ring virtio loop unix
> > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > VirtualBox 12/01/2006
> > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > knlGS:0000000000000000
> > kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > kernel: Call Trace:
> > kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> > kernel:  proc_sys_call_handler+0x127/0x230
> > kernel:  new_sync_write+0x114/0x1a0
> > kernel:  vfs_write+0x18c/0x220
> > kernel:  ksys_write+0x5a/0xd0
> > kernel:  do_syscall_64+0x45/0x80
> > kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > kernel: RIP: 0033:0x7fd4912b79b3
> > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > kernel: ---[ end trace ee7985b10570603d ]---
> > kernel: ------------[ cut here ]------------
>
> So the warning is easy to reproduce.
> On qemu/kvm just set ctrl_guest_offloads=off for the device.

I have no control over the settings of the host.
I have full control over the guest.

> The panic does not seem to trigger for me and you did not provide
> any data about it.  What happens? Does guest just freeze?

I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
No, the guest does not freeze, just, the moment I issue the command...
  echo 1 > /proc/sys/net/ipv4/ip_forward
... and I see the "--[ cut here ]--" message appear in the syslog.
Shortly thereafter my ssh session to that host dies.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-07-30 17:04   ` Ivan
@ 2021-07-31 20:53       ` Michael S. Tsirkin
  2021-08-02  4:35       ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-31 20:53 UTC (permalink / raw)
  To: Ivan
  Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
	virtualization, netdev, Eric Dumazet, Jakub Kicinski

On Fri, Jul 30, 2021 at 12:04:18PM -0500, Ivan wrote:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > > Dear Sir,
> > >
> > > I've been plagued with kernel panics recently. The problem is easily
> > > reproducible on any virtual machine that uses the virtio-net driver
> > > from stock Linux kernel. Simply isuse this command:
> > >
> > > echo 1 > /proc/sys/net/ipv4/ip_forward
> > > ...and the kernel panics.
> > >
> > > Is there any way we can possibly fix this?
> > >
> > > kernel: ------------[ cut here ]------------
> > > kernel: netdevice: eth0: failed to disable LRO!
> > > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > > dev_disable_lro+0x108/0x150
> > > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > > virtio_pci_modern_dev virtio_ring virtio loop unix
> > > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > VirtualBox 12/01/2006
> > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > > kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > > knlGS:0000000000000000
> > > kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > > kernel: Call Trace:
> > > kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> > > kernel:  proc_sys_call_handler+0x127/0x230
> > > kernel:  new_sync_write+0x114/0x1a0
> > > kernel:  vfs_write+0x18c/0x220
> > > kernel:  ksys_write+0x5a/0xd0
> > > kernel:  do_syscall_64+0x45/0x80
> > > kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > kernel: RIP: 0033:0x7fd4912b79b3
> > > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > > kernel: ---[ end trace ee7985b10570603d ]---
> > > kernel: ------------[ cut here ]------------
> >
> > So the warning is easy to reproduce.
> > On qemu/kvm just set ctrl_guest_offloads=off for the device.
> 
> I have no control over the settings of the host.
> I have full control over the guest.
> 
> > The panic does not seem to trigger for me and you did not provide
> > any data about it.  What happens? Does guest just freeze?
> 
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
>   echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.

So the host or to the guest? 

-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
@ 2021-07-31 20:53       ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-31 20:53 UTC (permalink / raw)
  To: Ivan
  Cc: Willem de Bruijn, netdev, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller

On Fri, Jul 30, 2021 at 12:04:18PM -0500, Ivan wrote:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > > Dear Sir,
> > >
> > > I've been plagued with kernel panics recently. The problem is easily
> > > reproducible on any virtual machine that uses the virtio-net driver
> > > from stock Linux kernel. Simply isuse this command:
> > >
> > > echo 1 > /proc/sys/net/ipv4/ip_forward
> > > ...and the kernel panics.
> > >
> > > Is there any way we can possibly fix this?
> > >
> > > kernel: ------------[ cut here ]------------
> > > kernel: netdevice: eth0: failed to disable LRO!
> > > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > > dev_disable_lro+0x108/0x150
> > > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > > virtio_pci_modern_dev virtio_ring virtio loop unix
> > > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > VirtualBox 12/01/2006
> > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > > kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > > knlGS:0000000000000000
> > > kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > > kernel: Call Trace:
> > > kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> > > kernel:  proc_sys_call_handler+0x127/0x230
> > > kernel:  new_sync_write+0x114/0x1a0
> > > kernel:  vfs_write+0x18c/0x220
> > > kernel:  ksys_write+0x5a/0xd0
> > > kernel:  do_syscall_64+0x45/0x80
> > > kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > kernel: RIP: 0033:0x7fd4912b79b3
> > > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > > kernel: ---[ end trace ee7985b10570603d ]---
> > > kernel: ------------[ cut here ]------------
> >
> > So the warning is easy to reproduce.
> > On qemu/kvm just set ctrl_guest_offloads=off for the device.
> 
> I have no control over the settings of the host.
> I have full control over the guest.
> 
> > The panic does not seem to trigger for me and you did not provide
> > any data about it.  What happens? Does guest just freeze?
> 
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
>   echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.

So the host or to the guest? 

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-07-31 20:53       ` Michael S. Tsirkin
  (?)
@ 2021-07-31 23:52       ` Ivan
  -1 siblings, 0 replies; 24+ messages in thread
From: Ivan @ 2021-07-31 23:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
	virtualization, netdev, Eric Dumazet, Jakub Kicinski, Ivan

On Sat, Jul 31, 2021 at 3:53 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jul 30, 2021 at 12:04:18PM -0500, Ivan wrote:
> > On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > > > Dear Sir,
> > > >
> > > > I've been plagued with kernel panics recently. The problem is easily
> > > > reproducible on any virtual machine that uses the virtio-net driver
> > > > from stock Linux kernel. Simply isuse this command:
> > > >
> > > > echo 1 > /proc/sys/net/ipv4/ip_forward
> > > > ...and the kernel panics.
> > > >
> > > > Is there any way we can possibly fix this?
> > > >
> > > > kernel: ------------[ cut here ]------------
> > > > kernel: netdevice: eth0: failed to disable LRO!
> > > > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > > > dev_disable_lro+0x108/0x150
> > > > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > > > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > > > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > > > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > > > virtio_pci_modern_dev virtio_ring virtio loop unix
> > > > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > > VirtualBox 12/01/2006
> > > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > > > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > > > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > > > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > > > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > > > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > > > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > > > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > > > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > > > kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > > > knlGS:0000000000000000
> > > > kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > > > kernel: Call Trace:
> > > > kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> > > > kernel:  proc_sys_call_handler+0x127/0x230
> > > > kernel:  new_sync_write+0x114/0x1a0
> > > > kernel:  vfs_write+0x18c/0x220
> > > > kernel:  ksys_write+0x5a/0xd0
> > > > kernel:  do_syscall_64+0x45/0x80
> > > > kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > kernel: RIP: 0033:0x7fd4912b79b3
> > > > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > > > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > > > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > > > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > > > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > > > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > > > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > > > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > > > kernel: ---[ end trace ee7985b10570603d ]---
> > > > kernel: ------------[ cut here ]------------
> > >
> > > So the warning is easy to reproduce.
> > > On qemu/kvm just set ctrl_guest_offloads=off for the device.
> >
> > I have no control over the settings of the host.
> > I have full control over the guest.
> >
> > > The panic does not seem to trigger for me and you did not provide
> > > any data about it.  What happens? Does guest just freeze?
> >
> > I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> > No, the guest does not freeze, just, the moment I issue the command...
> >   echo 1 > /proc/sys/net/ipv4/ip_forward
> > ... and I see the "--[ cut here ]--" message appear in the syslog.
> > Shortly thereafter my ssh session to that host dies.
>
> So the host or to the guest?
Sorry!  The guest. (My bad)  This problem happens in the guest.
My ssh session to that guest dies shortly after I ussue that command.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-07-30 17:04   ` Ivan
@ 2021-08-02  4:35       ` Jason Wang
  2021-08-02  4:35       ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2021-08-02  4:35 UTC (permalink / raw)
  To: Ivan, Michael S. Tsirkin
  Cc: Willem de Bruijn, David S. Miller, Tonghao Zhang, virtualization,
	netdev, Eric Dumazet, Jakub Kicinski


在 2021/7/31 上午1:04, Ivan 写道:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
>>> Dear Sir,
>>>
>>> I've been plagued with kernel panics recently. The problem is easily
>>> reproducible on any virtual machine that uses the virtio-net driver
>>> from stock Linux kernel. Simply isuse this command:
>>>
>>> echo 1 > /proc/sys/net/ipv4/ip_forward
>>> ...and the kernel panics.
>>>
>>> Is there any way we can possibly fix this?
>>>
>>> kernel: ------------[ cut here ]------------
>>> kernel: netdevice: eth0: failed to disable LRO!
>>> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
>>> dev_disable_lro+0x108/0x150
>>> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
>>> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
>>> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
>>> rng_core i2c_piix4 i2c_core virtio_pci usb_common
>>> virtio_pci_modern_dev virtio_ring virtio loop unix
>>> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>> VirtualBox 12/01/2006
>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
>>> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
>>> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
>>> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
>>> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
>>> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
>>> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
>>> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
>>> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
>>> kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
>>> knlGS:0000000000000000
>>> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
>>> kernel: Call Trace:
>>> kernel:  devinet_sysctl_forward+0x1ac/0x1e0
>>> kernel:  proc_sys_call_handler+0x127/0x230
>>> kernel:  new_sync_write+0x114/0x1a0
>>> kernel:  vfs_write+0x18c/0x220
>>> kernel:  ksys_write+0x5a/0xd0
>>> kernel:  do_syscall_64+0x45/0x80
>>> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> kernel: RIP: 0033:0x7fd4912b79b3
>>> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
>>> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
>>> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
>>> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
>>> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
>>> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
>>> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
>>> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
>>> kernel: ---[ end trace ee7985b10570603d ]---
>>> kernel: ------------[ cut here ]------------
>> So the warning is easy to reproduce.
>> On qemu/kvm just set ctrl_guest_offloads=off for the device.
> I have no control over the settings of the host.
> I have full control over the guest.
>
>> The panic does not seem to trigger for me and you did not provide
>> any data about it.  What happens? Does guest just freeze?
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
>    echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.


Does it work before this commit?

commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
Author: Willem de Bruijn <willemb@google.com>
Date:   Thu Dec 20 17:14:54 2018 -0500

     virtio-net: ethtool configurable LRO

Thanks


>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
@ 2021-08-02  4:35       ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2021-08-02  4:35 UTC (permalink / raw)
  To: Ivan, Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller


在 2021/7/31 上午1:04, Ivan 写道:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
>>> Dear Sir,
>>>
>>> I've been plagued with kernel panics recently. The problem is easily
>>> reproducible on any virtual machine that uses the virtio-net driver
>>> from stock Linux kernel. Simply isuse this command:
>>>
>>> echo 1 > /proc/sys/net/ipv4/ip_forward
>>> ...and the kernel panics.
>>>
>>> Is there any way we can possibly fix this?
>>>
>>> kernel: ------------[ cut here ]------------
>>> kernel: netdevice: eth0: failed to disable LRO!
>>> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
>>> dev_disable_lro+0x108/0x150
>>> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
>>> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
>>> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
>>> rng_core i2c_piix4 i2c_core virtio_pci usb_common
>>> virtio_pci_modern_dev virtio_ring virtio loop unix
>>> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>> VirtualBox 12/01/2006
>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
>>> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
>>> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
>>> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
>>> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
>>> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
>>> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
>>> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
>>> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
>>> kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
>>> knlGS:0000000000000000
>>> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
>>> kernel: Call Trace:
>>> kernel:  devinet_sysctl_forward+0x1ac/0x1e0
>>> kernel:  proc_sys_call_handler+0x127/0x230
>>> kernel:  new_sync_write+0x114/0x1a0
>>> kernel:  vfs_write+0x18c/0x220
>>> kernel:  ksys_write+0x5a/0xd0
>>> kernel:  do_syscall_64+0x45/0x80
>>> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> kernel: RIP: 0033:0x7fd4912b79b3
>>> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
>>> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
>>> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
>>> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
>>> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
>>> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
>>> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
>>> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
>>> kernel: ---[ end trace ee7985b10570603d ]---
>>> kernel: ------------[ cut here ]------------
>> So the warning is easy to reproduce.
>> On qemu/kvm just set ctrl_guest_offloads=off for the device.
> I have no control over the settings of the host.
> I have full control over the guest.
>
>> The panic does not seem to trigger for me and you did not provide
>> any data about it.  What happens? Does guest just freeze?
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
>    echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.


Does it work before this commit?

commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
Author: Willem de Bruijn <willemb@google.com>
Date:   Thu Dec 20 17:14:54 2018 -0500

     virtio-net: ethtool configurable LRO

Thanks


>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-08-02  4:35       ` Jason Wang
  (?)
@ 2021-08-02 18:16       ` Ivan
  -1 siblings, 0 replies; 24+ messages in thread
From: Ivan @ 2021-08-02 18:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Willem de Bruijn, David S. Miller,
	Tonghao Zhang, virtualization, netdev, Eric Dumazet,
	Jakub Kicinski, Ivan

On Sun, Aug 1, 2021 at 11:35 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/7/31 上午1:04, Ivan 写道:
> > On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> >>> Dear Sir,
> >>>
> >>> I've been plagued with kernel panics recently. The problem is easily
> >>> reproducible on any virtual machine that uses the virtio-net driver
> >>> from stock Linux kernel. Simply isuse this command:
> >>>
> >>> echo 1 > /proc/sys/net/ipv4/ip_forward
> >>> ...and the kernel panics.
> >>>
> >>> Is there any way we can possibly fix this?
> >>>
> >>> kernel: ------------[ cut here ]------------
> >>> kernel: netdevice: eth0: failed to disable LRO!
> >>> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> >>> dev_disable_lro+0x108/0x150
> >>> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> >>> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> >>> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> >>> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> >>> virtio_pci_modern_dev virtio_ring virtio loop unix
> >>> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> >>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> >>> VirtualBox 12/01/2006
> >>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> >>> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> >>> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> >>> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> >>> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> >>> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> >>> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> >>> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> >>> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> >>> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> >>> kernel: FS:  00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> >>> knlGS:0000000000000000
> >>> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> >>> kernel: Call Trace:
> >>> kernel:  devinet_sysctl_forward+0x1ac/0x1e0
> >>> kernel:  proc_sys_call_handler+0x127/0x230
> >>> kernel:  new_sync_write+0x114/0x1a0
> >>> kernel:  vfs_write+0x18c/0x220
> >>> kernel:  ksys_write+0x5a/0xd0
> >>> kernel:  do_syscall_64+0x45/0x80
> >>> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> kernel: RIP: 0033:0x7fd4912b79b3
> >>> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> >>> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> >>> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> >>> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >>> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> >>> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> >>> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> >>> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> >>> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> >>> kernel: ---[ end trace ee7985b10570603d ]---
> >>> kernel: ------------[ cut here ]------------
> >> So the warning is easy to reproduce.
> >> On qemu/kvm just set ctrl_guest_offloads=off for the device.
> > I have no control over the settings of the host.
> > I have full control over the guest.
> >
> >> The panic does not seem to trigger for me and you did not provide
> >> any data about it.  What happens? Does guest just freeze?
> > I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> > No, the guest does not freeze, just, the moment I issue the command...
> >    echo 1 > /proc/sys/net/ipv4/ip_forward
> > ... and I see the "--[ cut here ]--" message appear in the syslog.
> > Shortly thereafter my ssh session to that host dies.
>
>
> Does it work before this commit?
>
> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> Author: Willem de Bruijn <willemb@google.com>
> Date:   Thu Dec 20 17:14:54 2018 -0500
>
>      virtio-net: ethtool configurable LRO

Yes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]                             ` <CACFia2dLp19pzJsScSvVYREpQm0n6XCWLieWXzA94=OVYVHTbw@mail.gmail.com>
@ 2021-08-02 19:51                               ` Michael S. Tsirkin
       [not found]                                 ` <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-02 19:51 UTC (permalink / raw)
  To: Ivan
  Cc: Willem de Bruijn, virtualization, Eric Dumazet, Jakub Kicinski,
	David S. Miller

On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > >
> > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > >
> > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > >>
> > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >>>
> > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >>> > >
> > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > >>> > > > > > Actual changes:
> > > > >>> > > > > > rx-lro: on [requested off]
> > > > >>> > > > > > Could not change any device features
> > > > >>> > > > >
> > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > >>> > > > >
> > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > >>> > > >
> > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > >>> > > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > >>> > >
> > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > >>> > >
> > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > >>> >
> > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > >>>
> > > > >>> I was able to reproduce the warning but not the panic.
> > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > >>> is already a problem. Any chance you can bisect to
> > > > >>> find out which change introduced the panic?
> > > > >>
> > > > >>
> > > > >> Any kernels up to 4.19.198 don't panic.
> > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > >> That may take a day or so.  I'll get on with it now, and report my findings.
> > > > >
> > > > > So, I narrowed  it down: the panics start with kernel 5.0-rc.
> > > >
> > > > More narowly, the problem seems be coming from commit
> > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > and the panic went away.  Hope that helps you guys figure out
> > > > what the problem might be.
> >
> > Well it disables LRO but we knew this :( I'd help if we knew
> > where does it panic, all we see it the warning which is
> > related for sure but not the immediate rootcause ...
> >
> > > >
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2978,11 +2978,6 @@
> > > >   }
> > > >   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > >      dev->features |= NETIF_F_RXCSUM;
> > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > -    dev->features |= NETIF_F_LRO;
> > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > -    dev->hw_features |= NETIF_F_LRO;
> > > >
> > > >   dev->vlan_features = dev->features;
> > >
> > > Just FYI, Google turned up two similar bug reposts...
> > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > >
> > > Is there any sensible thing I could do, temporarily, until this
> > > problem is sorted out?
> > > Or am I simply stuck to kernels 4.19 on these machines for now?
> >
> >
> > Something like this I guess:
> >
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8a58a2f013af..cc5982193a40 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> >                         __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> >         }
> >
> > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> >         return 0;
> >  }
> 
> When I apply your patch, then I see drastic (more than half)
> reductions in speed. (confirmed with iperf).
> 
> But if instead I just remove a few lines from commit
> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> as in my earlier post, then I'm back to full speed
> 
> I understand that this is just temporary workaround, until we figure this out.


Oh weird. So it's not about getting some weird LRO packet. We will get it with
VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
features.

How about this then? Just pretend to Linux that we disabled LRO.


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..8e7e4cea176b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
 				   ~GUEST_OFFLOAD_LRO_MASK;
 
 		err = virtnet_set_guest_offloads(vi, offloads);
-		if (err)
-			return err;
+		WARN_ON(err);
+		//if (err)
+		//	return err;
 		vi->guest_offloads = offloads;
 	}
 
 

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
       [not found]                                 ` <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
@ 2021-08-10 15:31                                   ` Michael S. Tsirkin
  2021-08-11  3:38                                     ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-10 15:31 UTC (permalink / raw)
  To: Ivan
  Cc: Willem de Bruijn, virtualization, Eric Dumazet, Jakub Kicinski,
	David S. Miller

On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > >
> > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >
> > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >>
> > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >>>
> > > > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >>> > >
> > > > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > >>> > > > > > Actual changes:
> > > > > > >>> > > > > > rx-lro: on [requested off]
> > > > > > >>> > > > > > Could not change any device features
> > > > > > >>> > > > >
> > > > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > > > >>> > > > >
> > > > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > >>> > > >
> > > > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > > > >>> > > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > >>> > >
> > > > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > > > >>> > >
> > > > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > > > >>> >
> > > > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > >>>
> > > > > > >>> I was able to reproduce the warning but not the panic.
> > > > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > > > >>> is already a problem. Any chance you can bisect to
> > > > > > >>> find out which change introduced the panic?
> > > > > > >>
> > > > > > >>
> > > > > > >> Any kernels up to 4.19.198 don't panic.
> > > > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > >> That may take a day or so.  I'll get on with it now, and report my findings.
> > > > > > >
> > > > > > > So, I narrowed  it down: the panics start with kernel 5.0-rc.
> > > > > >
> > > > > > More narowly, the problem seems be coming from commit
> > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > and the panic went away.  Hope that helps you guys figure out
> > > > > > what the problem might be.
> > > >
> > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > where does it panic, all we see it the warning which is
> > > > related for sure but not the immediate rootcause ...
> > > >
> > > > > >
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -2978,11 +2978,6 @@
> > > > > >   }
> > > > > >   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > >      dev->features |= NETIF_F_RXCSUM;
> > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > -    dev->features |= NETIF_F_LRO;
> > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > -    dev->hw_features |= NETIF_F_LRO;
> > > > > >
> > > > > >   dev->vlan_features = dev->features;
> > > > >
> > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > >
> > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > problem is sorted out?
> > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > >
> > > >
> > > > Something like this I guess:
> > > >
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 8a58a2f013af..cc5982193a40 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > >                         __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > >         }
> > > >
> > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > >         return 0;
> > > >  }
> > >
> > > When I apply your patch, then I see drastic (more than half)
> > > reductions in speed. (confirmed with iperf).
> > >
> > > But if instead I just remove a few lines from commit
> > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > as in my earlier post, then I'm back to full speed
> > >
> > > I understand that this is just temporary workaround, until we figure this out.
> >
> >
> > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > features.
> >
> > How about this then? Just pretend to Linux that we disabled LRO.
> >
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8a58a2f013af..8e7e4cea176b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> >                                    ~GUEST_OFFLOAD_LRO_MASK;
> >
> >                 err = virtnet_set_guest_offloads(vi, offloads);
> > -               if (err)
> > -                       return err;
> > +               WARN_ON(err);
> > +               //if (err)
> > +               //      return err;
> >                 vi->guest_offloads = offloads;
> >         }
> 
> No. With this applied, the problem persists:
> 
> # echo "1" > /proc/sys/net/ipv4/ip_forward
> 
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150

Again the warning isn't a big deal. I agree we should address - Jason
any update? But the main issue is you lose connectivity. That still
persists with this? Can't you get a serial connection
out? I know qemu Did the kernel oops afterwards?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-08-10 15:31                                   ` Michael S. Tsirkin
@ 2021-08-11  3:38                                     ` Jason Wang
  2021-08-11  7:39                                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-08-11  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 9044 bytes --]

On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> > On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >
> > > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > >>
> > > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >>>
> > > > > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >>> > >
> > > > > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > > >>> > > > > > Actual changes:
> > > > > > > >>> > > > > > rx-lro: on [requested off]
> > > > > > > >>> > > > > > Could not change any device features
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > > >>> > > >
> > > > > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > > > > >>> > > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > > > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > > > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > > >>> > >
> > > > > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > > > > >>> > >
> > > > > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > > > > >>> >
> > > > > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > > >>>
> > > > > > > >>> I was able to reproduce the warning but not the panic.
> > > > > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > > > > >>> is already a problem. Any chance you can bisect to
> > > > > > > >>> find out which change introduced the panic?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Any kernels up to 4.19.198 don't panic.
> > > > > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > > >> That may take a day or so.  I'll get on with it now, and report my findings.
> > > > > > > >
> > > > > > > > So, I narrowed  it down: the panics start with kernel 5.0-rc.
> > > > > > >
> > > > > > > More narowly, the problem seems be coming from commit
> > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > > and the panic went away.  Hope that helps you guys figure out
> > > > > > > what the problem might be.
> > > > >
> > > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > > where does it panic, all we see it the warning which is
> > > > > related for sure but not the immediate rootcause ...
> > > > >
> > > > > > >
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -2978,11 +2978,6 @@
> > > > > > >   }
> > > > > > >   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > >      dev->features |= NETIF_F_RXCSUM;
> > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > > -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > > -    dev->features |= NETIF_F_LRO;
> > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > > -    dev->hw_features |= NETIF_F_LRO;
> > > > > > >
> > > > > > >   dev->vlan_features = dev->features;
> > > > > >
> > > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > > >
> > > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > > problem is sorted out?
> > > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > > >
> > > > >
> > > > > Something like this I guess:
> > > > >
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 8a58a2f013af..cc5982193a40 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > >                         __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > >         }
> > > > >
> > > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > >         return 0;
> > > > >  }
> > > >
> > > > When I apply your patch, then I see drastic (more than half)
> > > > reductions in speed. (confirmed with iperf).
> > > >
> > > > But if instead I just remove a few lines from commit
> > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > > as in my earlier post, then I'm back to full speed
> > > >
> > > > I understand that this is just temporary workaround, until we figure this out.
> > >
> > >
> > > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > > features.
> > >
> > > How about this then? Just pretend to Linux that we disabled LRO.
> > >
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 8a58a2f013af..8e7e4cea176b 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > >                                    ~GUEST_OFFLOAD_LRO_MASK;
> > >
> > >                 err = virtnet_set_guest_offloads(vi, offloads);
> > > -               if (err)
> > > -                       return err;
> > > +               WARN_ON(err);
> > > +               //if (err)
> > > +               //      return err;
> > >                 vi->guest_offloads = offloads;
> > >         }
> >
> > No. With this applied, the problem persists:
> >
> > # echo "1" > /proc/sys/net/ipv4/ip_forward
> >
> > kernel: ------------[ cut here ]------------
> > kernel: netdevice: eth0: failed to disable LRO!
> > kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> > dev_disable_lro+0x108/0x150
> > kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> > hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> > libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> > ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> > i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> > rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> > kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > VirtualBox 12/01/2006
> > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>
> Again the warning isn't a big deal. I agree we should address - Jason
> any update?

I still think using NETIF_F_LRO might not be correct. Since we're
basically receiving GSO packets.

And it might cause a lot of issues if the device doesn't have
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.

I see two possible fixes:

1) using NETIF_F_GRO_HW instead (the patch is attached)

or

2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS

Thanks

> But the main issue is you lose connectivity. That still
> persists with this? Can't you get a serial connection
> out? I know qemu Did the kernel oops afterwards?
>
> --
> MST
>

[-- Attachment #2: 0001-virtio-net-use-NETIF_F_GRO_HW-instead-of-NETIF_F_LRO.patch --]
[-- Type: application/octet-stream, Size: 2478 bytes --]

From 3fcf302686bc5fc080a58338ec84fb21f3973071 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Wed, 11 Aug 2021 10:48:20 +0800
Subject: [PATCH] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0416a7e00914..10c382b08bce 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -63,7 +63,7 @@ static const unsigned long guest_offloads[] = {
 	VIRTIO_NET_F_GUEST_CSUM
 };
 
-#define GUEST_OFFLOAD_LRO_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
 				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
 				(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
 				(1ULL << VIRTIO_NET_F_GUEST_UFO))
@@ -2481,7 +2481,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
 		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
 		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
-		NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO/CSUM, disable LRO/CSUM first");
+		NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
 		return -EOPNOTSUPP;
 	}
 
@@ -2612,15 +2612,15 @@ static int virtnet_set_features(struct net_device *dev,
 	u64 offloads;
 	int err;
 
-	if ((dev->features ^ features) & NETIF_F_LRO) {
+	if ((dev->features ^ features) & NETIF_F_GRO_HW) {
 		if (vi->xdp_enabled)
 			return -EBUSY;
 
-		if (features & NETIF_F_LRO)
+		if (features & NETIF_F_GRO_HW)
 			offloads = vi->guest_offloads_capable;
 		else
 			offloads = vi->guest_offloads_capable &
-				   ~GUEST_OFFLOAD_LRO_MASK;
+				   ~GUEST_OFFLOAD_GRO_HW_MASK;
 
 		err = virtnet_set_guest_offloads(vi, offloads);
 		if (err)
@@ -3100,9 +3100,9 @@ static int virtnet_probe(struct virtio_device *vdev)
 		dev->features |= NETIF_F_RXCSUM;
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
-		dev->features |= NETIF_F_LRO;
+		dev->features |= NETIF_F_GRO_HW;
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
-		dev->hw_features |= NETIF_F_LRO;
+		dev->hw_features |= NETIF_F_GRO_HW;
 
 	dev->vlan_features = dev->features;
 
-- 
2.25.1


[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-08-11  3:38                                     ` Jason Wang
@ 2021-08-11  7:39                                       ` Michael S. Tsirkin
  2021-08-11  7:45                                         ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-11  7:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller

On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
> On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> > > On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > >>
> > > > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >>>
> > > > > > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >>> > >
> > > > > > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > > > >>> > > > > > Actual changes:
> > > > > > > > >>> > > > > > rx-lro: on [requested off]
> > > > > > > > >>> > > > > > Could not change any device features
> > > > > > > > >>> > > > >
> > > > > > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > > > > > >>> > > > >
> > > > > > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > > > >>> > > >
> > > > > > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > > > > > >>> > > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > > > > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > > > > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > > > >>> > >
> > > > > > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > > > > > >>> > >
> > > > > > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > > > > > >>> >
> > > > > > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > > > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > > > >>>
> > > > > > > > >>> I was able to reproduce the warning but not the panic.
> > > > > > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > > > > > >>> is already a problem. Any chance you can bisect to
> > > > > > > > >>> find out which change introduced the panic?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Any kernels up to 4.19.198 don't panic.
> > > > > > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > > > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > > > >> That may take a day or so.  I'll get on with it now, and report my findings.
> > > > > > > > >
> > > > > > > > > So, I narrowed  it down: the panics start with kernel 5.0-rc.
> > > > > > > >
> > > > > > > > More narowly, the problem seems be coming from commit
> > > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > > > and the panic went away.  Hope that helps you guys figure out
> > > > > > > > what the problem might be.
> > > > > >
> > > > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > > > where does it panic, all we see it the warning which is
> > > > > > related for sure but not the immediate rootcause ...
> > > > > >
> > > > > > > >
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -2978,11 +2978,6 @@
> > > > > > > >   }
> > > > > > > >   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > > >      dev->features |= NETIF_F_RXCSUM;
> > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > > > -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > > > -    dev->features |= NETIF_F_LRO;
> > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > > > -    dev->hw_features |= NETIF_F_LRO;
> > > > > > > >
> > > > > > > >   dev->vlan_features = dev->features;
> > > > > > >
> > > > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > > > >
> > > > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > > > problem is sorted out?
> > > > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > > > >
> > > > > >
> > > > > > Something like this I guess:
> > > > > >
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 8a58a2f013af..cc5982193a40 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > > >                         __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > > >         }
> > > > > >
> > > > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > > >         return 0;
> > > > > >  }
> > > > >
> > > > > When I apply your patch, then I see drastic (more than half)
> > > > > reductions in speed. (confirmed with iperf).
> > > > >
> > > > > But if instead I just remove a few lines from commit
> > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > > > as in my earlier post, then I'm back to full speed
> > > > >
> > > > > I understand that this is just temporary workaround, until we figure this out.
> > > >
> > > >
> > > > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > > > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > > > features.
> > > >
> > > > How about this then? Just pretend to Linux that we disabled LRO.
> > > >
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 8a58a2f013af..8e7e4cea176b 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > > >                                    ~GUEST_OFFLOAD_LRO_MASK;
> > > >
> > > >                 err = virtnet_set_guest_offloads(vi, offloads);
> > > > -               if (err)
> > > > -                       return err;
> > > > +               WARN_ON(err);
> > > > +               //if (err)
> > > > +               //      return err;
> > > >                 vi->guest_offloads = offloads;
> > > >         }
> > >
> > > No. With this applied, the problem persists:
> > >
> > > # echo "1" > /proc/sys/net/ipv4/ip_forward
> > >
> > > kernel: ------------[ cut here ]------------
> > > kernel: netdevice: eth0: failed to disable LRO!
> > > kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> > > dev_disable_lro+0x108/0x150
> > > kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> > > hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> > > libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> > > ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> > > i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> > > rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> > > kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > VirtualBox 12/01/2006
> > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> >
> > Again the warning isn't a big deal. I agree we should address - Jason
> > any update?
> 
> I still think using NETIF_F_LRO might not be correct. Since we're
> basically receiving GSO packets.
> 
> And it might cause a lot of issues if the device doesn't have
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> 
> I see two possible fixes:
> 
> 1) using NETIF_F_GRO_HW instead (the patch is attached)

It's unfortunate you didn't inline. Anyway.
Ivan could you test the patch and report?

> 
> or

Hmm. I am not sure we always preserve the GRO_HW requirement that
packets can be re-segmented to reconstruct the original packet stream. 
Do all backends guarantee this? Could you explain why?



> 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
> 
> Thanks


This one would slow guests on old hosts down significantly.

I am not sure why this didn't trigger previously btw -
we used not to have CTRL_GUEST_OFFLOADS after all.



> > But the main issue is you lose connectivity. That still
> > persists with this? Can't you get a serial connection
> > out? I know qemu Did the kernel oops afterwards?
> >
> > --
> > MST
> >



_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-08-11  7:39                                       ` Michael S. Tsirkin
@ 2021-08-11  7:45                                         ` Jason Wang
  2021-08-11  8:01                                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-08-11  7:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller


在 2021/8/11 下午3:39, Michael S. Tsirkin 写道:
> On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
>> On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
>>>> On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
>>>>>> On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
>>>>>>>> On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>> On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>> On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>> 在 2021/7/23 上午10:54, Ivan 写道:
>>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>>>> Does it work if you turn off lro before enabling the forwarding?
>>>>>>>>>>>>>>>>> 0 root@NuRaid:~# ethtool -K eth0 lro off
>>>>>>>>>>>>>>>>> Actual changes:
>>>>>>>>>>>>>>>>> rx-lro: on [requested off]
>>>>>>>>>>>>>>>>> Could not change any device features
>>>>>>>>>>>>>>>> Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>>>>>>>>>>>>>>>> which makes it impossible to change the LRO setting.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Did you use qemu? If yes, what's the qemu version you've used?
>>>>>>>>>>>>>>> These are VirtualBox machines, which I've been using for years with
>>>>>>>>>>>>>>> longterm kernels 4.19, and I never had such a problem.  But now that I
>>>>>>>>>>>>>>> tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
>>>>>>>>>>>>>>> are just generic kernel builds, and a minimalistic userspace.
>>>>>>>>>>>>>> I would be useful to see the features your virtualbox instance provides
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>> # cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>> 1100010110111011111100000000000000000000000000000000000000000000
>>>>>>>>>>>> I was able to reproduce the warning but not the panic.
>>>>>>>>>>>> OTOH if LRO stays on when enabling forwarding that
>>>>>>>>>>>> is already a problem. Any chance you can bisect to
>>>>>>>>>>>> find out which change introduced the panic?
>>>>>>>>>>>
>>>>>>>>>>> Any kernels up to 4.19.198 don't panic.
>>>>>>>>>>> Any kernels 5.10+ panic immediately upon starting forwarding.
>>>>>>>>>>> I have not tested any kernels between 4.19 and 5.10.
>>>>>>>>>>> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
>>>>>>>>>>> That may take a day or so.  I'll get on with it now, and report my findings.
>>>>>>>>>> So, I narrowed  it down: the panics start with kernel 5.0-rc.
>>>>>>>>> More narowly, the problem seems be coming from commit
>>>>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
>>>>>>>>> Just to test my suspicion, I deleted a few lines from that code,
>>>>>>>>> and the panic went away.  Hope that helps you guys figure out
>>>>>>>>> what the problem might be.
>>>>>>> Well it disables LRO but we knew this :( I'd help if we knew
>>>>>>> where does it panic, all we see it the warning which is
>>>>>>> related for sure but not the immediate rootcause ...
>>>>>>>
>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>> @@ -2978,11 +2978,6 @@
>>>>>>>>>    }
>>>>>>>>>    if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>>>>>>>>>       dev->features |= NETIF_F_RXCSUM;
>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>>>>>> -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
>>>>>>>>> -    dev->features |= NETIF_F_LRO;
>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
>>>>>>>>> -    dev->hw_features |= NETIF_F_LRO;
>>>>>>>>>
>>>>>>>>>    dev->vlan_features = dev->features;
>>>>>>>> Just FYI, Google turned up two similar bug reposts...
>>>>>>>> Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
>>>>>>>> Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
>>>>>>>>
>>>>>>>> Is there any sensible thing I could do, temporarily, until this
>>>>>>>> problem is sorted out?
>>>>>>>> Or am I simply stuck to kernels 4.19 on these machines for now?
>>>>>>>
>>>>>>> Something like this I guess:
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 8a58a2f013af..cc5982193a40 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
>>>>>>>                          __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>>>>>>>          }
>>>>>>>
>>>>>>> +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
>>>>>>> +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
>>>>>>>          return 0;
>>>>>>>   }
>>>>>> When I apply your patch, then I see drastic (more than half)
>>>>>> reductions in speed. (confirmed with iperf).
>>>>>>
>>>>>> But if instead I just remove a few lines from commit
>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
>>>>>> as in my earlier post, then I'm back to full speed
>>>>>>
>>>>>> I understand that this is just temporary workaround, until we figure this out.
>>>>>
>>>>> Oh weird. So it's not about getting some weird LRO packet. We will get it with
>>>>> VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
>>>>> features.
>>>>>
>>>>> How about this then? Just pretend to Linux that we disabled LRO.
>>>>>
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index 8a58a2f013af..8e7e4cea176b 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
>>>>>                                     ~GUEST_OFFLOAD_LRO_MASK;
>>>>>
>>>>>                  err = virtnet_set_guest_offloads(vi, offloads);
>>>>> -               if (err)
>>>>> -                       return err;
>>>>> +               WARN_ON(err);
>>>>> +               //if (err)
>>>>> +               //      return err;
>>>>>                  vi->guest_offloads = offloads;
>>>>>          }
>>>> No. With this applied, the problem persists:
>>>>
>>>> # echo "1" > /proc/sys/net/ipv4/ip_forward
>>>>
>>>> kernel: ------------[ cut here ]------------
>>>> kernel: netdevice: eth0: failed to disable LRO!
>>>> kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
>>>> dev_disable_lro+0x108/0x150
>>>> kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
>>>> hid_generic usbhid hid virtio_net net_failover failover aesni_intel
>>>> libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
>>>> ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
>>>> i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
>>>> rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
>>>> kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
>>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>>> VirtualBox 12/01/2006
>>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>> Again the warning isn't a big deal. I agree we should address - Jason
>>> any update?
>> I still think using NETIF_F_LRO might not be correct. Since we're
>> basically receiving GSO packets.
>>
>> And it might cause a lot of issues if the device doesn't have
>> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
>>
>> I see two possible fixes:
>>
>> 1) using NETIF_F_GRO_HW instead (the patch is attached)
> It's unfortunate you didn't inline. Anyway.
> Ivan could you test the patch and report?
>
>> or
> Hmm. I am not sure we always preserve the GRO_HW requirement that
> packets can be re-segmented to reconstruct the original packet stream.
> Do all backends guarantee this?


I think we can't.


> Could you explain why?


Or we probably need another new netdev feature like rx-gso?


>
>
>
>> 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
>>
>> Thanks
>
> This one would slow guests on old hosts down significantly.


Actually, it's not this proposal but see below.


>
> I am not sure why this didn't trigger previously


It looks to me it was caused by a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 
("virtio-net: ethtool configurable LRO").

Before this commit we won't even advertise NETIF_F_LRO, so 
dev_disable_lro() won't warn.

After this commit, we advertise LRO and dev_disable_lro() will try to 
disable all guest offloads which will:

1) slow the traffic

and

2) warn if "lro" can't be disabled on the device without ctrl guest 
offloads (e.g the virtualbox host)

Thanks


> btw -
> we used not to have CTRL_GUEST_OFFLOADS after all.
>
>
>
>>> But the main issue is you lose connectivity. That still
>>> persists with this? Can't you get a serial connection
>>> out? I know qemu Did the kernel oops afterwards?
>>>
>>> --
>>> MST
>>>
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-08-11  7:45                                         ` Jason Wang
@ 2021-08-11  8:01                                           ` Michael S. Tsirkin
  2021-08-11  8:17                                             ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-11  8:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller

On Wed, Aug 11, 2021 at 03:45:48PM +0800, Jason Wang wrote:
> 
> 在 2021/8/11 下午3:39, Michael S. Tsirkin 写道:
> > On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
> > > On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> > > > > On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > > > > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > > > > > On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > > > > > > > > > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > > > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > > > > > > > > Does it work if you turn off lro before enabling the forwarding?
> > > > > > > > > > > > > > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > > > > > > > > > > > > > Actual changes:
> > > > > > > > > > > > > > > > > > rx-lro: on [requested off]
> > > > > > > > > > > > > > > > > > Could not change any device features
> > > > > > > > > > > > > > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > > > > > > > > > > > > which makes it impossible to change the LRO setting.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > > > > > > > > > > > These are VirtualBox machines, which I've been using for years with
> > > > > > > > > > > > > > > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > > > > > > > > > > > > > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > > > > > > > > > > > > > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > > > > > > > > > > I would be useful to see the features your virtualbox instance provides
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > cat /sys/class/net/eth0/device/features
> > > > > > > > > > > > > > # cat /sys/class/net/eth0/device/features
> > > > > > > > > > > > > > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > > > > > > > > I was able to reproduce the warning but not the panic.
> > > > > > > > > > > > > OTOH if LRO stays on when enabling forwarding that
> > > > > > > > > > > > > is already a problem. Any chance you can bisect to
> > > > > > > > > > > > > find out which change introduced the panic?
> > > > > > > > > > > > 
> > > > > > > > > > > > Any kernels up to 4.19.198 don't panic.
> > > > > > > > > > > > Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > > > > > > > I have not tested any kernels between 4.19 and 5.10.
> > > > > > > > > > > > I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > > > > > > > That may take a day or so.  I'll get on with it now, and report my findings.
> > > > > > > > > > > So, I narrowed  it down: the panics start with kernel 5.0-rc.
> > > > > > > > > > More narowly, the problem seems be coming from commit
> > > > > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > > > > > and the panic went away.  Hope that helps you guys figure out
> > > > > > > > > > what the problem might be.
> > > > > > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > > > > > where does it panic, all we see it the warning which is
> > > > > > > > related for sure but not the immediate rootcause ...
> > > > > > > > 
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -2978,11 +2978,6 @@
> > > > > > > > > >    }
> > > > > > > > > >    if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > > > > >       dev->features |= NETIF_F_RXCSUM;
> > > > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > > > > > -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > > > > > -    dev->features |= NETIF_F_LRO;
> > > > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > > > > > -    dev->hw_features |= NETIF_F_LRO;
> > > > > > > > > > 
> > > > > > > > > >    dev->vlan_features = dev->features;
> > > > > > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > > > > > > 
> > > > > > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > > > > > problem is sorted out?
> > > > > > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > > > > > > 
> > > > > > > > Something like this I guess:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 8a58a2f013af..cc5982193a40 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > > > > >                          __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > > > > >          }
> > > > > > > > 
> > > > > > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > > > > > +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > > > > >          return 0;
> > > > > > > >   }
> > > > > > > When I apply your patch, then I see drastic (more than half)
> > > > > > > reductions in speed. (confirmed with iperf).
> > > > > > > 
> > > > > > > But if instead I just remove a few lines from commit
> > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > > > > > as in my earlier post, then I'm back to full speed
> > > > > > > 
> > > > > > > I understand that this is just temporary workaround, until we figure this out.
> > > > > > 
> > > > > > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > > > > > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > > > > > features.
> > > > > > 
> > > > > > How about this then? Just pretend to Linux that we disabled LRO.
> > > > > > 
> > > > > > 
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 8a58a2f013af..8e7e4cea176b 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > > > > >                                     ~GUEST_OFFLOAD_LRO_MASK;
> > > > > > 
> > > > > >                  err = virtnet_set_guest_offloads(vi, offloads);
> > > > > > -               if (err)
> > > > > > -                       return err;
> > > > > > +               WARN_ON(err);
> > > > > > +               //if (err)
> > > > > > +               //      return err;
> > > > > >                  vi->guest_offloads = offloads;
> > > > > >          }
> > > > > No. With this applied, the problem persists:
> > > > > 
> > > > > # echo "1" > /proc/sys/net/ipv4/ip_forward
> > > > > 
> > > > > kernel: ------------[ cut here ]------------
> > > > > kernel: netdevice: eth0: failed to disable LRO!
> > > > > kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> > > > > dev_disable_lro+0x108/0x150
> > > > > kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> > > > > hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> > > > > libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> > > > > ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> > > > > i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> > > > > rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> > > > > kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> > > > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > > > VirtualBox 12/01/2006
> > > > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > > Again the warning isn't a big deal. I agree we should address - Jason
> > > > any update?
> > > I still think using NETIF_F_LRO might not be correct. Since we're
> > > basically receiving GSO packets.
> > > 
> > > And it might cause a lot of issues if the device doesn't have
> > > VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> > > 
> > > I see two possible fixes:
> > > 
> > > 1) using NETIF_F_GRO_HW instead (the patch is attached)
> > It's unfortunate you didn't inline. Anyway.
> > Ivan could you test the patch and report?
> > 
> > > or
> > Hmm. I am not sure we always preserve the GRO_HW requirement that
> > packets can be re-segmented to reconstruct the original packet stream.
> > Do all backends guarantee this?
> 
> 
> I think we can't.
> 
> 
> > Could you explain why?
> 
> 
> Or we probably need another new netdev feature like rx-gso?
> 
> 
> > 
> > 
> > 
> > > 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
> > > 
> > > Thanks
> > 
> > This one would slow guests on old hosts down significantly.
> 
> 
> Actually, it's not this proposal but see below.
> 
> 
> > 
> > I am not sure why this didn't trigger previously
> 
> 
> It looks to me it was caused by a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> ("virtio-net: ethtool configurable LRO").
> 
> Before this commit we won't even advertise NETIF_F_LRO, so dev_disable_lro()
> won't warn.
> 
> After this commit, we advertise LRO and dev_disable_lro() will try to
> disable all guest offloads which will:
> 
> 1) slow the traffic
> 
> and
> 
> 2) warn if "lro" can't be disabled on the device without ctrl guest offloads
> (e.g the virtualbox host)
> 
> Thanks

OK. So I think I understand your comment now: GRO_HW makes sense simply
because historically before a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 we
never advertised LRO.

Can you post a patch RFC properly so Ivan can test?



> 
> > btw -
> > we used not to have CTRL_GUEST_OFFLOADS after all.
> > 
> > 
> > 
> > > > But the main issue is you lose connectivity. That still
> > > > persists with this? Can't you get a serial connection
> > > > out? I know qemu Did the kernel oops afterwards?
> > > > 
> > > > --
> > > > MST
> > > > 
> > 
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PROBLEM: virtio_net LRO kernel panics
  2021-08-11  8:01                                           ` Michael S. Tsirkin
@ 2021-08-11  8:17                                             ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2021-08-11  8:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
	Jakub Kicinski, David S. Miller


在 2021/8/11 下午4:01, Michael S. Tsirkin 写道:
> On Wed, Aug 11, 2021 at 03:45:48PM +0800, Jason Wang wrote:
>> 在 2021/8/11 下午3:39, Michael S. Tsirkin 写道:
>>> On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
>>>> On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
>>>>>> On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
>>>>>>>> On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>> On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
>>>>>>>>>> On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>> On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>>> On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
>>>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
>>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>>>> 在 2021/7/23 上午10:54, Ivan 写道:
>>>>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>> Does it work if you turn off lro before enabling the forwarding?
>>>>>>>>>>>>>>>>>>> 0 root@NuRaid:~# ethtool -K eth0 lro off
>>>>>>>>>>>>>>>>>>> Actual changes:
>>>>>>>>>>>>>>>>>>> rx-lro: on [requested off]
>>>>>>>>>>>>>>>>>>> Could not change any device features
>>>>>>>>>>>>>>>>>> Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>>>>>>>>>>>>>>>>>> which makes it impossible to change the LRO setting.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Did you use qemu? If yes, what's the qemu version you've used?
>>>>>>>>>>>>>>>>> These are VirtualBox machines, which I've been using for years with
>>>>>>>>>>>>>>>>> longterm kernels 4.19, and I never had such a problem.  But now that I
>>>>>>>>>>>>>>>>> tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
>>>>>>>>>>>>>>>>> are just generic kernel builds, and a minimalistic userspace.
>>>>>>>>>>>>>>>> I would be useful to see the features your virtualbox instance provides
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>>>> # cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>>>> 1100010110111011111100000000000000000000000000000000000000000000
>>>>>>>>>>>>>> I was able to reproduce the warning but not the panic.
>>>>>>>>>>>>>> OTOH if LRO stays on when enabling forwarding that
>>>>>>>>>>>>>> is already a problem. Any chance you can bisect to
>>>>>>>>>>>>>> find out which change introduced the panic?
>>>>>>>>>>>>> Any kernels up to 4.19.198 don't panic.
>>>>>>>>>>>>> Any kernels 5.10+ panic immediately upon starting forwarding.
>>>>>>>>>>>>> I have not tested any kernels between 4.19 and 5.10.
>>>>>>>>>>>>> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
>>>>>>>>>>>>> That may take a day or so.  I'll get on with it now, and report my findings.
>>>>>>>>>>>> So, I narrowed  it down: the panics start with kernel 5.0-rc.
>>>>>>>>>>> More narowly, the problem seems be coming from commit
>>>>>>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
>>>>>>>>>>> Just to test my suspicion, I deleted a few lines from that code,
>>>>>>>>>>> and the panic went away.  Hope that helps you guys figure out
>>>>>>>>>>> what the problem might be.
>>>>>>>>> Well it disables LRO but we knew this :( I'd help if we knew
>>>>>>>>> where does it panic, all we see it the warning which is
>>>>>>>>> related for sure but not the immediate rootcause ...
>>>>>>>>>
>>>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>>>> @@ -2978,11 +2978,6 @@
>>>>>>>>>>>     }
>>>>>>>>>>>     if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>>>>>>>>>>>        dev->features |= NETIF_F_RXCSUM;
>>>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>>>>>>>> -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
>>>>>>>>>>> -    dev->features |= NETIF_F_LRO;
>>>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
>>>>>>>>>>> -    dev->hw_features |= NETIF_F_LRO;
>>>>>>>>>>>
>>>>>>>>>>>     dev->vlan_features = dev->features;
>>>>>>>>>> Just FYI, Google turned up two similar bug reposts...
>>>>>>>>>> Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
>>>>>>>>>> Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
>>>>>>>>>>
>>>>>>>>>> Is there any sensible thing I could do, temporarily, until this
>>>>>>>>>> problem is sorted out?
>>>>>>>>>> Or am I simply stuck to kernels 4.19 on these machines for now?
>>>>>>>>> Something like this I guess:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>>> index 8a58a2f013af..cc5982193a40 100644
>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>> @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
>>>>>>>>>                           __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>>>>>>>>>           }
>>>>>>>>>
>>>>>>>>> +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
>>>>>>>>> +       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
>>>>>>>>>           return 0;
>>>>>>>>>    }
>>>>>>>> When I apply your patch, then I see drastic (more than half)
>>>>>>>> reductions in speed. (confirmed with iperf).
>>>>>>>>
>>>>>>>> But if instead I just remove a few lines from commit
>>>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
>>>>>>>> as in my earlier post, then I'm back to full speed
>>>>>>>>
>>>>>>>> I understand that this is just temporary workaround, until we figure this out.
>>>>>>> Oh weird. So it's not about getting some weird LRO packet. We will get it with
>>>>>>> VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
>>>>>>> features.
>>>>>>>
>>>>>>> How about this then? Just pretend to Linux that we disabled LRO.
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 8a58a2f013af..8e7e4cea176b 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
>>>>>>>                                      ~GUEST_OFFLOAD_LRO_MASK;
>>>>>>>
>>>>>>>                   err = virtnet_set_guest_offloads(vi, offloads);
>>>>>>> -               if (err)
>>>>>>> -                       return err;
>>>>>>> +               WARN_ON(err);
>>>>>>> +               //if (err)
>>>>>>> +               //      return err;
>>>>>>>                   vi->guest_offloads = offloads;
>>>>>>>           }
>>>>>> No. With this applied, the problem persists:
>>>>>>
>>>>>> # echo "1" > /proc/sys/net/ipv4/ip_forward
>>>>>>
>>>>>> kernel: ------------[ cut here ]------------
>>>>>> kernel: netdevice: eth0: failed to disable LRO!
>>>>>> kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
>>>>>> dev_disable_lro+0x108/0x150
>>>>>> kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
>>>>>> hid_generic usbhid hid virtio_net net_failover failover aesni_intel
>>>>>> libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
>>>>>> ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
>>>>>> i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
>>>>>> rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
>>>>>> kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
>>>>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>>>>> VirtualBox 12/01/2006
>>>>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>>>> Again the warning isn't a big deal. I agree we should address - Jason
>>>>> any update?
>>>> I still think using NETIF_F_LRO might not be correct. Since we're
>>>> basically receiving GSO packets.
>>>>
>>>> And it might cause a lot of issues if the device doesn't have
>>>> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
>>>>
>>>> I see two possible fixes:
>>>>
>>>> 1) using NETIF_F_GRO_HW instead (the patch is attached)
>>> It's unfortunate you didn't inline. Anyway.
>>> Ivan could you test the patch and report?
>>>
>>>> or
>>> Hmm. I am not sure we always preserve the GRO_HW requirement that
>>> packets can be re-segmented to reconstruct the original packet stream.
>>> Do all backends guarantee this?
>>
>> I think we can't.
>>
>>
>>> Could you explain why?
>>
>> Or we probably need another new netdev feature like rx-gso?
>>
>>
>>>
>>>
>>>> 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
>>>>
>>>> Thanks
>>> This one would slow guests on old hosts down significantly.
>>
>> Actually, it's not this proposal but see below.
>>
>>
>>> I am not sure why this didn't trigger previously
>>
>> It looks to me it was caused by a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
>> ("virtio-net: ethtool configurable LRO").
>>
>> Before this commit we won't even advertise NETIF_F_LRO, so dev_disable_lro()
>> won't warn.
>>
>> After this commit, we advertise LRO and dev_disable_lro() will try to
>> disable all guest offloads which will:
>>
>> 1) slow the traffic
>>
>> and
>>
>> 2) warn if "lro" can't be disabled on the device without ctrl guest offloads
>> (e.g the virtualbox host)
>>
>> Thanks
> OK. So I think I understand your comment now: GRO_HW makes sense simply
> because historically before a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 we
> never advertised LRO.


Yes.


>
> Can you post a patch RFC properly so Ivan can test?


Done.

Thanks


>
>
>>> btw -
>>> we used not to have CTRL_GUEST_OFFLOADS after all.
>>>
>>>
>>>
>>>>> But the main issue is you lose connectivity. That still
>>>>> persists with this? Can't you get a serial connection
>>>>> out? I know qemu Did the kernel oops afterwards?
>>>>>
>>>>> --
>>>>> MST
>>>>>
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-08-11  8:17 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CACFia2dwacaVVYD+1uG=CDGaJqdCOSBvZ5FcXp04caecaWAY3w@mail.gmail.com>
2021-07-23  1:28 ` PROBLEM: virtio_net LRO kernel panics Tonghao Zhang
     [not found]   ` <CACFia2fDZHUZB5wJ7LK8M2sv_+W58rHw0NzzrwPPoX9=s7yPdQ@mail.gmail.com>
2021-07-23  2:37     ` Jason Wang
     [not found]       ` <CACFia2eLCJuy=w1r20691s_cSYkBkPaY-Dbd-9CkrcpSAe7z6g@mail.gmail.com>
2021-07-23  4:25         ` Jason Wang
     [not found]           ` <CACFia2eH3eCZxtt70LB5zoPbhLXRv=crPh5oOhR=6mY3auDdQA@mail.gmail.com>
2021-07-23  7:59             ` Michael S. Tsirkin
     [not found]               ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
2021-07-23  8:13                 ` Michael S. Tsirkin
2021-07-23 12:10                 ` Michael S. Tsirkin
     [not found]                   ` <CACFia2en0JJDFyz3Umk-JTnMT=kjvRogt4PudED4kiLeMjcHFg@mail.gmail.com>
     [not found]                     ` <CACFia2fx7Lt-4o_uqDznvk-VgdsMtD64qv6RYkrCjKLu2yt8bg@mail.gmail.com>
     [not found]                       ` <CACFia2eUi4KNRC7MYktzUS9Nq2WcBiesX04Tbn2pTuvuGkY4qA@mail.gmail.com>
     [not found]                         ` <CACFia2dns1rTe5OQj4H-kpurVm2CTtGfAXz0aOUS0_cs0QUrsA@mail.gmail.com>
2021-07-27  9:11                           ` Michael S. Tsirkin
     [not found]                             ` <CACFia2dLp19pzJsScSvVYREpQm0n6XCWLieWXzA94=OVYVHTbw@mail.gmail.com>
2021-08-02 19:51                               ` Michael S. Tsirkin
     [not found]                                 ` <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
2021-08-10 15:31                                   ` Michael S. Tsirkin
2021-08-11  3:38                                     ` Jason Wang
2021-08-11  7:39                                       ` Michael S. Tsirkin
2021-08-11  7:45                                         ` Jason Wang
2021-08-11  8:01                                           ` Michael S. Tsirkin
2021-08-11  8:17                                             ` Jason Wang
     [not found]               ` <CACFia2fYQG4Y3_ffym06C1HGrOiOS38YWxuoUu4HYorwS9qOjA@mail.gmail.com>
2021-07-23  8:59                 ` Michael S. Tsirkin
2021-07-30 11:42 ` Michael S. Tsirkin
2021-07-30 11:42   ` Michael S. Tsirkin
2021-07-30 17:04   ` Ivan
2021-07-31 20:53     ` Michael S. Tsirkin
2021-07-31 20:53       ` Michael S. Tsirkin
2021-07-31 23:52       ` Ivan
2021-08-02  4:35     ` Jason Wang
2021-08-02  4:35       ` Jason Wang
2021-08-02 18:16       ` Ivan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.