* Re: PROBLEM: virtio_net LRO kernel panics
[not found] <CACFia2dwacaVVYD+1uG=CDGaJqdCOSBvZ5FcXp04caecaWAY3w@mail.gmail.com>
@ 2021-07-23 1:28 ` Tonghao Zhang
[not found] ` <CACFia2fDZHUZB5wJ7LK8M2sv_+W58rHw0NzzrwPPoX9=s7yPdQ@mail.gmail.com>
2021-07-30 11:42 ` Michael S. Tsirkin
1 sibling, 1 reply; 24+ messages in thread
From: Tonghao Zhang @ 2021-07-23 1:28 UTC (permalink / raw)
To: Ivan
Cc: David S. Miller, virtualization, Willem de Bruijn, Michael S. Tsirkin
On Fri, Jul 23, 2021 at 7:29 AM Ivan <ivan@prestigetransportation.com> wrote:
>
> Dear Sir,
>
> I've been plagued with kernel panics recently. The problem is easily
> reproducible on any virtual machine that uses the virtio-net driver
> from stock Linux kernel. Simply isuse this command:
>
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ...and the kernel panics.
>
> Is there any way we can possibly fix this?
Hi
what is your kernel version, and features of your netdevice.
I set the option, and the kernel does not panic. 5.13.0+
echo 1 > /proc/sys/net/ipv4/ip_forward
root@localhost-upstream:~# ethtool -k eth0
Features for eth0:
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> virtio_pci_modern_dev virtio_ring virtio loop unix
> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> knlGS:0000000000000000
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> kernel: Call Trace:
> kernel: devinet_sysctl_forward+0x1ac/0x1e0
> kernel: proc_sys_call_handler+0x127/0x230
> kernel: new_sync_write+0x114/0x1a0
> kernel: vfs_write+0x18c/0x220
> kernel: ksys_write+0x5a/0xd0
> kernel: do_syscall_64+0x45/0x80
> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> kernel: RIP: 0033:0x7fd4912b79b3
> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> kernel: ---[ end trace ee7985b10570603d ]---
> kernel: ------------[ cut here ]------------
--
Best regards, Tonghao
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2fDZHUZB5wJ7LK8M2sv_+W58rHw0NzzrwPPoX9=s7yPdQ@mail.gmail.com>
@ 2021-07-23 2:37 ` Jason Wang
[not found] ` <CACFia2eLCJuy=w1r20691s_cSYkBkPaY-Dbd-9CkrcpSAe7z6g@mail.gmail.com>
0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-07-23 2:37 UTC (permalink / raw)
To: Ivan, Tonghao Zhang
Cc: Willem de Bruijn, virtualization, David S. Miller, Michael S. Tsirkin
在 2021/7/23 上午9:40, Ivan 写道:
> On Thu, Jul 22, 2021 at 8:28 PM Tonghao Zhang<xiangxia.m.yue@gmail.com> wrote:
>> what is your kernel version, and features of your netdevice
> Currently, 5.13.4. But I also tested with kernels back to 5.10, and
> it always panics. I also downloaded the stock generic kernel from
> Slackware, and it too panicked.
>
> 0 root@NuRaid:~# ethtool -k eth0
> Features for eth0:
> rx-checksumming: on [fixed]
> tx-checksumming: on
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: on
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
> tx-tcp-segmentation: on
> tx-tcp-ecn-segmentation: off [fixed]
> tx-tcp-mangleid-segmentation: off
> tx-tcp6-segmentation: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: on [fixed]
Does it work if you turn off lro before enabling the forwarding?
Btw, using LRO for virtio-net is suspicious, it's actually the GSO in
the RX patch not LRO.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2eLCJuy=w1r20691s_cSYkBkPaY-Dbd-9CkrcpSAe7z6g@mail.gmail.com>
@ 2021-07-23 4:25 ` Jason Wang
[not found] ` <CACFia2eH3eCZxtt70LB5zoPbhLXRv=crPh5oOhR=6mY3auDdQA@mail.gmail.com>
0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-07-23 4:25 UTC (permalink / raw)
To: Ivan
Cc: David S. Miller, virtualization, Willem de Bruijn, Michael S. Tsirkin
在 2021/7/23 上午10:54, Ivan 写道:
> On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/7/23 上午9:40, Ivan 写道:
>>> On Thu, Jul 22, 2021 at 8:28 PM Tonghao Zhang<xiangxia.m.yue@gmail.com> wrote:
>>>> what is your kernel version, and features of your netdevice
>>> Currently, 5.13.4. But I also tested with kernels back to 5.10, and
>>> it always panics. I also downloaded the stock generic kernel from
>>> Slackware, and it too panicked.
>>>
>>> 0 root@NuRaid:~# ethtool -k eth0
>>> Features for eth0:
>>> rx-checksumming: on [fixed]
>>> tx-checksumming: on
>>> tx-checksum-ipv4: off [fixed]
>>> tx-checksum-ip-generic: on
>>> tx-checksum-ipv6: off [fixed]
>>> tx-checksum-fcoe-crc: off [fixed]
>>> tx-checksum-sctp: off [fixed]
>>> scatter-gather: on
>>> tx-scatter-gather: on
>>> tx-scatter-gather-fraglist: off [fixed]
>>> tcp-segmentation-offload: on
>>> tx-tcp-segmentation: on
>>> tx-tcp-ecn-segmentation: off [fixed]
>>> tx-tcp-mangleid-segmentation: off
>>> tx-tcp6-segmentation: on
>>> generic-segmentation-offload: on
>>> generic-receive-offload: on
>>> large-receive-offload: on [fixed]
>>
>> Does it work if you turn off lro before enabling the forwarding?
>>
>> Btw, using LRO for virtio-net is suspicious, it's actually the GSO in
>> the RX patch not LRO.
> As I mention, it's a freshly booted system on which I have not made
> any setting changes with ethtool or sysctl. (So, whatever the kernel
> defaults are)
>
> Per your suggestion:
>
> 0 root@NuRaid:~# ethtool -K eth0 lro off
> Actual changes:
> rx-lro: on [requested off]
> Could not change any device features
> 1 root@NuRaid:~#
Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
which makes it impossible to change the LRO setting.
Did you use qemu? If yes, what's the qemu version you've used?
Thanks
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2eH3eCZxtt70LB5zoPbhLXRv=crPh5oOhR=6mY3auDdQA@mail.gmail.com>
@ 2021-07-23 7:59 ` Michael S. Tsirkin
[not found] ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
[not found] ` <CACFia2fYQG4Y3_ffym06C1HGrOiOS38YWxuoUu4HYorwS9qOjA@mail.gmail.com>
0 siblings, 2 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23 7:59 UTC (permalink / raw)
To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn
On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > 在 2021/7/23 上午10:54, Ivan 写道:
> > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > >> Does it work if you turn off lro before enabling the forwarding?
> > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > Actual changes:
> > > rx-lro: on [requested off]
> > > Could not change any device features
> >
> > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > which makes it impossible to change the LRO setting.
> >
> > Did you use qemu? If yes, what's the qemu version you've used?
>
> These are VirtualBox machines, which I've been using for years with
> longterm kernels 4.19, and I never had such a problem. But now that I
> tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> are just generic kernel builds, and a minimalistic userspace.
I would be useful to see the features your virtualbox instance provides
cat /sys/class/net/eth0/device/features
replacing eth0 with device name as appropriate
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
@ 2021-07-23 8:13 ` Michael S. Tsirkin
2021-07-23 12:10 ` Michael S. Tsirkin
1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23 8:13 UTC (permalink / raw)
To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn
On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > Actual changes:
> > > > > rx-lro: on [requested off]
> > > > > Could not change any device features
> > > >
> > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > which makes it impossible to change the LRO setting.
> > > >
> > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >
> > > These are VirtualBox machines, which I've been using for years with
> > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > are just generic kernel builds, and a minimalistic userspace.
> >
> > I would be useful to see the features your virtualbox instance provides
> >
> > cat /sys/class/net/eth0/device/features
>
> # cat /sys/class/net/eth0/device/features
> 1100010110111011111100000000000000000000000000000000000000000000
So if I'm not wrong:
1#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */
1#define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */
0#define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS 2 /* Dynamic offload configuration. */
0#define VIRTIO_NET_F_MTU 3 /* Initial MTU advice */
0
1#define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */
0
1#define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */
1#define VIRTIO_NET_F_GUEST_TSO6 8 /* Guest can handle TSOv6 in. */
0#define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */
1#define VIRTIO_NET_F_GUEST_UFO 10 /* Guest can handle UFO in. */
1#define VIRTIO_NET_F_HOST_TSO4 11 /* Host can handle TSOv4 in. */
1#define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */
0#define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */
1#define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */
1#define VIRTIO_NET_F_MRG_RXBUF 15 /* Host can merge receive buffers. */
1#define VIRTIO_NET_F_STATUS 16 /* virtio_net_config.status available */
1#define VIRTIO_NET_F_CTRL_VQ 17 /* Control channel available */
1#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
1#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2fYQG4Y3_ffym06C1HGrOiOS38YWxuoUu4HYorwS9qOjA@mail.gmail.com>
@ 2021-07-23 8:59 ` Michael S. Tsirkin
0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23 8:59 UTC (permalink / raw)
To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn
On Fri, Jul 23, 2021 at 03:31:02AM -0500, Ivan wrote:
> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > Actual changes:
> > > > > rx-lro: on [requested off]
> > > > > Could not change any device features
> > > >
> > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > which makes it impossible to change the LRO setting.
> > > >
> > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >
> > > These are VirtualBox machines, which I've been using for years with
> > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > are just generic kernel builds, and a minimalistic userspace.
> >
> > I would be useful to see the features your virtualbox instance provides
> >
> > cat /sys/class/net/eth0/device/features
> >
> > replacing eth0 with device name as appropriate
>
> # grep . /sys/class/net/eth0/device/* 2>/dev/null
> /sys/class/net/eth0/device/device:0x0001
> /sys/class/net/eth0/device/
> features:1100010110111011111100000000000000000000000000000000000000000000
> /sys/class/net/eth0/device/modalias:virtio:d00000001v00001AF4
> /sys/class/net/eth0/device/status:0x00000007
> /sys/class/net/eth0/device/uevent:DRIVER=virtio_net
> /sys/class/net/eth0/device/uevent:MODALIAS=virtio:d00000001v00001AF4
> /sys/class/net/eth0/device/vendor:0x1af4
>
> # lspci -vv -nn
> 00:03.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device
> [1af4:1000]
> Subsystem: Red Hat, Inc. Virtio network device [1af4:0001]
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 64
> Interrupt: pin A routed to IRQ 19
> Region 0: I/O ports at d000 [size=32]
> Capabilities: [80] Null
> Kernel driver in use: virtio-pci
> Kernel modules: virtio_pci
>
Disabling guest offloads reproduces the warning, but not the crash
for me.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
2021-07-23 8:13 ` Michael S. Tsirkin
@ 2021-07-23 12:10 ` Michael S. Tsirkin
[not found] ` <CACFia2en0JJDFyz3Umk-JTnMT=kjvRogt4PudED4kiLeMjcHFg@mail.gmail.com>
1 sibling, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-23 12:10 UTC (permalink / raw)
To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn
On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > Actual changes:
> > > > > rx-lro: on [requested off]
> > > > > Could not change any device features
> > > >
> > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > which makes it impossible to change the LRO setting.
> > > >
> > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >
> > > These are VirtualBox machines, which I've been using for years with
> > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > are just generic kernel builds, and a minimalistic userspace.
> >
> > I would be useful to see the features your virtualbox instance provides
> >
> > cat /sys/class/net/eth0/device/features
>
> # cat /sys/class/net/eth0/device/features
> 1100010110111011111100000000000000000000000000000000000000000000
I was able to reproduce the warning but not the panic.
OTOH if LRO stays on when enabling forwarding that
is already a problem. Any chance you can bisect to
find out which change introduced the panic?
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2dns1rTe5OQj4H-kpurVm2CTtGfAXz0aOUS0_cs0QUrsA@mail.gmail.com>
@ 2021-07-27 9:11 ` Michael S. Tsirkin
[not found] ` <CACFia2dLp19pzJsScSvVYREpQm0n6XCWLieWXzA94=OVYVHTbw@mail.gmail.com>
0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-27 9:11 UTC (permalink / raw)
To: Ivan; +Cc: David S. Miller, virtualization, Willem de Bruijn
On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> >
> > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > >
> > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > >>
> > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >>>
> > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >>> > >
> > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > >>> > > > > > Actual changes:
> > >>> > > > > > rx-lro: on [requested off]
> > >>> > > > > > Could not change any device features
> > >>> > > > >
> > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > >>> > > > > which makes it impossible to change the LRO setting.
> > >>> > > > >
> > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >>> > > >
> > >>> > > > These are VirtualBox machines, which I've been using for years with
> > >>> > > > longterm kernels 4.19, and I never had such a problem. But now that I
> > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > >>> > >
> > >>> > > I would be useful to see the features your virtualbox instance provides
> > >>> > >
> > >>> > > cat /sys/class/net/eth0/device/features
> > >>> >
> > >>> > # cat /sys/class/net/eth0/device/features
> > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > >>>
> > >>> I was able to reproduce the warning but not the panic.
> > >>> OTOH if LRO stays on when enabling forwarding that
> > >>> is already a problem. Any chance you can bisect to
> > >>> find out which change introduced the panic?
> > >>
> > >>
> > >> Any kernels up to 4.19.198 don't panic.
> > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > >> I have not tested any kernels between 4.19 and 5.10.
> > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > >> That may take a day or so. I'll get on with it now, and report my findings.
> > >
> > > So, I narrowed it down: the panics start with kernel 5.0-rc.
> >
> > More narowly, the problem seems be coming from commit
> > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > Just to test my suspicion, I deleted a few lines from that code,
> > and the panic went away. Hope that helps you guys figure out
> > what the problem might be.
Well it disables LRO but we knew this :( I'd help if we knew
where does it panic, all we see it the warning which is
related for sure but not the immediate rootcause ...
> >
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2978,11 +2978,6 @@
> > }
> > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > dev->features |= NETIF_F_RXCSUM;
> > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > - dev->features |= NETIF_F_LRO;
> > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > - dev->hw_features |= NETIF_F_LRO;
> >
> > dev->vlan_features = dev->features;
>
> Just FYI, Google turned up two similar bug reposts...
> Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
>
> Is there any sensible thing I could do, temporarily, until this
> problem is sorted out?
> Or am I simply stuck to kernels 4.19 on these machines for now?
Something like this I guess:
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..cc5982193a40 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
}
+ __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
+ __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
return 0;
}
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] <CACFia2dwacaVVYD+1uG=CDGaJqdCOSBvZ5FcXp04caecaWAY3w@mail.gmail.com>
@ 2021-07-30 11:42 ` Michael S. Tsirkin
2021-07-30 11:42 ` Michael S. Tsirkin
1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-30 11:42 UTC (permalink / raw)
To: Ivan
Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
virtualization, netdev, Eric Dumazet, Jakub Kicinski
On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> Dear Sir,
>
> I've been plagued with kernel panics recently. The problem is easily
> reproducible on any virtual machine that uses the virtio-net driver
> from stock Linux kernel. Simply isuse this command:
>
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ...and the kernel panics.
>
> Is there any way we can possibly fix this?
>
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> virtio_pci_modern_dev virtio_ring virtio loop unix
> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> knlGS:0000000000000000
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> kernel: Call Trace:
> kernel: devinet_sysctl_forward+0x1ac/0x1e0
> kernel: proc_sys_call_handler+0x127/0x230
> kernel: new_sync_write+0x114/0x1a0
> kernel: vfs_write+0x18c/0x220
> kernel: ksys_write+0x5a/0xd0
> kernel: do_syscall_64+0x45/0x80
> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> kernel: RIP: 0033:0x7fd4912b79b3
> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> kernel: ---[ end trace ee7985b10570603d ]---
> kernel: ------------[ cut here ]------------
So the warning is easy to reproduce.
On qemu/kvm just set ctrl_guest_offloads=off for the device.
The panic does not seem to trigger for me and you did not provide
any data about it. What happens? Does guest just freeze?
I am guessing the issue is that dev_disable_lro does not report the
return status and inet_forward_change assumes it's successful. We then
end up with LRO packets in unexpected places.
Cc netdev and a bunch of people who might have a better idea.
--
MST
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
@ 2021-07-30 11:42 ` Michael S. Tsirkin
0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-30 11:42 UTC (permalink / raw)
To: Ivan
Cc: Willem de Bruijn, netdev, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> Dear Sir,
>
> I've been plagued with kernel panics recently. The problem is easily
> reproducible on any virtual machine that uses the virtio-net driver
> from stock Linux kernel. Simply isuse this command:
>
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ...and the kernel panics.
>
> Is there any way we can possibly fix this?
>
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> virtio_pci_modern_dev virtio_ring virtio loop unix
> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> knlGS:0000000000000000
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> kernel: Call Trace:
> kernel: devinet_sysctl_forward+0x1ac/0x1e0
> kernel: proc_sys_call_handler+0x127/0x230
> kernel: new_sync_write+0x114/0x1a0
> kernel: vfs_write+0x18c/0x220
> kernel: ksys_write+0x5a/0xd0
> kernel: do_syscall_64+0x45/0x80
> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> kernel: RIP: 0033:0x7fd4912b79b3
> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> kernel: ---[ end trace ee7985b10570603d ]---
> kernel: ------------[ cut here ]------------
So the warning is easy to reproduce.
On qemu/kvm just set ctrl_guest_offloads=off for the device.
The panic does not seem to trigger for me and you did not provide
any data about it. What happens? Does guest just freeze?
I am guessing the issue is that dev_disable_lro does not report the
return status and inet_forward_change assumes it's successful. We then
end up with LRO packets in unexpected places.
Cc netdev and a bunch of people who might have a better idea.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-07-30 11:42 ` Michael S. Tsirkin
(?)
@ 2021-07-30 17:04 ` Ivan
2021-07-31 20:53 ` Michael S. Tsirkin
2021-08-02 4:35 ` Jason Wang
-1 siblings, 2 replies; 24+ messages in thread
From: Ivan @ 2021-07-30 17:04 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
virtualization, netdev, Eric Dumazet, Jakub Kicinski, Ivan
On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > Dear Sir,
> >
> > I've been plagued with kernel panics recently. The problem is easily
> > reproducible on any virtual machine that uses the virtio-net driver
> > from stock Linux kernel. Simply isuse this command:
> >
> > echo 1 > /proc/sys/net/ipv4/ip_forward
> > ...and the kernel panics.
> >
> > Is there any way we can possibly fix this?
> >
> > kernel: ------------[ cut here ]------------
> > kernel: netdevice: eth0: failed to disable LRO!
> > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > dev_disable_lro+0x108/0x150
> > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > virtio_pci_modern_dev virtio_ring virtio loop unix
> > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > VirtualBox 12/01/2006
> > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > knlGS:0000000000000000
> > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > kernel: Call Trace:
> > kernel: devinet_sysctl_forward+0x1ac/0x1e0
> > kernel: proc_sys_call_handler+0x127/0x230
> > kernel: new_sync_write+0x114/0x1a0
> > kernel: vfs_write+0x18c/0x220
> > kernel: ksys_write+0x5a/0xd0
> > kernel: do_syscall_64+0x45/0x80
> > kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> > kernel: RIP: 0033:0x7fd4912b79b3
> > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > kernel: ---[ end trace ee7985b10570603d ]---
> > kernel: ------------[ cut here ]------------
>
> So the warning is easy to reproduce.
> On qemu/kvm just set ctrl_guest_offloads=off for the device.
I have no control over the settings of the host.
I have full control over the guest.
> The panic does not seem to trigger for me and you did not provide
> any data about it. What happens? Does guest just freeze?
I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
No, the guest does not freeze, just, the moment I issue the command...
echo 1 > /proc/sys/net/ipv4/ip_forward
... and I see the "--[ cut here ]--" message appear in the syslog.
Shortly thereafter my ssh session to that host dies.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-07-30 17:04 ` Ivan
@ 2021-07-31 20:53 ` Michael S. Tsirkin
2021-08-02 4:35 ` Jason Wang
1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-31 20:53 UTC (permalink / raw)
To: Ivan
Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
virtualization, netdev, Eric Dumazet, Jakub Kicinski
On Fri, Jul 30, 2021 at 12:04:18PM -0500, Ivan wrote:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > > Dear Sir,
> > >
> > > I've been plagued with kernel panics recently. The problem is easily
> > > reproducible on any virtual machine that uses the virtio-net driver
> > > from stock Linux kernel. Simply isuse this command:
> > >
> > > echo 1 > /proc/sys/net/ipv4/ip_forward
> > > ...and the kernel panics.
> > >
> > > Is there any way we can possibly fix this?
> > >
> > > kernel: ------------[ cut here ]------------
> > > kernel: netdevice: eth0: failed to disable LRO!
> > > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > > dev_disable_lro+0x108/0x150
> > > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > > virtio_pci_modern_dev virtio_ring virtio loop unix
> > > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > VirtualBox 12/01/2006
> > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > > kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > > knlGS:0000000000000000
> > > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > > kernel: Call Trace:
> > > kernel: devinet_sysctl_forward+0x1ac/0x1e0
> > > kernel: proc_sys_call_handler+0x127/0x230
> > > kernel: new_sync_write+0x114/0x1a0
> > > kernel: vfs_write+0x18c/0x220
> > > kernel: ksys_write+0x5a/0xd0
> > > kernel: do_syscall_64+0x45/0x80
> > > kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > kernel: RIP: 0033:0x7fd4912b79b3
> > > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > > kernel: ---[ end trace ee7985b10570603d ]---
> > > kernel: ------------[ cut here ]------------
> >
> > So the warning is easy to reproduce.
> > On qemu/kvm just set ctrl_guest_offloads=off for the device.
>
> I have no control over the settings of the host.
> I have full control over the guest.
>
> > The panic does not seem to trigger for me and you did not provide
> > any data about it. What happens? Does guest just freeze?
>
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.
So the host or to the guest?
--
MST
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
@ 2021-07-31 20:53 ` Michael S. Tsirkin
0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-07-31 20:53 UTC (permalink / raw)
To: Ivan
Cc: Willem de Bruijn, netdev, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
On Fri, Jul 30, 2021 at 12:04:18PM -0500, Ivan wrote:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > > Dear Sir,
> > >
> > > I've been plagued with kernel panics recently. The problem is easily
> > > reproducible on any virtual machine that uses the virtio-net driver
> > > from stock Linux kernel. Simply isuse this command:
> > >
> > > echo 1 > /proc/sys/net/ipv4/ip_forward
> > > ...and the kernel panics.
> > >
> > > Is there any way we can possibly fix this?
> > >
> > > kernel: ------------[ cut here ]------------
> > > kernel: netdevice: eth0: failed to disable LRO!
> > > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > > dev_disable_lro+0x108/0x150
> > > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > > virtio_pci_modern_dev virtio_ring virtio loop unix
> > > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > VirtualBox 12/01/2006
> > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > > kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > > knlGS:0000000000000000
> > > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > > kernel: Call Trace:
> > > kernel: devinet_sysctl_forward+0x1ac/0x1e0
> > > kernel: proc_sys_call_handler+0x127/0x230
> > > kernel: new_sync_write+0x114/0x1a0
> > > kernel: vfs_write+0x18c/0x220
> > > kernel: ksys_write+0x5a/0xd0
> > > kernel: do_syscall_64+0x45/0x80
> > > kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > kernel: RIP: 0033:0x7fd4912b79b3
> > > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > > kernel: ---[ end trace ee7985b10570603d ]---
> > > kernel: ------------[ cut here ]------------
> >
> > So the warning is easy to reproduce.
> > On qemu/kvm just set ctrl_guest_offloads=off for the device.
>
> I have no control over the settings of the host.
> I have full control over the guest.
>
> > The panic does not seem to trigger for me and you did not provide
> > any data about it. What happens? Does guest just freeze?
>
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.
So the host or to the guest?
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-07-31 20:53 ` Michael S. Tsirkin
(?)
@ 2021-07-31 23:52 ` Ivan
-1 siblings, 0 replies; 24+ messages in thread
From: Ivan @ 2021-07-31 23:52 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, Willem de Bruijn, David S. Miller, Tonghao Zhang,
virtualization, netdev, Eric Dumazet, Jakub Kicinski, Ivan
On Sat, Jul 31, 2021 at 3:53 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jul 30, 2021 at 12:04:18PM -0500, Ivan wrote:
> > On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> > > > Dear Sir,
> > > >
> > > > I've been plagued with kernel panics recently. The problem is easily
> > > > reproducible on any virtual machine that uses the virtio-net driver
> > > > from stock Linux kernel. Simply isuse this command:
> > > >
> > > > echo 1 > /proc/sys/net/ipv4/ip_forward
> > > > ...and the kernel panics.
> > > >
> > > > Is there any way we can possibly fix this?
> > > >
> > > > kernel: ------------[ cut here ]------------
> > > > kernel: netdevice: eth0: failed to disable LRO!
> > > > kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> > > > dev_disable_lro+0x108/0x150
> > > > kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> > > > atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> > > > i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> > > > rng_core i2c_piix4 i2c_core virtio_pci usb_common
> > > > virtio_pci_modern_dev virtio_ring virtio loop unix
> > > > kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> > > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > > VirtualBox 12/01/2006
> > > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > > kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> > > > c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> > > > <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> > > > kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> > > > kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> > > > kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> > > > kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> > > > kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> > > > kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> > > > kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> > > > knlGS:0000000000000000
> > > > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> > > > kernel: Call Trace:
> > > > kernel: devinet_sysctl_forward+0x1ac/0x1e0
> > > > kernel: proc_sys_call_handler+0x127/0x230
> > > > kernel: new_sync_write+0x114/0x1a0
> > > > kernel: vfs_write+0x18c/0x220
> > > > kernel: ksys_write+0x5a/0xd0
> > > > kernel: do_syscall_64+0x45/0x80
> > > > kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > kernel: RIP: 0033:0x7fd4912b79b3
> > > > kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> > > > b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> > > > <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > > > kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > > kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> > > > kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> > > > kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> > > > kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> > > > kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> > > > kernel: ---[ end trace ee7985b10570603d ]---
> > > > kernel: ------------[ cut here ]------------
> > >
> > > So the warning is easy to reproduce.
> > > On qemu/kvm just set ctrl_guest_offloads=off for the device.
> >
> > I have no control over the settings of the host.
> > I have full control over the guest.
> >
> > > The panic does not seem to trigger for me and you did not provide
> > > any data about it. What happens? Does guest just freeze?
> >
> > I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> > No, the guest does not freeze, just, the moment I issue the command...
> > echo 1 > /proc/sys/net/ipv4/ip_forward
> > ... and I see the "--[ cut here ]--" message appear in the syslog.
> > Shortly thereafter my ssh session to that host dies.
>
> So the host or to the guest?
Sorry! The guest. (My bad) This problem happens in the guest.
My ssh session to that guest dies shortly after I ussue that command.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-07-30 17:04 ` Ivan
@ 2021-08-02 4:35 ` Jason Wang
2021-08-02 4:35 ` Jason Wang
1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2021-08-02 4:35 UTC (permalink / raw)
To: Ivan, Michael S. Tsirkin
Cc: Willem de Bruijn, David S. Miller, Tonghao Zhang, virtualization,
netdev, Eric Dumazet, Jakub Kicinski
在 2021/7/31 上午1:04, Ivan 写道:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
>>> Dear Sir,
>>>
>>> I've been plagued with kernel panics recently. The problem is easily
>>> reproducible on any virtual machine that uses the virtio-net driver
>>> from stock Linux kernel. Simply isuse this command:
>>>
>>> echo 1 > /proc/sys/net/ipv4/ip_forward
>>> ...and the kernel panics.
>>>
>>> Is there any way we can possibly fix this?
>>>
>>> kernel: ------------[ cut here ]------------
>>> kernel: netdevice: eth0: failed to disable LRO!
>>> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
>>> dev_disable_lro+0x108/0x150
>>> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
>>> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
>>> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
>>> rng_core i2c_piix4 i2c_core virtio_pci usb_common
>>> virtio_pci_modern_dev virtio_ring virtio loop unix
>>> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>> VirtualBox 12/01/2006
>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
>>> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
>>> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
>>> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
>>> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
>>> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
>>> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
>>> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
>>> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
>>> kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
>>> knlGS:0000000000000000
>>> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
>>> kernel: Call Trace:
>>> kernel: devinet_sysctl_forward+0x1ac/0x1e0
>>> kernel: proc_sys_call_handler+0x127/0x230
>>> kernel: new_sync_write+0x114/0x1a0
>>> kernel: vfs_write+0x18c/0x220
>>> kernel: ksys_write+0x5a/0xd0
>>> kernel: do_syscall_64+0x45/0x80
>>> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> kernel: RIP: 0033:0x7fd4912b79b3
>>> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
>>> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
>>> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
>>> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
>>> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
>>> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
>>> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
>>> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
>>> kernel: ---[ end trace ee7985b10570603d ]---
>>> kernel: ------------[ cut here ]------------
>> So the warning is easy to reproduce.
>> On qemu/kvm just set ctrl_guest_offloads=off for the device.
> I have no control over the settings of the host.
> I have full control over the guest.
>
>> The panic does not seem to trigger for me and you did not provide
>> any data about it. What happens? Does guest just freeze?
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.
Does it work before this commit?
commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
Author: Willem de Bruijn <willemb@google.com>
Date: Thu Dec 20 17:14:54 2018 -0500
virtio-net: ethtool configurable LRO
Thanks
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
@ 2021-08-02 4:35 ` Jason Wang
0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2021-08-02 4:35 UTC (permalink / raw)
To: Ivan, Michael S. Tsirkin
Cc: Willem de Bruijn, netdev, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
在 2021/7/31 上午1:04, Ivan 写道:
> On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
>>> Dear Sir,
>>>
>>> I've been plagued with kernel panics recently. The problem is easily
>>> reproducible on any virtual machine that uses the virtio-net driver
>>> from stock Linux kernel. Simply isuse this command:
>>>
>>> echo 1 > /proc/sys/net/ipv4/ip_forward
>>> ...and the kernel panics.
>>>
>>> Is there any way we can possibly fix this?
>>>
>>> kernel: ------------[ cut here ]------------
>>> kernel: netdevice: eth0: failed to disable LRO!
>>> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
>>> dev_disable_lro+0x108/0x150
>>> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
>>> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
>>> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
>>> rng_core i2c_piix4 i2c_core virtio_pci usb_common
>>> virtio_pci_modern_dev virtio_ring virtio loop unix
>>> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>> VirtualBox 12/01/2006
>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
>>> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
>>> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
>>> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
>>> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
>>> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
>>> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
>>> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
>>> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
>>> kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
>>> knlGS:0000000000000000
>>> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
>>> kernel: Call Trace:
>>> kernel: devinet_sysctl_forward+0x1ac/0x1e0
>>> kernel: proc_sys_call_handler+0x127/0x230
>>> kernel: new_sync_write+0x114/0x1a0
>>> kernel: vfs_write+0x18c/0x220
>>> kernel: ksys_write+0x5a/0xd0
>>> kernel: do_syscall_64+0x45/0x80
>>> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> kernel: RIP: 0033:0x7fd4912b79b3
>>> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
>>> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
>>> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
>>> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
>>> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
>>> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
>>> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
>>> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
>>> kernel: ---[ end trace ee7985b10570603d ]---
>>> kernel: ------------[ cut here ]------------
>> So the warning is easy to reproduce.
>> On qemu/kvm just set ctrl_guest_offloads=off for the device.
> I have no control over the settings of the host.
> I have full control over the guest.
>
>> The panic does not seem to trigger for me and you did not provide
>> any data about it. What happens? Does guest just freeze?
> I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> No, the guest does not freeze, just, the moment I issue the command...
> echo 1 > /proc/sys/net/ipv4/ip_forward
> ... and I see the "--[ cut here ]--" message appear in the syslog.
> Shortly thereafter my ssh session to that host dies.
Does it work before this commit?
commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
Author: Willem de Bruijn <willemb@google.com>
Date: Thu Dec 20 17:14:54 2018 -0500
virtio-net: ethtool configurable LRO
Thanks
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-08-02 4:35 ` Jason Wang
(?)
@ 2021-08-02 18:16 ` Ivan
-1 siblings, 0 replies; 24+ messages in thread
From: Ivan @ 2021-08-02 18:16 UTC (permalink / raw)
To: Jason Wang
Cc: Michael S. Tsirkin, Willem de Bruijn, David S. Miller,
Tonghao Zhang, virtualization, netdev, Eric Dumazet,
Jakub Kicinski, Ivan
On Sun, Aug 1, 2021 at 11:35 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/7/31 上午1:04, Ivan 写道:
> > On Fri, Jul 30, 2021 at 6:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Thu, Jul 22, 2021 at 06:27:18PM -0500, Ivan wrote:
> >>> Dear Sir,
> >>>
> >>> I've been plagued with kernel panics recently. The problem is easily
> >>> reproducible on any virtual machine that uses the virtio-net driver
> >>> from stock Linux kernel. Simply isuse this command:
> >>>
> >>> echo 1 > /proc/sys/net/ipv4/ip_forward
> >>> ...and the kernel panics.
> >>>
> >>> Is there any way we can possibly fix this?
> >>>
> >>> kernel: ------------[ cut here ]------------
> >>> kernel: netdevice: eth0: failed to disable LRO!
> >>> kernel: WARNING: CPU: 1 PID: 424 at net/core/dev.c:1768
> >>> dev_disable_lro+0x108/0x150
> >>> kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usbhid
> >>> atkbd libps2 ahci libahci virtio_net ohci_pci net_failover failover
> >>> i8042 serio lpc_ich mfd_core libata ohci_hcd ehci_pci ehci_hcd usbcore
> >>> rng_core i2c_piix4 i2c_core virtio_pci usb_common
> >>> virtio_pci_modern_dev virtio_ring virtio loop unix
> >>> kernel: CPU: 1 PID: 424 Comm: bash Not tainted 5.13.4-gnu.4-NuMini #1
> >>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> >>> VirtualBox 12/01/2006
> >>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> >>> kernel: Code: ae 88 74 14 be 25 00 00 00 48 89 df e8 f1 54 ed ff 48 85
> >>> c0 48 0f 44 eb 4c 89 e2 48 89 ee 48 c7 c7 00 c6 ae 88 e8 7a 76 0c 00
> >>> <0f> 0b e9 2d ff ff ff 80 3d e8 70 97 00 00 49 c7 c4 73 bb ae 88 75
> >>> kernel: RSP: 0018:ffffb596c0237d80 EFLAGS: 00010282
> >>> kernel: RAX: 0000000000000000 RBX: ffff9af9c1835000 RCX: ffff9af9fed17538
> >>> kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9af9fed17530
> >>> kernel: RBP: ffff9af9c1835000 R08: ffffffff88c96ac8 R09: 0000000000004ffb
> >>> kernel: R10: 00000000fffff000 R11: 3fffffffffffffff R12: ffffffff88ac7c3d
> >>> kernel: R13: 0000000000000000 R14: ffffffff88cb2748 R15: ffff9af9c12166c8
> >>> kernel: FS: 00007fd4911b8740(0000) GS:ffff9af9fed00000(0000)
> >>> knlGS:0000000000000000
> >>> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> kernel: CR2: 0000000000532008 CR3: 000000000115c000 CR4: 00000000000406e0
> >>> kernel: Call Trace:
> >>> kernel: devinet_sysctl_forward+0x1ac/0x1e0
> >>> kernel: proc_sys_call_handler+0x127/0x230
> >>> kernel: new_sync_write+0x114/0x1a0
> >>> kernel: vfs_write+0x18c/0x220
> >>> kernel: ksys_write+0x5a/0xd0
> >>> kernel: do_syscall_64+0x45/0x80
> >>> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> kernel: RIP: 0033:0x7fd4912b79b3
> >>> kernel: Code: 8b 15 b9 74 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb
> >>> b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05
> >>> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> >>> kernel: RSP: 002b:00007ffe96fdd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >>> kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4912b79b3
> >>> kernel: RDX: 0000000000000002 RSI: 0000000000536810 RDI: 0000000000000001
> >>> kernel: RBP: 0000000000536810 R08: 000000000000000a R09: 0000000000000000
> >>> kernel: R10: 00007fd49134f040 R11: 0000000000000246 R12: 0000000000000002
> >>> kernel: R13: 00007fd4913906c0 R14: 00007fd49138c520 R15: 00007fd49138b920
> >>> kernel: ---[ end trace ee7985b10570603d ]---
> >>> kernel: ------------[ cut here ]------------
> >> So the warning is easy to reproduce.
> >> On qemu/kvm just set ctrl_guest_offloads=off for the device.
> > I have no control over the settings of the host.
> > I have full control over the guest.
> >
> >> The panic does not seem to trigger for me and you did not provide
> >> any data about it. What happens? Does guest just freeze?
> > I'm not sure if I am misusing the word "panic". (Appologies, not a programer)
> > No, the guest does not freeze, just, the moment I issue the command...
> > echo 1 > /proc/sys/net/ipv4/ip_forward
> > ... and I see the "--[ cut here ]--" message appear in the syslog.
> > Shortly thereafter my ssh session to that host dies.
>
>
> Does it work before this commit?
>
> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> Author: Willem de Bruijn <willemb@google.com>
> Date: Thu Dec 20 17:14:54 2018 -0500
>
> virtio-net: ethtool configurable LRO
Yes.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2dLp19pzJsScSvVYREpQm0n6XCWLieWXzA94=OVYVHTbw@mail.gmail.com>
@ 2021-08-02 19:51 ` Michael S. Tsirkin
[not found] ` <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-02 19:51 UTC (permalink / raw)
To: Ivan
Cc: Willem de Bruijn, virtualization, Eric Dumazet, Jakub Kicinski,
David S. Miller
On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > >
> > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > >
> > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > >>
> > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >>>
> > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >>> > >
> > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > >>> > > > > > Actual changes:
> > > > >>> > > > > > rx-lro: on [requested off]
> > > > >>> > > > > > Could not change any device features
> > > > >>> > > > >
> > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > >>> > > > >
> > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > >>> > > >
> > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > >>> > > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > >>> > >
> > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > >>> > >
> > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > >>> >
> > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > >>>
> > > > >>> I was able to reproduce the warning but not the panic.
> > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > >>> is already a problem. Any chance you can bisect to
> > > > >>> find out which change introduced the panic?
> > > > >>
> > > > >>
> > > > >> Any kernels up to 4.19.198 don't panic.
> > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > >> That may take a day or so. I'll get on with it now, and report my findings.
> > > > >
> > > > > So, I narrowed it down: the panics start with kernel 5.0-rc.
> > > >
> > > > More narowly, the problem seems be coming from commit
> > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > and the panic went away. Hope that helps you guys figure out
> > > > what the problem might be.
> >
> > Well it disables LRO but we knew this :( I'd help if we knew
> > where does it panic, all we see it the warning which is
> > related for sure but not the immediate rootcause ...
> >
> > > >
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2978,11 +2978,6 @@
> > > > }
> > > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > dev->features |= NETIF_F_RXCSUM;
> > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > - dev->features |= NETIF_F_LRO;
> > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > - dev->hw_features |= NETIF_F_LRO;
> > > >
> > > > dev->vlan_features = dev->features;
> > >
> > > Just FYI, Google turned up two similar bug reposts...
> > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > >
> > > Is there any sensible thing I could do, temporarily, until this
> > > problem is sorted out?
> > > Or am I simply stuck to kernels 4.19 on these machines for now?
> >
> >
> > Something like this I guess:
> >
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8a58a2f013af..cc5982193a40 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > }
> >
> > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > return 0;
> > }
>
> When I apply your patch, then I see drastic (more than half)
> reductions in speed. (confirmed with iperf).
>
> But if instead I just remove a few lines from commit
> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> as in my earlier post, then I'm back to full speed
>
> I understand that this is just temporary workaround, until we figure this out.
Oh weird. So it's not about getting some weird LRO packet. We will get it with
VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
features.
How about this then? Just pretend to Linux that we disabled LRO.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..8e7e4cea176b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
~GUEST_OFFLOAD_LRO_MASK;
err = virtnet_set_guest_offloads(vi, offloads);
- if (err)
- return err;
+ WARN_ON(err);
+ //if (err)
+ // return err;
vi->guest_offloads = offloads;
}
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
[not found] ` <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
@ 2021-08-10 15:31 ` Michael S. Tsirkin
2021-08-11 3:38 ` Jason Wang
0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-10 15:31 UTC (permalink / raw)
To: Ivan
Cc: Willem de Bruijn, virtualization, Eric Dumazet, Jakub Kicinski,
David S. Miller
On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > >
> > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >
> > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >>
> > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >>>
> > > > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >>> > >
> > > > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > >>> > > > > > Actual changes:
> > > > > > >>> > > > > > rx-lro: on [requested off]
> > > > > > >>> > > > > > Could not change any device features
> > > > > > >>> > > > >
> > > > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > > > >>> > > > >
> > > > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > >>> > > >
> > > > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > > > >>> > > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > >>> > >
> > > > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > > > >>> > >
> > > > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > > > >>> >
> > > > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > >>>
> > > > > > >>> I was able to reproduce the warning but not the panic.
> > > > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > > > >>> is already a problem. Any chance you can bisect to
> > > > > > >>> find out which change introduced the panic?
> > > > > > >>
> > > > > > >>
> > > > > > >> Any kernels up to 4.19.198 don't panic.
> > > > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > >> That may take a day or so. I'll get on with it now, and report my findings.
> > > > > > >
> > > > > > > So, I narrowed it down: the panics start with kernel 5.0-rc.
> > > > > >
> > > > > > More narowly, the problem seems be coming from commit
> > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > and the panic went away. Hope that helps you guys figure out
> > > > > > what the problem might be.
> > > >
> > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > where does it panic, all we see it the warning which is
> > > > related for sure but not the immediate rootcause ...
> > > >
> > > > > >
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -2978,11 +2978,6 @@
> > > > > > }
> > > > > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > dev->features |= NETIF_F_RXCSUM;
> > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > - dev->features |= NETIF_F_LRO;
> > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > - dev->hw_features |= NETIF_F_LRO;
> > > > > >
> > > > > > dev->vlan_features = dev->features;
> > > > >
> > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > >
> > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > problem is sorted out?
> > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > >
> > > >
> > > > Something like this I guess:
> > > >
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 8a58a2f013af..cc5982193a40 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > }
> > > >
> > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > return 0;
> > > > }
> > >
> > > When I apply your patch, then I see drastic (more than half)
> > > reductions in speed. (confirmed with iperf).
> > >
> > > But if instead I just remove a few lines from commit
> > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > as in my earlier post, then I'm back to full speed
> > >
> > > I understand that this is just temporary workaround, until we figure this out.
> >
> >
> > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > features.
> >
> > How about this then? Just pretend to Linux that we disabled LRO.
> >
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8a58a2f013af..8e7e4cea176b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > ~GUEST_OFFLOAD_LRO_MASK;
> >
> > err = virtnet_set_guest_offloads(vi, offloads);
> > - if (err)
> > - return err;
> > + WARN_ON(err);
> > + //if (err)
> > + // return err;
> > vi->guest_offloads = offloads;
> > }
>
> No. With this applied, the problem persists:
>
> # echo "1" > /proc/sys/net/ipv4/ip_forward
>
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
Again the warning isn't a big deal. I agree we should address - Jason
any update? But the main issue is you lose connectivity. That still
persists with this? Can't you get a serial connection
out? I know qemu Did the kernel oops afterwards?
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-08-10 15:31 ` Michael S. Tsirkin
@ 2021-08-11 3:38 ` Jason Wang
2021-08-11 7:39 ` Michael S. Tsirkin
0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-08-11 3:38 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
[-- Attachment #1: Type: text/plain, Size: 9044 bytes --]
On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> > On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >
> > > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > >>
> > > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >>>
> > > > > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >>> > >
> > > > > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > > >>> > > > > > Actual changes:
> > > > > > > >>> > > > > > rx-lro: on [requested off]
> > > > > > > >>> > > > > > Could not change any device features
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > > >>> > > >
> > > > > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > > > > >>> > > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > > > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > > > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > > >>> > >
> > > > > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > > > > >>> > >
> > > > > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > > > > >>> >
> > > > > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > > >>>
> > > > > > > >>> I was able to reproduce the warning but not the panic.
> > > > > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > > > > >>> is already a problem. Any chance you can bisect to
> > > > > > > >>> find out which change introduced the panic?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Any kernels up to 4.19.198 don't panic.
> > > > > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > > >> That may take a day or so. I'll get on with it now, and report my findings.
> > > > > > > >
> > > > > > > > So, I narrowed it down: the panics start with kernel 5.0-rc.
> > > > > > >
> > > > > > > More narowly, the problem seems be coming from commit
> > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > > and the panic went away. Hope that helps you guys figure out
> > > > > > > what the problem might be.
> > > > >
> > > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > > where does it panic, all we see it the warning which is
> > > > > related for sure but not the immediate rootcause ...
> > > > >
> > > > > > >
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -2978,11 +2978,6 @@
> > > > > > > }
> > > > > > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > > dev->features |= NETIF_F_RXCSUM;
> > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > > - dev->features |= NETIF_F_LRO;
> > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > > - dev->hw_features |= NETIF_F_LRO;
> > > > > > >
> > > > > > > dev->vlan_features = dev->features;
> > > > > >
> > > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > > >
> > > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > > problem is sorted out?
> > > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > > >
> > > > >
> > > > > Something like this I guess:
> > > > >
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 8a58a2f013af..cc5982193a40 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > > }
> > > > >
> > > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > > return 0;
> > > > > }
> > > >
> > > > When I apply your patch, then I see drastic (more than half)
> > > > reductions in speed. (confirmed with iperf).
> > > >
> > > > But if instead I just remove a few lines from commit
> > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > > as in my earlier post, then I'm back to full speed
> > > >
> > > > I understand that this is just temporary workaround, until we figure this out.
> > >
> > >
> > > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > > features.
> > >
> > > How about this then? Just pretend to Linux that we disabled LRO.
> > >
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 8a58a2f013af..8e7e4cea176b 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > > ~GUEST_OFFLOAD_LRO_MASK;
> > >
> > > err = virtnet_set_guest_offloads(vi, offloads);
> > > - if (err)
> > > - return err;
> > > + WARN_ON(err);
> > > + //if (err)
> > > + // return err;
> > > vi->guest_offloads = offloads;
> > > }
> >
> > No. With this applied, the problem persists:
> >
> > # echo "1" > /proc/sys/net/ipv4/ip_forward
> >
> > kernel: ------------[ cut here ]------------
> > kernel: netdevice: eth0: failed to disable LRO!
> > kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> > dev_disable_lro+0x108/0x150
> > kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> > hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> > libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> > ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> > i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> > rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> > kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > VirtualBox 12/01/2006
> > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>
> Again the warning isn't a big deal. I agree we should address - Jason
> any update?
I still think using NETIF_F_LRO might not be correct. Since we're
basically receiving GSO packets.
And it might cause a lot of issues if the device doesn't have
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
I see two possible fixes:
1) using NETIF_F_GRO_HW instead (the patch is attached)
or
2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
Thanks
> But the main issue is you lose connectivity. That still
> persists with this? Can't you get a serial connection
> out? I know qemu Did the kernel oops afterwards?
>
> --
> MST
>
[-- Attachment #2: 0001-virtio-net-use-NETIF_F_GRO_HW-instead-of-NETIF_F_LRO.patch --]
[-- Type: application/octet-stream, Size: 2478 bytes --]
From 3fcf302686bc5fc080a58338ec84fb21f3973071 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Wed, 11 Aug 2021 10:48:20 +0800
Subject: [PATCH] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0416a7e00914..10c382b08bce 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -63,7 +63,7 @@ static const unsigned long guest_offloads[] = {
VIRTIO_NET_F_GUEST_CSUM
};
-#define GUEST_OFFLOAD_LRO_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
(1ULL << VIRTIO_NET_F_GUEST_ECN) | \
(1ULL << VIRTIO_NET_F_GUEST_UFO))
@@ -2481,7 +2481,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
- NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO/CSUM, disable LRO/CSUM first");
+ NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
return -EOPNOTSUPP;
}
@@ -2612,15 +2612,15 @@ static int virtnet_set_features(struct net_device *dev,
u64 offloads;
int err;
- if ((dev->features ^ features) & NETIF_F_LRO) {
+ if ((dev->features ^ features) & NETIF_F_GRO_HW) {
if (vi->xdp_enabled)
return -EBUSY;
- if (features & NETIF_F_LRO)
+ if (features & NETIF_F_GRO_HW)
offloads = vi->guest_offloads_capable;
else
offloads = vi->guest_offloads_capable &
- ~GUEST_OFFLOAD_LRO_MASK;
+ ~GUEST_OFFLOAD_GRO_HW_MASK;
err = virtnet_set_guest_offloads(vi, offloads);
if (err)
@@ -3100,9 +3100,9 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->features |= NETIF_F_RXCSUM;
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
- dev->features |= NETIF_F_LRO;
+ dev->features |= NETIF_F_GRO_HW;
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
- dev->hw_features |= NETIF_F_LRO;
+ dev->hw_features |= NETIF_F_GRO_HW;
dev->vlan_features = dev->features;
--
2.25.1
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-08-11 3:38 ` Jason Wang
@ 2021-08-11 7:39 ` Michael S. Tsirkin
2021-08-11 7:45 ` Jason Wang
0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-11 7:39 UTC (permalink / raw)
To: Jason Wang
Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
> On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> > > On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > >>
> > > > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >>>
> > > > > > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >>> > >
> > > > > > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > > > >>> > > > > > Actual changes:
> > > > > > > > >>> > > > > > rx-lro: on [requested off]
> > > > > > > > >>> > > > > > Could not change any device features
> > > > > > > > >>> > > > >
> > > > > > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > > > > > >>> > > > >
> > > > > > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > > > >>> > > >
> > > > > > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > > > > > >>> > > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > > > > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > > > > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > > > >>> > >
> > > > > > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > > > > > >>> > >
> > > > > > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > > > > > >>> >
> > > > > > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > > > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > > > >>>
> > > > > > > > >>> I was able to reproduce the warning but not the panic.
> > > > > > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > > > > > >>> is already a problem. Any chance you can bisect to
> > > > > > > > >>> find out which change introduced the panic?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Any kernels up to 4.19.198 don't panic.
> > > > > > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > > > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > > > >> That may take a day or so. I'll get on with it now, and report my findings.
> > > > > > > > >
> > > > > > > > > So, I narrowed it down: the panics start with kernel 5.0-rc.
> > > > > > > >
> > > > > > > > More narowly, the problem seems be coming from commit
> > > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > > > and the panic went away. Hope that helps you guys figure out
> > > > > > > > what the problem might be.
> > > > > >
> > > > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > > > where does it panic, all we see it the warning which is
> > > > > > related for sure but not the immediate rootcause ...
> > > > > >
> > > > > > > >
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -2978,11 +2978,6 @@
> > > > > > > > }
> > > > > > > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > > > dev->features |= NETIF_F_RXCSUM;
> > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > > > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > > > - dev->features |= NETIF_F_LRO;
> > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > > > - dev->hw_features |= NETIF_F_LRO;
> > > > > > > >
> > > > > > > > dev->vlan_features = dev->features;
> > > > > > >
> > > > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > > > >
> > > > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > > > problem is sorted out?
> > > > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > > > >
> > > > > >
> > > > > > Something like this I guess:
> > > > > >
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 8a58a2f013af..cc5982193a40 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > > > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > > > }
> > > > > >
> > > > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > > > return 0;
> > > > > > }
> > > > >
> > > > > When I apply your patch, then I see drastic (more than half)
> > > > > reductions in speed. (confirmed with iperf).
> > > > >
> > > > > But if instead I just remove a few lines from commit
> > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > > > as in my earlier post, then I'm back to full speed
> > > > >
> > > > > I understand that this is just temporary workaround, until we figure this out.
> > > >
> > > >
> > > > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > > > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > > > features.
> > > >
> > > > How about this then? Just pretend to Linux that we disabled LRO.
> > > >
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 8a58a2f013af..8e7e4cea176b 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > > > ~GUEST_OFFLOAD_LRO_MASK;
> > > >
> > > > err = virtnet_set_guest_offloads(vi, offloads);
> > > > - if (err)
> > > > - return err;
> > > > + WARN_ON(err);
> > > > + //if (err)
> > > > + // return err;
> > > > vi->guest_offloads = offloads;
> > > > }
> > >
> > > No. With this applied, the problem persists:
> > >
> > > # echo "1" > /proc/sys/net/ipv4/ip_forward
> > >
> > > kernel: ------------[ cut here ]------------
> > > kernel: netdevice: eth0: failed to disable LRO!
> > > kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> > > dev_disable_lro+0x108/0x150
> > > kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> > > hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> > > libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> > > ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> > > i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> > > rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> > > kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > VirtualBox 12/01/2006
> > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> >
> > Again the warning isn't a big deal. I agree we should address - Jason
> > any update?
>
> I still think using NETIF_F_LRO might not be correct. Since we're
> basically receiving GSO packets.
>
> And it might cause a lot of issues if the device doesn't have
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
>
> I see two possible fixes:
>
> 1) using NETIF_F_GRO_HW instead (the patch is attached)
It's unfortunate you didn't inline. Anyway.
Ivan could you test the patch and report?
>
> or
Hmm. I am not sure we always preserve the GRO_HW requirement that
packets can be re-segmented to reconstruct the original packet stream.
Do all backends guarantee this? Could you explain why?
> 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
>
> Thanks
This one would slow guests on old hosts down significantly.
I am not sure why this didn't trigger previously btw -
we used not to have CTRL_GUEST_OFFLOADS after all.
> > But the main issue is you lose connectivity. That still
> > persists with this? Can't you get a serial connection
> > out? I know qemu Did the kernel oops afterwards?
> >
> > --
> > MST
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-08-11 7:39 ` Michael S. Tsirkin
@ 2021-08-11 7:45 ` Jason Wang
2021-08-11 8:01 ` Michael S. Tsirkin
0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2021-08-11 7:45 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
在 2021/8/11 下午3:39, Michael S. Tsirkin 写道:
> On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
>> On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
>>>> On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
>>>>>> On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
>>>>>>>> On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>> On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>> On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>> 在 2021/7/23 上午10:54, Ivan 写道:
>>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>>>> Does it work if you turn off lro before enabling the forwarding?
>>>>>>>>>>>>>>>>> 0 root@NuRaid:~# ethtool -K eth0 lro off
>>>>>>>>>>>>>>>>> Actual changes:
>>>>>>>>>>>>>>>>> rx-lro: on [requested off]
>>>>>>>>>>>>>>>>> Could not change any device features
>>>>>>>>>>>>>>>> Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>>>>>>>>>>>>>>>> which makes it impossible to change the LRO setting.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Did you use qemu? If yes, what's the qemu version you've used?
>>>>>>>>>>>>>>> These are VirtualBox machines, which I've been using for years with
>>>>>>>>>>>>>>> longterm kernels 4.19, and I never had such a problem. But now that I
>>>>>>>>>>>>>>> tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
>>>>>>>>>>>>>>> are just generic kernel builds, and a minimalistic userspace.
>>>>>>>>>>>>>> I would be useful to see the features your virtualbox instance provides
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>> # cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>> 1100010110111011111100000000000000000000000000000000000000000000
>>>>>>>>>>>> I was able to reproduce the warning but not the panic.
>>>>>>>>>>>> OTOH if LRO stays on when enabling forwarding that
>>>>>>>>>>>> is already a problem. Any chance you can bisect to
>>>>>>>>>>>> find out which change introduced the panic?
>>>>>>>>>>>
>>>>>>>>>>> Any kernels up to 4.19.198 don't panic.
>>>>>>>>>>> Any kernels 5.10+ panic immediately upon starting forwarding.
>>>>>>>>>>> I have not tested any kernels between 4.19 and 5.10.
>>>>>>>>>>> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
>>>>>>>>>>> That may take a day or so. I'll get on with it now, and report my findings.
>>>>>>>>>> So, I narrowed it down: the panics start with kernel 5.0-rc.
>>>>>>>>> More narowly, the problem seems be coming from commit
>>>>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
>>>>>>>>> Just to test my suspicion, I deleted a few lines from that code,
>>>>>>>>> and the panic went away. Hope that helps you guys figure out
>>>>>>>>> what the problem might be.
>>>>>>> Well it disables LRO but we knew this :( I'd help if we knew
>>>>>>> where does it panic, all we see it the warning which is
>>>>>>> related for sure but not the immediate rootcause ...
>>>>>>>
>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>> @@ -2978,11 +2978,6 @@
>>>>>>>>> }
>>>>>>>>> if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>>>>>>>>> dev->features |= NETIF_F_RXCSUM;
>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>>>>>> - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
>>>>>>>>> - dev->features |= NETIF_F_LRO;
>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
>>>>>>>>> - dev->hw_features |= NETIF_F_LRO;
>>>>>>>>>
>>>>>>>>> dev->vlan_features = dev->features;
>>>>>>>> Just FYI, Google turned up two similar bug reposts...
>>>>>>>> Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
>>>>>>>> Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
>>>>>>>>
>>>>>>>> Is there any sensible thing I could do, temporarily, until this
>>>>>>>> problem is sorted out?
>>>>>>>> Or am I simply stuck to kernels 4.19 on these machines for now?
>>>>>>>
>>>>>>> Something like this I guess:
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 8a58a2f013af..cc5982193a40 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
>>>>>>> __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>>>>>>> }
>>>>>>>
>>>>>>> + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
>>>>>>> + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
>>>>>>> return 0;
>>>>>>> }
>>>>>> When I apply your patch, then I see drastic (more than half)
>>>>>> reductions in speed. (confirmed with iperf).
>>>>>>
>>>>>> But if instead I just remove a few lines from commit
>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
>>>>>> as in my earlier post, then I'm back to full speed
>>>>>>
>>>>>> I understand that this is just temporary workaround, until we figure this out.
>>>>>
>>>>> Oh weird. So it's not about getting some weird LRO packet. We will get it with
>>>>> VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
>>>>> features.
>>>>>
>>>>> How about this then? Just pretend to Linux that we disabled LRO.
>>>>>
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index 8a58a2f013af..8e7e4cea176b 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
>>>>> ~GUEST_OFFLOAD_LRO_MASK;
>>>>>
>>>>> err = virtnet_set_guest_offloads(vi, offloads);
>>>>> - if (err)
>>>>> - return err;
>>>>> + WARN_ON(err);
>>>>> + //if (err)
>>>>> + // return err;
>>>>> vi->guest_offloads = offloads;
>>>>> }
>>>> No. With this applied, the problem persists:
>>>>
>>>> # echo "1" > /proc/sys/net/ipv4/ip_forward
>>>>
>>>> kernel: ------------[ cut here ]------------
>>>> kernel: netdevice: eth0: failed to disable LRO!
>>>> kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
>>>> dev_disable_lro+0x108/0x150
>>>> kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
>>>> hid_generic usbhid hid virtio_net net_failover failover aesni_intel
>>>> libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
>>>> ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
>>>> i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
>>>> rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
>>>> kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
>>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>>> VirtualBox 12/01/2006
>>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>> Again the warning isn't a big deal. I agree we should address - Jason
>>> any update?
>> I still think using NETIF_F_LRO might not be correct. Since we're
>> basically receiving GSO packets.
>>
>> And it might cause a lot of issues if the device doesn't have
>> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
>>
>> I see two possible fixes:
>>
>> 1) using NETIF_F_GRO_HW instead (the patch is attached)
> It's unfortunate you didn't inline. Anyway.
> Ivan could you test the patch and report?
>
>> or
> Hmm. I am not sure we always preserve the GRO_HW requirement that
> packets can be re-segmented to reconstruct the original packet stream.
> Do all backends guarantee this?
I think we can't.
> Could you explain why?
Or we probably need another new netdev feature like rx-gso?
>
>
>
>> 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
>>
>> Thanks
>
> This one would slow guests on old hosts down significantly.
Actually, it's not this proposal but see below.
>
> I am not sure why this didn't trigger previously
It looks to me it was caused by a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
("virtio-net: ethtool configurable LRO").
Before this commit we won't even advertise NETIF_F_LRO, so
dev_disable_lro() won't warn.
After this commit, we advertise LRO and dev_disable_lro() will try to
disable all guest offloads which will:
1) slow the traffic
and
2) warn if "lro" can't be disabled on the device without ctrl guest
offloads (e.g the virtualbox host)
Thanks
> btw -
> we used not to have CTRL_GUEST_OFFLOADS after all.
>
>
>
>>> But the main issue is you lose connectivity. That still
>>> persists with this? Can't you get a serial connection
>>> out? I know qemu Did the kernel oops afterwards?
>>>
>>> --
>>> MST
>>>
>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-08-11 7:45 ` Jason Wang
@ 2021-08-11 8:01 ` Michael S. Tsirkin
2021-08-11 8:17 ` Jason Wang
0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-08-11 8:01 UTC (permalink / raw)
To: Jason Wang
Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
On Wed, Aug 11, 2021 at 03:45:48PM +0800, Jason Wang wrote:
>
> 在 2021/8/11 下午3:39, Michael S. Tsirkin 写道:
> > On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
> > > On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> > > > > On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > > > > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > > > > > > > On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > > > > > > > > > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > > > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > > > > > > > > Does it work if you turn off lro before enabling the forwarding?
> > > > > > > > > > > > > > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > > > > > > > > > > > > > Actual changes:
> > > > > > > > > > > > > > > > > > rx-lro: on [requested off]
> > > > > > > > > > > > > > > > > > Could not change any device features
> > > > > > > > > > > > > > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > > > > > > > > > > > > which makes it impossible to change the LRO setting.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > > > > > > > > > > > These are VirtualBox machines, which I've been using for years with
> > > > > > > > > > > > > > > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > > > > > > > > > > > > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > > > > > > > > > > > > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > > > > > > > > > > I would be useful to see the features your virtualbox instance provides
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > cat /sys/class/net/eth0/device/features
> > > > > > > > > > > > > > # cat /sys/class/net/eth0/device/features
> > > > > > > > > > > > > > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > > > > > > > > I was able to reproduce the warning but not the panic.
> > > > > > > > > > > > > OTOH if LRO stays on when enabling forwarding that
> > > > > > > > > > > > > is already a problem. Any chance you can bisect to
> > > > > > > > > > > > > find out which change introduced the panic?
> > > > > > > > > > > >
> > > > > > > > > > > > Any kernels up to 4.19.198 don't panic.
> > > > > > > > > > > > Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > > > > > > > I have not tested any kernels between 4.19 and 5.10.
> > > > > > > > > > > > I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > > > > > > > That may take a day or so. I'll get on with it now, and report my findings.
> > > > > > > > > > > So, I narrowed it down: the panics start with kernel 5.0-rc.
> > > > > > > > > > More narowly, the problem seems be coming from commit
> > > > > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > > > > > and the panic went away. Hope that helps you guys figure out
> > > > > > > > > > what the problem might be.
> > > > > > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > > > > > where does it panic, all we see it the warning which is
> > > > > > > > related for sure but not the immediate rootcause ...
> > > > > > > >
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -2978,11 +2978,6 @@
> > > > > > > > > > }
> > > > > > > > > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > > > > > dev->features |= NETIF_F_RXCSUM;
> > > > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > > > > > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > > > > > - dev->features |= NETIF_F_LRO;
> > > > > > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > > > > > - dev->hw_features |= NETIF_F_LRO;
> > > > > > > > > >
> > > > > > > > > > dev->vlan_features = dev->features;
> > > > > > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > > > > > >
> > > > > > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > > > > > problem is sorted out?
> > > > > > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > > > > > >
> > > > > > > > Something like this I guess:
> > > > > > > >
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 8a58a2f013af..cc5982193a40 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > > > > > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > > > > > }
> > > > > > > >
> > > > > > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > > > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > > > > > return 0;
> > > > > > > > }
> > > > > > > When I apply your patch, then I see drastic (more than half)
> > > > > > > reductions in speed. (confirmed with iperf).
> > > > > > >
> > > > > > > But if instead I just remove a few lines from commit
> > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > > > > > as in my earlier post, then I'm back to full speed
> > > > > > >
> > > > > > > I understand that this is just temporary workaround, until we figure this out.
> > > > > >
> > > > > > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > > > > > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > > > > > features.
> > > > > >
> > > > > > How about this then? Just pretend to Linux that we disabled LRO.
> > > > > >
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 8a58a2f013af..8e7e4cea176b 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > > > > > ~GUEST_OFFLOAD_LRO_MASK;
> > > > > >
> > > > > > err = virtnet_set_guest_offloads(vi, offloads);
> > > > > > - if (err)
> > > > > > - return err;
> > > > > > + WARN_ON(err);
> > > > > > + //if (err)
> > > > > > + // return err;
> > > > > > vi->guest_offloads = offloads;
> > > > > > }
> > > > > No. With this applied, the problem persists:
> > > > >
> > > > > # echo "1" > /proc/sys/net/ipv4/ip_forward
> > > > >
> > > > > kernel: ------------[ cut here ]------------
> > > > > kernel: netdevice: eth0: failed to disable LRO!
> > > > > kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> > > > > dev_disable_lro+0x108/0x150
> > > > > kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> > > > > hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> > > > > libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> > > > > ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> > > > > i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> > > > > rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> > > > > kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> > > > > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > > > > VirtualBox 12/01/2006
> > > > > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
> > > > Again the warning isn't a big deal. I agree we should address - Jason
> > > > any update?
> > > I still think using NETIF_F_LRO might not be correct. Since we're
> > > basically receiving GSO packets.
> > >
> > > And it might cause a lot of issues if the device doesn't have
> > > VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> > >
> > > I see two possible fixes:
> > >
> > > 1) using NETIF_F_GRO_HW instead (the patch is attached)
> > It's unfortunate you didn't inline. Anyway.
> > Ivan could you test the patch and report?
> >
> > > or
> > Hmm. I am not sure we always preserve the GRO_HW requirement that
> > packets can be re-segmented to reconstruct the original packet stream.
> > Do all backends guarantee this?
>
>
> I think we can't.
>
>
> > Could you explain why?
>
>
> Or we probably need another new netdev feature like rx-gso?
>
>
> >
> >
> >
> > > 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
> > >
> > > Thanks
> >
> > This one would slow guests on old hosts down significantly.
>
>
> Actually, it's not this proposal but see below.
>
>
> >
> > I am not sure why this didn't trigger previously
>
>
> It looks to me it was caused by a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> ("virtio-net: ethtool configurable LRO").
>
> Before this commit we won't even advertise NETIF_F_LRO, so dev_disable_lro()
> won't warn.
>
> After this commit, we advertise LRO and dev_disable_lro() will try to
> disable all guest offloads which will:
>
> 1) slow the traffic
>
> and
>
> 2) warn if "lro" can't be disabled on the device without ctrl guest offloads
> (e.g the virtualbox host)
>
> Thanks
OK. So I think I understand your comment now: GRO_HW makes sense simply
because historically before a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 we
never advertised LRO.
Can you post a patch RFC properly so Ivan can test?
>
> > btw -
> > we used not to have CTRL_GUEST_OFFLOADS after all.
> >
> >
> >
> > > > But the main issue is you lose connectivity. That still
> > > > persists with this? Can't you get a serial connection
> > > > out? I know qemu Did the kernel oops afterwards?
> > > >
> > > > --
> > > > MST
> > > >
> >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PROBLEM: virtio_net LRO kernel panics
2021-08-11 8:01 ` Michael S. Tsirkin
@ 2021-08-11 8:17 ` Jason Wang
0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2021-08-11 8:17 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Willem de Bruijn, Ivan, virtualization, Eric Dumazet,
Jakub Kicinski, David S. Miller
在 2021/8/11 下午4:01, Michael S. Tsirkin 写道:
> On Wed, Aug 11, 2021 at 03:45:48PM +0800, Jason Wang wrote:
>> 在 2021/8/11 下午3:39, Michael S. Tsirkin 写道:
>>> On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
>>>> On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
>>>>>> On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
>>>>>>>> On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>> On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
>>>>>>>>>> On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>> On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>>> On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
>>>>>>>>>>>>>>> On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
>>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>>>> 在 2021/7/23 上午10:54, Ivan 写道:
>>>>>>>>>>>>>>>>>>> On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>> Does it work if you turn off lro before enabling the forwarding?
>>>>>>>>>>>>>>>>>>> 0 root@NuRaid:~# ethtool -K eth0 lro off
>>>>>>>>>>>>>>>>>>> Actual changes:
>>>>>>>>>>>>>>>>>>> rx-lro: on [requested off]
>>>>>>>>>>>>>>>>>>> Could not change any device features
>>>>>>>>>>>>>>>>>> Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>>>>>>>>>>>>>>>>>> which makes it impossible to change the LRO setting.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Did you use qemu? If yes, what's the qemu version you've used?
>>>>>>>>>>>>>>>>> These are VirtualBox machines, which I've been using for years with
>>>>>>>>>>>>>>>>> longterm kernels 4.19, and I never had such a problem. But now that I
>>>>>>>>>>>>>>>>> tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
>>>>>>>>>>>>>>>>> are just generic kernel builds, and a minimalistic userspace.
>>>>>>>>>>>>>>>> I would be useful to see the features your virtualbox instance provides
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>>>> # cat /sys/class/net/eth0/device/features
>>>>>>>>>>>>>>> 1100010110111011111100000000000000000000000000000000000000000000
>>>>>>>>>>>>>> I was able to reproduce the warning but not the panic.
>>>>>>>>>>>>>> OTOH if LRO stays on when enabling forwarding that
>>>>>>>>>>>>>> is already a problem. Any chance you can bisect to
>>>>>>>>>>>>>> find out which change introduced the panic?
>>>>>>>>>>>>> Any kernels up to 4.19.198 don't panic.
>>>>>>>>>>>>> Any kernels 5.10+ panic immediately upon starting forwarding.
>>>>>>>>>>>>> I have not tested any kernels between 4.19 and 5.10.
>>>>>>>>>>>>> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
>>>>>>>>>>>>> That may take a day or so. I'll get on with it now, and report my findings.
>>>>>>>>>>>> So, I narrowed it down: the panics start with kernel 5.0-rc.
>>>>>>>>>>> More narowly, the problem seems be coming from commit
>>>>>>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
>>>>>>>>>>> Just to test my suspicion, I deleted a few lines from that code,
>>>>>>>>>>> and the panic went away. Hope that helps you guys figure out
>>>>>>>>>>> what the problem might be.
>>>>>>>>> Well it disables LRO but we knew this :( I'd help if we knew
>>>>>>>>> where does it panic, all we see it the warning which is
>>>>>>>>> related for sure but not the immediate rootcause ...
>>>>>>>>>
>>>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>>>> @@ -2978,11 +2978,6 @@
>>>>>>>>>>> }
>>>>>>>>>>> if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>>>>>>>>>>> dev->features |= NETIF_F_RXCSUM;
>>>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>>>>>>>> - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
>>>>>>>>>>> - dev->features |= NETIF_F_LRO;
>>>>>>>>>>> - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
>>>>>>>>>>> - dev->hw_features |= NETIF_F_LRO;
>>>>>>>>>>>
>>>>>>>>>>> dev->vlan_features = dev->features;
>>>>>>>>>> Just FYI, Google turned up two similar bug reposts...
>>>>>>>>>> Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
>>>>>>>>>> Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
>>>>>>>>>>
>>>>>>>>>> Is there any sensible thing I could do, temporarily, until this
>>>>>>>>>> problem is sorted out?
>>>>>>>>>> Or am I simply stuck to kernels 4.19 on these machines for now?
>>>>>>>>> Something like this I guess:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>>> index 8a58a2f013af..cc5982193a40 100644
>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>> @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
>>>>>>>>> __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
>>>>>>>>> + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>> When I apply your patch, then I see drastic (more than half)
>>>>>>>> reductions in speed. (confirmed with iperf).
>>>>>>>>
>>>>>>>> But if instead I just remove a few lines from commit
>>>>>>>> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
>>>>>>>> as in my earlier post, then I'm back to full speed
>>>>>>>>
>>>>>>>> I understand that this is just temporary workaround, until we figure this out.
>>>>>>> Oh weird. So it's not about getting some weird LRO packet. We will get it with
>>>>>>> VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
>>>>>>> features.
>>>>>>>
>>>>>>> How about this then? Just pretend to Linux that we disabled LRO.
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 8a58a2f013af..8e7e4cea176b 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
>>>>>>> ~GUEST_OFFLOAD_LRO_MASK;
>>>>>>>
>>>>>>> err = virtnet_set_guest_offloads(vi, offloads);
>>>>>>> - if (err)
>>>>>>> - return err;
>>>>>>> + WARN_ON(err);
>>>>>>> + //if (err)
>>>>>>> + // return err;
>>>>>>> vi->guest_offloads = offloads;
>>>>>>> }
>>>>>> No. With this applied, the problem persists:
>>>>>>
>>>>>> # echo "1" > /proc/sys/net/ipv4/ip_forward
>>>>>>
>>>>>> kernel: ------------[ cut here ]------------
>>>>>> kernel: netdevice: eth0: failed to disable LRO!
>>>>>> kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
>>>>>> dev_disable_lro+0x108/0x150
>>>>>> kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
>>>>>> hid_generic usbhid hid virtio_net net_failover failover aesni_intel
>>>>>> libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
>>>>>> ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
>>>>>> i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
>>>>>> rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
>>>>>> kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
>>>>>> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>>>>> VirtualBox 12/01/2006
>>>>>> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>>>>> Again the warning isn't a big deal. I agree we should address - Jason
>>>>> any update?
>>>> I still think using NETIF_F_LRO might not be correct. Since we're
>>>> basically receiving GSO packets.
>>>>
>>>> And it might cause a lot of issues if the device doesn't have
>>>> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
>>>>
>>>> I see two possible fixes:
>>>>
>>>> 1) using NETIF_F_GRO_HW instead (the patch is attached)
>>> It's unfortunate you didn't inline. Anyway.
>>> Ivan could you test the patch and report?
>>>
>>>> or
>>> Hmm. I am not sure we always preserve the GRO_HW requirement that
>>> packets can be re-segmented to reconstruct the original packet stream.
>>> Do all backends guarantee this?
>>
>> I think we can't.
>>
>>
>>> Could you explain why?
>>
>> Or we probably need another new netdev feature like rx-gso?
>>
>>
>>>
>>>
>>>> 2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
>>>>
>>>> Thanks
>>> This one would slow guests on old hosts down significantly.
>>
>> Actually, it's not this proposal but see below.
>>
>>
>>> I am not sure why this didn't trigger previously
>>
>> It looks to me it was caused by a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
>> ("virtio-net: ethtool configurable LRO").
>>
>> Before this commit we won't even advertise NETIF_F_LRO, so dev_disable_lro()
>> won't warn.
>>
>> After this commit, we advertise LRO and dev_disable_lro() will try to
>> disable all guest offloads which will:
>>
>> 1) slow the traffic
>>
>> and
>>
>> 2) warn if "lro" can't be disabled on the device without ctrl guest offloads
>> (e.g the virtualbox host)
>>
>> Thanks
> OK. So I think I understand your comment now: GRO_HW makes sense simply
> because historically before a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 we
> never advertised LRO.
Yes.
>
> Can you post a patch RFC properly so Ivan can test?
Done.
Thanks
>
>
>>> btw -
>>> we used not to have CTRL_GUEST_OFFLOADS after all.
>>>
>>>
>>>
>>>>> But the main issue is you lose connectivity. That still
>>>>> persists with this? Can't you get a serial connection
>>>>> out? I know qemu Did the kernel oops afterwards?
>>>>>
>>>>> --
>>>>> MST
>>>>>
>>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2021-08-11 8:17 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CACFia2dwacaVVYD+1uG=CDGaJqdCOSBvZ5FcXp04caecaWAY3w@mail.gmail.com>
2021-07-23 1:28 ` PROBLEM: virtio_net LRO kernel panics Tonghao Zhang
[not found] ` <CACFia2fDZHUZB5wJ7LK8M2sv_+W58rHw0NzzrwPPoX9=s7yPdQ@mail.gmail.com>
2021-07-23 2:37 ` Jason Wang
[not found] ` <CACFia2eLCJuy=w1r20691s_cSYkBkPaY-Dbd-9CkrcpSAe7z6g@mail.gmail.com>
2021-07-23 4:25 ` Jason Wang
[not found] ` <CACFia2eH3eCZxtt70LB5zoPbhLXRv=crPh5oOhR=6mY3auDdQA@mail.gmail.com>
2021-07-23 7:59 ` Michael S. Tsirkin
[not found] ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
2021-07-23 8:13 ` Michael S. Tsirkin
2021-07-23 12:10 ` Michael S. Tsirkin
[not found] ` <CACFia2en0JJDFyz3Umk-JTnMT=kjvRogt4PudED4kiLeMjcHFg@mail.gmail.com>
[not found] ` <CACFia2fx7Lt-4o_uqDznvk-VgdsMtD64qv6RYkrCjKLu2yt8bg@mail.gmail.com>
[not found] ` <CACFia2eUi4KNRC7MYktzUS9Nq2WcBiesX04Tbn2pTuvuGkY4qA@mail.gmail.com>
[not found] ` <CACFia2dns1rTe5OQj4H-kpurVm2CTtGfAXz0aOUS0_cs0QUrsA@mail.gmail.com>
2021-07-27 9:11 ` Michael S. Tsirkin
[not found] ` <CACFia2dLp19pzJsScSvVYREpQm0n6XCWLieWXzA94=OVYVHTbw@mail.gmail.com>
2021-08-02 19:51 ` Michael S. Tsirkin
[not found] ` <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
2021-08-10 15:31 ` Michael S. Tsirkin
2021-08-11 3:38 ` Jason Wang
2021-08-11 7:39 ` Michael S. Tsirkin
2021-08-11 7:45 ` Jason Wang
2021-08-11 8:01 ` Michael S. Tsirkin
2021-08-11 8:17 ` Jason Wang
[not found] ` <CACFia2fYQG4Y3_ffym06C1HGrOiOS38YWxuoUu4HYorwS9qOjA@mail.gmail.com>
2021-07-23 8:59 ` Michael S. Tsirkin
2021-07-30 11:42 ` Michael S. Tsirkin
2021-07-30 11:42 ` Michael S. Tsirkin
2021-07-30 17:04 ` Ivan
2021-07-31 20:53 ` Michael S. Tsirkin
2021-07-31 20:53 ` Michael S. Tsirkin
2021-07-31 23:52 ` Ivan
2021-08-02 4:35 ` Jason Wang
2021-08-02 4:35 ` Jason Wang
2021-08-02 18:16 ` Ivan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.