Linux 4.14 - regression: broken tun/tap / bridge network with virtio

All of lore.kernel.org
 help / color / mirror / Atom feed

* Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
@ 2017-11-26 14:17 Andreas Hartmann
  2017-11-27 16:46 ` Andreas Hartmann
  2017-12-01 10:11 ` Andreas Hartmann
  0 siblings, 2 replies; 34+ messages in thread
From: Andreas Hartmann @ 2017-11-26 14:17 UTC (permalink / raw)
  To: john.fastabend, netdev

Hello!

Since Linux 4.14 (running as host), the virtual network based on bridge
and tun/tap-devices is partly broken. Linux 4.13.x or earlier works
perfectly.

Given is the following architecture on host:

VM1 -> tun/tap -> br1
VM2 -> tun/tap -> br0 / br1
VM3 -> tun/tap -> br0

Example network configuration of the VMs:

    <interface type='bridge'>
      <mac address='...'/>
      <source bridge='br0'/>
      <model type='virtio'/>
    </interface>

Host is connected through br1.

VM2 is the router between two different networks provided by br1 and br0.

VM3 can be reached by the host via ssh through the router.
There are some more VMs in the network provided by br0 - e.g. VM4.

Mostly all VMs are Centos 7 / 64bit (Linux 3.10.x) VMs provided by
kvm_amd. VM1 uses Linux 4.4.x.

Now, VM1 sends a UDP IPv4 package (it's the first radius message from
hostapd during initialization of a EAP-TLS handshake with a WLAN client
- Access Request) to VM3. This package is answered by VM3 (Access
Challenge) and received by VM1.

Next, VM1 sends the second Access Request. I'm not sure, if this package
is still received by VM3 or not. But this is sure: VM1 never gets any
answer and the connection to VM3 is now *completely dead*. It isn't even
possible to reach VM3 by ssh any more.

There aren't any log messages - neither on the host, nor on the VM.
Other VMs aren't affected.

Bisecting the problem leads to this patch:

2ddf71e23cc246e95af72a6deed67b4a50a7b81c
net: add notifier hooks for devmap bpf map

It turns out, that the problem can be worked around by using e1000 as VM
interface instead of virtio for VM1 and 3:

    <interface type='bridge'>
      <mac address='52:54:00:15:ac:42'/>
      <source bridge='br0'/>
      <!-- <model type='virtio'/> -->
      <model type='e1000'/>
    </interface>

Would it be possible to fix this problem to get it working again with
virtio? Do you need some more information? Feel free to ask!

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-11-26 14:17 Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected Andreas Hartmann
@ 2017-11-27 16:46 ` Andreas Hartmann
  2017-11-27 16:55   ` Michal Kubecek
  2017-12-01 10:11 ` Andreas Hartmann
  1 sibling, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-11-27 16:46 UTC (permalink / raw)
  To: netdev; +Cc: john.fastabend

On 11/26/2017 at 03:17 PM Andreas Hartmann wrote:
> Hello!
> 
> Since Linux 4.14 (running as host), the virtual network based on bridge
> and tun/tap-devices is partly broken. Linux 4.13.x or earlier works
> perfectly.
> 
> 
> Given is the following architecture on host:
> 
> VM1 -> tun/tap -> br1
> VM2 -> tun/tap -> br0 / br1
> VM3 -> tun/tap -> br0
> 
> Example network configuration of the VMs:
> 
>     <interface type='bridge'>
>       <mac address='...'/>
>       <source bridge='br0'/>
>       <model type='virtio'/>
>     </interface>
> 
> 
> Host is connected through br1.
> 
> 
> VM2 is the router between two different networks provided by br1 and br0.
> 
> VM3 can be reached by the host via ssh through the router.
> There are some more VMs in the network provided by br0 - e.g. VM4.
> 
> Mostly all VMs are Centos 7 / 64bit (Linux 3.10.x) VMs provided by
> kvm_amd. VM1 uses Linux 4.4.x.
> 
> 
> Now, VM1 sends a UDP IPv4 package (it's the first radius message from
> hostapd during initialization of a EAP-TLS handshake with a WLAN client
> - Access Request) to VM3. This package is answered by VM3 (Access
> Challenge) and received by VM1.
> 
> Next, VM1 sends the second Access Request. I'm not sure, if this package
> is still received by VM3 or not. But this is sure: VM1 never gets any
> answer and the connection to VM3 is now *completely dead*. It isn't even
> possible to reach VM3 by ssh any more.
> 
> There aren't any log messages - neither on the host, nor on the VM.
> Other VMs aren't affected.
> 
> 
> Bisecting the problem leads to this patch:
> 
> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c
> net: add notifier hooks for devmap bpf map
> 
> 
> It turns out, that the problem can be worked around by using e1000 as VM
> interface instead of virtio for VM1 and 3:
> 
>     <interface type='bridge'>
>       <mac address='52:54:00:15:ac:42'/>
>       <source bridge='br0'/>
>       <!-- <model type='virtio'/> -->
>       <model type='e1000'/>
>     </interface>
> 
> 
> Would it be possible to fix this problem to get it working again with
> virtio? Do you need some more information? Feel free to ask!

Some additional information:

Using virtio not just breaks the network completely as described above,
it even leaves a never stoppable or restartable qemu process (even kill
-9 doesn't work). It's absolutely necessary to *force* a reboot to exit
or restart the VM.

I switched back to linux 4.13 as 4.14 virtualization is quite unusable.

I'm not the only one affected:
https://bugzilla.kernel.org/show_bug.cgi?id=197861


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-11-27 16:46 ` Andreas Hartmann
@ 2017-11-27 16:55   ` Michal Kubecek
  2017-11-27 19:09     ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Michal Kubecek @ 2017-11-27 16:55 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: netdev, john.fastabend

On Mon, Nov 27, 2017 at 05:46:14PM +0100, Andreas Hartmann wrote:
> 
> Using virtio not just breaks the network completely as described above,
> it even leaves a never stoppable or restartable qemu process (even kill
> -9 doesn't work). It's absolutely necessary to *force* a reboot to exit
> or restart the VM.
> 
> I switched back to linux 4.13 as 4.14 virtualization is quite unusable.
> 
> I'm not the only one affected:
> https://bugzilla.kernel.org/show_bug.cgi?id=197861

What does stack trace of that process look like (/proc/$pid/stack)? Is
it similar to the stack trace from kernel.org bugzilla?

Michal Kubecek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-11-27 16:55   ` Michal Kubecek
@ 2017-11-27 19:09     ` Andreas Hartmann
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Hartmann @ 2017-11-27 19:09 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev, john.fastabend

On 11/27/2017 at 05:55 PM Michal Kubecek wrote:
> On Mon, Nov 27, 2017 at 05:46:14PM +0100, Andreas Hartmann wrote:
>>
>> Using virtio not just breaks the network completely as described above,
>> it even leaves a never stoppable or restartable qemu process (even kill
>> -9 doesn't work). It's absolutely necessary to *force* a reboot to exit
>> or restart the VM.
>>
>> I switched back to linux 4.13 as 4.14 virtualization is quite unusable.
>>
>> I'm not the only one affected:
>> https://bugzilla.kernel.org/show_bug.cgi?id=197861
> 
> What does stack trace of that process look like (/proc/$pid/stack)? Is
> it similar to the stack trace from kernel.org bugzilla?

Unfortunately, I didn't look at /proc/$pid/stack at this moment (I had
two hanging VMs). I didn't get any log entry neither on the host nor in
the VM. I tried to strace one - but strace didn't show anything, two.

As I could see at the moment now, according to the logfiles of the VMs,
they have been shutdown completely - but qemu can't be killed:

Nov 26 20:43:08 proxy systemd: Stopping Remount Root and Kernel File
Systems...
Nov 26 20:43:08 proxy systemd: Stopping Monitoring of LVM2 mirrors,
snapshots etc. using dmeventd or progress polling...
Nov 26 20:43:08 proxy systemd: Stopped Create Static Device Nodes in /dev.
Nov 26 20:43:08 proxy systemd: Stopping Create Static Device Nodes in
/dev...
Nov 26 20:43:08 proxy systemd: Reached target Shutdown.
Nov 26 20:43:08 proxy systemd: Starting Shutdown.
Nov 26 20:43:08 proxy systemd: Reached target Final Step.
Nov 26 20:43:08 proxy systemd: Starting Final Step.
Nov 26 20:43:08 proxy systemd: Starting Power-Off...
Nov 26 20:43:08 proxy systemd: Shutting down.
Nov 26 20:43:08 proxy lvm: 3 logical volume(s) in volume group "centos"
unmonitored
Nov 26 20:43:08 proxy systemd-shutdown[1]: Sending SIGTERM to remaining
processes...
Nov 26 20:43:08 proxy lvmetad[500]: Failed to accept connection errno 11.
Nov 26 20:43:08 proxy journal: Journal stopped

Anyway, I reproduced the situation at the moment to get a stack while
qemu hangs forever. It looks like this:

[<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
[<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
[<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
[<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
[<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
[<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

Regards,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-11-26 14:17 Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected Andreas Hartmann
  2017-11-27 16:46 ` Andreas Hartmann
@ 2017-12-01 10:11 ` Andreas Hartmann
  2017-12-03 11:35   ` Andreas Hartmann
  1 sibling, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-01 10:11 UTC (permalink / raw)
  To: netdev; +Cc: john.fastabend, Michal Kubecek

Hello!

I hopefully could get rid of both of my problems (hanging network w/
virtio) and endless hanging qemu-process on VM shutdown by upgrading
qemu from 2.6.2 to 2.10.1. I hope it will persist.

BTW: Base system is Leap 42.2.

Sorry for the noise,
thanks,
kind regards,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-01 10:11 ` Andreas Hartmann
@ 2017-12-03 11:35   ` Andreas Hartmann
  2017-12-04 16:28     ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-03 11:35 UTC (permalink / raw)
  To: netdev; +Cc: john.fastabend, Michal Kubecek

On 12/01/2017 at 11:11 AM Andreas Hartmann wrote:
> Hello!
> 
> I hopefully could get rid of both of my problems (hanging network w/
> virtio) and endless hanging qemu-process on VM shutdown by upgrading
> qemu from 2.6.2 to 2.10.1. I hope it will persist.

It didn't persist. 10h later - same problems happened again. It's just
much harder to trigger the problems.

I'm now trying it with

CONFIG_RCU_NOCB_CPU=y and
rcu_nocbs=0-15

Since then, I didn't see any problem any more. But this doesn't mean
anything until now ... .


Regards,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-03 11:35   ` Andreas Hartmann
@ 2017-12-04 16:28     ` Andreas Hartmann
  2017-12-05  3:50       ` Jason Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-04 16:28 UTC (permalink / raw)
  To: Andreas Hartmann, netdev; +Cc: john.fastabend, Michal Kubecek

On 12/03/2017 at 12:35 PM Andreas Hartmann wrote:
> On 12/01/2017 at 11:11 AM Andreas Hartmann wrote:
>> Hello!
>>
>> I hopefully could get rid of both of my problems (hanging network w/
>> virtio) and endless hanging qemu-process on VM shutdown by upgrading
>> qemu from 2.6.2 to 2.10.1. I hope it will persist.
> 
> It didn't persist. 10h later - same problems happened again. It's just
> much harder to trigger the problems.
> 
> I'm now trying it with
> 
> CONFIG_RCU_NOCB_CPU=y and
> rcu_nocbs=0-15
> 
> Since then, I didn't see any problem any more. But this doesn't mean
> anything until now ... .

Didn't work ether. Disabling vhost_net's zcopy hadn't any effect, too.

=> It's just finally broken since

2ddf71e23cc246e95af72a6deed67b4a50a7b81c
net: add notifier hooks for devmap bpf map

Here is the same problem:
https://bugzilla.kernel.org/show_bug.cgi?id=197861

Any idea how to proceed?


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-04 16:28     ` Andreas Hartmann
@ 2017-12-05  3:50       ` Jason Wang
  2017-12-05 16:23         ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2017-12-05  3:50 UTC (permalink / raw)
  To: Andreas Hartmann, netdev; +Cc: john.fastabend, Michal Kubecek



On 2017年12月05日 00:28, Andreas Hartmann wrote:
> On 12/03/2017 at 12:35 PM Andreas Hartmann wrote:
>> On 12/01/2017 at 11:11 AM Andreas Hartmann wrote:
>>> Hello!
>>>
>>> I hopefully could get rid of both of my problems (hanging network w/
>>> virtio) and endless hanging qemu-process on VM shutdown by upgrading
>>> qemu from 2.6.2 to 2.10.1. I hope it will persist.
>> It didn't persist. 10h later - same problems happened again. It's just
>> much harder to trigger the problems.
>>
>> I'm now trying it with
>>
>> CONFIG_RCU_NOCB_CPU=y and
>> rcu_nocbs=0-15
>>
>> Since then, I didn't see any problem any more. But this doesn't mean
>> anything until now ... .
> Didn't work ether. Disabling vhost_net's zcopy hadn't any effect, too.
>
> => It's just finally broken since
>
> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c
> net: add notifier hooks for devmap bpf map

Hi:

Did you use XDP devmap in host? If not, please double check it was the 
first bad commit since the patch should only work when XDP/devmap is 
used on host.

Thanks

>
> Here is the same problem:
> https://bugzilla.kernel.org/show_bug.cgi?id=197861
>
> Any idea how to proceed?
>
>
> Thanks,
> Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-05  3:50       ` Jason Wang
@ 2017-12-05 16:23         ` Andreas Hartmann
  2017-12-06  3:08           ` Jason Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-05 16:23 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, john.fastabend, Michal Kubecek

On 12/05/2017 at 04:50 AM Jason Wang wrote:
> 
> 
> On 2017年12月05日 00:28, Andreas Hartmann wrote:
>> On 12/03/2017 at 12:35 PM Andreas Hartmann wrote:
>>> On 12/01/2017 at 11:11 AM Andreas Hartmann wrote:
>>>> Hello!
>>>>
>>>> I hopefully could get rid of both of my problems (hanging network w/
>>>> virtio) and endless hanging qemu-process on VM shutdown by upgrading
>>>> qemu from 2.6.2 to 2.10.1. I hope it will persist.
>>> It didn't persist. 10h later - same problems happened again. It's just
>>> much harder to trigger the problems.
>>>
>>> I'm now trying it with
>>>
>>> CONFIG_RCU_NOCB_CPU=y and
>>> rcu_nocbs=0-15
>>>
>>> Since then, I didn't see any problem any more. But this doesn't mean
>>> anything until now ... .
>> Didn't work ether. Disabling vhost_net's zcopy hadn't any effect, too.
>>
>> => It's just finally broken since
>>
>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c
>> net: add notifier hooks for devmap bpf map
> 
> Hi:
> 
> Did you use XDP devmap in host? If not, please double check it was the
> first bad commit since the patch should only work when XDP/devmap is
> used on host.

How do I know if XDP/devmap is enabled / used? Could you please give
some hint?


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-05 16:23         ` Andreas Hartmann
@ 2017-12-06  3:08           ` Jason Wang
  2017-12-08  7:21             ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2017-12-06  3:08 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: netdev, john.fastabend, Michal Kubecek



On 2017年12月06日 00:23, Andreas Hartmann wrote:
> On 12/05/2017 at 04:50 AM Jason Wang wrote:
>>
>> On 2017年12月05日 00:28, Andreas Hartmann wrote:
>>> On 12/03/2017 at 12:35 PM Andreas Hartmann wrote:
>>>> On 12/01/2017 at 11:11 AM Andreas Hartmann wrote:
>>>>> Hello!
>>>>>
>>>>> I hopefully could get rid of both of my problems (hanging network w/
>>>>> virtio) and endless hanging qemu-process on VM shutdown by upgrading
>>>>> qemu from 2.6.2 to 2.10.1. I hope it will persist.
>>>> It didn't persist. 10h later - same problems happened again. It's just
>>>> much harder to trigger the problems.
>>>>
>>>> I'm now trying it with
>>>>
>>>> CONFIG_RCU_NOCB_CPU=y and
>>>> rcu_nocbs=0-15
>>>>
>>>> Since then, I didn't see any problem any more. But this doesn't mean
>>>> anything until now ... .
>>> Didn't work ether. Disabling vhost_net's zcopy hadn't any effect, too.
>>>
>>> => It's just finally broken since
>>>
>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c
>>> net: add notifier hooks for devmap bpf map
>> Hi:
>>
>> Did you use XDP devmap in host? If not, please double check it was the
>> first bad commit since the patch should only work when XDP/devmap is
>> used on host.
> How do I know if XDP/devmap is enabled / used? Could you please give
> some hint?
>
>
> Thanks,
> Andreas

Something like:

./ip link | grep xdp
10: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc mq master 
kvmbr0 state UNKNOWN mode DEFAULT group default qlen 1000
     prog/xdp id 4 tag 0381911915bc8d7f

But you should have some recent version of ip.

Thanks

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-06  3:08           ` Jason Wang
@ 2017-12-08  7:21             ` Andreas Hartmann
  2017-12-08  8:47               ` Michal Kubecek
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-08  7:21 UTC (permalink / raw)
  To: Jason Wang, David Miller; +Cc: netdev, Michal Kubecek

On 12/06/2017 at 04:08 AM Jason Wang wrote:
> 
> 
> On 2017年12月06日 00:23, Andreas Hartmann wrote:
>> On 12/05/2017 at 04:50 AM Jason Wang wrote:
>>>
>>> On 2017年12月05日 00:28, Andreas Hartmann wrote:
>>>> On 12/03/2017 at 12:35 PM Andreas Hartmann wrote:
>>>>> On 12/01/2017 at 11:11 AM Andreas Hartmann wrote:
>>>>>> Hello!
>>>>>>
>>>>>> I hopefully could get rid of both of my problems (hanging network w/
>>>>>> virtio) and endless hanging qemu-process on VM shutdown by upgrading
>>>>>> qemu from 2.6.2 to 2.10.1. I hope it will persist.
>>>>> It didn't persist. 10h later - same problems happened again. It's just
>>>>> much harder to trigger the problems.
>>>>>
>>>>> I'm now trying it with
>>>>>
>>>>> CONFIG_RCU_NOCB_CPU=y and
>>>>> rcu_nocbs=0-15
>>>>>
>>>>> Since then, I didn't see any problem any more. But this doesn't mean
>>>>> anything until now ... .
>>>> Didn't work ether. Disabling vhost_net's zcopy hadn't any effect, too.
>>>>
>>>> => It's just finally broken since
>>>>
>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c
>>>> net: add notifier hooks for devmap bpf map
>>> Hi:
>>>
>>> Did you use XDP devmap in host? If not, please double check it was the
>>> first bad commit since the patch should only work when XDP/devmap is
>>> used on host.
>> How do I know if XDP/devmap is enabled / used? Could you please give
>> some hint?
>>
>>
>> Thanks,
>> Andreas
> 
> Something like:
> 
> ./ip link | grep xdp
> 10: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc mq master
> kvmbr0 state UNKNOWN mode DEFAULT group default qlen 1000
>     prog/xdp id 4 tag 0381911915bc8d7f
> 
> But you should have some recent version of ip.


Thanks for this hint - I'm not using xdp. Therefore I rechecked my
bisect and detected a mistake. The rebisect now leads to



[v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [1]



For the repeated bisect, I switched back to the original qemu 2.6.2
(instead of 2.10.1), because problems can be seen reliably with 2.6.2.

All my VMs are using virtio_net. BTW: I couldn't see the problems
(sometimes, the VM couldn't be stopped at all) if all my VMs are using
e1000 as interface instead.

This finding now matches pretty much the responsible UDP-package which
caused the stall. I already mentioned it here [2].

To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
Remove UDP Fragmentation Offload support" [3]

11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]

and applied it to Linux 4.14.4. It compiled fine and is running fine.
The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
are gone, too.

Obviously, there is something broken with the new UDP handling. Could
you please analyze this problem? I could test some more patches ... .



Thanks,
kind regards,
Andreas



[1] http://patchwork.ozlabs.org/patch/785411/
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg201635.html
[3] http://lists.openwall.net/netdev/2017/07/07/26
[4] http://patchwork.ozlabs.org/patch/785411/
[5] https://patchwork.ozlabs.org/patch/785413/
[6] https://patchwork.ozlabs.org/patch/785412/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08  7:21             ` Andreas Hartmann
@ 2017-12-08  8:47               ` Michal Kubecek
  2017-12-08 10:31                 ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Michal Kubecek @ 2017-12-08  8:47 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Jason Wang, David Miller, netdev

On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
> 
> Thanks for this hint - I'm not using xdp. Therefore I rechecked my
> bisect and detected a mistake. The rebisect now leads to
> 
> 
> 
> [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [1]
> 
> 
> 
> For the repeated bisect, I switched back to the original qemu 2.6.2
> (instead of 2.10.1), because problems can be seen reliably with 2.6.2.
> 
> All my VMs are using virtio_net. BTW: I couldn't see the problems
> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
> e1000 as interface instead.
> 
> This finding now matches pretty much the responsible UDP-package which
> caused the stall. I already mentioned it here [2].
> 
> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
> Remove UDP Fragmentation Offload support" [3]
> 
> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
> 
> and applied it to Linux 4.14.4. It compiled fine and is running fine.
> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
> are gone, too.
> 
> Obviously, there is something broken with the new UDP handling. Could
> you please analyze this problem? I could test some more patches ... .

Any chance your VMs were live migrated from pre-4.14 host kernel? If
this is the case, you should try commit 0c19f846d582 ("net: accept UFO
datagrams from tuntap and packet"). Or disabling UFO in the guest should
work around the issue.

                                                          Michal Kubecek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08  8:47               ` Michal Kubecek
@ 2017-12-08 10:31                 ` Andreas Hartmann
  2017-12-08 11:40                   ` Michal Kubecek
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-08 10:31 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Jason Wang, David Miller, netdev

On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
> On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>>
>> Thanks for this hint - I'm not using xdp. Therefore I rechecked my
>> bisect and detected a mistake. The rebisect now leads to
>>
>>
>>
>> [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [1]
>>
>>
>>
>> For the repeated bisect, I switched back to the original qemu 2.6.2
>> (instead of 2.10.1), because problems can be seen reliably with 2.6.2.
>>
>> All my VMs are using virtio_net. BTW: I couldn't see the problems
>> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>> e1000 as interface instead.
>>
>> This finding now matches pretty much the responsible UDP-package which
>> caused the stall. I already mentioned it here [2].
>>
>> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>> Remove UDP Fragmentation Offload support" [3]
>>
>> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>>
>> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>> are gone, too.
>>
>> Obviously, there is something broken with the new UDP handling. Could
>> you please analyze this problem? I could test some more patches ... .
> 
> Any chance your VMs were live migrated from pre-4.14 host kernel?

No - the VMs are not live migrated. They are always running on the same
host - either with kernel < 4.14 or with kernel 4.14.x.

> If
> this is the case, you should try commit 0c19f846d582 ("net: accept UFO
> datagrams from tuntap and packet"). 

It doesn't apply to 4.14.4

> Or disabling UFO in the guest should
> work around the issue.

ethtool -K ethX ufo off for each device / bridge in VM.

Yes, this seems to work. I'll wait and see if the non stoppable
qemu-problem on shutdown will remain.


When will there be a fix for 4.14? It is clearly a regression. Is it
possible / a good idea to just remove the complete patch series "Remove
UDP Fragmentation Offload support"?


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 10:31                 ` Andreas Hartmann
@ 2017-12-08 11:40                   ` Michal Kubecek
  2017-12-08 12:45                     ` Andreas Hartmann
  2017-12-08 16:04                     ` Willem de Bruijn
  0 siblings, 2 replies; 34+ messages in thread
From: Michal Kubecek @ 2017-12-08 11:40 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Jason Wang, David Miller, netdev

On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
> > On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
> >>
> >> All my VMs are using virtio_net. BTW: I couldn't see the problems
> >> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
> >> e1000 as interface instead.
> >>
> >> This finding now matches pretty much the responsible UDP-package which
> >> caused the stall. I already mentioned it here [2].
> >>
> >> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
> >> Remove UDP Fragmentation Offload support" [3]
> >>
> >> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
> >> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
> >> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
> >>
> >> and applied it to Linux 4.14.4. It compiled fine and is running fine.
> >> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
> >> are gone, too.
> >>
> >> Obviously, there is something broken with the new UDP handling. Could
> >> you please analyze this problem? I could test some more patches ... .
> > 
> > Any chance your VMs were live migrated from pre-4.14 host kernel?
> 
> No - the VMs are not live migrated. They are always running on the same
> host - either with kernel < 4.14 or with kernel 4.14.x.

This is disturbing... unless I'm mistaken, it shouldn't be possible to
have UFO enabled on a virtio device in a VM booted on a host with 4.14
kernel.

> > If this is the case, you should try commit 0c19f846d582 ("net:
> > accept UFO datagrams from tuntap and packet"). 
> 
> It doesn't apply to 4.14.4
> 
> > Or disabling UFO in the guest should
> > work around the issue.
> 
> ethtool -K ethX ufo off for each device / bridge in VM.
> 
> Yes, this seems to work. I'll wait and see if the non stoppable
> qemu-problem on shutdown will remain.
> 
> When will there be a fix for 4.14? It is clearly a regression. Is it
> possible / a good idea to just remove the complete patch series "Remove
> UDP Fragmentation Offload support"?

I cannot give an exact date but the patch is queued for stable
(see http://patchwork.ozlabs.org/bundle/davem/stable/?state=* ) so that
it should land in stable-4.14 in near future (weeks at most).

                                                         Michal Kubecek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 11:40                   ` Michal Kubecek
@ 2017-12-08 12:45                     ` Andreas Hartmann
  2017-12-08 12:58                       ` Michal Kubecek
  2017-12-08 16:04                     ` Willem de Bruijn
  1 sibling, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-08 12:45 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Jason Wang, David Miller, netdev

On 12/08/2017 at 12:40 PM Michal Kubecek wrote:
> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
>>> On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>>>>
>>>> All my VMs are using virtio_net. BTW: I couldn't see the problems
>>>> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>>>> e1000 as interface instead.
>>>>
>>>> This finding now matches pretty much the responsible UDP-package which
>>>> caused the stall. I already mentioned it here [2].
>>>>
>>>> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>>>> Remove UDP Fragmentation Offload support" [3]
>>>>
>>>> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>>>> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>>>> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>>>>
>>>> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>>>> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>>>> are gone, too.
>>>>
>>>> Obviously, there is something broken with the new UDP handling. Could
>>>> you please analyze this problem? I could test some more patches ... .
>>>
>>> Any chance your VMs were live migrated from pre-4.14 host kernel?
>>
>> No - the VMs are not live migrated. They are always running on the same
>> host - either with kernel < 4.14 or with kernel 4.14.x.
> 
> This is disturbing... unless I'm mistaken, it shouldn't be possible to
> have UFO enabled on a virtio device in a VM booted on a host with 4.14
> kernel.

It is on by default. I have to explicitly switch it off. As described below.

host:
# rebooted to kernel 4.14.x
uname -r
4.14.4-2.1-default

# just checked: bridges on host have disabled ufo w/ 4.14 per default.


guest:
uname -r
4.9.63-1.2-default # same with 3.10.x

lsmod | grep -e e1000 -e virtio_net
virtio_net             32768  0
virtio                 16384  4
virtio_net,virtio_balloon,virtio_pci,virtio_scsi
virtio_ring            24576  4
virtio_net,virtio_balloon,virtio_pci,virtio_scsi

lspci -vs 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
        Subsystem: Red Hat, Inc Device 0001
        Physical Slot: 3
        Flags: bus master, fast devsel, latency 0, IRQ 10
        I/O ports at c060 [size=32]
        Memory at febf1000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at feb80000 [disabled] [size=256K]
        Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
        Kernel driver in use: virtio-pci
        Kernel modules: virtio_pci

# after ufo was manually turned off on VM boot:
ethtool -k eth0 | grep fragm
udp-fragmentation-offload: off

ethtool -K eth0 ufo on

ethtool -k eth0 | grep fragm
udp-fragmentation-offload: on

ethtool -K eth0 ufo off

ethtool -k eth0 | grep fragm
udp-fragmentation-offload: off

> 
>>> If this is the case, you should try commit 0c19f846d582 ("net:
>>> accept UFO datagrams from tuntap and packet"). 
>>
>> It doesn't apply to 4.14.4
>>
>>> Or disabling UFO in the guest should
>>> work around the issue.
>>
>> ethtool -K ethX ufo off for each device / bridge in VM.
>>
>> Yes, this seems to work. I'll wait and see if the non stoppable
>> qemu-problem on shutdown will remain.
>>
>> When will there be a fix for 4.14? It is clearly a regression. Is it
>> possible / a good idea to just remove the complete patch series "Remove
>> UDP Fragmentation Offload support"?
> 
> I cannot give an exact date but the patch is queued for stable
> (see http://patchwork.ozlabs.org/bundle/davem/stable/?state=* ) so that
> it should land in stable-4.14 in near future (weeks at most).

Which one is it? I couldn't find any patch related to this problem at
first glance.


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 12:45                     ` Andreas Hartmann
@ 2017-12-08 12:58                       ` Michal Kubecek
  2017-12-08 13:13                         ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Michal Kubecek @ 2017-12-08 12:58 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Jason Wang, David Miller, netdev

On Fri, Dec 08, 2017 at 01:45:38PM +0100, Andreas Hartmann wrote:
> On 12/08/2017 at 12:40 PM Michal Kubecek wrote:
> > On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
> >>
> >> When will there be a fix for 4.14? It is clearly a regression. Is
> >> it possible / a good idea to just remove the complete patch series
> >> "Remove UDP Fragmentation Offload support"?
> > 
> > I cannot give an exact date but the patch is queued for stable (see
> > http://patchwork.ozlabs.org/bundle/davem/stable/?state=* ) so that
> > it should land in stable-4.14 in near future (weeks at most).
> 
> Which one is it? I couldn't find any patch related to this problem at
> first glance.

"[net,v2] net: accept UFO datagrams from tuntap and packet" - the
subject was mentioned in one of my earlier e-mails (with commit id).

Michal Kubecek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 12:58                       ` Michal Kubecek
@ 2017-12-08 13:13                         ` Andreas Hartmann
  2017-12-08 15:11                           ` Jason Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-08 13:13 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Jason Wang, David Miller, netdev

On 12/08/2017 at 01:58 PM Michal Kubecek wrote:
> On Fri, Dec 08, 2017 at 01:45:38PM +0100, Andreas Hartmann wrote:
>> On 12/08/2017 at 12:40 PM Michal Kubecek wrote:
>>> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>>>>
>>>> When will there be a fix for 4.14? It is clearly a regression. Is
>>>> it possible / a good idea to just remove the complete patch series
>>>> "Remove UDP Fragmentation Offload support"?
>>>
>>> I cannot give an exact date but the patch is queued for stable (see
>>> http://patchwork.ozlabs.org/bundle/davem/stable/?state=* ) so that
>>> it should land in stable-4.14 in near future (weeks at most).
>>
>> Which one is it? I couldn't find any patch related to this problem at
>> first glance.
> 
> "[net,v2] net: accept UFO datagrams from tuntap and packet" - the
> subject was mentioned in one of my earlier e-mails (with commit id).

Oh - I thought this would only work during live migration (which doesn't
happen here). My error.


Thanks,
regards,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 13:13                         ` Andreas Hartmann
@ 2017-12-08 15:11                           ` Jason Wang
  0 siblings, 0 replies; 34+ messages in thread
From: Jason Wang @ 2017-12-08 15:11 UTC (permalink / raw)
  To: Andreas Hartmann, Michal Kubecek; +Cc: David Miller, netdev, Willem de Bruijn



On 2017年12月08日 21:13, Andreas Hartmann wrote:
> On 12/08/2017 at 01:58 PM Michal Kubecek wrote:
>> On Fri, Dec 08, 2017 at 01:45:38PM +0100, Andreas Hartmann wrote:
>>> On 12/08/2017 at 12:40 PM Michal Kubecek wrote:
>>>> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>>>>> When will there be a fix for 4.14? It is clearly a regression. Is
>>>>> it possible / a good idea to just remove the complete patch series
>>>>> "Remove UDP Fragmentation Offload support"?
>>>> I cannot give an exact date but the patch is queued for stable (see
>>>> http://patchwork.ozlabs.org/bundle/davem/stable/?state=* ) so that
>>>> it should land in stable-4.14 in near future (weeks at most).
>>> Which one is it? I couldn't find any patch related to this problem at
>>> first glance.
>> "[net,v2] net: accept UFO datagrams from tuntap and packet" - the
>> subject was mentioned in one of my earlier e-mails (with commit id).
> Oh - I thought this would only work during live migration (which doesn't
> happen here). My error.
>
>
> Thanks,
> regards,
> Andreas

I think you can either wait it to go for stable or test Linus tree which 
has already contained the patch.

Thanks

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 11:40                   ` Michal Kubecek
  2017-12-08 12:45                     ` Andreas Hartmann
@ 2017-12-08 16:04                     ` Willem de Bruijn
  2017-12-08 20:11                       ` Andreas Hartmann
  1 sibling, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2017-12-08 16:04 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Andreas Hartmann, Jason Wang, David Miller, Network Development

On Fri, Dec 8, 2017 at 6:40 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
>> > On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>> >>
>> >> All my VMs are using virtio_net. BTW: I couldn't see the problems
>> >> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>> >> e1000 as interface instead.
>> >>
>> >> This finding now matches pretty much the responsible UDP-package which
>> >> caused the stall. I already mentioned it here [2].
>> >>
>> >> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>> >> Remove UDP Fragmentation Offload support" [3]
>> >>
>> >> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>> >> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>> >> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>> >>
>> >> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>> >> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>> >> are gone, too.
>> >>
>> >> Obviously, there is something broken with the new UDP handling. Could
>> >> you please analyze this problem? I could test some more patches ... .
>> >
>> > Any chance your VMs were live migrated from pre-4.14 host kernel?
>>
>> No - the VMs are not live migrated. They are always running on the same
>> host - either with kernel < 4.14 or with kernel 4.14.x.
>
> This is disturbing... unless I'm mistaken, it shouldn't be possible to
> have UFO enabled on a virtio device in a VM booted on a host with 4.14
> kernel.

Indeed. When working on that revert patch I verified that UFO in
the guest virtio_net was off before the revert patch, on after.

Qemu should check host support with tap_probe_has_ufo
before advertising support to the guest. Indeed, this is exactly
what broke live migration in virtio_net_load_device at

    if (qemu_get_byte(f) && !peer_has_ufo(n)) {
        error_report("virtio-net: saved image requires TUN_F_UFO support");
        return -1;
    }

Which follows

   peer_has_ufo
     qemu_has_ufo
       tap_has_ufo
         s->has_ufo

where s->has_ufo was set by tap_probe_has_ufo in net_tap_fd_init.

Now, checking my qemu git branch, I ran pretty old 2.7.0-rc3. But this
codepath does not seem to have changed between then and 2.10.1.

I cherry-picked the revert onto 4.14.3. It did not apply cleanly, but the
fix-up wasn't too hard. Compiled and booted, but untested otherwise. At

  https://github.com/wdebruij/linux/commits/v4.14.3-aargh-ufo


>
>> > If this is the case, you should try commit 0c19f846d582 ("net:
>> > accept UFO datagrams from tuntap and packet").
>>
>> It doesn't apply to 4.14.4
>>
>> > Or disabling UFO in the guest should
>> > work around the issue.
>>
>> ethtool -K ethX ufo off for each device / bridge in VM.
>>
>> Yes, this seems to work. I'll wait and see if the non stoppable
>> qemu-problem on shutdown will remain.
>>
>> When will there be a fix for 4.14? It is clearly a regression. Is it
>> possible / a good idea to just remove the complete patch series "Remove
>> UDP Fragmentation Offload support"?
>
> I cannot give an exact date but the patch is queued for stable
> (see http://patchwork.ozlabs.org/bundle/davem/stable/?state=* ) so that
> it should land in stable-4.14 in near future (weeks at most).
>
>                                                          Michal Kubecek
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 16:04                     ` Willem de Bruijn
@ 2017-12-08 20:11                       ` Andreas Hartmann
  2017-12-08 20:44                         ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-08 20:11 UTC (permalink / raw)
  To: Willem de Bruijn, Michal Kubecek
  Cc: Jason Wang, David Miller, Network Development

On 12/08/2017 at 05:04 PM Willem de Bruijn wrote:
> On Fri, Dec 8, 2017 at 6:40 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
>> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>>> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
>>>> On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>>>>>
>>>>> All my VMs are using virtio_net. BTW: I couldn't see the problems
>>>>> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>>>>> e1000 as interface instead.
>>>>>
>>>>> This finding now matches pretty much the responsible UDP-package which
>>>>> caused the stall. I already mentioned it here [2].
>>>>>
>>>>> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>>>>> Remove UDP Fragmentation Offload support" [3]
>>>>>
>>>>> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>>>>> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>>>>> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>>>>>
>>>>> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>>>>> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>>>>> are gone, too.
>>>>>
>>>>> Obviously, there is something broken with the new UDP handling. Could
>>>>> you please analyze this problem? I could test some more patches ... .
>>>>
>>>> Any chance your VMs were live migrated from pre-4.14 host kernel?
>>>
>>> No - the VMs are not live migrated. They are always running on the same
>>> host - either with kernel < 4.14 or with kernel 4.14.x.
>>
>> This is disturbing... unless I'm mistaken, it shouldn't be possible to
>> have UFO enabled on a virtio device in a VM booted on a host with 4.14
>> kernel.
> 
> Indeed. When working on that revert patch I verified that UFO in
> the guest virtio_net was off before the revert patch, on after.
> 
> Qemu should check host support with tap_probe_has_ufo
> before advertising support to the guest. Indeed, this is exactly
> what broke live migration in virtio_net_load_device at
> 
>     if (qemu_get_byte(f) && !peer_has_ufo(n)) {
>         error_report("virtio-net: saved image requires TUN_F_UFO support");
>         return -1;
>     }
> 
> Which follows
> 
>    peer_has_ufo
>      qemu_has_ufo
>        tap_has_ufo
>          s->has_ufo
> 
> where s->has_ufo was set by tap_probe_has_ufo in net_tap_fd_init.
> 
> Now, checking my qemu git branch, I ran pretty old 2.7.0-rc3. But this
> codepath does not seem to have changed between then and 2.10.1.
> 
> I cherry-picked the revert onto 4.14.3. It did not apply cleanly, but the
> fix-up wasn't too hard. Compiled and booted, but untested otherwise. At
> 
>   https://github.com/wdebruij/linux/commits/v4.14.3-aargh-ufo

I'm just running it at the moment. I didn't face any network hang until
now - although the critical UDP packages have been gone through.
Therefore: looks nice.


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 20:11                       ` Andreas Hartmann
@ 2017-12-08 20:44                         ` Andreas Hartmann
  2017-12-11 15:54                           ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-08 20:44 UTC (permalink / raw)
  To: Willem de Bruijn, Michal Kubecek
  Cc: Jason Wang, David Miller, Network Development

On 12/08/2017 at 09:11 PM Andreas Hartmann wrote:
> On 12/08/2017 at 05:04 PM Willem de Bruijn wrote:
>> On Fri, Dec 8, 2017 at 6:40 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
>>> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>>>> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
>>>>> On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>>>>>>
>>>>>> All my VMs are using virtio_net. BTW: I couldn't see the problems
>>>>>> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>>>>>> e1000 as interface instead.
>>>>>>
>>>>>> This finding now matches pretty much the responsible UDP-package which
>>>>>> caused the stall. I already mentioned it here [2].
>>>>>>
>>>>>> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>>>>>> Remove UDP Fragmentation Offload support" [3]
>>>>>>
>>>>>> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>>>>>> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>>>>>> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>>>>>>
>>>>>> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>>>>>> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>>>>>> are gone, too.
>>>>>>
>>>>>> Obviously, there is something broken with the new UDP handling. Could
>>>>>> you please analyze this problem? I could test some more patches ... .
>>>>>
>>>>> Any chance your VMs were live migrated from pre-4.14 host kernel?
>>>>
>>>> No - the VMs are not live migrated. They are always running on the same
>>>> host - either with kernel < 4.14 or with kernel 4.14.x.
>>>
>>> This is disturbing... unless I'm mistaken, it shouldn't be possible to
>>> have UFO enabled on a virtio device in a VM booted on a host with 4.14
>>> kernel.
>>
>> Indeed. When working on that revert patch I verified that UFO in
>> the guest virtio_net was off before the revert patch, on after.
>>
>> Qemu should check host support with tap_probe_has_ufo
>> before advertising support to the guest. Indeed, this is exactly
>> what broke live migration in virtio_net_load_device at
>>
>>     if (qemu_get_byte(f) && !peer_has_ufo(n)) {
>>         error_report("virtio-net: saved image requires TUN_F_UFO support");
>>         return -1;
>>     }
>>
>> Which follows
>>
>>    peer_has_ufo
>>      qemu_has_ufo
>>        tap_has_ufo
>>          s->has_ufo
>>
>> where s->has_ufo was set by tap_probe_has_ufo in net_tap_fd_init.
>>
>> Now, checking my qemu git branch, I ran pretty old 2.7.0-rc3. But this
>> codepath does not seem to have changed between then and 2.10.1.
>>
>> I cherry-picked the revert onto 4.14.3. It did not apply cleanly, but the
>> fix-up wasn't too hard. Compiled and booted, but untested otherwise. At
>>
>>   https://github.com/wdebruij/linux/commits/v4.14.3-aargh-ufo
> 
> I'm just running it at the moment. I didn't face any network hang until
> now - although the critical UDP packages have been gone through.
> Therefore: looks nice.

Well, the patch does not fix hanging VMs, which have been shutdown and
can't be killed any more.
Because of the stack trace

[<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
[<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
[<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
[<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
[<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
[<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

I was hoping, that the problems could be related - but that seems not to
be true.

Does anybody have any idea what happened here and how to analyze / fix it?


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-08 20:44                         ` Andreas Hartmann
@ 2017-12-11 15:54                           ` Andreas Hartmann
  2017-12-14 16:31                             ` Andreas Hartmann
  2017-12-14 22:17                             ` Willem de Bruijn
  0 siblings, 2 replies; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-11 15:54 UTC (permalink / raw)
  To: Willem de Bruijn, Michal Kubecek
  Cc: Jason Wang, David Miller, Network Development

[-- Attachment #1: Type: text/plain, Size: 4176 bytes --]

On 12/08/2017 at 09:44 PM Andreas Hartmann wrote:
> On 12/08/2017 at 09:11 PM Andreas Hartmann wrote:
>> On 12/08/2017 at 05:04 PM Willem de Bruijn wrote:
>>> On Fri, Dec 8, 2017 at 6:40 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
>>>> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>>>>> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
>>>>>> On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>>>>>>>
>>>>>>> All my VMs are using virtio_net. BTW: I couldn't see the problems
>>>>>>> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>>>>>>> e1000 as interface instead.
>>>>>>>
>>>>>>> This finding now matches pretty much the responsible UDP-package which
>>>>>>> caused the stall. I already mentioned it here [2].
>>>>>>>
>>>>>>> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>>>>>>> Remove UDP Fragmentation Offload support" [3]
>>>>>>>
>>>>>>> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>>>>>>> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>>>>>>> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>>>>>>>
>>>>>>> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>>>>>>> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>>>>>>> are gone, too.
>>>>>>>
>>>>>>> Obviously, there is something broken with the new UDP handling. Could
>>>>>>> you please analyze this problem? I could test some more patches ... .
>>>>>>
>>>>>> Any chance your VMs were live migrated from pre-4.14 host kernel?
>>>>>
>>>>> No - the VMs are not live migrated. They are always running on the same
>>>>> host - either with kernel < 4.14 or with kernel 4.14.x.
>>>>
>>>> This is disturbing... unless I'm mistaken, it shouldn't be possible to
>>>> have UFO enabled on a virtio device in a VM booted on a host with 4.14
>>>> kernel.
>>>
>>> Indeed. When working on that revert patch I verified that UFO in
>>> the guest virtio_net was off before the revert patch, on after.
>>>
>>> Qemu should check host support with tap_probe_has_ufo
>>> before advertising support to the guest. Indeed, this is exactly
>>> what broke live migration in virtio_net_load_device at
>>>
>>>     if (qemu_get_byte(f) && !peer_has_ufo(n)) {
>>>         error_report("virtio-net: saved image requires TUN_F_UFO support");
>>>         return -1;
>>>     }
>>>
>>> Which follows
>>>
>>>    peer_has_ufo
>>>      qemu_has_ufo
>>>        tap_has_ufo
>>>          s->has_ufo
>>>
>>> where s->has_ufo was set by tap_probe_has_ufo in net_tap_fd_init.
>>>
>>> Now, checking my qemu git branch, I ran pretty old 2.7.0-rc3. But this
>>> codepath does not seem to have changed between then and 2.10.1.
>>>
>>> I cherry-picked the revert onto 4.14.3. It did not apply cleanly, but the
>>> fix-up wasn't too hard. Compiled and booted, but untested otherwise. At
>>>
>>>   https://github.com/wdebruij/linux/commits/v4.14.3-aargh-ufo
>>
>> I'm just running it at the moment. I didn't face any network hang until
>> now - although the critical UDP packages have been gone through.
>> Therefore: looks nice.
> 
> Well, the patch does not fix hanging VMs, which have been shutdown and
> can't be killed any more.
> Because of the stack trace
> 
> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> I was hoping, that the problems could be related - but that seems not to
> be true.

However, it turned out, that reverting the complete patchset "Remove UDP
Fragmentation Offload support" prevent hanging qemu processes. I tested
today and the whole weekend - I didn't face any problem. Before that, I
could see nearly immediately hanging qemu processes after shutdown w/
libvirt.

Tested w/ 4.14.4, qemu 2.6.2 and libvirt 2.0.0 and 4 VMs.

I'll be back if the problem comes up again while the patchset is reverted.


Thanks,
Andreas

[-- Attachment #2: Revert - Remove UDP Fragmentation Offload support.tar.xz --]
[-- Type: application/x-xz, Size: 9152 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-11 15:54                           ` Andreas Hartmann
@ 2017-12-14 16:31                             ` Andreas Hartmann
  2017-12-14 22:17                             ` Willem de Bruijn
  1 sibling, 0 replies; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-14 16:31 UTC (permalink / raw)
  To: Willem de Bruijn, Michal Kubecek
  Cc: Jason Wang, David Miller, Network Development

On 12/11/2017 at 04:54 PM Andreas Hartmann wrote:
> On 12/08/2017 at 09:44 PM Andreas Hartmann wrote:
>> On 12/08/2017 at 09:11 PM Andreas Hartmann wrote:
>>> On 12/08/2017 at 05:04 PM Willem de Bruijn wrote:
>>>> On Fri, Dec 8, 2017 at 6:40 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
>>>>> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>>>>>> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
>>>>>>> On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>>>>>>>>
>>>>>>>> All my VMs are using virtio_net. BTW: I couldn't see the problems
>>>>>>>> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>>>>>>>> e1000 as interface instead.
>>>>>>>>
>>>>>>>> This finding now matches pretty much the responsible UDP-package which
>>>>>>>> caused the stall. I already mentioned it here [2].
>>>>>>>>
>>>>>>>> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>>>>>>>> Remove UDP Fragmentation Offload support" [3]
>>>>>>>>
>>>>>>>> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>>>>>>>> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>>>>>>>> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>>>>>>>>
>>>>>>>> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>>>>>>>> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>>>>>>>> are gone, too.
>>>>>>>>
>>>>>>>> Obviously, there is something broken with the new UDP handling. Could
>>>>>>>> you please analyze this problem? I could test some more patches ... .
>>>>>>>
>>>>>>> Any chance your VMs were live migrated from pre-4.14 host kernel?
>>>>>>
>>>>>> No - the VMs are not live migrated. They are always running on the same
>>>>>> host - either with kernel < 4.14 or with kernel 4.14.x.
>>>>>
>>>>> This is disturbing... unless I'm mistaken, it shouldn't be possible to
>>>>> have UFO enabled on a virtio device in a VM booted on a host with 4.14
>>>>> kernel.
>>>>
>>>> Indeed. When working on that revert patch I verified that UFO in
>>>> the guest virtio_net was off before the revert patch, on after.
>>>>
>>>> Qemu should check host support with tap_probe_has_ufo
>>>> before advertising support to the guest. Indeed, this is exactly
>>>> what broke live migration in virtio_net_load_device at
>>>>
>>>>     if (qemu_get_byte(f) && !peer_has_ufo(n)) {
>>>>         error_report("virtio-net: saved image requires TUN_F_UFO support");
>>>>         return -1;
>>>>     }
>>>>
>>>> Which follows
>>>>
>>>>    peer_has_ufo
>>>>      qemu_has_ufo
>>>>        tap_has_ufo
>>>>          s->has_ufo
>>>>
>>>> where s->has_ufo was set by tap_probe_has_ufo in net_tap_fd_init.
>>>>
>>>> Now, checking my qemu git branch, I ran pretty old 2.7.0-rc3. But this
>>>> codepath does not seem to have changed between then and 2.10.1.
>>>>
>>>> I cherry-picked the revert onto 4.14.3. It did not apply cleanly, but the
>>>> fix-up wasn't too hard. Compiled and booted, but untested otherwise. At
>>>>
>>>>   https://github.com/wdebruij/linux/commits/v4.14.3-aargh-ufo
>>>
>>> I'm just running it at the moment. I didn't face any network hang until
>>> now - although the critical UDP packages have been gone through.
>>> Therefore: looks nice.
>>
>> Well, the patch does not fix hanging VMs, which have been shutdown and
>> can't be killed any more.
>> Because of the stack trace
>>
>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> I was hoping, that the problems could be related - but that seems not to
>> be true.
> 
> However, it turned out, that reverting the complete patchset "Remove UDP
> Fragmentation Offload support" prevent hanging qemu processes. I tested
> today and the whole weekend - I didn't face any problem. Before that, I
> could see nearly immediately hanging qemu processes after shutdown w/
> libvirt.
> 
> Tested w/ 4.14.4, qemu 2.6.2 and libvirt 2.0.0 and 4 VMs.
> 
> I'll be back if the problem comes up again while the patchset is reverted.

After reversion of "Remove UDP Fragmentation Offload support" I didn't
see any problem any more so far.


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-11 15:54                           ` Andreas Hartmann
  2017-12-14 16:31                             ` Andreas Hartmann
@ 2017-12-14 22:17                             ` Willem de Bruijn
  2017-12-14 22:47                               ` Willem de Bruijn
  2017-12-15  6:05                               ` Andreas Hartmann
  1 sibling, 2 replies; 34+ messages in thread
From: Willem de Bruijn @ 2017-12-14 22:17 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

>> Well, the patch does not fix hanging VMs, which have been shutdown and
>> can't be killed any more.
>> Because of the stack trace
>>
>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> I was hoping, that the problems could be related - but that seems not to
>> be true.
>
> However, it turned out, that reverting the complete patchset "Remove UDP
> Fragmentation Offload support" prevent hanging qemu processes.

That implies a combination of UFO and vhost zerocopy. Disabling
experimental_zcopytx in vhost_net will probably work around the bug
then.

On the surface the two features are independent. Most of the relevant
UFO code is reverted with the patch mentioned earlier. Missing from
that is protocol stack support, but it is unlikely that your host OS is
generating these UFO packets.

They are coming from a guest over virtio_net, to which vhost_net then
applies zerocopy. Then the packet(s) is/are either freed without calling
uarg->callback() or queued somewhere for a very long time.

Looking at the diff-of-diffs between my stable patch and your full revert,
the majority of missing bits beside the procol layer is in device driver
support. Removing that causes the UFO packets to be segmented at any
dev_queue_xmit on their path. skb_segment ensures that when it segments
a large zerocopy packet, all new segments also point to the zerocopy
callback struct (ubuf_info), as the shared memory pages may not be
released until all skbs pointing to them are freed.

That may be wrong with vhost_zerocopy_callback, which does not use
refcounting. I will look into that. It may be that before the msg_zerocopy
patchsets large packets were copied before entering segmentation. It is
safe to enter segmentation for msg_zerocopy skbs, but not legacy zerocopy
skbs.

I will also set up two VMs and try to send UFO packets and see whether
they indeed are freed somewhere in the stack without notifying vhost_net.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-14 22:17                             ` Willem de Bruijn
@ 2017-12-14 22:47                               ` Willem de Bruijn
  2017-12-15  6:05                               ` Andreas Hartmann
  1 sibling, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2017-12-14 22:47 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On Thu, Dec 14, 2017 at 5:17 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>> can't be killed any more.
>>> Because of the stack trace
>>>
>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> I was hoping, that the problems could be related - but that seems not to
>>> be true.
>>
>> However, it turned out, that reverting the complete patchset "Remove UDP
>> Fragmentation Offload support" prevent hanging qemu processes.
>
> That implies a combination of UFO and vhost zerocopy. Disabling
> experimental_zcopytx in vhost_net will probably work around the bug
> then.
>
> On the surface the two features are independent. Most of the relevant
> UFO code is reverted with the patch mentioned earlier. Missing from
> that is protocol stack support, but it is unlikely that your host OS is
> generating these UFO packets.
>
> They are coming from a guest over virtio_net, to which vhost_net then
> applies zerocopy. Then the packet(s) is/are either freed without calling
> uarg->callback() or queued somewhere for a very long time.
>
> Looking at the diff-of-diffs between my stable patch and your full revert,
> the majority of missing bits beside the procol layer is in device driver
> support. Removing that causes the UFO packets to be segmented at any
> dev_queue_xmit on their path. skb_segment ensures that when it segments
> a large zerocopy packet, all new segments also point to the zerocopy
> callback struct (ubuf_info), as the shared memory pages may not be
> released until all skbs pointing to them are freed.
>
> That may be wrong with vhost_zerocopy_callback, which does not use
> refcounting. I will look into that. It may be that before the msg_zerocopy
> patchsets large packets were copied before entering segmentation. It is
> safe to enter segmentation for msg_zerocopy skbs, but not legacy zerocopy
> skbs.

If this is the cause, then the following, while not a real solution, would
probably also solve resolve the observed issue.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e140ba49b30a..8fe5bca1d6ae 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3655,10 +3655,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
                skb_copy_from_linear_data_offset(head_skb, offset,
                                                 skb_put(nskb, hsize), hsize);

+               if (unlikely(skb_orphan_frags_rx(head_skb, GFP_ATOMIC)))
+                       goto err;
                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
-                       goto err;

This basically converts zerocopy TSO skbs to regular and calls their
uarg->callback just before segmenting them.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-14 22:17                             ` Willem de Bruijn
  2017-12-14 22:47                               ` Willem de Bruijn
@ 2017-12-15  6:05                               ` Andreas Hartmann
  2017-12-17 22:33                                 ` Willem de Bruijn
  1 sibling, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-15  6:05 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On 12/14/2017 at 11:17 PM Willem de Bruijn wrote:
>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>> can't be killed any more.
>>> Because of the stack trace
>>>
>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> I was hoping, that the problems could be related - but that seems not to
>>> be true.
>>
>> However, it turned out, that reverting the complete patchset "Remove UDP
>> Fragmentation Offload support" prevent hanging qemu processes.
> 
> That implies a combination of UFO and vhost zerocopy. Disabling
> experimental_zcopytx in vhost_net will probably work around the bug
> then.

I already tested it w/ options vhost_net experimental_zcopytx=0 - but
this didn't "resolve" anything. See
https://www.mail-archive.com/netdev@vger.kernel.org/msg203197.html

Therefore, I think your following thoughts are lapsed unfortunately,
aren't they?

> On the surface the two features are independent. Most of the relevant
> UFO code is reverted with the patch mentioned earlier. Missing from
> that is protocol stack support, but it is unlikely that your host OS is
> generating these UFO packets.
> 
> They are coming from a guest over virtio_net, to which vhost_net then
> applies zerocopy. Then the packet(s) is/are either freed without calling
> uarg->callback() or queued somewhere for a very long time.
> 
> Looking at the diff-of-diffs between my stable patch and your full revert,
> the majority of missing bits beside the procol layer is in device driver
> support. Removing that causes the UFO packets to be segmented at any
> dev_queue_xmit on their path. skb_segment ensures that when it segments
> a large zerocopy packet, all new segments also point to the zerocopy
> callback struct (ubuf_info), as the shared memory pages may not be
> released until all skbs pointing to them are freed.
> 
> That may be wrong with vhost_zerocopy_callback, which does not use
> refcounting. I will look into that. It may be that before the msg_zerocopy
> patchsets large packets were copied before entering segmentation. It is
> safe to enter segmentation for msg_zerocopy skbs, but not legacy zerocopy
> skbs.
> 
> I will also set up two VMs and try to send UFO packets and see whether
> they indeed are freed somewhere in the stack without notifying vhost_net.


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-15  6:05                               ` Andreas Hartmann
@ 2017-12-17 22:33                                 ` Willem de Bruijn
  2017-12-18 17:11                                   ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2017-12-17 22:33 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On Fri, Dec 15, 2017 at 1:05 AM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/14/2017 at 11:17 PM Willem de Bruijn wrote:
>>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>>> can't be killed any more.
>>>> Because of the stack trace
>>>>
>>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>
>>>> I was hoping, that the problems could be related - but that seems not to
>>>> be true.
>>>
>>> However, it turned out, that reverting the complete patchset "Remove UDP
>>> Fragmentation Offload support" prevent hanging qemu processes.
>>
>> That implies a combination of UFO and vhost zerocopy. Disabling
>> experimental_zcopytx in vhost_net will probably work around the bug
>> then.

I have been able to reproduce the hang by sending a UFO packet
between two guests running v4.13 on a host running v4.15-rc1.

The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
vhost_zerocopy_callback being called for each segment of a
segmented UFO skb. This refcount is decremented then on each
segment, but incremented only once for the entire UFO skb.

Before v4.14, these packets would be converted in skb_segment to
regular copy packets with skb_orphan_frags and the callback function
called once at this point. v4.14 added support for reference counted
zerocopy skb that can pass through skb_orphan_frags unmodified and
have their zerocopy state safely cloned with skb_zerocopy_clone.

The call to skb_zerocopy_clone must come after skb_orphan_frags
to limit cloning of this state to those skbs that can do so safely.

Please try a host with the following patch. This fixes it for me. I intend to
send it to net.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a592ca025fc4..d2d985418819 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
-                       goto err;

                while (pos < offset + len) {
                        if (i >= nfrags) {
@@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                        if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
                                goto err;
+                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+                               goto err;

                        *nskb_frag = *frag;
                        __skb_frag_ref(nskb_frag);


This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
in the frags[] array. I will follow-up with a patch to net-next that only
checks once per skb:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466581cf4cdc..a293a33604ec 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
+               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
                        goto err;

                while (pos < offset + len) {
@@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                                BUG_ON(!nfrags);

+                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+                                   skb_zerocopy_clone(nskb, frag_skb,
+                                                      GFP_ATOMIC))
+                                       goto err;
+
                                list_skb = list_skb->next;
                        }

@@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
                                goto err;
                        }

-                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
-                               goto err;
-

I'll also send to net-next

(1) a patch to convert its vhost_net_ ubuf_ref refcnt to refcount_t

(2) a path to skb_zerocopy_clone to warn on clone if not
     sock_zerocopy_callback

> I already tested it w/ options vhost_net experimental_zcopytx=0 - but
> this didn't "resolve" anything. See
> https://www.mail-archive.com/netdev@vger.kernel.org/msg203197.html
>
> Therefore, I think your following thoughts are lapsed unfortunately,
> aren't they?

That experiment was perhaps run before commit 0c19f846d582 ("net:
accept UFO datagrams from tuntap and packet") and hit the other UFO
bug.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-17 22:33                                 ` Willem de Bruijn
@ 2017-12-18 17:11                                   ` Andreas Hartmann
  2017-12-20 15:56                                     ` Andreas Hartmann
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-18 17:11 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
> On Fri, Dec 15, 2017 at 1:05 AM, Andreas Hartmann
> <andihartmann@01019freenet.de> wrote:
>> On 12/14/2017 at 11:17 PM Willem de Bruijn wrote:
>>>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>>>> can't be killed any more.
>>>>> Because of the stack trace
>>>>>
>>>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>
>>>>> I was hoping, that the problems could be related - but that seems not to
>>>>> be true.
>>>>
>>>> However, it turned out, that reverting the complete patchset "Remove UDP
>>>> Fragmentation Offload support" prevent hanging qemu processes.
>>>
>>> That implies a combination of UFO and vhost zerocopy. Disabling
>>> experimental_zcopytx in vhost_net will probably work around the bug
>>> then.
> 
> I have been able to reproduce the hang by sending a UFO packet
> between two guests running v4.13 on a host running v4.15-rc1.
> 
> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
> vhost_zerocopy_callback being called for each segment of a
> segmented UFO skb. This refcount is decremented then on each
> segment, but incremented only once for the entire UFO skb.
> 
> Before v4.14, these packets would be converted in skb_segment to
> regular copy packets with skb_orphan_frags and the callback function
> called once at this point. v4.14 added support for reference counted
> zerocopy skb that can pass through skb_orphan_frags unmodified and
> have their zerocopy state safely cloned with skb_zerocopy_clone.
> 
> The call to skb_zerocopy_clone must come after skb_orphan_frags
> to limit cloning of this state to those skbs that can do so safely.
> 
> Please try a host with the following patch. This fixes it for me. I intend to
> send it to net.
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a592ca025fc4..d2d985418819 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>                                               SKBTX_SHARED_FRAG;
> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
> -                       goto err;
> 
>                 while (pos < offset + len) {
>                         if (i >= nfrags) {
> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>                                 goto err;
> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> +                               goto err;
> 
>                         *nskb_frag = *frag;
>                         __skb_frag_ref(nskb_frag);
> 
> 
> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
> in the frags[] array. I will follow-up with a patch to net-next that only
> checks once per skb:
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 466581cf4cdc..a293a33604ec 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>                                               SKBTX_SHARED_FRAG;
> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>                         goto err;
> 
>                 while (pos < offset + len) {
> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                                 BUG_ON(!nfrags);
> 
> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> +                                   skb_zerocopy_clone(nskb, frag_skb,
> +                                                      GFP_ATOMIC))
> +                                       goto err;
> +
>                                 list_skb = list_skb->next;
>                         }
> 
> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>                                 goto err;
>                         }
> 
> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
> -                               goto err;
> -

I'm currently testing this one.

> 
> I'll also send to net-next
> 
> (1) a patch to convert its vhost_net_ ubuf_ref refcnt to refcount_t
> 
> (2) a path to skb_zerocopy_clone to warn on clone if not
>      sock_zerocopy_callback
> 
>> I already tested it w/ options vhost_net experimental_zcopytx=0 - but
>> this didn't "resolve" anything. See
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg203197.html
>>
>> Therefore, I think your following thoughts are lapsed unfortunately,
>> aren't they?
> 
> That experiment was perhaps run before commit 0c19f846d582 ("net:
> accept UFO datagrams from tuntap and packet") and hit the other UFO
> bug.

That's probably true.


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-18 17:11                                   ` Andreas Hartmann
@ 2017-12-20 15:56                                     ` Andreas Hartmann
  2017-12-20 22:44                                       ` Willem de Bruijn
  2017-12-24 16:24                                       ` Andreas Hartmann
  0 siblings, 2 replies; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-20 15:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
[...]
>> I have been able to reproduce the hang by sending a UFO packet
>> between two guests running v4.13 on a host running v4.15-rc1.
>>
>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>> vhost_zerocopy_callback being called for each segment of a
>> segmented UFO skb. This refcount is decremented then on each
>> segment, but incremented only once for the entire UFO skb.
>>
>> Before v4.14, these packets would be converted in skb_segment to
>> regular copy packets with skb_orphan_frags and the callback function
>> called once at this point. v4.14 added support for reference counted
>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>
>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>> to limit cloning of this state to those skbs that can do so safely.
>>
>> Please try a host with the following patch. This fixes it for me. I intend to
>> send it to net.
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a592ca025fc4..d2d985418819 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> -                       goto err;
>>
>>                 while (pos < offset + len) {
>>                         if (i >= nfrags) {
>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>                                 goto err;
>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>> +                               goto err;
>>
>>                         *nskb_frag = *frag;
>>                         __skb_frag_ref(nskb_frag);
>>
>>
>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>> in the frags[] array. I will follow-up with a patch to net-next that only
>> checks once per skb:
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 466581cf4cdc..a293a33604ec 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>                         goto err;
>>
>>                 while (pos < offset + len) {
>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                                 BUG_ON(!nfrags);
>>
>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>> +                                                      GFP_ATOMIC))
>> +                                       goto err;
>> +
>>                                 list_skb = list_skb->next;
>>                         }
>>
>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>                                 goto err;
>>                         }
>>
>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>> -                               goto err;
>> -
> 
> I'm currently testing this one.
> 

Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
accept UFO datagrams from tuntap and packet".

At first, I tested an unpatched 4.14.7 - the problem (no more killable
qemu-process) did occur promptly on shutdown of the machine. This was
expected.

Next, I applied the above patch (the second one). Until now, I didn't
face any problem any more on shutdown of VMs. Looks promising.


Thanks,
regards,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-20 15:56                                     ` Andreas Hartmann
@ 2017-12-20 22:44                                       ` Willem de Bruijn
  2017-12-21 17:05                                         ` Andreas Hartmann
  2017-12-24 16:24                                       ` Andreas Hartmann
  1 sibling, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2017-12-20 22:44 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On Wed, Dec 20, 2017 at 10:56 AM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
> [...]
>>> I have been able to reproduce the hang by sending a UFO packet
>>> between two guests running v4.13 on a host running v4.15-rc1.
>>>
>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>>> vhost_zerocopy_callback being called for each segment of a
>>> segmented UFO skb. This refcount is decremented then on each
>>> segment, but incremented only once for the entire UFO skb.
>>>
>>> Before v4.14, these packets would be converted in skb_segment to
>>> regular copy packets with skb_orphan_frags and the callback function
>>> called once at this point. v4.14 added support for reference counted
>>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>>
>>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>>> to limit cloning of this state to those skbs that can do so safely.
>>>
>>> Please try a host with the following patch. This fixes it for me. I intend to
>>> send it to net.
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index a592ca025fc4..d2d985418819 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>                                               SKBTX_SHARED_FRAG;
>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>> -                       goto err;
>>>
>>>                 while (pos < offset + len) {
>>>                         if (i >= nfrags) {
>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>                                 goto err;
>>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>> +                               goto err;
>>>
>>>                         *nskb_frag = *frag;
>>>                         __skb_frag_ref(nskb_frag);
>>>
>>>
>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>>> in the frags[] array. I will follow-up with a patch to net-next that only
>>> checks once per skb:
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 466581cf4cdc..a293a33604ec 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>                                               SKBTX_SHARED_FRAG;
>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>                         goto err;
>>>
>>>                 while (pos < offset + len) {
>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                                 BUG_ON(!nfrags);
>>>
>>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>>> +                                                      GFP_ATOMIC))
>>> +                                       goto err;
>>> +
>>>                                 list_skb = list_skb->next;
>>>                         }
>>>
>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>                                 goto err;
>>>                         }
>>>
>>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>> -                               goto err;
>>> -
>>
>> I'm currently testing this one.
>>
>
> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
> accept UFO datagrams from tuntap and packet".
>
> At first, I tested an unpatched 4.14.7 - the problem (no more killable
> qemu-process) did occur promptly on shutdown of the machine. This was
> expected.
>
> Next, I applied the above patch (the second one). Until now, I didn't
> face any problem any more on shutdown of VMs. Looks promising.

Thanks for testing.

I sent the first, simpler, one to net together with another fix.

  http://patchwork.ozlabs.org/patch/851715/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-20 22:44                                       ` Willem de Bruijn
@ 2017-12-21 17:05                                         ` Andreas Hartmann
  2017-12-21 17:11                                           ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-21 17:05 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On 12/20/2017 at 11:44 PM Willem de Bruijn wrote:
> On Wed, Dec 20, 2017 at 10:56 AM, Andreas Hartmann
> <andihartmann@01019freenet.de> wrote:
>> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
>>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
>> [...]
>>>> I have been able to reproduce the hang by sending a UFO packet
>>>> between two guests running v4.13 on a host running v4.15-rc1.
>>>>
>>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>>>> vhost_zerocopy_callback being called for each segment of a
>>>> segmented UFO skb. This refcount is decremented then on each
>>>> segment, but incremented only once for the entire UFO skb.
>>>>
>>>> Before v4.14, these packets would be converted in skb_segment to
>>>> regular copy packets with skb_orphan_frags and the callback function
>>>> called once at this point. v4.14 added support for reference counted
>>>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>>>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>>>
>>>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>>>> to limit cloning of this state to those skbs that can do so safely.
>>>>
>>>> Please try a host with the following patch. This fixes it for me. I intend to
>>>> send it to net.
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index a592ca025fc4..d2d985418819 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                  skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>>                                                SKBTX_SHARED_FRAG;
>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>> -                       goto err;
>>>>
>>>>                  while (pos < offset + len) {
>>>>                          if (i >= nfrags) {
>>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                          if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>>                                  goto err;
>>>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>> +                               goto err;
>>>>
>>>>                          *nskb_frag = *frag;
>>>>                          __skb_frag_ref(nskb_frag);
>>>>
>>>>
>>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>>>> in the frags[] array. I will follow-up with a patch to net-next that only
>>>> checks once per skb:
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index 466581cf4cdc..a293a33604ec 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                  skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>>                                                SKBTX_SHARED_FRAG;
>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>>                          goto err;
>>>>
>>>>                  while (pos < offset + len) {
>>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                                  BUG_ON(!nfrags);
>>>>
>>>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>>>> +                                                      GFP_ATOMIC))
>>>> +                                       goto err;
>>>> +
>>>>                                  list_skb = list_skb->next;
>>>>                          }
>>>>
>>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>                                  goto err;
>>>>                          }
>>>>
>>>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>> -                               goto err;
>>>> -
>>>
>>> I'm currently testing this one.
>>>
>>
>> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
>> accept UFO datagrams from tuntap and packet".
>>
>> At first, I tested an unpatched 4.14.7 - the problem (no more killable
>> qemu-process) did occur promptly on shutdown of the machine. This was
>> expected.
>>
>> Next, I applied the above patch (the second one). Until now, I didn't
>> face any problem any more on shutdown of VMs. Looks promising.
> 
> Thanks for testing.
> 
> I sent the first, simpler, one to net together with another fix.
> 
>    http://patchwork.ozlabs.org/patch/851715/
> 

If I'm using the second patch above (the more efficient one and not 
"[net,1/2] skbuff: orphan frags before zerocopy clone"), which I'm 
already testing here: Is it still necessary to apply this patch 
"[net,2/2] skbuff: skb_copy_ubufs must release uarg even without user 
frags"?


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-21 17:05                                         ` Andreas Hartmann
@ 2017-12-21 17:11                                           ` Willem de Bruijn
  0 siblings, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2017-12-21 17:11 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On Thu, Dec 21, 2017 at 12:05 PM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/20/2017 at 11:44 PM Willem de Bruijn wrote:
>>
>> On Wed, Dec 20, 2017 at 10:56 AM, Andreas Hartmann
>> <andihartmann@01019freenet.de> wrote:
>>>
>>> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
>>>>
>>>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
>>>
>>> [...]
>>>>>
>>>>> I have been able to reproduce the hang by sending a UFO packet
>>>>> between two guests running v4.13 on a host running v4.15-rc1.
>>>>>
>>>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>>>>> vhost_zerocopy_callback being called for each segment of a
>>>>> segmented UFO skb. This refcount is decremented then on each
>>>>> segment, but incremented only once for the entire UFO skb.
>>>>>
>>>>> Before v4.14, these packets would be converted in skb_segment to
>>>>> regular copy packets with skb_orphan_frags and the callback function
>>>>> called once at this point. v4.14 added support for reference counted
>>>>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>>>>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>>>>
>>>>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>>>>> to limit cloning of this state to those skbs that can do so safely.
>>>>>
>>>>> Please try a host with the following patch. This fixes it for me. I
>>>>> intend to
>>>>> send it to net.
>>>>>
>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>>> index a592ca025fc4..d2d985418819 100644
>>>>> --- a/net/core/skbuff.c
>>>>> +++ b/net/core/skbuff.c
>>>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff
>>>>> *head_skb,
>>>>>
>>>>>                  skb_shinfo(nskb)->tx_flags |=
>>>>> skb_shinfo(head_skb)->tx_flags &
>>>>>                                                SKBTX_SHARED_FRAG;
>>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>>> -                       goto err;
>>>>>
>>>>>                  while (pos < offset + len) {
>>>>>                          if (i >= nfrags) {
>>>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff
>>>>> *head_skb,
>>>>>
>>>>>                          if (unlikely(skb_orphan_frags(frag_skb,
>>>>> GFP_ATOMIC)))
>>>>>                                  goto err;
>>>>> +                       if (skb_zerocopy_clone(nskb, frag_skb,
>>>>> GFP_ATOMIC))
>>>>> +                               goto err;
>>>>>
>>>>>                          *nskb_frag = *frag;
>>>>>                          __skb_frag_ref(nskb_frag);
>>>>>
>>>>>
>>>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each
>>>>> frag
>>>>> in the frags[] array. I will follow-up with a patch to net-next that
>>>>> only
>>>>> checks once per skb:
>>>>>
>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>>> index 466581cf4cdc..a293a33604ec 100644
>>>>> --- a/net/core/skbuff.c
>>>>> +++ b/net/core/skbuff.c
>>>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff
>>>>> *head_skb,
>>>>>
>>>>>                  skb_shinfo(nskb)->tx_flags |=
>>>>> skb_shinfo(head_skb)->tx_flags &
>>>>>                                                SKBTX_SHARED_FRAG;
>>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>>>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>>>                          goto err;
>>>>>
>>>>>                  while (pos < offset + len) {
>>>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff
>>>>> *head_skb,
>>>>>
>>>>>                                  BUG_ON(!nfrags);
>>>>>
>>>>> +                               if (skb_orphan_frags(frag_skb,
>>>>> GFP_ATOMIC) ||
>>>>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>>>>> +                                                      GFP_ATOMIC))
>>>>> +                                       goto err;
>>>>> +
>>>>>                                  list_skb = list_skb->next;
>>>>>                          }
>>>>>
>>>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff
>>>>> *head_skb,
>>>>>                                  goto err;
>>>>>                          }
>>>>>
>>>>> -                       if (unlikely(skb_orphan_frags(frag_skb,
>>>>> GFP_ATOMIC)))
>>>>> -                               goto err;
>>>>> -
>>>>
>>>>
>>>> I'm currently testing this one.
>>>>
>>>
>>> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
>>> accept UFO datagrams from tuntap and packet".
>>>
>>> At first, I tested an unpatched 4.14.7 - the problem (no more killable
>>> qemu-process) did occur promptly on shutdown of the machine. This was
>>> expected.
>>>
>>> Next, I applied the above patch (the second one). Until now, I didn't
>>> face any problem any more on shutdown of VMs. Looks promising.
>>
>>
>> Thanks for testing.
>>
>> I sent the first, simpler, one to net together with another fix.
>>
>>    http://patchwork.ozlabs.org/patch/851715/
>>
>
> If I'm using the second patch above (the more efficient one and not
> "[net,1/2] skbuff: orphan frags before zerocopy clone"), which I'm already
> testing here: Is it still necessary to apply this patch "[net,2/2] skbuff:
> skb_copy_ubufs must release uarg even without user frags"?

Not for this issue. It is an unrelated bug and not triggered by virtio_net
as configured normally.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-20 15:56                                     ` Andreas Hartmann
  2017-12-20 22:44                                       ` Willem de Bruijn
@ 2017-12-24 16:24                                       ` Andreas Hartmann
  2017-12-24 18:54                                         ` Willem de Bruijn
  1 sibling, 1 reply; 34+ messages in thread
From: Andreas Hartmann @ 2017-12-24 16:24 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On 12/20/2017 at 04:56 PM Andreas Hartmann wrote:
> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
> [...]
>>> I have been able to reproduce the hang by sending a UFO packet
>>> between two guests running v4.13 on a host running v4.15-rc1.
>>>
>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>>> vhost_zerocopy_callback being called for each segment of a
>>> segmented UFO skb. This refcount is decremented then on each
>>> segment, but incremented only once for the entire UFO skb.
>>>
>>> Before v4.14, these packets would be converted in skb_segment to
>>> regular copy packets with skb_orphan_frags and the callback function
>>> called once at this point. v4.14 added support for reference counted
>>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>>
>>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>>> to limit cloning of this state to those skbs that can do so safely.
>>>
>>> Please try a host with the following patch. This fixes it for me. I intend to
>>> send it to net.
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index a592ca025fc4..d2d985418819 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>                                               SKBTX_SHARED_FRAG;
>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>> -                       goto err;
>>>
>>>                 while (pos < offset + len) {
>>>                         if (i >= nfrags) {
>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>                                 goto err;
>>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>> +                               goto err;
>>>
>>>                         *nskb_frag = *frag;
>>>                         __skb_frag_ref(nskb_frag);
>>>
>>>
>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>>> in the frags[] array. I will follow-up with a patch to net-next that only
>>> checks once per skb:
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 466581cf4cdc..a293a33604ec 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>                                               SKBTX_SHARED_FRAG;
>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>                         goto err;
>>>
>>>                 while (pos < offset + len) {
>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                                 BUG_ON(!nfrags);
>>>
>>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>>> +                                                      GFP_ATOMIC))
>>> +                                       goto err;
>>> +
>>>                                 list_skb = list_skb->next;
>>>                         }
>>>
>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>                                 goto err;
>>>                         }
>>>
>>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>> -                               goto err;
>>> -
>>
>> I'm currently testing this one.
>>
> 
> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
> accept UFO datagrams from tuntap and packet".
> 
> At first, I tested an unpatched 4.14.7 - the problem (no more killable
> qemu-process) did occur promptly on shutdown of the machine. This was
> expected.
> 
> Next, I applied the above patch (the second one). Until now, I didn't
> face any problem any more on shutdown of VMs. Looks promising.

Ok, I didn't face any problem any more! Many thanks for your effort and
your 2 patches to get 4.14. working again w/ qemu and virtual networks /
virtio!


Regards,
Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
  2017-12-24 16:24                                       ` Andreas Hartmann
@ 2017-12-24 18:54                                         ` Willem de Bruijn
  0 siblings, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2017-12-24 18:54 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development

On Sun, Dec 24, 2017 at 11:24 AM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/20/2017 at 04:56 PM Andreas Hartmann wrote:
>> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
>>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
>> [...]
>>>> I have been able to reproduce the hang by sending a UFO packet
>>>> between two guests running v4.13 on a host running v4.15-rc1.
>>>>
>>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>>>> vhost_zerocopy_callback being called for each segment of a
>>>> segmented UFO skb. This refcount is decremented then on each
>>>> segment, but incremented only once for the entire UFO skb.
>>>>
>>>> Before v4.14, these packets would be converted in skb_segment to
>>>> regular copy packets with skb_orphan_frags and the callback function
>>>> called once at this point. v4.14 added support for reference counted
>>>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>>>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>>>
>>>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>>>> to limit cloning of this state to those skbs that can do so safely.
>>>>
>>>> Please try a host with the following patch. This fixes it for me. I intend to
>>>> send it to net.
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index a592ca025fc4..d2d985418819 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>>                                               SKBTX_SHARED_FRAG;
>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>> -                       goto err;
>>>>
>>>>                 while (pos < offset + len) {
>>>>                         if (i >= nfrags) {
>>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>>                                 goto err;
>>>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>> +                               goto err;
>>>>
>>>>                         *nskb_frag = *frag;
>>>>                         __skb_frag_ref(nskb_frag);
>>>>
>>>>
>>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>>>> in the frags[] array. I will follow-up with a patch to net-next that only
>>>> checks once per skb:
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index 466581cf4cdc..a293a33604ec 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>>                                               SKBTX_SHARED_FRAG;
>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>>                         goto err;
>>>>
>>>>                 while (pos < offset + len) {
>>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                                 BUG_ON(!nfrags);
>>>>
>>>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>>>> +                                                      GFP_ATOMIC))
>>>> +                                       goto err;
>>>> +
>>>>                                 list_skb = list_skb->next;
>>>>                         }
>>>>
>>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>                                 goto err;
>>>>                         }
>>>>
>>>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>> -                               goto err;
>>>> -
>>>
>>> I'm currently testing this one.
>>>
>>
>> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
>> accept UFO datagrams from tuntap and packet".
>>
>> At first, I tested an unpatched 4.14.7 - the problem (no more killable
>> qemu-process) did occur promptly on shutdown of the machine. This was
>> expected.
>>
>> Next, I applied the above patch (the second one). Until now, I didn't
>> face any problem any more on shutdown of VMs. Looks promising.
>
> Ok, I didn't face any problem any more! Many thanks for your effort and
> your 2 patches to get 4.14. working again w/ qemu and virtual networks /
> virtio!

That is great news. Thanks a lot for testing!

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-12-24 18:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-26 14:17 Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected Andreas Hartmann
2017-11-27 16:46 ` Andreas Hartmann
2017-11-27 16:55   ` Michal Kubecek
2017-11-27 19:09     ` Andreas Hartmann
2017-12-01 10:11 ` Andreas Hartmann
2017-12-03 11:35   ` Andreas Hartmann
2017-12-04 16:28     ` Andreas Hartmann
2017-12-05  3:50       ` Jason Wang
2017-12-05 16:23         ` Andreas Hartmann
2017-12-06  3:08           ` Jason Wang
2017-12-08  7:21             ` Andreas Hartmann
2017-12-08  8:47               ` Michal Kubecek
2017-12-08 10:31                 ` Andreas Hartmann
2017-12-08 11:40                   ` Michal Kubecek
2017-12-08 12:45                     ` Andreas Hartmann
2017-12-08 12:58                       ` Michal Kubecek
2017-12-08 13:13                         ` Andreas Hartmann
2017-12-08 15:11                           ` Jason Wang
2017-12-08 16:04                     ` Willem de Bruijn
2017-12-08 20:11                       ` Andreas Hartmann
2017-12-08 20:44                         ` Andreas Hartmann
2017-12-11 15:54                           ` Andreas Hartmann
2017-12-14 16:31                             ` Andreas Hartmann
2017-12-14 22:17                             ` Willem de Bruijn
2017-12-14 22:47                               ` Willem de Bruijn
2017-12-15  6:05                               ` Andreas Hartmann
2017-12-17 22:33                                 ` Willem de Bruijn
2017-12-18 17:11                                   ` Andreas Hartmann
2017-12-20 15:56                                     ` Andreas Hartmann
2017-12-20 22:44                                       ` Willem de Bruijn
2017-12-21 17:05                                         ` Andreas Hartmann
2017-12-21 17:11                                           ` Willem de Bruijn
2017-12-24 16:24                                       ` Andreas Hartmann
2017-12-24 18:54                                         ` Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.