All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Hartmann <andihartmann@01019freenet.de>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Michal Kubecek <mkubecek@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
Date: Mon, 11 Dec 2017 16:54:00 +0100	[thread overview]
Message-ID: <d71df64e-e65f-4db4-6f2e-c002c15fcbe4@01019freenet.de> (raw)
In-Reply-To: <bc075f2c-85d0-cf97-60cd-ee6d6efca12e@01019freenet.de>

[-- Attachment #1: Type: text/plain, Size: 4176 bytes --]

On 12/08/2017 at 09:44 PM Andreas Hartmann wrote:
> On 12/08/2017 at 09:11 PM Andreas Hartmann wrote:
>> On 12/08/2017 at 05:04 PM Willem de Bruijn wrote:
>>> On Fri, Dec 8, 2017 at 6:40 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
>>>> On Fri, Dec 08, 2017 at 11:31:50AM +0100, Andreas Hartmann wrote:
>>>>> On 12/08/2017 at 09:47 AM Michal Kubecek wrote:
>>>>>> On Fri, Dec 08, 2017 at 08:21:16AM +0100, Andreas Hartmann wrote:
>>>>>>>
>>>>>>> All my VMs are using virtio_net. BTW: I couldn't see the problems
>>>>>>> (sometimes, the VM couldn't be stopped at all) if all my VMs are using
>>>>>>> e1000 as interface instead.
>>>>>>>
>>>>>>> This finding now matches pretty much the responsible UDP-package which
>>>>>>> caused the stall. I already mentioned it here [2].
>>>>>>>
>>>>>>> To prove it, I reverted from the patch series "[PATCH v2 RFC 0/13]
>>>>>>> Remove UDP Fragmentation Offload support" [3]
>>>>>>>
>>>>>>> 11/13 [v2,RFC,11/13] net: Remove all references to SKB_GSO_UDP. [4]
>>>>>>> 12/13 [v2,RFC,12/13] inet: Remove software UFO fragmenting code. [5]
>>>>>>> 13/13 [v2,RFC,13/13] net: Kill NETIF_F_UFO and SKB_GSO_UDP. [6]
>>>>>>>
>>>>>>> and applied it to Linux 4.14.4. It compiled fine and is running fine.
>>>>>>> The vnet doesn't die anymore. Yet, I can't say if the qemu stop hangs
>>>>>>> are gone, too.
>>>>>>>
>>>>>>> Obviously, there is something broken with the new UDP handling. Could
>>>>>>> you please analyze this problem? I could test some more patches ... .
>>>>>>
>>>>>> Any chance your VMs were live migrated from pre-4.14 host kernel?
>>>>>
>>>>> No - the VMs are not live migrated. They are always running on the same
>>>>> host - either with kernel < 4.14 or with kernel 4.14.x.
>>>>
>>>> This is disturbing... unless I'm mistaken, it shouldn't be possible to
>>>> have UFO enabled on a virtio device in a VM booted on a host with 4.14
>>>> kernel.
>>>
>>> Indeed. When working on that revert patch I verified that UFO in
>>> the guest virtio_net was off before the revert patch, on after.
>>>
>>> Qemu should check host support with tap_probe_has_ufo
>>> before advertising support to the guest. Indeed, this is exactly
>>> what broke live migration in virtio_net_load_device at
>>>
>>>     if (qemu_get_byte(f) && !peer_has_ufo(n)) {
>>>         error_report("virtio-net: saved image requires TUN_F_UFO support");
>>>         return -1;
>>>     }
>>>
>>> Which follows
>>>
>>>    peer_has_ufo
>>>      qemu_has_ufo
>>>        tap_has_ufo
>>>          s->has_ufo
>>>
>>> where s->has_ufo was set by tap_probe_has_ufo in net_tap_fd_init.
>>>
>>> Now, checking my qemu git branch, I ran pretty old 2.7.0-rc3. But this
>>> codepath does not seem to have changed between then and 2.10.1.
>>>
>>> I cherry-picked the revert onto 4.14.3. It did not apply cleanly, but the
>>> fix-up wasn't too hard. Compiled and booted, but untested otherwise. At
>>>
>>>   https://github.com/wdebruij/linux/commits/v4.14.3-aargh-ufo
>>
>> I'm just running it at the moment. I didn't face any network hang until
>> now - although the critical UDP packages have been gone through.
>> Therefore: looks nice.
> 
> Well, the patch does not fix hanging VMs, which have been shutdown and
> can't be killed any more.
> Because of the stack trace
> 
> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> I was hoping, that the problems could be related - but that seems not to
> be true.

However, it turned out, that reverting the complete patchset "Remove UDP
Fragmentation Offload support" prevent hanging qemu processes. I tested
today and the whole weekend - I didn't face any problem. Before that, I
could see nearly immediately hanging qemu processes after shutdown w/
libvirt.

Tested w/ 4.14.4, qemu 2.6.2 and libvirt 2.0.0 and 4 VMs.

I'll be back if the problem comes up again while the patchset is reverted.


Thanks,
Andreas

[-- Attachment #2: Revert - Remove UDP Fragmentation Offload support.tar.xz --]
[-- Type: application/x-xz, Size: 9152 bytes --]

  reply	other threads:[~2017-12-11 15:56 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-26 14:17 Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected Andreas Hartmann
2017-11-27 16:46 ` Andreas Hartmann
2017-11-27 16:55   ` Michal Kubecek
2017-11-27 19:09     ` Andreas Hartmann
2017-12-01 10:11 ` Andreas Hartmann
2017-12-03 11:35   ` Andreas Hartmann
2017-12-04 16:28     ` Andreas Hartmann
2017-12-05  3:50       ` Jason Wang
2017-12-05 16:23         ` Andreas Hartmann
2017-12-06  3:08           ` Jason Wang
2017-12-08  7:21             ` Andreas Hartmann
2017-12-08  8:47               ` Michal Kubecek
2017-12-08 10:31                 ` Andreas Hartmann
2017-12-08 11:40                   ` Michal Kubecek
2017-12-08 12:45                     ` Andreas Hartmann
2017-12-08 12:58                       ` Michal Kubecek
2017-12-08 13:13                         ` Andreas Hartmann
2017-12-08 15:11                           ` Jason Wang
2017-12-08 16:04                     ` Willem de Bruijn
2017-12-08 20:11                       ` Andreas Hartmann
2017-12-08 20:44                         ` Andreas Hartmann
2017-12-11 15:54                           ` Andreas Hartmann [this message]
2017-12-14 16:31                             ` Andreas Hartmann
2017-12-14 22:17                             ` Willem de Bruijn
2017-12-14 22:47                               ` Willem de Bruijn
2017-12-15  6:05                               ` Andreas Hartmann
2017-12-17 22:33                                 ` Willem de Bruijn
2017-12-18 17:11                                   ` Andreas Hartmann
2017-12-20 15:56                                     ` Andreas Hartmann
2017-12-20 22:44                                       ` Willem de Bruijn
2017-12-21 17:05                                         ` Andreas Hartmann
2017-12-21 17:11                                           ` Willem de Bruijn
2017-12-24 16:24                                       ` Andreas Hartmann
2017-12-24 18:54                                         ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d71df64e-e65f-4db4-6f2e-c002c15fcbe4@01019freenet.de \
    --to=andihartmann@01019freenet.de \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.