All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hill <dhill@redhat.com>
To: Jason Wang <jasowang@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
Date: Tue, 28 Nov 2017 13:00:55 -0500	[thread overview]
Message-ID: <c63ba0d1-c0d2-85c6-ad1c-7f777b59eae8@redhat.com> (raw)
In-Reply-To: <6b41b4e5-6c0c-fce6-21fe-02dd8f550095@redhat.com>



On 2017-11-27 02:38 PM, David Hill wrote:
>
>
> On 2017-11-26 10:44 PM, Jason Wang wrote:
>>
>>
>> On 2017年11月25日 00:22, David Hill wrote:
>>> The VMs all have 2 vNICs ... and this is the hypervisor:
>>>
>>> [root@zappa ~]# brctl show
>>> bridge name    bridge id        STP enabled    interfaces
>>> virbr0        8000.525400914858    yes        virbr0-nic
>>>                             vnet0
>>>                             vnet1
>>>
>>>
>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
>>> group default qlen 1000
>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>     inet 127.0.0.1/8 scope host lo
>>>        valid_lft forever preferred_lft forever
>>>     inet6 ::1/128 scope host
>>>        valid_lft forever preferred_lft forever
>>> 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>> UP group default qlen 1000
>>>     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
>>>     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
>>>        valid_lft 48749sec preferred_lft 48749sec
>>>     inet6 fe80::862b:2bff:fe13:f291/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>> UP group default qlen 1000
>>>     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
>>>     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::862b:2bff:fe13:f292/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
>>> state UP group default qlen 1000
>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>>     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>>>
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.10/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.11/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.12/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.15/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.16/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.17/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.18/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.31/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.32/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.33/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.34/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.35/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.36/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.37/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.45/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.46/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.47/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.48/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.49/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.50/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.51/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>> 5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master 
>>> virbr0 state DOWN group default qlen 1000
>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>> 125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc 
>>> fq_codel state UNKNOWN group default qlen 100
>>>     link/none
>>>     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
>>>        valid_lft forever preferred_lft forever
>>> 402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::fc54:ff:fe09:2739/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::fc54:ff:feea:6b18/64 scope link
>>>        valid_lft forever preferred_lft forever
>>>
>>
>> I could not reproduce this locally by simply running netperf through 
>> a mlx4 card. Some more questions:
>>
>> - What kind of workloads did you run in guest?
>> - Did you meet this issue in a specific type of network card (I guess 
>> broadcom is used in this case)?
>> - Virbr0 looks like a bridge created by libvirt that did NAT and 
>> other stuffs, can you still hit this issue if you don't use virbr0?
>>
>> And what's more important, zerocopy is known to have issues, for 
>> production environment, need to disable it through vhost_net module 
>> parameters.
>>
>> Thanks
>
> I'm deploying an overcloud through a undercloud virtual machine... The 
> VM has 4vCPUs and 16GB of RAM as well as to virtio nics so I'm using 
> only virtual hardware here.
> I spawn 7 VMs on the hypervisor and deploy an overcloud using tripleo 
> on them ... everything's virtual and if I remove the bridge, then I'll 
> have to configure each VMs differently.
> The load is quite high on the VM that won't shutdown but when I shut 
> it down, it's doing nothing ...   This is a hard bug to troubleshoot 
> and I can't bisect the kernel because at some
> point the system simply won't boot properly.

I've disabled zerocopy with the following:

[root@zappa modprobe.d]# cat vhost-net.conf
options vhost_net  experimental_zcopytx=0


And I haven't reproduce this issue so far.   The problem I have right 
now is that experimental_zcopytx has been enabled by default with this 
commit:

commit f9611c43ab0ddaf547b395c90fb842f55959334c
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Thu Dec 6 14:56:00 2012 +0200

     vhost-net: enable zerocopy tx by default

     Zero copy TX has been around for a while now.
     We seem to be down to eliminating theoretical bugs
     and performance tuning at this point:
     it's probably time to enable it by default so that
     most users get the benefit.

     Keep the flag around meanwhile so users can experiment
     with disabling this if they experience regressions.
     I expect that we will remove it in the future.

     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

I'll try some more pass in producing this issue and I'll keep you posted.

Thank you very much,

David Hill

  reply	other threads:[~2017-11-28 18:01 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <efd45fba-5724-0036-8473-0274b5816ae9@redhat.com>
2017-11-13 15:54 ` Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. [1] David Hill
     [not found]   ` <CALapVYHmf7gG25nA-5LkoaTDR8gB0xQ1Ro_FyyCQNbzrfSp+aQ@mail.gmail.com>
2017-11-15 21:08     ` David Hill
2017-11-22 18:22       ` Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover David Hill
2017-11-23 23:48         ` Paolo Bonzini
2017-11-24  3:11           ` Jason Wang
2017-11-24 16:19             ` David Hill
2017-11-24 16:22             ` David Hill
2017-11-27  3:44               ` Jason Wang
2017-11-27 19:38                 ` David Hill
2017-11-28 18:00                   ` David Hill [this message]
2017-11-29  1:52                     ` Jason Wang
2017-11-29  2:52                       ` Dave Hill
2017-11-29  5:15                         ` Jason Wang
2017-11-29 19:13                           ` David Hill
2017-11-30  2:42                             ` Jason Wang
2017-11-30 20:52                               ` David Hill
2017-11-30 20:59                                 ` David Hill
2017-12-01 16:38                                   ` David Hill
2017-12-04  4:08                                     ` Jason Wang
2017-12-04 19:51                                       ` David Hill
2017-12-07  4:34                                         ` David Hill
2017-12-07  4:42                                           ` David Hill
2017-12-07  5:13                                             ` Jason Wang
2017-12-08 18:03                                               ` David Hill
2017-12-12  3:53                                                 ` David Hill
2017-12-19  3:36                                                   ` Jason Wang
2017-12-19 16:19                                                     ` Willem de Bruijn
2017-12-07  5:12                                           ` Jason Wang
2017-12-02 12:16                                   ` Harald Moeller
2017-12-02 16:37                                   ` Harald Moeller
2017-12-07  2:44                                     ` David Hill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c63ba0d1-c0d2-85c6-ad1c-7f777b59eae8@redhat.com \
    --to=dhill@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.