From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Wang Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. Date: Mon, 27 Nov 2017 11:44:55 +0800 Message-ID: <10fe2b98-1e26-9539-9f49-0d01f8693e04@redhat.com> References: <92c4f997-80db-fabf-98c8-fcb92da064a7@redhat.com> <7bd45f84-d07e-7fca-6ca3-07dededd092d@redhat.com> <29f8e09f-8920-52d0-02f4-c0fb779135ee@redhat.com> <9c912f3b-081c-8b02-17c8-453ebf36f42c@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit To: David Hill , Paolo Bonzini , kvm@vger.kernel.org Return-path: Received: from mx1.redhat.com ([209.132.183.28]:56854 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751107AbdK0DpB (ORCPT ); Sun, 26 Nov 2017 22:45:01 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3F364C0587C7 for ; Mon, 27 Nov 2017 03:45:01 +0000 (UTC) In-Reply-To: <9c912f3b-081c-8b02-17c8-453ebf36f42c@redhat.com> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: On 2017年11月25日 00:22, David Hill wrote: > The VMs all have 2 vNICs ... and this is the hypervisor: > > [root@zappa ~]# brctl show > bridge name    bridge id        STP enabled    interfaces > virbr0        8000.525400914858    yes        virbr0-nic >                             vnet0 >                             vnet1 > > > 1: lo: mtu 65536 qdisc noqueue state UNKNOWN > group default qlen 1000 >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >     inet 127.0.0.1/8 scope host lo >        valid_lft forever preferred_lft forever >     inet6 ::1/128 scope host >        valid_lft forever preferred_lft forever > 2: eno1: mtu 1500 qdisc mq state UP > group default qlen 1000 >     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff >     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1 >        valid_lft 48749sec preferred_lft 48749sec >     inet6 fe80::862b:2bff:fe13:f291/64 scope link >        valid_lft forever preferred_lft forever > 3: eno2: mtu 1500 qdisc mq state UP > group default qlen 1000 >     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff >     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2 >        valid_lft forever preferred_lft forever >     inet6 fe80::862b:2bff:fe13:f292/64 scope link >        valid_lft forever preferred_lft forever > 4: virbr0: mtu 1500 qdisc noqueue > state UP group default qlen 1000 >     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff >     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 > >        valid_lft forever preferred_lft forever >     inet 192.168.122.10/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.11/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.12/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.15/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.16/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.17/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.18/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.31/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.32/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.33/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.34/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.35/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.36/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.37/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.45/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.46/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.47/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.48/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.49/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.50/32 scope global virbr0 >        valid_lft forever preferred_lft forever >     inet 192.168.122.51/32 scope global virbr0 >        valid_lft forever preferred_lft forever > 5: virbr0-nic: mtu 1500 qdisc fq_codel master > virbr0 state DOWN group default qlen 1000 >     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff > 125: tun0: mtu 1360 qdisc > fq_codel state UNKNOWN group default qlen 100 >     link/none >     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0 >        valid_lft forever preferred_lft forever >     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy >        valid_lft forever preferred_lft forever > 402: vnet0: mtu 1500 qdisc fq_codel > master virbr0 state UNKNOWN group default qlen 1000 >     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff >     inet6 fe80::fc54:ff:fe09:2739/64 scope link >        valid_lft forever preferred_lft forever > 403: vnet1: mtu 1500 qdisc fq_codel > master virbr0 state UNKNOWN group default qlen 1000 >     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff >     inet6 fe80::fc54:ff:feea:6b18/64 scope link >        valid_lft forever preferred_lft forever > I could not reproduce this locally by simply running netperf through a mlx4 card. Some more questions: - What kind of workloads did you run in guest? - Did you meet this issue in a specific type of network card (I guess broadcom is used in this case)? - Virbr0 looks like a bridge created by libvirt that did NAT and other stuffs, can you still hit this issue if you don't use virbr0? And what's more important, zerocopy is known to have issues, for production environment, need to disable it through vhost_net module parameters. Thanks