From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Hill Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. Date: Wed, 22 Nov 2017 13:22:45 -0500 Message-ID: <29f8e09f-8920-52d0-02f4-c0fb779135ee@redhat.com> References: <92c4f997-80db-fabf-98c8-fcb92da064a7@redhat.com> <7bd45f84-d07e-7fca-6ca3-07dededd092d@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit To: kvm@vger.kernel.org Return-path: Received: from mail-qk0-f171.google.com ([209.85.220.171]:33173 "EHLO mail-qk0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751292AbdKVSWs (ORCPT ); Wed, 22 Nov 2017 13:22:48 -0500 Received: by mail-qk0-f171.google.com with SMTP id 78so17955321qkz.0 for ; Wed, 22 Nov 2017 10:22:48 -0800 (PST) Received: from [192.168.1.27] (modemcable010.138-178-173.mc.videotron.ca. [173.178.138.10]) by smtp.gmail.com with ESMTPSA id 46sm11442836qtx.65.2017.11.22.10.22.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Nov 2017 10:22:46 -0800 (PST) In-Reply-To: <7bd45f84-d07e-7fca-6ca3-07dededd092d@redhat.com> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: Anyone had time to take a little look at the kernel trace? On 2017-11-15 04:08 PM, David Hill wrote: > Hi guys, > >     The same behavior is seen with 4.15-rc0 : > > $ sudo virsh list >  Id    Name                           State > ---------------------------------------------------- >  2     undercloud-0-pike              in shutdown > > $ uname -a > Linux zappa.orion 4.15.0-0.rc0.git1.1.fc28.x86_64 #1 SMP Mon Nov 13 > 19:54:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > > $ ps aufxg | grep qemu > > qemu      8595 81.6 12.8 22831200 16961240 ?   D    11:44 214:26 > /usr/bin/qemu-system-x86_64 -machine accel=kvm -name > guest=undercloud-0-pike,debug-threads=on -S -object > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/master-key.aes > -machine > pc-i440fx-2.10,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu > Westmere -m 16384 -realtime mlock=off -smp > 4,sockets=4,cores=1,threads=1 -uuid > 0de0918d-f67e-426a-91c1-5e38e86c96b3 -no-user-config -nodefaults > -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/monitor.sock,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc > base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet > -no-shutdown -global PIIX4_PM.disable_s3=1 -global > PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device > ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device > ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 > -device > ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 > -device > ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 > -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive > file=/var/lib/jenkins/VMs/undercloud-0-pike.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 > -drive if=none,id=drive-ide0-0-0,readonly=on -device > ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=3 > -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:79:8e:41,bus=pci.0,addr=0x3,bootindex=1 > -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device > virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:b8:8b:a5,bus=pci.0,addr=0x8 > -chardev pty,id=charserial0 -device > isa-serial,chardev=charserial0,id=serial0 -chardev > spicevmc,id=charchannel0,name=vdagent -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 > -spice > port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on > -device > qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 > -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev > spicevmc,id=charredir0,name=usbredir -device > usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev > spicevmc,id=charredir1,name=usbredir -device > usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on > > Thank you very much, > > David Hill > > > On 2017-11-14 10:44 AM, Dave Hill wrote: >> I'm not able to bisect further as the system wont boot properly when >> I reach that commit. >> >> On Mon, Nov 13, 2017 at 10:54 AM, David Hill > > wrote: >> >> >>     Hi guys, >> >>        Starting with kernel 4.14-rc1, my CI failed to completely >>     shutdown one of the >>     VMs and it stuck in "in shutdown" while sending this kernel message: >> >>     [ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more >>     than 120 seconds. >>     [ 7496.552987]       Tainted: G          I >>     4.14.0-0.rc1.git3.1.fc28.x86_64 #1 >>     [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" >>     disables this message. >>     [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004 >>     [ 7496.553024] Call Trace: >>     [ 7496.553044]  __schedule+0x2dc/0xbb0 >>     [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10 >>     [ 7496.553074]  schedule+0x3d/0x90 >>     [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net] >>     [ 7496.553100]  ? finish_wait+0x90/0x90 >>     [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net] >>     [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0 >>     [ 7496.553166]  SyS_ioctl+0x79/0x90 >>     [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe >>     [ 7496.553190] RIP: 0033:0x7fa1ea0e1817 >>     [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246 >>     ORIG_RAX: 0000000000000010 >>     [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX: >>     00007fa1ea0e1817 >>     [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI: >>     0000000000000021 >>     [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09: >>     000055e330245d92 >>     [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12: >>     000055e33351a000 >>     [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15: >>     0000000000000000 >>     [ 7496.553284] >>                    Showing all locks held in the system: >>     [ 7496.553313] 1 lock held by khungtaskd/161: >>     [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at: >>     [] debug_show_all_locks+0x3d/0x1a0 >>     [ 7496.553373] 1 lock held by in:imklog/1194: >>     [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at: >>     [] __fdget_pos+0x4c/0x60 >>     [ 7496.553541] 1 lock held by qemu-system-x86/5978: >>     [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at: >>     [] vhost_net_ioctl+0x358/0x910 [vhost_net] >> >> >> >>     I'm currently bisecting to figure out which commit breaks this but >>     for some reasons, when hitting this commit: >> >>     # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag >>     'wireless-drivers-next-for-davem-2017-08-07' of >> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next >> >>     git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63 >> >>     the host will not allow SSHd to establish a new session and when >>     starting a KVM guest, the host will hard lock.   I'm still >>     bisecting but I marked that commit as bad even though perhaps it >>     would be good. >>     Hopefully, this commit was a bad one and my bisection will >>     pinpoint which >>     commit broke the kernel.    If you have an idea of which commit >>     might break >>     the system, please let me know which one I should test first. >> >>     Thank you very much, >>     David Hill >> >>     [1] https://bugzilla.kernel.org/show_bug.cgi?id=197861 >>     >> >> >> >