From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Hill Subject: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. [1] Date: Mon, 13 Nov 2017 10:54:09 -0500 Message-ID: <92c4f997-80db-fabf-98c8-fcb92da064a7@redhat.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit To: kvm@vger.kernel.org Return-path: Received: from mail-qk0-f177.google.com ([209.85.220.177]:52379 "EHLO mail-qk0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753143AbdKMPyM (ORCPT ); Mon, 13 Nov 2017 10:54:12 -0500 Received: by mail-qk0-f177.google.com with SMTP id a194so18102701qkc.9 for ; Mon, 13 Nov 2017 07:54:11 -0800 (PST) Received: from [192.168.1.27] (modemcable010.138-178-173.mc.videotron.ca. [173.178.138.10]) by smtp.gmail.com with ESMTPSA id d205sm7623292qke.21.2017.11.13.07.54.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Nov 2017 07:54:09 -0800 (PST) In-Reply-To: Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: Hi guys,    Starting with kernel 4.14-rc1, my CI failed to completely shutdown one of the VMs and it stuck in "in shutdown" while sending this kernel message: [ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more than 120 seconds. [ 7496.552987]       Tainted: G          I 4.14.0-0.rc1.git3.1.fc28.x86_64 #1 [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004 [ 7496.553024] Call Trace: [ 7496.553044]  __schedule+0x2dc/0xbb0 [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10 [ 7496.553074]  schedule+0x3d/0x90 [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net] [ 7496.553100]  ? finish_wait+0x90/0x90 [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net] [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0 [ 7496.553166]  SyS_ioctl+0x79/0x90 [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe [ 7496.553190] RIP: 0033:0x7fa1ea0e1817 [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007fa1ea0e1817 [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI: 0000000000000021 [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09: 000055e330245d92 [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12: 000055e33351a000 [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15: 0000000000000000 [ 7496.553284]                Showing all locks held in the system: [ 7496.553313] 1 lock held by khungtaskd/161: [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at: [] debug_show_all_locks+0x3d/0x1a0 [ 7496.553373] 1 lock held by in:imklog/1194: [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at: [] __fdget_pos+0x4c/0x60 [ 7496.553541] 1 lock held by qemu-system-x86/5978: [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at: [] vhost_net_ioctl+0x358/0x910 [vhost_net] I'm currently bisecting to figure out which commit breaks this but for some reasons, when hitting this commit: # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 'wireless-drivers-next-for-davem-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63 the host will not allow SSHd to establish a new session and when starting a KVM guest, the host will hard lock.   I'm still bisecting but I marked that commit as bad even though perhaps it would be good. Hopefully, this commit was a bad one and my bisection will pinpoint which commit broke the kernel.    If you have an idea of which commit might break the system, please let me know which one I should test first. Thank you very much, David Hill [1] https://bugzilla.kernel.org/show_bug.cgi?id=197861