From mboxrd@z Thu Jan 1 00:00:00 1970 From: Harald Moeller Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. Date: Sat, 2 Dec 2017 17:37:02 +0100 Message-ID: <08dddfe6-97c0-95a3-8b26-327c14d618a8@hakimo.net> References: <92c4f997-80db-fabf-98c8-fcb92da064a7@redhat.com> <7bd45f84-d07e-7fca-6ca3-07dededd092d@redhat.com> <29f8e09f-8920-52d0-02f4-c0fb779135ee@redhat.com> <9c912f3b-081c-8b02-17c8-453ebf36f42c@redhat.com> <10fe2b98-1e26-9539-9f49-0d01f8693e04@redhat.com> <6b41b4e5-6c0c-fce6-21fe-02dd8f550095@redhat.com> <634116a6-6338-4249-7d2d-430b654cc99c@redhat.com> <1f789868-7fda-3553-7078-3298873fb355@redhat.com> <918c4152-bcf9-b28c-0f54-f51d07d82bfc@redhat.com> <68b5d4aa-1d48-d9a1-fc47-62ee8d7ad07a@redhat.com> <623df785-b79c-80d1-899f-6fcc10f70e69@redhat.com> <61be2e2b-9aeb-1a82-d607-a6af00f8c9c6@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit To: kvm@vger.kernel.org Return-path: Received: from mout.kundenserver.de ([212.227.126.135]:61717 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751859AbdLBQhH (ORCPT ); Sat, 2 Dec 2017 11:37:07 -0500 Received: from hakimo.net ([91.39.86.182]) by mrelayeu.kundenserver.de (mreue005 [212.227.15.167]) with ESMTPSA (Nemesis) id 0Me7T8-1ejq0d3hmg-00Pqm5 for ; Sat, 02 Dec 2017 17:37:05 +0100 In-Reply-To: <61be2e2b-9aeb-1a82-d607-a6af00f8c9c6@redhat.com> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: Hello, my name is Harry and this is my first post here, hope I'm doing this the right way, sorry if not ... I'm not a subscriber to the full list yet so I understand I shall ask you to be personally CCed. I am following this as I do experience the same (or sort-a same) issue with 4.14.2. My setup is more simple, just an oVirt host shutting down some VMs. Doesn't happen all the time but I'd say around 3 from 10. This is what I see (slightly different from David): Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173 blocked for more than 120 seconds. Dec 01 16:11:53 oVirtHost01.xyz.net kernel:       Tainted: G          I     4.14.2-1.el7.hakimo.x86_64 #4 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm        D 0 1173      1 0x00000084 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace: Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880 Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  schedule+0x36/0x80 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net] Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  ? remove_wait_queue+0x60/0x60 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ioctl+0x317/0x8e0 [vhost_net] Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0 Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  SyS_ioctl+0x79/0x90 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: entry_SYSCALL64_slow_path+0x25/0x25 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX: 000055abaa2d29c0 RCX: 00007fb8862d1107 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI: 000000004008af30 RDI: 0000000000000028 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08: 000055aba805e10f R09: 00000000ffffffff Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 000055ababf32510 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14: 000055ababf32498 R15: 000055abaa2a0b40 This is still happening after reverting the three suggested commits 1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY") c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct ubuf_info)->refcnt to refcount_t") 581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on stand-alone ptype in dev_queue_xmit_nit"} Anything I could be helpful with trying to solve this? Any more info I could provide? Harry