From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Hill Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. Date: Wed, 6 Dec 2017 21:44:46 -0500 Message-ID: References: <92c4f997-80db-fabf-98c8-fcb92da064a7@redhat.com> <7bd45f84-d07e-7fca-6ca3-07dededd092d@redhat.com> <29f8e09f-8920-52d0-02f4-c0fb779135ee@redhat.com> <9c912f3b-081c-8b02-17c8-453ebf36f42c@redhat.com> <10fe2b98-1e26-9539-9f49-0d01f8693e04@redhat.com> <6b41b4e5-6c0c-fce6-21fe-02dd8f550095@redhat.com> <634116a6-6338-4249-7d2d-430b654cc99c@redhat.com> <1f789868-7fda-3553-7078-3298873fb355@redhat.com> <918c4152-bcf9-b28c-0f54-f51d07d82bfc@redhat.com> <68b5d4aa-1d48-d9a1-fc47-62ee8d7ad07a@redhat.com> <623df785-b79c-80d1-899f-6fcc10f70e69@redhat.com> <61be2e2b-9aeb-1a82-d607-a6af00f8c9c6@redhat.com> <08dddfe6-97c0-95a3-8b26-327c14d618a8@hakimo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit To: Harald Moeller , kvm@vger.kernel.org Return-path: Received: from mail-qt0-f172.google.com ([209.85.216.172]:45403 "EHLO mail-qt0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751182AbdLGCou (ORCPT ); Wed, 6 Dec 2017 21:44:50 -0500 Received: by mail-qt0-f172.google.com with SMTP id g10so14105129qtj.12 for ; Wed, 06 Dec 2017 18:44:50 -0800 (PST) In-Reply-To: <08dddfe6-97c0-95a3-8b26-327c14d618a8@hakimo.net> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: Have you tried adding this: cat</etc/modprobe.d/vhost-net.conf options vhost_net  experimental_zcopytx=0 EOF reboot Other than this, you can try bisecting but in my case, the system wont boot when reaching a given commit. On 2017-12-02 11:37 AM, Harald Moeller wrote: > Hello, my name is Harry and this is my first post here, hope I'm doing > this the right way, sorry if not ... > > I'm not a subscriber to the full list yet so I understand I shall ask > you to be personally CCed. > > I am following this as I do experience the same (or sort-a same) issue > with 4.14.2. > > My setup is more simple, just an oVirt host shutting down some VMs. > Doesn't happen all the time but I'd say around 3 from 10. > > This is what I see (slightly different from David): > > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173 > blocked for more than 120 seconds. > Dec 01 16:11:53 oVirtHost01.xyz.net kernel:       Tainted: G          > I     4.14.2-1.el7.hakimo.x86_64 #4 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm        D 0 > 1173      1 0x00000084 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace: > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  schedule+0x36/0x80 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: > vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net] > Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  ? > remove_wait_queue+0x60/0x60 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: > vhost_net_ioctl+0x317/0x8e0 [vhost_net] > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  SyS_ioctl+0x79/0x90 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: > entry_SYSCALL64_slow_path+0x25/0x25 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58 > EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX: > 000055abaa2d29c0 RCX: 00007fb8862d1107 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI: > 000000004008af30 RDI: 0000000000000028 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08: > 000055aba805e10f R09: 00000000ffffffff > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11: > 0000000000000246 R12: 000055ababf32510 > Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14: > 000055ababf32498 R15: 000055abaa2a0b40 > > This is still happening after reverting the three suggested commits > > 1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY") > > c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct > ubuf_info)->refcnt to refcount_t") > > 581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on > stand-alone ptype in dev_queue_xmit_nit"} > > Anything I could be helpful with trying to solve this? Any more info I > could provide? > > Harry >