From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Wang Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. Date: Thu, 7 Dec 2017 13:12:04 +0800 Message-ID: <1c81f62d-b0fd-c17e-4e5f-2e8ba3e1413b@redhat.com> References: <29f8e09f-8920-52d0-02f4-c0fb779135ee@redhat.com> <9c912f3b-081c-8b02-17c8-453ebf36f42c@redhat.com> <10fe2b98-1e26-9539-9f49-0d01f8693e04@redhat.com> <6b41b4e5-6c0c-fce6-21fe-02dd8f550095@redhat.com> <634116a6-6338-4249-7d2d-430b654cc99c@redhat.com> <1f789868-7fda-3553-7078-3298873fb355@redhat.com> <918c4152-bcf9-b28c-0f54-f51d07d82bfc@redhat.com> <68b5d4aa-1d48-d9a1-fc47-62ee8d7ad07a@redhat.com> <623df785-b79c-80d1-899f-6fcc10f70e69@redhat.com> <61be2e2b-9aeb-1a82-d607-a6af00f8c9c6@redhat.com> <094aabc6-4e6b-841e-2b7b-177b31e8ed07@redhat.com> <9da15781-b6e0-3688-f6b2-2ef483b39d0d@redhat.com> <2c153ff8-57cc-715b-6d2f-1758bcb66abb@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit To: David Hill , Paolo Bonzini , kvm@vger.kernel.org Return-path: Received: from mx1.redhat.com ([209.132.183.28]:45700 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750716AbdLGFMI (ORCPT ); Thu, 7 Dec 2017 00:12:08 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 415D9461DC for ; Thu, 7 Dec 2017 05:12:08 +0000 (UTC) In-Reply-To: <2c153ff8-57cc-715b-6d2f-1758bcb66abb@redhat.com> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: On 2017年12月07日 12:34, David Hill wrote: > > > On 2017-12-04 02:51 PM, David Hill wrote: >> >> On 2017-12-03 11:08 PM, Jason Wang wrote: >>> >>> >>> On 2017年12月02日 00:38, David Hill wrote: >>>>> >>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too >>>>> ... compiling and I'll keep you posted. >>>> >>>> So I'm still able to reproduce this issue even with reverting these >>>> 3 commits.  Would you have other suspect commits ? >>> >>> Thanks for the testing. No, I don't have other suspect commits. >>> >>> Looks like somebody else it hitting your issue too (see >>> https://www.spinics.net/lists/netdev/msg468319.html) >>> >>> But he claims the issue were fixed by using qemu 2.10.1. >>> >>> So you may: >>> >>> -try to see if qemu 2.10.1 solves your issue >> It didn't solve it for him... it's only harder to reproduce. [1] >>> -if not, try to see if commit >>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier hooks >>> for devmap bpf map") is the first bad commit >> I'll try to see what I can do here > I'm looking at that commit and it's been introduced before v4.13 if > I'm not mistaken while this issue appeared between v4.13 and v4.14-rc1 > .  Between those two releases, there're  1352 commits. > Is there a way to quickly know which commits are touching vhost-net, > zerocopy ? > > > [ 7496.553044]  __schedule+0x2dc/0xbb0 > [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10 > [ 7496.553074]  schedule+0x3d/0x90 > [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net] > [ 7496.553100]  ? finish_wait+0x90/0x90 > [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net] > [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0 > [ 7496.553166]  SyS_ioctl+0x79/0x90 > [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe e.g you can do #git log --oneline v4.13..v4.14-rc1 drivers/vhost/net.c 8b949be vhost_net: correctly check tx avail during rx busy polling c1d1b43 net: convert (struct ubuf_info)->refcnt to refcount_t 1f8b977 sock: enable MSG_ZEROCOPY 7a68ada Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net If I understand it correctly, you can still hit the issue before 1f8b977? If yes, you probably can bisect between 7a68ada and 1f8b977. Thanks > >>> -if not, maybe you can continue your bisection through git bisect skip >>> >> Some commits are so broken that the system won't boot ...  What I >> fear is that if I git bisect skip those commits, I'll also skip the >> commit culprit of my original problem >> >> [1] https://www.spinics.net/lists/netdev/msg469887.html >