From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Wang Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. Date: Tue, 19 Dec 2017 11:36:19 +0800 Message-ID: References: <10fe2b98-1e26-9539-9f49-0d01f8693e04@redhat.com> <6b41b4e5-6c0c-fce6-21fe-02dd8f550095@redhat.com> <634116a6-6338-4249-7d2d-430b654cc99c@redhat.com> <1f789868-7fda-3553-7078-3298873fb355@redhat.com> <918c4152-bcf9-b28c-0f54-f51d07d82bfc@redhat.com> <68b5d4aa-1d48-d9a1-fc47-62ee8d7ad07a@redhat.com> <623df785-b79c-80d1-899f-6fcc10f70e69@redhat.com> <61be2e2b-9aeb-1a82-d607-a6af00f8c9c6@redhat.com> <094aabc6-4e6b-841e-2b7b-177b31e8ed07@redhat.com> <9da15781-b6e0-3688-f6b2-2ef483b39d0d@redhat.com> <2c153ff8-57cc-715b-6d2f-1758bcb66abb@redhat.com> <4c8c81e6-e582-f292-79ed-f3d62518e2d9@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: Willem de Bruijn , netdev To: David Hill , Paolo Bonzini , kvm@vger.kernel.org Return-path: Received: from mx1.redhat.com ([209.132.183.28]:38882 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934063AbdLSDg1 (ORCPT ); Mon, 18 Dec 2017 22:36:27 -0500 In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 2017年12月12日 11:53, David Hill wrote: > > > On 2017-12-08 01:03 PM, David Hill wrote: >> >> >> On 2017-12-07 12:13 AM, Jason Wang wrote: >>> >>> >>> On 2017年12月07日 12:42, David Hill wrote: >>>> >>>> >>>> On 2017-12-06 11:34 PM, David Hill wrote: >>>>> >>>>> >>>>> On 2017-12-04 02:51 PM, David Hill wrote: >>>>>> >>>>>> On 2017-12-03 11:08 PM, Jason Wang wrote: >>>>>>> >>>>>>> >>>>>>> On 2017年12月02日 00:38, David Hill wrote: >>>>>>>>> >>>>>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e >>>>>>>>> too ... compiling and I'll keep you posted. >>>>>>>> >>>>>>>> So I'm still able to reproduce this issue even with reverting >>>>>>>> these 3 commits.  Would you have other suspect commits ? >>>>>>> >>>>>>> Thanks for the testing. No, I don't have other suspect commits. >>>>>>> >>>>>>> Looks like somebody else it hitting your issue too (see >>>>>>> https://www.spinics.net/lists/netdev/msg468319.html) >>>>>>> >>>>>>> But he claims the issue were fixed by using qemu 2.10.1. >>>>>>> >>>>>>> So you may: >>>>>>> >>>>>>> -try to see if qemu 2.10.1 solves your issue >>>>>> It didn't solve it for him... it's only harder to reproduce. [1] >>>>>>> -if not, try to see if commit >>>>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier >>>>>>> hooks for devmap bpf map") is the first bad commit >>>>>> I'll try to see what I can do here >>>>> I'm looking at that commit and it's been introduced before v4.13 >>>>> if I'm not mistaken while this issue appeared between v4.13 and >>>>> v4.14-rc1 .  Between those two releases, there're 1352 commits. >>>>> Is there a way to quickly know which commits are touching >>>>> vhost-net, zerocopy ? >>>>> >>>>> >>>>> [ 7496.553044]  __schedule+0x2dc/0xbb0 >>>>> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10 >>>>> [ 7496.553074]  schedule+0x3d/0x90 >>>>> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net] >>>>> [ 7496.553100]  ? finish_wait+0x90/0x90 >>>>> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net] >>>>> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0 >>>>> [ 7496.553166]  SyS_ioctl+0x79/0x90 >>>>> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe >>>> >>>> That vhost_net_ubuf_put_and)wait call has been changed in this >>>> commit with the following comment: >>>> >>>> commit 0ad8b480d6ee916aa84324f69acf690142aecd0e >>>> Author: Michael S. Tsirkin >>>> Date:   Thu Feb 13 11:42:05 2014 +0200 >>>> >>>>     vhost: fix ref cnt checking deadlock >>>> >>>>     vhost checked the counter within the refcnt before >>>> decrementing.  It >>>>     really wanted to know that it is the one that has the last >>>> reference, as >>>>     a way to batch freeing resources a bit more efficiently. >>>> >>>>     Note: we only let refcount go to 0 on device release. >>>> >>>>     This works well but we now access the ref counter twice so >>>> there's a >>>>     race: all users might see a high count and decide to defer freeing >>>>     resources. >>>>     In the end no one initiates freeing resources until the last >>>> reference >>>>     is gone (which is on VM shotdown so might happen after a >>>> looooong time). >>>> >>>>     Let's do what we probably should have done straight away: >>>>     switch from kref to plain atomic, documenting the >>>>     semantics, return the refcount value atomically after decrement, >>>>     then use that to avoid the deadlock. >>>> >>>>     Reported-by: Qin Chuanyu >>>>     Signed-off-by: Michael S. Tsirkin >>>>     Acked-by: Jason Wang >>>>     Signed-off-by: David S. Miller >>>> >>>> >>>> >>>> So at this point, are we hitting a deadlock when using >>>> experimental_zcopytx ? >>> >>> Yes. But there could be another possibility that it was not caused >>> by vhost_net itself but other places that holds a packet. >>> >>> Thanks >> >> While bisecting, when I reach this commit >> 46d4b68f891bee5d83a32508bfbd9778be6b1b63, the system kernel panic >> when I run virt-customize : >> >> Message from syslogd@zappa at Dec  8 12:52:06 ... >>  kernel:[  350.016376] Kernel panic - not syncing: Fatal exception in >> interrupt >> >> I marked that commit as bad again.   Will continue bisecting! >> > > It looks like the first bad commit would be the following: > > [jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g > 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit > commit 3ece782693c4b64d588dd217868558ab9a19bfe7 > Author: Willem de Bruijn > Date:   Thu Aug 3 16:29:38 2017 -0400 > >     sock: skb_copy_ubufs support for compound pages > >     Refine skb_copy_ubufs to support compound pages. With upcoming TCP >     zerocopy sendmsg, such fragments may appear. > >     The existing code replaces each page one for one. Splitting each >     compound page into an independent number of regular pages can result >     in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned. > >     Instead, fill all destination pages but the last to PAGE_SIZE. >     Split the existing alloc + copy loop into separate stages: >     1. compute bytelength and minimum number of pages to store this. >     2. allocate >     3. copy, filling each page except the last to PAGE_SIZE bytes >     4. update skb frag array > >     Signed-off-by: Willem de Bruijn >     Signed-off-by: David S. Miller > > :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 > 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M    include > :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb > 4fc8384362693e4619fab39b0a945f6f2349226b M    net > > Here is the bisect log: Thanks for the hard bisecting. Cc netdev and Willem. > > [root@zappa linux-stable-new]# git bisect log > git bisect start > # bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1 > git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e > # good: [e87c13993f16549e77abce9744af844c55154349] Linux 4.13.16 > git bisect good e87c13993f16549e77abce9744af844c55154349 > # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13 > git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261 > # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13 > git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261 > # bad: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect bad aae3dbb4776e7916b6cd442d00159bea27a695c1 > # good: [bf1d6b2c76eda86159519bf5c427b1fa8f51f733] Merge tag > 'staging-4.14-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging > git bisect good bf1d6b2c76eda86159519bf5c427b1fa8f51f733 > # bad: [e833251ad813168253fef9915aaf6a8c883337b0] rxrpc: Add > notification of end-of-Tx phase > git bisect bad e833251ad813168253fef9915aaf6a8c883337b0 > # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag > 'wireless-drivers-next-for-davem-2017-08-07' of > git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next > git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63 > # good: [cf6c6ea352faadb15d1373d890bf857080b218a4] iwlwifi: mvm: fix > the FIFO numbers in A000 devices > git bisect good cf6c6ea352faadb15d1373d890bf857080b218a4 > # good: [65205cc465e9b37abbdbb3d595c46081b97e35bc] sctp: remove the > typedef sctp_addiphdr_t > git bisect good 65205cc465e9b37abbdbb3d595c46081b97e35bc > # bad: [ecbd87b8430419199cc9dd91598d5552a180f558] phylink: add support > for MII ioctl access to Clause 45 PHYs > git bisect bad ecbd87b8430419199cc9dd91598d5552a180f558 > # bad: [52267790ef52d7513879238ca9fac22c1733e0e3] sock: add MSG_ZEROCOPY > git bisect bad 52267790ef52d7513879238ca9fac22c1733e0e3 > # good: [04b1d4e50e82536c12da00ee04a77510c459c844] net: core: Make the > FIB notification chain generic > git bisect good 04b1d4e50e82536c12da00ee04a77510c459c844 > # good: [9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c] ipv6: Regenerate > host route according to node pointer upon loopback up > git bisect good 9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c > # good: [0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549] mlxsw: > spectrum_router: Add support for route replace > git bisect good 0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549 > # good: [84b7187ca2338832e3af58eb5123c02bb6921e4e] Merge branch > 'mlxsw-Support-for-IPv6-UC-router' > git bisect good 84b7187ca2338832e3af58eb5123c02bb6921e4e > # bad: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs > support for compound pages > git bisect bad 3ece782693c4b64d588dd217868558ab9a19bfe7 > # good: [98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f] sock: allocate skbs > from optmem > git bisect good 98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f > # first bad commit: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: > skb_copy_ubufs support for compound pages > >