From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willem de Bruijn Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. Date: Tue, 19 Dec 2017 11:19:51 -0500 Message-ID: References: <10fe2b98-1e26-9539-9f49-0d01f8693e04@redhat.com> <6b41b4e5-6c0c-fce6-21fe-02dd8f550095@redhat.com> <634116a6-6338-4249-7d2d-430b654cc99c@redhat.com> <1f789868-7fda-3553-7078-3298873fb355@redhat.com> <918c4152-bcf9-b28c-0f54-f51d07d82bfc@redhat.com> <68b5d4aa-1d48-d9a1-fc47-62ee8d7ad07a@redhat.com> <623df785-b79c-80d1-899f-6fcc10f70e69@redhat.com> <61be2e2b-9aeb-1a82-d607-a6af00f8c9c6@redhat.com> <094aabc6-4e6b-841e-2b7b-177b31e8ed07@redhat.com> <9da15781-b6e0-3688-f6b2-2ef483b39d0d@redhat.com> <2c153ff8-57cc-715b-6d2f-1758bcb66abb@redhat.com> <4c8c81e6-e582-f292-79ed-f3d62518e2d9@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: David Hill , Paolo Bonzini , kvm@vger.kernel.org, Willem de Bruijn , netdev To: Jason Wang Return-path: Received: from mail-ot0-f196.google.com ([74.125.82.196]:45719 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750801AbdLSQUc (ORCPT ); Tue, 19 Dec 2017 11:20:32 -0500 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: >> It looks like the first bad commit would be the following: >> >> [jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g >> 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit >> commit 3ece782693c4b64d588dd217868558ab9a19bfe7 >> Author: Willem de Bruijn >> Date: Thu Aug 3 16:29:38 2017 -0400 >> >> sock: skb_copy_ubufs support for compound pages >> >> Refine skb_copy_ubufs to support compound pages. With upcoming TCP >> zerocopy sendmsg, such fragments may appear. >> >> The existing code replaces each page one for one. Splitting each >> compound page into an independent number of regular pages can result >> in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned. >> >> Instead, fill all destination pages but the last to PAGE_SIZE. >> Split the existing alloc + copy loop into separate stages: >> 1. compute bytelength and minimum number of pages to store this. >> 2. allocate >> 3. copy, filling each page except the last to PAGE_SIZE bytes >> 4. update skb frag array >> >> Signed-off-by: Willem de Bruijn >> Signed-off-by: David S. Miller >> >> :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 >> 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M include >> :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb >> 4fc8384362693e4619fab39b0a945f6f2349226b M net >> >> Here is the bisect log: > > > Thanks for the hard bisecting. > > Cc netdev and Willem. This is being discussed in http://lkml.kernel.org/r/ David also previously reported this at https://bugzilla.kernel.org/show_bug.cgi?id=197861 which has a pointer to the above thread, too. Let's discuss this in a single thread. I have suggested a fix there. Thanks for bisecting. Please also test the patch in the above thread if possible.