From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.9 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 008A5C433E0 for ; Sun, 24 May 2020 19:12:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A2B0A207D8 for ; Sun, 24 May 2020 19:12:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="qhpWBa9i" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A2B0A207D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2D04180008; Sun, 24 May 2020 15:12:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2807280007; Sun, 24 May 2020 15:12:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1995580008; Sun, 24 May 2020 15:12:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0036.hostedemail.com [216.40.44.36]) by kanga.kvack.org (Postfix) with ESMTP id 01F5780007 for ; Sun, 24 May 2020 15:12:39 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A8E06181AEF00 for ; Sun, 24 May 2020 19:12:39 +0000 (UTC) X-FDA: 76852559238.08.ducks54_51644ffe18d3c X-HE-Tag: ducks54_51644ffe18d3c X-Filterd-Recvd-Size: 9421 Received: from mail-ot1-f68.google.com (mail-ot1-f68.google.com [209.85.210.68]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Sun, 24 May 2020 19:12:39 +0000 (UTC) Received: by mail-ot1-f68.google.com with SMTP id g25so12321364otp.13 for ; Sun, 24 May 2020 12:12:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=pTUti2iXX/HBqPw+z4+24eIvkZrygpmEFB+h6bK3Nmw=; b=qhpWBa9i7c7elhwkam2/ANNIvn6v/uMNoMQt6T6WcKbBuR8oiWSKQTSsheuLHCMzQQ DirBC59lqDOEE56j5+q9esfziom9Fsj6g1FKhe1/raPCxstCJdspslppNNQog9Da4wBV TpFKk8QYOW3Y+X+6cCrxdsBPuu9iAbIRwCfUxdnu0yv+9Eig0nk9IO61hTnKA00+rpNm JRWM2ue8xw2c88OYjvcEisXmfYyrSHb7ItPRShFV8yI2ZMA4DRqBJWzLOcPSXgQYtBWo FohQ4USDDLrcGncoaYxNJR7kCkye3MTEApPNITgzTw5lczxKaM3ZDN0hBaHI22Yfzs2e owwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=pTUti2iXX/HBqPw+z4+24eIvkZrygpmEFB+h6bK3Nmw=; b=CWbdZAu3nO4SWNiIH2+E8gUbr41hqz7/uCE3PERrfxTAMjgNQnulfv4gV9LdBzFdQx C92hul7hgj8N68k+WzNKbvm9hDUoVEXtcKuC1sCaXSKx+8D36w6Y2d1g2/gKTQk4/cIK EtJBGYeXu7E98Bb5EnSavFtLLlvRO4R1FLhp9os3eY5H/jxjgHI/PtJU9NIw3yfxfgsK 1zkxVDkP0vkJVatgbPnAq+wBV/v9ZHdYj33inK+t0tnEEjZEbBzyFBNdEJAwdy479oU0 VYqViGTXDr2v8bT5abtaUl4qRA7G5pbNyjVnydVkr7nDhLjDj6gpGPWw+vp/00iV4pev ZE/g== X-Gm-Message-State: AOAM533fP6gMbH0Wpel1aXQ2xvhiW+7fD/jgpzZKdUoXzLNB27VpopN+ v+3EFrPeg5iBxWROPx0Dp69geQ== X-Google-Smtp-Source: ABdhPJwy+hC5I5E5Rn6xDqS7/+IUpTlI0yZmhW/si0NNH0sK7DmSuLqoI2I5jZCeaCHSdtM/qfbGCw== X-Received: by 2002:a9d:8e7:: with SMTP id 94mr18908037otf.370.1590347558212; Sun, 24 May 2020 12:12:38 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c26sm4403012oov.13.2020.05.24.12.12.36 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sun, 24 May 2020 12:12:36 -0700 (PDT) Date: Sun, 24 May 2020 12:12:35 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Konstantin Khlebnikov cc: Hugh Dickins , Andrew Morton , Vlastimil Babka , David Rientjes , "Kirill A. Shutemov" , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm/compaction: avoid VM_BUG_ON(PageSlab()) in page_mapcount() In-Reply-To: <63fe94c7-78d1-ae03-00da-ba0e6d207a70@yandex-team.ru> Message-ID: References: <158937872515.474360.5066096871639561424.stgit@buzz> <63fe94c7-78d1-ae03-00da-ba0e6d207a70@yandex-team.ru> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 24 May 2020, Konstantin Khlebnikov wrote: > On 24/05/2020 04.01, Hugh Dickins wrote: > > On Wed, 13 May 2020, Konstantin Khlebnikov wrote: > > > > > Function isolate_migratepages_block() runs some checks out of lru_lock > > > when choose pages for migration. After checking PageLRU() it checks extra > > > page references by comparing page_count() and page_mapcount(). Between > > > these two checks page could be removed from lru, freed and taken by slab. > > > > > > As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). > > > Race window is tiny. For certain workload this happens around once a > > > year. > > > > Around once a year, that was my guess too. I have no record of us ever > > hitting this, but yes it could happen when you have CONFIG_DEBUG_VM=y > > (which I too like to run with, but would not recommend for users). > > Yep, but for large cluster and pinpointed workload this happens surprisingly > frequently =) I've believed into this race only after seeing statistics for > count of compactions and how it correlates with incidents. > > Probably the key component is a slab allocation from network irq/bh context > which interrupts compaction exactly at this spot. Yes, I bet you're right. > > > > > > > > > > > > page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 > > > index:0x0 compound_mapcount: 0 > > > flags: 0x500000000008100(slab|head) > > > raw: 0500000000008100 dead000000000100 dead000000000200 > > > ffff88ff7712c180 > > > raw: 0000000000000000 0000000080200020 00000001ffffffff > > > 0000000000000000 > > > page dumped because: VM_BUG_ON_PAGE(PageSlab(page)) > > > ------------[ cut here ]------------ > > > kernel BUG at ./include/linux/mm.h:628! > > > invalid opcode: 0000 [#1] SMP NOPTI > > > CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G W > > > 4.19.109-27 #1 > > > Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019 > > > RIP: 0010:isolate_migratepages_block+0x986/0x9b0 > > > > > > > > > To fix just opencode page_mapcount() in racy check for 0-order case and > > > recheck carefully under lru_lock when page cannot escape from lru. > > > > > > Also add checking extra references for file pages and swap cache. > > > > > > Signed-off-by: Konstantin Khlebnikov > > > Fixes: 119d6d59dcc0 ("mm, compaction: avoid isolating pinned pages") > > > > Not really, that commit was correct at the time it went in. > > > > > Fixes: 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()") > > > > Exactly, that commit was well-intentioned, but did not allow for this > > (admittedly very exceptional) usage. How many developers actually > > make the mistake of applying page_mapcount() to their slab pages? > > None, I expect. That VM_BUG_ON_PAGE() is there for documentation, > > and could just be replaced by a comment - and Linus would be happy > > with that. > > Ok, I'll redo the fix in this way. Thanks. > > > > > > --- > > > mm/compaction.c | 17 +++++++++++++---- > > > 1 file changed, 13 insertions(+), 4 deletions(-) > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > index 46f0fcc93081..91bb87fd9420 100644 > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -935,12 +935,16 @@ isolate_migratepages_block(struct compact_control > > > *cc, unsigned long low_pfn, > > > } > > > /* > > > - * Migration will fail if an anonymous page is pinned in > > > memory, > > > + * Migration will fail if an page is pinned in memory, > > > * so avoid taking lru_lock and isolating it unnecessarily in > > > an > > > - * admittedly racy check. > > > + * admittedly racy check simplest case for 0-order pages. > > > + * > > > + * Open code page_mapcount() to avoid > > > VM_BUG_ON(PageSlab(page)). > > > > But open coding page_mapcount() is not all that you did. You have > > (understandably) chosen to avoid calling page_mapping(page), but... > > > > > + * Page could have extra reference from mapping or swap > > > cache. > > > */ > > > - if (!page_mapping(page) && > > > - page_count(page) > page_mapcount(page)) > > > + if (!PageCompound(page) && > > > + page_count(page) > atomic_read(&page->_mapcount) + 1 + > > > + (!PageAnon(page) || PageSwapCache(page))) > > > goto isolate_fail; > > > > Isn't that test going to send all the file cache pages with buffer heads > > in page->private, off to isolate_fail when they're actually great > > candidates for migration? > > Yes. What a shame. Adding page_has_private() could fix that? > > Kind of > > page_count(page) > page_mapcount(page) + > (PageAnon(page) ? PageSwapCache(page) : (1 + page_has_private(page))) Certainly it was fixable, but I'm too lazy to want to think through the correct answer; and though I'm often out of sympathy with helper functions (why do people want an inline bool function for every simple flag test?!?!), here is a place that cries out for a helper, if you complicate it beyond page_count > page_mapcount (especially when driven into that detail of adding 1 to _mapcount). > > or probably something like this: > > page_count(page) > page_mapcount(page) + > (PageAnon(page) ? PageSwapCache(page) : GUP_PIN_COUNTING_BIAS) > > I.e. skip only file pages pinned by dma or something slower. > I see some movements in this direction in recent changes. > > of course that's independent matter. Yes, once the gup/pin conversion is widespread, I expect that it will allow a better implementation of this compaction test, one not limited to the anonymous pages. (We do internally use a patch extending the current test to file pages, which in practice has saved a lot of time wasted on failing compactions: but, last I looked anyway, it gets some cases wrong - cases we happen not to care about ourselves, but would be unacceptable upstream. So I hope the distinction of pinned pages will work out well here later.) Hugh