From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EB5CC2D0E4 for ; Tue, 24 Nov 2020 13:32:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0BF372076E for ; Tue, 24 Nov 2020 13:32:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0BF372076E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 797D56B012C; Tue, 24 Nov 2020 08:32:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7473D6B012E; Tue, 24 Nov 2020 08:32:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 635D66B012F; Tue, 24 Nov 2020 08:32:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id 4D1086B012C for ; Tue, 24 Nov 2020 08:32:10 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 103713631 for ; Tue, 24 Nov 2020 13:32:10 +0000 (UTC) X-FDA: 77519400420.22.wind99_090e6002736e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id D99C018038E60 for ; Tue, 24 Nov 2020 13:32:09 +0000 (UTC) X-HE-Tag: wind99_090e6002736e X-Filterd-Recvd-Size: 5750 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Tue, 24 Nov 2020 13:32:09 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id DD8CFAFC9; Tue, 24 Nov 2020 13:32:07 +0000 (UTC) Date: Tue, 24 Nov 2020 13:32:05 +0000 From: Mel Gorman To: Vlastimil Babka Cc: Andrea Arcangeli , Andrew Morton , linux-mm@kvack.org, Qian Cai , Michal Hocko , David Hildenbrand , linux-kernel@vger.kernel.org, Mike Rapoport , Baoquan He Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Message-ID: <20201124133205.GK3306@suse.de> References: <8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw> <20201121194506.13464-1-aarcange@redhat.com> <20201121194506.13464-2-aarcange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 23, 2020 at 02:01:16PM +0100, Vlastimil Babka wrote: > > I actually never reproduced it until v5.9, but it's still the same bug > > as it was reported first for v5.7. > > > > See the page is "reserved" in all 3 cases. In the last two crashes > > with the pfn: > > > > pfn 0x39800 -> 0x39800000 min_pfn hit non-RAM: > > > > 39639000-39814fff : Unknown E820 type > > > > pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM: > > > > 7a17b000-7a216fff : Unknown E820 type > > It would be nice to also provide a /proc/zoneinfo and how exactly the > "zone_spans_pfn" was violated. I assume we end up below zone's start_pfn, > but is it true? > It must be if zone_spans_pfn is getting triggered. > > This actually seems a false positive bugcheck, the page structures are > > valid and the zones are correct, just it's non-RAM but setting > > pageblockskip should do no harm. However it's possible to solve the > > crash without lifting the bugcheck, by enforcing the invariant that > > the free_pfn cursor doesn't point to reserved pages (which would be > > otherwise implicitly achieved through the PageBuddy check, except in > > the new fast_isolate_around() path). > > > > Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") > > Signed-off-by: Andrea Arcangeli > > --- > > mm/compaction.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index 13cb7a961b31..d17e69549d34 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1433,7 +1433,10 @@ fast_isolate_freepages(struct compact_control *cc) > > page = pageblock_pfn_to_page(min_pfn, > > pageblock_end_pfn(min_pfn), > > cc->zone); > > - cc->free_pfn = min_pfn; > > + if (likely(!PageReserved(page))) > > PageReserved check seems rather awkward solution to me. Wouldn't it be more > obvious if we made sure we don't end up below zone's start_pfn (if my > assumption is correct) in the first place? > > When I check the code: > > unsigned long distance; > distance = (cc->free_pfn - cc->migrate_pfn); > low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2)); > min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1)); > > I think what can happen is that cc->free_pfn <= cc->migrate_pfn after the > very last isolate_migratepages(). I would hope that is not the case because they are not meant to overlap. However, if the beginning of the pageblock was not the start of a zone then the pages would be valid but the pfn would still be outside the zone boundary. If it was reserved, the struct page is valid but not suitable for set_pfnblock_flags_mask. However, it is a concern in general because the potential is there that pages are isolated from the wrong zone. > Then compact_finished() detects that in > compact_zone(), but only after migrate_pages() and thus > fast_isolate_freepages() is called. > > That would mean distance can be negative, or rather a large unsigned number > and low_pfn and min_pfn end up away from the zone? > > Or maybe the above doesn't happen, but cc->free_pfn gets so close to start > of the zone, that the calculations above make min_pfn go below start_pfn? > I think the last part is more likely, going below start_pfn > In any case I would rather make sure we stay within the expected zone > boundaries, than play tricks with PageReserved. Mel? > It would be preferable because this time it's PageReserved that happened to trip up an assumption in set_pfnblock_flags_mask but if it was a real zone and real page then compaction is migrating cross-zone which would be surprising. Maybe this untested patch? diff --git a/mm/compaction.c b/mm/compaction.c index 13cb7a961b31..ef1b5dacc289 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1330,6 +1330,10 @@ fast_isolate_freepages(struct compact_control *cc) low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2)); min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1)); + /* Ensure the PFN is within the zone */ + low_pfn = max(cc->zone->zone_start_pfn, low_pfn); + min_pfn = max(cc->zone->zone_start_pfn, min_pfn); + if (WARN_ON_ONCE(min_pfn > low_pfn)) low_pfn = min_pfn; -- Mel Gorman SUSE Labs