From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BF78C433E1 for ; Thu, 13 Aug 2020 11:46:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C7F4E20866 for ; Thu, 13 Aug 2020 11:46:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C7F4E20866 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 669A66B000A; Thu, 13 Aug 2020 07:46:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 617E26B000C; Thu, 13 Aug 2020 07:46:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52DF66B000D; Thu, 13 Aug 2020 07:46:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 3C8DA6B000A for ; Thu, 13 Aug 2020 07:46:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id EB007181AEF23 for ; Thu, 13 Aug 2020 11:46:40 +0000 (UTC) X-FDA: 77145368160.28.chair03_2c01d3c26ff3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id BB4BB6C2F for ; Thu, 13 Aug 2020 11:46:40 +0000 (UTC) X-HE-Tag: chair03_2c01d3c26ff3 X-Filterd-Recvd-Size: 5210 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Thu, 13 Aug 2020 11:46:40 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 26F8BACE3; Thu, 13 Aug 2020 11:47:01 +0000 (UTC) Date: Thu, 13 Aug 2020 13:46:38 +0200 From: Michal Hocko To: Mike Kravetz Cc: Baoquan He , Wei Yang , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page Message-ID: <20200813114638.GJ9477@dhcp22.suse.cz> References: <20200807091251.12129-1-richard.weiyang@linux.alibaba.com> <20200807091251.12129-11-richard.weiyang@linux.alibaba.com> <20200810021737.GV14854@MiWiFi-R3L-srv> <129cc03e-c6d5-24f8-2f3c-f5a3cc821e76@oracle.com> <20200811015148.GA10792@MiWiFi-R3L-srv> <20200811065406.GC4793@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: BB4BB6C2F X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 11-08-20 14:43:28, Mike Kravetz wrote: > On 8/10/20 11:54 PM, Michal Hocko wrote: > > > > I have managed to forgot all the juicy details since I have made that > > change. All that remains is that the surplus pages accounting was quite > > tricky and back then I didn't figure out a simpler method that would > > achieve the consistent look at those counters. As mentioned above I > > suspect this could lead to pre-mature allocation failures while the > > migration is ongoing. > > It is likely lost in the e-mail thread, but the suggested change was to > alloc_surplus_huge_page(). The code which allocates the migration target > (alloc_migrate_huge_page) will not be changed. So, this should not be > an issue. OK, I've missed that obviously. > > Sure quite unlikely to happen and the race window > > is likely very small. Maybe this is even acceptable but I would strongly > > recommend to have all this thinking documented in the changelog. > > I wrote down a description of what happens in the two different approaches > "temporary page" vs "surplus page". It is at the very end of this e-mail. > When looking at the details, I came up with what may be an even better > approach. Why not just call the low level routine to free the page instead > of going through put_page/free_huge_page? At the very least, it saves a > lock roundtrip and there is no need to worry about the counters/accounting. > > Here is a patch to do that. However, we are optimizing a return path in > a race condition that we are unlikely to ever hit. I 'tested' it by allocating > an 'extra' page and freeing it via this method in alloc_surplus_huge_page. > > >From 864c5f8ef4900c95ca3f6f2363a85f3cb25e793e Mon Sep 17 00:00:00 2001 > From: Mike Kravetz > Date: Tue, 11 Aug 2020 12:45:41 -0700 > Subject: [PATCH] hugetlb: optimize race error return in > alloc_surplus_huge_page > > The routine alloc_surplus_huge_page() could race with with a pool > size change. If this happens, the allocated page may not be needed. > To free the page, the current code will 'Abuse temporary page to > workaround the nasty free_huge_page codeflow'. Instead, directly > call the low level routine that free_huge_page uses. This works > out well because the page is new, we hold the only reference and > already hold the hugetlb_lock. > > Signed-off-by: Mike Kravetz > --- > mm/hugetlb.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 590111ea6975..ac89b91fba86 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1923,14 +1923,17 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > /* > * We could have raced with the pool size change. > * Double check that and simply deallocate the new page > - * if we would end up overcommiting the surpluses. Abuse > - * temporary page to workaround the nasty free_huge_page > - * codeflow > + * if we would end up overcommiting the surpluses. > */ > if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { > - SetPageHugeTemporary(page); > + /* > + * Since this page is new, we hold the only reference, and > + * we already hold the hugetlb_lock call the low level free > + * page routine. This saves at least a lock roundtrip. > + */ > + (void)put_page_testzero(page); /* don't call destructor */ > + update_and_free_page(h, page); > spin_unlock(&hugetlb_lock); > - put_page(page); > return NULL; > } else { > h->surplus_huge_pages++; Yes this makes sense. I would have to think about this more to be confident and give Acked-by but this looks sensible from a quick glance. Thanks! -- Michal Hocko SUSE Labs