From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87668CCA47C for ; Wed, 29 Jun 2022 06:39:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D73ED8E0002; Wed, 29 Jun 2022 02:39:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4B526B0072; Wed, 29 Jun 2022 02:39:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C15898E0002; Wed, 29 Jun 2022 02:39:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B48076B0071 for ; Wed, 29 Jun 2022 02:39:26 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 818522EB09 for ; Wed, 29 Jun 2022 06:39:26 +0000 (UTC) X-FDA: 79630321932.28.16384F2 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf20.hostedemail.com (Postfix) with ESMTP id 377F51C0042 for ; Wed, 29 Jun 2022 06:39:23 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id w19-20020a17090a8a1300b001ec79064d8dso18349387pjn.2 for ; Tue, 28 Jun 2022 23:39:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=3fDtVdFXcJ1THjzBC8Y1/G61yWX9UP0wjExNA/N4V6k=; b=bvufUtoDfA/LfWUF82SXjQBHU2AmthBebbO+sxI0rXBGT8WtDWG95lIcL4rZGxP5Tt PwZGqgSPUqKXjTiv9mlS7ejYruLqcwcKOMjX9Mh5EaSbgB6IGfuZBUbSvTcBQaTKJaEt VeBhPLlVwum1JmdK8FMVEvvbzI7/V33dbCBFjFoWsvD0BXSXndsu+9KM3gBbgFQ6zKPk saN9jAkLtYiQttbVbqLKYsACpC8c+omvBbtiGBOdrZsd4FJh4xwcBrzCxJtXgTj6/J6u /kLln4GD5YmkOYVoyo1ax+Tzrn5FJFSlsdVCpPGfBsGUNES6COLFFOdMXZzb9PGqHg/z 2baQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=3fDtVdFXcJ1THjzBC8Y1/G61yWX9UP0wjExNA/N4V6k=; b=x/SZ3/jvdvlFzq44eOKnz1rga9nfNfH0krDnUCqBKNXlPZp/lLzO2qlSjNkZFjXOUx n+GcJOSMCTRVgYB/kBm86/y67qDF4aZqE+dfY7h+JS/RdNASFTFL8vxoOvsC90U6XaS7 QnuMQTVow0IYF3hChMCbDVpx9nJ9cVLesfMLGiyAk80ud2ICNuXy/lDUnBEs/dJ8o9ns xqqQsh8zqleU67h4iXTiPTWZzgs6oY25JErS7416RxPo3rfD4Smd19s+dLwafpclSP1d pH8mRh3av1T2EKToA17ZwuegXlsT3bDsW6/vsejfc0cTP55MkXqoTbU3NdCgL5jERoEZ OkiA== X-Gm-Message-State: AJIora/O2pzT0VRpm53lSrPibFMY0Ds7CAIIFIEq7mC5nCRk0iLy3r+i w0DOZvMLVcsiJR6NzLUxMaPIkw== X-Google-Smtp-Source: AGRyM1vkqnd7LQ1E7AuJh2kisRaI+S1yh0QCO2tzOtAaBIZIqvJk2B/XKhQXCmkVD2Q5kBC7FeF0nw== X-Received: by 2002:a17:902:d2c4:b0:16a:5c48:8312 with SMTP id n4-20020a170902d2c400b0016a5c488312mr7785514plc.45.1656484761389; Tue, 28 Jun 2022 23:39:21 -0700 (PDT) Received: from localhost ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id y27-20020a634b1b000000b0040cff9def93sm10268662pga.66.2022.06.28.23.39.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 23:39:20 -0700 (PDT) Date: Wed, 29 Jun 2022 14:39:17 +0800 From: Muchun Song To: James Houghton Cc: Mike Kravetz , Peter Xu , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates Message-ID: References: <20220624173656.2033256-1-jthoughton@google.com> <20220624173656.2033256-3-jthoughton@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=bvufUtoD; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf20.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656484766; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3fDtVdFXcJ1THjzBC8Y1/G61yWX9UP0wjExNA/N4V6k=; b=EKBSZYtgP9Bg6xXKbW28YEGqbqvFtzdYw9pv50zhb+x3fLAOdw8tnM5ctiur/E7iL1bD3F ibEGSnPAoGj7sX688rsUXDJZ1uqmBuKC2webZcBbXSLceB8qY/+Tirav5wciyLbDRWGUIK t937JjwSSMNTbCeseMH6wFMxrDS9Xlw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656484766; a=rsa-sha256; cv=none; b=xgD/Us3dxf8bXabliWsHiC3OpTpr9Da/D2HjK93F6p4yX+RzvqwCSbrIq71jDyVzQ9Uii0 GJBoaGeo38iI/bDs4UIHBuhKJjNy9Jj2nXhQcyRUxY/iPi4uTEDALeYT2YukJIUVHa+SIY E53lPxZZX8fKvquYpUZrUXxawqdosiw= X-Stat-Signature: 98o7817oj8dd94i9xatoacpjx5cuas5y X-Rspamd-Queue-Id: 377F51C0042 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=bvufUtoD; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf20.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1656484763-571300 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 28, 2022 at 08:40:27AM -0700, James Houghton wrote: > On Mon, Jun 27, 2022 at 11:42 AM Mike Kravetz wrote: > > > > On 06/24/22 17:36, James Houghton wrote: > > > When using HugeTLB high-granularity mapping, we need to go through the > > > supported hugepage sizes in decreasing order so that we pick the largest > > > size that works. Consider the case where we're faulting in a 1G hugepage > > > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with > > > a PUD. By going through the sizes in decreasing order, we will find that > > > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. > > > > > > Signed-off-by: James Houghton > > > --- > > > mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++--- > > > 1 file changed, 37 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index a57e1be41401..5df838d86f32 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -33,6 +33,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > @@ -48,6 +49,10 @@ > > > > > > int hugetlb_max_hstate __read_mostly; > > > unsigned int default_hstate_idx; > > > +/* > > > + * After hugetlb_init_hstates is called, hstates will be sorted from largest > > > + * to smallest. > > > + */ > > > struct hstate hstates[HUGE_MAX_HSTATE]; > > > > > > #ifdef CONFIG_CMA > > > @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) > > > kfree(node_alloc_noretry); > > > } > > > > > > +static int compare_hstates_decreasing(const void *a, const void *b) > > > +{ > > > + const int shift_a = huge_page_shift((const struct hstate *)a); > > > + const int shift_b = huge_page_shift((const struct hstate *)b); > > > + > > > + if (shift_a < shift_b) > > > + return 1; > > > + if (shift_a > shift_b) > > > + return -1; > > > + return 0; > > > +} > > > + > > > +static void sort_hstates(void) > > > +{ > > > + unsigned long default_hstate_sz = huge_page_size(&default_hstate); > > > + > > > + /* Sort from largest to smallest. */ > > > + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), > > > + compare_hstates_decreasing, NULL); > > > + > > > + /* > > > + * We may have changed the location of the default hstate, so we need to > > > + * update it. > > > + */ > > > + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz)); > > > +} > > > + > > > static void __init hugetlb_init_hstates(void) > > > { > > > struct hstate *h, *h2; > > > > > > - for_each_hstate(h) { > > > - if (minimum_order > huge_page_order(h)) > > > - minimum_order = huge_page_order(h); > > > + sort_hstates(); > > > > > > + /* The last hstate is now the smallest. */ > > > + minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]); > > > + > > > + for_each_hstate(h) { > > > /* oversize hugepages were init'ed in early boot */ > > > if (!hstate_is_gigantic(h)) > > > hugetlb_hstate_alloc_pages(h); > > > > This may/will cause problems for gigantic hugetlb pages allocated at boot > > time. See alloc_bootmem_huge_page() where a pointer to the associated hstate > > is encoded within the allocated hugetlb page. These pages are added to > > hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved > > hstate to add prep the gigantic page and add to the correct pool. Currently, > > gather_bootmem_prealloc is called after hugetlb_init_hstates. So, changing > > hstate order will cause errors. > > > > I do not see any reason why we could not call gather_bootmem_prealloc before > > hugetlb_init_hstates to avoid this issue. > > Thanks for catching this, Mike. Your suggestion certainly seems to > work, but it also seems kind of error prone. I'll have to look at the > code more closely, but maybe it would be better if I just maintained a > separate `struct hstate *sorted_hstate_ptrs[]`, where the original I don't think this is a good idea. If you really rely on the order of the initialization in this patch. The easier solution is changing huge_bootmem_page->hstate to huge_bootmem_page->hugepagesz. Then we can use size_to_hstate(huge_bootmem_page->hugepagesz) in gather_bootmem_prealloc(). Thanks. > locations of the hstates remain unchanged, as to not break > gather_bootmem_prealloc/other things. > > > -- > > Mike Kravetz >