From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5789C4361B for ; Thu, 17 Dec 2020 03:23:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2A0ED2371F for ; Thu, 17 Dec 2020 03:23:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A0ED2371F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4CD606B0036; Wed, 16 Dec 2020 22:23:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A5556B005D; Wed, 16 Dec 2020 22:23:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36CCE6B0068; Wed, 16 Dec 2020 22:23:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id 207A76B0036 for ; Wed, 16 Dec 2020 22:23:03 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D92E98249980 for ; Thu, 17 Dec 2020 03:23:02 +0000 (UTC) X-FDA: 77601327804.07.spark27_2b009f127431 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id BC6841803F9A4 for ; Thu, 17 Dec 2020 03:23:02 +0000 (UTC) X-HE-Tag: spark27_2b009f127431 X-Filterd-Recvd-Size: 7306 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Dec 2020 03:23:01 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id lb18so2863488pjb.5 for ; Wed, 16 Dec 2020 19:23:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mlMtQGu2Ifr2DB7fOrHeXkv/k4LyT1iT196YLb6MeGY=; b=sQVmH0gSAc8M7hEb33NnMYoVdTE57AThi904A5up8MfLUFV0xP/MIucJVRJtieMu6M larvHzjsQtVG0nSrqsFOtW2d3S8oWkXIQbc8iwkv2nD++1DUn4vgNBuUs+VCGuqlZBmq WA7XN/2WW14csLIrQ5F7EYAQOQ7vpvjKl+YP03uGUIc/e0LC3WiJhorFSfX9Ys0Jkh77 DGfSHuz7N/QFF2u/VarqhXkjeFZ7fdUnNIdpOcKqDVM4nzelQqmyNCzqfv9OlBYNOrFk CoNsX3Zijnuhs+Vnicbw8hHw5IfPZ2+nPicEteFUdaIx0C5GmPOPTzYCS5rEN0wsge3w a2Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mlMtQGu2Ifr2DB7fOrHeXkv/k4LyT1iT196YLb6MeGY=; b=sGemblUgm9pRNk8v3m9w1MnmScfIVDD8EVwkUhIjnVLxWdrVC6bjK5FB07ndZ/chNN WabhYDqNZdwnVlsmOYArI4R/Ieo7NeCEWMu5Ko1KqL8SK2lc/keVmHwOKAFSOvEm5dvg gMYK6F0IxNXO5YOmBj4isPJuVDAYdMmSJLSLw7iQCX4xvUNWWtk7MPJq/JNfxt31zd4L H5LGQZzhtH+XpFnbf4rlh3zCplQ20cuBTxBBTctXZ5z5D21xqxbGMiXFPK60ttq5Ml3t KAEIzKSsaS3O2IKXtAKAjqGxeBkGZ5ozuxGiybSpzCkIqSR9bW5Oz8cutjX9AydJqFYA NTVg== X-Gm-Message-State: AOAM530jv/I/GiflO1YW+D6o6l4houTqncfo3pfncKAKyi13IjtbxZ0I 7DQSSUoHMoe+vxy3iWw9MD3Fq7yWxtIcC7n6sIC46w== X-Google-Smtp-Source: ABdhPJzN/F5jEThlpm/kOqB6Sl/IhLB8qsqLnwpT6R9vRPzZwrQkAmdl+WZeX/wP8HtfTj0Cqlyyj1+BBS+FhHKjX6E= X-Received: by 2002:a17:90a:ba88:: with SMTP id t8mr5826632pjr.229.1608175380735; Wed, 16 Dec 2020 19:23:00 -0800 (PST) MIME-Version: 1.0 References: <20201213154534.54826-1-songmuchun@bytedance.com> <20201213154534.54826-6-songmuchun@bytedance.com> <153c505c-d78f-42f2-9a56-04b2b4f6ae7c@oracle.com> In-Reply-To: <153c505c-d78f-42f2-9a56-04b2b4f6ae7c@oracle.com> From: Muchun Song Date: Thu, 17 Dec 2020 11:22:24 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v9 05/11] mm/hugetlb: Allocate the vmemmap pages associated with each HugeTLB page To: Mike Kravetz Cc: Jonathan Corbet , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , Michal Hocko , "Song Bao Hua (Barry Song)" , David Hildenbrand , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 17, 2020 at 9:17 AM Mike Kravetz wrote: > > On 12/13/20 7:45 AM, Muchun Song wrote: > > When we free a HugeTLB page to the buddy allocator, we should allocate the > > vmemmap pages associated with it. We can do that in the __free_hugepage() > > before freeing it to buddy. > > ... > > > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > > index 78c527617e8d..ffcf092c92ed 100644 > > --- a/mm/sparse-vmemmap.c > > +++ b/mm/sparse-vmemmap.c > > @@ -29,6 +29,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -39,7 +40,8 @@ > > * > > * @rmap_pte: called for each non-empty PTE (lowest-level) entry. > > * @reuse: the page which is reused for the tail vmemmap pages. > > - * @vmemmap_pages: the list head of the vmemmap pages that can be freed. > > + * @vmemmap_pages: the list head of the vmemmap pages that can be freed > > + * or is mapped from. > > */ > > struct vmemmap_rmap_walk { > > void (*rmap_pte)(pte_t *pte, unsigned long addr, > > @@ -54,6 +56,9 @@ struct vmemmap_rmap_walk { > > */ > > #define VMEMMAP_TAIL_PAGE_REUSE -1 > > > > +/* The gfp mask of allocating vmemmap page */ > > +#define GFP_VMEMMAP_PAGE (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) > > + > > static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, > > unsigned long end, struct vmemmap_rmap_walk *walk) > > { > > @@ -200,6 +205,68 @@ void vmemmap_remap_reuse(unsigned long start, unsigned long size) > > free_vmemmap_page_list(&vmemmap_pages); > > } > > > > +static void vmemmap_remap_restore_pte(pte_t *pte, unsigned long addr, > > + struct vmemmap_rmap_walk *walk) > > +{ > > + pgprot_t pgprot = PAGE_KERNEL; > > + struct page *page; > > + void *to; > > + > > + BUG_ON(pte_page(*pte) != walk->reuse); > > + > > + page = list_first_entry(walk->vmemmap_pages, struct page, lru); > > + list_del(&page->lru); > > + to = page_to_virt(page); > > + copy_page(to, page_to_virt(walk->reuse)); > > + > > + set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); > > +} > > + > > +static void alloc_vmemmap_page_list(struct list_head *list, > > + unsigned long nr_pages) > > +{ > > + while (nr_pages--) { > > + struct page *page; > > + > > +retry: > > + page = alloc_page(GFP_VMEMMAP_PAGE); > > Should we try (or require) the vmemmap page be on the same node as the > pages they describe? I imagine performance would be impacted if a > struct page and the page it describes are on different numa nodes. Yeah, it is a good idea. I also think that we should do this. I will do that in the next version. Thanks. > > > + if (unlikely(!page)) { > > + msleep(100); > > + /* > > + * We should retry infinitely, because we cannot > > + * handle allocation failures. Once we allocate > > + * vmemmap pages successfully, then we can free > > + * a HugeTLB page. > > + */ > > + goto retry; > > + } > > + list_add_tail(&page->lru, list); > > + } > > +} > > + > > -- > Mike Kravetz -- Yours, Muchun