From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D18E5C433DB for ; Tue, 12 Jan 2021 08:05:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 29E9222D2C for ; Tue, 12 Jan 2021 08:05:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 29E9222D2C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4D8248D006F; Tue, 12 Jan 2021 03:05:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4875F8D0051; Tue, 12 Jan 2021 03:05:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34FCE8D006F; Tue, 12 Jan 2021 03:05:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 19B1F8D0051 for ; Tue, 12 Jan 2021 03:05:05 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DC5B8180AD815 for ; Tue, 12 Jan 2021 08:05:04 +0000 (UTC) X-FDA: 77696387328.02.glue17_040bb4b27513 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id C9B0610097AA1 for ; Tue, 12 Jan 2021 08:05:04 +0000 (UTC) X-HE-Tag: glue17_040bb4b27513 X-Filterd-Recvd-Size: 5352 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Jan 2021 08:05:04 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id ADD64AC8F; Tue, 12 Jan 2021 08:05:02 +0000 (UTC) Date: Tue, 12 Jan 2021 09:04:58 +0100 From: Oscar Salvador To: Muchun Song Cc: corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, naoya.horiguchi@nec.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v12 04/13] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page Message-ID: <20210112080453.GA10895@linux> References: <20210106141931.73931-1-songmuchun@bytedance.com> <20210106141931.73931-5-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210106141931.73931-5-songmuchun@bytedance.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 06, 2021 at 10:19:22PM +0800, Muchun Song wrote: > Every HugeTLB has more than one struct page structure. We __know__ that > we only use the first 4(HUGETLB_CGROUP_MIN_ORDER) struct page structures > to store metadata associated with each HugeTLB. > > There are a lot of struct page structures associated with each HugeTLB > page. For tail pages, the value of compound_head is the same. So we can > reuse first page of tail page structures. We map the virtual addresses > of the remaining pages of tail page structures to the first tail page > struct, and then free these page frames. Therefore, we need to reserve > two pages as vmemmap areas. > > When we allocate a HugeTLB page from the buddy, we can free some vmemmap > pages associated with each HugeTLB page. It is more appropriate to do it > in the prep_new_huge_page(). > > The free_vmemmap_pages_per_hpage(), which indicates how many vmemmap > pages associated with a HugeTLB page can be freed, returns zero for > now, which means the feature is disabled. We will enable it once all > the infrastructure is there. > > Signed-off-by: Muchun Song My memory may betray me after vacation, so bear with me. > +/* > + * Any memory allocated via the memblock allocator and not via the > + * buddy will be marked reserved already in the memmap. For those > + * pages, we can call this function to free it to buddy allocator. > + */ > +static inline void free_bootmem_page(struct page *page) > +{ > + unsigned long magic = (unsigned long)page->freelist; > + > + /* > + * The reserve_bootmem_region sets the reserved flag on bootmem > + * pages. > + */ > + VM_WARN_ON_PAGE(page_ref_count(page) != 2, page); I have been thinking about this some more. And while I think that this macro might have its room somewhere, I do not think this is the case. Here, if we see that page's refcount differs from 2 it means that we had an earlier corruption. Now, as a person that has dealt with debugging memory corruptions, I think it is of no use to proceed further if such corruption happened, as this can lead to problems somewhere else that can manifest in funny ways, and you will find yourself scratching your head and trying to work out what happened. I am aware that this is not the root of the problem here, as someone might have had to decrease the refcount, but I would definitely change this to its VM_BUG_ON_* variant. > --- /dev/null > +++ b/mm/hugetlb_vmemmap.c [...] > diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h > new file mode 100644 > index 000000000000..6923f03534d5 > --- /dev/null > +++ b/mm/hugetlb_vmemmap.h [...] > +/** > + * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) > + * to the page which @reuse is mapped, then free vmemmap > + * pages. > + * @start: start address of the vmemmap virtual address range. > + * @end: end address of the vmemmap virtual address range. > + * @reuse: reuse address. > + */ > +void vmemmap_remap_free(unsigned long start, unsigned long end, > + unsigned long reuse) > +{ > + LIST_HEAD(vmemmap_pages); > + struct vmemmap_remap_walk walk = { > + .remap_pte = vmemmap_remap_pte, > + .reuse_addr = reuse, > + .vmemmap_pages = &vmemmap_pages, > + }; > + > + BUG_ON(start != reuse + PAGE_SIZE); It seems a bit odd to only pass "start" for the BUG_ON. Also, I kind of dislike the "addr += PAGE_SIZE" in vmemmap_pte_range. I wonder if adding a ".remap_start_addr" would make more sense. And adding it here with the vmemmap_remap_walk init. -- Oscar Salvador SUSE L3