From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29637C433DB for ; Thu, 14 Jan 2021 10:58:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B530F23A3B for ; Thu, 14 Jan 2021 10:58:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B530F23A3B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D9C478D00D5; Thu, 14 Jan 2021 05:57:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D4C4E8D008E; Thu, 14 Jan 2021 05:57:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEDAB8D00D5; Thu, 14 Jan 2021 05:57:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id AA37F8D008E for ; Thu, 14 Jan 2021 05:57:59 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 711C93629 for ; Thu, 14 Jan 2021 10:57:59 +0000 (UTC) X-FDA: 77704080678.14.pipe24_020c2db27526 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 5666A18229835 for ; Thu, 14 Jan 2021 10:57:59 +0000 (UTC) X-HE-Tag: pipe24_020c2db27526 X-Filterd-Recvd-Size: 7891 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Thu, 14 Jan 2021 10:57:58 +0000 (UTC) Received: by mail-pg1-f170.google.com with SMTP id q7so3534653pgm.5 for ; Thu, 14 Jan 2021 02:57:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cept6yDgxD1Uvrl1z08HyfZGxWF+qA4CuNNCEN/9TzM=; b=yiPxKRaazrQJhm62718dy5jL9EaBAIuTRkSQHSomn4R0VBrIU3J6k1Z45mE6OxLQua dLFtmeFhBJGfByaRgSLfQ4KRSVykikS+QfdJC9ViPatOCWLJb/I/pn/Erle1WQc/sDT7 wPATSR+VHZd0wHjPVOWyvJlEFwPzreQbYFXVmZsUoe2yGW4cTZlmgiSOfq8OKs+fzzet 6d7s09enWsOtGSVjoTAk5vGe6itxuSoIe42v61o9cfQWVlRMHStezspGOEdgqtki2lYS UYXD3pjurb2PEvLZlz+OGFVGGPWXpzxTkLbr143otO4TTWwCBeVKsDT3pjQohKjzJqCC OFyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cept6yDgxD1Uvrl1z08HyfZGxWF+qA4CuNNCEN/9TzM=; b=fhpRAc7mbl8d2sF4CwnSX6cqs10J5pjVyX6SNVJn//PYIA+09Zri+sjhJo+f5cmCV/ 8YB7imawjuMuoFhAqW0TWSNsPlYELkJCH8CBevUqQ20mgwHCK0Xi9l4KRK2Ym4+F//KJ ObsVcH07lKIj0IMMjcD5BRLtl8eiveygJMmWOhUfNh9eQ0Mx0BWb73KKkYZbEj8Go/NU sFTMgYOlO8ewKY4S+k2jrzUXLu1MmLjdNdmFG1ufBEhKtH/dXQTa8r+jpQL3eBxSyN3o 73uO3HMYQaLpRU462cQo/VJ7JVpFj57HJdKLT/lav3BA2kOHJ6eXI2P5SNAZMwTZ0GRJ PPCg== X-Gm-Message-State: AOAM532UOFAe79eejfffHbbR56DjURZreDSR8rr3pL9QrmNz/yeg3E/U fNSRT9JkUk6gtcYr25tQqsHtJnrHeN1pYczakMn2gw== X-Google-Smtp-Source: ABdhPJyyMBF7pLP/eCMLv2h2a9321/g/MkgMtV/7I8OMnLA3cgFxTgYwKtCr5oB//J+rkCRPpgkMXLkFK0uouXSl4/k= X-Received: by 2002:a63:1f21:: with SMTP id f33mr6986456pgf.31.1610621877319; Thu, 14 Jan 2021 02:57:57 -0800 (PST) MIME-Version: 1.0 References: <20210106141931.73931-1-songmuchun@bytedance.com> <20210106141931.73931-5-songmuchun@bytedance.com> <20210112080453.GA10895@linux> In-Reply-To: <20210112080453.GA10895@linux> From: Muchun Song Date: Thu, 14 Jan 2021 18:57:16 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v12 04/13] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page To: Oscar Salvador Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Michal Hocko , "Song Bao Hua (Barry Song)" , David Hildenbrand , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 12, 2021 at 4:05 PM Oscar Salvador wrote: > > On Wed, Jan 06, 2021 at 10:19:22PM +0800, Muchun Song wrote: > > Every HugeTLB has more than one struct page structure. We __know__ that > > we only use the first 4(HUGETLB_CGROUP_MIN_ORDER) struct page structures > > to store metadata associated with each HugeTLB. > > > > There are a lot of struct page structures associated with each HugeTLB > > page. For tail pages, the value of compound_head is the same. So we can > > reuse first page of tail page structures. We map the virtual addresses > > of the remaining pages of tail page structures to the first tail page > > struct, and then free these page frames. Therefore, we need to reserve > > two pages as vmemmap areas. > > > > When we allocate a HugeTLB page from the buddy, we can free some vmemmap > > pages associated with each HugeTLB page. It is more appropriate to do it > > in the prep_new_huge_page(). > > > > The free_vmemmap_pages_per_hpage(), which indicates how many vmemmap > > pages associated with a HugeTLB page can be freed, returns zero for > > now, which means the feature is disabled. We will enable it once all > > the infrastructure is there. > > > > Signed-off-by: Muchun Song > > My memory may betray me after vacation, so bear with me. > > > +/* > > + * Any memory allocated via the memblock allocator and not via the > > + * buddy will be marked reserved already in the memmap. For those > > + * pages, we can call this function to free it to buddy allocator. > > + */ > > +static inline void free_bootmem_page(struct page *page) > > +{ > > + unsigned long magic = (unsigned long)page->freelist; > > + > > + /* > > + * The reserve_bootmem_region sets the reserved flag on bootmem > > + * pages. > > + */ > > + VM_WARN_ON_PAGE(page_ref_count(page) != 2, page); > > I have been thinking about this some more. > And while I think that this macro might have its room somewhere, I do not > think this is the case. > > Here, if we see that page's refcount differs from 2 it means that we had an > earlier corruption. > Now, as a person that has dealt with debugging memory corruptions, I think it > is of no use to proceed further if such corruption happened, as this can lead > to problems somewhere else that can manifest in funny ways, and you will find > yourself scratching your head and trying to work out what happened. > > I am aware that this is not the root of the problem here, as someone might have > had to decrease the refcount, but I would definitely change this to its > VM_BUG_ON_* variant. > > > --- /dev/null > > +++ b/mm/hugetlb_vmemmap.c > > [...] > > > diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h > > new file mode 100644 > > index 000000000000..6923f03534d5 > > --- /dev/null > > +++ b/mm/hugetlb_vmemmap.h > > [...] > > > +/** > > + * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) > > + * to the page which @reuse is mapped, then free vmemmap > > + * pages. > > + * @start: start address of the vmemmap virtual address range. > > + * @end: end address of the vmemmap virtual address range. > > + * @reuse: reuse address. > > + */ > > +void vmemmap_remap_free(unsigned long start, unsigned long end, > > + unsigned long reuse) > > +{ > > + LIST_HEAD(vmemmap_pages); > > + struct vmemmap_remap_walk walk = { > > + .remap_pte = vmemmap_remap_pte, > > + .reuse_addr = reuse, > > + .vmemmap_pages = &vmemmap_pages, > > + }; > > + > > + BUG_ON(start != reuse + PAGE_SIZE); > > It seems a bit odd to only pass "start" for the BUG_ON. > Also, I kind of dislike the "addr += PAGE_SIZE" in vmemmap_pte_range. > > I wonder if adding a ".remap_start_addr" would make more sense. > And adding it here with the vmemmap_remap_walk init. How would vmemmap_pte_range look? If we introduce vmemmap_remap_walk, "addr += PAGE_SIZE" can drop? > > > -- > Oscar Salvador > SUSE L3