From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1742C433E0 for ; Mon, 21 Dec 2020 11:25:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5DC9F22BF3 for ; Mon, 21 Dec 2020 11:25:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5DC9F22BF3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 760036B005C; Mon, 21 Dec 2020 06:25:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 735C56B005D; Mon, 21 Dec 2020 06:25:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FEB76B0068; Mon, 21 Dec 2020 06:25:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0106.hostedemail.com [216.40.44.106]) by kanga.kvack.org (Postfix) with ESMTP id 4620F6B005C for ; Mon, 21 Dec 2020 06:25:55 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0F7A02461 for ; Mon, 21 Dec 2020 11:25:55 +0000 (UTC) X-FDA: 77617059870.27.robin03_11125f827456 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id E6BD23D668 for ; Mon, 21 Dec 2020 11:25:54 +0000 (UTC) X-HE-Tag: robin03_11125f827456 X-Filterd-Recvd-Size: 9654 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 11:25:54 +0000 (UTC) Received: by mail-pl1-f180.google.com with SMTP id j1so5462858pld.3 for ; Mon, 21 Dec 2020 03:25:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lbzKnUaIGMNM9PDUS9zdXlvtDHAvnDHBTqQ8/7RZubg=; b=pZ/y59K3oN1NXzG/P06iSVsj+BwRKPDHbMs/PUQEZcwadZ0cQNDmjdg1mLHIny6Lsy mbiYNg3xGMVJ5unMDeeuKP3H0TR7TZF/d7lllD4ACOb4WZU6SMrS0tLbCL0fygOYAbc8 FXPWzhmnibLpbAgS+fQmxPFiFiemvqdGtxlPE5jCpwhKYRIuTTj3LR2Nfv9ckVhx56yS 6mzFP6+AEtppG2oYC+xlBibgEAArFMYhKGxjhAmUzlWfFB72KNUp4cafhJ6Je+QGsp7i q2gCyiqyDEUsIzB5e8YjSYcGGtGA0DSEqF4Cgzzq1NQj/uV3kOvC7oGNXvksFdPDKOf5 53wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lbzKnUaIGMNM9PDUS9zdXlvtDHAvnDHBTqQ8/7RZubg=; b=lZsQM/ZKotlhnxuOj0nnTOKOd/gimiG5CP1Xpxrmng25q0BD//qes7YzrlmlsyMNvl LsN0nZNnpodN95kZBVJOKKxUa9SLRdOq04GKOf3fuk8DRZpOyLLogvafbXW09B41N3xg 40xeUWS+HFwTSmpJGeRnMYdC/50zVQtBk11QcQCXpzrw0DRBPi8XBH53iCKiwqK5p57u yZhhNjJVf4HcaacuAiXAGKZAsIY9YJfJ4BH7ILQofwLPTS5iYIZAWgKTirIbC6yigNfS uVP1iuKBNf8soarQROvybMgGowRG6nXV7FcXhwWhk7WqOkpaaSmiSkaVh9dZRGD8MiQC vF/g== X-Gm-Message-State: AOAM5306pIcg7eWEhasNJEq13JML6HSAow0gfVvSo3+71J8q838WFlEJ zaVpk6P1dzpGXNMkSLa/9jB8M61GvjYSoAhTtPd3Qw== X-Google-Smtp-Source: ABdhPJxUkJ3MkjyAcOPDYMEhdg7jCDt8LqpLLULIkmvHFjK+/A/pKNvisfJZ3Q96e/qo+1ycnjuS6SDCd+U7Q45wbWU= X-Received: by 2002:a17:90a:5405:: with SMTP id z5mr17321305pjh.13.1608549953093; Mon, 21 Dec 2020 03:25:53 -0800 (PST) MIME-Version: 1.0 References: <20201217121303.13386-1-songmuchun@bytedance.com> <20201217121303.13386-4-songmuchun@bytedance.com> <20201221091123.GB14343@linux> In-Reply-To: <20201221091123.GB14343@linux> From: Muchun Song Date: Mon, 21 Dec 2020 19:25:15 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v10 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page To: Oscar Salvador Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Michal Hocko , "Song Bao Hua (Barry Song)" , David Hildenbrand , naoya.horiguchi@nec.com, Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 21, 2020 at 5:11 PM Oscar Salvador wrote: > > On Thu, Dec 17, 2020 at 08:12:55PM +0800, Muchun Song wrote: > > +static inline void free_bootmem_page(struct page *page) > > +{ > > + unsigned long magic = (unsigned long)page->freelist; > > + > > + /* > > + * The reserve_bootmem_region sets the reserved flag on bootmem > > + * pages. > > + */ > > + VM_WARN_ON(page_ref_count(page) != 2); > > + > > + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) > > + put_page_bootmem(page); > > + else > > + VM_WARN_ON(1); > > Ideally, I think we want to see what how the page looks since its state > is not what we expected, so maybe join both conditions and use dump_page(). Agree. Will do. Thanks. > > > + * By removing redundant page structs for HugeTLB pages, memory can returned to > ^^ be Thanks. > > + * the buddy allocator for other uses. > > [...] > > > +void free_huge_page_vmemmap(struct hstate *h, struct page *head) > > +{ > > + unsigned long vmemmap_addr = (unsigned long)head; > > + > > + if (!free_vmemmap_pages_per_hpage(h)) > > + return; > > + > > + vmemmap_remap_free(vmemmap_addr + RESERVE_VMEMMAP_SIZE, > > + free_vmemmap_pages_size_per_hpage(h)); > > I am not sure what others think, but I would like to see vmemmap_remap_free taking > three arguments: start, end, and reuse addr, e.g: > > void free_huge_page_vmemmap(struct hstate *h, struct page *head) > { > unsigned long vmemmap_addr = (unsigned long)head; > unsigned long vmemmap_end, vmemmap_reuse; > > if (!free_vmemmap_pages_per_hpage(h)) > return; > > vmemmap_addr += RESERVE_MEMMAP_SIZE; > vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); > vmemmap_reuse = vmemmap_addr - PAGE_SIZE; > > vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse); > } > > The reason for me to do this is to let the callers of vmemmap_remap_free decide > __what__ they want to remap. > > More on this below. > > > > +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, > > + unsigned long end, > > + struct vmemmap_remap_walk *walk) > > +{ > > + pte_t *pte; > > + > > + pte = pte_offset_kernel(pmd, addr); > > + > > + if (walk->reuse_addr == addr) { > > + BUG_ON(pte_none(*pte)); > > + walk->reuse_page = pte_page(*pte++); > > + addr += PAGE_SIZE; > > + } > > Although it is quite obvious, a brief comment here pointing out what are we > doing and that this is meant to be set only once would be nice. OK. Will do. > > > > +static void vmemmap_remap_range(unsigned long start, unsigned long end, > > + struct vmemmap_remap_walk *walk) > > +{ > > + unsigned long addr = start - PAGE_SIZE; > > + unsigned long next; > > + pgd_t *pgd; > > + > > + VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE)); > > + VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE)); > > + > > + walk->reuse_page = NULL; > > + walk->reuse_addr = addr; > > With the change I suggested above, struct vmemmap_remap_walk should be > initialitzed at once in vmemmap_remap_free, so this should not longer be needed. You are right. > (And btw, you do not need to set reuse_page to NULL, the way you init the struct > in vmemmap_remap_free makes sure to null any field you do not explicitly set). > > > > +static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, > > + struct vmemmap_remap_walk *walk) > > +{ > > + /* > > + * Make the tail pages are mapped with read-only to catch > > + * illegal write operation to the tail pages. > "Remap the tail pages as read-only to ..." Thanks. > > > + */ > > + pgprot_t pgprot = PAGE_KERNEL_RO; > > + pte_t entry = mk_pte(walk->reuse_page, pgprot); > > + struct page *page; > > + > > + page = pte_page(*pte); > > struct page *page = pte_page(*pte); > > since you did the same for the other two. Yeah. Will change to this. > > > + list_add(&page->lru, walk->vmemmap_pages); > > + > > + set_pte_at(&init_mm, addr, pte, entry); > > +} > > + > > +/** > > + * vmemmap_remap_free - remap the vmemmap virtual address range > > + * [start, start + size) to the page which > > + * [start - PAGE_SIZE, start) is mapped, > > + * then free vmemmap pages. > > + * @start: start address of the vmemmap virtual address range > > + * @size: size of the vmemmap virtual address range > > + */ > > +void vmemmap_remap_free(unsigned long start, unsigned long size) > > +{ > > + unsigned long end = start + size; > > + LIST_HEAD(vmemmap_pages); > > + > > + struct vmemmap_remap_walk walk = { > > + .remap_pte = vmemmap_remap_pte, > > + .vmemmap_pages = &vmemmap_pages, > > + }; > > As stated above, this would become: > > void vmemmap_remap_free(unsigned long start, unsigned long end, > usigned long reuse) > { > LIST_HEAD(vmemmap_pages); > struct vmemmap_remap_walk walk = { > .reuse_addr = reuse, > .remap_pte = vmemmap_remap_pte, > .vmemmap_pages = &vmemmap_pages, > }; > > You might have had your reasons to do this way, but this looks more natural > to me, with the plus that callers of vmemmap_remap_free can specify > what they want to remap. Should we add a BUG_ON in vmemmap_remap_free() for now? BUG_ON(reuse != start + PAGE_SIZE); > > > -- > Oscar Salvador > SUSE L3 -- Yours, Muchun