From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 92C66C433EF for ; Wed, 12 Jan 2022 12:03:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=yEAkGZDt9NCFvYBvS98R9MUWesiMwmKgGMMeDZFV6SY=; b=ylFXQUJr40Sz25 dlZyll+HQro7hJaqhHJAG2lRvfeC5+1vesQ4bIDEKvHEDsLN2a6ZuZUPUgxaLDNzgiPOR5uQIpvO1 S2WvM8uLjawWHpXdCSqxhTuyc8hDxz94BRoomVUi47Y1vu8OYCv53MkOW5Lm48wHU+bkTCDDNr6h2 WH4kauu35/PEHh63VZPMuPI9Ho6A5r2Mblo/qNjA6dnLWMZqAg7a26UbSA/MwyM+dmJZrAKCAFVQ9 /WMq9jBUPs4WM3qF5aDyWlxNz1rk1gidKE0+hivTGRpZSilyeIToO3RC/6yxCRIUeuC8Dc3ccMlIq 7oLnAQlfzJLtPjw8SiZA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n7cKO-002QzG-GI; Wed, 12 Jan 2022 12:02:08 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n7cKK-002QwF-60 for linux-arm-kernel@lists.infradead.org; Wed, 12 Jan 2022 12:02:06 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 274F11FB; Wed, 12 Jan 2022 04:02:03 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.1.119]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BE2923F766; Wed, 12 Jan 2022 04:02:00 -0800 (PST) Date: Wed, 12 Jan 2022 12:01:57 +0000 From: Mark Rutland To: Muchun Song Cc: will@kernel.org, akpm@linux-foundation.org, david@redhat.com, bodeddub@amazon.com, osalvador@suse.de, mike.kravetz@oracle.com, rientjes@google.com, catalin.marinas@arm.com, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB Message-ID: References: <20220111131652.61947-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20220111131652.61947-1-songmuchun@bytedance.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220112_040204_352476_B5D42262 X-CRM114-Status: GOOD ( 40.97 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi, On Tue, Jan 11, 2022 at 09:16:52PM +0800, Muchun Song wrote: > The preparation of supporting freeing vmemmap associated with each > HugeTLB page is ready, so we can support this feature for arm64. > > Signed-off-by: Muchun Song It's a bit difficult to understand this commit message, as there's not much context here. What is HUGETLB_PAGE_FREE_VMEMMAP intended to achieve? Is this intended to save memory, find bugs, or some other goal? If this is a memory saving or performance improvement, can we quantify that benefit? Does the alloc/free happen dynamically, or does this happen once during kernel boot? IIUC it's the former, which sounds pretty scary. Especially if we need to re-allocate the vmmemmap pages later -- can't we run out of memory, and then fail to free a HugeTLB page? Are there any requirements upon arch code, e.g. mutual exclusion? Below there are a bunch of comments trying to explain that this is safe. Having some of that rationale in the commit message itself would be helpful. I see that commit: 6be24bed9da367c2 ("mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP") ... has a much more complete description, and cribbing some of that wording would be helpful. > --- > There is already some discussions about this in [1], but there was no > conclusion in the end. I copied the concern proposed by Anshuman to here. > > 1st concern: > " > But what happens when a hot remove section's vmemmap area (which is being > teared down) is nearby another vmemmap area which is either created or > being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB > pages inside the hot remove section might be safe. But what about other > HugeTLB areas whose vmemmap area shares page table entries with vmemmap > entries for a section being hot removed ? Massive HugeTLB alloc/use/free > test cycle using memory just adjacent to a memory hotplug area, which is > always added and removed periodically, should be able to expose this problem. > " > My Answer: As you already know HugeTLB pages inside the hot remove section > is safe. It would be helpful if you could explain *why* that's safe, since those of us coming at this cold have no idea whether this is the case. > Let's talk your question "what about other HugeTLB areas whose > vmemmap area shares page table entries with vmemmap entries for a section > being hot removed ?", the question is not established. Why? The minimal > granularity size of hotplug memory 128MB (on arm64, 4k base page), so any > HugeTLB smaller than 128MB is within a section, then, there is no share > (PTE) page tables between HugeTLB in this section and ones in other > sections and a HugeTLB could not cross two sections. Am I correct in assuming that in this case we never free the section? > Any HugeTLB bigger than 128MB (e.g. 1GB) whose size is an integer multible of > a section and vmemmap area is also an integer multiple of 2MB. At the time > memory is removed, all huge pages either have been migrated away or > dissolved. The vmemmap is stable. So there is no problem in this case as > well. Are you mention 2MB here because we PMD-map the vmemmap with 4K pages? IIUC, so long as: 1) HugeTLBs are naturally aligned, power-of-two sizes 2) The HugeTLB size >= the section size 3) The HugeTLB size >= the vmemmap leaf mapping size ... then a HugeTLB will not share any leaf page table entries with *anything else*, but will share intermediate entries. Perhaps that's a clearer line of argument? Regardless, this should be in the commit message. > 2nd concern: > " > differently, not sure if ptdump would require any synchronization. > > Dumping an wrong value is probably okay but crashing because a page table > entry is being freed after ptdump acquired the pointer is bad. On arm64, > ptdump() is protected against hotremove via [get|put]_online_mems(). > " > My Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges > PTEs or split the PMD entry (which means allocating a PTE page table). Both > operations do not free any page tables, so ptdump cannot run into a UAF on > any page tables. The wrost case is just dumping an wrong value. This should be in the commit message. Thanks, Mark. > > [1] https://lore.kernel.org/linux-mm/b8cdc9c8-853c-8392-a2fa-4f1a8f02057a@arm.com/T/ > > fs/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/Kconfig b/fs/Kconfig > index 7a2b11c0b803..04cfd5bf5ec9 100644 > --- a/fs/Kconfig > +++ b/fs/Kconfig > @@ -247,7 +247,7 @@ config HUGETLB_PAGE > > config HUGETLB_PAGE_FREE_VMEMMAP > def_bool HUGETLB_PAGE > - depends on X86_64 > + depends on X86_64 || ARM64 > depends on SPARSEMEM_VMEMMAP > > config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON > -- > 2.11.0 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel