From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0D7AFC433F5 for ; Thu, 13 Jan 2022 06:30:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=XSFSNh+fvVT/uP1QWKcsd80L8wHfOQt6W/easNfR9Uo=; b=J5ucWPHsAONhGn QJ5DYYakYtfUS6VYqYxdHi5hkZybjnO0wxPLzfQMWaUNx4c6UhXxE6ehOamD5MRA8LfUY9Zlc13C5 +dk6xekBAWZMFDAMePMfUzOymahUheDi/2bTONTUtDVNldMBYK3fwQ69WuIYKF3R1ShUD49GWCasG nHmG+dI/kNesDGQc/VPj51sw6N7DxYuP87MLSjHgEXP4p6gyTe6W2mQ5oCcXYbncmqX7SNlPhyeje Lbh+BpzcjqVKXdm/eGbMgQ9+WtQb3GaYER5zDG/c/mmYL7h6VUkgpLuIkzStfGNZpKtYAcP3EXQs+ 5fAH7af1mAzqaPk0JmSA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n7tbO-004ovc-RR; Thu, 13 Jan 2022 06:28:50 +0000 Received: from mail-yb1-xb34.google.com ([2607:f8b0:4864:20::b34]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1n7tbK-004oug-KC for linux-arm-kernel@lists.infradead.org; Thu, 13 Jan 2022 06:28:48 +0000 Received: by mail-yb1-xb34.google.com with SMTP id 127so12223600ybb.4 for ; Wed, 12 Jan 2022 22:28:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UdjffI2T5RQy9Fhj4zBGCPcU9B4inFX0s175+8/fA+U=; b=mkc7Fx2U+lFgpGEhhkWswpNHjp6VUEmh5OfB8VaEfpWP77PTBvjXtKlkhDBwfS7zu4 HELs1yPHLU68FOH7AZal/9wXLunwzAfhz+fW2DmaMqqT2yr5QOx6npCoi0Q4TOrRmGE0 hP9jE5m/btPaRvL3ibgFqFqMpsP83V/ks896J+PfNE1FsbHFqoa3R6T1DK5ugrwBCXMy UxpaXVZ/fzQLVgP0g2WkBNAWPtUYCZoslbqjJLYfjdcd6D9zOmkiSSQvOagRLmKFuC4l S+yYh+d61V3V/mkM3LAoC2xesB0v1eZBELnoQazcETCoy9KjqlWqMJkOitT9GvIdGpmt R6qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UdjffI2T5RQy9Fhj4zBGCPcU9B4inFX0s175+8/fA+U=; b=Lzy9243ASPl0o5yrGRfNs3Ie5qyhyxGdKW6QNuBTSqt8+lcAZaS2tEMYmAVowF5ov6 imu6grlsuGKN2sIl6kzC2oEWFZ2bCkL5XKyjlNhu6WvS0OnobhmLA0vQWhVHy3ZbgfIh LJhpK+bpif3NJjEuHD/yzlKY5FGarsLWoAFJYlqoifCMFVYF6c7+SCRU8lf/mCwnvAZj p/iB1qMDFzUfdL8mX+WZuCLK0vLC6UhyhSuQswRqCJ9ZsWAv8Rj0UlyOOIxUffxk/GFR IiRrfUDR84N0XmC7Ch2EMGd25eIPyJ4xrmElu9um05t6IVU5D6XZAZxrKqW6fslYvcHa rkKQ== X-Gm-Message-State: AOAM533W9oK2Bz5WXmkQv9muHx7Xv+vp+FQDp9HsZyqcV760blJkvkcb V9f4gQ43RvYW3rp0NV34KjZ+uSQv/aFoBuzkSBCMcw== X-Google-Smtp-Source: ABdhPJyrJaJhgs/1cBvU8PqRSgRNTV6Wse4sswNuBXOO2NA8diqbz+VreoH33Ej/KjbqbW7AmkdcSkvUX57bLoTPcgU= X-Received: by 2002:a25:6d09:: with SMTP id i9mr96060ybc.703.1642055324532; Wed, 12 Jan 2022 22:28:44 -0800 (PST) MIME-Version: 1.0 References: <20220111131652.61947-1-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Thu, 13 Jan 2022 14:28:08 +0800 Message-ID: Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB To: Mark Rutland Cc: Will Deacon , Andrew Morton , David Hildenbrand , "Bodeddula, Balasubramaniam" , Oscar Salvador , Mike Kravetz , David Rientjes , Catalin Marinas , james.morse@arm.com, linux-arm-kernel@lists.infradead.org, LKML , Linux Memory Management List , Xiongchun duan , Fam Zheng X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220112_222846_753300_FEACFB61 X-CRM114-Status: GOOD ( 50.63 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Jan 12, 2022 at 8:02 PM Mark Rutland wrote: > > Hi, > > On Tue, Jan 11, 2022 at 09:16:52PM +0800, Muchun Song wrote: > > The preparation of supporting freeing vmemmap associated with each > > HugeTLB page is ready, so we can support this feature for arm64. > > > > Signed-off-by: Muchun Song > > It's a bit difficult to understand this commit message, as there's not much > context here. Hi Mark, My bad. More infos can be found here [1]. [1] https://lore.kernel.org/all/20210510030027.56044-1-songmuchun@bytedance.com/T/#u > > What is HUGETLB_PAGE_FREE_VMEMMAP intended to achieve? Is this intended to save > memory, find bugs, or some other goal? If this is a memory saving or > performance improvement, can we quantify that benefit? It is for memory saving. It can save about 12GB or 16GB per 1TB HugeTLB pages (2MB or 1GB type). > > Does the alloc/free happen dynamically, or does this happen once during kernel > boot? IIUC it's the former, which sounds pretty scary. Especially if we need to > re-allocate the vmmemmap pages later -- can't we run out of memory, and then > fail to free a HugeTLB page? Right. The implementations about this can found in commit: ad2fa3717b74994 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page") > > Are there any requirements upon arch code, e.g. mutual exclusion? No. The implementation is generic. There is no architecture specific code needed to be implemented. > > Below there are a bunch of comments trying to explain that this is safe. Having > some of that rationale in the commit message itself would be helpful. > > I see that commit: > > 6be24bed9da367c2 ("mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP") > > ... has a much more complete description, and cribbing some of that wording > would be helpful. Will do in the next version once we are on the same page about this feature. > > > > --- > > There is already some discussions about this in [1], but there was no > > conclusion in the end. I copied the concern proposed by Anshuman to here. > > > > 1st concern: > > " > > But what happens when a hot remove section's vmemmap area (which is being > > teared down) is nearby another vmemmap area which is either created or > > being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB > > pages inside the hot remove section might be safe. But what about other > > HugeTLB areas whose vmemmap area shares page table entries with vmemmap > > entries for a section being hot removed ? Massive HugeTLB alloc/use/free > > test cycle using memory just adjacent to a memory hotplug area, which is > > always added and removed periodically, should be able to expose this problem. > > " > > My Answer: As you already know HugeTLB pages inside the hot remove section > > is safe. > > It would be helpful if you could explain *why* that's safe, since those of us > coming at this cold have no idea whether this is the case. At the time memory is removed, all huge pages either have been migrated away or dissolved. So there is no race between memory hot remove and free_huge_page_vmemmap(). > > > Let's talk your question "what about other HugeTLB areas whose > > vmemmap area shares page table entries with vmemmap entries for a section > > being hot removed ?", the question is not established. Why? The minimal > > granularity size of hotplug memory 128MB (on arm64, 4k base page), so any > > HugeTLB smaller than 128MB is within a section, then, there is no share > > (PTE) page tables between HugeTLB in this section and ones in other > > sections and a HugeTLB could not cross two sections. > > Am I correct in assuming that in this case we never free the section? Right. So there is no race between memory hot remove and free_huge_page_vmemmap() as well. > > > Any HugeTLB bigger than 128MB (e.g. 1GB) whose size is an integer multible of > > a section and vmemmap area is also an integer multiple of 2MB. At the time > > memory is removed, all huge pages either have been migrated away or > > dissolved. The vmemmap is stable. So there is no problem in this case as > > well. > > Are you mention 2MB here because we PMD-map the vmemmap with 4K pages? Right. > > IIUC, so long as: > > 1) HugeTLBs are naturally aligned, power-of-two sizes > 2) The HugeTLB size >= the section size > 3) The HugeTLB size >= the vmemmap leaf mapping size > > ... then a HugeTLB will not share any leaf page table entries with *anything > else*, but will share intermediate entries. Right. > > Perhaps that's a clearer line of argument? > > Regardless, this should be in the commit message. Will do. > > > 2nd concern: > > " > > differently, not sure if ptdump would require any synchronization. > > > > Dumping an wrong value is probably okay but crashing because a page table > > entry is being freed after ptdump acquired the pointer is bad. On arm64, > > ptdump() is protected against hotremove via [get|put]_online_mems(). > > " > > My Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges > > PTEs or split the PMD entry (which means allocating a PTE page table). Both > > operations do not free any page tables, so ptdump cannot run into a UAF on > > any page tables. The wrost case is just dumping an wrong value. > > This should be in the commit message. > Will do. Thanks. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel