All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: will@kernel.org, akpm@linux-foundation.org, david@redhat.com,
	bodeddub@amazon.com, osalvador@suse.de, mike.kravetz@oracle.com,
	rientjes@google.com, catalin.marinas@arm.com,
	james.morse@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	duanxiongchun@bytedance.com, fam.zheng@bytedance.com
Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
Date: Wed, 12 Jan 2022 12:01:57 +0000	[thread overview]
Message-ID: <Yd7DNesaCACPndv2@FVFF77S0Q05N> (raw)
In-Reply-To: <20220111131652.61947-1-songmuchun@bytedance.com>

Hi,

On Tue, Jan 11, 2022 at 09:16:52PM +0800, Muchun Song wrote:
> The preparation of supporting freeing vmemmap associated with each
> HugeTLB page is ready, so we can support this feature for arm64.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

It's a bit difficult to understand this commit message, as there's not much
context here.

What is HUGETLB_PAGE_FREE_VMEMMAP intended to achieve? Is this intended to save
memory, find bugs, or some other goal? If this is a memory saving or
performance improvement, can we quantify that benefit?

Does the alloc/free happen dynamically, or does this happen once during kernel
boot? IIUC it's the former, which sounds pretty scary. Especially if we need to
re-allocate the vmmemmap pages later -- can't we run out of memory, and then
fail to free a HugeTLB page?

Are there any requirements upon arch code, e.g. mutual exclusion?

Below there are a bunch of comments trying to explain that this is safe. Having
some of that rationale in the commit message itself would be helpful.

I see that commit:

  6be24bed9da367c2 ("mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP")

... has a much more complete description, and cribbing some of that wording
would be helpful.


> ---
> There is already some discussions about this in [1], but there was no
> conclusion in the end. I copied the concern proposed by Anshuman to here.
> 
> 1st concern:
> "
>   But what happens when a hot remove section's vmemmap area (which is being
>   teared down) is nearby another vmemmap area which is either created or
>   being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
>   pages inside the hot remove section might be safe. But what about other
>   HugeTLB areas whose vmemmap area shares page table entries with vmemmap
>   entries for a section being hot removed ? Massive HugeTLB alloc/use/free
>   test cycle using memory just adjacent to a memory hotplug area, which is
>   always added and removed periodically, should be able to expose this problem.
> "
> My Answer: As you already know HugeTLB pages inside the hot remove section
> is safe.

It would be helpful if you could explain *why* that's safe, since those of us
coming at this cold have no idea whether this is the case.

> Let's talk your question "what about other HugeTLB areas whose
> vmemmap area shares page table entries with vmemmap entries for a section
> being hot removed ?", the question is not established. Why? The minimal
> granularity size of hotplug memory 128MB (on arm64, 4k base page), so any
> HugeTLB smaller than 128MB is within a section, then, there is no share
> (PTE) page tables between HugeTLB in this section and ones in other
> sections and a HugeTLB could not cross two sections.

Am I correct in assuming that in this case we never free the section?

> Any HugeTLB bigger than 128MB (e.g. 1GB) whose size is an integer multible of
> a section and vmemmap area is also an integer multiple of 2MB. At the time
> memory is removed, all huge pages either have been migrated away or
> dissolved. The vmemmap is stable. So there is no problem in this case as
> well.

Are you mention 2MB here because we PMD-map the vmemmap with 4K pages?

IIUC, so long as:

1) HugeTLBs are naturally aligned, power-of-two sizes
2) The HugeTLB size >= the section size
3) The HugeTLB size >= the vmemmap leaf mapping size

... then a HugeTLB will not share any leaf page table entries with *anything
else*, but will share intermediate entries.

Perhaps that's a clearer line of argument?

Regardless, this should be in the commit message.

> 2nd concern:
> "
>   differently, not sure if ptdump would require any synchronization.
> 
>   Dumping an wrong value is probably okay but crashing because a page table
>   entry is being freed after ptdump acquired the pointer is bad. On arm64,
>   ptdump() is protected against hotremove via [get|put]_online_mems().
> "
> My Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges
> PTEs or split the PMD entry (which means allocating a PTE page table). Both
> operations do not free any page tables, so ptdump cannot run into a UAF on
> any page tables. The wrost case is just dumping an wrong value.

This should be in the commit message.

Thanks,
Mark.

> 
> [1] https://lore.kernel.org/linux-mm/b8cdc9c8-853c-8392-a2fa-4f1a8f02057a@arm.com/T/
> 
>  fs/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 7a2b11c0b803..04cfd5bf5ec9 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -247,7 +247,7 @@ config HUGETLB_PAGE
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP
>  	def_bool HUGETLB_PAGE
> -	depends on X86_64
> +	depends on X86_64 || ARM64
>  	depends on SPARSEMEM_VMEMMAP
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
> -- 
> 2.11.0
> 

WARNING: multiple messages have this Message-ID (diff)
From: Mark Rutland <mark.rutland@arm.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: will@kernel.org, akpm@linux-foundation.org, david@redhat.com,
	bodeddub@amazon.com, osalvador@suse.de, mike.kravetz@oracle.com,
	rientjes@google.com, catalin.marinas@arm.com,
	james.morse@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	duanxiongchun@bytedance.com, fam.zheng@bytedance.com
Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
Date: Wed, 12 Jan 2022 12:01:57 +0000	[thread overview]
Message-ID: <Yd7DNesaCACPndv2@FVFF77S0Q05N> (raw)
In-Reply-To: <20220111131652.61947-1-songmuchun@bytedance.com>

Hi,

On Tue, Jan 11, 2022 at 09:16:52PM +0800, Muchun Song wrote:
> The preparation of supporting freeing vmemmap associated with each
> HugeTLB page is ready, so we can support this feature for arm64.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

It's a bit difficult to understand this commit message, as there's not much
context here.

What is HUGETLB_PAGE_FREE_VMEMMAP intended to achieve? Is this intended to save
memory, find bugs, or some other goal? If this is a memory saving or
performance improvement, can we quantify that benefit?

Does the alloc/free happen dynamically, or does this happen once during kernel
boot? IIUC it's the former, which sounds pretty scary. Especially if we need to
re-allocate the vmmemmap pages later -- can't we run out of memory, and then
fail to free a HugeTLB page?

Are there any requirements upon arch code, e.g. mutual exclusion?

Below there are a bunch of comments trying to explain that this is safe. Having
some of that rationale in the commit message itself would be helpful.

I see that commit:

  6be24bed9da367c2 ("mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP")

... has a much more complete description, and cribbing some of that wording
would be helpful.


> ---
> There is already some discussions about this in [1], but there was no
> conclusion in the end. I copied the concern proposed by Anshuman to here.
> 
> 1st concern:
> "
>   But what happens when a hot remove section's vmemmap area (which is being
>   teared down) is nearby another vmemmap area which is either created or
>   being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
>   pages inside the hot remove section might be safe. But what about other
>   HugeTLB areas whose vmemmap area shares page table entries with vmemmap
>   entries for a section being hot removed ? Massive HugeTLB alloc/use/free
>   test cycle using memory just adjacent to a memory hotplug area, which is
>   always added and removed periodically, should be able to expose this problem.
> "
> My Answer: As you already know HugeTLB pages inside the hot remove section
> is safe.

It would be helpful if you could explain *why* that's safe, since those of us
coming at this cold have no idea whether this is the case.

> Let's talk your question "what about other HugeTLB areas whose
> vmemmap area shares page table entries with vmemmap entries for a section
> being hot removed ?", the question is not established. Why? The minimal
> granularity size of hotplug memory 128MB (on arm64, 4k base page), so any
> HugeTLB smaller than 128MB is within a section, then, there is no share
> (PTE) page tables between HugeTLB in this section and ones in other
> sections and a HugeTLB could not cross two sections.

Am I correct in assuming that in this case we never free the section?

> Any HugeTLB bigger than 128MB (e.g. 1GB) whose size is an integer multible of
> a section and vmemmap area is also an integer multiple of 2MB. At the time
> memory is removed, all huge pages either have been migrated away or
> dissolved. The vmemmap is stable. So there is no problem in this case as
> well.

Are you mention 2MB here because we PMD-map the vmemmap with 4K pages?

IIUC, so long as:

1) HugeTLBs are naturally aligned, power-of-two sizes
2) The HugeTLB size >= the section size
3) The HugeTLB size >= the vmemmap leaf mapping size

... then a HugeTLB will not share any leaf page table entries with *anything
else*, but will share intermediate entries.

Perhaps that's a clearer line of argument?

Regardless, this should be in the commit message.

> 2nd concern:
> "
>   differently, not sure if ptdump would require any synchronization.
> 
>   Dumping an wrong value is probably okay but crashing because a page table
>   entry is being freed after ptdump acquired the pointer is bad. On arm64,
>   ptdump() is protected against hotremove via [get|put]_online_mems().
> "
> My Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges
> PTEs or split the PMD entry (which means allocating a PTE page table). Both
> operations do not free any page tables, so ptdump cannot run into a UAF on
> any page tables. The wrost case is just dumping an wrong value.

This should be in the commit message.

Thanks,
Mark.

> 
> [1] https://lore.kernel.org/linux-mm/b8cdc9c8-853c-8392-a2fa-4f1a8f02057a@arm.com/T/
> 
>  fs/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 7a2b11c0b803..04cfd5bf5ec9 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -247,7 +247,7 @@ config HUGETLB_PAGE
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP
>  	def_bool HUGETLB_PAGE
> -	depends on X86_64
> +	depends on X86_64 || ARM64
>  	depends on SPARSEMEM_VMEMMAP
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
> -- 
> 2.11.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-01-12 12:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-11 13:16 [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB Muchun Song
2022-01-11 13:16 ` Muchun Song
2022-01-12 12:01 ` Mark Rutland [this message]
2022-01-12 12:01   ` Mark Rutland
2022-01-13  6:28   ` Muchun Song
2022-01-13  6:28     ` Muchun Song
  -- strict thread matches above, loose matches on Subject: below --
2021-05-18  9:18 Muchun Song
2021-05-18  9:18 ` Muchun Song
2021-05-19 11:45 ` Anshuman Khandual
2021-05-19 11:45   ` Anshuman Khandual
2021-05-19 12:03   ` David Hildenbrand
2021-05-19 12:03     ` David Hildenbrand
2021-05-20 11:54     ` Anshuman Khandual
2021-05-20 11:54       ` Anshuman Khandual
2021-05-20 11:59       ` David Hildenbrand
2021-05-20 11:59         ` David Hildenbrand
2021-05-21  5:02         ` Anshuman Khandual
2021-05-21  5:02           ` Anshuman Khandual
2021-05-19 12:36 ` Anshuman Khandual
2021-05-19 12:36   ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yd7DNesaCACPndv2@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bodeddub@amazon.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.