linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
@ 2022-01-11 13:16 Muchun Song
  2022-01-12 12:01 ` Mark Rutland
  0 siblings, 1 reply; 10+ messages in thread
From: Muchun Song @ 2022-01-11 13:16 UTC (permalink / raw)
  To: will, akpm, david, bodeddub, osalvador, mike.kravetz, rientjes,
	mark.rutland, catalin.marinas, james.morse
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, Muchun Song

The preparation of supporting freeing vmemmap associated with each
HugeTLB page is ready, so we can support this feature for arm64.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
There is already some discussions about this in [1], but there was no
conclusion in the end. I copied the concern proposed by Anshuman to here.

1st concern:
"
  But what happens when a hot remove section's vmemmap area (which is being
  teared down) is nearby another vmemmap area which is either created or
  being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
  pages inside the hot remove section might be safe. But what about other
  HugeTLB areas whose vmemmap area shares page table entries with vmemmap
  entries for a section being hot removed ? Massive HugeTLB alloc/use/free
  test cycle using memory just adjacent to a memory hotplug area, which is
  always added and removed periodically, should be able to expose this problem.
"
My Answer: As you already know HugeTLB pages inside the hot remove section
is safe. Let's talk your question "what about other HugeTLB areas whose
vmemmap area shares page table entries with vmemmap entries for a section
being hot removed ?", the question is not established. Why? The minimal
granularity size of hotplug memory 128MB (on arm64, 4k base page), so any
HugeTLB smaller than 128MB is within a section, then, there is no share
(PTE) page tables between HugeTLB in this section and ones in other
sections and a HugeTLB could not cross two sections. Any HugeTLB bigger
than 128MB (e.g. 1GB) whose size is an integer multible of a section and
vmemmap area is also an integer multiple of 2MB. At the time memory is
removed, all huge pages either have been migrated away or dissolved. The
vmemmap is stable. So there is no problem in this case as well.

2nd concern:
"
  differently, not sure if ptdump would require any synchronization.

  Dumping an wrong value is probably okay but crashing because a page table
  entry is being freed after ptdump acquired the pointer is bad. On arm64,
  ptdump() is protected against hotremove via [get|put]_online_mems().
"
My Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges
PTEs or split the PMD entry (which means allocating a PTE page table). Both
operations do not free any page tables, so ptdump cannot run into a UAF on
any page tables. The wrost case is just dumping an wrong value.

[1] https://lore.kernel.org/linux-mm/b8cdc9c8-853c-8392-a2fa-4f1a8f02057a@arm.com/T/

 fs/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 7a2b11c0b803..04cfd5bf5ec9 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -247,7 +247,7 @@ config HUGETLB_PAGE
 
 config HUGETLB_PAGE_FREE_VMEMMAP
 	def_bool HUGETLB_PAGE
-	depends on X86_64
+	depends on X86_64 || ARM64
 	depends on SPARSEMEM_VMEMMAP
 
 config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2022-01-11 13:16 [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB Muchun Song
@ 2022-01-12 12:01 ` Mark Rutland
  2022-01-13  6:28   ` Muchun Song
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Rutland @ 2022-01-12 12:01 UTC (permalink / raw)
  To: Muchun Song
  Cc: will, akpm, david, bodeddub, osalvador, mike.kravetz, rientjes,
	catalin.marinas, james.morse, linux-arm-kernel, linux-kernel,
	linux-mm, duanxiongchun, fam.zheng

Hi,

On Tue, Jan 11, 2022 at 09:16:52PM +0800, Muchun Song wrote:
> The preparation of supporting freeing vmemmap associated with each
> HugeTLB page is ready, so we can support this feature for arm64.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

It's a bit difficult to understand this commit message, as there's not much
context here.

What is HUGETLB_PAGE_FREE_VMEMMAP intended to achieve? Is this intended to save
memory, find bugs, or some other goal? If this is a memory saving or
performance improvement, can we quantify that benefit?

Does the alloc/free happen dynamically, or does this happen once during kernel
boot? IIUC it's the former, which sounds pretty scary. Especially if we need to
re-allocate the vmmemmap pages later -- can't we run out of memory, and then
fail to free a HugeTLB page?

Are there any requirements upon arch code, e.g. mutual exclusion?

Below there are a bunch of comments trying to explain that this is safe. Having
some of that rationale in the commit message itself would be helpful.

I see that commit:

  6be24bed9da367c2 ("mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP")

... has a much more complete description, and cribbing some of that wording
would be helpful.


> ---
> There is already some discussions about this in [1], but there was no
> conclusion in the end. I copied the concern proposed by Anshuman to here.
> 
> 1st concern:
> "
>   But what happens when a hot remove section's vmemmap area (which is being
>   teared down) is nearby another vmemmap area which is either created or
>   being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
>   pages inside the hot remove section might be safe. But what about other
>   HugeTLB areas whose vmemmap area shares page table entries with vmemmap
>   entries for a section being hot removed ? Massive HugeTLB alloc/use/free
>   test cycle using memory just adjacent to a memory hotplug area, which is
>   always added and removed periodically, should be able to expose this problem.
> "
> My Answer: As you already know HugeTLB pages inside the hot remove section
> is safe.

It would be helpful if you could explain *why* that's safe, since those of us
coming at this cold have no idea whether this is the case.

> Let's talk your question "what about other HugeTLB areas whose
> vmemmap area shares page table entries with vmemmap entries for a section
> being hot removed ?", the question is not established. Why? The minimal
> granularity size of hotplug memory 128MB (on arm64, 4k base page), so any
> HugeTLB smaller than 128MB is within a section, then, there is no share
> (PTE) page tables between HugeTLB in this section and ones in other
> sections and a HugeTLB could not cross two sections.

Am I correct in assuming that in this case we never free the section?

> Any HugeTLB bigger than 128MB (e.g. 1GB) whose size is an integer multible of
> a section and vmemmap area is also an integer multiple of 2MB. At the time
> memory is removed, all huge pages either have been migrated away or
> dissolved. The vmemmap is stable. So there is no problem in this case as
> well.

Are you mention 2MB here because we PMD-map the vmemmap with 4K pages?

IIUC, so long as:

1) HugeTLBs are naturally aligned, power-of-two sizes
2) The HugeTLB size >= the section size
3) The HugeTLB size >= the vmemmap leaf mapping size

... then a HugeTLB will not share any leaf page table entries with *anything
else*, but will share intermediate entries.

Perhaps that's a clearer line of argument?

Regardless, this should be in the commit message.

> 2nd concern:
> "
>   differently, not sure if ptdump would require any synchronization.
> 
>   Dumping an wrong value is probably okay but crashing because a page table
>   entry is being freed after ptdump acquired the pointer is bad. On arm64,
>   ptdump() is protected against hotremove via [get|put]_online_mems().
> "
> My Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges
> PTEs or split the PMD entry (which means allocating a PTE page table). Both
> operations do not free any page tables, so ptdump cannot run into a UAF on
> any page tables. The wrost case is just dumping an wrong value.

This should be in the commit message.

Thanks,
Mark.

> 
> [1] https://lore.kernel.org/linux-mm/b8cdc9c8-853c-8392-a2fa-4f1a8f02057a@arm.com/T/
> 
>  fs/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 7a2b11c0b803..04cfd5bf5ec9 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -247,7 +247,7 @@ config HUGETLB_PAGE
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP
>  	def_bool HUGETLB_PAGE
> -	depends on X86_64
> +	depends on X86_64 || ARM64
>  	depends on SPARSEMEM_VMEMMAP
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
> -- 
> 2.11.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2022-01-12 12:01 ` Mark Rutland
@ 2022-01-13  6:28   ` Muchun Song
  0 siblings, 0 replies; 10+ messages in thread
From: Muchun Song @ 2022-01-13  6:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Andrew Morton, David Hildenbrand, Bodeddula,
	Balasubramaniam, Oscar Salvador, Mike Kravetz, David Rientjes,
	Catalin Marinas, james.morse, linux-arm-kernel, LKML,
	Linux Memory Management List, Xiongchun duan, Fam Zheng

On Wed, Jan 12, 2022 at 8:02 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi,
>
> On Tue, Jan 11, 2022 at 09:16:52PM +0800, Muchun Song wrote:
> > The preparation of supporting freeing vmemmap associated with each
> > HugeTLB page is ready, so we can support this feature for arm64.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> It's a bit difficult to understand this commit message, as there's not much
> context here.

Hi Mark,

My bad. More infos can be found here [1].

[1] https://lore.kernel.org/all/20210510030027.56044-1-songmuchun@bytedance.com/T/#u

>
> What is HUGETLB_PAGE_FREE_VMEMMAP intended to achieve? Is this intended to save
> memory, find bugs, or some other goal? If this is a memory saving or
> performance improvement, can we quantify that benefit?

It is for memory saving. It can save about 12GB or 16GB per 1TB HugeTLB
pages (2MB or 1GB type).

>
> Does the alloc/free happen dynamically, or does this happen once during kernel
> boot? IIUC it's the former, which sounds pretty scary. Especially if we need to
> re-allocate the vmmemmap pages later -- can't we run out of memory, and then
> fail to free a HugeTLB page?

Right. The implementations about this can found in commit:

  ad2fa3717b74994 ("mm: hugetlb: alloc the vmemmap pages associated
with each HugeTLB page")

>
> Are there any requirements upon arch code, e.g. mutual exclusion?

No. The implementation is generic. There is no architecture specific code
needed to be implemented.

>
> Below there are a bunch of comments trying to explain that this is safe. Having
> some of that rationale in the commit message itself would be helpful.
>
> I see that commit:
>
>   6be24bed9da367c2 ("mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP")
>
> ... has a much more complete description, and cribbing some of that wording
> would be helpful.

Will do in the next version once we are on the same page about this feature.

>
>
> > ---
> > There is already some discussions about this in [1], but there was no
> > conclusion in the end. I copied the concern proposed by Anshuman to here.
> >
> > 1st concern:
> > "
> >   But what happens when a hot remove section's vmemmap area (which is being
> >   teared down) is nearby another vmemmap area which is either created or
> >   being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
> >   pages inside the hot remove section might be safe. But what about other
> >   HugeTLB areas whose vmemmap area shares page table entries with vmemmap
> >   entries for a section being hot removed ? Massive HugeTLB alloc/use/free
> >   test cycle using memory just adjacent to a memory hotplug area, which is
> >   always added and removed periodically, should be able to expose this problem.
> > "
> > My Answer: As you already know HugeTLB pages inside the hot remove section
> > is safe.
>
> It would be helpful if you could explain *why* that's safe, since those of us
> coming at this cold have no idea whether this is the case.

At the time memory is removed, all huge pages either have been migrated
away or dissolved. So there is no race between memory hot remove and
free_huge_page_vmemmap().

>
> > Let's talk your question "what about other HugeTLB areas whose
> > vmemmap area shares page table entries with vmemmap entries for a section
> > being hot removed ?", the question is not established. Why? The minimal
> > granularity size of hotplug memory 128MB (on arm64, 4k base page), so any
> > HugeTLB smaller than 128MB is within a section, then, there is no share
> > (PTE) page tables between HugeTLB in this section and ones in other
> > sections and a HugeTLB could not cross two sections.
>
> Am I correct in assuming that in this case we never free the section?

Right. So there is no race between memory hot remove and
free_huge_page_vmemmap() as well.

>
> > Any HugeTLB bigger than 128MB (e.g. 1GB) whose size is an integer multible of
> > a section and vmemmap area is also an integer multiple of 2MB. At the time
> > memory is removed, all huge pages either have been migrated away or
> > dissolved. The vmemmap is stable. So there is no problem in this case as
> > well.
>
> Are you mention 2MB here because we PMD-map the vmemmap with 4K pages?

Right.

>
> IIUC, so long as:
>
> 1) HugeTLBs are naturally aligned, power-of-two sizes
> 2) The HugeTLB size >= the section size
> 3) The HugeTLB size >= the vmemmap leaf mapping size
>
> ... then a HugeTLB will not share any leaf page table entries with *anything
> else*, but will share intermediate entries.

Right.

>
> Perhaps that's a clearer line of argument?
>
> Regardless, this should be in the commit message.

Will do.

>
> > 2nd concern:
> > "
> >   differently, not sure if ptdump would require any synchronization.
> >
> >   Dumping an wrong value is probably okay but crashing because a page table
> >   entry is being freed after ptdump acquired the pointer is bad. On arm64,
> >   ptdump() is protected against hotremove via [get|put]_online_mems().
> > "
> > My Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges
> > PTEs or split the PMD entry (which means allocating a PTE page table). Both
> > operations do not free any page tables, so ptdump cannot run into a UAF on
> > any page tables. The wrost case is just dumping an wrong value.
>
> This should be in the commit message.
>

Will do.

Thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2021-05-20 11:59       ` David Hildenbrand
@ 2021-05-21  5:02         ` Anshuman Khandual
  0 siblings, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2021-05-21  5:02 UTC (permalink / raw)
  To: David Hildenbrand, Muchun Song, will, akpm, bodeddub, osalvador,
	mike.kravetz, rientjes
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, zhengqi.arch, Mark Rutland, Catalin Marinas,
	James Morse



On 5/20/21 5:29 PM, David Hildenbrand wrote:
> On 20.05.21 13:54, Anshuman Khandual wrote:
>>
>> On 5/19/21 5:33 PM, David Hildenbrand wrote:
>>> On 19.05.21 13:45, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 5/18/21 2:48 PM, Muchun Song wrote:
>>>>> The preparation of supporting freeing vmemmap associated with each
>>>>> HugeTLB page is ready, so we can support this feature for arm64.
>>>>>
>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>>> ---
>>>>>    arch/arm64/mm/mmu.c | 5 +++++
>>>>>    fs/Kconfig          | 2 +-
>>>>>    2 files changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>> index 5d37e461c41f..967b01ce468d 100644
>>>>> --- a/arch/arm64/mm/mmu.c
>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>> @@ -23,6 +23,7 @@
>>>>>    #include <linux/mm.h>
>>>>>    #include <linux/vmalloc.h>
>>>>>    #include <linux/set_memory.h>
>>>>> +#include <linux/hugetlb.h>
>>>>>      #include <asm/barrier.h>
>>>>>    #include <asm/cputype.h>
>>>>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>>>        pmd_t *pmdp;
>>>>>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>>> +
>>>>> +    if (is_hugetlb_free_vmemmap_enabled() && !altmap)
>>>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>>>
>>>> Not considering the fact that this will force the kernel to have only
>>>> base page size mapping for vmemmap (unless altmap is also requested)
>>>> which might reduce the performance, it also enables vmemmap mapping to
>>>> be teared down or build up at runtime which could potentially collide
>>>> with other kernel page table walkers like ptdump or memory hotremove
>>>> operation ! How those possible collisions are protected right now ?
>>>
>>> Hi Anshuman,
>>>
>>> Memory hotremove is not an issue IIRC. At the time memory is removed, all huge pages either have been migrated away or dissolved; the vmemmap is stable.
>>
>> But what happens when a hot remove section's vmemmap area (which is being
>> teared down) is nearby another vmemmap area which is either created or
>> being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
>> pages inside the hot remove section might be safe. But what about other
>> HugeTLB areas whose vmemmap area shares page table entries with vmemmap
>> entries for a section being hot removed ? Massive HugeTLB alloc/use/free
>> test cycle using memory just adjacent to a memory hotplug area, which is
>> always added and removed periodically, should be able to expose this problem.
>>
>> IIUC unlike vmalloc(), vmemap mapping areas in the kernel page table were
>> always constant unless there are hotplug add or remove operations which
>> are protected with a hotplug lock. Now with this change, we could have
>> simultaneous walking and add or remove of the vmemap areas without any
>> synchronization. Is not this problematic ?
>>
>> On arm64 memory hot remove operation empties free portions of the vmemmap
>> table after clearing them. Hence all concurrent walkers (hugetlb_vmemmap,
>> hot remove, ptdump etc) need to be synchronized against hot remove.
>>
>>  From arch/arm64/mm/mmu.c
>>
>> void vmemmap_free(unsigned long start, unsigned long end,
>>                  struct vmem_altmap *altmap)
>> {
>> #ifdef CONFIG_MEMORY_HOTPLUG
>>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>
>>          unmap_hotplug_range(start, end, true, altmap);
>>          free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
>> #endif
>> }
> 
> You are right, however, AFAIR
> 
> 1) We always populate base pages, meaning we only modify PTEs and not actually add/remove page tables when creating/destroying a hugetlb page. Page table walkers should be fine and not suddenly run into a use-after-free.
> 
> 2) For pfn_to_page() users to never fault, we have to do an atomic exchange of PTES, meaning, someone traversing a page table looking for pte_none() entries (like free_empty_tables() in your example) should never get a false positive.
> 
> Makes sense, or am I missing something?
> 
>>
>>>
>>> vmemmap access (accessing the memmap via a virtual address) itself is not an issue. Manually walking (vmemmap) page tables might behave
>>
>> Right.
>>
>> differently, not sure if ptdump would require any synchronization.
>>
>> Dumping an wrong value is probably okay but crashing because a page table
>> entry is being freed after ptdump acquired the pointer is bad. On arm64,
>> ptdump() is protected against hotremove via [get|put]_online_mems().
> 
> Okay, and as the feature in question only exchanges PTEs, we should be fine.

Adding Mark, Catalin and James here in case I might have missed
something regarding possible vmemmap collision.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2021-05-20 11:54     ` Anshuman Khandual
@ 2021-05-20 11:59       ` David Hildenbrand
  2021-05-21  5:02         ` Anshuman Khandual
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-05-20 11:59 UTC (permalink / raw)
  To: Anshuman Khandual, Muchun Song, will, akpm, bodeddub, osalvador,
	mike.kravetz, rientjes
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, zhengqi.arch

On 20.05.21 13:54, Anshuman Khandual wrote:
> 
> On 5/19/21 5:33 PM, David Hildenbrand wrote:
>> On 19.05.21 13:45, Anshuman Khandual wrote:
>>>
>>>
>>> On 5/18/21 2:48 PM, Muchun Song wrote:
>>>> The preparation of supporting freeing vmemmap associated with each
>>>> HugeTLB page is ready, so we can support this feature for arm64.
>>>>
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> ---
>>>>    arch/arm64/mm/mmu.c | 5 +++++
>>>>    fs/Kconfig          | 2 +-
>>>>    2 files changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index 5d37e461c41f..967b01ce468d 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -23,6 +23,7 @@
>>>>    #include <linux/mm.h>
>>>>    #include <linux/vmalloc.h>
>>>>    #include <linux/set_memory.h>
>>>> +#include <linux/hugetlb.h>
>>>>      #include <asm/barrier.h>
>>>>    #include <asm/cputype.h>
>>>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>>        pmd_t *pmdp;
>>>>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>> +
>>>> +    if (is_hugetlb_free_vmemmap_enabled() && !altmap)
>>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>>
>>> Not considering the fact that this will force the kernel to have only
>>> base page size mapping for vmemmap (unless altmap is also requested)
>>> which might reduce the performance, it also enables vmemmap mapping to
>>> be teared down or build up at runtime which could potentially collide
>>> with other kernel page table walkers like ptdump or memory hotremove
>>> operation ! How those possible collisions are protected right now ?
>>
>> Hi Anshuman,
>>
>> Memory hotremove is not an issue IIRC. At the time memory is removed, all huge pages either have been migrated away or dissolved; the vmemmap is stable.
> 
> But what happens when a hot remove section's vmemmap area (which is being
> teared down) is nearby another vmemmap area which is either created or
> being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
> pages inside the hot remove section might be safe. But what about other
> HugeTLB areas whose vmemmap area shares page table entries with vmemmap
> entries for a section being hot removed ? Massive HugeTLB alloc/use/free
> test cycle using memory just adjacent to a memory hotplug area, which is
> always added and removed periodically, should be able to expose this problem.
> 
> IIUC unlike vmalloc(), vmemap mapping areas in the kernel page table were
> always constant unless there are hotplug add or remove operations which
> are protected with a hotplug lock. Now with this change, we could have
> simultaneous walking and add or remove of the vmemap areas without any
> synchronization. Is not this problematic ?
> 
> On arm64 memory hot remove operation empties free portions of the vmemmap
> table after clearing them. Hence all concurrent walkers (hugetlb_vmemmap,
> hot remove, ptdump etc) need to be synchronized against hot remove.
> 
>  From arch/arm64/mm/mmu.c
> 
> void vmemmap_free(unsigned long start, unsigned long end,
>                  struct vmem_altmap *altmap)
> {
> #ifdef CONFIG_MEMORY_HOTPLUG
>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
> 
>          unmap_hotplug_range(start, end, true, altmap);
>          free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
> #endif
> }

You are right, however, AFAIR

1) We always populate base pages, meaning we only modify PTEs and not 
actually add/remove page tables when creating/destroying a hugetlb page. 
Page table walkers should be fine and not suddenly run into a 
use-after-free.

2) For pfn_to_page() users to never fault, we have to do an atomic 
exchange of PTES, meaning, someone traversing a page table looking for 
pte_none() entries (like free_empty_tables() in your example) should 
never get a false positive.

Makes sense, or am I missing something?

> 
>>
>> vmemmap access (accessing the memmap via a virtual address) itself is not an issue. Manually walking (vmemmap) page tables might behave
> 
> Right.
> 
> differently, not sure if ptdump would require any synchronization.
> 
> Dumping an wrong value is probably okay but crashing because a page table
> entry is being freed after ptdump acquired the pointer is bad. On arm64,
> ptdump() is protected against hotremove via [get|put]_online_mems().

Okay, and as the feature in question only exchanges PTEs, we should be 
fine.



-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2021-05-19 12:03   ` David Hildenbrand
@ 2021-05-20 11:54     ` Anshuman Khandual
  2021-05-20 11:59       ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Anshuman Khandual @ 2021-05-20 11:54 UTC (permalink / raw)
  To: David Hildenbrand, Muchun Song, will, akpm, bodeddub, osalvador,
	mike.kravetz, rientjes
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, zhengqi.arch


On 5/19/21 5:33 PM, David Hildenbrand wrote:
> On 19.05.21 13:45, Anshuman Khandual wrote:
>>
>>
>> On 5/18/21 2:48 PM, Muchun Song wrote:
>>> The preparation of supporting freeing vmemmap associated with each
>>> HugeTLB page is ready, so we can support this feature for arm64.
>>>
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>>   arch/arm64/mm/mmu.c | 5 +++++
>>>   fs/Kconfig          | 2 +-
>>>   2 files changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 5d37e461c41f..967b01ce468d 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -23,6 +23,7 @@
>>>   #include <linux/mm.h>
>>>   #include <linux/vmalloc.h>
>>>   #include <linux/set_memory.h>
>>> +#include <linux/hugetlb.h>
>>>     #include <asm/barrier.h>
>>>   #include <asm/cputype.h>
>>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>       pmd_t *pmdp;
>>>         WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>> +
>>> +    if (is_hugetlb_free_vmemmap_enabled() && !altmap)
>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>
>> Not considering the fact that this will force the kernel to have only
>> base page size mapping for vmemmap (unless altmap is also requested)
>> which might reduce the performance, it also enables vmemmap mapping to
>> be teared down or build up at runtime which could potentially collide
>> with other kernel page table walkers like ptdump or memory hotremove
>> operation ! How those possible collisions are protected right now ?
> 
> Hi Anshuman,
> 
> Memory hotremove is not an issue IIRC. At the time memory is removed, all huge pages either have been migrated away or dissolved; the vmemmap is stable.

But what happens when a hot remove section's vmemmap area (which is being
teared down) is nearby another vmemmap area which is either created or
being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
pages inside the hot remove section might be safe. But what about other
HugeTLB areas whose vmemmap area shares page table entries with vmemmap
entries for a section being hot removed ? Massive HugeTLB alloc/use/free
test cycle using memory just adjacent to a memory hotplug area, which is
always added and removed periodically, should be able to expose this problem.

IIUC unlike vmalloc(), vmemap mapping areas in the kernel page table were
always constant unless there are hotplug add or remove operations which
are protected with a hotplug lock. Now with this change, we could have
simultaneous walking and add or remove of the vmemap areas without any
synchronization. Is not this problematic ?

On arm64 memory hot remove operation empties free portions of the vmemmap
table after clearing them. Hence all concurrent walkers (hugetlb_vmemmap,
hot remove, ptdump etc) need to be synchronized against hot remove.

From arch/arm64/mm/mmu.c

void vmemmap_free(unsigned long start, unsigned long end,
                struct vmem_altmap *altmap)
{
#ifdef CONFIG_MEMORY_HOTPLUG
        WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));

        unmap_hotplug_range(start, end, true, altmap);
        free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
#endif
}

> 
> vmemmap access (accessing the memmap via a virtual address) itself is not an issue. Manually walking (vmemmap) page tables might behave 

Right.

differently, not sure if ptdump would require any synchronization.

Dumping an wrong value is probably okay but crashing because a page table
entry is being freed after ptdump acquired the pointer is bad. On arm64,
ptdump() is protected against hotremove via [get|put]_online_mems().

> 
>>
>> Does not this vmemmap operation increase latency for HugeTLB usage ?
>> Should not this runtime enablement also take into account some other
>> qualifying information apart from potential memory save from struct
>> page areas. Just wondering.
> 
> That's one of the reasons why it explicitly has to be enabled by an admin.
> 

config HUGETLB_PAGE_FREE_VMEMMAP
        def_bool HUGETLB_PAGE
        depends on X86_64 || ARM64
        depends on SPARSEMEM_VMEMMAP

Should not this depend on EXPERT as well ? Regardless, there is a sync
problem on arm64 if this feature is enabled as vmemmap portions can be
freed up during hot remove operation. But wondering how latency would
be impacted if vmemap_remap_[alloc|free]() add [get|put]_online_mems().

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2021-05-18  9:18 Muchun Song
  2021-05-19 11:45 ` Anshuman Khandual
@ 2021-05-19 12:36 ` Anshuman Khandual
  1 sibling, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2021-05-19 12:36 UTC (permalink / raw)
  To: Muchun Song, will, akpm, david, bodeddub, osalvador,
	mike.kravetz, rientjes
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, zhengqi.arch


On 5/18/21 2:48 PM, Muchun Song wrote:
> The preparation of supporting freeing vmemmap associated with each
> HugeTLB page is ready, so we can support this feature for arm64.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  arch/arm64/mm/mmu.c | 5 +++++
>  fs/Kconfig          | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 5d37e461c41f..967b01ce468d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -23,6 +23,7 @@
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
>  #include <linux/set_memory.h>
> +#include <linux/hugetlb.h>
>  
>  #include <asm/barrier.h>
>  #include <asm/cputype.h>
> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	pmd_t *pmdp;
>  
>  	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
> +
> +	if (is_hugetlb_free_vmemmap_enabled() && !altmap)
> +		return vmemmap_populate_basepages(start, end, node, altmap);
> +
>  	do {
>  		next = pmd_addr_end(addr, end);
>  
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 6ce6fdac00a3..02c2d3bf1cb8 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -242,7 +242,7 @@ config HUGETLB_PAGE
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP
>  	def_bool HUGETLB_PAGE
> -	depends on X86_64
> +	depends on X86_64 || ARM64
>  	depends on SPARSEMEM_VMEMMAP
>  
>  config MEMFD_CREATE
> 

How does this interact with HugeTLB migration as such which might iterate
over individual constituent struct pages (overriding the same struct page
for all tail pages when this feature is enabled). A simple test involving
madvise(ptr, size, MADV_SOFT_OFFLINE) fails on various HugeTLB page sizes,
with this patch applied. Although I have not debugged this any further.

Soft offlining pfn 0x101c00 at process virtual address 0xffff7fa00000
soft offline: 0x101c00: hugepage migration failed 1, type bfffc0000010006 
              (referenced|uptodate|head|node=0|zone=2|lastcpupid=0xffff)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2021-05-19 11:45 ` Anshuman Khandual
@ 2021-05-19 12:03   ` David Hildenbrand
  2021-05-20 11:54     ` Anshuman Khandual
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-05-19 12:03 UTC (permalink / raw)
  To: Anshuman Khandual, Muchun Song, will, akpm, bodeddub, osalvador,
	mike.kravetz, rientjes
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, zhengqi.arch

On 19.05.21 13:45, Anshuman Khandual wrote:
> 
> 
> On 5/18/21 2:48 PM, Muchun Song wrote:
>> The preparation of supporting freeing vmemmap associated with each
>> HugeTLB page is ready, so we can support this feature for arm64.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>>   arch/arm64/mm/mmu.c | 5 +++++
>>   fs/Kconfig          | 2 +-
>>   2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 5d37e461c41f..967b01ce468d 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -23,6 +23,7 @@
>>   #include <linux/mm.h>
>>   #include <linux/vmalloc.h>
>>   #include <linux/set_memory.h>
>> +#include <linux/hugetlb.h>
>>   
>>   #include <asm/barrier.h>
>>   #include <asm/cputype.h>
>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>   	pmd_t *pmdp;
>>   
>>   	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>> +
>> +	if (is_hugetlb_free_vmemmap_enabled() && !altmap)
>> +		return vmemmap_populate_basepages(start, end, node, altmap);
> 
> Not considering the fact that this will force the kernel to have only
> base page size mapping for vmemmap (unless altmap is also requested)
> which might reduce the performance, it also enables vmemmap mapping to
> be teared down or build up at runtime which could potentially collide
> with other kernel page table walkers like ptdump or memory hotremove
> operation ! How those possible collisions are protected right now ?

Hi Anshuman,

Memory hotremove is not an issue IIRC. At the time memory is removed, 
all huge pages either have been migrated away or dissolved; the vmemmap 
is stable.

vmemmap access (accessing the memmap via a virtual address) itself is 
not an issue. Manually walking (vmemmap) page tables might behave 
differently, not sure if ptdump would require any synchronization.

> 
> Does not this vmemmap operation increase latency for HugeTLB usage ?
> Should not this runtime enablement also take into account some other
> qualifying information apart from potential memory save from struct
> page areas. Just wondering.

That's one of the reasons why it explicitly has to be enabled by an admin.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
  2021-05-18  9:18 Muchun Song
@ 2021-05-19 11:45 ` Anshuman Khandual
  2021-05-19 12:03   ` David Hildenbrand
  2021-05-19 12:36 ` Anshuman Khandual
  1 sibling, 1 reply; 10+ messages in thread
From: Anshuman Khandual @ 2021-05-19 11:45 UTC (permalink / raw)
  To: Muchun Song, will, akpm, david, bodeddub, osalvador,
	mike.kravetz, rientjes
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, zhengqi.arch



On 5/18/21 2:48 PM, Muchun Song wrote:
> The preparation of supporting freeing vmemmap associated with each
> HugeTLB page is ready, so we can support this feature for arm64.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  arch/arm64/mm/mmu.c | 5 +++++
>  fs/Kconfig          | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 5d37e461c41f..967b01ce468d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -23,6 +23,7 @@
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
>  #include <linux/set_memory.h>
> +#include <linux/hugetlb.h>
>  
>  #include <asm/barrier.h>
>  #include <asm/cputype.h>
> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	pmd_t *pmdp;
>  
>  	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
> +
> +	if (is_hugetlb_free_vmemmap_enabled() && !altmap)
> +		return vmemmap_populate_basepages(start, end, node, altmap);

Not considering the fact that this will force the kernel to have only
base page size mapping for vmemmap (unless altmap is also requested)
which might reduce the performance, it also enables vmemmap mapping to
be teared down or build up at runtime which could potentially collide
with other kernel page table walkers like ptdump or memory hotremove
operation ! How those possible collisions are protected right now ?

Does not this vmemmap operation increase latency for HugeTLB usage ?
Should not this runtime enablement also take into account some other
qualifying information apart from potential memory save from struct
page areas. Just wondering.

> +
>  	do {
>  		next = pmd_addr_end(addr, end);
>  
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 6ce6fdac00a3..02c2d3bf1cb8 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -242,7 +242,7 @@ config HUGETLB_PAGE
>  
>  config HUGETLB_PAGE_FREE_VMEMMAP
>  	def_bool HUGETLB_PAGE
> -	depends on X86_64
> +	depends on X86_64 || ARM64
>  	depends on SPARSEMEM_VMEMMAP
>  
>  config MEMFD_CREATE
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
@ 2021-05-18  9:18 Muchun Song
  2021-05-19 11:45 ` Anshuman Khandual
  2021-05-19 12:36 ` Anshuman Khandual
  0 siblings, 2 replies; 10+ messages in thread
From: Muchun Song @ 2021-05-18  9:18 UTC (permalink / raw)
  To: will, akpm, david, bodeddub, osalvador, mike.kravetz, rientjes
  Cc: linux-arm-kernel, linux-kernel, linux-mm, duanxiongchun,
	fam.zheng, zhengqi.arch, Muchun Song

The preparation of supporting freeing vmemmap associated with each
HugeTLB page is ready, so we can support this feature for arm64.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 arch/arm64/mm/mmu.c | 5 +++++
 fs/Kconfig          | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5d37e461c41f..967b01ce468d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -23,6 +23,7 @@
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <linux/set_memory.h>
+#include <linux/hugetlb.h>
 
 #include <asm/barrier.h>
 #include <asm/cputype.h>
@@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	pmd_t *pmdp;
 
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+
+	if (is_hugetlb_free_vmemmap_enabled() && !altmap)
+		return vmemmap_populate_basepages(start, end, node, altmap);
+
 	do {
 		next = pmd_addr_end(addr, end);
 
diff --git a/fs/Kconfig b/fs/Kconfig
index 6ce6fdac00a3..02c2d3bf1cb8 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -242,7 +242,7 @@ config HUGETLB_PAGE
 
 config HUGETLB_PAGE_FREE_VMEMMAP
 	def_bool HUGETLB_PAGE
-	depends on X86_64
+	depends on X86_64 || ARM64
 	depends on SPARSEMEM_VMEMMAP
 
 config MEMFD_CREATE
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-13  6:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11 13:16 [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB Muchun Song
2022-01-12 12:01 ` Mark Rutland
2022-01-13  6:28   ` Muchun Song
  -- strict thread matches above, loose matches on Subject: below --
2021-05-18  9:18 Muchun Song
2021-05-19 11:45 ` Anshuman Khandual
2021-05-19 12:03   ` David Hildenbrand
2021-05-20 11:54     ` Anshuman Khandual
2021-05-20 11:59       ` David Hildenbrand
2021-05-21  5:02         ` Anshuman Khandual
2021-05-19 12:36 ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).