All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Anshuman Khandual <anshuman.khandual@arm.com>,
	Muchun Song <songmuchun@bytedance.com>,
	will@kernel.org, akpm@linux-foundation.org, bodeddub@amazon.com,
	osalvador@suse.de, mike.kravetz@oracle.com, rientjes@google.com
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	duanxiongchun@bytedance.com, fam.zheng@bytedance.com,
	zhengqi.arch@bytedance.com
Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
Date: Thu, 20 May 2021 13:59:28 +0200	[thread overview]
Message-ID: <45c1a368-3d31-e92d-f120-4dca0eb2111d@redhat.com> (raw)
In-Reply-To: <df8a0fd5-2389-6ef0-b8e2-1c56663e7868@arm.com>

On 20.05.21 13:54, Anshuman Khandual wrote:
> 
> On 5/19/21 5:33 PM, David Hildenbrand wrote:
>> On 19.05.21 13:45, Anshuman Khandual wrote:
>>>
>>>
>>> On 5/18/21 2:48 PM, Muchun Song wrote:
>>>> The preparation of supporting freeing vmemmap associated with each
>>>> HugeTLB page is ready, so we can support this feature for arm64.
>>>>
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> ---
>>>>    arch/arm64/mm/mmu.c | 5 +++++
>>>>    fs/Kconfig          | 2 +-
>>>>    2 files changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index 5d37e461c41f..967b01ce468d 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -23,6 +23,7 @@
>>>>    #include <linux/mm.h>
>>>>    #include <linux/vmalloc.h>
>>>>    #include <linux/set_memory.h>
>>>> +#include <linux/hugetlb.h>
>>>>      #include <asm/barrier.h>
>>>>    #include <asm/cputype.h>
>>>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>>        pmd_t *pmdp;
>>>>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>> +
>>>> +    if (is_hugetlb_free_vmemmap_enabled() && !altmap)
>>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>>
>>> Not considering the fact that this will force the kernel to have only
>>> base page size mapping for vmemmap (unless altmap is also requested)
>>> which might reduce the performance, it also enables vmemmap mapping to
>>> be teared down or build up at runtime which could potentially collide
>>> with other kernel page table walkers like ptdump or memory hotremove
>>> operation ! How those possible collisions are protected right now ?
>>
>> Hi Anshuman,
>>
>> Memory hotremove is not an issue IIRC. At the time memory is removed, all huge pages either have been migrated away or dissolved; the vmemmap is stable.
> 
> But what happens when a hot remove section's vmemmap area (which is being
> teared down) is nearby another vmemmap area which is either created or
> being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
> pages inside the hot remove section might be safe. But what about other
> HugeTLB areas whose vmemmap area shares page table entries with vmemmap
> entries for a section being hot removed ? Massive HugeTLB alloc/use/free
> test cycle using memory just adjacent to a memory hotplug area, which is
> always added and removed periodically, should be able to expose this problem.
> 
> IIUC unlike vmalloc(), vmemap mapping areas in the kernel page table were
> always constant unless there are hotplug add or remove operations which
> are protected with a hotplug lock. Now with this change, we could have
> simultaneous walking and add or remove of the vmemap areas without any
> synchronization. Is not this problematic ?
> 
> On arm64 memory hot remove operation empties free portions of the vmemmap
> table after clearing them. Hence all concurrent walkers (hugetlb_vmemmap,
> hot remove, ptdump etc) need to be synchronized against hot remove.
> 
>  From arch/arm64/mm/mmu.c
> 
> void vmemmap_free(unsigned long start, unsigned long end,
>                  struct vmem_altmap *altmap)
> {
> #ifdef CONFIG_MEMORY_HOTPLUG
>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
> 
>          unmap_hotplug_range(start, end, true, altmap);
>          free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
> #endif
> }

You are right, however, AFAIR

1) We always populate base pages, meaning we only modify PTEs and not 
actually add/remove page tables when creating/destroying a hugetlb page. 
Page table walkers should be fine and not suddenly run into a 
use-after-free.

2) For pfn_to_page() users to never fault, we have to do an atomic 
exchange of PTES, meaning, someone traversing a page table looking for 
pte_none() entries (like free_empty_tables() in your example) should 
never get a false positive.

Makes sense, or am I missing something?

> 
>>
>> vmemmap access (accessing the memmap via a virtual address) itself is not an issue. Manually walking (vmemmap) page tables might behave
> 
> Right.
> 
> differently, not sure if ptdump would require any synchronization.
> 
> Dumping an wrong value is probably okay but crashing because a page table
> entry is being freed after ptdump acquired the pointer is bad. On arm64,
> ptdump() is protected against hotremove via [get|put]_online_mems().

Okay, and as the feature in question only exchanges PTEs, we should be 
fine.



-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Anshuman Khandual <anshuman.khandual@arm.com>,
	Muchun Song <songmuchun@bytedance.com>,
	will@kernel.org, akpm@linux-foundation.org, bodeddub@amazon.com,
	osalvador@suse.de, mike.kravetz@oracle.com, rientjes@google.com
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	duanxiongchun@bytedance.com, fam.zheng@bytedance.com,
	zhengqi.arch@bytedance.com
Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
Date: Thu, 20 May 2021 13:59:28 +0200	[thread overview]
Message-ID: <45c1a368-3d31-e92d-f120-4dca0eb2111d@redhat.com> (raw)
In-Reply-To: <df8a0fd5-2389-6ef0-b8e2-1c56663e7868@arm.com>

On 20.05.21 13:54, Anshuman Khandual wrote:
> 
> On 5/19/21 5:33 PM, David Hildenbrand wrote:
>> On 19.05.21 13:45, Anshuman Khandual wrote:
>>>
>>>
>>> On 5/18/21 2:48 PM, Muchun Song wrote:
>>>> The preparation of supporting freeing vmemmap associated with each
>>>> HugeTLB page is ready, so we can support this feature for arm64.
>>>>
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> ---
>>>>    arch/arm64/mm/mmu.c | 5 +++++
>>>>    fs/Kconfig          | 2 +-
>>>>    2 files changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index 5d37e461c41f..967b01ce468d 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -23,6 +23,7 @@
>>>>    #include <linux/mm.h>
>>>>    #include <linux/vmalloc.h>
>>>>    #include <linux/set_memory.h>
>>>> +#include <linux/hugetlb.h>
>>>>      #include <asm/barrier.h>
>>>>    #include <asm/cputype.h>
>>>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>>        pmd_t *pmdp;
>>>>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>> +
>>>> +    if (is_hugetlb_free_vmemmap_enabled() && !altmap)
>>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>>
>>> Not considering the fact that this will force the kernel to have only
>>> base page size mapping for vmemmap (unless altmap is also requested)
>>> which might reduce the performance, it also enables vmemmap mapping to
>>> be teared down or build up at runtime which could potentially collide
>>> with other kernel page table walkers like ptdump or memory hotremove
>>> operation ! How those possible collisions are protected right now ?
>>
>> Hi Anshuman,
>>
>> Memory hotremove is not an issue IIRC. At the time memory is removed, all huge pages either have been migrated away or dissolved; the vmemmap is stable.
> 
> But what happens when a hot remove section's vmemmap area (which is being
> teared down) is nearby another vmemmap area which is either created or
> being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
> pages inside the hot remove section might be safe. But what about other
> HugeTLB areas whose vmemmap area shares page table entries with vmemmap
> entries for a section being hot removed ? Massive HugeTLB alloc/use/free
> test cycle using memory just adjacent to a memory hotplug area, which is
> always added and removed periodically, should be able to expose this problem.
> 
> IIUC unlike vmalloc(), vmemap mapping areas in the kernel page table were
> always constant unless there are hotplug add or remove operations which
> are protected with a hotplug lock. Now with this change, we could have
> simultaneous walking and add or remove of the vmemap areas without any
> synchronization. Is not this problematic ?
> 
> On arm64 memory hot remove operation empties free portions of the vmemmap
> table after clearing them. Hence all concurrent walkers (hugetlb_vmemmap,
> hot remove, ptdump etc) need to be synchronized against hot remove.
> 
>  From arch/arm64/mm/mmu.c
> 
> void vmemmap_free(unsigned long start, unsigned long end,
>                  struct vmem_altmap *altmap)
> {
> #ifdef CONFIG_MEMORY_HOTPLUG
>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
> 
>          unmap_hotplug_range(start, end, true, altmap);
>          free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
> #endif
> }

You are right, however, AFAIR

1) We always populate base pages, meaning we only modify PTEs and not 
actually add/remove page tables when creating/destroying a hugetlb page. 
Page table walkers should be fine and not suddenly run into a 
use-after-free.

2) For pfn_to_page() users to never fault, we have to do an atomic 
exchange of PTES, meaning, someone traversing a page table looking for 
pte_none() entries (like free_empty_tables() in your example) should 
never get a false positive.

Makes sense, or am I missing something?

> 
>>
>> vmemmap access (accessing the memmap via a virtual address) itself is not an issue. Manually walking (vmemmap) page tables might behave
> 
> Right.
> 
> differently, not sure if ptdump would require any synchronization.
> 
> Dumping an wrong value is probably okay but crashing because a page table
> entry is being freed after ptdump acquired the pointer is bad. On arm64,
> ptdump() is protected against hotremove via [get|put]_online_mems().

Okay, and as the feature in question only exchanges PTEs, we should be 
fine.



-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-20 12:00 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18  9:18 [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB Muchun Song
2021-05-18  9:18 ` Muchun Song
2021-05-19 11:45 ` Anshuman Khandual
2021-05-19 11:45   ` Anshuman Khandual
2021-05-19 12:03   ` David Hildenbrand
2021-05-19 12:03     ` David Hildenbrand
2021-05-20 11:54     ` Anshuman Khandual
2021-05-20 11:54       ` Anshuman Khandual
2021-05-20 11:59       ` David Hildenbrand [this message]
2021-05-20 11:59         ` David Hildenbrand
2021-05-21  5:02         ` Anshuman Khandual
2021-05-21  5:02           ` Anshuman Khandual
2021-05-19 12:49   ` [External] " Muchun Song
2021-05-19 12:49     ` Muchun Song
2021-05-19 12:49     ` Muchun Song
2021-05-20 12:00     ` Anshuman Khandual
2021-05-20 12:00       ` Anshuman Khandual
2021-05-19 12:36 ` Anshuman Khandual
2021-05-19 12:36   ` Anshuman Khandual
2021-05-19 14:43   ` [External] " Muchun Song
2021-05-19 14:43     ` Muchun Song
2021-05-19 14:43     ` Muchun Song
2021-05-19 15:22     ` Muchun Song
2021-05-19 15:22       ` Muchun Song
2021-05-19 15:22       ` Muchun Song
2021-05-19 16:21       ` Muchun Song
2021-05-19 16:21         ` Muchun Song
2021-05-19 16:21         ` Muchun Song
2021-05-19 22:44         ` Mike Kravetz
2021-05-19 22:44           ` Mike Kravetz
2022-01-11 13:16 Muchun Song
2022-01-11 13:16 ` Muchun Song
2022-01-12 12:01 ` Mark Rutland
2022-01-12 12:01   ` Mark Rutland
2022-01-13  6:28   ` Muchun Song
2022-01-13  6:28     ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45c1a368-3d31-e92d-f120-4dca0eb2111d@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=bodeddub@amazon.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=will@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.