linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>,
	will@kernel.org, akpm@linux-foundation.org, bodeddub@amazon.com,
	osalvador@suse.de, mike.kravetz@oracle.com, rientjes@google.com
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	duanxiongchun@bytedance.com, fam.zheng@bytedance.com,
	zhengqi.arch@bytedance.com, Mark Rutland <mark.rutland@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	James Morse <james.morse@arm.com>
Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB
Date: Fri, 21 May 2021 10:32:41 +0530	[thread overview]
Message-ID: <b8cdc9c8-853c-8392-a2fa-4f1a8f02057a@arm.com> (raw)
In-Reply-To: <45c1a368-3d31-e92d-f120-4dca0eb2111d@redhat.com>



On 5/20/21 5:29 PM, David Hildenbrand wrote:
> On 20.05.21 13:54, Anshuman Khandual wrote:
>>
>> On 5/19/21 5:33 PM, David Hildenbrand wrote:
>>> On 19.05.21 13:45, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 5/18/21 2:48 PM, Muchun Song wrote:
>>>>> The preparation of supporting freeing vmemmap associated with each
>>>>> HugeTLB page is ready, so we can support this feature for arm64.
>>>>>
>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>>> ---
>>>>>    arch/arm64/mm/mmu.c | 5 +++++
>>>>>    fs/Kconfig          | 2 +-
>>>>>    2 files changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>> index 5d37e461c41f..967b01ce468d 100644
>>>>> --- a/arch/arm64/mm/mmu.c
>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>> @@ -23,6 +23,7 @@
>>>>>    #include <linux/mm.h>
>>>>>    #include <linux/vmalloc.h>
>>>>>    #include <linux/set_memory.h>
>>>>> +#include <linux/hugetlb.h>
>>>>>      #include <asm/barrier.h>
>>>>>    #include <asm/cputype.h>
>>>>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>>>        pmd_t *pmdp;
>>>>>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>>> +
>>>>> +    if (is_hugetlb_free_vmemmap_enabled() && !altmap)
>>>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>>>
>>>> Not considering the fact that this will force the kernel to have only
>>>> base page size mapping for vmemmap (unless altmap is also requested)
>>>> which might reduce the performance, it also enables vmemmap mapping to
>>>> be teared down or build up at runtime which could potentially collide
>>>> with other kernel page table walkers like ptdump or memory hotremove
>>>> operation ! How those possible collisions are protected right now ?
>>>
>>> Hi Anshuman,
>>>
>>> Memory hotremove is not an issue IIRC. At the time memory is removed, all huge pages either have been migrated away or dissolved; the vmemmap is stable.
>>
>> But what happens when a hot remove section's vmemmap area (which is being
>> teared down) is nearby another vmemmap area which is either created or
>> being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB
>> pages inside the hot remove section might be safe. But what about other
>> HugeTLB areas whose vmemmap area shares page table entries with vmemmap
>> entries for a section being hot removed ? Massive HugeTLB alloc/use/free
>> test cycle using memory just adjacent to a memory hotplug area, which is
>> always added and removed periodically, should be able to expose this problem.
>>
>> IIUC unlike vmalloc(), vmemap mapping areas in the kernel page table were
>> always constant unless there are hotplug add or remove operations which
>> are protected with a hotplug lock. Now with this change, we could have
>> simultaneous walking and add or remove of the vmemap areas without any
>> synchronization. Is not this problematic ?
>>
>> On arm64 memory hot remove operation empties free portions of the vmemmap
>> table after clearing them. Hence all concurrent walkers (hugetlb_vmemmap,
>> hot remove, ptdump etc) need to be synchronized against hot remove.
>>
>>  From arch/arm64/mm/mmu.c
>>
>> void vmemmap_free(unsigned long start, unsigned long end,
>>                  struct vmem_altmap *altmap)
>> {
>> #ifdef CONFIG_MEMORY_HOTPLUG
>>          WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>
>>          unmap_hotplug_range(start, end, true, altmap);
>>          free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
>> #endif
>> }
> 
> You are right, however, AFAIR
> 
> 1) We always populate base pages, meaning we only modify PTEs and not actually add/remove page tables when creating/destroying a hugetlb page. Page table walkers should be fine and not suddenly run into a use-after-free.
> 
> 2) For pfn_to_page() users to never fault, we have to do an atomic exchange of PTES, meaning, someone traversing a page table looking for pte_none() entries (like free_empty_tables() in your example) should never get a false positive.
> 
> Makes sense, or am I missing something?
> 
>>
>>>
>>> vmemmap access (accessing the memmap via a virtual address) itself is not an issue. Manually walking (vmemmap) page tables might behave
>>
>> Right.
>>
>> differently, not sure if ptdump would require any synchronization.
>>
>> Dumping an wrong value is probably okay but crashing because a page table
>> entry is being freed after ptdump acquired the pointer is bad. On arm64,
>> ptdump() is protected against hotremove via [get|put]_online_mems().
> 
> Okay, and as the feature in question only exchanges PTEs, we should be fine.

Adding Mark, Catalin and James here in case I might have missed
something regarding possible vmemmap collision.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-21  5:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18  9:18 [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB Muchun Song
2021-05-19 11:45 ` Anshuman Khandual
2021-05-19 12:03   ` David Hildenbrand
2021-05-20 11:54     ` Anshuman Khandual
2021-05-20 11:59       ` David Hildenbrand
2021-05-21  5:02         ` Anshuman Khandual [this message]
2021-05-19 12:49   ` [External] " Muchun Song
2021-05-20 12:00     ` Anshuman Khandual
2021-05-19 12:36 ` Anshuman Khandual
2021-05-19 14:43   ` [External] " Muchun Song
2021-05-19 15:22     ` Muchun Song
2021-05-19 16:21       ` Muchun Song
2021-05-19 22:44         ` Mike Kravetz
2022-01-11 13:16 Muchun Song
2022-01-12 12:01 ` Mark Rutland
2022-01-13  6:28   ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8cdc9c8-853c-8392-a2fa-4f1a8f02057a@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bodeddub@amazon.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=will@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).