From: Catalin Marinas <catalin.marinas@arm.com> To: Anshuman Khandual <anshuman.khandual@arm.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, will@kernel.org, mark.rutland@arm.com, mhocko@suse.com, ira.weiny@intel.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove Date: Thu, 12 Sep 2019 21:15:17 +0100 [thread overview] Message-ID: <20190912201517.GB1068@C02TF0J2HF1T.local> (raw) In-Reply-To: <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com> Hi Anshuman, Thanks for the details on the need for removing the page tables and vmemmap backing. Some comments on the code below. On Tue, Sep 03, 2019 at 03:15:58PM +0530, Anshuman Khandual wrote: > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -60,6 +60,14 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused; > > static DEFINE_SPINLOCK(swapper_pgdir_lock); > > +/* > + * This represents if vmalloc and vmemmap address range overlap with > + * each other on an intermediate level kernel page table entry which > + * in turn helps in deciding whether empty kernel page table pages > + * if any can be freed during memory hotplug operation. > + */ > +static bool vmalloc_vmemmap_overlap; I'd say just move the static find_vmalloc_vmemmap_overlap() function here, the compiler should be sufficiently smart enough to figure out that it's just a build-time constant. > @@ -770,6 +1022,28 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > void vmemmap_free(unsigned long start, unsigned long end, > struct vmem_altmap *altmap) > { > +#ifdef CONFIG_MEMORY_HOTPLUG > + /* > + * FIXME: We should have called remove_pagetable(start, end, true). > + * vmemmap and vmalloc virtual range might share intermediate kernel > + * page table entries. Removing vmemmap range page table pages here > + * can potentially conflict with a concurrent vmalloc() allocation. > + * > + * This is primarily because vmalloc() does not take init_mm ptl for > + * the entire page table walk and it's modification. Instead it just > + * takes the lock while allocating and installing page table pages > + * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table > + * entry via memory hot remove can cause vmalloc() kernel page table > + * walk pointers to be invalid on the fly which can cause corruption > + * or worst, a crash. > + * > + * So free_empty_tables() gets called where vmalloc and vmemmap range > + * do not overlap at any intermediate level kernel page table entry. > + */ > + unmap_hotplug_range(start, end, true); > + if (!vmalloc_vmemmap_overlap) > + free_empty_tables(start, end); > +#endif > } So, I see the risk with overlapping and I guess for some kernel configurations (PAGE_SIZE == 64K) we may not be able to avoid it. If we can, that's great, otherwise could we rewrite the above functions to handle floor and ceiling similar to free_pgd_range()? (I wonder how this function works if you called it on init_mm and kernel address range). By having the vmemmap start/end information it avoids freeing partially filled page table pages. Another question: could we do the page table and the actual vmemmap pages freeing in a single pass (sorry if this has been discussed before)? > @@ -1048,10 +1322,18 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) > } > > #ifdef CONFIG_MEMORY_HOTPLUG > +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) > +{ > + unsigned long end = start + size; > + > + WARN_ON(pgdir != init_mm.pgd); > + remove_pagetable(start, end, false); > +} I think the point I've made previously still stands: you only call remove_pagetable() with sparse_vmap == false in this patch. Just remove the extra argument and call unmap_hotplug_range() with sparse_vmap == false directly in remove_pagetable(). -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com> To: Anshuman Khandual <anshuman.khandual@arm.com> Cc: mark.rutland@arm.com, mhocko@suse.com, david@redhat.com, linux-mm@kvack.org, arunks@codeaurora.org, cpandya@codeaurora.org, ira.weiny@intel.com, will@kernel.org, steven.price@arm.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, Robin.Murphy@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, dan.j.williams@intel.com, linux-arm-kernel@lists.infradead.org, osalvador@suse.de, steve.capper@arm.com, logang@deltatee.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@techsingularity.net Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove Date: Thu, 12 Sep 2019 21:15:17 +0100 [thread overview] Message-ID: <20190912201517.GB1068@C02TF0J2HF1T.local> (raw) In-Reply-To: <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com> Hi Anshuman, Thanks for the details on the need for removing the page tables and vmemmap backing. Some comments on the code below. On Tue, Sep 03, 2019 at 03:15:58PM +0530, Anshuman Khandual wrote: > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -60,6 +60,14 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused; > > static DEFINE_SPINLOCK(swapper_pgdir_lock); > > +/* > + * This represents if vmalloc and vmemmap address range overlap with > + * each other on an intermediate level kernel page table entry which > + * in turn helps in deciding whether empty kernel page table pages > + * if any can be freed during memory hotplug operation. > + */ > +static bool vmalloc_vmemmap_overlap; I'd say just move the static find_vmalloc_vmemmap_overlap() function here, the compiler should be sufficiently smart enough to figure out that it's just a build-time constant. > @@ -770,6 +1022,28 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > void vmemmap_free(unsigned long start, unsigned long end, > struct vmem_altmap *altmap) > { > +#ifdef CONFIG_MEMORY_HOTPLUG > + /* > + * FIXME: We should have called remove_pagetable(start, end, true). > + * vmemmap and vmalloc virtual range might share intermediate kernel > + * page table entries. Removing vmemmap range page table pages here > + * can potentially conflict with a concurrent vmalloc() allocation. > + * > + * This is primarily because vmalloc() does not take init_mm ptl for > + * the entire page table walk and it's modification. Instead it just > + * takes the lock while allocating and installing page table pages > + * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table > + * entry via memory hot remove can cause vmalloc() kernel page table > + * walk pointers to be invalid on the fly which can cause corruption > + * or worst, a crash. > + * > + * So free_empty_tables() gets called where vmalloc and vmemmap range > + * do not overlap at any intermediate level kernel page table entry. > + */ > + unmap_hotplug_range(start, end, true); > + if (!vmalloc_vmemmap_overlap) > + free_empty_tables(start, end); > +#endif > } So, I see the risk with overlapping and I guess for some kernel configurations (PAGE_SIZE == 64K) we may not be able to avoid it. If we can, that's great, otherwise could we rewrite the above functions to handle floor and ceiling similar to free_pgd_range()? (I wonder how this function works if you called it on init_mm and kernel address range). By having the vmemmap start/end information it avoids freeing partially filled page table pages. Another question: could we do the page table and the actual vmemmap pages freeing in a single pass (sorry if this has been discussed before)? > @@ -1048,10 +1322,18 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) > } > > #ifdef CONFIG_MEMORY_HOTPLUG > +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) > +{ > + unsigned long end = start + size; > + > + WARN_ON(pgdir != init_mm.pgd); > + remove_pagetable(start, end, false); > +} I think the point I've made previously still stands: you only call remove_pagetable() with sparse_vmap == false in this patch. Just remove the extra argument and call unmap_hotplug_range() with sparse_vmap == false directly in remove_pagetable(). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-09-12 20:15 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-09-03 9:45 [PATCH V7 0/3] arm64/mm: Enable memory hot remove Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-03 9:45 ` [PATCH V7 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-04 8:16 ` David Hildenbrand 2019-09-04 8:16 ` David Hildenbrand 2019-09-05 4:27 ` Anshuman Khandual 2019-09-05 4:27 ` Anshuman Khandual 2019-09-16 1:44 ` Balbir Singh 2019-09-16 1:44 ` Balbir Singh 2019-09-18 9:28 ` Anshuman Khandual 2019-09-18 9:28 ` Anshuman Khandual 2019-09-03 9:45 ` [PATCH V7 2/3] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-15 2:35 ` Balbir Singh 2019-09-15 2:35 ` Balbir Singh 2019-09-18 9:12 ` Anshuman Khandual 2019-09-18 9:12 ` Anshuman Khandual 2019-09-03 9:45 ` [PATCH V7 3/3] arm64/mm: Enable memory hot remove Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-10 16:17 ` Catalin Marinas 2019-09-10 16:17 ` Catalin Marinas 2019-09-11 10:31 ` David Hildenbrand 2019-09-11 10:31 ` David Hildenbrand 2019-09-12 4:28 ` Anshuman Khandual 2019-09-12 4:28 ` Anshuman Khandual 2019-09-12 8:37 ` Anshuman Khandual 2019-09-12 8:37 ` Anshuman Khandual 2019-09-12 20:15 ` Catalin Marinas [this message] 2019-09-12 20:15 ` Catalin Marinas 2019-09-13 5:58 ` Anshuman Khandual 2019-09-13 5:58 ` Anshuman Khandual 2019-09-13 10:09 ` Catalin Marinas 2019-09-13 10:09 ` Catalin Marinas 2019-09-17 4:36 ` Anshuman Khandual 2019-09-17 4:36 ` Anshuman Khandual 2019-09-17 15:08 ` Catalin Marinas 2019-09-17 15:08 ` Catalin Marinas
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190912201517.GB1068@C02TF0J2HF1T.local \ --to=catalin.marinas@arm.com \ --cc=Robin.Murphy@arm.com \ --cc=akpm@linux-foundation.org \ --cc=anshuman.khandual@arm.com \ --cc=ard.biesheuvel@arm.com \ --cc=arunks@codeaurora.org \ --cc=broonie@kernel.org \ --cc=cai@lca.pw \ --cc=cpandya@codeaurora.org \ --cc=dan.j.williams@intel.com \ --cc=david@redhat.com \ --cc=ira.weiny@intel.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=logang@deltatee.com \ --cc=mark.rutland@arm.com \ --cc=mgorman@techsingularity.net \ --cc=mhocko@suse.com \ --cc=osalvador@suse.de \ --cc=steve.capper@arm.com \ --cc=steven.price@arm.com \ --cc=suzuki.poulose@arm.com \ --cc=valentin.schneider@arm.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.