From: Anshuman Khandual <anshuman.khandual@arm.com> To: Steven Price <steven.price@arm.com>, Sudarshan Rajagopalan <sudaraja@codeaurora.org>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Logan Gunthorpe <logang@deltatee.com>, David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory Date: Thu, 10 Sep 2020 16:20:42 +0530 [thread overview] Message-ID: <145c57a3-1753-3ff8-4353-3bf7bac0b7de@arm.com> (raw) In-Reply-To: <b8c6e11f-00f3-a89c-6ebc-eef55f92298b@arm.com> On 09/10/2020 01:57 PM, Steven Price wrote: > On 10/09/2020 07:05, Sudarshan Rajagopalan wrote: >> When section mappings are enabled, we allocate vmemmap pages from physically >> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section >> mappings are good to reduce TLB pressure. But when system is highly fragmented >> and memory blocks are being hot-added at runtime, its possible that such >> physically continuous memory allocations can fail. Rather than failing the >> memory hot-add procedure, add a fallback option to allocate vmemmap pages from >> discontinuous pages using vmemmap_populate_basepages(). >> >> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will@kernel.org> >> Cc: Anshuman Khandual <anshuman.khandual@arm.com> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Logan Gunthorpe <logang@deltatee.com> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Steven Price <steven.price@arm.com> >> --- >> arch/arm64/mm/mmu.c | 15 ++++++++++++--- >> 1 file changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >> index 75df62f..a46c7d4 100644 >> --- a/arch/arm64/mm/mmu.c >> +++ b/arch/arm64/mm/mmu.c >> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, >> p4d_t *p4dp; >> pud_t *pudp; >> pmd_t *pmdp; >> + int ret = 0; >> do { >> next = pmd_addr_end(addr, end); >> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, >> void *p = NULL; >> p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap); >> - if (!p) >> - return -ENOMEM; >> + if (!p) { >> +#ifdef CONFIG_MEMORY_HOTPLUG >> + vmemmap_free(start, end, altmap); >> +#endif >> + ret = -ENOMEM; >> + break; >> + } >> pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL)); >> } else >> vmemmap_verify((pte_t *)pmdp, node, addr, next); >> } while (addr = next, addr != end); >> - return 0; >> + if (ret) >> + return vmemmap_populate_basepages(start, end, node, altmap); >> + else >> + return ret; > > Style comment: I find this usage of 'ret' confusing. When we assign -ENOMEM above that is never actually the return value of the function (in that case vmemmap_populate_basepages() provides the actual return value). Right. > > Also the "return ret" is misleading since we know by that point that ret==0 (and the 'else' is redundant). Right. > > Can you not just move the call to vmemmap_populate_basepages() up to just after the (possible) vmemmap_free() call and remove the 'ret' variable? > > AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel you Right, CONFIG_MEMORY_HOTPLUG is not required. need at least a comment to explain Anshuman's point that it looks like you're freeing an unmapped area. Although if I'm reading the code correctly it seems like the unmapped area will just be skipped. Proposed vmemmap_free() attempts to free the entire requested vmemmap range [start, end] when an intermediate PMD entry can not be allocated. Hence even if vmemap_free() could skip an unmapped area (will double check on that), it unnecessarily goes through large sections of unmapped range, which could not have been mapped. So, basically there could be two different methods for doing this fallback. 1. Call vmemmap_populate_basepages() for sections when PMD_SIZE allocation fails - vmemmap_free() need not be called 2. Abort at the first instance of PMD_SIZE allocation failure - Call vmemmap_free() to unmap all sections mapped till that point - Call vmemmap_populate_basepages() to map the entire request section The proposed patch tried to mix both approaches. Regardless, the first approach here seems better and is the case in vmemmap_populate_hugepages() implementation on x86 as well.
WARNING: multiple messages have this Message-ID (diff)
From: Anshuman Khandual <anshuman.khandual@arm.com> To: Steven Price <steven.price@arm.com>, Sudarshan Rajagopalan <sudaraja@codeaurora.org>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com>, Will Deacon <will@kernel.org>, David Hildenbrand <david@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Andrew Morton <akpm@linux-foundation.org>, Logan Gunthorpe <logang@deltatee.com> Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory Date: Thu, 10 Sep 2020 16:20:42 +0530 [thread overview] Message-ID: <145c57a3-1753-3ff8-4353-3bf7bac0b7de@arm.com> (raw) In-Reply-To: <b8c6e11f-00f3-a89c-6ebc-eef55f92298b@arm.com> On 09/10/2020 01:57 PM, Steven Price wrote: > On 10/09/2020 07:05, Sudarshan Rajagopalan wrote: >> When section mappings are enabled, we allocate vmemmap pages from physically >> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section >> mappings are good to reduce TLB pressure. But when system is highly fragmented >> and memory blocks are being hot-added at runtime, its possible that such >> physically continuous memory allocations can fail. Rather than failing the >> memory hot-add procedure, add a fallback option to allocate vmemmap pages from >> discontinuous pages using vmemmap_populate_basepages(). >> >> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will@kernel.org> >> Cc: Anshuman Khandual <anshuman.khandual@arm.com> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Logan Gunthorpe <logang@deltatee.com> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Steven Price <steven.price@arm.com> >> --- >> arch/arm64/mm/mmu.c | 15 ++++++++++++--- >> 1 file changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >> index 75df62f..a46c7d4 100644 >> --- a/arch/arm64/mm/mmu.c >> +++ b/arch/arm64/mm/mmu.c >> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, >> p4d_t *p4dp; >> pud_t *pudp; >> pmd_t *pmdp; >> + int ret = 0; >> do { >> next = pmd_addr_end(addr, end); >> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, >> void *p = NULL; >> p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap); >> - if (!p) >> - return -ENOMEM; >> + if (!p) { >> +#ifdef CONFIG_MEMORY_HOTPLUG >> + vmemmap_free(start, end, altmap); >> +#endif >> + ret = -ENOMEM; >> + break; >> + } >> pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL)); >> } else >> vmemmap_verify((pte_t *)pmdp, node, addr, next); >> } while (addr = next, addr != end); >> - return 0; >> + if (ret) >> + return vmemmap_populate_basepages(start, end, node, altmap); >> + else >> + return ret; > > Style comment: I find this usage of 'ret' confusing. When we assign -ENOMEM above that is never actually the return value of the function (in that case vmemmap_populate_basepages() provides the actual return value). Right. > > Also the "return ret" is misleading since we know by that point that ret==0 (and the 'else' is redundant). Right. > > Can you not just move the call to vmemmap_populate_basepages() up to just after the (possible) vmemmap_free() call and remove the 'ret' variable? > > AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel you Right, CONFIG_MEMORY_HOTPLUG is not required. need at least a comment to explain Anshuman's point that it looks like you're freeing an unmapped area. Although if I'm reading the code correctly it seems like the unmapped area will just be skipped. Proposed vmemmap_free() attempts to free the entire requested vmemmap range [start, end] when an intermediate PMD entry can not be allocated. Hence even if vmemap_free() could skip an unmapped area (will double check on that), it unnecessarily goes through large sections of unmapped range, which could not have been mapped. So, basically there could be two different methods for doing this fallback. 1. Call vmemmap_populate_basepages() for sections when PMD_SIZE allocation fails - vmemmap_free() need not be called 2. Abort at the first instance of PMD_SIZE allocation failure - Call vmemmap_free() to unmap all sections mapped till that point - Call vmemmap_populate_basepages() to map the entire request section The proposed patch tried to mix both approaches. Regardless, the first approach here seems better and is the case in vmemmap_populate_hugepages() implementation on x86 as well. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-09-10 10:54 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <01010174769e2b68-a6f3768e-aef8-43c7-b357-a8cb1e17d3eb-000000@us-west-2.amazonses.com> 2020-09-10 6:45 ` [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory Anshuman Khandual 2020-09-10 6:45 ` Anshuman Khandual 2020-09-10 8:08 ` David Hildenbrand 2020-09-10 8:08 ` David Hildenbrand 2020-09-10 10:58 ` Anshuman Khandual 2020-09-10 10:58 ` Anshuman Khandual 2020-09-10 8:27 ` sudaraja 2020-09-10 11:16 ` Anshuman Khandual 2020-09-10 11:16 ` Anshuman Khandual 2020-09-10 8:27 ` sudaraja 2020-09-10 8:27 ` Steven Price 2020-09-10 8:27 ` Steven Price 2020-09-10 10:50 ` Anshuman Khandual [this message] 2020-09-10 10:50 ` Anshuman Khandual 2020-09-10 20:48 ` sudaraja 2020-09-21 17:43 ` Will Deacon 2020-09-21 17:43 ` Will Deacon 2020-09-10 20:48 ` sudaraja 2020-09-10 6:05 Sudarshan Rajagopalan -- strict thread matches above, loose matches on Subject: below -- 2020-09-10 6:05 Sudarshan Rajagopalan
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=145c57a3-1753-3ff8-4353-3bf7bac0b7de@arm.com \ --to=anshuman.khandual@arm.com \ --cc=akpm@linux-foundation.org \ --cc=catalin.marinas@arm.com \ --cc=david@redhat.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=logang@deltatee.com \ --cc=mark.rutland@arm.com \ --cc=steven.price@arm.com \ --cc=sudaraja@codeaurora.org \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.