From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80CD8C41621 for ; Tue, 24 Mar 2020 12:01:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 233E8208C3 for ; Tue, 24 Mar 2020 12:01:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 233E8208C3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BD8016B0007; Tue, 24 Mar 2020 08:01:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B60406B0008; Tue, 24 Mar 2020 08:01:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A007D6B000A; Tue, 24 Mar 2020 08:01:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 7E8546B0007 for ; Tue, 24 Mar 2020 08:01:38 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 62E29289CD for ; Tue, 24 Mar 2020 12:01:38 +0000 (UTC) X-FDA: 76630116276.21.horn84_2b3f761c0c31c X-HE-Tag: horn84_2b3f761c0c31c X-Filterd-Recvd-Size: 22933 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Tue, 24 Mar 2020 12:01:36 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B4F171FB; Tue, 24 Mar 2020 05:01:35 -0700 (PDT) Received: from [10.163.1.71] (unknown [10.163.1.71]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DD49C3F792; Tue, 24 Mar 2020 05:01:26 -0700 (PDT) From: Anshuman Khandual Subject: Re: [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages() To: Robin Murphy , linux-mm@kvack.org Cc: Mark Rutland , Michal Hocko , linux-ia64@vger.kernel.org, David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-riscv@lists.infradead.org, Will Deacon , Thomas Gleixner , x86@kernel.org, "Matthew Wilcox (Oracle)" , Mike Rapoport , Ingo Molnar , Fenghua Yu , Pavel Tatashin , Andy Lutomirski , Paul Walmsley , Dan Williams , linux-arm-kernel@lists.infradead.org, Tony Luck , linux-kernel@vger.kernel.org, Palmer Dabbelt , Andrew Morton , "Kirill A. Shutemov" References: <1583331030-7335-1-git-send-email-anshuman.khandual@arm.com> <1583331030-7335-2-git-send-email-anshuman.khandual@arm.com> <5e1bad9b-11d7-344c-766f-162f7a779941@arm.com> Message-ID: Date: Tue, 24 Mar 2020 17:31:21 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <5e1bad9b-11d7-344c-766f-162f7a779941@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03/20/2020 10:38 PM, Robin Murphy wrote: > On 2020-03-04 2:10 pm, Anshuman Khandual wrote: >> vmemmap_populate_basepages() is used across platforms to allocate back= ing >> memory for vmemmap mapping. This is used as a standard default choice = or >> as a fallback when intended huge pages allocation fails. This just cre= ates >> entire vmemmap mapping with base pages (PAGE_SIZE). >> >> On arm64 platforms, vmemmap_populate_basepages() is called instead of = the >> platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_M= APS >> is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES conf= igs. >> >> At present vmemmap_populate_basepages() does not support allocating fr= om >> driver defined struct vmem_altmap while trying to create vmemmap mappi= ng >> for a device memory range. It prevents ARM64_16K_PAGES and ARM64_64K_P= AGES >> configs on arm64 from supporting device memory with vmemap_altmap requ= est. >> >> This enables vmem_altmap support in vmemmap_populate_basepages() unloc= king >> device memory allocation for vmemap mapping on arm64 platforms with 16= K or >> 64K base page configs. >> >> Each architecture should evaluate and decide on subscribing device mem= ory >> based base page allocation through vmemmap_populate_basepages(). Hence= lets >> keep it disabled on all archs in order to preserve the existing semant= ics. >> A subsequent patch enables it on arm64. >=20 > I guess buy-in for this change largely depends on whether any other arc= hitectures are likely to want to share it. The existing altmap users don'= t look like they would, so that's probably more a question for the likes = of S390 and RISC-V. If vmemmap_populate_basepages() exists to be shared across platforms for creating vmemmap mapping with base pages, then there does not seem to be any good reason for it not to support altmap requests as well. >=20 > Failing that, simply decoupling arm64 from vmemmap_populate_basepages()= seems viable - I tried hacking up a quick proof-of-concept (attached at = the end) and it doesn't come out looking *too* disgusting. Even though this option seemed viable to me at the beginning, there was no particular pressing reasons for vmemmap_populate_basepages() to exist as a generic function and not support atlamp. If each architecture just create their own policies regarding which level to support altmap or not while also using a generic function, then why even have a minimum shared function like vmemmap_populate_basepages() in the first place. >=20 >> Cc: Catalin Marinas >> Cc: Will Deacon >> Cc: Mark Rutland >> Cc: Paul Walmsley >> Cc: Palmer Dabbelt >> Cc: Tony Luck >> Cc: Fenghua Yu >> Cc: Dave Hansen >> Cc: Andy Lutomirski >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: Ingo Molnar >> Cc: David Hildenbrand >> Cc: Mike Rapoport >> Cc: Michal Hocko >> Cc: "Matthew Wilcox (Oracle)" >> Cc: "Kirill A. Shutemov" >> Cc: Andrew Morton >> Cc: Dan Williams >> Cc: Pavel Tatashin >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: linux-ia64@vger.kernel.org >> Cc: linux-riscv@lists.infradead.org >> Cc: x86@kernel.org >> Cc: linux-kernel@vger.kernel.org >> >> Acked-by: Will Deacon >> Signed-off-by: Anshuman Khandual >> --- >> =C2=A0 arch/arm64/mm/mmu.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 2 +- >> =C2=A0 arch/ia64/mm/discontig.c |=C2=A0 2 +- >> =C2=A0 arch/riscv/mm/init.c=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 2 +- >> =C2=A0 arch/x86/mm/init_64.c=C2=A0=C2=A0=C2=A0 |=C2=A0 6 +++--- >> =C2=A0 include/linux/mm.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 = 5 +++-- >> =C2=A0 mm/sparse-vmemmap.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 16 ++++++++= +++----- >> =C2=A0 6 files changed, 20 insertions(+), 13 deletions(-) >> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >> index 9b08f7c7e6f0..27cb95c471eb 100644 >> --- a/arch/arm64/mm/mmu.c >> +++ b/arch/arm64/mm/mmu.c >> @@ -1036,7 +1036,7 @@ static void free_empty_tables(unsigned long addr= , unsigned long end, >> =C2=A0 int __meminit vmemmap_populate(unsigned long start, unsigned lo= ng end, int node, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vmem_alt= map *altmap) >> =C2=A0 { >> -=C2=A0=C2=A0=C2=A0 return vmemmap_populate_basepages(start, end, node= ); >> +=C2=A0=C2=A0=C2=A0 return vmemmap_populate_basepages(start, end, node= , NULL); >> =C2=A0 } >> =C2=A0 #else=C2=A0=C2=A0=C2=A0 /* !ARM64_SWAPPER_USES_SECTION_MAPS */ >> =C2=A0 int __meminit vmemmap_populate(unsigned long start, unsigned lo= ng end, int node, >> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c >> index 4f33f6e7e206..20409f3afea8 100644 >> --- a/arch/ia64/mm/discontig.c >> +++ b/arch/ia64/mm/discontig.c >> @@ -656,7 +656,7 @@ void arch_refresh_nodedata(int update_node, pg_dat= a_t *update_pgdat) >> =C2=A0 int __meminit vmemmap_populate(unsigned long start, unsigned lo= ng end, int node, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vmem_alt= map *altmap) >> =C2=A0 { >> -=C2=A0=C2=A0=C2=A0 return vmemmap_populate_basepages(start, end, node= ); >> +=C2=A0=C2=A0=C2=A0 return vmemmap_populate_basepages(start, end, node= , NULL); >> =C2=A0 } >> =C2=A0 =C2=A0 void vmemmap_free(unsigned long start, unsigned long end= , >> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >> index 965a8cf4829c..1d7451c91982 100644 >> --- a/arch/riscv/mm/init.c >> +++ b/arch/riscv/mm/init.c >> @@ -501,6 +501,6 @@ void __init paging_init(void) >> =C2=A0 int __meminit vmemmap_populate(unsigned long start, unsigned lo= ng end, int node, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vmem_altmap *altm= ap) >> =C2=A0 { >> -=C2=A0=C2=A0=C2=A0 return vmemmap_populate_basepages(start, end, node= ); >> +=C2=A0=C2=A0=C2=A0 return vmemmap_populate_basepages(start, end, node= , NULL); >> =C2=A0 } >> =C2=A0 #endif >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >> index abbdecb75fad..3272fe0d844a 100644 >> --- a/arch/x86/mm/init_64.c >> +++ b/arch/x86/mm/init_64.c >> @@ -1471,7 +1471,7 @@ static int __meminit vmemmap_populate_hugepages(= unsigned long start, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 vmemmap_verify((pte_t *)pmd, node, addr, next); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 continue; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (vmemmap_populate_basep= ages(addr, next, node)) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (vmemmap_populate_basep= ages(addr, next, node, NULL)) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return -ENOMEM; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >> @@ -1483,7 +1483,7 @@ int __meminit vmemmap_populate(unsigned long sta= rt, unsigned long end, int node, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int err; >> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (end - start < PAGES_PER_SECT= ION * sizeof(struct page)) >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D vmemmap_populate_b= asepages(start, end, node); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D vmemmap_populate_b= asepages(start, end, node, NULL); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else if (boot_cpu_has(X86_FEATURE_PSE)) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D vmemmap= _populate_hugepages(start, end, node, altmap); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else if (altmap) { >> @@ -1491,7 +1491,7 @@ int __meminit vmemmap_populate(unsigned long sta= rt, unsigned long end, int node, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __func__); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D -ENOMEM= ; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } else >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D vmemmap_populate_b= asepages(start, end, node); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D vmemmap_populate_b= asepages(start, end, node, NULL); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!err) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sync_global_pgd= s(start, end - 1); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return err; >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 52269e56c514..42f99c8d63c0 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -2780,14 +2780,15 @@ pgd_t *vmemmap_pgd_populate(unsigned long addr= , int node); >> =C2=A0 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int= node); >> =C2=A0 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int= node); >> =C2=A0 pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int= node); >> -pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node)= ; >> +pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 struct vmem_altmap *altmap); >> =C2=A0 void *vmemmap_alloc_block(unsigned long size, int node); >> =C2=A0 struct vmem_altmap; >> =C2=A0 void *vmemmap_alloc_block_buf(unsigned long size, int node); >> =C2=A0 void *altmap_alloc_block_buf(unsigned long size, struct vmem_al= tmap *altmap); >> =C2=A0 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long)= ; >> =C2=A0 int vmemmap_populate_basepages(unsigned long start, unsigned lo= ng end, >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int node); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int node, struct vmem_altmap *alt= map); >> =C2=A0 int vmemmap_populate(unsigned long start, unsigned long end, in= t node, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vmem_alt= map *altmap); >> =C2=A0 void vmemmap_populate_print_last(void); >> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c >> index 200aef686722..a407abc9b46c 100644 >> --- a/mm/sparse-vmemmap.c >> +++ b/mm/sparse-vmemmap.c >> @@ -140,12 +140,18 @@ void __meminit vmemmap_verify(pte_t *pte, int no= de, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 start, end - 1); >> =C2=A0 } >> =C2=A0 -pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned lo= ng addr, int node) >> +pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr= , int node, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vm= em_altmap *altmap) >> =C2=A0 { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pte_t *pte =3D pte_offset_kernel(pmd, a= ddr); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (pte_none(*pte)) { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pte_t entry; >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 void *p =3D vmemmap_alloc_= block_buf(PAGE_SIZE, node); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 void *p; >> + >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (altmap) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p = =3D altmap_alloc_block_buf(PAGE_SIZE, altmap); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p = =3D vmemmap_alloc_block_buf(PAGE_SIZE, node); >=20 > This pattern ends up appearing a number of times by the end - if we do = go down the generic code route, might it be worth pushing it down into vm= memmap_alloc_block_buf() itself to make it automatic? (possibly even incl= uding the powerpc fallback behaviour too?) Yes, this pattern is now there in couple of more places. Sure, will chang= e vmemmap_alloc_block_buf() to handle altmap with a fallback request. Something like this (not tested properly) ---------------------------------------------------=20 From: Anshuman Khandual Date: Tue, 24 Mar 2020 07:35:47 +0000 Subject: [PATCH] mm/sparse: Enable vmemmap_alloc_block_buf() for altmap allocations There are many instances where vmemap allocation is often switched betwee= n device memory and regular memory based on whether altmap is available or not. vmemmap_alloc_block_buf() is used in various platforms to allocate vmemmap. Hence enable it to handle altmap based device memory allocation = as well. While here implement a regular memory allocation fallback mechanism that is used in powerpc. Suggested-by: Robin Murphy Signed-off-by: Anshuman Khandual --- arch/arm64/mm/mmu.c | 6 ++---- arch/powerpc/mm/init_64.c | 12 ++++++------ arch/x86/mm/init_64.c | 6 ++---- include/linux/mm.h | 3 ++- mm/sparse-vmemmap.c | 27 +++++++++++++++++++++------ 5 files changed, 33 insertions(+), 21 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 88c5b357013b..45f09935c160 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1080,10 +1080,8 @@ int __meminit vmemmap_populate(unsigned long start= , unsigned long end, int node, if (pmd_none(READ_ONCE(*pmdp))) { void *p =3D NULL; =20 - if (altmap) - p =3D altmap_alloc_block_buf(PMD_SIZE, altmap); - else - p =3D vmemmap_alloc_block_buf(PMD_SIZE, node); + p =3D vmemmap_alloc_block_buf(PMD_SIZE, node, + altmap, false); if (!p) return -ENOMEM; =20 diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 4002ced3596f..31995eb4b62a 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -150,7 +150,7 @@ static __meminit struct vmemmap_backing * vmemmap_lis= t_alloc(int node) =20 /* allocate a page when required and hand out chunks */ if (!num_left) { - next =3D vmemmap_alloc_block(PAGE_SIZE, node); + next =3D vmemmap_alloc_block(PAGE_SIZE, node, NULL, false); if (unlikely(!next)) { WARN_ON(1); return NULL; @@ -226,12 +226,12 @@ int __meminit vmemmap_populate(unsigned long start,= unsigned long end, int node, * fall back to system memory if the altmap allocation fail. */ if (altmap && !altmap_cross_boundary(altmap, start, page_size)) { - p =3D altmap_alloc_block_buf(page_size, altmap); - if (!p) - pr_debug("altmap block allocation failed, falling back to system mem= ory"); + p =3D vmemmap_alloc_block_buf(page_size, node, + altmap, true); + } else { + p =3D vmemmap_alloc_block_buf(page_size, node, + NULL, false); } - if (!p) - p =3D vmemmap_alloc_block_buf(page_size, node); if (!p) return -ENOMEM; =20 diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index c22677571619..35cc0c9d9578 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1444,10 +1444,8 @@ static int __meminit vmemmap_populate_hugepages(un= signed long start, if (pmd_none(*pmd)) { void *p; =20 - if (altmap) - p =3D altmap_alloc_block_buf(PMD_SIZE, altmap); - else - p =3D vmemmap_alloc_block_buf(PMD_SIZE, node); + p =3D vmemmap_alloc_block_buf(PMD_SIZE, node, + altmap, false); if (p) { pte_t entry; =20 diff --git a/include/linux/mm.h b/include/linux/mm.h index 4a987d173488..a2cb9c669800 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2994,7 +2994,8 @@ pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned lo= ng addr, int node, struct vmem_altmap *altmap); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; -void *vmemmap_alloc_block_buf(unsigned long size, int node); +void *vmemmap_alloc_block_buf(unsigned long size, int node, + struct vmem_altmap *altmap, bool fallback); void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *alt= map); void vmemmap_verify(pte_t *, int, unsigned long, unsigned long); int vmemmap_populate_basepages(unsigned long start, unsigned long end, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index a407abc9b46c..f502fcdf539f 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -71,10 +71,28 @@ void * __meminit vmemmap_alloc_block(unsigned long si= ze, int node) } =20 /* need to make sure size is all the same during early stage */ -void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node) +void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node, + struct vmem_altmap *altmap, + bool fallback) { - void *ptr =3D sparse_buffer_alloc(size); + void *ptr; =20 + /* + * There is no point in asking for fallback without + * an altmap request to begin with. Just warn here + * to catch potential call sites violating this. + */ + WARN_ON(!altmap && fallback); + + if (altmap) { + ptr =3D altmap_alloc_block_buf(size, altmap); + if (!ptr && !fallback) + return NULL; + pr_debug("altmap block allocation failed,\ + falling back to system memory"); + } + + ptr =3D sparse_buffer_alloc(size); if (!ptr) ptr =3D vmemmap_alloc_block(size, node); return ptr; @@ -148,10 +166,7 @@ pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, u= nsigned long addr, int node, pte_t entry; void *p; =20 - if (altmap) - p =3D altmap_alloc_block_buf(PAGE_SIZE, altmap); - else - p =3D vmemmap_alloc_block_buf(PAGE_SIZE, node); + p =3D vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap, false); if (!p) return NULL; entry =3D pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); --=20 2.20.1 >=20 > Robin. >=20 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!p) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return NULL; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 entry =3D pfn_p= te(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); >> @@ -213,8 +219,8 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned lo= ng addr, int node) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return pgd; >> =C2=A0 } >> =C2=A0 -int __meminit vmemmap_populate_basepages(unsigned long start, >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long end, in= t node) >> +int __meminit vmemmap_populate_basepages(unsigned long start, unsigne= d long end, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int node, struct vmem= _altmap *altmap) >> =C2=A0 { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long addr =3D start; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgd_t *pgd; >> @@ -236,7 +242,7 @@ int __meminit vmemmap_populate_basepages(unsigned = long start, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmd =3D vmemmap= _pmd_populate(pud, addr, node); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!pmd) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return -ENOMEM; >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pte =3D vmemmap_pte_popula= te(pmd, addr, node); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pte =3D vmemmap_pte_popula= te(pmd, addr, node, altmap); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!pte) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return -ENOMEM; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vmemmap_verify(= pte, node, addr, addr + PAGE_SIZE); >> >=20 > ----->8----- > From: Robin Murphy > Subject: [PATCH] arm64/mm: Consolidate vmemmap_populate() >=20 > Since we already have a custom vmemmap_populate() implementation, fold > the non-section-map case into that as well, so that we can easily add > altmap support for both cases without having to mess with core code. >=20 > Signed-off-by: Robin Murphy > --- > =C2=A0arch/arm64/mm/mmu.c | 34 +++++++++++++++++++++------------- > =C2=A01 file changed, 21 insertions(+), 13 deletions(-) >=20 > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 128f70852bf3..e250fd414b2b 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -725,13 +725,6 @@ int kern_addr_valid(unsigned long addr) > =C2=A0=C2=A0=C2=A0=C2=A0 return pfn_valid(pte_pfn(pte)); > =C2=A0} > =C2=A0#ifdef CONFIG_SPARSEMEM_VMEMMAP > -#if !ARM64_SWAPPER_USES_SECTION_MAPS > -int __meminit vmemmap_populate(unsigned long start, unsigned long end,= int node, > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vmem_altmap *altmap) > -{ > -=C2=A0=C2=A0=C2=A0 return vmemmap_populate_basepages(start, end, node)= ; > -} > -#else=C2=A0=C2=A0=C2=A0 /* !ARM64_SWAPPER_USES_SECTION_MAPS */ > =C2=A0int __meminit vmemmap_populate(unsigned long start, unsigned long= end, int node, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vmem_altmap *al= tmap) > =C2=A0{ > @@ -740,6 +733,7 @@ int __meminit vmemmap_populate(unsigned long start,= unsigned long end, int node, > =C2=A0=C2=A0=C2=A0=C2=A0 pgd_t *pgdp; > =C2=A0=C2=A0=C2=A0=C2=A0 pud_t *pudp; > =C2=A0=C2=A0=C2=A0=C2=A0 pmd_t *pmdp; > +=C2=A0=C2=A0=C2=A0 pte_t *ptep; >=20 > =C2=A0=C2=A0=C2=A0=C2=A0 do { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 next =3D pmd_addr_end(= addr, end); > @@ -752,22 +746,36 @@ int __meminit vmemmap_populate(unsigned long star= t, unsigned long end, int node, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!pudp) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= return -ENOMEM; >=20 > +#if ARM64_SWAPPER_USES_SECTION_MAPS > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmdp =3D pmd_offset(pu= dp, addr); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (pmd_none(READ_ONCE= (*pmdp))) { > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 voi= d *p =3D NULL; > - > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p =3D= vmemmap_alloc_block_buf(PMD_SIZE, node); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 voi= d *p =3D vmemmap_alloc_block_buf(PMD_SIZE, node); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= if (!p) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 return -ENOMEM; >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL)); > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } else > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vme= mmap_verify((pte_t *)pmdp, node, addr, next); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 con= tinue; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > +#else > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmdp =3D vmemmap_pmd_popula= te(pmdp, addr, node); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!pmdp) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret= urn -ENOMEM; > + > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ptep =3D pte_offset_kernel(= pmdp, addr); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (pte_none(READ_ONCE(*pte= p))) { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 voi= d *p =3D vmemmap_alloc_block_buf(PAGE_SIZE, node); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if = (!p) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 return -ENOMEM; > + > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 set= _pte(ptep, pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL)); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > +#endif > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vmemmap_verify((pte_t *)pmd= p, node, addr, next); > =C2=A0=C2=A0=C2=A0=C2=A0 } while (addr =3D next, addr !=3D end); >=20 > =C2=A0=C2=A0=C2=A0=C2=A0 return 0; > =C2=A0} > -#endif=C2=A0=C2=A0=C2=A0 /* !ARM64_SWAPPER_USES_SECTION_MAPS */ > + > =C2=A0void vmemmap_free(unsigned long start, unsigned long end, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vmem_altmap *al= tmap) > =C2=A0{ >=20 >=20