linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/2] arm64: Enable vmemmap mapping from device memory
@ 2020-03-04 14:10 Anshuman Khandual
  2020-03-04 14:10 ` [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages() Anshuman Khandual
  2020-03-04 14:10 ` [PATCH V2 2/2] arm64/mm: Enable vmem_altmap support for vmemmap mappings Anshuman Khandual
  0 siblings, 2 replies; 7+ messages in thread
From: Anshuman Khandual @ 2020-03-04 14:10 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Tony Luck, Fenghua Yu,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, David Hildenbrand, Mike Rapoport, Michal Hocko,
	Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Andrew Morton, Dan Williams, Pavel Tatashin,
	linux-arm-kernel, linux-ia64, linux-riscv, x86, linux-kernel

This series enables vmemmap backing memory allocation from device memory
ranges on arm64. But before that, it enables vmemmap_populate_basepages()
to accommodate struct vmem_altmap based requests.

This series applies after latest (v14) arm64 memory hot remove series
(https://lkml.org/lkml/2020/3/3/1746) on Linux 5.6-rc4.

Changes in V2:

- Rebased on latest hot-remove series (v14) adding P4D page table support

Changes in V1: (https://lkml.org/lkml/2020/1/23/12)

- Added an WARN_ON() in unmap_hotplug_range() when altmap is
  provided without the page table backing memory being freed

Changes in RFC V2: (https://lkml.org/lkml/2019/10/21/11)

- Changed the commit message on 1/2 patch per Will
- Changed the commit message on 2/2 patch as well
- Rebased on arm64 memory hot remove series (v10)

RFC V1: (https://lkml.org/lkml/2019/6/28/32)

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (2):
  mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
  arm64/mm: Enable vmem_altmap support for vmemmap mappings

 arch/arm64/mm/mmu.c      | 71 +++++++++++++++++++++++++++++-----------
 arch/ia64/mm/discontig.c |  2 +-
 arch/riscv/mm/init.c     |  2 +-
 arch/x86/mm/init_64.c    |  6 ++--
 include/linux/mm.h       |  5 +--
 mm/sparse-vmemmap.c      | 16 ++++++---
 6 files changed, 70 insertions(+), 32 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
  2020-03-04 14:10 [PATCH V2 0/2] arm64: Enable vmemmap mapping from device memory Anshuman Khandual
@ 2020-03-04 14:10 ` Anshuman Khandual
  2020-03-20 17:08   ` Robin Murphy
  2020-03-04 14:10 ` [PATCH V2 2/2] arm64/mm: Enable vmem_altmap support for vmemmap mappings Anshuman Khandual
  1 sibling, 1 reply; 7+ messages in thread
From: Anshuman Khandual @ 2020-03-04 14:10 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Tony Luck, Fenghua Yu,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, David Hildenbrand, Mike Rapoport, Michal Hocko,
	Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Andrew Morton, Dan Williams, Pavel Tatashin,
	linux-arm-kernel, linux-ia64, linux-riscv, x86, linux-kernel

vmemmap_populate_basepages() is used across platforms to allocate backing
memory for vmemmap mapping. This is used as a standard default choice or
as a fallback when intended huge pages allocation fails. This just creates
entire vmemmap mapping with base pages (PAGE_SIZE).

On arm64 platforms, vmemmap_populate_basepages() is called instead of the
platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.

At present vmemmap_populate_basepages() does not support allocating from
driver defined struct vmem_altmap while trying to create vmemmap mapping
for a device memory range. It prevents ARM64_16K_PAGES and ARM64_64K_PAGES
configs on arm64 from supporting device memory with vmemap_altmap request.

This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
device memory allocation for vmemap mapping on arm64 platforms with 16K or
64K base page configs.

Each architecture should evaluate and decide on subscribing device memory
based base page allocation through vmemmap_populate_basepages(). Hence lets
keep it disabled on all archs in order to preserve the existing semantics.
A subsequent patch enables it on arm64.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/mmu.c      |  2 +-
 arch/ia64/mm/discontig.c |  2 +-
 arch/riscv/mm/init.c     |  2 +-
 arch/x86/mm/init_64.c    |  6 +++---
 include/linux/mm.h       |  5 +++--
 mm/sparse-vmemmap.c      | 16 +++++++++++-----
 6 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9b08f7c7e6f0..27cb95c471eb 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1036,7 +1036,7 @@ static void free_empty_tables(unsigned long addr, unsigned long end,
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap)
 {
-	return vmemmap_populate_basepages(start, end, node);
+	return vmemmap_populate_basepages(start, end, node, NULL);
 }
 #else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 4f33f6e7e206..20409f3afea8 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -656,7 +656,7 @@ void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap)
 {
-	return vmemmap_populate_basepages(start, end, node);
+	return vmemmap_populate_basepages(start, end, node, NULL);
 }
 
 void vmemmap_free(unsigned long start, unsigned long end,
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 965a8cf4829c..1d7451c91982 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -501,6 +501,6 @@ void __init paging_init(void)
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 			       struct vmem_altmap *altmap)
 {
-	return vmemmap_populate_basepages(start, end, node);
+	return vmemmap_populate_basepages(start, end, node, NULL);
 }
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index abbdecb75fad..3272fe0d844a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1471,7 +1471,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 			vmemmap_verify((pte_t *)pmd, node, addr, next);
 			continue;
 		}
-		if (vmemmap_populate_basepages(addr, next, node))
+		if (vmemmap_populate_basepages(addr, next, node, NULL))
 			return -ENOMEM;
 	}
 	return 0;
@@ -1483,7 +1483,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	int err;
 
 	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
-		err = vmemmap_populate_basepages(start, end, node);
+		err = vmemmap_populate_basepages(start, end, node, NULL);
 	else if (boot_cpu_has(X86_FEATURE_PSE))
 		err = vmemmap_populate_hugepages(start, end, node, altmap);
 	else if (altmap) {
@@ -1491,7 +1491,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 				__func__);
 		err = -ENOMEM;
 	} else
-		err = vmemmap_populate_basepages(start, end, node);
+		err = vmemmap_populate_basepages(start, end, node, NULL);
 	if (!err)
 		sync_global_pgds(start, end - 1);
 	return err;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..42f99c8d63c0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2780,14 +2780,15 @@ pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
 pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
-pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
+pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
+			    struct vmem_altmap *altmap);
 void *vmemmap_alloc_block(unsigned long size, int node);
 struct vmem_altmap;
 void *vmemmap_alloc_block_buf(unsigned long size, int node);
 void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap);
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
-			       int node);
+			       int node, struct vmem_altmap *altmap);
 int vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap);
 void vmemmap_populate_print_last(void);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 200aef686722..a407abc9b46c 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -140,12 +140,18 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 			start, end - 1);
 }
 
-pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node)
+pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
+				       struct vmem_altmap *altmap)
 {
 	pte_t *pte = pte_offset_kernel(pmd, addr);
 	if (pte_none(*pte)) {
 		pte_t entry;
-		void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
+		void *p;
+
+		if (altmap)
+			p = altmap_alloc_block_buf(PAGE_SIZE, altmap);
+		else
+			p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
 		entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
@@ -213,8 +219,8 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
 	return pgd;
 }
 
-int __meminit vmemmap_populate_basepages(unsigned long start,
-					 unsigned long end, int node)
+int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end,
+					 int node, struct vmem_altmap *altmap)
 {
 	unsigned long addr = start;
 	pgd_t *pgd;
@@ -236,7 +242,7 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
 		pmd = vmemmap_pmd_populate(pud, addr, node);
 		if (!pmd)
 			return -ENOMEM;
-		pte = vmemmap_pte_populate(pmd, addr, node);
+		pte = vmemmap_pte_populate(pmd, addr, node, altmap);
 		if (!pte)
 			return -ENOMEM;
 		vmemmap_verify(pte, node, addr, addr + PAGE_SIZE);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 2/2] arm64/mm: Enable vmem_altmap support for vmemmap mappings
  2020-03-04 14:10 [PATCH V2 0/2] arm64: Enable vmemmap mapping from device memory Anshuman Khandual
  2020-03-04 14:10 ` [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages() Anshuman Khandual
@ 2020-03-04 14:10 ` Anshuman Khandual
  2020-03-20 19:35   ` Robin Murphy
  1 sibling, 1 reply; 7+ messages in thread
From: Anshuman Khandual @ 2020-03-04 14:10 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, Mark Rutland,
	Steve Capper, David Hildenbrand, Yu Zhao, Hsin-Yi Wang,
	Thomas Gleixner, Andrew Morton, linux-arm-kernel, linux-kernel

Device memory ranges when getting hot added into ZONE_DEVICE, might require
their vmemmap mapping's backing memory to be allocated from their own range
instead of consuming system memory. This prevents large system memory usage
for potentially large device memory ranges. Device driver communicates this
request via vmem_altmap structure. Architecture needs to take this request
into account while creating and tearing down vemmmap mappings.

This enables vmem_altmap support in vmemmap_populate() and vmemmap_free()
which includes vmemmap_populate_basepages() used for ARM64_16K_PAGES and
ARM64_64K_PAGES configs.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/mmu.c | 71 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 27cb95c471eb..0e0a0ecc812e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -727,15 +727,30 @@ int kern_addr_valid(unsigned long addr)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static void free_hotplug_page_range(struct page *page, size_t size)
+static void free_hotplug_page_range(struct page *page, size_t size,
+				    struct vmem_altmap *altmap)
 {
-	WARN_ON(PageReserved(page));
-	free_pages((unsigned long)page_address(page), get_order(size));
+	if (altmap) {
+		/*
+		 * Though unmap_hotplug_range() will tear down altmap based
+		 * vmemmap mappings at all page table levels, these mappings
+		 * should only have been created either at PTE or PMD level
+		 * with vmemmap_populate_basepages() or vmemmap_populate()
+		 * respectively. Unmapping requests at any other level will
+		 * be problematic. Drop these warnings when vmemmap mapping
+		 * is supported at PUD (even perhaps P4D) level.
+		 */
+		WARN_ON((size != PAGE_SIZE) && (size != PMD_SIZE));
+		vmem_altmap_free(altmap, size >> PAGE_SHIFT);
+	} else {
+		WARN_ON(PageReserved(page));
+		free_pages((unsigned long)page_address(page), get_order(size));
+	}
 }
 
 static void free_hotplug_pgtable_page(struct page *page)
 {
-	free_hotplug_page_range(page, PAGE_SIZE);
+	free_hotplug_page_range(page, PAGE_SIZE, NULL);
 }
 
 static bool pgtable_range_aligned(unsigned long start, unsigned long end,
@@ -758,7 +773,8 @@ static bool pgtable_range_aligned(unsigned long start, unsigned long end,
 }
 
 static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
-				    unsigned long end, bool free_mapped)
+				    unsigned long end, bool free_mapped,
+				    struct vmem_altmap *altmap)
 {
 	pte_t *ptep, pte;
 
@@ -772,12 +788,14 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
 		pte_clear(&init_mm, addr, ptep);
 		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 		if (free_mapped)
-			free_hotplug_page_range(pte_page(pte), PAGE_SIZE);
+			free_hotplug_page_range(pte_page(pte),
+						PAGE_SIZE, altmap);
 	} while (addr += PAGE_SIZE, addr < end);
 }
 
 static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
-				    unsigned long end, bool free_mapped)
+				    unsigned long end, bool free_mapped,
+				    struct vmem_altmap *altmap)
 {
 	unsigned long next;
 	pmd_t *pmdp, pmd;
@@ -800,16 +818,17 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
 			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 			if (free_mapped)
 				free_hotplug_page_range(pmd_page(pmd),
-							PMD_SIZE);
+							PMD_SIZE, altmap);
 			continue;
 		}
 		WARN_ON(!pmd_table(pmd));
-		unmap_hotplug_pte_range(pmdp, addr, next, free_mapped);
+		unmap_hotplug_pte_range(pmdp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
 static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
-				    unsigned long end, bool free_mapped)
+				    unsigned long end, bool free_mapped,
+				    struct vmem_altmap *altmap)
 {
 	unsigned long next;
 	pud_t *pudp, pud;
@@ -832,16 +851,17 @@ static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
 			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 			if (free_mapped)
 				free_hotplug_page_range(pud_page(pud),
-							PUD_SIZE);
+							PUD_SIZE, altmap);
 			continue;
 		}
 		WARN_ON(!pud_table(pud));
-		unmap_hotplug_pmd_range(pudp, addr, next, free_mapped);
+		unmap_hotplug_pmd_range(pudp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
 static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
-				    unsigned long end, bool free_mapped)
+				    unsigned long end, bool free_mapped,
+				    struct vmem_altmap *altmap)
 {
 	unsigned long next;
 	p4d_t *p4dp, p4d;
@@ -854,16 +874,24 @@ static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
 			continue;
 
 		WARN_ON(!p4d_present(p4d));
-		unmap_hotplug_pud_range(p4dp, addr, next, free_mapped);
+		unmap_hotplug_pud_range(p4dp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
 static void unmap_hotplug_range(unsigned long addr, unsigned long end,
-				bool free_mapped)
+				bool free_mapped, struct vmem_altmap *altmap)
 {
 	unsigned long next;
 	pgd_t *pgdp, pgd;
 
+	/*
+	 * vmem_altmap can only be used as backing memory in a given
+	 * page table mapping. In case backing memory itself is not
+	 * being freed, then altmap is irrelevant. Warn about this
+	 * inconsistency when encountered.
+	 */
+	WARN_ON(!free_mapped && altmap);
+
 	do {
 		next = pgd_addr_end(addr, end);
 		pgdp = pgd_offset_k(addr);
@@ -872,7 +900,7 @@ static void unmap_hotplug_range(unsigned long addr, unsigned long end,
 			continue;
 
 		WARN_ON(!pgd_present(pgd));
-		unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped);
+		unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
@@ -1036,7 +1064,7 @@ static void free_empty_tables(unsigned long addr, unsigned long end,
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap)
 {
-	return vmemmap_populate_basepages(start, end, node, NULL);
+	return vmemmap_populate_basepages(start, end, node, altmap);
 }
 #else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
@@ -1063,7 +1091,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		if (pmd_none(READ_ONCE(*pmdp))) {
 			void *p = NULL;
 
-			p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+			if (altmap)
+				p = altmap_alloc_block_buf(PMD_SIZE, altmap);
+			else
+				p = vmemmap_alloc_block_buf(PMD_SIZE, node);
 			if (!p)
 				return -ENOMEM;
 
@@ -1081,7 +1112,7 @@ void vmemmap_free(unsigned long start, unsigned long end,
 #ifdef CONFIG_MEMORY_HOTPLUG
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
 
-	unmap_hotplug_range(start, end, true);
+	unmap_hotplug_range(start, end, true, altmap);
 	free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
 #endif
 }
@@ -1369,7 +1400,7 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
 	WARN_ON(pgdir != init_mm.pgd);
 	WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END));
 
-	unmap_hotplug_range(start, end, false);
+	unmap_hotplug_range(start, end, false, NULL);
 	free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
  2020-03-04 14:10 ` [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages() Anshuman Khandual
@ 2020-03-20 17:08   ` Robin Murphy
  2020-03-24 12:01     ` Anshuman Khandual
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2020-03-20 17:08 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Mark Rutland, Michal Hocko, linux-ia64, David Hildenbrand,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-riscv,
	Will Deacon, Thomas Gleixner, x86, Matthew Wilcox (Oracle),
	Mike Rapoport, Ingo Molnar, Fenghua Yu, Pavel Tatashin,
	Andy Lutomirski, Paul Walmsley, Dan Williams, linux-arm-kernel,
	Tony Luck, linux-kernel, Palmer Dabbelt, Andrew Morton,
	Kirill A. Shutemov

On 2020-03-04 2:10 pm, Anshuman Khandual wrote:
> vmemmap_populate_basepages() is used across platforms to allocate backing
> memory for vmemmap mapping. This is used as a standard default choice or
> as a fallback when intended huge pages allocation fails. This just creates
> entire vmemmap mapping with base pages (PAGE_SIZE).
> 
> On arm64 platforms, vmemmap_populate_basepages() is called instead of the
> platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
> is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.
> 
> At present vmemmap_populate_basepages() does not support allocating from
> driver defined struct vmem_altmap while trying to create vmemmap mapping
> for a device memory range. It prevents ARM64_16K_PAGES and ARM64_64K_PAGES
> configs on arm64 from supporting device memory with vmemap_altmap request.
> 
> This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
> device memory allocation for vmemap mapping on arm64 platforms with 16K or
> 64K base page configs.
> 
> Each architecture should evaluate and decide on subscribing device memory
> based base page allocation through vmemmap_populate_basepages(). Hence lets
> keep it disabled on all archs in order to preserve the existing semantics.
> A subsequent patch enables it on arm64.

I guess buy-in for this change largely depends on whether any other 
architectures are likely to want to share it. The existing altmap users 
don't look like they would, so that's probably more a question for the 
likes of S390 and RISC-V.

Failing that, simply decoupling arm64 from vmemmap_populate_basepages() 
seems viable - I tried hacking up a quick proof-of-concept (attached at 
the end) and it doesn't come out looking *too* disgusting.

> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-riscv@lists.infradead.org
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Acked-by: Will Deacon <will@kernel.org>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   arch/arm64/mm/mmu.c      |  2 +-
>   arch/ia64/mm/discontig.c |  2 +-
>   arch/riscv/mm/init.c     |  2 +-
>   arch/x86/mm/init_64.c    |  6 +++---
>   include/linux/mm.h       |  5 +++--
>   mm/sparse-vmemmap.c      | 16 +++++++++++-----
>   6 files changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 9b08f7c7e6f0..27cb95c471eb 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1036,7 +1036,7 @@ static void free_empty_tables(unsigned long addr, unsigned long end,
>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   		struct vmem_altmap *altmap)
>   {
> -	return vmemmap_populate_basepages(start, end, node);
> +	return vmemmap_populate_basepages(start, end, node, NULL);
>   }
>   #else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
> index 4f33f6e7e206..20409f3afea8 100644
> --- a/arch/ia64/mm/discontig.c
> +++ b/arch/ia64/mm/discontig.c
> @@ -656,7 +656,7 @@ void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   		struct vmem_altmap *altmap)
>   {
> -	return vmemmap_populate_basepages(start, end, node);
> +	return vmemmap_populate_basepages(start, end, node, NULL);
>   }
>   
>   void vmemmap_free(unsigned long start, unsigned long end,
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 965a8cf4829c..1d7451c91982 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -501,6 +501,6 @@ void __init paging_init(void)
>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   			       struct vmem_altmap *altmap)
>   {
> -	return vmemmap_populate_basepages(start, end, node);
> +	return vmemmap_populate_basepages(start, end, node, NULL);
>   }
>   #endif
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index abbdecb75fad..3272fe0d844a 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1471,7 +1471,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
>   			vmemmap_verify((pte_t *)pmd, node, addr, next);
>   			continue;
>   		}
> -		if (vmemmap_populate_basepages(addr, next, node))
> +		if (vmemmap_populate_basepages(addr, next, node, NULL))
>   			return -ENOMEM;
>   	}
>   	return 0;
> @@ -1483,7 +1483,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   	int err;
>   
>   	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
> -		err = vmemmap_populate_basepages(start, end, node);
> +		err = vmemmap_populate_basepages(start, end, node, NULL);
>   	else if (boot_cpu_has(X86_FEATURE_PSE))
>   		err = vmemmap_populate_hugepages(start, end, node, altmap);
>   	else if (altmap) {
> @@ -1491,7 +1491,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   				__func__);
>   		err = -ENOMEM;
>   	} else
> -		err = vmemmap_populate_basepages(start, end, node);
> +		err = vmemmap_populate_basepages(start, end, node, NULL);
>   	if (!err)
>   		sync_global_pgds(start, end - 1);
>   	return err;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52269e56c514..42f99c8d63c0 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2780,14 +2780,15 @@ pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>   p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
>   pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
>   pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
> -pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
> +pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
> +			    struct vmem_altmap *altmap);
>   void *vmemmap_alloc_block(unsigned long size, int node);
>   struct vmem_altmap;
>   void *vmemmap_alloc_block_buf(unsigned long size, int node);
>   void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap);
>   void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
>   int vmemmap_populate_basepages(unsigned long start, unsigned long end,
> -			       int node);
> +			       int node, struct vmem_altmap *altmap);
>   int vmemmap_populate(unsigned long start, unsigned long end, int node,
>   		struct vmem_altmap *altmap);
>   void vmemmap_populate_print_last(void);
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 200aef686722..a407abc9b46c 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -140,12 +140,18 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>   			start, end - 1);
>   }
>   
> -pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node)
> +pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
> +				       struct vmem_altmap *altmap)
>   {
>   	pte_t *pte = pte_offset_kernel(pmd, addr);
>   	if (pte_none(*pte)) {
>   		pte_t entry;
> -		void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
> +		void *p;
> +
> +		if (altmap)
> +			p = altmap_alloc_block_buf(PAGE_SIZE, altmap);
> +		else
> +			p = vmemmap_alloc_block_buf(PAGE_SIZE, node);

This pattern ends up appearing a number of times by the end - if we do 
go down the generic code route, might it be worth pushing it down into 
vmmemmap_alloc_block_buf() itself to make it automatic? (possibly even 
including the powerpc fallback behaviour too?)

Robin.

>   		if (!p)
>   			return NULL;
>   		entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
> @@ -213,8 +219,8 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
>   	return pgd;
>   }
>   
> -int __meminit vmemmap_populate_basepages(unsigned long start,
> -					 unsigned long end, int node)
> +int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end,
> +					 int node, struct vmem_altmap *altmap)
>   {
>   	unsigned long addr = start;
>   	pgd_t *pgd;
> @@ -236,7 +242,7 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
>   		pmd = vmemmap_pmd_populate(pud, addr, node);
>   		if (!pmd)
>   			return -ENOMEM;
> -		pte = vmemmap_pte_populate(pmd, addr, node);
> +		pte = vmemmap_pte_populate(pmd, addr, node, altmap);
>   		if (!pte)
>   			return -ENOMEM;
>   		vmemmap_verify(pte, node, addr, addr + PAGE_SIZE);
> 

----->8-----
From: Robin Murphy <robin.murphy@arm.com>
Subject: [PATCH] arm64/mm: Consolidate vmemmap_populate()

Since we already have a custom vmemmap_populate() implementation, fold
the non-section-map case into that as well, so that we can easily add
altmap support for both cases without having to mess with core code.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
  arch/arm64/mm/mmu.c | 34 +++++++++++++++++++++-------------
  1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 128f70852bf3..e250fd414b2b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -725,13 +725,6 @@ int kern_addr_valid(unsigned long addr)
  	return pfn_valid(pte_pfn(pte));
  }
  #ifdef CONFIG_SPARSEMEM_VMEMMAP
-#if !ARM64_SWAPPER_USES_SECTION_MAPS
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, 
int node,
-		struct vmem_altmap *altmap)
-{
-	return vmemmap_populate_basepages(start, end, node);
-}
-#else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
  int __meminit vmemmap_populate(unsigned long start, unsigned long end, 
int node,
  		struct vmem_altmap *altmap)
  {
@@ -740,6 +733,7 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
  	pgd_t *pgdp;
  	pud_t *pudp;
  	pmd_t *pmdp;
+	pte_t *ptep;

  	do {
  		next = pmd_addr_end(addr, end);
@@ -752,22 +746,36 @@ int __meminit vmemmap_populate(unsigned long 
start, unsigned long end, int node,
  		if (!pudp)
  			return -ENOMEM;

+#if ARM64_SWAPPER_USES_SECTION_MAPS
  		pmdp = pmd_offset(pudp, addr);
  		if (pmd_none(READ_ONCE(*pmdp))) {
-			void *p = NULL;
-
-			p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+			void *p = vmemmap_alloc_block_buf(PMD_SIZE, node);
  			if (!p)
  				return -ENOMEM;

  			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
-		} else
-			vmemmap_verify((pte_t *)pmdp, node, addr, next);
+			continue;
+		}
+#else
+		pmdp = vmemmap_pmd_populate(pmdp, addr, node);
+		if (!pmdp)
+			return -ENOMEM;
+
+		ptep = pte_offset_kernel(pmdp, addr);
+		if (pte_none(READ_ONCE(*ptep))) {
+			void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
+			if (!p)
+				return -ENOMEM;
+
+			set_pte(ptep, pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL));
+		}
+#endif
+		vmemmap_verify((pte_t *)pmdp, node, addr, next);
  	} while (addr = next, addr != end);

  	return 0;
  }
-#endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
+
  void vmemmap_free(unsigned long start, unsigned long end,
  		struct vmem_altmap *altmap)
  {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 2/2] arm64/mm: Enable vmem_altmap support for vmemmap mappings
  2020-03-04 14:10 ` [PATCH V2 2/2] arm64/mm: Enable vmem_altmap support for vmemmap mappings Anshuman Khandual
@ 2020-03-20 19:35   ` Robin Murphy
  2020-03-24 12:03     ` Anshuman Khandual
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2020-03-20 19:35 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Mark Rutland, Yu Zhao, David Hildenbrand, Catalin Marinas,
	Steve Capper, linux-kernel, Hsin-Yi Wang, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel

On 2020-03-04 2:10 pm, Anshuman Khandual wrote:
> Device memory ranges when getting hot added into ZONE_DEVICE, might require
> their vmemmap mapping's backing memory to be allocated from their own range
> instead of consuming system memory. This prevents large system memory usage
> for potentially large device memory ranges. Device driver communicates this
> request via vmem_altmap structure. Architecture needs to take this request
> into account while creating and tearing down vemmmap mappings.
> 
> This enables vmem_altmap support in vmemmap_populate() and vmemmap_free()
> which includes vmemmap_populate_basepages() used for ARM64_16K_PAGES and
> ARM64_64K_PAGES configs.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Hsin-Yi Wang <hsinyi@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   arch/arm64/mm/mmu.c | 71 ++++++++++++++++++++++++++++++++-------------
>   1 file changed, 51 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 27cb95c471eb..0e0a0ecc812e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -727,15 +727,30 @@ int kern_addr_valid(unsigned long addr)
>   }
>   
>   #ifdef CONFIG_MEMORY_HOTPLUG
> -static void free_hotplug_page_range(struct page *page, size_t size)
> +static void free_hotplug_page_range(struct page *page, size_t size,
> +				    struct vmem_altmap *altmap)
>   {
> -	WARN_ON(PageReserved(page));
> -	free_pages((unsigned long)page_address(page), get_order(size));
> +	if (altmap) {
> +		/*
> +		 * Though unmap_hotplug_range() will tear down altmap based
> +		 * vmemmap mappings at all page table levels, these mappings
> +		 * should only have been created either at PTE or PMD level
> +		 * with vmemmap_populate_basepages() or vmemmap_populate()
> +		 * respectively. Unmapping requests at any other level will
> +		 * be problematic. Drop these warnings when vmemmap mapping
> +		 * is supported at PUD (even perhaps P4D) level.
> +		 */
> +		WARN_ON((size != PAGE_SIZE) && (size != PMD_SIZE));

Isn't that comment equally true of the regular case? AFAICS we don't 
call vmemmap_alloc_block_buf() with larger than PMD_SIZE either. If the 
warnings are useful, shouldn't they cover both cases equally? However, 
given that we never warned before, and the code here appears that it 
would work fine anyway, *are* they really useful?

> +		vmem_altmap_free(altmap, size >> PAGE_SHIFT);
> +	} else {
> +		WARN_ON(PageReserved(page));
> +		free_pages((unsigned long)page_address(page), get_order(size));
> +	}
>   }
>   
>   static void free_hotplug_pgtable_page(struct page *page)
>   {
> -	free_hotplug_page_range(page, PAGE_SIZE);
> +	free_hotplug_page_range(page, PAGE_SIZE, NULL);
>   }
>   
>   static bool pgtable_range_aligned(unsigned long start, unsigned long end,
> @@ -758,7 +773,8 @@ static bool pgtable_range_aligned(unsigned long start, unsigned long end,
>   }
>   
>   static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
> -				    unsigned long end, bool free_mapped)
> +				    unsigned long end, bool free_mapped,
> +				    struct vmem_altmap *altmap)
>   {
>   	pte_t *ptep, pte;
>   
> @@ -772,12 +788,14 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
>   		pte_clear(&init_mm, addr, ptep);
>   		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>   		if (free_mapped)
> -			free_hotplug_page_range(pte_page(pte), PAGE_SIZE);
> +			free_hotplug_page_range(pte_page(pte),
> +						PAGE_SIZE, altmap);
>   	} while (addr += PAGE_SIZE, addr < end);
>   }
>   
>   static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
> -				    unsigned long end, bool free_mapped)
> +				    unsigned long end, bool free_mapped,
> +				    struct vmem_altmap *altmap)
>   {
>   	unsigned long next;
>   	pmd_t *pmdp, pmd;
> @@ -800,16 +818,17 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
>   			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>   			if (free_mapped)
>   				free_hotplug_page_range(pmd_page(pmd),
> -							PMD_SIZE);
> +							PMD_SIZE, altmap);
>   			continue;
>   		}
>   		WARN_ON(!pmd_table(pmd));
> -		unmap_hotplug_pte_range(pmdp, addr, next, free_mapped);
> +		unmap_hotplug_pte_range(pmdp, addr, next, free_mapped, altmap);
>   	} while (addr = next, addr < end);
>   }
>   
>   static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
> -				    unsigned long end, bool free_mapped)
> +				    unsigned long end, bool free_mapped,
> +				    struct vmem_altmap *altmap)
>   {
>   	unsigned long next;
>   	pud_t *pudp, pud;
> @@ -832,16 +851,17 @@ static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
>   			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>   			if (free_mapped)
>   				free_hotplug_page_range(pud_page(pud),
> -							PUD_SIZE);
> +							PUD_SIZE, altmap);
>   			continue;
>   		}
>   		WARN_ON(!pud_table(pud));
> -		unmap_hotplug_pmd_range(pudp, addr, next, free_mapped);
> +		unmap_hotplug_pmd_range(pudp, addr, next, free_mapped, altmap);
>   	} while (addr = next, addr < end);
>   }
>   
>   static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
> -				    unsigned long end, bool free_mapped)
> +				    unsigned long end, bool free_mapped,
> +				    struct vmem_altmap *altmap)
>   {
>   	unsigned long next;
>   	p4d_t *p4dp, p4d;
> @@ -854,16 +874,24 @@ static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
>   			continue;
>   
>   		WARN_ON(!p4d_present(p4d));
> -		unmap_hotplug_pud_range(p4dp, addr, next, free_mapped);
> +		unmap_hotplug_pud_range(p4dp, addr, next, free_mapped, altmap);
>   	} while (addr = next, addr < end);
>   }
>   
>   static void unmap_hotplug_range(unsigned long addr, unsigned long end,
> -				bool free_mapped)
> +				bool free_mapped, struct vmem_altmap *altmap)
>   {
>   	unsigned long next;
>   	pgd_t *pgdp, pgd;
>   
> +	/*
> +	 * vmem_altmap can only be used as backing memory in a given
> +	 * page table mapping. In case backing memory itself is not
> +	 * being freed, then altmap is irrelevant. Warn about this
> +	 * inconsistency when encountered.
> +	 */
> +	WARN_ON(!free_mapped && altmap);

Personally I find that comment a bit unclear (particularly the first 
sentence which just seems like a confusing tautology). Is the overall 
point that the altmap only matters when we're unmapping and freeing 
vmemmap pages (such that we free them to the right allocator)? At face 
value it doesn't seem to warrant a warning - it's not necessary to know 
which allocator owns pages that we aren't freeing, but it isn't harmful 
either.

That said, however, after puzzling through the code I get the distinct 
feeling it would be more useful if all those "free_mapped" arguments 
were actually named "is_vmemmap" instead. A that point, the conceptual 
inconsistency would be a little more obvious (and arguably might not 
even need commenting).

All the altmap plumbing itself looks pretty mechanical and hard to 
disagree with :)

Robin.

> +
>   	do {
>   		next = pgd_addr_end(addr, end);
>   		pgdp = pgd_offset_k(addr);
> @@ -872,7 +900,7 @@ static void unmap_hotplug_range(unsigned long addr, unsigned long end,
>   			continue;
>   
>   		WARN_ON(!pgd_present(pgd));
> -		unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped);
> +		unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped, altmap);
>   	} while (addr = next, addr < end);
>   }
>   
> @@ -1036,7 +1064,7 @@ static void free_empty_tables(unsigned long addr, unsigned long end,
>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   		struct vmem_altmap *altmap)
>   {
> -	return vmemmap_populate_basepages(start, end, node, NULL);
> +	return vmemmap_populate_basepages(start, end, node, altmap);
>   }
>   #else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> @@ -1063,7 +1091,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   		if (pmd_none(READ_ONCE(*pmdp))) {
>   			void *p = NULL;
>   
> -			p = vmemmap_alloc_block_buf(PMD_SIZE, node);
> +			if (altmap)
> +				p = altmap_alloc_block_buf(PMD_SIZE, altmap);
> +			else
> +				p = vmemmap_alloc_block_buf(PMD_SIZE, node);
>   			if (!p)
>   				return -ENOMEM;
>   
> @@ -1081,7 +1112,7 @@ void vmemmap_free(unsigned long start, unsigned long end,
>   #ifdef CONFIG_MEMORY_HOTPLUG
>   	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>   
> -	unmap_hotplug_range(start, end, true);
> +	unmap_hotplug_range(start, end, true, altmap);
>   	free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
>   #endif
>   }
> @@ -1369,7 +1400,7 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
>   	WARN_ON(pgdir != init_mm.pgd);
>   	WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END));
>   
> -	unmap_hotplug_range(start, end, false);
> +	unmap_hotplug_range(start, end, false, NULL);
>   	free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
>   }
>   
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
  2020-03-20 17:08   ` Robin Murphy
@ 2020-03-24 12:01     ` Anshuman Khandual
  0 siblings, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2020-03-24 12:01 UTC (permalink / raw)
  To: Robin Murphy, linux-mm
  Cc: Mark Rutland, Michal Hocko, linux-ia64, David Hildenbrand,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-riscv,
	Will Deacon, Thomas Gleixner, x86, Matthew Wilcox (Oracle),
	Mike Rapoport, Ingo Molnar, Fenghua Yu, Pavel Tatashin,
	Andy Lutomirski, Paul Walmsley, Dan Williams, linux-arm-kernel,
	Tony Luck, linux-kernel, Palmer Dabbelt, Andrew Morton,
	Kirill A. Shutemov


On 03/20/2020 10:38 PM, Robin Murphy wrote:
> On 2020-03-04 2:10 pm, Anshuman Khandual wrote:
>> vmemmap_populate_basepages() is used across platforms to allocate backing
>> memory for vmemmap mapping. This is used as a standard default choice or
>> as a fallback when intended huge pages allocation fails. This just creates
>> entire vmemmap mapping with base pages (PAGE_SIZE).
>>
>> On arm64 platforms, vmemmap_populate_basepages() is called instead of the
>> platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
>> is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.
>>
>> At present vmemmap_populate_basepages() does not support allocating from
>> driver defined struct vmem_altmap while trying to create vmemmap mapping
>> for a device memory range. It prevents ARM64_16K_PAGES and ARM64_64K_PAGES
>> configs on arm64 from supporting device memory with vmemap_altmap request.
>>
>> This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
>> device memory allocation for vmemap mapping on arm64 platforms with 16K or
>> 64K base page configs.
>>
>> Each architecture should evaluate and decide on subscribing device memory
>> based base page allocation through vmemmap_populate_basepages(). Hence lets
>> keep it disabled on all archs in order to preserve the existing semantics.
>> A subsequent patch enables it on arm64.
> 
> I guess buy-in for this change largely depends on whether any other architectures are likely to want to share it. The existing altmap users don't look like they would, so that's probably more a question for the likes of S390 and RISC-V.

If vmemmap_populate_basepages() exists to be shared across platforms for
creating vmemmap mapping with base pages, then there does not seem to be
any good reason for it not to support altmap requests as well.

> 
> Failing that, simply decoupling arm64 from vmemmap_populate_basepages() seems viable - I tried hacking up a quick proof-of-concept (attached at the end) and it doesn't come out looking *too* disgusting.

Even though this option seemed viable to me at the beginning, there was
no particular pressing reasons for vmemmap_populate_basepages() to exist
as a generic function and not support atlamp. If each architecture just
create their own policies regarding which level to support altmap or not
while also using a generic function, then why even have a minimum shared
function like vmemmap_populate_basepages() in the first place.

> 
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Paul Walmsley <paul.walmsley@sifive.com>
>> Cc: Palmer Dabbelt <palmer@dabbelt.com>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Cc: Fenghua Yu <fenghua.yu@intel.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-ia64@vger.kernel.org
>> Cc: linux-riscv@lists.infradead.org
>> Cc: x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org
>>
>> Acked-by: Will Deacon <will@kernel.org>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   arch/arm64/mm/mmu.c      |  2 +-
>>   arch/ia64/mm/discontig.c |  2 +-
>>   arch/riscv/mm/init.c     |  2 +-
>>   arch/x86/mm/init_64.c    |  6 +++---
>>   include/linux/mm.h       |  5 +++--
>>   mm/sparse-vmemmap.c      | 16 +++++++++++-----
>>   6 files changed, 20 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 9b08f7c7e6f0..27cb95c471eb 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1036,7 +1036,7 @@ static void free_empty_tables(unsigned long addr, unsigned long end,
>>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>           struct vmem_altmap *altmap)
>>   {
>> -    return vmemmap_populate_basepages(start, end, node);
>> +    return vmemmap_populate_basepages(start, end, node, NULL);
>>   }
>>   #else    /* !ARM64_SWAPPER_USES_SECTION_MAPS */
>>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
>> index 4f33f6e7e206..20409f3afea8 100644
>> --- a/arch/ia64/mm/discontig.c
>> +++ b/arch/ia64/mm/discontig.c
>> @@ -656,7 +656,7 @@ void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
>>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>           struct vmem_altmap *altmap)
>>   {
>> -    return vmemmap_populate_basepages(start, end, node);
>> +    return vmemmap_populate_basepages(start, end, node, NULL);
>>   }
>>     void vmemmap_free(unsigned long start, unsigned long end,
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index 965a8cf4829c..1d7451c91982 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -501,6 +501,6 @@ void __init paging_init(void)
>>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>                      struct vmem_altmap *altmap)
>>   {
>> -    return vmemmap_populate_basepages(start, end, node);
>> +    return vmemmap_populate_basepages(start, end, node, NULL);
>>   }
>>   #endif
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index abbdecb75fad..3272fe0d844a 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -1471,7 +1471,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
>>               vmemmap_verify((pte_t *)pmd, node, addr, next);
>>               continue;
>>           }
>> -        if (vmemmap_populate_basepages(addr, next, node))
>> +        if (vmemmap_populate_basepages(addr, next, node, NULL))
>>               return -ENOMEM;
>>       }
>>       return 0;
>> @@ -1483,7 +1483,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>       int err;
>>         if (end - start < PAGES_PER_SECTION * sizeof(struct page))
>> -        err = vmemmap_populate_basepages(start, end, node);
>> +        err = vmemmap_populate_basepages(start, end, node, NULL);
>>       else if (boot_cpu_has(X86_FEATURE_PSE))
>>           err = vmemmap_populate_hugepages(start, end, node, altmap);
>>       else if (altmap) {
>> @@ -1491,7 +1491,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>                   __func__);
>>           err = -ENOMEM;
>>       } else
>> -        err = vmemmap_populate_basepages(start, end, node);
>> +        err = vmemmap_populate_basepages(start, end, node, NULL);
>>       if (!err)
>>           sync_global_pgds(start, end - 1);
>>       return err;
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 52269e56c514..42f99c8d63c0 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2780,14 +2780,15 @@ pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>>   p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
>>   pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
>>   pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
>> -pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
>> +pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
>> +                struct vmem_altmap *altmap);
>>   void *vmemmap_alloc_block(unsigned long size, int node);
>>   struct vmem_altmap;
>>   void *vmemmap_alloc_block_buf(unsigned long size, int node);
>>   void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap);
>>   void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
>>   int vmemmap_populate_basepages(unsigned long start, unsigned long end,
>> -                   int node);
>> +                   int node, struct vmem_altmap *altmap);
>>   int vmemmap_populate(unsigned long start, unsigned long end, int node,
>>           struct vmem_altmap *altmap);
>>   void vmemmap_populate_print_last(void);
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index 200aef686722..a407abc9b46c 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -140,12 +140,18 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>>               start, end - 1);
>>   }
>>   -pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node)
>> +pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
>> +                       struct vmem_altmap *altmap)
>>   {
>>       pte_t *pte = pte_offset_kernel(pmd, addr);
>>       if (pte_none(*pte)) {
>>           pte_t entry;
>> -        void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
>> +        void *p;
>> +
>> +        if (altmap)
>> +            p = altmap_alloc_block_buf(PAGE_SIZE, altmap);
>> +        else
>> +            p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
> 
> This pattern ends up appearing a number of times by the end - if we do go down the generic code route, might it be worth pushing it down into vmmemmap_alloc_block_buf() itself to make it automatic? (possibly even including the powerpc fallback behaviour too?)

Yes, this pattern is now there in couple of more places. Sure, will change
vmemmap_alloc_block_buf() to handle altmap with a fallback request.

Something like this (not tested properly)

--------------------------------------------------- 
From: Anshuman Khandual <anshuman.khandual@arm.com>
Date: Tue, 24 Mar 2020 07:35:47 +0000
Subject: [PATCH] mm/sparse: Enable vmemmap_alloc_block_buf() for altmap
 allocations

There are many instances where vmemap allocation is often switched between
device memory and regular memory based on whether altmap is available or
not. vmemmap_alloc_block_buf() is used in various platforms to allocate
vmemmap. Hence enable it to handle altmap based device memory allocation as
well. While here implement a regular memory allocation fallback mechanism
that is used in powerpc.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/mmu.c       |  6 ++----
 arch/powerpc/mm/init_64.c | 12 ++++++------
 arch/x86/mm/init_64.c     |  6 ++----
 include/linux/mm.h        |  3 ++-
 mm/sparse-vmemmap.c       | 27 +++++++++++++++++++++------
 5 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 88c5b357013b..45f09935c160 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1080,10 +1080,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		if (pmd_none(READ_ONCE(*pmdp))) {
 			void *p = NULL;
 
-			if (altmap)
-				p = altmap_alloc_block_buf(PMD_SIZE, altmap);
-			else
-				p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+			p = vmemmap_alloc_block_buf(PMD_SIZE, node,
+						    altmap, false);
 			if (!p)
 				return -ENOMEM;
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 4002ced3596f..31995eb4b62a 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -150,7 +150,7 @@ static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
 
 	/* allocate a page when required and hand out chunks */
 	if (!num_left) {
-		next = vmemmap_alloc_block(PAGE_SIZE, node);
+		next = vmemmap_alloc_block(PAGE_SIZE, node, NULL, false);
 		if (unlikely(!next)) {
 			WARN_ON(1);
 			return NULL;
@@ -226,12 +226,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		 * fall back to system memory if the altmap allocation fail.
 		 */
 		if (altmap && !altmap_cross_boundary(altmap, start, page_size)) {
-			p = altmap_alloc_block_buf(page_size, altmap);
-			if (!p)
-				pr_debug("altmap block allocation failed, falling back to system memory");
+			p = vmemmap_alloc_block_buf(page_size, node,
+						    altmap, true);
+		} else {
+			p = vmemmap_alloc_block_buf(page_size, node,
+						    NULL, false);
 		}
-		if (!p)
-			p = vmemmap_alloc_block_buf(page_size, node);
 		if (!p)
 			return -ENOMEM;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index c22677571619..35cc0c9d9578 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1444,10 +1444,8 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 		if (pmd_none(*pmd)) {
 			void *p;
 
-			if (altmap)
-				p = altmap_alloc_block_buf(PMD_SIZE, altmap);
-			else
-				p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+			p = vmemmap_alloc_block_buf(PMD_SIZE, node,
+						    altmap, false);
 			if (p) {
 				pte_t entry;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4a987d173488..a2cb9c669800 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2994,7 +2994,8 @@ pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
 			    struct vmem_altmap *altmap);
 void *vmemmap_alloc_block(unsigned long size, int node);
 struct vmem_altmap;
-void *vmemmap_alloc_block_buf(unsigned long size, int node);
+void *vmemmap_alloc_block_buf(unsigned long size, int node,
+			      struct vmem_altmap *altmap, bool fallback);
 void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap);
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a407abc9b46c..f502fcdf539f 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -71,10 +71,28 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 }
 
 /* need to make sure size is all the same during early stage */
-void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node)
+void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node,
+					 struct vmem_altmap *altmap,
+					 bool fallback)
 {
-	void *ptr = sparse_buffer_alloc(size);
+	void *ptr;
 
+	/*
+	 * There is no point in asking for fallback without
+	 * an altmap request to begin with. Just warn here
+	 * to catch potential call sites violating this.
+	 */
+	WARN_ON(!altmap && fallback);
+
+	if (altmap) {
+		ptr = altmap_alloc_block_buf(size, altmap);
+		if (!ptr && !fallback)
+			return NULL;
+		pr_debug("altmap block allocation failed,\
+				falling back to system memory");
+	}
+
+	ptr = sparse_buffer_alloc(size);
 	if (!ptr)
 		ptr = vmemmap_alloc_block(size, node);
 	return ptr;
@@ -148,10 +166,7 @@ pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
 		pte_t entry;
 		void *p;
 
-		if (altmap)
-			p = altmap_alloc_block_buf(PAGE_SIZE, altmap);
-		else
-			p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
+		p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap, false);
 		if (!p)
 			return NULL;
 		entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
-- 
2.20.1



> 
> Robin.
> 
>>           if (!p)
>>               return NULL;
>>           entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
>> @@ -213,8 +219,8 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
>>       return pgd;
>>   }
>>   -int __meminit vmemmap_populate_basepages(unsigned long start,
>> -                     unsigned long end, int node)
>> +int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end,
>> +                     int node, struct vmem_altmap *altmap)
>>   {
>>       unsigned long addr = start;
>>       pgd_t *pgd;
>> @@ -236,7 +242,7 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
>>           pmd = vmemmap_pmd_populate(pud, addr, node);
>>           if (!pmd)
>>               return -ENOMEM;
>> -        pte = vmemmap_pte_populate(pmd, addr, node);
>> +        pte = vmemmap_pte_populate(pmd, addr, node, altmap);
>>           if (!pte)
>>               return -ENOMEM;
>>           vmemmap_verify(pte, node, addr, addr + PAGE_SIZE);
>>
> 
> ----->8-----
> From: Robin Murphy <robin.murphy@arm.com>
> Subject: [PATCH] arm64/mm: Consolidate vmemmap_populate()
> 
> Since we already have a custom vmemmap_populate() implementation, fold
> the non-section-map case into that as well, so that we can easily add
> altmap support for both cases without having to mess with core code.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 34 +++++++++++++++++++++-------------
>  1 file changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 128f70852bf3..e250fd414b2b 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -725,13 +725,6 @@ int kern_addr_valid(unsigned long addr)
>      return pfn_valid(pte_pfn(pte));
>  }
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -#if !ARM64_SWAPPER_USES_SECTION_MAPS
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> -        struct vmem_altmap *altmap)
> -{
> -    return vmemmap_populate_basepages(start, end, node);
> -}
> -#else    /* !ARM64_SWAPPER_USES_SECTION_MAPS */
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>          struct vmem_altmap *altmap)
>  {
> @@ -740,6 +733,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>      pgd_t *pgdp;
>      pud_t *pudp;
>      pmd_t *pmdp;
> +    pte_t *ptep;
> 
>      do {
>          next = pmd_addr_end(addr, end);
> @@ -752,22 +746,36 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>          if (!pudp)
>              return -ENOMEM;
> 
> +#if ARM64_SWAPPER_USES_SECTION_MAPS
>          pmdp = pmd_offset(pudp, addr);
>          if (pmd_none(READ_ONCE(*pmdp))) {
> -            void *p = NULL;
> -
> -            p = vmemmap_alloc_block_buf(PMD_SIZE, node);
> +            void *p = vmemmap_alloc_block_buf(PMD_SIZE, node);
>              if (!p)
>                  return -ENOMEM;
> 
>              pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
> -        } else
> -            vmemmap_verify((pte_t *)pmdp, node, addr, next);
> +            continue;
> +        }
> +#else
> +        pmdp = vmemmap_pmd_populate(pmdp, addr, node);
> +        if (!pmdp)
> +            return -ENOMEM;
> +
> +        ptep = pte_offset_kernel(pmdp, addr);
> +        if (pte_none(READ_ONCE(*ptep))) {
> +            void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
> +            if (!p)
> +                return -ENOMEM;
> +
> +            set_pte(ptep, pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL));
> +        }
> +#endif
> +        vmemmap_verify((pte_t *)pmdp, node, addr, next);
>      } while (addr = next, addr != end);
> 
>      return 0;
>  }
> -#endif    /* !ARM64_SWAPPER_USES_SECTION_MAPS */
> +
>  void vmemmap_free(unsigned long start, unsigned long end,
>          struct vmem_altmap *altmap)
>  {
> 
> 


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 2/2] arm64/mm: Enable vmem_altmap support for vmemmap mappings
  2020-03-20 19:35   ` Robin Murphy
@ 2020-03-24 12:03     ` Anshuman Khandual
  0 siblings, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2020-03-24 12:03 UTC (permalink / raw)
  To: Robin Murphy, linux-mm
  Cc: Mark Rutland, Yu Zhao, David Hildenbrand, Catalin Marinas,
	Steve Capper, linux-kernel, Hsin-Yi Wang, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel


On 03/21/2020 01:05 AM, Robin Murphy wrote:
> On 2020-03-04 2:10 pm, Anshuman Khandual wrote:
>> Device memory ranges when getting hot added into ZONE_DEVICE, might require
>> their vmemmap mapping's backing memory to be allocated from their own range
>> instead of consuming system memory. This prevents large system memory usage
>> for potentially large device memory ranges. Device driver communicates this
>> request via vmem_altmap structure. Architecture needs to take this request
>> into account while creating and tearing down vemmmap mappings.
>>
>> This enables vmem_altmap support in vmemmap_populate() and vmemmap_free()
>> which includes vmemmap_populate_basepages() used for ARM64_16K_PAGES and
>> ARM64_64K_PAGES configs.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Steve Capper <steve.capper@arm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Hsin-Yi Wang <hsinyi@chromium.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   arch/arm64/mm/mmu.c | 71 ++++++++++++++++++++++++++++++++-------------
>>   1 file changed, 51 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 27cb95c471eb..0e0a0ecc812e 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -727,15 +727,30 @@ int kern_addr_valid(unsigned long addr)
>>   }
>>     #ifdef CONFIG_MEMORY_HOTPLUG
>> -static void free_hotplug_page_range(struct page *page, size_t size)
>> +static void free_hotplug_page_range(struct page *page, size_t size,
>> +                    struct vmem_altmap *altmap)
>>   {
>> -    WARN_ON(PageReserved(page));
>> -    free_pages((unsigned long)page_address(page), get_order(size));
>> +    if (altmap) {
>> +        /*
>> +         * Though unmap_hotplug_range() will tear down altmap based
>> +         * vmemmap mappings at all page table levels, these mappings
>> +         * should only have been created either at PTE or PMD level
>> +         * with vmemmap_populate_basepages() or vmemmap_populate()
>> +         * respectively. Unmapping requests at any other level will
>> +         * be problematic. Drop these warnings when vmemmap mapping
>> +         * is supported at PUD (even perhaps P4D) level.
>> +         */
>> +        WARN_ON((size != PAGE_SIZE) && (size != PMD_SIZE));
> 
> Isn't that comment equally true of the regular case? AFAICS we don't call vmemmap_alloc_block_buf() with larger than PMD_SIZE either. If the warnings are useful, shouldn't they cover both cases equally? However, given that we never warned before, and the code here appears that it would work fine anyway, *are* they really useful?

Sure, this is not something exclusively applicable for altmap based
vmemmap mappings alone. Will drop it from here.

> 
>> +        vmem_altmap_free(altmap, size >> PAGE_SHIFT);
>> +    } else {
>> +        WARN_ON(PageReserved(page));
>> +        free_pages((unsigned long)page_address(page), get_order(size));
>> +    }
>>   }
>>     static void free_hotplug_pgtable_page(struct page *page)
>>   {
>> -    free_hotplug_page_range(page, PAGE_SIZE);
>> +    free_hotplug_page_range(page, PAGE_SIZE, NULL);
>>   }
>>     static bool pgtable_range_aligned(unsigned long start, unsigned long end,
>> @@ -758,7 +773,8 @@ static bool pgtable_range_aligned(unsigned long start, unsigned long end,
>>   }
>>     static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
>> -                    unsigned long end, bool free_mapped)
>> +                    unsigned long end, bool free_mapped,
>> +                    struct vmem_altmap *altmap)
>>   {
>>       pte_t *ptep, pte;
>>   @@ -772,12 +788,14 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
>>           pte_clear(&init_mm, addr, ptep);
>>           flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>>           if (free_mapped)
>> -            free_hotplug_page_range(pte_page(pte), PAGE_SIZE);
>> +            free_hotplug_page_range(pte_page(pte),
>> +                        PAGE_SIZE, altmap);
>>       } while (addr += PAGE_SIZE, addr < end);
>>   }
>>     static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
>> -                    unsigned long end, bool free_mapped)
>> +                    unsigned long end, bool free_mapped,
>> +                    struct vmem_altmap *altmap)
>>   {
>>       unsigned long next;
>>       pmd_t *pmdp, pmd;
>> @@ -800,16 +818,17 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
>>               flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>>               if (free_mapped)
>>                   free_hotplug_page_range(pmd_page(pmd),
>> -                            PMD_SIZE);
>> +                            PMD_SIZE, altmap);
>>               continue;
>>           }
>>           WARN_ON(!pmd_table(pmd));
>> -        unmap_hotplug_pte_range(pmdp, addr, next, free_mapped);
>> +        unmap_hotplug_pte_range(pmdp, addr, next, free_mapped, altmap);
>>       } while (addr = next, addr < end);
>>   }
>>     static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
>> -                    unsigned long end, bool free_mapped)
>> +                    unsigned long end, bool free_mapped,
>> +                    struct vmem_altmap *altmap)
>>   {
>>       unsigned long next;
>>       pud_t *pudp, pud;
>> @@ -832,16 +851,17 @@ static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
>>               flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>>               if (free_mapped)
>>                   free_hotplug_page_range(pud_page(pud),
>> -                            PUD_SIZE);
>> +                            PUD_SIZE, altmap);
>>               continue;
>>           }
>>           WARN_ON(!pud_table(pud));
>> -        unmap_hotplug_pmd_range(pudp, addr, next, free_mapped);
>> +        unmap_hotplug_pmd_range(pudp, addr, next, free_mapped, altmap);
>>       } while (addr = next, addr < end);
>>   }
>>     static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
>> -                    unsigned long end, bool free_mapped)
>> +                    unsigned long end, bool free_mapped,
>> +                    struct vmem_altmap *altmap)
>>   {
>>       unsigned long next;
>>       p4d_t *p4dp, p4d;
>> @@ -854,16 +874,24 @@ static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
>>               continue;
>>             WARN_ON(!p4d_present(p4d));
>> -        unmap_hotplug_pud_range(p4dp, addr, next, free_mapped);
>> +        unmap_hotplug_pud_range(p4dp, addr, next, free_mapped, altmap);
>>       } while (addr = next, addr < end);
>>   }
>>     static void unmap_hotplug_range(unsigned long addr, unsigned long end,
>> -                bool free_mapped)
>> +                bool free_mapped, struct vmem_altmap *altmap)
>>   {
>>       unsigned long next;
>>       pgd_t *pgdp, pgd;
>>   +    /*
>> +     * vmem_altmap can only be used as backing memory in a given
>> +     * page table mapping. In case backing memory itself is not
>> +     * being freed, then altmap is irrelevant. Warn about this
>> +     * inconsistency when encountered.
>> +     */
>> +    WARN_ON(!free_mapped && altmap);
> 
> Personally I find that comment a bit unclear (particularly the first sentence which just seems like a confusing tautology). Is the overall point that the altmap only matters when we're unmapping and freeing vmemmap pages (such that we free them to the right allocator)? At face value it doesn't seem to warrant a warning - it's not necessary to know which allocator owns pages that we aren't freeing, but it isn't harmful either.

Probably will change the comment to something like this instead.

        /*
         * altmap can only be used as vmemmap mapping backing memory.
         * In case the backing memory itself is not being freed, then
         * altmap is just irrelevant. Warn about this inconsistency
	 * when encountered.
         */

altmap does decide which allocator, the backing pages will get freed
into. The primary purpose here is to just warn about this invalid
combination i.e (!free_mapped && altmap) which the function should
never be called with.

> 
> That said, however, after puzzling through the code I get the distinct feeling it would be more useful if all those "free_mapped" arguments were actually named "is_vmemmap" instead. A that point, the conceptual inconsistency would be a little more obvious (and arguably might not even need commenting).

'free_mapped' was a conscious decision [1] that got added during hot
remove series V9. It avoided the name to be just vmemmap specific as
the unmapping and freeing functions are very generic in nature.

[1] https://lkml.org/lkml/2019/10/8/310

> 
> All the altmap plumbing itself looks pretty mechanical and hard to disagree with :)

Okay.

> 
> Robin.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-24 12:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-04 14:10 [PATCH V2 0/2] arm64: Enable vmemmap mapping from device memory Anshuman Khandual
2020-03-04 14:10 ` [PATCH V2 1/2] mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages() Anshuman Khandual
2020-03-20 17:08   ` Robin Murphy
2020-03-24 12:01     ` Anshuman Khandual
2020-03-04 14:10 ` [PATCH V2 2/2] arm64/mm: Enable vmem_altmap support for vmemmap mappings Anshuman Khandual
2020-03-20 19:35   ` Robin Murphy
2020-03-24 12:03     ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).