linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages
@ 2021-06-12  9:45 Muchun Song
  2021-06-12  9:45 ` [PATCH v2 1/3] mm: sparsemem: split the " Muchun Song
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Muchun Song @ 2021-06-12  9:45 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, linux-doc, linux-kernel, linux-mm, Muchun Song

In order to reduce the difficulty of code review in series[1]. We disable
huge PMD mapping of vmemmap pages when that feature is enabled. In this
series, we do not disable huge PMD mapping of vmemmap pages anymore. We
will split huge PMD mapping when needed. When HugeTLB pages are freed from
the pool we do not attempt coalasce and move back to a PMD mapping because
it is much more complex.

[1] https://lore.kernel.org/linux-doc/20210510030027.56044-1-songmuchun@bytedance.com/

Changelog in v2:
  1. Collect Review-by from Mike.
  2. Remove helpers used to preallocate/free page tables for HugeTLB pages.

  Thanks Mike's suggestions. It really eliminate a lot of code.

Muchun Song (3):
  mm: sparsemem: split the huge PMD mapping of vmemmap pages
  mm: sparsemem: use huge PMD mapping for vmemmap pages
  mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON

 Documentation/admin-guide/kernel-parameters.txt |  10 +-
 arch/x86/mm/init_64.c                           |   8 +-
 fs/Kconfig                                      |  10 ++
 include/linux/hugetlb.h                         |  25 +---
 include/linux/mm.h                              |   4 +-
 mm/hugetlb_vmemmap.c                            |  11 +-
 mm/memory_hotplug.c                             |   2 +-
 mm/sparse-vmemmap.c                             | 157 ++++++++++++++++++------
 8 files changed, 149 insertions(+), 78 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] mm: sparsemem: split the huge PMD mapping of vmemmap pages
  2021-06-12  9:45 [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Muchun Song
@ 2021-06-12  9:45 ` Muchun Song
  2021-06-15 22:31   ` Mike Kravetz
  2021-06-12  9:45 ` [PATCH v2 2/3] mm: sparsemem: use huge PMD mapping for " Muchun Song
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Muchun Song @ 2021-06-12  9:45 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, linux-doc, linux-kernel, linux-mm, Muchun Song

Currently, we disable huge PMD mapping of vmemmap pages when that feature
of "Free some vmemmap pages of HugeTLB pages" is enabled. If the vmemmap
is huge PMD mapped when we walk the vmemmap page tables, we split the
huge PMD firstly and then we move to PTE mappings. When HugeTLB pages are
freed from the pool we do not attempt coalasce and move back to a PMD
mapping because it is much more complex.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/mm.h   |   4 +-
 mm/hugetlb_vmemmap.c |   5 +-
 mm/sparse-vmemmap.c  | 157 ++++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 123 insertions(+), 43 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cadc8cc2c715..8284e8ed30c9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3055,8 +3055,8 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
 }
 #endif
 
-void vmemmap_remap_free(unsigned long start, unsigned long end,
-			unsigned long reuse);
+int vmemmap_remap_free(unsigned long start, unsigned long end,
+		       unsigned long reuse);
 int vmemmap_remap_alloc(unsigned long start, unsigned long end,
 			unsigned long reuse, gfp_t gfp_mask);
 
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index f9f9bb212319..06802056f296 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -258,9 +258,8 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head)
 	 * to the page which @vmemmap_reuse is mapped to, then free the pages
 	 * which the range [@vmemmap_addr, @vmemmap_end] is mapped to.
 	 */
-	vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
-
-	SetHPageVmemmapOptimized(head);
+	if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse))
+		SetHPageVmemmapOptimized(head);
 }
 
 void __init hugetlb_vmemmap_init(struct hstate *h)
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 693de0aec7a8..7f73c37f742d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -38,6 +38,7 @@
  * vmemmap_remap_walk - walk vmemmap page table
  *
  * @remap_pte:		called for each lowest-level entry (PTE).
+ * @walked_pte:		the number of walked pte.
  * @reuse_page:		the page which is reused for the tail vmemmap pages.
  * @reuse_addr:		the virtual address of the @reuse_page page.
  * @vmemmap_pages:	the list head of the vmemmap pages that can be freed
@@ -46,11 +47,44 @@
 struct vmemmap_remap_walk {
 	void (*remap_pte)(pte_t *pte, unsigned long addr,
 			  struct vmemmap_remap_walk *walk);
+	unsigned long walked_pte;
 	struct page *reuse_page;
 	unsigned long reuse_addr;
 	struct list_head *vmemmap_pages;
 };
 
+static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
+				  struct vmemmap_remap_walk *walk)
+{
+	pmd_t __pmd;
+	int i;
+	unsigned long addr = start;
+	struct page *page = pmd_page(*pmd);
+	pte_t *pgtable = pte_alloc_one_kernel(&init_mm);
+
+	if (!pgtable)
+		return -ENOMEM;
+
+	pmd_populate_kernel(&init_mm, &__pmd, pgtable);
+
+	for (i = 0; i < PMD_SIZE / PAGE_SIZE; i++, addr += PAGE_SIZE) {
+		pte_t entry, *pte;
+		pgprot_t pgprot = PAGE_KERNEL;
+
+		entry = mk_pte(page + i, pgprot);
+		pte = pte_offset_kernel(&__pmd, addr);
+		set_pte_at(&init_mm, addr, pte, entry);
+	}
+
+	/* Make pte visible before pmd. See comment in __pte_alloc(). */
+	smp_wmb();
+	pmd_populate_kernel(&init_mm, pmd, pgtable);
+
+	flush_tlb_kernel_range(start, start + PMD_SIZE);
+
+	return 0;
+}
+
 static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
 			      unsigned long end,
 			      struct vmemmap_remap_walk *walk)
@@ -68,59 +102,81 @@ static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
 		 * walking, skip the reuse address range.
 		 */
 		addr += PAGE_SIZE;
+		walk->walked_pte++;
 		pte++;
 	}
 
-	for (; addr != end; addr += PAGE_SIZE, pte++)
+	for (; addr != end; addr += PAGE_SIZE, pte++) {
 		walk->remap_pte(pte, addr, walk);
+		walk->walked_pte++;
+	}
 }
 
-static void vmemmap_pmd_range(pud_t *pud, unsigned long addr,
-			      unsigned long end,
-			      struct vmemmap_remap_walk *walk)
+static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
+			     unsigned long end,
+			     struct vmemmap_remap_walk *walk)
 {
 	pmd_t *pmd;
 	unsigned long next;
 
 	pmd = pmd_offset(pud, addr);
 	do {
-		BUG_ON(pmd_leaf(*pmd));
+		if (pmd_leaf(*pmd)) {
+			int ret;
 
+			ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
+			if (ret)
+				return ret;
+		}
 		next = pmd_addr_end(addr, end);
 		vmemmap_pte_range(pmd, addr, next, walk);
 	} while (pmd++, addr = next, addr != end);
+
+	return 0;
 }
 
-static void vmemmap_pud_range(p4d_t *p4d, unsigned long addr,
-			      unsigned long end,
-			      struct vmemmap_remap_walk *walk)
+static int vmemmap_pud_range(p4d_t *p4d, unsigned long addr,
+			     unsigned long end,
+			     struct vmemmap_remap_walk *walk)
 {
 	pud_t *pud;
 	unsigned long next;
 
 	pud = pud_offset(p4d, addr);
 	do {
+		int ret;
+
 		next = pud_addr_end(addr, end);
-		vmemmap_pmd_range(pud, addr, next, walk);
+		ret = vmemmap_pmd_range(pud, addr, next, walk);
+		if (ret)
+			return ret;
 	} while (pud++, addr = next, addr != end);
+
+	return 0;
 }
 
-static void vmemmap_p4d_range(pgd_t *pgd, unsigned long addr,
-			      unsigned long end,
-			      struct vmemmap_remap_walk *walk)
+static int vmemmap_p4d_range(pgd_t *pgd, unsigned long addr,
+			     unsigned long end,
+			     struct vmemmap_remap_walk *walk)
 {
 	p4d_t *p4d;
 	unsigned long next;
 
 	p4d = p4d_offset(pgd, addr);
 	do {
+		int ret;
+
 		next = p4d_addr_end(addr, end);
-		vmemmap_pud_range(p4d, addr, next, walk);
+		ret = vmemmap_pud_range(p4d, addr, next, walk);
+		if (ret)
+			return ret;
 	} while (p4d++, addr = next, addr != end);
+
+	return 0;
 }
 
-static void vmemmap_remap_range(unsigned long start, unsigned long end,
-				struct vmemmap_remap_walk *walk)
+static int vmemmap_remap_range(unsigned long start, unsigned long end,
+			       struct vmemmap_remap_walk *walk)
 {
 	unsigned long addr = start;
 	unsigned long next;
@@ -131,8 +187,12 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end,
 
 	pgd = pgd_offset_k(addr);
 	do {
+		int ret;
+
 		next = pgd_addr_end(addr, end);
-		vmemmap_p4d_range(pgd, addr, next, walk);
+		ret = vmemmap_p4d_range(pgd, addr, next, walk);
+		if (ret)
+			return ret;
 	} while (pgd++, addr = next, addr != end);
 
 	/*
@@ -141,6 +201,8 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end,
 	 * belongs to the range.
 	 */
 	flush_tlb_kernel_range(start + PAGE_SIZE, end);
+
+	return 0;
 }
 
 /*
@@ -179,10 +241,27 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
 	pte_t entry = mk_pte(walk->reuse_page, pgprot);
 	struct page *page = pte_page(*pte);
 
-	list_add(&page->lru, walk->vmemmap_pages);
+	list_add_tail(&page->lru, walk->vmemmap_pages);
 	set_pte_at(&init_mm, addr, pte, entry);
 }
 
+static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
+				struct vmemmap_remap_walk *walk)
+{
+	pgprot_t pgprot = PAGE_KERNEL;
+	struct page *page;
+	void *to;
+
+	BUG_ON(pte_page(*pte) != walk->reuse_page);
+
+	page = list_first_entry(walk->vmemmap_pages, struct page, lru);
+	list_del(&page->lru);
+	to = page_to_virt(page);
+	copy_page(to, (void *)walk->reuse_addr);
+
+	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
+}
+
 /**
  * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end)
  *			to the page which @reuse is mapped to, then free vmemmap
@@ -193,12 +272,12 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
  *		remap.
  * @reuse:	reuse address.
  *
- * Note: This function depends on vmemmap being base page mapped. Please make
- * sure that we disable PMD mapping of vmemmap pages when calling this function.
+ * Return: %0 on success, negative error code otherwise.
  */
-void vmemmap_remap_free(unsigned long start, unsigned long end,
-			unsigned long reuse)
+int vmemmap_remap_free(unsigned long start, unsigned long end,
+		       unsigned long reuse)
 {
+	int ret;
 	LIST_HEAD(vmemmap_pages);
 	struct vmemmap_remap_walk walk = {
 		.remap_pte	= vmemmap_remap_pte,
@@ -221,25 +300,25 @@ void vmemmap_remap_free(unsigned long start, unsigned long end,
 	 */
 	BUG_ON(start - reuse != PAGE_SIZE);
 
-	vmemmap_remap_range(reuse, end, &walk);
-	free_vmemmap_page_list(&vmemmap_pages);
-}
+	mmap_write_lock(&init_mm);
+	ret = vmemmap_remap_range(reuse, end, &walk);
+	mmap_write_downgrade(&init_mm);
 
-static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
-				struct vmemmap_remap_walk *walk)
-{
-	pgprot_t pgprot = PAGE_KERNEL;
-	struct page *page;
-	void *to;
+	if (ret && walk.walked_pte) {
+		end = reuse + walk.walked_pte * PAGE_SIZE;
+		walk = (struct vmemmap_remap_walk) {
+			.remap_pte	= vmemmap_restore_pte,
+			.reuse_addr	= reuse,
+			.vmemmap_pages	= &vmemmap_pages,
+		};
 
-	BUG_ON(pte_page(*pte) != walk->reuse_page);
+		vmemmap_remap_range(reuse, end, &walk);
+	}
+	mmap_read_unlock(&init_mm);
 
-	page = list_first_entry(walk->vmemmap_pages, struct page, lru);
-	list_del(&page->lru);
-	to = page_to_virt(page);
-	copy_page(to, (void *)walk->reuse_addr);
+	free_vmemmap_page_list(&vmemmap_pages);
 
-	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
+	return ret;
 }
 
 static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
@@ -273,6 +352,8 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
  *		remap.
  * @reuse:	reuse address.
  * @gpf_mask:	GFP flag for allocating vmemmap pages.
+ *
+ * Return: %0 on success, negative error code otherwise.
  */
 int vmemmap_remap_alloc(unsigned long start, unsigned long end,
 			unsigned long reuse, gfp_t gfp_mask)
@@ -287,12 +368,12 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end,
 	/* See the comment in the vmemmap_remap_free(). */
 	BUG_ON(start - reuse != PAGE_SIZE);
 
-	might_sleep_if(gfpflags_allow_blocking(gfp_mask));
-
 	if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages))
 		return -ENOMEM;
 
+	mmap_read_lock(&init_mm);
 	vmemmap_remap_range(reuse, end, &walk);
+	mmap_read_unlock(&init_mm);
 
 	return 0;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] mm: sparsemem: use huge PMD mapping for vmemmap pages
  2021-06-12  9:45 [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Muchun Song
  2021-06-12  9:45 ` [PATCH v2 1/3] mm: sparsemem: split the " Muchun Song
@ 2021-06-12  9:45 ` Muchun Song
  2021-06-12  9:45 ` [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
  2021-06-15  1:12 ` [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Andrew Morton
  3 siblings, 0 replies; 11+ messages in thread
From: Muchun Song @ 2021-06-12  9:45 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, linux-doc, linux-kernel, linux-mm, Muchun Song

The preparation of splitting huge PMD mapping of vmemmap pages is ready,
so switch the mapping from PTE to PMD.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 -------
 arch/x86/mm/init_64.c                           |  8 ++------
 include/linux/hugetlb.h                         | 25 ++++++-------------------
 mm/memory_hotplug.c                             |  2 +-
 4 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index db1ef6739613..a01aadafee38 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1599,13 +1599,6 @@
 			enabled.
 			Allows heavy hugetlb users to free up some more
 			memory (6 * PAGE_SIZE for each 2MB hugetlb page).
-			This feauture is not free though. Large page
-			tables are not used to back vmemmap pages which
-			can lead to a performance degradation for some
-			workloads. Also there will be memory allocation
-			required when hugetlb pages are freed from the
-			pool which can lead to corner cases under heavy
-			memory pressure.
 			Format: { on | off (default) }
 
 			on:  enable the feature
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 9d9d18d0c2a1..65ea58527176 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -34,7 +34,6 @@
 #include <linux/gfp.h>
 #include <linux/kcore.h>
 #include <linux/bootmem_info.h>
-#include <linux/hugetlb.h>
 
 #include <asm/processor.h>
 #include <asm/bios_ebda.h>
@@ -1610,8 +1609,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
 	VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
 
-	if ((is_hugetlb_free_vmemmap_enabled()  && !altmap) ||
-	    end - start < PAGES_PER_SECTION * sizeof(struct page))
+	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
 		err = vmemmap_populate_basepages(start, end, node, NULL);
 	else if (boot_cpu_has(X86_FEATURE_PSE))
 		err = vmemmap_populate_hugepages(start, end, node, altmap);
@@ -1639,8 +1637,6 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 	pmd_t *pmd;
 	unsigned int nr_pmd_pages;
 	struct page *page;
-	bool base_mapping = !boot_cpu_has(X86_FEATURE_PSE) ||
-			    is_hugetlb_free_vmemmap_enabled();
 
 	for (; addr < end; addr = next) {
 		pte_t *pte = NULL;
@@ -1666,7 +1662,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 		}
 		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
 
-		if (base_mapping) {
+		if (!boot_cpu_has(X86_FEATURE_PSE)) {
 			next = (addr + PAGE_SIZE) & PAGE_MASK;
 			pmd = pmd_offset(pud, addr);
 			if (pmd_none(*pmd))
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 03ca83db0a3e..d43565dd5fb9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -904,20 +904,6 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 }
 #endif
 
-#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
-
-static inline bool is_hugetlb_free_vmemmap_enabled(void)
-{
-	return hugetlb_free_vmemmap_enabled;
-}
-#else
-static inline bool is_hugetlb_free_vmemmap_enabled(void)
-{
-	return false;
-}
-#endif
-
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
@@ -1077,13 +1063,14 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
 					pte_t *ptep, pte_t pte, unsigned long sz)
 {
 }
-
-static inline bool is_hugetlb_free_vmemmap_enabled(void)
-{
-	return false;
-}
 #endif	/* CONFIG_HUGETLB_PAGE */
 
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+extern bool hugetlb_free_vmemmap_enabled;
+#else
+#define hugetlb_free_vmemmap_enabled	false
+#endif
+
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
 					struct mm_struct *mm, pte_t *pte)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d96a3c7551c8..9d8a551c08d5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1056,7 +1056,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       populate a single PMD.
 	 */
 	return memmap_on_memory &&
-	       !is_hugetlb_free_vmemmap_enabled() &&
+	       !hugetlb_free_vmemmap_enabled &&
 	       IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
 	       size == memory_block_size_bytes() &&
 	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
  2021-06-12  9:45 [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Muchun Song
  2021-06-12  9:45 ` [PATCH v2 1/3] mm: sparsemem: split the " Muchun Song
  2021-06-12  9:45 ` [PATCH v2 2/3] mm: sparsemem: use huge PMD mapping for " Muchun Song
@ 2021-06-12  9:45 ` Muchun Song
  2021-06-15 23:00   ` Joao Martins
  2021-06-15  1:12 ` [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Andrew Morton
  3 siblings, 1 reply; 11+ messages in thread
From: Muchun Song @ 2021-06-12  9:45 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, linux-doc, linux-kernel, linux-mm, Muchun Song

When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages
associated with each HugeTLB page is default off. Now the vmemmap is PMD
mapped. So there is no side effect when this feature is enabled with no
HugeTLB pages in the system. Someone may want to enable this feature in
the compiler time instead of using boot command line. So add a config to
make it default on when someone do not want to enable it via command line.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 fs/Kconfig                                      | 10 ++++++++++
 mm/hugetlb_vmemmap.c                            |  6 ++++--
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a01aadafee38..8eee439d943c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1604,6 +1604,9 @@
 			on:  enable the feature
 			off: disable the feature
 
+			Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y,
+			the default is on.
+
 			This is not compatible with memory_hotplug.memmap_on_memory.
 			If both parameters are enabled, hugetlb_free_vmemmap takes
 			precedence over memory_hotplug.memmap_on_memory.
diff --git a/fs/Kconfig b/fs/Kconfig
index f40b5b98f7ba..e78bc5daf7b0 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -245,6 +245,16 @@ config HUGETLB_PAGE_FREE_VMEMMAP
 	depends on X86_64
 	depends on SPARSEMEM_VMEMMAP
 
+config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
+	bool "Default freeing vmemmap pages of HugeTLB to on"
+	default n
+	depends on HUGETLB_PAGE_FREE_VMEMMAP
+	help
+	  When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap
+	  pages associated with each HugeTLB page is default off. Say Y here
+	  to enable freeing vmemmap pages of HugeTLB by default. It can then
+	  be disabled on the command line via hugetlb_free_vmemmap=off.
+
 config MEMFD_CREATE
 	def_bool TMPFS || HUGETLBFS
 
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 06802056f296..c540c21e26f5 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -182,7 +182,7 @@
 #define RESERVE_VMEMMAP_NR		2U
 #define RESERVE_VMEMMAP_SIZE		(RESERVE_VMEMMAP_NR << PAGE_SHIFT)
 
-bool hugetlb_free_vmemmap_enabled;
+bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
 
 static int __init early_hugetlb_free_vmemmap_param(char *buf)
 {
@@ -197,7 +197,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf)
 
 	if (!strcmp(buf, "on"))
 		hugetlb_free_vmemmap_enabled = true;
-	else if (strcmp(buf, "off"))
+	else if (!strcmp(buf, "off"))
+		hugetlb_free_vmemmap_enabled = false;
+	else
 		return -EINVAL;
 
 	return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages
  2021-06-12  9:45 [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Muchun Song
                   ` (2 preceding siblings ...)
  2021-06-12  9:45 ` [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
@ 2021-06-15  1:12 ` Andrew Morton
  2021-06-15  3:52   ` Mike Kravetz
  3 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2021-06-15  1:12 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, osalvador, mhocko, song.bao.hua, david, chenhuang5,
	bodeddub, corbet, duanxiongchun, fam.zheng, linux-doc,
	linux-kernel, linux-mm

On Sat, 12 Jun 2021 17:45:52 +0800 Muchun Song <songmuchun@bytedance.com> wrote:

> In order to reduce the difficulty of code review in series[1]. We disable
> huge PMD mapping of vmemmap pages when that feature is enabled. In this
> series, we do not disable huge PMD mapping of vmemmap pages anymore. We
> will split huge PMD mapping when needed. When HugeTLB pages are freed from
> the pool we do not attempt coalasce and move back to a PMD mapping because
> it is much more complex.
> 
> [1] https://lore.kernel.org/linux-doc/20210510030027.56044-1-songmuchun@bytedance.com/

[1] had a nice [0/n] description but the v2 series lost that.  I could
copy/paste the v1 changelogging but I am unsure that it has been
maintained appropriately for the v2 series.

I think I'll pass on this v2 pending additional review input.  Please reinstate
the [0/n] overview if/when resending?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages
  2021-06-15  1:12 ` [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Andrew Morton
@ 2021-06-15  3:52   ` Mike Kravetz
  2021-06-15  5:37     ` [External] " Muchun Song
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2021-06-15  3:52 UTC (permalink / raw)
  To: Andrew Morton, Muchun Song
  Cc: osalvador, mhocko, song.bao.hua, david, chenhuang5, bodeddub,
	corbet, duanxiongchun, fam.zheng, linux-doc, linux-kernel,
	linux-mm

On 6/14/21 6:12 PM, Andrew Morton wrote:
> On Sat, 12 Jun 2021 17:45:52 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> 
>> In order to reduce the difficulty of code review in series[1]. We disable
>> huge PMD mapping of vmemmap pages when that feature is enabled. In this
>> series, we do not disable huge PMD mapping of vmemmap pages anymore. We
>> will split huge PMD mapping when needed. When HugeTLB pages are freed from
>> the pool we do not attempt coalasce and move back to a PMD mapping because
>> it is much more complex.
>>
>> [1] https://lore.kernel.org/linux-doc/20210510030027.56044-1-songmuchun@bytedance.com/
> 
> [1] had a nice [0/n] description but the v2 series lost that.  I could
> copy/paste the v1 changelogging but I am unsure that it has been
> maintained appropriately for the v2 series.
> 
> I think I'll pass on this v2 pending additional review input.  Please reinstate
> the [0/n] overview if/when resending?

There may be some confusion.

This series is a follow on optimization for the functionality provided by
[1].  Early in the development of [1], it was decided to drop some code
for ease of review.  Specifically, splitting vmemmap PMD mappings to PTE
mappings as required when hugetlb pages were allocated.  The
'simplification' in [1] is that if the feature is enabled then vmemmap
will only be mapped with PTEs.

This series provides the ability to split PMD mappings 'on demand' as
hugetlb pages are allocated.  As mentioned, it really is a follow on and
optimization to functionality provided in [1].  As such, I am not sure
that repeating the [0/n] description from 1 is necessary here.

In any case, this should be clearly stated in the [0/n] description of
this series.

BTW- I did get through the series today, and did not discover any
issues.  However, I want to sleep on it before signing off.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] Re: [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages
  2021-06-15  3:52   ` Mike Kravetz
@ 2021-06-15  5:37     ` Muchun Song
  0 siblings, 0 replies; 11+ messages in thread
From: Muchun Song @ 2021-06-15  5:37 UTC (permalink / raw)
  To: Mike Kravetz, Andrew Morton
  Cc: Oscar Salvador, Michal Hocko, Song Bao Hua (Barry Song),
	David Hildenbrand, Chen Huang, Bodeddula, Balasubramaniam,
	Jonathan Corbet, Xiongchun duan, fam.zheng, linux-doc, LKML,
	Linux Memory Management List

On Tue, Jun 15, 2021 at 11:52 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 6/14/21 6:12 PM, Andrew Morton wrote:
> > On Sat, 12 Jun 2021 17:45:52 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> >
> >> In order to reduce the difficulty of code review in series[1]. We disable
> >> huge PMD mapping of vmemmap pages when that feature is enabled. In this
> >> series, we do not disable huge PMD mapping of vmemmap pages anymore. We
> >> will split huge PMD mapping when needed. When HugeTLB pages are freed from
> >> the pool we do not attempt coalasce and move back to a PMD mapping because
> >> it is much more complex.
> >>
> >> [1] https://lore.kernel.org/linux-doc/20210510030027.56044-1-songmuchun@bytedance.com/
> >
> > [1] had a nice [0/n] description but the v2 series lost that.  I could
> > copy/paste the v1 changelogging but I am unsure that it has been
> > maintained appropriately for the v2 series.
> >
> > I think I'll pass on this v2 pending additional review input.  Please reinstate
> > the [0/n] overview if/when resending?
>
> There may be some confusion.
>
> This series is a follow on optimization for the functionality provided by
> [1].  Early in the development of [1], it was decided to drop some code
> for ease of review.  Specifically, splitting vmemmap PMD mappings to PTE
> mappings as required when hugetlb pages were allocated.  The
> 'simplification' in [1] is that if the feature is enabled then vmemmap
> will only be mapped with PTEs.
>
> This series provides the ability to split PMD mappings 'on demand' as
> hugetlb pages are allocated.  As mentioned, it really is a follow on and
> optimization to functionality provided in [1].  As such, I am not sure
> that repeating the [0/n] description from 1 is necessary here.
>
> In any case, this should be clearly stated in the [0/n] description of
> this series.

Thanks for the clarification for me. I totally agree with you.

>
> BTW- I did get through the series today, and did not discover any
> issues.  However, I want to sleep on it before signing off.
> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] mm: sparsemem: split the huge PMD mapping of vmemmap pages
  2021-06-12  9:45 ` [PATCH v2 1/3] mm: sparsemem: split the " Muchun Song
@ 2021-06-15 22:31   ` Mike Kravetz
  2021-06-16  3:23     ` [External] " Muchun Song
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2021-06-15 22:31 UTC (permalink / raw)
  To: Muchun Song, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, linux-doc, linux-kernel, linux-mm

On 6/12/21 2:45 AM, Muchun Song wrote:
> Currently, we disable huge PMD mapping of vmemmap pages when that feature
> of "Free some vmemmap pages of HugeTLB pages" is enabled. If the vmemmap
> is huge PMD mapped when we walk the vmemmap page tables, we split the
> huge PMD firstly and then we move to PTE mappings. When HugeTLB pages are
> freed from the pool we do not attempt coalasce and move back to a PMD
> mapping because it is much more complex.

Possible rewording of commit message:

In [1], PMD mappings of vmemmap pages were disabled if the the feature
hugetlb_free_vmemmap was enabled.  This was done to simplify the initial
implementation of vmmemap freeing for hugetlb pages.  Now, remove this
simplification by allowing PMD mapping and switching to PTE mappings as
needed for allocated hugetlb pages.

When a hugetlb page is allocated, the vmemmap page tables are walked to
free vmemmap pages.  During this walk, split huge PMD mappings to PTE
mappings as required.  In the unlikely case PTE pages can not be allocated,
return error(ENOMEM) and do not optimize vmemmap of the hugetlb page.

When HugeTLB pages are freed from the pool, we do not attempt to coalesce
and move back to a PMD mapping because it is much more complex.

[1] https://lkml.kernel.org/r/20210510030027.56044-8-songmuchun@bytedance.com

> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/mm.h   |   4 +-
>  mm/hugetlb_vmemmap.c |   5 +-
>  mm/sparse-vmemmap.c  | 157 ++++++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 123 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cadc8cc2c715..8284e8ed30c9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3055,8 +3055,8 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
>  }
>  #endif
>  
> -void vmemmap_remap_free(unsigned long start, unsigned long end,
> -			unsigned long reuse);
> +int vmemmap_remap_free(unsigned long start, unsigned long end,
> +		       unsigned long reuse);
>  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
>  			unsigned long reuse, gfp_t gfp_mask);
>  
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index f9f9bb212319..06802056f296 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -258,9 +258,8 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head)
>  	 * to the page which @vmemmap_reuse is mapped to, then free the pages
>  	 * which the range [@vmemmap_addr, @vmemmap_end] is mapped to.
>  	 */
> -	vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
> -
> -	SetHPageVmemmapOptimized(head);
> +	if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse))
> +		SetHPageVmemmapOptimized(head);
>  }
>  
>  void __init hugetlb_vmemmap_init(struct hstate *h)
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 693de0aec7a8..7f73c37f742d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -38,6 +38,7 @@
>   * vmemmap_remap_walk - walk vmemmap page table
>   *
>   * @remap_pte:		called for each lowest-level entry (PTE).
> + * @walked_pte:		the number of walked pte.

Suggest name change to 'nr_walked_pte' or just 'nr_walked'?  walked_pte
could be confused with a pointer to pte.

>   * @reuse_page:		the page which is reused for the tail vmemmap pages.
>   * @reuse_addr:		the virtual address of the @reuse_page page.
>   * @vmemmap_pages:	the list head of the vmemmap pages that can be freed
> @@ -46,11 +47,44 @@
>  struct vmemmap_remap_walk {
>  	void (*remap_pte)(pte_t *pte, unsigned long addr,
>  			  struct vmemmap_remap_walk *walk);
> +	unsigned long walked_pte;
>  	struct page *reuse_page;
>  	unsigned long reuse_addr;
>  	struct list_head *vmemmap_pages;
>  };
>  
> +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> +				  struct vmemmap_remap_walk *walk)
> +{
> +	pmd_t __pmd;
> +	int i;
> +	unsigned long addr = start;
> +	struct page *page = pmd_page(*pmd);
> +	pte_t *pgtable = pte_alloc_one_kernel(&init_mm);
> +
> +	if (!pgtable)
> +		return -ENOMEM;
> +
> +	pmd_populate_kernel(&init_mm, &__pmd, pgtable);
> +
> +	for (i = 0; i < PMD_SIZE / PAGE_SIZE; i++, addr += PAGE_SIZE) {
> +		pte_t entry, *pte;
> +		pgprot_t pgprot = PAGE_KERNEL;
> +
> +		entry = mk_pte(page + i, pgprot);
> +		pte = pte_offset_kernel(&__pmd, addr);
> +		set_pte_at(&init_mm, addr, pte, entry);
> +	}
> +
> +	/* Make pte visible before pmd. See comment in __pte_alloc(). */
> +	smp_wmb();
> +	pmd_populate_kernel(&init_mm, pmd, pgtable);
> +
> +	flush_tlb_kernel_range(start, start + PMD_SIZE);
> +
> +	return 0;
> +}
> +
>  static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
>  			      unsigned long end,
>  			      struct vmemmap_remap_walk *walk)
> @@ -68,59 +102,81 @@ static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
>  		 * walking, skip the reuse address range.
>  		 */
>  		addr += PAGE_SIZE;
> +		walk->walked_pte++;
>  		pte++;
>  	}
>  
> -	for (; addr != end; addr += PAGE_SIZE, pte++)
> +	for (; addr != end; addr += PAGE_SIZE, pte++) {
>  		walk->remap_pte(pte, addr, walk);
> +		walk->walked_pte++;
> +	}
>  }
>  
> -static void vmemmap_pmd_range(pud_t *pud, unsigned long addr,
> -			      unsigned long end,
> -			      struct vmemmap_remap_walk *walk)
> +static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
> +			     unsigned long end,
> +			     struct vmemmap_remap_walk *walk)
>  {
>  	pmd_t *pmd;
>  	unsigned long next;
>  
>  	pmd = pmd_offset(pud, addr);
>  	do {
> -		BUG_ON(pmd_leaf(*pmd));
> +		if (pmd_leaf(*pmd)) {
> +			int ret;
>  
> +			ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
> +			if (ret)
> +				return ret;
> +		}
>  		next = pmd_addr_end(addr, end);
>  		vmemmap_pte_range(pmd, addr, next, walk);
>  	} while (pmd++, addr = next, addr != end);
> +
> +	return 0;
>  }
>  
> -static void vmemmap_pud_range(p4d_t *p4d, unsigned long addr,
> -			      unsigned long end,
> -			      struct vmemmap_remap_walk *walk)
> +static int vmemmap_pud_range(p4d_t *p4d, unsigned long addr,
> +			     unsigned long end,
> +			     struct vmemmap_remap_walk *walk)
>  {
>  	pud_t *pud;
>  	unsigned long next;
>  
>  	pud = pud_offset(p4d, addr);
>  	do {
> +		int ret;
> +
>  		next = pud_addr_end(addr, end);
> -		vmemmap_pmd_range(pud, addr, next, walk);
> +		ret = vmemmap_pmd_range(pud, addr, next, walk);
> +		if (ret)
> +			return ret;
>  	} while (pud++, addr = next, addr != end);
> +
> +	return 0;
>  }
>  
> -static void vmemmap_p4d_range(pgd_t *pgd, unsigned long addr,
> -			      unsigned long end,
> -			      struct vmemmap_remap_walk *walk)
> +static int vmemmap_p4d_range(pgd_t *pgd, unsigned long addr,
> +			     unsigned long end,
> +			     struct vmemmap_remap_walk *walk)
>  {
>  	p4d_t *p4d;
>  	unsigned long next;
>  
>  	p4d = p4d_offset(pgd, addr);
>  	do {
> +		int ret;
> +
>  		next = p4d_addr_end(addr, end);
> -		vmemmap_pud_range(p4d, addr, next, walk);
> +		ret = vmemmap_pud_range(p4d, addr, next, walk);
> +		if (ret)
> +			return ret;
>  	} while (p4d++, addr = next, addr != end);
> +
> +	return 0;
>  }
>  
> -static void vmemmap_remap_range(unsigned long start, unsigned long end,
> -				struct vmemmap_remap_walk *walk)
> +static int vmemmap_remap_range(unsigned long start, unsigned long end,
> +			       struct vmemmap_remap_walk *walk)
>  {
>  	unsigned long addr = start;
>  	unsigned long next;
> @@ -131,8 +187,12 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end,
>  
>  	pgd = pgd_offset_k(addr);
>  	do {
> +		int ret;
> +
>  		next = pgd_addr_end(addr, end);
> -		vmemmap_p4d_range(pgd, addr, next, walk);
> +		ret = vmemmap_p4d_range(pgd, addr, next, walk);
> +		if (ret)
> +			return ret;
>  	} while (pgd++, addr = next, addr != end);
>  
>  	/*
> @@ -141,6 +201,8 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end,
>  	 * belongs to the range.
>  	 */
>  	flush_tlb_kernel_range(start + PAGE_SIZE, end);
> +
> +	return 0;
>  }
>  
>  /*
> @@ -179,10 +241,27 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
>  	pte_t entry = mk_pte(walk->reuse_page, pgprot);
>  	struct page *page = pte_page(*pte);
>  
> -	list_add(&page->lru, walk->vmemmap_pages);
> +	list_add_tail(&page->lru, walk->vmemmap_pages);
>  	set_pte_at(&init_mm, addr, pte, entry);
>  }
>  
> +static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> +				struct vmemmap_remap_walk *walk)
> +{
> +	pgprot_t pgprot = PAGE_KERNEL;
> +	struct page *page;
> +	void *to;
> +
> +	BUG_ON(pte_page(*pte) != walk->reuse_page);
> +
> +	page = list_first_entry(walk->vmemmap_pages, struct page, lru);
> +	list_del(&page->lru);
> +	to = page_to_virt(page);
> +	copy_page(to, (void *)walk->reuse_addr);
> +
> +	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> +}
> +
>  /**
>   * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end)
>   *			to the page which @reuse is mapped to, then free vmemmap
> @@ -193,12 +272,12 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
>   *		remap.
>   * @reuse:	reuse address.
>   *
> - * Note: This function depends on vmemmap being base page mapped. Please make
> - * sure that we disable PMD mapping of vmemmap pages when calling this function.
> + * Return: %0 on success, negative error code otherwise.
>   */
> -void vmemmap_remap_free(unsigned long start, unsigned long end,
> -			unsigned long reuse)
> +int vmemmap_remap_free(unsigned long start, unsigned long end,
> +		       unsigned long reuse)
>  {
> +	int ret;
>  	LIST_HEAD(vmemmap_pages);
>  	struct vmemmap_remap_walk walk = {
>  		.remap_pte	= vmemmap_remap_pte,
> @@ -221,25 +300,25 @@ void vmemmap_remap_free(unsigned long start, unsigned long end,
>  	 */
>  	BUG_ON(start - reuse != PAGE_SIZE);
>  
> -	vmemmap_remap_range(reuse, end, &walk);
> -	free_vmemmap_page_list(&vmemmap_pages);
> -}
> +	mmap_write_lock(&init_mm);
> +	ret = vmemmap_remap_range(reuse, end, &walk);
> +	mmap_write_downgrade(&init_mm);
>  
> -static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> -				struct vmemmap_remap_walk *walk)
> -{
> -	pgprot_t pgprot = PAGE_KERNEL;
> -	struct page *page;
> -	void *to;
> +	if (ret && walk.walked_pte) {
> +		end = reuse + walk.walked_pte * PAGE_SIZE;

Might be good to have a comment saying:

		/*
		 * vmemmap_pages contains pages from the previous
		 * vmemmap_remap_range call which failed.  These
		 * are pages which were removed from the vmemmap.
		 * They will be restored in the following call.
		 */

Code looks good and I like that changes were mostly isolated to
sparse-vmemmap.c.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

It still would be good if someone else takes a look at these changes.
-- 
Mike Kravetz

> +		walk = (struct vmemmap_remap_walk) {
> +			.remap_pte	= vmemmap_restore_pte,
> +			.reuse_addr	= reuse,
> +			.vmemmap_pages	= &vmemmap_pages,
> +		};
>  
> -	BUG_ON(pte_page(*pte) != walk->reuse_page);
> +		vmemmap_remap_range(reuse, end, &walk);
> +	}
> +	mmap_read_unlock(&init_mm);
>  
> -	page = list_first_entry(walk->vmemmap_pages, struct page, lru);
> -	list_del(&page->lru);
> -	to = page_to_virt(page);
> -	copy_page(to, (void *)walk->reuse_addr);
> +	free_vmemmap_page_list(&vmemmap_pages);
>  
> -	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> +	return ret;
>  }
>  
>  static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> @@ -273,6 +352,8 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>   *		remap.
>   * @reuse:	reuse address.
>   * @gpf_mask:	GFP flag for allocating vmemmap pages.
> + *
> + * Return: %0 on success, negative error code otherwise.
>   */
>  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
>  			unsigned long reuse, gfp_t gfp_mask)
> @@ -287,12 +368,12 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end,
>  	/* See the comment in the vmemmap_remap_free(). */
>  	BUG_ON(start - reuse != PAGE_SIZE);
>  
> -	might_sleep_if(gfpflags_allow_blocking(gfp_mask));
> -
>  	if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages))
>  		return -ENOMEM;
>  
> +	mmap_read_lock(&init_mm);
>  	vmemmap_remap_range(reuse, end, &walk);
> +	mmap_read_unlock(&init_mm);
>  
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
  2021-06-12  9:45 ` [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
@ 2021-06-15 23:00   ` Joao Martins
  2021-06-16  3:04     ` [External] " Muchun Song
  0 siblings, 1 reply; 11+ messages in thread
From: Joao Martins @ 2021-06-15 23:00 UTC (permalink / raw)
  To: Muchun Song
  Cc: duanxiongchun, fam.zheng, linux-doc, linux-kernel, linux-mm,
	mike.kravetz, chenhuang5, akpm, osalvador, mhocko, song.bao.hua,
	david, corbet, bodeddub



On 6/12/21 10:45 AM, Muchun Song wrote:
> When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages
> associated with each HugeTLB page is default off. Now the vmemmap is PMD
> mapped. So there is no side effect when this feature is enabled with no
> HugeTLB pages in the system. Someone may want to enable this feature in
> the compiler time instead of using boot command line. So add a config to
> make it default on when someone do not want to enable it via command line.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 +++
>  fs/Kconfig                                      | 10 ++++++++++
>  mm/hugetlb_vmemmap.c                            |  6 ++++--
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a01aadafee38..8eee439d943c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1604,6 +1604,9 @@
>  			on:  enable the feature
>  			off: disable the feature
>  
> +			Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y,
> +			the default is on.
> +
>  			This is not compatible with memory_hotplug.memmap_on_memory.
>  			If both parameters are enabled, hugetlb_free_vmemmap takes
>  			precedence over memory_hotplug.memmap_on_memory.
> diff --git a/fs/Kconfig b/fs/Kconfig
> index f40b5b98f7ba..e78bc5daf7b0 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -245,6 +245,16 @@ config HUGETLB_PAGE_FREE_VMEMMAP
>  	depends on X86_64
>  	depends on SPARSEMEM_VMEMMAP
>  
Now that you have no longer have the directmap in basepages limitation, I suppose you no
longer need explicit arch support for HUGETLB_PAGE_FREE_VMEMMAP right?

If so, I suppose you might be able to remove the 'depends on X86_64' part and "gain"
ARM64, PPC, etc support.

	Joao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] Re: [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
  2021-06-15 23:00   ` Joao Martins
@ 2021-06-16  3:04     ` Muchun Song
  0 siblings, 0 replies; 11+ messages in thread
From: Muchun Song @ 2021-06-16  3:04 UTC (permalink / raw)
  To: Joao Martins
  Cc: Xiongchun duan, fam.zheng, linux-doc, LKML,
	Linux Memory Management List, Mike Kravetz, Chen Huang,
	Andrew Morton, Oscar Salvador, Michal Hocko,
	Song Bao Hua (Barry Song),
	David Hildenbrand, Jonathan Corbet, Bodeddula, Balasubramaniam

On Wed, Jun 16, 2021 at 7:01 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>
>
>
> On 6/12/21 10:45 AM, Muchun Song wrote:
> > When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages
> > associated with each HugeTLB page is default off. Now the vmemmap is PMD
> > mapped. So there is no side effect when this feature is enabled with no
> > HugeTLB pages in the system. Someone may want to enable this feature in
> > the compiler time instead of using boot command line. So add a config to
> > make it default on when someone do not want to enable it via command line.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |  3 +++
> >  fs/Kconfig                                      | 10 ++++++++++
> >  mm/hugetlb_vmemmap.c                            |  6 ++++--
> >  3 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index a01aadafee38..8eee439d943c 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1604,6 +1604,9 @@
> >                       on:  enable the feature
> >                       off: disable the feature
> >
> > +                     Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y,
> > +                     the default is on.
> > +
> >                       This is not compatible with memory_hotplug.memmap_on_memory.
> >                       If both parameters are enabled, hugetlb_free_vmemmap takes
> >                       precedence over memory_hotplug.memmap_on_memory.
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index f40b5b98f7ba..e78bc5daf7b0 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -245,6 +245,16 @@ config HUGETLB_PAGE_FREE_VMEMMAP
> >       depends on X86_64
> >       depends on SPARSEMEM_VMEMMAP
> >
> Now that you have no longer have the directmap in basepages limitation, I suppose you no
> longer need explicit arch support for HUGETLB_PAGE_FREE_VMEMMAP right?
>
> If so, I suppose you might be able to remove the 'depends on X86_64' part and "gain"
> ARM64, PPC, etc support.

You are right. This is the next step I want to do. Also, include IA64 and RISCV.

>
>         Joao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] Re: [PATCH v2 1/3] mm: sparsemem: split the huge PMD mapping of vmemmap pages
  2021-06-15 22:31   ` Mike Kravetz
@ 2021-06-16  3:23     ` Muchun Song
  0 siblings, 0 replies; 11+ messages in thread
From: Muchun Song @ 2021-06-16  3:23 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Oscar Salvador, Michal Hocko,
	Song Bao Hua (Barry Song),
	David Hildenbrand, Chen Huang, Bodeddula, Balasubramaniam,
	Jonathan Corbet, Xiongchun duan, fam.zheng, linux-doc, LKML,
	Linux Memory Management List

On Wed, Jun 16, 2021 at 6:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 6/12/21 2:45 AM, Muchun Song wrote:
> > Currently, we disable huge PMD mapping of vmemmap pages when that feature
> > of "Free some vmemmap pages of HugeTLB pages" is enabled. If the vmemmap
> > is huge PMD mapped when we walk the vmemmap page tables, we split the
> > huge PMD firstly and then we move to PTE mappings. When HugeTLB pages are
> > freed from the pool we do not attempt coalasce and move back to a PMD
> > mapping because it is much more complex.
>
> Possible rewording of commit message:
>
> In [1], PMD mappings of vmemmap pages were disabled if the the feature
> hugetlb_free_vmemmap was enabled.  This was done to simplify the initial
> implementation of vmmemap freeing for hugetlb pages.  Now, remove this
> simplification by allowing PMD mapping and switching to PTE mappings as
> needed for allocated hugetlb pages.
>
> When a hugetlb page is allocated, the vmemmap page tables are walked to
> free vmemmap pages.  During this walk, split huge PMD mappings to PTE
> mappings as required.  In the unlikely case PTE pages can not be allocated,
> return error(ENOMEM) and do not optimize vmemmap of the hugetlb page.
>
> When HugeTLB pages are freed from the pool, we do not attempt to coalesce
> and move back to a PMD mapping because it is much more complex.
>
> [1] https://lkml.kernel.org/r/20210510030027.56044-8-songmuchun@bytedance.com

Thanks, Mike, I'll reuse this.

>
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/mm.h   |   4 +-
> >  mm/hugetlb_vmemmap.c |   5 +-
> >  mm/sparse-vmemmap.c  | 157 ++++++++++++++++++++++++++++++++++++++-------------
> >  3 files changed, 123 insertions(+), 43 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index cadc8cc2c715..8284e8ed30c9 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3055,8 +3055,8 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
> >  }
> >  #endif
> >
> > -void vmemmap_remap_free(unsigned long start, unsigned long end,
> > -                     unsigned long reuse);
> > +int vmemmap_remap_free(unsigned long start, unsigned long end,
> > +                    unsigned long reuse);
> >  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
> >                       unsigned long reuse, gfp_t gfp_mask);
> >
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index f9f9bb212319..06802056f296 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -258,9 +258,8 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> >        * to the page which @vmemmap_reuse is mapped to, then free the pages
> >        * which the range [@vmemmap_addr, @vmemmap_end] is mapped to.
> >        */
> > -     vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
> > -
> > -     SetHPageVmemmapOptimized(head);
> > +     if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse))
> > +             SetHPageVmemmapOptimized(head);
> >  }
> >
> >  void __init hugetlb_vmemmap_init(struct hstate *h)
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index 693de0aec7a8..7f73c37f742d 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -38,6 +38,7 @@
> >   * vmemmap_remap_walk - walk vmemmap page table
> >   *
> >   * @remap_pte:               called for each lowest-level entry (PTE).
> > + * @walked_pte:              the number of walked pte.
>
> Suggest name change to 'nr_walked_pte' or just 'nr_walked'?  walked_pte
> could be confused with a pointer to pte.

Make sense. Will update.

>
> >   * @reuse_page:              the page which is reused for the tail vmemmap pages.
> >   * @reuse_addr:              the virtual address of the @reuse_page page.
> >   * @vmemmap_pages:   the list head of the vmemmap pages that can be freed
> > @@ -46,11 +47,44 @@
> >  struct vmemmap_remap_walk {
> >       void (*remap_pte)(pte_t *pte, unsigned long addr,
> >                         struct vmemmap_remap_walk *walk);
> > +     unsigned long walked_pte;
> >       struct page *reuse_page;
> >       unsigned long reuse_addr;
> >       struct list_head *vmemmap_pages;
> >  };
> >
> > +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> > +                               struct vmemmap_remap_walk *walk)
> > +{
> > +     pmd_t __pmd;
> > +     int i;
> > +     unsigned long addr = start;
> > +     struct page *page = pmd_page(*pmd);
> > +     pte_t *pgtable = pte_alloc_one_kernel(&init_mm);
> > +
> > +     if (!pgtable)
> > +             return -ENOMEM;
> > +
> > +     pmd_populate_kernel(&init_mm, &__pmd, pgtable);
> > +
> > +     for (i = 0; i < PMD_SIZE / PAGE_SIZE; i++, addr += PAGE_SIZE) {
> > +             pte_t entry, *pte;
> > +             pgprot_t pgprot = PAGE_KERNEL;
> > +
> > +             entry = mk_pte(page + i, pgprot);
> > +             pte = pte_offset_kernel(&__pmd, addr);
> > +             set_pte_at(&init_mm, addr, pte, entry);
> > +     }
> > +
> > +     /* Make pte visible before pmd. See comment in __pte_alloc(). */
> > +     smp_wmb();
> > +     pmd_populate_kernel(&init_mm, pmd, pgtable);
> > +
> > +     flush_tlb_kernel_range(start, start + PMD_SIZE);
> > +
> > +     return 0;
> > +}
> > +
> >  static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
> >                             unsigned long end,
> >                             struct vmemmap_remap_walk *walk)
> > @@ -68,59 +102,81 @@ static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
> >                * walking, skip the reuse address range.
> >                */
> >               addr += PAGE_SIZE;
> > +             walk->walked_pte++;
> >               pte++;
> >       }
> >
> > -     for (; addr != end; addr += PAGE_SIZE, pte++)
> > +     for (; addr != end; addr += PAGE_SIZE, pte++) {
> >               walk->remap_pte(pte, addr, walk);
> > +             walk->walked_pte++;
> > +     }
> >  }
> >
> > -static void vmemmap_pmd_range(pud_t *pud, unsigned long addr,
> > -                           unsigned long end,
> > -                           struct vmemmap_remap_walk *walk)
> > +static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
> > +                          unsigned long end,
> > +                          struct vmemmap_remap_walk *walk)
> >  {
> >       pmd_t *pmd;
> >       unsigned long next;
> >
> >       pmd = pmd_offset(pud, addr);
> >       do {
> > -             BUG_ON(pmd_leaf(*pmd));
> > +             if (pmd_leaf(*pmd)) {
> > +                     int ret;
> >
> > +                     ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
> > +                     if (ret)
> > +                             return ret;
> > +             }
> >               next = pmd_addr_end(addr, end);
> >               vmemmap_pte_range(pmd, addr, next, walk);
> >       } while (pmd++, addr = next, addr != end);
> > +
> > +     return 0;
> >  }
> >
> > -static void vmemmap_pud_range(p4d_t *p4d, unsigned long addr,
> > -                           unsigned long end,
> > -                           struct vmemmap_remap_walk *walk)
> > +static int vmemmap_pud_range(p4d_t *p4d, unsigned long addr,
> > +                          unsigned long end,
> > +                          struct vmemmap_remap_walk *walk)
> >  {
> >       pud_t *pud;
> >       unsigned long next;
> >
> >       pud = pud_offset(p4d, addr);
> >       do {
> > +             int ret;
> > +
> >               next = pud_addr_end(addr, end);
> > -             vmemmap_pmd_range(pud, addr, next, walk);
> > +             ret = vmemmap_pmd_range(pud, addr, next, walk);
> > +             if (ret)
> > +                     return ret;
> >       } while (pud++, addr = next, addr != end);
> > +
> > +     return 0;
> >  }
> >
> > -static void vmemmap_p4d_range(pgd_t *pgd, unsigned long addr,
> > -                           unsigned long end,
> > -                           struct vmemmap_remap_walk *walk)
> > +static int vmemmap_p4d_range(pgd_t *pgd, unsigned long addr,
> > +                          unsigned long end,
> > +                          struct vmemmap_remap_walk *walk)
> >  {
> >       p4d_t *p4d;
> >       unsigned long next;
> >
> >       p4d = p4d_offset(pgd, addr);
> >       do {
> > +             int ret;
> > +
> >               next = p4d_addr_end(addr, end);
> > -             vmemmap_pud_range(p4d, addr, next, walk);
> > +             ret = vmemmap_pud_range(p4d, addr, next, walk);
> > +             if (ret)
> > +                     return ret;
> >       } while (p4d++, addr = next, addr != end);
> > +
> > +     return 0;
> >  }
> >
> > -static void vmemmap_remap_range(unsigned long start, unsigned long end,
> > -                             struct vmemmap_remap_walk *walk)
> > +static int vmemmap_remap_range(unsigned long start, unsigned long end,
> > +                            struct vmemmap_remap_walk *walk)
> >  {
> >       unsigned long addr = start;
> >       unsigned long next;
> > @@ -131,8 +187,12 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end,
> >
> >       pgd = pgd_offset_k(addr);
> >       do {
> > +             int ret;
> > +
> >               next = pgd_addr_end(addr, end);
> > -             vmemmap_p4d_range(pgd, addr, next, walk);
> > +             ret = vmemmap_p4d_range(pgd, addr, next, walk);
> > +             if (ret)
> > +                     return ret;
> >       } while (pgd++, addr = next, addr != end);
> >
> >       /*
> > @@ -141,6 +201,8 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end,
> >        * belongs to the range.
> >        */
> >       flush_tlb_kernel_range(start + PAGE_SIZE, end);
> > +
> > +     return 0;
> >  }
> >
> >  /*
> > @@ -179,10 +241,27 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
> >       pte_t entry = mk_pte(walk->reuse_page, pgprot);
> >       struct page *page = pte_page(*pte);
> >
> > -     list_add(&page->lru, walk->vmemmap_pages);
> > +     list_add_tail(&page->lru, walk->vmemmap_pages);
> >       set_pte_at(&init_mm, addr, pte, entry);
> >  }
> >
> > +static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> > +                             struct vmemmap_remap_walk *walk)
> > +{
> > +     pgprot_t pgprot = PAGE_KERNEL;
> > +     struct page *page;
> > +     void *to;
> > +
> > +     BUG_ON(pte_page(*pte) != walk->reuse_page);
> > +
> > +     page = list_first_entry(walk->vmemmap_pages, struct page, lru);
> > +     list_del(&page->lru);
> > +     to = page_to_virt(page);
> > +     copy_page(to, (void *)walk->reuse_addr);
> > +
> > +     set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> > +}
> > +
> >  /**
> >   * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end)
> >   *                   to the page which @reuse is mapped to, then free vmemmap
> > @@ -193,12 +272,12 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
> >   *           remap.
> >   * @reuse:   reuse address.
> >   *
> > - * Note: This function depends on vmemmap being base page mapped. Please make
> > - * sure that we disable PMD mapping of vmemmap pages when calling this function.
> > + * Return: %0 on success, negative error code otherwise.
> >   */
> > -void vmemmap_remap_free(unsigned long start, unsigned long end,
> > -                     unsigned long reuse)
> > +int vmemmap_remap_free(unsigned long start, unsigned long end,
> > +                    unsigned long reuse)
> >  {
> > +     int ret;
> >       LIST_HEAD(vmemmap_pages);
> >       struct vmemmap_remap_walk walk = {
> >               .remap_pte      = vmemmap_remap_pte,
> > @@ -221,25 +300,25 @@ void vmemmap_remap_free(unsigned long start, unsigned long end,
> >        */
> >       BUG_ON(start - reuse != PAGE_SIZE);
> >
> > -     vmemmap_remap_range(reuse, end, &walk);
> > -     free_vmemmap_page_list(&vmemmap_pages);
> > -}
> > +     mmap_write_lock(&init_mm);
> > +     ret = vmemmap_remap_range(reuse, end, &walk);
> > +     mmap_write_downgrade(&init_mm);
> >
> > -static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> > -                             struct vmemmap_remap_walk *walk)
> > -{
> > -     pgprot_t pgprot = PAGE_KERNEL;
> > -     struct page *page;
> > -     void *to;
> > +     if (ret && walk.walked_pte) {
> > +             end = reuse + walk.walked_pte * PAGE_SIZE;
>
> Might be good to have a comment saying:
>
>                 /*
>                  * vmemmap_pages contains pages from the previous
>                  * vmemmap_remap_range call which failed.  These
>                  * are pages which were removed from the vmemmap.
>                  * They will be restored in the following call.
>                  */

More clear. Will do.

>
> Code looks good and I like that changes were mostly isolated to
> sparse-vmemmap.c.
>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Thanks Mike.

>
> It still would be good if someone else takes a look at these changes.
> --
> Mike Kravetz
>
> > +             walk = (struct vmemmap_remap_walk) {
> > +                     .remap_pte      = vmemmap_restore_pte,
> > +                     .reuse_addr     = reuse,
> > +                     .vmemmap_pages  = &vmemmap_pages,
> > +             };
> >
> > -     BUG_ON(pte_page(*pte) != walk->reuse_page);
> > +             vmemmap_remap_range(reuse, end, &walk);
> > +     }
> > +     mmap_read_unlock(&init_mm);
> >
> > -     page = list_first_entry(walk->vmemmap_pages, struct page, lru);
> > -     list_del(&page->lru);
> > -     to = page_to_virt(page);
> > -     copy_page(to, (void *)walk->reuse_addr);
> > +     free_vmemmap_page_list(&vmemmap_pages);
> >
> > -     set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> > +     return ret;
> >  }
> >
> >  static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> > @@ -273,6 +352,8 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> >   *           remap.
> >   * @reuse:   reuse address.
> >   * @gpf_mask:        GFP flag for allocating vmemmap pages.
> > + *
> > + * Return: %0 on success, negative error code otherwise.
> >   */
> >  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
> >                       unsigned long reuse, gfp_t gfp_mask)
> > @@ -287,12 +368,12 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end,
> >       /* See the comment in the vmemmap_remap_free(). */
> >       BUG_ON(start - reuse != PAGE_SIZE);
> >
> > -     might_sleep_if(gfpflags_allow_blocking(gfp_mask));
> > -
> >       if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages))
> >               return -ENOMEM;
> >
> > +     mmap_read_lock(&init_mm);
> >       vmemmap_remap_range(reuse, end, &walk);
> > +     mmap_read_unlock(&init_mm);
> >
> >       return 0;
> >  }
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-16  3:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-12  9:45 [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Muchun Song
2021-06-12  9:45 ` [PATCH v2 1/3] mm: sparsemem: split the " Muchun Song
2021-06-15 22:31   ` Mike Kravetz
2021-06-16  3:23     ` [External] " Muchun Song
2021-06-12  9:45 ` [PATCH v2 2/3] mm: sparsemem: use huge PMD mapping for " Muchun Song
2021-06-12  9:45 ` [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
2021-06-15 23:00   ` Joao Martins
2021-06-16  3:04     ` [External] " Muchun Song
2021-06-15  1:12 ` [PATCH v2 0/3] Split huge PMD mapping of vmemmap pages Andrew Morton
2021-06-15  3:52   ` Mike Kravetz
2021-06-15  5:37     ` [External] " Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).