All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Split huge PMD mapping of vmemmap pages
@ 2021-06-09 12:13 Muchun Song
  2021-06-09 12:13 ` [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables Muchun Song
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Muchun Song @ 2021-06-09 12:13 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm, Muchun Song

In order to reduce the difficulty of code review in series[1]. We disable
huge PMD mapping of vmemmap pages when that feature is enabled. In this
series, we do not disable huge PMD mapping of vmemmap pages anymore. We
will split huge PMD mapping when needed.

[1] https://lore.kernel.org/linux-doc/20210510030027.56044-1-songmuchun@bytedance.com/

Muchun Song (5):
  mm: hugetlb: introduce helpers to preallocate/free page tables
  mm: hugetlb: introduce helpers to preallocate page tables from bootmem
    allocator
  mm: sparsemem: split the huge PMD mapping of vmemmap pages
  mm: sparsemem: use huge PMD mapping for vmemmap pages
  mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON

 Documentation/admin-guide/kernel-parameters.txt |  10 +-
 arch/x86/mm/init_64.c                           |   8 +-
 fs/Kconfig                                      |  10 ++
 include/linux/hugetlb.h                         |  28 ++----
 include/linux/mm.h                              |   2 +-
 mm/hugetlb.c                                    |  42 +++++++-
 mm/hugetlb_vmemmap.c                            | 126 +++++++++++++++++++++++-
 mm/hugetlb_vmemmap.h                            |  25 +++++
 mm/memory_hotplug.c                             |   2 +-
 mm/sparse-vmemmap.c                             |  61 ++++++++++--
 10 files changed, 267 insertions(+), 47 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables
  2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
@ 2021-06-09 12:13 ` Muchun Song
  2021-06-10 21:49   ` Mike Kravetz
  2021-06-09 12:13 ` [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator Muchun Song
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2021-06-09 12:13 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm, Muchun Song

On some architectures (e.g. x86_64 and arm64), vmemmap pages are usually
mapped with huge pmd. We will disable the huge pmd mapping of vmemmap
pages when the feature of "Free vmemmap pages of HugeTLB page" is enabled.
This can affect the non-HugeTLB pages. What we want is only mapping the
vmemmap pages associated with HugeTLB pages with base page. We can split
the huge pmd mapping of vmemmap pages when freeing vmemmap pages of
HugeTLB page. But we need to preallocate page tables. In this patch, we
introduce page tables allocationg/freeing helpers.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb_vmemmap.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/hugetlb_vmemmap.h | 12 ++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index f9f9bb212319..628e2752714f 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -170,6 +170,9 @@
  */
 #define pr_fmt(fmt)	"HugeTLB: " fmt
 
+#include <linux/list.h>
+#include <asm/pgalloc.h>
+
 #include "hugetlb_vmemmap.h"
 
 /*
@@ -209,6 +212,57 @@ static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h)
 	return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT;
 }
 
+static inline unsigned int vmemmap_pages_per_hpage(struct hstate *h)
+{
+	return free_vmemmap_pages_per_hpage(h) + RESERVE_VMEMMAP_NR;
+}
+
+static inline unsigned long vmemmap_pages_size_per_hpage(struct hstate *h)
+{
+	return (unsigned long)vmemmap_pages_per_hpage(h) << PAGE_SHIFT;
+}
+
+static inline unsigned int pgtable_pages_to_prealloc_per_hpage(struct hstate *h)
+{
+	unsigned long vmemmap_size = vmemmap_pages_size_per_hpage(h);
+
+	/*
+	 * No need to pre-allocate page tables when there is no vmemmap pages
+	 * to be freed.
+	 */
+	if (!free_vmemmap_pages_per_hpage(h))
+		return 0;
+
+	return ALIGN(vmemmap_size, PMD_SIZE) >> PMD_SHIFT;
+}
+
+void vmemmap_pgtable_free(struct list_head *pgtables)
+{
+	struct page *pte_page, *t_page;
+
+	list_for_each_entry_safe(pte_page, t_page, pgtables, lru)
+		pte_free_kernel(&init_mm, page_to_virt(pte_page));
+}
+
+int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables)
+{
+	unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
+
+	while (nr--) {
+		pte_t *pte_p;
+
+		pte_p = pte_alloc_one_kernel(&init_mm);
+		if (!pte_p)
+			goto out;
+		list_add(&virt_to_page(pte_p)->lru, pgtables);
+	}
+
+	return 0;
+out:
+	vmemmap_pgtable_free(pgtables);
+	return -ENOMEM;
+}
+
 /*
  * Previously discarded vmemmap pages will be allocated and remapping
  * after this function returns zero.
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index cb2bef8f9e73..306e15519da1 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -14,6 +14,8 @@
 int alloc_huge_page_vmemmap(struct hstate *h, struct page *head);
 void free_huge_page_vmemmap(struct hstate *h, struct page *head);
 void hugetlb_vmemmap_init(struct hstate *h);
+int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables);
+void vmemmap_pgtable_free(struct list_head *pgtables);
 
 /*
  * How many vmemmap pages associated with a HugeTLB page that can be freed
@@ -33,6 +35,16 @@ static inline void free_huge_page_vmemmap(struct hstate *h, struct page *head)
 {
 }
 
+static inline int vmemmap_pgtable_prealloc(struct hstate *h,
+					   struct list_head *pgtables)
+{
+	return 0;
+}
+
+static inline void vmemmap_pgtable_free(struct list_head *pgtables)
+{
+}
+
 static inline void hugetlb_vmemmap_init(struct hstate *h)
 {
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator
  2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
  2021-06-09 12:13 ` [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables Muchun Song
@ 2021-06-09 12:13 ` Muchun Song
  2021-06-10 22:13   ` Mike Kravetz
  2021-06-09 12:13 ` [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages Muchun Song
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2021-06-09 12:13 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm, Muchun Song

If we want to split the huge PMD of vmemmap pages associated with each
gigantic page allocated from bootmem allocator, we should pre-allocate
the page tables from bootmem allocator. In this patch, we introduce
some helpers to preallocate page tables for gigantic pages.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/hugetlb.h |  3 +++
 mm/hugetlb_vmemmap.c    | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/hugetlb_vmemmap.h    | 13 ++++++++++
 3 files changed, 79 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 03ca83db0a3e..c27a299c4211 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -622,6 +622,9 @@ struct hstate {
 struct huge_bootmem_page {
 	struct list_head list;
 	struct hstate *hstate;
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+	pte_t *vmemmap_pte;
+#endif
 };
 
 int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 628e2752714f..6f3a47b4ebd3 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -171,6 +171,7 @@
 #define pr_fmt(fmt)	"HugeTLB: " fmt
 
 #include <linux/list.h>
+#include <linux/memblock.h>
 #include <asm/pgalloc.h>
 
 #include "hugetlb_vmemmap.h"
@@ -263,6 +264,68 @@ int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables)
 	return -ENOMEM;
 }
 
+unsigned long __init gigantic_vmemmap_pgtable_prealloc(void)
+{
+	struct huge_bootmem_page *m, *tmp;
+	unsigned long nr_free = 0;
+
+	list_for_each_entry_safe(m, tmp, &huge_boot_pages, list) {
+		struct hstate *h = m->hstate;
+		unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
+		unsigned long size;
+
+		if (!nr)
+			continue;
+
+		size = nr << PAGE_SHIFT;
+		m->vmemmap_pte = memblock_alloc_try_nid(size, PAGE_SIZE, 0,
+							MEMBLOCK_ALLOC_ACCESSIBLE,
+							NUMA_NO_NODE);
+		if (!m->vmemmap_pte) {
+			nr_free++;
+			list_del(&m->list);
+			memblock_free_early(__pa(m), huge_page_size(h));
+		}
+	}
+
+	return nr_free;
+}
+
+void __init gigantic_vmemmap_pgtable_init(struct huge_bootmem_page *m,
+					  struct page *head)
+{
+	struct hstate *h = m->hstate;
+	unsigned long pte = (unsigned long)m->vmemmap_pte;
+	unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
+
+	if (!nr)
+		return;
+
+	/*
+	 * If we had gigantic hugepages allocated at boot time, we need
+	 * to restore the 'stolen' pages to totalram_pages in order to
+	 * fix confusing memory reports from free(1) and another
+	 * side-effects, like CommitLimit going negative.
+	 */
+	adjust_managed_page_count(head, nr);
+
+	/*
+	 * Use the huge page lru list to temporarily store the preallocated
+	 * pages. The preallocated pages are used and the list is emptied
+	 * before the huge page is put into use. When the huge page is put
+	 * into use by prep_new_huge_page() the list will be reinitialized.
+	 */
+	INIT_LIST_HEAD(&head->lru);
+
+	while (nr--) {
+		struct page *pte_page = virt_to_page(pte);
+
+		__ClearPageReserved(pte_page);
+		list_add(&pte_page->lru, &head->lru);
+		pte += PAGE_SIZE;
+	}
+}
+
 /*
  * Previously discarded vmemmap pages will be allocated and remapping
  * after this function returns zero.
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index 306e15519da1..f6170720f183 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -16,6 +16,9 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head);
 void hugetlb_vmemmap_init(struct hstate *h);
 int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables);
 void vmemmap_pgtable_free(struct list_head *pgtables);
+unsigned long gigantic_vmemmap_pgtable_prealloc(void);
+void gigantic_vmemmap_pgtable_init(struct huge_bootmem_page *m,
+				   struct page *head);
 
 /*
  * How many vmemmap pages associated with a HugeTLB page that can be freed
@@ -45,6 +48,16 @@ static inline void vmemmap_pgtable_free(struct list_head *pgtables)
 {
 }
 
+static inline unsigned long gigantic_vmemmap_pgtable_prealloc(void)
+{
+	return 0;
+}
+
+static inline void gigantic_vmemmap_pgtable_init(struct huge_bootmem_page *m,
+						 struct page *head)
+{
+}
+
 static inline void hugetlb_vmemmap_init(struct hstate *h)
 {
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages
  2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
  2021-06-09 12:13 ` [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables Muchun Song
  2021-06-09 12:13 ` [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator Muchun Song
@ 2021-06-09 12:13 ` Muchun Song
  2021-06-10 22:35   ` Mike Kravetz
  2021-06-09 12:13 ` [PATCH 4/5] mm: sparsemem: use huge PMD mapping for " Muchun Song
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2021-06-09 12:13 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm, Muchun Song

If the vmemmap is huge PMD mapped, we should split the huge PMD firstly
and then we can change the PTE page table entry. In this patch, we add
the ability of splitting the huge PMD mapping of vmemmap pages.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/mm.h   |  2 +-
 mm/hugetlb.c         | 42 ++++++++++++++++++++++++++++++++++--
 mm/hugetlb_vmemmap.c |  3 ++-
 mm/sparse-vmemmap.c  | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
 4 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cadc8cc2c715..b97e1486c5c1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3056,7 +3056,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
 #endif
 
 void vmemmap_remap_free(unsigned long start, unsigned long end,
-			unsigned long reuse);
+			unsigned long reuse, struct list_head *pgtables);
 int vmemmap_remap_alloc(unsigned long start, unsigned long end,
 			unsigned long reuse, gfp_t gfp_mask);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c3b2a8a494d6..3137c72d9cc7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1609,6 +1609,13 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
 static void __prep_new_huge_page(struct hstate *h, struct page *page)
 {
 	free_huge_page_vmemmap(h, page);
+	/*
+	 * Because we store preallocated pages on @page->lru,
+	 * vmemmap_pgtable_free() must be called before the
+	 * initialization of @page->lru in INIT_LIST_HEAD().
+	 */
+	vmemmap_pgtable_free(&page->lru);
+
 	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
 	hugetlb_set_page_subpool(page, NULL);
@@ -1775,14 +1782,29 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
 		nodemask_t *node_alloc_noretry)
 {
 	struct page *page;
+	LIST_HEAD(pgtables);
+
+	if (vmemmap_pgtable_prealloc(h, &pgtables))
+		return NULL;
 
 	if (hstate_is_gigantic(h))
 		page = alloc_gigantic_page(h, gfp_mask, nid, nmask);
 	else
 		page = alloc_buddy_huge_page(h, gfp_mask,
 				nid, nmask, node_alloc_noretry);
-	if (!page)
+	if (!page) {
+		vmemmap_pgtable_free(&pgtables);
 		return NULL;
+	}
+
+	/*
+	 * Use the huge page lru list to temporarily store the preallocated
+	 * pages. The preallocated pages are used and the list is emptied
+	 * before the huge page is put into use. When the huge page is put
+	 * into use by __prep_new_huge_page() the list will be reinitialized.
+	 */
+	INIT_LIST_HEAD(&page->lru);
+	list_splice(&pgtables, &page->lru);
 
 	if (hstate_is_gigantic(h))
 		prep_compound_gigantic_page(page, huge_page_order(h));
@@ -2417,6 +2439,10 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 	int nid = page_to_nid(old_page);
 	struct page *new_page;
 	int ret = 0;
+	LIST_HEAD(pgtables);
+
+	if (vmemmap_pgtable_prealloc(h, &pgtables))
+		return -ENOMEM;
 
 	/*
 	 * Before dissolving the page, we need to allocate a new one for the
@@ -2426,8 +2452,15 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 	 * under the lock.
 	 */
 	new_page = alloc_buddy_huge_page(h, gfp_mask, nid, NULL, NULL);
-	if (!new_page)
+	if (!new_page) {
+		vmemmap_pgtable_free(&pgtables);
 		return -ENOMEM;
+	}
+
+	/* See the comments in alloc_fresh_huge_page(). */
+	INIT_LIST_HEAD(&new_page->lru);
+	list_splice(&pgtables, &new_page->lru);
+
 	__prep_new_huge_page(h, new_page);
 
 retry:
@@ -2711,6 +2744,7 @@ static void __init gather_bootmem_prealloc(void)
 		WARN_ON(page_count(page) != 1);
 		prep_compound_huge_page(page, huge_page_order(h));
 		WARN_ON(PageReserved(page));
+		gigantic_vmemmap_pgtable_init(m, page);
 		prep_new_huge_page(h, page, page_to_nid(page));
 		put_page(page); /* free it into the hugepage allocator */
 
@@ -2763,6 +2797,10 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 			break;
 		cond_resched();
 	}
+
+	if (hstate_is_gigantic(h))
+		i -= gigantic_vmemmap_pgtable_prealloc();
+
 	if (i < h->max_huge_pages) {
 		char buf[32];
 
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 6f3a47b4ebd3..01f3652fa359 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -375,7 +375,8 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head)
 	 * to the page which @vmemmap_reuse is mapped to, then free the pages
 	 * which the range [@vmemmap_addr, @vmemmap_end] is mapped to.
 	 */
-	vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
+	vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse,
+			   &head->lru);
 
 	SetHPageVmemmapOptimized(head);
 }
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 693de0aec7a8..fedb3f56110c 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -42,6 +42,8 @@
  * @reuse_addr:		the virtual address of the @reuse_page page.
  * @vmemmap_pages:	the list head of the vmemmap pages that can be freed
  *			or is mapped from.
+ * @pgtables:		the list of page tables which is used for splitting huge
+ *			PMD page tables.
  */
 struct vmemmap_remap_walk {
 	void (*remap_pte)(pte_t *pte, unsigned long addr,
@@ -49,8 +51,49 @@ struct vmemmap_remap_walk {
 	struct page *reuse_page;
 	unsigned long reuse_addr;
 	struct list_head *vmemmap_pages;
+	struct list_head *pgtables;
 };
 
+#define VMEMMAP_HPMD_ORDER		(PMD_SHIFT - PAGE_SHIFT)
+#define VMEMMAP_HPMD_NR			(1 << VMEMMAP_HPMD_ORDER)
+
+static inline pte_t *pte_withdraw(struct vmemmap_remap_walk *walk)
+{
+	pgtable_t pgtable;
+
+	pgtable = list_first_entry(walk->pgtables, struct page, lru);
+	list_del(&pgtable->lru);
+
+	return page_to_virt(pgtable);
+}
+
+static void split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
+				   struct vmemmap_remap_walk *walk)
+{
+	int i;
+	pmd_t tmp;
+	pte_t *new = pte_withdraw(walk);
+	struct page *page = pmd_page(*pmd);
+	unsigned long addr = start;
+
+	pmd_populate_kernel(&init_mm, &tmp, new);
+
+	for (i = 0; i < VMEMMAP_HPMD_NR; i++, addr += PAGE_SIZE) {
+		pte_t entry, *pte;
+		pgprot_t pgprot = PAGE_KERNEL;
+
+		entry = mk_pte(page + i, pgprot);
+		pte = pte_offset_kernel(&tmp, addr);
+		set_pte_at(&init_mm, addr, pte, entry);
+	}
+
+	/* Make pte visible before pmd. See comment in __pte_alloc(). */
+	smp_wmb();
+	pmd_populate_kernel(&init_mm, pmd, new);
+
+	flush_tlb_kernel_range(start, start + PMD_SIZE);
+}
+
 static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
 			      unsigned long end,
 			      struct vmemmap_remap_walk *walk)
@@ -84,8 +127,8 @@ static void vmemmap_pmd_range(pud_t *pud, unsigned long addr,
 
 	pmd = pmd_offset(pud, addr);
 	do {
-		BUG_ON(pmd_leaf(*pmd));
-
+		if (pmd_leaf(*pmd))
+			split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
 		next = pmd_addr_end(addr, end);
 		vmemmap_pte_range(pmd, addr, next, walk);
 	} while (pmd++, addr = next, addr != end);
@@ -192,18 +235,17 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
  * @end:	end address of the vmemmap virtual address range that we want to
  *		remap.
  * @reuse:	reuse address.
- *
- * Note: This function depends on vmemmap being base page mapped. Please make
- * sure that we disable PMD mapping of vmemmap pages when calling this function.
+ * @pgtables:	the list of page tables used for splitting huge PMD.
  */
 void vmemmap_remap_free(unsigned long start, unsigned long end,
-			unsigned long reuse)
+			unsigned long reuse, struct list_head *pgtables)
 {
 	LIST_HEAD(vmemmap_pages);
 	struct vmemmap_remap_walk walk = {
 		.remap_pte	= vmemmap_remap_pte,
 		.reuse_addr	= reuse,
 		.vmemmap_pages	= &vmemmap_pages,
+		.pgtables	= pgtables,
 	};
 
 	/*
@@ -221,7 +263,10 @@ void vmemmap_remap_free(unsigned long start, unsigned long end,
 	 */
 	BUG_ON(start - reuse != PAGE_SIZE);
 
+	mmap_write_lock(&init_mm);
 	vmemmap_remap_range(reuse, end, &walk);
+	mmap_write_unlock(&init_mm);
+
 	free_vmemmap_page_list(&vmemmap_pages);
 }
 
@@ -287,12 +332,12 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end,
 	/* See the comment in the vmemmap_remap_free(). */
 	BUG_ON(start - reuse != PAGE_SIZE);
 
-	might_sleep_if(gfpflags_allow_blocking(gfp_mask));
-
 	if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages))
 		return -ENOMEM;
 
+	mmap_read_lock(&init_mm);
 	vmemmap_remap_range(reuse, end, &walk);
+	mmap_read_unlock(&init_mm);
 
 	return 0;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] mm: sparsemem: use huge PMD mapping for vmemmap pages
  2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
                   ` (2 preceding siblings ...)
  2021-06-09 12:13 ` [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages Muchun Song
@ 2021-06-09 12:13 ` Muchun Song
  2021-06-10 22:49   ` Mike Kravetz
  2021-06-09 12:13 ` [PATCH 5/5] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
  2021-06-10 21:32 ` [PATCH 0/5] Split huge PMD mapping of vmemmap pages Mike Kravetz
  5 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2021-06-09 12:13 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm, Muchun Song

The preparation of splitting huge PMD mapping of vmemmap pages is ready,
so switch the mapping from PTE to PMD.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 -------
 arch/x86/mm/init_64.c                           |  8 ++------
 include/linux/hugetlb.h                         | 25 ++++++-------------------
 mm/memory_hotplug.c                             |  2 +-
 4 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index db1ef6739613..a01aadafee38 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1599,13 +1599,6 @@
 			enabled.
 			Allows heavy hugetlb users to free up some more
 			memory (6 * PAGE_SIZE for each 2MB hugetlb page).
-			This feauture is not free though. Large page
-			tables are not used to back vmemmap pages which
-			can lead to a performance degradation for some
-			workloads. Also there will be memory allocation
-			required when hugetlb pages are freed from the
-			pool which can lead to corner cases under heavy
-			memory pressure.
 			Format: { on | off (default) }
 
 			on:  enable the feature
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 9d9d18d0c2a1..65ea58527176 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -34,7 +34,6 @@
 #include <linux/gfp.h>
 #include <linux/kcore.h>
 #include <linux/bootmem_info.h>
-#include <linux/hugetlb.h>
 
 #include <asm/processor.h>
 #include <asm/bios_ebda.h>
@@ -1610,8 +1609,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
 	VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
 
-	if ((is_hugetlb_free_vmemmap_enabled()  && !altmap) ||
-	    end - start < PAGES_PER_SECTION * sizeof(struct page))
+	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
 		err = vmemmap_populate_basepages(start, end, node, NULL);
 	else if (boot_cpu_has(X86_FEATURE_PSE))
 		err = vmemmap_populate_hugepages(start, end, node, altmap);
@@ -1639,8 +1637,6 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 	pmd_t *pmd;
 	unsigned int nr_pmd_pages;
 	struct page *page;
-	bool base_mapping = !boot_cpu_has(X86_FEATURE_PSE) ||
-			    is_hugetlb_free_vmemmap_enabled();
 
 	for (; addr < end; addr = next) {
 		pte_t *pte = NULL;
@@ -1666,7 +1662,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 		}
 		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
 
-		if (base_mapping) {
+		if (!boot_cpu_has(X86_FEATURE_PSE)) {
 			next = (addr + PAGE_SIZE) & PAGE_MASK;
 			pmd = pmd_offset(pud, addr);
 			if (pmd_none(*pmd))
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c27a299c4211..2b46e6494114 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -907,20 +907,6 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 }
 #endif
 
-#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
-
-static inline bool is_hugetlb_free_vmemmap_enabled(void)
-{
-	return hugetlb_free_vmemmap_enabled;
-}
-#else
-static inline bool is_hugetlb_free_vmemmap_enabled(void)
-{
-	return false;
-}
-#endif
-
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
@@ -1080,13 +1066,14 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
 					pte_t *ptep, pte_t pte, unsigned long sz)
 {
 }
-
-static inline bool is_hugetlb_free_vmemmap_enabled(void)
-{
-	return false;
-}
 #endif	/* CONFIG_HUGETLB_PAGE */
 
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+extern bool hugetlb_free_vmemmap_enabled;
+#else
+#define hugetlb_free_vmemmap_enabled	false
+#endif
+
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
 					struct mm_struct *mm, pte_t *pte)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d96a3c7551c8..9d8a551c08d5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1056,7 +1056,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       populate a single PMD.
 	 */
 	return memmap_on_memory &&
-	       !is_hugetlb_free_vmemmap_enabled() &&
+	       !hugetlb_free_vmemmap_enabled &&
 	       IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
 	       size == memory_block_size_bytes() &&
 	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
  2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
                   ` (3 preceding siblings ...)
  2021-06-09 12:13 ` [PATCH 4/5] mm: sparsemem: use huge PMD mapping for " Muchun Song
@ 2021-06-09 12:13 ` Muchun Song
  2021-06-10 21:32 ` [PATCH 0/5] Split huge PMD mapping of vmemmap pages Mike Kravetz
  5 siblings, 0 replies; 14+ messages in thread
From: Muchun Song @ 2021-06-09 12:13 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm, Muchun Song

When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages
associated with each HugeTLB page is default off. Now the vmemmap is PMD
mapped. So there is no side effect when this feature is enabled with no
HugeTLB pages in the system. Someone may want to enable this feature in
the compiler time instead of using boot command line. So add a config to
make it default on when someone do not want to enable it via command line.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 fs/Kconfig                                      | 10 ++++++++++
 mm/hugetlb_vmemmap.c                            |  6 ++++--
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a01aadafee38..8eee439d943c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1604,6 +1604,9 @@
 			on:  enable the feature
 			off: disable the feature
 
+			Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y,
+			the default is on.
+
 			This is not compatible with memory_hotplug.memmap_on_memory.
 			If both parameters are enabled, hugetlb_free_vmemmap takes
 			precedence over memory_hotplug.memmap_on_memory.
diff --git a/fs/Kconfig b/fs/Kconfig
index f40b5b98f7ba..e78bc5daf7b0 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -245,6 +245,16 @@ config HUGETLB_PAGE_FREE_VMEMMAP
 	depends on X86_64
 	depends on SPARSEMEM_VMEMMAP
 
+config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
+	bool "Default freeing vmemmap pages of HugeTLB to on"
+	default n
+	depends on HUGETLB_PAGE_FREE_VMEMMAP
+	help
+	  When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap
+	  pages associated with each HugeTLB page is default off. Say Y here
+	  to enable freeing vmemmap pages of HugeTLB by default. It can then
+	  be disabled on the command line via hugetlb_free_vmemmap=off.
+
 config MEMFD_CREATE
 	def_bool TMPFS || HUGETLBFS
 
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 01f3652fa359..b5f4f29e042a 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -186,7 +186,7 @@
 #define RESERVE_VMEMMAP_NR		2U
 #define RESERVE_VMEMMAP_SIZE		(RESERVE_VMEMMAP_NR << PAGE_SHIFT)
 
-bool hugetlb_free_vmemmap_enabled;
+bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
 
 static int __init early_hugetlb_free_vmemmap_param(char *buf)
 {
@@ -201,7 +201,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf)
 
 	if (!strcmp(buf, "on"))
 		hugetlb_free_vmemmap_enabled = true;
-	else if (strcmp(buf, "off"))
+	else if (!strcmp(buf, "off"))
+		hugetlb_free_vmemmap_enabled = false;
+	else
 		return -EINVAL;
 
 	return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Split huge PMD mapping of vmemmap pages
  2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
                   ` (4 preceding siblings ...)
  2021-06-09 12:13 ` [PATCH 5/5] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
@ 2021-06-10 21:32 ` Mike Kravetz
  2021-06-11  3:23   ` [External] " Muchun Song
  5 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2021-06-10 21:32 UTC (permalink / raw)
  To: Muchun Song, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm

On 6/9/21 5:13 AM, Muchun Song wrote:
> In order to reduce the difficulty of code review in series[1]. We disable
> huge PMD mapping of vmemmap pages when that feature is enabled. In this
> series, we do not disable huge PMD mapping of vmemmap pages anymore. We
> will split huge PMD mapping when needed.

Thank you Muchun!

Adding this functionality should reduce the decisions a sys admin needs
to make WRT vmemmap reduction for hugetlb pages.  There should be no
downside to enabling vmemmap reduction as moving from PMD to PTE mapping
happens 'on demand' as hugetlb pages are added to the pool.

I just want to clarify something for myself and possibly other
reviewers.   At hugetlb page allocation time, we move to PTE mappings.
When hugetlb pages are freed from the pool we do not attempt coalasce
and move back to a PMD mapping.  Correct?  I am not suggesting we do
this and I suspect it is much more complex.  Just want to make sure I
understand the functionality of this series.

BTW - Just before you sent this series I had worked up a version of
hugetlb page demote [2] with vmemmap optimizations.  That code will need
to be reworked.  However, if we never coalesce and move back to PMD
mappings it might make that effort easier.

[2] https://lore.kernel.org/linux-mm/20210309001855.142453-1-mike.kravetz@oracle.com/
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables
  2021-06-09 12:13 ` [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables Muchun Song
@ 2021-06-10 21:49   ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2021-06-10 21:49 UTC (permalink / raw)
  To: Muchun Song, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm

On 6/9/21 5:13 AM, Muchun Song wrote:
> On some architectures (e.g. x86_64 and arm64), vmemmap pages are usually
> mapped with huge pmd. We will disable the huge pmd mapping of vmemmap
> pages when the feature of "Free vmemmap pages of HugeTLB page" is enabled.
> This can affect the non-HugeTLB pages. What we want is only mapping the
> vmemmap pages associated with HugeTLB pages with base page. We can split
> the huge pmd mapping of vmemmap pages when freeing vmemmap pages of
> HugeTLB page. But we need to preallocate page tables. In this patch, we
> introduce page tables allocationg/freeing helpers.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb_vmemmap.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/hugetlb_vmemmap.h | 12 ++++++++++++
>  2 files changed, 66 insertions(+)

These helper routines are pretty straight forward.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator
  2021-06-09 12:13 ` [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator Muchun Song
@ 2021-06-10 22:13   ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2021-06-10 22:13 UTC (permalink / raw)
  To: Muchun Song, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm

On 6/9/21 5:13 AM, Muchun Song wrote:
> If we want to split the huge PMD of vmemmap pages associated with each
> gigantic page allocated from bootmem allocator, we should pre-allocate
> the page tables from bootmem allocator.

Just curious why this is necessary and a good idea?  Why not wait until
the gigantic pages allocated from bootmem are added to the pool to
allocate any necessary vmemmmap pages?

> the page tables from bootmem allocator. In this patch, we introduce
> some helpers to preallocate page tables for gigantic pages.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/hugetlb.h |  3 +++
>  mm/hugetlb_vmemmap.c    | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/hugetlb_vmemmap.h    | 13 ++++++++++
>  3 files changed, 79 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 03ca83db0a3e..c27a299c4211 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -622,6 +622,9 @@ struct hstate {
>  struct huge_bootmem_page {
>  	struct list_head list;
>  	struct hstate *hstate;
> +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> +	pte_t *vmemmap_pte;
> +#endif
>  };
>  
>  int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 628e2752714f..6f3a47b4ebd3 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -171,6 +171,7 @@
>  #define pr_fmt(fmt)	"HugeTLB: " fmt
>  
>  #include <linux/list.h>
> +#include <linux/memblock.h>
>  #include <asm/pgalloc.h>
>  
>  #include "hugetlb_vmemmap.h"
> @@ -263,6 +264,68 @@ int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables)
>  	return -ENOMEM;
>  }
>  
> +unsigned long __init gigantic_vmemmap_pgtable_prealloc(void)
> +{
> +	struct huge_bootmem_page *m, *tmp;
> +	unsigned long nr_free = 0;
> +
> +	list_for_each_entry_safe(m, tmp, &huge_boot_pages, list) {
> +		struct hstate *h = m->hstate;
> +		unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
> +		unsigned long size;
> +
> +		if (!nr)
> +			continue;
> +
> +		size = nr << PAGE_SHIFT;
> +		m->vmemmap_pte = memblock_alloc_try_nid(size, PAGE_SIZE, 0,
> +							MEMBLOCK_ALLOC_ACCESSIBLE,
> +							NUMA_NO_NODE);
> +		if (!m->vmemmap_pte) {
> +			nr_free++;
> +			list_del(&m->list);
> +			memblock_free_early(__pa(m), huge_page_size(h));

If we can not allocate the vmmmemap pages to split the PMD, then we will
not add the huge page to the pool.  Correct?

Perhaps I am thinking about this incorrectly, but this seems wrong.  We
already have everything we need to add the page to the pool.  vmemmap
reduction is an optimization.  So, the allocation failure is associated
with an optimization.  In this case, it seems like we should just skip
the optimization (vmemmap reduction) and proceed to add the page to the
pool?  It seems we do the same thing in subsequent patches.

Again, I could be thinking about this incorrectly.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages
  2021-06-09 12:13 ` [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages Muchun Song
@ 2021-06-10 22:35   ` Mike Kravetz
  2021-06-11  7:52     ` [External] " Muchun Song
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2021-06-10 22:35 UTC (permalink / raw)
  To: Muchun Song, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm

On 6/9/21 5:13 AM, Muchun Song wrote:
> If the vmemmap is huge PMD mapped, we should split the huge PMD firstly
> and then we can change the PTE page table entry. In this patch, we add
> the ability of splitting the huge PMD mapping of vmemmap pages.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/mm.h   |  2 +-
>  mm/hugetlb.c         | 42 ++++++++++++++++++++++++++++++++++--
>  mm/hugetlb_vmemmap.c |  3 ++-
>  mm/sparse-vmemmap.c  | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
>  4 files changed, 96 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cadc8cc2c715..b97e1486c5c1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3056,7 +3056,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
>  #endif
>  
>  void vmemmap_remap_free(unsigned long start, unsigned long end,
> -			unsigned long reuse);
> +			unsigned long reuse, struct list_head *pgtables);
>  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
>  			unsigned long reuse, gfp_t gfp_mask);
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c3b2a8a494d6..3137c72d9cc7 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1609,6 +1609,13 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
>  static void __prep_new_huge_page(struct hstate *h, struct page *page)
>  {
>  	free_huge_page_vmemmap(h, page);
> +	/*
> +	 * Because we store preallocated pages on @page->lru,
> +	 * vmemmap_pgtable_free() must be called before the
> +	 * initialization of @page->lru in INIT_LIST_HEAD().
> +	 */
> +	vmemmap_pgtable_free(&page->lru);
> +
>  	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>  	hugetlb_set_page_subpool(page, NULL);
> @@ -1775,14 +1782,29 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
>  		nodemask_t *node_alloc_noretry)
>  {
>  	struct page *page;
> +	LIST_HEAD(pgtables);
> +
> +	if (vmemmap_pgtable_prealloc(h, &pgtables))
> +		return NULL;

In the previous two patches I asked:
- Can we wait until later to prealloc vmemmap pages for gigantic pages
  allocated from bootmem?
- Should we fail to add a hugetlb page to the pool if we can not do
  vmemmap optimization?


Depending on the answers to those questions, we may be able to eliminate
these vmemmap_pgtable_prealloc/vmemmap_pgtable_free calls in hugetlb.c.
What about adding the calls to free_huge_page_vmemmap?
At the beginning of free_huge_page_vmemmap, allocate any vmemmap pgtable
pages.  If it fails, skip optimization.  We can free any pages before
returning to the caller.

Since we also know the page/address in the page table can we check to see
if it is already PTE mapped.  If so, can we then skip allocation?
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm: sparsemem: use huge PMD mapping for vmemmap pages
  2021-06-09 12:13 ` [PATCH 4/5] mm: sparsemem: use huge PMD mapping for " Muchun Song
@ 2021-06-10 22:49   ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2021-06-10 22:49 UTC (permalink / raw)
  To: Muchun Song, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet
  Cc: duanxiongchun, fam.zheng, zhengqi.arch, linux-doc, linux-kernel,
	linux-mm

On 6/9/21 5:13 AM, Muchun Song wrote:
> The preparation of splitting huge PMD mapping of vmemmap pages is ready,
> so switch the mapping from PTE to PMD.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  7 -------
>  arch/x86/mm/init_64.c                           |  8 ++------
>  include/linux/hugetlb.h                         | 25 ++++++-------------------
>  mm/memory_hotplug.c                             |  2 +-
>  4 files changed, 9 insertions(+), 33 deletions(-)

This pretty much removes all the code previously added to disable PMD
mapping if vmemmap optimizations were requested.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 0/5] Split huge PMD mapping of vmemmap pages
  2021-06-10 21:32 ` [PATCH 0/5] Split huge PMD mapping of vmemmap pages Mike Kravetz
@ 2021-06-11  3:23   ` Muchun Song
  0 siblings, 0 replies; 14+ messages in thread
From: Muchun Song @ 2021-06-11  3:23 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Oscar Salvador, Michal Hocko,
	Song Bao Hua (Barry Song),
	David Hildenbrand, Chen Huang, Bodeddula, Balasubramaniam,
	Jonathan Corbet, Xiongchun duan, fam.zheng, zhengqi.arch,
	linux-doc, LKML, Linux Memory Management List

On Fri, Jun 11, 2021 at 5:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 6/9/21 5:13 AM, Muchun Song wrote:
> > In order to reduce the difficulty of code review in series[1]. We disable
> > huge PMD mapping of vmemmap pages when that feature is enabled. In this
> > series, we do not disable huge PMD mapping of vmemmap pages anymore. We
> > will split huge PMD mapping when needed.
>
> Thank you Muchun!
>
> Adding this functionality should reduce the decisions a sys admin needs
> to make WRT vmemmap reduction for hugetlb pages.  There should be no
> downside to enabling vmemmap reduction as moving from PMD to PTE mapping
> happens 'on demand' as hugetlb pages are added to the pool.

Agree.

>
> I just want to clarify something for myself and possibly other
> reviewers.   At hugetlb page allocation time, we move to PTE mappings.
> When hugetlb pages are freed from the pool we do not attempt coalasce
> and move back to a PMD mapping.  Correct?  I am not suggesting we do
> this and I suspect it is much more complex.  Just want to make sure I
> understand the functionality of this series.

Totally right. Coalescing is very complex. So I do not do this in this
series.

>
> BTW - Just before you sent this series I had worked up a version of
> hugetlb page demote [2] with vmemmap optimizations.  That code will need
> to be reworked.  However, if we never coalesce and move back to PMD
> mappings it might make that effort easier.
>
> [2] https://lore.kernel.org/linux-mm/20210309001855.142453-1-mike.kravetz@oracle.com/

I've not looked at this deeply. I will go take a look.

Thanks Mike.

> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages
  2021-06-10 22:35   ` Mike Kravetz
@ 2021-06-11  7:52     ` Muchun Song
  2021-06-11 12:35       ` Muchun Song
  0 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2021-06-11  7:52 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Oscar Salvador, Michal Hocko,
	Song Bao Hua (Barry Song),
	David Hildenbrand, Chen Huang, Bodeddula, Balasubramaniam,
	Jonathan Corbet, Xiongchun duan, fam.zheng, zhengqi.arch,
	linux-doc, LKML, Linux Memory Management List

On Fri, Jun 11, 2021 at 6:35 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 6/9/21 5:13 AM, Muchun Song wrote:
> > If the vmemmap is huge PMD mapped, we should split the huge PMD firstly
> > and then we can change the PTE page table entry. In this patch, we add
> > the ability of splitting the huge PMD mapping of vmemmap pages.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/mm.h   |  2 +-
> >  mm/hugetlb.c         | 42 ++++++++++++++++++++++++++++++++++--
> >  mm/hugetlb_vmemmap.c |  3 ++-
> >  mm/sparse-vmemmap.c  | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
> >  4 files changed, 96 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index cadc8cc2c715..b97e1486c5c1 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3056,7 +3056,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
> >  #endif
> >
> >  void vmemmap_remap_free(unsigned long start, unsigned long end,
> > -                     unsigned long reuse);
> > +                     unsigned long reuse, struct list_head *pgtables);
> >  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
> >                       unsigned long reuse, gfp_t gfp_mask);
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index c3b2a8a494d6..3137c72d9cc7 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1609,6 +1609,13 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
> >  static void __prep_new_huge_page(struct hstate *h, struct page *page)
> >  {
> >       free_huge_page_vmemmap(h, page);
> > +     /*
> > +      * Because we store preallocated pages on @page->lru,
> > +      * vmemmap_pgtable_free() must be called before the
> > +      * initialization of @page->lru in INIT_LIST_HEAD().
> > +      */
> > +     vmemmap_pgtable_free(&page->lru);
> > +
> >       INIT_LIST_HEAD(&page->lru);
> >       set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> >       hugetlb_set_page_subpool(page, NULL);
> > @@ -1775,14 +1782,29 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
> >               nodemask_t *node_alloc_noretry)
> >  {
> >       struct page *page;
> > +     LIST_HEAD(pgtables);
> > +
> > +     if (vmemmap_pgtable_prealloc(h, &pgtables))
> > +             return NULL;
>
> In the previous two patches I asked:
> - Can we wait until later to prealloc vmemmap pages for gigantic pages
>   allocated from bootmem?
> - Should we fail to add a hugetlb page to the pool if we can not do
>   vmemmap optimization?
>
>
> Depending on the answers to those questions, we may be able to eliminate
> these vmemmap_pgtable_prealloc/vmemmap_pgtable_free calls in hugetlb.c.
> What about adding the calls to free_huge_page_vmemmap?
> At the beginning of free_huge_page_vmemmap, allocate any vmemmap pgtable
> pages.  If it fails, skip optimization.  We can free any pages before
> returning to the caller.

You are right because we've introduced HPageVmemmapOptimized flag.
It can be useful here. If failing to optimize vmemmap is allowed, we can
eliminate allocating/freeing page table helpers. Thanks for your reminder.

>
> Since we also know the page/address in the page table can we check to see
> if it is already PTE mapped.  If so, can we then skip allocation?

Good point. We need to allocate 512 page tables when splitting
1 GB huge page. If we fail to allocate page tables in the middle
of processing of remapping, we should restore the previous
mapping. I just want to clarify something for myself.

Thanks, Mike. I'll try in the next version.


> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages
  2021-06-11  7:52     ` [External] " Muchun Song
@ 2021-06-11 12:35       ` Muchun Song
  0 siblings, 0 replies; 14+ messages in thread
From: Muchun Song @ 2021-06-11 12:35 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Oscar Salvador, Michal Hocko,
	Song Bao Hua (Barry Song),
	David Hildenbrand, Chen Huang, Bodeddula, Balasubramaniam,
	Jonathan Corbet, Xiongchun duan, fam.zheng, zhengqi.arch,
	linux-doc, LKML, Linux Memory Management List

On Fri, Jun 11, 2021 at 3:52 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Fri, Jun 11, 2021 at 6:35 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 6/9/21 5:13 AM, Muchun Song wrote:
> > > If the vmemmap is huge PMD mapped, we should split the huge PMD firstly
> > > and then we can change the PTE page table entry. In this patch, we add
> > > the ability of splitting the huge PMD mapping of vmemmap pages.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  include/linux/mm.h   |  2 +-
> > >  mm/hugetlb.c         | 42 ++++++++++++++++++++++++++++++++++--
> > >  mm/hugetlb_vmemmap.c |  3 ++-
> > >  mm/sparse-vmemmap.c  | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
> > >  4 files changed, 96 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index cadc8cc2c715..b97e1486c5c1 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -3056,7 +3056,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
> > >  #endif
> > >
> > >  void vmemmap_remap_free(unsigned long start, unsigned long end,
> > > -                     unsigned long reuse);
> > > +                     unsigned long reuse, struct list_head *pgtables);
> > >  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
> > >                       unsigned long reuse, gfp_t gfp_mask);
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index c3b2a8a494d6..3137c72d9cc7 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1609,6 +1609,13 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
> > >  static void __prep_new_huge_page(struct hstate *h, struct page *page)
> > >  {
> > >       free_huge_page_vmemmap(h, page);
> > > +     /*
> > > +      * Because we store preallocated pages on @page->lru,
> > > +      * vmemmap_pgtable_free() must be called before the
> > > +      * initialization of @page->lru in INIT_LIST_HEAD().
> > > +      */
> > > +     vmemmap_pgtable_free(&page->lru);
> > > +
> > >       INIT_LIST_HEAD(&page->lru);
> > >       set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> > >       hugetlb_set_page_subpool(page, NULL);
> > > @@ -1775,14 +1782,29 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
> > >               nodemask_t *node_alloc_noretry)
> > >  {
> > >       struct page *page;
> > > +     LIST_HEAD(pgtables);
> > > +
> > > +     if (vmemmap_pgtable_prealloc(h, &pgtables))
> > > +             return NULL;
> >
> > In the previous two patches I asked:
> > - Can we wait until later to prealloc vmemmap pages for gigantic pages
> >   allocated from bootmem?
> > - Should we fail to add a hugetlb page to the pool if we can not do
> >   vmemmap optimization?
> >
> >
> > Depending on the answers to those questions, we may be able to eliminate
> > these vmemmap_pgtable_prealloc/vmemmap_pgtable_free calls in hugetlb.c.
> > What about adding the calls to free_huge_page_vmemmap?
> > At the beginning of free_huge_page_vmemmap, allocate any vmemmap pgtable
> > pages.  If it fails, skip optimization.  We can free any pages before
> > returning to the caller.
>
> You are right because we've introduced HPageVmemmapOptimized flag.
> It can be useful here. If failing to optimize vmemmap is allowed, we can
> eliminate allocating/freeing page table helpers. Thanks for your reminder.
>
> >
> > Since we also know the page/address in the page table can we check to see
> > if it is already PTE mapped.  If so, can we then skip allocation?
>
> Good point. We need to allocate 512 page tables when splitting

Sorry, it is 7 page tables here.

> 1 GB huge page. If we fail to allocate page tables in the middle
> of processing of remapping, we should restore the previous
> mapping. I just want to clarify something for myself.
>
> Thanks, Mike. I'll try in the next version.
>
>
> > --
> > Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-06-11 12:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
2021-06-09 12:13 ` [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables Muchun Song
2021-06-10 21:49   ` Mike Kravetz
2021-06-09 12:13 ` [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator Muchun Song
2021-06-10 22:13   ` Mike Kravetz
2021-06-09 12:13 ` [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages Muchun Song
2021-06-10 22:35   ` Mike Kravetz
2021-06-11  7:52     ` [External] " Muchun Song
2021-06-11 12:35       ` Muchun Song
2021-06-09 12:13 ` [PATCH 4/5] mm: sparsemem: use huge PMD mapping for " Muchun Song
2021-06-10 22:49   ` Mike Kravetz
2021-06-09 12:13 ` [PATCH 5/5] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
2021-06-10 21:32 ` [PATCH 0/5] Split huge PMD mapping of vmemmap pages Mike Kravetz
2021-06-11  3:23   ` [External] " Muchun Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.