All of lore.kernel.org
 help / color / mirror / Atom feed
* [v3 0/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
@ 2023-08-25 11:18 Usama Arif
  2023-08-25 11:18 ` [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Usama Arif @ 2023-08-25 11:18 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, songmuchun, fam.zheng, liangma, punit.agrawal, Usama Arif

This series moves the boot time initialization of tail struct pages of a
gigantic page to after HVO is attempted. If HVO is successful, only
HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) struct pages need to
be initialized. For a 1G hugepage, this series avoid initialization of
262144 - 63 = 262081 struct pages per hugepage.

When tested on a 512G system (which can allocate max 500 1G hugepages), the
kexec-boot time with HVO and DEFERRED_STRUCT_PAGE_INIT enabled without this
patchseries to running init is 3.9 seconds. With this patch it is 1.2 seconds.
This represents an approximately 70% reduction in boot time and will
significantly reduce server downtime when using a large number of
gigantic pages.

Thanks,
Usama

[v2->v3]:
- (Muchun Song) skip prep of struct pages backing gigantic hugepages
at boot time only.
- (Muchun Song) move initialization of tail struct pages to after
HVO is attempted. This also means that the hugetlb command line parsing
does not need to be changed.

[v1->v2]:
- (Mike Rapoport) Code quality improvements (function names, arguments,
comments).

[RFC->v1]:
- (Mike Rapoport) Change from passing hugepage_size in
memblock_alloc_try_nid_raw for skipping struct page initialization to
using MEMBLOCK_RSRV_NOINIT flag

Usama Arif (4):
  mm: hugetlb_vmemmap: Use nid of the head page to reallocate it
  memblock: pass memblock_type to memblock_setclr_flag
  memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag
  mm: hugetlb: Skip initialization of gigantic tail struct pages if
    freed by HVO

 include/linux/memblock.h | 10 ++++++++
 mm/hugetlb.c             | 52 ++++++++++++++++++++++++++++++++--------
 mm/hugetlb_vmemmap.c     |  2 +-
 mm/hugetlb_vmemmap.h     |  8 +++----
 mm/internal.h            |  3 +++
 mm/memblock.c            | 47 ++++++++++++++++++++++++++----------
 mm/mm_init.c             |  2 +-
 7 files changed, 95 insertions(+), 29 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it
  2023-08-25 11:18 [v3 0/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
@ 2023-08-25 11:18 ` Usama Arif
  2023-08-28  7:15   ` Muchun Song
  2023-08-25 11:18 ` [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Usama Arif @ 2023-08-25 11:18 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, songmuchun, fam.zheng, liangma, punit.agrawal, Usama Arif

If tail page prep and initialization is skipped, then the "start"
page will not contain the correct nid. Use the nid from first
vmemap page.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 mm/hugetlb_vmemmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c2007ef5e9b0..208907f2c5e1 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -324,7 +324,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
 		.reuse_addr	= reuse,
 		.vmemmap_pages	= &vmemmap_pages,
 	};
-	int nid = page_to_nid((struct page *)start);
+	int nid = page_to_nid((struct page *)reuse);
 	gfp_t gfp_mask = GFP_KERNEL | __GFP_THISNODE | __GFP_NORETRY |
 			__GFP_NOWARN;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag
  2023-08-25 11:18 [v3 0/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
  2023-08-25 11:18 ` [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
@ 2023-08-25 11:18 ` Usama Arif
  2023-08-28  7:16   ` Muchun Song
                     ` (2 more replies)
  2023-08-25 11:18 ` [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag Usama Arif
  2023-08-25 11:18 ` [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
  3 siblings, 3 replies; 25+ messages in thread
From: Usama Arif @ 2023-08-25 11:18 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, songmuchun, fam.zheng, liangma, punit.agrawal, Usama Arif

This allows setting flags to both memblock types and is in preparation for
setting flags (for e.g. to not initialize struct pages) on reserved
memory region.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 mm/memblock.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index f9e61e565a53..43cb4404d94c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -896,10 +896,9 @@ int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size)
  *
  * Return: 0 on success, -errno on failure.
  */
-static int __init_memblock memblock_setclr_flag(phys_addr_t base,
-				phys_addr_t size, int set, int flag)
+static int __init_memblock memblock_setclr_flag(struct memblock_type *type,
+				phys_addr_t base, phys_addr_t size, int set, int flag)
 {
-	struct memblock_type *type = &memblock.memory;
 	int i, ret, start_rgn, end_rgn;
 
 	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
@@ -928,7 +927,7 @@ static int __init_memblock memblock_setclr_flag(phys_addr_t base,
  */
 int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_setclr_flag(base, size, 1, MEMBLOCK_HOTPLUG);
+	return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_HOTPLUG);
 }
 
 /**
@@ -940,7 +939,7 @@ int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
  */
 int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_setclr_flag(base, size, 0, MEMBLOCK_HOTPLUG);
+	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_HOTPLUG);
 }
 
 /**
@@ -957,7 +956,7 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
 
 	system_has_some_mirror = true;
 
-	return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR);
+	return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_MIRROR);
 }
 
 /**
@@ -977,7 +976,7 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
  */
 int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
+	return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_NOMAP);
 }
 
 /**
@@ -989,7 +988,7 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
  */
 int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
+	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_NOMAP);
 }
 
 static bool should_skip_region(struct memblock_type *type,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag
  2023-08-25 11:18 [v3 0/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
  2023-08-25 11:18 ` [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
  2023-08-25 11:18 ` [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
@ 2023-08-25 11:18 ` Usama Arif
  2023-08-28  7:26   ` Muchun Song
  2023-08-28  7:47   ` Mike Rapoport
  2023-08-25 11:18 ` [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
  3 siblings, 2 replies; 25+ messages in thread
From: Usama Arif @ 2023-08-25 11:18 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, songmuchun, fam.zheng, liangma, punit.agrawal, Usama Arif

For reserved memory regions marked with this flag,
reserve_bootmem_region is not called during memmap_init_reserved_pages.
This can be used to avoid struct page initialization for
regions which won't need them, for e.g. hugepages with
HVO enabled.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 include/linux/memblock.h | 10 ++++++++++
 mm/memblock.c            | 32 +++++++++++++++++++++++++++-----
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f71ff9f0ec81..6d681d053880 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -40,6 +40,8 @@ extern unsigned long long max_possible_pfn;
  * via a driver, and never indicated in the firmware-provided memory map as
  * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
  * kernel resource tree.
+ * @MEMBLOCK_RSRV_NOINIT_VMEMMAP: memory region for which struct pages are
+ * not initialized (only for reserved regions).
  */
 enum memblock_flags {
 	MEMBLOCK_NONE		= 0x0,	/* No special request */
@@ -47,6 +49,8 @@ enum memblock_flags {
 	MEMBLOCK_MIRROR		= 0x2,	/* mirrored region */
 	MEMBLOCK_NOMAP		= 0x4,	/* don't add to kernel direct mapping */
 	MEMBLOCK_DRIVER_MANAGED = 0x8,	/* always detected via a driver */
+	/* don't initialize struct pages associated with this reserver memory block */
+	MEMBLOCK_RSRV_NOINIT_VMEMMAP	= 0x10,
 };
 
 /**
@@ -125,6 +129,7 @@ int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
 int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
 int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
 int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
+int memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size);
 
 void memblock_free_all(void);
 void memblock_free(void *ptr, size_t size);
@@ -259,6 +264,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m)
 	return m->flags & MEMBLOCK_NOMAP;
 }
 
+static inline bool memblock_is_noinit_vmemmap(struct memblock_region *m)
+{
+	return m->flags & MEMBLOCK_RSRV_NOINIT_VMEMMAP;
+}
+
 static inline bool memblock_is_driver_managed(struct memblock_region *m)
 {
 	return m->flags & MEMBLOCK_DRIVER_MANAGED;
diff --git a/mm/memblock.c b/mm/memblock.c
index 43cb4404d94c..a9782228c840 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -991,6 +991,23 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
 	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_NOMAP);
 }
 
+/**
+ * memblock_reserved_mark_noinit_vmemmap - Mark a reserved memory region with flag
+ * MEMBLOCK_RSRV_NOINIT_VMEMMAP.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * struct pages will not be initialized for reserved memory regions marked with
+ * %MEMBLOCK_RSRV_NOINIT_VMEMMAP.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_setclr_flag(&memblock.reserved, base, size, 1,
+				    MEMBLOCK_RSRV_NOINIT_VMEMMAP);
+}
+
 static bool should_skip_region(struct memblock_type *type,
 			       struct memblock_region *m,
 			       int nid, int flags)
@@ -2107,13 +2124,18 @@ static void __init memmap_init_reserved_pages(void)
 		memblock_set_node(start, end, &memblock.reserved, nid);
 	}
 
-	/* initialize struct pages for the reserved regions */
+	/*
+	 * initialize struct pages for reserved regions that don't have
+	 * the MEMBLOCK_RSRV_NOINIT_VMEMMAP flag set
+	 */
 	for_each_reserved_mem_region(region) {
-		nid = memblock_get_region_node(region);
-		start = region->base;
-		end = start + region->size;
+		if (!memblock_is_noinit_vmemmap(region)) {
+			nid = memblock_get_region_node(region);
+			start = region->base;
+			end = start + region->size;
 
-		reserve_bootmem_region(start, end, nid);
+			reserve_bootmem_region(start, end, nid);
+		}
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-25 11:18 [v3 0/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
                   ` (2 preceding siblings ...)
  2023-08-25 11:18 ` [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag Usama Arif
@ 2023-08-25 11:18 ` Usama Arif
  2023-08-28 11:33   ` Muchun Song
  3 siblings, 1 reply; 25+ messages in thread
From: Usama Arif @ 2023-08-25 11:18 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, songmuchun, fam.zheng, liangma, punit.agrawal, Usama Arif

The new boot flow when it comes to initialization of gigantic pages
is as follows:
- At boot time, for a gigantic page during __alloc_bootmem_hugepage,
the region after the first struct page is marked as noinit.
- This results in only the first struct page to be
initialized in reserve_bootmem_region. As the tail struct pages are
not initialized at this point, there can be a significant saving
in boot time if HVO succeeds later on.
- Later on in the boot, HVO is attempted. If its successful, only the first
HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
after the head struct page are initialized. If it is not successful,
then all of the tail struct pages are initialized.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 mm/hugetlb.c         | 52 +++++++++++++++++++++++++++++++++++---------
 mm/hugetlb_vmemmap.h |  8 +++----
 mm/internal.h        |  3 +++
 mm/mm_init.c         |  2 +-
 4 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6da626bfb52e..964f7a2b693e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1953,7 +1953,6 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
 
 static void __prep_new_hugetlb_folio(struct hstate *h, struct folio *folio)
 {
-	hugetlb_vmemmap_optimize(h, &folio->page);
 	INIT_LIST_HEAD(&folio->lru);
 	folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR);
 	hugetlb_set_folio_subpool(folio, NULL);
@@ -2225,6 +2224,7 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
 			return NULL;
 		}
 	}
+	hugetlb_vmemmap_optimize(h, &folio->page);
 	prep_new_hugetlb_folio(h, folio, folio_nid(folio));
 
 	return folio;
@@ -2943,6 +2943,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h,
 	new_folio = alloc_buddy_hugetlb_folio(h, gfp_mask, nid, NULL, NULL);
 	if (!new_folio)
 		return -ENOMEM;
+	hugetlb_vmemmap_optimize(h, &new_folio->page);
 	__prep_new_hugetlb_folio(h, new_folio);
 
 retry:
@@ -3206,6 +3207,15 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 	}
 
 found:
+
+	/*
+	 * Only initialize the head struct page in memmap_init_reserved_pages,
+	 * rest of the struct pages will be initialized by the HugeTLB subsystem itself.
+	 * The head struct page is used to get folio information by the HugeTLB
+	 * subsystem like zone id and node id.
+	 */
+	memblock_reserved_mark_noinit_vmemmap(virt_to_phys((void *)m + PAGE_SIZE),
+		huge_page_size(h) - PAGE_SIZE);
 	/* Put them into a private list first because mem_map is not up yet */
 	INIT_LIST_HEAD(&m->list);
 	list_add(&m->list, &huge_boot_pages);
@@ -3213,6 +3223,27 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 	return 1;
 }
 
+static void __init hugetlb_folio_init_vmemmap(struct hstate *h, struct folio *folio,
+					       unsigned long nr_pages)
+{
+	enum zone_type zone = zone_idx(folio_zone(folio));
+	int nid = folio_nid(folio);
+	unsigned long head_pfn = folio_pfn(folio);
+	unsigned long pfn, end_pfn = head_pfn + nr_pages;
+
+	__folio_clear_reserved(folio);
+	__folio_set_head(folio);
+
+	for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) {
+		struct page *page = pfn_to_page(pfn);
+
+		__init_single_page(page, pfn, zone, nid);
+		prep_compound_tail((struct page *)folio, pfn - head_pfn);
+		set_page_count(page, 0);
+	}
+	prep_compound_head((struct page *)folio, huge_page_order(h));
+}
+
 /*
  * Put bootmem huge pages into the standard lists after mem_map is up.
  * Note: This only applies to gigantic (order > MAX_ORDER) pages.
@@ -3223,19 +3254,19 @@ static void __init gather_bootmem_prealloc(void)
 
 	list_for_each_entry(m, &huge_boot_pages, list) {
 		struct page *page = virt_to_page(m);
-		struct folio *folio = page_folio(page);
+		struct folio *folio = (void *)page;
 		struct hstate *h = m->hstate;
+		unsigned long nr_pages = pages_per_huge_page(h);
 
 		VM_BUG_ON(!hstate_is_gigantic(h));
 		WARN_ON(folio_ref_count(folio) != 1);
-		if (prep_compound_gigantic_folio(folio, huge_page_order(h))) {
-			WARN_ON(folio_test_reserved(folio));
-			prep_new_hugetlb_folio(h, folio, folio_nid(folio));
-			free_huge_page(page); /* add to the hugepage allocator */
-		} else {
-			/* VERY unlikely inflated ref count on a tail page */
-			free_gigantic_folio(folio, huge_page_order(h));
-		}
+
+		hugetlb_vmemmap_optimize(h, &folio->page);
+		if (HPageVmemmapOptimized(&folio->page))
+			nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);
+		hugetlb_folio_init_vmemmap(h, folio, nr_pages);
+		prep_new_hugetlb_folio(h, folio, folio_nid(folio));
+		free_huge_page(page); /* add to the hugepage allocator */
 
 		/*
 		 * We need to restore the 'stolen' pages to totalram_pages
@@ -3656,6 +3687,7 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
 		else
 			prep_compound_page(subpage, target_hstate->order);
 		folio_change_private(inner_folio, NULL);
+		hugetlb_vmemmap_optimize(h, &folio->page);
 		prep_new_hugetlb_folio(target_hstate, inner_folio, nid);
 		free_huge_page(subpage);
 	}
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index 25bd0e002431..d30aff8f3573 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -10,16 +10,16 @@
 #define _LINUX_HUGETLB_VMEMMAP_H
 #include <linux/hugetlb.h>
 
-#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
-int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
-void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
-
 /*
  * Reserve one vmemmap page, all vmemmap addresses are mapped to it. See
  * Documentation/vm/vmemmap_dedup.rst.
  */
 #define HUGETLB_VMEMMAP_RESERVE_SIZE	PAGE_SIZE
 
+#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
+int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
+void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
+
 static inline unsigned int hugetlb_vmemmap_size(const struct hstate *h)
 {
 	return pages_per_huge_page(h) * sizeof(struct page);
diff --git a/mm/internal.h b/mm/internal.h
index a7d9e980429a..31b3d45f4609 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1102,4 +1102,7 @@ struct vma_prepare {
 	struct vm_area_struct *remove;
 	struct vm_area_struct *remove2;
 };
+
+void __meminit __init_single_page(struct page *page, unsigned long pfn,
+				unsigned long zone, int nid);
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index a1963c3322af..3d4ab595ca7d 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -551,7 +551,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	node_states[N_MEMORY] = saved_node_state;
 }
 
-static void __meminit __init_single_page(struct page *page, unsigned long pfn,
+void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
 	mm_zero_struct_page(page);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it
  2023-08-25 11:18 ` [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
@ 2023-08-28  7:15   ` Muchun Song
  2023-08-28 18:25     ` Mike Kravetz
  0 siblings, 1 reply; 25+ messages in thread
From: Muchun Song @ 2023-08-28  7:15 UTC (permalink / raw)
  To: Usama Arif
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal



> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> 
> If tail page prep and initialization is skipped, then the "start"
> page will not contain the correct nid. Use the nid from first
> vmemap page.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>

Make sense even if without your optimization. Because the nid is
used for allocation for "reuse" page, it really should be extracted
from "reuse" page instead of "start" page.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag
  2023-08-25 11:18 ` [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
@ 2023-08-28  7:16   ` Muchun Song
  2023-08-28  7:37   ` Mike Rapoport
  2023-08-28 18:39   ` Mike Kravetz
  2 siblings, 0 replies; 25+ messages in thread
From: Muchun Song @ 2023-08-28  7:16 UTC (permalink / raw)
  To: Usama Arif
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, linux-kernel, Muchun Song,
	fam.zheng, liangma, punit.agrawal



> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> 
> This allows setting flags to both memblock types and is in preparation for
> setting flags (for e.g. to not initialize struct pages) on reserved
> memory region.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag
  2023-08-25 11:18 ` [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag Usama Arif
@ 2023-08-28  7:26   ` Muchun Song
  2023-08-28  7:47   ` Mike Rapoport
  1 sibling, 0 replies; 25+ messages in thread
From: Muchun Song @ 2023-08-28  7:26 UTC (permalink / raw)
  To: Usama Arif
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, linux-kernel, Muchun Song,
	fam.zheng, liangma, punit.agrawal



> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> 
> For reserved memory regions marked with this flag,
> reserve_bootmem_region is not called during memmap_init_reserved_pages.
> This can be used to avoid struct page initialization for
> regions which won't need them, for e.g. hugepages with
> HVO enabled.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

One nit below.

> ---
> include/linux/memblock.h | 10 ++++++++++
> mm/memblock.c            | 32 +++++++++++++++++++++++++++-----
> 2 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index f71ff9f0ec81..6d681d053880 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -40,6 +40,8 @@ extern unsigned long long max_possible_pfn;
>  * via a driver, and never indicated in the firmware-provided memory map as
>  * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
>  * kernel resource tree.
> + * @MEMBLOCK_RSRV_NOINIT_VMEMMAP: memory region for which struct pages are
> + * not initialized (only for reserved regions).

We have a more detailed explanation here.

>  */
> enum memblock_flags {
> 	MEMBLOCK_NONE = 0x0, /* No special request */
> @@ -47,6 +49,8 @@ enum memblock_flags {
> 	MEMBLOCK_MIRROR = 0x2, /* mirrored region */
> 	MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */
> 	MEMBLOCK_DRIVER_MANAGED = 0x8, /* always detected via a driver */
> +	/* don't initialize struct pages associated with this reserver memory block */

Those comments right after the macros here seem like a brief explanation.
To keep the consistent with others, maybe "don't initialize struct pages"
is enough? At least, a detailed one is redundant and repetitive compared
with the above one.

> +	MEMBLOCK_RSRV_NOINIT_VMEMMAP = 0x10,
> };


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag
  2023-08-25 11:18 ` [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
  2023-08-28  7:16   ` Muchun Song
@ 2023-08-28  7:37   ` Mike Rapoport
  2023-08-28 18:39   ` Mike Kravetz
  2 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2023-08-28  7:37 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-mm, muchun.song, mike.kravetz, linux-kernel, songmuchun,
	fam.zheng, liangma, punit.agrawal

On Fri, Aug 25, 2023 at 12:18:34PM +0100, Usama Arif wrote:
> This allows setting flags to both memblock types and is in preparation for
> setting flags (for e.g. to not initialize struct pages) on reserved
> memory region.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  mm/memblock.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index f9e61e565a53..43cb4404d94c 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -896,10 +896,9 @@ int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size)
>   *
>   * Return: 0 on success, -errno on failure.
>   */
> -static int __init_memblock memblock_setclr_flag(phys_addr_t base,
> -				phys_addr_t size, int set, int flag)
> +static int __init_memblock memblock_setclr_flag(struct memblock_type *type,
> +				phys_addr_t base, phys_addr_t size, int set, int flag)
>  {
> -	struct memblock_type *type = &memblock.memory;
>  	int i, ret, start_rgn, end_rgn;
>  
>  	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
> @@ -928,7 +927,7 @@ static int __init_memblock memblock_setclr_flag(phys_addr_t base,
>   */
>  int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
>  {
> -	return memblock_setclr_flag(base, size, 1, MEMBLOCK_HOTPLUG);
> +	return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_HOTPLUG);
>  }
>  
>  /**
> @@ -940,7 +939,7 @@ int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
>   */
>  int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
>  {
> -	return memblock_setclr_flag(base, size, 0, MEMBLOCK_HOTPLUG);
> +	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_HOTPLUG);
>  }
>  
>  /**
> @@ -957,7 +956,7 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
>  
>  	system_has_some_mirror = true;
>  
> -	return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR);
> +	return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_MIRROR);
>  }
>  
>  /**
> @@ -977,7 +976,7 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
>   */
>  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
>  {
> -	return memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
> +	return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_NOMAP);
>  }
>  
>  /**
> @@ -989,7 +988,7 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
>   */
>  int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
>  {
> -	return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
> +	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_NOMAP);
>  }
>  
>  static bool should_skip_region(struct memblock_type *type,
> -- 
> 2.25.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag
  2023-08-25 11:18 ` [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag Usama Arif
  2023-08-28  7:26   ` Muchun Song
@ 2023-08-28  7:47   ` Mike Rapoport
  2023-08-28  8:52     ` Muchun Song
  1 sibling, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2023-08-28  7:47 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-mm, muchun.song, mike.kravetz, linux-kernel, songmuchun,
	fam.zheng, liangma, punit.agrawal

On Fri, Aug 25, 2023 at 12:18:35PM +0100, Usama Arif wrote:
> For reserved memory regions marked with this flag,
> reserve_bootmem_region is not called during memmap_init_reserved_pages.
> This can be used to avoid struct page initialization for
> regions which won't need them, for e.g. hugepages with
> HVO enabled.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> ---
>  include/linux/memblock.h | 10 ++++++++++
>  mm/memblock.c            | 32 +++++++++++++++++++++++++++-----
>  2 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index f71ff9f0ec81..6d681d053880 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -40,6 +40,8 @@ extern unsigned long long max_possible_pfn;
>   * via a driver, and never indicated in the firmware-provided memory map as
>   * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
>   * kernel resource tree.
> + * @MEMBLOCK_RSRV_NOINIT_VMEMMAP: memory region for which struct pages are
> + * not initialized (only for reserved regions).
>   */
>  enum memblock_flags {
>  	MEMBLOCK_NONE		= 0x0,	/* No special request */
> @@ -47,6 +49,8 @@ enum memblock_flags {
>  	MEMBLOCK_MIRROR		= 0x2,	/* mirrored region */
>  	MEMBLOCK_NOMAP		= 0x4,	/* don't add to kernel direct mapping */
>  	MEMBLOCK_DRIVER_MANAGED = 0x8,	/* always detected via a driver */
> +	/* don't initialize struct pages associated with this reserver memory block */
> +	MEMBLOCK_RSRV_NOINIT_VMEMMAP	= 0x10,

The flag means that struct page shouldn't be initialized, it may be used
not only by vmemmap optimizations.
Please drop _VMEMMAP.

And I agree with Muchun's remarks about the comments.



>  };
>  
>  /**
> @@ -125,6 +129,7 @@ int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
>  int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
>  int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
>  int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
> +int memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size);

memblock does not care about vmemmap, please drop _vmemmap here and below as well.
  
>  void memblock_free_all(void);
>  void memblock_free(void *ptr, size_t size);
> @@ -259,6 +264,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m)
>  	return m->flags & MEMBLOCK_NOMAP;
>  }
>  
> +static inline bool memblock_is_noinit_vmemmap(struct memblock_region *m)

memblock_is_reserved_noinit please.

> +{
> +	return m->flags & MEMBLOCK_RSRV_NOINIT_VMEMMAP;
> +}
> +
>  static inline bool memblock_is_driver_managed(struct memblock_region *m)
>  {
>  	return m->flags & MEMBLOCK_DRIVER_MANAGED;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 43cb4404d94c..a9782228c840 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -991,6 +991,23 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
>  	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_NOMAP);
>  }
>  
> +/**
> + * memblock_reserved_mark_noinit_vmemmap - Mark a reserved memory region with flag
> + * MEMBLOCK_RSRV_NOINIT_VMEMMAP.

this should be about what marking RSRV_NOINIT does, not what flag it uses

> + * @base: the base phys addr of the region
> + * @size: the size of the region
> + *
> + * struct pages will not be initialized for reserved memory regions marked with
> + * %MEMBLOCK_RSRV_NOINIT_VMEMMAP.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +int __init_memblock memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size)
> +{
> +	return memblock_setclr_flag(&memblock.reserved, base, size, 1,
> +				    MEMBLOCK_RSRV_NOINIT_VMEMMAP);
> +}
> +
>  static bool should_skip_region(struct memblock_type *type,
>  			       struct memblock_region *m,
>  			       int nid, int flags)
> @@ -2107,13 +2124,18 @@ static void __init memmap_init_reserved_pages(void)
>  		memblock_set_node(start, end, &memblock.reserved, nid);
>  	}
>  
> -	/* initialize struct pages for the reserved regions */
> +	/*
> +	 * initialize struct pages for reserved regions that don't have
> +	 * the MEMBLOCK_RSRV_NOINIT_VMEMMAP flag set
> +	 */
>  	for_each_reserved_mem_region(region) {
> -		nid = memblock_get_region_node(region);
> -		start = region->base;
> -		end = start + region->size;
> +		if (!memblock_is_noinit_vmemmap(region)) {
> +			nid = memblock_get_region_node(region);
> +			start = region->base;
> +			end = start + region->size;
>  
> -		reserve_bootmem_region(start, end, nid);
> +			reserve_bootmem_region(start, end, nid);
> +		}
>  	}
>  }
>  
> -- 
> 2.25.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag
  2023-08-28  7:47   ` Mike Rapoport
@ 2023-08-28  8:52     ` Muchun Song
  2023-08-28  9:09       ` Mike Rapoport
  0 siblings, 1 reply; 25+ messages in thread
From: Muchun Song @ 2023-08-28  8:52 UTC (permalink / raw)
  To: Mike Rapoport, Usama Arif
  Cc: linux-mm, mike.kravetz, linux-kernel, songmuchun, fam.zheng,
	liangma, punit.agrawal



On 2023/8/28 15:47, Mike Rapoport wrote:
> On Fri, Aug 25, 2023 at 12:18:35PM +0100, Usama Arif wrote:
>> For reserved memory regions marked with this flag,
>> reserve_bootmem_region is not called during memmap_init_reserved_pages.
>> This can be used to avoid struct page initialization for
>> regions which won't need them, for e.g. hugepages with
>> HVO enabled.
>>
>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>> ---
>>   include/linux/memblock.h | 10 ++++++++++
>>   mm/memblock.c            | 32 +++++++++++++++++++++++++++-----
>>   2 files changed, 37 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f71ff9f0ec81..6d681d053880 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -40,6 +40,8 @@ extern unsigned long long max_possible_pfn;
>>    * via a driver, and never indicated in the firmware-provided memory map as
>>    * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
>>    * kernel resource tree.
>> + * @MEMBLOCK_RSRV_NOINIT_VMEMMAP: memory region for which struct pages are
>> + * not initialized (only for reserved regions).
>>    */
>>   enum memblock_flags {
>>   	MEMBLOCK_NONE		= 0x0,	/* No special request */
>> @@ -47,6 +49,8 @@ enum memblock_flags {
>>   	MEMBLOCK_MIRROR		= 0x2,	/* mirrored region */
>>   	MEMBLOCK_NOMAP		= 0x4,	/* don't add to kernel direct mapping */
>>   	MEMBLOCK_DRIVER_MANAGED = 0x8,	/* always detected via a driver */
>> +	/* don't initialize struct pages associated with this reserver memory block */
>> +	MEMBLOCK_RSRV_NOINIT_VMEMMAP	= 0x10,
> The flag means that struct page shouldn't be initialized, it may be used
> not only by vmemmap optimizations.
> Please drop _VMEMMAP.

The area at where the struct pages located is vmemmap, I think the
"vmemap" suffix does not mean that it is for "vmemmap optimization",
it could specify the target which will not be initialized. For me,
MEMBLOCK_RSRV_NOINIT does not tell me what should not be initialized,
memblock itself or its struct page (aka vmemmap pages)? So maybe
the suffix is better to keep?

>
> And I agree with Muchun's remarks about the comments.
>
>
>
>>   };
>>   
>>   /**
>> @@ -125,6 +129,7 @@ int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
>>   int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
>>   int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
>>   int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
>> +int memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size);
> memblock does not care about vmemmap, please drop _vmemmap here and below as well.
>    
>>   void memblock_free_all(void);
>>   void memblock_free(void *ptr, size_t size);
>> @@ -259,6 +264,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m)
>>   	return m->flags & MEMBLOCK_NOMAP;
>>   }
>>   
>> +static inline bool memblock_is_noinit_vmemmap(struct memblock_region *m)
> memblock_is_reserved_noinit please.
>
>> +{
>> +	return m->flags & MEMBLOCK_RSRV_NOINIT_VMEMMAP;
>> +}
>> +
>>   static inline bool memblock_is_driver_managed(struct memblock_region *m)
>>   {
>>   	return m->flags & MEMBLOCK_DRIVER_MANAGED;
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 43cb4404d94c..a9782228c840 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -991,6 +991,23 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
>>   	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_NOMAP);
>>   }
>>   
>> +/**
>> + * memblock_reserved_mark_noinit_vmemmap - Mark a reserved memory region with flag
>> + * MEMBLOCK_RSRV_NOINIT_VMEMMAP.
> this should be about what marking RSRV_NOINIT does, not what flag it uses
>
>> + * @base: the base phys addr of the region
>> + * @size: the size of the region
>> + *
>> + * struct pages will not be initialized for reserved memory regions marked with
>> + * %MEMBLOCK_RSRV_NOINIT_VMEMMAP.
>> + *
>> + * Return: 0 on success, -errno on failure.
>> + */
>> +int __init_memblock memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size)
>> +{
>> +	return memblock_setclr_flag(&memblock.reserved, base, size, 1,
>> +				    MEMBLOCK_RSRV_NOINIT_VMEMMAP);
>> +}
>> +
>>   static bool should_skip_region(struct memblock_type *type,
>>   			       struct memblock_region *m,
>>   			       int nid, int flags)
>> @@ -2107,13 +2124,18 @@ static void __init memmap_init_reserved_pages(void)
>>   		memblock_set_node(start, end, &memblock.reserved, nid);
>>   	}
>>   
>> -	/* initialize struct pages for the reserved regions */
>> +	/*
>> +	 * initialize struct pages for reserved regions that don't have
>> +	 * the MEMBLOCK_RSRV_NOINIT_VMEMMAP flag set
>> +	 */
>>   	for_each_reserved_mem_region(region) {
>> -		nid = memblock_get_region_node(region);
>> -		start = region->base;
>> -		end = start + region->size;
>> +		if (!memblock_is_noinit_vmemmap(region)) {
>> +			nid = memblock_get_region_node(region);
>> +			start = region->base;
>> +			end = start + region->size;
>>   
>> -		reserve_bootmem_region(start, end, nid);
>> +			reserve_bootmem_region(start, end, nid);
>> +		}
>>   	}
>>   }
>>   
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag
  2023-08-28  8:52     ` Muchun Song
@ 2023-08-28  9:09       ` Mike Rapoport
  2023-08-28  9:18         ` Muchun Song
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2023-08-28  9:09 UTC (permalink / raw)
  To: Muchun Song
  Cc: Usama Arif, linux-mm, mike.kravetz, linux-kernel, songmuchun,
	fam.zheng, liangma, punit.agrawal

On Mon, Aug 28, 2023 at 04:52:10PM +0800, Muchun Song wrote:
> 
> 
> On 2023/8/28 15:47, Mike Rapoport wrote:
> > On Fri, Aug 25, 2023 at 12:18:35PM +0100, Usama Arif wrote:
> > > For reserved memory regions marked with this flag,
> > > reserve_bootmem_region is not called during memmap_init_reserved_pages.
> > > This can be used to avoid struct page initialization for
> > > regions which won't need them, for e.g. hugepages with
> > > HVO enabled.
> > > 
> > > Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> > > ---
> > >   include/linux/memblock.h | 10 ++++++++++
> > >   mm/memblock.c            | 32 +++++++++++++++++++++++++++-----
> > >   2 files changed, 37 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > > index f71ff9f0ec81..6d681d053880 100644
> > > --- a/include/linux/memblock.h
> > > +++ b/include/linux/memblock.h
> > > @@ -40,6 +40,8 @@ extern unsigned long long max_possible_pfn;
> > >    * via a driver, and never indicated in the firmware-provided memory map as
> > >    * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
> > >    * kernel resource tree.
> > > + * @MEMBLOCK_RSRV_NOINIT_VMEMMAP: memory region for which struct pages are
> > > + * not initialized (only for reserved regions).
> > >    */
> > >   enum memblock_flags {
> > >   	MEMBLOCK_NONE		= 0x0,	/* No special request */
> > > @@ -47,6 +49,8 @@ enum memblock_flags {
> > >   	MEMBLOCK_MIRROR		= 0x2,	/* mirrored region */
> > >   	MEMBLOCK_NOMAP		= 0x4,	/* don't add to kernel direct mapping */
> > >   	MEMBLOCK_DRIVER_MANAGED = 0x8,	/* always detected via a driver */
> > > +	/* don't initialize struct pages associated with this reserver memory block */
> > > +	MEMBLOCK_RSRV_NOINIT_VMEMMAP	= 0x10,
> > The flag means that struct page shouldn't be initialized, it may be used
> > not only by vmemmap optimizations.
> > Please drop _VMEMMAP.
> 
> The area at where the struct pages located is vmemmap, I think the
> "vmemap" suffix does not mean that it is for "vmemmap optimization",
> it could specify the target which will not be initialized. For me,
> MEMBLOCK_RSRV_NOINIT does not tell me what should not be initialized,
> memblock itself or its struct page (aka vmemmap pages)? So maybe
> the suffix is better to keep?

In general case the area is memmap rather than vmemmap, so a better suffix
then would be _MEMMAP. I'm not too fond of that either, but I cannot think
of better name.
 
> > 
> > And I agree with Muchun's remarks about the comments.
> > 
> > 
> > 
> > >   };
> > >   /**
> > > @@ -125,6 +129,7 @@ int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
> > >   int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
> > >   int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
> > >   int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
> > > +int memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size);
> > memblock does not care about vmemmap, please drop _vmemmap here and below as well.
> > >   void memblock_free_all(void);
> > >   void memblock_free(void *ptr, size_t size);
> > > @@ -259,6 +264,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m)
> > >   	return m->flags & MEMBLOCK_NOMAP;
> > >   }
> > > +static inline bool memblock_is_noinit_vmemmap(struct memblock_region *m)
> > memblock_is_reserved_noinit please.
> > 
> > > +{
> > > +	return m->flags & MEMBLOCK_RSRV_NOINIT_VMEMMAP;
> > > +}
> > > +
> > >   static inline bool memblock_is_driver_managed(struct memblock_region *m)
> > >   {
> > >   	return m->flags & MEMBLOCK_DRIVER_MANAGED;
> > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > index 43cb4404d94c..a9782228c840 100644
> > > --- a/mm/memblock.c
> > > +++ b/mm/memblock.c
> > > @@ -991,6 +991,23 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
> > >   	return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_NOMAP);
> > >   }
> > > +/**
> > > + * memblock_reserved_mark_noinit_vmemmap - Mark a reserved memory region with flag
> > > + * MEMBLOCK_RSRV_NOINIT_VMEMMAP.
> > this should be about what marking RSRV_NOINIT does, not what flag it uses
> > 
> > > + * @base: the base phys addr of the region
> > > + * @size: the size of the region
> > > + *
> > > + * struct pages will not be initialized for reserved memory regions marked with
> > > + * %MEMBLOCK_RSRV_NOINIT_VMEMMAP.
> > > + *
> > > + * Return: 0 on success, -errno on failure.
> > > + */
> > > +int __init_memblock memblock_reserved_mark_noinit_vmemmap(phys_addr_t base, phys_addr_t size)
> > > +{
> > > +	return memblock_setclr_flag(&memblock.reserved, base, size, 1,
> > > +				    MEMBLOCK_RSRV_NOINIT_VMEMMAP);
> > > +}
> > > +
> > >   static bool should_skip_region(struct memblock_type *type,
> > >   			       struct memblock_region *m,
> > >   			       int nid, int flags)
> > > @@ -2107,13 +2124,18 @@ static void __init memmap_init_reserved_pages(void)
> > >   		memblock_set_node(start, end, &memblock.reserved, nid);
> > >   	}
> > > -	/* initialize struct pages for the reserved regions */
> > > +	/*
> > > +	 * initialize struct pages for reserved regions that don't have
> > > +	 * the MEMBLOCK_RSRV_NOINIT_VMEMMAP flag set
> > > +	 */
> > >   	for_each_reserved_mem_region(region) {
> > > -		nid = memblock_get_region_node(region);
> > > -		start = region->base;
> > > -		end = start + region->size;
> > > +		if (!memblock_is_noinit_vmemmap(region)) {
> > > +			nid = memblock_get_region_node(region);
> > > +			start = region->base;
> > > +			end = start + region->size;
> > > -		reserve_bootmem_region(start, end, nid);
> > > +			reserve_bootmem_region(start, end, nid);
> > > +		}
> > >   	}
> > >   }
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag
  2023-08-28  9:09       ` Mike Rapoport
@ 2023-08-28  9:18         ` Muchun Song
  0 siblings, 0 replies; 25+ messages in thread
From: Muchun Song @ 2023-08-28  9:18 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Usama Arif, Linux-MM, Mike Kravetz, linux-kernel, Muchun Song,
	fam.zheng, liangma, punit.agrawal



> On Aug 28, 2023, at 17:09, Mike Rapoport <rppt@kernel.org> wrote:
> 
> On Mon, Aug 28, 2023 at 04:52:10PM +0800, Muchun Song wrote:
>> 
>> 
>> On 2023/8/28 15:47, Mike Rapoport wrote:
>>> On Fri, Aug 25, 2023 at 12:18:35PM +0100, Usama Arif wrote:
>>>> For reserved memory regions marked with this flag,
>>>> reserve_bootmem_region is not called during memmap_init_reserved_pages.
>>>> This can be used to avoid struct page initialization for
>>>> regions which won't need them, for e.g. hugepages with
>>>> HVO enabled.
>>>> 
>>>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>>>> ---
>>>>  include/linux/memblock.h | 10 ++++++++++
>>>>  mm/memblock.c            | 32 +++++++++++++++++++++++++++-----
>>>>  2 files changed, 37 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>>>> index f71ff9f0ec81..6d681d053880 100644
>>>> --- a/include/linux/memblock.h
>>>> +++ b/include/linux/memblock.h
>>>> @@ -40,6 +40,8 @@ extern unsigned long long max_possible_pfn;
>>>>   * via a driver, and never indicated in the firmware-provided memory map as
>>>>   * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
>>>>   * kernel resource tree.
>>>> + * @MEMBLOCK_RSRV_NOINIT_VMEMMAP: memory region for which struct pages are
>>>> + * not initialized (only for reserved regions).
>>>>   */
>>>>  enum memblock_flags {
>>>>   MEMBLOCK_NONE = 0x0, /* No special request */
>>>> @@ -47,6 +49,8 @@ enum memblock_flags {
>>>>   MEMBLOCK_MIRROR = 0x2, /* mirrored region */
>>>>   MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */
>>>>   MEMBLOCK_DRIVER_MANAGED = 0x8, /* always detected via a driver */
>>>> + /* don't initialize struct pages associated with this reserver memory block */
>>>> + MEMBLOCK_RSRV_NOINIT_VMEMMAP = 0x10,
>>> The flag means that struct page shouldn't be initialized, it may be used
>>> not only by vmemmap optimizations.
>>> Please drop _VMEMMAP.
>> 
>> The area at where the struct pages located is vmemmap, I think the
>> "vmemap" suffix does not mean that it is for "vmemmap optimization",
>> it could specify the target which will not be initialized. For me,
>> MEMBLOCK_RSRV_NOINIT does not tell me what should not be initialized,
>> memblock itself or its struct page (aka vmemmap pages)? So maybe
>> the suffix is better to keep?
> In general case the area is memmap rather than vmemmap, so a better suffix

Right. memmap

> then would be _MEMMAP. I'm not too fond of that either, but I cannot think
> of better name.

I have no strong opinion, if we cannot think a better name, just drop the
suffix as you suggested and let the comments more specified. :-)

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-25 11:18 ` [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
@ 2023-08-28 11:33   ` Muchun Song
  2023-08-28 21:04     ` Mike Kravetz
  2023-08-30 10:27     ` [External] " Usama Arif
  0 siblings, 2 replies; 25+ messages in thread
From: Muchun Song @ 2023-08-28 11:33 UTC (permalink / raw)
  To: Usama Arif
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal



> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> 
> The new boot flow when it comes to initialization of gigantic pages
> is as follows:
> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> the region after the first struct page is marked as noinit.
> - This results in only the first struct page to be
> initialized in reserve_bootmem_region. As the tail struct pages are
> not initialized at this point, there can be a significant saving
> in boot time if HVO succeeds later on.
> - Later on in the boot, HVO is attempted. If its successful, only the first
> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> after the head struct page are initialized. If it is not successful,
> then all of the tail struct pages are initialized.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>

This edition is simpler than before ever, thanks for your work.

There is premise that other subsystems do not access vmemmap pages
before the initialization of vmemmap pages associated withe HugeTLB
pages allocated from bootmem for your optimization. However, IIUC, the
compacting path could access arbitrary struct page when memory fails
to be allocated via buddy allocator. So we should make sure that
those struct pages are not referenced in this routine. And I know
if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
the same issue, but I don't find any code to prevent this from
happening. I need more time to confirm this, if someone already knows,
please let me know, thanks. So I think HugeTLB should adopt the similar
way to prevent this.

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it
  2023-08-28  7:15   ` Muchun Song
@ 2023-08-28 18:25     ` Mike Kravetz
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2023-08-28 18:25 UTC (permalink / raw)
  To: Muchun Song
  Cc: Usama Arif, Linux-MM, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal

On 08/28/23 15:15, Muchun Song wrote:
> 
> 
> > On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> > 
> > If tail page prep and initialization is skipped, then the "start"
> > page will not contain the correct nid. Use the nid from first
> > vmemap page.
> > 
> > Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> 
> Make sense even if without your optimization. Because the nid is
> used for allocation for "reuse" page, it really should be extracted
> from "reuse" page instead of "start" page.
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Agree that using reuse is better than start here.  However, IIUC there is no
way start and reuse could be on different nodes today.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag
  2023-08-25 11:18 ` [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
  2023-08-28  7:16   ` Muchun Song
  2023-08-28  7:37   ` Mike Rapoport
@ 2023-08-28 18:39   ` Mike Kravetz
  2 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2023-08-28 18:39 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-mm, muchun.song, rppt, linux-kernel, songmuchun, fam.zheng,
	liangma, punit.agrawal

On 08/25/23 12:18, Usama Arif wrote:
> This allows setting flags to both memblock types and is in preparation for
> setting flags (for e.g. to not initialize struct pages) on reserved
> memory region.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> ---
>  mm/memblock.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-28 11:33   ` Muchun Song
@ 2023-08-28 21:04     ` Mike Kravetz
  2023-08-29  3:33       ` Muchun Song
  2023-08-30 10:27     ` [External] " Usama Arif
  1 sibling, 1 reply; 25+ messages in thread
From: Mike Kravetz @ 2023-08-28 21:04 UTC (permalink / raw)
  To: Muchun Song
  Cc: Usama Arif, Linux-MM, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal

On 08/28/23 19:33, Muchun Song wrote:
> 
> 
> > On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> > 
> > The new boot flow when it comes to initialization of gigantic pages
> > is as follows:
> > - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> > the region after the first struct page is marked as noinit.
> > - This results in only the first struct page to be
> > initialized in reserve_bootmem_region. As the tail struct pages are
> > not initialized at this point, there can be a significant saving
> > in boot time if HVO succeeds later on.
> > - Later on in the boot, HVO is attempted. If its successful, only the first
> > HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> > after the head struct page are initialized. If it is not successful,
> > then all of the tail struct pages are initialized.
> > 
> > Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> 
> This edition is simpler than before ever, thanks for your work.
> 
> There is premise that other subsystems do not access vmemmap pages
> before the initialization of vmemmap pages associated withe HugeTLB
> pages allocated from bootmem for your optimization. However, IIUC, the
> compacting path could access arbitrary struct page when memory fails
> to be allocated via buddy allocator. So we should make sure that
> those struct pages are not referenced in this routine. And I know
> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
> the same issue, but I don't find any code to prevent this from
> happening. I need more time to confirm this, if someone already knows,
> please let me know, thanks. So I think HugeTLB should adopt the similar
> way to prevent this.

In this patch, the call to hugetlb_vmemmap_optimize() is moved BEFORE
__prep_new_hugetlb_folio or prep_new_hugetlb_folio in all code paths.
The prep_new_hugetlb_folio routine(s) are what set the destructor (soon
to be a flag) that identifies the set of pages as a hugetlb page.  So,
there is now a window where a set of pages not identified as hugetlb
will not have vmemmap pages.

Recently, I closed the same window in the hugetlb freeing code paths with
commit 32c877191e02 'hugetlb: do not clear hugetlb dtor until allocating'.
This patch needs to be reworked so that this window is not opened in the
allocation paths.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-28 21:04     ` Mike Kravetz
@ 2023-08-29  3:33       ` Muchun Song
  2023-08-29  3:47         ` Mike Kravetz
  0 siblings, 1 reply; 25+ messages in thread
From: Muchun Song @ 2023-08-29  3:33 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Usama Arif, Linux-MM, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal



> On Aug 29, 2023, at 05:04, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
> On 08/28/23 19:33, Muchun Song wrote:
>> 
>> 
>>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
>>> 
>>> The new boot flow when it comes to initialization of gigantic pages
>>> is as follows:
>>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
>>> the region after the first struct page is marked as noinit.
>>> - This results in only the first struct page to be
>>> initialized in reserve_bootmem_region. As the tail struct pages are
>>> not initialized at this point, there can be a significant saving
>>> in boot time if HVO succeeds later on.
>>> - Later on in the boot, HVO is attempted. If its successful, only the first
>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
>>> after the head struct page are initialized. If it is not successful,
>>> then all of the tail struct pages are initialized.
>>> 
>>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>> 
>> This edition is simpler than before ever, thanks for your work.
>> 
>> There is premise that other subsystems do not access vmemmap pages
>> before the initialization of vmemmap pages associated withe HugeTLB
>> pages allocated from bootmem for your optimization. However, IIUC, the
>> compacting path could access arbitrary struct page when memory fails
>> to be allocated via buddy allocator. So we should make sure that
>> those struct pages are not referenced in this routine. And I know
>> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
>> the same issue, but I don't find any code to prevent this from
>> happening. I need more time to confirm this, if someone already knows,
>> please let me know, thanks. So I think HugeTLB should adopt the similar
>> way to prevent this.
> 
> In this patch, the call to hugetlb_vmemmap_optimize() is moved BEFORE
> __prep_new_hugetlb_folio or prep_new_hugetlb_folio in all code paths.
> The prep_new_hugetlb_folio routine(s) are what set the destructor (soon
> to be a flag) that identifies the set of pages as a hugetlb page.  So,
> there is now a window where a set of pages not identified as hugetlb
> will not have vmemmap pages.

Thanks for your point it out.

Seems this issue is not related to this change? hugetlb_vmemmap_optimize()
is called before the setting of destructor since the initial commit
f41f2ed43ca5. Right?

> 
> Recently, I closed the same window in the hugetlb freeing code paths with
> commit 32c877191e02 'hugetlb: do not clear hugetlb dtor until allocating'.

Yes, I saw it. 

> This patch needs to be reworked so that this window is not opened in the
> allocation paths.

So I think the fix should be a separate series.

Thanks.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-29  3:33       ` Muchun Song
@ 2023-08-29  3:47         ` Mike Kravetz
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2023-08-29  3:47 UTC (permalink / raw)
  To: Muchun Song
  Cc: Usama Arif, Linux-MM, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal

On 08/29/23 11:33, Muchun Song wrote:
> 
> 
> > On Aug 29, 2023, at 05:04, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > 
> > On 08/28/23 19:33, Muchun Song wrote:
> >> 
> >> 
> >>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> >>> 
> >>> The new boot flow when it comes to initialization of gigantic pages
> >>> is as follows:
> >>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> >>> the region after the first struct page is marked as noinit.
> >>> - This results in only the first struct page to be
> >>> initialized in reserve_bootmem_region. As the tail struct pages are
> >>> not initialized at this point, there can be a significant saving
> >>> in boot time if HVO succeeds later on.
> >>> - Later on in the boot, HVO is attempted. If its successful, only the first
> >>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> >>> after the head struct page are initialized. If it is not successful,
> >>> then all of the tail struct pages are initialized.
> >>> 
> >>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> >> 
> >> This edition is simpler than before ever, thanks for your work.
> >> 
> >> There is premise that other subsystems do not access vmemmap pages
> >> before the initialization of vmemmap pages associated withe HugeTLB
> >> pages allocated from bootmem for your optimization. However, IIUC, the
> >> compacting path could access arbitrary struct page when memory fails
> >> to be allocated via buddy allocator. So we should make sure that
> >> those struct pages are not referenced in this routine. And I know
> >> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
> >> the same issue, but I don't find any code to prevent this from
> >> happening. I need more time to confirm this, if someone already knows,
> >> please let me know, thanks. So I think HugeTLB should adopt the similar
> >> way to prevent this.
> > 
> > In this patch, the call to hugetlb_vmemmap_optimize() is moved BEFORE
> > __prep_new_hugetlb_folio or prep_new_hugetlb_folio in all code paths.
> > The prep_new_hugetlb_folio routine(s) are what set the destructor (soon
> > to be a flag) that identifies the set of pages as a hugetlb page.  So,
> > there is now a window where a set of pages not identified as hugetlb
> > will not have vmemmap pages.
> 
> Thanks for your point it out.
> 
> Seems this issue is not related to this change? hugetlb_vmemmap_optimize()
> is called before the setting of destructor since the initial commit
> f41f2ed43ca5. Right?
> 

Thanks Muchun!

Yes, this issue exists today.  It was the further separation of the calls in
this patch which pointed out the issue to me.

I overlooked the fact that the issue already exists. :(

> > 
> > Recently, I closed the same window in the hugetlb freeing code paths with
> > commit 32c877191e02 'hugetlb: do not clear hugetlb dtor until allocating'.
> 
> Yes, I saw it. 
> 
> > This patch needs to be reworked so that this window is not opened in the
> > allocation paths.
> 
> So I think the fix should be a separate series.
> 

Right.  I can fix that up separately.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [External] Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-28 11:33   ` Muchun Song
  2023-08-28 21:04     ` Mike Kravetz
@ 2023-08-30 10:27     ` Usama Arif
  2023-08-31  6:21       ` [External] " Muchun Song
  2023-08-31  7:33       ` [External] " Mike Rapoport
  1 sibling, 2 replies; 25+ messages in thread
From: Usama Arif @ 2023-08-30 10:27 UTC (permalink / raw)
  To: Muchun Song
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal, mgorman, akpm



On 28/08/2023 12:33, Muchun Song wrote:
> 
> 
>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
>>
>> The new boot flow when it comes to initialization of gigantic pages
>> is as follows:
>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
>> the region after the first struct page is marked as noinit.
>> - This results in only the first struct page to be
>> initialized in reserve_bootmem_region. As the tail struct pages are
>> not initialized at this point, there can be a significant saving
>> in boot time if HVO succeeds later on.
>> - Later on in the boot, HVO is attempted. If its successful, only the first
>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
>> after the head struct page are initialized. If it is not successful,
>> then all of the tail struct pages are initialized.
>>
>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> 
> This edition is simpler than before ever, thanks for your work.
> 
> There is premise that other subsystems do not access vmemmap pages
> before the initialization of vmemmap pages associated withe HugeTLB
> pages allocated from bootmem for your optimization. However, IIUC, the
> compacting path could access arbitrary struct page when memory fails
> to be allocated via buddy allocator. So we should make sure that
> those struct pages are not referenced in this routine. And I know
> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
> the same issue, but I don't find any code to prevent this from
> happening. I need more time to confirm this, if someone already knows,
> please let me know, thanks. So I think HugeTLB should adopt the similar
> way to prevent this.
> 
> Thanks.
> 

Thanks for the reviews.

So if I understand it correctly, the uninitialized pages due to the 
optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should 
be treated in the same way during compaction. I see that in 
isolate_freepages during compaction there is a check to see if PageBuddy 
flag is set and also there are calls like __pageblock_pfn_to_page to 
check if the pageblock is valid.

But if the struct page is uninitialized then they would contain random 
data and these checks could pass if certain bits were set?

Compaction is done on free list. I think the uninitialized struct pages 
atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I 
think their pfn would be considered for compaction.

Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and 
compaction confirm how the uninitialized struct pages are handled when 
compaction happens? Thanks!

Usama

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [External] [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-30 10:27     ` [External] " Usama Arif
@ 2023-08-31  6:21       ` Muchun Song
  2023-08-31  9:58         ` Mel Gorman
  2023-08-31  7:33       ` [External] " Mike Rapoport
  1 sibling, 1 reply; 25+ messages in thread
From: Muchun Song @ 2023-08-31  6:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal, Andrew Morton, Usama Arif



> On Aug 30, 2023, at 18:27, Usama Arif <usama.arif@bytedance.com> wrote:
> On 28/08/2023 12:33, Muchun Song wrote:
>>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
>>> 
>>> The new boot flow when it comes to initialization of gigantic pages
>>> is as follows:
>>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
>>> the region after the first struct page is marked as noinit.
>>> - This results in only the first struct page to be
>>> initialized in reserve_bootmem_region. As the tail struct pages are
>>> not initialized at this point, there can be a significant saving
>>> in boot time if HVO succeeds later on.
>>> - Later on in the boot, HVO is attempted. If its successful, only the first
>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
>>> after the head struct page are initialized. If it is not successful,
>>> then all of the tail struct pages are initialized.
>>> 
>>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>> This edition is simpler than before ever, thanks for your work.
>> There is premise that other subsystems do not access vmemmap pages
>> before the initialization of vmemmap pages associated withe HugeTLB
>> pages allocated from bootmem for your optimization. However, IIUC, the
>> compacting path could access arbitrary struct page when memory fails
>> to be allocated via buddy allocator. So we should make sure that
>> those struct pages are not referenced in this routine. And I know
>> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
>> the same issue, but I don't find any code to prevent this from
>> happening. I need more time to confirm this, if someone already knows,
>> please let me know, thanks. So I think HugeTLB should adopt the similar
>> way to prevent this.
>> Thanks.
> 
> Thanks for the reviews.
> 
> So if I understand it correctly, the uninitialized pages due to the optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should be treated in the same way during compaction. I see that in isolate_freepages during compaction there is a check to see if PageBuddy flag is set and also there are calls like __pageblock_pfn_to_page to check if the pageblock is valid.
> 
> But if the struct page is uninitialized then they would contain random data and these checks could pass if certain bits were set?
> 
> Compaction is done on free list. I think the uninitialized struct pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I think their pfn would be considered for compaction.
> 
> Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and compaction confirm how the uninitialized struct pages are handled when compaction happens? Thanks!

Hi Mel,

Could you help us answer this question? I think you must be the expert of
CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we all know,
some struct pages are uninnitialized when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
enabled, if someone allocates a larger memory (e.g. order is 4) via buddy
allocator and fails to allocate the memory, then we will go into the compacting
routine, which will traverse all pfns and use pfn_to_page to access its struct
page, however, those struct pages may be uninnitialized (so it's arbitrary data).
Our question is how to prevent the compacting routine from accessing those
uninitialized struct pages? We'll be appreciated if you know the answer.

Thanks.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [External] Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-30 10:27     ` [External] " Usama Arif
  2023-08-31  6:21       ` [External] " Muchun Song
@ 2023-08-31  7:33       ` Mike Rapoport
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2023-08-31  7:33 UTC (permalink / raw)
  To: Usama Arif
  Cc: Muchun Song, Linux-MM, Mike Kravetz, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal, mgorman, akpm

On Wed, Aug 30, 2023 at 11:27:42AM +0100, Usama Arif wrote:
> 
> On 28/08/2023 12:33, Muchun Song wrote:
> > 
> > 
> > > On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> > > 
> > > The new boot flow when it comes to initialization of gigantic pages
> > > is as follows:
> > > - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> > > the region after the first struct page is marked as noinit.
> > > - This results in only the first struct page to be
> > > initialized in reserve_bootmem_region. As the tail struct pages are
> > > not initialized at this point, there can be a significant saving
> > > in boot time if HVO succeeds later on.
> > > - Later on in the boot, HVO is attempted. If its successful, only the first
> > > HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> > > after the head struct page are initialized. If it is not successful,
> > > then all of the tail struct pages are initialized.
> > > 
> > > Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> > 
> > This edition is simpler than before ever, thanks for your work.
> > 
> > There is premise that other subsystems do not access vmemmap pages
> > before the initialization of vmemmap pages associated withe HugeTLB
> > pages allocated from bootmem for your optimization. However, IIUC, the
> > compacting path could access arbitrary struct page when memory fails
> > to be allocated via buddy allocator. So we should make sure that
> > those struct pages are not referenced in this routine. And I know
> > if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
> > the same issue, but I don't find any code to prevent this from
> > happening. I need more time to confirm this, if someone already knows,
> > please let me know, thanks. So I think HugeTLB should adopt the similar
> > way to prevent this.
> > 
> > Thanks.
> > 
> 
> Thanks for the reviews.
> 
> So if I understand it correctly, the uninitialized pages due to the
> optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should be
> treated in the same way during compaction. I see that in isolate_freepages
> during compaction there is a check to see if PageBuddy flag is set and also
> there are calls like __pageblock_pfn_to_page to check if the pageblock is
> valid.
> 
> But if the struct page is uninitialized then they would contain random data
> and these checks could pass if certain bits were set?
> 
> Compaction is done on free list. I think the uninitialized struct pages
> atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I think
> their pfn would be considered for compaction.
> 
> Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and compaction
> confirm how the uninitialized struct pages are handled when compaction
> happens? Thanks!

I'm not familiar with compaction enough to confirm it only touches pages on
the free lists, but DEFERRED_STRUCT_PAGE_INIT makes sure the struct page is
initialized before it's put on a free list.
 
> Usama

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [External] [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-31  6:21       ` [External] " Muchun Song
@ 2023-08-31  9:58         ` Mel Gorman
  2023-08-31 10:01           ` Muchun Song
  0 siblings, 1 reply; 25+ messages in thread
From: Mel Gorman @ 2023-08-31  9:58 UTC (permalink / raw)
  To: Muchun Song
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal, Andrew Morton, Usama Arif

On Thu, Aug 31, 2023 at 02:21:06PM +0800, Muchun Song wrote:
> 
> 
> > On Aug 30, 2023, at 18:27, Usama Arif <usama.arif@bytedance.com> wrote:
> > On 28/08/2023 12:33, Muchun Song wrote:
> >>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> >>> 
> >>> The new boot flow when it comes to initialization of gigantic pages
> >>> is as follows:
> >>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> >>> the region after the first struct page is marked as noinit.
> >>> - This results in only the first struct page to be
> >>> initialized in reserve_bootmem_region. As the tail struct pages are
> >>> not initialized at this point, there can be a significant saving
> >>> in boot time if HVO succeeds later on.
> >>> - Later on in the boot, HVO is attempted. If its successful, only the first
> >>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> >>> after the head struct page are initialized. If it is not successful,
> >>> then all of the tail struct pages are initialized.
> >>> 
> >>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> >> This edition is simpler than before ever, thanks for your work.
> >> There is premise that other subsystems do not access vmemmap pages
> >> before the initialization of vmemmap pages associated withe HugeTLB
> >> pages allocated from bootmem for your optimization. However, IIUC, the
> >> compacting path could access arbitrary struct page when memory fails
> >> to be allocated via buddy allocator. So we should make sure that
> >> those struct pages are not referenced in this routine. And I know
> >> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
> >> the same issue, but I don't find any code to prevent this from
> >> happening. I need more time to confirm this, if someone already knows,
> >> please let me know, thanks. So I think HugeTLB should adopt the similar
> >> way to prevent this.
> >> Thanks.
> > 
> > Thanks for the reviews.
> > 
> > So if I understand it correctly, the uninitialized pages due to the optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should be treated in the same way during compaction. I see that in isolate_freepages during compaction there is a check to see if PageBuddy flag is set and also there are calls like __pageblock_pfn_to_page to check if the pageblock is valid.
> > 
> > But if the struct page is uninitialized then they would contain random data and these checks could pass if certain bits were set?
> > 
> > Compaction is done on free list. I think the uninitialized struct pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I think their pfn would be considered for compaction.
> > 
> > Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and compaction confirm how the uninitialized struct pages are handled when compaction happens? Thanks!
> 
> Hi Mel,
> 
> Could you help us answer this question? I think you must be the expert of
> CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we all know,
> some struct pages are uninnitialized when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
> enabled, if someone allocates a larger memory (e.g. order is 4) via buddy
> allocator and fails to allocate the memory, then we will go into the compacting
> routine, which will traverse all pfns and use pfn_to_page to access its struct
> page, however, those struct pages may be uninnitialized (so it's arbitrary data).
> Our question is how to prevent the compacting routine from accessing those
> uninitialized struct pages? We'll be appreciated if you know the answer.
> 

I didn't check the code but IIRC, the struct pages should be at least
valid and not contain arbitrary data once page_alloc_init_late finishes.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [External] [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-31  9:58         ` Mel Gorman
@ 2023-08-31 10:01           ` Muchun Song
  2023-08-31 10:28             ` Mel Gorman
  0 siblings, 1 reply; 25+ messages in thread
From: Muchun Song @ 2023-08-31 10:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal, Andrew Morton, Usama Arif



> On Aug 31, 2023, at 17:58, Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> On Thu, Aug 31, 2023 at 02:21:06PM +0800, Muchun Song wrote:
>> 
>> 
>>> On Aug 30, 2023, at 18:27, Usama Arif <usama.arif@bytedance.com> wrote:
>>> On 28/08/2023 12:33, Muchun Song wrote:
>>>>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
>>>>> 
>>>>> The new boot flow when it comes to initialization of gigantic pages
>>>>> is as follows:
>>>>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
>>>>> the region after the first struct page is marked as noinit.
>>>>> - This results in only the first struct page to be
>>>>> initialized in reserve_bootmem_region. As the tail struct pages are
>>>>> not initialized at this point, there can be a significant saving
>>>>> in boot time if HVO succeeds later on.
>>>>> - Later on in the boot, HVO is attempted. If its successful, only the first
>>>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
>>>>> after the head struct page are initialized. If it is not successful,
>>>>> then all of the tail struct pages are initialized.
>>>>> 
>>>>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>>>> This edition is simpler than before ever, thanks for your work.
>>>> There is premise that other subsystems do not access vmemmap pages
>>>> before the initialization of vmemmap pages associated withe HugeTLB
>>>> pages allocated from bootmem for your optimization. However, IIUC, the
>>>> compacting path could access arbitrary struct page when memory fails
>>>> to be allocated via buddy allocator. So we should make sure that
>>>> those struct pages are not referenced in this routine. And I know
>>>> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
>>>> the same issue, but I don't find any code to prevent this from
>>>> happening. I need more time to confirm this, if someone already knows,
>>>> please let me know, thanks. So I think HugeTLB should adopt the similar
>>>> way to prevent this.
>>>> Thanks.
>>> 
>>> Thanks for the reviews.
>>> 
>>> So if I understand it correctly, the uninitialized pages due to the optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should be treated in the same way during compaction. I see that in isolate_freepages during compaction there is a check to see if PageBuddy flag is set and also there are calls like __pageblock_pfn_to_page to check if the pageblock is valid.
>>> 
>>> But if the struct page is uninitialized then they would contain random data and these checks could pass if certain bits were set?
>>> 
>>> Compaction is done on free list. I think the uninitialized struct pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I think their pfn would be considered for compaction.
>>> 
>>> Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and compaction confirm how the uninitialized struct pages are handled when compaction happens? Thanks!
>> 
>> Hi Mel,
>> 
>> Could you help us answer this question? I think you must be the expert of
>> CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we all know,
>> some struct pages are uninnitialized when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
>> enabled, if someone allocates a larger memory (e.g. order is 4) via buddy
>> allocator and fails to allocate the memory, then we will go into the compacting
>> routine, which will traverse all pfns and use pfn_to_page to access its struct
>> page, however, those struct pages may be uninnitialized (so it's arbitrary data).
>> Our question is how to prevent the compacting routine from accessing those
>> uninitialized struct pages? We'll be appreciated if you know the answer.
>> 
> 
> I didn't check the code but IIRC, the struct pages should be at least
> valid and not contain arbitrary data once page_alloc_init_late finishes.

However, the buddy allocator is ready before page_alloc_init_late(), so it
may access arbitrary data in compacting routine, right?

> 
> -- 
> Mel Gorman
> SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [External] [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
  2023-08-31 10:01           ` Muchun Song
@ 2023-08-31 10:28             ` Mel Gorman
  0 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2023-08-31 10:28 UTC (permalink / raw)
  To: Muchun Song
  Cc: Linux-MM, Mike Kravetz, Mike Rapoport, LKML, Muchun Song,
	fam.zheng, liangma, punit.agrawal, Andrew Morton, Usama Arif

On Thu, Aug 31, 2023 at 06:01:08PM +0800, Muchun Song wrote:
> 
> 
> > On Aug 31, 2023, at 17:58, Mel Gorman <mgorman@techsingularity.net> wrote:
> > 
> > On Thu, Aug 31, 2023 at 02:21:06PM +0800, Muchun Song wrote:
> >> 
> >> 
> >>> On Aug 30, 2023, at 18:27, Usama Arif <usama.arif@bytedance.com> wrote:
> >>> On 28/08/2023 12:33, Muchun Song wrote:
> >>>>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@bytedance.com> wrote:
> >>>>> 
> >>>>> The new boot flow when it comes to initialization of gigantic pages
> >>>>> is as follows:
> >>>>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> >>>>> the region after the first struct page is marked as noinit.
> >>>>> - This results in only the first struct page to be
> >>>>> initialized in reserve_bootmem_region. As the tail struct pages are
> >>>>> not initialized at this point, there can be a significant saving
> >>>>> in boot time if HVO succeeds later on.
> >>>>> - Later on in the boot, HVO is attempted. If its successful, only the first
> >>>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> >>>>> after the head struct page are initialized. If it is not successful,
> >>>>> then all of the tail struct pages are initialized.
> >>>>> 
> >>>>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> >>>> This edition is simpler than before ever, thanks for your work.
> >>>> There is premise that other subsystems do not access vmemmap pages
> >>>> before the initialization of vmemmap pages associated withe HugeTLB
> >>>> pages allocated from bootmem for your optimization. However, IIUC, the
> >>>> compacting path could access arbitrary struct page when memory fails
> >>>> to be allocated via buddy allocator. So we should make sure that
> >>>> those struct pages are not referenced in this routine. And I know
> >>>> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
> >>>> the same issue, but I don't find any code to prevent this from
> >>>> happening. I need more time to confirm this, if someone already knows,
> >>>> please let me know, thanks. So I think HugeTLB should adopt the similar
> >>>> way to prevent this.
> >>>> Thanks.
> >>> 
> >>> Thanks for the reviews.
> >>> 
> >>> So if I understand it correctly, the uninitialized pages due to the optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should be treated in the same way during compaction. I see that in isolate_freepages during compaction there is a check to see if PageBuddy flag is set and also there are calls like __pageblock_pfn_to_page to check if the pageblock is valid.
> >>> 
> >>> But if the struct page is uninitialized then they would contain random data and these checks could pass if certain bits were set?
> >>> 
> >>> Compaction is done on free list. I think the uninitialized struct pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I think their pfn would be considered for compaction.
> >>> 
> >>> Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and compaction confirm how the uninitialized struct pages are handled when compaction happens? Thanks!
> >> 
> >> Hi Mel,
> >> 
> >> Could you help us answer this question? I think you must be the expert of
> >> CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we all know,
> >> some struct pages are uninnitialized when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
> >> enabled, if someone allocates a larger memory (e.g. order is 4) via buddy
> >> allocator and fails to allocate the memory, then we will go into the compacting
> >> routine, which will traverse all pfns and use pfn_to_page to access its struct
> >> page, however, those struct pages may be uninnitialized (so it's arbitrary data).
> >> Our question is how to prevent the compacting routine from accessing those
> >> uninitialized struct pages? We'll be appreciated if you know the answer.
> >> 
> > 
> > I didn't check the code but IIRC, the struct pages should be at least
> > valid and not contain arbitrary data once page_alloc_init_late finishes.
> 
> However, the buddy allocator is ready before page_alloc_init_late(), so it
> may access arbitrary data in compacting routine, right?
> 

Again, I didn't check the code but given that there is a minimum amount of
the zone that must be initialised and only the highest zone is deferred
for initialisation (again, didn't check this), there may be an implicit
assumption that compaction is not required in early boot. Even if it was
attempted, it would likely have nothing to do as fragmentation-related events
that can be resolved by compaction should not occur that early in boot.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-08-31 10:28 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 11:18 [v3 0/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
2023-08-25 11:18 ` [v3 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
2023-08-28  7:15   ` Muchun Song
2023-08-28 18:25     ` Mike Kravetz
2023-08-25 11:18 ` [v3 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
2023-08-28  7:16   ` Muchun Song
2023-08-28  7:37   ` Mike Rapoport
2023-08-28 18:39   ` Mike Kravetz
2023-08-25 11:18 ` [v3 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT_VMEMMAP flag Usama Arif
2023-08-28  7:26   ` Muchun Song
2023-08-28  7:47   ` Mike Rapoport
2023-08-28  8:52     ` Muchun Song
2023-08-28  9:09       ` Mike Rapoport
2023-08-28  9:18         ` Muchun Song
2023-08-25 11:18 ` [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
2023-08-28 11:33   ` Muchun Song
2023-08-28 21:04     ` Mike Kravetz
2023-08-29  3:33       ` Muchun Song
2023-08-29  3:47         ` Mike Kravetz
2023-08-30 10:27     ` [External] " Usama Arif
2023-08-31  6:21       ` [External] " Muchun Song
2023-08-31  9:58         ` Mel Gorman
2023-08-31 10:01           ` Muchun Song
2023-08-31 10:28             ` Mel Gorman
2023-08-31  7:33       ` [External] " Mike Rapoport

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.