All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO
@ 2023-07-24 13:46 Usama Arif
  2023-07-24 13:46 ` [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Usama Arif
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Usama Arif @ 2023-07-24 13:46 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, fam.zheng, liangma, simon.evans, punit.agrawal, Usama Arif

If the region is for gigantic hugepages and if HVO is enabled, then those
struct pages which will be freed later by HVO don't need to be prepared and
initialized. This can save significant time when a large number of hugepages
are allocated at boot time.

For a 1G hugepage, this series avoid initialization and preparation of
262144 - 64 = 262080 struct pages per hugepage.

When tested on a 512G system (which can allocate max 500 1G hugepages), the
kexec-boot time with HVO and DEFERRED_STRUCT_PAGE_INIT enabled without this
patchseries to running init is 3.9 seconds. With this patch it is 1.2 seconds.
This represents an approximately 70% reduction in boot time and will
significantly reduce server downtime when using a large number of
gigantic pages.

Thanks,
Usama

Usama Arif (4):
  mm/hugetlb: Skip prep of tail pages when HVO is enabled
  mm/memblock: Add hugepage_size member to struct memblock_region
  mm/hugetlb_vmemmap: Use nid of the head page to reallocate it
  mm/memblock: Skip initialization of struct pages freed later by HVO

 arch/arm64/mm/kasan_init.c                   |  2 +-
 arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
 arch/powerpc/platforms/pseries/setup.c       |  4 +-
 arch/powerpc/sysdev/dart_iommu.c             |  2 +-
 include/linux/memblock.h                     |  8 +-
 mm/cma.c                                     |  4 +-
 mm/hugetlb.c                                 | 36 +++++---
 mm/hugetlb_vmemmap.c                         |  6 +-
 mm/hugetlb_vmemmap.h                         |  4 +
 mm/memblock.c                                | 87 +++++++++++++-------
 mm/mm_init.c                                 |  2 +-
 mm/sparse-vmemmap.c                          |  2 +-
 tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
 13 files changed, 106 insertions(+), 55 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled
  2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
@ 2023-07-24 13:46 ` Usama Arif
  2023-07-24 17:33   ` kernel test robot
  2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Usama Arif @ 2023-07-24 13:46 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, fam.zheng, liangma, simon.evans, punit.agrawal, Usama Arif

When vmemmap is optimizable, it will free all the
duplicated tail pages in hugetlb_vmemmap_optimize while
preparing the new hugepage. Hence, there is no need to
prepare them.

For 1G x86 hugepages, it avoids preparing
262144 - 64 = 262080 struct pages per hugepage.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 mm/hugetlb.c         | 30 +++++++++++++++++++++---------
 mm/hugetlb_vmemmap.c |  2 +-
 mm/hugetlb_vmemmap.h |  1 +
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 64a3239b6407..24352abbb9e5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1943,13 +1943,22 @@ static void prep_new_hugetlb_folio(struct hstate *h, struct folio *folio, int ni
 }
 
 static bool __prep_compound_gigantic_folio(struct folio *folio,
-					unsigned int order, bool demote)
+					unsigned int order, bool demote,
+					bool hugetlb_vmemmap_optimizable)
 {
 	int i, j;
 	int nr_pages = 1 << order;
 	struct page *p;
 
 	__folio_clear_reserved(folio);
+
+	/*
+	 * No need to prep pages that will be freed later by hugetlb_vmemmap_optimize
+	 * in prep_new_huge_page. Hence, reduce nr_pages to the pages that will be kept.
+	 */
+	if (hugetlb_vmemmap_optimizable)
+		nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);
+
 	for (i = 0; i < nr_pages; i++) {
 		p = folio_page(folio, i);
 
@@ -2020,15 +2029,15 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 }
 
 static bool prep_compound_gigantic_folio(struct folio *folio,
-							unsigned int order)
+							unsigned int order, bool hugetlb_vmemmap_optimizable)
 {
-	return __prep_compound_gigantic_folio(folio, order, false);
+	return __prep_compound_gigantic_folio(folio, order, false, hugetlb_vmemmap_optimizable);
 }
 
 static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
-							unsigned int order)
+							unsigned int order, bool hugetlb_vmemmap_optimizable)
 {
-	return __prep_compound_gigantic_folio(folio, order, true);
+	return __prep_compound_gigantic_folio(folio, order, true, hugetlb_vmemmap_optimizable);
 }
 
 /*
@@ -2185,7 +2194,8 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
 	if (!folio)
 		return NULL;
 	if (hstate_is_gigantic(h)) {
-		if (!prep_compound_gigantic_folio(folio, huge_page_order(h))) {
+		if (!prep_compound_gigantic_folio(folio, huge_page_order(h),
+							 vmemmap_should_optimize(h, &folio->page))) {
 			/*
 			 * Rare failure to convert pages to compound page.
 			 * Free pages and try again - ONCE!
@@ -3201,7 +3211,8 @@ static void __init gather_bootmem_prealloc(void)
 
 		VM_BUG_ON(!hstate_is_gigantic(h));
 		WARN_ON(folio_ref_count(folio) != 1);
-		if (prep_compound_gigantic_folio(folio, huge_page_order(h))) {
+		if (prep_compound_gigantic_folio(folio, huge_page_order(h),
+						vmemmap_should_optimize(h, page))) {
 			WARN_ON(folio_test_reserved(folio));
 			prep_new_hugetlb_folio(h, folio, folio_nid(folio));
 			free_huge_page(page); /* add to the hugepage allocator */
@@ -3624,8 +3635,9 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
 		subpage = folio_page(folio, i);
 		inner_folio = page_folio(subpage);
 		if (hstate_is_gigantic(target_hstate))
-			prep_compound_gigantic_folio_for_demote(inner_folio,
-							target_hstate->order);
+			prep_compound_gigantic_folio_for_demote(folio,
+							target_hstate->order,
+							vmemmap_should_optimize(target_hstate, subpage));
 		else
 			prep_compound_page(subpage, target_hstate->order);
 		folio_change_private(inner_folio, NULL);
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c2007ef5e9b0..b721e87de2b3 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -486,7 +486,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
 }
 
 /* Return true iff a HugeTLB whose vmemmap should and can be optimized. */
-static bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
+bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
 {
 	if (!READ_ONCE(vmemmap_optimize_enabled))
 		return false;
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index 25bd0e002431..3525c514c061 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -57,4 +57,5 @@ static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h)
 {
 	return hugetlb_vmemmap_optimizable_size(h) != 0;
 }
+bool vmemmap_should_optimize(const struct hstate *h, const struct page *head);
 #endif /* _LINUX_HUGETLB_VMEMMAP_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
  2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
  2023-07-24 13:46 ` [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Usama Arif
@ 2023-07-24 13:46 ` Usama Arif
  2023-07-24 17:33   ` kernel test robot
                     ` (2 more replies)
  2023-07-24 13:46 ` [RFC 3/4] mm/hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 14+ messages in thread
From: Usama Arif @ 2023-07-24 13:46 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, fam.zheng, liangma, simon.evans, punit.agrawal, Usama Arif

This propagates the hugepage size from the memblock APIs
(memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
so that it can be stored in struct memblock region. This does not
introduce any functional change and hugepage_size is not used in
this commit. It is just a setup for the next commit where huge_pagesize
is used to skip initialization of struct pages that will be freed later
when HVO is enabled.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 arch/arm64/mm/kasan_init.c                   |  2 +-
 arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
 arch/powerpc/platforms/pseries/setup.c       |  4 +-
 arch/powerpc/sysdev/dart_iommu.c             |  2 +-
 include/linux/memblock.h                     |  8 ++-
 mm/cma.c                                     |  4 +-
 mm/hugetlb.c                                 |  6 +-
 mm/memblock.c                                | 60 ++++++++++++--------
 mm/mm_init.c                                 |  2 +-
 mm/sparse-vmemmap.c                          |  2 +-
 tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
 11 files changed, 56 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index f17d066e85eb..39992a418891 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -50,7 +50,7 @@ static phys_addr_t __init kasan_alloc_raw_page(int node)
 	void *p = memblock_alloc_try_nid_raw(PAGE_SIZE, PAGE_SIZE,
 						__pa(MAX_DMA_ADDRESS),
 						MEMBLOCK_ALLOC_NOLEAKTRACE,
-						node);
+						node, 0);
 	if (!p)
 		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%llx\n",
 		      __func__, PAGE_SIZE, PAGE_SIZE, node,
diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c
index 375487cba874..6963cdf76bce 100644
--- a/arch/powerpc/platforms/pasemi/iommu.c
+++ b/arch/powerpc/platforms/pasemi/iommu.c
@@ -201,7 +201,7 @@ static int __init iob_init(struct device_node *dn)
 	/* For 2G space, 8x64 pages (2^21 bytes) is max total l2 size */
 	iob_l2_base = memblock_alloc_try_nid_raw(1UL << 21, 1UL << 21,
 					MEMBLOCK_LOW_LIMIT, 0x80000000,
-					NUMA_NO_NODE);
+					NUMA_NO_NODE, 0);
 	if (!iob_l2_base)
 		panic("%s: Failed to allocate %lu bytes align=0x%lx max_addr=%x\n",
 		      __func__, 1UL << 21, 1UL << 21, 0x80000000);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index e2a57cfa6c83..cec7198b59d2 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -160,7 +160,7 @@ static void __init fwnmi_init(void)
 	 */
 	mce_data_buf = memblock_alloc_try_nid_raw(RTAS_ERROR_LOG_MAX * nr_cpus,
 					RTAS_ERROR_LOG_MAX, MEMBLOCK_LOW_LIMIT,
-					ppc64_rma_size, NUMA_NO_NODE);
+					ppc64_rma_size, NUMA_NO_NODE, 0);
 	if (!mce_data_buf)
 		panic("Failed to allocate %d bytes below %pa for MCE buffer\n",
 		      RTAS_ERROR_LOG_MAX * nr_cpus, &ppc64_rma_size);
@@ -176,7 +176,7 @@ static void __init fwnmi_init(void)
 		size = sizeof(struct slb_entry) * mmu_slb_size * nr_cpus;
 		slb_ptr = memblock_alloc_try_nid_raw(size,
 				sizeof(struct slb_entry), MEMBLOCK_LOW_LIMIT,
-				ppc64_rma_size, NUMA_NO_NODE);
+				ppc64_rma_size, NUMA_NO_NODE, 0);
 		if (!slb_ptr)
 			panic("Failed to allocate %zu bytes below %pa for slb area\n",
 			      size, &ppc64_rma_size);
diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 98096bbfd62e..86c676b61899 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -239,7 +239,7 @@ static void __init allocate_dart(void)
 	 */
 	dart_tablebase = memblock_alloc_try_nid_raw(SZ_16M, SZ_16M,
 					MEMBLOCK_LOW_LIMIT, SZ_2G,
-					NUMA_NO_NODE);
+					NUMA_NO_NODE, 0);
 	if (!dart_tablebase)
 		panic("Failed to allocate 16MB below 2GB for DART table\n");
 
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f71ff9f0ec81..bb8019540d73 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -63,6 +63,7 @@ struct memblock_region {
 #ifdef CONFIG_NUMA
 	int nid;
 #endif
+	phys_addr_t hugepage_size;
 };
 
 /**
@@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
 				      phys_addr_t start, phys_addr_t end);
 phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
 				      phys_addr_t align, phys_addr_t start,
-				      phys_addr_t end, int nid, bool exact_nid);
+				      phys_addr_t end, int nid, bool exact_nid,
+				      phys_addr_t hugepage_size);
 phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
 
 static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size,
@@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
 				 int nid);
 void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
 				 phys_addr_t min_addr, phys_addr_t max_addr,
-				 int nid);
+				 int nid, phys_addr_t hugepage_size);
 void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
 			     phys_addr_t min_addr, phys_addr_t max_addr,
 			     int nid);
@@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size,
 {
 	return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT,
 					  MEMBLOCK_ALLOC_ACCESSIBLE,
-					  NUMA_NO_NODE);
+					  NUMA_NO_NODE, 0);
 }
 
 static inline void *memblock_alloc_from(phys_addr_t size,
diff --git a/mm/cma.c b/mm/cma.c
index a4cfe995e11e..a270905aa7f2 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -334,7 +334,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 		if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) {
 			memblock_set_bottom_up(true);
 			addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
-							limit, nid, true);
+							limit, nid, true, 0);
 			memblock_set_bottom_up(false);
 		}
 #endif
@@ -353,7 +353,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 
 		if (!addr) {
 			addr = memblock_alloc_range_nid(size, alignment, base,
-					limit, nid, true);
+					limit, nid, true, 0);
 			if (!addr) {
 				ret = -ENOMEM;
 				goto err;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 24352abbb9e5..5ba7fd702458 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3168,7 +3168,8 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 	/* do node specific alloc */
 	if (nid != NUMA_NO_NODE) {
 		m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h),
-				0, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+				0, MEMBLOCK_ALLOC_ACCESSIBLE, nid,
+				hugetlb_vmemmap_optimizable(h) ? huge_page_size(h) : 0);
 		if (!m)
 			return 0;
 		goto found;
@@ -3177,7 +3178,8 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 	for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) {
 		m = memblock_alloc_try_nid_raw(
 				huge_page_size(h), huge_page_size(h),
-				0, MEMBLOCK_ALLOC_ACCESSIBLE, node);
+				0, MEMBLOCK_ALLOC_ACCESSIBLE, node,
+				hugetlb_vmemmap_optimizable(h) ? huge_page_size(h) : 0);
 		/*
 		 * Use the beginning of the huge page to store the
 		 * huge_bootmem_page struct (until gather_bootmem
diff --git a/mm/memblock.c b/mm/memblock.c
index f9e61e565a53..e92d437bcb51 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -549,7 +549,8 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 						   int idx, phys_addr_t base,
 						   phys_addr_t size,
 						   int nid,
-						   enum memblock_flags flags)
+						   enum memblock_flags flags,
+						   phys_addr_t hugepage_size)
 {
 	struct memblock_region *rgn = &type->regions[idx];
 
@@ -558,6 +559,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 	rgn->base = base;
 	rgn->size = size;
 	rgn->flags = flags;
+	rgn->hugepage_size = hugepage_size;
 	memblock_set_region_node(rgn, nid);
 	type->cnt++;
 	type->total_size += size;
@@ -581,7 +583,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  */
 static int __init_memblock memblock_add_range(struct memblock_type *type,
 				phys_addr_t base, phys_addr_t size,
-				int nid, enum memblock_flags flags)
+				int nid, enum memblock_flags flags, phys_addr_t hugepage_size)
 {
 	bool insert = false;
 	phys_addr_t obase = base;
@@ -598,6 +600,7 @@ static int __init_memblock memblock_add_range(struct memblock_type *type,
 		type->regions[0].base = base;
 		type->regions[0].size = size;
 		type->regions[0].flags = flags;
+		type->regions[0].hugepage_size = hugepage_size;
 		memblock_set_region_node(&type->regions[0], nid);
 		type->total_size = size;
 		return 0;
@@ -646,7 +649,7 @@ static int __init_memblock memblock_add_range(struct memblock_type *type,
 				end_rgn = idx + 1;
 				memblock_insert_region(type, idx++, base,
 						       rbase - base, nid,
-						       flags);
+						       flags, hugepage_size);
 			}
 		}
 		/* area below @rend is dealt with, forget about it */
@@ -661,7 +664,7 @@ static int __init_memblock memblock_add_range(struct memblock_type *type,
 				start_rgn = idx;
 			end_rgn = idx + 1;
 			memblock_insert_region(type, idx, base, end - base,
-					       nid, flags);
+					       nid, flags, hugepage_size);
 		}
 	}
 
@@ -705,7 +708,7 @@ int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size,
 	memblock_dbg("%s: [%pa-%pa] nid=%d flags=%x %pS\n", __func__,
 		     &base, &end, nid, flags, (void *)_RET_IP_);
 
-	return memblock_add_range(&memblock.memory, base, size, nid, flags);
+	return memblock_add_range(&memblock.memory, base, size, nid, flags, 0);
 }
 
 /**
@@ -726,7 +729,7 @@ int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
 	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
-	return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0);
+	return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0, 0);
 }
 
 /**
@@ -782,7 +785,7 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			type->total_size -= base - rbase;
 			memblock_insert_region(type, idx, rbase, base - rbase,
 					       memblock_get_region_node(rgn),
-					       rgn->flags);
+					       rgn->flags, 0);
 		} else if (rend > end) {
 			/*
 			 * @rgn intersects from above.  Split and redo the
@@ -793,7 +796,7 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			type->total_size -= end - rbase;
 			memblock_insert_region(type, idx--, rbase, end - rbase,
 					       memblock_get_region_node(rgn),
-					       rgn->flags);
+					       rgn->flags, 0);
 		} else {
 			/* @rgn is fully contained, record it */
 			if (!*end_rgn)
@@ -863,14 +866,20 @@ int __init_memblock memblock_phys_free(phys_addr_t base, phys_addr_t size)
 	return memblock_remove_range(&memblock.reserved, base, size);
 }
 
-int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
+					   phys_addr_t hugepage_size)
 {
 	phys_addr_t end = base + size - 1;
 
 	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
-	return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0);
+	return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0, hugepage_size);
+}
+
+int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_reserve_huge(base, size, 0);
 }
 
 #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
@@ -881,7 +890,7 @@ int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size)
 	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
-	return memblock_add_range(&physmem, base, size, MAX_NUMNODES, 0);
+	return memblock_add_range(&physmem, base, size, MAX_NUMNODES, 0, 0);
 }
 #endif
 
@@ -1365,6 +1374,7 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
  * @end: the upper bound of the memory region to allocate (phys address)
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
  * @exact_nid: control the allocation fall back to other nodes
+ * @hugepage_size: size of the hugepages in bytes
  *
  * The allocation is performed from memory region limited by
  * memblock.current_limit if @end == %MEMBLOCK_ALLOC_ACCESSIBLE.
@@ -1385,7 +1395,7 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
 phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 					phys_addr_t align, phys_addr_t start,
 					phys_addr_t end, int nid,
-					bool exact_nid)
+					bool exact_nid, phys_addr_t hugepage_size)
 {
 	enum memblock_flags flags = choose_memblock_flags();
 	phys_addr_t found;
@@ -1402,14 +1412,14 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 again:
 	found = memblock_find_in_range_node(size, align, start, end, nid,
 					    flags);
-	if (found && !memblock_reserve(found, size))
+	if (found && !memblock_reserve_huge(found, size, hugepage_size))
 		goto done;
 
 	if (nid != NUMA_NO_NODE && !exact_nid) {
 		found = memblock_find_in_range_node(size, align, start,
 						    end, NUMA_NO_NODE,
 						    flags);
-		if (found && !memblock_reserve(found, size))
+		if (found && !memblock_reserve_huge(found, size, hugepage_size))
 			goto done;
 	}
 
@@ -1469,7 +1479,7 @@ phys_addr_t __init memblock_phys_alloc_range(phys_addr_t size,
 		     __func__, (u64)size, (u64)align, &start, &end,
 		     (void *)_RET_IP_);
 	return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE,
-					false);
+					false, 0);
 }
 
 /**
@@ -1488,7 +1498,7 @@ phys_addr_t __init memblock_phys_alloc_range(phys_addr_t size,
 phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
 {
 	return memblock_alloc_range_nid(size, align, 0,
-					MEMBLOCK_ALLOC_ACCESSIBLE, nid, false);
+					MEMBLOCK_ALLOC_ACCESSIBLE, nid, false, 0);
 }
 
 /**
@@ -1514,7 +1524,7 @@ phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t ali
 static void * __init memblock_alloc_internal(
 				phys_addr_t size, phys_addr_t align,
 				phys_addr_t min_addr, phys_addr_t max_addr,
-				int nid, bool exact_nid)
+				int nid, bool exact_nid, phys_addr_t hugepage_size)
 {
 	phys_addr_t alloc;
 
@@ -1530,12 +1540,12 @@ static void * __init memblock_alloc_internal(
 		max_addr = memblock.current_limit;
 
 	alloc = memblock_alloc_range_nid(size, align, min_addr, max_addr, nid,
-					exact_nid);
+					exact_nid, hugepage_size);
 
 	/* retry allocation without lower limit */
 	if (!alloc && min_addr)
 		alloc = memblock_alloc_range_nid(size, align, 0, max_addr, nid,
-						exact_nid);
+						exact_nid, hugepage_size);
 
 	if (!alloc)
 		return NULL;
@@ -1571,7 +1581,7 @@ void * __init memblock_alloc_exact_nid_raw(
 		     &max_addr, (void *)_RET_IP_);
 
 	return memblock_alloc_internal(size, align, min_addr, max_addr, nid,
-				       true);
+				       true, 0);
 }
 
 /**
@@ -1585,25 +1595,29 @@ void * __init memblock_alloc_exact_nid_raw(
  *	      is preferred (phys address), or %MEMBLOCK_ALLOC_ACCESSIBLE to
  *	      allocate only from memory limited by memblock.current_limit value
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ * @hugepage_size: size of the hugepages in bytes
  *
  * Public function, provides additional debug information (including caller
  * info), if enabled. Does not zero allocated memory, does not panic if request
  * cannot be satisfied.
  *
+ * If hugepage_size is not 0 and HVO is enabled, then only the struct pages
+ * that are not freed by HVO are initialized using the hugepage_size parameter.
+ *
  * Return:
  * Virtual address of allocated memory block on success, NULL on failure.
  */
 void * __init memblock_alloc_try_nid_raw(
 			phys_addr_t size, phys_addr_t align,
 			phys_addr_t min_addr, phys_addr_t max_addr,
-			int nid)
+			int nid, phys_addr_t hugepage_size)
 {
 	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
 		     __func__, (u64)size, (u64)align, nid, &min_addr,
 		     &max_addr, (void *)_RET_IP_);
 
 	return memblock_alloc_internal(size, align, min_addr, max_addr, nid,
-				       false);
+				       false, hugepage_size);
 }
 
 /**
@@ -1634,7 +1648,7 @@ void * __init memblock_alloc_try_nid(
 		     __func__, (u64)size, (u64)align, nid, &min_addr,
 		     &max_addr, (void *)_RET_IP_);
 	ptr = memblock_alloc_internal(size, align,
-					   min_addr, max_addr, nid, false);
+					   min_addr, max_addr, nid, false, 0);
 	if (ptr)
 		memset(ptr, 0, size);
 
diff --git a/mm/mm_init.c b/mm/mm_init.c
index a1963c3322af..c36d768bb671 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1615,7 +1615,7 @@ void __init *memmap_alloc(phys_addr_t size, phys_addr_t align,
 	else
 		ptr = memblock_alloc_try_nid_raw(size, align, min_addr,
 						 MEMBLOCK_ALLOC_ACCESSIBLE,
-						 nid);
+						 nid, 0);
 
 	if (ptr && size > 0)
 		page_init_poison(ptr, size);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a044a130405b..56b8b8e684df 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -43,7 +43,7 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
 				unsigned long goal)
 {
 	return memblock_alloc_try_nid_raw(size, align, goal,
-					       MEMBLOCK_ALLOC_ACCESSIBLE, node);
+					       MEMBLOCK_ALLOC_ACCESSIBLE, node, 0);
 }
 
 void * __meminit vmemmap_alloc_block(unsigned long size, int node)
diff --git a/tools/testing/memblock/tests/alloc_nid_api.c b/tools/testing/memblock/tests/alloc_nid_api.c
index 49bb416d34ff..225044366fbb 100644
--- a/tools/testing/memblock/tests/alloc_nid_api.c
+++ b/tools/testing/memblock/tests/alloc_nid_api.c
@@ -43,7 +43,7 @@ static inline void *run_memblock_alloc_nid(phys_addr_t size,
 						    max_addr, nid);
 	if (alloc_nid_test_flags & TEST_F_RAW)
 		return memblock_alloc_try_nid_raw(size, align, min_addr,
-						  max_addr, nid);
+						  max_addr, nid, 0);
 	return memblock_alloc_try_nid(size, align, min_addr, max_addr, nid);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 3/4] mm/hugetlb_vmemmap: Use nid of the head page to reallocate it
  2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
  2023-07-24 13:46 ` [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Usama Arif
  2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
@ 2023-07-24 13:46 ` Usama Arif
  2023-07-24 13:46 ` [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO Usama Arif
  2023-07-26 10:34 ` [RFC 0/4] mm/memblock: Skip prep and " Usama Arif
  4 siblings, 0 replies; 14+ messages in thread
From: Usama Arif @ 2023-07-24 13:46 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, fam.zheng, liangma, simon.evans, punit.agrawal, Usama Arif

If tail page prep and initialization is skipped, then the "start"
page will not contain the correct nid. Use the nid from first
vmemap page.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 mm/hugetlb_vmemmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index b721e87de2b3..bdf750a4786b 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -324,7 +324,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
 		.reuse_addr	= reuse,
 		.vmemmap_pages	= &vmemmap_pages,
 	};
-	int nid = page_to_nid((struct page *)start);
+	int nid = page_to_nid((struct page *)reuse);
 	gfp_t gfp_mask = GFP_KERNEL | __GFP_THISNODE | __GFP_NORETRY |
 			__GFP_NOWARN;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO
  2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
                   ` (2 preceding siblings ...)
  2023-07-24 13:46 ` [RFC 3/4] mm/hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
@ 2023-07-24 13:46 ` Usama Arif
  2023-07-24 18:26   ` kernel test robot
  2023-07-26 10:34 ` [RFC 0/4] mm/memblock: Skip prep and " Usama Arif
  4 siblings, 1 reply; 14+ messages in thread
From: Usama Arif @ 2023-07-24 13:46 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, fam.zheng, liangma, simon.evans, punit.agrawal, Usama Arif

If the region is for hugepages and if HVO is enabled, then those
struct pages which will be freed later don't need to be initialized.
This can save significant time when a large number of hugepages are
allocated at boot time. As memmap_init_reserved_pages is only called at
boot time, we don't need to worry about memory hotplug.

Hugepage regions are kept separate from non hugepage regions in
memblock_merge_regions so that initialization for unused struct pages
can be skipped for the entire region.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 mm/hugetlb_vmemmap.c |  2 +-
 mm/hugetlb_vmemmap.h |  3 +++
 mm/memblock.c        | 27 ++++++++++++++++++++++-----
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index bdf750a4786b..b5b7834e0f42 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -443,7 +443,7 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
 DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
 EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key);
 
-static bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON);
+bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON);
 core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0);
 
 /**
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index 3525c514c061..8b9a1563f7b9 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -58,4 +58,7 @@ static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h)
 	return hugetlb_vmemmap_optimizable_size(h) != 0;
 }
 bool vmemmap_should_optimize(const struct hstate *h, const struct page *head);
+
+extern bool vmemmap_optimize_enabled;
+
 #endif /* _LINUX_HUGETLB_VMEMMAP_H */
diff --git a/mm/memblock.c b/mm/memblock.c
index e92d437bcb51..62072a0226de 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -21,6 +21,7 @@
 #include <linux/io.h>
 
 #include "internal.h"
+#include "hugetlb_vmemmap.h"
 
 #define INIT_MEMBLOCK_REGIONS			128
 #define INIT_PHYSMEM_REGIONS			4
@@ -519,7 +520,8 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type,
 		if (this->base + this->size != next->base ||
 		    memblock_get_region_node(this) !=
 		    memblock_get_region_node(next) ||
-		    this->flags != next->flags) {
+		    this->flags != next->flags ||
+		    this->hugepage_size != next->hugepage_size) {
 			BUG_ON(this->base + this->size > next->base);
 			i++;
 			continue;
@@ -2125,10 +2127,25 @@ static void __init memmap_init_reserved_pages(void)
 	/* initialize struct pages for the reserved regions */
 	for_each_reserved_mem_region(region) {
 		nid = memblock_get_region_node(region);
-		start = region->base;
-		end = start + region->size;
-
-		reserve_bootmem_region(start, end, nid);
+		/*
+		 * If the region is for hugepages and if HVO is enabled, then those
+		 * struct pages which will be freed later don't need to be initialized.
+		 * This can save significant time when a large number of hugepages are
+		 * allocated at boot time. As this is at boot time, we don't need to
+		 * worry about memory hotplug.
+		 */
+		if (region->hugepage_size && vmemmap_optimize_enabled) {
+			for (start = region->base;
+			    start < region->base + region->size;
+			    start += region->hugepage_size) {
+				end = start + HUGETLB_VMEMMAP_RESERVE_SIZE * sizeof(struct page);
+				reserve_bootmem_region(start, end, nid);
+			}
+		} else {
+			start = region->base;
+			end = start + region->size;
+			reserve_bootmem_region(start, end, nid);
+		}
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled
  2023-07-24 13:46 ` [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Usama Arif
@ 2023-07-24 17:33   ` kernel test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-07-24 17:33 UTC (permalink / raw)
  To: Usama Arif; +Cc: oe-kbuild-all

Hi Usama,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230724-214832
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230724134644.1299963-2-usama.arif%40bytedance.com
patch subject: [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled
config: sparc-randconfig-r036-20230724 (https://download.01.org/0day-ci/archive/20230725/202307250114.8KlXie9d-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230725/202307250114.8KlXie9d-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307250114.8KlXie9d-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/hugetlb.c: In function '__prep_compound_gigantic_folio':
>> mm/hugetlb.c:1988:28: error: 'HUGETLB_VMEMMAP_RESERVE_SIZE' undeclared (first use in this function)
    1988 |                 nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);
         |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/hugetlb.c:1988:28: note: each undeclared identifier is reported only once for each function it appears in


vim +/HUGETLB_VMEMMAP_RESERVE_SIZE +1988 mm/hugetlb.c

  1972	
  1973	static bool __prep_compound_gigantic_folio(struct folio *folio,
  1974						unsigned int order, bool demote,
  1975						bool hugetlb_vmemmap_optimizable)
  1976	{
  1977		int i, j;
  1978		int nr_pages = 1 << order;
  1979		struct page *p;
  1980	
  1981		__folio_clear_reserved(folio);
  1982	
  1983		/*
  1984		 * No need to prep pages that will be freed later by hugetlb_vmemmap_optimize
  1985		 * in prep_new_huge_page. Hence, reduce nr_pages to the pages that will be kept.
  1986		 */
  1987		if (hugetlb_vmemmap_optimizable)
> 1988			nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);
  1989	
  1990		for (i = 0; i < nr_pages; i++) {
  1991			p = folio_page(folio, i);
  1992	
  1993			/*
  1994			 * For gigantic hugepages allocated through bootmem at
  1995			 * boot, it's safer to be consistent with the not-gigantic
  1996			 * hugepages and clear the PG_reserved bit from all tail pages
  1997			 * too.  Otherwise drivers using get_user_pages() to access tail
  1998			 * pages may get the reference counting wrong if they see
  1999			 * PG_reserved set on a tail page (despite the head page not
  2000			 * having PG_reserved set).  Enforcing this consistency between
  2001			 * head and tail pages allows drivers to optimize away a check
  2002			 * on the head page when they need know if put_page() is needed
  2003			 * after get_user_pages().
  2004			 */
  2005			if (i != 0)	/* head page cleared above */
  2006				__ClearPageReserved(p);
  2007			/*
  2008			 * Subtle and very unlikely
  2009			 *
  2010			 * Gigantic 'page allocators' such as memblock or cma will
  2011			 * return a set of pages with each page ref counted.  We need
  2012			 * to turn this set of pages into a compound page with tail
  2013			 * page ref counts set to zero.  Code such as speculative page
  2014			 * cache adding could take a ref on a 'to be' tail page.
  2015			 * We need to respect any increased ref count, and only set
  2016			 * the ref count to zero if count is currently 1.  If count
  2017			 * is not 1, we return an error.  An error return indicates
  2018			 * the set of pages can not be converted to a gigantic page.
  2019			 * The caller who allocated the pages should then discard the
  2020			 * pages using the appropriate free interface.
  2021			 *
  2022			 * In the case of demote, the ref count will be zero.
  2023			 */
  2024			if (!demote) {
  2025				if (!page_ref_freeze(p, 1)) {
  2026					pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
  2027					goto out_error;
  2028				}
  2029			} else {
  2030				VM_BUG_ON_PAGE(page_count(p), p);
  2031			}
  2032			if (i != 0)
  2033				set_compound_head(p, &folio->page);
  2034		}
  2035		__folio_set_head(folio);
  2036		/* we rely on prep_new_hugetlb_folio to set the destructor */
  2037		folio_set_order(folio, order);
  2038		atomic_set(&folio->_entire_mapcount, -1);
  2039		atomic_set(&folio->_nr_pages_mapped, 0);
  2040		atomic_set(&folio->_pincount, 0);
  2041		return true;
  2042	
  2043	out_error:
  2044		/* undo page modifications made above */
  2045		for (j = 0; j < i; j++) {
  2046			p = folio_page(folio, j);
  2047			if (j != 0)
  2048				clear_compound_head(p);
  2049			set_page_refcounted(p);
  2050		}
  2051		/* need to clear PG_reserved on remaining tail pages  */
  2052		for (; j < nr_pages; j++) {
  2053			p = folio_page(folio, j);
  2054			__ClearPageReserved(p);
  2055		}
  2056		return false;
  2057	}
  2058	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
  2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
@ 2023-07-24 17:33   ` kernel test robot
  2023-07-24 17:44   ` kernel test robot
  2023-07-26 11:01   ` Mike Rapoport
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-07-24 17:33 UTC (permalink / raw)
  To: Usama Arif; +Cc: llvm, oe-kbuild-all

Hi Usama,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230724-214832
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230724134644.1299963-3-usama.arif%40bytedance.com
patch subject: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20230725/202307250139.EDNyQbWQ-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230725/202307250139.EDNyQbWQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307250139.EDNyQbWQ-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from mm/memblock.c:18:
   In file included from include/linux/memblock.h:13:
   In file included from arch/um/include/asm/dma.h:5:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     547 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     560 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
         |                                                   ^
   In file included from mm/memblock.c:18:
   In file included from include/linux/memblock.h:13:
   In file included from arch/um/include/asm/dma.h:5:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     573 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
         |                                                   ^
   In file included from mm/memblock.c:18:
   In file included from include/linux/memblock.h:13:
   In file included from arch/um/include/asm/dma.h:5:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     584 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     692 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     700 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     708 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     717 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     726 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     735 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
>> mm/memblock.c:869:21: warning: no previous prototype for function 'memblock_reserve_huge' [-Wmissing-prototypes]
     869 | int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
         |                     ^
   mm/memblock.c:869:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     869 | int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
         | ^
         | static 
   13 warnings generated.


vim +/memblock_reserve_huge +869 mm/memblock.c

   868	
 > 869	int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
   870						   phys_addr_t hugepage_size)
   871	{
   872		phys_addr_t end = base + size - 1;
   873	
   874		memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
   875			     &base, &end, (void *)_RET_IP_);
   876	
   877		return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0, hugepage_size);
   878	}
   879	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
  2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
  2023-07-24 17:33   ` kernel test robot
@ 2023-07-24 17:44   ` kernel test robot
  2023-07-26 11:01   ` Mike Rapoport
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-07-24 17:44 UTC (permalink / raw)
  To: Usama Arif; +Cc: llvm, oe-kbuild-all

Hi Usama,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230724-214832
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230724134644.1299963-3-usama.arif%40bytedance.com
patch subject: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
config: riscv-randconfig-r042-20230724 (https://download.01.org/0day-ci/archive/20230725/202307250135.1cG4zZN4-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230725/202307250135.1cG4zZN4-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307250135.1cG4zZN4-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> mm/cma.c:350:37: error: too few arguments to function call, expected 7, have 6
     350 |                         addr = memblock_alloc_range_nid(size, alignment,
         |                                ~~~~~~~~~~~~~~~~~~~~~~~~
     351 |                                         highmem_start, limit, nid, true);
         |                                                                        ^
   include/linux/memblock.h:402:13: note: 'memblock_alloc_range_nid' declared here
     402 | phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
         |             ^
   1 error generated.
--
>> mm/memblock.c:869:21: warning: no previous prototype for function 'memblock_reserve_huge' [-Wmissing-prototypes]
     869 | int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
         |                     ^
   mm/memblock.c:869:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     869 | int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
         | ^
         | static 
   1 warning generated.


vim +350 mm/cma.c

df2ff39e78da74d Roman Gushchin   2021-02-25  341  
148aa87e4f631e9 Levi Yun         2023-01-18  342  		/*
148aa87e4f631e9 Levi Yun         2023-01-18  343  		 * All pages in the reserved area must come from the same zone.
148aa87e4f631e9 Levi Yun         2023-01-18  344  		 * If the requested region crosses the low/high memory boundary,
148aa87e4f631e9 Levi Yun         2023-01-18  345  		 * try allocating from high memory first and fall back to low
148aa87e4f631e9 Levi Yun         2023-01-18  346  		 * memory in case of failure.
148aa87e4f631e9 Levi Yun         2023-01-18  347  		 */
148aa87e4f631e9 Levi Yun         2023-01-18  348  		if (!addr && base < highmem_start && limit > highmem_start) {
148aa87e4f631e9 Levi Yun         2023-01-18  349  			addr = memblock_alloc_range_nid(size, alignment,
148aa87e4f631e9 Levi Yun         2023-01-18 @350  					highmem_start, limit, nid, true);
148aa87e4f631e9 Levi Yun         2023-01-18  351  			limit = highmem_start;
148aa87e4f631e9 Levi Yun         2023-01-18  352  		}
148aa87e4f631e9 Levi Yun         2023-01-18  353  
16195ddd4ebcc10 Laurent Pinchart 2014-10-24  354  		if (!addr) {
8676af1ff2d28e6 Aslan Bakirov    2020-04-10  355  			addr = memblock_alloc_range_nid(size, alignment, base,
7a3e1836a1ea68e Usama Arif       2023-07-24  356  					limit, nid, true, 0);
a254129e8686bff Joonsoo Kim      2014-08-06  357  			if (!addr) {
a254129e8686bff Joonsoo Kim      2014-08-06  358  				ret = -ENOMEM;
a254129e8686bff Joonsoo Kim      2014-08-06  359  				goto err;
a254129e8686bff Joonsoo Kim      2014-08-06  360  			}
a254129e8686bff Joonsoo Kim      2014-08-06  361  		}
a254129e8686bff Joonsoo Kim      2014-08-06  362  
620951e2745750d Thierry Reding   2014-12-12  363  		/*
620951e2745750d Thierry Reding   2014-12-12  364  		 * kmemleak scans/reads tracked objects for pointers to other
620951e2745750d Thierry Reding   2014-12-12  365  		 * objects but this address isn't mapped and accessible
620951e2745750d Thierry Reding   2014-12-12  366  		 */
9099daed9c6991a Catalin Marinas  2016-10-11  367  		kmemleak_ignore_phys(addr);
16195ddd4ebcc10 Laurent Pinchart 2014-10-24  368  		base = addr;
16195ddd4ebcc10 Laurent Pinchart 2014-10-24  369  	}
16195ddd4ebcc10 Laurent Pinchart 2014-10-24  370  
f318dd083c8128c Laura Abbott     2017-04-18  371  	ret = cma_init_reserved_mem(base, size, order_per_bit, name, res_cma);
de9e14eebf33a60 Marek Szyprowski 2014-10-13  372  	if (ret)
0d3bd18a5efd660 Peng Fan         2019-03-05  373  		goto free_mem;
a254129e8686bff Joonsoo Kim      2014-08-06  374  
56fa4f609badbe4 Laurent Pinchart 2014-10-24  375  	pr_info("Reserved %ld MiB at %pa\n", (unsigned long)size / SZ_1M,
56fa4f609badbe4 Laurent Pinchart 2014-10-24  376  		&base);
a254129e8686bff Joonsoo Kim      2014-08-06  377  	return 0;
a254129e8686bff Joonsoo Kim      2014-08-06  378  
0d3bd18a5efd660 Peng Fan         2019-03-05  379  free_mem:
3ecc68349bbab6b Mike Rapoport    2021-11-05  380  	memblock_phys_free(base, size);
a254129e8686bff Joonsoo Kim      2014-08-06  381  err:
0de9d2ebe590f92 Joonsoo Kim      2014-08-06  382  	pr_err("Failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M);
a254129e8686bff Joonsoo Kim      2014-08-06  383  	return ret;
a254129e8686bff Joonsoo Kim      2014-08-06  384  }
a254129e8686bff Joonsoo Kim      2014-08-06  385  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO
  2023-07-24 13:46 ` [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO Usama Arif
@ 2023-07-24 18:26   ` kernel test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-07-24 18:26 UTC (permalink / raw)
  To: Usama Arif; +Cc: llvm, oe-kbuild-all

Hi Usama,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230724-214832
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230724134644.1299963-5-usama.arif%40bytedance.com
patch subject: [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20230725/202307250246.x2VHKOEo-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230725/202307250246.x2VHKOEo-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307250246.x2VHKOEo-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from mm/memblock.c:18:
   In file included from include/linux/memblock.h:13:
   In file included from arch/um/include/asm/dma.h:5:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     547 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     560 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
         |                                                   ^
   In file included from mm/memblock.c:18:
   In file included from include/linux/memblock.h:13:
   In file included from arch/um/include/asm/dma.h:5:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     573 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
         |                                                   ^
   In file included from mm/memblock.c:18:
   In file included from include/linux/memblock.h:13:
   In file included from arch/um/include/asm/dma.h:5:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     584 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     692 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     700 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     708 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     717 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     726 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     735 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   mm/memblock.c:871:21: warning: no previous prototype for function 'memblock_reserve_huge' [-Wmissing-prototypes]
     871 | int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
         |                     ^
   mm/memblock.c:871:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     871 | int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t size,
         | ^
         | static 
>> mm/memblock.c:2141:19: error: use of undeclared identifier 'HUGETLB_VMEMMAP_RESERVE_SIZE'
    2141 |                                 end = start + HUGETLB_VMEMMAP_RESERVE_SIZE * sizeof(struct page);
         |                                               ^
   13 warnings and 1 error generated.


vim +/HUGETLB_VMEMMAP_RESERVE_SIZE +2141 mm/memblock.c

  2105	
  2106	static void __init memmap_init_reserved_pages(void)
  2107	{
  2108		struct memblock_region *region;
  2109		phys_addr_t start, end;
  2110		int nid;
  2111	
  2112		/*
  2113		 * set nid on all reserved pages and also treat struct
  2114		 * pages for the NOMAP regions as PageReserved
  2115		 */
  2116		for_each_mem_region(region) {
  2117			nid = memblock_get_region_node(region);
  2118			start = region->base;
  2119			end = start + region->size;
  2120	
  2121			if (memblock_is_nomap(region))
  2122				reserve_bootmem_region(start, end, nid);
  2123	
  2124			memblock_set_node(start, end, &memblock.reserved, nid);
  2125		}
  2126	
  2127		/* initialize struct pages for the reserved regions */
  2128		for_each_reserved_mem_region(region) {
  2129			nid = memblock_get_region_node(region);
  2130			/*
  2131			 * If the region is for hugepages and if HVO is enabled, then those
  2132			 * struct pages which will be freed later don't need to be initialized.
  2133			 * This can save significant time when a large number of hugepages are
  2134			 * allocated at boot time. As this is at boot time, we don't need to
  2135			 * worry about memory hotplug.
  2136			 */
  2137			if (region->hugepage_size && vmemmap_optimize_enabled) {
  2138				for (start = region->base;
  2139				    start < region->base + region->size;
  2140				    start += region->hugepage_size) {
> 2141					end = start + HUGETLB_VMEMMAP_RESERVE_SIZE * sizeof(struct page);
  2142					reserve_bootmem_region(start, end, nid);
  2143				}
  2144			} else {
  2145				start = region->base;
  2146				end = start + region->size;
  2147				reserve_bootmem_region(start, end, nid);
  2148			}
  2149		}
  2150	}
  2151	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO
  2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
                   ` (3 preceding siblings ...)
  2023-07-24 13:46 ` [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO Usama Arif
@ 2023-07-26 10:34 ` Usama Arif
  4 siblings, 0 replies; 14+ messages in thread
From: Usama Arif @ 2023-07-26 10:34 UTC (permalink / raw)
  To: linux-mm, muchun.song, mike.kravetz, rppt
  Cc: linux-kernel, fam.zheng, liangma, simon.evans, punit.agrawal



On 24/07/2023 14:46, Usama Arif wrote:
> If the region is for gigantic hugepages and if HVO is enabled, then those
> struct pages which will be freed later by HVO don't need to be prepared and
> initialized. This can save significant time when a large number of hugepages
> are allocated at boot time.
> 
> For a 1G hugepage, this series avoid initialization and preparation of
> 262144 - 64 = 262080 struct pages per hugepage.
> 
> When tested on a 512G system (which can allocate max 500 1G hugepages), the
> kexec-boot time with HVO and DEFERRED_STRUCT_PAGE_INIT enabled without this
> patchseries to running init is 3.9 seconds. With this patch it is 1.2 seconds.
> This represents an approximately 70% reduction in boot time and will
> significantly reduce server downtime when using a large number of
> gigantic pages.

There were a few errors reported by kernel-bot if different config 
options were changed (CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP disabled, 
CONFIG_CMA enabled). I will fix these in the next revision if the 
general approach in the patches looks good to the community to start review.

Thanks,
Usama

> 
> Thanks,
> Usama
> 
> Usama Arif (4):
>    mm/hugetlb: Skip prep of tail pages when HVO is enabled
>    mm/memblock: Add hugepage_size member to struct memblock_region
>    mm/hugetlb_vmemmap: Use nid of the head page to reallocate it
>    mm/memblock: Skip initialization of struct pages freed later by HVO
> 
>   arch/arm64/mm/kasan_init.c                   |  2 +-
>   arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
>   arch/powerpc/platforms/pseries/setup.c       |  4 +-
>   arch/powerpc/sysdev/dart_iommu.c             |  2 +-
>   include/linux/memblock.h                     |  8 +-
>   mm/cma.c                                     |  4 +-
>   mm/hugetlb.c                                 | 36 +++++---
>   mm/hugetlb_vmemmap.c                         |  6 +-
>   mm/hugetlb_vmemmap.h                         |  4 +
>   mm/memblock.c                                | 87 +++++++++++++-------
>   mm/mm_init.c                                 |  2 +-
>   mm/sparse-vmemmap.c                          |  2 +-
>   tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
>   13 files changed, 106 insertions(+), 55 deletions(-)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
  2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
  2023-07-24 17:33   ` kernel test robot
  2023-07-24 17:44   ` kernel test robot
@ 2023-07-26 11:01   ` Mike Rapoport
  2023-07-26 15:02     ` [External] " Usama Arif
  2 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2023-07-26 11:01 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-mm, muchun.song, mike.kravetz, linux-kernel, fam.zheng,
	liangma, simon.evans, punit.agrawal

On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
> This propagates the hugepage size from the memblock APIs
> (memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
> so that it can be stored in struct memblock region. This does not
> introduce any functional change and hugepage_size is not used in
> this commit. It is just a setup for the next commit where huge_pagesize
> is used to skip initialization of struct pages that will be freed later
> when HVO is enabled.
> 
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> ---
>  arch/arm64/mm/kasan_init.c                   |  2 +-
>  arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
>  arch/powerpc/platforms/pseries/setup.c       |  4 +-
>  arch/powerpc/sysdev/dart_iommu.c             |  2 +-
>  include/linux/memblock.h                     |  8 ++-
>  mm/cma.c                                     |  4 +-
>  mm/hugetlb.c                                 |  6 +-
>  mm/memblock.c                                | 60 ++++++++++++--------
>  mm/mm_init.c                                 |  2 +-
>  mm/sparse-vmemmap.c                          |  2 +-
>  tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
>  11 files changed, 56 insertions(+), 38 deletions(-)
> 

[ snip ]

> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index f71ff9f0ec81..bb8019540d73 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -63,6 +63,7 @@ struct memblock_region {
>  #ifdef CONFIG_NUMA
>  	int nid;
>  #endif
> +	phys_addr_t hugepage_size;
>  };
>  
>  /**
> @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
>  				      phys_addr_t start, phys_addr_t end);
>  phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
>  				      phys_addr_t align, phys_addr_t start,
> -				      phys_addr_t end, int nid, bool exact_nid);
> +				      phys_addr_t end, int nid, bool exact_nid,
> +				      phys_addr_t hugepage_size);

Rather than adding yet another parameter to memblock_phys_alloc_range() we
can have an API that sets a flag on the reserved regions.
With this the hugetlb reservation code can set a flag when HVO is
enabled and memmap_init_reserved_pages() will skip regions with this flag
set.

>  phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
>  
>  static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size,
> @@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
>  				 int nid);
>  void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>  				 phys_addr_t min_addr, phys_addr_t max_addr,
> -				 int nid);
> +				 int nid, phys_addr_t hugepage_size);
>  void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
>  			     phys_addr_t min_addr, phys_addr_t max_addr,
>  			     int nid);
> @@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size,
>  {
>  	return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT,
>  					  MEMBLOCK_ALLOC_ACCESSIBLE,
> -					  NUMA_NO_NODE);
> +					  NUMA_NO_NODE, 0);
>  }
>  
>  static inline void *memblock_alloc_from(phys_addr_t size,

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
  2023-07-26 11:01   ` Mike Rapoport
@ 2023-07-26 15:02     ` Usama Arif
  2023-07-27  4:30       ` Mike Rapoport
  0 siblings, 1 reply; 14+ messages in thread
From: Usama Arif @ 2023-07-26 15:02 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, muchun.song, mike.kravetz, linux-kernel, fam.zheng,
	liangma, simon.evans, punit.agrawal



On 26/07/2023 12:01, Mike Rapoport wrote:
> On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
>> This propagates the hugepage size from the memblock APIs
>> (memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
>> so that it can be stored in struct memblock region. This does not
>> introduce any functional change and hugepage_size is not used in
>> this commit. It is just a setup for the next commit where huge_pagesize
>> is used to skip initialization of struct pages that will be freed later
>> when HVO is enabled.
>>
>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>> ---
>>   arch/arm64/mm/kasan_init.c                   |  2 +-
>>   arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
>>   arch/powerpc/platforms/pseries/setup.c       |  4 +-
>>   arch/powerpc/sysdev/dart_iommu.c             |  2 +-
>>   include/linux/memblock.h                     |  8 ++-
>>   mm/cma.c                                     |  4 +-
>>   mm/hugetlb.c                                 |  6 +-
>>   mm/memblock.c                                | 60 ++++++++++++--------
>>   mm/mm_init.c                                 |  2 +-
>>   mm/sparse-vmemmap.c                          |  2 +-
>>   tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
>>   11 files changed, 56 insertions(+), 38 deletions(-)
>>
> 
> [ snip ]
> 
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f71ff9f0ec81..bb8019540d73 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -63,6 +63,7 @@ struct memblock_region {
>>   #ifdef CONFIG_NUMA
>>   	int nid;
>>   #endif
>> +	phys_addr_t hugepage_size;
>>   };
>>   
>>   /**
>> @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
>>   				      phys_addr_t start, phys_addr_t end);
>>   phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
>>   				      phys_addr_t align, phys_addr_t start,
>> -				      phys_addr_t end, int nid, bool exact_nid);
>> +				      phys_addr_t end, int nid, bool exact_nid,
>> +				      phys_addr_t hugepage_size);
> 
> Rather than adding yet another parameter to memblock_phys_alloc_range() we
> can have an API that sets a flag on the reserved regions.
> With this the hugetlb reservation code can set a flag when HVO is
> enabled and memmap_init_reserved_pages() will skip regions with this flag
> set.
> 

Hi,

Thanks for the review.

I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw 
and not memblock_phys_alloc_range?

My initial approach was to use flags, but I think it looks worse than 
what I have done in this RFC (I have pushed the flags prototype at 
https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO, 
top 4 commits for reference (the main difference is patch 2 and 4 from 
RFC)). The major points are (the bigger issue is in patch 4):

- (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is 
propagated from memblock_alloc_try_nid_raw through function calls. When 
using flags, the "no_init" boolean is propogated from 
memblock_alloc_try_nid_raw through function calls until the region flags 
are available in memblock_add_range and the new MEMBLOCK_NOINIT flag is 
set. I think its a bit more tricky to introduce a new function to set 
the flag in the region AFTER the call to memblock_alloc_try_nid_raw has 
finished as the memblock_region can not be found.
So something (hugepage_size/flag information) still has to be propagated 
through function calls and a new argument needs to be added.

- (RFC vs flags patch 4 comparison) We can't skip initialization of the 
whole region, only the tail pages. We still need to initialize the 
HUGETLB_VMEMMAP_RESERVE_SIZE (PAGE_SIZE) struct pages for each gigantic 
page.
In the RFC, hugepage_size from patch 2 was used in the for loop in 
memmap_init_reserved_pages in patch 4 to reserve 
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages for every hugepage_size. This 
looks very simple and not hacky.
If we use a flag, there are 2 ways to initialize the 
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage:

1. (implemented in github link patch 4) memmap_init_reserved_pages skips 
the region for initialization as you suggested, and then we initialize 
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage somewhere later 
(I did it in gather_bootmem_prealloc). When calling 
reserve_bootmem_region in gather_bootmem_prealloc, we need to skip 
early_page_uninitialised and this makes it look a bit hacky.

2. We initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per 
hugepage in memmap_init_reserved_pages itself. As we have used a flag 
and havent passed hugepage_size, we need to get the gigantic page size 
somehow. There doesnt seem to be a nice way to determine the gigantic 
page size in that function which is architecture dependent. I think 
gigantic page size can be given by PAGE_SIZE << (PUD_SHIFT - 
PAGE_SHIFT), but not sure if this is ok for all architectures? If we can 
use PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT) it will look much better than 
point 1.

Both the RFC patches and the github flags implementation work, but I 
think RFC patches look much cleaner. If there is a strong preference for 
the the github patches I can send it to mailing list?

Thanks,
Usama


>>   phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
>>   
>>   static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size,
>> @@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
>>   				 int nid);
>>   void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>>   				 phys_addr_t min_addr, phys_addr_t max_addr,
>> -				 int nid);
>> +				 int nid, phys_addr_t hugepage_size);
>>   void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
>>   			     phys_addr_t min_addr, phys_addr_t max_addr,
>>   			     int nid);
>> @@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size,
>>   {
>>   	return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT,
>>   					  MEMBLOCK_ALLOC_ACCESSIBLE,
>> -					  NUMA_NO_NODE);
>> +					  NUMA_NO_NODE, 0);
>>   }
>>   
>>   static inline void *memblock_alloc_from(phys_addr_t size,
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
  2023-07-26 15:02     ` [External] " Usama Arif
@ 2023-07-27  4:30       ` Mike Rapoport
  2023-07-27 20:56         ` Usama Arif
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2023-07-27  4:30 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-mm, muchun.song, mike.kravetz, linux-kernel, fam.zheng,
	liangma, simon.evans, punit.agrawal

On Wed, Jul 26, 2023 at 04:02:21PM +0100, Usama Arif wrote:
> 
> On 26/07/2023 12:01, Mike Rapoport wrote:
> > On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
> > > This propagates the hugepage size from the memblock APIs
> > > (memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
> > > so that it can be stored in struct memblock region. This does not
> > > introduce any functional change and hugepage_size is not used in
> > > this commit. It is just a setup for the next commit where huge_pagesize
> > > is used to skip initialization of struct pages that will be freed later
> > > when HVO is enabled.
> > > 
> > > Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> > > ---
> > >   arch/arm64/mm/kasan_init.c                   |  2 +-
> > >   arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
> > >   arch/powerpc/platforms/pseries/setup.c       |  4 +-
> > >   arch/powerpc/sysdev/dart_iommu.c             |  2 +-
> > >   include/linux/memblock.h                     |  8 ++-
> > >   mm/cma.c                                     |  4 +-
> > >   mm/hugetlb.c                                 |  6 +-
> > >   mm/memblock.c                                | 60 ++++++++++++--------
> > >   mm/mm_init.c                                 |  2 +-
> > >   mm/sparse-vmemmap.c                          |  2 +-
> > >   tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
> > >   11 files changed, 56 insertions(+), 38 deletions(-)
> > > 
> > 
> > [ snip ]
> > 
> > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > > index f71ff9f0ec81..bb8019540d73 100644
> > > --- a/include/linux/memblock.h
> > > +++ b/include/linux/memblock.h
> > > @@ -63,6 +63,7 @@ struct memblock_region {
> > >   #ifdef CONFIG_NUMA
> > >   	int nid;
> > >   #endif
> > > +	phys_addr_t hugepage_size;
> > >   };
> > >   /**
> > > @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
> > >   				      phys_addr_t start, phys_addr_t end);
> > >   phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
> > >   				      phys_addr_t align, phys_addr_t start,
> > > -				      phys_addr_t end, int nid, bool exact_nid);
> > > +				      phys_addr_t end, int nid, bool exact_nid,
> > > +				      phys_addr_t hugepage_size);
> > 
> > Rather than adding yet another parameter to memblock_phys_alloc_range() we
> > can have an API that sets a flag on the reserved regions.
> > With this the hugetlb reservation code can set a flag when HVO is
> > enabled and memmap_init_reserved_pages() will skip regions with this flag
> > set.
> > 
> 
> Hi,
> 
> Thanks for the review.
> 
> I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw and
> not memblock_phys_alloc_range?

Yes.
 
> My initial approach was to use flags, but I think it looks worse than what I
> have done in this RFC (I have pushed the flags prototype at
> https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO,
> top 4 commits for reference (the main difference is patch 2 and 4 from
> RFC)). The major points are (the bigger issue is in patch 4):
> 
> - (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is propagated
> from memblock_alloc_try_nid_raw through function calls. When using flags,
> the "no_init" boolean is propogated from memblock_alloc_try_nid_raw through
> function calls until the region flags are available in memblock_add_range
> and the new MEMBLOCK_NOINIT flag is set. I think its a bit more tricky to
> introduce a new function to set the flag in the region AFTER the call to
> memblock_alloc_try_nid_raw has finished as the memblock_region can not be
> found.
> So something (hugepage_size/flag information) still has to be propagated
> through function calls and a new argument needs to be added.

Sorry if I wasn't clear. I didn't mean to add flags parameter, I meant to
add a flag and a function that sets this flag for a range. So for
MEMBLOCK_NOINIT there would be 

int memblock_mark_noinit(phys_addr_t base, phys_addr_t size);

I'd just name this flag MEMBLOCK_RSRV_NOINIT to make it clear it controls
the reserved regions.

This won't require updating all call sites of memblock_alloc_range_nid()
and memblock_alloc_try_nid_raw() but only a small refactoring of
memblock_setclr_flag() and its callers.

> - (RFC vs flags patch 4 comparison) We can't skip initialization of the
> whole region, only the tail pages. We still need to initialize the
> HUGETLB_VMEMMAP_RESERVE_SIZE (PAGE_SIZE) struct pages for each gigantic
> page.
> In the RFC, hugepage_size from patch 2 was used in the for loop in
> memmap_init_reserved_pages in patch 4 to reserve
> HUGETLB_VMEMMAP_RESERVE_SIZE struct pages for every hugepage_size. This
> looks very simple and not hacky.

But this requires having hugetlb details in memblock which feels backwards
to me.

With memblock_mark_noinit() you can decide what parts of a gigantic page
should be initialized in __alloc_bootmem_huge_page() and mark as NOINIT
only relevant range.

> If we use a flag, there are 2 ways to initialize the
> HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage:
> 
> 1. (implemented in github link patch 4) memmap_init_reserved_pages skips the
> region for initialization as you suggested, and then we initialize
> HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage somewhere later (I
> did it in gather_bootmem_prealloc). When calling reserve_bootmem_region in
> gather_bootmem_prealloc, we need to skip early_page_uninitialised and this
> makes it look a bit hacky.
> 
> 2. We initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage
> in memmap_init_reserved_pages itself. As we have used a flag and havent
> passed hugepage_size, we need to get the gigantic page size somehow. There
> doesnt seem to be a nice way to determine the gigantic page size in that
> function which is architecture dependent. I think gigantic page size can be
> given by PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT), but not sure if this is ok
> for all architectures? If we can use PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT)
> it will look much better than point 1.
> 
> Both the RFC patches and the github flags implementation work, but I think
> RFC patches look much cleaner. If there is a strong preference for the the
> github patches I can send it to mailing list?
> 
> Thanks,
> Usama

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
  2023-07-27  4:30       ` Mike Rapoport
@ 2023-07-27 20:56         ` Usama Arif
  0 siblings, 0 replies; 14+ messages in thread
From: Usama Arif @ 2023-07-27 20:56 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, muchun.song, mike.kravetz, linux-kernel, fam.zheng,
	liangma, simon.evans, punit.agrawal



On 27/07/2023 05:30, Mike Rapoport wrote:
> On Wed, Jul 26, 2023 at 04:02:21PM +0100, Usama Arif wrote:
>>
>> On 26/07/2023 12:01, Mike Rapoport wrote:
>>> On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
>>>> This propagates the hugepage size from the memblock APIs
>>>> (memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
>>>> so that it can be stored in struct memblock region. This does not
>>>> introduce any functional change and hugepage_size is not used in
>>>> this commit. It is just a setup for the next commit where huge_pagesize
>>>> is used to skip initialization of struct pages that will be freed later
>>>> when HVO is enabled.
>>>>
>>>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>>>> ---
>>>>    arch/arm64/mm/kasan_init.c                   |  2 +-
>>>>    arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
>>>>    arch/powerpc/platforms/pseries/setup.c       |  4 +-
>>>>    arch/powerpc/sysdev/dart_iommu.c             |  2 +-
>>>>    include/linux/memblock.h                     |  8 ++-
>>>>    mm/cma.c                                     |  4 +-
>>>>    mm/hugetlb.c                                 |  6 +-
>>>>    mm/memblock.c                                | 60 ++++++++++++--------
>>>>    mm/mm_init.c                                 |  2 +-
>>>>    mm/sparse-vmemmap.c                          |  2 +-
>>>>    tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
>>>>    11 files changed, 56 insertions(+), 38 deletions(-)
>>>>
>>>
>>> [ snip ]
>>>
>>>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>>>> index f71ff9f0ec81..bb8019540d73 100644
>>>> --- a/include/linux/memblock.h
>>>> +++ b/include/linux/memblock.h
>>>> @@ -63,6 +63,7 @@ struct memblock_region {
>>>>    #ifdef CONFIG_NUMA
>>>>    	int nid;
>>>>    #endif
>>>> +	phys_addr_t hugepage_size;
>>>>    };
>>>>    /**
>>>> @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
>>>>    				      phys_addr_t start, phys_addr_t end);
>>>>    phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
>>>>    				      phys_addr_t align, phys_addr_t start,
>>>> -				      phys_addr_t end, int nid, bool exact_nid);
>>>> +				      phys_addr_t end, int nid, bool exact_nid,
>>>> +				      phys_addr_t hugepage_size);
>>>
>>> Rather than adding yet another parameter to memblock_phys_alloc_range() we
>>> can have an API that sets a flag on the reserved regions.
>>> With this the hugetlb reservation code can set a flag when HVO is
>>> enabled and memmap_init_reserved_pages() will skip regions with this flag
>>> set.
>>>
>>
>> Hi,
>>
>> Thanks for the review.
>>
>> I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw and
>> not memblock_phys_alloc_range?
> 
> Yes.
>   
>> My initial approach was to use flags, but I think it looks worse than what I
>> have done in this RFC (I have pushed the flags prototype at
>> https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO,
>> top 4 commits for reference (the main difference is patch 2 and 4 from
>> RFC)). The major points are (the bigger issue is in patch 4):
>>
>> - (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is propagated
>> from memblock_alloc_try_nid_raw through function calls. When using flags,
>> the "no_init" boolean is propogated from memblock_alloc_try_nid_raw through
>> function calls until the region flags are available in memblock_add_range
>> and the new MEMBLOCK_NOINIT flag is set. I think its a bit more tricky to
>> introduce a new function to set the flag in the region AFTER the call to
>> memblock_alloc_try_nid_raw has finished as the memblock_region can not be
>> found.
>> So something (hugepage_size/flag information) still has to be propagated
>> through function calls and a new argument needs to be added.
> 
> Sorry if I wasn't clear. I didn't mean to add flags parameter, I meant to
> add a flag and a function that sets this flag for a range. So for
> MEMBLOCK_NOINIT there would be
> 
> int memblock_mark_noinit(phys_addr_t base, phys_addr_t size);
> 
> I'd just name this flag MEMBLOCK_RSRV_NOINIT to make it clear it controls
> the reserved regions.
> 
> This won't require updating all call sites of memblock_alloc_range_nid()
> and memblock_alloc_try_nid_raw() but only a small refactoring of
> memblock_setclr_flag() and its callers.
> 

Thanks for this, its much cleaner doing the way you described. I have 
sent v1 implementing this 
https://lore.kernel.org/all/20230727204624.1942372-1-usama.arif@bytedance.com/.

Regards,
Usama


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-07-27 20:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
2023-07-24 13:46 ` [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Usama Arif
2023-07-24 17:33   ` kernel test robot
2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
2023-07-24 17:33   ` kernel test robot
2023-07-24 17:44   ` kernel test robot
2023-07-26 11:01   ` Mike Rapoport
2023-07-26 15:02     ` [External] " Usama Arif
2023-07-27  4:30       ` Mike Rapoport
2023-07-27 20:56         ` Usama Arif
2023-07-24 13:46 ` [RFC 3/4] mm/hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
2023-07-24 13:46 ` [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO Usama Arif
2023-07-24 18:26   ` kernel test robot
2023-07-26 10:34 ` [RFC 0/4] mm/memblock: Skip prep and " Usama Arif

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.