All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] make hugetlb_optimize_vmemmap compatible with memmap_on_memory
@ 2022-06-19 13:38 Muchun Song
  2022-06-19 13:38 ` [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags Muchun Song
  2022-06-19 13:38 ` [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory Muchun Song
  0 siblings, 2 replies; 11+ messages in thread
From: Muchun Song @ 2022-06-19 13:38 UTC (permalink / raw)
  To: akpm, corbet, david, mike.kravetz, osalvador, paulmck
  Cc: linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun, Muchun Song

This series makes hugetlb_optimize_vmemmap compatible with memmap_on_memory
and is based on mm-stable.  The reason refers to the patch 2's commit log.

v4:
 - Fix compiling error when CONFIG_MEMORY_HOTPLUG is disabled reported by kernel test robot.
 - Fix a bug when memory_block_size_bytes() is not equal to section size.

v3:
 - Switch complicated enumeration magic (David).
 - Introduce PageVmemmapSelfHosted to make both parameters compatible (David and Oscar).

v2:
 - Fix compile error when !CONFIG_ZONE_DEVICE reported by kernel test robot.

Muchun Song (2):
  mm: memory_hotplug: enumerate all supported section flags
  mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with
    memmap_on_memory

 Documentation/admin-guide/kernel-parameters.txt | 22 +++++------
 Documentation/admin-guide/sysctl/vm.rst         |  5 +--
 include/linux/memory_hotplug.h                  |  9 -----
 include/linux/mmzone.h                          | 44 ++++++++++++++++-----
 include/linux/page-flags.h                      | 11 ++++++
 mm/hugetlb_vmemmap.c                            | 52 +++++++++++++++++++++----
 mm/memory_hotplug.c                             | 33 ++++++++--------
 mm/sparse.c                                     |  2 +-
 8 files changed, 121 insertions(+), 57 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags
  2022-06-19 13:38 [PATCH v4 0/2] make hugetlb_optimize_vmemmap compatible with memmap_on_memory Muchun Song
@ 2022-06-19 13:38 ` Muchun Song
  2022-06-20  7:51   ` David Hildenbrand
  2022-06-19 13:38 ` [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory Muchun Song
  1 sibling, 1 reply; 11+ messages in thread
From: Muchun Song @ 2022-06-19 13:38 UTC (permalink / raw)
  To: akpm, corbet, david, mike.kravetz, osalvador, paulmck
  Cc: linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun, Muchun Song

We are almost running out of section flags, only one bit is available in
the worst case (powerpc with 256k pages).  However, there are still some
free bits (in ->section_mem_map) on other architectures (e.g. x86_64 has
10 bits available, arm64 has 8 bits available with worst case of 64K
pages).  We have hard coded those numbers in code, it is inconvenient to
use those bits on other architectures except powerpc.  So transfer those
section flags to enumeration to make it easy to add new section flags in
the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of
CONFIG_ZONE_DEVICE to save a bit on non-zone-device case.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/mmzone.h | 44 +++++++++++++++++++++++++++++++++++---------
 mm/memory_hotplug.c    |  6 ++++++
 mm/sparse.c            |  2 +-
 3 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index aab70355d64f..932843c6459b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1418,16 +1418,35 @@ extern size_t mem_section_usage_size(void);
  *      (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
  *      worst combination is powerpc with 256k pages,
  *      which results in PFN_SECTION_SHIFT equal 6.
- * To sum it up, at least 6 bits are available.
+ * To sum it up, at least 6 bits are available on all architectures.
+ * However, we can exceed 6 bits on some other architectures except
+ * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
+ * with the worst case of 64K pages on arm64) if we make sure the
+ * exceeded bit is not applicable to powerpc.
  */
-#define SECTION_MARKED_PRESENT		(1UL<<0)
-#define SECTION_HAS_MEM_MAP		(1UL<<1)
-#define SECTION_IS_ONLINE		(1UL<<2)
-#define SECTION_IS_EARLY		(1UL<<3)
-#define SECTION_TAINT_ZONE_DEVICE	(1UL<<4)
-#define SECTION_MAP_LAST_BIT		(1UL<<5)
-#define SECTION_MAP_MASK		(~(SECTION_MAP_LAST_BIT-1))
-#define SECTION_NID_SHIFT		6
+enum {
+	SECTION_MARKED_PRESENT_BIT,
+	SECTION_HAS_MEM_MAP_BIT,
+	SECTION_IS_ONLINE_BIT,
+	SECTION_IS_EARLY_BIT,
+#ifdef CONFIG_ZONE_DEVICE
+	SECTION_TAINT_ZONE_DEVICE_BIT,
+#endif
+	SECTION_MAP_LAST_BIT,
+};
+
+enum {
+	SECTION_MARKED_PRESENT		= BIT(SECTION_MARKED_PRESENT_BIT),
+	SECTION_HAS_MEM_MAP		= BIT(SECTION_HAS_MEM_MAP_BIT),
+	SECTION_IS_ONLINE		= BIT(SECTION_IS_ONLINE_BIT),
+	SECTION_IS_EARLY		= BIT(SECTION_IS_EARLY_BIT),
+#ifdef CONFIG_ZONE_DEVICE
+	SECTION_TAINT_ZONE_DEVICE	= BIT(SECTION_TAINT_ZONE_DEVICE_BIT),
+#endif
+};
+
+#define SECTION_MAP_MASK		(~(BIT(SECTION_MAP_LAST_BIT) - 1))
+#define SECTION_NID_SHIFT		SECTION_MAP_LAST_BIT
 
 static inline struct page *__section_mem_map_addr(struct mem_section *section)
 {
@@ -1466,12 +1485,19 @@ static inline int online_section(struct mem_section *section)
 	return (section && (section->section_mem_map & SECTION_IS_ONLINE));
 }
 
+#ifdef CONFIG_ZONE_DEVICE
 static inline int online_device_section(struct mem_section *section)
 {
 	unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE;
 
 	return section && ((section->section_mem_map & flags) == flags);
 }
+#else
+static inline int online_device_section(struct mem_section *section)
+{
+	return 0;
+}
+#endif
 
 static inline int online_section_nr(unsigned long nr)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1f1a730c4499..6662b86e9e64 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -670,12 +670,18 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
 
 }
 
+#ifdef CONFIG_ZONE_DEVICE
 static void section_taint_zone_device(unsigned long pfn)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
 
 	ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE;
 }
+#else
+static inline void section_taint_zone_device(unsigned long pfn)
+{
+}
+#endif
 
 /*
  * Associate the pfn range with the given zone, initializing the memmaps
diff --git a/mm/sparse.c b/mm/sparse.c
index cb3bfae64036..e5a8a3a0edd7 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -281,7 +281,7 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
 {
 	unsigned long coded_mem_map =
 		(unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
-	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > (1UL<<PFN_SECTION_SHIFT));
+	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
 	BUG_ON(coded_mem_map & ~SECTION_MAP_MASK);
 	return coded_mem_map;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
  2022-06-19 13:38 [PATCH v4 0/2] make hugetlb_optimize_vmemmap compatible with memmap_on_memory Muchun Song
  2022-06-19 13:38 ` [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags Muchun Song
@ 2022-06-19 13:38 ` Muchun Song
  2022-06-20  7:22   ` Muchun Song
  1 sibling, 1 reply; 11+ messages in thread
From: Muchun Song @ 2022-06-19 13:38 UTC (permalink / raw)
  To: akpm, corbet, david, mike.kravetz, osalvador, paulmck
  Cc: linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun, Muchun Song

For now, the feature of hugetlb_free_vmemmap is not compatible with the
feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap
takes precedence over memory_hotplug.memmap_on_memory. However, someone
wants to make memory_hotplug.memmap_on_memory takes precedence over
hugetlb_free_vmemmap since memmap_on_memory makes it more likely to
succeed memory hotplug in close-to-OOM situations.  So the decision
of making hugetlb_free_vmemmap take precedence is not wise and elegant.
The proper approach is to have hugetlb_vmemmap.c do the check whether
the section which the HugeTLB pages belong to can be optimized.  If
the section's vmemmap pages are allocated from the added memory block
itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap,
otherwise, do the optimization.  Then both kernel parameters are
compatible.  So this patch introduces VmemmapSelfHosted to mask any
non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag
to detect if a vmemmap page can be optimized.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Co-developed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 +++++------
 Documentation/admin-guide/sysctl/vm.rst         |  5 +--
 include/linux/memory_hotplug.h                  |  9 -----
 include/linux/page-flags.h                      | 11 ++++++
 mm/hugetlb_vmemmap.c                            | 52 +++++++++++++++++++++----
 mm/memory_hotplug.c                             | 27 ++++++-------
 6 files changed, 79 insertions(+), 47 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8090130b544b..d740e2ed0e61 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1722,9 +1722,11 @@
 			Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
 			the default is on.
 
-			This is not compatible with memory_hotplug.memmap_on_memory.
-			If both parameters are enabled, hugetlb_free_vmemmap takes
-			precedence over memory_hotplug.memmap_on_memory.
+			Note that the vmemmap pages may be allocated from the added
+			memory block itself when memory_hotplug.memmap_on_memory is
+			enabled, those vmemmap pages cannot be optimized even if this
+			feature is enabled.  Other vmemmap pages not allocated from
+			the added memory block itself do not be affected.
 
 	hung_task_panic=
 			[KNL] Should the hung task detector generate panics.
@@ -3069,10 +3071,12 @@
 			[KNL,X86,ARM] Boolean flag to enable this feature.
 			Format: {on | off (default)}
 			When enabled, runtime hotplugged memory will
-			allocate its internal metadata (struct pages)
-			from the hotadded memory which will allow to
-			hotadd a lot of memory without requiring
-			additional memory to do so.
+			allocate its internal metadata (struct pages,
+			those vmemmap pages cannot be optimized even
+			if hugetlb_free_vmemmap is enabled) from the
+			hotadded memory which will allow to hotadd a
+			lot of memory without requiring additional
+			memory to do so.
 			This feature is disabled by default because it
 			has some implication on large (e.g. GB)
 			allocations in some configurations (e.g. small
@@ -3082,10 +3086,6 @@
 			Note that even when enabled, there are a few cases where
 			the feature is not effective.
 
-			This is not compatible with hugetlb_free_vmemmap. If
-			both parameters are enabled, hugetlb_free_vmemmap takes
-			precedence over memory_hotplug.memmap_on_memory.
-
 	memtest=	[KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
 			Format: <integer>
 			default : 0 <disable>
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 5c9aa171a0d3..d7374a1e8ac9 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst
 hugetlb_optimize_vmemmap
 ========================
 
-This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter)
-is configured or the size of 'struct page' (a structure defined in
-include/linux/mm_types.h) is not power of two (an unusual system config could
+This knob is not available when the size of 'struct page' (a structure defined
+in include/linux/mm_types.h) is not power of two (an unusual system config could
 result in this).
 
 Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 20d7edf62a6a..e0b2209ab71c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size);
 extern bool mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
-#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
-bool mhp_memmap_on_memory(void);
-#else
-static inline bool mhp_memmap_on_memory(void)
-{
-	return false;
-}
-#endif
-
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e66f7aa3191d..2aa5dcbfe468 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -193,6 +193,11 @@ enum pageflags {
 
 	/* Only valid for buddy pages. Used to track pages that are reported */
 	PG_reported = PG_uptodate,
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+	/* For self-hosted memmap pages */
+	PG_vmemmap_self_hosted = PG_owner_priv_1,
+#endif
 };
 
 #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
@@ -628,6 +633,12 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
  */
 __PAGEFLAG(Reported, reported, PF_NO_COMPOUND)
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY)
+#else
+PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
+#endif
+
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 1089ea8a9c98..73bfbb47f6a4 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -10,7 +10,7 @@
  */
 #define pr_fmt(fmt)	"HugeTLB: " fmt
 
-#include <linux/memory_hotplug.h>
+#include <linux/memory.h>
 #include "hugetlb_vmemmap.h"
 
 /*
@@ -97,18 +97,54 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
 	return ret;
 }
 
+static unsigned int vmemmap_optimizable_pages(struct hstate *h,
+					      struct page *head)
+{
+	if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
+		return 0;
+
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
+		unsigned long pfn = page_to_pfn(head);
+
+		/*
+		 * Due to HugeTLB alignment requirements and the vmemmap pages
+		 * being at the start of the hotplugged memory region in
+		 * memory_hotplug.memmap_on_memory case. Checking the first
+		 * vmemmap page's vmemmap if it is marked as VmemmapSelfHosted
+		 * is sufficient.
+		 *
+		 * [                  hotplugged memory                  ]
+		 * [        section        ][...][        section        ]
+		 * [ vmemmap ][              usable memory               ]
+		 *   ^   |     |                                        |
+		 *   +---+     |                                        |
+		 *     ^       |                                        |
+		 *     +-------+                                        |
+		 *          ^                                           |
+		 *          +-------------------------------------------+
+		 *
+		 * Hotplugged memory block never has non-present sections, while
+		 * boot memory block can have one or more. So pfn_valid() is
+		 * used to filter out the non-present section which also cannot
+		 * be memmap_on_memory.
+		 */
+		pfn = ALIGN_DOWN(pfn, PHYS_PFN(memory_block_size_bytes()));
+		if (pfn_valid(pfn) && PageVmemmapSelfHosted(pfn_to_page(pfn)))
+			return 0;
+	}
+
+	return hugetlb_optimize_vmemmap_pages(h);
+}
+
 void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
 {
 	unsigned long vmemmap_addr = (unsigned long)head;
 	unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages;
 
-	vmemmap_pages = hugetlb_optimize_vmemmap_pages(h);
+	vmemmap_pages = vmemmap_optimizable_pages(h, head);
 	if (!vmemmap_pages)
 		return;
 
-	if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
-		return;
-
 	static_branch_inc(&hugetlb_optimize_vmemmap_key);
 
 	vmemmap_addr	+= RESERVE_VMEMMAP_SIZE;
@@ -199,10 +235,10 @@ static struct ctl_table hugetlb_vmemmap_sysctls[] = {
 static __init int hugetlb_vmemmap_sysctls_init(void)
 {
 	/*
-	 * If "memory_hotplug.memmap_on_memory" is enabled or "struct page"
-	 * crosses page boundaries, the vmemmap pages cannot be optimized.
+	 * If "struct page" crosses page boundaries, the vmemmap pages cannot
+	 * be optimized.
 	 */
-	if (!mhp_memmap_on_memory() && is_power_of_2(sizeof(struct page)))
+	if (is_power_of_2(sizeof(struct page)))
 		register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
 
 	return 0;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6662b86e9e64..3a59d4e97c03 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -43,30 +43,22 @@
 #include "shuffle.h"
 
 #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
-static int memmap_on_memory_set(const char *val, const struct kernel_param *kp)
-{
-	if (hugetlb_optimize_vmemmap_enabled())
-		return 0;
-	return param_set_bool(val, kp);
-}
-
-static const struct kernel_param_ops memmap_on_memory_ops = {
-	.flags	= KERNEL_PARAM_OPS_FL_NOARG,
-	.set	= memmap_on_memory_set,
-	.get	= param_get_bool,
-};
-
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
 static bool memmap_on_memory __ro_after_init;
-module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444);
+module_param(memmap_on_memory, bool, 0444);
 MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
 
-bool mhp_memmap_on_memory(void)
+static inline bool mhp_memmap_on_memory(void)
 {
 	return memmap_on_memory;
 }
+#else
+static inline bool mhp_memmap_on_memory(void)
+{
+	return false;
+}
 #endif
 
 enum {
@@ -1035,7 +1027,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
 			      struct zone *zone)
 {
 	unsigned long end_pfn = pfn + nr_pages;
-	int ret;
+	int ret, i;
 
 	ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
 	if (ret)
@@ -1043,6 +1035,9 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
 
 	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
 
+	for (i = 0; i < nr_pages; i++)
+		SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
+
 	/*
 	 * It might be that the vmemmap_pages fully span sections. If that is
 	 * the case, mark those sections online here as otherwise they will be
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
  2022-06-19 13:38 ` [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory Muchun Song
@ 2022-06-20  7:22   ` Muchun Song
  2022-06-20  7:47     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Muchun Song @ 2022-06-20  7:22 UTC (permalink / raw)
  To: akpm, corbet, david, mike.kravetz, osalvador, paulmck
  Cc: linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun

On Sun, Jun 19, 2022 at 09:38:51PM +0800, Muchun Song wrote:
> For now, the feature of hugetlb_free_vmemmap is not compatible with the
> feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap
> takes precedence over memory_hotplug.memmap_on_memory. However, someone
> wants to make memory_hotplug.memmap_on_memory takes precedence over
> hugetlb_free_vmemmap since memmap_on_memory makes it more likely to
> succeed memory hotplug in close-to-OOM situations.  So the decision
> of making hugetlb_free_vmemmap take precedence is not wise and elegant.
> The proper approach is to have hugetlb_vmemmap.c do the check whether
> the section which the HugeTLB pages belong to can be optimized.  If
> the section's vmemmap pages are allocated from the added memory block
> itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap,
> otherwise, do the optimization.  Then both kernel parameters are
> compatible.  So this patch introduces VmemmapSelfHosted to mask any
> non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag
> to detect if a vmemmap page can be optimized.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Co-developed-by: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 22 +++++------
>  Documentation/admin-guide/sysctl/vm.rst         |  5 +--
>  include/linux/memory_hotplug.h                  |  9 -----
>  include/linux/page-flags.h                      | 11 ++++++
>  mm/hugetlb_vmemmap.c                            | 52 +++++++++++++++++++++----
>  mm/memory_hotplug.c                             | 27 ++++++-------
>  6 files changed, 79 insertions(+), 47 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 8090130b544b..d740e2ed0e61 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1722,9 +1722,11 @@
>  			Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
>  			the default is on.
>  
> -			This is not compatible with memory_hotplug.memmap_on_memory.
> -			If both parameters are enabled, hugetlb_free_vmemmap takes
> -			precedence over memory_hotplug.memmap_on_memory.
> +			Note that the vmemmap pages may be allocated from the added
> +			memory block itself when memory_hotplug.memmap_on_memory is
> +			enabled, those vmemmap pages cannot be optimized even if this
> +			feature is enabled.  Other vmemmap pages not allocated from
> +			the added memory block itself do not be affected.
>  
>  	hung_task_panic=
>  			[KNL] Should the hung task detector generate panics.
> @@ -3069,10 +3071,12 @@
>  			[KNL,X86,ARM] Boolean flag to enable this feature.
>  			Format: {on | off (default)}
>  			When enabled, runtime hotplugged memory will
> -			allocate its internal metadata (struct pages)
> -			from the hotadded memory which will allow to
> -			hotadd a lot of memory without requiring
> -			additional memory to do so.
> +			allocate its internal metadata (struct pages,
> +			those vmemmap pages cannot be optimized even
> +			if hugetlb_free_vmemmap is enabled) from the
> +			hotadded memory which will allow to hotadd a
> +			lot of memory without requiring additional
> +			memory to do so.
>  			This feature is disabled by default because it
>  			has some implication on large (e.g. GB)
>  			allocations in some configurations (e.g. small
> @@ -3082,10 +3086,6 @@
>  			Note that even when enabled, there are a few cases where
>  			the feature is not effective.
>  
> -			This is not compatible with hugetlb_free_vmemmap. If
> -			both parameters are enabled, hugetlb_free_vmemmap takes
> -			precedence over memory_hotplug.memmap_on_memory.
> -
>  	memtest=	[KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
>  			Format: <integer>
>  			default : 0 <disable>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 5c9aa171a0d3..d7374a1e8ac9 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst
>  hugetlb_optimize_vmemmap
>  ========================
>  
> -This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter)
> -is configured or the size of 'struct page' (a structure defined in
> -include/linux/mm_types.h) is not power of two (an unusual system config could
> +This knob is not available when the size of 'struct page' (a structure defined
> +in include/linux/mm_types.h) is not power of two (an unusual system config could
>  result in this).
>  
>  Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 20d7edf62a6a..e0b2209ab71c 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size);
>  extern bool mhp_supports_memmap_on_memory(unsigned long size);
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
> -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
> -bool mhp_memmap_on_memory(void);
> -#else
> -static inline bool mhp_memmap_on_memory(void)
> -{
> -	return false;
> -}
> -#endif
> -
>  #endif /* __LINUX_MEMORY_HOTPLUG_H */
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index e66f7aa3191d..2aa5dcbfe468 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -193,6 +193,11 @@ enum pageflags {
>  
>  	/* Only valid for buddy pages. Used to track pages that are reported */
>  	PG_reported = PG_uptodate,
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +	/* For self-hosted memmap pages */
> +	PG_vmemmap_self_hosted = PG_owner_priv_1,
> +#endif
>  };
>  
>  #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
> @@ -628,6 +633,12 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
>   */
>  __PAGEFLAG(Reported, reported, PF_NO_COMPOUND)
>  
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY)
> +#else
> +PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
> +#endif
> +
>  /*
>   * On an anonymous page mapped into a user virtual memory area,
>   * page->mapping points to its anon_vma, not to a struct address_space;
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 1089ea8a9c98..73bfbb47f6a4 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -10,7 +10,7 @@
>   */
>  #define pr_fmt(fmt)	"HugeTLB: " fmt
>  
> -#include <linux/memory_hotplug.h>
> +#include <linux/memory.h>
>  #include "hugetlb_vmemmap.h"
>  
>  /*
> @@ -97,18 +97,54 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>  	return ret;
>  }
>  
> +static unsigned int vmemmap_optimizable_pages(struct hstate *h,
> +					      struct page *head)
> +{
> +	if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
> +		return 0;
> +
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
> +		unsigned long pfn = page_to_pfn(head);
> +
> +		/*
> +		 * Due to HugeTLB alignment requirements and the vmemmap pages
> +		 * being at the start of the hotplugged memory region in
> +		 * memory_hotplug.memmap_on_memory case. Checking the first
> +		 * vmemmap page's vmemmap if it is marked as VmemmapSelfHosted
> +		 * is sufficient.
> +		 *
> +		 * [                  hotplugged memory                  ]
> +		 * [        section        ][...][        section        ]
> +		 * [ vmemmap ][              usable memory               ]
> +		 *   ^   |     |                                        |
> +		 *   +---+     |                                        |
> +		 *     ^       |                                        |
> +		 *     +-------+                                        |
> +		 *          ^                                           |
> +		 *          +-------------------------------------------+
> +		 *
> +		 * Hotplugged memory block never has non-present sections, while
> +		 * boot memory block can have one or more. So pfn_valid() is
> +		 * used to filter out the non-present section which also cannot
> +		 * be memmap_on_memory.
> +		 */
> +		pfn = ALIGN_DOWN(pfn, PHYS_PFN(memory_block_size_bytes()));
> +		if (pfn_valid(pfn) && PageVmemmapSelfHosted(pfn_to_page(pfn)))

Although it works, I think PageVmemmapSelfHosted() check for the 1st pfn's
vmemmap page is not always reliable.  Since we reused PG_owner_priv_1
as PG_vmemmap_self_hosted, the test is noly reliable for vmemmap page's
vmemmap page.  Other non-vmemmap page can be flagged with PG_owner_priv_1.
So this check can be false-positive. Maybe the following code snippet is
the solution.

Any thoughts? Oscar or David.

+       if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
+               pmd_t *pmdp, pmd;
+               struct page *vmemmap_page;
+               unsigned long vaddr = (unsigned long)head;
+
+               pmdp = pmd_off_k(vaddr);
+               /*
+                * The READ_ONCE() is used to stabilize *pmdp in a register or
+                * on the stack so that it will stop changing under the code.
+                */
+               pmd = READ_ONCE(*pmdp);
+               if (pmd_leaf(pmd))
+                       vmemmap_page = pmd_page(pmd) + pte_index(vaddr);
+               else
+                       vmemmap_page = pte_page(*pte_offset_kernel(pmdp, vaddr));
+               /*
+                * Due to HugeTLB alignment requirements and the vmemmap pages
+                * being at the start of the hotplugged memory region in
+                * memory_hotplug.memmap_on_memory case. Checking any vmemmap
+                * page's vmemmap page if it is marked as VmemmapSelfHosted is
+                * sufficient.
+                *
+                * [                  hotplugged memory                  ]
+                * [        section        ][...][        section        ]
+                * [ vmemmap ][              usable memory               ]
+                *   ^   |     |                                        |
+                *   +---+     |                                        |
+                *     ^       |                                        |
+                *     +-------+                                        |
+                *          ^                                           |
+                *          +-------------------------------------------+
+                */
+               if (PageVmemmapSelfHosted(vmemmap_page))
+                       return 0;
+       }

> +			return 0;
> +	}
> +
> +	return hugetlb_optimize_vmemmap_pages(h);
> +}
> +
>  void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
>  {
>  	unsigned long vmemmap_addr = (unsigned long)head;
>  	unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages;
>  
> -	vmemmap_pages = hugetlb_optimize_vmemmap_pages(h);
> +	vmemmap_pages = vmemmap_optimizable_pages(h, head);
>  	if (!vmemmap_pages)
>  		return;
>  
> -	if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
> -		return;
> -
>  	static_branch_inc(&hugetlb_optimize_vmemmap_key);
>  
>  	vmemmap_addr	+= RESERVE_VMEMMAP_SIZE;
> @@ -199,10 +235,10 @@ static struct ctl_table hugetlb_vmemmap_sysctls[] = {
>  static __init int hugetlb_vmemmap_sysctls_init(void)
>  {
>  	/*
> -	 * If "memory_hotplug.memmap_on_memory" is enabled or "struct page"
> -	 * crosses page boundaries, the vmemmap pages cannot be optimized.
> +	 * If "struct page" crosses page boundaries, the vmemmap pages cannot
> +	 * be optimized.
>  	 */
> -	if (!mhp_memmap_on_memory() && is_power_of_2(sizeof(struct page)))
> +	if (is_power_of_2(sizeof(struct page)))
>  		register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
>  
>  	return 0;
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6662b86e9e64..3a59d4e97c03 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -43,30 +43,22 @@
>  #include "shuffle.h"
>  
>  #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
> -static int memmap_on_memory_set(const char *val, const struct kernel_param *kp)
> -{
> -	if (hugetlb_optimize_vmemmap_enabled())
> -		return 0;
> -	return param_set_bool(val, kp);
> -}
> -
> -static const struct kernel_param_ops memmap_on_memory_ops = {
> -	.flags	= KERNEL_PARAM_OPS_FL_NOARG,
> -	.set	= memmap_on_memory_set,
> -	.get	= param_get_bool,
> -};
> -
>  /*
>   * memory_hotplug.memmap_on_memory parameter
>   */
>  static bool memmap_on_memory __ro_after_init;
> -module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444);
> +module_param(memmap_on_memory, bool, 0444);
>  MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
>  
> -bool mhp_memmap_on_memory(void)
> +static inline bool mhp_memmap_on_memory(void)
>  {
>  	return memmap_on_memory;
>  }
> +#else
> +static inline bool mhp_memmap_on_memory(void)
> +{
> +	return false;
> +}
>  #endif
>  
>  enum {
> @@ -1035,7 +1027,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
>  			      struct zone *zone)
>  {
>  	unsigned long end_pfn = pfn + nr_pages;
> -	int ret;
> +	int ret, i;
>  
>  	ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
>  	if (ret)
> @@ -1043,6 +1035,9 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
>  
>  	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
>  
> +	for (i = 0; i < nr_pages; i++)
> +		SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
> +
>  	/*
>  	 * It might be that the vmemmap_pages fully span sections. If that is
>  	 * the case, mark those sections online here as otherwise they will be
> -- 
> 2.11.0
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
  2022-06-20  7:22   ` Muchun Song
@ 2022-06-20  7:47     ` David Hildenbrand
  2022-06-20  8:29       ` Muchun Song
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2022-06-20  7:47 UTC (permalink / raw)
  To: Muchun Song, akpm, corbet, mike.kravetz, osalvador, paulmck
  Cc: linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun

On 20.06.22 09:22, Muchun Song wrote:
> On Sun, Jun 19, 2022 at 09:38:51PM +0800, Muchun Song wrote:
>> For now, the feature of hugetlb_free_vmemmap is not compatible with the
>> feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap
>> takes precedence over memory_hotplug.memmap_on_memory. However, someone
>> wants to make memory_hotplug.memmap_on_memory takes precedence over
>> hugetlb_free_vmemmap since memmap_on_memory makes it more likely to
>> succeed memory hotplug in close-to-OOM situations.  So the decision
>> of making hugetlb_free_vmemmap take precedence is not wise and elegant.
>> The proper approach is to have hugetlb_vmemmap.c do the check whether
>> the section which the HugeTLB pages belong to can be optimized.  If
>> the section's vmemmap pages are allocated from the added memory block
>> itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap,
>> otherwise, do the optimization.  Then both kernel parameters are
>> compatible.  So this patch introduces VmemmapSelfHosted to mask any
>> non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag
>> to detect if a vmemmap page can be optimized.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Co-developed-by: Oscar Salvador <osalvador@suse.de>
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>> ---
>>  Documentation/admin-guide/kernel-parameters.txt | 22 +++++------
>>  Documentation/admin-guide/sysctl/vm.rst         |  5 +--
>>  include/linux/memory_hotplug.h                  |  9 -----
>>  include/linux/page-flags.h                      | 11 ++++++
>>  mm/hugetlb_vmemmap.c                            | 52 +++++++++++++++++++++----
>>  mm/memory_hotplug.c                             | 27 ++++++-------
>>  6 files changed, 79 insertions(+), 47 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 8090130b544b..d740e2ed0e61 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1722,9 +1722,11 @@
>>  			Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
>>  			the default is on.
>>  
>> -			This is not compatible with memory_hotplug.memmap_on_memory.
>> -			If both parameters are enabled, hugetlb_free_vmemmap takes
>> -			precedence over memory_hotplug.memmap_on_memory.
>> +			Note that the vmemmap pages may be allocated from the added
>> +			memory block itself when memory_hotplug.memmap_on_memory is
>> +			enabled, those vmemmap pages cannot be optimized even if this
>> +			feature is enabled.  Other vmemmap pages not allocated from
>> +			the added memory block itself do not be affected.
>>  
>>  	hung_task_panic=
>>  			[KNL] Should the hung task detector generate panics.
>> @@ -3069,10 +3071,12 @@
>>  			[KNL,X86,ARM] Boolean flag to enable this feature.
>>  			Format: {on | off (default)}
>>  			When enabled, runtime hotplugged memory will
>> -			allocate its internal metadata (struct pages)
>> -			from the hotadded memory which will allow to
>> -			hotadd a lot of memory without requiring
>> -			additional memory to do so.
>> +			allocate its internal metadata (struct pages,
>> +			those vmemmap pages cannot be optimized even
>> +			if hugetlb_free_vmemmap is enabled) from the
>> +			hotadded memory which will allow to hotadd a
>> +			lot of memory without requiring additional
>> +			memory to do so.
>>  			This feature is disabled by default because it
>>  			has some implication on large (e.g. GB)
>>  			allocations in some configurations (e.g. small
>> @@ -3082,10 +3086,6 @@
>>  			Note that even when enabled, there are a few cases where
>>  			the feature is not effective.
>>  
>> -			This is not compatible with hugetlb_free_vmemmap. If
>> -			both parameters are enabled, hugetlb_free_vmemmap takes
>> -			precedence over memory_hotplug.memmap_on_memory.
>> -
>>  	memtest=	[KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
>>  			Format: <integer>
>>  			default : 0 <disable>
>> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
>> index 5c9aa171a0d3..d7374a1e8ac9 100644
>> --- a/Documentation/admin-guide/sysctl/vm.rst
>> +++ b/Documentation/admin-guide/sysctl/vm.rst
>> @@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst
>>  hugetlb_optimize_vmemmap
>>  ========================
>>  
>> -This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter)
>> -is configured or the size of 'struct page' (a structure defined in
>> -include/linux/mm_types.h) is not power of two (an unusual system config could
>> +This knob is not available when the size of 'struct page' (a structure defined
>> +in include/linux/mm_types.h) is not power of two (an unusual system config could
>>  result in this).
>>  
>>  Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages
>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index 20d7edf62a6a..e0b2209ab71c 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size);
>>  extern bool mhp_supports_memmap_on_memory(unsigned long size);
>>  #endif /* CONFIG_MEMORY_HOTPLUG */
>>  
>> -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>> -bool mhp_memmap_on_memory(void);
>> -#else
>> -static inline bool mhp_memmap_on_memory(void)
>> -{
>> -	return false;
>> -}
>> -#endif
>> -
>>  #endif /* __LINUX_MEMORY_HOTPLUG_H */
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index e66f7aa3191d..2aa5dcbfe468 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -193,6 +193,11 @@ enum pageflags {
>>  
>>  	/* Only valid for buddy pages. Used to track pages that are reported */
>>  	PG_reported = PG_uptodate,
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +	/* For self-hosted memmap pages */
>> +	PG_vmemmap_self_hosted = PG_owner_priv_1,
>> +#endif
>>  };
>>  
>>  #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
>> @@ -628,6 +633,12 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
>>   */
>>  __PAGEFLAG(Reported, reported, PF_NO_COMPOUND)
>>  
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY)
>> +#else
>> +PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
>> +#endif
>> +
>>  /*
>>   * On an anonymous page mapped into a user virtual memory area,
>>   * page->mapping points to its anon_vma, not to a struct address_space;
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 1089ea8a9c98..73bfbb47f6a4 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -10,7 +10,7 @@
>>   */
>>  #define pr_fmt(fmt)	"HugeTLB: " fmt
>>  
>> -#include <linux/memory_hotplug.h>
>> +#include <linux/memory.h>
>>  #include "hugetlb_vmemmap.h"
>>  
>>  /*
>> @@ -97,18 +97,54 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>  	return ret;
>>  }
>>  
>> +static unsigned int vmemmap_optimizable_pages(struct hstate *h,
>> +					      struct page *head)
>> +{
>> +	if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
>> +		return 0;
>> +
>> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
>> +		unsigned long pfn = page_to_pfn(head);
>> +
>> +		/*
>> +		 * Due to HugeTLB alignment requirements and the vmemmap pages
>> +		 * being at the start of the hotplugged memory region in
>> +		 * memory_hotplug.memmap_on_memory case. Checking the first
>> +		 * vmemmap page's vmemmap if it is marked as VmemmapSelfHosted
>> +		 * is sufficient.
>> +		 *
>> +		 * [                  hotplugged memory                  ]
>> +		 * [        section        ][...][        section        ]
>> +		 * [ vmemmap ][              usable memory               ]
>> +		 *   ^   |     |                                        |
>> +		 *   +---+     |                                        |
>> +		 *     ^       |                                        |
>> +		 *     +-------+                                        |
>> +		 *          ^                                           |
>> +		 *          +-------------------------------------------+
>> +		 *
>> +		 * Hotplugged memory block never has non-present sections, while
>> +		 * boot memory block can have one or more. So pfn_valid() is
>> +		 * used to filter out the non-present section which also cannot
>> +		 * be memmap_on_memory.
>> +		 */
>> +		pfn = ALIGN_DOWN(pfn, PHYS_PFN(memory_block_size_bytes()));
>> +		if (pfn_valid(pfn) && PageVmemmapSelfHosted(pfn_to_page(pfn)))
> 
> Although it works, I think PageVmemmapSelfHosted() check for the 1st pfn's
> vmemmap page is not always reliable.  Since we reused PG_owner_priv_1
> as PG_vmemmap_self_hosted, the test is noly reliable for vmemmap page's
> vmemmap page.  Other non-vmemmap page can be flagged with PG_owner_priv_1.
> So this check can be false-positive. Maybe the following code snippet is
> the solution.

How could that happen for pages used for backing a vmemmap?

> 
> Any thoughts? Oscar or David.

First of all, I think you should really avoid using
memory_block_size_bytes(); when using memory_block_size_bytes(), you
wouldn't need PageVmemmapSelfHosted(), you can just check if the vmemmap
of the page is itself. But I think we should try making this independent
of the memory block size.

If virt_to_page(page) doesn't work, maybe just traverse the direct map
to find the page backing page directly?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags
  2022-06-19 13:38 ` [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags Muchun Song
@ 2022-06-20  7:51   ` David Hildenbrand
  2022-06-20  8:16     ` Muchun Song
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2022-06-20  7:51 UTC (permalink / raw)
  To: Muchun Song, akpm, corbet, mike.kravetz, osalvador, paulmck
  Cc: linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun

On 19.06.22 15:38, Muchun Song wrote:
> We are almost running out of section flags, only one bit is available in
> the worst case (powerpc with 256k pages).  However, there are still some
> free bits (in ->section_mem_map) on other architectures (e.g. x86_64 has
> 10 bits available, arm64 has 8 bits available with worst case of 64K
> pages).  We have hard coded those numbers in code, it is inconvenient to
> use those bits on other architectures except powerpc.  So transfer those
> section flags to enumeration to make it easy to add new section flags in
> the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of
> CONFIG_ZONE_DEVICE to save a bit on non-zone-device case.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/mmzone.h | 44 +++++++++++++++++++++++++++++++++++---------
>  mm/memory_hotplug.c    |  6 ++++++
>  mm/sparse.c            |  2 +-
>  3 files changed, 42 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index aab70355d64f..932843c6459b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1418,16 +1418,35 @@ extern size_t mem_section_usage_size(void);
>   *      (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
>   *      worst combination is powerpc with 256k pages,
>   *      which results in PFN_SECTION_SHIFT equal 6.
> - * To sum it up, at least 6 bits are available.
> + * To sum it up, at least 6 bits are available on all architectures.
> + * However, we can exceed 6 bits on some other architectures except
> + * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
> + * with the worst case of 64K pages on arm64) if we make sure the
> + * exceeded bit is not applicable to powerpc.
>   */
> -#define SECTION_MARKED_PRESENT		(1UL<<0)
> -#define SECTION_HAS_MEM_MAP		(1UL<<1)
> -#define SECTION_IS_ONLINE		(1UL<<2)
> -#define SECTION_IS_EARLY		(1UL<<3)
> -#define SECTION_TAINT_ZONE_DEVICE	(1UL<<4)
> -#define SECTION_MAP_LAST_BIT		(1UL<<5)
> -#define SECTION_MAP_MASK		(~(SECTION_MAP_LAST_BIT-1))
> -#define SECTION_NID_SHIFT		6
> +enum {
> +	SECTION_MARKED_PRESENT_BIT,
> +	SECTION_HAS_MEM_MAP_BIT,
> +	SECTION_IS_ONLINE_BIT,
> +	SECTION_IS_EARLY_BIT,
> +#ifdef CONFIG_ZONE_DEVICE
> +	SECTION_TAINT_ZONE_DEVICE_BIT,
> +#endif
> +	SECTION_MAP_LAST_BIT,
> +};
> +
> +enum {
> +	SECTION_MARKED_PRESENT		= BIT(SECTION_MARKED_PRESENT_BIT),
> +	SECTION_HAS_MEM_MAP		= BIT(SECTION_HAS_MEM_MAP_BIT),
> +	SECTION_IS_ONLINE		= BIT(SECTION_IS_ONLINE_BIT),
> +	SECTION_IS_EARLY		= BIT(SECTION_IS_EARLY_BIT),
> +#ifdef CONFIG_ZONE_DEVICE
> +	SECTION_TAINT_ZONE_DEVICE	= BIT(SECTION_TAINT_ZONE_DEVICE_BIT),
> +#endif
> +};

I can understand the reason for the other enum, to auto-assing numbers.
What's the underlying reason for the enum here? Personally, I'd just
stay with defines, so I'm curious :)

LGTM

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags
  2022-06-20  7:51   ` David Hildenbrand
@ 2022-06-20  8:16     ` Muchun Song
  0 siblings, 0 replies; 11+ messages in thread
From: Muchun Song @ 2022-06-20  8:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, corbet, mike.kravetz, osalvador, paulmck, linux-doc,
	linux-kernel, linux-mm, duanxiongchun, smuchun

On Mon, Jun 20, 2022 at 09:51:42AM +0200, David Hildenbrand wrote:
> On 19.06.22 15:38, Muchun Song wrote:
> > We are almost running out of section flags, only one bit is available in
> > the worst case (powerpc with 256k pages).  However, there are still some
> > free bits (in ->section_mem_map) on other architectures (e.g. x86_64 has
> > 10 bits available, arm64 has 8 bits available with worst case of 64K
> > pages).  We have hard coded those numbers in code, it is inconvenient to
> > use those bits on other architectures except powerpc.  So transfer those
> > section flags to enumeration to make it easy to add new section flags in
> > the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of
> > CONFIG_ZONE_DEVICE to save a bit on non-zone-device case.
> > 
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/mmzone.h | 44 +++++++++++++++++++++++++++++++++++---------
> >  mm/memory_hotplug.c    |  6 ++++++
> >  mm/sparse.c            |  2 +-
> >  3 files changed, 42 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index aab70355d64f..932843c6459b 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1418,16 +1418,35 @@ extern size_t mem_section_usage_size(void);
> >   *      (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
> >   *      worst combination is powerpc with 256k pages,
> >   *      which results in PFN_SECTION_SHIFT equal 6.
> > - * To sum it up, at least 6 bits are available.
> > + * To sum it up, at least 6 bits are available on all architectures.
> > + * However, we can exceed 6 bits on some other architectures except
> > + * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
> > + * with the worst case of 64K pages on arm64) if we make sure the
> > + * exceeded bit is not applicable to powerpc.
> >   */
> > -#define SECTION_MARKED_PRESENT		(1UL<<0)
> > -#define SECTION_HAS_MEM_MAP		(1UL<<1)
> > -#define SECTION_IS_ONLINE		(1UL<<2)
> > -#define SECTION_IS_EARLY		(1UL<<3)
> > -#define SECTION_TAINT_ZONE_DEVICE	(1UL<<4)
> > -#define SECTION_MAP_LAST_BIT		(1UL<<5)
> > -#define SECTION_MAP_MASK		(~(SECTION_MAP_LAST_BIT-1))
> > -#define SECTION_NID_SHIFT		6
> > +enum {
> > +	SECTION_MARKED_PRESENT_BIT,
> > +	SECTION_HAS_MEM_MAP_BIT,
> > +	SECTION_IS_ONLINE_BIT,
> > +	SECTION_IS_EARLY_BIT,
> > +#ifdef CONFIG_ZONE_DEVICE
> > +	SECTION_TAINT_ZONE_DEVICE_BIT,
> > +#endif
> > +	SECTION_MAP_LAST_BIT,
> > +};
> > +
> > +enum {
> > +	SECTION_MARKED_PRESENT		= BIT(SECTION_MARKED_PRESENT_BIT),
> > +	SECTION_HAS_MEM_MAP		= BIT(SECTION_HAS_MEM_MAP_BIT),
> > +	SECTION_IS_ONLINE		= BIT(SECTION_IS_ONLINE_BIT),
> > +	SECTION_IS_EARLY		= BIT(SECTION_IS_EARLY_BIT),
> > +#ifdef CONFIG_ZONE_DEVICE
> > +	SECTION_TAINT_ZONE_DEVICE	= BIT(SECTION_TAINT_ZONE_DEVICE_BIT),
> > +#endif
> > +};
> 
> I can understand the reason for the other enum, to auto-assing numbers.
> What's the underlying reason for the enum here? Personally, I'd just
> stay with defines, so I'm curious :)
>

Oh, just my personal preference. I can replace those with defines in
next version if you prefer it. :-)

Thanks.

> LGTM
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
  2022-06-20  7:47     ` David Hildenbrand
@ 2022-06-20  8:29       ` Muchun Song
  2022-06-20  8:44         ` Oscar Salvador
  0 siblings, 1 reply; 11+ messages in thread
From: Muchun Song @ 2022-06-20  8:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, corbet, mike.kravetz, osalvador, paulmck, linux-doc,
	linux-kernel, linux-mm, duanxiongchun, smuchun

On Mon, Jun 20, 2022 at 09:47:22AM +0200, David Hildenbrand wrote:
> On 20.06.22 09:22, Muchun Song wrote:
> > On Sun, Jun 19, 2022 at 09:38:51PM +0800, Muchun Song wrote:
> >> For now, the feature of hugetlb_free_vmemmap is not compatible with the
> >> feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap
> >> takes precedence over memory_hotplug.memmap_on_memory. However, someone
> >> wants to make memory_hotplug.memmap_on_memory takes precedence over
> >> hugetlb_free_vmemmap since memmap_on_memory makes it more likely to
> >> succeed memory hotplug in close-to-OOM situations.  So the decision
> >> of making hugetlb_free_vmemmap take precedence is not wise and elegant.
> >> The proper approach is to have hugetlb_vmemmap.c do the check whether
> >> the section which the HugeTLB pages belong to can be optimized.  If
> >> the section's vmemmap pages are allocated from the added memory block
> >> itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap,
> >> otherwise, do the optimization.  Then both kernel parameters are
> >> compatible.  So this patch introduces VmemmapSelfHosted to mask any
> >> non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag
> >> to detect if a vmemmap page can be optimized.
> >>
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> Co-developed-by: Oscar Salvador <osalvador@suse.de>
> >> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> >> ---
> >>  Documentation/admin-guide/kernel-parameters.txt | 22 +++++------
> >>  Documentation/admin-guide/sysctl/vm.rst         |  5 +--
> >>  include/linux/memory_hotplug.h                  |  9 -----
> >>  include/linux/page-flags.h                      | 11 ++++++
> >>  mm/hugetlb_vmemmap.c                            | 52 +++++++++++++++++++++----
> >>  mm/memory_hotplug.c                             | 27 ++++++-------
> >>  6 files changed, 79 insertions(+), 47 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index 8090130b544b..d740e2ed0e61 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -1722,9 +1722,11 @@
> >>  			Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
> >>  			the default is on.
> >>  
> >> -			This is not compatible with memory_hotplug.memmap_on_memory.
> >> -			If both parameters are enabled, hugetlb_free_vmemmap takes
> >> -			precedence over memory_hotplug.memmap_on_memory.
> >> +			Note that the vmemmap pages may be allocated from the added
> >> +			memory block itself when memory_hotplug.memmap_on_memory is
> >> +			enabled, those vmemmap pages cannot be optimized even if this
> >> +			feature is enabled.  Other vmemmap pages not allocated from
> >> +			the added memory block itself do not be affected.
> >>  
> >>  	hung_task_panic=
> >>  			[KNL] Should the hung task detector generate panics.
> >> @@ -3069,10 +3071,12 @@
> >>  			[KNL,X86,ARM] Boolean flag to enable this feature.
> >>  			Format: {on | off (default)}
> >>  			When enabled, runtime hotplugged memory will
> >> -			allocate its internal metadata (struct pages)
> >> -			from the hotadded memory which will allow to
> >> -			hotadd a lot of memory without requiring
> >> -			additional memory to do so.
> >> +			allocate its internal metadata (struct pages,
> >> +			those vmemmap pages cannot be optimized even
> >> +			if hugetlb_free_vmemmap is enabled) from the
> >> +			hotadded memory which will allow to hotadd a
> >> +			lot of memory without requiring additional
> >> +			memory to do so.
> >>  			This feature is disabled by default because it
> >>  			has some implication on large (e.g. GB)
> >>  			allocations in some configurations (e.g. small
> >> @@ -3082,10 +3086,6 @@
> >>  			Note that even when enabled, there are a few cases where
> >>  			the feature is not effective.
> >>  
> >> -			This is not compatible with hugetlb_free_vmemmap. If
> >> -			both parameters are enabled, hugetlb_free_vmemmap takes
> >> -			precedence over memory_hotplug.memmap_on_memory.
> >> -
> >>  	memtest=	[KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
> >>  			Format: <integer>
> >>  			default : 0 <disable>
> >> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> >> index 5c9aa171a0d3..d7374a1e8ac9 100644
> >> --- a/Documentation/admin-guide/sysctl/vm.rst
> >> +++ b/Documentation/admin-guide/sysctl/vm.rst
> >> @@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst
> >>  hugetlb_optimize_vmemmap
> >>  ========================
> >>  
> >> -This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter)
> >> -is configured or the size of 'struct page' (a structure defined in
> >> -include/linux/mm_types.h) is not power of two (an unusual system config could
> >> +This knob is not available when the size of 'struct page' (a structure defined
> >> +in include/linux/mm_types.h) is not power of two (an unusual system config could
> >>  result in this).
> >>  
> >>  Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages
> >> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> >> index 20d7edf62a6a..e0b2209ab71c 100644
> >> --- a/include/linux/memory_hotplug.h
> >> +++ b/include/linux/memory_hotplug.h
> >> @@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size);
> >>  extern bool mhp_supports_memmap_on_memory(unsigned long size);
> >>  #endif /* CONFIG_MEMORY_HOTPLUG */
> >>  
> >> -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
> >> -bool mhp_memmap_on_memory(void);
> >> -#else
> >> -static inline bool mhp_memmap_on_memory(void)
> >> -{
> >> -	return false;
> >> -}
> >> -#endif
> >> -
> >>  #endif /* __LINUX_MEMORY_HOTPLUG_H */
> >> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> >> index e66f7aa3191d..2aa5dcbfe468 100644
> >> --- a/include/linux/page-flags.h
> >> +++ b/include/linux/page-flags.h
> >> @@ -193,6 +193,11 @@ enum pageflags {
> >>  
> >>  	/* Only valid for buddy pages. Used to track pages that are reported */
> >>  	PG_reported = PG_uptodate,
> >> +
> >> +#ifdef CONFIG_MEMORY_HOTPLUG
> >> +	/* For self-hosted memmap pages */
> >> +	PG_vmemmap_self_hosted = PG_owner_priv_1,
> >> +#endif
> >>  };
> >>  
> >>  #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
> >> @@ -628,6 +633,12 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
> >>   */
> >>  __PAGEFLAG(Reported, reported, PF_NO_COMPOUND)
> >>  
> >> +#ifdef CONFIG_MEMORY_HOTPLUG
> >> +PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY)
> >> +#else
> >> +PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
> >> +#endif
> >> +
> >>  /*
> >>   * On an anonymous page mapped into a user virtual memory area,
> >>   * page->mapping points to its anon_vma, not to a struct address_space;
> >> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> >> index 1089ea8a9c98..73bfbb47f6a4 100644
> >> --- a/mm/hugetlb_vmemmap.c
> >> +++ b/mm/hugetlb_vmemmap.c
> >> @@ -10,7 +10,7 @@
> >>   */
> >>  #define pr_fmt(fmt)	"HugeTLB: " fmt
> >>  
> >> -#include <linux/memory_hotplug.h>
> >> +#include <linux/memory.h>
> >>  #include "hugetlb_vmemmap.h"
> >>  
> >>  /*
> >> @@ -97,18 +97,54 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
> >>  	return ret;
> >>  }
> >>  
> >> +static unsigned int vmemmap_optimizable_pages(struct hstate *h,
> >> +					      struct page *head)
> >> +{
> >> +	if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
> >> +		return 0;
> >> +
> >> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
> >> +		unsigned long pfn = page_to_pfn(head);
> >> +
> >> +		/*
> >> +		 * Due to HugeTLB alignment requirements and the vmemmap pages
> >> +		 * being at the start of the hotplugged memory region in
> >> +		 * memory_hotplug.memmap_on_memory case. Checking the first
> >> +		 * vmemmap page's vmemmap if it is marked as VmemmapSelfHosted
> >> +		 * is sufficient.
> >> +		 *
> >> +		 * [                  hotplugged memory                  ]
> >> +		 * [        section        ][...][        section        ]
> >> +		 * [ vmemmap ][              usable memory               ]
> >> +		 *   ^   |     |                                        |
> >> +		 *   +---+     |                                        |
> >> +		 *     ^       |                                        |
> >> +		 *     +-------+                                        |
> >> +		 *          ^                                           |
> >> +		 *          +-------------------------------------------+
> >> +		 *
> >> +		 * Hotplugged memory block never has non-present sections, while
> >> +		 * boot memory block can have one or more. So pfn_valid() is
> >> +		 * used to filter out the non-present section which also cannot
> >> +		 * be memmap_on_memory.
> >> +		 */
> >> +		pfn = ALIGN_DOWN(pfn, PHYS_PFN(memory_block_size_bytes()));
> >> +		if (pfn_valid(pfn) && PageVmemmapSelfHosted(pfn_to_page(pfn)))
> > 
> > Although it works, I think PageVmemmapSelfHosted() check for the 1st pfn's
> > vmemmap page is not always reliable.  Since we reused PG_owner_priv_1
> > as PG_vmemmap_self_hosted, the test is noly reliable for vmemmap page's
> > vmemmap page.  Other non-vmemmap page can be flagged with PG_owner_priv_1.
> > So this check can be false-positive. Maybe the following code snippet is
> > the solution.
> 
> How could that happen for pages used for backing a vmemmap?
>

It cannot happen for memmap_on_memory case. Howwver, it can happen for other
cases. E.g. the 1st pfn (of boot memory block) whose vmemmap page may be flagged
as PG_owner_priv_1 (if PG_swapcache is set). Then, the check is false-positive.
 
> > 
> > Any thoughts? Oscar or David.
> 
> First of all, I think you should really avoid using
> memory_block_size_bytes(); when using memory_block_size_bytes(), you
> wouldn't need PageVmemmapSelfHosted(), you can just check if the vmemmap
> of the page is itself. But I think we should try making this independent
> of the memory block size.
> 

Agree.

> If virt_to_page(page) doesn't work, maybe just traverse the direct map
> to find the page backing page directly?
>

Yeah, now I have tried to walk page tables to get the backing page.
I'll update a new version.

Thanks.

> -- 
> Thanks,
> 
> David / dhildenb
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
  2022-06-20  8:29       ` Muchun Song
@ 2022-06-20  8:44         ` Oscar Salvador
  2022-06-20  8:56           ` David Hildenbrand
  2022-06-20  9:05           ` Muchun Song
  0 siblings, 2 replies; 11+ messages in thread
From: Oscar Salvador @ 2022-06-20  8:44 UTC (permalink / raw)
  To: Muchun Song
  Cc: David Hildenbrand, akpm, corbet, mike.kravetz, paulmck,
	linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun

On Mon, Jun 20, 2022 at 04:29:11PM +0800, Muchun Song wrote:
> > > Although it works, I think PageVmemmapSelfHosted() check for the 1st pfn's
> > > vmemmap page is not always reliable.  Since we reused PG_owner_priv_1
> > > as PG_vmemmap_self_hosted, the test is noly reliable for vmemmap page's
> > > vmemmap page.  Other non-vmemmap page can be flagged with PG_owner_priv_1.
> > > So this check can be false-positive. Maybe the following code snippet is
> > > the solution.
> > 
> > How could that happen for pages used for backing a vmemmap?
> >
> 
> It cannot happen for memmap_on_memory case. Howwver, it can happen for other
> cases. E.g. the 1st pfn (of boot memory block) whose vmemmap page may be flagged
> as PG_owner_priv_1 (if PG_swapcache is set). Then, the check is false-positive.

If this can really happen, which I am not that sure tbh, maybe a way out would be
to just define a new page-type as we did in previous versions of memmap_on_memory.
In that way we would not for flags, but for its type.

But as I said, I am not entirely sure about the potential fallout of what you mention.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
  2022-06-20  8:44         ` Oscar Salvador
@ 2022-06-20  8:56           ` David Hildenbrand
  2022-06-20  9:05           ` Muchun Song
  1 sibling, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2022-06-20  8:56 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song
  Cc: akpm, corbet, mike.kravetz, paulmck, linux-doc, linux-kernel,
	linux-mm, duanxiongchun, smuchun

On 20.06.22 10:44, Oscar Salvador wrote:
> On Mon, Jun 20, 2022 at 04:29:11PM +0800, Muchun Song wrote:
>>>> Although it works, I think PageVmemmapSelfHosted() check for the 1st pfn's
>>>> vmemmap page is not always reliable.  Since we reused PG_owner_priv_1
>>>> as PG_vmemmap_self_hosted, the test is noly reliable for vmemmap page's
>>>> vmemmap page.  Other non-vmemmap page can be flagged with PG_owner_priv_1.
>>>> So this check can be false-positive. Maybe the following code snippet is
>>>> the solution.
>>>
>>> How could that happen for pages used for backing a vmemmap?
>>>
>>
>> It cannot happen for memmap_on_memory case. Howwver, it can happen for other
>> cases. E.g. the 1st pfn (of boot memory block) whose vmemmap page may be flagged
>> as PG_owner_priv_1 (if PG_swapcache is set). Then, the check is false-positive.
> 
> If this can really happen, which I am not that sure tbh, maybe a way out would be
> to just define a new page-type as we did in previous versions of memmap_on_memory.
> In that way we would not for flags, but for its type.
> 
> But as I said, I am not entirely sure about the potential fallout of what you mention.

We are talking about the memmap of a page, who's page content is the
memmap of pages actually exposed to other users (file-backed, anonymous,
whatsoever).

In other words, while setting PG_swapcache on the memmap of a page
exposed to the user is possible, it shouldn't be possible for the memmap
of the page "hosting" these memmaps.

I know, it's confusing and I keep confusing myself. I tried creating a
picture and it doesn't really clarify the situation :D

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
  2022-06-20  8:44         ` Oscar Salvador
  2022-06-20  8:56           ` David Hildenbrand
@ 2022-06-20  9:05           ` Muchun Song
  1 sibling, 0 replies; 11+ messages in thread
From: Muchun Song @ 2022-06-20  9:05 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: David Hildenbrand, akpm, corbet, mike.kravetz, paulmck,
	linux-doc, linux-kernel, linux-mm, duanxiongchun, smuchun

On Mon, Jun 20, 2022 at 10:44:40AM +0200, Oscar Salvador wrote:
> On Mon, Jun 20, 2022 at 04:29:11PM +0800, Muchun Song wrote:
> > > > Although it works, I think PageVmemmapSelfHosted() check for the 1st pfn's
> > > > vmemmap page is not always reliable.  Since we reused PG_owner_priv_1
> > > > as PG_vmemmap_self_hosted, the test is noly reliable for vmemmap page's
> > > > vmemmap page.  Other non-vmemmap page can be flagged with PG_owner_priv_1.
> > > > So this check can be false-positive. Maybe the following code snippet is
> > > > the solution.
> > > 
> > > How could that happen for pages used for backing a vmemmap?
> > >
> > 
> > It cannot happen for memmap_on_memory case. Howwver, it can happen for other
> > cases. E.g. the 1st pfn (of boot memory block) whose vmemmap page may be flagged
> > as PG_owner_priv_1 (if PG_swapcache is set). Then, the check is false-positive.
> 
> If this can really happen, which I am not that sure tbh, maybe a way out would be

I need to clarify this only can be happened by using this approach implemented
in this patch.

For a boot memory block, the vemmmap pages are not slef-hosted.  So the 1st pfn (of
this memory block) can be allocated to other users. e.g. an anonymous page with
PG_swapcache set.  In this patch, ALIGN_DOWN(pfn, PHYS_PFN(memory_block_size_bytes()))
will located on this anonymous page, then the check is false-positive.

[                  boot memory block                  ]
[        section        ][...][        section        ]
[                   usable memory                     ]

> to just define a new page-type as we did in previous versions of memmap_on_memory.
> In that way we would not for flags, but for its type.
>

I think we do not need to introduced a new flag, we just make sure the page
passed to PageVmemmapSelfHosted() is a backing page for vmemmap. Then we
cannot incur false-positive.  The feasible solution is walking page tables
to find a vmemmap page's backing page.

Thanks.
 
> But as I said, I am not entirely sure about the potential fallout of what you mention.
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-20  9:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-19 13:38 [PATCH v4 0/2] make hugetlb_optimize_vmemmap compatible with memmap_on_memory Muchun Song
2022-06-19 13:38 ` [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags Muchun Song
2022-06-20  7:51   ` David Hildenbrand
2022-06-20  8:16     ` Muchun Song
2022-06-19 13:38 ` [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory Muchun Song
2022-06-20  7:22   ` Muchun Song
2022-06-20  7:47     ` David Hildenbrand
2022-06-20  8:29       ` Muchun Song
2022-06-20  8:44         ` Oscar Salvador
2022-06-20  8:56           ` David Hildenbrand
2022-06-20  9:05           ` Muchun Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.