All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Add support for memmap on memory feature on ppc64
@ 2023-07-06  8:50 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here https://lore.kernel.org/linux-mm/20230616110826.344417-1-aneesh.kumar@linux.ibm.com

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree



Aneesh Kumar K.V (5):
  mm/hotplug: Embed vmem_altmap details in memory block
  mm/hotplug: Allow architecture override for memmap on memory feature
  mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
  mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix

 arch/arm64/Kconfig                            |  4 +-
 arch/arm64/mm/mmu.c                           |  5 +
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 ++++++
 .../platforms/pseries/hotplug-memory.c        |  4 +-
 arch/x86/Kconfig                              |  4 +-
 arch/x86/mm/init_64.c                         |  6 ++
 drivers/acpi/acpi_memhotplug.c                |  3 +-
 drivers/base/memory.c                         | 28 ++++--
 include/linux/memory.h                        | 25 +++--
 include/linux/memory_hotplug.h                | 17 +++-
 include/linux/memremap.h                      | 18 +---
 mm/Kconfig                                    |  3 +
 mm/memory_hotplug.c                           | 94 +++++++++----------
 14 files changed, 151 insertions(+), 89 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 0/5] Add support for memmap on memory feature on ppc64
@ 2023-07-06  8:50 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here https://lore.kernel.org/linux-mm/20230616110826.344417-1-aneesh.kumar@linux.ibm.com

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree



Aneesh Kumar K.V (5):
  mm/hotplug: Embed vmem_altmap details in memory block
  mm/hotplug: Allow architecture override for memmap on memory feature
  mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
  mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix

 arch/arm64/Kconfig                            |  4 +-
 arch/arm64/mm/mmu.c                           |  5 +
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 ++++++
 .../platforms/pseries/hotplug-memory.c        |  4 +-
 arch/x86/Kconfig                              |  4 +-
 arch/x86/mm/init_64.c                         |  6 ++
 drivers/acpi/acpi_memhotplug.c                |  3 +-
 drivers/base/memory.c                         | 28 ++++--
 include/linux/memory.h                        | 25 +++--
 include/linux/memory_hotplug.h                | 17 +++-
 include/linux/memremap.h                      | 18 +---
 mm/Kconfig                                    |  3 +
 mm/memory_hotplug.c                           | 94 +++++++++----------
 14 files changed, 151 insertions(+), 89 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06  8:50 ` Aneesh Kumar K.V
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory.

Embed vmem_altmap data structure to memory_bock and use that instead of
nr_vmemmap_pages.

On memory unplug, if the kernel finds any memory block in the range to
be using vmem_altmap, the kernel fails to unplug the memory if the
request is not a single memory block unplug.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c    | 28 +++++++++++++++++++---------
 include/linux/memory.h   | 25 +++++++++++++++++++------
 include/linux/memremap.h | 18 +-----------------
 mm/memory_hotplug.c      | 31 ++++++++++++++-----------------
 4 files changed, 53 insertions(+), 49 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..523cc1d37c81 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	WARN(mem->altmap.alloc, "Altmap not fully unmapped");
 	kfree(mem);
 }
 
@@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap.alloc)
+		nr_vmemmap_pages = mem->altmap.alloc + mem->altmap.reserve;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap.alloc)
+		nr_vmemmap_pages = mem->altmap.alloc + mem->altmap.reserve;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +751,10 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	if (altmap)
+		memcpy(&mem->altmap, altmap, sizeof(*altmap));
+	else
+		mem->altmap.alloc = 0;
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +793,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +828,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +842,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..87f12924250f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -64,6 +64,23 @@ struct memory_group {
 	};
 };
 
+/**
+ * struct vmem_altmap - pre-allocated storage for vmemmap_populate
+ * @base_pfn: base of the entire dev_pagemap mapping
+ * @reserve: pages mapped, but reserved for driver use (relative to @base)
+ * @free: free pages set aside in the mapping for memmap storage
+ * @align: pages reserved to meet allocation alignments
+ * @alloc: track pages consumed, private to vmemmap_populate()
+ */
+struct vmem_altmap {
+	unsigned long base_pfn;
+	const unsigned long end_pfn;
+	const unsigned long reserve;
+	unsigned long free;
+	unsigned long align;
+	unsigned long alloc;
+};
+
 struct memory_block {
 	unsigned long start_section_nr;
 	unsigned long state;		/* serialized by the dev->lock */
@@ -77,11 +94,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +160,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 1314d9c5f05b..4cb326f85302 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_MEMREMAP_H_
 #define _LINUX_MEMREMAP_H_
 
+#include <linux/memory.h>
 #include <linux/mmzone.h>
 #include <linux/range.h>
 #include <linux/ioport.h>
@@ -10,23 +11,6 @@
 struct resource;
 struct device;
 
-/**
- * struct vmem_altmap - pre-allocated storage for vmemmap_populate
- * @base_pfn: base of the entire dev_pagemap mapping
- * @reserve: pages mapped, but reserved for driver use (relative to @base)
- * @free: free pages set aside in the mapping for memmap storage
- * @align: pages reserved to meet allocation alignments
- * @alloc: track pages consumed, private to vmemmap_populate()
- */
-struct vmem_altmap {
-	unsigned long base_pfn;
-	const unsigned long end_pfn;
-	const unsigned long reserve;
-	unsigned long free;
-	unsigned long align;
-	unsigned long alloc;
-};
-
 /*
  * Specialize ZONE_DEVICE memory into multiple types each has a different
  * usage.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f231cf1b410..c4bac38cc147 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1354,7 +1354,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
+	ret = create_memory_block_devices(start, size, &mhp_altmap,
 					  group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
@@ -1956,12 +1956,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct vmem_altmap **altmap = (struct vmem_altmap **)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * If we have any pages allocated from altmap
+	 * return the altmap details and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap.alloc) {
+		*altmap = &mem->altmap;
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2036,9 +2042,8 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
+	int ret;
 	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2060,24 +2065,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
 	 * the same granularity it was added - a single memory block.
 	 */
+
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &altmap, get_vmemmap_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
-			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
-			 */
-			mhp_altmap.alloc = nr_vmemmap_pages;
-			altmap = &mhp_altmap;
 		}
 	}
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory.

Embed vmem_altmap data structure to memory_bock and use that instead of
nr_vmemmap_pages.

On memory unplug, if the kernel finds any memory block in the range to
be using vmem_altmap, the kernel fails to unplug the memory if the
request is not a single memory block unplug.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c    | 28 +++++++++++++++++++---------
 include/linux/memory.h   | 25 +++++++++++++++++++------
 include/linux/memremap.h | 18 +-----------------
 mm/memory_hotplug.c      | 31 ++++++++++++++-----------------
 4 files changed, 53 insertions(+), 49 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..523cc1d37c81 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	WARN(mem->altmap.alloc, "Altmap not fully unmapped");
 	kfree(mem);
 }
 
@@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap.alloc)
+		nr_vmemmap_pages = mem->altmap.alloc + mem->altmap.reserve;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap.alloc)
+		nr_vmemmap_pages = mem->altmap.alloc + mem->altmap.reserve;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +751,10 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	if (altmap)
+		memcpy(&mem->altmap, altmap, sizeof(*altmap));
+	else
+		mem->altmap.alloc = 0;
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +793,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +828,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +842,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..87f12924250f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -64,6 +64,23 @@ struct memory_group {
 	};
 };
 
+/**
+ * struct vmem_altmap - pre-allocated storage for vmemmap_populate
+ * @base_pfn: base of the entire dev_pagemap mapping
+ * @reserve: pages mapped, but reserved for driver use (relative to @base)
+ * @free: free pages set aside in the mapping for memmap storage
+ * @align: pages reserved to meet allocation alignments
+ * @alloc: track pages consumed, private to vmemmap_populate()
+ */
+struct vmem_altmap {
+	unsigned long base_pfn;
+	const unsigned long end_pfn;
+	const unsigned long reserve;
+	unsigned long free;
+	unsigned long align;
+	unsigned long alloc;
+};
+
 struct memory_block {
 	unsigned long start_section_nr;
 	unsigned long state;		/* serialized by the dev->lock */
@@ -77,11 +94,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +160,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 1314d9c5f05b..4cb326f85302 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_MEMREMAP_H_
 #define _LINUX_MEMREMAP_H_
 
+#include <linux/memory.h>
 #include <linux/mmzone.h>
 #include <linux/range.h>
 #include <linux/ioport.h>
@@ -10,23 +11,6 @@
 struct resource;
 struct device;
 
-/**
- * struct vmem_altmap - pre-allocated storage for vmemmap_populate
- * @base_pfn: base of the entire dev_pagemap mapping
- * @reserve: pages mapped, but reserved for driver use (relative to @base)
- * @free: free pages set aside in the mapping for memmap storage
- * @align: pages reserved to meet allocation alignments
- * @alloc: track pages consumed, private to vmemmap_populate()
- */
-struct vmem_altmap {
-	unsigned long base_pfn;
-	const unsigned long end_pfn;
-	const unsigned long reserve;
-	unsigned long free;
-	unsigned long align;
-	unsigned long alloc;
-};
-
 /*
  * Specialize ZONE_DEVICE memory into multiple types each has a different
  * usage.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f231cf1b410..c4bac38cc147 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1354,7 +1354,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
+	ret = create_memory_block_devices(start, size, &mhp_altmap,
 					  group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
@@ -1956,12 +1956,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct vmem_altmap **altmap = (struct vmem_altmap **)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * If we have any pages allocated from altmap
+	 * return the altmap details and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap.alloc) {
+		*altmap = &mem->altmap;
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2036,9 +2042,8 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
+	int ret;
 	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2060,24 +2065,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
 	 * the same granularity it was added - a single memory block.
 	 */
+
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &altmap, get_vmemmap_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
-			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
-			 */
-			mhp_altmap.alloc = nr_vmemmap_pages;
-			altmap = &mhp_altmap;
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 2/5] mm/hotplug: Allow architecture override for memmap on memory feature
  2023-07-06  8:50 ` Aneesh Kumar K.V
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Some architectures like ppc64 wants to enable this feature only with radix
translation and their vemmap mappings have different alignment
requirements. Add overrider for mhp_supports_memmap_on_memory() and also
use altmap.reserve feature to adjust the pageblock alignment requirement.

The patch also fallback to allocation of memmap outside memblock if the
alignment rules are not met for memmap on memory allocation. This allows to
use the feature more widely allocating memmap as much as possible within
the memory block getting added.

A follow patch to enable memmap on memory for ppc64 will use this.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/mm/mmu.c            |  5 +++++
 arch/x86/mm/init_64.c          |  6 ++++++
 include/linux/memory_hotplug.h |  3 ++-
 mm/memory_hotplug.c            | 36 ++++++++++++++++++++++++----------
 4 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 95d360805f8a..0ff2ec634a58 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1340,6 +1340,11 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
 }
 
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	return __mhp_supports_memmap_on_memory(size);
+}
+
 /*
  * This memory hotplug notifier helps prevent boot memory from being
  * inadvertently removed as it blocks pfn range offlining process in
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a190aae8ceaf..b318d26a70d4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1264,6 +1264,12 @@ void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	__remove_pages(start_pfn, nr_pages, altmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
+
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	return __mhp_supports_memmap_on_memory(size);
+}
+
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static struct kcore_list kcore_vsyscall;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..a769f44b8368 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -354,7 +354,8 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
 extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
 				      struct mhp_params *params);
 void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
+bool mhp_supports_memmap_on_memory(unsigned long size);
+bool __mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c4bac38cc147..423f96dd5481 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,8 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-bool mhp_supports_memmap_on_memory(unsigned long size)
+/* Helper function for architecture to use. */
+bool __mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1285,6 +1286,20 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
 }
 
+bool __weak mhp_supports_memmap_on_memory(unsigned long size)
+{
+	return false;
+}
+
+/*
+ * Architectures may want to override the altmap reserve details based
+ * on the alignment requirement for vmemmap mapping.
+ */
+unsigned __weak long memory_block_align_base(struct resource *res)
+{
+	return 0;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1295,7 +1310,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+		.reserve  = memory_block_align_base(res),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1339,13 +1358,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 * Self hosted memmap array
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-		if (!mhp_supports_memmap_on_memory(size)) {
-			ret = -EINVAL;
-			goto error;
+		if (mhp_supports_memmap_on_memory(size)) {
+			mhp_altmap.free	= PHYS_PFN(size) - mhp_altmap.reserve;
+			params.altmap = &mhp_altmap;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
-		params.altmap = &mhp_altmap;
+		/* if not supported don't use altmap */
 	}
 
 	/* call arch's memory hotadd */
@@ -1354,8 +1371,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, &mhp_altmap,
-					  group);
+	ret = create_memory_block_devices(start, size, &mhp_altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 2/5] mm/hotplug: Allow architecture override for memmap on memory feature
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Some architectures like ppc64 wants to enable this feature only with radix
translation and their vemmap mappings have different alignment
requirements. Add overrider for mhp_supports_memmap_on_memory() and also
use altmap.reserve feature to adjust the pageblock alignment requirement.

The patch also fallback to allocation of memmap outside memblock if the
alignment rules are not met for memmap on memory allocation. This allows to
use the feature more widely allocating memmap as much as possible within
the memory block getting added.

A follow patch to enable memmap on memory for ppc64 will use this.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/mm/mmu.c            |  5 +++++
 arch/x86/mm/init_64.c          |  6 ++++++
 include/linux/memory_hotplug.h |  3 ++-
 mm/memory_hotplug.c            | 36 ++++++++++++++++++++++++----------
 4 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 95d360805f8a..0ff2ec634a58 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1340,6 +1340,11 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
 }
 
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	return __mhp_supports_memmap_on_memory(size);
+}
+
 /*
  * This memory hotplug notifier helps prevent boot memory from being
  * inadvertently removed as it blocks pfn range offlining process in
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a190aae8ceaf..b318d26a70d4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1264,6 +1264,12 @@ void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	__remove_pages(start_pfn, nr_pages, altmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
+
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	return __mhp_supports_memmap_on_memory(size);
+}
+
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static struct kcore_list kcore_vsyscall;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..a769f44b8368 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -354,7 +354,8 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
 extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
 				      struct mhp_params *params);
 void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
+bool mhp_supports_memmap_on_memory(unsigned long size);
+bool __mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c4bac38cc147..423f96dd5481 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,8 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-bool mhp_supports_memmap_on_memory(unsigned long size)
+/* Helper function for architecture to use. */
+bool __mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1285,6 +1286,20 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
 }
 
+bool __weak mhp_supports_memmap_on_memory(unsigned long size)
+{
+	return false;
+}
+
+/*
+ * Architectures may want to override the altmap reserve details based
+ * on the alignment requirement for vmemmap mapping.
+ */
+unsigned __weak long memory_block_align_base(struct resource *res)
+{
+	return 0;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1295,7 +1310,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+		.reserve  = memory_block_align_base(res),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1339,13 +1358,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 * Self hosted memmap array
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-		if (!mhp_supports_memmap_on_memory(size)) {
-			ret = -EINVAL;
-			goto error;
+		if (mhp_supports_memmap_on_memory(size)) {
+			mhp_altmap.free	= PHYS_PFN(size) - mhp_altmap.reserve;
+			params.altmap = &mhp_altmap;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
-		params.altmap = &mhp_altmap;
+		/* if not supported don't use altmap */
 	}
 
 	/* call arch's memory hotadd */
@@ -1354,8 +1371,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, &mhp_altmap,
-					  group);
+	ret = create_memory_block_devices(start, size, &mhp_altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
  2023-07-06  8:50 ` Aneesh Kumar K.V
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Instead of checking for memmap on memory feature enablement within the
functions checking for alignment, use the kernel parameter to control the
memory hotplug flags. The generic kernel now enables memmap on memory
feature if the hotplug flag request for the same.

The ACPI code now can pass the flag unconditionally because the kernel will
fallback to not using the feature if the alignment rules are not met.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/acpi/acpi_memhotplug.c |  3 +--
 include/linux/memory_hotplug.h | 14 ++++++++++++++
 mm/memory_hotplug.c            | 35 +++++++++++-----------------------
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..4d0096fc4cc2 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (!info->length)
 			continue;
 
-		if (mhp_supports_memmap_on_memory(info->length))
-			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+		mhp_flags |= get_memmap_on_memory_flags();
 		result = __add_memory(mgid, info->start_addr, info->length,
 				      mhp_flags);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index a769f44b8368..af7017122506 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
 bool __mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
+#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
+extern bool memmap_on_memory;
+static inline unsigned long get_memmap_on_memory_flags(void)
+{
+	if (memmap_on_memory)
+		return MHP_MEMMAP_ON_MEMORY;
+	return 0;
+}
+#else
+static inline unsigned long get_memmap_on_memory_flags(void)
+{
+	return 0;
+}
+#endif
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 423f96dd5481..a1542b8f64e6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -45,19 +45,9 @@
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
+bool memmap_on_memory __ro_after_init;
 module_param(memmap_on_memory, bool, 0444);
 MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
-
-static inline bool mhp_memmap_on_memory(void)
-{
-	return memmap_on_memory;
-}
-#else
-static inline bool mhp_memmap_on_memory(void)
-{
-	return false;
-}
 #endif
 
 enum {
@@ -1280,10 +1270,9 @@ bool __mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+	return size == memory_block_size_bytes() &&
+		IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
+		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
 }
 
 bool __weak mhp_supports_memmap_on_memory(unsigned long size)
@@ -2082,15 +2071,13 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 
-	if (mhp_memmap_on_memory()) {
-		ret = walk_memory_blocks(start, size, &altmap, get_vmemmap_altmap_cb);
-		if (ret) {
-			if (size != memory_block_size_bytes()) {
-				pr_warn("Refuse to remove %#llx - %#llx,"
-					"wrong granularity\n",
-					start, start + size);
-				return -EINVAL;
-			}
+	ret = walk_memory_blocks(start, size, &altmap, get_vmemmap_altmap_cb);
+	if (ret) {
+		if (size != memory_block_size_bytes()) {
+			pr_warn("Refuse to remove %#llx - %#llx,"
+				"wrong granularity\n",
+				start, start + size);
+			return -EINVAL;
 		}
 	}
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Instead of checking for memmap on memory feature enablement within the
functions checking for alignment, use the kernel parameter to control the
memory hotplug flags. The generic kernel now enables memmap on memory
feature if the hotplug flag request for the same.

The ACPI code now can pass the flag unconditionally because the kernel will
fallback to not using the feature if the alignment rules are not met.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/acpi/acpi_memhotplug.c |  3 +--
 include/linux/memory_hotplug.h | 14 ++++++++++++++
 mm/memory_hotplug.c            | 35 +++++++++++-----------------------
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..4d0096fc4cc2 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (!info->length)
 			continue;
 
-		if (mhp_supports_memmap_on_memory(info->length))
-			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+		mhp_flags |= get_memmap_on_memory_flags();
 		result = __add_memory(mgid, info->start_addr, info->length,
 				      mhp_flags);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index a769f44b8368..af7017122506 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
 bool __mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
+#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
+extern bool memmap_on_memory;
+static inline unsigned long get_memmap_on_memory_flags(void)
+{
+	if (memmap_on_memory)
+		return MHP_MEMMAP_ON_MEMORY;
+	return 0;
+}
+#else
+static inline unsigned long get_memmap_on_memory_flags(void)
+{
+	return 0;
+}
+#endif
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 423f96dd5481..a1542b8f64e6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -45,19 +45,9 @@
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
+bool memmap_on_memory __ro_after_init;
 module_param(memmap_on_memory, bool, 0444);
 MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
-
-static inline bool mhp_memmap_on_memory(void)
-{
-	return memmap_on_memory;
-}
-#else
-static inline bool mhp_memmap_on_memory(void)
-{
-	return false;
-}
 #endif
 
 enum {
@@ -1280,10 +1270,9 @@ bool __mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+	return size == memory_block_size_bytes() &&
+		IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
+		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
 }
 
 bool __weak mhp_supports_memmap_on_memory(unsigned long size)
@@ -2082,15 +2071,13 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 
-	if (mhp_memmap_on_memory()) {
-		ret = walk_memory_blocks(start, size, &altmap, get_vmemmap_altmap_cb);
-		if (ret) {
-			if (size != memory_block_size_bytes()) {
-				pr_warn("Refuse to remove %#llx - %#llx,"
-					"wrong granularity\n",
-					start, start + size);
-				return -EINVAL;
-			}
+	ret = walk_memory_blocks(start, size, &altmap, get_vmemmap_altmap_cb);
+	if (ret) {
+		if (size != memory_block_size_bytes()) {
+			pr_warn("Refuse to remove %#llx - %#llx,"
+				"wrong granularity\n",
+				start, start + size);
+			return -EINVAL;
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 4/5] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  2023-07-06  8:50 ` Aneesh Kumar K.V
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/Kconfig | 4 +---
 arch/x86/Kconfig   | 4 +---
 mm/Kconfig         | 3 +++
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7856c3a3e35a..7e5985c018f8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_GNU_PROPERTY
 	select ARCH_USE_MEMTEST
@@ -346,9 +347,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 config SMP
 	def_bool y
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
 	def_bool y
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 923bd35f81f2..2f9d28fee75d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -570,6 +570,9 @@ config MHP_MEMMAP_ON_MEMORY
 	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
 	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+       bool
+
 endif # MEMORY_HOTPLUG
 
 # Heavily threaded applications may benefit from splitting the mm-wide
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 4/5] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/Kconfig | 4 +---
 arch/x86/Kconfig   | 4 +---
 mm/Kconfig         | 3 +++
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7856c3a3e35a..7e5985c018f8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_GNU_PROPERTY
 	select ARCH_USE_MEMTEST
@@ -346,9 +347,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 config SMP
 	def_bool y
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
 	def_bool y
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 923bd35f81f2..2f9d28fee75d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -570,6 +570,9 @@ config MHP_MEMMAP_ON_MEMORY
 	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
 	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+       bool
+
 endif # MEMORY_HOTPLUG
 
 # Heavily threaded applications may benefit from splitting the mm-wide
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-06  8:50 ` Aneesh Kumar K.V
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. We also use altmap.reserve
feature to align things correctly at pageblock granularity. We can end up
loosing some pages in memory with this. For ex: with 256MB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  4 ++-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 116d6add0bb0..f890907e5bbf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index a62729f70f2a..c0bd60b5fb64 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	return 1;
 }
+
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	if (!radix_enabled())
+		return false;
+	/*
+	 * The pageblock alignment requirement is met by using
+	 * reserve blocks in altmap.
+	 */
+	return size == memory_block_size_bytes();
+}
+
+unsigned long memory_block_align_base(struct resource *res)
+{
+	unsigned long base_pfn = PHYS_PFN(res->start);
+	unsigned long align, size = resource_size(res);
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
+
+	align = pageblock_align(base_pfn + vmemmap_size) - (base_pfn + vmemmap_size);
+	return align;
+}
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..326db26d773e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
+	mhp_t mhp_flags = MHP_NONE;
 	unsigned long block_sz;
 	int nid, rc;
 
@@ -637,7 +638,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	mhp_flags |= get_memmap_on_memory_flags();
+	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-06  8:50   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-06  8:50 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. We also use altmap.reserve
feature to align things correctly at pageblock granularity. We can end up
loosing some pages in memory with this. For ex: with 256MB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  4 ++-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 116d6add0bb0..f890907e5bbf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index a62729f70f2a..c0bd60b5fb64 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	return 1;
 }
+
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	if (!radix_enabled())
+		return false;
+	/*
+	 * The pageblock alignment requirement is met by using
+	 * reserve blocks in altmap.
+	 */
+	return size == memory_block_size_bytes();
+}
+
+unsigned long memory_block_align_base(struct resource *res)
+{
+	unsigned long base_pfn = PHYS_PFN(res->start);
+	unsigned long align, size = resource_size(res);
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
+
+	align = pageblock_align(base_pfn + vmemmap_size) - (base_pfn + vmemmap_size);
+	return align;
+}
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..326db26d773e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
+	mhp_t mhp_flags = MHP_NONE;
 	unsigned long block_sz;
 	int nid, rc;
 
@@ -637,7 +638,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	mhp_flags |= get_memmap_on_memory_flags();
+	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 4/5] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  2023-07-06  8:50   ` Aneesh Kumar K.V
@ 2023-07-06  8:53     ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  8:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Instead of adding menu entry with all supported architectures, add
> mm/Kconfig variable and select the same from supported architectures.
> 
> No functional change in this patch.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/arm64/Kconfig | 4 +---
>   arch/x86/Kconfig   | 4 +---
>   mm/Kconfig         | 3 +++
>   3 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7856c3a3e35a..7e5985c018f8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -78,6 +78,7 @@ config ARM64
>   	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
>   	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
>   	select ARCH_KEEP_MEMBLOCK
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   	select ARCH_USE_CMPXCHG_LOCKREF
>   	select ARCH_USE_GNU_PROPERTY
>   	select ARCH_USE_MEMTEST
> @@ -346,9 +347,6 @@ config GENERIC_CSUM
>   config GENERIC_CALIBRATE_DELAY
>   	def_bool y
>   
> -config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> -	def_bool y
> -
>   config SMP
>   	def_bool y
>   
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 78224aa76409..d0258e92a8af 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -102,6 +102,7 @@ config X86
>   	select ARCH_HAS_DEBUG_WX
>   	select ARCH_HAS_ZONE_DMA_SET if EXPERT
>   	select ARCH_HAVE_NMI_SAFE_CMPXCHG
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
>   	select ARCH_MIGHT_HAVE_PC_PARPORT
>   	select ARCH_MIGHT_HAVE_PC_SERIO
> @@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
>   	def_bool y
>   	depends on ARCH_ENABLE_MEMORY_HOTPLUG
>   
> -config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> -	def_bool y
> -
>   menu "Power management and ACPI options"
>   
>   config ARCH_HIBERNATION_HEADER
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 923bd35f81f2..2f9d28fee75d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -570,6 +570,9 @@ config MHP_MEMMAP_ON_MEMORY
>   	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>   	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   
> +config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> +       bool
> +
>   endif # MEMORY_HOTPLUG
>   
>   # Heavily threaded applications may benefit from splitting the mm-wide

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 4/5] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
@ 2023-07-06  8:53     ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  8:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Instead of adding menu entry with all supported architectures, add
> mm/Kconfig variable and select the same from supported architectures.
> 
> No functional change in this patch.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/arm64/Kconfig | 4 +---
>   arch/x86/Kconfig   | 4 +---
>   mm/Kconfig         | 3 +++
>   3 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7856c3a3e35a..7e5985c018f8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -78,6 +78,7 @@ config ARM64
>   	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
>   	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
>   	select ARCH_KEEP_MEMBLOCK
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   	select ARCH_USE_CMPXCHG_LOCKREF
>   	select ARCH_USE_GNU_PROPERTY
>   	select ARCH_USE_MEMTEST
> @@ -346,9 +347,6 @@ config GENERIC_CSUM
>   config GENERIC_CALIBRATE_DELAY
>   	def_bool y
>   
> -config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> -	def_bool y
> -
>   config SMP
>   	def_bool y
>   
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 78224aa76409..d0258e92a8af 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -102,6 +102,7 @@ config X86
>   	select ARCH_HAS_DEBUG_WX
>   	select ARCH_HAS_ZONE_DMA_SET if EXPERT
>   	select ARCH_HAVE_NMI_SAFE_CMPXCHG
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
>   	select ARCH_MIGHT_HAVE_PC_PARPORT
>   	select ARCH_MIGHT_HAVE_PC_SERIO
> @@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
>   	def_bool y
>   	depends on ARCH_ENABLE_MEMORY_HOTPLUG
>   
> -config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> -	def_bool y
> -
>   menu "Power management and ACPI options"
>   
>   config ARCH_HIBERNATION_HEADER
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 923bd35f81f2..2f9d28fee75d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -570,6 +570,9 @@ config MHP_MEMMAP_ON_MEMORY
>   	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>   	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   
> +config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> +       bool
> +
>   endif # MEMORY_HOTPLUG
>   
>   # Heavily threaded applications may benefit from splitting the mm-wide

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-06  8:50   ` Aneesh Kumar K.V
@ 2023-07-06  9:07     ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Radix vmemmap mapping can map things correctly at the PMD level or PTE
> level based on different device boundary checks. We also use altmap.reserve
> feature to align things correctly at pageblock granularity. We can end up
> loosing some pages in memory with this. For ex: with 256MB memory block
> size, we require 4 pages to map vmemmap pages, In order to align things
> correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
> 28 pages get reserved.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/Kconfig                          |  1 +
>   arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
>   .../platforms/pseries/hotplug-memory.c        |  4 ++-
>   3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 116d6add0bb0..f890907e5bbf 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -157,6 +157,7 @@ config PPC
>   	select ARCH_HAS_UBSAN_SANITIZE_ALL
>   	select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   	select ARCH_KEEP_MEMBLOCK
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
>   	select ARCH_MIGHT_HAVE_PC_PARPORT
>   	select ARCH_MIGHT_HAVE_PC_SERIO
>   	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index a62729f70f2a..c0bd60b5fb64 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>   
>   	return 1;
>   }
> +
> +/*
> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
> + * some of the restrictions. We don't check for PMD_SIZE because our
> + * vmemmap allocation code can fallback correctly. The pageblock

x86 also has the fallback IIRC; the concern is rather that you end up 
with a PTE-mapped vmemmap, which is slower at runtime than a PMD-mapped 
vmemmap.

> + * alignment requirement is met using altmap->reserve blocks.
> + */
> +bool mhp_supports_memmap_on_memory(unsigned long size)
> +{
> +	if (!radix_enabled())
> +		return false;
> +	/*
> +	 * The pageblock alignment requirement is met by using
> +	 * reserve blocks in altmap.
> +	 */
> +	return size == memory_block_size_bytes();
> +}

I really dislike such arch overrides.

I think the flow should be something like that, having a generic:

bool mhp_supports_memmap_on_memory(unsigned long size)
{
	...
	return arch_mhp_supports_memmap_on_memory(size)) &&
	       size == memory_block_size_bytes() &&
	       ...
}

where we'll also keep the pageblock check here.

And a ppc specific

bool arch_mhp_supports_memmap_on_memory(unsigned long size)
{
	/*
          * Don't check for the vmemmap covering PMD_SIZE, we accept that
          * we might get a PTE-mapped vmemmap.
          */
	return radix_enabled();
}

We can then move the PMD_SIZE check into arch specific code (x86-aarch64).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-06  9:07     ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Radix vmemmap mapping can map things correctly at the PMD level or PTE
> level based on different device boundary checks. We also use altmap.reserve
> feature to align things correctly at pageblock granularity. We can end up
> loosing some pages in memory with this. For ex: with 256MB memory block
> size, we require 4 pages to map vmemmap pages, In order to align things
> correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
> 28 pages get reserved.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/Kconfig                          |  1 +
>   arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
>   .../platforms/pseries/hotplug-memory.c        |  4 ++-
>   3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 116d6add0bb0..f890907e5bbf 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -157,6 +157,7 @@ config PPC
>   	select ARCH_HAS_UBSAN_SANITIZE_ALL
>   	select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   	select ARCH_KEEP_MEMBLOCK
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
>   	select ARCH_MIGHT_HAVE_PC_PARPORT
>   	select ARCH_MIGHT_HAVE_PC_SERIO
>   	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index a62729f70f2a..c0bd60b5fb64 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>   
>   	return 1;
>   }
> +
> +/*
> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
> + * some of the restrictions. We don't check for PMD_SIZE because our
> + * vmemmap allocation code can fallback correctly. The pageblock

x86 also has the fallback IIRC; the concern is rather that you end up 
with a PTE-mapped vmemmap, which is slower at runtime than a PMD-mapped 
vmemmap.

> + * alignment requirement is met using altmap->reserve blocks.
> + */
> +bool mhp_supports_memmap_on_memory(unsigned long size)
> +{
> +	if (!radix_enabled())
> +		return false;
> +	/*
> +	 * The pageblock alignment requirement is met by using
> +	 * reserve blocks in altmap.
> +	 */
> +	return size == memory_block_size_bytes();
> +}

I really dislike such arch overrides.

I think the flow should be something like that, having a generic:

bool mhp_supports_memmap_on_memory(unsigned long size)
{
	...
	return arch_mhp_supports_memmap_on_memory(size)) &&
	       size == memory_block_size_bytes() &&
	       ...
}

where we'll also keep the pageblock check here.

And a ppc specific

bool arch_mhp_supports_memmap_on_memory(unsigned long size)
{
	/*
          * Don't check for the vmemmap covering PMD_SIZE, we accept that
          * we might get a PTE-mapped vmemmap.
          */
	return radix_enabled();
}

We can then move the PMD_SIZE check into arch specific code (x86-aarch64).

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06  8:50   ` Aneesh Kumar K.V
@ 2023-07-06  9:18     ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory.

Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?

IOW, why can't ppc64 simply allocate the vmemmap from the start of the 
memblock (-> base_pfn) and use the stored number of vmemmap pages to 
calculate the end_pfn?

To rephrase: if the vmemmap is not at the beginning and doesn't cover 
full apgeblocks, memory onlining/offlining would be broken.

[...]

>   
> +/**
> + * struct vmem_altmap - pre-allocated storage for vmemmap_populate
> + * @base_pfn: base of the entire dev_pagemap mapping
> + * @reserve: pages mapped, but reserved for driver use (relative to @base)
> + * @free: free pages set aside in the mapping for memmap storage
> + * @align: pages reserved to meet allocation alignments
> + * @alloc: track pages consumed, private to vmemmap_populate()
> + */
> +struct vmem_altmap {
> +	unsigned long base_pfn;
> +	const unsigned long end_pfn;
> +	const unsigned long reserve;
> +	unsigned long free;
> +	unsigned long align;
> +	unsigned long alloc;
> +};

Instead of embedding that, what about conditionally allocating it and 
store a pointer to it in the "struct memory_block"?

In the general case as of today, we don't have an altmap.

> +
>   struct memory_block {
>   	unsigned long start_section_nr;
>   	unsigned long state;		/* serialized by the dev->lock */
> @@ -77,11 +94,7 @@ struct memory_block {
>   	 */
>   	struct zone *zone;
>   	struct device dev;
> -	/*
> -	 * Number of vmemmap pages. These pages
> -	 * lay at the beginning of the memory block.
> -	 */
> -	unsigned long nr_vmemmap_pages;
> +	struct vmem_altmap altmap;
>   	struct memory_group *group;	/* group (if any) for this block */
>   	struct list_head group_next;	/* next block inside memory group */
>   #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> @@ -147,7 +160,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
>   extern int register_memory_notifier(struct notifier_block *nb);
>   extern void unregister_memory_notifier(struct notifier_block *nb);
>   int create_memory_block_devices(unsigned long start, unsigned long size,

[...]

>   static int check_cpu_on_node(int nid)
> @@ -2036,9 +2042,8 @@ EXPORT_SYMBOL(try_offline_node);
>   
>   static int __ref try_remove_memory(u64 start, u64 size)
>   {
> -	struct vmem_altmap mhp_altmap = {};
> +	int ret;
>   	struct vmem_altmap *altmap = NULL;
> -	unsigned long nr_vmemmap_pages;
>   	int rc = 0, nid = NUMA_NO_NODE;
>   
>   	BUG_ON(check_hotplug_memory_range(start, size));
> @@ -2060,24 +2065,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
>   	 * the same granularity it was added - a single memory block.
>   	 */
> +

^ unrealted change?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-06  9:18     ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory.

Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?

IOW, why can't ppc64 simply allocate the vmemmap from the start of the 
memblock (-> base_pfn) and use the stored number of vmemmap pages to 
calculate the end_pfn?

To rephrase: if the vmemmap is not at the beginning and doesn't cover 
full apgeblocks, memory onlining/offlining would be broken.

[...]

>   
> +/**
> + * struct vmem_altmap - pre-allocated storage for vmemmap_populate
> + * @base_pfn: base of the entire dev_pagemap mapping
> + * @reserve: pages mapped, but reserved for driver use (relative to @base)
> + * @free: free pages set aside in the mapping for memmap storage
> + * @align: pages reserved to meet allocation alignments
> + * @alloc: track pages consumed, private to vmemmap_populate()
> + */
> +struct vmem_altmap {
> +	unsigned long base_pfn;
> +	const unsigned long end_pfn;
> +	const unsigned long reserve;
> +	unsigned long free;
> +	unsigned long align;
> +	unsigned long alloc;
> +};

Instead of embedding that, what about conditionally allocating it and 
store a pointer to it in the "struct memory_block"?

In the general case as of today, we don't have an altmap.

> +
>   struct memory_block {
>   	unsigned long start_section_nr;
>   	unsigned long state;		/* serialized by the dev->lock */
> @@ -77,11 +94,7 @@ struct memory_block {
>   	 */
>   	struct zone *zone;
>   	struct device dev;
> -	/*
> -	 * Number of vmemmap pages. These pages
> -	 * lay at the beginning of the memory block.
> -	 */
> -	unsigned long nr_vmemmap_pages;
> +	struct vmem_altmap altmap;
>   	struct memory_group *group;	/* group (if any) for this block */
>   	struct list_head group_next;	/* next block inside memory group */
>   #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> @@ -147,7 +160,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
>   extern int register_memory_notifier(struct notifier_block *nb);
>   extern void unregister_memory_notifier(struct notifier_block *nb);
>   int create_memory_block_devices(unsigned long start, unsigned long size,

[...]

>   static int check_cpu_on_node(int nid)
> @@ -2036,9 +2042,8 @@ EXPORT_SYMBOL(try_offline_node);
>   
>   static int __ref try_remove_memory(u64 start, u64 size)
>   {
> -	struct vmem_altmap mhp_altmap = {};
> +	int ret;
>   	struct vmem_altmap *altmap = NULL;
> -	unsigned long nr_vmemmap_pages;
>   	int rc = 0, nid = NUMA_NO_NODE;
>   
>   	BUG_ON(check_hotplug_memory_range(start, size));
> @@ -2060,24 +2065,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
>   	 * the same granularity it was added - a single memory block.
>   	 */
> +

^ unrealted change?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/5] mm/hotplug: Allow architecture override for memmap on memory feature
  2023-07-06  8:50   ` Aneesh Kumar K.V
@ 2023-07-06  9:19     ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Some architectures like ppc64 wants to enable this feature only with radix
> translation and their vemmap mappings have different alignment
> requirements. Add overrider for mhp_supports_memmap_on_memory() and also
> use altmap.reserve feature to adjust the pageblock alignment requirement.
> 
> The patch also fallback to allocation of memmap outside memblock if the
> alignment rules are not met for memmap on memory allocation. This allows to
> use the feature more widely allocating memmap as much as possible within
> the memory block getting added.
> 
> A follow patch to enable memmap on memory for ppc64 will use this.

See my other reply. Rather have another arch check then overriding it 
completely.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/5] mm/hotplug: Allow architecture override for memmap on memory feature
@ 2023-07-06  9:19     ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Some architectures like ppc64 wants to enable this feature only with radix
> translation and their vemmap mappings have different alignment
> requirements. Add overrider for mhp_supports_memmap_on_memory() and also
> use altmap.reserve feature to adjust the pageblock alignment requirement.
> 
> The patch also fallback to allocation of memmap outside memblock if the
> alignment rules are not met for memmap on memory allocation. This allows to
> use the feature more widely allocating memmap as much as possible within
> the memory block getting added.
> 
> A follow patch to enable memmap on memory for ppc64 will use this.

See my other reply. Rather have another arch check then overriding it 
completely.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
  2023-07-06  8:50   ` Aneesh Kumar K.V
@ 2023-07-06  9:24     ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Instead of checking for memmap on memory feature enablement within the
> functions checking for alignment, use the kernel parameter to control the
> memory hotplug flags. The generic kernel now enables memmap on memory
> feature if the hotplug flag request for the same.
> 
> The ACPI code now can pass the flag unconditionally because the kernel will
> fallback to not using the feature if the alignment rules are not met.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   drivers/acpi/acpi_memhotplug.c |  3 +--
>   include/linux/memory_hotplug.h | 14 ++++++++++++++
>   mm/memory_hotplug.c            | 35 +++++++++++-----------------------
>   3 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 24f662d8bd39..4d0096fc4cc2 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>   		if (!info->length)
>   			continue;
>   
> -		if (mhp_supports_memmap_on_memory(info->length))
> -			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
> +		mhp_flags |= get_memmap_on_memory_flags();
>   		result = __add_memory(mgid, info->start_addr, info->length,
>   				      mhp_flags);
>   
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index a769f44b8368..af7017122506 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
>   bool __mhp_supports_memmap_on_memory(unsigned long size);
>   #endif /* CONFIG_MEMORY_HOTPLUG */
>   
> +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
> +extern bool memmap_on_memory;
> +static inline unsigned long get_memmap_on_memory_flags(void)
> +{
> +	if (memmap_on_memory)
> +		return MHP_MEMMAP_ON_MEMORY;
> +	return 0;
> +}
> +#else
> +static inline unsigned long get_memmap_on_memory_flags(void)
> +{
> +	return 0;
> +}
> +#endif

That's kind-of ugly TBH.


Why do we need this change?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
@ 2023-07-06  9:24     ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06  9:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Instead of checking for memmap on memory feature enablement within the
> functions checking for alignment, use the kernel parameter to control the
> memory hotplug flags. The generic kernel now enables memmap on memory
> feature if the hotplug flag request for the same.
> 
> The ACPI code now can pass the flag unconditionally because the kernel will
> fallback to not using the feature if the alignment rules are not met.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   drivers/acpi/acpi_memhotplug.c |  3 +--
>   include/linux/memory_hotplug.h | 14 ++++++++++++++
>   mm/memory_hotplug.c            | 35 +++++++++++-----------------------
>   3 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 24f662d8bd39..4d0096fc4cc2 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>   		if (!info->length)
>   			continue;
>   
> -		if (mhp_supports_memmap_on_memory(info->length))
> -			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
> +		mhp_flags |= get_memmap_on_memory_flags();
>   		result = __add_memory(mgid, info->start_addr, info->length,
>   				      mhp_flags);
>   
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index a769f44b8368..af7017122506 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
>   bool __mhp_supports_memmap_on_memory(unsigned long size);
>   #endif /* CONFIG_MEMORY_HOTPLUG */
>   
> +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
> +extern bool memmap_on_memory;
> +static inline unsigned long get_memmap_on_memory_flags(void)
> +{
> +	if (memmap_on_memory)
> +		return MHP_MEMMAP_ON_MEMORY;
> +	return 0;
> +}
> +#else
> +static inline unsigned long get_memmap_on_memory_flags(void)
> +{
> +	return 0;
> +}
> +#endif

That's kind-of ugly TBH.


Why do we need this change?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-06  9:07     ` David Hildenbrand
@ 2023-07-06  9:27       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06  9:27 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/6/23 2:37 PM, David Hildenbrand wrote:
> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>> level based on different device boundary checks. We also use altmap.reserve
>> feature to align things correctly at pageblock granularity. We can end up
>> loosing some pages in memory with this. For ex: with 256MB memory block
>> size, we require 4 pages to map vmemmap pages, In order to align things
>> correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
>> 28 pages get reserved.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/Kconfig                          |  1 +
>>   arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
>>   .../platforms/pseries/hotplug-memory.c        |  4 ++-
>>   3 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 116d6add0bb0..f890907e5bbf 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -157,6 +157,7 @@ config PPC
>>       select ARCH_HAS_UBSAN_SANITIZE_ALL
>>       select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>       select ARCH_KEEP_MEMBLOCK
>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>       select ARCH_MIGHT_HAVE_PC_PARPORT
>>       select ARCH_MIGHT_HAVE_PC_SERIO
>>       select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index a62729f70f2a..c0bd60b5fb64 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>>         return 1;
>>   }
>> +
>> +/*
>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>> + * some of the restrictions. We don't check for PMD_SIZE because our
>> + * vmemmap allocation code can fallback correctly. The pageblock
> 
> x86 also has the fallback IIRC; the concern is rather that you end up with a PTE-mapped vmemmap, which is slower at runtime than a PMD-mapped vmemmap.


The memory block size is dependent on a config option at the hypervisor for power and we can have values ranging from 16MB - 512MB
With these different memory block sizes I was thinking relaxing these checks makes the feature more useful. Also with page size
of 64K using a 2M vmemmap requires larger memory block size. 

> 
>> + * alignment requirement is met using altmap->reserve blocks.
>> + */
>> +bool mhp_supports_memmap_on_memory(unsigned long size)
>> +{
>> +    if (!radix_enabled())
>> +        return false;
>> +    /*
>> +     * The pageblock alignment requirement is met by using
>> +     * reserve blocks in altmap.
>> +     */
>> +    return size == memory_block_size_bytes();
>> +}
> 
> I really dislike such arch overrides.
> 
> I think the flow should be something like that, having a generic:
> 
> bool mhp_supports_memmap_on_memory(unsigned long size)
> {
>     ...
>     return arch_mhp_supports_memmap_on_memory(size)) &&
>            size == memory_block_size_bytes() &&
>            ...
> }
> 

Sure we can do that. But for ppc64 we also want to skip the PMD_SIZE and pageblock
alignment restrictions. 

> where we'll also keep the pageblock check here.

For ppc64, I converted that pageblock rule as a reserve blocks allocation with altmap.
I am not sure whether we want to do that outside ppc64? 

> 
> And a ppc specific
> 
> bool arch_mhp_supports_memmap_on_memory(unsigned long size)
> {
>     /*
>          * Don't check for the vmemmap covering PMD_SIZE, we accept that
>          * we might get a PTE-mapped vmemmap.
>          */
>     return radix_enabled();
> }
> 
> We can then move the PMD_SIZE check into arch specific code (x86-aarch64).
> 

sure. will rework the changes accordingly. 

-aneesh


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-06  9:27       ` Aneesh Kumar K V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06  9:27 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/6/23 2:37 PM, David Hildenbrand wrote:
> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>> level based on different device boundary checks. We also use altmap.reserve
>> feature to align things correctly at pageblock granularity. We can end up
>> loosing some pages in memory with this. For ex: with 256MB memory block
>> size, we require 4 pages to map vmemmap pages, In order to align things
>> correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
>> 28 pages get reserved.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/Kconfig                          |  1 +
>>   arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
>>   .../platforms/pseries/hotplug-memory.c        |  4 ++-
>>   3 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 116d6add0bb0..f890907e5bbf 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -157,6 +157,7 @@ config PPC
>>       select ARCH_HAS_UBSAN_SANITIZE_ALL
>>       select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>       select ARCH_KEEP_MEMBLOCK
>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>       select ARCH_MIGHT_HAVE_PC_PARPORT
>>       select ARCH_MIGHT_HAVE_PC_SERIO
>>       select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index a62729f70f2a..c0bd60b5fb64 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>>         return 1;
>>   }
>> +
>> +/*
>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>> + * some of the restrictions. We don't check for PMD_SIZE because our
>> + * vmemmap allocation code can fallback correctly. The pageblock
> 
> x86 also has the fallback IIRC; the concern is rather that you end up with a PTE-mapped vmemmap, which is slower at runtime than a PMD-mapped vmemmap.


The memory block size is dependent on a config option at the hypervisor for power and we can have values ranging from 16MB - 512MB
With these different memory block sizes I was thinking relaxing these checks makes the feature more useful. Also with page size
of 64K using a 2M vmemmap requires larger memory block size. 

> 
>> + * alignment requirement is met using altmap->reserve blocks.
>> + */
>> +bool mhp_supports_memmap_on_memory(unsigned long size)
>> +{
>> +    if (!radix_enabled())
>> +        return false;
>> +    /*
>> +     * The pageblock alignment requirement is met by using
>> +     * reserve blocks in altmap.
>> +     */
>> +    return size == memory_block_size_bytes();
>> +}
> 
> I really dislike such arch overrides.
> 
> I think the flow should be something like that, having a generic:
> 
> bool mhp_supports_memmap_on_memory(unsigned long size)
> {
>     ...
>     return arch_mhp_supports_memmap_on_memory(size)) &&
>            size == memory_block_size_bytes() &&
>            ...
> }
> 

Sure we can do that. But for ppc64 we also want to skip the PMD_SIZE and pageblock
alignment restrictions. 

> where we'll also keep the pageblock check here.

For ppc64, I converted that pageblock rule as a reserve blocks allocation with altmap.
I am not sure whether we want to do that outside ppc64? 

> 
> And a ppc specific
> 
> bool arch_mhp_supports_memmap_on_memory(unsigned long size)
> {
>     /*
>          * Don't check for the vmemmap covering PMD_SIZE, we accept that
>          * we might get a PTE-mapped vmemmap.
>          */
>     return radix_enabled();
> }
> 
> We can then move the PMD_SIZE check into arch specific code (x86-aarch64).
> 

sure. will rework the changes accordingly. 

-aneesh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06  9:18     ` David Hildenbrand
@ 2023-07-06  9:36       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06  9:36 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/6/23 2:48 PM, David Hildenbrand wrote:
> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
> 
> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
> 
> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
> 
> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
> 
> [...]


With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because 
a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range. 
So on free we  check 

vmemmap_free() {
...
	if (altmap) {
		alt_start = altmap->base_pfn;
		alt_end = altmap->base_pfn + altmap->reserve +
			  altmap->free + altmap->alloc + altmap->align;
	}

...
		if (base_pfn >= alt_start && base_pfn < alt_end) {
			vmem_altmap_free(altmap, nr_pages);

to see whether we did use altmap for the vmemmap allocation. 

> 
>>   +/**
>> + * struct vmem_altmap - pre-allocated storage for vmemmap_populate
>> + * @base_pfn: base of the entire dev_pagemap mapping
>> + * @reserve: pages mapped, but reserved for driver use (relative to @base)
>> + * @free: free pages set aside in the mapping for memmap storage
>> + * @align: pages reserved to meet allocation alignments
>> + * @alloc: track pages consumed, private to vmemmap_populate()
>> + */
>> +struct vmem_altmap {
>> +    unsigned long base_pfn;
>> +    const unsigned long end_pfn;
>> +    const unsigned long reserve;
>> +    unsigned long free;
>> +    unsigned long align;
>> +    unsigned long alloc;
>> +};
> 
> Instead of embedding that, what about conditionally allocating it and store a pointer to it in the "struct memory_block"?
> 
> In the general case as of today, we don't have an altmap.
> 

Sure but with memmap on memory option it is essentially adding that right?. Is the concern related to the increase in the size of
struct memory_block  ?

>> +
>>   struct memory_block {
>>       unsigned long start_section_nr;
>>       unsigned long state;        /* serialized by the dev->lock */
>> @@ -77,11 +94,7 @@ struct memory_block {
>>        */
>>       struct zone *zone;
>>       struct device dev;
>> -    /*
>> -     * Number of vmemmap pages. These pages
>> -     * lay at the beginning of the memory block.
>> -     */
>> -    unsigned long nr_vmemmap_pages;
>> +    struct vmem_altmap altmap;
>>       struct memory_group *group;    /* group (if any) for this block */
>>       struct list_head group_next;    /* next block inside memory group */
>>   #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
>> @@ -147,7 +160,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
>>   extern int register_memory_notifier(struct notifier_block *nb);
>>   extern void unregister_memory_notifier(struct notifier_block *nb);
>>   int create_memory_block_devices(unsigned long start, unsigned long size,
> 
> [...]
> 
>>   static int check_cpu_on_node(int nid)
>> @@ -2036,9 +2042,8 @@ EXPORT_SYMBOL(try_offline_node);
>>     static int __ref try_remove_memory(u64 start, u64 size)
>>   {
>> -    struct vmem_altmap mhp_altmap = {};
>> +    int ret;
>>       struct vmem_altmap *altmap = NULL;
>> -    unsigned long nr_vmemmap_pages;
>>       int rc = 0, nid = NUMA_NO_NODE;
>>         BUG_ON(check_hotplug_memory_range(start, size));
>> @@ -2060,24 +2065,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>        * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
>>        * the same granularity it was added - a single memory block.
>>        */
>> +
> 
> ^ unrealted change?
> 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-06  9:36       ` Aneesh Kumar K V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06  9:36 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/6/23 2:48 PM, David Hildenbrand wrote:
> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
> 
> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
> 
> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
> 
> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
> 
> [...]


With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because 
a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range. 
So on free we  check 

vmemmap_free() {
...
	if (altmap) {
		alt_start = altmap->base_pfn;
		alt_end = altmap->base_pfn + altmap->reserve +
			  altmap->free + altmap->alloc + altmap->align;
	}

...
		if (base_pfn >= alt_start && base_pfn < alt_end) {
			vmem_altmap_free(altmap, nr_pages);

to see whether we did use altmap for the vmemmap allocation. 

> 
>>   +/**
>> + * struct vmem_altmap - pre-allocated storage for vmemmap_populate
>> + * @base_pfn: base of the entire dev_pagemap mapping
>> + * @reserve: pages mapped, but reserved for driver use (relative to @base)
>> + * @free: free pages set aside in the mapping for memmap storage
>> + * @align: pages reserved to meet allocation alignments
>> + * @alloc: track pages consumed, private to vmemmap_populate()
>> + */
>> +struct vmem_altmap {
>> +    unsigned long base_pfn;
>> +    const unsigned long end_pfn;
>> +    const unsigned long reserve;
>> +    unsigned long free;
>> +    unsigned long align;
>> +    unsigned long alloc;
>> +};
> 
> Instead of embedding that, what about conditionally allocating it and store a pointer to it in the "struct memory_block"?
> 
> In the general case as of today, we don't have an altmap.
> 

Sure but with memmap on memory option it is essentially adding that right?. Is the concern related to the increase in the size of
struct memory_block  ?

>> +
>>   struct memory_block {
>>       unsigned long start_section_nr;
>>       unsigned long state;        /* serialized by the dev->lock */
>> @@ -77,11 +94,7 @@ struct memory_block {
>>        */
>>       struct zone *zone;
>>       struct device dev;
>> -    /*
>> -     * Number of vmemmap pages. These pages
>> -     * lay at the beginning of the memory block.
>> -     */
>> -    unsigned long nr_vmemmap_pages;
>> +    struct vmem_altmap altmap;
>>       struct memory_group *group;    /* group (if any) for this block */
>>       struct list_head group_next;    /* next block inside memory group */
>>   #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
>> @@ -147,7 +160,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
>>   extern int register_memory_notifier(struct notifier_block *nb);
>>   extern void unregister_memory_notifier(struct notifier_block *nb);
>>   int create_memory_block_devices(unsigned long start, unsigned long size,
> 
> [...]
> 
>>   static int check_cpu_on_node(int nid)
>> @@ -2036,9 +2042,8 @@ EXPORT_SYMBOL(try_offline_node);
>>     static int __ref try_remove_memory(u64 start, u64 size)
>>   {
>> -    struct vmem_altmap mhp_altmap = {};
>> +    int ret;
>>       struct vmem_altmap *altmap = NULL;
>> -    unsigned long nr_vmemmap_pages;
>>       int rc = 0, nid = NUMA_NO_NODE;
>>         BUG_ON(check_hotplug_memory_range(start, size));
>> @@ -2060,24 +2065,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>        * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
>>        * the same granularity it was added - a single memory block.
>>        */
>> +
> 
> ^ unrealted change?
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
  2023-07-06  9:24     ` David Hildenbrand
@ 2023-07-06 10:04       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06 10:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/6/23 2:54 PM, David Hildenbrand wrote:
> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>> Instead of checking for memmap on memory feature enablement within the
>> functions checking for alignment, use the kernel parameter to control the
>> memory hotplug flags. The generic kernel now enables memmap on memory
>> feature if the hotplug flag request for the same.
>>
>> The ACPI code now can pass the flag unconditionally because the kernel will
>> fallback to not using the feature if the alignment rules are not met.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   drivers/acpi/acpi_memhotplug.c |  3 +--
>>   include/linux/memory_hotplug.h | 14 ++++++++++++++
>>   mm/memory_hotplug.c            | 35 +++++++++++-----------------------
>>   3 files changed, 26 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 24f662d8bd39..4d0096fc4cc2 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>>           if (!info->length)
>>               continue;
>>   -        if (mhp_supports_memmap_on_memory(info->length))
>> -            mhp_flags |= MHP_MEMMAP_ON_MEMORY;
>> +        mhp_flags |= get_memmap_on_memory_flags();
>>           result = __add_memory(mgid, info->start_addr, info->length,
>>                         mhp_flags);
>>   diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index a769f44b8368..af7017122506 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
>>   bool __mhp_supports_memmap_on_memory(unsigned long size);
>>   #endif /* CONFIG_MEMORY_HOTPLUG */
>>   +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>> +extern bool memmap_on_memory;
>> +static inline unsigned long get_memmap_on_memory_flags(void)
>> +{
>> +    if (memmap_on_memory)
>> +        return MHP_MEMMAP_ON_MEMORY;
>> +    return 0;
>> +}
>> +#else
>> +static inline unsigned long get_memmap_on_memory_flags(void)
>> +{
>> +    return 0;
>> +}
>> +#endif
> 
> That's kind-of ugly TBH.
> 
> 
> Why do we need this change?
> 

I was trying to avoid rest of the kernel doing 

if (mhp_supports_memmap_on_memory(info->length))
         mhp_flags |= MHP_MEMMAP_ON_MEMORY;

I was also following pattern similar to 

static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
{
	return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
}


Overall goal of the patch is to handle the fallback of not using altmap/memmap in memory 
inside add_memory_resource and the callsites only express the desire to use memmap on memory
if possible/enabled. 

-aneesh


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
@ 2023-07-06 10:04       ` Aneesh Kumar K V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06 10:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/6/23 2:54 PM, David Hildenbrand wrote:
> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>> Instead of checking for memmap on memory feature enablement within the
>> functions checking for alignment, use the kernel parameter to control the
>> memory hotplug flags. The generic kernel now enables memmap on memory
>> feature if the hotplug flag request for the same.
>>
>> The ACPI code now can pass the flag unconditionally because the kernel will
>> fallback to not using the feature if the alignment rules are not met.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   drivers/acpi/acpi_memhotplug.c |  3 +--
>>   include/linux/memory_hotplug.h | 14 ++++++++++++++
>>   mm/memory_hotplug.c            | 35 +++++++++++-----------------------
>>   3 files changed, 26 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 24f662d8bd39..4d0096fc4cc2 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>>           if (!info->length)
>>               continue;
>>   -        if (mhp_supports_memmap_on_memory(info->length))
>> -            mhp_flags |= MHP_MEMMAP_ON_MEMORY;
>> +        mhp_flags |= get_memmap_on_memory_flags();
>>           result = __add_memory(mgid, info->start_addr, info->length,
>>                         mhp_flags);
>>   diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index a769f44b8368..af7017122506 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
>>   bool __mhp_supports_memmap_on_memory(unsigned long size);
>>   #endif /* CONFIG_MEMORY_HOTPLUG */
>>   +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>> +extern bool memmap_on_memory;
>> +static inline unsigned long get_memmap_on_memory_flags(void)
>> +{
>> +    if (memmap_on_memory)
>> +        return MHP_MEMMAP_ON_MEMORY;
>> +    return 0;
>> +}
>> +#else
>> +static inline unsigned long get_memmap_on_memory_flags(void)
>> +{
>> +    return 0;
>> +}
>> +#endif
> 
> That's kind-of ugly TBH.
> 
> 
> Why do we need this change?
> 

I was trying to avoid rest of the kernel doing 

if (mhp_supports_memmap_on_memory(info->length))
         mhp_flags |= MHP_MEMMAP_ON_MEMORY;

I was also following pattern similar to 

static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
{
	return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
}


Overall goal of the patch is to handle the fallback of not using altmap/memmap in memory 
inside add_memory_resource and the callsites only express the desire to use memmap on memory
if possible/enabled. 

-aneesh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06  9:36       ` Aneesh Kumar K V
@ 2023-07-06 11:14         ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06 11:14 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 11:36, Aneesh Kumar K V wrote:
> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>
>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>
>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>
>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>
>> [...]
> 
> 
> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
> So on free we  check

So you end up with a mixture of altmap and ordinarily-allocated vmemmap 
pages? That sound wrong (and is counter-intuitive to the feature in 
general, where we *don't* want to allocate the vmemmap from outside the 
altmap).

(64 * 1024) / sizeof(struct page) -> 1024 pages

1024 pages * 64k = 64 MiB.

What's the memory block size on these systems? If it's >= 64 MiB the 
vmemmap of a single memory block fits into a single page and we should 
be fine.

Smells like you want to disable the feature on a 64k system.

> 
> vmemmap_free() {
> ...
> 	if (altmap) {
> 		alt_start = altmap->base_pfn;
> 		alt_end = altmap->base_pfn + altmap->reserve +
> 			  altmap->free + altmap->alloc + altmap->align;
> 	}
> 
> ...
> 		if (base_pfn >= alt_start && base_pfn < alt_end) {
> 			vmem_altmap_free(altmap, nr_pages);
> 
> to see whether we did use altmap for the vmemmap allocation.
> 
>>
>>>    +/**
>>> + * struct vmem_altmap - pre-allocated storage for vmemmap_populate
>>> + * @base_pfn: base of the entire dev_pagemap mapping
>>> + * @reserve: pages mapped, but reserved for driver use (relative to @base)
>>> + * @free: free pages set aside in the mapping for memmap storage
>>> + * @align: pages reserved to meet allocation alignments
>>> + * @alloc: track pages consumed, private to vmemmap_populate()
>>> + */
>>> +struct vmem_altmap {
>>> +    unsigned long base_pfn;
>>> +    const unsigned long end_pfn;
>>> +    const unsigned long reserve;
>>> +    unsigned long free;
>>> +    unsigned long align;
>>> +    unsigned long alloc;
>>> +};
>>
>> Instead of embedding that, what about conditionally allocating it and store a pointer to it in the "struct memory_block"?
>>
>> In the general case as of today, we don't have an altmap.
>>
> 
> Sure but with memmap on memory option it is essentially adding that right?.

At least on x86_64 and aarch64 only for 128 MiB DIMMs (and especially, 
not memory added by hv-balloon, virtio-mem, xen-balloon).

So in the general case it's not that frequently used. Maybe on ppc64 
once wired up.

Is the concern related to the increase in the size of
> struct memory_block  ?

Partially. It looks cleaner to have !mem->altmap if there is no altmap.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-06 11:14         ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06 11:14 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 11:36, Aneesh Kumar K V wrote:
> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>
>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>
>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>
>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>
>> [...]
> 
> 
> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
> So on free we  check

So you end up with a mixture of altmap and ordinarily-allocated vmemmap 
pages? That sound wrong (and is counter-intuitive to the feature in 
general, where we *don't* want to allocate the vmemmap from outside the 
altmap).

(64 * 1024) / sizeof(struct page) -> 1024 pages

1024 pages * 64k = 64 MiB.

What's the memory block size on these systems? If it's >= 64 MiB the 
vmemmap of a single memory block fits into a single page and we should 
be fine.

Smells like you want to disable the feature on a 64k system.

> 
> vmemmap_free() {
> ...
> 	if (altmap) {
> 		alt_start = altmap->base_pfn;
> 		alt_end = altmap->base_pfn + altmap->reserve +
> 			  altmap->free + altmap->alloc + altmap->align;
> 	}
> 
> ...
> 		if (base_pfn >= alt_start && base_pfn < alt_end) {
> 			vmem_altmap_free(altmap, nr_pages);
> 
> to see whether we did use altmap for the vmemmap allocation.
> 
>>
>>>    +/**
>>> + * struct vmem_altmap - pre-allocated storage for vmemmap_populate
>>> + * @base_pfn: base of the entire dev_pagemap mapping
>>> + * @reserve: pages mapped, but reserved for driver use (relative to @base)
>>> + * @free: free pages set aside in the mapping for memmap storage
>>> + * @align: pages reserved to meet allocation alignments
>>> + * @alloc: track pages consumed, private to vmemmap_populate()
>>> + */
>>> +struct vmem_altmap {
>>> +    unsigned long base_pfn;
>>> +    const unsigned long end_pfn;
>>> +    const unsigned long reserve;
>>> +    unsigned long free;
>>> +    unsigned long align;
>>> +    unsigned long alloc;
>>> +};
>>
>> Instead of embedding that, what about conditionally allocating it and store a pointer to it in the "struct memory_block"?
>>
>> In the general case as of today, we don't have an altmap.
>>
> 
> Sure but with memmap on memory option it is essentially adding that right?.

At least on x86_64 and aarch64 only for 128 MiB DIMMs (and especially, 
not memory added by hv-balloon, virtio-mem, xen-balloon).

So in the general case it's not that frequently used. Maybe on ppc64 
once wired up.

Is the concern related to the increase in the size of
> struct memory_block  ?

Partially. It looks cleaner to have !mem->altmap if there is no altmap.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
  2023-07-06 10:04       ` Aneesh Kumar K V
@ 2023-07-06 11:20         ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06 11:20 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 12:04, Aneesh Kumar K V wrote:
> On 7/6/23 2:54 PM, David Hildenbrand wrote:
>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>> Instead of checking for memmap on memory feature enablement within the
>>> functions checking for alignment, use the kernel parameter to control the
>>> memory hotplug flags. The generic kernel now enables memmap on memory
>>> feature if the hotplug flag request for the same.
>>>
>>> The ACPI code now can pass the flag unconditionally because the kernel will
>>> fallback to not using the feature if the alignment rules are not met.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>    drivers/acpi/acpi_memhotplug.c |  3 +--
>>>    include/linux/memory_hotplug.h | 14 ++++++++++++++
>>>    mm/memory_hotplug.c            | 35 +++++++++++-----------------------
>>>    3 files changed, 26 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>>> index 24f662d8bd39..4d0096fc4cc2 100644
>>> --- a/drivers/acpi/acpi_memhotplug.c
>>> +++ b/drivers/acpi/acpi_memhotplug.c
>>> @@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>>>            if (!info->length)
>>>                continue;
>>>    -        if (mhp_supports_memmap_on_memory(info->length))
>>> -            mhp_flags |= MHP_MEMMAP_ON_MEMORY;
>>> +        mhp_flags |= get_memmap_on_memory_flags();
>>>            result = __add_memory(mgid, info->start_addr, info->length,
>>>                          mhp_flags);
>>>    diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>> index a769f44b8368..af7017122506 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
>>>    bool __mhp_supports_memmap_on_memory(unsigned long size);
>>>    #endif /* CONFIG_MEMORY_HOTPLUG */
>>>    +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>>> +extern bool memmap_on_memory;
>>> +static inline unsigned long get_memmap_on_memory_flags(void)
>>> +{
>>> +    if (memmap_on_memory)
>>> +        return MHP_MEMMAP_ON_MEMORY;
>>> +    return 0;
>>> +}
>>> +#else
>>> +static inline unsigned long get_memmap_on_memory_flags(void)
>>> +{
>>> +    return 0;
>>> +}
>>> +#endif
>>
>> That's kind-of ugly TBH.
>>
>>
>> Why do we need this change?
>>
> 
> I was trying to avoid rest of the kernel doing
> 
> if (mhp_supports_memmap_on_memory(info->length))
>           mhp_flags |= MHP_MEMMAP_ON_MEMORY;

It would look much cleaner if you would simply have:

mhp_flags |= MHP_MEMMAP_ON_MEMORY;
result = __add_memory(mgid, info->start_addr, info->length, mhp_flags);

And modify the semantics of MHP_MEMMAP_ON_MEMORY to mean "allocate the 
memmap from hotplugged memory if supported and enabled globally".

Then, we can simply ignore the flag in __add_memory() if either the 
global toggle is off or if it's not supported for the range / by the arch.

Maybe, in the future we want more fine-grained control (as asked by 
dax/kmem) and maybe not have the global toggle. But for now, that should 
be good enough I guess.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag
@ 2023-07-06 11:20         ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06 11:20 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 12:04, Aneesh Kumar K V wrote:
> On 7/6/23 2:54 PM, David Hildenbrand wrote:
>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>> Instead of checking for memmap on memory feature enablement within the
>>> functions checking for alignment, use the kernel parameter to control the
>>> memory hotplug flags. The generic kernel now enables memmap on memory
>>> feature if the hotplug flag request for the same.
>>>
>>> The ACPI code now can pass the flag unconditionally because the kernel will
>>> fallback to not using the feature if the alignment rules are not met.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>    drivers/acpi/acpi_memhotplug.c |  3 +--
>>>    include/linux/memory_hotplug.h | 14 ++++++++++++++
>>>    mm/memory_hotplug.c            | 35 +++++++++++-----------------------
>>>    3 files changed, 26 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>>> index 24f662d8bd39..4d0096fc4cc2 100644
>>> --- a/drivers/acpi/acpi_memhotplug.c
>>> +++ b/drivers/acpi/acpi_memhotplug.c
>>> @@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>>>            if (!info->length)
>>>                continue;
>>>    -        if (mhp_supports_memmap_on_memory(info->length))
>>> -            mhp_flags |= MHP_MEMMAP_ON_MEMORY;
>>> +        mhp_flags |= get_memmap_on_memory_flags();
>>>            result = __add_memory(mgid, info->start_addr, info->length,
>>>                          mhp_flags);
>>>    diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>> index a769f44b8368..af7017122506 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -358,4 +358,18 @@ bool mhp_supports_memmap_on_memory(unsigned long size);
>>>    bool __mhp_supports_memmap_on_memory(unsigned long size);
>>>    #endif /* CONFIG_MEMORY_HOTPLUG */
>>>    +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>>> +extern bool memmap_on_memory;
>>> +static inline unsigned long get_memmap_on_memory_flags(void)
>>> +{
>>> +    if (memmap_on_memory)
>>> +        return MHP_MEMMAP_ON_MEMORY;
>>> +    return 0;
>>> +}
>>> +#else
>>> +static inline unsigned long get_memmap_on_memory_flags(void)
>>> +{
>>> +    return 0;
>>> +}
>>> +#endif
>>
>> That's kind-of ugly TBH.
>>
>>
>> Why do we need this change?
>>
> 
> I was trying to avoid rest of the kernel doing
> 
> if (mhp_supports_memmap_on_memory(info->length))
>           mhp_flags |= MHP_MEMMAP_ON_MEMORY;

It would look much cleaner if you would simply have:

mhp_flags |= MHP_MEMMAP_ON_MEMORY;
result = __add_memory(mgid, info->start_addr, info->length, mhp_flags);

And modify the semantics of MHP_MEMMAP_ON_MEMORY to mean "allocate the 
memmap from hotplugged memory if supported and enabled globally".

Then, we can simply ignore the flag in __add_memory() if either the 
global toggle is off or if it's not supported for the range / by the arch.

Maybe, in the future we want more fine-grained control (as asked by 
dax/kmem) and maybe not have the global toggle. But for now, that should 
be good enough I guess.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06 11:14         ` David Hildenbrand
@ 2023-07-06 12:32           ` Aneesh Kumar K V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06 12:32 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/6/23 4:44 PM, David Hildenbrand wrote:
> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>
>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>
>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>
>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>
>>> [...]
>>
>>
>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>> So on free we  check
> 
> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
> 
> (64 * 1024) / sizeof(struct page) -> 1024 pages
> 
> 1024 pages * 64k = 64 MiB.
> 
> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
> 
> Smells like you want to disable the feature on a 64k system.
>

But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
of adding vmemmap_altmap to  struct memory_block

 
>>
>> vmemmap_free() {
>> ...
>>     if (altmap) {
>>         alt_start = altmap->base_pfn;
>>         alt_end = altmap->base_pfn + altmap->reserve +
>>               altmap->free + altmap->alloc + altmap->align;
>>     }
>>
>> ...
>>         if (base_pfn >= alt_start && base_pfn < alt_end) {
>>             vmem_altmap_free(altmap, nr_pages);
>>
>> to see whether we did use altmap for the vmemmap allocation.
>>


-aneesh




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-06 12:32           ` Aneesh Kumar K V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06 12:32 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/6/23 4:44 PM, David Hildenbrand wrote:
> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>
>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>
>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>
>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>
>>> [...]
>>
>>
>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>> So on free we  check
> 
> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
> 
> (64 * 1024) / sizeof(struct page) -> 1024 pages
> 
> 1024 pages * 64k = 64 MiB.
> 
> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
> 
> Smells like you want to disable the feature on a 64k system.
>

But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
of adding vmemmap_altmap to  struct memory_block

 
>>
>> vmemmap_free() {
>> ...
>>     if (altmap) {
>>         alt_start = altmap->base_pfn;
>>         alt_end = altmap->base_pfn + altmap->reserve +
>>               altmap->free + altmap->alloc + altmap->align;
>>     }
>>
>> ...
>>         if (base_pfn >= alt_start && base_pfn < alt_end) {
>>             vmem_altmap_free(altmap, nr_pages);
>>
>> to see whether we did use altmap for the vmemmap allocation.
>>


-aneesh



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06 12:32           ` Aneesh Kumar K V
@ 2023-07-06 12:59             ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06 12:59 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 14:32, Aneesh Kumar K V wrote:
> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>
>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>
>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>
>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>
>>>> [...]
>>>
>>>
>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>> So on free we  check
>>
>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>
>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>
>> 1024 pages * 64k = 64 MiB.
>>
>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>
>> Smells like you want to disable the feature on a 64k system.
>>
> 
> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
> of adding vmemmap_altmap to  struct memory_block

I'd suggest making sure that for the memmap_on_memory case your really 
*always* allocate from the altmap (that's what the feature is about 
after all), and otherwise block the feature (i.e., arch_mhp_supports_... 
should reject it).

Then, you can reconstruct the altmap layout trivially

base_pfn: start of the range to unplug
end_pfn: base_pfn + nr_vmemmap_pages

and pass that to the removal code, which will do the right thing, no?


Sure, remembering the altmap might be a potential cleanup (eventually?), 
but the basic reasoning why this is required as patch #1 IMHO is wrong: 
if you say you support memmap_on_memory for a configuration, then you 
should also properly support it (allocate from the hotplugged memory), 
not silently fall back to something else.


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-06 12:59             ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-06 12:59 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 14:32, Aneesh Kumar K V wrote:
> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>
>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>
>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>
>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>
>>>> [...]
>>>
>>>
>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>> So on free we  check
>>
>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>
>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>
>> 1024 pages * 64k = 64 MiB.
>>
>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>
>> Smells like you want to disable the feature on a 64k system.
>>
> 
> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
> of adding vmemmap_altmap to  struct memory_block

I'd suggest making sure that for the memmap_on_memory case your really 
*always* allocate from the altmap (that's what the feature is about 
after all), and otherwise block the feature (i.e., arch_mhp_supports_... 
should reject it).

Then, you can reconstruct the altmap layout trivially

base_pfn: start of the range to unplug
end_pfn: base_pfn + nr_vmemmap_pages

and pass that to the removal code, which will do the right thing, no?


Sure, remembering the altmap might be a potential cleanup (eventually?), 
but the basic reasoning why this is required as patch #1 IMHO is wrong: 
if you say you support memmap_on_memory for a configuration, then you 
should also properly support it (allocate from the hotplugged memory), 
not silently fall back to something else.


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06 12:59             ` David Hildenbrand
@ 2023-07-06 16:06               ` Aneesh Kumar K V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06 16:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/6/23 6:29 PM, David Hildenbrand wrote:
> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>
>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>
>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>
>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>
>>>>> [...]
>>>>
>>>>
>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>> So on free we  check
>>>
>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>
>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>
>>> 1024 pages * 64k = 64 MiB.
>>>
>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>
>>> Smells like you want to disable the feature on a 64k system.
>>>
>>
>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>> of adding vmemmap_altmap to  struct memory_block
> 
> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>

Sure. How about?

bool mhp_supports_memmap_on_memory(unsigned long size)
{

	unsigned long nr_pages = size >> PAGE_SHIFT;
	unsigned long vmemmap_size = nr_pages * sizeof(struct page);

	if (!radix_enabled())
		return false;
	/*
	 * memmap on memory only supported with memory block size add/remove
	 */
	if (size != memory_block_size_bytes())
		return false;
	/*
	 * Also make sure the vmemmap allocation is fully contianed
	 * so that we always allocate vmemmap memory from altmap area.
	 */
	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
		return false;
	/*
	 * The pageblock alignment requirement is met by using
	 * reserve blocks in altmap.
	 */
	return true;
}



 
> Then, you can reconstruct the altmap layout trivially
> 
> base_pfn: start of the range to unplug
> end_pfn: base_pfn + nr_vmemmap_pages
> 
> and pass that to the removal code, which will do the right thing, no?
> 
> 
> Sure, remembering the altmap might be a potential cleanup (eventually?), but the basic reasoning why this is required as patch #1 IMHO is wrong: if you say you support memmap_on_memory for a configuration, then you should also properly support it (allocate from the hotplugged memory), not silently fall back to something else.

I guess you want to keep the altmap introduction as a later patch in the series and not the preparatory patch? Or are you ok with just adding the additional check I mentioned above w.r.t size value and keep this patch as patch 1  as a generic cleanup (avoiding
the recomputation of altmap->alloc/base_pfn/end_pfn? 

-aneesh
 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-06 16:06               ` Aneesh Kumar K V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-06 16:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/6/23 6:29 PM, David Hildenbrand wrote:
> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>
>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>
>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>
>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>
>>>>> [...]
>>>>
>>>>
>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>> So on free we  check
>>>
>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>
>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>
>>> 1024 pages * 64k = 64 MiB.
>>>
>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>
>>> Smells like you want to disable the feature on a 64k system.
>>>
>>
>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>> of adding vmemmap_altmap to  struct memory_block
> 
> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>

Sure. How about?

bool mhp_supports_memmap_on_memory(unsigned long size)
{

	unsigned long nr_pages = size >> PAGE_SHIFT;
	unsigned long vmemmap_size = nr_pages * sizeof(struct page);

	if (!radix_enabled())
		return false;
	/*
	 * memmap on memory only supported with memory block size add/remove
	 */
	if (size != memory_block_size_bytes())
		return false;
	/*
	 * Also make sure the vmemmap allocation is fully contianed
	 * so that we always allocate vmemmap memory from altmap area.
	 */
	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
		return false;
	/*
	 * The pageblock alignment requirement is met by using
	 * reserve blocks in altmap.
	 */
	return true;
}



 
> Then, you can reconstruct the altmap layout trivially
> 
> base_pfn: start of the range to unplug
> end_pfn: base_pfn + nr_vmemmap_pages
> 
> and pass that to the removal code, which will do the right thing, no?
> 
> 
> Sure, remembering the altmap might be a potential cleanup (eventually?), but the basic reasoning why this is required as patch #1 IMHO is wrong: if you say you support memmap_on_memory for a configuration, then you should also properly support it (allocate from the hotplugged memory), not silently fall back to something else.

I guess you want to keep the altmap introduction as a later patch in the series and not the preparatory patch? Or are you ok with just adding the additional check I mentioned above w.r.t size value and keep this patch as patch 1  as a generic cleanup (avoiding
the recomputation of altmap->alloc/base_pfn/end_pfn? 

-aneesh
 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-06 16:06               ` Aneesh Kumar K V
@ 2023-07-07 12:17                 ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-07 12:17 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 06.07.23 18:06, Aneesh Kumar K V wrote:
> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>
>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>
>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>
>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>
>>>>>> [...]
>>>>>
>>>>>
>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>> So on free we  check
>>>>
>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>
>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>
>>>> 1024 pages * 64k = 64 MiB.
>>>>
>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>
>>>> Smells like you want to disable the feature on a 64k system.
>>>>
>>>
>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>> of adding vmemmap_altmap to  struct memory_block
>>
>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>
> 
> Sure. How about?
> 
> bool mhp_supports_memmap_on_memory(unsigned long size)
> {
> 
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
> 
> 	if (!radix_enabled())
> 		return false;
> 	/*
> 	 * memmap on memory only supported with memory block size add/remove
> 	 */
> 	if (size != memory_block_size_bytes())
> 		return false;
> 	/*
> 	 * Also make sure the vmemmap allocation is fully contianed
> 	 * so that we always allocate vmemmap memory from altmap area.
> 	 */
> 	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
> 		return false;
> 	/*
> 	 * The pageblock alignment requirement is met by using
> 	 * reserve blocks in altmap.
> 	 */
> 	return true;
> }

Better, but the PAGE_SIZE that could be added to common code as well.

... but, the pageblock check in common code implies a PAGE_SIZE check, 
so why do we need any other check besides the radix_enabled() check for 
arm64 and just keep all the other checks in common code as they are?

If your vmemmap does not cover full pageblocks (which implies full 
pages), the feature cannot be used *unless* we'd waste altmap space in 
the vmemmap to cover one pageblock.

Wasting hotplugged memory certainly sounds wrong?


So I appreciate if you could explain why the pageblock check should not 
be had for ppc64?

> 
> 
> 
>   
>> Then, you can reconstruct the altmap layout trivially
>>
>> base_pfn: start of the range to unplug
>> end_pfn: base_pfn + nr_vmemmap_pages
>>
>> and pass that to the removal code, which will do the right thing, no?
>>
>>
>> Sure, remembering the altmap might be a potential cleanup (eventually?), but the basic reasoning why this is required as patch #1 IMHO is wrong: if you say you support memmap_on_memory for a configuration, then you should also properly support it (allocate from the hotplugged memory), not silently fall back to something else.
> 
> I guess you want to keep the altmap introduction as a later patch in the series and not the preparatory patch? Or are you ok with just adding the additional check I mentioned above w.r.t size value and keep this patch as patch 1  as a generic cleanup (avoiding
> the recomputation of altmap->alloc/base_pfn/end_pfn?

Yes, if it's not required better remove it completely from this 
patchset. We can alter discuss if keeping the altmap around is actually 
a cleanup or rather unnecessary.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-07 12:17                 ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-07 12:17 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 06.07.23 18:06, Aneesh Kumar K V wrote:
> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>
>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>
>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>
>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>
>>>>>> [...]
>>>>>
>>>>>
>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>> So on free we  check
>>>>
>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>
>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>
>>>> 1024 pages * 64k = 64 MiB.
>>>>
>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>
>>>> Smells like you want to disable the feature on a 64k system.
>>>>
>>>
>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>> of adding vmemmap_altmap to  struct memory_block
>>
>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>
> 
> Sure. How about?
> 
> bool mhp_supports_memmap_on_memory(unsigned long size)
> {
> 
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
> 
> 	if (!radix_enabled())
> 		return false;
> 	/*
> 	 * memmap on memory only supported with memory block size add/remove
> 	 */
> 	if (size != memory_block_size_bytes())
> 		return false;
> 	/*
> 	 * Also make sure the vmemmap allocation is fully contianed
> 	 * so that we always allocate vmemmap memory from altmap area.
> 	 */
> 	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
> 		return false;
> 	/*
> 	 * The pageblock alignment requirement is met by using
> 	 * reserve blocks in altmap.
> 	 */
> 	return true;
> }

Better, but the PAGE_SIZE that could be added to common code as well.

... but, the pageblock check in common code implies a PAGE_SIZE check, 
so why do we need any other check besides the radix_enabled() check for 
arm64 and just keep all the other checks in common code as they are?

If your vmemmap does not cover full pageblocks (which implies full 
pages), the feature cannot be used *unless* we'd waste altmap space in 
the vmemmap to cover one pageblock.

Wasting hotplugged memory certainly sounds wrong?


So I appreciate if you could explain why the pageblock check should not 
be had for ppc64?

> 
> 
> 
>   
>> Then, you can reconstruct the altmap layout trivially
>>
>> base_pfn: start of the range to unplug
>> end_pfn: base_pfn + nr_vmemmap_pages
>>
>> and pass that to the removal code, which will do the right thing, no?
>>
>>
>> Sure, remembering the altmap might be a potential cleanup (eventually?), but the basic reasoning why this is required as patch #1 IMHO is wrong: if you say you support memmap_on_memory for a configuration, then you should also properly support it (allocate from the hotplugged memory), not silently fall back to something else.
> 
> I guess you want to keep the altmap introduction as a later patch in the series and not the preparatory patch? Or are you ok with just adding the additional check I mentioned above w.r.t size value and keep this patch as patch 1  as a generic cleanup (avoiding
> the recomputation of altmap->alloc/base_pfn/end_pfn?

Yes, if it's not required better remove it completely from this 
patchset. We can alter discuss if keeping the altmap around is actually 
a cleanup or rather unnecessary.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-07 12:17                 ` David Hildenbrand
@ 2023-07-07 13:30                   ` Aneesh Kumar K V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-07 13:30 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/7/23 5:47 PM, David Hildenbrand wrote:
> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>
>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>
>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>
>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>
>>>>>>> [...]
>>>>>>
>>>>>>
>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>> So on free we  check
>>>>>
>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>
>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>
>>>>> 1024 pages * 64k = 64 MiB.
>>>>>
>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>
>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>
>>>>
>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>> of adding vmemmap_altmap to  struct memory_block
>>>
>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>
>>
>> Sure. How about?
>>
>> bool mhp_supports_memmap_on_memory(unsigned long size)
>> {
>>
>>     unsigned long nr_pages = size >> PAGE_SHIFT;
>>     unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>
>>     if (!radix_enabled())
>>         return false;
>>     /*
>>      * memmap on memory only supported with memory block size add/remove
>>      */
>>     if (size != memory_block_size_bytes())
>>         return false;
>>     /*
>>      * Also make sure the vmemmap allocation is fully contianed
>>      * so that we always allocate vmemmap memory from altmap area.
>>      */
>>     if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>         return false;
>>     /*
>>      * The pageblock alignment requirement is met by using
>>      * reserve blocks in altmap.
>>      */
>>     return true;
>> }
> 
> Better, but the PAGE_SIZE that could be added to common code as well.
> 
> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
> 
> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
> 
> Wasting hotplugged memory certainly sounds wrong?
> 
> 
> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
> 

If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64. 

We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
28 pages get reserved.


-aneesh






^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-07 13:30                   ` Aneesh Kumar K V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-07 13:30 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/7/23 5:47 PM, David Hildenbrand wrote:
> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>
>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>
>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>
>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>
>>>>>>> [...]
>>>>>>
>>>>>>
>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>> So on free we  check
>>>>>
>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>
>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>
>>>>> 1024 pages * 64k = 64 MiB.
>>>>>
>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>
>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>
>>>>
>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>> of adding vmemmap_altmap to  struct memory_block
>>>
>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>
>>
>> Sure. How about?
>>
>> bool mhp_supports_memmap_on_memory(unsigned long size)
>> {
>>
>>     unsigned long nr_pages = size >> PAGE_SHIFT;
>>     unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>
>>     if (!radix_enabled())
>>         return false;
>>     /*
>>      * memmap on memory only supported with memory block size add/remove
>>      */
>>     if (size != memory_block_size_bytes())
>>         return false;
>>     /*
>>      * Also make sure the vmemmap allocation is fully contianed
>>      * so that we always allocate vmemmap memory from altmap area.
>>      */
>>     if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>         return false;
>>     /*
>>      * The pageblock alignment requirement is met by using
>>      * reserve blocks in altmap.
>>      */
>>     return true;
>> }
> 
> Better, but the PAGE_SIZE that could be added to common code as well.
> 
> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
> 
> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
> 
> Wasting hotplugged memory certainly sounds wrong?
> 
> 
> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
> 

If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64. 

We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
28 pages get reserved.


-aneesh







^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-07 13:30                   ` Aneesh Kumar K V
@ 2023-07-07 15:42                     ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-07 15:42 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 07.07.23 15:30, Aneesh Kumar K V wrote:
> On 7/7/23 5:47 PM, David Hildenbrand wrote:
>> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>>
>>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>>
>>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>>
>>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>>
>>>>>>>> [...]
>>>>>>>
>>>>>>>
>>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>>> So on free we  check
>>>>>>
>>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>>
>>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>>
>>>>>> 1024 pages * 64k = 64 MiB.
>>>>>>
>>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>>
>>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>>
>>>>>
>>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>>> of adding vmemmap_altmap to  struct memory_block
>>>>
>>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>>
>>>
>>> Sure. How about?
>>>
>>> bool mhp_supports_memmap_on_memory(unsigned long size)
>>> {
>>>
>>>      unsigned long nr_pages = size >> PAGE_SHIFT;
>>>      unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>
>>>      if (!radix_enabled())
>>>          return false;
>>>      /*
>>>       * memmap on memory only supported with memory block size add/remove
>>>       */
>>>      if (size != memory_block_size_bytes())
>>>          return false;
>>>      /*
>>>       * Also make sure the vmemmap allocation is fully contianed
>>>       * so that we always allocate vmemmap memory from altmap area.
>>>       */
>>>      if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>>          return false;
>>>      /*
>>>       * The pageblock alignment requirement is met by using
>>>       * reserve blocks in altmap.
>>>       */
>>>      return true;
>>> }
>>
>> Better, but the PAGE_SIZE that could be added to common code as well.
>>
>> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
>>
>> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
>>
>> Wasting hotplugged memory certainly sounds wrong?
>>
>>
>> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
>>
> 
> If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
> on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64.

As we'll always handle a 2M pageblock, you'll end up wasting memory.

Assume a 64MiB memory block:

With 64k: 1024 pages -> 64k vmemmap, almost 2 MiB wasted. ~3.1 %
With 4k: 16384 pages -> 1 MiB vmemmap, 1 MiB wasted. ~1.5%

It gets worse with smaller memory block sizes.


> 
> We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
> size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
> 28 pages get reserved.


You can simply align-up the nr_vmemmap_pages up to pageblocks in the 
memory hotplug code (e.g., depending on a config/arch knob whether 
wasting memory is supported).

Because the pageblock granularity is a memory onlining/offlining 
limitation and should be checked+handled exactly there.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-07 15:42                     ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-07 15:42 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 07.07.23 15:30, Aneesh Kumar K V wrote:
> On 7/7/23 5:47 PM, David Hildenbrand wrote:
>> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>>
>>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>>
>>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>>
>>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>>
>>>>>>>> [...]
>>>>>>>
>>>>>>>
>>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>>> So on free we  check
>>>>>>
>>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>>
>>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>>
>>>>>> 1024 pages * 64k = 64 MiB.
>>>>>>
>>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>>
>>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>>
>>>>>
>>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>>> of adding vmemmap_altmap to  struct memory_block
>>>>
>>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>>
>>>
>>> Sure. How about?
>>>
>>> bool mhp_supports_memmap_on_memory(unsigned long size)
>>> {
>>>
>>>      unsigned long nr_pages = size >> PAGE_SHIFT;
>>>      unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>
>>>      if (!radix_enabled())
>>>          return false;
>>>      /*
>>>       * memmap on memory only supported with memory block size add/remove
>>>       */
>>>      if (size != memory_block_size_bytes())
>>>          return false;
>>>      /*
>>>       * Also make sure the vmemmap allocation is fully contianed
>>>       * so that we always allocate vmemmap memory from altmap area.
>>>       */
>>>      if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>>          return false;
>>>      /*
>>>       * The pageblock alignment requirement is met by using
>>>       * reserve blocks in altmap.
>>>       */
>>>      return true;
>>> }
>>
>> Better, but the PAGE_SIZE that could be added to common code as well.
>>
>> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
>>
>> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
>>
>> Wasting hotplugged memory certainly sounds wrong?
>>
>>
>> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
>>
> 
> If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
> on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64.

As we'll always handle a 2M pageblock, you'll end up wasting memory.

Assume a 64MiB memory block:

With 64k: 1024 pages -> 64k vmemmap, almost 2 MiB wasted. ~3.1 %
With 4k: 16384 pages -> 1 MiB vmemmap, 1 MiB wasted. ~1.5%

It gets worse with smaller memory block sizes.


> 
> We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
> size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
> 28 pages get reserved.


You can simply align-up the nr_vmemmap_pages up to pageblocks in the 
memory hotplug code (e.g., depending on a config/arch knob whether 
wasting memory is supported).

Because the pageblock granularity is a memory onlining/offlining 
limitation and should be checked+handled exactly there.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-07 15:42                     ` David Hildenbrand
@ 2023-07-07 16:25                       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-07 16:25 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/7/23 9:12 PM, David Hildenbrand wrote:
> On 07.07.23 15:30, Aneesh Kumar K V wrote:
>> On 7/7/23 5:47 PM, David Hildenbrand wrote:
>>> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>>>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>>>
>>>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>>>
>>>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>>>
>>>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>
>>>>>>>>
>>>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>>>> So on free we  check
>>>>>>>
>>>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>>>
>>>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>>>
>>>>>>> 1024 pages * 64k = 64 MiB.
>>>>>>>
>>>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>>>
>>>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>>>
>>>>>>
>>>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>>>> of adding vmemmap_altmap to  struct memory_block
>>>>>
>>>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>>>
>>>>
>>>> Sure. How about?
>>>>
>>>> bool mhp_supports_memmap_on_memory(unsigned long size)
>>>> {
>>>>
>>>>      unsigned long nr_pages = size >> PAGE_SHIFT;
>>>>      unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>>
>>>>      if (!radix_enabled())
>>>>          return false;
>>>>      /*
>>>>       * memmap on memory only supported with memory block size add/remove
>>>>       */
>>>>      if (size != memory_block_size_bytes())
>>>>          return false;
>>>>      /*
>>>>       * Also make sure the vmemmap allocation is fully contianed
>>>>       * so that we always allocate vmemmap memory from altmap area.
>>>>       */
>>>>      if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>>>          return false;
>>>>      /*
>>>>       * The pageblock alignment requirement is met by using
>>>>       * reserve blocks in altmap.
>>>>       */
>>>>      return true;
>>>> }
>>>
>>> Better, but the PAGE_SIZE that could be added to common code as well.
>>>
>>> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
>>>
>>> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
>>>
>>> Wasting hotplugged memory certainly sounds wrong?
>>>
>>>
>>> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
>>>
>>
>> If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
>> on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64.
> 
> As we'll always handle a 2M pageblock, you'll end up wasting memory.
> 
> Assume a 64MiB memory block:
> 
> With 64k: 1024 pages -> 64k vmemmap, almost 2 MiB wasted. ~3.1 %
> With 4k: 16384 pages -> 1 MiB vmemmap, 1 MiB wasted. ~1.5%
> 
> It gets worse with smaller memory block sizes.
> 
> 
>>
>> We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
>> size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
>> 28 pages get reserved.
> 
> 
> You can simply align-up the nr_vmemmap_pages up to pageblocks in the memory hotplug code (e.g., depending on a config/arch knob whether wasting memory is supported).
> 
> Because the pageblock granularity is a memory onlining/offlining limitation and should be checked+handled exactly there.

That is what the changes in the patches are doing. A rewritten patch showing this exact details is below. If arch want's to avoid
wasting pages due to this aligment they can add the page aligment restrictions in 

static inline bool arch_supports_memmap_on_memory(unsigned long size)
{
	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
	unsigned long remaining_size = size - vmemmap_size;

	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
}


modified   mm/memory_hotplug.c
@@ -1285,6 +1285,16 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
 }
 
+unsigned long memory_block_align_base(unsigned long size)
+{
+	unsigned long align;
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
+
+	align = pageblock_align(vmemmap_size) -  vmemmap_size;
+	return align;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1295,7 +1305,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+		.reserve  = memory_block_align_base(resource_size(res)),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1340,8 +1354,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = PHYS_PFN(size) - mhp_altmap.reserve;
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1353,7 +1366,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
+	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
 					  group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
@@ -2253,3 +2266,14 @@ int offline_and_remove_memory(u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
+
+static int __init memory_hotplug_init(void)
+{
+	if (mhp_memmap_on_memory()) {
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memory_block_align_base(memory_block_size_bytes()));
+
+	}
+	return 0;
+}
+module_init(memory_hotplug_init);





^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-07 16:25                       ` Aneesh Kumar K V
  0 siblings, 0 replies; 48+ messages in thread
From: Aneesh Kumar K V @ 2023-07-07 16:25 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/7/23 9:12 PM, David Hildenbrand wrote:
> On 07.07.23 15:30, Aneesh Kumar K V wrote:
>> On 7/7/23 5:47 PM, David Hildenbrand wrote:
>>> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>>>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>>>
>>>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>>>
>>>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>>>
>>>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>
>>>>>>>>
>>>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>>>> So on free we  check
>>>>>>>
>>>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>>>
>>>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>>>
>>>>>>> 1024 pages * 64k = 64 MiB.
>>>>>>>
>>>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>>>
>>>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>>>
>>>>>>
>>>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>>>> of adding vmemmap_altmap to  struct memory_block
>>>>>
>>>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>>>
>>>>
>>>> Sure. How about?
>>>>
>>>> bool mhp_supports_memmap_on_memory(unsigned long size)
>>>> {
>>>>
>>>>      unsigned long nr_pages = size >> PAGE_SHIFT;
>>>>      unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>>
>>>>      if (!radix_enabled())
>>>>          return false;
>>>>      /*
>>>>       * memmap on memory only supported with memory block size add/remove
>>>>       */
>>>>      if (size != memory_block_size_bytes())
>>>>          return false;
>>>>      /*
>>>>       * Also make sure the vmemmap allocation is fully contianed
>>>>       * so that we always allocate vmemmap memory from altmap area.
>>>>       */
>>>>      if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>>>          return false;
>>>>      /*
>>>>       * The pageblock alignment requirement is met by using
>>>>       * reserve blocks in altmap.
>>>>       */
>>>>      return true;
>>>> }
>>>
>>> Better, but the PAGE_SIZE that could be added to common code as well.
>>>
>>> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
>>>
>>> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
>>>
>>> Wasting hotplugged memory certainly sounds wrong?
>>>
>>>
>>> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
>>>
>>
>> If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
>> on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64.
> 
> As we'll always handle a 2M pageblock, you'll end up wasting memory.
> 
> Assume a 64MiB memory block:
> 
> With 64k: 1024 pages -> 64k vmemmap, almost 2 MiB wasted. ~3.1 %
> With 4k: 16384 pages -> 1 MiB vmemmap, 1 MiB wasted. ~1.5%
> 
> It gets worse with smaller memory block sizes.
> 
> 
>>
>> We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
>> size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
>> 28 pages get reserved.
> 
> 
> You can simply align-up the nr_vmemmap_pages up to pageblocks in the memory hotplug code (e.g., depending on a config/arch knob whether wasting memory is supported).
> 
> Because the pageblock granularity is a memory onlining/offlining limitation and should be checked+handled exactly there.

That is what the changes in the patches are doing. A rewritten patch showing this exact details is below. If arch want's to avoid
wasting pages due to this aligment they can add the page aligment restrictions in 

static inline bool arch_supports_memmap_on_memory(unsigned long size)
{
	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
	unsigned long remaining_size = size - vmemmap_size;

	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
}


modified   mm/memory_hotplug.c
@@ -1285,6 +1285,16 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
 }
 
+unsigned long memory_block_align_base(unsigned long size)
+{
+	unsigned long align;
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
+
+	align = pageblock_align(vmemmap_size) -  vmemmap_size;
+	return align;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1295,7 +1305,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+		.reserve  = memory_block_align_base(resource_size(res)),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1340,8 +1354,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = PHYS_PFN(size) - mhp_altmap.reserve;
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1353,7 +1366,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
+	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
 					  group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
@@ -2253,3 +2266,14 @@ int offline_and_remove_memory(u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
+
+static int __init memory_hotplug_init(void)
+{
+	if (mhp_memmap_on_memory()) {
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memory_block_align_base(memory_block_size_bytes()));
+
+	}
+	return 0;
+}
+module_init(memory_hotplug_init);




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-07 16:25                       ` Aneesh Kumar K V
@ 2023-07-07 20:26                         ` David Hildenbrand
  -1 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-07 20:26 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 07.07.23 18:25, Aneesh Kumar K V wrote:
> On 7/7/23 9:12 PM, David Hildenbrand wrote:
>> On 07.07.23 15:30, Aneesh Kumar K V wrote:
>>> On 7/7/23 5:47 PM, David Hildenbrand wrote:
>>>> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>>>>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>>>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>>>>
>>>>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>>>>
>>>>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>>>>
>>>>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>>>>
>>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>>>>> So on free we  check
>>>>>>>>
>>>>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>>>>
>>>>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>>>>
>>>>>>>> 1024 pages * 64k = 64 MiB.
>>>>>>>>
>>>>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>>>>
>>>>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>>>>
>>>>>>>
>>>>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>>>>> of adding vmemmap_altmap to  struct memory_block
>>>>>>
>>>>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>>>>
>>>>>
>>>>> Sure. How about?
>>>>>
>>>>> bool mhp_supports_memmap_on_memory(unsigned long size)
>>>>> {
>>>>>
>>>>>       unsigned long nr_pages = size >> PAGE_SHIFT;
>>>>>       unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>>>
>>>>>       if (!radix_enabled())
>>>>>           return false;
>>>>>       /*
>>>>>        * memmap on memory only supported with memory block size add/remove
>>>>>        */
>>>>>       if (size != memory_block_size_bytes())
>>>>>           return false;
>>>>>       /*
>>>>>        * Also make sure the vmemmap allocation is fully contianed
>>>>>        * so that we always allocate vmemmap memory from altmap area.
>>>>>        */
>>>>>       if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>>>>           return false;
>>>>>       /*
>>>>>        * The pageblock alignment requirement is met by using
>>>>>        * reserve blocks in altmap.
>>>>>        */
>>>>>       return true;
>>>>> }
>>>>
>>>> Better, but the PAGE_SIZE that could be added to common code as well.
>>>>
>>>> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
>>>>
>>>> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
>>>>
>>>> Wasting hotplugged memory certainly sounds wrong?
>>>>
>>>>
>>>> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
>>>>
>>>
>>> If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
>>> on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64.
>>
>> As we'll always handle a 2M pageblock, you'll end up wasting memory.
>>
>> Assume a 64MiB memory block:
>>
>> With 64k: 1024 pages -> 64k vmemmap, almost 2 MiB wasted. ~3.1 %
>> With 4k: 16384 pages -> 1 MiB vmemmap, 1 MiB wasted. ~1.5%
>>
>> It gets worse with smaller memory block sizes.
>>
>>
>>>
>>> We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
>>> size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
>>> 28 pages get reserved.
>>
>>
>> You can simply align-up the nr_vmemmap_pages up to pageblocks in the memory hotplug code (e.g., depending on a config/arch knob whether wasting memory is supported).
>>
>> Because the pageblock granularity is a memory onlining/offlining limitation and should be checked+handled exactly there.
> 
> That is what the changes in the patches are doing. A rewritten patch showing this exact details is below. If arch want's to avoid
> wasting pages due to this aligment they can add the page aligment restrictions in
> 
> static inline bool arch_supports_memmap_on_memory(unsigned long size)
> {
> 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
> 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> 	unsigned long remaining_size = size - vmemmap_size;
> 
> 	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> 		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> }

I tend towards that this should be a config option (something that 
expresses that wasting memory is acceptable), then we can move it to 
common code.

There, we simply allow aligning the vmemmap size up to the next 
pageblock (if the config allows for it).

Further, we have to make sure that our single memblock is not a single 
pageblock (and adding memory would imply only consuming memmap and not 
providing any memory).


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-07 20:26                         ` David Hildenbrand
  0 siblings, 0 replies; 48+ messages in thread
From: David Hildenbrand @ 2023-07-07 20:26 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 07.07.23 18:25, Aneesh Kumar K V wrote:
> On 7/7/23 9:12 PM, David Hildenbrand wrote:
>> On 07.07.23 15:30, Aneesh Kumar K V wrote:
>>> On 7/7/23 5:47 PM, David Hildenbrand wrote:
>>>> On 06.07.23 18:06, Aneesh Kumar K V wrote:
>>>>> On 7/6/23 6:29 PM, David Hildenbrand wrote:
>>>>>> On 06.07.23 14:32, Aneesh Kumar K V wrote:
>>>>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote:
>>>>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote:
>>>>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote:
>>>>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>>>>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>>>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory.
>>>>>>>>>>
>>>>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't?
>>>>>>>>>>
>>>>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn?
>>>>>>>>>>
>>>>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken.
>>>>>>>>>>
>>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because
>>>>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range.
>>>>>>>>> So on free we  check
>>>>>>>>
>>>>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap).
>>>>>>>>
>>>>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages
>>>>>>>>
>>>>>>>> 1024 pages * 64k = 64 MiB.
>>>>>>>>
>>>>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine.
>>>>>>>>
>>>>>>>> Smells like you want to disable the feature on a 64k system.
>>>>>>>>
>>>>>>>
>>>>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require
>>>>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea
>>>>>>> of adding vmemmap_altmap to  struct memory_block
>>>>>>
>>>>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it).
>>>>>>
>>>>>
>>>>> Sure. How about?
>>>>>
>>>>> bool mhp_supports_memmap_on_memory(unsigned long size)
>>>>> {
>>>>>
>>>>>       unsigned long nr_pages = size >> PAGE_SHIFT;
>>>>>       unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>>>
>>>>>       if (!radix_enabled())
>>>>>           return false;
>>>>>       /*
>>>>>        * memmap on memory only supported with memory block size add/remove
>>>>>        */
>>>>>       if (size != memory_block_size_bytes())
>>>>>           return false;
>>>>>       /*
>>>>>        * Also make sure the vmemmap allocation is fully contianed
>>>>>        * so that we always allocate vmemmap memory from altmap area.
>>>>>        */
>>>>>       if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>>>>>           return false;
>>>>>       /*
>>>>>        * The pageblock alignment requirement is met by using
>>>>>        * reserve blocks in altmap.
>>>>>        */
>>>>>       return true;
>>>>> }
>>>>
>>>> Better, but the PAGE_SIZE that could be added to common code as well.
>>>>
>>>> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are?
>>>>
>>>> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock.
>>>>
>>>> Wasting hotplugged memory certainly sounds wrong?
>>>>
>>>>
>>>> So I appreciate if you could explain why the pageblock check should not be had for ppc64?
>>>>
>>>
>>> If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all
>>> on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64.
>>
>> As we'll always handle a 2M pageblock, you'll end up wasting memory.
>>
>> Assume a 64MiB memory block:
>>
>> With 64k: 1024 pages -> 64k vmemmap, almost 2 MiB wasted. ~3.1 %
>> With 4k: 16384 pages -> 1 MiB vmemmap, 1 MiB wasted. ~1.5%
>>
>> It gets worse with smaller memory block sizes.
>>
>>
>>>
>>> We  use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block
>>> size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
>>> 28 pages get reserved.
>>
>>
>> You can simply align-up the nr_vmemmap_pages up to pageblocks in the memory hotplug code (e.g., depending on a config/arch knob whether wasting memory is supported).
>>
>> Because the pageblock granularity is a memory onlining/offlining limitation and should be checked+handled exactly there.
> 
> That is what the changes in the patches are doing. A rewritten patch showing this exact details is below. If arch want's to avoid
> wasting pages due to this aligment they can add the page aligment restrictions in
> 
> static inline bool arch_supports_memmap_on_memory(unsigned long size)
> {
> 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
> 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> 	unsigned long remaining_size = size - vmemmap_size;
> 
> 	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> 		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> }

I tend towards that this should be a config option (something that 
expresses that wasting memory is acceptable), then we can move it to 
common code.

There, we simply allow aligning the vmemmap size up to the next 
pageblock (if the config allows for it).

Further, we have to make sure that our single memblock is not a single 
pageblock (and adding memory would imply only consuming memmap and not 
providing any memory).


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2023-07-07 20:27 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06  8:50 [PATCH v2 0/5] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-07-06  8:50 ` Aneesh Kumar K.V
2023-07-06  8:50 ` [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block Aneesh Kumar K.V
2023-07-06  8:50   ` Aneesh Kumar K.V
2023-07-06  9:18   ` David Hildenbrand
2023-07-06  9:18     ` David Hildenbrand
2023-07-06  9:36     ` Aneesh Kumar K V
2023-07-06  9:36       ` Aneesh Kumar K V
2023-07-06 11:14       ` David Hildenbrand
2023-07-06 11:14         ` David Hildenbrand
2023-07-06 12:32         ` Aneesh Kumar K V
2023-07-06 12:32           ` Aneesh Kumar K V
2023-07-06 12:59           ` David Hildenbrand
2023-07-06 12:59             ` David Hildenbrand
2023-07-06 16:06             ` Aneesh Kumar K V
2023-07-06 16:06               ` Aneesh Kumar K V
2023-07-07 12:17               ` David Hildenbrand
2023-07-07 12:17                 ` David Hildenbrand
2023-07-07 13:30                 ` Aneesh Kumar K V
2023-07-07 13:30                   ` Aneesh Kumar K V
2023-07-07 15:42                   ` David Hildenbrand
2023-07-07 15:42                     ` David Hildenbrand
2023-07-07 16:25                     ` Aneesh Kumar K V
2023-07-07 16:25                       ` Aneesh Kumar K V
2023-07-07 20:26                       ` David Hildenbrand
2023-07-07 20:26                         ` David Hildenbrand
2023-07-06  8:50 ` [PATCH v2 2/5] mm/hotplug: Allow architecture override for memmap on memory feature Aneesh Kumar K.V
2023-07-06  8:50   ` Aneesh Kumar K.V
2023-07-06  9:19   ` David Hildenbrand
2023-07-06  9:19     ` David Hildenbrand
2023-07-06  8:50 ` [PATCH v2 3/5] mm/hotplug: Simplify the handling of MHP_MEMMAP_ON_MEMORY flag Aneesh Kumar K.V
2023-07-06  8:50   ` Aneesh Kumar K.V
2023-07-06  9:24   ` David Hildenbrand
2023-07-06  9:24     ` David Hildenbrand
2023-07-06 10:04     ` Aneesh Kumar K V
2023-07-06 10:04       ` Aneesh Kumar K V
2023-07-06 11:20       ` David Hildenbrand
2023-07-06 11:20         ` David Hildenbrand
2023-07-06  8:50 ` [PATCH v2 4/5] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
2023-07-06  8:50   ` Aneesh Kumar K.V
2023-07-06  8:53   ` David Hildenbrand
2023-07-06  8:53     ` David Hildenbrand
2023-07-06  8:50 ` [PATCH v2 5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
2023-07-06  8:50   ` Aneesh Kumar K.V
2023-07-06  9:07   ` David Hildenbrand
2023-07-06  9:07     ` David Hildenbrand
2023-07-06  9:27     ` Aneesh Kumar K V
2023-07-06  9:27       ` Aneesh Kumar K V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.