All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Add support for memmap on memory feature on ppc64
@ 2023-07-27  8:02 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here
https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/

Changes from v5:
* Update commit message
* Move memory alloc/free to the callers in patch 6
* Address review feedback w.r.t patch 4

Changes from v4:
* Use altmap.free instead of altmap.reserve
* Address review feedback

Changes from v3:
* Extend the module parameter memmap_on_memory to force allocation even
  though we can waste hotplug memory.

Changes from v2:
* Rebase to latest linus tree
* Redo the series based on review feedback. Multiple changes to the patchset.

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree


Aneesh Kumar K.V (7):
  mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  mm/memory_hotplug: Allow memmap on memory hotplug request to fallback
  mm/memory_hotplug: Allow architecture to override memmap on memory
    support check
  mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned
    to pageblocks
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  mm/memory_hotplug: Embed vmem_altmap details in memory block
  mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter

 .../admin-guide/mm/memory-hotplug.rst         |  12 +
 arch/arm64/Kconfig                            |   4 +-
 arch/powerpc/Kconfig                          |   1 +
 arch/powerpc/include/asm/pgtable.h            |  21 ++
 .../platforms/pseries/hotplug-memory.c        |   2 +-
 arch/x86/Kconfig                              |   4 +-
 drivers/acpi/acpi_memhotplug.c                |   3 +-
 drivers/base/memory.c                         |  25 ++-
 include/linux/memory.h                        |   8 +-
 include/linux/memory_hotplug.h                |   3 +-
 mm/Kconfig                                    |   3 +
 mm/memory_hotplug.c                           | 210 ++++++++++++++----
 12 files changed, 224 insertions(+), 72 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 0/7] Add support for memmap on memory feature on ppc64
@ 2023-07-27  8:02 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here
https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/

Changes from v5:
* Update commit message
* Move memory alloc/free to the callers in patch 6
* Address review feedback w.r.t patch 4

Changes from v4:
* Use altmap.free instead of altmap.reserve
* Address review feedback

Changes from v3:
* Extend the module parameter memmap_on_memory to force allocation even
  though we can waste hotplug memory.

Changes from v2:
* Rebase to latest linus tree
* Redo the series based on review feedback. Multiple changes to the patchset.

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree


Aneesh Kumar K.V (7):
  mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  mm/memory_hotplug: Allow memmap on memory hotplug request to fallback
  mm/memory_hotplug: Allow architecture to override memmap on memory
    support check
  mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned
    to pageblocks
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  mm/memory_hotplug: Embed vmem_altmap details in memory block
  mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter

 .../admin-guide/mm/memory-hotplug.rst         |  12 +
 arch/arm64/Kconfig                            |   4 +-
 arch/powerpc/Kconfig                          |   1 +
 arch/powerpc/include/asm/pgtable.h            |  21 ++
 .../platforms/pseries/hotplug-memory.c        |   2 +-
 arch/x86/Kconfig                              |   4 +-
 drivers/acpi/acpi_memhotplug.c                |   3 +-
 drivers/base/memory.c                         |  25 ++-
 include/linux/memory.h                        |   8 +-
 include/linux/memory_hotplug.h                |   3 +-
 mm/Kconfig                                    |   3 +
 mm/memory_hotplug.c                           | 210 ++++++++++++++----
 12 files changed, 224 insertions(+), 72 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  2023-07-27  8:02 ` Aneesh Kumar K.V
  (?)
@ 2023-07-27  8:02 ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/Kconfig | 4 +---
 arch/x86/Kconfig   | 4 +---
 mm/Kconfig         | 3 +++
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1573257a4d6..0f749cfab8e6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_GNU_PROPERTY
 	select ARCH_USE_MEMTEST
@@ -347,9 +348,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 config SMP
 	def_bool y
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
 	def_bool y
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 5fe49c030961..721dc88423c7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -571,6 +571,9 @@ config MHP_MEMMAP_ON_MEMORY
 
 endif # MEMORY_HOTPLUG
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+       bool
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 2/7] mm/memory_hotplug: Allow memmap on memory hotplug request to fallback
  2023-07-27  8:02 ` Aneesh Kumar K.V
  (?)
  (?)
@ 2023-07-27  8:02 ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/acpi/acpi_memhotplug.c |  3 +--
 include/linux/memory_hotplug.h |  3 ++-
 mm/memory_hotplug.c            | 13 ++++++-------
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..d0c1a71007d0 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (!info->length)
 			continue;
 
-		if (mhp_supports_memmap_on_memory(info->length))
-			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+		mhp_flags |= MHP_MEMMAP_ON_MEMORY;
 		result = __add_memory(mgid, info->start_addr, info->length,
 				      mhp_flags);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..7d2076583494 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -97,6 +97,8 @@ typedef int __bitwise mhp_t;
  * To do so, we will use the beginning of the hot-added range to build
  * the page tables for the memmap array that describes the entire range.
  * Only selected architectures support it with SPARSE_VMEMMAP.
+ * This is only a hint, the core kernel can decide to not do this based on
+ * different alignment checks.
  */
 #define MHP_MEMMAP_ON_MEMORY   ((__force mhp_t)BIT(1))
 /*
@@ -354,7 +356,6 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
 extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
 				      struct mhp_params *params);
 void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7cfd13c91568..eca32ccd45cc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-bool mhp_supports_memmap_on_memory(unsigned long size)
+static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 * Self hosted memmap array
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-		if (!mhp_supports_memmap_on_memory(size)) {
-			ret = -EINVAL;
-			goto error;
+		if (mhp_supports_memmap_on_memory(size)) {
+			mhp_altmap.free = PHYS_PFN(size);
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			params.altmap = &mhp_altmap;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
-		params.altmap = &mhp_altmap;
+		/* fallback to not using altmap  */
 	}
 
 	/* call arch's memory hotadd */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 3/7] mm/memory_hotplug: Allow architecture to override memmap on memory support check
  2023-07-27  8:02 ` Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  (?)
@ 2023-07-27  8:02 ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Some architectures would want different restrictions. Hence add an
architecture-specific override.

The PMD_SIZE check is moved there.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index eca32ccd45cc..746cb7c08c64 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,10 +1247,26 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
+static inline unsigned long memory_block_memmap_size(void)
+{
+	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
+}
+
+#ifndef arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
+{
+	/*
+	 * As default, we want the vmemmap to span a complete PMD such that we
+	 * can map the vmemmap using a single PMD if supported by the
+	 * architecture.
+	 */
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+}
+#endif
+
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
-	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
-	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+	unsigned long vmemmap_size = memory_block_memmap_size();
 	unsigned long remaining_size = size - vmemmap_size;
 
 	/*
@@ -1281,8 +1297,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 */
 	return mhp_memmap_on_memory() &&
 	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
+	       arch_supports_memmap_on_memory(vmemmap_size);
 }
 
 /*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-27  8:02 ` Aneesh Kumar K.V
@ 2023-07-27  8:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.

This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps the start address to be page
block aligned with different memory block sizes. Using this facility
implies the kernel will be reserving some pages for every memoryblock.
This allows the memmap on memory feature to be widely useful with
different memory block size values.

For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 .../admin-guide/mm/memory-hotplug.rst         |  12 ++
 mm/memory_hotplug.c                           | 120 +++++++++++++++---
 2 files changed, 113 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index bd77841041af..2994958c7ce8 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -433,6 +433,18 @@ The following module parameters are currently defined:
 				 memory in a way that huge pages in bigger
 				 granularity cannot be formed on hotplugged
 				 memory.
+
+				 With value "force" it could result in memory
+				 wastage due to memmap size limitations. For
+				 example, if the memmap for a memory block
+				 requires 1 MiB, but the pageblock size is 2
+				 MiB, 1 MiB of hotplugged memory will be wasted.
+				 Note that there are still cases where the
+				 feature cannot be enforced: for example, if the
+				 memmap is smaller than a single page, or if the
+				 architecture does not support the forced mode
+				 in all configurations.
+
 ``online_policy``		 read-write: Set the basic policy used for
 				 automatic zone selection when onlining memory
 				 blocks without specifying a target zone.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 746cb7c08c64..fe94feb32d71 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -41,17 +41,83 @@
 #include "internal.h"
 #include "shuffle.h"
 
+enum {
+	MEMMAP_ON_MEMORY_DISABLE = 0,
+	MEMMAP_ON_MEMORY_ENABLE,
+	MEMMAP_ON_MEMORY_FORCE,
+};
+
+static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
+
+static inline unsigned long memory_block_memmap_size(void)
+{
+	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
+}
+
+static inline unsigned long memory_block_memmap_on_memory_pages(void)
+{
+	unsigned long nr_pages = PFN_UP(memory_block_memmap_size());
+
+	/*
+	 * In "forced" memmap_on_memory mode, we add extra pages to align the
+	 * vmemmap size to cover full pageblocks. That way, we can add memory
+	 * even if the vmemmap size is not properly aligned, however, we might waste
+	 * memory.
+	 */
+	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
+		return pageblock_align(nr_pages);
+	return nr_pages;
+}
+
 #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
-module_param(memmap_on_memory, bool, 0444);
-MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
+static int set_memmap_mode(const char *val, const struct kernel_param *kp)
+{
+	int ret, mode;
+	bool enabled;
+
+	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
+		mode = MEMMAP_ON_MEMORY_FORCE;
+	} else {
+		ret = kstrtobool(val, &enabled);
+		if (ret < 0)
+			return ret;
+		if (enabled)
+			mode = MEMMAP_ON_MEMORY_ENABLE;
+		else
+			mode = MEMMAP_ON_MEMORY_DISABLE;
+	}
+	*((int *)kp->arg) = mode;
+	if (mode == MEMMAP_ON_MEMORY_FORCE) {
+		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
+
+		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
+			     memmap_pages - PFN_UP(memory_block_memmap_size()));
+	}
+	return 0;
+}
+
+static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
+{
+	if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE)
+		return sprintf(buffer,  "force\n");
+	return param_get_bool(buffer, kp);
+}
+
+static const struct kernel_param_ops memmap_mode_ops = {
+	.set = set_memmap_mode,
+	.get = get_memmap_mode,
+};
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
+		 "With value \"force\" it could result in memory wastage due "
+		 "to memmap size limitations (Y/N/force)");
 
 static inline bool mhp_memmap_on_memory(void)
 {
-	return memmap_on_memory;
+	return memmap_mode != MEMMAP_ON_MEMORY_DISABLE;
 }
 #else
 static inline bool mhp_memmap_on_memory(void)
@@ -1247,11 +1313,6 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-static inline unsigned long memory_block_memmap_size(void)
-{
-	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
-}
-
 #ifndef arch_supports_memmap_on_memory
 static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 {
@@ -1267,7 +1328,7 @@ static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long vmemmap_size = memory_block_memmap_size();
-	unsigned long remaining_size = size - vmemmap_size;
+	unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
@@ -1295,10 +1356,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
-	       arch_supports_memmap_on_memory(vmemmap_size);
+	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+		return false;
+
+	/*
+	 * Make sure the vmemmap allocation is fully contained
+	 * so that we always allocate vmemmap memory from altmap area.
+	 */
+	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
+		return false;
+
+	/*
+	 * start pfn should be pageblock_nr_pages aligned for correctly
+	 * setting migrate types
+	 */
+	if (!pageblock_aligned(memmap_pages))
+		return false;
+
+	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
+		/* No effective hotplugged memory doesn't make sense. */
+		return false;
+
+	return arch_supports_memmap_on_memory(vmemmap_size);
 }
 
 /*
@@ -1311,7 +1390,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1356,8 +1438,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = memory_block_memmap_on_memory_pages();
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1369,8 +1450,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
-					  group);
+	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -2096,6 +2176,8 @@ static int __ref try_remove_memory(u64 start, u64 size)
 			 * right thing if we used vmem_altmap when hot-adding
 			 * the range.
 			 */
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = nr_vmemmap_pages;
 			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-27  8:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.

This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps the start address to be page
block aligned with different memory block sizes. Using this facility
implies the kernel will be reserving some pages for every memoryblock.
This allows the memmap on memory feature to be widely useful with
different memory block size values.

For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 .../admin-guide/mm/memory-hotplug.rst         |  12 ++
 mm/memory_hotplug.c                           | 120 +++++++++++++++---
 2 files changed, 113 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index bd77841041af..2994958c7ce8 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -433,6 +433,18 @@ The following module parameters are currently defined:
 				 memory in a way that huge pages in bigger
 				 granularity cannot be formed on hotplugged
 				 memory.
+
+				 With value "force" it could result in memory
+				 wastage due to memmap size limitations. For
+				 example, if the memmap for a memory block
+				 requires 1 MiB, but the pageblock size is 2
+				 MiB, 1 MiB of hotplugged memory will be wasted.
+				 Note that there are still cases where the
+				 feature cannot be enforced: for example, if the
+				 memmap is smaller than a single page, or if the
+				 architecture does not support the forced mode
+				 in all configurations.
+
 ``online_policy``		 read-write: Set the basic policy used for
 				 automatic zone selection when onlining memory
 				 blocks without specifying a target zone.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 746cb7c08c64..fe94feb32d71 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -41,17 +41,83 @@
 #include "internal.h"
 #include "shuffle.h"
 
+enum {
+	MEMMAP_ON_MEMORY_DISABLE = 0,
+	MEMMAP_ON_MEMORY_ENABLE,
+	MEMMAP_ON_MEMORY_FORCE,
+};
+
+static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
+
+static inline unsigned long memory_block_memmap_size(void)
+{
+	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
+}
+
+static inline unsigned long memory_block_memmap_on_memory_pages(void)
+{
+	unsigned long nr_pages = PFN_UP(memory_block_memmap_size());
+
+	/*
+	 * In "forced" memmap_on_memory mode, we add extra pages to align the
+	 * vmemmap size to cover full pageblocks. That way, we can add memory
+	 * even if the vmemmap size is not properly aligned, however, we might waste
+	 * memory.
+	 */
+	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
+		return pageblock_align(nr_pages);
+	return nr_pages;
+}
+
 #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
-module_param(memmap_on_memory, bool, 0444);
-MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
+static int set_memmap_mode(const char *val, const struct kernel_param *kp)
+{
+	int ret, mode;
+	bool enabled;
+
+	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
+		mode = MEMMAP_ON_MEMORY_FORCE;
+	} else {
+		ret = kstrtobool(val, &enabled);
+		if (ret < 0)
+			return ret;
+		if (enabled)
+			mode = MEMMAP_ON_MEMORY_ENABLE;
+		else
+			mode = MEMMAP_ON_MEMORY_DISABLE;
+	}
+	*((int *)kp->arg) = mode;
+	if (mode == MEMMAP_ON_MEMORY_FORCE) {
+		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
+
+		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
+			     memmap_pages - PFN_UP(memory_block_memmap_size()));
+	}
+	return 0;
+}
+
+static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
+{
+	if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE)
+		return sprintf(buffer,  "force\n");
+	return param_get_bool(buffer, kp);
+}
+
+static const struct kernel_param_ops memmap_mode_ops = {
+	.set = set_memmap_mode,
+	.get = get_memmap_mode,
+};
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
+		 "With value \"force\" it could result in memory wastage due "
+		 "to memmap size limitations (Y/N/force)");
 
 static inline bool mhp_memmap_on_memory(void)
 {
-	return memmap_on_memory;
+	return memmap_mode != MEMMAP_ON_MEMORY_DISABLE;
 }
 #else
 static inline bool mhp_memmap_on_memory(void)
@@ -1247,11 +1313,6 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-static inline unsigned long memory_block_memmap_size(void)
-{
-	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
-}
-
 #ifndef arch_supports_memmap_on_memory
 static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 {
@@ -1267,7 +1328,7 @@ static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long vmemmap_size = memory_block_memmap_size();
-	unsigned long remaining_size = size - vmemmap_size;
+	unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
@@ -1295,10 +1356,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
-	       arch_supports_memmap_on_memory(vmemmap_size);
+	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+		return false;
+
+	/*
+	 * Make sure the vmemmap allocation is fully contained
+	 * so that we always allocate vmemmap memory from altmap area.
+	 */
+	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
+		return false;
+
+	/*
+	 * start pfn should be pageblock_nr_pages aligned for correctly
+	 * setting migrate types
+	 */
+	if (!pageblock_aligned(memmap_pages))
+		return false;
+
+	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
+		/* No effective hotplugged memory doesn't make sense. */
+		return false;
+
+	return arch_supports_memmap_on_memory(vmemmap_size);
 }
 
 /*
@@ -1311,7 +1390,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1356,8 +1438,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = memory_block_memmap_on_memory_pages();
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1369,8 +1450,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
-					  group);
+	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -2096,6 +2176,8 @@ static int __ref try_remove_memory(u64 start, u64 size)
 			 * right thing if we used vmem_altmap when hot-adding
 			 * the range.
 			 */
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = nr_vmemmap_pages;
 			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-27  8:02 ` Aneesh Kumar K.V
@ 2023-07-27  8:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB

We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/include/asm/pgtable.h            | 21 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  2 +-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d0497d13f5b4..938294c996dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index a4893b17705a..33464e6d6431 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -161,6 +161,27 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
 int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
 bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
 			   unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
+{
+	if (!radix_enabled())
+		return false;
+	/*
+	 * With 4K page size and 2M PMD_SIZE, we can align
+	 * things better with memory block size value
+	 * starting from 128MB. Hence align things with PMD_SIZE.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+		return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+	return true;
+}
+
 #endif /* CONFIG_PPC64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..4f3d6a2f9065 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -637,7 +637,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-27  8:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB

We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/include/asm/pgtable.h            | 21 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  2 +-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d0497d13f5b4..938294c996dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index a4893b17705a..33464e6d6431 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -161,6 +161,27 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
 int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
 bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
 			   unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
+{
+	if (!radix_enabled())
+		return false;
+	/*
+	 * With 4K page size and 2M PMD_SIZE, we can align
+	 * things better with memory block size value
+	 * starting from 128MB. Hence align things with PMD_SIZE.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+		return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+	return true;
+}
+
 #endif /* CONFIG_PPC64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..4f3d6a2f9065 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -637,7 +637,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
  2023-07-27  8:02 ` Aneesh Kumar K.V
@ 2023-07-27  8:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

No functional change in this patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c  | 25 +++++++++++-------
 include/linux/memory.h |  8 ++----
 mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
 3 files changed, 55 insertions(+), 36 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..57ed61212277 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	WARN_ON(mem->altmap);
 	kfree(mem);
 }
 
@@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +751,7 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	mem->altmap = altmap;
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +790,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +825,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +839,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..f53cfdaaaa41 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -77,11 +77,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap *altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fe94feb32d71..aa8724bd1d53 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1439,7 +1439,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
 			mhp_altmap.free = memory_block_memmap_on_memory_pages();
-			params.altmap = &mhp_altmap;
+			params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+			if (!params.altmap)
+				goto error;
+
+			memcpy(params.altmap, &mhp_altmap, sizeof(mhp_altmap));
 		}
 		/* fallback to not using altmap  */
 	}
@@ -1447,13 +1451,13 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	/* call arch's memory hotadd */
 	ret = arch_add_memory(nid, start, size, &params);
 	if (ret < 0)
-		goto error;
+		goto error_free;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
+	ret = create_memory_block_devices(start, size, params.altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
-		goto error;
+		goto error_free;
 	}
 
 	if (new_node) {
@@ -1490,6 +1494,8 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		walk_memory_blocks(start, size, NULL, online_memory_block);
 
 	return ret;
+error_free:
+	kfree(params.altmap);
 error:
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
 		memblock_remove(start, size);
@@ -2056,12 +2062,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int test_has_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct memory_block **mem_ptr = (struct memory_block **)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * return the memblock if we have altmap
+	 * and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap) {
+		*mem_ptr = mem;
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
-	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
+	int ret;
+	struct memory_block *mem;
 	int rc = 0, nid = NUMA_NO_NODE;
+	struct vmem_altmap *altmap = NULL;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
 
@@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
+			altmap = mem->altmap;
 			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
+			 * Mark altmap NULL so that we can add a debug
+			 * check on memblock free.
 			 */
-			mhp_altmap.base_pfn = PHYS_PFN(start);
-			mhp_altmap.free = nr_vmemmap_pages;
-			mhp_altmap.alloc = nr_vmemmap_pages;
-			altmap = &mhp_altmap;
+			mem->altmap = NULL;
 		}
 	}
 
@@ -2196,6 +2203,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
 
 	arch_remove_memory(start, size, altmap);
 
+	/*
+	 * Now that we are tracking alloc and free correctly
+	 * we can add check to verify altmap free pages.
+	 */
+	if (altmap) {
+		WARN(altmap->alloc, "Altmap not fully unmapped");
+		kfree(altmap);
+	}
+
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
 		memblock_phys_free(start, size);
 		memblock_remove(start, size);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
@ 2023-07-27  8:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

No functional change in this patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c  | 25 +++++++++++-------
 include/linux/memory.h |  8 ++----
 mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
 3 files changed, 55 insertions(+), 36 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..57ed61212277 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	WARN_ON(mem->altmap);
 	kfree(mem);
 }
 
@@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +751,7 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	mem->altmap = altmap;
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +790,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +825,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +839,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..f53cfdaaaa41 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -77,11 +77,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap *altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fe94feb32d71..aa8724bd1d53 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1439,7 +1439,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
 			mhp_altmap.free = memory_block_memmap_on_memory_pages();
-			params.altmap = &mhp_altmap;
+			params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+			if (!params.altmap)
+				goto error;
+
+			memcpy(params.altmap, &mhp_altmap, sizeof(mhp_altmap));
 		}
 		/* fallback to not using altmap  */
 	}
@@ -1447,13 +1451,13 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	/* call arch's memory hotadd */
 	ret = arch_add_memory(nid, start, size, &params);
 	if (ret < 0)
-		goto error;
+		goto error_free;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
+	ret = create_memory_block_devices(start, size, params.altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
-		goto error;
+		goto error_free;
 	}
 
 	if (new_node) {
@@ -1490,6 +1494,8 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		walk_memory_blocks(start, size, NULL, online_memory_block);
 
 	return ret;
+error_free:
+	kfree(params.altmap);
 error:
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
 		memblock_remove(start, size);
@@ -2056,12 +2062,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int test_has_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct memory_block **mem_ptr = (struct memory_block **)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * return the memblock if we have altmap
+	 * and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap) {
+		*mem_ptr = mem;
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
-	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
+	int ret;
+	struct memory_block *mem;
 	int rc = 0, nid = NUMA_NO_NODE;
+	struct vmem_altmap *altmap = NULL;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
 
@@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
+			altmap = mem->altmap;
 			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
+			 * Mark altmap NULL so that we can add a debug
+			 * check on memblock free.
 			 */
-			mhp_altmap.base_pfn = PHYS_PFN(start);
-			mhp_altmap.free = nr_vmemmap_pages;
-			mhp_altmap.alloc = nr_vmemmap_pages;
-			altmap = &mhp_altmap;
+			mem->altmap = NULL;
 		}
 	}
 
@@ -2196,6 +2203,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
 
 	arch_remove_memory(start, size, altmap);
 
+	/*
+	 * Now that we are tracking alloc and free correctly
+	 * we can add check to verify altmap free pages.
+	 */
+	if (altmap) {
+		WARN(altmap->alloc, "Altmap not fully unmapped");
+		kfree(altmap);
+	}
+
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
 		memblock_phys_free(start, size);
 		memblock_remove(start, size);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
  2023-07-27  8:02 ` Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  (?)
@ 2023-07-27  8:02 ` Aneesh Kumar K.V
  2023-07-27  9:26     ` Michal Hocko
  2023-07-27 11:18     ` David Hildenbrand
  -1 siblings, 2 replies; 30+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-27  8:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index aa8724bd1d53..7c877756b363 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -89,7 +89,12 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
 		else
 			mode = MEMMAP_ON_MEMORY_DISABLE;
 	}
+	/*
+	 * Avoid changing memmap mode during hotplug.
+	 */
+	get_online_mems();
 	*((int *)kp->arg) = mode;
+	put_online_mems();
 	if (mode == MEMMAP_ON_MEMORY_FORCE) {
 		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
@@ -110,7 +115,7 @@ static const struct kernel_param_ops memmap_mode_ops = {
 	.set = set_memmap_mode,
 	.get = get_memmap_mode,
 };
-module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
 MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
 		 "With value \"force\" it could result in memory wastage due "
 		 "to memmap size limitations (Y/N/force)");
@@ -2172,22 +2177,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
 	 * the same granularity it was added - a single memory block.
 	 */
-	if (mhp_memmap_on_memory()) {
-		ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
-		if (ret) {
-			if (size != memory_block_size_bytes()) {
-				pr_warn("Refuse to remove %#llx - %#llx,"
-					"wrong granularity\n",
-					start, start + size);
-				return -EINVAL;
-			}
-			altmap = mem->altmap;
-			/*
-			 * Mark altmap NULL so that we can add a debug
-			 * check on memblock free.
-			 */
-			mem->altmap = NULL;
+	ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
+	if (ret) {
+		if (size != memory_block_size_bytes()) {
+			pr_warn("Refuse to remove %#llx - %#llx,"
+				"wrong granularity\n",
+				start, start + size);
+			return -EINVAL;
 		}
+		altmap = mem->altmap;
+		/*
+		 * Mark altmap NULL so that we can add a debug
+		 * check on memblock free.
+		 */
+		mem->altmap = NULL;
 	}
 
 	/* remove memmap entry */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-27  8:02   ` Aneesh Kumar K.V
@ 2023-07-27  9:23     ` Michal Hocko
  -1 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27  9:23 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Vishal Verma

On Thu 27-07-23 13:32:29, Aneesh Kumar K.V wrote:
[...]
> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
> +
> +		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
> +			     memmap_pages - PFN_UP(memory_block_memmap_size()));
> +	}
> +	return 0;
> +}

Why should we print this only for the forced case? Isn't that
interesting for any on memory memmap? Also is this the above sufficient
on its own? the size depends on the block size and that can vary.
I think it would make more sense to print the block size and the vmemmap
reservation and for the force case also any wasted amount on top (if
any).

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-27  9:23     ` Michal Hocko
  0 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27  9:23 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
	linuxppc-dev, Oscar Salvador

On Thu 27-07-23 13:32:29, Aneesh Kumar K.V wrote:
[...]
> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
> +
> +		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
> +			     memmap_pages - PFN_UP(memory_block_memmap_size()));
> +	}
> +	return 0;
> +}

Why should we print this only for the forced case? Isn't that
interesting for any on memory memmap? Also is this the above sufficient
on its own? the size depends on the block size and that can vary.
I think it would make more sense to print the block size and the vmemmap
reservation and for the force case also any wasted amount on top (if
any).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
  2023-07-27  8:02   ` Aneesh Kumar K.V
@ 2023-07-27  9:25     ` Michal Hocko
  -1 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27  9:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Vishal Verma

On Thu 27-07-23 13:32:31, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> computing them again when we remove a memory block, embed vmem_altmap
> details in struct memory_block if we are using memmap on memory block
> feature.
> 
> No functional change in this patch
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/base/memory.c  | 25 +++++++++++-------
>  include/linux/memory.h |  8 ++----
>  mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
>  3 files changed, 55 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index b456ac213610..57ed61212277 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
>  {
>  	struct memory_block *mem = to_memory_block(dev);
>  
> +	WARN_ON(mem->altmap);

What is this supposed to catch? A comment would be handy so that we know
what to look at should it ever trigger.

>  	kfree(mem);
>  }
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
@ 2023-07-27  9:25     ` Michal Hocko
  0 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27  9:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
	linuxppc-dev, Oscar Salvador

On Thu 27-07-23 13:32:31, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> computing them again when we remove a memory block, embed vmem_altmap
> details in struct memory_block if we are using memmap on memory block
> feature.
> 
> No functional change in this patch
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/base/memory.c  | 25 +++++++++++-------
>  include/linux/memory.h |  8 ++----
>  mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
>  3 files changed, 55 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index b456ac213610..57ed61212277 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
>  {
>  	struct memory_block *mem = to_memory_block(dev);
>  
> +	WARN_ON(mem->altmap);

What is this supposed to catch? A comment would be handy so that we know
what to look at should it ever trigger.

>  	kfree(mem);
>  }
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
  2023-07-27  8:02 ` [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Aneesh Kumar K.V
@ 2023-07-27  9:26     ` Michal Hocko
  2023-07-27 11:18     ` David Hildenbrand
  1 sibling, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27  9:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Vishal Verma

ENOCHANGELOG. Considering this is a user visible change then it is
really due.

On Thu 27-07-23 13:32:32, Aneesh Kumar K.V wrote:
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/memory_hotplug.c | 35 +++++++++++++++++++----------------
>  1 file changed, 19 insertions(+), 16 deletions(-)

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
@ 2023-07-27  9:26     ` Michal Hocko
  0 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27  9:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
	linuxppc-dev, Oscar Salvador

ENOCHANGELOG. Considering this is a user visible change then it is
really due.

On Thu 27-07-23 13:32:32, Aneesh Kumar K.V wrote:
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/memory_hotplug.c | 35 +++++++++++++++++++----------------
>  1 file changed, 19 insertions(+), 16 deletions(-)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-27  9:23     ` Michal Hocko
@ 2023-07-27  9:27       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K V @ 2023-07-27  9:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Vishal Verma

On 7/27/23 2:53 PM, Michal Hocko wrote:
> On Thu 27-07-23 13:32:29, Aneesh Kumar K.V wrote:
> [...]
>> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
>> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
>> +
>> +		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
>> +			     memmap_pages - PFN_UP(memory_block_memmap_size()));
>> +	}
>> +	return 0;
>> +}
> 
> Why should we print this only for the forced case? Isn't that
> interesting for any on memory memmap? Also is this the above sufficient
> on its own? the size depends on the block size and that can vary.
> I think it would make more sense to print the block size and the vmemmap
> reservation and for the force case also any wasted amount on top (if
> any).
> 

For the other cases the space is completely used by for struct page allocation. What
the information is indicating here is that for each memblock we add we are loosing/wasting so many pages. 
May be I should have used the term "waste" instead of "reserve" ?

-aneesh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-27  9:27       ` Aneesh Kumar K V
  0 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K V @ 2023-07-27  9:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
	linuxppc-dev, Oscar Salvador

On 7/27/23 2:53 PM, Michal Hocko wrote:
> On Thu 27-07-23 13:32:29, Aneesh Kumar K.V wrote:
> [...]
>> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
>> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
>> +
>> +		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
>> +			     memmap_pages - PFN_UP(memory_block_memmap_size()));
>> +	}
>> +	return 0;
>> +}
> 
> Why should we print this only for the forced case? Isn't that
> interesting for any on memory memmap? Also is this the above sufficient
> on its own? the size depends on the block size and that can vary.
> I think it would make more sense to print the block size and the vmemmap
> reservation and for the force case also any wasted amount on top (if
> any).
> 

For the other cases the space is completely used by for struct page allocation. What
the information is indicating here is that for each memblock we add we are loosing/wasting so many pages. 
May be I should have used the term "waste" instead of "reserve" ?

-aneesh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
  2023-07-27  9:25     ` Michal Hocko
@ 2023-07-27  9:32       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K V @ 2023-07-27  9:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Vishal Verma

On 7/27/23 2:55 PM, Michal Hocko wrote:
> On Thu 27-07-23 13:32:31, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>> computing them again when we remove a memory block, embed vmem_altmap
>> details in struct memory_block if we are using memmap on memory block
>> feature.
>>
>> No functional change in this patch
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  drivers/base/memory.c  | 25 +++++++++++-------
>>  include/linux/memory.h |  8 ++----
>>  mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
>>  3 files changed, 55 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index b456ac213610..57ed61212277 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
>>  {
>>  	struct memory_block *mem = to_memory_block(dev);
>>  
>> +	WARN_ON(mem->altmap);
> 
> What is this supposed to catch? A comment would be handy so that we know
> what to look at should it ever trigger.
> 

I did add a comment where we clear the altmap in try_remove_memory(). I will also add
more details here.

+			 * Mark altmap NULL so that we can add a debug
+			 * check on memblock free.
 			 */

WARN_ON is an indication of memory leak because if we have mem->altmap != NULL
then the allocated altmap is not freed . It also indicate that memblock got freed
without going through the try_remove_memory(). 

>>  	kfree(mem);
>>  }


=aneesh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
@ 2023-07-27  9:32       ` Aneesh Kumar K V
  0 siblings, 0 replies; 30+ messages in thread
From: Aneesh Kumar K V @ 2023-07-27  9:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
	linuxppc-dev, Oscar Salvador

On 7/27/23 2:55 PM, Michal Hocko wrote:
> On Thu 27-07-23 13:32:31, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>> computing them again when we remove a memory block, embed vmem_altmap
>> details in struct memory_block if we are using memmap on memory block
>> feature.
>>
>> No functional change in this patch
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  drivers/base/memory.c  | 25 +++++++++++-------
>>  include/linux/memory.h |  8 ++----
>>  mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
>>  3 files changed, 55 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index b456ac213610..57ed61212277 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
>>  {
>>  	struct memory_block *mem = to_memory_block(dev);
>>  
>> +	WARN_ON(mem->altmap);
> 
> What is this supposed to catch? A comment would be handy so that we know
> what to look at should it ever trigger.
> 

I did add a comment where we clear the altmap in try_remove_memory(). I will also add
more details here.

+			 * Mark altmap NULL so that we can add a debug
+			 * check on memblock free.
 			 */

WARN_ON is an indication of memory leak because if we have mem->altmap != NULL
then the allocated altmap is not freed . It also indicate that memblock got freed
without going through the try_remove_memory(). 

>>  	kfree(mem);
>>  }


=aneesh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-27  9:27       ` Aneesh Kumar K V
@ 2023-07-27 10:55         ` Michal Hocko
  -1 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27 10:55 UTC (permalink / raw)
  To: Aneesh Kumar K V
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Vishal Verma

On Thu 27-07-23 14:57:17, Aneesh Kumar K V wrote:
> On 7/27/23 2:53 PM, Michal Hocko wrote:
> > On Thu 27-07-23 13:32:29, Aneesh Kumar K.V wrote:
> > [...]
> >> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
> >> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
> >> +
> >> +		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
> >> +			     memmap_pages - PFN_UP(memory_block_memmap_size()));
> >> +	}
> >> +	return 0;
> >> +}
> > 
> > Why should we print this only for the forced case? Isn't that
> > interesting for any on memory memmap? Also is this the above sufficient
> > on its own? the size depends on the block size and that can vary.
> > I think it would make more sense to print the block size and the vmemmap
> > reservation and for the force case also any wasted amount on top (if
> > any).
> > 
> 
> For the other cases the space is completely used by for struct page allocation. What
> the information is indicating here is that for each memblock we add we are loosing/wasting so many pages. 
> May be I should have used the term "waste" instead of "reserve" ?

OK, so I have clearly misread and it just confirms this would benefit
from a clarification. In any case I still think that it would be
benefitial to also report how much of the memory is used for vmemmap on
the hotplugged memory. Maybe as a separate patch.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-27 10:55         ` Michal Hocko
  0 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27 10:55 UTC (permalink / raw)
  To: Aneesh Kumar K V
  Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
	linuxppc-dev, Oscar Salvador

On Thu 27-07-23 14:57:17, Aneesh Kumar K V wrote:
> On 7/27/23 2:53 PM, Michal Hocko wrote:
> > On Thu 27-07-23 13:32:29, Aneesh Kumar K.V wrote:
> > [...]
> >> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
> >> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
> >> +
> >> +		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
> >> +			     memmap_pages - PFN_UP(memory_block_memmap_size()));
> >> +	}
> >> +	return 0;
> >> +}
> > 
> > Why should we print this only for the forced case? Isn't that
> > interesting for any on memory memmap? Also is this the above sufficient
> > on its own? the size depends on the block size and that can vary.
> > I think it would make more sense to print the block size and the vmemmap
> > reservation and for the force case also any wasted amount on top (if
> > any).
> > 
> 
> For the other cases the space is completely used by for struct page allocation. What
> the information is indicating here is that for each memblock we add we are loosing/wasting so many pages. 
> May be I should have used the term "waste" instead of "reserve" ?

OK, so I have clearly misread and it just confirms this would benefit
from a clarification. In any case I still think that it would be
benefitial to also report how much of the memory is used for vmemmap on
the hotplugged memory. Maybe as a separate patch.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
  2023-07-27  9:32       ` Aneesh Kumar K V
@ 2023-07-27 10:56         ` Michal Hocko
  -1 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27 10:56 UTC (permalink / raw)
  To: Aneesh Kumar K V
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Vishal Verma

On Thu 27-07-23 15:02:12, Aneesh Kumar K V wrote:
> On 7/27/23 2:55 PM, Michal Hocko wrote:
> > On Thu 27-07-23 13:32:31, Aneesh Kumar K.V wrote:
> >> With memmap on memory, some architecture needs more details w.r.t altmap
> >> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> >> computing them again when we remove a memory block, embed vmem_altmap
> >> details in struct memory_block if we are using memmap on memory block
> >> feature.
> >>
> >> No functional change in this patch
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> ---
> >>  drivers/base/memory.c  | 25 +++++++++++-------
> >>  include/linux/memory.h |  8 ++----
> >>  mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
> >>  3 files changed, 55 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> >> index b456ac213610..57ed61212277 100644
> >> --- a/drivers/base/memory.c
> >> +++ b/drivers/base/memory.c
> >> @@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
> >>  {
> >>  	struct memory_block *mem = to_memory_block(dev);
> >>  
> >> +	WARN_ON(mem->altmap);
> > 
> > What is this supposed to catch? A comment would be handy so that we know
> > what to look at should it ever trigger.
> > 
> 
> I did add a comment where we clear the altmap in try_remove_memory(). I will also add
> more details here.
> 
> +			 * Mark altmap NULL so that we can add a debug
> +			 * check on memblock free.
>  			 */
> 
> WARN_ON is an indication of memory leak because if we have mem->altmap != NULL
> then the allocated altmap is not freed . It also indicate that memblock got freed
> without going through the try_remove_memory(). 

I think it would be better to be explicit here (who should free up but
hasn't).

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
@ 2023-07-27 10:56         ` Michal Hocko
  0 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2023-07-27 10:56 UTC (permalink / raw)
  To: Aneesh Kumar K V
  Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
	linuxppc-dev, Oscar Salvador

On Thu 27-07-23 15:02:12, Aneesh Kumar K V wrote:
> On 7/27/23 2:55 PM, Michal Hocko wrote:
> > On Thu 27-07-23 13:32:31, Aneesh Kumar K.V wrote:
> >> With memmap on memory, some architecture needs more details w.r.t altmap
> >> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> >> computing them again when we remove a memory block, embed vmem_altmap
> >> details in struct memory_block if we are using memmap on memory block
> >> feature.
> >>
> >> No functional change in this patch
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> ---
> >>  drivers/base/memory.c  | 25 +++++++++++-------
> >>  include/linux/memory.h |  8 ++----
> >>  mm/memory_hotplug.c    | 58 +++++++++++++++++++++++++++---------------
> >>  3 files changed, 55 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> >> index b456ac213610..57ed61212277 100644
> >> --- a/drivers/base/memory.c
> >> +++ b/drivers/base/memory.c
> >> @@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
> >>  {
> >>  	struct memory_block *mem = to_memory_block(dev);
> >>  
> >> +	WARN_ON(mem->altmap);
> > 
> > What is this supposed to catch? A comment would be handy so that we know
> > what to look at should it ever trigger.
> > 
> 
> I did add a comment where we clear the altmap in try_remove_memory(). I will also add
> more details here.
> 
> +			 * Mark altmap NULL so that we can add a debug
> +			 * check on memblock free.
>  			 */
> 
> WARN_ON is an indication of memory leak because if we have mem->altmap != NULL
> then the allocated altmap is not freed . It also indicate that memblock got freed
> without going through the try_remove_memory(). 

I think it would be better to be explicit here (who should free up but
hasn't).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
  2023-07-27  8:02   ` Aneesh Kumar K.V
@ 2023-07-27 11:17     ` David Hildenbrand
  -1 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2023-07-27 11:17 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma


> +	/*
> +	 * Now that we are tracking alloc and free correctly
> +	 * we can add check to verify altmap free pages.
> +	 */

Better remove the history lesson from the comment.

"Verify that all vmemmap pages have actually been freed."

> +	if (altmap) {
> +		WARN(altmap->alloc, "Altmap not fully unmapped");
> +		kfree(altmap);
> +	}
> +
>   	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>   		memblock_phys_free(start, size);
>   		memblock_remove(start, size);

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
@ 2023-07-27 11:17     ` David Hildenbrand
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2023-07-27 11:17 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador


> +	/*
> +	 * Now that we are tracking alloc and free correctly
> +	 * we can add check to verify altmap free pages.
> +	 */

Better remove the history lesson from the comment.

"Verify that all vmemmap pages have actually been freed."

> +	if (altmap) {
> +		WARN(altmap->alloc, "Altmap not fully unmapped");
> +		kfree(altmap);
> +	}
> +
>   	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>   		memblock_phys_free(start, size);
>   		memblock_remove(start, size);

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
  2023-07-27  8:02 ` [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Aneesh Kumar K.V
@ 2023-07-27 11:18     ` David Hildenbrand
  2023-07-27 11:18     ` David Hildenbrand
  1 sibling, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2023-07-27 11:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 27.07.23 10:02, Aneesh Kumar K.V wrote:
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/memory_hotplug.c | 35 +++++++++++++++++++----------------
>   1 file changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index aa8724bd1d53..7c877756b363 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -89,7 +89,12 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
>   		else
>   			mode = MEMMAP_ON_MEMORY_DISABLE;
>   	}
> +	/*
> +	 * Avoid changing memmap mode during hotplug.
> +	 */

Nit: comment fits into a single line.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
@ 2023-07-27 11:18     ` David Hildenbrand
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2023-07-27 11:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 27.07.23 10:02, Aneesh Kumar K.V wrote:
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/memory_hotplug.c | 35 +++++++++++++++++++----------------
>   1 file changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index aa8724bd1d53..7c877756b363 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -89,7 +89,12 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
>   		else
>   			mode = MEMMAP_ON_MEMORY_DISABLE;
>   	}
> +	/*
> +	 * Avoid changing memmap mode during hotplug.
> +	 */

Nit: comment fits into a single line.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-07-27 11:19 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-27  8:02 [PATCH v6 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-07-27  8:02 ` Aneesh Kumar K.V
2023-07-27  8:02 ` [PATCH v6 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
2023-07-27  8:02 ` [PATCH v6 2/7] mm/memory_hotplug: Allow memmap on memory hotplug request to fallback Aneesh Kumar K.V
2023-07-27  8:02 ` [PATCH v6 3/7] mm/memory_hotplug: Allow architecture to override memmap on memory support check Aneesh Kumar K.V
2023-07-27  8:02 ` [PATCH v6 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks Aneesh Kumar K.V
2023-07-27  8:02   ` Aneesh Kumar K.V
2023-07-27  9:23   ` Michal Hocko
2023-07-27  9:23     ` Michal Hocko
2023-07-27  9:27     ` Aneesh Kumar K V
2023-07-27  9:27       ` Aneesh Kumar K V
2023-07-27 10:55       ` Michal Hocko
2023-07-27 10:55         ` Michal Hocko
2023-07-27  8:02 ` [PATCH v6 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
2023-07-27  8:02   ` Aneesh Kumar K.V
2023-07-27  8:02 ` [PATCH v6 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block Aneesh Kumar K.V
2023-07-27  8:02   ` Aneesh Kumar K.V
2023-07-27  9:25   ` Michal Hocko
2023-07-27  9:25     ` Michal Hocko
2023-07-27  9:32     ` Aneesh Kumar K V
2023-07-27  9:32       ` Aneesh Kumar K V
2023-07-27 10:56       ` Michal Hocko
2023-07-27 10:56         ` Michal Hocko
2023-07-27 11:17   ` David Hildenbrand
2023-07-27 11:17     ` David Hildenbrand
2023-07-27  8:02 ` [PATCH v6 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Aneesh Kumar K.V
2023-07-27  9:26   ` Michal Hocko
2023-07-27  9:26     ` Michal Hocko
2023-07-27 11:18   ` David Hildenbrand
2023-07-27 11:18     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.