All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] Add support for memmap on memory feature on ppc64
@ 2023-07-25 10:02 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here
https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/

Changes from v4:
* Use altmap.free instead of altmap.reserve
* Address review feedback

Changes from v3:
* Extend the module parameter memmap_on_memory to force allocation even
  though we can waste hotplug memory.

Changes from v2:
* Rebase to latest linus tree
* Redo the series based on review feedback. Multiple changes to the patchset.

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree



Aneesh Kumar K.V (7):
  mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  mm/hotplug: Allow memmap on memory hotplug request to fallback
  mm/hotplug: Allow architecture to override memmap on memory support
    check
  mm/hotplug: Support memmap_on_memory when memmap is not aligned to
    pageblocks
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  mm/hotplug: Embed vmem_altmap details in memory block
  mm/hotplug: Enable runtime update of memmap_on_memory parameter

 .../admin-guide/mm/memory-hotplug.rst         |  12 ++
 arch/arm64/Kconfig                            |   4 +-
 arch/powerpc/Kconfig                          |   1 +
 arch/powerpc/include/asm/pgtable.h            |  24 +++
 .../platforms/pseries/hotplug-memory.c        |   3 +-
 arch/x86/Kconfig                              |   4 +-
 drivers/acpi/acpi_memhotplug.c                |   3 +-
 drivers/base/memory.c                         |  32 ++-
 include/linux/memory.h                        |   8 +-
 include/linux/memory_hotplug.h                |   3 +-
 mm/Kconfig                                    |   3 +
 mm/memory_hotplug.c                           | 201 ++++++++++++++----
 12 files changed, 229 insertions(+), 69 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 0/7] Add support for memmap on memory feature on ppc64
@ 2023-07-25 10:02 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here
https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/

Changes from v4:
* Use altmap.free instead of altmap.reserve
* Address review feedback

Changes from v3:
* Extend the module parameter memmap_on_memory to force allocation even
  though we can waste hotplug memory.

Changes from v2:
* Rebase to latest linus tree
* Redo the series based on review feedback. Multiple changes to the patchset.

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree



Aneesh Kumar K.V (7):
  mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  mm/hotplug: Allow memmap on memory hotplug request to fallback
  mm/hotplug: Allow architecture to override memmap on memory support
    check
  mm/hotplug: Support memmap_on_memory when memmap is not aligned to
    pageblocks
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  mm/hotplug: Embed vmem_altmap details in memory block
  mm/hotplug: Enable runtime update of memmap_on_memory parameter

 .../admin-guide/mm/memory-hotplug.rst         |  12 ++
 arch/arm64/Kconfig                            |   4 +-
 arch/powerpc/Kconfig                          |   1 +
 arch/powerpc/include/asm/pgtable.h            |  24 +++
 .../platforms/pseries/hotplug-memory.c        |   3 +-
 arch/x86/Kconfig                              |   4 +-
 drivers/acpi/acpi_memhotplug.c                |   3 +-
 drivers/base/memory.c                         |  32 ++-
 include/linux/memory.h                        |   8 +-
 include/linux/memory_hotplug.h                |   3 +-
 mm/Kconfig                                    |   3 +
 mm/memory_hotplug.c                           | 201 ++++++++++++++----
 12 files changed, 229 insertions(+), 69 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 1/7] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/Kconfig | 4 +---
 arch/x86/Kconfig   | 4 +---
 mm/Kconfig         | 3 +++
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1573257a4d6..0f749cfab8e6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_GNU_PROPERTY
 	select ARCH_USE_MEMTEST
@@ -347,9 +348,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 config SMP
 	def_bool y
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
 	def_bool y
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 5fe49c030961..721dc88423c7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -571,6 +571,9 @@ config MHP_MEMMAP_ON_MEMORY
 
 endif # MEMORY_HOTPLUG
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+       bool
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 1/7] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/Kconfig | 4 +---
 arch/x86/Kconfig   | 4 +---
 mm/Kconfig         | 3 +++
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1573257a4d6..0f749cfab8e6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_GNU_PROPERTY
 	select ARCH_USE_MEMTEST
@@ -347,9 +348,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 config SMP
 	def_bool y
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
 	def_bool y
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 5fe49c030961..721dc88423c7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -571,6 +571,9 @@ config MHP_MEMMAP_ON_MEMORY
 
 endif # MEMORY_HOTPLUG
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+       bool
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/acpi/acpi_memhotplug.c |  3 +--
 include/linux/memory_hotplug.h |  3 ++-
 mm/memory_hotplug.c            | 13 ++++++-------
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..d0c1a71007d0 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (!info->length)
 			continue;
 
-		if (mhp_supports_memmap_on_memory(info->length))
-			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+		mhp_flags |= MHP_MEMMAP_ON_MEMORY;
 		result = __add_memory(mgid, info->start_addr, info->length,
 				      mhp_flags);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..7d2076583494 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -97,6 +97,8 @@ typedef int __bitwise mhp_t;
  * To do so, we will use the beginning of the hot-added range to build
  * the page tables for the memmap array that describes the entire range.
  * Only selected architectures support it with SPARSE_VMEMMAP.
+ * This is only a hint, the core kernel can decide to not do this based on
+ * different alignment checks.
  */
 #define MHP_MEMMAP_ON_MEMORY   ((__force mhp_t)BIT(1))
 /*
@@ -354,7 +356,6 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
 extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
 				      struct mhp_params *params);
 void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7cfd13c91568..eca32ccd45cc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-bool mhp_supports_memmap_on_memory(unsigned long size)
+static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 * Self hosted memmap array
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-		if (!mhp_supports_memmap_on_memory(size)) {
-			ret = -EINVAL;
-			goto error;
+		if (mhp_supports_memmap_on_memory(size)) {
+			mhp_altmap.free = PHYS_PFN(size);
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			params.altmap = &mhp_altmap;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
-		params.altmap = &mhp_altmap;
+		/* fallback to not using altmap  */
 	}
 
 	/* call arch's memory hotadd */
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/acpi/acpi_memhotplug.c |  3 +--
 include/linux/memory_hotplug.h |  3 ++-
 mm/memory_hotplug.c            | 13 ++++++-------
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..d0c1a71007d0 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (!info->length)
 			continue;
 
-		if (mhp_supports_memmap_on_memory(info->length))
-			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+		mhp_flags |= MHP_MEMMAP_ON_MEMORY;
 		result = __add_memory(mgid, info->start_addr, info->length,
 				      mhp_flags);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..7d2076583494 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -97,6 +97,8 @@ typedef int __bitwise mhp_t;
  * To do so, we will use the beginning of the hot-added range to build
  * the page tables for the memmap array that describes the entire range.
  * Only selected architectures support it with SPARSE_VMEMMAP.
+ * This is only a hint, the core kernel can decide to not do this based on
+ * different alignment checks.
  */
 #define MHP_MEMMAP_ON_MEMORY   ((__force mhp_t)BIT(1))
 /*
@@ -354,7 +356,6 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
 extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
 				      struct mhp_params *params);
 void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7cfd13c91568..eca32ccd45cc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-bool mhp_supports_memmap_on_memory(unsigned long size)
+static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 * Self hosted memmap array
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-		if (!mhp_supports_memmap_on_memory(size)) {
-			ret = -EINVAL;
-			goto error;
+		if (mhp_supports_memmap_on_memory(size)) {
+			mhp_altmap.free = PHYS_PFN(size);
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			params.altmap = &mhp_altmap;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
-		params.altmap = &mhp_altmap;
+		/* fallback to not using altmap  */
 	}
 
 	/* call arch's memory hotadd */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Some architectures would want different restrictions. Hence add an
architecture-specific override.

The PMD_SIZE check is moved there.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index eca32ccd45cc..457824a6ecb8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,9 +1247,24 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
+#ifndef arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
+{
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+
+	/*
+	 * As default, we want the vmemmap to span a complete PMD such that we
+	 * can map the vmemmap using a single PMD if supported by the
+	 * architecture.
+	 */
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+}
+#endif
+
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
-	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
 	unsigned long remaining_size = size - vmemmap_size;
 
@@ -1281,8 +1296,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 */
 	return mhp_memmap_on_memory() &&
 	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
+	       arch_supports_memmap_on_memory(size);
 }
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Some architectures would want different restrictions. Hence add an
architecture-specific override.

The PMD_SIZE check is moved there.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index eca32ccd45cc..457824a6ecb8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,9 +1247,24 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
+#ifndef arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
+{
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+
+	/*
+	 * As default, we want the vmemmap to span a complete PMD such that we
+	 * can map the vmemmap using a single PMD if supported by the
+	 * architecture.
+	 */
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+}
+#endif
+
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
-	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
 	unsigned long remaining_size = size - vmemmap_size;
 
@@ -1281,8 +1296,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 */
 	return mhp_memmap_on_memory() &&
 	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
+	       arch_supports_memmap_on_memory(size);
 }
 
 /*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.

This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps the start address to be page
block aligned with different memory block sizes. Using this facility
implies the kernel will be reserving some pages for every memoryblock.
This allows the memmap on memory feature to be widely useful with
different memory block size values.

For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 .../admin-guide/mm/memory-hotplug.rst         |  12 ++
 mm/memory_hotplug.c                           | 121 ++++++++++++++++--
 2 files changed, 119 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index bd77841041af..2994958c7ce8 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -433,6 +433,18 @@ The following module parameters are currently defined:
 				 memory in a way that huge pages in bigger
 				 granularity cannot be formed on hotplugged
 				 memory.
+
+				 With value "force" it could result in memory
+				 wastage due to memmap size limitations. For
+				 example, if the memmap for a memory block
+				 requires 1 MiB, but the pageblock size is 2
+				 MiB, 1 MiB of hotplugged memory will be wasted.
+				 Note that there are still cases where the
+				 feature cannot be enforced: for example, if the
+				 memmap is smaller than a single page, or if the
+				 architecture does not support the forced mode
+				 in all configurations.
+
 ``online_policy``		 read-write: Set the basic policy used for
 				 automatic zone selection when onlining memory
 				 blocks without specifying a target zone.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 457824a6ecb8..5b472e137898 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -41,17 +41,89 @@
 #include "internal.h"
 #include "shuffle.h"
 
+enum {
+	MEMMAP_ON_MEMORY_DISABLE = 0,
+	MEMMAP_ON_MEMORY_ENABLE,
+	MEMMAP_ON_MEMORY_FORCE,
+};
+
+static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
+
+static inline unsigned long memory_block_memmap_pages(void)
+{
+	unsigned long memmap_size;
+
+	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
+	return memmap_size >> PAGE_SHIFT;
+}
+
+static inline unsigned long memory_block_memmap_on_memory_pages(void)
+{
+	unsigned long nr_pages = memory_block_memmap_pages();
+
+	/*
+	 * In "forced" memmap_on_memory mode, we add extra pages to align the
+	 * vmemmap size to cover full pageblocks. That way, we can add memory
+	 * even if the vmemmap size is not properly aligned, however, we might waste
+	 * memory.
+	 */
+	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
+		return pageblock_align(nr_pages);
+	return nr_pages;
+}
+
 #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
-module_param(memmap_on_memory, bool, 0444);
-MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
+static int set_memmap_mode(const char *val, const struct kernel_param *kp)
+{
+	int ret, mode;
+	bool enabled;
+
+	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
+		mode =  MEMMAP_ON_MEMORY_FORCE;
+		goto matched;
+	}
+
+	ret = kstrtobool(val, &enabled);
+	if (ret < 0)
+		return ret;
+	if (enabled)
+		mode =  MEMMAP_ON_MEMORY_ENABLE;
+	else
+		mode =  MEMMAP_ON_MEMORY_DISABLE;
+
+matched:
+	*((int *)kp->arg) =  mode;
+	if (mode == MEMMAP_ON_MEMORY_FORCE) {
+		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
+
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memmap_pages - memory_block_memmap_pages());
+	}
+	return 0;
+}
+
+static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
+{
+	if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE)
+		return sprintf(buffer,  "force\n");
+	return param_get_bool(buffer, kp);
+}
+
+static const struct kernel_param_ops memmap_mode_ops = {
+	.set = set_memmap_mode,
+	.get = get_memmap_mode,
+};
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
+		 "With value \"force\" it could result in memory wastage due "
+		 "to memmap size limitations (Y/N/force)");
 
 static inline bool mhp_memmap_on_memory(void)
 {
-	return memmap_on_memory;
+	return memmap_mode != MEMMAP_ON_MEMORY_DISABLE;
 }
 #else
 static inline bool mhp_memmap_on_memory(void)
@@ -1266,7 +1338,7 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
-	unsigned long remaining_size = size - vmemmap_size;
+	unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
@@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
-	       arch_supports_memmap_on_memory(size);
+	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+		return false;
+
+	/*
+	 * Make sure the vmemmap allocation is fully contained
+	 * so that we always allocate vmemmap memory from altmap area.
+	 */
+	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
+		return false;
+
+	/*
+	 * start pfn should be pageblock_nr_pages aligned for correctly
+	 * setting migrate types
+	 */
+	if (!pageblock_aligned(memmap_pages))
+		return false;
+
+	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
+		/* No effective hotplugged memory doesn't make sense. */
+		return false;
+
+	return arch_supports_memmap_on_memory(size);
 }
 
 /*
@@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1355,8 +1448,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = memory_block_memmap_on_memory_pages();
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1368,8 +1460,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
-					  group);
+	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -2095,6 +2186,8 @@ static int __ref try_remove_memory(u64 start, u64 size)
 			 * right thing if we used vmem_altmap when hot-adding
 			 * the range.
 			 */
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = nr_vmemmap_pages;
 			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.

This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps the start address to be page
block aligned with different memory block sizes. Using this facility
implies the kernel will be reserving some pages for every memoryblock.
This allows the memmap on memory feature to be widely useful with
different memory block size values.

For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 .../admin-guide/mm/memory-hotplug.rst         |  12 ++
 mm/memory_hotplug.c                           | 121 ++++++++++++++++--
 2 files changed, 119 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index bd77841041af..2994958c7ce8 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -433,6 +433,18 @@ The following module parameters are currently defined:
 				 memory in a way that huge pages in bigger
 				 granularity cannot be formed on hotplugged
 				 memory.
+
+				 With value "force" it could result in memory
+				 wastage due to memmap size limitations. For
+				 example, if the memmap for a memory block
+				 requires 1 MiB, but the pageblock size is 2
+				 MiB, 1 MiB of hotplugged memory will be wasted.
+				 Note that there are still cases where the
+				 feature cannot be enforced: for example, if the
+				 memmap is smaller than a single page, or if the
+				 architecture does not support the forced mode
+				 in all configurations.
+
 ``online_policy``		 read-write: Set the basic policy used for
 				 automatic zone selection when onlining memory
 				 blocks without specifying a target zone.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 457824a6ecb8..5b472e137898 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -41,17 +41,89 @@
 #include "internal.h"
 #include "shuffle.h"
 
+enum {
+	MEMMAP_ON_MEMORY_DISABLE = 0,
+	MEMMAP_ON_MEMORY_ENABLE,
+	MEMMAP_ON_MEMORY_FORCE,
+};
+
+static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
+
+static inline unsigned long memory_block_memmap_pages(void)
+{
+	unsigned long memmap_size;
+
+	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
+	return memmap_size >> PAGE_SHIFT;
+}
+
+static inline unsigned long memory_block_memmap_on_memory_pages(void)
+{
+	unsigned long nr_pages = memory_block_memmap_pages();
+
+	/*
+	 * In "forced" memmap_on_memory mode, we add extra pages to align the
+	 * vmemmap size to cover full pageblocks. That way, we can add memory
+	 * even if the vmemmap size is not properly aligned, however, we might waste
+	 * memory.
+	 */
+	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
+		return pageblock_align(nr_pages);
+	return nr_pages;
+}
+
 #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
-module_param(memmap_on_memory, bool, 0444);
-MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
+static int set_memmap_mode(const char *val, const struct kernel_param *kp)
+{
+	int ret, mode;
+	bool enabled;
+
+	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
+		mode =  MEMMAP_ON_MEMORY_FORCE;
+		goto matched;
+	}
+
+	ret = kstrtobool(val, &enabled);
+	if (ret < 0)
+		return ret;
+	if (enabled)
+		mode =  MEMMAP_ON_MEMORY_ENABLE;
+	else
+		mode =  MEMMAP_ON_MEMORY_DISABLE;
+
+matched:
+	*((int *)kp->arg) =  mode;
+	if (mode == MEMMAP_ON_MEMORY_FORCE) {
+		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
+
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memmap_pages - memory_block_memmap_pages());
+	}
+	return 0;
+}
+
+static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
+{
+	if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE)
+		return sprintf(buffer,  "force\n");
+	return param_get_bool(buffer, kp);
+}
+
+static const struct kernel_param_ops memmap_mode_ops = {
+	.set = set_memmap_mode,
+	.get = get_memmap_mode,
+};
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
+		 "With value \"force\" it could result in memory wastage due "
+		 "to memmap size limitations (Y/N/force)");
 
 static inline bool mhp_memmap_on_memory(void)
 {
-	return memmap_on_memory;
+	return memmap_mode != MEMMAP_ON_MEMORY_DISABLE;
 }
 #else
 static inline bool mhp_memmap_on_memory(void)
@@ -1266,7 +1338,7 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
-	unsigned long remaining_size = size - vmemmap_size;
+	unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
@@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
-	       arch_supports_memmap_on_memory(size);
+	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+		return false;
+
+	/*
+	 * Make sure the vmemmap allocation is fully contained
+	 * so that we always allocate vmemmap memory from altmap area.
+	 */
+	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
+		return false;
+
+	/*
+	 * start pfn should be pageblock_nr_pages aligned for correctly
+	 * setting migrate types
+	 */
+	if (!pageblock_aligned(memmap_pages))
+		return false;
+
+	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
+		/* No effective hotplugged memory doesn't make sense. */
+		return false;
+
+	return arch_supports_memmap_on_memory(size);
 }
 
 /*
@@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1355,8 +1448,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = memory_block_memmap_on_memory_pages();
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1368,8 +1460,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
-					  group);
+	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -2095,6 +2186,8 @@ static int __ref try_remove_memory(u64 start, u64 size)
 			 * right thing if we used vmem_altmap when hot-adding
 			 * the range.
 			 */
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = nr_vmemmap_pages;
 			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB

We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/include/asm/pgtable.h            | 24 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  3 ++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d0497d13f5b4..938294c996dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index a4893b17705a..9b4a1fd24025 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -161,6 +161,30 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
 int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
 bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
 			   unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
+
+	if (!radix_enabled())
+		return false;
+	/*
+	 * With 4K page size and 2M PMD_SIZE, we can align
+	 * things better with memory block size value
+	 * starting from 128MB. Hence align things with PMD_SIZE.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+		return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+	return true;
+}
+
 #endif /* CONFIG_PPC64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..1447509357a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
+	mhp_t mhp_flags = MHP_NONE | MHP_MEMMAP_ON_MEMORY;
 	unsigned long block_sz;
 	int nid, rc;
 
@@ -637,7 +638,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB

We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/include/asm/pgtable.h            | 24 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  3 ++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d0497d13f5b4..938294c996dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index a4893b17705a..9b4a1fd24025 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -161,6 +161,30 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
 int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
 bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
 			   unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
+
+	if (!radix_enabled())
+		return false;
+	/*
+	 * With 4K page size and 2M PMD_SIZE, we can align
+	 * things better with memory block size value
+	 * starting from 128MB. Hence align things with PMD_SIZE.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+		return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+	return true;
+}
+
 #endif /* CONFIG_PPC64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..1447509357a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
+	mhp_t mhp_flags = MHP_NONE | MHP_MEMMAP_ON_MEMORY;
 	unsigned long block_sz;
 	int nid, rc;
 
@@ -637,7 +638,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

No functional change in this patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c  | 32 +++++++++++++++++++++++---------
 include/linux/memory.h |  8 ++------
 mm/memory_hotplug.c    | 41 ++++++++++++++++++++++-------------------
 3 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..0210ed7b7696 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	kfree(mem->altmap);
 	kfree(mem);
 }
 
@@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	if (altmap) {
+		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+		if (!mem->altmap) {
+			kfree(mem);
+			return -ENOMEM;
+		}
+		memcpy(mem->altmap, altmap, sizeof(*altmap));
+	}
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +797,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +832,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +846,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..f53cfdaaaa41 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -77,11 +77,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap *altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5b472e137898..96e794f39313 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1460,7 +1460,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
+	ret = create_memory_block_devices(start, size, params.altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -2066,12 +2066,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * If we have any pages allocated from altmap
+	 * return the altmap details and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap) {
+		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
-	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
+	int ret;
+	struct vmem_altmap mhp_altmap, *altmap = NULL;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &mhp_altmap,
+					 get_vmemmap_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
-			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
-			 */
-			mhp_altmap.base_pfn = PHYS_PFN(start);
-			mhp_altmap.free = nr_vmemmap_pages;
-			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
 	}
@@ -2206,6 +2202,13 @@ static int __ref try_remove_memory(u64 start, u64 size)
 
 	arch_remove_memory(start, size, altmap);
 
+	/*
+	 * Now that we are tracking alloc and free correctly
+	 * we can add check to verify altmap free pages.
+	 */
+	if (altmap)
+		WARN(altmap->alloc, "Altmap not fully unmapped");
+
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
 		memblock_phys_free(start, size);
 		memblock_remove(start, size);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

No functional change in this patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c  | 32 +++++++++++++++++++++++---------
 include/linux/memory.h |  8 ++------
 mm/memory_hotplug.c    | 41 ++++++++++++++++++++++-------------------
 3 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..0210ed7b7696 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	kfree(mem->altmap);
 	kfree(mem);
 }
 
@@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	if (altmap) {
+		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+		if (!mem->altmap) {
+			kfree(mem);
+			return -ENOMEM;
+		}
+		memcpy(mem->altmap, altmap, sizeof(*altmap));
+	}
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +797,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +832,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +846,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..f53cfdaaaa41 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -77,11 +77,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap *altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5b472e137898..96e794f39313 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1460,7 +1460,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
+	ret = create_memory_block_devices(start, size, params.altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -2066,12 +2066,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * If we have any pages allocated from altmap
+	 * return the altmap details and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap) {
+		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
-	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
+	int ret;
+	struct vmem_altmap mhp_altmap, *altmap = NULL;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &mhp_altmap,
+					 get_vmemmap_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
-			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
-			 */
-			mhp_altmap.base_pfn = PHYS_PFN(start);
-			mhp_altmap.free = nr_vmemmap_pages;
-			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
 	}
@@ -2206,6 +2202,13 @@ static int __ref try_remove_memory(u64 start, u64 size)
 
 	arch_remove_memory(start, size, altmap);
 
+	/*
+	 * Now that we are tracking alloc and free correctly
+	 * we can add check to verify altmap free pages.
+	 */
+	if (altmap)
+		WARN(altmap->alloc, "Altmap not fully unmapped");
+
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
 		memblock_phys_free(start, size);
 		memblock_remove(start, size);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 7/7] mm/hotplug: Enable runtime update of memmap_on_memory parameter
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 96e794f39313..6cb6eac1aee5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,7 +95,12 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
 		mode =  MEMMAP_ON_MEMORY_DISABLE;
 
 matched:
+	/*
+	 * Avoid changing memmap mode during hotplug.
+	 */
+	get_online_mems();
 	*((int *)kp->arg) =  mode;
+	put_online_mems();
 	if (mode == MEMMAP_ON_MEMORY_FORCE) {
 		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
@@ -116,7 +121,7 @@ static const struct kernel_param_ops memmap_mode_ops = {
 	.set = set_memmap_mode,
 	.get = get_memmap_mode,
 };
-module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
 MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
 		 "With value \"force\" it could result in memory wastage due "
 		 "to memmap size limitations (Y/N/force)");
@@ -2175,18 +2180,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
 	 * the same granularity it was added - a single memory block.
 	 */
-	if (mhp_memmap_on_memory()) {
-		ret = walk_memory_blocks(start, size, &mhp_altmap,
-					 get_vmemmap_altmap_cb);
-		if (ret) {
-			if (size != memory_block_size_bytes()) {
-				pr_warn("Refuse to remove %#llx - %#llx,"
-					"wrong granularity\n",
-					start, start + size);
-				return -EINVAL;
-			}
-			altmap = &mhp_altmap;
+	ret = walk_memory_blocks(start, size, &mhp_altmap,
+				 get_vmemmap_altmap_cb);
+	if (ret) {
+		if (size != memory_block_size_bytes()) {
+			pr_warn("Refuse to remove %#llx - %#llx,"
+				"wrong granularity\n",
+				start, start + size);
+			return -EINVAL;
 		}
+		altmap = &mhp_altmap;
 	}
 
 	/* remove memmap entry */
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 7/7] mm/hotplug: Enable runtime update of memmap_on_memory parameter
@ 2023-07-25 10:02   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-25 10:02 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 96e794f39313..6cb6eac1aee5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,7 +95,12 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
 		mode =  MEMMAP_ON_MEMORY_DISABLE;
 
 matched:
+	/*
+	 * Avoid changing memmap mode during hotplug.
+	 */
+	get_online_mems();
 	*((int *)kp->arg) =  mode;
+	put_online_mems();
 	if (mode == MEMMAP_ON_MEMORY_FORCE) {
 		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
@@ -116,7 +121,7 @@ static const struct kernel_param_ops memmap_mode_ops = {
 	.set = set_memmap_mode,
 	.get = get_memmap_mode,
 };
-module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
 MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
 		 "With value \"force\" it could result in memory wastage due "
 		 "to memmap size limitations (Y/N/force)");
@@ -2175,18 +2180,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
 	 * the same granularity it was added - a single memory block.
 	 */
-	if (mhp_memmap_on_memory()) {
-		ret = walk_memory_blocks(start, size, &mhp_altmap,
-					 get_vmemmap_altmap_cb);
-		if (ret) {
-			if (size != memory_block_size_bytes()) {
-				pr_warn("Refuse to remove %#llx - %#llx,"
-					"wrong granularity\n",
-					start, start + size);
-				return -EINVAL;
-			}
-			altmap = &mhp_altmap;
+	ret = walk_memory_blocks(start, size, &mhp_altmap,
+				 get_vmemmap_altmap_cb);
+	if (ret) {
+		if (size != memory_block_size_bytes()) {
+			pr_warn("Refuse to remove %#llx - %#llx,"
+				"wrong granularity\n",
+				start, start + size);
+			return -EINVAL;
 		}
+		altmap = &mhp_altmap;
 	}
 
 	/* remove memmap entry */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 0/7] Add support for memmap on memory feature on ppc64
  2023-07-25 10:02 ` Aneesh Kumar K.V
@ 2023-07-25 10:06   ` David Hildenbrand
  -1 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 10:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> This patch series update memmap on memory feature to fall back to
> memmap allocation outside the memory block if the alignment rules are
> not met. This makes the feature more useful on architectures like
> ppc64 where alignment rules are different with 64K page size.
> 
> This patch series is dependent on dax vmemmap optimization series
> posted here
> https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/
> 
> Changes from v4:
> * Use altmap.free instead of altmap.reserve
> * Address review feedback
> 
> Changes from v3:
> * Extend the module parameter memmap_on_memory to force allocation even
>    though we can waste hotplug memory.
> 
> Changes from v2:
> * Rebase to latest linus tree
> * Redo the series based on review feedback. Multiple changes to the patchset.
> 
> Changes from v1:
> * update the memblock to store vmemmap_altmap details. This is required
> so that when we remove the memory we can find the altmap details which
> is needed on some architectures.
> * rebase to latest linus tree
> 
> 
> 
> Aneesh Kumar K.V (7):
>    mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
>    mm/hotplug: Allow memmap on memory hotplug request to fallback
>    mm/hotplug: Allow architecture to override memmap on memory support
>      check
>    mm/hotplug: Support memmap_on_memory when memmap is not aligned to
>      pageblocks
>    powerpc/book3s64/memhotplug: Enable memmap on memory for radix
>    mm/hotplug: Embed vmem_altmap details in memory block
>    mm/hotplug: Enable runtime update of memmap_on_memory parameter

Nit for all patches: we use "mm/memory_hotplug" as prefix. (I somehow 
missed that earlier :D )

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 0/7] Add support for memmap on memory feature on ppc64
@ 2023-07-25 10:06   ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 10:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> This patch series update memmap on memory feature to fall back to
> memmap allocation outside the memory block if the alignment rules are
> not met. This makes the feature more useful on architectures like
> ppc64 where alignment rules are different with 64K page size.
> 
> This patch series is dependent on dax vmemmap optimization series
> posted here
> https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/
> 
> Changes from v4:
> * Use altmap.free instead of altmap.reserve
> * Address review feedback
> 
> Changes from v3:
> * Extend the module parameter memmap_on_memory to force allocation even
>    though we can waste hotplug memory.
> 
> Changes from v2:
> * Rebase to latest linus tree
> * Redo the series based on review feedback. Multiple changes to the patchset.
> 
> Changes from v1:
> * update the memblock to store vmemmap_altmap details. This is required
> so that when we remove the memory we can find the altmap details which
> is needed on some architectures.
> * rebase to latest linus tree
> 
> 
> 
> Aneesh Kumar K.V (7):
>    mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
>    mm/hotplug: Allow memmap on memory hotplug request to fallback
>    mm/hotplug: Allow architecture to override memmap on memory support
>      check
>    mm/hotplug: Support memmap_on_memory when memmap is not aligned to
>      pageblocks
>    powerpc/book3s64/memhotplug: Enable memmap on memory for radix
>    mm/hotplug: Embed vmem_altmap details in memory block
>    mm/hotplug: Enable runtime update of memmap_on_memory parameter

Nit for all patches: we use "mm/memory_hotplug" as prefix. (I somehow 
missed that earlier :D )

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-25 10:02   ` Aneesh Kumar K.V
@ 2023-07-25 10:09     ` David Hildenbrand
  -1 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 10:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

>   #endif /* CONFIG_PPC64 */
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 9c62c2c3b3d0..1447509357a7 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
>   
>   static int dlpar_add_lmb(struct drmem_lmb *lmb)
>   {
> +	mhp_t mhp_flags = MHP_NONE | MHP_MEMMAP_ON_MEMORY;

MHP_NONE is a placeholder, so no need for that and ...

>   	unsigned long block_sz;
>   	int nid, rc;
>   
> @@ -637,7 +638,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>   		nid = first_online_node;
>   
>   	/* Add the memory */
> -	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
> +	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
... this becomes

	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);

With that

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-25 10:09     ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 10:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

>   #endif /* CONFIG_PPC64 */
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 9c62c2c3b3d0..1447509357a7 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
>   
>   static int dlpar_add_lmb(struct drmem_lmb *lmb)
>   {
> +	mhp_t mhp_flags = MHP_NONE | MHP_MEMMAP_ON_MEMORY;

MHP_NONE is a placeholder, so no need for that and ...

>   	unsigned long block_sz;
>   	int nid, rc;
>   
> @@ -637,7 +638,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>   		nid = first_online_node;
>   
>   	/* Add the memory */
> -	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
> +	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
... this becomes

	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);

With that

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/7] mm/hotplug: Enable runtime update of memmap_on_memory parameter
  2023-07-25 10:02   ` Aneesh Kumar K.V
@ 2023-07-25 17:52     ` David Hildenbrand
  -1 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 17:52 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/memory_hotplug.c | 27 +++++++++++++++------------
>   1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 96e794f39313..6cb6eac1aee5 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -95,7 +95,12 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
>   		mode =  MEMMAP_ON_MEMORY_DISABLE;
>   
>   matched:
> +	/*
> +	 * Avoid changing memmap mode during hotplug.
> +	 */
> +	get_online_mems();
>   	*((int *)kp->arg) =  mode;
> +	put_online_mems();
>   	if (mode == MEMMAP_ON_MEMORY_FORCE) {
>   		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
>   
> @@ -116,7 +121,7 @@ static const struct kernel_param_ops memmap_mode_ops = {
>   	.set = set_memmap_mode,
>   	.get = get_memmap_mode,
>   };
> -module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
> +module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
>   MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
>   		 "With value \"force\" it could result in memory wastage due "
>   		 "to memmap size limitations (Y/N/force)");
> @@ -2175,18 +2180,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
>   	 * the same granularity it was added - a single memory block.
>   	 */
> -	if (mhp_memmap_on_memory()) {
> -		ret = walk_memory_blocks(start, size, &mhp_altmap,
> -					 get_vmemmap_altmap_cb);
> -		if (ret) {
> -			if (size != memory_block_size_bytes()) {
> -				pr_warn("Refuse to remove %#llx - %#llx,"
> -					"wrong granularity\n",
> -					start, start + size);
> -				return -EINVAL;
> -			}
> -			altmap = &mhp_altmap;
> +	ret = walk_memory_blocks(start, size, &mhp_altmap,
> +				 get_vmemmap_altmap_cb);
> +	if (ret) {
> +		if (size != memory_block_size_bytes()) {
> +			pr_warn("Refuse to remove %#llx - %#llx,"
> +				"wrong granularity\n",
> +				start, start + size);
> +			return -EINVAL;
>   		}
> +		altmap = &mhp_altmap;
>   	}
>   
>   	/* remove memmap entry */


Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/7] mm/hotplug: Enable runtime update of memmap_on_memory parameter
@ 2023-07-25 17:52     ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 17:52 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/memory_hotplug.c | 27 +++++++++++++++------------
>   1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 96e794f39313..6cb6eac1aee5 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -95,7 +95,12 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
>   		mode =  MEMMAP_ON_MEMORY_DISABLE;
>   
>   matched:
> +	/*
> +	 * Avoid changing memmap mode during hotplug.
> +	 */
> +	get_online_mems();
>   	*((int *)kp->arg) =  mode;
> +	put_online_mems();
>   	if (mode == MEMMAP_ON_MEMORY_FORCE) {
>   		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
>   
> @@ -116,7 +121,7 @@ static const struct kernel_param_ops memmap_mode_ops = {
>   	.set = set_memmap_mode,
>   	.get = get_memmap_mode,
>   };
> -module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
> +module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
>   MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
>   		 "With value \"force\" it could result in memory wastage due "
>   		 "to memmap size limitations (Y/N/force)");
> @@ -2175,18 +2180,16 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
>   	 * the same granularity it was added - a single memory block.
>   	 */
> -	if (mhp_memmap_on_memory()) {
> -		ret = walk_memory_blocks(start, size, &mhp_altmap,
> -					 get_vmemmap_altmap_cb);
> -		if (ret) {
> -			if (size != memory_block_size_bytes()) {
> -				pr_warn("Refuse to remove %#llx - %#llx,"
> -					"wrong granularity\n",
> -					start, start + size);
> -				return -EINVAL;
> -			}
> -			altmap = &mhp_altmap;
> +	ret = walk_memory_blocks(start, size, &mhp_altmap,
> +				 get_vmemmap_altmap_cb);
> +	if (ret) {
> +		if (size != memory_block_size_bytes()) {
> +			pr_warn("Refuse to remove %#llx - %#llx,"
> +				"wrong granularity\n",
> +				start, start + size);
> +			return -EINVAL;
>   		}
> +		altmap = &mhp_altmap;
>   	}
>   
>   	/* remove memmap entry */


Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-25 10:02   ` Aneesh Kumar K.V
@ 2023-07-25 18:06     ` David Hildenbrand
  -1 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 18:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> Currently, memmap_on_memory feature is only supported with memory block
> sizes that result in vmemmap pages covering full page blocks. This is
> because memory onlining/offlining code requires applicable ranges to be
> pageblock-aligned, for example, to set the migratetypes properly.
> 
> This patch helps to lift that restriction by reserving more pages than
> required for vmemmap space. This helps the start address to be page
> block aligned with different memory block sizes. Using this facility
> implies the kernel will be reserving some pages for every memoryblock.
> This allows the memmap on memory feature to be widely useful with
> different memory block size values.
> 
> For ex: with 64K page size and 256MiB memory block size, we require 4
> pages to map vmemmap pages, To align things correctly we end up adding a
> reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   .../admin-guide/mm/memory-hotplug.rst         |  12 ++
>   mm/memory_hotplug.c                           | 121 ++++++++++++++++--
>   2 files changed, 119 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
> index bd77841041af..2994958c7ce8 100644
> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
> @@ -433,6 +433,18 @@ The following module parameters are currently defined:
>   				 memory in a way that huge pages in bigger
>   				 granularity cannot be formed on hotplugged
>   				 memory.
> +
> +				 With value "force" it could result in memory
> +				 wastage due to memmap size limitations. For
> +				 example, if the memmap for a memory block
> +				 requires 1 MiB, but the pageblock size is 2
> +				 MiB, 1 MiB of hotplugged memory will be wasted.
> +				 Note that there are still cases where the
> +				 feature cannot be enforced: for example, if the
> +				 memmap is smaller than a single page, or if the
> +				 architecture does not support the forced mode
> +				 in all configurations.
> +
>   ``online_policy``		 read-write: Set the basic policy used for
>   				 automatic zone selection when onlining memory
>   				 blocks without specifying a target zone.
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 457824a6ecb8..5b472e137898 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -41,17 +41,89 @@
>   #include "internal.h"
>   #include "shuffle.h"
>   
> +enum {
> +	MEMMAP_ON_MEMORY_DISABLE = 0,
> +	MEMMAP_ON_MEMORY_ENABLE,
> +	MEMMAP_ON_MEMORY_FORCE,
> +};
> +
> +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
> +
> +static inline unsigned long memory_block_memmap_pages(void)
> +{
> +	unsigned long memmap_size;
> +
> +	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
> +	return memmap_size >> PAGE_SHIFT;

I'd really move a !page variant (memory_block_memmap_size()) to the 
previous patch and use it in mhp_supports_memmap_on_memory() and 
arch_supports_memmap_on_memory().

Then, in this patch, reuse that function in 
memory_block_memmap_on_memory_pages() and ...

> +}
> +
> +static inline unsigned long memory_block_memmap_on_memory_pages(void)
> +{
> +	unsigned long nr_pages = memory_block_memmap_pages();

... do here a

nr_pages = PHYS_PFN(memory_block_memmap_size());


Conceptually, it would be even cleaner to have here

nr_pages = PFN_UP(memory_block_memmap_size());

even though one can argue that mhp_supports_memmap_on_memory() will make 
sure that the unaligned value (memory_block_memmap_size()) covers full 
pages, but at least to me it looks cleaner that way. No strong opinion.


> +
> +	/*
> +	 * In "forced" memmap_on_memory mode, we add extra pages to align the
> +	 * vmemmap size to cover full pageblocks. That way, we can add memory
> +	 * even if the vmemmap size is not properly aligned, however, we might waste
> +	 * memory.
> +	 */
> +	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
> +		return pageblock_align(nr_pages);
> +	return nr_pages;
> +}
> +
>   #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>   /*
>    * memory_hotplug.memmap_on_memory parameter
>    */
> -static bool memmap_on_memory __ro_after_init;
> -module_param(memmap_on_memory, bool, 0444);
> -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
> +static int set_memmap_mode(const char *val, const struct kernel_param *kp)
> +{
> +	int ret, mode;
> +	bool enabled;
> +
> +	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
> +		mode =  MEMMAP_ON_MEMORY_FORCE;
> +		goto matched;
> +	}

Avoid the goto + label

} else {
	ret = kstrtobool(val, &enabled);
	...
}

*((int *)kp->arg) =  mode;

> +
> +	ret = kstrtobool(val, &enabled);
> +	if (ret < 0)
> +		return ret;
> +	if (enabled)
> +		mode =  MEMMAP_ON_MEMORY_ENABLE;
> +	else
> +		mode =  MEMMAP_ON_MEMORY_DISABLE;
> +
> +matched:
> +	*((int *)kp->arg) =  mode;
> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
> +
> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
> +			memmap_pages - memory_block_memmap_pages());

pr_info_once() ?

> +	}
> +	return 0;
> +}
> +

[...]

>   	/*
>   	 * Besides having arch support and the feature enabled at runtime, we
> @@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 *       altmap as an alternative source of memory, and we do not exactly
>   	 *       populate a single PMD.
>   	 */
> -	return mhp_memmap_on_memory() &&
> -	       size == memory_block_size_bytes() &&
> -	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
> -	       arch_supports_memmap_on_memory(size);
> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
> +		return false;
> +
> +	/*
> +	 * Make sure the vmemmap allocation is fully contained
> +	 * so that we always allocate vmemmap memory from altmap area.
> +	 */
> +	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
> +		return false;
> +
> +	/*
> +	 * start pfn should be pageblock_nr_pages aligned for correctly
> +	 * setting migrate types
> +	 */
> +	if (!pageblock_aligned(memmap_pages))
> +		return false;
> +
> +	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
> +		/* No effective hotplugged memory doesn't make sense. */
> +		return false;
> +
> +	return arch_supports_memmap_on_memory(size);
>   }
>   
>   /*
> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>   {
>   	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>   	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
> -	struct vmem_altmap mhp_altmap = {};
> +	struct vmem_altmap mhp_altmap = {
> +		.base_pfn =  PHYS_PFN(res->start),
> +		.end_pfn  =  PHYS_PFN(res->end),

Is it required to set .end_pfn, and if so, shouldn't we also set it to 
base_pfn + memory_block_memmap_on_memory_pages()) ?

We also don't set it on the try_remove_memory() part,.



With these things addressed, feel free to add

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-25 18:06     ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-25 18:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> Currently, memmap_on_memory feature is only supported with memory block
> sizes that result in vmemmap pages covering full page blocks. This is
> because memory onlining/offlining code requires applicable ranges to be
> pageblock-aligned, for example, to set the migratetypes properly.
> 
> This patch helps to lift that restriction by reserving more pages than
> required for vmemmap space. This helps the start address to be page
> block aligned with different memory block sizes. Using this facility
> implies the kernel will be reserving some pages for every memoryblock.
> This allows the memmap on memory feature to be widely useful with
> different memory block size values.
> 
> For ex: with 64K page size and 256MiB memory block size, we require 4
> pages to map vmemmap pages, To align things correctly we end up adding a
> reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   .../admin-guide/mm/memory-hotplug.rst         |  12 ++
>   mm/memory_hotplug.c                           | 121 ++++++++++++++++--
>   2 files changed, 119 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
> index bd77841041af..2994958c7ce8 100644
> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
> @@ -433,6 +433,18 @@ The following module parameters are currently defined:
>   				 memory in a way that huge pages in bigger
>   				 granularity cannot be formed on hotplugged
>   				 memory.
> +
> +				 With value "force" it could result in memory
> +				 wastage due to memmap size limitations. For
> +				 example, if the memmap for a memory block
> +				 requires 1 MiB, but the pageblock size is 2
> +				 MiB, 1 MiB of hotplugged memory will be wasted.
> +				 Note that there are still cases where the
> +				 feature cannot be enforced: for example, if the
> +				 memmap is smaller than a single page, or if the
> +				 architecture does not support the forced mode
> +				 in all configurations.
> +
>   ``online_policy``		 read-write: Set the basic policy used for
>   				 automatic zone selection when onlining memory
>   				 blocks without specifying a target zone.
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 457824a6ecb8..5b472e137898 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -41,17 +41,89 @@
>   #include "internal.h"
>   #include "shuffle.h"
>   
> +enum {
> +	MEMMAP_ON_MEMORY_DISABLE = 0,
> +	MEMMAP_ON_MEMORY_ENABLE,
> +	MEMMAP_ON_MEMORY_FORCE,
> +};
> +
> +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
> +
> +static inline unsigned long memory_block_memmap_pages(void)
> +{
> +	unsigned long memmap_size;
> +
> +	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
> +	return memmap_size >> PAGE_SHIFT;

I'd really move a !page variant (memory_block_memmap_size()) to the 
previous patch and use it in mhp_supports_memmap_on_memory() and 
arch_supports_memmap_on_memory().

Then, in this patch, reuse that function in 
memory_block_memmap_on_memory_pages() and ...

> +}
> +
> +static inline unsigned long memory_block_memmap_on_memory_pages(void)
> +{
> +	unsigned long nr_pages = memory_block_memmap_pages();

... do here a

nr_pages = PHYS_PFN(memory_block_memmap_size());


Conceptually, it would be even cleaner to have here

nr_pages = PFN_UP(memory_block_memmap_size());

even though one can argue that mhp_supports_memmap_on_memory() will make 
sure that the unaligned value (memory_block_memmap_size()) covers full 
pages, but at least to me it looks cleaner that way. No strong opinion.


> +
> +	/*
> +	 * In "forced" memmap_on_memory mode, we add extra pages to align the
> +	 * vmemmap size to cover full pageblocks. That way, we can add memory
> +	 * even if the vmemmap size is not properly aligned, however, we might waste
> +	 * memory.
> +	 */
> +	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
> +		return pageblock_align(nr_pages);
> +	return nr_pages;
> +}
> +
>   #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>   /*
>    * memory_hotplug.memmap_on_memory parameter
>    */
> -static bool memmap_on_memory __ro_after_init;
> -module_param(memmap_on_memory, bool, 0444);
> -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
> +static int set_memmap_mode(const char *val, const struct kernel_param *kp)
> +{
> +	int ret, mode;
> +	bool enabled;
> +
> +	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
> +		mode =  MEMMAP_ON_MEMORY_FORCE;
> +		goto matched;
> +	}

Avoid the goto + label

} else {
	ret = kstrtobool(val, &enabled);
	...
}

*((int *)kp->arg) =  mode;

> +
> +	ret = kstrtobool(val, &enabled);
> +	if (ret < 0)
> +		return ret;
> +	if (enabled)
> +		mode =  MEMMAP_ON_MEMORY_ENABLE;
> +	else
> +		mode =  MEMMAP_ON_MEMORY_DISABLE;
> +
> +matched:
> +	*((int *)kp->arg) =  mode;
> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
> +
> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
> +			memmap_pages - memory_block_memmap_pages());

pr_info_once() ?

> +	}
> +	return 0;
> +}
> +

[...]

>   	/*
>   	 * Besides having arch support and the feature enabled at runtime, we
> @@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 *       altmap as an alternative source of memory, and we do not exactly
>   	 *       populate a single PMD.
>   	 */
> -	return mhp_memmap_on_memory() &&
> -	       size == memory_block_size_bytes() &&
> -	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
> -	       arch_supports_memmap_on_memory(size);
> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
> +		return false;
> +
> +	/*
> +	 * Make sure the vmemmap allocation is fully contained
> +	 * so that we always allocate vmemmap memory from altmap area.
> +	 */
> +	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
> +		return false;
> +
> +	/*
> +	 * start pfn should be pageblock_nr_pages aligned for correctly
> +	 * setting migrate types
> +	 */
> +	if (!pageblock_aligned(memmap_pages))
> +		return false;
> +
> +	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
> +		/* No effective hotplugged memory doesn't make sense. */
> +		return false;
> +
> +	return arch_supports_memmap_on_memory(size);
>   }
>   
>   /*
> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>   {
>   	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>   	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
> -	struct vmem_altmap mhp_altmap = {};
> +	struct vmem_altmap mhp_altmap = {
> +		.base_pfn =  PHYS_PFN(res->start),
> +		.end_pfn  =  PHYS_PFN(res->end),

Is it required to set .end_pfn, and if so, shouldn't we also set it to 
base_pfn + memory_block_memmap_on_memory_pages()) ?

We also don't set it on the try_remove_memory() part,.



With these things addressed, feel free to add

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-25 18:06     ` David Hildenbrand
@ 2023-07-26  4:25       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-26  4:25 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

David Hildenbrand <david@redhat.com> writes:

> On 25.07.23 12:02, Aneesh Kumar K.V wrote:
>> Currently, memmap_on_memory feature is only supported with memory block
>> sizes that result in vmemmap pages covering full page blocks. This is
>> because memory onlining/offlining code requires applicable ranges to be
>> pageblock-aligned, for example, to set the migratetypes properly.
>> 
>> This patch helps to lift that restriction by reserving more pages than
>> required for vmemmap space. This helps the start address to be page
>> block aligned with different memory block sizes. Using this facility
>> implies the kernel will be reserving some pages for every memoryblock.
>> This allows the memmap on memory feature to be widely useful with
>> different memory block size values.
>> 
>> For ex: with 64K page size and 256MiB memory block size, we require 4
>> pages to map vmemmap pages, To align things correctly we end up adding a
>> reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   .../admin-guide/mm/memory-hotplug.rst         |  12 ++
>>   mm/memory_hotplug.c                           | 121 ++++++++++++++++--
>>   2 files changed, 119 insertions(+), 14 deletions(-)
>> 
>> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
>> index bd77841041af..2994958c7ce8 100644
>> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
>> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
>> @@ -433,6 +433,18 @@ The following module parameters are currently defined:
>>   				 memory in a way that huge pages in bigger
>>   				 granularity cannot be formed on hotplugged
>>   				 memory.
>> +
>> +				 With value "force" it could result in memory
>> +				 wastage due to memmap size limitations. For
>> +				 example, if the memmap for a memory block
>> +				 requires 1 MiB, but the pageblock size is 2
>> +				 MiB, 1 MiB of hotplugged memory will be wasted.
>> +				 Note that there are still cases where the
>> +				 feature cannot be enforced: for example, if the
>> +				 memmap is smaller than a single page, or if the
>> +				 architecture does not support the forced mode
>> +				 in all configurations.
>> +
>>   ``online_policy``		 read-write: Set the basic policy used for
>>   				 automatic zone selection when onlining memory
>>   				 blocks without specifying a target zone.
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 457824a6ecb8..5b472e137898 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -41,17 +41,89 @@
>>   #include "internal.h"
>>   #include "shuffle.h"
>>   
>> +enum {
>> +	MEMMAP_ON_MEMORY_DISABLE = 0,
>> +	MEMMAP_ON_MEMORY_ENABLE,
>> +	MEMMAP_ON_MEMORY_FORCE,
>> +};
>> +
>> +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
>> +
>> +static inline unsigned long memory_block_memmap_pages(void)
>> +{
>> +	unsigned long memmap_size;
>> +
>> +	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
>> +	return memmap_size >> PAGE_SHIFT;
>
> I'd really move a !page variant (memory_block_memmap_size()) to the 
> previous patch and use it in mhp_supports_memmap_on_memory() and 
> arch_supports_memmap_on_memory().
>
> Then, in this patch, reuse that function in 
> memory_block_memmap_on_memory_pages() and ...
>
>> +}
>> +
>> +static inline unsigned long memory_block_memmap_on_memory_pages(void)
>> +{
>> +	unsigned long nr_pages = memory_block_memmap_pages();
>
> ... do here a
>
> nr_pages = PHYS_PFN(memory_block_memmap_size());
>
>
> Conceptually, it would be even cleaner to have here
>
> nr_pages = PFN_UP(memory_block_memmap_size());
>
> even though one can argue that mhp_supports_memmap_on_memory() will make 
> sure that the unaligned value (memory_block_memmap_size()) covers full 
> pages, but at least to me it looks cleaner that way. No strong opinion.
>
>
>> +
>> +	/*
>> +	 * In "forced" memmap_on_memory mode, we add extra pages to align the
>> +	 * vmemmap size to cover full pageblocks. That way, we can add memory
>> +	 * even if the vmemmap size is not properly aligned, however, we might waste
>> +	 * memory.
>> +	 */
>> +	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
>> +		return pageblock_align(nr_pages);
>> +	return nr_pages;
>> +}
>> +
>>   #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>>   /*
>>    * memory_hotplug.memmap_on_memory parameter
>>    */
>> -static bool memmap_on_memory __ro_after_init;
>> -module_param(memmap_on_memory, bool, 0444);
>> -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
>> +static int set_memmap_mode(const char *val, const struct kernel_param *kp)
>> +{
>> +	int ret, mode;
>> +	bool enabled;
>> +
>> +	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
>> +		mode =  MEMMAP_ON_MEMORY_FORCE;
>> +		goto matched;
>> +	}
>
> Avoid the goto + label
>
> } else {
> 	ret = kstrtobool(val, &enabled);
> 	...
> }
>
> *((int *)kp->arg) =  mode;
>
>> +
>> +	ret = kstrtobool(val, &enabled);
>> +	if (ret < 0)
>> +		return ret;
>> +	if (enabled)
>> +		mode =  MEMMAP_ON_MEMORY_ENABLE;
>> +	else
>> +		mode =  MEMMAP_ON_MEMORY_DISABLE;
>> +
>> +matched:
>> +	*((int *)kp->arg) =  mode;
>> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
>> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
>> +
>> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
>> +			memmap_pages - memory_block_memmap_pages());
>
> pr_info_once() ?
>
>> +	}
>> +	return 0;
>> +}
>> +
>
> [...]
>
>>   	/*
>>   	 * Besides having arch support and the feature enabled at runtime, we
>> @@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>   	 *       altmap as an alternative source of memory, and we do not exactly
>>   	 *       populate a single PMD.
>>   	 */
>> -	return mhp_memmap_on_memory() &&
>> -	       size == memory_block_size_bytes() &&
>> -	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
>> -	       arch_supports_memmap_on_memory(size);
>> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>> +		return false;
>> +
>> +	/*
>> +	 * Make sure the vmemmap allocation is fully contained
>> +	 * so that we always allocate vmemmap memory from altmap area.
>> +	 */
>> +	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
>> +		return false;
>> +
>> +	/*
>> +	 * start pfn should be pageblock_nr_pages aligned for correctly
>> +	 * setting migrate types
>> +	 */
>> +	if (!pageblock_aligned(memmap_pages))
>> +		return false;
>> +
>> +	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
>> +		/* No effective hotplugged memory doesn't make sense. */
>> +		return false;
>> +
>> +	return arch_supports_memmap_on_memory(size);
>>   }
>>   
>>   /*
>> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>   {
>>   	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>   	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>> -	struct vmem_altmap mhp_altmap = {};
>> +	struct vmem_altmap mhp_altmap = {
>> +		.base_pfn =  PHYS_PFN(res->start),
>> +		.end_pfn  =  PHYS_PFN(res->end),
>
> Is it required to set .end_pfn, and if so, shouldn't we also set it to 
> base_pfn + memory_block_memmap_on_memory_pages()) ?
>

We use that in ppc64 for checking altmap boundary condition. As we
discussed earlier, ppc64 due to vmemmap mapping size restrictions can't
always allocate vmemmap pages from altmap area even if requested. We
fallback to regular memory alocation in that case (only used now with
pmem). We use altmap.end_pfn for that boundary check. You can refer to
altmap_cross_boundary() for more details.

>
> We also don't set it on the try_remove_memory() part,.
>

We use altmap.end_pfn only in the allocation path. This confusion is
also another motivation to embed altmap in memory block structure as
done in patch 6.

>
>
> With these things addressed, feel free to add
>
> Acked-by: David Hildenbrand <david@redhat.com>
>

Thanks for all your help with review. Attaching the diff against
previous version taking care of the feedback above.

diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 7e46369d57c9..2209d66a034e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -176,11 +176,8 @@ bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
  * alignment requirement is met using altmap->reserve blocks.
  */
 #define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
-static inline bool arch_supports_memmap_on_memory(unsigned long size)
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 {
-	unsigned long nr_pages = size >> PAGE_SHIFT;
-	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
-
 	if (!radix_enabled())
 		return false;
 	/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1c9069cc0263..cadf4209d7c4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -49,17 +49,14 @@ enum {
 
 static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
 
-static inline unsigned long memory_block_memmap_pages(void)
+static inline unsigned long memory_block_memmap_size(void)
 {
-	unsigned long memmap_size;
-
-	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
-	return memmap_size >> PAGE_SHIFT;
+	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
 }
 
 static inline unsigned long memory_block_memmap_on_memory_pages(void)
 {
-	unsigned long nr_pages = memory_block_memmap_pages();
+	unsigned long nr_pages = PFN_UP(memory_block_memmap_size());
 
 	/*
 	 * In "forced" memmap_on_memory mode, we add extra pages to align the
@@ -83,18 +80,15 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
 
 	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
 		mode =  MEMMAP_ON_MEMORY_FORCE;
-		goto matched;
+	} else {
+		ret = kstrtobool(val, &enabled);
+		if (ret < 0)
+			return ret;
+		if (enabled)
+			mode = MEMMAP_ON_MEMORY_ENABLE;
+		else
+			mode = MEMMAP_ON_MEMORY_DISABLE;
 	}
-
-	ret = kstrtobool(val, &enabled);
-	if (ret < 0)
-		return ret;
-	if (enabled)
-		mode =  MEMMAP_ON_MEMORY_ENABLE;
-	else
-		mode =  MEMMAP_ON_MEMORY_DISABLE;
-
-matched:
 	/*
 	 * Avoid changing memmap mode during hotplug.
 	 */
@@ -104,8 +98,8 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
 	if (mode == MEMMAP_ON_MEMORY_FORCE) {
 		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
-		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
-			memmap_pages - memory_block_memmap_pages());
+		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
+			     memmap_pages - PFN_UP(memory_block_memmap_size()));
 	}
 	return 0;
 }
@@ -1327,11 +1321,8 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 }
 
 #ifndef arch_supports_memmap_on_memory
-static inline bool arch_supports_memmap_on_memory(unsigned long size)
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 {
-	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
-	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
-
 	/*
 	 * As default, we want the vmemmap to span a complete PMD such that we
 	 * can map the vmemmap using a single PMD if supported by the
@@ -1343,8 +1334,7 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
 
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
-	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
-	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+	unsigned long vmemmap_size = memory_block_memmap_size();
 	unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
 	/*
@@ -1394,7 +1384,7 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 		/* No effective hotplugged memory doesn't make sense. */
 		return false;
 
-	return arch_supports_memmap_on_memory(size);
+	return arch_supports_memmap_on_memory(vmemmap_size);
 }
 
 /*



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-26  4:25       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-26  4:25 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

David Hildenbrand <david@redhat.com> writes:

> On 25.07.23 12:02, Aneesh Kumar K.V wrote:
>> Currently, memmap_on_memory feature is only supported with memory block
>> sizes that result in vmemmap pages covering full page blocks. This is
>> because memory onlining/offlining code requires applicable ranges to be
>> pageblock-aligned, for example, to set the migratetypes properly.
>> 
>> This patch helps to lift that restriction by reserving more pages than
>> required for vmemmap space. This helps the start address to be page
>> block aligned with different memory block sizes. Using this facility
>> implies the kernel will be reserving some pages for every memoryblock.
>> This allows the memmap on memory feature to be widely useful with
>> different memory block size values.
>> 
>> For ex: with 64K page size and 256MiB memory block size, we require 4
>> pages to map vmemmap pages, To align things correctly we end up adding a
>> reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   .../admin-guide/mm/memory-hotplug.rst         |  12 ++
>>   mm/memory_hotplug.c                           | 121 ++++++++++++++++--
>>   2 files changed, 119 insertions(+), 14 deletions(-)
>> 
>> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
>> index bd77841041af..2994958c7ce8 100644
>> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
>> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
>> @@ -433,6 +433,18 @@ The following module parameters are currently defined:
>>   				 memory in a way that huge pages in bigger
>>   				 granularity cannot be formed on hotplugged
>>   				 memory.
>> +
>> +				 With value "force" it could result in memory
>> +				 wastage due to memmap size limitations. For
>> +				 example, if the memmap for a memory block
>> +				 requires 1 MiB, but the pageblock size is 2
>> +				 MiB, 1 MiB of hotplugged memory will be wasted.
>> +				 Note that there are still cases where the
>> +				 feature cannot be enforced: for example, if the
>> +				 memmap is smaller than a single page, or if the
>> +				 architecture does not support the forced mode
>> +				 in all configurations.
>> +
>>   ``online_policy``		 read-write: Set the basic policy used for
>>   				 automatic zone selection when onlining memory
>>   				 blocks without specifying a target zone.
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 457824a6ecb8..5b472e137898 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -41,17 +41,89 @@
>>   #include "internal.h"
>>   #include "shuffle.h"
>>   
>> +enum {
>> +	MEMMAP_ON_MEMORY_DISABLE = 0,
>> +	MEMMAP_ON_MEMORY_ENABLE,
>> +	MEMMAP_ON_MEMORY_FORCE,
>> +};
>> +
>> +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
>> +
>> +static inline unsigned long memory_block_memmap_pages(void)
>> +{
>> +	unsigned long memmap_size;
>> +
>> +	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
>> +	return memmap_size >> PAGE_SHIFT;
>
> I'd really move a !page variant (memory_block_memmap_size()) to the 
> previous patch and use it in mhp_supports_memmap_on_memory() and 
> arch_supports_memmap_on_memory().
>
> Then, in this patch, reuse that function in 
> memory_block_memmap_on_memory_pages() and ...
>
>> +}
>> +
>> +static inline unsigned long memory_block_memmap_on_memory_pages(void)
>> +{
>> +	unsigned long nr_pages = memory_block_memmap_pages();
>
> ... do here a
>
> nr_pages = PHYS_PFN(memory_block_memmap_size());
>
>
> Conceptually, it would be even cleaner to have here
>
> nr_pages = PFN_UP(memory_block_memmap_size());
>
> even though one can argue that mhp_supports_memmap_on_memory() will make 
> sure that the unaligned value (memory_block_memmap_size()) covers full 
> pages, but at least to me it looks cleaner that way. No strong opinion.
>
>
>> +
>> +	/*
>> +	 * In "forced" memmap_on_memory mode, we add extra pages to align the
>> +	 * vmemmap size to cover full pageblocks. That way, we can add memory
>> +	 * even if the vmemmap size is not properly aligned, however, we might waste
>> +	 * memory.
>> +	 */
>> +	if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
>> +		return pageblock_align(nr_pages);
>> +	return nr_pages;
>> +}
>> +
>>   #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
>>   /*
>>    * memory_hotplug.memmap_on_memory parameter
>>    */
>> -static bool memmap_on_memory __ro_after_init;
>> -module_param(memmap_on_memory, bool, 0444);
>> -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
>> +static int set_memmap_mode(const char *val, const struct kernel_param *kp)
>> +{
>> +	int ret, mode;
>> +	bool enabled;
>> +
>> +	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
>> +		mode =  MEMMAP_ON_MEMORY_FORCE;
>> +		goto matched;
>> +	}
>
> Avoid the goto + label
>
> } else {
> 	ret = kstrtobool(val, &enabled);
> 	...
> }
>
> *((int *)kp->arg) =  mode;
>
>> +
>> +	ret = kstrtobool(val, &enabled);
>> +	if (ret < 0)
>> +		return ret;
>> +	if (enabled)
>> +		mode =  MEMMAP_ON_MEMORY_ENABLE;
>> +	else
>> +		mode =  MEMMAP_ON_MEMORY_DISABLE;
>> +
>> +matched:
>> +	*((int *)kp->arg) =  mode;
>> +	if (mode == MEMMAP_ON_MEMORY_FORCE) {
>> +		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
>> +
>> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
>> +			memmap_pages - memory_block_memmap_pages());
>
> pr_info_once() ?
>
>> +	}
>> +	return 0;
>> +}
>> +
>
> [...]
>
>>   	/*
>>   	 * Besides having arch support and the feature enabled at runtime, we
>> @@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>   	 *       altmap as an alternative source of memory, and we do not exactly
>>   	 *       populate a single PMD.
>>   	 */
>> -	return mhp_memmap_on_memory() &&
>> -	       size == memory_block_size_bytes() &&
>> -	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
>> -	       arch_supports_memmap_on_memory(size);
>> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>> +		return false;
>> +
>> +	/*
>> +	 * Make sure the vmemmap allocation is fully contained
>> +	 * so that we always allocate vmemmap memory from altmap area.
>> +	 */
>> +	if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
>> +		return false;
>> +
>> +	/*
>> +	 * start pfn should be pageblock_nr_pages aligned for correctly
>> +	 * setting migrate types
>> +	 */
>> +	if (!pageblock_aligned(memmap_pages))
>> +		return false;
>> +
>> +	if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
>> +		/* No effective hotplugged memory doesn't make sense. */
>> +		return false;
>> +
>> +	return arch_supports_memmap_on_memory(size);
>>   }
>>   
>>   /*
>> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>   {
>>   	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>   	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>> -	struct vmem_altmap mhp_altmap = {};
>> +	struct vmem_altmap mhp_altmap = {
>> +		.base_pfn =  PHYS_PFN(res->start),
>> +		.end_pfn  =  PHYS_PFN(res->end),
>
> Is it required to set .end_pfn, and if so, shouldn't we also set it to 
> base_pfn + memory_block_memmap_on_memory_pages()) ?
>

We use that in ppc64 for checking altmap boundary condition. As we
discussed earlier, ppc64 due to vmemmap mapping size restrictions can't
always allocate vmemmap pages from altmap area even if requested. We
fallback to regular memory alocation in that case (only used now with
pmem). We use altmap.end_pfn for that boundary check. You can refer to
altmap_cross_boundary() for more details.

>
> We also don't set it on the try_remove_memory() part,.
>

We use altmap.end_pfn only in the allocation path. This confusion is
also another motivation to embed altmap in memory block structure as
done in patch 6.

>
>
> With these things addressed, feel free to add
>
> Acked-by: David Hildenbrand <david@redhat.com>
>

Thanks for all your help with review. Attaching the diff against
previous version taking care of the feedback above.

diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 7e46369d57c9..2209d66a034e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -176,11 +176,8 @@ bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
  * alignment requirement is met using altmap->reserve blocks.
  */
 #define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
-static inline bool arch_supports_memmap_on_memory(unsigned long size)
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 {
-	unsigned long nr_pages = size >> PAGE_SHIFT;
-	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
-
 	if (!radix_enabled())
 		return false;
 	/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1c9069cc0263..cadf4209d7c4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -49,17 +49,14 @@ enum {
 
 static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
 
-static inline unsigned long memory_block_memmap_pages(void)
+static inline unsigned long memory_block_memmap_size(void)
 {
-	unsigned long memmap_size;
-
-	memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
-	return memmap_size >> PAGE_SHIFT;
+	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
 }
 
 static inline unsigned long memory_block_memmap_on_memory_pages(void)
 {
-	unsigned long nr_pages = memory_block_memmap_pages();
+	unsigned long nr_pages = PFN_UP(memory_block_memmap_size());
 
 	/*
 	 * In "forced" memmap_on_memory mode, we add extra pages to align the
@@ -83,18 +80,15 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
 
 	if (sysfs_streq(val, "force") ||  sysfs_streq(val, "FORCE")) {
 		mode =  MEMMAP_ON_MEMORY_FORCE;
-		goto matched;
+	} else {
+		ret = kstrtobool(val, &enabled);
+		if (ret < 0)
+			return ret;
+		if (enabled)
+			mode = MEMMAP_ON_MEMORY_ENABLE;
+		else
+			mode = MEMMAP_ON_MEMORY_DISABLE;
 	}
-
-	ret = kstrtobool(val, &enabled);
-	if (ret < 0)
-		return ret;
-	if (enabled)
-		mode =  MEMMAP_ON_MEMORY_ENABLE;
-	else
-		mode =  MEMMAP_ON_MEMORY_DISABLE;
-
-matched:
 	/*
 	 * Avoid changing memmap mode during hotplug.
 	 */
@@ -104,8 +98,8 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
 	if (mode == MEMMAP_ON_MEMORY_FORCE) {
 		unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
-		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
-			memmap_pages - memory_block_memmap_pages());
+		pr_info_once("Memory hotplug will reserve %ld pages in each memory block\n",
+			     memmap_pages - PFN_UP(memory_block_memmap_size()));
 	}
 	return 0;
 }
@@ -1327,11 +1321,8 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 }
 
 #ifndef arch_supports_memmap_on_memory
-static inline bool arch_supports_memmap_on_memory(unsigned long size)
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 {
-	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
-	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
-
 	/*
 	 * As default, we want the vmemmap to span a complete PMD such that we
 	 * can map the vmemmap using a single PMD if supported by the
@@ -1343,8 +1334,7 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
 
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
-	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
-	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+	unsigned long vmemmap_size = memory_block_memmap_size();
 	unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
 
 	/*
@@ -1394,7 +1384,7 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 		/* No effective hotplugged memory doesn't make sense. */
 		return false;
 
-	return arch_supports_memmap_on_memory(size);
+	return arch_supports_memmap_on_memory(vmemmap_size);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-26  4:25       ` Aneesh Kumar K.V
@ 2023-07-26  9:04         ` David Hildenbrand
  -1 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26  9:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma


>>>    /*
>>> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>>    {
>>>    	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>>    	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>>> -	struct vmem_altmap mhp_altmap = {};
>>> +	struct vmem_altmap mhp_altmap = {
>>> +		.base_pfn =  PHYS_PFN(res->start),
>>> +		.end_pfn  =  PHYS_PFN(res->end),
>>
>> Is it required to set .end_pfn, and if so, shouldn't we also set it to
>> base_pfn + memory_block_memmap_on_memory_pages()) ?
>>
> 
> We use that in ppc64 for checking altmap boundary condition. As we
> discussed earlier, ppc64 due to vmemmap mapping size restrictions can't
> always allocate vmemmap pages from altmap area even if requested. We
> fallback to regular memory alocation in that case (only used now with
> pmem). We use altmap.end_pfn for that boundary check. You can refer to
> altmap_cross_boundary() for more details.

But even then, setting the end to the end of the resource size is wrong, 
no? We don't want anybody to allocate beyond base_pfn + 
memory_block_memmap_on_memory_pages().


Apart from that, LGTM.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-26  9:04         ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26  9:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador


>>>    /*
>>> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>>    {
>>>    	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>>    	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>>> -	struct vmem_altmap mhp_altmap = {};
>>> +	struct vmem_altmap mhp_altmap = {
>>> +		.base_pfn =  PHYS_PFN(res->start),
>>> +		.end_pfn  =  PHYS_PFN(res->end),
>>
>> Is it required to set .end_pfn, and if so, shouldn't we also set it to
>> base_pfn + memory_block_memmap_on_memory_pages()) ?
>>
> 
> We use that in ppc64 for checking altmap boundary condition. As we
> discussed earlier, ppc64 due to vmemmap mapping size restrictions can't
> always allocate vmemmap pages from altmap area even if requested. We
> fallback to regular memory alocation in that case (only used now with
> pmem). We use altmap.end_pfn for that boundary check. You can refer to
> altmap_cross_boundary() for more details.

But even then, setting the end to the end of the resource size is wrong, 
no? We don't want anybody to allocate beyond base_pfn + 
memory_block_memmap_on_memory_pages().


Apart from that, LGTM.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-25 10:02   ` Aneesh Kumar K.V
@ 2023-07-26  9:41     ` David Hildenbrand
  -1 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26  9:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> computing them again when we remove a memory block, embed vmem_altmap
> details in struct memory_block if we are using memmap on memory block
> feature.
> 
> No functional change in this patch
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---

[...]

>   
>   static int add_memory_block(unsigned long block_id, unsigned long state,
> -			    unsigned long nr_vmemmap_pages,
> +			    struct vmem_altmap *altmap,
>   			    struct memory_group *group)
>   {
>   	struct memory_block *mem;
> @@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>   	mem->start_section_nr = block_id * sections_per_block;
>   	mem->state = state;
>   	mem->nid = NUMA_NO_NODE;
> -	mem->nr_vmemmap_pages = nr_vmemmap_pages;
> +	if (altmap) {
> +		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
> +		if (!mem->altmap) {
> +			kfree(mem);
> +			return -ENOMEM;
> +		}
> +		memcpy(mem->altmap, altmap, sizeof(*altmap));
> +	}

I'm wondering if we should instead let the caller do the alloc/free. So we would alloc
int the caller and would only store the pointer.

Before removing the memory block, we would clear the pointer and free it in the caller.

IOW, when removing a memory block and we still have an altmap set, something would be wrong.

See below on try_remove_memory() handling.

[...]

> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
> +static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
>   {
> +	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
>   	/*
> -	 * If not set, continue with the next block.
> +	 * If we have any pages allocated from altmap
> +	 * return the altmap details and break callback.
>   	 */
> -	return mem->nr_vmemmap_pages;
> +	if (mem->altmap) {
> +		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
> +		return 1;
> +	}
> +	return 0;
>   }
>   
>   static int check_cpu_on_node(int nid)
> @@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
>   
>   static int __ref try_remove_memory(u64 start, u64 size)
>   {
> -	struct vmem_altmap mhp_altmap = {};
> -	struct vmem_altmap *altmap = NULL;
> -	unsigned long nr_vmemmap_pages;
> +	int ret;
> +	struct vmem_altmap mhp_altmap, *altmap = NULL;
>   	int rc = 0, nid = NUMA_NO_NODE;
>   
>   	BUG_ON(check_hotplug_memory_range(start, size));
> @@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   	 * the same granularity it was added - a single memory block.
>   	 */
>   	if (mhp_memmap_on_memory()) {
> -		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
> -						      get_nr_vmemmap_pages_cb);
> -		if (nr_vmemmap_pages) {
> +		ret = walk_memory_blocks(start, size, &mhp_altmap,
> +					 get_vmemmap_altmap_cb);
> +		if (ret) {
>   			if (size != memory_block_size_bytes()) {
>   				pr_warn("Refuse to remove %#llx - %#llx,"
>   					"wrong granularity\n",
>   					start, start + size);
>   				return -EINVAL;
>   			}
> -
> -			/*
> -			 * Let remove_pmd_table->free_hugepage_table do the
> -			 * right thing if we used vmem_altmap when hot-adding
> -			 * the range.
> -			 */
> -			mhp_altmap.base_pfn = PHYS_PFN(start);
> -			mhp_altmap.free = nr_vmemmap_pages;
> -			mhp_altmap.alloc = nr_vmemmap_pages;
>   			altmap = &mhp_altmap;
>   		}


Instead of that, I suggest (whitespace damage expected):

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f231cf1b410..f6860df64549 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1956,12 +1956,19 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
         return 0;
  }
  
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int test_has_altmap_cb(struct memory_block *mem, void *arg)
  {
-       /*
-        * If not set, continue with the next block.
-        */
-       return mem->nr_vmemmap_pages;
+       struct memory_block **mem_ptr = (struct memory_block **)arg;
+
+       if (mem->altmap) {
+               /*
+                * We're not taking a reference on the memory block; it
+                * it cannot vanish while we're about to that memory ourselves.
+                */
+               *mem_ptr = mem;
+               return 1;
+       }
+       return 0;
  }
  
  static int check_cpu_on_node(int nid)
@@ -2036,9 +2043,7 @@ EXPORT_SYMBOL(try_offline_node);
  
  static int __ref try_remove_memory(u64 start, u64 size)
  {
-       struct vmem_altmap mhp_altmap = {};
         struct vmem_altmap *altmap = NULL;
-       unsigned long nr_vmemmap_pages;
         int rc = 0, nid = NUMA_NO_NODE;
  
         BUG_ON(check_hotplug_memory_range(start, size));
@@ -2061,9 +2066,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
          * the same granularity it was added - a single memory block.
          */
         if (mhp_memmap_on_memory()) {
-               nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-                                                     get_nr_vmemmap_pages_cb);
-               if (nr_vmemmap_pages) {
+               struct memory_block *mem;
+
+               if (walk_memory_blocks(start, size, &mem, test_has_altmap_cb)) {
                         if (size != memory_block_size_bytes()) {
                                 pr_warn("Refuse to remove %#llx - %#llx,"
                                         "wrong granularity\n",
@@ -2072,12 +2077,11 @@ static int __ref try_remove_memory(u64 start, u64 size)
                         }
  
                         /*
-                        * Let remove_pmd_table->free_hugepage_table do the
-                        * right thing if we used vmem_altmap when hot-adding
-                        * the range.
+                        * Clear the altmap from the memory block before we
+                        * remove it; we'll take care of freeing the altmap.
                          */
-                       mhp_altmap.alloc = nr_vmemmap_pages;
-                       altmap = &mhp_altmap;
+                       altmap = mem->altmap;
+                       mem->altmap = NULL;
                 }
         }
  
@@ -2094,6 +2098,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
  
         arch_remove_memory(start, size, altmap);
  
+       if (altmap)
+               kfree(altmap);
+
         if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
                 memblock_phys_free(start, size);
                 memblock_remove(start, size);


-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-26  9:41     ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26  9:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 25.07.23 12:02, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> computing them again when we remove a memory block, embed vmem_altmap
> details in struct memory_block if we are using memmap on memory block
> feature.
> 
> No functional change in this patch
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---

[...]

>   
>   static int add_memory_block(unsigned long block_id, unsigned long state,
> -			    unsigned long nr_vmemmap_pages,
> +			    struct vmem_altmap *altmap,
>   			    struct memory_group *group)
>   {
>   	struct memory_block *mem;
> @@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>   	mem->start_section_nr = block_id * sections_per_block;
>   	mem->state = state;
>   	mem->nid = NUMA_NO_NODE;
> -	mem->nr_vmemmap_pages = nr_vmemmap_pages;
> +	if (altmap) {
> +		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
> +		if (!mem->altmap) {
> +			kfree(mem);
> +			return -ENOMEM;
> +		}
> +		memcpy(mem->altmap, altmap, sizeof(*altmap));
> +	}

I'm wondering if we should instead let the caller do the alloc/free. So we would alloc
int the caller and would only store the pointer.

Before removing the memory block, we would clear the pointer and free it in the caller.

IOW, when removing a memory block and we still have an altmap set, something would be wrong.

See below on try_remove_memory() handling.

[...]

> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
> +static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
>   {
> +	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
>   	/*
> -	 * If not set, continue with the next block.
> +	 * If we have any pages allocated from altmap
> +	 * return the altmap details and break callback.
>   	 */
> -	return mem->nr_vmemmap_pages;
> +	if (mem->altmap) {
> +		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
> +		return 1;
> +	}
> +	return 0;
>   }
>   
>   static int check_cpu_on_node(int nid)
> @@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
>   
>   static int __ref try_remove_memory(u64 start, u64 size)
>   {
> -	struct vmem_altmap mhp_altmap = {};
> -	struct vmem_altmap *altmap = NULL;
> -	unsigned long nr_vmemmap_pages;
> +	int ret;
> +	struct vmem_altmap mhp_altmap, *altmap = NULL;
>   	int rc = 0, nid = NUMA_NO_NODE;
>   
>   	BUG_ON(check_hotplug_memory_range(start, size));
> @@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   	 * the same granularity it was added - a single memory block.
>   	 */
>   	if (mhp_memmap_on_memory()) {
> -		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
> -						      get_nr_vmemmap_pages_cb);
> -		if (nr_vmemmap_pages) {
> +		ret = walk_memory_blocks(start, size, &mhp_altmap,
> +					 get_vmemmap_altmap_cb);
> +		if (ret) {
>   			if (size != memory_block_size_bytes()) {
>   				pr_warn("Refuse to remove %#llx - %#llx,"
>   					"wrong granularity\n",
>   					start, start + size);
>   				return -EINVAL;
>   			}
> -
> -			/*
> -			 * Let remove_pmd_table->free_hugepage_table do the
> -			 * right thing if we used vmem_altmap when hot-adding
> -			 * the range.
> -			 */
> -			mhp_altmap.base_pfn = PHYS_PFN(start);
> -			mhp_altmap.free = nr_vmemmap_pages;
> -			mhp_altmap.alloc = nr_vmemmap_pages;
>   			altmap = &mhp_altmap;
>   		}


Instead of that, I suggest (whitespace damage expected):

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f231cf1b410..f6860df64549 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1956,12 +1956,19 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
         return 0;
  }
  
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int test_has_altmap_cb(struct memory_block *mem, void *arg)
  {
-       /*
-        * If not set, continue with the next block.
-        */
-       return mem->nr_vmemmap_pages;
+       struct memory_block **mem_ptr = (struct memory_block **)arg;
+
+       if (mem->altmap) {
+               /*
+                * We're not taking a reference on the memory block; it
+                * it cannot vanish while we're about to that memory ourselves.
+                */
+               *mem_ptr = mem;
+               return 1;
+       }
+       return 0;
  }
  
  static int check_cpu_on_node(int nid)
@@ -2036,9 +2043,7 @@ EXPORT_SYMBOL(try_offline_node);
  
  static int __ref try_remove_memory(u64 start, u64 size)
  {
-       struct vmem_altmap mhp_altmap = {};
         struct vmem_altmap *altmap = NULL;
-       unsigned long nr_vmemmap_pages;
         int rc = 0, nid = NUMA_NO_NODE;
  
         BUG_ON(check_hotplug_memory_range(start, size));
@@ -2061,9 +2066,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
          * the same granularity it was added - a single memory block.
          */
         if (mhp_memmap_on_memory()) {
-               nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-                                                     get_nr_vmemmap_pages_cb);
-               if (nr_vmemmap_pages) {
+               struct memory_block *mem;
+
+               if (walk_memory_blocks(start, size, &mem, test_has_altmap_cb)) {
                         if (size != memory_block_size_bytes()) {
                                 pr_warn("Refuse to remove %#llx - %#llx,"
                                         "wrong granularity\n",
@@ -2072,12 +2077,11 @@ static int __ref try_remove_memory(u64 start, u64 size)
                         }
  
                         /*
-                        * Let remove_pmd_table->free_hugepage_table do the
-                        * right thing if we used vmem_altmap when hot-adding
-                        * the range.
+                        * Clear the altmap from the memory block before we
+                        * remove it; we'll take care of freeing the altmap.
                          */
-                       mhp_altmap.alloc = nr_vmemmap_pages;
-                       altmap = &mhp_altmap;
+                       altmap = mem->altmap;
+                       mem->altmap = NULL;
                 }
         }
  
@@ -2094,6 +2098,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
  
         arch_remove_memory(start, size, altmap);
  
+       if (altmap)
+               kfree(altmap);
+
         if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
                 memblock_phys_free(start, size);
                 memblock_remove(start, size);


-- 
Cheers,

David / dhildenb


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-26  9:04         ` David Hildenbrand
  (?)
@ 2023-07-26  9:57         ` Aneesh Kumar K V
  2023-07-26 16:39             ` David Hildenbrand
  -1 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K V @ 2023-07-26  9:57 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/26/23 2:34 PM, David Hildenbrand wrote:
> 
>>>>    /*
>>>> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>>>    {
>>>>        struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>>>        enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>>>> -    struct vmem_altmap mhp_altmap = {};
>>>> +    struct vmem_altmap mhp_altmap = {
>>>> +        .base_pfn =  PHYS_PFN(res->start),
>>>> +        .end_pfn  =  PHYS_PFN(res->end),
>>>
>>> Is it required to set .end_pfn, and if so, shouldn't we also set it to
>>> base_pfn + memory_block_memmap_on_memory_pages()) ?
>>>
>>
>> We use that in ppc64 for checking altmap boundary condition. As we
>> discussed earlier, ppc64 due to vmemmap mapping size restrictions can't
>> always allocate vmemmap pages from altmap area even if requested. We
>> fallback to regular memory alocation in that case (only used now with
>> pmem). We use altmap.end_pfn for that boundary check. You can refer to
>> altmap_cross_boundary() for more details.
> 
> But even then, setting the end to the end of the resource size is wrong, no? We don't want anybody to allocate beyond base_pfn + memory_block_memmap_on_memory_pages().
> 

altmap.end is the end pfn of the resource

__nvdimm_setup_pfn()
...
        resource_size_t end = nsio->res.end - end_trunc;
	struct vmem_altmap __altmap = {
		.base_pfn = init_altmap_base(base),
		.reserve = init_altmap_reserve(base),
		.end_pfn = PHYS_PFN(end),
	};


And we use it to find that the page_to_pfn mapping we use by allocating a page from altmap doesn't point to a pfn that is outside the resource range.

-aneesh

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-26  9:41     ` David Hildenbrand
@ 2023-07-26 10:31       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-26 10:31 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

David Hildenbrand <david@redhat.com> writes:

> On 25.07.23 12:02, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>> computing them again when we remove a memory block, embed vmem_altmap
>> details in struct memory_block if we are using memmap on memory block
>> feature.
>> 
>> No functional change in this patch
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>
> [...]
>
>>   
>>   static int add_memory_block(unsigned long block_id, unsigned long state,
>> -			    unsigned long nr_vmemmap_pages,
>> +			    struct vmem_altmap *altmap,
>>   			    struct memory_group *group)
>>   {
>>   	struct memory_block *mem;
>> @@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>>   	mem->start_section_nr = block_id * sections_per_block;
>>   	mem->state = state;
>>   	mem->nid = NUMA_NO_NODE;
>> -	mem->nr_vmemmap_pages = nr_vmemmap_pages;
>> +	if (altmap) {
>> +		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
>> +		if (!mem->altmap) {
>> +			kfree(mem);
>> +			return -ENOMEM;
>> +		}
>> +		memcpy(mem->altmap, altmap, sizeof(*altmap));
>> +	}
>
> I'm wondering if we should instead let the caller do the alloc/free. So we would alloc
> int the caller and would only store the pointer.
>
> Before removing the memory block, we would clear the pointer and free it in the caller.
>
> IOW, when removing a memory block and we still have an altmap set, something would be wrong.
>
> See below on try_remove_memory() handling.
>
> [...]
>
>> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
>> +static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
>>   {
>> +	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
>>   	/*
>> -	 * If not set, continue with the next block.
>> +	 * If we have any pages allocated from altmap
>> +	 * return the altmap details and break callback.
>>   	 */
>> -	return mem->nr_vmemmap_pages;
>> +	if (mem->altmap) {
>> +		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
>> +		return 1;
>> +	}
>> +	return 0;
>>   }
>>   
>>   static int check_cpu_on_node(int nid)
>> @@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
>>   
>>   static int __ref try_remove_memory(u64 start, u64 size)
>>   {
>> -	struct vmem_altmap mhp_altmap = {};
>> -	struct vmem_altmap *altmap = NULL;
>> -	unsigned long nr_vmemmap_pages;
>> +	int ret;
>> +	struct vmem_altmap mhp_altmap, *altmap = NULL;
>>   	int rc = 0, nid = NUMA_NO_NODE;
>>   
>>   	BUG_ON(check_hotplug_memory_range(start, size));
>> @@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>   	 * the same granularity it was added - a single memory block.
>>   	 */
>>   	if (mhp_memmap_on_memory()) {
>> -		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>> -						      get_nr_vmemmap_pages_cb);
>> -		if (nr_vmemmap_pages) {
>> +		ret = walk_memory_blocks(start, size, &mhp_altmap,
>> +					 get_vmemmap_altmap_cb);
>> +		if (ret) {
>>   			if (size != memory_block_size_bytes()) {
>>   				pr_warn("Refuse to remove %#llx - %#llx,"
>>   					"wrong granularity\n",
>>   					start, start + size);
>>   				return -EINVAL;
>>   			}
>> -
>> -			/*
>> -			 * Let remove_pmd_table->free_hugepage_table do the
>> -			 * right thing if we used vmem_altmap when hot-adding
>> -			 * the range.
>> -			 */
>> -			mhp_altmap.base_pfn = PHYS_PFN(start);
>> -			mhp_altmap.free = nr_vmemmap_pages;
>> -			mhp_altmap.alloc = nr_vmemmap_pages;
>>   			altmap = &mhp_altmap;
>>   		}
>
>
> Instead of that, I suggest (whitespace damage expected):
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3f231cf1b410..f6860df64549 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1956,12 +1956,19 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
>          return 0;
>   }
>   
> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
> +static int test_has_altmap_cb(struct memory_block *mem, void *arg)
>   {
> -       /*
> -        * If not set, continue with the next block.
> -        */
> -       return mem->nr_vmemmap_pages;
> +       struct memory_block **mem_ptr = (struct memory_block **)arg;
> +
> +       if (mem->altmap) {
> +               /*
> +                * We're not taking a reference on the memory block; it
> +                * it cannot vanish while we're about to that memory ourselves.
> +                */
> +               *mem_ptr = mem;
> +               return 1;
> +       }
> +       return 0;
>   }
>   
>   static int check_cpu_on_node(int nid)
> @@ -2036,9 +2043,7 @@ EXPORT_SYMBOL(try_offline_node);
>   
>   static int __ref try_remove_memory(u64 start, u64 size)
>   {
> -       struct vmem_altmap mhp_altmap = {};
>          struct vmem_altmap *altmap = NULL;
> -       unsigned long nr_vmemmap_pages;
>          int rc = 0, nid = NUMA_NO_NODE;
>   
>          BUG_ON(check_hotplug_memory_range(start, size));
> @@ -2061,9 +2066,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>           * the same granularity it was added - a single memory block.
>           */
>          if (mhp_memmap_on_memory()) {
> -               nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
> -                                                     get_nr_vmemmap_pages_cb);
> -               if (nr_vmemmap_pages) {
> +               struct memory_block *mem;
> +
> +               if (walk_memory_blocks(start, size, &mem, test_has_altmap_cb)) {
>                          if (size != memory_block_size_bytes()) {
>                                  pr_warn("Refuse to remove %#llx - %#llx,"
>                                          "wrong granularity\n",
> @@ -2072,12 +2077,11 @@ static int __ref try_remove_memory(u64 start, u64 size)
>                          }
>   
>                          /*
> -                        * Let remove_pmd_table->free_hugepage_table do the
> -                        * right thing if we used vmem_altmap when hot-adding
> -                        * the range.
> +                        * Clear the altmap from the memory block before we
> +                        * remove it; we'll take care of freeing the altmap.
>                           */
> -                       mhp_altmap.alloc = nr_vmemmap_pages;
> -                       altmap = &mhp_altmap;
> +                       altmap = mem->altmap;
> +                       mem->altmap = NULL;
>                  }
>          }
>   
> @@ -2094,6 +2098,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   
>          arch_remove_memory(start, size, altmap);
>   
> +       if (altmap)
> +               kfree(altmap);
> +
>          if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>                  memblock_phys_free(start, size);
>                  memblock_remove(start, size);
>

Is this any better. Any specific reason we want the alloc and free in
the caller? 

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0210ed7b7696..271cfdf8f6b6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,7 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
-	kfree(mem->altmap);
+	WARN_ON(mem->altmap);
 	kfree(mem);
 }
 
@@ -751,14 +751,8 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	if (altmap) {
-		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
-		if (!mem->altmap) {
-			kfree(mem);
-			return -ENOMEM;
-		}
-		memcpy(mem->altmap, altmap, sizeof(*altmap));
-	}
+	if (altmap)
+		mem->altmap = altmap;
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2bad1bf0e9e3..1c7d88332e0e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1445,8 +1445,13 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
+
 			mhp_altmap.free = memory_block_memmap_on_memory_pages();
-			params.altmap = &mhp_altmap;
+			params.altmap = kzalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+			if (!params.altmap)
+				goto error;
+
+			memcpy(params.altmap, &mhp_altmap, sizeof(mhp_altmap));
 		}
 		/* fallback to not using altmap  */
 	}
@@ -2067,13 +2072,14 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 
 static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
-	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
+	struct vmem_altmap **altmap = (struct vmem_altmap **)arg;
 	/*
 	 * If we have any pages allocated from altmap
 	 * return the altmap details and break callback.
 	 */
 	if (mem->altmap) {
-		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
+		*altmap = mem->altmap;
+		mem->altmap = NULL;
 		return 1;
 	}
 	return 0;
@@ -2152,7 +2158,7 @@ EXPORT_SYMBOL(try_offline_node);
 static int __ref try_remove_memory(u64 start, u64 size)
 {
 	int ret;
-	struct vmem_altmap mhp_altmap, *altmap = NULL;
+	struct vmem_altmap *altmap = NULL;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2174,7 +2180,7 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
 	 * the same granularity it was added - a single memory block.
 	 */
-	ret = walk_memory_blocks(start, size, &mhp_altmap,
+	ret = walk_memory_blocks(start, size, &altmap,
 				 get_vmemmap_altmap_cb);
 	if (ret) {
 		if (size != memory_block_size_bytes()) {
@@ -2183,7 +2189,6 @@ static int __ref try_remove_memory(u64 start, u64 size)
 				start, start + size);
 			return -EINVAL;
 		}
-		altmap = &mhp_altmap;
 	}
 
 	/* remove memmap entry */
@@ -2203,8 +2208,10 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * Now that we are tracking alloc and free correctly
 	 * we can add check to verify altmap free pages.
 	 */
-	if (altmap)
+	if (altmap) {
 		WARN(altmap->alloc, "Altmap not fully unmapped");
+		kfree(altmap);
+	}
 
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
 		memblock_phys_free(start, size);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-26 10:31       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-26 10:31 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

David Hildenbrand <david@redhat.com> writes:

> On 25.07.23 12:02, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>> computing them again when we remove a memory block, embed vmem_altmap
>> details in struct memory_block if we are using memmap on memory block
>> feature.
>> 
>> No functional change in this patch
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>
> [...]
>
>>   
>>   static int add_memory_block(unsigned long block_id, unsigned long state,
>> -			    unsigned long nr_vmemmap_pages,
>> +			    struct vmem_altmap *altmap,
>>   			    struct memory_group *group)
>>   {
>>   	struct memory_block *mem;
>> @@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>>   	mem->start_section_nr = block_id * sections_per_block;
>>   	mem->state = state;
>>   	mem->nid = NUMA_NO_NODE;
>> -	mem->nr_vmemmap_pages = nr_vmemmap_pages;
>> +	if (altmap) {
>> +		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
>> +		if (!mem->altmap) {
>> +			kfree(mem);
>> +			return -ENOMEM;
>> +		}
>> +		memcpy(mem->altmap, altmap, sizeof(*altmap));
>> +	}
>
> I'm wondering if we should instead let the caller do the alloc/free. So we would alloc
> int the caller and would only store the pointer.
>
> Before removing the memory block, we would clear the pointer and free it in the caller.
>
> IOW, when removing a memory block and we still have an altmap set, something would be wrong.
>
> See below on try_remove_memory() handling.
>
> [...]
>
>> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
>> +static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
>>   {
>> +	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
>>   	/*
>> -	 * If not set, continue with the next block.
>> +	 * If we have any pages allocated from altmap
>> +	 * return the altmap details and break callback.
>>   	 */
>> -	return mem->nr_vmemmap_pages;
>> +	if (mem->altmap) {
>> +		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
>> +		return 1;
>> +	}
>> +	return 0;
>>   }
>>   
>>   static int check_cpu_on_node(int nid)
>> @@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
>>   
>>   static int __ref try_remove_memory(u64 start, u64 size)
>>   {
>> -	struct vmem_altmap mhp_altmap = {};
>> -	struct vmem_altmap *altmap = NULL;
>> -	unsigned long nr_vmemmap_pages;
>> +	int ret;
>> +	struct vmem_altmap mhp_altmap, *altmap = NULL;
>>   	int rc = 0, nid = NUMA_NO_NODE;
>>   
>>   	BUG_ON(check_hotplug_memory_range(start, size));
>> @@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>   	 * the same granularity it was added - a single memory block.
>>   	 */
>>   	if (mhp_memmap_on_memory()) {
>> -		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>> -						      get_nr_vmemmap_pages_cb);
>> -		if (nr_vmemmap_pages) {
>> +		ret = walk_memory_blocks(start, size, &mhp_altmap,
>> +					 get_vmemmap_altmap_cb);
>> +		if (ret) {
>>   			if (size != memory_block_size_bytes()) {
>>   				pr_warn("Refuse to remove %#llx - %#llx,"
>>   					"wrong granularity\n",
>>   					start, start + size);
>>   				return -EINVAL;
>>   			}
>> -
>> -			/*
>> -			 * Let remove_pmd_table->free_hugepage_table do the
>> -			 * right thing if we used vmem_altmap when hot-adding
>> -			 * the range.
>> -			 */
>> -			mhp_altmap.base_pfn = PHYS_PFN(start);
>> -			mhp_altmap.free = nr_vmemmap_pages;
>> -			mhp_altmap.alloc = nr_vmemmap_pages;
>>   			altmap = &mhp_altmap;
>>   		}
>
>
> Instead of that, I suggest (whitespace damage expected):
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3f231cf1b410..f6860df64549 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1956,12 +1956,19 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
>          return 0;
>   }
>   
> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
> +static int test_has_altmap_cb(struct memory_block *mem, void *arg)
>   {
> -       /*
> -        * If not set, continue with the next block.
> -        */
> -       return mem->nr_vmemmap_pages;
> +       struct memory_block **mem_ptr = (struct memory_block **)arg;
> +
> +       if (mem->altmap) {
> +               /*
> +                * We're not taking a reference on the memory block; it
> +                * it cannot vanish while we're about to that memory ourselves.
> +                */
> +               *mem_ptr = mem;
> +               return 1;
> +       }
> +       return 0;
>   }
>   
>   static int check_cpu_on_node(int nid)
> @@ -2036,9 +2043,7 @@ EXPORT_SYMBOL(try_offline_node);
>   
>   static int __ref try_remove_memory(u64 start, u64 size)
>   {
> -       struct vmem_altmap mhp_altmap = {};
>          struct vmem_altmap *altmap = NULL;
> -       unsigned long nr_vmemmap_pages;
>          int rc = 0, nid = NUMA_NO_NODE;
>   
>          BUG_ON(check_hotplug_memory_range(start, size));
> @@ -2061,9 +2066,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>           * the same granularity it was added - a single memory block.
>           */
>          if (mhp_memmap_on_memory()) {
> -               nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
> -                                                     get_nr_vmemmap_pages_cb);
> -               if (nr_vmemmap_pages) {
> +               struct memory_block *mem;
> +
> +               if (walk_memory_blocks(start, size, &mem, test_has_altmap_cb)) {
>                          if (size != memory_block_size_bytes()) {
>                                  pr_warn("Refuse to remove %#llx - %#llx,"
>                                          "wrong granularity\n",
> @@ -2072,12 +2077,11 @@ static int __ref try_remove_memory(u64 start, u64 size)
>                          }
>   
>                          /*
> -                        * Let remove_pmd_table->free_hugepage_table do the
> -                        * right thing if we used vmem_altmap when hot-adding
> -                        * the range.
> +                        * Clear the altmap from the memory block before we
> +                        * remove it; we'll take care of freeing the altmap.
>                           */
> -                       mhp_altmap.alloc = nr_vmemmap_pages;
> -                       altmap = &mhp_altmap;
> +                       altmap = mem->altmap;
> +                       mem->altmap = NULL;
>                  }
>          }
>   
> @@ -2094,6 +2098,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>   
>          arch_remove_memory(start, size, altmap);
>   
> +       if (altmap)
> +               kfree(altmap);
> +
>          if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>                  memblock_phys_free(start, size);
>                  memblock_remove(start, size);
>

Is this any better. Any specific reason we want the alloc and free in
the caller? 

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0210ed7b7696..271cfdf8f6b6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,7 +106,7 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
-	kfree(mem->altmap);
+	WARN_ON(mem->altmap);
 	kfree(mem);
 }
 
@@ -751,14 +751,8 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	if (altmap) {
-		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
-		if (!mem->altmap) {
-			kfree(mem);
-			return -ENOMEM;
-		}
-		memcpy(mem->altmap, altmap, sizeof(*altmap));
-	}
+	if (altmap)
+		mem->altmap = altmap;
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2bad1bf0e9e3..1c7d88332e0e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1445,8 +1445,13 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
+
 			mhp_altmap.free = memory_block_memmap_on_memory_pages();
-			params.altmap = &mhp_altmap;
+			params.altmap = kzalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+			if (!params.altmap)
+				goto error;
+
+			memcpy(params.altmap, &mhp_altmap, sizeof(mhp_altmap));
 		}
 		/* fallback to not using altmap  */
 	}
@@ -2067,13 +2072,14 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 
 static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
-	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
+	struct vmem_altmap **altmap = (struct vmem_altmap **)arg;
 	/*
 	 * If we have any pages allocated from altmap
 	 * return the altmap details and break callback.
 	 */
 	if (mem->altmap) {
-		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
+		*altmap = mem->altmap;
+		mem->altmap = NULL;
 		return 1;
 	}
 	return 0;
@@ -2152,7 +2158,7 @@ EXPORT_SYMBOL(try_offline_node);
 static int __ref try_remove_memory(u64 start, u64 size)
 {
 	int ret;
-	struct vmem_altmap mhp_altmap, *altmap = NULL;
+	struct vmem_altmap *altmap = NULL;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2174,7 +2180,7 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
 	 * the same granularity it was added - a single memory block.
 	 */
-	ret = walk_memory_blocks(start, size, &mhp_altmap,
+	ret = walk_memory_blocks(start, size, &altmap,
 				 get_vmemmap_altmap_cb);
 	if (ret) {
 		if (size != memory_block_size_bytes()) {
@@ -2183,7 +2189,6 @@ static int __ref try_remove_memory(u64 start, u64 size)
 				start, start + size);
 			return -EINVAL;
 		}
-		altmap = &mhp_altmap;
 	}
 
 	/* remove memmap entry */
@@ -2203,8 +2208,10 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * Now that we are tracking alloc and free correctly
 	 * we can add check to verify altmap free pages.
 	 */
-	if (altmap)
+	if (altmap) {
 		WARN(altmap->alloc, "Altmap not fully unmapped");
+		kfree(altmap);
+	}
 
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
 		memblock_phys_free(start, size);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
  2023-07-26  9:57         ` Aneesh Kumar K V
@ 2023-07-26 16:39             ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26 16:39 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 26.07.23 11:57, Aneesh Kumar K V wrote:
> On 7/26/23 2:34 PM, David Hildenbrand wrote:
>>
>>>>>     /*
>>>>> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>>>>     {
>>>>>         struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>>>>         enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>>>>> -    struct vmem_altmap mhp_altmap = {};
>>>>> +    struct vmem_altmap mhp_altmap = {
>>>>> +        .base_pfn =  PHYS_PFN(res->start),
>>>>> +        .end_pfn  =  PHYS_PFN(res->end),
>>>>
>>>> Is it required to set .end_pfn, and if so, shouldn't we also set it to
>>>> base_pfn + memory_block_memmap_on_memory_pages()) ?
>>>>
>>>
>>> We use that in ppc64 for checking altmap boundary condition. As we
>>> discussed earlier, ppc64 due to vmemmap mapping size restrictions can't
>>> always allocate vmemmap pages from altmap area even if requested. We
>>> fallback to regular memory alocation in that case (only used now with
>>> pmem). We use altmap.end_pfn for that boundary check. You can refer to
>>> altmap_cross_boundary() for more details.
>>
>> But even then, setting the end to the end of the resource size is wrong, no? We don't want anybody to allocate beyond base_pfn + memory_block_memmap_on_memory_pages().
>>
> 
> altmap.end is the end pfn of the resource

Oh, thanks for pointing that out. I wonder why the altmap even has to 
care about that ...

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
@ 2023-07-26 16:39             ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26 16:39 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 26.07.23 11:57, Aneesh Kumar K V wrote:
> On 7/26/23 2:34 PM, David Hildenbrand wrote:
>>
>>>>>     /*
>>>>> @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>>>>     {
>>>>>         struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>>>>         enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>>>>> -    struct vmem_altmap mhp_altmap = {};
>>>>> +    struct vmem_altmap mhp_altmap = {
>>>>> +        .base_pfn =  PHYS_PFN(res->start),
>>>>> +        .end_pfn  =  PHYS_PFN(res->end),
>>>>
>>>> Is it required to set .end_pfn, and if so, shouldn't we also set it to
>>>> base_pfn + memory_block_memmap_on_memory_pages()) ?
>>>>
>>>
>>> We use that in ppc64 for checking altmap boundary condition. As we
>>> discussed earlier, ppc64 due to vmemmap mapping size restrictions can't
>>> always allocate vmemmap pages from altmap area even if requested. We
>>> fallback to regular memory alocation in that case (only used now with
>>> pmem). We use altmap.end_pfn for that boundary check. You can refer to
>>> altmap_cross_boundary() for more details.
>>
>> But even then, setting the end to the end of the resource size is wrong, no? We don't want anybody to allocate beyond base_pfn + memory_block_memmap_on_memory_pages().
>>
> 
> altmap.end is the end pfn of the resource

Oh, thanks for pointing that out. I wonder why the altmap even has to 
care about that ...

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-26 10:31       ` Aneesh Kumar K.V
@ 2023-07-26 16:43         ` David Hildenbrand
  -1 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26 16:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 26.07.23 12:31, Aneesh Kumar K.V wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 25.07.23 12:02, Aneesh Kumar K.V wrote:
>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>>> computing them again when we remove a memory block, embed vmem_altmap
>>> details in struct memory_block if we are using memmap on memory block
>>> feature.
>>>
>>> No functional change in this patch
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>
>> [...]
>>
>>>    
>>>    static int add_memory_block(unsigned long block_id, unsigned long state,
>>> -			    unsigned long nr_vmemmap_pages,
>>> +			    struct vmem_altmap *altmap,
>>>    			    struct memory_group *group)
>>>    {
>>>    	struct memory_block *mem;
>>> @@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>>>    	mem->start_section_nr = block_id * sections_per_block;
>>>    	mem->state = state;
>>>    	mem->nid = NUMA_NO_NODE;
>>> -	mem->nr_vmemmap_pages = nr_vmemmap_pages;
>>> +	if (altmap) {
>>> +		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
>>> +		if (!mem->altmap) {
>>> +			kfree(mem);
>>> +			return -ENOMEM;
>>> +		}
>>> +		memcpy(mem->altmap, altmap, sizeof(*altmap));
>>> +	}
>>
>> I'm wondering if we should instead let the caller do the alloc/free. So we would alloc
>> int the caller and would only store the pointer.
>>
>> Before removing the memory block, we would clear the pointer and free it in the caller.
>>
>> IOW, when removing a memory block and we still have an altmap set, something would be wrong.
>>
>> See below on try_remove_memory() handling.
>>
>> [...]
>>
>>> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
>>> +static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
>>>    {
>>> +	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
>>>    	/*
>>> -	 * If not set, continue with the next block.
>>> +	 * If we have any pages allocated from altmap
>>> +	 * return the altmap details and break callback.
>>>    	 */
>>> -	return mem->nr_vmemmap_pages;
>>> +	if (mem->altmap) {
>>> +		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
>>> +		return 1;
>>> +	}
>>> +	return 0;
>>>    }
>>>    
>>>    static int check_cpu_on_node(int nid)
>>> @@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
>>>    
>>>    static int __ref try_remove_memory(u64 start, u64 size)
>>>    {
>>> -	struct vmem_altmap mhp_altmap = {};
>>> -	struct vmem_altmap *altmap = NULL;
>>> -	unsigned long nr_vmemmap_pages;
>>> +	int ret;
>>> +	struct vmem_altmap mhp_altmap, *altmap = NULL;
>>>    	int rc = 0, nid = NUMA_NO_NODE;
>>>    
>>>    	BUG_ON(check_hotplug_memory_range(start, size));
>>> @@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>>    	 * the same granularity it was added - a single memory block.
>>>    	 */
>>>    	if (mhp_memmap_on_memory()) {
>>> -		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>>> -						      get_nr_vmemmap_pages_cb);
>>> -		if (nr_vmemmap_pages) {
>>> +		ret = walk_memory_blocks(start, size, &mhp_altmap,
>>> +					 get_vmemmap_altmap_cb);
>>> +		if (ret) {
>>>    			if (size != memory_block_size_bytes()) {
>>>    				pr_warn("Refuse to remove %#llx - %#llx,"
>>>    					"wrong granularity\n",
>>>    					start, start + size);
>>>    				return -EINVAL;
>>>    			}
>>> -
>>> -			/*
>>> -			 * Let remove_pmd_table->free_hugepage_table do the
>>> -			 * right thing if we used vmem_altmap when hot-adding
>>> -			 * the range.
>>> -			 */
>>> -			mhp_altmap.base_pfn = PHYS_PFN(start);
>>> -			mhp_altmap.free = nr_vmemmap_pages;
>>> -			mhp_altmap.alloc = nr_vmemmap_pages;
>>>    			altmap = &mhp_altmap;
>>>    		}
>>
>>
>> Instead of that, I suggest (whitespace damage expected):
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 3f231cf1b410..f6860df64549 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1956,12 +1956,19 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
>>           return 0;
>>    }
>>    
>> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
>> +static int test_has_altmap_cb(struct memory_block *mem, void *arg)
>>    {
>> -       /*
>> -        * If not set, continue with the next block.
>> -        */
>> -       return mem->nr_vmemmap_pages;
>> +       struct memory_block **mem_ptr = (struct memory_block **)arg;
>> +
>> +       if (mem->altmap) {
>> +               /*
>> +                * We're not taking a reference on the memory block; it
>> +                * it cannot vanish while we're about to that memory ourselves.
>> +                */
>> +               *mem_ptr = mem;
>> +               return 1;
>> +       }
>> +       return 0;
>>    }
>>    
>>    static int check_cpu_on_node(int nid)
>> @@ -2036,9 +2043,7 @@ EXPORT_SYMBOL(try_offline_node);
>>    
>>    static int __ref try_remove_memory(u64 start, u64 size)
>>    {
>> -       struct vmem_altmap mhp_altmap = {};
>>           struct vmem_altmap *altmap = NULL;
>> -       unsigned long nr_vmemmap_pages;
>>           int rc = 0, nid = NUMA_NO_NODE;
>>    
>>           BUG_ON(check_hotplug_memory_range(start, size));
>> @@ -2061,9 +2066,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>            * the same granularity it was added - a single memory block.
>>            */
>>           if (mhp_memmap_on_memory()) {
>> -               nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>> -                                                     get_nr_vmemmap_pages_cb);
>> -               if (nr_vmemmap_pages) {
>> +               struct memory_block *mem;
>> +
>> +               if (walk_memory_blocks(start, size, &mem, test_has_altmap_cb)) {
>>                           if (size != memory_block_size_bytes()) {
>>                                   pr_warn("Refuse to remove %#llx - %#llx,"
>>                                           "wrong granularity\n",
>> @@ -2072,12 +2077,11 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>                           }
>>    
>>                           /*
>> -                        * Let remove_pmd_table->free_hugepage_table do the
>> -                        * right thing if we used vmem_altmap when hot-adding
>> -                        * the range.
>> +                        * Clear the altmap from the memory block before we
>> +                        * remove it; we'll take care of freeing the altmap.
>>                            */
>> -                       mhp_altmap.alloc = nr_vmemmap_pages;
>> -                       altmap = &mhp_altmap;
>> +                       altmap = mem->altmap;
>> +                       mem->altmap = NULL;
>>                   }
>>           }
>>    
>> @@ -2094,6 +2098,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>    
>>           arch_remove_memory(start, size, altmap);
>>    
>> +       if (altmap)
>> +               kfree(altmap);
>> +
>>           if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>>                   memblock_phys_free(start, size);
>>                   memblock_remove(start, size);
>>
> 
> Is this any better. Any specific reason we want the alloc and free in
> the caller?


IMHO if you only have a single instance you don't have to worry about 
any inconsistencies. At least to me it looks cleaner than this copying 
back and forth.

Below is better, but as you pointed out, your get_vmemmap_altmap_cb() 
change is problematic/insufficient.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-26 16:43         ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-07-26 16:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 26.07.23 12:31, Aneesh Kumar K.V wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 25.07.23 12:02, Aneesh Kumar K.V wrote:
>>> With memmap on memory, some architecture needs more details w.r.t altmap
>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>>> computing them again when we remove a memory block, embed vmem_altmap
>>> details in struct memory_block if we are using memmap on memory block
>>> feature.
>>>
>>> No functional change in this patch
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>
>> [...]
>>
>>>    
>>>    static int add_memory_block(unsigned long block_id, unsigned long state,
>>> -			    unsigned long nr_vmemmap_pages,
>>> +			    struct vmem_altmap *altmap,
>>>    			    struct memory_group *group)
>>>    {
>>>    	struct memory_block *mem;
>>> @@ -744,7 +751,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>>>    	mem->start_section_nr = block_id * sections_per_block;
>>>    	mem->state = state;
>>>    	mem->nid = NUMA_NO_NODE;
>>> -	mem->nr_vmemmap_pages = nr_vmemmap_pages;
>>> +	if (altmap) {
>>> +		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
>>> +		if (!mem->altmap) {
>>> +			kfree(mem);
>>> +			return -ENOMEM;
>>> +		}
>>> +		memcpy(mem->altmap, altmap, sizeof(*altmap));
>>> +	}
>>
>> I'm wondering if we should instead let the caller do the alloc/free. So we would alloc
>> int the caller and would only store the pointer.
>>
>> Before removing the memory block, we would clear the pointer and free it in the caller.
>>
>> IOW, when removing a memory block and we still have an altmap set, something would be wrong.
>>
>> See below on try_remove_memory() handling.
>>
>> [...]
>>
>>> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
>>> +static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
>>>    {
>>> +	struct vmem_altmap *altmap = (struct vmem_altmap *)arg;
>>>    	/*
>>> -	 * If not set, continue with the next block.
>>> +	 * If we have any pages allocated from altmap
>>> +	 * return the altmap details and break callback.
>>>    	 */
>>> -	return mem->nr_vmemmap_pages;
>>> +	if (mem->altmap) {
>>> +		memcpy(altmap, mem->altmap, sizeof(struct vmem_altmap));
>>> +		return 1;
>>> +	}
>>> +	return 0;
>>>    }
>>>    
>>>    static int check_cpu_on_node(int nid)
>>> @@ -2146,9 +2152,8 @@ EXPORT_SYMBOL(try_offline_node);
>>>    
>>>    static int __ref try_remove_memory(u64 start, u64 size)
>>>    {
>>> -	struct vmem_altmap mhp_altmap = {};
>>> -	struct vmem_altmap *altmap = NULL;
>>> -	unsigned long nr_vmemmap_pages;
>>> +	int ret;
>>> +	struct vmem_altmap mhp_altmap, *altmap = NULL;
>>>    	int rc = 0, nid = NUMA_NO_NODE;
>>>    
>>>    	BUG_ON(check_hotplug_memory_range(start, size));
>>> @@ -2171,24 +2176,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>>    	 * the same granularity it was added - a single memory block.
>>>    	 */
>>>    	if (mhp_memmap_on_memory()) {
>>> -		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>>> -						      get_nr_vmemmap_pages_cb);
>>> -		if (nr_vmemmap_pages) {
>>> +		ret = walk_memory_blocks(start, size, &mhp_altmap,
>>> +					 get_vmemmap_altmap_cb);
>>> +		if (ret) {
>>>    			if (size != memory_block_size_bytes()) {
>>>    				pr_warn("Refuse to remove %#llx - %#llx,"
>>>    					"wrong granularity\n",
>>>    					start, start + size);
>>>    				return -EINVAL;
>>>    			}
>>> -
>>> -			/*
>>> -			 * Let remove_pmd_table->free_hugepage_table do the
>>> -			 * right thing if we used vmem_altmap when hot-adding
>>> -			 * the range.
>>> -			 */
>>> -			mhp_altmap.base_pfn = PHYS_PFN(start);
>>> -			mhp_altmap.free = nr_vmemmap_pages;
>>> -			mhp_altmap.alloc = nr_vmemmap_pages;
>>>    			altmap = &mhp_altmap;
>>>    		}
>>
>>
>> Instead of that, I suggest (whitespace damage expected):
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 3f231cf1b410..f6860df64549 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1956,12 +1956,19 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
>>           return 0;
>>    }
>>    
>> -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
>> +static int test_has_altmap_cb(struct memory_block *mem, void *arg)
>>    {
>> -       /*
>> -        * If not set, continue with the next block.
>> -        */
>> -       return mem->nr_vmemmap_pages;
>> +       struct memory_block **mem_ptr = (struct memory_block **)arg;
>> +
>> +       if (mem->altmap) {
>> +               /*
>> +                * We're not taking a reference on the memory block; it
>> +                * it cannot vanish while we're about to that memory ourselves.
>> +                */
>> +               *mem_ptr = mem;
>> +               return 1;
>> +       }
>> +       return 0;
>>    }
>>    
>>    static int check_cpu_on_node(int nid)
>> @@ -2036,9 +2043,7 @@ EXPORT_SYMBOL(try_offline_node);
>>    
>>    static int __ref try_remove_memory(u64 start, u64 size)
>>    {
>> -       struct vmem_altmap mhp_altmap = {};
>>           struct vmem_altmap *altmap = NULL;
>> -       unsigned long nr_vmemmap_pages;
>>           int rc = 0, nid = NUMA_NO_NODE;
>>    
>>           BUG_ON(check_hotplug_memory_range(start, size));
>> @@ -2061,9 +2066,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>            * the same granularity it was added - a single memory block.
>>            */
>>           if (mhp_memmap_on_memory()) {
>> -               nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>> -                                                     get_nr_vmemmap_pages_cb);
>> -               if (nr_vmemmap_pages) {
>> +               struct memory_block *mem;
>> +
>> +               if (walk_memory_blocks(start, size, &mem, test_has_altmap_cb)) {
>>                           if (size != memory_block_size_bytes()) {
>>                                   pr_warn("Refuse to remove %#llx - %#llx,"
>>                                           "wrong granularity\n",
>> @@ -2072,12 +2077,11 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>                           }
>>    
>>                           /*
>> -                        * Let remove_pmd_table->free_hugepage_table do the
>> -                        * right thing if we used vmem_altmap when hot-adding
>> -                        * the range.
>> +                        * Clear the altmap from the memory block before we
>> +                        * remove it; we'll take care of freeing the altmap.
>>                            */
>> -                       mhp_altmap.alloc = nr_vmemmap_pages;
>> -                       altmap = &mhp_altmap;
>> +                       altmap = mem->altmap;
>> +                       mem->altmap = NULL;
>>                   }
>>           }
>>    
>> @@ -2094,6 +2098,9 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>    
>>           arch_remove_memory(start, size, altmap);
>>    
>> +       if (altmap)
>> +               kfree(altmap);
>> +
>>           if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>>                   memblock_phys_free(start, size);
>>                   memblock_remove(start, size);
>>
> 
> Is this any better. Any specific reason we want the alloc and free in
> the caller?


IMHO if you only have a single instance you don't have to worry about 
any inconsistencies. At least to me it looks cleaner than this copying 
back and forth.

Below is better, but as you pointed out, your get_vmemmap_altmap_cb() 
change is problematic/insufficient.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-07-26 16:44 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-25 10:02 [PATCH v5 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-07-25 10:02 ` Aneesh Kumar K.V
2023-07-25 10:02 ` [PATCH v5 1/7] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
2023-07-25 10:02   ` Aneesh Kumar K.V
2023-07-25 10:02 ` [PATCH v5 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback Aneesh Kumar K.V
2023-07-25 10:02   ` Aneesh Kumar K.V
2023-07-25 10:02 ` [PATCH v5 3/7] mm/hotplug: Allow architecture to override memmap on memory support check Aneesh Kumar K.V
2023-07-25 10:02   ` Aneesh Kumar K.V
2023-07-25 10:02 ` [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks Aneesh Kumar K.V
2023-07-25 10:02   ` Aneesh Kumar K.V
2023-07-25 18:06   ` David Hildenbrand
2023-07-25 18:06     ` David Hildenbrand
2023-07-26  4:25     ` Aneesh Kumar K.V
2023-07-26  4:25       ` Aneesh Kumar K.V
2023-07-26  9:04       ` David Hildenbrand
2023-07-26  9:04         ` David Hildenbrand
2023-07-26  9:57         ` Aneesh Kumar K V
2023-07-26 16:39           ` David Hildenbrand
2023-07-26 16:39             ` David Hildenbrand
2023-07-25 10:02 ` [PATCH v5 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
2023-07-25 10:02   ` Aneesh Kumar K.V
2023-07-25 10:09   ` David Hildenbrand
2023-07-25 10:09     ` David Hildenbrand
2023-07-25 10:02 ` [PATCH v5 6/7] mm/hotplug: Embed vmem_altmap details in memory block Aneesh Kumar K.V
2023-07-25 10:02   ` Aneesh Kumar K.V
2023-07-26  9:41   ` David Hildenbrand
2023-07-26  9:41     ` David Hildenbrand
2023-07-26 10:31     ` Aneesh Kumar K.V
2023-07-26 10:31       ` Aneesh Kumar K.V
2023-07-26 16:43       ` David Hildenbrand
2023-07-26 16:43         ` David Hildenbrand
2023-07-25 10:02 ` [PATCH v5 7/7] mm/hotplug: Enable runtime update of memmap_on_memory parameter Aneesh Kumar K.V
2023-07-25 10:02   ` Aneesh Kumar K.V
2023-07-25 17:52   ` David Hildenbrand
2023-07-25 17:52     ` David Hildenbrand
2023-07-25 10:06 ` [PATCH v5 0/7] Add support for memmap on memory feature on ppc64 David Hildenbrand
2023-07-25 10:06   ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.