[PATCH v3 0/7] Add support for memmap on memory feature on ppc64

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/7] Add support for memmap on memory feature on ppc64
@ 2023-07-11  4:48 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here
https://lore.kernel.org/linux-mm/20230710160842.56300-1-aneesh.kumar@linux.ibm.com


Changes from v2:
* Rebase to latest linus tree
* Redo the series based on review feedback. Multiple changes to the patchset.

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree


Aneesh Kumar K.V (6):
  mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  mm/hotplug: Allow memmap on memory hotplug request to fallback
  mm/hotplug: Allow architecture to override memmap on memory support
    check
  mm/hotplug: Allow pageblock alignment via altmap reservation
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  mm/hotplug: Embed vmem_altmap details in memory block

Vishal Verma (1):
  dax/kmem: Always enroll hotplugged memory for memmap_on_memory

 arch/arm64/Kconfig                            |   4 +-
 arch/powerpc/Kconfig                          |   1 +
 arch/powerpc/include/asm/pgtable.h            |  28 +++++
 .../platforms/pseries/hotplug-memory.c        |   3 +-
 arch/x86/Kconfig                              |   4 +-
 drivers/acpi/acpi_memhotplug.c                |   3 +-
 drivers/base/memory.c                         |  35 ++++--
 drivers/dax/kmem.c                            |  81 ++++++++++----
 include/linux/memory.h                        |   8 +-
 include/linux/memory_hotplug.h                |   1 -
 mm/Kconfig                                    |  12 ++
 mm/memory_hotplug.c                           | 103 ++++++++++++------
 12 files changed, 205 insertions(+), 78 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v3 0/7] Add support for memmap on memory feature on ppc64
@ 2023-07-11  4:48 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.

This patch series is dependent on dax vmemmap optimization series
posted here
https://lore.kernel.org/linux-mm/20230710160842.56300-1-aneesh.kumar@linux.ibm.com


Changes from v2:
* Rebase to latest linus tree
* Redo the series based on review feedback. Multiple changes to the patchset.

Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree


Aneesh Kumar K.V (6):
  mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  mm/hotplug: Allow memmap on memory hotplug request to fallback
  mm/hotplug: Allow architecture to override memmap on memory support
    check
  mm/hotplug: Allow pageblock alignment via altmap reservation
  powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  mm/hotplug: Embed vmem_altmap details in memory block

Vishal Verma (1):
  dax/kmem: Always enroll hotplugged memory for memmap_on_memory

 arch/arm64/Kconfig                            |   4 +-
 arch/powerpc/Kconfig                          |   1 +
 arch/powerpc/include/asm/pgtable.h            |  28 +++++
 .../platforms/pseries/hotplug-memory.c        |   3 +-
 arch/x86/Kconfig                              |   4 +-
 drivers/acpi/acpi_memhotplug.c                |   3 +-
 drivers/base/memory.c                         |  35 ++++--
 drivers/dax/kmem.c                            |  81 ++++++++++----
 include/linux/memory.h                        |   8 +-
 include/linux/memory_hotplug.h                |   1 -
 mm/Kconfig                                    |  12 ++
 mm/memory_hotplug.c                           | 103 ++++++++++++------
 12 files changed, 205 insertions(+), 78 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v3 1/7] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
  2023-07-11  4:48 ` Aneesh Kumar K.V
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/Kconfig | 4 +---
 arch/x86/Kconfig   | 4 +---
 mm/Kconfig         | 3 +++
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7856c3a3e35a..7e5985c018f8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_GNU_PROPERTY
 	select ARCH_USE_MEMTEST
@@ -346,9 +347,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 config SMP
 	def_bool y
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
 	def_bool y
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 923bd35f81f2..932349271e28 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -572,6 +572,9 @@ config MHP_MEMMAP_ON_MEMORY
 
 endif # MEMORY_HOTPLUG
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+       bool
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 1/7] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/Kconfig | 4 +---
 arch/x86/Kconfig   | 4 +---
 mm/Kconfig         | 3 +++
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7856c3a3e35a..7e5985c018f8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
 	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
 	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_GNU_PROPERTY
 	select ARCH_USE_MEMTEST
@@ -346,9 +347,6 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 config SMP
 	def_bool y
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
 	def_bool y
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
-	def_bool y
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 923bd35f81f2..932349271e28 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -572,6 +572,9 @@ config MHP_MEMMAP_ON_MEMORY
 
 endif # MEMORY_HOTPLUG
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+       bool
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
  2023-07-11  4:48 ` Aneesh Kumar K.V
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/acpi/acpi_memhotplug.c |  3 +--
 include/linux/memory_hotplug.h |  1 -
 mm/memory_hotplug.c            | 13 ++++++-------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..d0c1a71007d0 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (!info->length)
 			continue;
 
-		if (mhp_supports_memmap_on_memory(info->length))
-			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+		mhp_flags |= MHP_MEMMAP_ON_MEMORY;
 		result = __add_memory(mgid, info->start_addr, info->length,
 				      mhp_flags);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..96f6127f197f 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -354,7 +354,6 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
 extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
 				      struct mhp_params *params);
 void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f231cf1b410..1b19462f4e72 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-bool mhp_supports_memmap_on_memory(unsigned long size)
+static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 * Self hosted memmap array
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-		if (!mhp_supports_memmap_on_memory(size)) {
-			ret = -EINVAL;
-			goto error;
+		if (mhp_supports_memmap_on_memory(size)) {
+			mhp_altmap.free = PHYS_PFN(size);
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			params.altmap = &mhp_altmap;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
-		params.altmap = &mhp_altmap;
+		/* fallback to not using altmap  */
 	}
 
 	/* call arch's memory hotadd */
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/acpi/acpi_memhotplug.c |  3 +--
 include/linux/memory_hotplug.h |  1 -
 mm/memory_hotplug.c            | 13 ++++++-------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..d0c1a71007d0 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (!info->length)
 			continue;
 
-		if (mhp_supports_memmap_on_memory(info->length))
-			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+		mhp_flags |= MHP_MEMMAP_ON_MEMORY;
 		result = __add_memory(mgid, info->start_addr, info->length,
 				      mhp_flags);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..96f6127f197f 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -354,7 +354,6 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
 extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
 				      struct mhp_params *params);
 void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f231cf1b410..1b19462f4e72 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-bool mhp_supports_memmap_on_memory(unsigned long size)
+static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 * Self hosted memmap array
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-		if (!mhp_supports_memmap_on_memory(size)) {
-			ret = -EINVAL;
-			goto error;
+		if (mhp_supports_memmap_on_memory(size)) {
+			mhp_altmap.free = PHYS_PFN(size);
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			params.altmap = &mhp_altmap;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
-		params.altmap = &mhp_altmap;
+		/* fallback to not using altmap  */
 	}
 
 	/* call arch's memory hotadd */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-11  4:48 ` Aneesh Kumar K.V
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Some architectures would want different restrictions. Hence add an
architecture-specific override.

Both the PMD_SIZE check and pageblock alignment check are moved there.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1b19462f4e72..07c99b0cc371 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-static bool mhp_supports_memmap_on_memory(unsigned long size)
+#ifndef arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
 {
-	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
 	unsigned long remaining_size = size - vmemmap_size;
 
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
+		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+}
+#endif
+
+static bool mhp_supports_memmap_on_memory(unsigned long size)
+{
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
 	 * need a few more assumptions to hold true:
@@ -1280,9 +1288,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       populate a single PMD.
 	 */
 	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+		size == memory_block_size_bytes() &&
+		arch_supports_memmap_on_memory(size);
 }
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Some architectures would want different restrictions. Hence add an
architecture-specific override.

Both the PMD_SIZE check and pageblock alignment check are moved there.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory_hotplug.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1b19462f4e72..07c99b0cc371 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-static bool mhp_supports_memmap_on_memory(unsigned long size)
+#ifndef arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
 {
-	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
 	unsigned long remaining_size = size - vmemmap_size;
 
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
+		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+}
+#endif
+
+static bool mhp_supports_memmap_on_memory(unsigned long size)
+{
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
 	 * need a few more assumptions to hold true:
@@ -1280,9 +1288,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       populate a single PMD.
 	 */
 	return mhp_memmap_on_memory() &&
-	       size == memory_block_size_bytes() &&
-	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+		size == memory_block_size_bytes() &&
+		arch_supports_memmap_on_memory(size);
 }
 
 /*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-11  4:48 ` Aneesh Kumar K.V
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Add a new kconfig option that can be selected if we want to allow
pageblock alignment by reserving pages in the vmemmap altmap area.
This implies we will be reserving some pages for every memoryblock
This also allows the memmap on memory feature to be widely useful
with different memory block size values.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/Kconfig          |  9 +++++++
 mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 932349271e28..88a1472b2086 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
 	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
 	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 
+config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
+       bool "Allow Reserving pages for page block aligment"
+       depends on MHP_MEMMAP_ON_MEMORY
+       help
+	This option allows memmap on memory feature to be more useful
+	with different memory block sizes. This is achieved by marking some pages
+	in each memory block as reserved so that we can get page-block alignment
+	for the remaining pages.
+
 endif # MEMORY_HOTPLUG
 
 config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 07c99b0cc371..f36aec1f7626 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
-	unsigned long remaining_size = size - vmemmap_size;
 
-	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
 }
 #endif
 
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+	unsigned long remaining_size = size - vmemmap_size;
+
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
 	 * need a few more assumptions to hold true:
@@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-		size == memory_block_size_bytes() &&
-		arch_supports_memmap_on_memory(size);
+	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+		return false;
+	 /*
+	  * Without page reservation remaining pages should be pageblock aligned.
+	  */
+	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
+	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
+		return false;
+
+	return arch_supports_memmap_on_memory(size);
+}
+
+static inline unsigned long memory_block_align_base(unsigned long size)
+{
+	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
+		unsigned long align;
+		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+		unsigned long vmemmap_size;
+
+		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
+		align = pageblock_align(vmemmap_size) - vmemmap_size;
+		return align;
+	} else
+		return 0;
 }
 
 /*
@@ -1302,7 +1325,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+		.reserve  = memory_block_align_base(resource_size(res)),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1347,8 +1374,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = PHYS_PFN(size) - mhp_altmap.reserve;
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1360,7 +1386,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
+	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
 					  group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
@@ -2260,3 +2286,16 @@ int offline_and_remove_memory(u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
+
+static int __init memory_hotplug_init(void)
+{
+
+	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
+	    mhp_memmap_on_memory()) {
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memory_block_align_base(memory_block_size_bytes()));
+
+	}
+	return 0;
+}
+module_init(memory_hotplug_init);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Add a new kconfig option that can be selected if we want to allow
pageblock alignment by reserving pages in the vmemmap altmap area.
This implies we will be reserving some pages for every memoryblock
This also allows the memmap on memory feature to be widely useful
with different memory block size values.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/Kconfig          |  9 +++++++
 mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 932349271e28..88a1472b2086 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
 	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
 	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 
+config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
+       bool "Allow Reserving pages for page block aligment"
+       depends on MHP_MEMMAP_ON_MEMORY
+       help
+	This option allows memmap on memory feature to be more useful
+	with different memory block sizes. This is achieved by marking some pages
+	in each memory block as reserved so that we can get page-block alignment
+	for the remaining pages.
+
 endif # MEMORY_HOTPLUG
 
 config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 07c99b0cc371..f36aec1f7626 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
 {
 	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
 	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
-	unsigned long remaining_size = size - vmemmap_size;
 
-	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
-		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
 }
 #endif
 
 static bool mhp_supports_memmap_on_memory(unsigned long size)
 {
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+	unsigned long remaining_size = size - vmemmap_size;
+
 	/*
 	 * Besides having arch support and the feature enabled at runtime, we
 	 * need a few more assumptions to hold true:
@@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 *       altmap as an alternative source of memory, and we do not exactly
 	 *       populate a single PMD.
 	 */
-	return mhp_memmap_on_memory() &&
-		size == memory_block_size_bytes() &&
-		arch_supports_memmap_on_memory(size);
+	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+		return false;
+	 /*
+	  * Without page reservation remaining pages should be pageblock aligned.
+	  */
+	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
+	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
+		return false;
+
+	return arch_supports_memmap_on_memory(size);
+}
+
+static inline unsigned long memory_block_align_base(unsigned long size)
+{
+	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
+		unsigned long align;
+		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+		unsigned long vmemmap_size;
+
+		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
+		align = pageblock_align(vmemmap_size) - vmemmap_size;
+		return align;
+	} else
+		return 0;
 }
 
 /*
@@ -1302,7 +1325,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
 	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
 	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap mhp_altmap = {
+		.base_pfn =  PHYS_PFN(res->start),
+		.end_pfn  =  PHYS_PFN(res->end),
+		.reserve  = memory_block_align_base(resource_size(res)),
+	};
 	struct memory_group *group = NULL;
 	u64 start, size;
 	bool new_node = false;
@@ -1347,8 +1374,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 	 */
 	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
 		if (mhp_supports_memmap_on_memory(size)) {
-			mhp_altmap.free = PHYS_PFN(size);
-			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = PHYS_PFN(size) - mhp_altmap.reserve;
 			params.altmap = &mhp_altmap;
 		}
 		/* fallback to not using altmap  */
@@ -1360,7 +1386,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
+	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
 					  group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
@@ -2260,3 +2286,16 @@ int offline_and_remove_memory(u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
+
+static int __init memory_hotplug_init(void)
+{
+
+	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
+	    mhp_memmap_on_memory()) {
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memory_block_align_base(memory_block_size_bytes()));
+
+	}
+	return 0;
+}
+module_init(memory_hotplug_init);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-11  4:48 ` Aneesh Kumar K.V
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB

We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  3 +-
 mm/memory_hotplug.c                           |  2 ++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 116d6add0bb0..f890907e5bbf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 68817ea7f994..8e6c92dde6ad 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
 int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
 bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
 			   unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
+
+	if (!radix_enabled())
+		return false;
+
+#ifdef CONFIG_PPC_4K_PAGES
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+#else
+	/*
+	 * Make sure the vmemmap allocation is fully contianed
+	 * so that we always allocate vmemmap memory from altmap area.
+	 * The pageblock alignment requirement is met by using
+	 * reserve blocks in altmap.
+	 */
+	return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
+#endif
+}
+
 #endif /* CONFIG_PPC64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..1447509357a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
+	mhp_t mhp_flags = MHP_NONE | MHP_MEMMAP_ON_MEMORY;
 	unsigned long block_sz;
 	int nid, rc;
 
@@ -637,7 +638,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f36aec1f7626..0c4d3fdd31a2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -2108,6 +2108,8 @@ static int __ref try_remove_memory(u64 start, u64 size)
 			 * right thing if we used vmem_altmap when hot-adding
 			 * the range.
 			 */
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = PHYS_PFN(size) - nr_vmemmap_pages;
 			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB

We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  3 +-
 mm/memory_hotplug.c                           |  2 ++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 116d6add0bb0..f890907e5bbf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 68817ea7f994..8e6c92dde6ad 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
 int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
 bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
 			   unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long size)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
+
+	if (!radix_enabled())
+		return false;
+
+#ifdef CONFIG_PPC_4K_PAGES
+	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+#else
+	/*
+	 * Make sure the vmemmap allocation is fully contianed
+	 * so that we always allocate vmemmap memory from altmap area.
+	 * The pageblock alignment requirement is met by using
+	 * reserve blocks in altmap.
+	 */
+	return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
+#endif
+}
+
 #endif /* CONFIG_PPC64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..1447509357a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -617,6 +617,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
+	mhp_t mhp_flags = MHP_NONE | MHP_MEMMAP_ON_MEMORY;
 	unsigned long block_sz;
 	int nid, rc;
 
@@ -637,7 +638,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f36aec1f7626..0c4d3fdd31a2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -2108,6 +2108,8 @@ static int __ref try_remove_memory(u64 start, u64 size)
 			 * right thing if we used vmem_altmap when hot-adding
 			 * the range.
 			 */
+			mhp_altmap.base_pfn = PHYS_PFN(start);
+			mhp_altmap.free = PHYS_PFN(size) - nr_vmemmap_pages;
 			mhp_altmap.alloc = nr_vmemmap_pages;
 			altmap = &mhp_altmap;
 		}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 6/7] dax/kmem: Always enroll hotplugged memory for memmap_on_memory
  2023-07-11  4:48 ` Aneesh Kumar K.V
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Rafael J. Wysocki, Len Brown, Dan Williams, Dave Jiang,
	Dave Hansen, Huang Ying, Aneesh Kumar K . V

From: Vishal Verma <vishal.l.verma@intel.com>

With DAX memory regions originating from CXL memory expanders or
NVDIMMs, the kmem driver may be hot-adding huge amounts of system memory
on a system without enough 'regular' main memory to support the memmap
for it. To avoid this, ensure that all kmem managed hotplugged memory is
added with the MHP_MEMMAP_ON_MEMORY flag to place the memmap on the
new memory region being hot added.

To do this, call add_memory() in chunks of memory_block_size_bytes() as
that is a requirement for memmap_on_memory.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/dax/kmem.c | 81 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 59 insertions(+), 22 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 898ca9505754..840bf7b40a44 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -12,6 +12,7 @@
 #include <linux/mm.h>
 #include <linux/mman.h>
 #include <linux/memory-tiers.h>
+#include <linux/memory_hotplug.h>
 #include "dax-private.h"
 #include "bus.h"
 
@@ -105,6 +106,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 	data->mgid = rc;
 
 	for (i = 0; i < dev_dax->nr_range; i++) {
+		u64 cur_start, cur_len, remaining;
 		struct resource *res;
 		struct range range;
 
@@ -137,21 +139,42 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 		res->flags = IORESOURCE_SYSTEM_RAM;
 
 		/*
-		 * Ensure that future kexec'd kernels will not treat
-		 * this as RAM automatically.
+		 * Add memory in chunks of memory_block_size_bytes() so that
+		 * it is considered for MHP_MEMMAP_ON_MEMORY
+		 * @range has already been aligned to memory_block_size_bytes(),
+		 * so the following loop will always break it down cleanly.
 		 */
-		rc = add_memory_driver_managed(data->mgid, range.start,
-				range_len(&range), kmem_name, MHP_NID_IS_MGID);
-
-		if (rc) {
-			dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
-					i, range.start, range.end);
-			remove_resource(res);
-			kfree(res);
-			data->res[i] = NULL;
-			if (mapped)
-				continue;
-			goto err_request_mem;
+		cur_start = range.start;
+		cur_len = memory_block_size_bytes();
+		remaining = range_len(&range);
+		while (remaining) {
+			/*
+			 * If alignment rules are not satisified we will
+			 * fallback normal memmap allocation.
+			 */
+			mhp_t mhp_flags = MHP_NID_IS_MGID | MHP_MEMMAP_ON_MEMORY;
+			/*
+			 * Ensure that future kexec'd kernels will not treat
+			 * this as RAM automatically.
+			 */
+			rc = add_memory_driver_managed(data->mgid, cur_start,
+						       cur_len, kmem_name,
+						       mhp_flags);
+
+			if (rc) {
+				dev_warn(dev,
+					 "mapping%d: %#llx-%#llx memory add failed\n",
+					 i, cur_start, cur_start + cur_len - 1);
+				remove_resource(res);
+				kfree(res);
+				data->res[i] = NULL;
+				if (mapped)
+					continue;
+				goto err_request_mem;
+			}
+
+			cur_start += cur_len;
+			remaining -= cur_len;
 		}
 		mapped++;
 	}
@@ -186,25 +209,39 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
 	 * unbind will succeed even if we return failure.
 	 */
 	for (i = 0; i < dev_dax->nr_range; i++) {
+
+		u64 cur_start, cur_len, remaining;
 		struct range range;
+		bool resource_remove;
 		int rc;
 
 		rc = dax_kmem_range(dev_dax, i, &range);
 		if (rc)
 			continue;
 
-		rc = remove_memory(range.start, range_len(&range));
-		if (rc == 0) {
+		resource_remove = true;
+		cur_start = range.start;
+		cur_len = memory_block_size_bytes();
+		remaining = range_len(&range);
+		while (remaining) {
+
+			rc = remove_memory(cur_start, cur_len);
+			if (rc) {
+				resource_remove = false;
+				dev_err(dev,
+					"mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
+					i, cur_start, cur_len);
+			}
+			cur_start += cur_len;
+			remaining -= cur_len;
+		}
+		if (resource_remove) {
 			remove_resource(data->res[i]);
 			kfree(data->res[i]);
 			data->res[i] = NULL;
 			success++;
-			continue;
-		}
-		any_hotremove_failed = true;
-		dev_err(dev,
-			"mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
-				i, range.start, range.end);
+		} else
+			any_hotremove_failed = true;
 	}
 
 	if (success >= dev_dax->nr_range) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 6/7] dax/kmem: Always enroll hotplugged memory for memmap_on_memory
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Michal Hocko, Dave Jiang, Rafael J. Wysocki, Vishal Verma,
	David Hildenbrand, Dave Hansen, Aneesh Kumar K . V, Huang Ying,
	Dan Williams, Oscar Salvador, Len Brown

From: Vishal Verma <vishal.l.verma@intel.com>

With DAX memory regions originating from CXL memory expanders or
NVDIMMs, the kmem driver may be hot-adding huge amounts of system memory
on a system without enough 'regular' main memory to support the memmap
for it. To avoid this, ensure that all kmem managed hotplugged memory is
added with the MHP_MEMMAP_ON_MEMORY flag to place the memmap on the
new memory region being hot added.

To do this, call add_memory() in chunks of memory_block_size_bytes() as
that is a requirement for memmap_on_memory.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/dax/kmem.c | 81 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 59 insertions(+), 22 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 898ca9505754..840bf7b40a44 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -12,6 +12,7 @@
 #include <linux/mm.h>
 #include <linux/mman.h>
 #include <linux/memory-tiers.h>
+#include <linux/memory_hotplug.h>
 #include "dax-private.h"
 #include "bus.h"
 
@@ -105,6 +106,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 	data->mgid = rc;
 
 	for (i = 0; i < dev_dax->nr_range; i++) {
+		u64 cur_start, cur_len, remaining;
 		struct resource *res;
 		struct range range;
 
@@ -137,21 +139,42 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 		res->flags = IORESOURCE_SYSTEM_RAM;
 
 		/*
-		 * Ensure that future kexec'd kernels will not treat
-		 * this as RAM automatically.
+		 * Add memory in chunks of memory_block_size_bytes() so that
+		 * it is considered for MHP_MEMMAP_ON_MEMORY
+		 * @range has already been aligned to memory_block_size_bytes(),
+		 * so the following loop will always break it down cleanly.
 		 */
-		rc = add_memory_driver_managed(data->mgid, range.start,
-				range_len(&range), kmem_name, MHP_NID_IS_MGID);
-
-		if (rc) {
-			dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
-					i, range.start, range.end);
-			remove_resource(res);
-			kfree(res);
-			data->res[i] = NULL;
-			if (mapped)
-				continue;
-			goto err_request_mem;
+		cur_start = range.start;
+		cur_len = memory_block_size_bytes();
+		remaining = range_len(&range);
+		while (remaining) {
+			/*
+			 * If alignment rules are not satisified we will
+			 * fallback normal memmap allocation.
+			 */
+			mhp_t mhp_flags = MHP_NID_IS_MGID | MHP_MEMMAP_ON_MEMORY;
+			/*
+			 * Ensure that future kexec'd kernels will not treat
+			 * this as RAM automatically.
+			 */
+			rc = add_memory_driver_managed(data->mgid, cur_start,
+						       cur_len, kmem_name,
+						       mhp_flags);
+
+			if (rc) {
+				dev_warn(dev,
+					 "mapping%d: %#llx-%#llx memory add failed\n",
+					 i, cur_start, cur_start + cur_len - 1);
+				remove_resource(res);
+				kfree(res);
+				data->res[i] = NULL;
+				if (mapped)
+					continue;
+				goto err_request_mem;
+			}
+
+			cur_start += cur_len;
+			remaining -= cur_len;
 		}
 		mapped++;
 	}
@@ -186,25 +209,39 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
 	 * unbind will succeed even if we return failure.
 	 */
 	for (i = 0; i < dev_dax->nr_range; i++) {
+
+		u64 cur_start, cur_len, remaining;
 		struct range range;
+		bool resource_remove;
 		int rc;
 
 		rc = dax_kmem_range(dev_dax, i, &range);
 		if (rc)
 			continue;
 
-		rc = remove_memory(range.start, range_len(&range));
-		if (rc == 0) {
+		resource_remove = true;
+		cur_start = range.start;
+		cur_len = memory_block_size_bytes();
+		remaining = range_len(&range);
+		while (remaining) {
+
+			rc = remove_memory(cur_start, cur_len);
+			if (rc) {
+				resource_remove = false;
+				dev_err(dev,
+					"mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
+					i, cur_start, cur_len);
+			}
+			cur_start += cur_len;
+			remaining -= cur_len;
+		}
+		if (resource_remove) {
 			remove_resource(data->res[i]);
 			kfree(data->res[i]);
 			data->res[i] = NULL;
 			success++;
-			continue;
-		}
-		any_hotremove_failed = true;
-		dev_err(dev,
-			"mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
-				i, range.start, range.end);
+		} else
+			any_hotremove_failed = true;
 	}
 
 	if (success >= dev_dax->nr_range) {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 7/7] mm/hotplug: Embed vmem_altmap details in memory block
  2023-07-11  4:48 ` Aneesh Kumar K.V
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
	Aneesh Kumar K.V

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

No functional change in this patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c  | 35 ++++++++++++++++++++++++++---------
 include/linux/memory.h |  8 ++------
 mm/memory_hotplug.c    | 34 ++++++++++++++--------------------
 3 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..10aacaecf8de 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,10 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	if (mem->altmap) {
+		WARN(mem->altmap->alloc, "Altmap not fully unmapped");
+		kfree(mem->altmap);
+	}
 	kfree(mem);
 }
 
@@ -183,7 +187,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +204,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->alloc + mem->altmap->reserve;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +237,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +247,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->alloc + mem->altmap->reserve;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +736,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +754,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	if (altmap) {
+		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+		if (!mem->altmap) {
+			kfree(mem);
+			return -ENOMEM;
+		}
+		memcpy(mem->altmap, altmap, sizeof(*altmap));
+	}
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +800,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +835,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +849,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..f53cfdaaaa41 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -77,11 +77,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap *altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0c4d3fdd31a2..f22831eaa93f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1386,8 +1386,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
-					  group);
+	ret = create_memory_block_devices(start, size, params.altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -1988,12 +1987,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct vmem_altmap **altmap = (struct vmem_altmap **)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * If we have any pages allocated from altmap
+	 * return the altmap details and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap) {
+		*altmap = mem->altmap;
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2068,9 +2073,8 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
+	int ret;
 	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2093,25 +2097,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &altmap,
+					 get_vmemmap_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
-			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
-			 */
-			mhp_altmap.base_pfn = PHYS_PFN(start);
-			mhp_altmap.free = PHYS_PFN(size) - nr_vmemmap_pages;
-			mhp_altmap.alloc = nr_vmemmap_pages;
-			altmap = &mhp_altmap;
 		}
 	}
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 7/7] mm/hotplug: Embed vmem_altmap details in memory block
@ 2023-07-11  4:48   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-11  4:48 UTC (permalink / raw)
  To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
	Oscar Salvador

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

No functional change in this patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/memory.c  | 35 ++++++++++++++++++++++++++---------
 include/linux/memory.h |  8 ++------
 mm/memory_hotplug.c    | 34 ++++++++++++++--------------------
 3 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..10aacaecf8de 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -106,6 +106,10 @@ static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
 
+	if (mem->altmap) {
+		WARN(mem->altmap->alloc, "Altmap not fully unmapped");
+		kfree(mem->altmap);
+	}
 	kfree(mem);
 }
 
@@ -183,7 +187,7 @@ static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	struct zone *zone;
 	int ret;
 
@@ -200,6 +204,9 @@ static int memory_block_online(struct memory_block *mem)
 	 * stage helps to keep accounting easier to follow - e.g vmemmaps
 	 * belong to the same zone as the memory they backed.
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->alloc + mem->altmap->reserve;
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -230,7 +237,7 @@ static int memory_block_offline(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret;
 
 	if (!mem->zone)
@@ -240,6 +247,9 @@ static int memory_block_offline(struct memory_block *mem)
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->alloc + mem->altmap->reserve;
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -726,7 +736,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
 #endif
 
 static int add_memory_block(unsigned long block_id, unsigned long state,
-			    unsigned long nr_vmemmap_pages,
+			    struct vmem_altmap *altmap,
 			    struct memory_group *group)
 {
 	struct memory_block *mem;
@@ -744,7 +754,14 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
 	mem->nid = NUMA_NO_NODE;
-	mem->nr_vmemmap_pages = nr_vmemmap_pages;
+	if (altmap) {
+		mem->altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+		if (!mem->altmap) {
+			kfree(mem);
+			return -ENOMEM;
+		}
+		memcpy(mem->altmap, altmap, sizeof(*altmap));
+	}
 	INIT_LIST_HEAD(&mem->group_next);
 
 #ifndef CONFIG_NUMA
@@ -783,14 +800,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
 	if (section_count == 0)
 		return 0;
 	return add_memory_block(memory_block_id(base_section_nr),
-				MEM_ONLINE, 0,  NULL);
+				MEM_ONLINE, NULL,  NULL);
 }
 
 static int add_hotplug_memory_block(unsigned long block_id,
-				    unsigned long nr_vmemmap_pages,
+				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
 {
-	return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+	return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
 }
 
 static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +835,7 @@ static void remove_memory_block(struct memory_block *memory)
  * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group)
 {
 	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +849,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
 		return -EINVAL;
 
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+		ret = add_hotplug_memory_block(block_id, altmap, group);
 		if (ret)
 			break;
 	}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..f53cfdaaaa41 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -77,11 +77,7 @@ struct memory_block {
 	 */
 	struct zone *zone;
 	struct device dev;
-	/*
-	 * Number of vmemmap pages. These pages
-	 * lay at the beginning of the memory block.
-	 */
-	unsigned long nr_vmemmap_pages;
+	struct vmem_altmap *altmap;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
-				unsigned long vmemmap_pages,
+				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0c4d3fdd31a2..f22831eaa93f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1386,8 +1386,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 
 	/* create memory block devices after memory was added */
-	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
-					  group);
+	ret = create_memory_block_devices(start, size, params.altmap, group);
 	if (ret) {
 		arch_remove_memory(start, size, NULL);
 		goto error;
@@ -1988,12 +1987,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg)
 {
+	struct vmem_altmap **altmap = (struct vmem_altmap **)arg;
 	/*
-	 * If not set, continue with the next block.
+	 * If we have any pages allocated from altmap
+	 * return the altmap details and break callback.
 	 */
-	return mem->nr_vmemmap_pages;
+	if (mem->altmap) {
+		*altmap = mem->altmap;
+		return 1;
+	}
+	return 0;
 }
 
 static int check_cpu_on_node(int nid)
@@ -2068,9 +2073,8 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)
 {
-	struct vmem_altmap mhp_altmap = {};
+	int ret;
 	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages;
 	int rc = 0, nid = NUMA_NO_NODE;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -2093,25 +2097,15 @@ static int __ref try_remove_memory(u64 start, u64 size)
 	 * the same granularity it was added - a single memory block.
 	 */
 	if (mhp_memmap_on_memory()) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
+		ret = walk_memory_blocks(start, size, &altmap,
+					 get_vmemmap_altmap_cb);
+		if (ret) {
 			if (size != memory_block_size_bytes()) {
 				pr_warn("Refuse to remove %#llx - %#llx,"
 					"wrong granularity\n",
 					start, start + size);
 				return -EINVAL;
 			}
-
-			/*
-			 * Let remove_pmd_table->free_hugepage_table do the
-			 * right thing if we used vmem_altmap when hot-adding
-			 * the range.
-			 */
-			mhp_altmap.base_pfn = PHYS_PFN(start);
-			mhp_altmap.free = PHYS_PFN(size) - nr_vmemmap_pages;
-			mhp_altmap.alloc = nr_vmemmap_pages;
-			altmap = &mhp_altmap;
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-11  4:48   ` Aneesh Kumar K.V
@ 2023-07-11  6:21     ` Huang, Ying
  -1 siblings, 0 replies; 56+ messages in thread
From: Huang, Ying @ 2023-07-11  6:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> Add a new kconfig option that can be selected if we want to allow
> pageblock alignment by reserving pages in the vmemmap altmap area.
> This implies we will be reserving some pages for every memoryblock
> This also allows the memmap on memory feature to be widely useful
> with different memory block size values.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/Kconfig          |  9 +++++++
>  mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 58 insertions(+), 10 deletions(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 932349271e28..88a1472b2086 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
>  	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>  	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>  
> +config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
> +       bool "Allow Reserving pages for page block aligment"
> +       depends on MHP_MEMMAP_ON_MEMORY
> +       help
> +	This option allows memmap on memory feature to be more useful
> +	with different memory block sizes. This is achieved by marking some pages
> +	in each memory block as reserved so that we can get page-block alignment
> +	for the remaining pages.
> +
>  endif # MEMORY_HOTPLUG
>  
>  config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 07c99b0cc371..f36aec1f7626 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
>  {
>  	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>  	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> -	unsigned long remaining_size = size - vmemmap_size;
>  
> -	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> -		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>  }
>  #endif
>  
>  static bool mhp_supports_memmap_on_memory(unsigned long size)
>  {
> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> +	unsigned long remaining_size = size - vmemmap_size;
> +
>  	/*
>  	 * Besides having arch support and the feature enabled at runtime, we
>  	 * need a few more assumptions to hold true:
> @@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>  	 *       altmap as an alternative source of memory, and we do not exactly
>  	 *       populate a single PMD.
>  	 */
> -	return mhp_memmap_on_memory() &&
> -		size == memory_block_size_bytes() &&
> -		arch_supports_memmap_on_memory(size);
> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
> +		return false;
> +	 /*
> +	  * Without page reservation remaining pages should be pageblock aligned.
> +	  */
> +	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
> +	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
> +		return false;
> +
> +	return arch_supports_memmap_on_memory(size);
> +}
> +
> +static inline unsigned long memory_block_align_base(unsigned long size)
> +{
> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
> +		unsigned long align;
> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +		unsigned long vmemmap_size;
> +
> +		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;

DIV_ROUND_UP()?

> +		align = pageblock_align(vmemmap_size) - vmemmap_size;
> +		return align;
> +	} else
> +		return 0;
>  }
>  
>  /*
> @@ -1302,7 +1325,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  {
>  	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>  	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
> -	struct vmem_altmap mhp_altmap = {};
> +	struct vmem_altmap mhp_altmap = {
> +		.base_pfn =  PHYS_PFN(res->start),
> +		.end_pfn  =  PHYS_PFN(res->end),
> +		.reserve  = memory_block_align_base(resource_size(res)),
> +	};
>  	struct memory_group *group = NULL;
>  	u64 start, size;
>  	bool new_node = false;
> @@ -1347,8 +1374,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  	 */
>  	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
>  		if (mhp_supports_memmap_on_memory(size)) {
> -			mhp_altmap.free = PHYS_PFN(size);
> -			mhp_altmap.base_pfn = PHYS_PFN(start);
> +			mhp_altmap.free = PHYS_PFN(size) - mhp_altmap.reserve;
>  			params.altmap = &mhp_altmap;
>  		}
>  		/* fallback to not using altmap  */
> @@ -1360,7 +1386,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  		goto error;
>  
>  	/* create memory block devices after memory was added */
> -	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
> +	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
>  					  group);
>  	if (ret) {
>  		arch_remove_memory(start, size, NULL);
> @@ -2260,3 +2286,16 @@ int offline_and_remove_memory(u64 start, u64 size)
>  }
>  EXPORT_SYMBOL_GPL(offline_and_remove_memory);
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> +
> +static int __init memory_hotplug_init(void)
> +{
> +
> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
> +	    mhp_memmap_on_memory()) {
> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
> +			memory_block_align_base(memory_block_size_bytes()));
> +
> +	}
> +	return 0;
> +}
> +module_init(memory_hotplug_init);


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-11  6:21     ` Huang, Ying
  0 siblings, 0 replies; 56+ messages in thread
From: Huang, Ying @ 2023-07-11  6:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, David Hildenbrand, linux-mm, npiggin, Vishal Verma,
	akpm, linuxppc-dev, Oscar Salvador

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> Add a new kconfig option that can be selected if we want to allow
> pageblock alignment by reserving pages in the vmemmap altmap area.
> This implies we will be reserving some pages for every memoryblock
> This also allows the memmap on memory feature to be widely useful
> with different memory block size values.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/Kconfig          |  9 +++++++
>  mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 58 insertions(+), 10 deletions(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 932349271e28..88a1472b2086 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
>  	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>  	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>  
> +config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
> +       bool "Allow Reserving pages for page block aligment"
> +       depends on MHP_MEMMAP_ON_MEMORY
> +       help
> +	This option allows memmap on memory feature to be more useful
> +	with different memory block sizes. This is achieved by marking some pages
> +	in each memory block as reserved so that we can get page-block alignment
> +	for the remaining pages.
> +
>  endif # MEMORY_HOTPLUG
>  
>  config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 07c99b0cc371..f36aec1f7626 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
>  {
>  	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>  	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> -	unsigned long remaining_size = size - vmemmap_size;
>  
> -	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> -		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>  }
>  #endif
>  
>  static bool mhp_supports_memmap_on_memory(unsigned long size)
>  {
> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> +	unsigned long remaining_size = size - vmemmap_size;
> +
>  	/*
>  	 * Besides having arch support and the feature enabled at runtime, we
>  	 * need a few more assumptions to hold true:
> @@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>  	 *       altmap as an alternative source of memory, and we do not exactly
>  	 *       populate a single PMD.
>  	 */
> -	return mhp_memmap_on_memory() &&
> -		size == memory_block_size_bytes() &&
> -		arch_supports_memmap_on_memory(size);
> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
> +		return false;
> +	 /*
> +	  * Without page reservation remaining pages should be pageblock aligned.
> +	  */
> +	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
> +	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
> +		return false;
> +
> +	return arch_supports_memmap_on_memory(size);
> +}
> +
> +static inline unsigned long memory_block_align_base(unsigned long size)
> +{
> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
> +		unsigned long align;
> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +		unsigned long vmemmap_size;
> +
> +		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;

DIV_ROUND_UP()?

> +		align = pageblock_align(vmemmap_size) - vmemmap_size;
> +		return align;
> +	} else
> +		return 0;
>  }
>  
>  /*
> @@ -1302,7 +1325,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  {
>  	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>  	enum memblock_flags memblock_flags = MEMBLOCK_NONE;
> -	struct vmem_altmap mhp_altmap = {};
> +	struct vmem_altmap mhp_altmap = {
> +		.base_pfn =  PHYS_PFN(res->start),
> +		.end_pfn  =  PHYS_PFN(res->end),
> +		.reserve  = memory_block_align_base(resource_size(res)),
> +	};
>  	struct memory_group *group = NULL;
>  	u64 start, size;
>  	bool new_node = false;
> @@ -1347,8 +1374,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  	 */
>  	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
>  		if (mhp_supports_memmap_on_memory(size)) {
> -			mhp_altmap.free = PHYS_PFN(size);
> -			mhp_altmap.base_pfn = PHYS_PFN(start);
> +			mhp_altmap.free = PHYS_PFN(size) - mhp_altmap.reserve;
>  			params.altmap = &mhp_altmap;
>  		}
>  		/* fallback to not using altmap  */
> @@ -1360,7 +1386,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  		goto error;
>  
>  	/* create memory block devices after memory was added */
> -	ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
> +	ret = create_memory_block_devices(start, size, mhp_altmap.alloc + mhp_altmap.reserve,
>  					  group);
>  	if (ret) {
>  		arch_remove_memory(start, size, NULL);
> @@ -2260,3 +2286,16 @@ int offline_and_remove_memory(u64 start, u64 size)
>  }
>  EXPORT_SYMBOL_GPL(offline_and_remove_memory);
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> +
> +static int __init memory_hotplug_init(void)
> +{
> +
> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
> +	    mhp_memmap_on_memory()) {
> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
> +			memory_block_align_base(memory_block_size_bytes()));
> +
> +	}
> +	return 0;
> +}
> +module_init(memory_hotplug_init);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-11  6:21     ` Huang, Ying
@ 2023-07-11  8:20       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11  8:20 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
	Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma

On 7/11/23 11:51 AM, Huang, Ying wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> 
>> Add a new kconfig option that can be selected if we want to allow
>> pageblock alignment by reserving pages in the vmemmap altmap area.
>> This implies we will be reserving some pages for every memoryblock
>> This also allows the memmap on memory feature to be widely useful
>> with different memory block size values.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  mm/Kconfig          |  9 +++++++
>>  mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
>>  2 files changed, 58 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index 932349271e28..88a1472b2086 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
>>  	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>>  	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>>  
>> +config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
>> +       bool "Allow Reserving pages for page block aligment"
>> +       depends on MHP_MEMMAP_ON_MEMORY
>> +       help
>> +	This option allows memmap on memory feature to be more useful
>> +	with different memory block sizes. This is achieved by marking some pages
>> +	in each memory block as reserved so that we can get page-block alignment
>> +	for the remaining pages.
>> +
>>  endif # MEMORY_HOTPLUG
>>  
>>  config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 07c99b0cc371..f36aec1f7626 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>  {
>>  	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>>  	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> -	unsigned long remaining_size = size - vmemmap_size;
>>  
>> -	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
>> -		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
>> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>>  }
>>  #endif
>>  
>>  static bool mhp_supports_memmap_on_memory(unsigned long size)
>>  {
>> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>> +	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> +	unsigned long remaining_size = size - vmemmap_size;
>> +
>>  	/*
>>  	 * Besides having arch support and the feature enabled at runtime, we
>>  	 * need a few more assumptions to hold true:
>> @@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>  	 *       altmap as an alternative source of memory, and we do not exactly
>>  	 *       populate a single PMD.
>>  	 */
>> -	return mhp_memmap_on_memory() &&
>> -		size == memory_block_size_bytes() &&
>> -		arch_supports_memmap_on_memory(size);
>> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>> +		return false;
>> +	 /*
>> +	  * Without page reservation remaining pages should be pageblock aligned.
>> +	  */
>> +	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
>> +	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
>> +		return false;
>> +
>> +	return arch_supports_memmap_on_memory(size);
>> +}
>> +
>> +static inline unsigned long memory_block_align_base(unsigned long size)
>> +{
>> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
>> +		unsigned long align;
>> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>> +		unsigned long vmemmap_size;
>> +
>> +		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
> 
> DIV_ROUND_UP()?
> 
>

yes. Will update.

		vmemmap_size = DIV_ROUND_UP(nr_vmemmap_pages * sizeof(struct page), PAGE_SIZE);

-aneesh



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-11  8:20       ` Aneesh Kumar K V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11  8:20 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Michal Hocko, David Hildenbrand, linux-mm, npiggin, Vishal Verma,
	akpm, linuxppc-dev, Oscar Salvador

On 7/11/23 11:51 AM, Huang, Ying wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> 
>> Add a new kconfig option that can be selected if we want to allow
>> pageblock alignment by reserving pages in the vmemmap altmap area.
>> This implies we will be reserving some pages for every memoryblock
>> This also allows the memmap on memory feature to be widely useful
>> with different memory block size values.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  mm/Kconfig          |  9 +++++++
>>  mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
>>  2 files changed, 58 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index 932349271e28..88a1472b2086 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
>>  	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>>  	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>>  
>> +config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
>> +       bool "Allow Reserving pages for page block aligment"
>> +       depends on MHP_MEMMAP_ON_MEMORY
>> +       help
>> +	This option allows memmap on memory feature to be more useful
>> +	with different memory block sizes. This is achieved by marking some pages
>> +	in each memory block as reserved so that we can get page-block alignment
>> +	for the remaining pages.
>> +
>>  endif # MEMORY_HOTPLUG
>>  
>>  config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 07c99b0cc371..f36aec1f7626 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>  {
>>  	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>>  	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> -	unsigned long remaining_size = size - vmemmap_size;
>>  
>> -	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
>> -		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
>> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>>  }
>>  #endif
>>  
>>  static bool mhp_supports_memmap_on_memory(unsigned long size)
>>  {
>> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>> +	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> +	unsigned long remaining_size = size - vmemmap_size;
>> +
>>  	/*
>>  	 * Besides having arch support and the feature enabled at runtime, we
>>  	 * need a few more assumptions to hold true:
>> @@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>  	 *       altmap as an alternative source of memory, and we do not exactly
>>  	 *       populate a single PMD.
>>  	 */
>> -	return mhp_memmap_on_memory() &&
>> -		size == memory_block_size_bytes() &&
>> -		arch_supports_memmap_on_memory(size);
>> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>> +		return false;
>> +	 /*
>> +	  * Without page reservation remaining pages should be pageblock aligned.
>> +	  */
>> +	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
>> +	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
>> +		return false;
>> +
>> +	return arch_supports_memmap_on_memory(size);
>> +}
>> +
>> +static inline unsigned long memory_block_align_base(unsigned long size)
>> +{
>> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
>> +		unsigned long align;
>> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>> +		unsigned long vmemmap_size;
>> +
>> +		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
> 
> DIV_ROUND_UP()?
> 
>

yes. Will update.

		vmemmap_size = DIV_ROUND_UP(nr_vmemmap_pages * sizeof(struct page), PAGE_SIZE);

-aneesh


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 6/7] dax/kmem: Always enroll hotplugged memory for memmap_on_memory
  2023-07-11  4:48   ` Aneesh Kumar K.V
@ 2023-07-11 10:21     ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 10:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma, Rafael J. Wysocki,
	Len Brown, Dan Williams, Dave Jiang, Dave Hansen, Huang Ying

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> From: Vishal Verma <vishal.l.verma@intel.com>
> 
> With DAX memory regions originating from CXL memory expanders or
> NVDIMMs, the kmem driver may be hot-adding huge amounts of system memory
> on a system without enough 'regular' main memory to support the memmap
> for it. To avoid this, ensure that all kmem managed hotplugged memory is
> added with the MHP_MEMMAP_ON_MEMORY flag to place the memmap on the
> new memory region being hot added.
> 
> To do this, call add_memory() in chunks of memory_block_size_bytes() as
> that is a requirement for memmap_on_memory.
> 
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---


See my feedback as reply to the original patch.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 6/7] dax/kmem: Always enroll hotplugged memory for memmap_on_memory
@ 2023-07-11 10:21     ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 10:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Michal Hocko, Dave Jiang, Rafael J. Wysocki, Vishal Verma,
	Dave Hansen, Huang Ying, Dan Williams, Oscar Salvador, Len Brown

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> From: Vishal Verma <vishal.l.verma@intel.com>
> 
> With DAX memory regions originating from CXL memory expanders or
> NVDIMMs, the kmem driver may be hot-adding huge amounts of system memory
> on a system without enough 'regular' main memory to support the memmap
> for it. To avoid this, ensure that all kmem managed hotplugged memory is
> added with the MHP_MEMMAP_ON_MEMORY flag to place the memmap on the
> new memory region being hot added.
> 
> To do this, call add_memory() in chunks of memory_block_size_bytes() as
> that is a requirement for memmap_on_memory.
> 
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---


See my feedback as reply to the original patch.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
  2023-07-11  4:48   ` Aneesh Kumar K.V
@ 2023-07-11 10:23     ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 10:23 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

> -bool mhp_supports_memmap_on_memory(unsigned long size)
> +static bool mhp_supports_memmap_on_memory(unsigned long size)
>   {
>   	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
>   	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> @@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>   	 * Self hosted memmap array
>   	 */
>   	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
> -		if (!mhp_supports_memmap_on_memory(size)) {
> -			ret = -EINVAL;
> -			goto error;
> +		if (mhp_supports_memmap_on_memory(size)) {
> +			mhp_altmap.free = PHYS_PFN(size);
> +			mhp_altmap.base_pfn = PHYS_PFN(start);
> +			params.altmap = &mhp_altmap;
>   		}
> -		mhp_altmap.free = PHYS_PFN(size);
> -		mhp_altmap.base_pfn = PHYS_PFN(start);
> -		params.altmap = &mhp_altmap;
> +		/* fallback to not using altmap  */
>   	}
>   
>   	/* call arch's memory hotadd */

In general, LGTM, but please extend the documentation of the parameter 
in memory_hotplug.h, stating that this is just a hint and that the core 
can decide to no do that.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
@ 2023-07-11 10:23     ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 10:23 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

> -bool mhp_supports_memmap_on_memory(unsigned long size)
> +static bool mhp_supports_memmap_on_memory(unsigned long size)
>   {
>   	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
>   	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> @@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>   	 * Self hosted memmap array
>   	 */
>   	if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
> -		if (!mhp_supports_memmap_on_memory(size)) {
> -			ret = -EINVAL;
> -			goto error;
> +		if (mhp_supports_memmap_on_memory(size)) {
> +			mhp_altmap.free = PHYS_PFN(size);
> +			mhp_altmap.base_pfn = PHYS_PFN(start);
> +			params.altmap = &mhp_altmap;
>   		}
> -		mhp_altmap.free = PHYS_PFN(size);
> -		mhp_altmap.base_pfn = PHYS_PFN(start);
> -		params.altmap = &mhp_altmap;
> +		/* fallback to not using altmap  */
>   	}
>   
>   	/* call arch's memory hotadd */

In general, LGTM, but please extend the documentation of the parameter 
in memory_hotplug.h, stating that this is just a hint and that the core 
can decide to no do that.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-11  4:48   ` Aneesh Kumar K.V
@ 2023-07-11 10:36     ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 10:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> Some architectures would want different restrictions. Hence add an
> architecture-specific override.
> 
> Both the PMD_SIZE check and pageblock alignment check are moved there.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/memory_hotplug.c | 17 ++++++++++++-----
>   1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1b19462f4e72..07c99b0cc371 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   	return device_online(&mem->dev);
>   }
>   
> -static bool mhp_supports_memmap_on_memory(unsigned long size)
> +#ifndef arch_supports_memmap_on_memory

Can we make that a __weak function instead?

> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>   {
> -	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>   	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>   	unsigned long remaining_size = size - vmemmap_size;
>   
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> +		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));

You're moving that check back to mhp_supports_memmap_on_memory() in the 
following patch, where it actually belongs. So this check should stay in 
mhp_supports_memmap_on_memory(). Might be reasonable to factor out the 
vmemmap_size calculation.


Also, let's a comment

/*
  * As default, we want the vmemmap to span a complete PMD such that we
  * can map the vmemmap using a single PMD if supported by the
  * architecture.
  */
return IS_ALIGNED(vmemmap_size, PMD_SIZE);

> +}
> +#endif
> +
> +static bool mhp_supports_memmap_on_memory(unsigned long size)
> +{
>   	/*
>   	 * Besides having arch support and the feature enabled at runtime, we
>   	 * need a few more assumptions to hold true:
> @@ -1280,9 +1288,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 *       populate a single PMD.
>   	 */
>   	return mhp_memmap_on_memory() &&
> -	       size == memory_block_size_bytes() &&
> -	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> -	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> +		size == memory_block_size_bytes() &&

If you keep the properly aligned indentation, this will not be detected 
as a change by git.

> +		arch_supports_memmap_on_memory(size);
>   }
>   
>   /*

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-11 10:36     ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 10:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> Some architectures would want different restrictions. Hence add an
> architecture-specific override.
> 
> Both the PMD_SIZE check and pageblock alignment check are moved there.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/memory_hotplug.c | 17 ++++++++++++-----
>   1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1b19462f4e72..07c99b0cc371 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   	return device_online(&mem->dev);
>   }
>   
> -static bool mhp_supports_memmap_on_memory(unsigned long size)
> +#ifndef arch_supports_memmap_on_memory

Can we make that a __weak function instead?

> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>   {
> -	unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>   	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>   	unsigned long remaining_size = size - vmemmap_size;
>   
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> +		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));

You're moving that check back to mhp_supports_memmap_on_memory() in the 
following patch, where it actually belongs. So this check should stay in 
mhp_supports_memmap_on_memory(). Might be reasonable to factor out the 
vmemmap_size calculation.


Also, let's a comment

/*
  * As default, we want the vmemmap to span a complete PMD such that we
  * can map the vmemmap using a single PMD if supported by the
  * architecture.
  */
return IS_ALIGNED(vmemmap_size, PMD_SIZE);

> +}
> +#endif
> +
> +static bool mhp_supports_memmap_on_memory(unsigned long size)
> +{
>   	/*
>   	 * Besides having arch support and the feature enabled at runtime, we
>   	 * need a few more assumptions to hold true:
> @@ -1280,9 +1288,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 *       populate a single PMD.
>   	 */
>   	return mhp_memmap_on_memory() &&
> -	       size == memory_block_size_bytes() &&
> -	       IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> -	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> +		size == memory_block_size_bytes() &&

If you keep the properly aligned indentation, this will not be detected 
as a change by git.

> +		arch_supports_memmap_on_memory(size);
>   }
>   
>   /*

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-11  4:48   ` Aneesh Kumar K.V
@ 2023-07-11 15:26     ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 15:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> Radix vmemmap mapping can map things correctly at the PMD level or PTE
> level based on different device boundary checks. Hence we skip the
> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
> makes the feature widely useful because to use PMD_SIZE vmemmap area we
> require a memory block size of 2GiB
> 
> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
> can work with a memory block size of 256MB. Using altmap.reserve feature
> to align things correctly at pageblock granularity. We can end up
> losing some pages in memory with this. For ex: with a 256MiB memory block
> size, we require 4 pages to map vmemmap pages, In order to align things
> correctly we end up adding a reserve of 28 pages. ie, for every 4096
> pages 28 pages get reserved.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/Kconfig                          |  1 +
>   arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>   .../platforms/pseries/hotplug-memory.c        |  3 +-
>   mm/memory_hotplug.c                           |  2 ++
>   4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 116d6add0bb0..f890907e5bbf 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -157,6 +157,7 @@ config PPC
>   	select ARCH_HAS_UBSAN_SANITIZE_ALL
>   	select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   	select ARCH_KEEP_MEMBLOCK
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
>   	select ARCH_MIGHT_HAVE_PC_PARPORT
>   	select ARCH_MIGHT_HAVE_PC_SERIO
>   	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 68817ea7f994..8e6c92dde6ad 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>   int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>   bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>   			   unsigned long page_size);
> +/*
> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
> + * some of the restrictions. We don't check for PMD_SIZE because our
> + * vmemmap allocation code can fallback correctly. The pageblock
> + * alignment requirement is met using altmap->reserve blocks.
> + */
> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
> +{
> +	unsigned long nr_pages = size >> PAGE_SHIFT;
> +	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
> +
> +	if (!radix_enabled())
> +		return false;
> +
> +#ifdef CONFIG_PPC_4K_PAGES
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
> +#else
> +	/*
> +	 * Make sure the vmemmap allocation is fully contianed
> +	 * so that we always allocate vmemmap memory from altmap area.
> +	 * The pageblock alignment requirement is met by using
> +	 * reserve blocks in altmap.
> +	 */
> +	return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);

Can we move that check into common code as well?

If our (original) vmemmap size would not fit into a single page, we 
would be in trouble on any architecture. Did not check if it would be an 
issue for arm64 as well in case we would allow eventually wasting memory.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-11 15:26     ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 15:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> Radix vmemmap mapping can map things correctly at the PMD level or PTE
> level based on different device boundary checks. Hence we skip the
> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
> makes the feature widely useful because to use PMD_SIZE vmemmap area we
> require a memory block size of 2GiB
> 
> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
> can work with a memory block size of 256MB. Using altmap.reserve feature
> to align things correctly at pageblock granularity. We can end up
> losing some pages in memory with this. For ex: with a 256MiB memory block
> size, we require 4 pages to map vmemmap pages, In order to align things
> correctly we end up adding a reserve of 28 pages. ie, for every 4096
> pages 28 pages get reserved.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/Kconfig                          |  1 +
>   arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>   .../platforms/pseries/hotplug-memory.c        |  3 +-
>   mm/memory_hotplug.c                           |  2 ++
>   4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 116d6add0bb0..f890907e5bbf 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -157,6 +157,7 @@ config PPC
>   	select ARCH_HAS_UBSAN_SANITIZE_ALL
>   	select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   	select ARCH_KEEP_MEMBLOCK
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
>   	select ARCH_MIGHT_HAVE_PC_PARPORT
>   	select ARCH_MIGHT_HAVE_PC_SERIO
>   	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 68817ea7f994..8e6c92dde6ad 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>   int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>   bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>   			   unsigned long page_size);
> +/*
> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
> + * some of the restrictions. We don't check for PMD_SIZE because our
> + * vmemmap allocation code can fallback correctly. The pageblock
> + * alignment requirement is met using altmap->reserve blocks.
> + */
> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
> +{
> +	unsigned long nr_pages = size >> PAGE_SHIFT;
> +	unsigned long vmemmap_size = nr_pages * sizeof(struct page);
> +
> +	if (!radix_enabled())
> +		return false;
> +
> +#ifdef CONFIG_PPC_4K_PAGES
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
> +#else
> +	/*
> +	 * Make sure the vmemmap allocation is fully contianed
> +	 * so that we always allocate vmemmap memory from altmap area.
> +	 * The pageblock alignment requirement is met by using
> +	 * reserve blocks in altmap.
> +	 */
> +	return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);

Can we move that check into common code as well?

If our (original) vmemmap size would not fit into a single page, we 
would be in trouble on any architecture. Did not check if it would be an 
issue for arm64 as well in case we would allow eventually wasting memory.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-11 15:26     ` David Hildenbrand
@ 2023-07-11 15:40       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 15:40 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/11/23 8:56 PM, David Hildenbrand wrote:
> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>> level based on different device boundary checks. Hence we skip the
>> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
>> makes the feature widely useful because to use PMD_SIZE vmemmap area we
>> require a memory block size of 2GiB
>>
>> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
>> can work with a memory block size of 256MB. Using altmap.reserve feature
>> to align things correctly at pageblock granularity. We can end up
>> losing some pages in memory with this. For ex: with a 256MiB memory block
>> size, we require 4 pages to map vmemmap pages, In order to align things
>> correctly we end up adding a reserve of 28 pages. ie, for every 4096
>> pages 28 pages get reserved.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/Kconfig                          |  1 +
>>   arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>>   .../platforms/pseries/hotplug-memory.c        |  3 +-
>>   mm/memory_hotplug.c                           |  2 ++
>>   4 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 116d6add0bb0..f890907e5bbf 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -157,6 +157,7 @@ config PPC
>>       select ARCH_HAS_UBSAN_SANITIZE_ALL
>>       select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>       select ARCH_KEEP_MEMBLOCK
>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>       select ARCH_MIGHT_HAVE_PC_PARPORT
>>       select ARCH_MIGHT_HAVE_PC_SERIO
>>       select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
>> index 68817ea7f994..8e6c92dde6ad 100644
>> --- a/arch/powerpc/include/asm/pgtable.h
>> +++ b/arch/powerpc/include/asm/pgtable.h
>> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>>   int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>>   bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>>                  unsigned long page_size);
>> +/*
>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>> + * some of the restrictions. We don't check for PMD_SIZE because our
>> + * vmemmap allocation code can fallback correctly. The pageblock
>> + * alignment requirement is met using altmap->reserve blocks.
>> + */
>> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>> +{
>> +    unsigned long nr_pages = size >> PAGE_SHIFT;
>> +    unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>> +
>> +    if (!radix_enabled())
>> +        return false;
>> +
>> +#ifdef CONFIG_PPC_4K_PAGES
>> +    return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>> +#else
>> +    /*
>> +     * Make sure the vmemmap allocation is fully contianed
>> +     * so that we always allocate vmemmap memory from altmap area.
>> +     * The pageblock alignment requirement is met by using
>> +     * reserve blocks in altmap.
>> +     */
>> +    return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
> 
> Can we move that check into common code as well?
> 
> If our (original) vmemmap size would not fit into a single page, we would be in trouble on any architecture. Did not check if it would be an issue for arm64 as well in case we would allow eventually wasting memory.
> 


For x86 and arm we already do IS_ALIGNED(vmemmap_size, PMD_SIZE); in arch_supports_memmap_on_memory(). That should imply PAGE_SIZE alignment.
If arm64 allow the usage of altmap.reserve, I would expect the arch_supports_memmap_on_memory to have the PAGE_SIZE check.  

Adding the PAGE_SIZE check in  mhp_supports_memmap_on_memory() makes it redundant check for x86 and arm currently? 

modified   mm/memory_hotplug.c
@@ -1293,6 +1293,13 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 */
 	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
 		return false;
+
+	/*
+	 * Make sure the vmemmap allocation is fully contianed
+	 * so that we always allocate vmemmap memory from altmap area.
+	 */
+	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
+		return false;
 	 /*
 	  * Without page reservation remaining pages should be pageblock aligned.
 	  */




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-11 15:40       ` Aneesh Kumar K V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 15:40 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/11/23 8:56 PM, David Hildenbrand wrote:
> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>> level based on different device boundary checks. Hence we skip the
>> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
>> makes the feature widely useful because to use PMD_SIZE vmemmap area we
>> require a memory block size of 2GiB
>>
>> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
>> can work with a memory block size of 256MB. Using altmap.reserve feature
>> to align things correctly at pageblock granularity. We can end up
>> losing some pages in memory with this. For ex: with a 256MiB memory block
>> size, we require 4 pages to map vmemmap pages, In order to align things
>> correctly we end up adding a reserve of 28 pages. ie, for every 4096
>> pages 28 pages get reserved.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/Kconfig                          |  1 +
>>   arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>>   .../platforms/pseries/hotplug-memory.c        |  3 +-
>>   mm/memory_hotplug.c                           |  2 ++
>>   4 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 116d6add0bb0..f890907e5bbf 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -157,6 +157,7 @@ config PPC
>>       select ARCH_HAS_UBSAN_SANITIZE_ALL
>>       select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>       select ARCH_KEEP_MEMBLOCK
>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>       select ARCH_MIGHT_HAVE_PC_PARPORT
>>       select ARCH_MIGHT_HAVE_PC_SERIO
>>       select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
>> index 68817ea7f994..8e6c92dde6ad 100644
>> --- a/arch/powerpc/include/asm/pgtable.h
>> +++ b/arch/powerpc/include/asm/pgtable.h
>> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>>   int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>>   bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>>                  unsigned long page_size);
>> +/*
>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>> + * some of the restrictions. We don't check for PMD_SIZE because our
>> + * vmemmap allocation code can fallback correctly. The pageblock
>> + * alignment requirement is met using altmap->reserve blocks.
>> + */
>> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>> +{
>> +    unsigned long nr_pages = size >> PAGE_SHIFT;
>> +    unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>> +
>> +    if (!radix_enabled())
>> +        return false;
>> +
>> +#ifdef CONFIG_PPC_4K_PAGES
>> +    return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>> +#else
>> +    /*
>> +     * Make sure the vmemmap allocation is fully contianed
>> +     * so that we always allocate vmemmap memory from altmap area.
>> +     * The pageblock alignment requirement is met by using
>> +     * reserve blocks in altmap.
>> +     */
>> +    return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
> 
> Can we move that check into common code as well?
> 
> If our (original) vmemmap size would not fit into a single page, we would be in trouble on any architecture. Did not check if it would be an issue for arm64 as well in case we would allow eventually wasting memory.
> 


For x86 and arm we already do IS_ALIGNED(vmemmap_size, PMD_SIZE); in arch_supports_memmap_on_memory(). That should imply PAGE_SIZE alignment.
If arm64 allow the usage of altmap.reserve, I would expect the arch_supports_memmap_on_memory to have the PAGE_SIZE check.  

Adding the PAGE_SIZE check in  mhp_supports_memmap_on_memory() makes it redundant check for x86 and arm currently? 

modified   mm/memory_hotplug.c
@@ -1293,6 +1293,13 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
 	 */
 	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
 		return false;
+
+	/*
+	 * Make sure the vmemmap allocation is fully contianed
+	 * so that we always allocate vmemmap memory from altmap area.
+	 */
+	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
+		return false;
 	 /*
 	  * Without page reservation remaining pages should be pageblock aligned.
 	  */



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-11 15:40       ` Aneesh Kumar K V
@ 2023-07-11 15:44         ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 15:44 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 11.07.23 17:40, Aneesh Kumar K V wrote:
> On 7/11/23 8:56 PM, David Hildenbrand wrote:
>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>>> level based on different device boundary checks. Hence we skip the
>>> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
>>> makes the feature widely useful because to use PMD_SIZE vmemmap area we
>>> require a memory block size of 2GiB
>>>
>>> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
>>> can work with a memory block size of 256MB. Using altmap.reserve feature
>>> to align things correctly at pageblock granularity. We can end up
>>> losing some pages in memory with this. For ex: with a 256MiB memory block
>>> size, we require 4 pages to map vmemmap pages, In order to align things
>>> correctly we end up adding a reserve of 28 pages. ie, for every 4096
>>> pages 28 pages get reserved.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>    arch/powerpc/Kconfig                          |  1 +
>>>    arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>>>    .../platforms/pseries/hotplug-memory.c        |  3 +-
>>>    mm/memory_hotplug.c                           |  2 ++
>>>    4 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index 116d6add0bb0..f890907e5bbf 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -157,6 +157,7 @@ config PPC
>>>        select ARCH_HAS_UBSAN_SANITIZE_ALL
>>>        select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>        select ARCH_KEEP_MEMBLOCK
>>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>>        select ARCH_MIGHT_HAVE_PC_PARPORT
>>>        select ARCH_MIGHT_HAVE_PC_SERIO
>>>        select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>>> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
>>> index 68817ea7f994..8e6c92dde6ad 100644
>>> --- a/arch/powerpc/include/asm/pgtable.h
>>> +++ b/arch/powerpc/include/asm/pgtable.h
>>> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>>>    int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>>>    bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>>>                   unsigned long page_size);
>>> +/*
>>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>>> + * some of the restrictions. We don't check for PMD_SIZE because our
>>> + * vmemmap allocation code can fallback correctly. The pageblock
>>> + * alignment requirement is met using altmap->reserve blocks.
>>> + */
>>> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
>>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>> +{
>>> +    unsigned long nr_pages = size >> PAGE_SHIFT;
>>> +    unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>> +
>>> +    if (!radix_enabled())
>>> +        return false;
>>> +
>>> +#ifdef CONFIG_PPC_4K_PAGES
>>> +    return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>>> +#else
>>> +    /*
>>> +     * Make sure the vmemmap allocation is fully contianed
>>> +     * so that we always allocate vmemmap memory from altmap area.
>>> +     * The pageblock alignment requirement is met by using
>>> +     * reserve blocks in altmap.
>>> +     */
>>> +    return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
>>
>> Can we move that check into common code as well?
>>
>> If our (original) vmemmap size would not fit into a single page, we would be in trouble on any architecture. Did not check if it would be an issue for arm64 as well in case we would allow eventually wasting memory.
>>
> 
> 
> For x86 and arm we already do IS_ALIGNED(vmemmap_size, PMD_SIZE); in arch_supports_memmap_on_memory(). That should imply PAGE_SIZE alignment.
> If arm64 allow the usage of altmap.reserve, I would expect the arch_supports_memmap_on_memory to have the PAGE_SIZE check.
> 
> Adding the PAGE_SIZE check in  mhp_supports_memmap_on_memory() makes it redundant check for x86 and arm currently?

IMHO not an issue. The common code check is a bit weaker and the arch 
check a bit stronger.

> 
> modified   mm/memory_hotplug.c
> @@ -1293,6 +1293,13 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 */
>   	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>   		return false;
> +
> +	/*
> +	 * Make sure the vmemmap allocation is fully contianed

s/contianed/contained/

> +	 * so that we always allocate vmemmap memory from altmap area.

In theory, it's not only the vmemmap size, but also the vmemmap start 
(that it doesn't start somewhere in between a page, crossing a page). I 
suspect the start is always guaranteed to be aligned (of the vmemmap 
size is aligned), correct?

> +	 */
> +	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
> +		return false;
>   	 /*
>   	  * Without page reservation remaining pages should be pageblock aligned.
>   	  */
> 
> 

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-11 15:44         ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 15:44 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 11.07.23 17:40, Aneesh Kumar K V wrote:
> On 7/11/23 8:56 PM, David Hildenbrand wrote:
>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>>> level based on different device boundary checks. Hence we skip the
>>> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
>>> makes the feature widely useful because to use PMD_SIZE vmemmap area we
>>> require a memory block size of 2GiB
>>>
>>> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
>>> can work with a memory block size of 256MB. Using altmap.reserve feature
>>> to align things correctly at pageblock granularity. We can end up
>>> losing some pages in memory with this. For ex: with a 256MiB memory block
>>> size, we require 4 pages to map vmemmap pages, In order to align things
>>> correctly we end up adding a reserve of 28 pages. ie, for every 4096
>>> pages 28 pages get reserved.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>    arch/powerpc/Kconfig                          |  1 +
>>>    arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>>>    .../platforms/pseries/hotplug-memory.c        |  3 +-
>>>    mm/memory_hotplug.c                           |  2 ++
>>>    4 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index 116d6add0bb0..f890907e5bbf 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -157,6 +157,7 @@ config PPC
>>>        select ARCH_HAS_UBSAN_SANITIZE_ALL
>>>        select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>        select ARCH_KEEP_MEMBLOCK
>>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>>        select ARCH_MIGHT_HAVE_PC_PARPORT
>>>        select ARCH_MIGHT_HAVE_PC_SERIO
>>>        select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>>> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
>>> index 68817ea7f994..8e6c92dde6ad 100644
>>> --- a/arch/powerpc/include/asm/pgtable.h
>>> +++ b/arch/powerpc/include/asm/pgtable.h
>>> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>>>    int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>>>    bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>>>                   unsigned long page_size);
>>> +/*
>>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>>> + * some of the restrictions. We don't check for PMD_SIZE because our
>>> + * vmemmap allocation code can fallback correctly. The pageblock
>>> + * alignment requirement is met using altmap->reserve blocks.
>>> + */
>>> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
>>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>> +{
>>> +    unsigned long nr_pages = size >> PAGE_SHIFT;
>>> +    unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>> +
>>> +    if (!radix_enabled())
>>> +        return false;
>>> +
>>> +#ifdef CONFIG_PPC_4K_PAGES
>>> +    return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>>> +#else
>>> +    /*
>>> +     * Make sure the vmemmap allocation is fully contianed
>>> +     * so that we always allocate vmemmap memory from altmap area.
>>> +     * The pageblock alignment requirement is met by using
>>> +     * reserve blocks in altmap.
>>> +     */
>>> +    return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
>>
>> Can we move that check into common code as well?
>>
>> If our (original) vmemmap size would not fit into a single page, we would be in trouble on any architecture. Did not check if it would be an issue for arm64 as well in case we would allow eventually wasting memory.
>>
> 
> 
> For x86 and arm we already do IS_ALIGNED(vmemmap_size, PMD_SIZE); in arch_supports_memmap_on_memory(). That should imply PAGE_SIZE alignment.
> If arm64 allow the usage of altmap.reserve, I would expect the arch_supports_memmap_on_memory to have the PAGE_SIZE check.
> 
> Adding the PAGE_SIZE check in  mhp_supports_memmap_on_memory() makes it redundant check for x86 and arm currently?

IMHO not an issue. The common code check is a bit weaker and the arch 
check a bit stronger.

> 
> modified   mm/memory_hotplug.c
> @@ -1293,6 +1293,13 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 */
>   	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>   		return false;
> +
> +	/*
> +	 * Make sure the vmemmap allocation is fully contianed

s/contianed/contained/

> +	 * so that we always allocate vmemmap memory from altmap area.

In theory, it's not only the vmemmap size, but also the vmemmap start 
(that it doesn't start somewhere in between a page, crossing a page). I 
suspect the start is always guaranteed to be aligned (of the vmemmap 
size is aligned), correct?

> +	 */
> +	if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
> +		return false;
>   	 /*
>   	  * Without page reservation remaining pages should be pageblock aligned.
>   	  */
> 
> 

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
  2023-07-11 15:44         ` David Hildenbrand
@ 2023-07-11 15:46           ` Aneesh Kumar K V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 15:46 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/11/23 9:14 PM, David Hildenbrand wrote:
> On 11.07.23 17:40, Aneesh Kumar K V wrote:
>> On 7/11/23 8:56 PM, David Hildenbrand wrote:
>>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>>>> level based on different device boundary checks. Hence we skip the
>>>> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
>>>> makes the feature widely useful because to use PMD_SIZE vmemmap area we
>>>> require a memory block size of 2GiB
>>>>
>>>> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
>>>> can work with a memory block size of 256MB. Using altmap.reserve feature
>>>> to align things correctly at pageblock granularity. We can end up
>>>> losing some pages in memory with this. For ex: with a 256MiB memory block
>>>> size, we require 4 pages to map vmemmap pages, In order to align things
>>>> correctly we end up adding a reserve of 28 pages. ie, for every 4096
>>>> pages 28 pages get reserved.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>>    arch/powerpc/Kconfig                          |  1 +
>>>>    arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>>>>    .../platforms/pseries/hotplug-memory.c        |  3 +-
>>>>    mm/memory_hotplug.c                           |  2 ++
>>>>    4 files changed, 33 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>>> index 116d6add0bb0..f890907e5bbf 100644
>>>> --- a/arch/powerpc/Kconfig
>>>> +++ b/arch/powerpc/Kconfig
>>>> @@ -157,6 +157,7 @@ config PPC
>>>>        select ARCH_HAS_UBSAN_SANITIZE_ALL
>>>>        select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>>        select ARCH_KEEP_MEMBLOCK
>>>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>>>        select ARCH_MIGHT_HAVE_PC_PARPORT
>>>>        select ARCH_MIGHT_HAVE_PC_SERIO
>>>>        select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>>>> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
>>>> index 68817ea7f994..8e6c92dde6ad 100644
>>>> --- a/arch/powerpc/include/asm/pgtable.h
>>>> +++ b/arch/powerpc/include/asm/pgtable.h
>>>> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>>>>    int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>>>>    bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>>>>                   unsigned long page_size);
>>>> +/*
>>>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>>>> + * some of the restrictions. We don't check for PMD_SIZE because our
>>>> + * vmemmap allocation code can fallback correctly. The pageblock
>>>> + * alignment requirement is met using altmap->reserve blocks.
>>>> + */
>>>> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
>>>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>>> +{
>>>> +    unsigned long nr_pages = size >> PAGE_SHIFT;
>>>> +    unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>> +
>>>> +    if (!radix_enabled())
>>>> +        return false;
>>>> +
>>>> +#ifdef CONFIG_PPC_4K_PAGES
>>>> +    return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>>>> +#else
>>>> +    /*
>>>> +     * Make sure the vmemmap allocation is fully contianed
>>>> +     * so that we always allocate vmemmap memory from altmap area.
>>>> +     * The pageblock alignment requirement is met by using
>>>> +     * reserve blocks in altmap.
>>>> +     */
>>>> +    return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
>>>
>>> Can we move that check into common code as well?
>>>
>>> If our (original) vmemmap size would not fit into a single page, we would be in trouble on any architecture. Did not check if it would be an issue for arm64 as well in case we would allow eventually wasting memory.
>>>
>>
>>
>> For x86 and arm we already do IS_ALIGNED(vmemmap_size, PMD_SIZE); in arch_supports_memmap_on_memory(). That should imply PAGE_SIZE alignment.
>> If arm64 allow the usage of altmap.reserve, I would expect the arch_supports_memmap_on_memory to have the PAGE_SIZE check.
>>
>> Adding the PAGE_SIZE check in  mhp_supports_memmap_on_memory() makes it redundant check for x86 and arm currently?
> 
> IMHO not an issue. The common code check is a bit weaker and the arch check a bit stronger.
> 
>>

ok will update

>> modified   mm/memory_hotplug.c
>> @@ -1293,6 +1293,13 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>        */
>>       if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>>           return false;
>> +
>> +    /*
>> +     * Make sure the vmemmap allocation is fully contianed
> 
> s/contianed/contained/
> 
>> +     * so that we always allocate vmemmap memory from altmap area.
> 
> In theory, it's not only the vmemmap size, but also the vmemmap start (that it doesn't start somewhere in between a page, crossing a page). I suspect the start is always guaranteed to be aligned (of the vmemmap size is aligned), correct?
> 

That is correct.

>> +     */
>> +    if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>> +        return false;
>>        /*
>>         * Without page reservation remaining pages should be pageblock aligned.
>>         */
>>


-aneesh
> 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
@ 2023-07-11 15:46           ` Aneesh Kumar K V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 15:46 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/11/23 9:14 PM, David Hildenbrand wrote:
> On 11.07.23 17:40, Aneesh Kumar K V wrote:
>> On 7/11/23 8:56 PM, David Hildenbrand wrote:
>>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>>>> level based on different device boundary checks. Hence we skip the
>>>> restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
>>>> makes the feature widely useful because to use PMD_SIZE vmemmap area we
>>>> require a memory block size of 2GiB
>>>>
>>>> We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
>>>> can work with a memory block size of 256MB. Using altmap.reserve feature
>>>> to align things correctly at pageblock granularity. We can end up
>>>> losing some pages in memory with this. For ex: with a 256MiB memory block
>>>> size, we require 4 pages to map vmemmap pages, In order to align things
>>>> correctly we end up adding a reserve of 28 pages. ie, for every 4096
>>>> pages 28 pages get reserved.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>>    arch/powerpc/Kconfig                          |  1 +
>>>>    arch/powerpc/include/asm/pgtable.h            | 28 +++++++++++++++++++
>>>>    .../platforms/pseries/hotplug-memory.c        |  3 +-
>>>>    mm/memory_hotplug.c                           |  2 ++
>>>>    4 files changed, 33 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>>> index 116d6add0bb0..f890907e5bbf 100644
>>>> --- a/arch/powerpc/Kconfig
>>>> +++ b/arch/powerpc/Kconfig
>>>> @@ -157,6 +157,7 @@ config PPC
>>>>        select ARCH_HAS_UBSAN_SANITIZE_ALL
>>>>        select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>>        select ARCH_KEEP_MEMBLOCK
>>>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>>>        select ARCH_MIGHT_HAVE_PC_PARPORT
>>>>        select ARCH_MIGHT_HAVE_PC_SERIO
>>>>        select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>>>> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
>>>> index 68817ea7f994..8e6c92dde6ad 100644
>>>> --- a/arch/powerpc/include/asm/pgtable.h
>>>> +++ b/arch/powerpc/include/asm/pgtable.h
>>>> @@ -169,6 +169,34 @@ static inline bool is_ioremap_addr(const void *x)
>>>>    int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
>>>>    bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
>>>>                   unsigned long page_size);
>>>> +/*
>>>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>>>> + * some of the restrictions. We don't check for PMD_SIZE because our
>>>> + * vmemmap allocation code can fallback correctly. The pageblock
>>>> + * alignment requirement is met using altmap->reserve blocks.
>>>> + */
>>>> +#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
>>>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>>> +{
>>>> +    unsigned long nr_pages = size >> PAGE_SHIFT;
>>>> +    unsigned long vmemmap_size = nr_pages * sizeof(struct page);
>>>> +
>>>> +    if (!radix_enabled())
>>>> +        return false;
>>>> +
>>>> +#ifdef CONFIG_PPC_4K_PAGES
>>>> +    return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>>>> +#else
>>>> +    /*
>>>> +     * Make sure the vmemmap allocation is fully contianed
>>>> +     * so that we always allocate vmemmap memory from altmap area.
>>>> +     * The pageblock alignment requirement is met by using
>>>> +     * reserve blocks in altmap.
>>>> +     */
>>>> +    return IS_ALIGNED(vmemmap_size,  PAGE_SIZE);
>>>
>>> Can we move that check into common code as well?
>>>
>>> If our (original) vmemmap size would not fit into a single page, we would be in trouble on any architecture. Did not check if it would be an issue for arm64 as well in case we would allow eventually wasting memory.
>>>
>>
>>
>> For x86 and arm we already do IS_ALIGNED(vmemmap_size, PMD_SIZE); in arch_supports_memmap_on_memory(). That should imply PAGE_SIZE alignment.
>> If arm64 allow the usage of altmap.reserve, I would expect the arch_supports_memmap_on_memory to have the PAGE_SIZE check.
>>
>> Adding the PAGE_SIZE check in  mhp_supports_memmap_on_memory() makes it redundant check for x86 and arm currently?
> 
> IMHO not an issue. The common code check is a bit weaker and the arch check a bit stronger.
> 
>>

ok will update

>> modified   mm/memory_hotplug.c
>> @@ -1293,6 +1293,13 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>        */
>>       if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
>>           return false;
>> +
>> +    /*
>> +     * Make sure the vmemmap allocation is fully contianed
> 
> s/contianed/contained/
> 
>> +     * so that we always allocate vmemmap memory from altmap area.
> 
> In theory, it's not only the vmemmap size, but also the vmemmap start (that it doesn't start somewhere in between a page, crossing a page). I suspect the start is always guaranteed to be aligned (of the vmemmap size is aligned), correct?
> 

That is correct.

>> +     */
>> +    if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE))
>> +        return false;
>>        /*
>>         * Without page reservation remaining pages should be pageblock aligned.
>>         */
>>


-aneesh
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
  2023-07-11 10:23     ` David Hildenbrand
@ 2023-07-11 15:58       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 15:58 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/11/23 3:53 PM, David Hildenbrand wrote:
>> -bool mhp_supports_memmap_on_memory(unsigned long size)
>> +static bool mhp_supports_memmap_on_memory(unsigned long size)
>>   {
>>       unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
>>       unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> @@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>        * Self hosted memmap array
>>        */
>>       if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
>> -        if (!mhp_supports_memmap_on_memory(size)) {
>> -            ret = -EINVAL;
>> -            goto error;
>> +        if (mhp_supports_memmap_on_memory(size)) {
>> +            mhp_altmap.free = PHYS_PFN(size);
>> +            mhp_altmap.base_pfn = PHYS_PFN(start);
>> +            params.altmap = &mhp_altmap;
>>           }
>> -        mhp_altmap.free = PHYS_PFN(size);
>> -        mhp_altmap.base_pfn = PHYS_PFN(start);
>> -        params.altmap = &mhp_altmap;
>> +        /* fallback to not using altmap  */
>>       }
>>         /* call arch's memory hotadd */
> 
> In general, LGTM, but please extend the documentation of the parameter in memory_hotplug.h, stating that this is just a hint and that the core can decide to no do that.
> 

will update

modified   include/linux/memory_hotplug.h
@@ -97,6 +97,8 @@ typedef int __bitwise mhp_t;
  * To do so, we will use the beginning of the hot-added range to build
  * the page tables for the memmap array that describes the entire range.
  * Only selected architectures support it with SPARSE_VMEMMAP.
+ * This is only a hint, core kernel can decide to not do this based on
+ * different alignment checks.
  */
 #define MHP_MEMMAP_ON_MEMORY   ((__force mhp_t)BIT(1))



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback
@ 2023-07-11 15:58       ` Aneesh Kumar K V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 15:58 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/11/23 3:53 PM, David Hildenbrand wrote:
>> -bool mhp_supports_memmap_on_memory(unsigned long size)
>> +static bool mhp_supports_memmap_on_memory(unsigned long size)
>>   {
>>       unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
>>       unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> @@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>        * Self hosted memmap array
>>        */
>>       if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
>> -        if (!mhp_supports_memmap_on_memory(size)) {
>> -            ret = -EINVAL;
>> -            goto error;
>> +        if (mhp_supports_memmap_on_memory(size)) {
>> +            mhp_altmap.free = PHYS_PFN(size);
>> +            mhp_altmap.base_pfn = PHYS_PFN(start);
>> +            params.altmap = &mhp_altmap;
>>           }
>> -        mhp_altmap.free = PHYS_PFN(size);
>> -        mhp_altmap.base_pfn = PHYS_PFN(start);
>> -        params.altmap = &mhp_altmap;
>> +        /* fallback to not using altmap  */
>>       }
>>         /* call arch's memory hotadd */
> 
> In general, LGTM, but please extend the documentation of the parameter in memory_hotplug.h, stating that this is just a hint and that the core can decide to no do that.
> 

will update

modified   include/linux/memory_hotplug.h
@@ -97,6 +97,8 @@ typedef int __bitwise mhp_t;
  * To do so, we will use the beginning of the hot-added range to build
  * the page tables for the memmap array that describes the entire range.
  * Only selected architectures support it with SPARSE_VMEMMAP.
+ * This is only a hint, core kernel can decide to not do this based on
+ * different alignment checks.
  */
 #define MHP_MEMMAP_ON_MEMORY   ((__force mhp_t)BIT(1))


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-11 10:36     ` David Hildenbrand
@ 2023-07-11 16:07       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 16:07 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/11/23 4:06 PM, David Hildenbrand wrote:
> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>> Some architectures would want different restrictions. Hence add an
>> architecture-specific override.
>>
>> Both the PMD_SIZE check and pageblock alignment check are moved there.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   mm/memory_hotplug.c | 17 ++++++++++++-----
>>   1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1b19462f4e72..07c99b0cc371 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>       return device_online(&mem->dev);
>>   }
>>   -static bool mhp_supports_memmap_on_memory(unsigned long size)
>> +#ifndef arch_supports_memmap_on_memory
> 
> Can we make that a __weak function instead?


We can. It is confusing because we do have these two patterns within the kernel where we use 

#ifndef x
#endif 

vs

__weak x 

What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.


> 
>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>   {
>> -    unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
>> +    unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>>       unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>>       unsigned long remaining_size = size - vmemmap_size;
>>   +    return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
>> +        IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> 
> You're moving that check back to mhp_supports_memmap_on_memory() in the following patch, where it actually belongs. So this check should stay in mhp_supports_memmap_on_memory(). Might be reasonable to factor out the vmemmap_size calculation.
> 
> 
> Also, let's a comment
> 
> /*
>  * As default, we want the vmemmap to span a complete PMD such that we
>  * can map the vmemmap using a single PMD if supported by the
>  * architecture.
>  */
> return IS_ALIGNED(vmemmap_size, PMD_SIZE);
> 
>> +}
>> +#endif
>> +
>> +static bool mhp_supports_memmap_on_memory(unsigned long size)
>> +{
>>       /*
>>        * Besides having arch support and the feature enabled at runtime, we
>>        * need a few more assumptions to hold true:
>> @@ -1280,9 +1288,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>        *       populate a single PMD.
>>        */
>>       return mhp_memmap_on_memory() &&
>> -           size == memory_block_size_bytes() &&
>> -           IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
>> -           IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
>> +        size == memory_block_size_bytes() &&
> 
> If you keep the properly aligned indentation, this will not be detected as a change by git.
> 
>> +        arch_supports_memmap_on_memory(size);
>>   }
>>     /*
> 

Will update the code based on the above feedback.

-aneesh


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-11 16:07       ` Aneesh Kumar K V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-11 16:07 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/11/23 4:06 PM, David Hildenbrand wrote:
> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>> Some architectures would want different restrictions. Hence add an
>> architecture-specific override.
>>
>> Both the PMD_SIZE check and pageblock alignment check are moved there.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   mm/memory_hotplug.c | 17 ++++++++++++-----
>>   1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1b19462f4e72..07c99b0cc371 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>       return device_online(&mem->dev);
>>   }
>>   -static bool mhp_supports_memmap_on_memory(unsigned long size)
>> +#ifndef arch_supports_memmap_on_memory
> 
> Can we make that a __weak function instead?


We can. It is confusing because we do have these two patterns within the kernel where we use 

#ifndef x
#endif 

vs

__weak x 

What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.


> 
>> +static inline bool arch_supports_memmap_on_memory(unsigned long size)
>>   {
>> -    unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
>> +    unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>>       unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>>       unsigned long remaining_size = size - vmemmap_size;
>>   +    return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
>> +        IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> 
> You're moving that check back to mhp_supports_memmap_on_memory() in the following patch, where it actually belongs. So this check should stay in mhp_supports_memmap_on_memory(). Might be reasonable to factor out the vmemmap_size calculation.
> 
> 
> Also, let's a comment
> 
> /*
>  * As default, we want the vmemmap to span a complete PMD such that we
>  * can map the vmemmap using a single PMD if supported by the
>  * architecture.
>  */
> return IS_ALIGNED(vmemmap_size, PMD_SIZE);
> 
>> +}
>> +#endif
>> +
>> +static bool mhp_supports_memmap_on_memory(unsigned long size)
>> +{
>>       /*
>>        * Besides having arch support and the feature enabled at runtime, we
>>        * need a few more assumptions to hold true:
>> @@ -1280,9 +1288,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>>        *       populate a single PMD.
>>        */
>>       return mhp_memmap_on_memory() &&
>> -           size == memory_block_size_bytes() &&
>> -           IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
>> -           IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
>> +        size == memory_block_size_bytes() &&
> 
> If you keep the properly aligned indentation, this will not be detected as a change by git.
> 
>> +        arch_supports_memmap_on_memory(size);
>>   }
>>     /*
> 

Will update the code based on the above feedback.

-aneesh

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-11 16:07       ` Aneesh Kumar K V
@ 2023-07-11 16:09         ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 16:09 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 11.07.23 18:07, Aneesh Kumar K V wrote:
> On 7/11/23 4:06 PM, David Hildenbrand wrote:
>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>> Some architectures would want different restrictions. Hence add an
>>> architecture-specific override.
>>>
>>> Both the PMD_SIZE check and pageblock alignment check are moved there.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>    mm/memory_hotplug.c | 17 ++++++++++++-----
>>>    1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 1b19462f4e72..07c99b0cc371 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>>        return device_online(&mem->dev);
>>>    }
>>>    -static bool mhp_supports_memmap_on_memory(unsigned long size)
>>> +#ifndef arch_supports_memmap_on_memory
>>
>> Can we make that a __weak function instead?
> 
> 
> We can. It is confusing because we do have these two patterns within the kernel where we use
> 
> #ifndef x
> #endif
> 
> vs
> 
> __weak x
> 
> What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.
> 

I think when placing the implementation in a C file, it's __weak. But 
don't ask me :)

We do this already for arch_get_mappable_range() in mm/memory_hotplug.c 
and IMHO it looks quite nice.


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-11 16:09         ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 16:09 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 11.07.23 18:07, Aneesh Kumar K V wrote:
> On 7/11/23 4:06 PM, David Hildenbrand wrote:
>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>> Some architectures would want different restrictions. Hence add an
>>> architecture-specific override.
>>>
>>> Both the PMD_SIZE check and pageblock alignment check are moved there.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>    mm/memory_hotplug.c | 17 ++++++++++++-----
>>>    1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 1b19462f4e72..07c99b0cc371 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1247,12 +1247,20 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>>        return device_online(&mem->dev);
>>>    }
>>>    -static bool mhp_supports_memmap_on_memory(unsigned long size)
>>> +#ifndef arch_supports_memmap_on_memory
>>
>> Can we make that a __weak function instead?
> 
> 
> We can. It is confusing because we do have these two patterns within the kernel where we use
> 
> #ifndef x
> #endif
> 
> vs
> 
> __weak x
> 
> What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.
> 

I think when placing the implementation in a C file, it's __weak. But 
don't ask me :)

We do this already for arch_get_mappable_range() in mm/memory_hotplug.c 
and IMHO it looks quite nice.


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-11  4:48   ` Aneesh Kumar K.V
@ 2023-07-11 17:19     ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 17:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> Add a new kconfig option that can be selected if we want to allow
> pageblock alignment by reserving pages in the vmemmap altmap area.
> This implies we will be reserving some pages for every memoryblock
> This also allows the memmap on memory feature to be widely useful
> with different memory block size values.

"reserving pages" is a nice way of saying "wasting memory". :) Let's 
spell that out.

I think we have to find a better name for this, and I think we should 
have a toggle similar to memory_hotplug.memmap_on_memory. This should be 
an admin decision, not some kernel config option.


memory_hotplug.force_memmap_on_memory

"Enable the memmap on memory feature even if it could result in memory 
waste due to memmap size limitations. For example, if the memmap for a 
memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
of hotplugged memory will be wasted. Note that there are still cases 
where the feature cannot be enforced: for example, if the memmap is 
smaller than a single page, or if the architecture does not support the 
forced mode in all configurations."

Thoughts?

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/Kconfig          |  9 +++++++
>   mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
>   2 files changed, 58 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 932349271e28..88a1472b2086 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
>   	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>   	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   
> +config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
> +       bool "Allow Reserving pages for page block aligment"
> +       depends on MHP_MEMMAP_ON_MEMORY
> +       help
> +	This option allows memmap on memory feature to be more useful
> +	with different memory block sizes. This is achieved by marking some pages
> +	in each memory block as reserved so that we can get page-block alignment
> +	for the remaining pages.
> +



>   endif # MEMORY_HOTPLUG
>   
>   config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 07c99b0cc371..f36aec1f7626 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
>   {
>   	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>   	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> -	unsigned long remaining_size = size - vmemmap_size;
>   
> -	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> -		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>   }
>   #endif
>   
>   static bool mhp_supports_memmap_on_memory(unsigned long size)
>   {
> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> +	unsigned long remaining_size = size - vmemmap_size;
> +
>   	/*
>   	 * Besides having arch support and the feature enabled at runtime, we
>   	 * need a few more assumptions to hold true:
> @@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 *       altmap as an alternative source of memory, and we do not exactly
>   	 *       populate a single PMD.
>   	 */
> -	return mhp_memmap_on_memory() &&
> -		size == memory_block_size_bytes() &&
> -		arch_supports_memmap_on_memory(size);
> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
> +		return false;
> +	 /*
> +	  * Without page reservation remaining pages should be pageblock aligned.
> +	  */
> +	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
> +	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
> +		return false;
> +
> +	return arch_supports_memmap_on_memory(size);
> +}
> +
> +static inline unsigned long memory_block_align_base(unsigned long size)
> +{
> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
> +		unsigned long align;
> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +		unsigned long vmemmap_size;
> +
> +		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
> +		align = pageblock_align(vmemmap_size) - vmemmap_size;

We should probably have a helper to calculate

a) the unaligned vmemmap size, for example used in 
arch_supports_memmap_on_memory()

b) the pageblock-aligned vmemmap size.


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-11 17:19     ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-11 17:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 11.07.23 06:48, Aneesh Kumar K.V wrote:
> Add a new kconfig option that can be selected if we want to allow
> pageblock alignment by reserving pages in the vmemmap altmap area.
> This implies we will be reserving some pages for every memoryblock
> This also allows the memmap on memory feature to be widely useful
> with different memory block size values.

"reserving pages" is a nice way of saying "wasting memory". :) Let's 
spell that out.

I think we have to find a better name for this, and I think we should 
have a toggle similar to memory_hotplug.memmap_on_memory. This should be 
an admin decision, not some kernel config option.


memory_hotplug.force_memmap_on_memory

"Enable the memmap on memory feature even if it could result in memory 
waste due to memmap size limitations. For example, if the memmap for a 
memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
of hotplugged memory will be wasted. Note that there are still cases 
where the feature cannot be enforced: for example, if the memmap is 
smaller than a single page, or if the architecture does not support the 
forced mode in all configurations."

Thoughts?

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/Kconfig          |  9 +++++++
>   mm/memory_hotplug.c | 59 +++++++++++++++++++++++++++++++++++++--------
>   2 files changed, 58 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 932349271e28..88a1472b2086 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -570,6 +570,15 @@ config MHP_MEMMAP_ON_MEMORY
>   	depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
>   	depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   
> +config MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY
> +       bool "Allow Reserving pages for page block aligment"
> +       depends on MHP_MEMMAP_ON_MEMORY
> +       help
> +	This option allows memmap on memory feature to be more useful
> +	with different memory block sizes. This is achieved by marking some pages
> +	in each memory block as reserved so that we can get page-block alignment
> +	for the remaining pages.
> +



>   endif # MEMORY_HOTPLUG
>   
>   config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 07c99b0cc371..f36aec1f7626 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1252,15 +1252,17 @@ static inline bool arch_supports_memmap_on_memory(unsigned long size)
>   {
>   	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
>   	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> -	unsigned long remaining_size = size - vmemmap_size;
>   
> -	return IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
> -		IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
> +	return IS_ALIGNED(vmemmap_size, PMD_SIZE);
>   }
>   #endif
>   
>   static bool mhp_supports_memmap_on_memory(unsigned long size)
>   {
> +	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +	unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> +	unsigned long remaining_size = size - vmemmap_size;
> +
>   	/*
>   	 * Besides having arch support and the feature enabled at runtime, we
>   	 * need a few more assumptions to hold true:
> @@ -1287,9 +1289,30 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
>   	 *       altmap as an alternative source of memory, and we do not exactly
>   	 *       populate a single PMD.
>   	 */
> -	return mhp_memmap_on_memory() &&
> -		size == memory_block_size_bytes() &&
> -		arch_supports_memmap_on_memory(size);
> +	if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
> +		return false;
> +	 /*
> +	  * Without page reservation remaining pages should be pageblock aligned.
> +	  */
> +	if (!IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY) &&
> +	    !IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)))
> +		return false;
> +
> +	return arch_supports_memmap_on_memory(size);
> +}
> +
> +static inline unsigned long memory_block_align_base(unsigned long size)
> +{
> +	if (IS_ENABLED(CONFIG_MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY)) {
> +		unsigned long align;
> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +		unsigned long vmemmap_size;
> +
> +		vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
> +		align = pageblock_align(vmemmap_size) - vmemmap_size;

We should probably have a helper to calculate

a) the unaligned vmemmap size, for example used in 
arch_supports_memmap_on_memory()

b) the pageblock-aligned vmemmap size.


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-11 17:19     ` David Hildenbrand
@ 2023-07-12  3:16       ` Aneesh Kumar K V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-12  3:16 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/11/23 10:49 PM, David Hildenbrand wrote:
> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>> Add a new kconfig option that can be selected if we want to allow
>> pageblock alignment by reserving pages in the vmemmap altmap area.
>> This implies we will be reserving some pages for every memoryblock
>> This also allows the memmap on memory feature to be widely useful
>> with different memory block size values.
> 
> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
> 
> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
> 
> 
> memory_hotplug.force_memmap_on_memory
> 
> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
> 
> Thoughts?
> 

With module parameter, do we still need the Kconfig option? 

-aneesh



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-12  3:16       ` Aneesh Kumar K V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K V @ 2023-07-12  3:16 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/11/23 10:49 PM, David Hildenbrand wrote:
> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>> Add a new kconfig option that can be selected if we want to allow
>> pageblock alignment by reserving pages in the vmemmap altmap area.
>> This implies we will be reserving some pages for every memoryblock
>> This also allows the memmap on memory feature to be widely useful
>> with different memory block size values.
> 
> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
> 
> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
> 
> 
> memory_hotplug.force_memmap_on_memory
> 
> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
> 
> Thoughts?
> 

With module parameter, do we still need the Kconfig option? 

-aneesh


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-12  3:16       ` Aneesh Kumar K V
@ 2023-07-12  7:22         ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-12  7:22 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 12.07.23 05:16, Aneesh Kumar K V wrote:
> On 7/11/23 10:49 PM, David Hildenbrand wrote:
>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>> Add a new kconfig option that can be selected if we want to allow
>>> pageblock alignment by reserving pages in the vmemmap altmap area.
>>> This implies we will be reserving some pages for every memoryblock
>>> This also allows the memmap on memory feature to be widely useful
>>> with different memory block size values.
>>
>> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
>>
>> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
>>
>>
>> memory_hotplug.force_memmap_on_memory
>>
>> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
>> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
>>
>> Thoughts?
>>
> 
> With module parameter, do we still need the Kconfig option?

No.

Sleeping over this, maybe we can convert the existing 
memory_hotplug.memmap_on_memory parameter to also accept "force".

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-12  7:22         ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-12  7:22 UTC (permalink / raw)
  To: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 12.07.23 05:16, Aneesh Kumar K V wrote:
> On 7/11/23 10:49 PM, David Hildenbrand wrote:
>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>> Add a new kconfig option that can be selected if we want to allow
>>> pageblock alignment by reserving pages in the vmemmap altmap area.
>>> This implies we will be reserving some pages for every memoryblock
>>> This also allows the memmap on memory feature to be widely useful
>>> with different memory block size values.
>>
>> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
>>
>> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
>>
>>
>> memory_hotplug.force_memmap_on_memory
>>
>> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
>> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
>>
>> Thoughts?
>>
> 
> With module parameter, do we still need the Kconfig option?

No.

Sleeping over this, maybe we can convert the existing 
memory_hotplug.memmap_on_memory parameter to also accept "force".

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-12  7:22         ` David Hildenbrand
@ 2023-07-12 13:50           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-12 13:50 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

David Hildenbrand <david@redhat.com> writes:

> On 12.07.23 05:16, Aneesh Kumar K V wrote:
>> On 7/11/23 10:49 PM, David Hildenbrand wrote:
>>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>>> Add a new kconfig option that can be selected if we want to allow
>>>> pageblock alignment by reserving pages in the vmemmap altmap area.
>>>> This implies we will be reserving some pages for every memoryblock
>>>> This also allows the memmap on memory feature to be widely useful
>>>> with different memory block size values.
>>>
>>> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
>>>
>>> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
>>>
>>>
>>> memory_hotplug.force_memmap_on_memory
>>>
>>> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
>>> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
>>>
>>> Thoughts?
>>>
>> 
>> With module parameter, do we still need the Kconfig option?
>
> No.
>
> Sleeping over this, maybe we can convert the existing 
> memory_hotplug.memmap_on_memory parameter to also accept "force".
>

How about this? 

modified   mm/memory_hotplug.c
@@ -45,13 +45,67 @@
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
-module_param(memmap_on_memory, bool, 0444);
-MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
+enum {
+	MEMMAP_ON_MEMORY_DISABLE = 0,
+	MEMMAP_ON_MEMORY_ENABLE,
+	FORCE_MEMMAP_ON_MEMORY,
+};
+static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
+static const char *memmap_on_memory_to_str[] = {
+	[MEMMAP_ON_MEMORY_DISABLE]  = "disable",
+	[MEMMAP_ON_MEMORY_ENABLE]   = "enable",
+	[FORCE_MEMMAP_ON_MEMORY]    = "force",
+};
+
+static inline unsigned long memory_block_align_base(unsigned long size)
+{
+	if (memmap_mode == FORCE_MEMMAP_ON_MEMORY) {
+		unsigned long align;
+		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+		unsigned long vmemmap_size;
+
+		vmemmap_size = DIV_ROUND_UP(nr_vmemmap_pages * sizeof(struct page), PAGE_SIZE);
+		align = pageblock_align(vmemmap_size) - vmemmap_size;
+		return align;
+	} else
+		return 0;
+}
+
+static int set_memmap_mode(const char *val, const struct kernel_param *kp)
+{
+	int ret = sysfs_match_string(memmap_on_memory_to_str, val);
+
+	if (ret < 0)
+		return ret;
+	*((int *)kp->arg) = ret;
+	if (ret == FORCE_MEMMAP_ON_MEMORY) {
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memory_block_align_base(memory_block_size_bytes()));
+	}
+	return 0;
+}
+
+static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%s\n", memmap_on_memory_to_str[*((int *)kp->arg)]);
+}
+
+static const struct kernel_param_ops memmap_mode_ops = {
+	.set = set_memmap_mode,
+	.get = get_memmap_mode,
+};
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
+MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
+	"With value \"force\" it could result in memory waste due to memmap size limitations \n"
+	"For example, if the memmap for a memory block requires 1 MiB, but the pageblock \n"
+	"size is 2 MiB, 1 MiB of hotplugged memory will be wasted. Note that there are \n"
+	"still cases where the feature cannot be enforced: for example, if the memmap is \n"
+	"smaller than a single page, or if the architecture does not support the forced \n"
+	"mode in all configurations. (disable/enable/force)");
 
 static inline bool mhp_memmap_on_memory(void)
 {
-	return memmap_on_memory;
+	return !!memmap_mode;
 }
 #else

We can also enable runtime enable/disable/force the feature. We just
need to make sure on try_remove_memory we lookup for altmap correctly. 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-12 13:50           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 56+ messages in thread
From: Aneesh Kumar K.V @ 2023-07-12 13:50 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

David Hildenbrand <david@redhat.com> writes:

> On 12.07.23 05:16, Aneesh Kumar K V wrote:
>> On 7/11/23 10:49 PM, David Hildenbrand wrote:
>>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>>> Add a new kconfig option that can be selected if we want to allow
>>>> pageblock alignment by reserving pages in the vmemmap altmap area.
>>>> This implies we will be reserving some pages for every memoryblock
>>>> This also allows the memmap on memory feature to be widely useful
>>>> with different memory block size values.
>>>
>>> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
>>>
>>> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
>>>
>>>
>>> memory_hotplug.force_memmap_on_memory
>>>
>>> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
>>> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
>>>
>>> Thoughts?
>>>
>> 
>> With module parameter, do we still need the Kconfig option?
>
> No.
>
> Sleeping over this, maybe we can convert the existing 
> memory_hotplug.memmap_on_memory parameter to also accept "force".
>

How about this? 

modified   mm/memory_hotplug.c
@@ -45,13 +45,67 @@
 /*
  * memory_hotplug.memmap_on_memory parameter
  */
-static bool memmap_on_memory __ro_after_init;
-module_param(memmap_on_memory, bool, 0444);
-MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
+enum {
+	MEMMAP_ON_MEMORY_DISABLE = 0,
+	MEMMAP_ON_MEMORY_ENABLE,
+	FORCE_MEMMAP_ON_MEMORY,
+};
+static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
+static const char *memmap_on_memory_to_str[] = {
+	[MEMMAP_ON_MEMORY_DISABLE]  = "disable",
+	[MEMMAP_ON_MEMORY_ENABLE]   = "enable",
+	[FORCE_MEMMAP_ON_MEMORY]    = "force",
+};
+
+static inline unsigned long memory_block_align_base(unsigned long size)
+{
+	if (memmap_mode == FORCE_MEMMAP_ON_MEMORY) {
+		unsigned long align;
+		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+		unsigned long vmemmap_size;
+
+		vmemmap_size = DIV_ROUND_UP(nr_vmemmap_pages * sizeof(struct page), PAGE_SIZE);
+		align = pageblock_align(vmemmap_size) - vmemmap_size;
+		return align;
+	} else
+		return 0;
+}
+
+static int set_memmap_mode(const char *val, const struct kernel_param *kp)
+{
+	int ret = sysfs_match_string(memmap_on_memory_to_str, val);
+
+	if (ret < 0)
+		return ret;
+	*((int *)kp->arg) = ret;
+	if (ret == FORCE_MEMMAP_ON_MEMORY) {
+		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
+			memory_block_align_base(memory_block_size_bytes()));
+	}
+	return 0;
+}
+
+static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%s\n", memmap_on_memory_to_str[*((int *)kp->arg)]);
+}
+
+static const struct kernel_param_ops memmap_mode_ops = {
+	.set = set_memmap_mode,
+	.get = get_memmap_mode,
+};
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
+MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
+	"With value \"force\" it could result in memory waste due to memmap size limitations \n"
+	"For example, if the memmap for a memory block requires 1 MiB, but the pageblock \n"
+	"size is 2 MiB, 1 MiB of hotplugged memory will be wasted. Note that there are \n"
+	"still cases where the feature cannot be enforced: for example, if the memmap is \n"
+	"smaller than a single page, or if the architecture does not support the forced \n"
+	"mode in all configurations. (disable/enable/force)");
 
 static inline bool mhp_memmap_on_memory(void)
 {
-	return memmap_on_memory;
+	return !!memmap_mode;
 }
 #else

We can also enable runtime enable/disable/force the feature. We just
need to make sure on try_remove_memory we lookup for altmap correctly. 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
  2023-07-12 13:50           ` Aneesh Kumar K.V
@ 2023-07-12 19:06             ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-12 19:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 12.07.23 15:50, Aneesh Kumar K.V wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 12.07.23 05:16, Aneesh Kumar K V wrote:
>>> On 7/11/23 10:49 PM, David Hildenbrand wrote:
>>>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>>>> Add a new kconfig option that can be selected if we want to allow
>>>>> pageblock alignment by reserving pages in the vmemmap altmap area.
>>>>> This implies we will be reserving some pages for every memoryblock
>>>>> This also allows the memmap on memory feature to be widely useful
>>>>> with different memory block size values.
>>>>
>>>> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
>>>>
>>>> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
>>>>
>>>>
>>>> memory_hotplug.force_memmap_on_memory
>>>>
>>>> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
>>>> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
>>>>
>>>> Thoughts?
>>>>
>>>
>>> With module parameter, do we still need the Kconfig option?
>>
>> No.
>>
>> Sleeping over this, maybe we can convert the existing
>> memory_hotplug.memmap_on_memory parameter to also accept "force".
>>
> 
> How about this?
> 
> modified   mm/memory_hotplug.c
> @@ -45,13 +45,67 @@
>   /*
>    * memory_hotplug.memmap_on_memory parameter
>    */
> -static bool memmap_on_memory __ro_after_init;
> -module_param(memmap_on_memory, bool, 0444);
> -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
> +enum {
> +	MEMMAP_ON_MEMORY_DISABLE = 0,
> +	MEMMAP_ON_MEMORY_ENABLE,
> +	FORCE_MEMMAP_ON_MEMORY,

MEMMAP_ON_MEMORY_FORCE ?

> +};
> +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
> +static const char *memmap_on_memory_to_str[] = {
> +	[MEMMAP_ON_MEMORY_DISABLE]  = "disable",
> +	[MEMMAP_ON_MEMORY_ENABLE]   = "enable",
> +	[FORCE_MEMMAP_ON_MEMORY]    = "force",
> +};
> +
> +static inline unsigned long memory_block_align_base(unsigned long size)
> +{
> +	if (memmap_mode == FORCE_MEMMAP_ON_MEMORY) {
> +		unsigned long align;
> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +		unsigned long vmemmap_size;
> +
> +		vmemmap_size = DIV_ROUND_UP(nr_vmemmap_pages * sizeof(struct page), PAGE_SIZE);
> +		align = pageblock_align(vmemmap_size) - vmemmap_size;
> +		return align;
> +	} else
> +		return 0;

^ have to see that in action :)

> +}
> +
> +static int set_memmap_mode(const char *val, const struct kernel_param *kp)
> +{
> +	int ret = sysfs_match_string(memmap_on_memory_to_str, val);
> +

That would break existing cmdlines that eat Y/N/0/..., no?

Maybe try parsing "force/FORCE" first and then fallback to the common 
bool parsing (kstrtobool).

Same when printing: handle "force" separately and then just print Y/N 
like param_get_bool() used to do.

So you'd end up with Y/N/FORCE as output and Y/N/0/.../FORCE/force as input.

But I'm happy to hear about alternatives. Maybe a second parameter is 
better ... but what name should it have "memmap_on_memory_force" sounds 
wrong. We'd need a name that expresses that we might be wasting memory, 
hm ...

> +	if (ret < 0)
> +		return ret;
> +	*((int *)kp->arg) = ret;
> +	if (ret == FORCE_MEMMAP_ON_MEMORY) {
> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
> +			memory_block_align_base(memory_block_size_bytes()));
> +	}
> +	return 0;
> +}
> +
> +static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
> +{
> +	return sprintf(buffer, "%s\n", memmap_on_memory_to_str[*((int *)kp->arg)]);
> +}
> +
> +static const struct kernel_param_ops memmap_mode_ops = {
> +	.set = set_memmap_mode,
> +	.get = get_memmap_mode,
> +};
> +module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
> +MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
> +	"With value \"force\" it could result in memory waste due to memmap size limitations \n"
> +	"For example, if the memmap for a memory block requires 1 MiB, but the pageblock \n"
> +	"size is 2 MiB, 1 MiB of hotplugged memory will be wasted. Note that there are \n"
> +	"still cases where the feature cannot be enforced: for example, if the memmap is \n"
> +	"smaller than a single page, or if the architecture does not support the forced \n"
> +	"mode in all configurations. (disable/enable/force)");
>   
>   static inline bool mhp_memmap_on_memory(void)
>   {
> -	return memmap_on_memory;
> +	return !!memmap_mode;
>   }
>   #else
> 
> We can also enable runtime enable/disable/force the feature. We just
> need to make sure on try_remove_memory we lookup for altmap correctly.
> 

Yes, that's already been asked for. But let's do that as a separate 
change later.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation
@ 2023-07-12 19:06             ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-12 19:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
	christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 12.07.23 15:50, Aneesh Kumar K.V wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 12.07.23 05:16, Aneesh Kumar K V wrote:
>>> On 7/11/23 10:49 PM, David Hildenbrand wrote:
>>>> On 11.07.23 06:48, Aneesh Kumar K.V wrote:
>>>>> Add a new kconfig option that can be selected if we want to allow
>>>>> pageblock alignment by reserving pages in the vmemmap altmap area.
>>>>> This implies we will be reserving some pages for every memoryblock
>>>>> This also allows the memmap on memory feature to be widely useful
>>>>> with different memory block size values.
>>>>
>>>> "reserving pages" is a nice way of saying "wasting memory". :) Let's spell that out.
>>>>
>>>> I think we have to find a better name for this, and I think we should have a toggle similar to memory_hotplug.memmap_on_memory. This should be an admin decision, not some kernel config option.
>>>>
>>>>
>>>> memory_hotplug.force_memmap_on_memory
>>>>
>>>> "Enable the memmap on memory feature even if it could result in memory waste due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB
>>>> of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations."
>>>>
>>>> Thoughts?
>>>>
>>>
>>> With module parameter, do we still need the Kconfig option?
>>
>> No.
>>
>> Sleeping over this, maybe we can convert the existing
>> memory_hotplug.memmap_on_memory parameter to also accept "force".
>>
> 
> How about this?
> 
> modified   mm/memory_hotplug.c
> @@ -45,13 +45,67 @@
>   /*
>    * memory_hotplug.memmap_on_memory parameter
>    */
> -static bool memmap_on_memory __ro_after_init;
> -module_param(memmap_on_memory, bool, 0444);
> -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
> +enum {
> +	MEMMAP_ON_MEMORY_DISABLE = 0,
> +	MEMMAP_ON_MEMORY_ENABLE,
> +	FORCE_MEMMAP_ON_MEMORY,

MEMMAP_ON_MEMORY_FORCE ?

> +};
> +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
> +static const char *memmap_on_memory_to_str[] = {
> +	[MEMMAP_ON_MEMORY_DISABLE]  = "disable",
> +	[MEMMAP_ON_MEMORY_ENABLE]   = "enable",
> +	[FORCE_MEMMAP_ON_MEMORY]    = "force",
> +};
> +
> +static inline unsigned long memory_block_align_base(unsigned long size)
> +{
> +	if (memmap_mode == FORCE_MEMMAP_ON_MEMORY) {
> +		unsigned long align;
> +		unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
> +		unsigned long vmemmap_size;
> +
> +		vmemmap_size = DIV_ROUND_UP(nr_vmemmap_pages * sizeof(struct page), PAGE_SIZE);
> +		align = pageblock_align(vmemmap_size) - vmemmap_size;
> +		return align;
> +	} else
> +		return 0;

^ have to see that in action :)

> +}
> +
> +static int set_memmap_mode(const char *val, const struct kernel_param *kp)
> +{
> +	int ret = sysfs_match_string(memmap_on_memory_to_str, val);
> +

That would break existing cmdlines that eat Y/N/0/..., no?

Maybe try parsing "force/FORCE" first and then fallback to the common 
bool parsing (kstrtobool).

Same when printing: handle "force" separately and then just print Y/N 
like param_get_bool() used to do.

So you'd end up with Y/N/FORCE as output and Y/N/0/.../FORCE/force as input.

But I'm happy to hear about alternatives. Maybe a second parameter is 
better ... but what name should it have "memmap_on_memory_force" sounds 
wrong. We'd need a name that expresses that we might be wasting memory, 
hm ...

> +	if (ret < 0)
> +		return ret;
> +	*((int *)kp->arg) = ret;
> +	if (ret == FORCE_MEMMAP_ON_MEMORY) {
> +		pr_info("Memory hotplug will reserve %ld pages in each memory block\n",
> +			memory_block_align_base(memory_block_size_bytes()));
> +	}
> +	return 0;
> +}
> +
> +static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
> +{
> +	return sprintf(buffer, "%s\n", memmap_on_memory_to_str[*((int *)kp->arg)]);
> +}
> +
> +static const struct kernel_param_ops memmap_mode_ops = {
> +	.set = set_memmap_mode,
> +	.get = get_memmap_mode,
> +};
> +module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
> +MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
> +	"With value \"force\" it could result in memory waste due to memmap size limitations \n"
> +	"For example, if the memmap for a memory block requires 1 MiB, but the pageblock \n"
> +	"size is 2 MiB, 1 MiB of hotplugged memory will be wasted. Note that there are \n"
> +	"still cases where the feature cannot be enforced: for example, if the memmap is \n"
> +	"smaller than a single page, or if the architecture does not support the forced \n"
> +	"mode in all configurations. (disable/enable/force)");
>   
>   static inline bool mhp_memmap_on_memory(void)
>   {
> -	return memmap_on_memory;
> +	return !!memmap_mode;
>   }
>   #else
> 
> We can also enable runtime enable/disable/force the feature. We just
> need to make sure on try_remove_memory we lookup for altmap correctly.
> 

Yes, that's already been asked for. But let's do that as a separate 
change later.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-11 16:09         ` David Hildenbrand
@ 2023-07-12 20:07           ` John Hubbard
  -1 siblings, 0 replies; 56+ messages in thread
From: John Hubbard @ 2023-07-12 20:07 UTC (permalink / raw)
  To: David Hildenbrand, Aneesh Kumar K V, linux-mm, akpm, mpe,
	linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/11/23 09:09, David Hildenbrand wrote:
...
>>> Can we make that a __weak function instead?
>>
>> We can. It is confusing because we do have these two patterns within the kernel where we use
>>
>> #ifndef x
>> #endif
>>
>> vs
>>
>> __weak x
>>
>> What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.
>>
> 
> I think when placing the implementation in a C file, it's __weak. But don't ask me :)
> 
> We do this already for arch_get_mappable_range() in mm/memory_hotplug.c and IMHO it looks quite nice.
> 

It does look nice. I always forget which parts are supposed to be
__weak, so I went to check Documentation/ , and it was quite
entertaining. There are only two search hits: one trivial reference in
Documentation/conf.py, and the other in checkpatch.rst, which says:

   **WEAK_DECLARATION**
     Using weak declarations like __attribute__((weak)) or __weak
     can have unintended link defects.  Avoid using them.

...which seems deeply out of touch with how arch layers work these days,
doesn't it? (This is not rhetorical; I'm asking in order to get an
opinion or two on the topic.)


thanks,
-- 
John Hubbard
NVIDIA



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-12 20:07           ` John Hubbard
  0 siblings, 0 replies; 56+ messages in thread
From: John Hubbard @ 2023-07-12 20:07 UTC (permalink / raw)
  To: David Hildenbrand, Aneesh Kumar K V, linux-mm, akpm, mpe,
	linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/11/23 09:09, David Hildenbrand wrote:
...
>>> Can we make that a __weak function instead?
>>
>> We can. It is confusing because we do have these two patterns within the kernel where we use
>>
>> #ifndef x
>> #endif
>>
>> vs
>>
>> __weak x
>>
>> What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.
>>
> 
> I think when placing the implementation in a C file, it's __weak. But don't ask me :)
> 
> We do this already for arch_get_mappable_range() in mm/memory_hotplug.c and IMHO it looks quite nice.
> 

It does look nice. I always forget which parts are supposed to be
__weak, so I went to check Documentation/ , and it was quite
entertaining. There are only two search hits: one trivial reference in
Documentation/conf.py, and the other in checkpatch.rst, which says:

   **WEAK_DECLARATION**
     Using weak declarations like __attribute__((weak)) or __weak
     can have unintended link defects.  Avoid using them.

...which seems deeply out of touch with how arch layers work these days,
doesn't it? (This is not rhetorical; I'm asking in order to get an
opinion or two on the topic.)


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-12 20:07           ` John Hubbard
@ 2023-07-13  9:08             ` David Hildenbrand
  -1 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-13  9:08 UTC (permalink / raw)
  To: John Hubbard, Aneesh Kumar K V, linux-mm, akpm, mpe,
	linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 12.07.23 22:07, John Hubbard wrote:
> On 7/11/23 09:09, David Hildenbrand wrote:
> ...
>>>> Can we make that a __weak function instead?
>>>
>>> We can. It is confusing because we do have these two patterns within the kernel where we use
>>>
>>> #ifndef x
>>> #endif
>>>
>>> vs
>>>
>>> __weak x
>>>
>>> What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.
>>>
>>
>> I think when placing the implementation in a C file, it's __weak. But don't ask me :)
>>
>> We do this already for arch_get_mappable_range() in mm/memory_hotplug.c and IMHO it looks quite nice.
>>
> 
> It does look nice. I always forget which parts are supposed to be
> __weak, so I went to check Documentation/ , and it was quite
> entertaining. There are only two search hits: one trivial reference in
> Documentation/conf.py, and the other in checkpatch.rst, which says:
> 
>     **WEAK_DECLARATION**
>       Using weak declarations like __attribute__((weak)) or __weak
>       can have unintended link defects.  Avoid using them.
> 
> ...which seems deeply out of touch with how arch layers work these days,
> doesn't it? (This is not rhetorical; I'm asking in order to get an
> opinion or two on the topic.)

Did some digging:

commit 65d9a9a60fd71be964effb2e94747a6acb6e7015
Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Date:   Fri Jul 1 13:04:04 2022 +0530

     kexec_file: drop weak attribute from functions
     
     As requested
     (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org),
     this series converts weak functions in kexec to use the #ifdef approach.
     
     Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from
     arch_kexec_apply_relocations[_add]") changelog:
     
     : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols")
     : [1], binutils (v2.36+) started dropping section symbols that it thought
     : were unused.  This isn't an issue in general, but with kexec_file.c, gcc
     : is placing kexec_arch_apply_relocations[_add] into a separate
     : .text.unlikely section and the section symbol ".text.unlikely" is being
     : dropped.  Due to this, recordmcount is unable to find a non-weak symbol in
     : .text.unlikely to generate a relocation record against.
     
     This patch (of 2);
     
     Drop __weak attribute from functions in kexec_file.c:
     - arch_kexec_kernel_image_probe()
     - arch_kimage_file_post_load_cleanup()
     - arch_kexec_kernel_image_load()
     - arch_kexec_locate_mem_hole()
     - arch_kexec_kernel_verify_sig()
     
     arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so
     drop the static attribute for the latter.
     
     arch_kexec_kernel_verify_sig() is not overridden by any architecture, so
     drop the __weak attribute.
     
     Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
     Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
     Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
     Suggested-by: Eric Biederman <ebiederm@xmission.com>
     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
     Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>


So, in general, it's use seems to be fine (unless some tool actually bails out).

https://lore.kernel.org/all/87ee0q7b92.fsf@email.froward.int.ebiederm.org/T/#u

Also mentions that__weak and non __weak variants ending up in the vmlinux. Did not
check if that's actually (still) the case.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-13  9:08             ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2023-07-13  9:08 UTC (permalink / raw)
  To: John Hubbard, Aneesh Kumar K V, linux-mm, akpm, mpe,
	linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 12.07.23 22:07, John Hubbard wrote:
> On 7/11/23 09:09, David Hildenbrand wrote:
> ...
>>>> Can we make that a __weak function instead?
>>>
>>> We can. It is confusing because we do have these two patterns within the kernel where we use
>>>
>>> #ifndef x
>>> #endif
>>>
>>> vs
>>>
>>> __weak x
>>>
>>> What is the recommended way to override ? I have mostly been using #ifndef for most of the arch overrides till now.
>>>
>>
>> I think when placing the implementation in a C file, it's __weak. But don't ask me :)
>>
>> We do this already for arch_get_mappable_range() in mm/memory_hotplug.c and IMHO it looks quite nice.
>>
> 
> It does look nice. I always forget which parts are supposed to be
> __weak, so I went to check Documentation/ , and it was quite
> entertaining. There are only two search hits: one trivial reference in
> Documentation/conf.py, and the other in checkpatch.rst, which says:
> 
>     **WEAK_DECLARATION**
>       Using weak declarations like __attribute__((weak)) or __weak
>       can have unintended link defects.  Avoid using them.
> 
> ...which seems deeply out of touch with how arch layers work these days,
> doesn't it? (This is not rhetorical; I'm asking in order to get an
> opinion or two on the topic.)

Did some digging:

commit 65d9a9a60fd71be964effb2e94747a6acb6e7015
Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Date:   Fri Jul 1 13:04:04 2022 +0530

     kexec_file: drop weak attribute from functions
     
     As requested
     (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org),
     this series converts weak functions in kexec to use the #ifdef approach.
     
     Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from
     arch_kexec_apply_relocations[_add]") changelog:
     
     : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols")
     : [1], binutils (v2.36+) started dropping section symbols that it thought
     : were unused.  This isn't an issue in general, but with kexec_file.c, gcc
     : is placing kexec_arch_apply_relocations[_add] into a separate
     : .text.unlikely section and the section symbol ".text.unlikely" is being
     : dropped.  Due to this, recordmcount is unable to find a non-weak symbol in
     : .text.unlikely to generate a relocation record against.
     
     This patch (of 2);
     
     Drop __weak attribute from functions in kexec_file.c:
     - arch_kexec_kernel_image_probe()
     - arch_kimage_file_post_load_cleanup()
     - arch_kexec_kernel_image_load()
     - arch_kexec_locate_mem_hole()
     - arch_kexec_kernel_verify_sig()
     
     arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so
     drop the static attribute for the latter.
     
     arch_kexec_kernel_verify_sig() is not overridden by any architecture, so
     drop the __weak attribute.
     
     Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
     Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
     Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
     Suggested-by: Eric Biederman <ebiederm@xmission.com>
     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
     Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>


So, in general, it's use seems to be fine (unless some tool actually bails out).

https://lore.kernel.org/all/87ee0q7b92.fsf@email.froward.int.ebiederm.org/T/#u

Also mentions that__weak and non __weak variants ending up in the vmlinux. Did not
check if that's actually (still) the case.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
  2023-07-13  9:08             ` David Hildenbrand
@ 2023-07-14 23:14               ` John Hubbard
  -1 siblings, 0 replies; 56+ messages in thread
From: John Hubbard @ 2023-07-14 23:14 UTC (permalink / raw)
  To: David Hildenbrand, Aneesh Kumar K V, linux-mm, akpm, mpe,
	linuxppc-dev, npiggin, christophe.leroy
  Cc: Oscar Salvador, Michal Hocko, Vishal Verma

On 7/13/23 02:08, David Hildenbrand wrote:
...
>>     **WEAK_DECLARATION**
>>       Using weak declarations like __attribute__((weak)) or __weak
>>       can have unintended link defects.  Avoid using them.
>>
>> ...which seems deeply out of touch with how arch layers work these days,
>> doesn't it? (This is not rhetorical; I'm asking in order to get an
>> opinion or two on the topic.)
> 
> Did some digging:
> 
> commit 65d9a9a60fd71be964effb2e94747a6acb6e7015
> Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Date:   Fri Jul 1 13:04:04 2022 +0530
> 
>      kexec_file: drop weak attribute from functions
>      As requested
>      (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org),
>      this series converts weak functions in kexec to use the #ifdef approach.
>      Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from
>      arch_kexec_apply_relocations[_add]") changelog:
>      : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols")
>      : [1], binutils (v2.36+) started dropping section symbols that it thought
>      : were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>      : is placing kexec_arch_apply_relocations[_add] into a separate
>      : .text.unlikely section and the section symbol ".text.unlikely" is being
>      : dropped.  Due to this, recordmcount is unable to find a non-weak symbol in
>      : .text.unlikely to generate a relocation record against.
>      This patch (of 2);
>      Drop __weak attribute from functions in kexec_file.c:
>      - arch_kexec_kernel_image_probe()
>      - arch_kimage_file_post_load_cleanup()
>      - arch_kexec_kernel_image_load()
>      - arch_kexec_locate_mem_hole()
>      - arch_kexec_kernel_verify_sig()
>      arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so
>      drop the static attribute for the latter.
>      arch_kexec_kernel_verify_sig() is not overridden by any architecture, so
>      drop the __weak attribute.
>      Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
>      Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
>      Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>      Suggested-by: Eric Biederman <ebiederm@xmission.com>
>      Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>      Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
> 
> 
> So, in general, it's use seems to be fine (unless some tool actually bails out).
> 
> https://lore.kernel.org/all/87ee0q7b92.fsf@email.froward.int.ebiederm.org/T/#u
> 
> Also mentions that__weak and non __weak variants ending up in the vmlinux. Did not
> check if that's actually (still) the case.
> 

OK, I looked at that commit and the associated discussion, and now have a
pretty clear picture of the preferred ways to do arch overrides.

Thanks for taking the time to look into it, and also to explain it.
Much appreciated!


thanks,
-- 
John Hubbard
NVIDIA



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check
@ 2023-07-14 23:14               ` John Hubbard
  0 siblings, 0 replies; 56+ messages in thread
From: John Hubbard @ 2023-07-14 23:14 UTC (permalink / raw)
  To: David Hildenbrand, Aneesh Kumar K V, linux-mm, akpm, mpe,
	linuxppc-dev, npiggin, christophe.leroy
  Cc: Vishal Verma, Michal Hocko, Oscar Salvador

On 7/13/23 02:08, David Hildenbrand wrote:
...
>>     **WEAK_DECLARATION**
>>       Using weak declarations like __attribute__((weak)) or __weak
>>       can have unintended link defects.  Avoid using them.
>>
>> ...which seems deeply out of touch with how arch layers work these days,
>> doesn't it? (This is not rhetorical; I'm asking in order to get an
>> opinion or two on the topic.)
> 
> Did some digging:
> 
> commit 65d9a9a60fd71be964effb2e94747a6acb6e7015
> Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Date:   Fri Jul 1 13:04:04 2022 +0530
> 
>      kexec_file: drop weak attribute from functions
>      As requested
>      (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org),
>      this series converts weak functions in kexec to use the #ifdef approach.
>      Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from
>      arch_kexec_apply_relocations[_add]") changelog:
>      : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols")
>      : [1], binutils (v2.36+) started dropping section symbols that it thought
>      : were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>      : is placing kexec_arch_apply_relocations[_add] into a separate
>      : .text.unlikely section and the section symbol ".text.unlikely" is being
>      : dropped.  Due to this, recordmcount is unable to find a non-weak symbol in
>      : .text.unlikely to generate a relocation record against.
>      This patch (of 2);
>      Drop __weak attribute from functions in kexec_file.c:
>      - arch_kexec_kernel_image_probe()
>      - arch_kimage_file_post_load_cleanup()
>      - arch_kexec_kernel_image_load()
>      - arch_kexec_locate_mem_hole()
>      - arch_kexec_kernel_verify_sig()
>      arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so
>      drop the static attribute for the latter.
>      arch_kexec_kernel_verify_sig() is not overridden by any architecture, so
>      drop the __weak attribute.
>      Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
>      Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
>      Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>      Suggested-by: Eric Biederman <ebiederm@xmission.com>
>      Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>      Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
> 
> 
> So, in general, it's use seems to be fine (unless some tool actually bails out).
> 
> https://lore.kernel.org/all/87ee0q7b92.fsf@email.froward.int.ebiederm.org/T/#u
> 
> Also mentions that__weak and non __weak variants ending up in the vmlinux. Did not
> check if that's actually (still) the case.
> 

OK, I looked at that commit and the associated discussion, and now have a
pretty clear picture of the preferred ways to do arch overrides.

Thanks for taking the time to look into it, and also to explain it.
Much appreciated!


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2023-07-14 23:15 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-11  4:48 [PATCH v3 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-07-11  4:48 ` Aneesh Kumar K.V
2023-07-11  4:48 ` [PATCH v3 1/7] mm/hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
2023-07-11  4:48   ` Aneesh Kumar K.V
2023-07-11  4:48 ` [PATCH v3 2/7] mm/hotplug: Allow memmap on memory hotplug request to fallback Aneesh Kumar K.V
2023-07-11  4:48   ` Aneesh Kumar K.V
2023-07-11 10:23   ` David Hildenbrand
2023-07-11 10:23     ` David Hildenbrand
2023-07-11 15:58     ` Aneesh Kumar K V
2023-07-11 15:58       ` Aneesh Kumar K V
2023-07-11  4:48 ` [PATCH v3 3/7] mm/hotplug: Allow architecture to override memmap on memory support check Aneesh Kumar K.V
2023-07-11  4:48   ` Aneesh Kumar K.V
2023-07-11 10:36   ` David Hildenbrand
2023-07-11 10:36     ` David Hildenbrand
2023-07-11 16:07     ` Aneesh Kumar K V
2023-07-11 16:07       ` Aneesh Kumar K V
2023-07-11 16:09       ` David Hildenbrand
2023-07-11 16:09         ` David Hildenbrand
2023-07-12 20:07         ` John Hubbard
2023-07-12 20:07           ` John Hubbard
2023-07-13  9:08           ` David Hildenbrand
2023-07-13  9:08             ` David Hildenbrand
2023-07-14 23:14             ` John Hubbard
2023-07-14 23:14               ` John Hubbard
2023-07-11  4:48 ` [PATCH v3 4/7] mm/hotplug: Allow pageblock alignment via altmap reservation Aneesh Kumar K.V
2023-07-11  4:48   ` Aneesh Kumar K.V
2023-07-11  6:21   ` Huang, Ying
2023-07-11  6:21     ` Huang, Ying
2023-07-11  8:20     ` Aneesh Kumar K V
2023-07-11  8:20       ` Aneesh Kumar K V
2023-07-11 17:19   ` David Hildenbrand
2023-07-11 17:19     ` David Hildenbrand
2023-07-12  3:16     ` Aneesh Kumar K V
2023-07-12  3:16       ` Aneesh Kumar K V
2023-07-12  7:22       ` David Hildenbrand
2023-07-12  7:22         ` David Hildenbrand
2023-07-12 13:50         ` Aneesh Kumar K.V
2023-07-12 13:50           ` Aneesh Kumar K.V
2023-07-12 19:06           ` David Hildenbrand
2023-07-12 19:06             ` David Hildenbrand
2023-07-11  4:48 ` [PATCH v3 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
2023-07-11  4:48   ` Aneesh Kumar K.V
2023-07-11 15:26   ` David Hildenbrand
2023-07-11 15:26     ` David Hildenbrand
2023-07-11 15:40     ` Aneesh Kumar K V
2023-07-11 15:40       ` Aneesh Kumar K V
2023-07-11 15:44       ` David Hildenbrand
2023-07-11 15:44         ` David Hildenbrand
2023-07-11 15:46         ` Aneesh Kumar K V
2023-07-11 15:46           ` Aneesh Kumar K V
2023-07-11  4:48 ` [PATCH v3 6/7] dax/kmem: Always enroll hotplugged memory for memmap_on_memory Aneesh Kumar K.V
2023-07-11  4:48   ` Aneesh Kumar K.V
2023-07-11 10:21   ` David Hildenbrand
2023-07-11 10:21     ` David Hildenbrand
2023-07-11  4:48 ` [PATCH v3 7/7] mm/hotplug: Embed vmem_altmap details in memory block Aneesh Kumar K.V
2023-07-11  4:48   ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.