linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug
@ 2021-04-30 14:36 Leonardo Bras
  2021-04-30 14:36 ` [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug Leonardo Bras
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Leonardo Bras @ 2021-04-30 14:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Sandipan Das, Mike Rapoport, Andrew Morton,
	Aneesh Kumar K.V, Nicholas Piggin, Nathan Lynch,
	David Hildenbrand, Scott Cheloha, Laurent Dufour
  Cc: linuxppc-dev, linux-kernel

This patchset intends to reduce time needed for processing memory
hotplug/hotunplug in hash guests.

The first one, makes sure guests with pagesize over 4k don't need to
go through HPT resize-downs after memory hotplug.

The second and third patches make hotplug / hotunplug perform a single
HPT resize per operation, instead of one for each shift change, or one
for each LMB in case of resize-down error.

Why haven't the same mechanism used for both memory hotplug and hotunplug?
They both have different requirements:

Memory hotplug causes (usually) HPT resize-ups, which are fine happening
at the start of hotplug, but resize-ups should not ever be disabled, as
other mechanisms may try to increase memory, hitting issues with a HPT
that is too small.

Memory hotunplug causes HPT resize-downs, which can be disabled (HPT will
just remain larger for a while), but need to happen at the end of an
hotunplug operation. If we want to batch it, we need to disable
resize-downs and perform it only at the end.

Tests done with this patchset in the same machine / guest config:
Starting memory: 129GB, DIMM: 256GB
Before patchset: hotplug = 710s, hotunplug = 621s.
After patchset: hotplug  = 21s, hotunplug = 100s.

Any feedback will be appreciated!

Changes since v1:
- Atomic used to disable resize was replaced by a mutex
- Removed wrappers, testing for !radix directly in hot(un)plug routine
- Added bounds to HPT resize loop
- Removed batching from dlpar_memory_*_by_index, as it adds a single LMB 

Best regards,
Leonardo Bras (3):
  powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
  powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug
  powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug

 arch/powerpc/include/asm/book3s/64/hash.h     |  4 +
 arch/powerpc/mm/book3s64/hash_utils.c         | 95 ++++++++++++++++---
 .../platforms/pseries/hotplug-memory.c        | 35 +++++++
 3 files changed, 119 insertions(+), 15 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
  2021-04-30 14:36 [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
@ 2021-04-30 14:36 ` Leonardo Bras
  2021-06-07  5:02   ` David Gibson
  2021-04-30 14:36 ` [PATCH v2 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on " Leonardo Bras
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Leonardo Bras @ 2021-04-30 14:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Sandipan Das, Mike Rapoport, Andrew Morton,
	Aneesh Kumar K.V, Nicholas Piggin, Nathan Lynch,
	David Hildenbrand, Scott Cheloha, Laurent Dufour
  Cc: linuxppc-dev, linux-kernel

Because hypervisors may need to create HPTs without knowing the guest
page size, the smallest used page-size (4k) may be chosen, resulting in
a HPT that is possibly bigger than needed.

On a guest with bigger page-sizes, the amount of entries for HTP may be
too high, causing the guest to ask for a HPT resize-down on the first
hotplug.

This becomes a problem when HPT resize-down fails, and causes the
HPT resize to be performed on every LMB added, until HPT size is
compatible to guest memory size, causing a major slowdown.

So, avoiding HPT resizing-down on hot-add significantly improves memory
hotplug times.

As an example, hotplugging 256GB on a 129GB guest took 710s without this
patch, and 21s after applied.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/mm/book3s64/hash_utils.c | 36 ++++++++++++++++-----------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 581b20a2feaf..608e4ed397a9 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -795,7 +795,7 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int resize_hpt_for_hotplug(unsigned long new_mem_size)
+static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
 {
 	unsigned target_hpt_shift;
 
@@ -804,19 +804,25 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size)
 
 	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
 
-	/*
-	 * To avoid lots of HPT resizes if memory size is fluctuating
-	 * across a boundary, we deliberately have some hysterisis
-	 * here: we immediately increase the HPT size if the target
-	 * shift exceeds the current shift, but we won't attempt to
-	 * reduce unless the target shift is at least 2 below the
-	 * current shift
-	 */
-	if (target_hpt_shift > ppc64_pft_size ||
-	    target_hpt_shift < ppc64_pft_size - 1)
-		return mmu_hash_ops.resize_hpt(target_hpt_shift);
+	if (shrinking) {
 
-	return 0;
+		/*
+		 * To avoid lots of HPT resizes if memory size is fluctuating
+		 * across a boundary, we deliberately have some hysterisis
+		 * here: we immediately increase the HPT size if the target
+		 * shift exceeds the current shift, but we won't attempt to
+		 * reduce unless the target shift is at least 2 below the
+		 * current shift
+		 */
+
+		if (target_hpt_shift >= ppc64_pft_size - 1)
+			return 0;
+
+	} else if (target_hpt_shift <= ppc64_pft_size) {
+		return 0;
+	}
+
+	return mmu_hash_ops.resize_hpt(target_hpt_shift);
 }
 
 int hash__create_section_mapping(unsigned long start, unsigned long end,
@@ -829,7 +835,7 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
 		return -1;
 	}
 
-	resize_hpt_for_hotplug(memblock_phys_mem_size());
+	resize_hpt_for_hotplug(memblock_phys_mem_size(), false);
 
 	rc = htab_bolt_mapping(start, end, __pa(start),
 			       pgprot_val(prot), mmu_linear_psize,
@@ -848,7 +854,7 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
 	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
 				     mmu_kernel_ssize);
 
-	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
+	if (resize_hpt_for_hotplug(memblock_phys_mem_size(), true) == -ENOSPC)
 		pr_warn("Hash collision while resizing HPT\n");
 
 	return rc;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug
  2021-04-30 14:36 [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
  2021-04-30 14:36 ` [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug Leonardo Bras
@ 2021-04-30 14:36 ` Leonardo Bras
  2021-06-07  5:10   ` David Gibson
  2021-04-30 14:36 ` [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug Leonardo Bras
  2021-04-30 14:41 ` [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
  3 siblings, 1 reply; 15+ messages in thread
From: Leonardo Bras @ 2021-04-30 14:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Sandipan Das, Mike Rapoport, Andrew Morton,
	Aneesh Kumar K.V, Nicholas Piggin, Nathan Lynch,
	David Hildenbrand, Scott Cheloha, Laurent Dufour
  Cc: linuxppc-dev, linux-kernel

Every time a memory hotplug happens, and the memory limit crosses a 2^n
value, it may be necessary to perform HPT resizing-up, which can take
some time (over 100ms in my tests).

It usually is not an issue, but it can take some time if a lot of memory
is added to a guest with little starting memory:
Adding 256G to a 2GB guest, for example will require 8 HPT resizes.

Perform an HPT resize before memory hotplug, updating HPT to its
final size (considering a successful hotplug), taking the number of
HPT resizes to at most one per memory hotplug action.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/hash.h     |  2 ++
 arch/powerpc/mm/book3s64/hash_utils.c         | 20 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  9 +++++++++
 3 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index d959b0195ad9..fad4af8b8543 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -255,6 +255,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
 				 int nid, pgprot_t prot);
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
+void hash_batch_expand_prepare(unsigned long newsize);
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 608e4ed397a9..3fa395b3fe57 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -859,6 +859,26 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
 
 	return rc;
 }
+
+void hash_batch_expand_prepare(unsigned long newsize)
+{
+	const u64 starting_size = ppc64_pft_size;
+
+	/*
+	 * Resizing-up HPT should never fail, but there are some cases system starts with higher
+	 * SHIFT than required, and we go through the funny case of resizing HPT down while
+	 * adding memory
+	 */
+
+	while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
+		newsize *= 2;
+		pr_warn("Hash collision while resizing HPT\n");
+
+		/* Do not try to resize to the starting size, or bigger value */
+		if (htab_shift_for_mem_size(newsize) >= starting_size)
+			break;
+	}
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static void __init hash_init_partition_table(phys_addr_t hash_table,
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 8377f1f7c78e..48b2cfe4ce69 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -13,6 +13,7 @@
 #include <linux/memory.h>
 #include <linux/memory_hotplug.h>
 #include <linux/slab.h>
+#include <linux/pgtable.h>
 
 #include <asm/firmware.h>
 #include <asm/machdep.h>
@@ -671,6 +672,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
 	if (lmbs_available < lmbs_to_add)
 		return -EINVAL;
 
+	if (!radix_enabled())
+		hash_batch_expand_prepare(memblock_phys_mem_size() +
+						 lmbs_to_add * drmem_lmb_size());
+
 	for_each_drmem_lmb(lmb) {
 		if (lmb->flags & DRCONF_MEM_ASSIGNED)
 			continue;
@@ -788,6 +793,10 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
 	if (lmbs_available < lmbs_to_add)
 		return -EINVAL;
 
+	if (!radix_enabled())
+		hash_batch_expand_prepare(memblock_phys_mem_size() +
+					  lmbs_to_add * drmem_lmb_size());
+
 	for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
 		if (lmb->flags & DRCONF_MEM_ASSIGNED)
 			continue;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug
  2021-04-30 14:36 [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
  2021-04-30 14:36 ` [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug Leonardo Bras
  2021-04-30 14:36 ` [PATCH v2 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on " Leonardo Bras
@ 2021-04-30 14:36 ` Leonardo Bras
  2021-06-07  5:20   ` David Gibson
  2021-04-30 14:41 ` [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
  3 siblings, 1 reply; 15+ messages in thread
From: Leonardo Bras @ 2021-04-30 14:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Sandipan Das, Mike Rapoport, Andrew Morton,
	Aneesh Kumar K.V, Nicholas Piggin, Nathan Lynch,
	David Hildenbrand, Scott Cheloha, Laurent Dufour
  Cc: linuxppc-dev, linux-kernel

During memory hotunplug, after each LMB is removed, the HPT may be
resized-down if it would map a max of 4 times the current amount of memory.
(2 shifts, due to introduced histeresis)

It usually is not an issue, but it can take a lot of time if HPT
resizing-down fails. This happens  because resize-down failures
usually repeat at each LMB removal, until there are no more bolted entries
conflict, which can take a while to happen.

This can be solved by doing a single HPT resize at the end of memory
hotunplug, after all requested entries are removed.

To make this happen, it's necessary to temporarily disable all HPT
resize-downs before hotunplug, re-enable them after hotunplug ends,
and then resize-down HPT to the current memory size.

As an example, hotunplugging 256GB from a 385GB guest took 621s without
this patch, and 100s after applied.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/hash.h     |  2 +
 arch/powerpc/mm/book3s64/hash_utils.c         | 45 +++++++++++++++++--
 .../platforms/pseries/hotplug-memory.c        | 26 +++++++++++
 3 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index fad4af8b8543..6cd66e7e98c9 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -256,6 +256,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
 void hash_batch_expand_prepare(unsigned long newsize);
+void hash_batch_shrink_begin(void);
+void hash_batch_shrink_end(void);
 
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 3fa395b3fe57..73ecd0f61acd 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -795,6 +795,9 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+
+static DEFINE_MUTEX(hpt_resize_down_lock);
+
 static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
 {
 	unsigned target_hpt_shift;
@@ -805,7 +808,7 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
 	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
 
 	if (shrinking) {
-
+		int ret;
 		/*
 		 * To avoid lots of HPT resizes if memory size is fluctuating
 		 * across a boundary, we deliberately have some hysterisis
@@ -818,10 +821,20 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
 		if (target_hpt_shift >= ppc64_pft_size - 1)
 			return 0;
 
-	} else if (target_hpt_shift <= ppc64_pft_size) {
-		return 0;
+		/* When batch removing entries, only resizes HPT at the end. */
+
+		if (!mutex_trylock(&hpt_resize_down_lock))
+			return 0;
+
+		ret = mmu_hash_ops.resize_hpt(target_hpt_shift);
+
+		mutex_unlock(&hpt_resize_down_lock);
+		return ret;
 	}
 
+	if (target_hpt_shift <= ppc64_pft_size)
+		return 0;
+
 	return mmu_hash_ops.resize_hpt(target_hpt_shift);
 }
 
@@ -879,6 +892,32 @@ void hash_batch_expand_prepare(unsigned long newsize)
 			break;
 	}
 }
+
+void hash_batch_shrink_begin(void)
+{
+	/* Disable HPT resize-down during hot-unplug */
+	mutex_lock(&hpt_resize_down_lock);
+}
+
+void hash_batch_shrink_end(void)
+{
+	const u64 starting_size = ppc64_pft_size;
+	unsigned long newsize;
+
+	newsize = memblock_phys_mem_size();
+	/* Resize to smallest SHIFT possible */
+	while (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) {
+		newsize *= 2;
+		pr_warn("Hash collision while resizing HPT\n");
+
+		/* Do not try to resize to the starting size, or bigger value */
+		if (htab_shift_for_mem_size(newsize) >= starting_size)
+			break;
+	}
+
+	/* Re-enables HPT resize-down after hot-unplug */
+	mutex_unlock(&hpt_resize_down_lock);
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static void __init hash_init_partition_table(phys_addr_t hash_table,
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 48b2cfe4ce69..44bc50d72353 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -426,6 +426,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
 		return -EINVAL;
 	}
 
+	if (!radix_enabled())
+		hash_batch_shrink_begin();
+
 	for_each_drmem_lmb(lmb) {
 		rc = dlpar_remove_lmb(lmb);
 		if (rc)
@@ -471,6 +474,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
 		rc = 0;
 	}
 
+	if (!radix_enabled())
+		hash_batch_shrink_end();
+
 	return rc;
 }
 
@@ -533,6 +539,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 	if (lmbs_available < lmbs_to_remove)
 		return -EINVAL;
 
+	if (!radix_enabled())
+		hash_batch_shrink_begin();
+
 	for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
 		if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
 			continue;
@@ -573,6 +582,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 		}
 	}
 
+	if (!radix_enabled())
+		hash_batch_shrink_end();
+
 	return rc;
 }
 
@@ -703,6 +715,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
 	if (lmbs_added != lmbs_to_add) {
 		pr_err("Memory hot-add failed, removing any added LMBs\n");
 
+		if (!radix_enabled())
+			hash_batch_shrink_begin();
+
 		for_each_drmem_lmb(lmb) {
 			if (!drmem_lmb_reserved(lmb))
 				continue;
@@ -716,6 +731,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
 
 			drmem_remove_lmb_reservation(lmb);
 		}
+
+		if (!radix_enabled())
+			hash_batch_shrink_end();
+
 		rc = -EINVAL;
 	} else {
 		for_each_drmem_lmb(lmb) {
@@ -817,6 +836,9 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
 	if (rc) {
 		pr_err("Memory indexed-count-add failed, removing any added LMBs\n");
 
+		if (!radix_enabled())
+			hash_batch_shrink_begin();
+
 		for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
 			if (!drmem_lmb_reserved(lmb))
 				continue;
@@ -830,6 +852,10 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
 
 			drmem_remove_lmb_reservation(lmb);
 		}
+
+		if (!radix_enabled())
+			hash_batch_shrink_end();
+
 		rc = -EINVAL;
 	} else {
 		for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug
  2021-04-30 14:36 [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
                   ` (2 preceding siblings ...)
  2021-04-30 14:36 ` [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug Leonardo Bras
@ 2021-04-30 14:41 ` Leonardo Bras
  3 siblings, 0 replies; 15+ messages in thread
From: Leonardo Bras @ 2021-04-30 14:41 UTC (permalink / raw)
  To: avid Gibson, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Sandipan Das, Mike Rapoport, Andrew Morton,
	Aneesh Kumar K.V, Nicholas Piggin, Nathan Lynch,
	David Hildenbrand, Scott Cheloha, Laurent Dufour
  Cc: linuxppc-dev, linux-kernel

CC: David Gibson

http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=241574&state=%2A&archive=both


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
  2021-04-30 14:36 ` [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug Leonardo Bras
@ 2021-06-07  5:02   ` David Gibson
  2021-06-09  0:52     ` Leonardo Brás
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2021-06-07  5:02 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4162 bytes --]

On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote:
> Because hypervisors may need to create HPTs without knowing the guest
> page size, the smallest used page-size (4k) may be chosen, resulting in
> a HPT that is possibly bigger than needed.
> 
> On a guest with bigger page-sizes, the amount of entries for HTP may be
> too high, causing the guest to ask for a HPT resize-down on the first
> hotplug.
> 
> This becomes a problem when HPT resize-down fails, and causes the
> HPT resize to be performed on every LMB added, until HPT size is
> compatible to guest memory size, causing a major slowdown.
> 
> So, avoiding HPT resizing-down on hot-add significantly improves memory
> hotplug times.
> 
> As an example, hotplugging 256GB on a 129GB guest took 710s without this
> patch, and 21s after applied.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

Sorry it's taken me so long to look at these

I don't love the extra statefulness that the 'shrinking' parameter
adds, but I can't see an elegant way to avoid it, so:

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/mm/book3s64/hash_utils.c | 36 ++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 581b20a2feaf..608e4ed397a9 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -795,7 +795,7 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -static int resize_hpt_for_hotplug(unsigned long new_mem_size)
> +static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
>  {
>  	unsigned target_hpt_shift;
>  
> @@ -804,19 +804,25 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size)
>  
>  	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
>  
> -	/*
> -	 * To avoid lots of HPT resizes if memory size is fluctuating
> -	 * across a boundary, we deliberately have some hysterisis
> -	 * here: we immediately increase the HPT size if the target
> -	 * shift exceeds the current shift, but we won't attempt to
> -	 * reduce unless the target shift is at least 2 below the
> -	 * current shift
> -	 */
> -	if (target_hpt_shift > ppc64_pft_size ||
> -	    target_hpt_shift < ppc64_pft_size - 1)
> -		return mmu_hash_ops.resize_hpt(target_hpt_shift);
> +	if (shrinking) {
>  
> -	return 0;
> +		/*
> +		 * To avoid lots of HPT resizes if memory size is fluctuating
> +		 * across a boundary, we deliberately have some hysterisis
> +		 * here: we immediately increase the HPT size if the target
> +		 * shift exceeds the current shift, but we won't attempt to
> +		 * reduce unless the target shift is at least 2 below the
> +		 * current shift
> +		 */
> +
> +		if (target_hpt_shift >= ppc64_pft_size - 1)
> +			return 0;
> +
> +	} else if (target_hpt_shift <= ppc64_pft_size) {
> +		return 0;
> +	}
> +
> +	return mmu_hash_ops.resize_hpt(target_hpt_shift);
>  }
>  
>  int hash__create_section_mapping(unsigned long start, unsigned long end,
> @@ -829,7 +835,7 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
>  		return -1;
>  	}
>  
> -	resize_hpt_for_hotplug(memblock_phys_mem_size());
> +	resize_hpt_for_hotplug(memblock_phys_mem_size(), false);
>  
>  	rc = htab_bolt_mapping(start, end, __pa(start),
>  			       pgprot_val(prot), mmu_linear_psize,
> @@ -848,7 +854,7 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
>  	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
>  				     mmu_kernel_ssize);
>  
> -	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> +	if (resize_hpt_for_hotplug(memblock_phys_mem_size(), true) == -ENOSPC)
>  		pr_warn("Hash collision while resizing HPT\n");
>  
>  	return rc;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug
  2021-04-30 14:36 ` [PATCH v2 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on " Leonardo Bras
@ 2021-06-07  5:10   ` David Gibson
  2021-06-09  3:09     ` Leonardo Brás
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2021-06-07  5:10 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4235 bytes --]

On Fri, Apr 30, 2021 at 11:36:08AM -0300, Leonardo Bras wrote:
> Every time a memory hotplug happens, and the memory limit crosses a 2^n
> value, it may be necessary to perform HPT resizing-up, which can take
> some time (over 100ms in my tests).
> 
> It usually is not an issue, but it can take some time if a lot of memory
> is added to a guest with little starting memory:
> Adding 256G to a 2GB guest, for example will require 8 HPT resizes.
> 
> Perform an HPT resize before memory hotplug, updating HPT to its
> final size (considering a successful hotplug), taking the number of
> HPT resizes to at most one per memory hotplug action.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/include/asm/book3s/64/hash.h     |  2 ++
>  arch/powerpc/mm/book3s64/hash_utils.c         | 20 +++++++++++++++++++
>  .../platforms/pseries/hotplug-memory.c        |  9 +++++++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index d959b0195ad9..fad4af8b8543 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -255,6 +255,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
>  				 int nid, pgprot_t prot);
>  int hash__remove_section_mapping(unsigned long start, unsigned long end);
>  
> +void hash_batch_expand_prepare(unsigned long newsize);
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 608e4ed397a9..3fa395b3fe57 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -859,6 +859,26 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
>  
>  	return rc;
>  }
> +
> +void hash_batch_expand_prepare(unsigned long newsize)
> +{
> +	const u64 starting_size = ppc64_pft_size;
> +
> +	/*
> +	 * Resizing-up HPT should never fail, but there are some cases system starts with higher
> +	 * SHIFT than required, and we go through the funny case of resizing HPT down while
> +	 * adding memory
> +	 */
> +
> +	while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> +		newsize *= 2;
> +		pr_warn("Hash collision while resizing HPT\n");
> +
> +		/* Do not try to resize to the starting size, or bigger value */
> +		if (htab_shift_for_mem_size(newsize) >= starting_size)
> +			break;
> +	}
> +}
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  static void __init hash_init_partition_table(phys_addr_t hash_table,
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 8377f1f7c78e..48b2cfe4ce69 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -13,6 +13,7 @@
>  #include <linux/memory.h>
>  #include <linux/memory_hotplug.h>
>  #include <linux/slab.h>
> +#include <linux/pgtable.h>
>  
>  #include <asm/firmware.h>
>  #include <asm/machdep.h>
> @@ -671,6 +672,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>  	if (lmbs_available < lmbs_to_add)
>  		return -EINVAL;
>  
> +	if (!radix_enabled())
> +		hash_batch_expand_prepare(memblock_phys_mem_size() +
> +						 lmbs_to_add * drmem_lmb_size());
> +
>  	for_each_drmem_lmb(lmb) {
>  		if (lmb->flags & DRCONF_MEM_ASSIGNED)
>  			continue;
> @@ -788,6 +793,10 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
>  	if (lmbs_available < lmbs_to_add)
>  		return -EINVAL;
>  
> +	if (!radix_enabled())
> +		hash_batch_expand_prepare(memblock_phys_mem_size() +
> +					  lmbs_to_add * drmem_lmb_size());
> +
>  	for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
>  		if (lmb->flags & DRCONF_MEM_ASSIGNED)
>  			continue;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug
  2021-04-30 14:36 ` [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug Leonardo Bras
@ 2021-06-07  5:20   ` David Gibson
  2021-06-09  5:30     ` Leonardo Brás
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2021-06-07  5:20 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7633 bytes --]

On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote:
> During memory hotunplug, after each LMB is removed, the HPT may be
> resized-down if it would map a max of 4 times the current amount of memory.
> (2 shifts, due to introduced histeresis)
> 
> It usually is not an issue, but it can take a lot of time if HPT
> resizing-down fails. This happens  because resize-down failures
> usually repeat at each LMB removal, until there are no more bolted entries
> conflict, which can take a while to happen.
> 
> This can be solved by doing a single HPT resize at the end of memory
> hotunplug, after all requested entries are removed.
> 
> To make this happen, it's necessary to temporarily disable all HPT
> resize-downs before hotunplug, re-enable them after hotunplug ends,
> and then resize-down HPT to the current memory size.
> 
> As an example, hotunplugging 256GB from a 385GB guest took 621s without
> this patch, and 100s after applied.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

Hrm.  This looks correct, but it seems overly complicated.

AFAICT, the resize calls that this adds should in practice be the
*only* times we call resize, all the calls from the lower level code
should be suppressed.  In which case can't we just remove those calls
entirely, and not deal with the clunky locking and exclusion here.
That should also remove the need for the 'shrinking' parameter in 1/3.

> ---
>  arch/powerpc/include/asm/book3s/64/hash.h     |  2 +
>  arch/powerpc/mm/book3s64/hash_utils.c         | 45 +++++++++++++++++--
>  .../platforms/pseries/hotplug-memory.c        | 26 +++++++++++
>  3 files changed, 70 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index fad4af8b8543..6cd66e7e98c9 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -256,6 +256,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
>  int hash__remove_section_mapping(unsigned long start, unsigned long end);
>  
>  void hash_batch_expand_prepare(unsigned long newsize);
> +void hash_batch_shrink_begin(void);
> +void hash_batch_shrink_end(void);
>  
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __KERNEL__ */
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 3fa395b3fe57..73ecd0f61acd 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -795,6 +795,9 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +
> +static DEFINE_MUTEX(hpt_resize_down_lock);
> +
>  static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
>  {
>  	unsigned target_hpt_shift;
> @@ -805,7 +808,7 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
>  	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
>  
>  	if (shrinking) {
> -
> +		int ret;
>  		/*
>  		 * To avoid lots of HPT resizes if memory size is fluctuating
>  		 * across a boundary, we deliberately have some hysterisis
> @@ -818,10 +821,20 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
>  		if (target_hpt_shift >= ppc64_pft_size - 1)
>  			return 0;
>  
> -	} else if (target_hpt_shift <= ppc64_pft_size) {
> -		return 0;
> +		/* When batch removing entries, only resizes HPT at the end. */
> +
> +		if (!mutex_trylock(&hpt_resize_down_lock))
> +			return 0;
> +
> +		ret = mmu_hash_ops.resize_hpt(target_hpt_shift);
> +
> +		mutex_unlock(&hpt_resize_down_lock);
> +		return ret;
>  	}
>  
> +	if (target_hpt_shift <= ppc64_pft_size)
> +		return 0;
> +
>  	return mmu_hash_ops.resize_hpt(target_hpt_shift);
>  }
>  
> @@ -879,6 +892,32 @@ void hash_batch_expand_prepare(unsigned long newsize)
>  			break;
>  	}
>  }
> +
> +void hash_batch_shrink_begin(void)
> +{
> +	/* Disable HPT resize-down during hot-unplug */
> +	mutex_lock(&hpt_resize_down_lock);
> +}
> +
> +void hash_batch_shrink_end(void)
> +{
> +	const u64 starting_size = ppc64_pft_size;
> +	unsigned long newsize;
> +
> +	newsize = memblock_phys_mem_size();
> +	/* Resize to smallest SHIFT possible */
> +	while (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) {
> +		newsize *= 2;
> +		pr_warn("Hash collision while resizing HPT\n");
> +
> +		/* Do not try to resize to the starting size, or bigger value */
> +		if (htab_shift_for_mem_size(newsize) >= starting_size)
> +			break;
> +	}
> +
> +	/* Re-enables HPT resize-down after hot-unplug */
> +	mutex_unlock(&hpt_resize_down_lock);
> +}
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  static void __init hash_init_partition_table(phys_addr_t hash_table,
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 48b2cfe4ce69..44bc50d72353 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -426,6 +426,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
>  		return -EINVAL;
>  	}
>  
> +	if (!radix_enabled())
> +		hash_batch_shrink_begin();
> +
>  	for_each_drmem_lmb(lmb) {
>  		rc = dlpar_remove_lmb(lmb);
>  		if (rc)
> @@ -471,6 +474,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
>  		rc = 0;
>  	}
>  
> +	if (!radix_enabled())
> +		hash_batch_shrink_end();
> +
>  	return rc;
>  }
>  
> @@ -533,6 +539,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
>  	if (lmbs_available < lmbs_to_remove)
>  		return -EINVAL;
>  
> +	if (!radix_enabled())
> +		hash_batch_shrink_begin();
> +
>  	for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
>  		if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>  			continue;
> @@ -573,6 +582,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
>  		}
>  	}
>  
> +	if (!radix_enabled())
> +		hash_batch_shrink_end();
> +
>  	return rc;
>  }
>  
> @@ -703,6 +715,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>  	if (lmbs_added != lmbs_to_add) {
>  		pr_err("Memory hot-add failed, removing any added LMBs\n");
>  
> +		if (!radix_enabled())
> +			hash_batch_shrink_begin();
> +
>  		for_each_drmem_lmb(lmb) {
>  			if (!drmem_lmb_reserved(lmb))
>  				continue;
> @@ -716,6 +731,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>  
>  			drmem_remove_lmb_reservation(lmb);
>  		}
> +
> +		if (!radix_enabled())
> +			hash_batch_shrink_end();
> +
>  		rc = -EINVAL;
>  	} else {
>  		for_each_drmem_lmb(lmb) {
> @@ -817,6 +836,9 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
>  	if (rc) {
>  		pr_err("Memory indexed-count-add failed, removing any added LMBs\n");
>  
> +		if (!radix_enabled())
> +			hash_batch_shrink_begin();
> +
>  		for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
>  			if (!drmem_lmb_reserved(lmb))
>  				continue;
> @@ -830,6 +852,10 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
>  
>  			drmem_remove_lmb_reservation(lmb);
>  		}
> +
> +		if (!radix_enabled())
> +			hash_batch_shrink_end();
> +
>  		rc = -EINVAL;
>  	} else {
>  		for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
  2021-06-07  5:02   ` David Gibson
@ 2021-06-09  0:52     ` Leonardo Brás
  2021-06-09  4:40       ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Leonardo Brás @ 2021-06-09  0:52 UTC (permalink / raw)
  To: David Gibson
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote:
> On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote:
> > Because hypervisors may need to create HPTs without knowing the
> > guest
> > page size, the smallest used page-size (4k) may be chosen,
> > resulting in
> > a HPT that is possibly bigger than needed.
> > 
> > On a guest with bigger page-sizes, the amount of entries for HTP
> > may be
> > too high, causing the guest to ask for a HPT resize-down on the
> > first
> > hotplug.
> > 
> > This becomes a problem when HPT resize-down fails, and causes the
> > HPT resize to be performed on every LMB added, until HPT size is
> > compatible to guest memory size, causing a major slowdown.
> > 
> > So, avoiding HPT resizing-down on hot-add significantly improves
> > memory
> > hotplug times.
> > 
> > As an example, hotplugging 256GB on a 129GB guest took 710s without
> > this
> > patch, and 21s after applied.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> 
> Sorry it's taken me so long to look at these
> 
> I don't love the extra statefulness that the 'shrinking' parameter
> adds, but I can't see an elegant way to avoid it, so:
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

np, thanks for reviewing!

Best regards,
Leonardo Bras


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug
  2021-06-07  5:10   ` David Gibson
@ 2021-06-09  3:09     ` Leonardo Brás
  0 siblings, 0 replies; 15+ messages in thread
From: Leonardo Brás @ 2021-06-09  3:09 UTC (permalink / raw)
  To: David Gibson
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

On Mon, 2021-06-07 at 15:10 +1000, David Gibson wrote:
> On Fri, Apr 30, 2021 at 11:36:08AM -0300, Leonardo Bras wrote:
> > Every time a memory hotplug happens, and the memory limit crosses a
> > 2^n
> > value, it may be necessary to perform HPT resizing-up, which can
> > take
> > some time (over 100ms in my tests).
> > 
> > It usually is not an issue, but it can take some time if a lot of
> > memory
> > is added to a guest with little starting memory:
> > Adding 256G to a 2GB guest, for example will require 8 HPT resizes.
> > 
> > Perform an HPT resize before memory hotplug, updating HPT to its
> > final size (considering a successful hotplug), taking the number of
> > HPT resizes to at most one per memory hotplug action.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Thanks David!

> 
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash.h     |  2 ++
> >  arch/powerpc/mm/book3s64/hash_utils.c         | 20
> > +++++++++++++++++++
> >  .../platforms/pseries/hotplug-memory.c        |  9 +++++++++
> >  3 files changed, 31 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash.h
> > b/arch/powerpc/include/asm/book3s/64/hash.h
> > index d959b0195ad9..fad4af8b8543 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> > @@ -255,6 +255,8 @@ int hash__create_section_mapping(unsigned long
> > start, unsigned long end,
> >                                  int nid, pgprot_t prot);
> >  int hash__remove_section_mapping(unsigned long start, unsigned
> > long end);
> >  
> > +void hash_batch_expand_prepare(unsigned long newsize);
> > +
> >  #endif /* !__ASSEMBLY__ */
> >  #endif /* __KERNEL__ */
> >  #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
> > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c
> > b/arch/powerpc/mm/book3s64/hash_utils.c
> > index 608e4ed397a9..3fa395b3fe57 100644
> > --- a/arch/powerpc/mm/book3s64/hash_utils.c
> > +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> > @@ -859,6 +859,26 @@ int hash__remove_section_mapping(unsigned long
> > start, unsigned long end)
> >  
> >         return rc;
> >  }
> > +
> > +void hash_batch_expand_prepare(unsigned long newsize)
> > +{
> > +       const u64 starting_size = ppc64_pft_size;
> > +
> > +       /*
> > +        * Resizing-up HPT should never fail, but there are some
> > cases system starts with higher
> > +        * SHIFT than required, and we go through the funny case of
> > resizing HPT down while
> > +        * adding memory
> > +        */
> > +
> > +       while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> > +               newsize *= 2;
> > +               pr_warn("Hash collision while resizing HPT\n");
> > +
> > +               /* Do not try to resize to the starting size, or
> > bigger value */
> > +               if (htab_shift_for_mem_size(newsize) >=
> > starting_size)
> > +                       break;
> > +       }
> > +}
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >  
> >  static void __init hash_init_partition_table(phys_addr_t
> > hash_table,
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > index 8377f1f7c78e..48b2cfe4ce69 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > @@ -13,6 +13,7 @@
> >  #include <linux/memory.h>
> >  #include <linux/memory_hotplug.h>
> >  #include <linux/slab.h>
> > +#include <linux/pgtable.h>
> >  
> >  #include <asm/firmware.h>
> >  #include <asm/machdep.h>
> > @@ -671,6 +672,10 @@ static int dlpar_memory_add_by_count(u32
> > lmbs_to_add)
> >         if (lmbs_available < lmbs_to_add)
> >                 return -EINVAL;
> >  
> > +       if (!radix_enabled())
> > +               hash_batch_expand_prepare(memblock_phys_mem_size()
> > +
> > +                                                lmbs_to_add *
> > drmem_lmb_size());
> > +
> >         for_each_drmem_lmb(lmb) {
> >                 if (lmb->flags & DRCONF_MEM_ASSIGNED)
> >                         continue;
> > @@ -788,6 +793,10 @@ static int dlpar_memory_add_by_ic(u32
> > lmbs_to_add, u32 drc_index)
> >         if (lmbs_available < lmbs_to_add)
> >                 return -EINVAL;
> >  
> > +       if (!radix_enabled())
> > +               hash_batch_expand_prepare(memblock_phys_mem_size()
> > +
> > +                                         lmbs_to_add *
> > drmem_lmb_size());
> > +
> >         for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> >                 if (lmb->flags & DRCONF_MEM_ASSIGNED)
> >                         continue;
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my
> code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_
> _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
  2021-06-09  0:52     ` Leonardo Brás
@ 2021-06-09  4:40       ` David Gibson
  2021-06-09  5:51         ` Leonardo Brás
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2021-06-09  4:40 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3445 bytes --]

On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote:
> On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote:
> > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote:
> > > Because hypervisors may need to create HPTs without knowing the
> > > guest
> > > page size, the smallest used page-size (4k) may be chosen,
> > > resulting in
> > > a HPT that is possibly bigger than needed.
> > > 
> > > On a guest with bigger page-sizes, the amount of entries for HTP
> > > may be
> > > too high, causing the guest to ask for a HPT resize-down on the
> > > first
> > > hotplug.
> > > 
> > > This becomes a problem when HPT resize-down fails, and causes the
> > > HPT resize to be performed on every LMB added, until HPT size is
> > > compatible to guest memory size, causing a major slowdown.
> > > 
> > > So, avoiding HPT resizing-down on hot-add significantly improves
> > > memory
> > > hotplug times.
> > > 
> > > As an example, hotplugging 256GB on a 129GB guest took 710s without
> > > this
> > > patch, and 21s after applied.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > 
> > Sorry it's taken me so long to look at these
> > 
> > I don't love the extra statefulness that the 'shrinking' parameter
> > adds, but I can't see an elegant way to avoid it, so:
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> np, thanks for reviewing!

Actually... I take that back.  With the subsequent patches my
discomfort with the complexity of implementing the batching grew.

I think I can see a simpler way - although it wasn't as clear as I
thought it might be, without some deep history on this feature.

What's going on here is pretty hard to follow, because it starts in
arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c)
where it processes the add/remove requests, then goes into generic
code __add_memory() which eventually emerges back in arch specific
code (hash__create_section_mapping()).

The HPT resizing calls are in the "inner" arch specific section,
whereas it's only the outer arch section that has the information to
batch properly.  The mutex and 'shrinking' parameter in Leonardo's
code are all about conveying information from the outer to inner
section.

Now, I think the reason I had the resize calls in the inner section
was to accomodate the notion that a) pHyp might support resizing in
future, and it could come in through a different path with its drmgr
thingy and/or b) bare metal hash architectures might want to implement
hash resizing, and this would make at least part of the path common.

Given the decreasing relevance of hash MMUs, I think we can now safely
say neither of these is ever going to happen.

Therefore, we can simplify things by moving the HPT resize calls into
the pseries LMB code, instead of create/remove_section_mapping.  Then
to do batching without extra complications we just need this logic for
all resizes (both add and remove):

	let new_hpt_order = expected HPT size for new mem size;

	if (new_hpt_order > current_hpt_order)
		resize to new_hpt_order

	add/remove memory

	if (new_hpt_order < current_hpt_order - 1)
		resize to new_hpt_order


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug
  2021-06-07  5:20   ` David Gibson
@ 2021-06-09  5:30     ` Leonardo Brás
  2021-06-09  6:08       ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Leonardo Brás @ 2021-06-09  5:30 UTC (permalink / raw)
  To: David Gibson
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

On Mon, 2021-06-07 at 15:20 +1000, David Gibson wrote:
> On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote:
> > During memory hotunplug, after each LMB is removed, the HPT may be
> > resized-down if it would map a max of 4 times the current amount of
> > memory.
> > (2 shifts, due to introduced histeresis)
> > 
> > It usually is not an issue, but it can take a lot of time if HPT
> > resizing-down fails. This happens  because resize-down failures
> > usually repeat at each LMB removal, until there are no more bolted
> > entries
> > conflict, which can take a while to happen.
> > 
> > This can be solved by doing a single HPT resize at the end of
> > memory
> > hotunplug, after all requested entries are removed.
> > 
> > To make this happen, it's necessary to temporarily disable all HPT
> > resize-downs before hotunplug, re-enable them after hotunplug ends,
> > and then resize-down HPT to the current memory size.
> > 
> > As an example, hotunplugging 256GB from a 385GB guest took 621s
> > without
> > this patch, and 100s after applied.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> 
> Hrm.  This looks correct, but it seems overly complicated.
> 
> AFAICT, the resize calls that this adds should in practice be the
> *only* times we call resize, all the calls from the lower level code
> should be suppressed. 

That's correct.

>  In which case can't we just remove those calls
> entirely, and not deal with the clunky locking and exclusion here.
> That should also remove the need for the 'shrinking' parameter in
> 1/3.


If I get your suggestion correctly, you suggest something like:
1 - Never calling resize_hpt_for_hotplug() in
hash__remove_section_mapping(), thus not needing the srinking
parameter.
2 - Functions in hotplug-memory.c that call dlpar_remove_lmb() would in
fact call another function to do the batch resize_hpt_for_hotplug() for
them

If so, that assumes that no other function that currently calls
resize_hpt_for_hotplug() under another path, or if they do, it does not
need to actually resize the HPT.

Is the above correct?

There are some examples of functions that currently call
resize_hpt_for_hotplug() by another path:

add_memory_driver_managed
	virtio_mem_add_memory
	dev_dax_kmem_probe

reserve_additional_memory
	balloon_process
	add_ballooned_pages

__add_memory
	probe_store

__remove_memory
	pseries_remove_memblock

remove_memory
	dev_dax_kmem_remove
	virtio_mem_remove_memory

memunmap_pages
	pci_p2pdma_add_resource
	virtio_fs_setup_dax


Best regards,
Leonardo Bras


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
  2021-06-09  4:40       ` David Gibson
@ 2021-06-09  5:51         ` Leonardo Brás
  2021-06-09  6:59           ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Leonardo Brás @ 2021-06-09  5:51 UTC (permalink / raw)
  To: David Gibson
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

On Wed, 2021-06-09 at 14:40 +1000, David Gibson wrote:
> On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote:
> > On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote:
> > > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote:
> > > > Because hypervisors may need to create HPTs without knowing the
> > > > guest
> > > > page size, the smallest used page-size (4k) may be chosen,
> > > > resulting in
> > > > a HPT that is possibly bigger than needed.
> > > > 
> > > > On a guest with bigger page-sizes, the amount of entries for
> > > > HTP
> > > > may be
> > > > too high, causing the guest to ask for a HPT resize-down on the
> > > > first
> > > > hotplug.
> > > > 
> > > > This becomes a problem when HPT resize-down fails, and causes
> > > > the
> > > > HPT resize to be performed on every LMB added, until HPT size
> > > > is
> > > > compatible to guest memory size, causing a major slowdown.
> > > > 
> > > > So, avoiding HPT resizing-down on hot-add significantly
> > > > improves
> > > > memory
> > > > hotplug times.
> > > > 
> > > > As an example, hotplugging 256GB on a 129GB guest took 710s
> > > > without
> > > > this
> > > > patch, and 21s after applied.
> > > > 
> > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > > 
> > > Sorry it's taken me so long to look at these
> > > 
> > > I don't love the extra statefulness that the 'shrinking'
> > > parameter
> > > adds, but I can't see an elegant way to avoid it, so:
> > > 
> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > 
> > np, thanks for reviewing!
> 
> Actually... I take that back.  With the subsequent patches my
> discomfort with the complexity of implementing the batching grew.
> 
> I think I can see a simpler way - although it wasn't as clear as I
> thought it might be, without some deep history on this feature.
> 
> What's going on here is pretty hard to follow, because it starts in
> arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c)
> where it processes the add/remove requests, then goes into generic
> code __add_memory() which eventually emerges back in arch specific
> code (hash__create_section_mapping()).
> 
> The HPT resizing calls are in the "inner" arch specific section,
> whereas it's only the outer arch section that has the information to
> batch properly.  The mutex and 'shrinking' parameter in Leonardo's
> code are all about conveying information from the outer to inner
> section.
> 
> Now, I think the reason I had the resize calls in the inner section
> was to accomodate the notion that a) pHyp might support resizing in
> future, and it could come in through a different path with its drmgr
> thingy and/or b) bare metal hash architectures might want to
> implement
> hash resizing, and this would make at least part of the path common.
> 
> Given the decreasing relevance of hash MMUs, I think we can now
> safely
> say neither of these is ever going to happen.
> 
> Therefore, we can simplify things by moving the HPT resize calls into
> the pseries LMB code, instead of create/remove_section_mapping.  Then
> to do batching without extra complications we just need this logic
> for
> all resizes (both add and remove):
> 
>         let new_hpt_order = expected HPT size for new mem size;
> 
>         if (new_hpt_order > current_hpt_order)
>                 resize to new_hpt_order
> 
>         add/remove memory
> 
>         if (new_hpt_order < current_hpt_order - 1)
>                 resize to new_hpt_order
> 
> 


Ok, that really does seem to simplify a lot the batching.

Question:
by LMB code, you mean dlpar_memory_{add,remove}_by_* ?
(dealing only with dlpar_{add,remove}_lmb() would not be enough to deal
with batching)

In my 3/3 repĺy I sent you some other examples of functions that
currently end up calling resize_hpt_for_hotplug() without comming from 
hotplug-memory.c. Is that ok that they do not call it anymore?


Best regards,
Leonardo Bras


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug
  2021-06-09  5:30     ` Leonardo Brás
@ 2021-06-09  6:08       ` David Gibson
  0 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2021-06-09  6:08 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4851 bytes --]

On Wed, Jun 09, 2021 at 02:30:36AM -0300, Leonardo Brás wrote:
> On Mon, 2021-06-07 at 15:20 +1000, David Gibson wrote:
> > On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote:
> > > During memory hotunplug, after each LMB is removed, the HPT may be
> > > resized-down if it would map a max of 4 times the current amount of
> > > memory.
> > > (2 shifts, due to introduced histeresis)
> > > 
> > > It usually is not an issue, but it can take a lot of time if HPT
> > > resizing-down fails. This happens  because resize-down failures
> > > usually repeat at each LMB removal, until there are no more bolted
> > > entries
> > > conflict, which can take a while to happen.
> > > 
> > > This can be solved by doing a single HPT resize at the end of
> > > memory
> > > hotunplug, after all requested entries are removed.
> > > 
> > > To make this happen, it's necessary to temporarily disable all HPT
> > > resize-downs before hotunplug, re-enable them after hotunplug ends,
> > > and then resize-down HPT to the current memory size.
> > > 
> > > As an example, hotunplugging 256GB from a 385GB guest took 621s
> > > without
> > > this patch, and 100s after applied.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > 
> > Hrm.  This looks correct, but it seems overly complicated.
> > 
> > AFAICT, the resize calls that this adds should in practice be the
> > *only* times we call resize, all the calls from the lower level code
> > should be suppressed. 
> 
> That's correct.
> 
> >  In which case can't we just remove those calls
> > entirely, and not deal with the clunky locking and exclusion here.
> > That should also remove the need for the 'shrinking' parameter in
> > 1/3.
> 
> 
> If I get your suggestion correctly, you suggest something like:
> 1 - Never calling resize_hpt_for_hotplug() in
> hash__remove_section_mapping(), thus not needing the srinking
> parameter.
> 2 - Functions in hotplug-memory.c that call dlpar_remove_lmb() would in
> fact call another function to do the batch resize_hpt_for_hotplug() for
> them

Basically, yes.

> If so, that assumes that no other function that currently calls
> resize_hpt_for_hotplug() under another path, or if they do, it does not
> need to actually resize the HPT.
> 
> Is the above correct?
> 
> There are some examples of functions that currently call
> resize_hpt_for_hotplug() by another path:
> 
> add_memory_driver_managed
> 	virtio_mem_add_memory
> 	dev_dax_kmem_probe

Oh... virtio-mem.  I didn't think of that.


> reserve_additional_memory
> 	balloon_process
> 	add_ballooned_pages

AFAICT this comes from drivers/xen, and Xen has never been a thing on
POWER.

> __add_memory
> 	probe_store

So this is a sysfs triggered memory add.  If the user is doing this
manually, then I think it's reasonable for them to manually manage the
HPT size as well, which they can do through debugfs.  I think it might
also be used my drmgr under pHyp, but pHyp doesn't support HPT
resizing.

> __remove_memory
> 	pseries_remove_memblock

Huh, this one comes through OF_RECONFIG_DETACH_NODE.  I don't really
know when those happen, but I strongly suspect it's only under pHyp
again.

> remove_memory
> 	dev_dax_kmem_remove
> 	virtio_mem_remove_memory

virtio-mem again.

> memunmap_pages
> 	pci_p2pdma_add_resource
> 	virtio_fs_setup_dax

And virtio-fs in dax mode.  Didn't think of that either.


Ugh, yeah, I'm used to the world where the platform provides the only
way of hotplugging memory, but virtio-mem does indeed provide another
one, and we could indeed need to manage the HPT size based on that.
Drat, so moving all the HPT resizing handling up into
pseries/hotplug-memory.c won't work.

I still think we can simplify the communication between the stuff in
the pseries hotplug code and the actual hash resizing.  In your draft
there are kind of 3 ways the information is conveyed: the mutex
suppresses HPT shrinks, pre-growing past what we need prevents HPT
grows, and the 'shrinking' flag handles some edge cases.

I suggest instead a single flag that will suppress all the current
resizes.  Not sure it technically has to be an atomic mutex, but
that's probably the obvious safe choice.  Then have a "resize up to
target" and "resize down to target" that ignore that suppression and
are no-ops if the target is in the other direction.
Then you should be able to make the path for pseries hotplugs be:

	suppress other resizes

	resize up to target

	do the actual adds or removes

	resize down to target

	unsuppress other resizes


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
  2021-06-09  5:51         ` Leonardo Brás
@ 2021-06-09  6:59           ` David Gibson
  0 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2021-06-09  6:59 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Sandipan Das, Mike Rapoport, Andrew Morton, Aneesh Kumar K.V,
	Nicholas Piggin, Nathan Lynch, David Hildenbrand, Scott Cheloha,
	Laurent Dufour, linuxppc-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4892 bytes --]

On Wed, Jun 09, 2021 at 02:51:49AM -0300, Leonardo Brás wrote:
> On Wed, 2021-06-09 at 14:40 +1000, David Gibson wrote:
> > On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote:
> > > On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote:
> > > > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote:
> > > > > Because hypervisors may need to create HPTs without knowing the
> > > > > guest
> > > > > page size, the smallest used page-size (4k) may be chosen,
> > > > > resulting in
> > > > > a HPT that is possibly bigger than needed.
> > > > > 
> > > > > On a guest with bigger page-sizes, the amount of entries for
> > > > > HTP
> > > > > may be
> > > > > too high, causing the guest to ask for a HPT resize-down on the
> > > > > first
> > > > > hotplug.
> > > > > 
> > > > > This becomes a problem when HPT resize-down fails, and causes
> > > > > the
> > > > > HPT resize to be performed on every LMB added, until HPT size
> > > > > is
> > > > > compatible to guest memory size, causing a major slowdown.
> > > > > 
> > > > > So, avoiding HPT resizing-down on hot-add significantly
> > > > > improves
> > > > > memory
> > > > > hotplug times.
> > > > > 
> > > > > As an example, hotplugging 256GB on a 129GB guest took 710s
> > > > > without
> > > > > this
> > > > > patch, and 21s after applied.
> > > > > 
> > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > > > 
> > > > Sorry it's taken me so long to look at these
> > > > 
> > > > I don't love the extra statefulness that the 'shrinking'
> > > > parameter
> > > > adds, but I can't see an elegant way to avoid it, so:
> > > > 
> > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > 
> > > np, thanks for reviewing!
> > 
> > Actually... I take that back.  With the subsequent patches my
> > discomfort with the complexity of implementing the batching grew.
> > 
> > I think I can see a simpler way - although it wasn't as clear as I
> > thought it might be, without some deep history on this feature.
> > 
> > What's going on here is pretty hard to follow, because it starts in
> > arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c)
> > where it processes the add/remove requests, then goes into generic
> > code __add_memory() which eventually emerges back in arch specific
> > code (hash__create_section_mapping()).
> > 
> > The HPT resizing calls are in the "inner" arch specific section,
> > whereas it's only the outer arch section that has the information to
> > batch properly.  The mutex and 'shrinking' parameter in Leonardo's
> > code are all about conveying information from the outer to inner
> > section.
> > 
> > Now, I think the reason I had the resize calls in the inner section
> > was to accomodate the notion that a) pHyp might support resizing in
> > future, and it could come in through a different path with its drmgr
> > thingy and/or b) bare metal hash architectures might want to
> > implement
> > hash resizing, and this would make at least part of the path common.
> > 
> > Given the decreasing relevance of hash MMUs, I think we can now
> > safely
> > say neither of these is ever going to happen.
> > 
> > Therefore, we can simplify things by moving the HPT resize calls into
> > the pseries LMB code, instead of create/remove_section_mapping.  Then
> > to do batching without extra complications we just need this logic
> > for
> > all resizes (both add and remove):
> > 
> >         let new_hpt_order = expected HPT size for new mem size;
> > 
> >         if (new_hpt_order > current_hpt_order)
> >                 resize to new_hpt_order
> > 
> >         add/remove memory
> > 
> >         if (new_hpt_order < current_hpt_order - 1)
> >                 resize to new_hpt_order
> > 
> > 
> 
> 
> Ok, that really does seem to simplify a lot the batching.
> 
> Question:
> by LMB code, you mean dlpar_memory_{add,remove}_by_* ?
> (dealing only with dlpar_{add,remove}_lmb() would not be enough to deal
> with batching)

I was thinking of a two stage process.  First moving the resizes to
dlpar_{add,remote}_lmb() (not changing behaviour for the pseries dlpar
path), then implementing the batching by moving to the {add,remove}_by
functions.

But..

> In my 3/3 repĺy I sent you some other examples of functions that
> currently end up calling resize_hpt_for_hotplug() without comming from 
> hotplug-memory.c. Is that ok that they do not call it anymore?

..as I replied there, I was wrong about it being safe to move the
resizes all to the pseries dlpar code.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-09  7:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 14:36 [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
2021-04-30 14:36 ` [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug Leonardo Bras
2021-06-07  5:02   ` David Gibson
2021-06-09  0:52     ` Leonardo Brás
2021-06-09  4:40       ` David Gibson
2021-06-09  5:51         ` Leonardo Brás
2021-06-09  6:59           ` David Gibson
2021-04-30 14:36 ` [PATCH v2 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on " Leonardo Bras
2021-06-07  5:10   ` David Gibson
2021-06-09  3:09     ` Leonardo Brás
2021-04-30 14:36 ` [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug Leonardo Bras
2021-06-07  5:20   ` David Gibson
2021-06-09  5:30     ` Leonardo Brás
2021-06-09  6:08       ` David Gibson
2021-04-30 14:41 ` [PATCH v2 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).