linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
@ 2020-07-09 13:19 Aneesh Kumar K.V
  2020-07-09 13:19 ` [PATCH v3 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings Aneesh Kumar K.V
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-09 13:19 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Bharata B Rao

This is the next version of the fixes for memory unplug on radix.
The issues and the fix are described in the actual patches.

Changes from v2:
- Address review feedback

Changes from v1:
- Added back patch to drop split_kernel_mapping
- Most of the split_kernel_mapping related issues are now described
  in the removal patch
- drop pte fragment change
- use lmb size as the max mapping size.
- Radix baremetal now use memory block size of 1G.


Changes from v0:
- Rebased to latest kernel.
- Took care of p4d changes.
- Addressed Aneesh's review feedback:
 - Added comments.
 - Indentation fixed.
- Dropped the 1st patch (setting DRCONF_MEM_HOTREMOVABLE lmb flags) as
  it is debatable if this flag should be set in the device tree by OS
  and not by platform in case of hotplug. This can be looked at separately.
  (The fixes in this patchset remain valid without the dropped patch)
- Dropped the last patch that removed split_kernel_mapping() to ensure
  that spilitting code is available for any radix guest running on
  platforms that don't set DRCONF_MEM_HOTREMOVABLE.



Aneesh Kumar K.V (2):
  powerpc/mm/radix: Fix PTE/PMD fragment count for early page table
    mappings
  powerpc/mm/radix: Create separate mappings for hot-plugged memory

Bharata B Rao (2):
  powerpc/mm/radix: Free PUD table when freeing pagetable
  powerpc/mm/radix: Remove split_kernel_mapping()

 arch/powerpc/include/asm/book3s/64/mmu.h     |   5 +
 arch/powerpc/include/asm/book3s/64/pgalloc.h |  16 +-
 arch/powerpc/mm/book3s64/pgtable.c           |   5 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c     | 197 +++++++++++--------
 arch/powerpc/mm/pgtable-frag.c               |   3 +
 arch/powerpc/platforms/powernv/setup.c       |  10 +-
 6 files changed, 147 insertions(+), 89 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
  2020-07-09 13:19 [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Aneesh Kumar K.V
@ 2020-07-09 13:19 ` Aneesh Kumar K.V
  2020-07-09 13:19 ` [PATCH v3 2/4] powerpc/mm/radix: Free PUD table when freeing pagetable Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-09 13:19 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Bharata B Rao

We can hit the following BUG_ON during memory unplug:

kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
NIP [c000000000093308] pmd_fragment_free+0x48/0xc0
LR [c00000000147bfec] remove_pagetable+0x578/0x60c
Call Trace:
0xc000008050000000 (unreliable)
remove_pagetable+0x384/0x60c
radix__remove_section_mapping+0x18/0x2c
remove_section_mapping+0x1c/0x3c
arch_remove_memory+0x11c/0x180
try_remove_memory+0x120/0x1b0
__remove_memory+0x20/0x40
dlpar_remove_lmb+0xc0/0x114
dlpar_memory+0x8b0/0xb20
handle_dlpar_errorlog+0xc0/0x190
pseries_hp_work_fn+0x2c/0x60
process_one_work+0x30c/0x810
worker_thread+0x98/0x540
kthread+0x1c4/0x1d0
ret_from_kernel_thread+0x5c/0x74

This occurs when unplug is attempted for such memory which has
been mapped using memblock pages as part of early kernel page
table setup. We wouldn't have initialized the PMD or PTE fragment
count for those PMD or PTE pages.

This can be fixed by allocating memory in PAGE_SIZE granularity
during early page table allocation. This makes sure a specific
page is not shared for another memblock allocation and we can
free them correctly on removing page-table pages.

Since we now do PAGE_SIZE allocations for both PUD table and
PMD table (Note that PTE table allocation is already of PAGE_SIZE),
we end up allocating more memory for the same amount of system RAM.
Here is a comparision of how much more we need for a 64T and 2G
system after this patch:

1. 64T system
-------------
64T RAM would need 64G for vmemmap with struct page size being 64B.

128 PUD tables for 64T memory (1G mappings)
1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K

2. 2G system
------------
2G RAM would need 2M for vmemmap with struct page size being 64B.

1 PUD table for 2G memory (1G mapping)
1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 16 +++++++++++++++-
 arch/powerpc/mm/book3s64/pgtable.c           |  5 ++++-
 arch/powerpc/mm/book3s64/radix_pgtable.c     | 15 +++++++++++----
 arch/powerpc/mm/pgtable-frag.c               |  3 +++
 4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 69c5b051734f..e1af0b394ceb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -107,9 +107,23 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 	return pud;
 }
 
+static inline void __pud_free(pud_t *pud)
+{
+	struct page *page = virt_to_page(pud);
+
+	/*
+	 * Early pud pages allocated via memblock allocator
+	 * can't be directly freed to slab
+	 */
+	if (PageReserved(page))
+		free_reserved_page(page);
+	else
+		kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
+}
+
 static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 {
-	kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
+	return __pud_free(pud);
 }
 
 static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index c58ad1049909..85de5b574dd7 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -339,6 +339,9 @@ void pmd_fragment_free(unsigned long *pmd)
 {
 	struct page *page = virt_to_page(pmd);
 
+	if (PageReserved(page))
+		return free_reserved_page(page);
+
 	BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
 	if (atomic_dec_and_test(&page->pt_frag_refcount)) {
 		pgtable_pmd_page_dtor(page);
@@ -356,7 +359,7 @@ static inline void pgtable_free(void *table, int index)
 		pmd_fragment_free(table);
 		break;
 	case PUD_INDEX:
-		kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
+		__pud_free(table);
 		break;
 #if defined(CONFIG_PPC_4K_PAGES) && defined(CONFIG_HUGETLB_PAGE)
 		/* 16M hugepd directory at pud level */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index bb00e0cba119..85806a6bed4d 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -56,6 +56,13 @@ static __ref void *early_alloc_pgtable(unsigned long size, int nid,
 	return ptr;
 }
 
+/*
+ * When allocating pud or pmd pointers, we allocate a complete page
+ * of PAGE_SIZE rather than PUD_TABLE_SIZE or PMD_TABLE_SIZE. This
+ * is to ensure that the page obtained from the memblock allocator
+ * can be completely used as page table page and can be freed
+ * correctly when the page table entries are removed.
+ */
 static int early_map_kernel_page(unsigned long ea, unsigned long pa,
 			  pgprot_t flags,
 			  unsigned int map_page_size,
@@ -72,8 +79,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa,
 	pgdp = pgd_offset_k(ea);
 	p4dp = p4d_offset(pgdp, ea);
 	if (p4d_none(*p4dp)) {
-		pudp = early_alloc_pgtable(PUD_TABLE_SIZE, nid,
-						region_start, region_end);
+		pudp = early_alloc_pgtable(PAGE_SIZE, nid,
+					   region_start, region_end);
 		p4d_populate(&init_mm, p4dp, pudp);
 	}
 	pudp = pud_offset(p4dp, ea);
@@ -82,8 +89,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa,
 		goto set_the_pte;
 	}
 	if (pud_none(*pudp)) {
-		pmdp = early_alloc_pgtable(PMD_TABLE_SIZE, nid,
-						region_start, region_end);
+		pmdp = early_alloc_pgtable(PAGE_SIZE, nid, region_start,
+					   region_end);
 		pud_populate(&init_mm, pudp, pmdp);
 	}
 	pmdp = pmd_offset(pudp, ea);
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index ee4bd6d38602..97ae4935da79 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -110,6 +110,9 @@ void pte_fragment_free(unsigned long *table, int kernel)
 {
 	struct page *page = virt_to_page(table);
 
+	if (PageReserved(page))
+		return free_reserved_page(page);
+
 	BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
 	if (atomic_dec_and_test(&page->pt_frag_refcount)) {
 		if (!kernel)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/4] powerpc/mm/radix: Free PUD table when freeing pagetable
  2020-07-09 13:19 [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Aneesh Kumar K.V
  2020-07-09 13:19 ` [PATCH v3 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings Aneesh Kumar K.V
@ 2020-07-09 13:19 ` Aneesh Kumar K.V
  2020-07-09 13:19 ` [PATCH v3 3/4] powerpc/mm/radix: Remove split_kernel_mapping() Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-09 13:19 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K . V, Bharata B Rao

From: Bharata B Rao <bharata@linux.ibm.com>

remove_pagetable() isn't freeing PUD table. This causes memory
leak during memory unplug. Fix this.

Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()")
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 85806a6bed4d..46ad2da3087a 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -707,6 +707,21 @@ static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
 	pud_clear(pud);
 }
 
+static void free_pud_table(pud_t *pud_start, p4d_t *p4d)
+{
+	pud_t *pud;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		pud = pud_start + i;
+		if (!pud_none(*pud))
+			return;
+	}
+
+	pud_free(&init_mm, pud_start);
+	p4d_clear(p4d);
+}
+
 struct change_mapping_params {
 	pte_t *pte;
 	unsigned long start;
@@ -881,6 +896,7 @@ static void __meminit remove_pagetable(unsigned long start, unsigned long end)
 
 		pud_base = (pud_t *)p4d_page_vaddr(*p4d);
 		remove_pud_table(pud_base, addr, next);
+		free_pud_table(pud_base, p4d);
 	}
 
 	spin_unlock(&init_mm.page_table_lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/4] powerpc/mm/radix: Remove split_kernel_mapping()
  2020-07-09 13:19 [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Aneesh Kumar K.V
  2020-07-09 13:19 ` [PATCH v3 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings Aneesh Kumar K.V
  2020-07-09 13:19 ` [PATCH v3 2/4] powerpc/mm/radix: Free PUD table when freeing pagetable Aneesh Kumar K.V
@ 2020-07-09 13:19 ` Aneesh Kumar K.V
  2020-07-09 13:19 ` [PATCH v3 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-09 13:19 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K . V, Bharata B Rao

From: Bharata B Rao <bharata@linux.ibm.com>

We split the page table mapping on memory unplug if the
linear range was mapped with huge page mapping (for ex: 1G)
The page table splitting code has a few issues:

1. Recursive locking
--------------------
Memory unplug path takes cpu_hotplug_lock and calls stop_machine()
for splitting the mappings. However stop_machine() takes
cpu_hotplug_lock again causing deadlock.

2. BUG: sleeping function called from in_atomic() context
---------------------------------------------------------
Memory unplug path (remove_pagetable) takes init_mm.page_table_lock
spinlock and later calls stop_machine() which does wait_for_completion()

3. Bad unlock unbalance
-----------------------
Memory unplug path takes init_mm.page_table_lock spinlock and calls
stop_machine(). The stop_machine thread function runs in a different
thread context (migration thread) which tries to release and reaquire
ptl. Releasing ptl from a different thread than which acquired it
causes bad unlock unbalance.

These problems can be avoided if we avoid mapping hot-plugged memory
with 1G mapping, thereby removing the need for splitting them during
unplug. The kernel always make sure the minimum unplug request is
SUBSECTION_SIZE for device memory and SECTION_SIZE for regular memory.

In preparation for such a change remove page table splitting support.

This essentially is a revert of
commit 4dd5f8a99e791 ("powerpc/mm/radix: Split linear mapping on hot-unplug")

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 95 +++++-------------------
 1 file changed, 19 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 46ad2da3087a..d5a01b9aadc9 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -15,7 +15,6 @@
 #include <linux/mm.h>
 #include <linux/hugetlb.h>
 #include <linux/string_helpers.h>
-#include <linux/stop_machine.h>
 
 #include <asm/pgalloc.h>
 #include <asm/mmu_context.h>
@@ -722,32 +721,6 @@ static void free_pud_table(pud_t *pud_start, p4d_t *p4d)
 	p4d_clear(p4d);
 }
 
-struct change_mapping_params {
-	pte_t *pte;
-	unsigned long start;
-	unsigned long end;
-	unsigned long aligned_start;
-	unsigned long aligned_end;
-};
-
-static int __meminit stop_machine_change_mapping(void *data)
-{
-	struct change_mapping_params *params =
-			(struct change_mapping_params *)data;
-
-	if (!data)
-		return -1;
-
-	spin_unlock(&init_mm.page_table_lock);
-	pte_clear(&init_mm, params->aligned_start, params->pte);
-	create_physical_mapping(__pa(params->aligned_start),
-				__pa(params->start), -1, PAGE_KERNEL);
-	create_physical_mapping(__pa(params->end), __pa(params->aligned_end),
-				-1, PAGE_KERNEL);
-	spin_lock(&init_mm.page_table_lock);
-	return 0;
-}
-
 static void remove_pte_table(pte_t *pte_start, unsigned long addr,
 			     unsigned long end)
 {
@@ -776,52 +749,6 @@ static void remove_pte_table(pte_t *pte_start, unsigned long addr,
 	}
 }
 
-/*
- * clear the pte and potentially split the mapping helper
- */
-static void __meminit split_kernel_mapping(unsigned long addr, unsigned long end,
-				unsigned long size, pte_t *pte)
-{
-	unsigned long mask = ~(size - 1);
-	unsigned long aligned_start = addr & mask;
-	unsigned long aligned_end = addr + size;
-	struct change_mapping_params params;
-	bool split_region = false;
-
-	if ((end - addr) < size) {
-		/*
-		 * We're going to clear the PTE, but not flushed
-		 * the mapping, time to remap and flush. The
-		 * effects if visible outside the processor or
-		 * if we are running in code close to the
-		 * mapping we cleared, we are in trouble.
-		 */
-		if (overlaps_kernel_text(aligned_start, addr) ||
-			overlaps_kernel_text(end, aligned_end)) {
-			/*
-			 * Hack, just return, don't pte_clear
-			 */
-			WARN_ONCE(1, "Linear mapping %lx->%lx overlaps kernel "
-				  "text, not splitting\n", addr, end);
-			return;
-		}
-		split_region = true;
-	}
-
-	if (split_region) {
-		params.pte = pte;
-		params.start = addr;
-		params.end = end;
-		params.aligned_start = addr & ~(size - 1);
-		params.aligned_end = min_t(unsigned long, aligned_end,
-				(unsigned long)__va(memblock_end_of_DRAM()));
-		stop_machine(stop_machine_change_mapping, &params, NULL);
-		return;
-	}
-
-	pte_clear(&init_mm, addr, pte);
-}
-
 static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
 			     unsigned long end)
 {
@@ -837,7 +764,12 @@ static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
 			continue;
 
 		if (pmd_is_leaf(*pmd)) {
-			split_kernel_mapping(addr, end, PMD_SIZE, (pte_t *)pmd);
+			if (!IS_ALIGNED(addr, PMD_SIZE) ||
+			    !IS_ALIGNED(next, PMD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+			pte_clear(&init_mm, addr, (pte_t *)pmd);
 			continue;
 		}
 
@@ -862,7 +794,12 @@ static void remove_pud_table(pud_t *pud_start, unsigned long addr,
 			continue;
 
 		if (pud_is_leaf(*pud)) {
-			split_kernel_mapping(addr, end, PUD_SIZE, (pte_t *)pud);
+			if (!IS_ALIGNED(addr, PUD_SIZE) ||
+			    !IS_ALIGNED(next, PUD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+			pte_clear(&init_mm, addr, (pte_t *)pud);
 			continue;
 		}
 
@@ -890,7 +827,13 @@ static void __meminit remove_pagetable(unsigned long start, unsigned long end)
 			continue;
 
 		if (p4d_is_leaf(*p4d)) {
-			split_kernel_mapping(addr, end, P4D_SIZE, (pte_t *)p4d);
+			if (!IS_ALIGNED(addr, P4D_SIZE) ||
+			    !IS_ALIGNED(next, P4D_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
+			pte_clear(&init_mm, addr, (pte_t *)pgd);
 			continue;
 		}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory
  2020-07-09 13:19 [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2020-07-09 13:19 ` [PATCH v3 3/4] powerpc/mm/radix: Remove split_kernel_mapping() Aneesh Kumar K.V
@ 2020-07-09 13:19 ` Aneesh Kumar K.V
  2020-07-16 14:00 ` [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Nathan Lynch
  2020-07-24 13:24 ` Michael Ellerman
  5 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-09 13:19 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Bharata B Rao

To enable memory unplug without splitting kernel page table
mapping, we force the max mapping size to the LMB size. LMB
size is the unit in which hypervisor will do memory add/remove
operation.

Pseries systems supports max LMB size of 256MB. Hence on pseries,
we now end up mapping memory with 2M page size instead of 1G. To improve
that we want hypervisor to hint the kernel about the hotplug
memory range. That was added that as part of

commit b6eca183e23e ("powerpc/kernel: Enables memory
hot-remove after reboot on pseries guests")

But PowerVM doesn't provide that hint yet. Once we get PowerVM
updated, we can then force the 2M mapping only to hot-pluggable
memory region using memblock_is_hotpluggable(). Till then
let's depend on LMB size for finding the mapping page size
for linear range.

With this change KVM guest will also be doing linear mapping with
2M page size.

The actual TLB benefit of mapping guest page table entries with
hugepage size can only be materialized if the partition scoped
entries are also using the same or higher page size. A guest using
1G hugetlbfs backing guest memory can have a performance impact with
the above change.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  5 ++
 arch/powerpc/mm/book3s64/radix_pgtable.c | 81 ++++++++++++++++++++----
 arch/powerpc/platforms/powernv/setup.c   | 10 ++-
 3 files changed, 84 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5393a535240c..15aae924f41c 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -82,6 +82,11 @@ extern unsigned int mmu_pid_bits;
 /* Base PID to allocate from */
 extern unsigned int mmu_base_pid;
 
+/*
+ * memory block size used with radix translation.
+ */
+extern unsigned int __ro_after_init radix_mem_block_size;
+
 #define PRTB_SIZE_SHIFT	(mmu_pid_bits + 4)
 #define PRTB_ENTRIES	(1ul << mmu_pid_bits)
 
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index d5a01b9aadc9..bba45fc0b7b2 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -15,6 +15,7 @@
 #include <linux/mm.h>
 #include <linux/hugetlb.h>
 #include <linux/string_helpers.h>
+#include <linux/memory.h>
 
 #include <asm/pgalloc.h>
 #include <asm/mmu_context.h>
@@ -33,6 +34,7 @@
 
 unsigned int mmu_pid_bits;
 unsigned int mmu_base_pid;
+unsigned int radix_mem_block_size __ro_after_init;
 
 static __ref void *early_alloc_pgtable(unsigned long size, int nid,
 			unsigned long region_start, unsigned long region_end)
@@ -265,6 +267,7 @@ static unsigned long next_boundary(unsigned long addr, unsigned long end)
 
 static int __meminit create_physical_mapping(unsigned long start,
 					     unsigned long end,
+					     unsigned long max_mapping_size,
 					     int nid, pgprot_t _prot)
 {
 	unsigned long vaddr, addr, mapping_size = 0;
@@ -278,6 +281,8 @@ static int __meminit create_physical_mapping(unsigned long start,
 		int rc;
 
 		gap = next_boundary(addr, end) - addr;
+		if (gap > max_mapping_size)
+			gap = max_mapping_size;
 		previous_size = mapping_size;
 		prev_exec = exec;
 
@@ -328,8 +333,9 @@ static void __init radix_init_pgtable(void)
 
 	/* We don't support slb for radix */
 	mmu_slb_size = 0;
+
 	/*
-	 * Create the linear mapping, using standard page size for now
+	 * Create the linear mapping
 	 */
 	for_each_memblock(memory, reg) {
 		/*
@@ -345,6 +351,7 @@ static void __init radix_init_pgtable(void)
 
 		WARN_ON(create_physical_mapping(reg->base,
 						reg->base + reg->size,
+						radix_mem_block_size,
 						-1, PAGE_KERNEL));
 	}
 
@@ -485,6 +492,47 @@ static int __init radix_dt_scan_page_sizes(unsigned long node,
 	return 1;
 }
 
+static int __init probe_memory_block_size(unsigned long node, const char *uname, int
+					  depth, void *data)
+{
+	unsigned long *mem_block_size = (unsigned long *)data;
+	const __be64 *prop;
+	int len;
+
+	if (depth != 1)
+		return 0;
+
+	if (strcmp(uname, "ibm,dynamic-reconfiguration-memory"))
+		return 0;
+
+	prop = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
+	if (!prop || len < sizeof(__be64))
+		/*
+		 * Nothing in the device tree
+		 */
+		*mem_block_size = MIN_MEMORY_BLOCK_SIZE;
+	else
+		*mem_block_size = be64_to_cpup(prop);
+	return 1;
+}
+
+static unsigned long radix_memory_block_size(void)
+{
+	unsigned long mem_block_size = MIN_MEMORY_BLOCK_SIZE;
+
+	/*
+	 * OPAL firmware feature is set by now. Hence we are ok
+	 * to test OPAL feature.
+	 */
+	if (firmware_has_feature(FW_FEATURE_OPAL))
+		mem_block_size = 1UL * 1024 * 1024 * 1024;
+	else
+		of_scan_flat_dt(probe_memory_block_size, &mem_block_size);
+
+	return mem_block_size;
+}
+
+
 void __init radix__early_init_devtree(void)
 {
 	int rc;
@@ -493,17 +541,27 @@ void __init radix__early_init_devtree(void)
 	 * Try to find the available page sizes in the device-tree
 	 */
 	rc = of_scan_flat_dt(radix_dt_scan_page_sizes, NULL);
-	if (rc != 0)  /* Found */
-		goto found;
+	if (!rc) {
+		/*
+		 * No page size details found in device tree.
+		 * Let's assume we have page 4k and 64k support
+		 */
+		mmu_psize_defs[MMU_PAGE_4K].shift = 12;
+		mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
+
+		mmu_psize_defs[MMU_PAGE_64K].shift = 16;
+		mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
+	}
+
 	/*
-	 * let's assume we have page 4k and 64k support
+	 * Max mapping size used when mapping pages. We don't use
+	 * ppc_md.memory_block_size() here because this get called
+	 * early and we don't have machine probe called yet. Also
+	 * the pseries implementation only check for ibm,lmb-size.
+	 * All hypervisor supporting radix do expose that device
+	 * tree node.
 	 */
-	mmu_psize_defs[MMU_PAGE_4K].shift = 12;
-	mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
-
-	mmu_psize_defs[MMU_PAGE_64K].shift = 16;
-	mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
-found:
+	radix_mem_block_size = radix_memory_block_size();
 	return;
 }
 
@@ -855,7 +913,8 @@ int __meminit radix__create_section_mapping(unsigned long start,
 		return -1;
 	}
 
-	return create_physical_mapping(__pa(start), __pa(end), nid, prot);
+	return create_physical_mapping(__pa(start), __pa(end),
+				       radix_mem_block_size, nid, prot);
 }
 
 int __meminit radix__remove_section_mapping(unsigned long start, unsigned long end)
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 3bc188da82ba..7fcb88623081 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -399,7 +399,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 static unsigned long pnv_memory_block_size(void)
 {
-	return 256UL * 1024 * 1024;
+	/*
+	 * We map the kernel linear region with 1GB large pages on radix. For
+	 * memory hot unplug to work our memory block size must be at least
+	 * this size.
+	 */
+	if (radix_enabled())
+		return radix_mem_block_size;
+	else
+		return 256UL * 1024 * 1024;
 }
 #endif
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-09 13:19 [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2020-07-09 13:19 ` [PATCH v3 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory Aneesh Kumar K.V
@ 2020-07-16 14:00 ` Nathan Lynch
  2020-07-21  1:45   ` Michael Ellerman
  2020-07-24 13:24 ` Michael Ellerman
  5 siblings, 1 reply; 16+ messages in thread
From: Nathan Lynch @ 2020-07-16 14:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Bharata B Rao

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> This is the next version of the fixes for memory unplug on radix.
> The issues and the fix are described in the actual patches.

I guess this isn't actually causing problems at runtime right now, but I
notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
arch_remove_memory(), which ought to be mmu-agnostic:

int __ref arch_add_memory(int nid, u64 start, u64 size,
			  struct mhp_params *params)
{
	unsigned long start_pfn = start >> PAGE_SHIFT;
	unsigned long nr_pages = size >> PAGE_SHIFT;
	int rc;

	resize_hpt_for_hotplug(memblock_phys_mem_size());

	start = (unsigned long)__va(start);
	rc = create_section_mapping(start, start + size, nid,
				    params->pgprot);
...


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-16 14:00 ` [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Nathan Lynch
@ 2020-07-21  1:45   ` Michael Ellerman
  2020-07-21  3:29     ` Bharata B Rao
  2020-07-21  4:42     ` Aneesh Kumar K.V
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Ellerman @ 2020-07-21  1:45 UTC (permalink / raw)
  To: Nathan Lynch, Aneesh Kumar K.V; +Cc: linuxppc-dev, Bharata B Rao

Nathan Lynch <nathanl@linux.ibm.com> writes:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> This is the next version of the fixes for memory unplug on radix.
>> The issues and the fix are described in the actual patches.
>
> I guess this isn't actually causing problems at runtime right now, but I
> notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> arch_remove_memory(), which ought to be mmu-agnostic:
>
> int __ref arch_add_memory(int nid, u64 start, u64 size,
> 			  struct mhp_params *params)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int rc;
>
> 	resize_hpt_for_hotplug(memblock_phys_mem_size());
>
> 	start = (unsigned long)__va(start);
> 	rc = create_section_mapping(start, start + size, nid,
> 				    params->pgprot);
> ...

Hmm well spotted.

That does return early if the ops are not setup:

int resize_hpt_for_hotplug(unsigned long new_mem_size)
{
	unsigned target_hpt_shift;

	if (!mmu_hash_ops.resize_hpt)
		return 0;


And:

void __init hpte_init_pseries(void)
{
	...
	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;

And that comes in via ibm,hypertas-functions:

	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},


But firmware is not necessarily going to add/remove that call based on
whether we're using hash/radix.

So I think a follow-up patch is needed to make this more robust.

Aneesh/Bharata what platform did you test this series on? I'm curious
how this didn't break.

cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-21  1:45   ` Michael Ellerman
@ 2020-07-21  3:29     ` Bharata B Rao
  2020-07-21 12:25       ` Michael Ellerman
  2020-07-21  4:42     ` Aneesh Kumar K.V
  1 sibling, 1 reply; 16+ messages in thread
From: Bharata B Rao @ 2020-07-21  3:29 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev

On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >> This is the next version of the fixes for memory unplug on radix.
> >> The issues and the fix are described in the actual patches.
> >
> > I guess this isn't actually causing problems at runtime right now, but I
> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> > arch_remove_memory(), which ought to be mmu-agnostic:
> >
> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> > 			  struct mhp_params *params)
> > {
> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
> > 	int rc;
> >
> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
> >
> > 	start = (unsigned long)__va(start);
> > 	rc = create_section_mapping(start, start + size, nid,
> > 				    params->pgprot);
> > ...
> 
> Hmm well spotted.
> 
> That does return early if the ops are not setup:
> 
> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> {
> 	unsigned target_hpt_shift;
> 
> 	if (!mmu_hash_ops.resize_hpt)
> 		return 0;
> 
> 
> And:
> 
> void __init hpte_init_pseries(void)
> {
> 	...
> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> 
> And that comes in via ibm,hypertas-functions:
> 
> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> 
> 
> But firmware is not necessarily going to add/remove that call based on
> whether we're using hash/radix.

Correct but hpte_init_pseries() will not be called for radix guests.

> 
> So I think a follow-up patch is needed to make this more robust.
> 
> Aneesh/Bharata what platform did you test this series on? I'm curious
> how this didn't break.

I have tested memory hotplug/unplug for radix guest on zz platform and
sanity-tested this for hash guest on P8.

As noted above, mmu_hash_ops.resize_hpt will not be set for radix
guest and hence we won't see any breakage.

However a separate patch to fix this will be good.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-21  1:45   ` Michael Ellerman
  2020-07-21  3:29     ` Bharata B Rao
@ 2020-07-21  4:42     ` Aneesh Kumar K.V
  1 sibling, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-21  4:42 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Lynch; +Cc: linuxppc-dev, Bharata B Rao

On 7/21/20 7:15 AM, Michael Ellerman wrote:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>> This is the next version of the fixes for memory unplug on radix.
>>> The issues and the fix are described in the actual patches.
>>
>> I guess this isn't actually causing problems at runtime right now, but I
>> notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
>> arch_remove_memory(), which ought to be mmu-agnostic:
>>
>> int __ref arch_add_memory(int nid, u64 start, u64 size,
>> 			  struct mhp_params *params)
>> {
>> 	unsigned long start_pfn = start >> PAGE_SHIFT;
>> 	unsigned long nr_pages = size >> PAGE_SHIFT;
>> 	int rc;
>>
>> 	resize_hpt_for_hotplug(memblock_phys_mem_size());
>>
>> 	start = (unsigned long)__va(start);
>> 	rc = create_section_mapping(start, start + size, nid,
>> 				    params->pgprot);
>> ...
> 
> Hmm well spotted.
> 
> That does return early if the ops are not setup:
> 
> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> {
> 	unsigned target_hpt_shift;
> 
> 	if (!mmu_hash_ops.resize_hpt)
> 		return 0;
> 
> 
> And:
> 
> void __init hpte_init_pseries(void)
> {
> 	...
> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> 
> And that comes in via ibm,hypertas-functions:
> 
> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> 
> 
> But firmware is not necessarily going to add/remove that call based on
> whether we're using hash/radix.
> 


We are good there because hpte_init_pseries is only called for hash 
translation.

early_init_mmu()
-> hash__early_init_mmu
    -> hpte_init_pseries
       -> mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;

> So I think a follow-up patch is needed to make this more robust.
> 
> Aneesh/Bharata what platform did you test this series on? I'm curious
> how this didn't break.

All the changes are tested with kvm.

-aneesh


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-21  3:29     ` Bharata B Rao
@ 2020-07-21 12:25       ` Michael Ellerman
  2020-07-22  6:05         ` Bharata B Rao
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Ellerman @ 2020-07-21 12:25 UTC (permalink / raw)
  To: bharata; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev

Bharata B Rao <bharata@linux.ibm.com> writes:
> On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> >> This is the next version of the fixes for memory unplug on radix.
>> >> The issues and the fix are described in the actual patches.
>> >
>> > I guess this isn't actually causing problems at runtime right now, but I
>> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
>> > arch_remove_memory(), which ought to be mmu-agnostic:
>> >
>> > int __ref arch_add_memory(int nid, u64 start, u64 size,
>> > 			  struct mhp_params *params)
>> > {
>> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
>> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
>> > 	int rc;
>> >
>> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
>> >
>> > 	start = (unsigned long)__va(start);
>> > 	rc = create_section_mapping(start, start + size, nid,
>> > 				    params->pgprot);
>> > ...
>> 
>> Hmm well spotted.
>> 
>> That does return early if the ops are not setup:
>> 
>> int resize_hpt_for_hotplug(unsigned long new_mem_size)
>> {
>> 	unsigned target_hpt_shift;
>> 
>> 	if (!mmu_hash_ops.resize_hpt)
>> 		return 0;
>> 
>> 
>> And:
>> 
>> void __init hpte_init_pseries(void)
>> {
>> 	...
>> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
>> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
>> 
>> And that comes in via ibm,hypertas-functions:
>> 
>> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
>> 
>> 
>> But firmware is not necessarily going to add/remove that call based on
>> whether we're using hash/radix.
>
> Correct but hpte_init_pseries() will not be called for radix guests.

Yeah, duh. You'd think the function name would have been a sufficient
clue for me :)

>> So I think a follow-up patch is needed to make this more robust.
>> 
>> Aneesh/Bharata what platform did you test this series on? I'm curious
>> how this didn't break.
>
> I have tested memory hotplug/unplug for radix guest on zz platform and
> sanity-tested this for hash guest on P8.
>
> As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> guest and hence we won't see any breakage.

OK.

That's probably fine as it is then. Or maybe just a comment in
resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
we're using radix.

cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-21 12:25       ` Michael Ellerman
@ 2020-07-22  6:05         ` Bharata B Rao
  2020-07-22  7:51           ` David Gibson
  2020-07-24 11:52           ` Michael Ellerman
  0 siblings, 2 replies; 16+ messages in thread
From: Bharata B Rao @ 2020-07-22  6:05 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev, david

On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> >> Nathan Lynch <nathanl@linux.ibm.com> writes:
> >> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >> >> This is the next version of the fixes for memory unplug on radix.
> >> >> The issues and the fix are described in the actual patches.
> >> >
> >> > I guess this isn't actually causing problems at runtime right now, but I
> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> >> > arch_remove_memory(), which ought to be mmu-agnostic:
> >> >
> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >> > 			  struct mhp_params *params)
> >> > {
> >> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
> >> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
> >> > 	int rc;
> >> >
> >> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
> >> >
> >> > 	start = (unsigned long)__va(start);
> >> > 	rc = create_section_mapping(start, start + size, nid,
> >> > 				    params->pgprot);
> >> > ...
> >> 
> >> Hmm well spotted.
> >> 
> >> That does return early if the ops are not setup:
> >> 
> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> >> {
> >> 	unsigned target_hpt_shift;
> >> 
> >> 	if (!mmu_hash_ops.resize_hpt)
> >> 		return 0;
> >> 
> >> 
> >> And:
> >> 
> >> void __init hpte_init_pseries(void)
> >> {
> >> 	...
> >> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> >> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> >> 
> >> And that comes in via ibm,hypertas-functions:
> >> 
> >> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> >> 
> >> 
> >> But firmware is not necessarily going to add/remove that call based on
> >> whether we're using hash/radix.
> >
> > Correct but hpte_init_pseries() will not be called for radix guests.
> 
> Yeah, duh. You'd think the function name would have been a sufficient
> clue for me :)
> 
> >> So I think a follow-up patch is needed to make this more robust.
> >> 
> >> Aneesh/Bharata what platform did you test this series on? I'm curious
> >> how this didn't break.
> >
> > I have tested memory hotplug/unplug for radix guest on zz platform and
> > sanity-tested this for hash guest on P8.
> >
> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> > guest and hence we won't see any breakage.
> 
> OK.
> 
> That's probably fine as it is then. Or maybe just a comment in
> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> we're using radix.

Or we could move these calls to hpt-only routines like below?

David - Do you remember if there was any particular reason to have
these two hpt-resize calls within powerpc-generic memory hotplug code?

diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
index c89b32443cff..1e6fa371cc38 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -17,12 +17,6 @@ extern int create_section_mapping(unsigned long start, unsigned long end,
 				  int nid, pgprot_t prot);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
-#else
-static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { return 0; }
-#endif
-
 #ifdef CONFIG_NUMA
 extern int hot_add_scn_to_nid(unsigned long scn_addr);
 #else
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index eec6f4e5e481..5daf53ec7600 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -787,7 +787,7 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int resize_hpt_for_hotplug(unsigned long new_mem_size)
+static int resize_hpt_for_hotplug(unsigned long new_mem_size)
 {
 	unsigned target_hpt_shift;
 
@@ -821,6 +821,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
 		return -1;
 	}
 
+	resize_hpt_for_hotplug(memblock_phys_mem_size());
+
 	rc = htab_bolt_mapping(start, end, __pa(start),
 			       pgprot_val(prot), mmu_linear_psize,
 			       mmu_kernel_ssize);
@@ -838,6 +840,10 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
 	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
 				     mmu_kernel_ssize);
 	WARN_ON(rc < 0);
+
+	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
+		pr_warn("Hash collision while resizing HPT\n");
+
 	return rc;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index c2c11eb8dcfc..9dafc636588f 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -127,8 +127,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int rc;
 
-	resize_hpt_for_hotplug(memblock_phys_mem_size());
-
 	start = (unsigned long)__va(start);
 	rc = create_section_mapping(start, start + size, nid,
 				    params->pgprot);
@@ -161,9 +159,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	 * hit that section of memory
 	 */
 	vm_unmap_aliases();
-
-	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
-		pr_warn("Hash collision while resizing HPT\n");
 }
 #endif
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-22  6:05         ` Bharata B Rao
@ 2020-07-22  7:51           ` David Gibson
  2020-07-24 11:52           ` Michael Ellerman
  1 sibling, 0 replies; 16+ messages in thread
From: David Gibson @ 2020-07-22  7:51 UTC (permalink / raw)
  To: Bharata B Rao; +Cc: Nathan Lynch, linuxppc-dev, Aneesh Kumar K.V

[-- Attachment #1: Type: text/plain, Size: 3314 bytes --]

On Wed, Jul 22, 2020 at 11:35:06AM +0530, Bharata B Rao wrote:
> On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> > Bharata B Rao <bharata@linux.ibm.com> writes:
> > > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> > >> Nathan Lynch <nathanl@linux.ibm.com> writes:
> > >> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> > >> >> This is the next version of the fixes for memory unplug on radix.
> > >> >> The issues and the fix are described in the actual patches.
> > >> >
> > >> > I guess this isn't actually causing problems at runtime right now, but I
> > >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> > >> > arch_remove_memory(), which ought to be mmu-agnostic:
> > >> >
> > >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> > >> > 			  struct mhp_params *params)
> > >> > {
> > >> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
> > >> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
> > >> > 	int rc;
> > >> >
> > >> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
> > >> >
> > >> > 	start = (unsigned long)__va(start);
> > >> > 	rc = create_section_mapping(start, start + size, nid,
> > >> > 				    params->pgprot);
> > >> > ...
> > >> 
> > >> Hmm well spotted.
> > >> 
> > >> That does return early if the ops are not setup:
> > >> 
> > >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> > >> {
> > >> 	unsigned target_hpt_shift;
> > >> 
> > >> 	if (!mmu_hash_ops.resize_hpt)
> > >> 		return 0;
> > >> 
> > >> 
> > >> And:
> > >> 
> > >> void __init hpte_init_pseries(void)
> > >> {
> > >> 	...
> > >> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> > >> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> > >> 
> > >> And that comes in via ibm,hypertas-functions:
> > >> 
> > >> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> > >> 
> > >> 
> > >> But firmware is not necessarily going to add/remove that call based on
> > >> whether we're using hash/radix.
> > >
> > > Correct but hpte_init_pseries() will not be called for radix guests.
> > 
> > Yeah, duh. You'd think the function name would have been a sufficient
> > clue for me :)
> > 
> > >> So I think a follow-up patch is needed to make this more robust.
> > >> 
> > >> Aneesh/Bharata what platform did you test this series on? I'm curious
> > >> how this didn't break.
> > >
> > > I have tested memory hotplug/unplug for radix guest on zz platform and
> > > sanity-tested this for hash guest on P8.
> > >
> > > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> > > guest and hence we won't see any breakage.
> > 
> > OK.
> > 
> > That's probably fine as it is then. Or maybe just a comment in
> > resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> > we're using radix.
> 
> Or we could move these calls to hpt-only routines like below?
> 
> David - Do you remember if there was any particular reason to have
> these two hpt-resize calls within powerpc-generic memory hotplug code?

I don't remember, sorry.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-22  6:05         ` Bharata B Rao
  2020-07-22  7:51           ` David Gibson
@ 2020-07-24 11:52           ` Michael Ellerman
  2020-07-24 12:17             ` Bharata B Rao
  2020-07-25  7:37             ` David Gibson
  1 sibling, 2 replies; 16+ messages in thread
From: Michael Ellerman @ 2020-07-24 11:52 UTC (permalink / raw)
  To: bharata; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev, david

Bharata B Rao <bharata@linux.ibm.com> writes:
> On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
>> Bharata B Rao <bharata@linux.ibm.com> writes:
>> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
>> >> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> >> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> >> >> This is the next version of the fixes for memory unplug on radix.
>> >> >> The issues and the fix are described in the actual patches.
>> >> >
>> >> > I guess this isn't actually causing problems at runtime right now, but I
>> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
>> >> > arch_remove_memory(), which ought to be mmu-agnostic:
>> >> >
>> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
>> >> > 			  struct mhp_params *params)
>> >> > {
>> >> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
>> >> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
>> >> > 	int rc;
>> >> >
>> >> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
>> >> >
>> >> > 	start = (unsigned long)__va(start);
>> >> > 	rc = create_section_mapping(start, start + size, nid,
>> >> > 				    params->pgprot);
>> >> > ...
>> >> 
>> >> Hmm well spotted.
>> >> 
>> >> That does return early if the ops are not setup:
>> >> 
>> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
>> >> {
>> >> 	unsigned target_hpt_shift;
>> >> 
>> >> 	if (!mmu_hash_ops.resize_hpt)
>> >> 		return 0;
>> >> 
>> >> 
>> >> And:
>> >> 
>> >> void __init hpte_init_pseries(void)
>> >> {
>> >> 	...
>> >> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
>> >> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
>> >> 
>> >> And that comes in via ibm,hypertas-functions:
>> >> 
>> >> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
>> >> 
>> >> 
>> >> But firmware is not necessarily going to add/remove that call based on
>> >> whether we're using hash/radix.
>> >
>> > Correct but hpte_init_pseries() will not be called for radix guests.
>> 
>> Yeah, duh. You'd think the function name would have been a sufficient
>> clue for me :)
>> 
>> >> So I think a follow-up patch is needed to make this more robust.
>> >> 
>> >> Aneesh/Bharata what platform did you test this series on? I'm curious
>> >> how this didn't break.
>> >
>> > I have tested memory hotplug/unplug for radix guest on zz platform and
>> > sanity-tested this for hash guest on P8.
>> >
>> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
>> > guest and hence we won't see any breakage.
>> 
>> OK.
>> 
>> That's probably fine as it is then. Or maybe just a comment in
>> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
>> we're using radix.
>
> Or we could move these calls to hpt-only routines like below?

That looks like it would be equivalent, and would nicely isolate those
calls in hash specific code. So yeah I think that's worth sending as a
proper patch, even better if you can test it.

> David - Do you remember if there was any particular reason to have
> these two hpt-resize calls within powerpc-generic memory hotplug code?

I think the HPT resizing was developed before or concurrently with the
radix support, so I would guess it was just not something we thought
about at the time.

cheers

> diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
> index c89b32443cff..1e6fa371cc38 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -17,12 +17,6 @@ extern int create_section_mapping(unsigned long start, unsigned long end,
>  				  int nid, pgprot_t prot);
>  extern int remove_section_mapping(unsigned long start, unsigned long end);
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
> -#else
> -static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { return 0; }
> -#endif
> -
>  #ifdef CONFIG_NUMA
>  extern int hot_add_scn_to_nid(unsigned long scn_addr);
>  #else
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index eec6f4e5e481..5daf53ec7600 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -787,7 +787,7 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -int resize_hpt_for_hotplug(unsigned long new_mem_size)
> +static int resize_hpt_for_hotplug(unsigned long new_mem_size)
>  {
>  	unsigned target_hpt_shift;
>  
> @@ -821,6 +821,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
>  		return -1;
>  	}
>  
> +	resize_hpt_for_hotplug(memblock_phys_mem_size());
> +
>  	rc = htab_bolt_mapping(start, end, __pa(start),
>  			       pgprot_val(prot), mmu_linear_psize,
>  			       mmu_kernel_ssize);
> @@ -838,6 +840,10 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
>  	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
>  				     mmu_kernel_ssize);
>  	WARN_ON(rc < 0);
> +
> +	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> +		pr_warn("Hash collision while resizing HPT\n");
> +
>  	return rc;
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index c2c11eb8dcfc..9dafc636588f 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -127,8 +127,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>  	int rc;
>  
> -	resize_hpt_for_hotplug(memblock_phys_mem_size());
> -
>  	start = (unsigned long)__va(start);
>  	rc = create_section_mapping(start, start + size, nid,
>  				    params->pgprot);
> @@ -161,9 +159,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>  	 * hit that section of memory
>  	 */
>  	vm_unmap_aliases();
> -
> -	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> -		pr_warn("Hash collision while resizing HPT\n");
>  }
>  #endif
>  
> -- 
> 2.26.2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-24 11:52           ` Michael Ellerman
@ 2020-07-24 12:17             ` Bharata B Rao
  2020-07-25  7:37             ` David Gibson
  1 sibling, 0 replies; 16+ messages in thread
From: Bharata B Rao @ 2020-07-24 12:17 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev, david

On Fri, Jul 24, 2020 at 09:52:14PM +1000, Michael Ellerman wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> >> Bharata B Rao <bharata@linux.ibm.com> writes:
> >> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> >> >> Nathan Lynch <nathanl@linux.ibm.com> writes:
> >> >> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >> >> >> This is the next version of the fixes for memory unplug on radix.
> >> >> >> The issues and the fix are described in the actual patches.
> >> >> >
> >> >> > I guess this isn't actually causing problems at runtime right now, but I
> >> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> >> >> > arch_remove_memory(), which ought to be mmu-agnostic:
> >> >> >
> >> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >> >> > 			  struct mhp_params *params)
> >> >> > {
> >> >> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
> >> >> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
> >> >> > 	int rc;
> >> >> >
> >> >> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
> >> >> >
> >> >> > 	start = (unsigned long)__va(start);
> >> >> > 	rc = create_section_mapping(start, start + size, nid,
> >> >> > 				    params->pgprot);
> >> >> > ...
> >> >> 
> >> >> Hmm well spotted.
> >> >> 
> >> >> That does return early if the ops are not setup:
> >> >> 
> >> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> >> >> {
> >> >> 	unsigned target_hpt_shift;
> >> >> 
> >> >> 	if (!mmu_hash_ops.resize_hpt)
> >> >> 		return 0;
> >> >> 
> >> >> 
> >> >> And:
> >> >> 
> >> >> void __init hpte_init_pseries(void)
> >> >> {
> >> >> 	...
> >> >> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> >> >> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> >> >> 
> >> >> And that comes in via ibm,hypertas-functions:
> >> >> 
> >> >> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> >> >> 
> >> >> 
> >> >> But firmware is not necessarily going to add/remove that call based on
> >> >> whether we're using hash/radix.
> >> >
> >> > Correct but hpte_init_pseries() will not be called for radix guests.
> >> 
> >> Yeah, duh. You'd think the function name would have been a sufficient
> >> clue for me :)
> >> 
> >> >> So I think a follow-up patch is needed to make this more robust.
> >> >> 
> >> >> Aneesh/Bharata what platform did you test this series on? I'm curious
> >> >> how this didn't break.
> >> >
> >> > I have tested memory hotplug/unplug for radix guest on zz platform and
> >> > sanity-tested this for hash guest on P8.
> >> >
> >> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> >> > guest and hence we won't see any breakage.
> >> 
> >> OK.
> >> 
> >> That's probably fine as it is then. Or maybe just a comment in
> >> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> >> we're using radix.
> >
> > Or we could move these calls to hpt-only routines like below?
> 
> That looks like it would be equivalent, and would nicely isolate those
> calls in hash specific code. So yeah I think that's worth sending as a
> proper patch, even better if you can test it.

Sure I will send it as a proper patch. I did test minimal hotplug/unplug
for hash guest with that patch, will do more extensive test and resend.

> 
> > David - Do you remember if there was any particular reason to have
> > these two hpt-resize calls within powerpc-generic memory hotplug code?
> 
> I think the HPT resizing was developed before or concurrently with the
> radix support, so I would guess it was just not something we thought
> about at the time.

Right.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-09 13:19 [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2020-07-16 14:00 ` [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Nathan Lynch
@ 2020-07-24 13:24 ` Michael Ellerman
  5 siblings, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2020-07-24 13:24 UTC (permalink / raw)
  To: linuxppc-dev, Aneesh Kumar K.V, mpe; +Cc: Bharata B Rao

On Thu, 9 Jul 2020 18:49:21 +0530, Aneesh Kumar K.V wrote:
> This is the next version of the fixes for memory unplug on radix.
> The issues and the fix are described in the actual patches.
> 
> Changes from v2:
> - Address review feedback
> 
> Changes from v1:
> - Added back patch to drop split_kernel_mapping
> - Most of the split_kernel_mapping related issues are now described
>   in the removal patch
> - drop pte fragment change
> - use lmb size as the max mapping size.
> - Radix baremetal now use memory block size of 1G.
> 
> [...]

Applied to powerpc/next.

[1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
      https://git.kernel.org/powerpc/c/645d5ce2f7d6cb4dcf6a4e087fb550e238d24283
[2/4] powerpc/mm/radix: Free PUD table when freeing pagetable
      https://git.kernel.org/powerpc/c/9ce8853b4a735c8115f55ac0e9c2b27a4c8f80b5
[3/4] powerpc/mm/radix: Remove split_kernel_mapping()
      https://git.kernel.org/powerpc/c/d6d6ebfc5dbb4008be21baa4ec2ad45606578966
[4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory
      https://git.kernel.org/powerpc/c/af9d00e93a4f062c5f160325d7b8f33336f6744e

cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
  2020-07-24 11:52           ` Michael Ellerman
  2020-07-24 12:17             ` Bharata B Rao
@ 2020-07-25  7:37             ` David Gibson
  1 sibling, 0 replies; 16+ messages in thread
From: David Gibson @ 2020-07-25  7:37 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev, bharata

[-- Attachment #1: Type: text/plain, Size: 3847 bytes --]

On Fri, Jul 24, 2020 at 09:52:14PM +1000, Michael Ellerman wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> >> Bharata B Rao <bharata@linux.ibm.com> writes:
> >> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> >> >> Nathan Lynch <nathanl@linux.ibm.com> writes:
> >> >> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >> >> >> This is the next version of the fixes for memory unplug on radix.
> >> >> >> The issues and the fix are described in the actual patches.
> >> >> >
> >> >> > I guess this isn't actually causing problems at runtime right now, but I
> >> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> >> >> > arch_remove_memory(), which ought to be mmu-agnostic:
> >> >> >
> >> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >> >> > 			  struct mhp_params *params)
> >> >> > {
> >> >> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
> >> >> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
> >> >> > 	int rc;
> >> >> >
> >> >> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
> >> >> >
> >> >> > 	start = (unsigned long)__va(start);
> >> >> > 	rc = create_section_mapping(start, start + size, nid,
> >> >> > 				    params->pgprot);
> >> >> > ...
> >> >> 
> >> >> Hmm well spotted.
> >> >> 
> >> >> That does return early if the ops are not setup:
> >> >> 
> >> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> >> >> {
> >> >> 	unsigned target_hpt_shift;
> >> >> 
> >> >> 	if (!mmu_hash_ops.resize_hpt)
> >> >> 		return 0;
> >> >> 
> >> >> 
> >> >> And:
> >> >> 
> >> >> void __init hpte_init_pseries(void)
> >> >> {
> >> >> 	...
> >> >> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> >> >> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> >> >> 
> >> >> And that comes in via ibm,hypertas-functions:
> >> >> 
> >> >> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> >> >> 
> >> >> 
> >> >> But firmware is not necessarily going to add/remove that call based on
> >> >> whether we're using hash/radix.
> >> >
> >> > Correct but hpte_init_pseries() will not be called for radix guests.
> >> 
> >> Yeah, duh. You'd think the function name would have been a sufficient
> >> clue for me :)
> >> 
> >> >> So I think a follow-up patch is needed to make this more robust.
> >> >> 
> >> >> Aneesh/Bharata what platform did you test this series on? I'm curious
> >> >> how this didn't break.
> >> >
> >> > I have tested memory hotplug/unplug for radix guest on zz platform and
> >> > sanity-tested this for hash guest on P8.
> >> >
> >> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> >> > guest and hence we won't see any breakage.
> >> 
> >> OK.
> >> 
> >> That's probably fine as it is then. Or maybe just a comment in
> >> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> >> we're using radix.
> >
> > Or we could move these calls to hpt-only routines like below?
> 
> That looks like it would be equivalent, and would nicely isolate those
> calls in hash specific code. So yeah I think that's worth sending as a
> proper patch, even better if you can test it.
> 
> > David - Do you remember if there was any particular reason to have
> > these two hpt-resize calls within powerpc-generic memory hotplug code?
> 
> I think the HPT resizing was developed before or concurrently with the
> radix support, so I would guess it was just not something we thought
> about at the time.

Sounds about right; I don't remember for certain.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-07-25  7:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09 13:19 [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Aneesh Kumar K.V
2020-07-09 13:19 ` [PATCH v3 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings Aneesh Kumar K.V
2020-07-09 13:19 ` [PATCH v3 2/4] powerpc/mm/radix: Free PUD table when freeing pagetable Aneesh Kumar K.V
2020-07-09 13:19 ` [PATCH v3 3/4] powerpc/mm/radix: Remove split_kernel_mapping() Aneesh Kumar K.V
2020-07-09 13:19 ` [PATCH v3 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory Aneesh Kumar K.V
2020-07-16 14:00 ` [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes Nathan Lynch
2020-07-21  1:45   ` Michael Ellerman
2020-07-21  3:29     ` Bharata B Rao
2020-07-21 12:25       ` Michael Ellerman
2020-07-22  6:05         ` Bharata B Rao
2020-07-22  7:51           ` David Gibson
2020-07-24 11:52           ` Michael Ellerman
2020-07-24 12:17             ` Bharata B Rao
2020-07-25  7:37             ` David Gibson
2020-07-21  4:42     ` Aneesh Kumar K.V
2020-07-24 13:24 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).