All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix
@ 2016-12-15 19:50 Reza Arbab
  2016-12-15 19:50 ` [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size Reza Arbab
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-15 19:50 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Memory hotplug is leading to hash page table calls, even on radix:

...
        arch_add_memory
                create_section_mapping
                        htab_bolt_mapping
                                BUG_ON(!ppc_md.hpte_insert);

To fix, refactor {create,remove}_section_mapping() into hash__ and radix__
variants. Implement the radix versions by borrowing from existing vmemmap
and x86 code.

This passes basic verification of plugging and removing memory, but this 
stuff is tricky and I'd appreciate extra scrutiny of the series for 
correctness--in particular, the adaptation of remove_pagetable() from x86.

/* changelog */

v3:
* Port remove_pagetable() et al. from x86 for unmapping.

* [RFC] -> [PATCH]

v2:
* https://lkml.kernel.org/r/1471449083-15931-1-git-send-email-arbab@linux.vnet.ibm.com

* Do not simply fall through to vmemmap_{create,remove}_mapping(). As Aneesh
  and Michael pointed out, they are tied to CONFIG_SPARSEMEM_VMEMMAP and only
  did what I needed by luck anyway.

v1:
* https://lkml.kernel.org/r/1466699962-22412-1-git-send-email-arbab@linux.vnet.ibm.com

Reza Arbab (5):
  powerpc/mm: set the radix linear page mapping size
  powerpc/mm: refactor {create,remove}_section_mapping()
  powerpc/mm: add radix__create_section_mapping()
  powerpc/mm: add radix__remove_section_mapping()
  powerpc/mm: unstub radix__vmemmap_remove_mapping()

 arch/powerpc/include/asm/book3s/64/hash.h  |   5 +
 arch/powerpc/include/asm/book3s/64/radix.h |   5 +
 arch/powerpc/mm/hash_utils_64.c            |   4 +-
 arch/powerpc/mm/pgtable-book3s64.c         |  18 +++
 arch/powerpc/mm/pgtable-radix.c            | 207 ++++++++++++++++++++++++++++-
 5 files changed, 236 insertions(+), 3 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size
  2016-12-15 19:50 [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Reza Arbab
@ 2016-12-15 19:50 ` Reza Arbab
  2016-12-19  8:58   ` Aneesh Kumar K.V
  2016-12-15 19:50 ` [PATCH v3 2/5] powerpc/mm: refactor {create, remove}_section_mapping() Reza Arbab
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Reza Arbab @ 2016-12-15 19:50 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

This was defaulting to 4K, regardless of PAGE_SIZE.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable-radix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 623a0dc..54bd70e 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -351,8 +351,10 @@ void __init radix__early_init_mmu(void)
 #ifdef CONFIG_PPC_64K_PAGES
 	/* PAGE_SIZE mappings */
 	mmu_virtual_psize = MMU_PAGE_64K;
+	mmu_linear_psize = MMU_PAGE_64K;
 #else
 	mmu_virtual_psize = MMU_PAGE_4K;
+	mmu_linear_psize = MMU_PAGE_4K;
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/5] powerpc/mm: refactor {create, remove}_section_mapping()
  2016-12-15 19:50 [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Reza Arbab
  2016-12-15 19:50 ` [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size Reza Arbab
@ 2016-12-15 19:50 ` Reza Arbab
  2016-12-19  9:00   ` Aneesh Kumar K.V
  2016-12-20  5:26   ` Balbir Singh
  2016-12-15 19:50 ` [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping() Reza Arbab
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-15 19:50 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Change {create,remove}_section_mapping() to be wrappers around functions
prefixed with "hash__".

This is preparation for the addition of their "radix__" variants. No
functional change.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash.h |  5 +++++
 arch/powerpc/mm/hash_utils_64.c           |  4 ++--
 arch/powerpc/mm/pgtable-book3s64.c        | 18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index f61cad3..dd90574 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -201,6 +201,11 @@ extern int __meminit hash__vmemmap_create_mapping(unsigned long start,
 					      unsigned long phys);
 extern void hash__vmemmap_remove_mapping(unsigned long start,
 				     unsigned long page_size);
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int hash__create_section_mapping(unsigned long start, unsigned long end);
+int hash__remove_section_mapping(unsigned long start, unsigned long end);
+#endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index b9a062f..96a4fb7 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -743,7 +743,7 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int create_section_mapping(unsigned long start, unsigned long end)
+int hash__create_section_mapping(unsigned long start, unsigned long end)
 {
 	int rc = htab_bolt_mapping(start, end, __pa(start),
 				   pgprot_val(PAGE_KERNEL), mmu_linear_psize,
@@ -757,7 +757,7 @@ int create_section_mapping(unsigned long start, unsigned long end)
 	return rc;
 }
 
-int remove_section_mapping(unsigned long start, unsigned long end)
+int hash__remove_section_mapping(unsigned long start, unsigned long end)
 {
 	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
 				     mmu_kernel_ssize);
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index ebf9782..653ff6c 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -126,3 +126,21 @@ void mmu_cleanup_all(void)
 	else if (mmu_hash_ops.hpte_clear_all)
 		mmu_hash_ops.hpte_clear_all();
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int create_section_mapping(unsigned long start, unsigned long end)
+{
+	if (radix_enabled())
+		return -ENODEV;
+
+	return hash__create_section_mapping(start, end);
+}
+
+int remove_section_mapping(unsigned long start, unsigned long end)
+{
+	if (radix_enabled())
+		return -ENODEV;
+
+	return hash__remove_section_mapping(start, end);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping()
  2016-12-15 19:50 [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Reza Arbab
  2016-12-15 19:50 ` [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size Reza Arbab
  2016-12-15 19:50 ` [PATCH v3 2/5] powerpc/mm: refactor {create, remove}_section_mapping() Reza Arbab
@ 2016-12-15 19:50 ` Reza Arbab
  2016-12-19  9:04   ` Aneesh Kumar K.V
  2016-12-20  6:28   ` Balbir Singh
  2016-12-15 19:50 ` [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-15 19:50 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Add the linear page mapping function for radix, used by memory hotplug.
This is similar to vmemmap_populate().

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h |  4 ++++
 arch/powerpc/mm/pgtable-book3s64.c         |  2 +-
 arch/powerpc/mm/pgtable-radix.c            | 19 +++++++++++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index b4d1302..43c2571 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -291,5 +291,9 @@ static inline unsigned long radix__get_tree_size(void)
 	}
 	return rts_field;
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int radix__create_section_mapping(unsigned long start, unsigned long end);
+#endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 653ff6c..2b13f6b 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -131,7 +131,7 @@ void mmu_cleanup_all(void)
 int create_section_mapping(unsigned long start, unsigned long end)
 {
 	if (radix_enabled())
-		return -ENODEV;
+		return radix__create_section_mapping(start, end);
 
 	return hash__create_section_mapping(start, end);
 }
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 54bd70e..8201d1f 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -465,6 +465,25 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	memblock_set_current_limit(first_memblock_base + first_memblock_size);
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+int radix__create_section_mapping(unsigned long start, unsigned long end)
+{
+	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
+
+	/* Align to the page size of the linear mapping. */
+	start = _ALIGN_DOWN(start, page_size);
+
+	for (; start < end; start += page_size) {
+		int rc = radix__map_kernel_page(start, __pa(start),
+						PAGE_KERNEL, page_size);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 int __meminit radix__vmemmap_create_mapping(unsigned long start,
 				      unsigned long page_size,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping()
  2016-12-15 19:50 [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Reza Arbab
                   ` (2 preceding siblings ...)
  2016-12-15 19:50 ` [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping() Reza Arbab
@ 2016-12-15 19:50 ` Reza Arbab
  2016-12-19  9:48   ` Aneesh Kumar K.V
  2016-12-15 19:50 ` [PATCH v3 5/5] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
  2016-12-16 14:38 ` [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Balbir Singh
  5 siblings, 1 reply; 23+ messages in thread
From: Reza Arbab @ 2016-12-15 19:50 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Tear down and free the four-level page tables of the linear mapping
during memory hotremove.

We borrow the basic structure of remove_pagetable() and friends from the
identically-named x86 functions.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h |   1 +
 arch/powerpc/mm/pgtable-book3s64.c         |   2 +-
 arch/powerpc/mm/pgtable-radix.c            | 163 +++++++++++++++++++++++++++++
 3 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 43c2571..0032b66 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -294,6 +294,7 @@ static inline unsigned long radix__get_tree_size(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int radix__create_section_mapping(unsigned long start, unsigned long end);
+int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 2b13f6b..b798ff6 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -139,7 +139,7 @@ int create_section_mapping(unsigned long start, unsigned long end)
 int remove_section_mapping(unsigned long start, unsigned long end)
 {
 	if (radix_enabled())
-		return -ENODEV;
+		return radix__remove_section_mapping(start, end);
 
 	return hash__remove_section_mapping(start, end);
 }
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 8201d1f..315237c 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -466,6 +466,159 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
+{
+	pte_t *pte;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		pte = pte_start + i;
+		if (!pte_none(*pte))
+			return;
+	}
+
+	pte_free_kernel(&init_mm, pte_start);
+	spin_lock(&init_mm.page_table_lock);
+	pmd_clear(pmd);
+	spin_unlock(&init_mm.page_table_lock);
+}
+
+static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+{
+	pmd_t *pmd;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd = pmd_start + i;
+		if (!pmd_none(*pmd))
+			return;
+	}
+
+	pmd_free(&init_mm, pmd_start);
+	spin_lock(&init_mm.page_table_lock);
+	pud_clear(pud);
+	spin_unlock(&init_mm.page_table_lock);
+}
+
+static void free_pud_table(pud_t *pud_start, pgd_t *pgd)
+{
+	pud_t *pud;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		pud = pud_start + i;
+		if (!pud_none(*pud))
+			return;
+	}
+
+	pud_free(&init_mm, pud_start);
+	spin_lock(&init_mm.page_table_lock);
+	pgd_clear(pgd);
+	spin_unlock(&init_mm.page_table_lock);
+}
+
+static void remove_pte_table(pte_t *pte_start, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	pte_t *pte;
+
+	pte = pte_start + pte_index(addr);
+	for (; addr < end; addr = next, pte++) {
+		next = (addr + PAGE_SIZE) & PAGE_MASK;
+		if (next > end)
+			next = end;
+
+		if (!pte_present(*pte))
+			continue;
+
+		spin_lock(&init_mm.page_table_lock);
+		pte_clear(&init_mm, addr, pte);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	flush_tlb_mm(&init_mm);
+}
+
+static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
+			     unsigned long end, unsigned long map_page_size)
+{
+	unsigned long next;
+	pte_t *pte_base;
+	pmd_t *pmd;
+
+	pmd = pmd_start + pmd_index(addr);
+	for (; addr < end; addr = next, pmd++) {
+		next = pmd_addr_end(addr, end);
+
+		if (!pmd_present(*pmd))
+			continue;
+
+		if (map_page_size == PMD_SIZE) {
+			spin_lock(&init_mm.page_table_lock);
+			pte_clear(&init_mm, addr, (pte_t *)pmd);
+			spin_unlock(&init_mm.page_table_lock);
+
+			continue;
+		}
+
+		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
+		remove_pte_table(pte_base, addr, next);
+		free_pte_table(pte_base, pmd);
+	}
+}
+
+static void remove_pud_table(pud_t *pud_start, unsigned long addr,
+			     unsigned long end, unsigned long map_page_size)
+{
+	unsigned long next;
+	pmd_t *pmd_base;
+	pud_t *pud;
+
+	pud = pud_start + pud_index(addr);
+	for (; addr < end; addr = next, pud++) {
+		next = pud_addr_end(addr, end);
+
+		if (!pud_present(*pud))
+			continue;
+
+		if (map_page_size == PUD_SIZE) {
+			spin_lock(&init_mm.page_table_lock);
+			pte_clear(&init_mm, addr, (pte_t *)pud);
+			spin_unlock(&init_mm.page_table_lock);
+
+			continue;
+		}
+
+		pmd_base = (pmd_t *)pud_page_vaddr(*pud);
+		remove_pmd_table(pmd_base, addr, next, map_page_size);
+		free_pmd_table(pmd_base, pud);
+	}
+}
+
+static void remove_pagetable(unsigned long start, unsigned long end,
+			     unsigned long map_page_size)
+{
+	unsigned long next;
+	unsigned long addr;
+	pgd_t *pgd;
+	pud_t *pud;
+
+	for (addr = start; addr < end; addr = next) {
+		next = pgd_addr_end(addr, end);
+
+		pgd = pgd_offset_k(addr);
+		if (!pgd_present(*pgd))
+			continue;
+
+		pud = (pud_t *)pgd_page_vaddr(*pgd);
+		remove_pud_table(pud, addr, next, map_page_size);
+		free_pud_table(pud, pgd);
+	}
+
+	flush_tlb_mm(&init_mm);
+}
+
 int radix__create_section_mapping(unsigned long start, unsigned long end)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
@@ -482,6 +635,16 @@ int radix__create_section_mapping(unsigned long start, unsigned long end)
 
 	return 0;
 }
+
+int radix__remove_section_mapping(unsigned long start, unsigned long end)
+{
+	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
+
+	start = _ALIGN_DOWN(start, page_size);
+	remove_pagetable(start, end, page_size);
+
+	return 0;
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/5] powerpc/mm: unstub radix__vmemmap_remove_mapping()
  2016-12-15 19:50 [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Reza Arbab
                   ` (3 preceding siblings ...)
  2016-12-15 19:50 ` [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
@ 2016-12-15 19:50 ` Reza Arbab
  2016-12-16 14:38 ` [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Balbir Singh
  5 siblings, 0 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-15 19:50 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Use remove_pagetable() and friends for radix vmemmap removal.

We do not require the special-case handling of vmemmap done in the x86
versions of these functions. This is because vmemmap_free() has already
freed the mapped pages, and calls us with an aligned address range.

So, add a few failsafe WARNs, but otherwise the code to remove linear
mappings is already sufficient for vmemmap.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable-radix.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 315237c..9d1d51e 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -532,6 +532,15 @@ static void remove_pte_table(pte_t *pte_start, unsigned long addr,
 		if (!pte_present(*pte))
 			continue;
 
+		if (!PAGE_ALIGNED(addr) || !PAGE_ALIGNED(next)) {
+			/*
+			 * The vmemmap_free() and remove_section_mapping()
+			 * codepaths call us with aligned addresses.
+			 */
+			WARN_ONCE(1, "%s: unaligned range\n", __func__);
+			continue;
+		}
+
 		spin_lock(&init_mm.page_table_lock);
 		pte_clear(&init_mm, addr, pte);
 		spin_unlock(&init_mm.page_table_lock);
@@ -555,6 +564,12 @@ static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
 			continue;
 
 		if (map_page_size == PMD_SIZE) {
+			if (!IS_ALIGNED(addr, PMD_SIZE) ||
+			    !IS_ALIGNED(next, PMD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			spin_lock(&init_mm.page_table_lock);
 			pte_clear(&init_mm, addr, (pte_t *)pmd);
 			spin_unlock(&init_mm.page_table_lock);
@@ -583,6 +598,12 @@ static void remove_pud_table(pud_t *pud_start, unsigned long addr,
 			continue;
 
 		if (map_page_size == PUD_SIZE) {
+			if (!IS_ALIGNED(addr, PUD_SIZE) ||
+			    !IS_ALIGNED(next, PUD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			spin_lock(&init_mm.page_table_lock);
 			pte_clear(&init_mm, addr, (pte_t *)pud);
 			spin_unlock(&init_mm.page_table_lock);
@@ -662,7 +683,7 @@ int __meminit radix__vmemmap_create_mapping(unsigned long start,
 #ifdef CONFIG_MEMORY_HOTPLUG
 void radix__vmemmap_remove_mapping(unsigned long start, unsigned long page_size)
 {
-	/* FIXME!! intel does more. We should free page tables mapping vmemmap ? */
+	remove_pagetable(start, start + page_size, page_size);
 }
 #endif
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix
  2016-12-15 19:50 [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Reza Arbab
                   ` (4 preceding siblings ...)
  2016-12-15 19:50 ` [PATCH v3 5/5] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
@ 2016-12-16 14:38 ` Balbir Singh
  2016-12-19 17:58   ` Reza Arbab
  5 siblings, 1 reply; 23+ messages in thread
From: Balbir Singh @ 2016-12-16 14:38 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Alistair Popple



On 16/12/16 06:50, Reza Arbab wrote:
> Memory hotplug is leading to hash page table calls, even on radix:
> 
> ...
>         arch_add_memory
>                 create_section_mapping
>                         htab_bolt_mapping
>                                 BUG_ON(!ppc_md.hpte_insert);
> 
> To fix, refactor {create,remove}_section_mapping() into hash__ and radix__
> variants. Implement the radix versions by borrowing from existing vmemmap
> and x86 code.
> 
> This passes basic verification of plugging and removing memory, but this 
> stuff is tricky and I'd appreciate extra scrutiny of the series for 
> correctness--in particular, the adaptation of remove_pagetable() from x86.

On quick glance everything seemed alright to me. I'll review each patch and Ack/provide comments.
Do we care about alt maps yet?

Balbir Singh.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size
  2016-12-15 19:50 ` [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size Reza Arbab
@ 2016-12-19  8:58   ` Aneesh Kumar K.V
  2016-12-19 20:53     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 23+ messages in thread
From: Aneesh Kumar K.V @ 2016-12-19  8:58 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> This was defaulting to 4K, regardless of PAGE_SIZE.
>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/pgtable-radix.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 623a0dc..54bd70e 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -351,8 +351,10 @@ void __init radix__early_init_mmu(void)
>  #ifdef CONFIG_PPC_64K_PAGES
>  	/* PAGE_SIZE mappings */
>  	mmu_virtual_psize = MMU_PAGE_64K;
> +	mmu_linear_psize = MMU_PAGE_64K;

That is not clearly correct, we map the linear address with either 64K,
2M or 1G depending on the memory available. Take a look at
static void __init radix_init_pgtable(void)


>  #else
>  	mmu_virtual_psize = MMU_PAGE_4K;
> +	mmu_linear_psize = MMU_PAGE_4K;
>  #endif
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] powerpc/mm: refactor {create, remove}_section_mapping()
  2016-12-15 19:50 ` [PATCH v3 2/5] powerpc/mm: refactor {create, remove}_section_mapping() Reza Arbab
@ 2016-12-19  9:00   ` Aneesh Kumar K.V
  2016-12-19 18:00     ` [PATCH v3 2/5] powerpc/mm: refactor {create,remove}_section_mapping() Reza Arbab
  2016-12-20  5:26   ` Balbir Singh
  1 sibling, 1 reply; 23+ messages in thread
From: Aneesh Kumar K.V @ 2016-12-19  9:00 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> Change {create,remove}_section_mapping() to be wrappers around functions
> prefixed with "hash__".
>
> This is preparation for the addition of their "radix__" variants. No
> functional change.
>

I think this can go upstream now ? To fixup broken hotplug with radix ?

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/hash.h |  5 +++++
>  arch/powerpc/mm/hash_utils_64.c           |  4 ++--
>  arch/powerpc/mm/pgtable-book3s64.c        | 18 ++++++++++++++++++
>  3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index f61cad3..dd90574 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -201,6 +201,11 @@ extern int __meminit hash__vmemmap_create_mapping(unsigned long start,
>  					      unsigned long phys);
>  extern void hash__vmemmap_remove_mapping(unsigned long start,
>  				     unsigned long page_size);
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int hash__create_section_mapping(unsigned long start, unsigned long end);
> +int hash__remove_section_mapping(unsigned long start, unsigned long end);
> +#endif /* CONFIG_MEMORY_HOTPLUG */
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index b9a062f..96a4fb7 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -743,7 +743,7 @@ static unsigned long __init htab_get_table_size(void)
>  }
>
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -int create_section_mapping(unsigned long start, unsigned long end)
> +int hash__create_section_mapping(unsigned long start, unsigned long end)
>  {
>  	int rc = htab_bolt_mapping(start, end, __pa(start),
>  				   pgprot_val(PAGE_KERNEL), mmu_linear_psize,
> @@ -757,7 +757,7 @@ int create_section_mapping(unsigned long start, unsigned long end)
>  	return rc;
>  }
>
> -int remove_section_mapping(unsigned long start, unsigned long end)
> +int hash__remove_section_mapping(unsigned long start, unsigned long end)
>  {
>  	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
>  				     mmu_kernel_ssize);
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index ebf9782..653ff6c 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -126,3 +126,21 @@ void mmu_cleanup_all(void)
>  	else if (mmu_hash_ops.hpte_clear_all)
>  		mmu_hash_ops.hpte_clear_all();
>  }
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int create_section_mapping(unsigned long start, unsigned long end)
> +{
> +	if (radix_enabled())
> +		return -ENODEV;
> +
> +	return hash__create_section_mapping(start, end);
> +}
> +
> +int remove_section_mapping(unsigned long start, unsigned long end)
> +{
> +	if (radix_enabled())
> +		return -ENODEV;
> +
> +	return hash__remove_section_mapping(start, end);
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping()
  2016-12-15 19:50 ` [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping() Reza Arbab
@ 2016-12-19  9:04   ` Aneesh Kumar K.V
  2016-12-19 18:06     ` Reza Arbab
  2016-12-21  7:03     ` Anshuman Khandual
  2016-12-20  6:28   ` Balbir Singh
  1 sibling, 2 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2016-12-19  9:04 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> Add the linear page mapping function for radix, used by memory hotplug.
> This is similar to vmemmap_populate().
>

Ok with this patch your first patch becomes useful. Can you merge that
with this and rename mmu_linear_psize to mmu_hotplug_psize or even use
mmu_virtual_psize. The linear naming is confusing.


> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h |  4 ++++
>  arch/powerpc/mm/pgtable-book3s64.c         |  2 +-
>  arch/powerpc/mm/pgtable-radix.c            | 19 +++++++++++++++++++
>  3 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index b4d1302..43c2571 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -291,5 +291,9 @@ static inline unsigned long radix__get_tree_size(void)
>  	}
>  	return rts_field;
>  }
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int radix__create_section_mapping(unsigned long start, unsigned long end);
> +#endif /* CONFIG_MEMORY_HOTPLUG */
>  #endif /* __ASSEMBLY__ */
>  #endif
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 653ff6c..2b13f6b 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -131,7 +131,7 @@ void mmu_cleanup_all(void)
>  int create_section_mapping(unsigned long start, unsigned long end)
>  {
>  	if (radix_enabled())
> -		return -ENODEV;
> +		return radix__create_section_mapping(start, end);
>
>  	return hash__create_section_mapping(start, end);
>  }
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 54bd70e..8201d1f 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -465,6 +465,25 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  	memblock_set_current_limit(first_memblock_base + first_memblock_size);
>  }
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int radix__create_section_mapping(unsigned long start, unsigned long end)
> +{
> +	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
> +
> +	/* Align to the page size of the linear mapping. */
> +	start = _ALIGN_DOWN(start, page_size);
> +
> +	for (; start < end; start += page_size) {
> +		int rc = radix__map_kernel_page(start, __pa(start),
> +						PAGE_KERNEL, page_size);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> +
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  int __meminit radix__vmemmap_create_mapping(unsigned long start,
>  				      unsigned long page_size,
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping()
  2016-12-15 19:50 ` [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
@ 2016-12-19  9:48   ` Aneesh Kumar K.V
  2016-12-19 18:11     ` Reza Arbab
  2016-12-19 20:59     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2016-12-19  9:48 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> Tear down and free the four-level page tables of the linear mapping
> during memory hotremove.
>
> We borrow the basic structure of remove_pagetable() and friends from the
> identically-named x86 functions.
>

Can you add more details here, which explain why we don't need to follow
the RCU page table free when doing memory hotunplug ?

> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h |   1 +
>  arch/powerpc/mm/pgtable-book3s64.c         |   2 +-
>  arch/powerpc/mm/pgtable-radix.c            | 163 +++++++++++++++++++++++++++++
>  3 files changed, 165 insertions(+), 1 deletion(-)
>
....
....


> +static void remove_pte_table(pte_t *pte_start, unsigned long addr,
> +			     unsigned long end)
> +{
> +	unsigned long next;
> +	pte_t *pte;
> +
> +	pte = pte_start + pte_index(addr);
> +	for (; addr < end; addr = next, pte++) {
> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
> +		if (next > end)
> +			next = end;
> +
> +		if (!pte_present(*pte))
> +			continue;
> +
> +		spin_lock(&init_mm.page_table_lock);
> +		pte_clear(&init_mm, addr, pte);
> +		spin_unlock(&init_mm.page_table_lock);
> +	}
> +
> +	flush_tlb_mm(&init_mm);

Why call a flush here. we do that at the end of remove_page_table .
Isn't that sufficient ?

> +}
> +
> +static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
> +			     unsigned long end, unsigned long map_page_size)
> +{
> +	unsigned long next;
> +	pte_t *pte_base;
> +	pmd_t *pmd;
> +
> +	pmd = pmd_start + pmd_index(addr);
> +	for (; addr < end; addr = next, pmd++) {
> +		next = pmd_addr_end(addr, end);
> +
> +		if (!pmd_present(*pmd))
> +			continue;
> +
> +		if (map_page_size == PMD_SIZE) {
> +			spin_lock(&init_mm.page_table_lock);
> +			pte_clear(&init_mm, addr, (pte_t *)pmd);
> +			spin_unlock(&init_mm.page_table_lock);
> +
> +			continue;
> +		}
> +
> +		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
> +		remove_pte_table(pte_base, addr, next);
> +		free_pte_table(pte_base, pmd);
> +	}
> +}
> +
> +static void remove_pud_table(pud_t *pud_start, unsigned long addr,
> +			     unsigned long end, unsigned long map_page_size)
> +{
> +	unsigned long next;
> +	pmd_t *pmd_base;
> +	pud_t *pud;
> +
> +	pud = pud_start + pud_index(addr);
> +	for (; addr < end; addr = next, pud++) {
> +		next = pud_addr_end(addr, end);
> +
> +		if (!pud_present(*pud))
> +			continue;
> +
> +		if (map_page_size == PUD_SIZE) {
> +			spin_lock(&init_mm.page_table_lock);
> +			pte_clear(&init_mm, addr, (pte_t *)pud);
> +			spin_unlock(&init_mm.page_table_lock);
> +
> +			continue;
> +		}
> +
> +		pmd_base = (pmd_t *)pud_page_vaddr(*pud);
> +		remove_pmd_table(pmd_base, addr, next, map_page_size);
> +		free_pmd_table(pmd_base, pud);
> +	}
> +}
> +
> +static void remove_pagetable(unsigned long start, unsigned long end,
> +			     unsigned long map_page_size)
> +{
> +	unsigned long next;
> +	unsigned long addr;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +
> +	for (addr = start; addr < end; addr = next) {
> +		next = pgd_addr_end(addr, end);
> +
> +		pgd = pgd_offset_k(addr);
> +		if (!pgd_present(*pgd))
> +			continue;
> +
> +		pud = (pud_t *)pgd_page_vaddr(*pgd);
> +		remove_pud_table(pud, addr, next, map_page_size);
> +		free_pud_table(pud, pgd);
> +	}
> +
> +	flush_tlb_mm(&init_mm);


So we want to flush the full kernel tlb when we do a hotplug ?
May be check using flush_tlb_kernel_range(). Also that flush_tlb_mm() do
check for mm_is_thread_local(). Do we update init_mm correct to handle
that check ? I assume we want a tlbie() here instead of tlbiel() ?


> +}
> +
>  int radix__create_section_mapping(unsigned long start, unsigned long end)
>  {
>  	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
> @@ -482,6 +635,16 @@ int radix__create_section_mapping(unsigned long start, unsigned long end)
>
>  	return 0;
>  }
> +
> +int radix__remove_section_mapping(unsigned long start, unsigned long end)
> +{
> +	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
> +
> +	start = _ALIGN_DOWN(start, page_size);
> +	remove_pagetable(start, end, page_size);
> +
> +	return 0;
> +}
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP


-aneesh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix
  2016-12-16 14:38 ` [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Balbir Singh
@ 2016-12-19 17:58   ` Reza Arbab
  2016-12-21  6:54     ` Anshuman Khandual
  0 siblings, 1 reply; 23+ messages in thread
From: Reza Arbab @ 2016-12-19 17:58 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

On Sat, Dec 17, 2016 at 01:38:40AM +1100, Balbir Singh wrote:
>Do we care about alt maps yet?

Good question. I'll try to see if/how altmaps might need special 
consideration here.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] powerpc/mm: refactor {create,remove}_section_mapping()
  2016-12-19  9:00   ` Aneesh Kumar K.V
@ 2016-12-19 18:00     ` Reza Arbab
  0 siblings, 0 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-19 18:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Balbir Singh, Alistair Popple

On Mon, Dec 19, 2016 at 02:30:28PM +0530, Aneesh Kumar K.V wrote:
>Reza Arbab <arbab@linux.vnet.ibm.com> writes:
>> Change {create,remove}_section_mapping() to be wrappers around 
>> functions prefixed with "hash__".
>>
>> This is preparation for the addition of their "radix__" variants. No
>> functional change.
>>
>
>I think this can go upstream now ? To fixup broken hotplug with radix ?

Yes, this one might be worth separating as a fix for the BUG() on radix.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping()
  2016-12-19  9:04   ` Aneesh Kumar K.V
@ 2016-12-19 18:06     ` Reza Arbab
  2016-12-21  7:03     ` Anshuman Khandual
  1 sibling, 0 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-19 18:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Balbir Singh, Alistair Popple

On Mon, Dec 19, 2016 at 02:34:13PM +0530, Aneesh Kumar K.V wrote:
>Reza Arbab <arbab@linux.vnet.ibm.com> writes:
>> Add the linear page mapping function for radix, used by memory 
>> hotplug. This is similar to vmemmap_populate().
>>
>
>Ok with this patch your first patch becomes useful. Can you merge that
>with this and rename mmu_linear_psize to mmu_hotplug_psize or even use
>mmu_virtual_psize. The linear naming is confusing.

Thanks for pointing out radix_init_pgtable(). I think the right thing to 
do here is create these mappings the same way it does. We can probably 
factor out a common function.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping()
  2016-12-19  9:48   ` Aneesh Kumar K.V
@ 2016-12-19 18:11     ` Reza Arbab
  2016-12-19 20:59     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-19 18:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Balbir Singh, Alistair Popple

On Mon, Dec 19, 2016 at 03:18:07PM +0530, Aneesh Kumar K.V wrote:
>Reza Arbab <arbab@linux.vnet.ibm.com> writes:
>> +static void remove_pte_table(pte_t *pte_start, unsigned long addr,
>> +			     unsigned long end)
>> +{
>> +	unsigned long next;
>> +	pte_t *pte;
>> +
>> +	pte = pte_start + pte_index(addr);
>> +	for (; addr < end; addr = next, pte++) {
>> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +		if (next > end)
>> +			next = end;
>> +
>> +		if (!pte_present(*pte))
>> +			continue;
>> +
>> +		spin_lock(&init_mm.page_table_lock);
>> +		pte_clear(&init_mm, addr, pte);
>> +		spin_unlock(&init_mm.page_table_lock);
>> +	}
>> +
>> +	flush_tlb_mm(&init_mm);
>
>Why call a flush here. we do that at the end of remove_page_table .
>Isn't that sufficient ?

This was carried over from the x86 version of the function, where they 
do flush_tlb_all(). I can experiment to make sure things work without 
it.

>> +static void remove_pagetable(unsigned long start, unsigned long end,
>> +			     unsigned long map_page_size)
>> +{
>> +	unsigned long next;
>> +	unsigned long addr;
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +
>> +	for (addr = start; addr < end; addr = next) {
>> +		next = pgd_addr_end(addr, end);
>> +
>> +		pgd = pgd_offset_k(addr);
>> +		if (!pgd_present(*pgd))
>> +			continue;
>> +
>> +		pud = (pud_t *)pgd_page_vaddr(*pgd);
>> +		remove_pud_table(pud, addr, next, map_page_size);
>> +		free_pud_table(pud, pgd);
>> +	}
>> +
>> +	flush_tlb_mm(&init_mm);
>
>
>So we want to flush the full kernel tlb when we do a hotplug ?
>May be check using flush_tlb_kernel_range(). Also that flush_tlb_mm() do
>check for mm_is_thread_local(). Do we update init_mm correct to handle
>that check ? I assume we want a tlbie() here instead of tlbiel() ?

I'll try using flush_tlb_kernel_range() instead. That sure does seem 
more appropriate.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size
  2016-12-19  8:58   ` Aneesh Kumar K.V
@ 2016-12-19 20:53     ` Benjamin Herrenschmidt
  2016-12-20  2:02       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2016-12-19 20:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Reza Arbab, Michael Ellerman, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

On Mon, 2016-12-19 at 14:28 +0530, Aneesh Kumar K.V wrote:
> Reza Arbab <arbab@linux.vnet.ibm.com> writes:
> 
> > This was defaulting to 4K, regardless of PAGE_SIZE.
> > 
> > Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/mm/pgtable-radix.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/powerpc/mm/pgtable-radix.c
> > b/arch/powerpc/mm/pgtable-radix.c
> > index 623a0dc..54bd70e 100644
> > --- a/arch/powerpc/mm/pgtable-radix.c
> > +++ b/arch/powerpc/mm/pgtable-radix.c
> > @@ -351,8 +351,10 @@ void __init radix__early_init_mmu(void)
> >  #ifdef CONFIG_PPC_64K_PAGES
> >  	/* PAGE_SIZE mappings */
> >  	mmu_virtual_psize = MMU_PAGE_64K;
> > +	mmu_linear_psize = MMU_PAGE_64K;
> 
> That is not clearly correct, we map the linear address with either
> 64K,
> 2M or 1G depending on the memory available. Take a look at
> static void __init radix_init_pgtable(void)
> 

So should we fix that initialization regardless or take it out ?
> 
> >  #else
> >  	mmu_virtual_psize = MMU_PAGE_4K;
> > +	mmu_linear_psize = MMU_PAGE_4K;
> >  #endif
> > 
> >  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> > -- 
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping()
  2016-12-19  9:48   ` Aneesh Kumar K.V
  2016-12-19 18:11     ` Reza Arbab
@ 2016-12-19 20:59     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2016-12-19 20:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Reza Arbab, Michael Ellerman, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

On Mon, 2016-12-19 at 15:18 +0530, Aneesh Kumar K.V wrote:
> > +     pte = pte_start + pte_index(addr);
> > +     for (; addr < end; addr = next, pte++) {
> > +             next = (addr + PAGE_SIZE) & PAGE_MASK;
> > +             if (next > end)
> > +                     next = end;
> > +
> > +             if (!pte_present(*pte))
> > +                     continue;
> > +
> > +             spin_lock(&init_mm.page_table_lock);
> > +             pte_clear(&init_mm, addr, pte);
> > +             spin_unlock(&init_mm.page_table_lock);
> > +     }
> > +
> > +     flush_tlb_mm(&init_mm);
> 
> Why call a flush here. we do that at the end of remove_page_table .
> Isn't that sufficient ?

All those lock/unlock ... what for ? Can't we just do the whole page ?

Also I agree, we can delay the flush of the PTEs to the end.

Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size
  2016-12-19 20:53     ` Benjamin Herrenschmidt
@ 2016-12-20  2:02       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2016-12-20  2:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Reza Arbab, Michael Ellerman, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2016-12-19 at 14:28 +0530, Aneesh Kumar K.V wrote:
>> Reza Arbab <arbab@linux.vnet.ibm.com> writes:
>>=20
>> > This was defaulting to 4K, regardless of PAGE_SIZE.
>> >=20
>> > Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
>> > ---
>> > =C2=A0arch/powerpc/mm/pgtable-radix.c | 2 ++
>> > =C2=A01 file changed, 2 insertions(+)
>> >=20
>> > diff --git a/arch/powerpc/mm/pgtable-radix.c
>> > b/arch/powerpc/mm/pgtable-radix.c
>> > index 623a0dc..54bd70e 100644
>> > --- a/arch/powerpc/mm/pgtable-radix.c
>> > +++ b/arch/powerpc/mm/pgtable-radix.c
>> > @@ -351,8 +351,10 @@ void __init radix__early_init_mmu(void)
>> > =C2=A0#ifdef CONFIG_PPC_64K_PAGES
>> > =C2=A0	/* PAGE_SIZE mappings */
>> > =C2=A0	mmu_virtual_psize =3D MMU_PAGE_64K;
>> > +	mmu_linear_psize =3D MMU_PAGE_64K;
>>=20
>> That is not clearly correct, we map the linear address with either
>> 64K,
>> 2M or 1G depending on the memory available. Take a look at
>> static void __init radix_init_pgtable(void)
>>=20
>
> So should we fix that initialization regardless or take it out ?

We should not be usuing mmu_linear_psize on radix. Hence we can skip
that initialization.

-aneesh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] powerpc/mm: refactor {create,remove}_section_mapping()
  2016-12-15 19:50 ` [PATCH v3 2/5] powerpc/mm: refactor {create, remove}_section_mapping() Reza Arbab
  2016-12-19  9:00   ` Aneesh Kumar K.V
@ 2016-12-20  5:26   ` Balbir Singh
  1 sibling, 0 replies; 23+ messages in thread
From: Balbir Singh @ 2016-12-20  5:26 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Alistair Popple



On 12/16/2016 06:50 AM, Reza Arbab wrote:
> Change {create,remove}_section_mapping() to be wrappers around functions
> prefixed with "hash__".
>
> This is preparation for the addition of their "radix__" variants. No
> functional change.
>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping()
  2016-12-15 19:50 ` [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping() Reza Arbab
  2016-12-19  9:04   ` Aneesh Kumar K.V
@ 2016-12-20  6:28   ` Balbir Singh
  2016-12-20 15:32     ` Reza Arbab
  1 sibling, 1 reply; 23+ messages in thread
From: Balbir Singh @ 2016-12-20  6:28 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int radix__create_section_mapping(unsigned long start, unsigned long end)
> +{
> +	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;

Can we refactor bits from radix_init_pgtable() and reuse?

Balbir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping()
  2016-12-20  6:28   ` Balbir Singh
@ 2016-12-20 15:32     ` Reza Arbab
  0 siblings, 0 replies; 23+ messages in thread
From: Reza Arbab @ 2016-12-20 15:32 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

On Tue, Dec 20, 2016 at 05:28:40PM +1100, Balbir Singh wrote:
>>+#ifdef CONFIG_MEMORY_HOTPLUG
>>+int radix__create_section_mapping(unsigned long start, unsigned long end)
>>+{
>>+	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
>
>Can we refactor bits from radix_init_pgtable() and reuse?

Yes, that's my plan for v4.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix
  2016-12-19 17:58   ` Reza Arbab
@ 2016-12-21  6:54     ` Anshuman Khandual
  0 siblings, 0 replies; 23+ messages in thread
From: Anshuman Khandual @ 2016-12-21  6:54 UTC (permalink / raw)
  To: Reza Arbab, Balbir Singh
  Cc: Alistair Popple, Paul Mackerras, Aneesh Kumar K.V, linuxppc-dev

On 12/19/2016 11:28 PM, Reza Arbab wrote:
> On Sat, Dec 17, 2016 at 01:38:40AM +1100, Balbir Singh wrote:
>> Do we care about alt maps yet?
> 
> Good question. I'll try to see if/how altmaps might need special
> consideration here.

vmemmap alt-map should be enabled separately and should not be mixed
in this series which tries to just get hotplug working on radix.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping()
  2016-12-19  9:04   ` Aneesh Kumar K.V
  2016-12-19 18:06     ` Reza Arbab
@ 2016-12-21  7:03     ` Anshuman Khandual
  1 sibling, 0 replies; 23+ messages in thread
From: Anshuman Khandual @ 2016-12-21  7:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Reza Arbab, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Alistair Popple

On 12/19/2016 02:34 PM, Aneesh Kumar K.V wrote:
> Reza Arbab <arbab@linux.vnet.ibm.com> writes:
> 
>> > Add the linear page mapping function for radix, used by memory hotplug.
>> > This is similar to vmemmap_populate().
>> >
> Ok with this patch your first patch becomes useful. Can you merge that
> with this and rename mmu_linear_psize to mmu_hotplug_psize or even use
> mmu_virtual_psize. The linear naming is confusing.

mmu_linear_psize variable was referring to the page size used to create
kernel linear mapping (the 0xc00.... range) for the newly added memory
section. Why the name should be changed ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-12-21  7:04 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15 19:50 [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Reza Arbab
2016-12-15 19:50 ` [PATCH v3 1/5] powerpc/mm: set the radix linear page mapping size Reza Arbab
2016-12-19  8:58   ` Aneesh Kumar K.V
2016-12-19 20:53     ` Benjamin Herrenschmidt
2016-12-20  2:02       ` Aneesh Kumar K.V
2016-12-15 19:50 ` [PATCH v3 2/5] powerpc/mm: refactor {create, remove}_section_mapping() Reza Arbab
2016-12-19  9:00   ` Aneesh Kumar K.V
2016-12-19 18:00     ` [PATCH v3 2/5] powerpc/mm: refactor {create,remove}_section_mapping() Reza Arbab
2016-12-20  5:26   ` Balbir Singh
2016-12-15 19:50 ` [PATCH v3 3/5] powerpc/mm: add radix__create_section_mapping() Reza Arbab
2016-12-19  9:04   ` Aneesh Kumar K.V
2016-12-19 18:06     ` Reza Arbab
2016-12-21  7:03     ` Anshuman Khandual
2016-12-20  6:28   ` Balbir Singh
2016-12-20 15:32     ` Reza Arbab
2016-12-15 19:50 ` [PATCH v3 4/5] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
2016-12-19  9:48   ` Aneesh Kumar K.V
2016-12-19 18:11     ` Reza Arbab
2016-12-19 20:59     ` Benjamin Herrenschmidt
2016-12-15 19:50 ` [PATCH v3 5/5] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
2016-12-16 14:38 ` [PATCH v3 0/5] powerpc/mm: enable memory hotplug on radix Balbir Singh
2016-12-19 17:58   ` Reza Arbab
2016-12-21  6:54     ` Anshuman Khandual

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.