All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] powerpc/mm: enable memory hotplug on radix
@ 2017-01-03 20:43 Reza Arbab
  2017-01-03 20:43 ` [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Reza Arbab @ 2017-01-03 20:43 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Do the plumbing needed for memory hotplug on systems using radix pagetables,
borrowing from existing vmemmap and x86 code.

This passes basic verification of plugging and removing memory, but this 
stuff is tricky and I'd appreciate extra scrutiny of the series for 
correctness--in particular, the adaptation of remove_pagetable() from x86.

/* changelog */

v4:
* Sent patch 1 as a standalone fix. This set still depends on it:
  https://lkml.kernel.org/r/1483475991-16999-1-git-send-email-arbab@linux.vnet.ibm.com

* Extract a common function that can be used by both radix_init_pgtable() and
  radix__create_section_mapping().

* Reduce tlb flushing to one flush_tlb_kernel_range() per section, and do
  less granular locking of init_mm->page_table_lock.

v3:
* https://lkml.kernel.org/r/1481831443-22761-1-git-send-email-arbab@linux.vnet.ibm.com

* Port remove_pagetable() et al. from x86 for unmapping.

* [RFC] -> [PATCH]

v2:
* https://lkml.kernel.org/r/1471449083-15931-1-git-send-email-arbab@linux.vnet.ibm.com

* Do not simply fall through to vmemmap_{create,remove}_mapping(). As Aneesh
  and Michael pointed out, they are tied to CONFIG_SPARSEMEM_VMEMMAP and only
  did what I needed by luck anyway.

v1:
* https://lkml.kernel.org/r/1466699962-22412-1-git-send-email-arbab@linux.vnet.ibm.com

Reza Arbab (4):
  powerpc/mm: refactor radix physical page mapping
  powerpc/mm: add radix__create_section_mapping()
  powerpc/mm: add radix__remove_section_mapping()
  powerpc/mm: unstub radix__vmemmap_remove_mapping()

 arch/powerpc/include/asm/book3s/64/radix.h |   5 +
 arch/powerpc/mm/pgtable-book3s64.c         |   4 +-
 arch/powerpc/mm/pgtable-radix.c            | 254 ++++++++++++++++++++++++-----
 3 files changed, 222 insertions(+), 41 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-03 20:43 [PATCH v4 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
@ 2017-01-03 20:43 ` Reza Arbab
  2017-01-04  5:04   ` Aneesh Kumar K.V
  2017-01-03 20:43 ` [PATCH v4 2/4] powerpc/mm: add radix__create_section_mapping() Reza Arbab
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Reza Arbab @ 2017-01-03 20:43 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Move the page mapping code in radix_init_pgtable() into a separate
function that will also be used for memory hotplug.

The current goto loop progressively decreases its mapping size as it
covers the tail of a range whose end is unaligned. Change this to a for
loop which can do the same for both ends of the range.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable-radix.c | 69 ++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 623a0dc..5cee6d1 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -107,54 +107,47 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
 	return 0;
 }
 
+static int __meminit create_physical_mapping(unsigned long start,
+					     unsigned long end)
+{
+	unsigned long mapping_size;
+
+	start = _ALIGN_UP(start, PAGE_SIZE);
+	for (; start < end; start += mapping_size) {
+		unsigned long gap = end - start;
+		int rc;
+
+		if (IS_ALIGNED(start, PUD_SIZE) && gap >= PUD_SIZE &&
+		    mmu_psize_defs[MMU_PAGE_1G].shift)
+			mapping_size = PUD_SIZE;
+		else if (IS_ALIGNED(start, PMD_SIZE) && gap >= PMD_SIZE &&
+			 mmu_psize_defs[MMU_PAGE_2M].shift)
+			mapping_size = PMD_SIZE;
+		else
+			mapping_size = PAGE_SIZE;
+
+		rc = radix__map_kernel_page((unsigned long)__va(start), start,
+					    PAGE_KERNEL_X, mapping_size);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 static void __init radix_init_pgtable(void)
 {
-	int loop_count;
-	u64 base, end, start_addr;
 	unsigned long rts_field;
 	struct memblock_region *reg;
-	unsigned long linear_page_size;
 
 	/* We don't support slb for radix */
 	mmu_slb_size = 0;
 	/*
 	 * Create the linear mapping, using standard page size for now
 	 */
-	loop_count = 0;
-	for_each_memblock(memory, reg) {
-
-		start_addr = reg->base;
-
-redo:
-		if (loop_count < 1 && mmu_psize_defs[MMU_PAGE_1G].shift)
-			linear_page_size = PUD_SIZE;
-		else if (loop_count < 2 && mmu_psize_defs[MMU_PAGE_2M].shift)
-			linear_page_size = PMD_SIZE;
-		else
-			linear_page_size = PAGE_SIZE;
-
-		base = _ALIGN_UP(start_addr, linear_page_size);
-		end = _ALIGN_DOWN(reg->base + reg->size, linear_page_size);
-
-		pr_info("Mapping range 0x%lx - 0x%lx with 0x%lx\n",
-			(unsigned long)base, (unsigned long)end,
-			linear_page_size);
-
-		while (base < end) {
-			radix__map_kernel_page((unsigned long)__va(base),
-					      base, PAGE_KERNEL_X,
-					      linear_page_size);
-			base += linear_page_size;
-		}
-		/*
-		 * map the rest using lower page size
-		 */
-		if (end < reg->base + reg->size) {
-			start_addr = end;
-			loop_count++;
-			goto redo;
-		}
-	}
+	for_each_memblock(memory, reg)
+		WARN_ON(create_physical_mapping(reg->base,
+						reg->base + reg->size));
 	/*
 	 * Allocate Partition table and process table for the
 	 * host.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/4] powerpc/mm: add radix__create_section_mapping()
  2017-01-03 20:43 [PATCH v4 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
  2017-01-03 20:43 ` [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
@ 2017-01-03 20:43 ` Reza Arbab
  2017-01-03 20:43 ` [PATCH v4 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
  2017-01-03 20:43 ` [PATCH v4 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
  3 siblings, 0 replies; 9+ messages in thread
From: Reza Arbab @ 2017-01-03 20:43 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Wire up memory hotplug page mapping for radix. Share the mapping
function already used by radix_init_pgtable().

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h | 4 ++++
 arch/powerpc/mm/pgtable-book3s64.c         | 2 +-
 arch/powerpc/mm/pgtable-radix.c            | 7 +++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index b4d1302..43c2571 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -291,5 +291,9 @@ static inline unsigned long radix__get_tree_size(void)
 	}
 	return rts_field;
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int radix__create_section_mapping(unsigned long start, unsigned long end);
+#endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 653ff6c..2b13f6b 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -131,7 +131,7 @@ void mmu_cleanup_all(void)
 int create_section_mapping(unsigned long start, unsigned long end)
 {
 	if (radix_enabled())
-		return -ENODEV;
+		return radix__create_section_mapping(start, end);
 
 	return hash__create_section_mapping(start, end);
 }
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 5cee6d1..3588895 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -456,6 +456,13 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	memblock_set_current_limit(first_memblock_base + first_memblock_size);
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+int __ref radix__create_section_mapping(unsigned long start, unsigned long end)
+{
+	return create_physical_mapping(start, end);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 int __meminit radix__vmemmap_create_mapping(unsigned long start,
 				      unsigned long page_size,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/4] powerpc/mm: add radix__remove_section_mapping()
  2017-01-03 20:43 [PATCH v4 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
  2017-01-03 20:43 ` [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
  2017-01-03 20:43 ` [PATCH v4 2/4] powerpc/mm: add radix__create_section_mapping() Reza Arbab
@ 2017-01-03 20:43 ` Reza Arbab
  2017-01-04  5:07   ` Aneesh Kumar K.V
  2017-01-03 20:43 ` [PATCH v4 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
  3 siblings, 1 reply; 9+ messages in thread
From: Reza Arbab @ 2017-01-03 20:43 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Tear down and free the four-level page tables of physical mappings
during memory hotremove.

Borrow the basic structure of remove_pagetable() and friends from the
identically-named x86 functions. Simplify things a bit so locking and
tlb flushing are only done in the outermost function.

Memory must be offline to be removed, thus not in use. So there
shouldn't be the sort of concurrent page walking activity here that
might prompt us to use RCU.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h |   1 +
 arch/powerpc/mm/pgtable-book3s64.c         |   2 +-
 arch/powerpc/mm/pgtable-radix.c            | 149 +++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 43c2571..0032b66 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -294,6 +294,7 @@ static inline unsigned long radix__get_tree_size(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int radix__create_section_mapping(unsigned long start, unsigned long end);
+int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 2b13f6b..b798ff6 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -139,7 +139,7 @@ int create_section_mapping(unsigned long start, unsigned long end)
 int remove_section_mapping(unsigned long start, unsigned long end)
 {
 	if (radix_enabled())
-		return -ENODEV;
+		return radix__remove_section_mapping(start, end);
 
 	return hash__remove_section_mapping(start, end);
 }
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 3588895..f7a8e625 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -457,10 +457,159 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
+{
+	pte_t *pte;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		pte = pte_start + i;
+		if (!pte_none(*pte))
+			return;
+	}
+
+	pte_free_kernel(&init_mm, pte_start);
+	pmd_clear(pmd);
+}
+
+static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+{
+	pmd_t *pmd;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd = pmd_start + i;
+		if (!pmd_none(*pmd))
+			return;
+	}
+
+	pmd_free(&init_mm, pmd_start);
+	pud_clear(pud);
+}
+
+static void free_pud_table(pud_t *pud_start, pgd_t *pgd)
+{
+	pud_t *pud;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		pud = pud_start + i;
+		if (!pud_none(*pud))
+			return;
+	}
+
+	pud_free(&init_mm, pud_start);
+	pgd_clear(pgd);
+}
+
+static void remove_pte_table(pte_t *pte_start, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	pte_t *pte;
+
+	pte = pte_start + pte_index(addr);
+	for (; addr < end; addr = next, pte++) {
+		next = (addr + PAGE_SIZE) & PAGE_MASK;
+		if (next > end)
+			next = end;
+
+		if (!pte_present(*pte))
+			continue;
+
+		pte_clear(&init_mm, addr, pte);
+	}
+}
+
+static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	pte_t *pte_base;
+	pmd_t *pmd;
+
+	pmd = pmd_start + pmd_index(addr);
+	for (; addr < end; addr = next, pmd++) {
+		next = pmd_addr_end(addr, end);
+
+		if (!pmd_present(*pmd))
+			continue;
+
+		if (pmd_huge(*pmd)) {
+			pte_clear(&init_mm, addr, (pte_t *)pmd);
+			continue;
+		}
+
+		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
+		remove_pte_table(pte_base, addr, next);
+		free_pte_table(pte_base, pmd);
+	}
+}
+
+static void remove_pud_table(pud_t *pud_start, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	pmd_t *pmd_base;
+	pud_t *pud;
+
+	pud = pud_start + pud_index(addr);
+	for (; addr < end; addr = next, pud++) {
+		next = pud_addr_end(addr, end);
+
+		if (!pud_present(*pud))
+			continue;
+
+		if (pud_huge(*pud)) {
+			pte_clear(&init_mm, addr, (pte_t *)pud);
+			continue;
+		}
+
+		pmd_base = (pmd_t *)pud_page_vaddr(*pud);
+		remove_pmd_table(pmd_base, addr, next);
+		free_pmd_table(pmd_base, pud);
+	}
+}
+
+static void remove_pagetable(unsigned long start, unsigned long end)
+{
+	unsigned long addr, next;
+	pud_t *pud_base;
+	pgd_t *pgd;
+
+	spin_lock(&init_mm.page_table_lock);
+
+	for (addr = start; addr < end; addr = next) {
+		next = pgd_addr_end(addr, end);
+
+		pgd = pgd_offset_k(addr);
+		if (!pgd_present(*pgd))
+			continue;
+
+		if (pgd_huge(*pgd)) {
+			pte_clear(&init_mm, addr, (pte_t *)pgd);
+			continue;
+		}
+
+		pud_base = (pud_t *)pgd_page_vaddr(*pgd);
+		remove_pud_table(pud_base, addr, next);
+		free_pud_table(pud_base, pgd);
+	}
+
+	spin_unlock(&init_mm.page_table_lock);
+	flush_tlb_kernel_range(start, end);
+}
+
 int __ref radix__create_section_mapping(unsigned long start, unsigned long end)
 {
 	return create_physical_mapping(start, end);
 }
+
+int radix__remove_section_mapping(unsigned long start, unsigned long end)
+{
+	remove_pagetable(start, end);
+	return 0;
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping()
  2017-01-03 20:43 [PATCH v4 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
                   ` (2 preceding siblings ...)
  2017-01-03 20:43 ` [PATCH v4 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
@ 2017-01-03 20:43 ` Reza Arbab
  3 siblings, 0 replies; 9+ messages in thread
From: Reza Arbab @ 2017-01-03 20:43 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Use remove_pagetable() and friends for radix vmemmap removal.

We do not require the special-case handling of vmemmap done in the x86
versions of these functions. This is because vmemmap_free() has already
freed the mapped pages, and calls us with an aligned address range.

So, add a few failsafe WARNs, but otherwise the code to remove physical
mappings is already sufficient for vmemmap.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable-radix.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index f7a8e625..bada0d9 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -517,6 +517,15 @@ static void remove_pte_table(pte_t *pte_start, unsigned long addr,
 		if (!pte_present(*pte))
 			continue;
 
+		if (!PAGE_ALIGNED(addr) || !PAGE_ALIGNED(next)) {
+			/*
+			 * The vmemmap_free() and remove_section_mapping()
+			 * codepaths call us with aligned addresses.
+			 */
+			WARN_ONCE(1, "%s: unaligned range\n", __func__);
+			continue;
+		}
+
 		pte_clear(&init_mm, addr, pte);
 	}
 }
@@ -536,6 +545,12 @@ static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
 			continue;
 
 		if (pmd_huge(*pmd)) {
+			if (!IS_ALIGNED(addr, PMD_SIZE) ||
+			    !IS_ALIGNED(next, PMD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			pte_clear(&init_mm, addr, (pte_t *)pmd);
 			continue;
 		}
@@ -561,6 +576,12 @@ static void remove_pud_table(pud_t *pud_start, unsigned long addr,
 			continue;
 
 		if (pud_huge(*pud)) {
+			if (!IS_ALIGNED(addr, PUD_SIZE) ||
+			    !IS_ALIGNED(next, PUD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			pte_clear(&init_mm, addr, (pte_t *)pud);
 			continue;
 		}
@@ -587,6 +608,12 @@ static void remove_pagetable(unsigned long start, unsigned long end)
 			continue;
 
 		if (pgd_huge(*pgd)) {
+			if (!IS_ALIGNED(addr, PGDIR_SIZE) ||
+			    !IS_ALIGNED(next, PGDIR_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			pte_clear(&init_mm, addr, (pte_t *)pgd);
 			continue;
 		}
@@ -627,7 +654,7 @@ int __meminit radix__vmemmap_create_mapping(unsigned long start,
 #ifdef CONFIG_MEMORY_HOTPLUG
 void radix__vmemmap_remove_mapping(unsigned long start, unsigned long page_size)
 {
-	/* FIXME!! intel does more. We should free page tables mapping vmemmap ? */
+	remove_pagetable(start, start + page_size);
 }
 #endif
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-03 20:43 ` [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
@ 2017-01-04  5:04   ` Aneesh Kumar K.V
  2017-01-04 21:25     ` Reza Arbab
  0 siblings, 1 reply; 9+ messages in thread
From: Aneesh Kumar K.V @ 2017-01-04  5:04 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> Move the page mapping code in radix_init_pgtable() into a separate
> function that will also be used for memory hotplug.
>
> The current goto loop progressively decreases its mapping size as it
> covers the tail of a range whose end is unaligned. Change this to a for
> loop which can do the same for both ends of the range.
>

We lost the below in the change.

		pr_info("Mapping range 0x%lx - 0x%lx with 0x%lx\n",
			(unsigned long)base, (unsigned long)end,
			linear_page_size);


Is there a way to dump the range and the size with which we mapped that
range ?


> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/pgtable-radix.c | 69 ++++++++++++++++++-----------------------
>  1 file changed, 31 insertions(+), 38 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 623a0dc..5cee6d1 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -107,54 +107,47 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
>  	return 0;
>  }
>
> +static int __meminit create_physical_mapping(unsigned long start,
> +					     unsigned long end)
> +{
> +	unsigned long mapping_size;
> +
> +	start = _ALIGN_UP(start, PAGE_SIZE);
> +	for (; start < end; start += mapping_size) {
> +		unsigned long gap = end - start;
> +		int rc;
> +
> +		if (IS_ALIGNED(start, PUD_SIZE) && gap >= PUD_SIZE &&
> +		    mmu_psize_defs[MMU_PAGE_1G].shift)
> +			mapping_size = PUD_SIZE;
> +		else if (IS_ALIGNED(start, PMD_SIZE) && gap >= PMD_SIZE &&
> +			 mmu_psize_defs[MMU_PAGE_2M].shift)
> +			mapping_size = PMD_SIZE;
> +		else
> +			mapping_size = PAGE_SIZE;
> +
> +		rc = radix__map_kernel_page((unsigned long)__va(start), start,
> +					    PAGE_KERNEL_X, mapping_size);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
>  static void __init radix_init_pgtable(void)
>  {
> -	int loop_count;
> -	u64 base, end, start_addr;
>  	unsigned long rts_field;
>  	struct memblock_region *reg;
> -	unsigned long linear_page_size;
>
>  	/* We don't support slb for radix */
>  	mmu_slb_size = 0;
>  	/*
>  	 * Create the linear mapping, using standard page size for now
>  	 */
> -	loop_count = 0;
> -	for_each_memblock(memory, reg) {
> -
> -		start_addr = reg->base;
> -
> -redo:
> -		if (loop_count < 1 && mmu_psize_defs[MMU_PAGE_1G].shift)
> -			linear_page_size = PUD_SIZE;
> -		else if (loop_count < 2 && mmu_psize_defs[MMU_PAGE_2M].shift)
> -			linear_page_size = PMD_SIZE;
> -		else
> -			linear_page_size = PAGE_SIZE;
> -
> -		base = _ALIGN_UP(start_addr, linear_page_size);
> -		end = _ALIGN_DOWN(reg->base + reg->size, linear_page_size);
> -
> -		pr_info("Mapping range 0x%lx - 0x%lx with 0x%lx\n",
> -			(unsigned long)base, (unsigned long)end,
> -			linear_page_size);
> -
> -		while (base < end) {
> -			radix__map_kernel_page((unsigned long)__va(base),
> -					      base, PAGE_KERNEL_X,
> -					      linear_page_size);
> -			base += linear_page_size;
> -		}
> -		/*
> -		 * map the rest using lower page size
> -		 */
> -		if (end < reg->base + reg->size) {
> -			start_addr = end;
> -			loop_count++;
> -			goto redo;
> -		}
> -	}
> +	for_each_memblock(memory, reg)
> +		WARN_ON(create_physical_mapping(reg->base,
> +						reg->base + reg->size));
>  	/*
>  	 * Allocate Partition table and process table for the
>  	 * host.
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/4] powerpc/mm: add radix__remove_section_mapping()
  2017-01-03 20:43 ` [PATCH v4 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
@ 2017-01-04  5:07   ` Aneesh Kumar K.V
  2017-01-04 21:28     ` Reza Arbab
  0 siblings, 1 reply; 9+ messages in thread
From: Aneesh Kumar K.V @ 2017-01-04  5:07 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> Tear down and free the four-level page tables of physical mappings
> during memory hotremove.
>
> Borrow the basic structure of remove_pagetable() and friends from the
> identically-named x86 functions. Simplify things a bit so locking and
> tlb flushing are only done in the outermost function.
>
> Memory must be offline to be removed, thus not in use. So there
> shouldn't be the sort of concurrent page walking activity here that
> might prompt us to use RCU.
>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h |   1 +
>  arch/powerpc/mm/pgtable-book3s64.c         |   2 +-
>  arch/powerpc/mm/pgtable-radix.c            | 149 +++++++++++++++++++++++++++++
>  3 files changed, 151 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index 43c2571..0032b66 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -294,6 +294,7 @@ static inline unsigned long radix__get_tree_size(void)
>
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  int radix__create_section_mapping(unsigned long start, unsigned long end);
> +int radix__remove_section_mapping(unsigned long start, unsigned long end);
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  #endif /* __ASSEMBLY__ */
>  #endif
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 2b13f6b..b798ff6 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -139,7 +139,7 @@ int create_section_mapping(unsigned long start, unsigned long end)
>  int remove_section_mapping(unsigned long start, unsigned long end)
>  {
>  	if (radix_enabled())
> -		return -ENODEV;
> +		return radix__remove_section_mapping(start, end);
>
>  	return hash__remove_section_mapping(start, end);
>  }
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 3588895..f7a8e625 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -457,10 +457,159 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  }
>
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
> +{
> +	pte_t *pte;
> +	int i;
> +
> +	for (i = 0; i < PTRS_PER_PTE; i++) {
> +		pte = pte_start + i;
> +		if (!pte_none(*pte))
> +			return;
> +	}
> +
> +	pte_free_kernel(&init_mm, pte_start);
> +	pmd_clear(pmd);
> +}
> +
> +static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
> +{
> +	pmd_t *pmd;
> +	int i;
> +
> +	for (i = 0; i < PTRS_PER_PMD; i++) {
> +		pmd = pmd_start + i;
> +		if (!pmd_none(*pmd))
> +			return;
> +	}
> +
> +	pmd_free(&init_mm, pmd_start);
> +	pud_clear(pud);
> +}
> +
> +static void free_pud_table(pud_t *pud_start, pgd_t *pgd)
> +{
> +	pud_t *pud;
> +	int i;
> +
> +	for (i = 0; i < PTRS_PER_PUD; i++) {
> +		pud = pud_start + i;
> +		if (!pud_none(*pud))
> +			return;
> +	}
> +
> +	pud_free(&init_mm, pud_start);
> +	pgd_clear(pgd);
> +}
> +
> +static void remove_pte_table(pte_t *pte_start, unsigned long addr,
> +			     unsigned long end)
> +{
> +	unsigned long next;
> +	pte_t *pte;
> +
> +	pte = pte_start + pte_index(addr);
> +	for (; addr < end; addr = next, pte++) {
> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
> +		if (next > end)
> +			next = end;
> +
> +		if (!pte_present(*pte))
> +			continue;
> +
> +		pte_clear(&init_mm, addr, pte);
> +	}
> +}
> +
> +static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
> +			     unsigned long end)
> +{
> +	unsigned long next;
> +	pte_t *pte_base;
> +	pmd_t *pmd;
> +
> +	pmd = pmd_start + pmd_index(addr);
> +	for (; addr < end; addr = next, pmd++) {
> +		next = pmd_addr_end(addr, end);
> +
> +		if (!pmd_present(*pmd))
> +			continue;
> +
> +		if (pmd_huge(*pmd)) {
> +			pte_clear(&init_mm, addr, (pte_t *)pmd);
> +			continue;
> +		}
> +
> +		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
> +		remove_pte_table(pte_base, addr, next);
> +		free_pte_table(pte_base, pmd);
> +	}
> +}
> +
> +static void remove_pud_table(pud_t *pud_start, unsigned long addr,
> +			     unsigned long end)
> +{
> +	unsigned long next;
> +	pmd_t *pmd_base;
> +	pud_t *pud;
> +
> +	pud = pud_start + pud_index(addr);
> +	for (; addr < end; addr = next, pud++) {
> +		next = pud_addr_end(addr, end);
> +
> +		if (!pud_present(*pud))
> +			continue;
> +
> +		if (pud_huge(*pud)) {
> +			pte_clear(&init_mm, addr, (pte_t *)pud);
> +			continue;
> +		}
> +
> +		pmd_base = (pmd_t *)pud_page_vaddr(*pud);
> +		remove_pmd_table(pmd_base, addr, next);
> +		free_pmd_table(pmd_base, pud);
> +	}
> +}
> +
> +static void remove_pagetable(unsigned long start, unsigned long end)
> +{
> +	unsigned long addr, next;
> +	pud_t *pud_base;
> +	pgd_t *pgd;
> +
> +	spin_lock(&init_mm.page_table_lock);
> +
> +	for (addr = start; addr < end; addr = next) {
> +		next = pgd_addr_end(addr, end);
> +
> +		pgd = pgd_offset_k(addr);
> +		if (!pgd_present(*pgd))
> +			continue;
> +
> +		if (pgd_huge(*pgd)) {
> +			pte_clear(&init_mm, addr, (pte_t *)pgd);
> +			continue;
> +		}
> +
> +		pud_base = (pud_t *)pgd_page_vaddr(*pgd);
> +		remove_pud_table(pud_base, addr, next);
> +		free_pud_table(pud_base, pgd);
> +	}
> +
> +	spin_unlock(&init_mm.page_table_lock);

What is this lock protecting ?


> +	flush_tlb_kernel_range(start, end);

We can use radix__flush_tlb_kernel_range avoiding an if
(radix_enabled()) conditional ? Also if needed we could make all the
above take a radix__ prefix ?


> +}
> +
>  int __ref radix__create_section_mapping(unsigned long start, unsigned long end)
>  {
>  	return create_physical_mapping(start, end);
>  }
> +
> +int radix__remove_section_mapping(unsigned long start, unsigned long end)
> +{
> +	remove_pagetable(start, end);
> +	return 0;
> +}
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-04  5:04   ` Aneesh Kumar K.V
@ 2017-01-04 21:25     ` Reza Arbab
  0 siblings, 0 replies; 9+ messages in thread
From: Reza Arbab @ 2017-01-04 21:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Balbir Singh, Alistair Popple

On Wed, Jan 04, 2017 at 10:34:25AM +0530, Aneesh Kumar K.V wrote:
>We lost the below in the change.
>
>		pr_info("Mapping range 0x%lx - 0x%lx with 0x%lx\n",
>			(unsigned long)base, (unsigned long)end,
>			linear_page_size);
>
>
>Is there a way to dump the range and the size with which we mapped that
>range ?

Sure. It's a little more difficult than before, because the mapping size 
is now reselected in each iteration of the loop, but a similar print can 
be done.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/4] powerpc/mm: add radix__remove_section_mapping()
  2017-01-04  5:07   ` Aneesh Kumar K.V
@ 2017-01-04 21:28     ` Reza Arbab
  0 siblings, 0 replies; 9+ messages in thread
From: Reza Arbab @ 2017-01-04 21:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Balbir Singh, Alistair Popple

On Wed, Jan 04, 2017 at 10:37:58AM +0530, Aneesh Kumar K.V wrote:
>Reza Arbab <arbab@linux.vnet.ibm.com> writes:
>> +static void remove_pagetable(unsigned long start, unsigned long end)
>> +{
>> +	unsigned long addr, next;
>> +	pud_t *pud_base;
>> +	pgd_t *pgd;
>> +
>> +	spin_lock(&init_mm.page_table_lock);
>> +
>> +	for (addr = start; addr < end; addr = next) {
>> +		next = pgd_addr_end(addr, end);
>> +
>> +		pgd = pgd_offset_k(addr);
>> +		if (!pgd_present(*pgd))
>> +			continue;
>> +
>> +		if (pgd_huge(*pgd)) {
>> +			pte_clear(&init_mm, addr, (pte_t *)pgd);
>> +			continue;
>> +		}
>> +
>> +		pud_base = (pud_t *)pgd_page_vaddr(*pgd);
>> +		remove_pud_table(pud_base, addr, next);
>> +		free_pud_table(pud_base, pgd);
>> +	}
>> +
>> +	spin_unlock(&init_mm.page_table_lock);
>
>What is this lock protecting ?

The more I look into it, I'm not sure. This is still an artifact from 
the x86 functions, where they lock/unlock agressively, as you and Ben 
noted. I can take it out. 

>> +	flush_tlb_kernel_range(start, end);
>
>We can use radix__flush_tlb_kernel_range avoiding an if
>(radix_enabled()) conditional ?

Yes, good idea.

>(radix_enabled()) conditional ? Also if needed we could make all the
>above take a radix__ prefix ?

You mean rename all these new functions? We could, but I don't really 
see why. These functions are static to pgtable-radix.c, there aren't 
hash__ versions to differentiate from, and it seemed helpful to mirror 
the x86 names.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-04 21:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-03 20:43 [PATCH v4 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
2017-01-03 20:43 ` [PATCH v4 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
2017-01-04  5:04   ` Aneesh Kumar K.V
2017-01-04 21:25     ` Reza Arbab
2017-01-03 20:43 ` [PATCH v4 2/4] powerpc/mm: add radix__create_section_mapping() Reza Arbab
2017-01-03 20:43 ` [PATCH v4 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
2017-01-04  5:07   ` Aneesh Kumar K.V
2017-01-04 21:28     ` Reza Arbab
2017-01-03 20:43 ` [PATCH v4 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.