All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] powerpc/mm: enable memory hotplug on radix
@ 2017-01-16 19:07 Reza Arbab
  2017-01-16 19:07 ` [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Reza Arbab @ 2017-01-16 19:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Do the plumbing needed for memory hotplug on systems using radix pagetables,
borrowing from existing vmemmap and x86 code.

This passes basic verification of plugging and removing memory, but this 
stuff is tricky and I'd appreciate extra scrutiny of the series for 
correctness--in particular, the adaptation of remove_pagetable() from x86.

Former patch 1 is now a separate fix. This set still depends on it:
https://lkml.kernel.org/r/1483475991-16999-1-git-send-email-arbab@linux.vnet.ibm.com

/* changelog */

v5:
* Retain pr_info() of the size used to map each address range.

* flush_tlb_kernel_range() -> radix__flush_tlb_kernel_range()

v4:
* https://lkml.kernel.org/r/1483476218-17271-1-git-send-email-arbab@linux.vnet.ibm.com

* Sent patch 1 as a standalone fix.

* Extract a common function that can be used by both radix_init_pgtable() and
  radix__create_section_mapping().

* Reduce tlb flushing to one flush_tlb_kernel_range() per section, and do
  less granular locking of init_mm->page_table_lock.

v3:
* https://lkml.kernel.org/r/1481831443-22761-1-git-send-email-arbab@linux.vnet.ibm.com

* Port remove_pagetable() et al. from x86 for unmapping.

* [RFC] -> [PATCH]

v2:
* https://lkml.kernel.org/r/1471449083-15931-1-git-send-email-arbab@linux.vnet.ibm.com

* Do not simply fall through to vmemmap_{create,remove}_mapping(). As Aneesh
  and Michael pointed out, they are tied to CONFIG_SPARSEMEM_VMEMMAP and only
  did what I needed by luck anyway.

v1:
* https://lkml.kernel.org/r/1466699962-22412-1-git-send-email-arbab@linux.vnet.ibm.com

Reza Arbab (4):
  powerpc/mm: refactor radix physical page mapping
  powerpc/mm: add radix__create_section_mapping()
  powerpc/mm: add radix__remove_section_mapping()
  powerpc/mm: unstub radix__vmemmap_remove_mapping()

 arch/powerpc/include/asm/book3s/64/radix.h |   5 +
 arch/powerpc/mm/pgtable-book3s64.c         |   4 +-
 arch/powerpc/mm/pgtable-radix.c            | 257 ++++++++++++++++++++++++-----
 3 files changed, 225 insertions(+), 41 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-16 19:07 [PATCH v5 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
@ 2017-01-16 19:07 ` Reza Arbab
  2017-01-17  6:46   ` Balbir Singh
                     ` (2 more replies)
  2017-01-16 19:07 ` [PATCH v5 2/4] powerpc/mm: add radix__create_section_mapping() Reza Arbab
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 19+ messages in thread
From: Reza Arbab @ 2017-01-16 19:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Move the page mapping code in radix_init_pgtable() into a separate
function that will also be used for memory hotplug.

The current goto loop progressively decreases its mapping size as it
covers the tail of a range whose end is unaligned. Change this to a for
loop which can do the same for both ends of the range.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable-radix.c | 88 +++++++++++++++++++++++------------------
 1 file changed, 50 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 623a0dc..2ce1354 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -107,54 +107,66 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
 	return 0;
 }
 
+static inline void __meminit print_mapping(unsigned long start,
+					   unsigned long end,
+					   unsigned long size)
+{
+	if (end <= start)
+		return;
+
+	pr_info("Mapped range 0x%lx - 0x%lx with 0x%lx\n", start, end, size);
+}
+
+static int __meminit create_physical_mapping(unsigned long start,
+					     unsigned long end)
+{
+	unsigned long addr, mapping_size;
+
+	start = _ALIGN_UP(start, PAGE_SIZE);
+	for (addr = start; addr < end; addr += mapping_size) {
+		unsigned long gap, previous_size;
+		int rc;
+
+		gap = end - addr;
+		previous_size = mapping_size;
+
+		if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
+		    mmu_psize_defs[MMU_PAGE_1G].shift)
+			mapping_size = PUD_SIZE;
+		else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
+			 mmu_psize_defs[MMU_PAGE_2M].shift)
+			mapping_size = PMD_SIZE;
+		else
+			mapping_size = PAGE_SIZE;
+
+		if (mapping_size != previous_size) {
+			print_mapping(start, addr, previous_size);
+			start = addr;
+		}
+
+		rc = radix__map_kernel_page((unsigned long)__va(addr), addr,
+					    PAGE_KERNEL_X, mapping_size);
+		if (rc)
+			return rc;
+	}
+
+	print_mapping(start, addr, mapping_size);
+	return 0;
+}
+
 static void __init radix_init_pgtable(void)
 {
-	int loop_count;
-	u64 base, end, start_addr;
 	unsigned long rts_field;
 	struct memblock_region *reg;
-	unsigned long linear_page_size;
 
 	/* We don't support slb for radix */
 	mmu_slb_size = 0;
 	/*
 	 * Create the linear mapping, using standard page size for now
 	 */
-	loop_count = 0;
-	for_each_memblock(memory, reg) {
-
-		start_addr = reg->base;
-
-redo:
-		if (loop_count < 1 && mmu_psize_defs[MMU_PAGE_1G].shift)
-			linear_page_size = PUD_SIZE;
-		else if (loop_count < 2 && mmu_psize_defs[MMU_PAGE_2M].shift)
-			linear_page_size = PMD_SIZE;
-		else
-			linear_page_size = PAGE_SIZE;
-
-		base = _ALIGN_UP(start_addr, linear_page_size);
-		end = _ALIGN_DOWN(reg->base + reg->size, linear_page_size);
-
-		pr_info("Mapping range 0x%lx - 0x%lx with 0x%lx\n",
-			(unsigned long)base, (unsigned long)end,
-			linear_page_size);
-
-		while (base < end) {
-			radix__map_kernel_page((unsigned long)__va(base),
-					      base, PAGE_KERNEL_X,
-					      linear_page_size);
-			base += linear_page_size;
-		}
-		/*
-		 * map the rest using lower page size
-		 */
-		if (end < reg->base + reg->size) {
-			start_addr = end;
-			loop_count++;
-			goto redo;
-		}
-	}
+	for_each_memblock(memory, reg)
+		WARN_ON(create_physical_mapping(reg->base,
+						reg->base + reg->size));
 	/*
 	 * Allocate Partition table and process table for the
 	 * host.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 2/4] powerpc/mm: add radix__create_section_mapping()
  2017-01-16 19:07 [PATCH v5 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
  2017-01-16 19:07 ` [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
@ 2017-01-16 19:07 ` Reza Arbab
  2017-01-17  6:48   ` Balbir Singh
  2017-01-16 19:07 ` [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
  2017-01-16 19:07 ` [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
  3 siblings, 1 reply; 19+ messages in thread
From: Reza Arbab @ 2017-01-16 19:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Wire up memory hotplug page mapping for radix. Share the mapping
function already used by radix_init_pgtable().

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h | 4 ++++
 arch/powerpc/mm/pgtable-book3s64.c         | 2 +-
 arch/powerpc/mm/pgtable-radix.c            | 7 +++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index b4d1302..43c2571 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -291,5 +291,9 @@ static inline unsigned long radix__get_tree_size(void)
 	}
 	return rts_field;
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int radix__create_section_mapping(unsigned long start, unsigned long end);
+#endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 653ff6c..2b13f6b 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -131,7 +131,7 @@ void mmu_cleanup_all(void)
 int create_section_mapping(unsigned long start, unsigned long end)
 {
 	if (radix_enabled())
-		return -ENODEV;
+		return radix__create_section_mapping(start, end);
 
 	return hash__create_section_mapping(start, end);
 }
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 2ce1354..075b4ec 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -475,6 +475,13 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	memblock_set_current_limit(first_memblock_base + first_memblock_size);
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+int __ref radix__create_section_mapping(unsigned long start, unsigned long end)
+{
+	return create_physical_mapping(start, end);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 int __meminit radix__vmemmap_create_mapping(unsigned long start,
 				      unsigned long page_size,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping()
  2017-01-16 19:07 [PATCH v5 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
  2017-01-16 19:07 ` [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
  2017-01-16 19:07 ` [PATCH v5 2/4] powerpc/mm: add radix__create_section_mapping() Reza Arbab
@ 2017-01-16 19:07 ` Reza Arbab
  2017-01-17  7:22   ` Balbir Singh
  2017-01-16 19:07 ` [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
  3 siblings, 1 reply; 19+ messages in thread
From: Reza Arbab @ 2017-01-16 19:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Tear down and free the four-level page tables of physical mappings
during memory hotremove.

Borrow the basic structure of remove_pagetable() and friends from the
identically-named x86 functions. Reduce the frequency of tlb flushes and
page_table_lock spinlocks by only doing them in the outermost function.
There was some question as to whether the locking is needed at all.
Leave it for now, but we could consider dropping it.

Memory must be offline to be removed, thus not in use. So there
shouldn't be the sort of concurrent page walking activity here that
might prompt us to use RCU.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h |   1 +
 arch/powerpc/mm/pgtable-book3s64.c         |   2 +-
 arch/powerpc/mm/pgtable-radix.c            | 133 +++++++++++++++++++++++++++++
 3 files changed, 135 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 43c2571..0032b66 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -294,6 +294,7 @@ static inline unsigned long radix__get_tree_size(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int radix__create_section_mapping(unsigned long start, unsigned long end);
+int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 2b13f6b..b798ff6 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -139,7 +139,7 @@ int create_section_mapping(unsigned long start, unsigned long end)
 int remove_section_mapping(unsigned long start, unsigned long end)
 {
 	if (radix_enabled())
-		return -ENODEV;
+		return radix__remove_section_mapping(start, end);
 
 	return hash__remove_section_mapping(start, end);
 }
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 075b4ec..cfba666 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -476,10 +476,143 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
+{
+	pte_t *pte;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		pte = pte_start + i;
+		if (!pte_none(*pte))
+			return;
+	}
+
+	pte_free_kernel(&init_mm, pte_start);
+	pmd_clear(pmd);
+}
+
+static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+{
+	pmd_t *pmd;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd = pmd_start + i;
+		if (!pmd_none(*pmd))
+			return;
+	}
+
+	pmd_free(&init_mm, pmd_start);
+	pud_clear(pud);
+}
+
+static void remove_pte_table(pte_t *pte_start, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	pte_t *pte;
+
+	pte = pte_start + pte_index(addr);
+	for (; addr < end; addr = next, pte++) {
+		next = (addr + PAGE_SIZE) & PAGE_MASK;
+		if (next > end)
+			next = end;
+
+		if (!pte_present(*pte))
+			continue;
+
+		pte_clear(&init_mm, addr, pte);
+	}
+}
+
+static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	pte_t *pte_base;
+	pmd_t *pmd;
+
+	pmd = pmd_start + pmd_index(addr);
+	for (; addr < end; addr = next, pmd++) {
+		next = pmd_addr_end(addr, end);
+
+		if (!pmd_present(*pmd))
+			continue;
+
+		if (pmd_huge(*pmd)) {
+			pte_clear(&init_mm, addr, (pte_t *)pmd);
+			continue;
+		}
+
+		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
+		remove_pte_table(pte_base, addr, next);
+		free_pte_table(pte_base, pmd);
+	}
+}
+
+static void remove_pud_table(pud_t *pud_start, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	pmd_t *pmd_base;
+	pud_t *pud;
+
+	pud = pud_start + pud_index(addr);
+	for (; addr < end; addr = next, pud++) {
+		next = pud_addr_end(addr, end);
+
+		if (!pud_present(*pud))
+			continue;
+
+		if (pud_huge(*pud)) {
+			pte_clear(&init_mm, addr, (pte_t *)pud);
+			continue;
+		}
+
+		pmd_base = (pmd_t *)pud_page_vaddr(*pud);
+		remove_pmd_table(pmd_base, addr, next);
+		free_pmd_table(pmd_base, pud);
+	}
+}
+
+static void remove_pagetable(unsigned long start, unsigned long end)
+{
+	unsigned long addr, next;
+	pud_t *pud_base;
+	pgd_t *pgd;
+
+	spin_lock(&init_mm.page_table_lock);
+
+	for (addr = start; addr < end; addr = next) {
+		next = pgd_addr_end(addr, end);
+
+		pgd = pgd_offset_k(addr);
+		if (!pgd_present(*pgd))
+			continue;
+
+		if (pgd_huge(*pgd)) {
+			pte_clear(&init_mm, addr, (pte_t *)pgd);
+			continue;
+		}
+
+		pud_base = (pud_t *)pgd_page_vaddr(*pgd);
+		remove_pud_table(pud_base, addr, next);
+	}
+
+	spin_unlock(&init_mm.page_table_lock);
+	radix__flush_tlb_kernel_range(start, end);
+}
+
 int __ref radix__create_section_mapping(unsigned long start, unsigned long end)
 {
 	return create_physical_mapping(start, end);
 }
+
+int radix__remove_section_mapping(unsigned long start, unsigned long end)
+{
+	remove_pagetable(start, end);
+	return 0;
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping()
  2017-01-16 19:07 [PATCH v5 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
                   ` (2 preceding siblings ...)
  2017-01-16 19:07 ` [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
@ 2017-01-16 19:07 ` Reza Arbab
  2017-01-17  7:25   ` Balbir Singh
  3 siblings, 1 reply; 19+ messages in thread
From: Reza Arbab @ 2017-01-16 19:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Use remove_pagetable() and friends for radix vmemmap removal.

We do not require the special-case handling of vmemmap done in the x86
versions of these functions. This is because vmemmap_free() has already
freed the mapped pages, and calls us with an aligned address range.

So, add a few failsafe WARNs, but otherwise the code to remove physical
mappings is already sufficient for vmemmap.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable-radix.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index cfba666..6d06171 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -521,6 +521,15 @@ static void remove_pte_table(pte_t *pte_start, unsigned long addr,
 		if (!pte_present(*pte))
 			continue;
 
+		if (!PAGE_ALIGNED(addr) || !PAGE_ALIGNED(next)) {
+			/*
+			 * The vmemmap_free() and remove_section_mapping()
+			 * codepaths call us with aligned addresses.
+			 */
+			WARN_ONCE(1, "%s: unaligned range\n", __func__);
+			continue;
+		}
+
 		pte_clear(&init_mm, addr, pte);
 	}
 }
@@ -540,6 +549,12 @@ static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
 			continue;
 
 		if (pmd_huge(*pmd)) {
+			if (!IS_ALIGNED(addr, PMD_SIZE) ||
+			    !IS_ALIGNED(next, PMD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			pte_clear(&init_mm, addr, (pte_t *)pmd);
 			continue;
 		}
@@ -565,6 +580,12 @@ static void remove_pud_table(pud_t *pud_start, unsigned long addr,
 			continue;
 
 		if (pud_huge(*pud)) {
+			if (!IS_ALIGNED(addr, PUD_SIZE) ||
+			    !IS_ALIGNED(next, PUD_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			pte_clear(&init_mm, addr, (pte_t *)pud);
 			continue;
 		}
@@ -591,6 +612,12 @@ static void remove_pagetable(unsigned long start, unsigned long end)
 			continue;
 
 		if (pgd_huge(*pgd)) {
+			if (!IS_ALIGNED(addr, PGDIR_SIZE) ||
+			    !IS_ALIGNED(next, PGDIR_SIZE)) {
+				WARN_ONCE(1, "%s: unaligned range\n", __func__);
+				continue;
+			}
+
 			pte_clear(&init_mm, addr, (pte_t *)pgd);
 			continue;
 		}
@@ -630,7 +657,7 @@ int __meminit radix__vmemmap_create_mapping(unsigned long start,
 #ifdef CONFIG_MEMORY_HOTPLUG
 void radix__vmemmap_remove_mapping(unsigned long start, unsigned long page_size)
 {
-	/* FIXME!! intel does more. We should free page tables mapping vmemmap ? */
+	remove_pagetable(start, start + page_size);
 }
 #endif
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-16 19:07 ` [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
@ 2017-01-17  6:46   ` Balbir Singh
  2017-01-17 18:34     ` Reza Arbab
  2017-01-30  8:38   ` Michael Ellerman
  2017-02-01  1:05   ` [v5,1/4] " Michael Ellerman
  2 siblings, 1 reply; 19+ messages in thread
From: Balbir Singh @ 2017-01-17  6:46 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

On Mon, Jan 16, 2017 at 01:07:43PM -0600, Reza Arbab wrote:
> Move the page mapping code in radix_init_pgtable() into a separate
> function that will also be used for memory hotplug.
> 
> The current goto loop progressively decreases its mapping size as it
> covers the tail of a range whose end is unaligned. Change this to a for
> loop which can do the same for both ends of the range.
> 
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/pgtable-radix.c | 88 +++++++++++++++++++++++------------------
>  1 file changed, 50 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 623a0dc..2ce1354 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -107,54 +107,66 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
>  	return 0;
>  }
>  
> +static inline void __meminit print_mapping(unsigned long start,
> +					   unsigned long end,
> +					   unsigned long size)
> +{
> +	if (end <= start)
> +		return;

Should we pr_err for start > end?

> +
> +	pr_info("Mapped range 0x%lx - 0x%lx with 0x%lx\n", start, end, size);
> +}
> +
> +static int __meminit create_physical_mapping(unsigned long start,
> +					     unsigned long end)
> +{
> +	unsigned long addr, mapping_size;
> +
> +	start = _ALIGN_UP(start, PAGE_SIZE);
> +	for (addr = start; addr < end; addr += mapping_size) {
> +		unsigned long gap, previous_size;
> +		int rc;
> +
> +		gap = end - addr;
> +		previous_size = mapping_size;
> +
> +		if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
> +		    mmu_psize_defs[MMU_PAGE_1G].shift)
> +			mapping_size = PUD_SIZE;
> +		else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
> +			 mmu_psize_defs[MMU_PAGE_2M].shift)
> +			mapping_size = PMD_SIZE;
> +		else
> +			mapping_size = PAGE_SIZE;
> +
> +		if (mapping_size != previous_size) {
> +			print_mapping(start, addr, previous_size);
> +			start = addr;
> +		}
> +
> +		rc = radix__map_kernel_page((unsigned long)__va(addr), addr,
> +					    PAGE_KERNEL_X, mapping_size);
> +		if (rc)
> +			return rc;

Should we try a lower page size if map_kernel_page fails for this mapping_size?
I like the cleanup very much, BTW

Balbir Singh.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 2/4] powerpc/mm: add radix__create_section_mapping()
  2017-01-16 19:07 ` [PATCH v5 2/4] powerpc/mm: add radix__create_section_mapping() Reza Arbab
@ 2017-01-17  6:48   ` Balbir Singh
  0 siblings, 0 replies; 19+ messages in thread
From: Balbir Singh @ 2017-01-17  6:48 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

On Mon, Jan 16, 2017 at 01:07:44PM -0600, Reza Arbab wrote:
> Wire up memory hotplug page mapping for radix. Share the mapping
> function already used by radix_init_pgtable().
> 
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping()
  2017-01-16 19:07 ` [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
@ 2017-01-17  7:22   ` Balbir Singh
  2017-01-17 18:36     ` Reza Arbab
  0 siblings, 1 reply; 19+ messages in thread
From: Balbir Singh @ 2017-01-17  7:22 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

On Mon, Jan 16, 2017 at 01:07:45PM -0600, Reza Arbab wrote:
> Tear down and free the four-level page tables of physical mappings
> during memory hotremove.
> 
> Borrow the basic structure of remove_pagetable() and friends from the
> identically-named x86 functions. Reduce the frequency of tlb flushes and
> page_table_lock spinlocks by only doing them in the outermost function.
> There was some question as to whether the locking is needed at all.
> Leave it for now, but we could consider dropping it.
> 
> Memory must be offline to be removed, thus not in use. So there
> shouldn't be the sort of concurrent page walking activity here that
> might prompt us to use RCU.
> 
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h |   1 +
>  arch/powerpc/mm/pgtable-book3s64.c         |   2 +-
>  arch/powerpc/mm/pgtable-radix.c            | 133 +++++++++++++++++++++++++++++
>  3 files changed, 135 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index 43c2571..0032b66 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -294,6 +294,7 @@ static inline unsigned long radix__get_tree_size(void)
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  int radix__create_section_mapping(unsigned long start, unsigned long end);
> +int radix__remove_section_mapping(unsigned long start, unsigned long end);
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  #endif /* __ASSEMBLY__ */
>  #endif
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 2b13f6b..b798ff6 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -139,7 +139,7 @@ int create_section_mapping(unsigned long start, unsigned long end)
>  int remove_section_mapping(unsigned long start, unsigned long end)
>  {
>  	if (radix_enabled())
> -		return -ENODEV;
> +		return radix__remove_section_mapping(start, end);
>  
>  	return hash__remove_section_mapping(start, end);
>  }
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 075b4ec..cfba666 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -476,10 +476,143 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  }
>

Shouldn't most of these functions have __meminit?
  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
> +{
> +	pte_t *pte;
> +	int i;
> +
> +	for (i = 0; i < PTRS_PER_PTE; i++) {
> +		pte = pte_start + i;
> +		if (!pte_none(*pte))
> +			return;

If !pte_none() we fail the hotplug? Or silently
leave the allocated pte's around. I guess this is
the same as x86

> +	}
> +
> +	pte_free_kernel(&init_mm, pte_start);
> +	pmd_clear(pmd);
> +}
> +
> +static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
> +{
> +	pmd_t *pmd;
> +	int i;
> +
> +	for (i = 0; i < PTRS_PER_PMD; i++) {
> +		pmd = pmd_start + i;
> +		if (!pmd_none(*pmd))
> +			return;
> +	}
> +
> +	pmd_free(&init_mm, pmd_start);
> +	pud_clear(pud);
> +}
> +
> +static void remove_pte_table(pte_t *pte_start, unsigned long addr,
> +			     unsigned long end)
> +{
> +	unsigned long next;
> +	pte_t *pte;
> +
> +	pte = pte_start + pte_index(addr);
> +	for (; addr < end; addr = next, pte++) {
> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
> +		if (next > end)
> +			next = end;
> +
> +		if (!pte_present(*pte))
> +			continue;
> +
> +		pte_clear(&init_mm, addr, pte);
> +	}
> +}
> +
> +static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
> +			     unsigned long end)
> +{
> +	unsigned long next;
> +	pte_t *pte_base;
> +	pmd_t *pmd;
> +
> +	pmd = pmd_start + pmd_index(addr);
> +	for (; addr < end; addr = next, pmd++) {
> +		next = pmd_addr_end(addr, end);
> +
> +		if (!pmd_present(*pmd))
> +			continue;
> +
> +		if (pmd_huge(*pmd)) {
> +			pte_clear(&init_mm, addr, (pte_t *)pmd);

pmd_clear()?

> +			continue;
> +		}
> +
> +		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
> +		remove_pte_table(pte_base, addr, next);
> +		free_pte_table(pte_base, pmd);
> +	}
> +}
> +
> +static void remove_pud_table(pud_t *pud_start, unsigned long addr,
> +			     unsigned long end)
> +{
> +	unsigned long next;
> +	pmd_t *pmd_base;
> +	pud_t *pud;
> +
> +	pud = pud_start + pud_index(addr);
> +	for (; addr < end; addr = next, pud++) {
> +		next = pud_addr_end(addr, end);
> +
> +		if (!pud_present(*pud))
> +			continue;
> +
> +		if (pud_huge(*pud)) {
> +			pte_clear(&init_mm, addr, (pte_t *)pud);
pud_clear()?
> +			continue;
> +		}
> +
> +		pmd_base = (pmd_t *)pud_page_vaddr(*pud);
> +		remove_pmd_table(pmd_base, addr, next);
> +		free_pmd_table(pmd_base, pud);
> +	}
> +}
> +
> +static void remove_pagetable(unsigned long start, unsigned long end)
> +{
> +	unsigned long addr, next;
> +	pud_t *pud_base;
> +	pgd_t *pgd;
> +
> +	spin_lock(&init_mm.page_table_lock);
> +

x86 does more granular lock acquisition only during
clearing the relevant entries. I suppose we don't have
to worry about it since its not fast path and frequent.

Balbir Singh.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping()
  2017-01-16 19:07 ` [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
@ 2017-01-17  7:25   ` Balbir Singh
  2017-01-17 18:36     ` Reza Arbab
  0 siblings, 1 reply; 19+ messages in thread
From: Balbir Singh @ 2017-01-17  7:25 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

On Mon, Jan 16, 2017 at 01:07:46PM -0600, Reza Arbab wrote:
> Use remove_pagetable() and friends for radix vmemmap removal.
> 
> We do not require the special-case handling of vmemmap done in the x86
> versions of these functions. This is because vmemmap_free() has already
> freed the mapped pages, and calls us with an aligned address range.
> 
> So, add a few failsafe WARNs, but otherwise the code to remove physical
> mappings is already sufficient for vmemmap.

I wonder if we really need them?

Balbir Singh.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-17  6:46   ` Balbir Singh
@ 2017-01-17 18:34     ` Reza Arbab
  2017-01-18  1:14       ` Balbir Singh
  0 siblings, 1 reply; 19+ messages in thread
From: Reza Arbab @ 2017-01-17 18:34 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

Thanks for your review!

On Tue, Jan 17, 2017 at 12:16:35PM +0530, Balbir Singh wrote:
>On Mon, Jan 16, 2017 at 01:07:43PM -0600, Reza Arbab wrote:
>> --- a/arch/powerpc/mm/pgtable-radix.c
>> +++ b/arch/powerpc/mm/pgtable-radix.c
>> @@ -107,54 +107,66 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
>>  	return 0;
>>  }
>>
>> +static inline void __meminit print_mapping(unsigned long start,
>> +					   unsigned long end,
>> +					   unsigned long size)
>> +{
>> +	if (end <= start)
>> +		return;
>
>Should we pr_err for start > end?

I think that would be overkill. The way this little inline is called, 
start > end is not possible. The real point is not to print anything if 
start == end. Using <= just seemed better in context.

>> +static int __meminit create_physical_mapping(unsigned long start,
>> +					     unsigned long end)
>> +{
>> +	unsigned long addr, mapping_size;
>> +
>> +	start = _ALIGN_UP(start, PAGE_SIZE);
>> +	for (addr = start; addr < end; addr += mapping_size) {
>> +		unsigned long gap, previous_size;
>> +		int rc;
>> +
>> +		gap = end - addr;
>> +		previous_size = mapping_size;
>> +
>> +		if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
>> +		    mmu_psize_defs[MMU_PAGE_1G].shift)
>> +			mapping_size = PUD_SIZE;
>> +		else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
>> +			 mmu_psize_defs[MMU_PAGE_2M].shift)
>> +			mapping_size = PMD_SIZE;
>> +		else
>> +			mapping_size = PAGE_SIZE;
>> +
>> +		if (mapping_size != previous_size) {
>> +			print_mapping(start, addr, previous_size);
>> +			start = addr;
>> +		}
>> +
>> +		rc = radix__map_kernel_page((unsigned long)__va(addr), addr,
>> +					    PAGE_KERNEL_X, mapping_size);
>> +		if (rc)
>> +			return rc;
>
>Should we try a lower page size if map_kernel_page fails for this mapping_size?

The only way map_kernel_page can fail is -ENOMEM. If that's the case,
there's no way we're going to be able to map this range at all. Better 
to fail fast here, I would think.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping()
  2017-01-17  7:22   ` Balbir Singh
@ 2017-01-17 18:36     ` Reza Arbab
  2017-01-18  1:22       ` Balbir Singh
  0 siblings, 1 reply; 19+ messages in thread
From: Reza Arbab @ 2017-01-17 18:36 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

On Tue, Jan 17, 2017 at 12:52:51PM +0530, Balbir Singh wrote:
>Shouldn't most of these functions have __meminit?

I don't think so. The mapping functions are __meminit, but the unmapping 
functions are completely within #ifdef CONFIG_MEMORY_HOTPLUG already.

>On Mon, Jan 16, 2017 at 01:07:45PM -0600, Reza Arbab wrote:
>>  #ifdef CONFIG_MEMORY_HOTPLUG
>> +static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
>> +{
>> +	pte_t *pte;
>> +	int i;
>> +
>> +	for (i = 0; i < PTRS_PER_PTE; i++) {
>> +		pte = pte_start + i;
>> +		if (!pte_none(*pte))
>> +			return;
>
>If !pte_none() we fail the hotplug? Or silently
>leave the allocated pte's around. I guess this is
>the same as x86

The latter--it's not a failure. If you provided remove_pagetable() an 
unaligned address range, there could be a pte left unremoved at either 
end.

>> +static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
>> +			     unsigned long end)
>> +{
>> +	unsigned long next;
>> +	pte_t *pte_base;
>> +	pmd_t *pmd;
>> +
>> +	pmd = pmd_start + pmd_index(addr);
>> +	for (; addr < end; addr = next, pmd++) {
>> +		next = pmd_addr_end(addr, end);
>> +
>> +		if (!pmd_present(*pmd))
>> +			continue;
>> +
>> +		if (pmd_huge(*pmd)) {
>> +			pte_clear(&init_mm, addr, (pte_t *)pmd);
>
>pmd_clear()?

I used pte_clear() to mirror what happens in radix__map_kernel_page():

		if (map_page_size == PMD_SIZE) {
			ptep = (pte_t *)pmdp;
			goto set_the_pte;
		}

		[...]

	set_the_pte:
		set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, flags));

Would pmd_clear() be equivalent, since the pointer got set like a pte?

>> +static void remove_pagetable(unsigned long start, unsigned long end)
>> +{
>> +	unsigned long addr, next;
>> +	pud_t *pud_base;
>> +	pgd_t *pgd;
>> +
>> +	spin_lock(&init_mm.page_table_lock);
>> +
>
>x86 does more granular lock acquisition only during
>clearing the relevant entries. I suppose we don't have
>to worry about it since its not fast path and frequent.

Yep. Ben thought the locking in remove_pte_table() was actually too 
granular, and Aneesh questioned what was being protected in the first 
place. So I left one lock/unlock in the outermost function for now.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping()
  2017-01-17  7:25   ` Balbir Singh
@ 2017-01-17 18:36     ` Reza Arbab
  2017-01-18  1:53       ` Balbir Singh
  0 siblings, 1 reply; 19+ messages in thread
From: Reza Arbab @ 2017-01-17 18:36 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

On Tue, Jan 17, 2017 at 12:55:13PM +0530, Balbir Singh wrote:
>On Mon, Jan 16, 2017 at 01:07:46PM -0600, Reza Arbab wrote:
>> Use remove_pagetable() and friends for radix vmemmap removal.
>>
>> We do not require the special-case handling of vmemmap done in the x86
>> versions of these functions. This is because vmemmap_free() has already
>> freed the mapped pages, and calls us with an aligned address range.
>>
>> So, add a few failsafe WARNs, but otherwise the code to remove physical
>> mappings is already sufficient for vmemmap.
>
>I wonder if we really need them?

Not sure what the guideline is for a "this shouldn't happen" WARN. It 
could save us some grief, should our vmemmap code ever start calling 
with an unaligned range, like it does on x86.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-17 18:34     ` Reza Arbab
@ 2017-01-18  1:14       ` Balbir Singh
  0 siblings, 0 replies; 19+ messages in thread
From: Balbir Singh @ 2017-01-18  1:14 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Balbir Singh, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

On Tue, Jan 17, 2017 at 12:34:56PM -0600, Reza Arbab wrote:
> Thanks for your review!
> 
> On Tue, Jan 17, 2017 at 12:16:35PM +0530, Balbir Singh wrote:
> > On Mon, Jan 16, 2017 at 01:07:43PM -0600, Reza Arbab wrote:
> > > --- a/arch/powerpc/mm/pgtable-radix.c
> > > +++ b/arch/powerpc/mm/pgtable-radix.c
> > > @@ -107,54 +107,66 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
> > >  	return 0;
> > >  }
> > > 
> > > +static inline void __meminit print_mapping(unsigned long start,
> > > +					   unsigned long end,
> > > +					   unsigned long size)
> > > +{
> > > +	if (end <= start)
> > > +		return;
> > 
> > Should we pr_err for start > end?
> 
> I think that would be overkill. The way this little inline is called, start
> > end is not possible. The real point is not to print anything if start ==
> end. Using <= just seemed better in context.
>

Agreed

<snip>
 
> > 
> > Should we try a lower page size if map_kernel_page fails for this mapping_size?
> 
> The only way map_kernel_page can fail is -ENOMEM. If that's the case,
> there's no way we're going to be able to map this range at all. Better to
> fail fast here, I would think.
>

I think I am OK with this implementation for now. 

Balbir Singh. 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping()
  2017-01-17 18:36     ` Reza Arbab
@ 2017-01-18  1:22       ` Balbir Singh
  0 siblings, 0 replies; 19+ messages in thread
From: Balbir Singh @ 2017-01-18  1:22 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Balbir Singh, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

On Tue, Jan 17, 2017 at 12:36:21PM -0600, Reza Arbab wrote:
> On Tue, Jan 17, 2017 at 12:52:51PM +0530, Balbir Singh wrote:
> > Shouldn't most of these functions have __meminit?
> 
> I don't think so. The mapping functions are __meminit, but the unmapping
> functions are completely within #ifdef CONFIG_MEMORY_HOTPLUG already.
> 
> > On Mon, Jan 16, 2017 at 01:07:45PM -0600, Reza Arbab wrote:
> > >  #ifdef CONFIG_MEMORY_HOTPLUG
> > > +static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
> > > +{
> > > +	pte_t *pte;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < PTRS_PER_PTE; i++) {
> > > +		pte = pte_start + i;
> > > +		if (!pte_none(*pte))
> > > +			return;
> > 
> > If !pte_none() we fail the hotplug? Or silently
> > leave the allocated pte's around. I guess this is
> > the same as x86
> 
> The latter--it's not a failure. If you provided remove_pagetable() an
> unaligned address range, there could be a pte left unremoved at either end.
>

OK.
 
> > > +static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
> > > +			     unsigned long end)
> > > +{
> > > +	unsigned long next;
> > > +	pte_t *pte_base;
> > > +	pmd_t *pmd;
> > > +
> > > +	pmd = pmd_start + pmd_index(addr);
> > > +	for (; addr < end; addr = next, pmd++) {
> > > +		next = pmd_addr_end(addr, end);
> > > +
> > > +		if (!pmd_present(*pmd))
> > > +			continue;
> > > +
> > > +		if (pmd_huge(*pmd)) {
> > > +			pte_clear(&init_mm, addr, (pte_t *)pmd);
> > 
> > pmd_clear()?
> 
> I used pte_clear() to mirror what happens in radix__map_kernel_page():
> 
> 		if (map_page_size == PMD_SIZE) {
> 			ptep = (pte_t *)pmdp;
> 			goto set_the_pte;
> 		}
> 
> 		[...]
> 
> 	set_the_pte:
> 		set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, flags));
> 
> Would pmd_clear() be equivalent, since the pointer got set like a pte?

But we are still setting a pmdp. pmd_clear() will set the pmd to 0,
pte_clear() will go through the pte_update() mechanism which is expensive
IMHO and we may not need to do it.

> 
> > > +static void remove_pagetable(unsigned long start, unsigned long end)
> > > +{
> > > +	unsigned long addr, next;
> > > +	pud_t *pud_base;
> > > +	pgd_t *pgd;
> > > +
> > > +	spin_lock(&init_mm.page_table_lock);
> > > +
> > 
> > x86 does more granular lock acquisition only during
> > clearing the relevant entries. I suppose we don't have
> > to worry about it since its not fast path and frequent.
> 
> Yep. Ben thought the locking in remove_pte_table() was actually too
> granular, and Aneesh questioned what was being protected in the first place.
> So I left one lock/unlock in the outermost function for now.
>

Fair enough

Balbir Singh. 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping()
  2017-01-17 18:36     ` Reza Arbab
@ 2017-01-18  1:53       ` Balbir Singh
  0 siblings, 0 replies; 19+ messages in thread
From: Balbir Singh @ 2017-01-18  1:53 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Balbir Singh, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Aneesh Kumar K.V, Alistair Popple

On Tue, Jan 17, 2017 at 12:36:53PM -0600, Reza Arbab wrote:
> On Tue, Jan 17, 2017 at 12:55:13PM +0530, Balbir Singh wrote:
> > On Mon, Jan 16, 2017 at 01:07:46PM -0600, Reza Arbab wrote:
> > > Use remove_pagetable() and friends for radix vmemmap removal.
> > > 
> > > We do not require the special-case handling of vmemmap done in the x86
> > > versions of these functions. This is because vmemmap_free() has already
> > > freed the mapped pages, and calls us with an aligned address range.
> > > 
> > > So, add a few failsafe WARNs, but otherwise the code to remove physical
> > > mappings is already sufficient for vmemmap.
> > 
> > I wonder if we really need them?
> 
> Not sure what the guideline is for a "this shouldn't happen" WARN. It could
> save us some grief, should our vmemmap code ever start calling with an
> unaligned range, like it does on x86.
>

Fair enough

Acked-by: Balbir Singh <bsingharora@gmail.com> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-16 19:07 ` [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
  2017-01-17  6:46   ` Balbir Singh
@ 2017-01-30  8:38   ` Michael Ellerman
  2017-01-30 17:28     ` Reza Arbab
  2017-02-01  1:05   ` [v5,1/4] " Michael Ellerman
  2 siblings, 1 reply; 19+ messages in thread
From: Michael Ellerman @ 2017-01-30  8:38 UTC (permalink / raw)
  To: Reza Arbab, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-ra=
dix.c
> index 623a0dc..2ce1354 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -107,54 +107,66 @@ int radix__map_kernel_page(unsigned long ea, unsign=
ed long pa,
>  	return 0;
>  }
>=20=20
> +static inline void __meminit print_mapping(unsigned long start,
> +					   unsigned long end,
> +					   unsigned long size)
> +{
> +	if (end <=3D start)
> +		return;
> +
> +	pr_info("Mapped range 0x%lx - 0x%lx with 0x%lx\n", start, end, size);
> +}
> +
> +static int __meminit create_physical_mapping(unsigned long start,
> +					     unsigned long end)
> +{
> +	unsigned long addr, mapping_size;
> +
> +	start =3D _ALIGN_UP(start, PAGE_SIZE);
> +	for (addr =3D start; addr < end; addr +=3D mapping_size) {
> +		unsigned long gap, previous_size;
> +		int rc;
> +
> +		gap =3D end - addr;
> +		previous_size =3D mapping_size;
> +
> +		if (IS_ALIGNED(addr, PUD_SIZE) && gap >=3D PUD_SIZE &&
> +		    mmu_psize_defs[MMU_PAGE_1G].shift)
> +			mapping_size =3D PUD_SIZE;
> +		else if (IS_ALIGNED(addr, PMD_SIZE) && gap >=3D PMD_SIZE &&
> +			 mmu_psize_defs[MMU_PAGE_2M].shift)
> +			mapping_size =3D PMD_SIZE;
> +		else
> +			mapping_size =3D PAGE_SIZE;
> +
> +		if (mapping_size !=3D previous_size) {
> +			print_mapping(start, addr, previous_size);
> +			start =3D addr;
> +		}
> +
> +		rc =3D radix__map_kernel_page((unsigned long)__va(addr), addr,
> +					    PAGE_KERNEL_X, mapping_size);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	print_mapping(start, addr, mapping_size);

Doesn't build.


In file included from ../include/linux/kernel.h:13:0,
                 from ../include/linux/sched.h:17,
                 from ../arch/powerpc/mm/pgtable-radix.c:11:
../arch/powerpc/mm/pgtable-radix.c: In function =E2=80=98create_physical_ma=
pping=E2=80=99:
../include/linux/printk.h:299:2: error: =E2=80=98mapping_size=E2=80=99 may =
be used uninitialized in this function [-Werror=3Dmaybe-uninitialized]
  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  ^~~~~~
../arch/powerpc/mm/pgtable-radix.c:123:22: note: =E2=80=98mapping_size=E2=
=80=99 was declared here
  unsigned long addr, mapping_size;
                      ^~~~~~~~~~~~


cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-30  8:38   ` Michael Ellerman
@ 2017-01-30 17:28     ` Reza Arbab
  2017-01-30 21:58       ` Michael Ellerman
  0 siblings, 1 reply; 19+ messages in thread
From: Reza Arbab @ 2017-01-30 17:28 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Aneesh Kumar K.V, Balbir Singh, Alistair Popple

On Mon, Jan 30, 2017 at 07:38:18PM +1100, Michael Ellerman wrote:
>Doesn't build.
>
>In file included from ../include/linux/kernel.h:13:0,
>                 from ../include/linux/sched.h:17,
>                 from ../arch/powerpc/mm/pgtable-radix.c:11:
>../arch/powerpc/mm/pgtable-radix.c: In function ‘create_physical_mapping’:
>../include/linux/printk.h:299:2: error: ‘mapping_size’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
>  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>  ^~~~~~
>../arch/powerpc/mm/pgtable-radix.c:123:22: note: ‘mapping_size’ was declared here
>  unsigned long addr, mapping_size;

Doh. Could you please do the following for now?

-	unsigned long addr, mapping_size;
+	unsigned long addr, mapping_size = 0;

I'd like to delay spinning v6 with this until I see any input you might 
have on the rest of the set.

And for future reference, how are you ending up with 
-Werror=maybe-uninitialized? On powerpc/next, with pseries_le_defconfig, 
I get -Wno-maybe-uninitialized.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-30 17:28     ` Reza Arbab
@ 2017-01-30 21:58       ` Michael Ellerman
  0 siblings, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2017-01-30 21:58 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Aneesh Kumar K.V, Balbir Singh, Alistair Popple

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> On Mon, Jan 30, 2017 at 07:38:18PM +1100, Michael Ellerman wrote:
>>Doesn't build.
>>
>>In file included from ../include/linux/kernel.h:13:0,
>>                 from ../include/linux/sched.h:17,
>>                 from ../arch/powerpc/mm/pgtable-radix.c:11:
>>../arch/powerpc/mm/pgtable-radix.c: In function =E2=80=98create_physical_=
mapping=E2=80=99:
>>../include/linux/printk.h:299:2: error: =E2=80=98mapping_size=E2=80=99 ma=
y be used uninitialized in this function [-Werror=3Dmaybe-uninitialized]
>>  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>>  ^~~~~~
>>../arch/powerpc/mm/pgtable-radix.c:123:22: note: =E2=80=98mapping_size=E2=
=80=99 was declared here
>>  unsigned long addr, mapping_size;
>
> Doh. Could you please do the following for now?
>
> -	unsigned long addr, mapping_size;
> +	unsigned long addr, mapping_size =3D 0;

Thanks.

> I'd like to delay spinning v6 with this until I see any input you might=20
> have on the rest of the set.

I don't think I have any at the moment. So I'll just fold the above into
v5.

> And for future reference, how are you ending up with=20
> -Werror=3Dmaybe-uninitialized? On powerpc/next, with pseries_le_defconfig=
,=20
> I get -Wno-maybe-uninitialized.

By default. Probably you're not getting it because your compiler is too
old. If I'm reading Makefile correctly it's enabled for GCC 4.9 or later:

  KBUILD_CFLAGS +=3D $(call cc-ifversion, -lt, 0409, \
  			$(call cc-disable-warning,maybe-uninitialized,))

I'm using GCC 6.2.0.

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v5,1/4] powerpc/mm: refactor radix physical page mapping
  2017-01-16 19:07 ` [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
  2017-01-17  6:46   ` Balbir Singh
  2017-01-30  8:38   ` Michael Ellerman
@ 2017-02-01  1:05   ` Michael Ellerman
  2 siblings, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2017-02-01  1:05 UTC (permalink / raw)
  To: Reza Arbab, Benjamin Herrenschmidt, Paul Mackerras
  Cc: Alistair Popple, linuxppc-dev, Aneesh Kumar K.V

On Mon, 2017-01-16 at 19:07:43 UTC, Reza Arbab wrote:
> Move the page mapping code in radix_init_pgtable() into a separate
> function that will also be used for memory hotplug.
> 
> The current goto loop progressively decreases its mapping size as it
> covers the tail of a range whose end is unaligned. Change this to a for
> loop which can do the same for both ends of the range.
> 
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b5200ec9edf038459619fce9988842

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-01  1:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 19:07 [PATCH v5 0/4] powerpc/mm: enable memory hotplug on radix Reza Arbab
2017-01-16 19:07 ` [PATCH v5 1/4] powerpc/mm: refactor radix physical page mapping Reza Arbab
2017-01-17  6:46   ` Balbir Singh
2017-01-17 18:34     ` Reza Arbab
2017-01-18  1:14       ` Balbir Singh
2017-01-30  8:38   ` Michael Ellerman
2017-01-30 17:28     ` Reza Arbab
2017-01-30 21:58       ` Michael Ellerman
2017-02-01  1:05   ` [v5,1/4] " Michael Ellerman
2017-01-16 19:07 ` [PATCH v5 2/4] powerpc/mm: add radix__create_section_mapping() Reza Arbab
2017-01-17  6:48   ` Balbir Singh
2017-01-16 19:07 ` [PATCH v5 3/4] powerpc/mm: add radix__remove_section_mapping() Reza Arbab
2017-01-17  7:22   ` Balbir Singh
2017-01-17 18:36     ` Reza Arbab
2017-01-18  1:22       ` Balbir Singh
2017-01-16 19:07 ` [PATCH v5 4/4] powerpc/mm: unstub radix__vmemmap_remove_mapping() Reza Arbab
2017-01-17  7:25   ` Balbir Singh
2017-01-17 18:36     ` Reza Arbab
2017-01-18  1:53       ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.