linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled
@ 2022-08-01  8:04 Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 1/4] arm64: introduce have_zone_dma() helper Mike Rapoport
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Mike Rapoport @ 2022-08-01  8:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Guanghui Feng, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

Hi,

There were several rounds of discussion how to remap with base pages only
the crash kernel area, the latest one here:

https://lore.kernel.org/all/1656777473-73887-1-git-send-email-guanghuifeng@linux.alibaba.com

and this is my attempt to allow having both large pages in the linear map
and protection for the crash kernel memory.

For server systems it is important to protect crash kernel memory for
post-mortem analysis, and for that protection to work the crash kernel
memory should be mapped with base pages in the linear map. 

On the systems with ZONE_DMA/DMA32 enabled, crash kernel reservation
happens after the linear map is created and the current code forces using
base pages for the entire linear map, which results in performance
degradation.

These patches enable remapping of the crash kernel area with base pages
while keeping large pages in the rest of the linear map.

The idea is to align crash kernel reservation to PUD boundaries, remap that
PUD and then free the extra memory.

For now the remapping does not deal with the case when crash kernel base is
specified, but this won't be a problem to add if the idea is generally
acceptable.

Mike Rapoport (4):
  arm64: introduce have_zone_dma() helper
  arm64/mmu: drop _hotplug from unmap_hotplug_* function names
  arm64/mmu: move helpers for hotplug page tables freeing close to callers
  arm64/mm: remap crash kernel with base pages even if rodata_full disabled

 arch/arm64/include/asm/memory.h |   8 +++
 arch/arm64/include/asm/mmu.h    |   2 +
 arch/arm64/mm/init.c            |  44 ++++++++++--
 arch/arm64/mm/mmu.c             | 116 ++++++++++++++++++++------------
 4 files changed, 122 insertions(+), 48 deletions(-)

-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/4] arm64: introduce have_zone_dma() helper
  2022-08-01  8:04 [RFC PATCH 0/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
@ 2022-08-01  8:04 ` Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 2/4] arm64/mmu: drop _hotplug from unmap_hotplug_* function names Mike Rapoport
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Mike Rapoport @ 2022-08-01  8:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Guanghui Feng, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

rather than open-code the check whether CONFIG_ZONE_DMA or
CONFIG_ZONE_DMA32 are enabled.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/include/asm/memory.h | 8 ++++++++
 arch/arm64/mm/init.c            | 4 ++--
 arch/arm64/mm/mmu.c             | 6 ++----
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0af70d9abede..fa89d3bded8b 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -351,6 +351,14 @@ static inline void *phys_to_virt(phys_addr_t x)
 })
 
 void dump_mem_limit(void);
+
+static inline bool have_zone_dma(void)
+{
+	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
+		return true;
+
+	return false;
+}
 #endif /* !ASSEMBLY */
 
 /*
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 339ee84e5a61..fa2260040c0f 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -389,7 +389,7 @@ void __init arm64_memblock_init(void)
 
 	early_init_fdt_scan_reserved_mem();
 
-	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
+	if (!have_zone_dma())
 		reserve_crashkernel();
 
 	high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -438,7 +438,7 @@ void __init bootmem_init(void)
 	 * request_standard_resources() depends on crashkernel's memory being
 	 * reserved, so do it here.
 	 */
-	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
+	if (have_zone_dma())
 		reserve_crashkernel();
 
 	memblock_dump_all();
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 626ec32873c6..d170b7956b01 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -529,8 +529,7 @@ static void __init map_mem(pgd_t *pgdp)
 
 #ifdef CONFIG_KEXEC_CORE
 	if (crash_mem_map) {
-		if (IS_ENABLED(CONFIG_ZONE_DMA) ||
-		    IS_ENABLED(CONFIG_ZONE_DMA32))
+		if (have_zone_dma())
 			flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 		else if (crashk_res.end)
 			memblock_mark_nomap(crashk_res.start,
@@ -571,8 +570,7 @@ static void __init map_mem(pgd_t *pgdp)
 	 * through /sys/kernel/kexec_crash_size interface.
 	 */
 #ifdef CONFIG_KEXEC_CORE
-	if (crash_mem_map &&
-	    !IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
+	if (crash_mem_map && !have_zone_dma()) {
 		if (crashk_res.end) {
 			__map_memblock(pgdp, crashk_res.start,
 				       crashk_res.end + 1,
-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/4] arm64/mmu: drop _hotplug from unmap_hotplug_* function names
  2022-08-01  8:04 [RFC PATCH 0/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 1/4] arm64: introduce have_zone_dma() helper Mike Rapoport
@ 2022-08-01  8:04 ` Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 3/4] arm64/mmu: move helpers for hotplug page tables freeing close to callers Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 4/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
  3 siblings, 0 replies; 7+ messages in thread
From: Mike Rapoport @ 2022-08-01  8:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Guanghui Feng, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

so that they can be used for remapping crash kernel.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/mm/mmu.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d170b7956b01..baa2dda2dcce 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -861,7 +861,7 @@ static bool pgtable_range_aligned(unsigned long start, unsigned long end,
 	return true;
 }
 
-static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
+static void unmap_pte_range(pmd_t *pmdp, unsigned long addr,
 				    unsigned long end, bool free_mapped,
 				    struct vmem_altmap *altmap)
 {
@@ -882,7 +882,7 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
 	} while (addr += PAGE_SIZE, addr < end);
 }
 
-static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
+static void unmap_pmd_range(pud_t *pudp, unsigned long addr,
 				    unsigned long end, bool free_mapped,
 				    struct vmem_altmap *altmap)
 {
@@ -911,11 +911,11 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
 			continue;
 		}
 		WARN_ON(!pmd_table(pmd));
-		unmap_hotplug_pte_range(pmdp, addr, next, free_mapped, altmap);
+		unmap_pte_range(pmdp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
-static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
+static void unmap_pud_range(p4d_t *p4dp, unsigned long addr,
 				    unsigned long end, bool free_mapped,
 				    struct vmem_altmap *altmap)
 {
@@ -944,11 +944,11 @@ static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
 			continue;
 		}
 		WARN_ON(!pud_table(pud));
-		unmap_hotplug_pmd_range(pudp, addr, next, free_mapped, altmap);
+		unmap_pmd_range(pudp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
-static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
+static void unmap_p4d_range(pgd_t *pgdp, unsigned long addr,
 				    unsigned long end, bool free_mapped,
 				    struct vmem_altmap *altmap)
 {
@@ -963,11 +963,11 @@ static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
 			continue;
 
 		WARN_ON(!p4d_present(p4d));
-		unmap_hotplug_pud_range(p4dp, addr, next, free_mapped, altmap);
+		unmap_pud_range(p4dp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
-static void unmap_hotplug_range(unsigned long addr, unsigned long end,
+static void unmap_range(unsigned long addr, unsigned long end,
 				bool free_mapped, struct vmem_altmap *altmap)
 {
 	unsigned long next;
@@ -989,7 +989,7 @@ static void unmap_hotplug_range(unsigned long addr, unsigned long end,
 			continue;
 
 		WARN_ON(!pgd_present(pgd));
-		unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped, altmap);
+		unmap_p4d_range(pgdp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
 
@@ -1208,7 +1208,7 @@ void vmemmap_free(unsigned long start, unsigned long end,
 {
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
 
-	unmap_hotplug_range(start, end, true, altmap);
+	unmap_range(start, end, true, altmap);
 	free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
@@ -1472,7 +1472,7 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
 	WARN_ON(pgdir != init_mm.pgd);
 	WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END));
 
-	unmap_hotplug_range(start, end, false, NULL);
+	unmap_range(start, end, false, NULL);
 	free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
 }
 
-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 3/4] arm64/mmu: move helpers for hotplug page tables freeing close to callers
  2022-08-01  8:04 [RFC PATCH 0/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 1/4] arm64: introduce have_zone_dma() helper Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 2/4] arm64/mmu: drop _hotplug from unmap_hotplug_* function names Mike Rapoport
@ 2022-08-01  8:04 ` Mike Rapoport
  2022-08-01  8:04 ` [RFC PATCH 4/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
  3 siblings, 0 replies; 7+ messages in thread
From: Mike Rapoport @ 2022-08-01  8:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Guanghui Feng, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

to minimize extra ifdefery when unmap_*() methods will be used to remap
crash kernel.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/mm/mmu.c | 50 ++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index baa2dda2dcce..2f548fb2244c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -837,30 +837,6 @@ static void free_hotplug_page_range(struct page *page, size_t size,
 	}
 }
 
-static void free_hotplug_pgtable_page(struct page *page)
-{
-	free_hotplug_page_range(page, PAGE_SIZE, NULL);
-}
-
-static bool pgtable_range_aligned(unsigned long start, unsigned long end,
-				  unsigned long floor, unsigned long ceiling,
-				  unsigned long mask)
-{
-	start &= mask;
-	if (start < floor)
-		return false;
-
-	if (ceiling) {
-		ceiling &= mask;
-		if (!ceiling)
-			return false;
-	}
-
-	if (end - 1 > ceiling - 1)
-		return false;
-	return true;
-}
-
 static void unmap_pte_range(pmd_t *pmdp, unsigned long addr,
 				    unsigned long end, bool free_mapped,
 				    struct vmem_altmap *altmap)
@@ -993,6 +969,30 @@ static void unmap_range(unsigned long addr, unsigned long end,
 	} while (addr = next, addr < end);
 }
 
+static bool pgtable_range_aligned(unsigned long start, unsigned long end,
+				  unsigned long floor, unsigned long ceiling,
+				  unsigned long mask)
+{
+	start &= mask;
+	if (start < floor)
+		return false;
+
+	if (ceiling) {
+		ceiling &= mask;
+		if (!ceiling)
+			return false;
+	}
+
+	if (end - 1 > ceiling - 1)
+		return false;
+	return true;
+}
+
+static void free_hotplug_pgtable_page(struct page *page)
+{
+	free_hotplug_page_range(page, PAGE_SIZE, NULL);
+}
+
 static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
 				 unsigned long end, unsigned long floor,
 				 unsigned long ceiling)
@@ -1146,7 +1146,7 @@ static void free_empty_tables(unsigned long addr, unsigned long end,
 		free_empty_p4d_table(pgdp, addr, next, floor, ceiling);
 	} while (addr = next, addr < end);
 }
-#endif
+#endif /* CONFIG_MEMORY_HOTPLUG */
 
 #if !ARM64_KERNEL_USES_PMD_MAPS
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 4/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled
  2022-08-01  8:04 [RFC PATCH 0/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
                   ` (2 preceding siblings ...)
  2022-08-01  8:04 ` [RFC PATCH 3/4] arm64/mmu: move helpers for hotplug page tables freeing close to callers Mike Rapoport
@ 2022-08-01  8:04 ` Mike Rapoport
  2022-08-01 10:22   ` Ard Biesheuvel
  3 siblings, 1 reply; 7+ messages in thread
From: Mike Rapoport @ 2022-08-01  8:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Guanghui Feng, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

For server systems it is important to protect crash kernel memory for
post-mortem analysis. In order to protect this memory it should be mapped
at PTE level.

When CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled, usage of crash kernel
essentially forces mapping of the entire linear map with base pages even if
rodata_full is not set (commit 2687275a5843 ("arm64: Force
NO_BLOCK_MAPPINGS if crashkernel reservation is required")) and this causes
performance degradation.

With ZONE_DMA/DMA32 enabled, the crash kernel memory is reserved after
the linear map is created, but before multiprocessing and multithreading
are enabled, so it is safe to remap the crash kernel memory with base
pages as long as the page table entries that would be changed do not map
the memory that might be accessed during the remapping.

To ensure there are no memory accesses in the range that will be
remapped, align crash memory reservation to PUD_SIZE boundaries, remap
the entire PUD-aligned area and than free the memory that was allocated
beyond the crash_size requested by the user.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/include/asm/mmu.h |  2 ++
 arch/arm64/mm/init.c         | 40 ++++++++++++++++++++++++++++++++++--
 arch/arm64/mm/mmu.c          | 40 +++++++++++++++++++++++++++++++-----
 3 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 48f8466a4be9..d9829a7def69 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -71,6 +71,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot);
 extern void mark_linear_text_alias_ro(void);
 extern bool kaslr_requires_kpti(void);
+extern int remap_crashkernel(phys_addr_t start, phys_addr_t size,
+			     phys_addr_t aligned_size);
 
 #define INIT_MM_CONTEXT(name)	\
 	.pgd = init_pg_dir,
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index fa2260040c0f..be74e091bef7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -40,6 +40,7 @@
 #include <asm/memory.h>
 #include <asm/numa.h>
 #include <asm/sections.h>
+#include <asm/set_memory.h>
 #include <asm/setup.h>
 #include <linux/sizes.h>
 #include <asm/tlb.h>
@@ -116,6 +117,38 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
 	return 0;
 }
 
+static unsigned long long __init
+reserve_remap_crashkernel(unsigned long long crash_base,
+			  unsigned long long crash_size,
+			  unsigned long long crash_max)
+{
+	unsigned long long size;
+
+	if (!have_zone_dma())
+		return 0;
+
+	if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
+		return 0;
+
+	if (crash_base)
+		return 0;
+
+	size = ALIGN(crash_size, PUD_SIZE);
+
+	crash_base = memblock_phys_alloc_range(size, PUD_SIZE, 0, crash_max);
+	if (!crash_base)
+		return 0;
+
+	if (remap_crashkernel(crash_base, crash_size, size)) {
+		memblock_phys_free(crash_base, size);
+		return 0;
+	}
+
+	memblock_phys_free(crash_base + crash_size, size - crash_size);
+
+	return crash_base;
+}
+
 /*
  * reserve_crashkernel() - reserves memory for crash kernel
  *
@@ -162,8 +195,11 @@ static void __init reserve_crashkernel(void)
 	if (crash_base)
 		crash_max = crash_base + crash_size;
 
-	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
-					       crash_base, crash_max);
+	crash_base = reserve_remap_crashkernel(crash_base, crash_size,
+					       crash_max);
+	if (!crash_base)
+		crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
+						       crash_base, crash_max);
 	if (!crash_base) {
 		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
 			crash_size);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2f548fb2244c..183936775fab 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -528,10 +528,8 @@ static void __init map_mem(pgd_t *pgdp)
 	memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
 
 #ifdef CONFIG_KEXEC_CORE
-	if (crash_mem_map) {
-		if (have_zone_dma())
-			flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
-		else if (crashk_res.end)
+	if (crash_mem_map && !have_zone_dma()) {
+		if (crashk_res.end)
 			memblock_mark_nomap(crashk_res.start,
 			    resource_size(&crashk_res));
 	}
@@ -825,7 +823,7 @@ int kern_addr_valid(unsigned long addr)
 	return pfn_valid(pte_pfn(pte));
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
+#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_KEXEC_CORE)
 static void free_hotplug_page_range(struct page *page, size_t size,
 				    struct vmem_altmap *altmap)
 {
@@ -968,7 +966,9 @@ static void unmap_range(unsigned long addr, unsigned long end,
 		unmap_p4d_range(pgdp, addr, next, free_mapped, altmap);
 	} while (addr = next, addr < end);
 }
+#endif /* CONFIG_MEMORY_HOTPLUG || CONFIG_KEXEC_CORE  */
 
+#ifdef CONFIG_MEMORY_HOTPLUG
 static bool pgtable_range_aligned(unsigned long start, unsigned long end,
 				  unsigned long floor, unsigned long ceiling,
 				  unsigned long mask)
@@ -1213,6 +1213,36 @@ void vmemmap_free(unsigned long start, unsigned long end,
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
+int __init remap_crashkernel(phys_addr_t start, phys_addr_t size,
+			     phys_addr_t aligned_size)
+{
+#ifdef CONFIG_KEXEC_CORE
+	phys_addr_t end = start + size;
+	phys_addr_t aligned_end = start + aligned_size;
+
+	if (!IS_ALIGNED(start, PUD_SIZE) || !IS_ALIGNED(aligned_end, PUD_SIZE))
+		return -EINVAL;
+
+	/* Clear PUDs containing crash kernel memory */
+	unmap_range(__phys_to_virt(start), __phys_to_virt(aligned_end),
+		    false, NULL);
+
+	/* map crash kernel memory with base pages */
+	__create_pgd_mapping(swapper_pg_dir, start,  __phys_to_virt(start),
+			     size, PAGE_KERNEL, early_pgtable_alloc,
+			     NO_EXEC_MAPPINGS | NO_BLOCK_MAPPINGS |
+			     NO_CONT_MAPPINGS);
+
+	/* map area from end of crash kernel to PUD end with large pages */
+	size = aligned_end - end;
+	if (size)
+		__create_pgd_mapping(swapper_pg_dir, end, __phys_to_virt(end),
+				     size, PAGE_KERNEL, early_pgtable_alloc, 0);
+#endif
+
+	return 0;
+}
+
 static inline pud_t *fixmap_pud(unsigned long addr)
 {
 	pgd_t *pgdp = pgd_offset_k(addr);
-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 4/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled
  2022-08-01  8:04 ` [RFC PATCH 4/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
@ 2022-08-01 10:22   ` Ard Biesheuvel
  2022-08-01 10:33     ` Mike Rapoport
  0 siblings, 1 reply; 7+ messages in thread
From: Ard Biesheuvel @ 2022-08-01 10:22 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Linux ARM, Catalin Marinas, Guanghui Feng, Mark Rutland,
	Mike Rapoport, Will Deacon, Linux Kernel Mailing List,
	Linux Memory Management List

Hello Mike,

On Mon, 1 Aug 2022 at 10:04, Mike Rapoport <rppt@kernel.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> For server systems it is important to protect crash kernel memory for
> post-mortem analysis. In order to protect this memory it should be mapped
> at PTE level.
>
> When CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled, usage of crash kernel
> essentially forces mapping of the entire linear map with base pages even if
> rodata_full is not set (commit 2687275a5843 ("arm64: Force
> NO_BLOCK_MAPPINGS if crashkernel reservation is required")) and this causes
> performance degradation.
>
> With ZONE_DMA/DMA32 enabled, the crash kernel memory is reserved after
> the linear map is created, but before multiprocessing and multithreading
> are enabled, so it is safe to remap the crash kernel memory with base
> pages as long as the page table entries that would be changed do not map
> the memory that might be accessed during the remapping.
>
> To ensure there are no memory accesses in the range that will be
> remapped, align crash memory reservation to PUD_SIZE boundaries, remap
> the entire PUD-aligned area and than free the memory that was allocated
> beyond the crash_size requested by the user.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/arm64/include/asm/mmu.h |  2 ++
>  arch/arm64/mm/init.c         | 40 ++++++++++++++++++++++++++++++++++--
>  arch/arm64/mm/mmu.c          | 40 +++++++++++++++++++++++++++++++-----
>  3 files changed, 75 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 48f8466a4be9..d9829a7def69 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -71,6 +71,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>  extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot);
>  extern void mark_linear_text_alias_ro(void);
>  extern bool kaslr_requires_kpti(void);
> +extern int remap_crashkernel(phys_addr_t start, phys_addr_t size,
> +                            phys_addr_t aligned_size);
>
>  #define INIT_MM_CONTEXT(name)  \
>         .pgd = init_pg_dir,
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index fa2260040c0f..be74e091bef7 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -40,6 +40,7 @@
>  #include <asm/memory.h>
>  #include <asm/numa.h>
>  #include <asm/sections.h>
> +#include <asm/set_memory.h>
>  #include <asm/setup.h>
>  #include <linux/sizes.h>
>  #include <asm/tlb.h>
> @@ -116,6 +117,38 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
>         return 0;
>  }
>
> +static unsigned long long __init
> +reserve_remap_crashkernel(unsigned long long crash_base,
> +                         unsigned long long crash_size,
> +                         unsigned long long crash_max)
> +{
> +       unsigned long long size;
> +
> +       if (!have_zone_dma())
> +               return 0;
> +
> +       if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
> +               return 0;
> +
> +       if (crash_base)
> +               return 0;
> +
> +       size = ALIGN(crash_size, PUD_SIZE);
> +
> +       crash_base = memblock_phys_alloc_range(size, PUD_SIZE, 0, crash_max);
> +       if (!crash_base)
> +               return 0;
> +
> +       if (remap_crashkernel(crash_base, crash_size, size)) {
> +               memblock_phys_free(crash_base, size);
> +               return 0;
> +       }
> +
> +       memblock_phys_free(crash_base + crash_size, size - crash_size);
> +
> +       return crash_base;
> +}
> +
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel
>   *
> @@ -162,8 +195,11 @@ static void __init reserve_crashkernel(void)
>         if (crash_base)
>                 crash_max = crash_base + crash_size;
>
> -       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> -                                              crash_base, crash_max);
> +       crash_base = reserve_remap_crashkernel(crash_base, crash_size,
> +                                              crash_max);
> +       if (!crash_base)
> +               crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> +                                                      crash_base, crash_max);
>         if (!crash_base) {
>                 pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>                         crash_size);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 2f548fb2244c..183936775fab 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -528,10 +528,8 @@ static void __init map_mem(pgd_t *pgdp)
>         memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
>
>  #ifdef CONFIG_KEXEC_CORE
> -       if (crash_mem_map) {
> -               if (have_zone_dma())
> -                       flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> -               else if (crashk_res.end)
> +       if (crash_mem_map && !have_zone_dma()) {
> +               if (crashk_res.end)
>                         memblock_mark_nomap(crashk_res.start,
>                             resource_size(&crashk_res));
>         }
> @@ -825,7 +823,7 @@ int kern_addr_valid(unsigned long addr)
>         return pfn_valid(pte_pfn(pte));
>  }
>
> -#ifdef CONFIG_MEMORY_HOTPLUG
> +#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_KEXEC_CORE)
>  static void free_hotplug_page_range(struct page *page, size_t size,
>                                     struct vmem_altmap *altmap)
>  {
> @@ -968,7 +966,9 @@ static void unmap_range(unsigned long addr, unsigned long end,
>                 unmap_p4d_range(pgdp, addr, next, free_mapped, altmap);
>         } while (addr = next, addr < end);
>  }
> +#endif /* CONFIG_MEMORY_HOTPLUG || CONFIG_KEXEC_CORE  */
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
>  static bool pgtable_range_aligned(unsigned long start, unsigned long end,
>                                   unsigned long floor, unsigned long ceiling,
>                                   unsigned long mask)
> @@ -1213,6 +1213,36 @@ void vmemmap_free(unsigned long start, unsigned long end,
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>
> +int __init remap_crashkernel(phys_addr_t start, phys_addr_t size,
> +                            phys_addr_t aligned_size)
> +{
> +#ifdef CONFIG_KEXEC_CORE
> +       phys_addr_t end = start + size;
> +       phys_addr_t aligned_end = start + aligned_size;
> +
> +       if (!IS_ALIGNED(start, PUD_SIZE) || !IS_ALIGNED(aligned_end, PUD_SIZE))
> +               return -EINVAL;
> +
> +       /* Clear PUDs containing crash kernel memory */
> +       unmap_range(__phys_to_virt(start), __phys_to_virt(aligned_end),
> +                   false, NULL);
> +

Why is this safe? This runs after paging_init(), so you are unmapping
a PUD that is live, and could already be in use, no?

> +       /* map crash kernel memory with base pages */
> +       __create_pgd_mapping(swapper_pg_dir, start,  __phys_to_virt(start),
> +                            size, PAGE_KERNEL, early_pgtable_alloc,
> +                            NO_EXEC_MAPPINGS | NO_BLOCK_MAPPINGS |
> +                            NO_CONT_MAPPINGS);
> +
> +       /* map area from end of crash kernel to PUD end with large pages */
> +       size = aligned_end - end;
> +       if (size)
> +               __create_pgd_mapping(swapper_pg_dir, end, __phys_to_virt(end),
> +                                    size, PAGE_KERNEL, early_pgtable_alloc, 0);
> +#endif
> +
> +       return 0;
> +}
> +
>  static inline pud_t *fixmap_pud(unsigned long addr)
>  {
>         pgd_t *pgdp = pgd_offset_k(addr);
> --
> 2.35.3
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 4/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled
  2022-08-01 10:22   ` Ard Biesheuvel
@ 2022-08-01 10:33     ` Mike Rapoport
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Rapoport @ 2022-08-01 10:33 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux ARM, Catalin Marinas, Guanghui Feng, Mark Rutland,
	Mike Rapoport, Will Deacon, Linux Kernel Mailing List,
	Linux Memory Management List

On Mon, Aug 01, 2022 at 12:22:47PM +0200, Ard Biesheuvel wrote:
> Hello Mike,
> 
> On Mon, 1 Aug 2022 at 10:04, Mike Rapoport <rppt@kernel.org> wrote:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > For server systems it is important to protect crash kernel memory for
> > post-mortem analysis. In order to protect this memory it should be mapped
> > at PTE level.
> >
> > When CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled, usage of crash kernel
> > essentially forces mapping of the entire linear map with base pages even if
> > rodata_full is not set (commit 2687275a5843 ("arm64: Force
> > NO_BLOCK_MAPPINGS if crashkernel reservation is required")) and this causes
> > performance degradation.
> >
> > With ZONE_DMA/DMA32 enabled, the crash kernel memory is reserved after
> > the linear map is created, but before multiprocessing and multithreading
> > are enabled, so it is safe to remap the crash kernel memory with base
> > pages as long as the page table entries that would be changed do not map
> > the memory that might be accessed during the remapping.
> >
> > To ensure there are no memory accesses in the range that will be
> > remapped, align crash memory reservation to PUD_SIZE boundaries, remap
> > the entire PUD-aligned area and than free the memory that was allocated
> > beyond the crash_size requested by the user.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  arch/arm64/include/asm/mmu.h |  2 ++
> >  arch/arm64/mm/init.c         | 40 ++++++++++++++++++++++++++++++++++--
> >  arch/arm64/mm/mmu.c          | 40 +++++++++++++++++++++++++++++++-----
> >  3 files changed, 75 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> > index 48f8466a4be9..d9829a7def69 100644
> > --- a/arch/arm64/include/asm/mmu.h
> > +++ b/arch/arm64/include/asm/mmu.h
> > @@ -71,6 +71,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> >  extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot);
> >  extern void mark_linear_text_alias_ro(void);
> >  extern bool kaslr_requires_kpti(void);
> > +extern int remap_crashkernel(phys_addr_t start, phys_addr_t size,
> > +                            phys_addr_t aligned_size);
> >
> >  #define INIT_MM_CONTEXT(name)  \
> >         .pgd = init_pg_dir,
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index fa2260040c0f..be74e091bef7 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -40,6 +40,7 @@
> >  #include <asm/memory.h>
> >  #include <asm/numa.h>
> >  #include <asm/sections.h>
> > +#include <asm/set_memory.h>
> >  #include <asm/setup.h>
> >  #include <linux/sizes.h>
> >  #include <asm/tlb.h>
> > @@ -116,6 +117,38 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
> >         return 0;
> >  }
> >
> > +static unsigned long long __init
> > +reserve_remap_crashkernel(unsigned long long crash_base,
> > +                         unsigned long long crash_size,
> > +                         unsigned long long crash_max)
> > +{
> > +       unsigned long long size;
> > +
> > +       if (!have_zone_dma())
> > +               return 0;
> > +
> > +       if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
> > +               return 0;
> > +
> > +       if (crash_base)
> > +               return 0;
> > +
> > +       size = ALIGN(crash_size, PUD_SIZE);
> > +
> > +       crash_base = memblock_phys_alloc_range(size, PUD_SIZE, 0, crash_max);
> > +       if (!crash_base)
> > +               return 0;
> > +
> > +       if (remap_crashkernel(crash_base, crash_size, size)) {
> > +               memblock_phys_free(crash_base, size);
> > +               return 0;
> > +       }
> > +
> > +       memblock_phys_free(crash_base + crash_size, size - crash_size);
> > +
> > +       return crash_base;
> > +}
> > +
> >  /*
> >   * reserve_crashkernel() - reserves memory for crash kernel
> >   *
> > @@ -162,8 +195,11 @@ static void __init reserve_crashkernel(void)
> >         if (crash_base)
> >                 crash_max = crash_base + crash_size;
> >
> > -       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> > -                                              crash_base, crash_max);
> > +       crash_base = reserve_remap_crashkernel(crash_base, crash_size,
> > +                                              crash_max);
> > +       if (!crash_base)
> > +               crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> > +                                                      crash_base, crash_max);
> >         if (!crash_base) {
> >                 pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> >                         crash_size);
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 2f548fb2244c..183936775fab 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -528,10 +528,8 @@ static void __init map_mem(pgd_t *pgdp)
> >         memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> >
> >  #ifdef CONFIG_KEXEC_CORE
> > -       if (crash_mem_map) {
> > -               if (have_zone_dma())
> > -                       flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > -               else if (crashk_res.end)
> > +       if (crash_mem_map && !have_zone_dma()) {
> > +               if (crashk_res.end)
> >                         memblock_mark_nomap(crashk_res.start,
> >                             resource_size(&crashk_res));
> >         }
> > @@ -825,7 +823,7 @@ int kern_addr_valid(unsigned long addr)
> >         return pfn_valid(pte_pfn(pte));
> >  }
> >
> > -#ifdef CONFIG_MEMORY_HOTPLUG
> > +#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_KEXEC_CORE)
> >  static void free_hotplug_page_range(struct page *page, size_t size,
> >                                     struct vmem_altmap *altmap)
> >  {
> > @@ -968,7 +966,9 @@ static void unmap_range(unsigned long addr, unsigned long end,
> >                 unmap_p4d_range(pgdp, addr, next, free_mapped, altmap);
> >         } while (addr = next, addr < end);
> >  }
> > +#endif /* CONFIG_MEMORY_HOTPLUG || CONFIG_KEXEC_CORE  */
> >
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> >  static bool pgtable_range_aligned(unsigned long start, unsigned long end,
> >                                   unsigned long floor, unsigned long ceiling,
> >                                   unsigned long mask)
> > @@ -1213,6 +1213,36 @@ void vmemmap_free(unsigned long start, unsigned long end,
> >  }
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >
> > +int __init remap_crashkernel(phys_addr_t start, phys_addr_t size,
> > +                            phys_addr_t aligned_size)
> > +{
> > +#ifdef CONFIG_KEXEC_CORE
> > +       phys_addr_t end = start + size;
> > +       phys_addr_t aligned_end = start + aligned_size;
> > +
> > +       if (!IS_ALIGNED(start, PUD_SIZE) || !IS_ALIGNED(aligned_end, PUD_SIZE))
> > +               return -EINVAL;
> > +
> > +       /* Clear PUDs containing crash kernel memory */
> > +       unmap_range(__phys_to_virt(start), __phys_to_virt(aligned_end),
> > +                   false, NULL);
> > +
> 
> Why is this safe? This runs after paging_init(), so you are unmapping
> a PUD that is live, and could already be in use, no?

The allocation request for crash kernel is extended to fill the entire PUD
and it is PUD-aligned, so if memblock_phys_alloc() in
reserve_remap_crashkernel() succeeds, the memory returned by it would be
mapped by one ore PUDs and these PUDs will only map that memory.

Since there is no multitasking yet, there is nothing that can access that
memory.
 
> > +       /* map crash kernel memory with base pages */
> > +       __create_pgd_mapping(swapper_pg_dir, start,  __phys_to_virt(start),
> > +                            size, PAGE_KERNEL, early_pgtable_alloc,
> > +                            NO_EXEC_MAPPINGS | NO_BLOCK_MAPPINGS |
> > +                            NO_CONT_MAPPINGS);
> > +
> > +       /* map area from end of crash kernel to PUD end with large pages */
> > +       size = aligned_end - end;
> > +       if (size)
> > +               __create_pgd_mapping(swapper_pg_dir, end, __phys_to_virt(end),
> > +                                    size, PAGE_KERNEL, early_pgtable_alloc, 0);
> > +#endif
> > +
> > +       return 0;
> > +}
> > +
> >  static inline pud_t *fixmap_pud(unsigned long addr)
> >  {
> >         pgd_t *pgdp = pgd_offset_k(addr);
> > --
> > 2.35.3
> >

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-01 10:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-01  8:04 [RFC PATCH 0/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
2022-08-01  8:04 ` [RFC PATCH 1/4] arm64: introduce have_zone_dma() helper Mike Rapoport
2022-08-01  8:04 ` [RFC PATCH 2/4] arm64/mmu: drop _hotplug from unmap_hotplug_* function names Mike Rapoport
2022-08-01  8:04 ` [RFC PATCH 3/4] arm64/mmu: move helpers for hotplug page tables freeing close to callers Mike Rapoport
2022-08-01  8:04 ` [RFC PATCH 4/4] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Mike Rapoport
2022-08-01 10:22   ` Ard Biesheuvel
2022-08-01 10:33     ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).