All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-24 15:54 ` Alexandre Ghiti
  0 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti

This patchset intends to improve tlb utilization by using hugepages for
the linear mapping.

As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
take care of isolating the kernel text and rodata so that they are not
mapped with a PUD mapping which would then assign wrong permissions to
the whole region: it is achieved the same way as arm64 by using the
memblock nomap API which isolates those regions and re-merge them afterwards
thus avoiding any issue with the system resources tree creation.

base-commit-tag: v6.3-rc1

v9:
- Remove new API and arm64 patches as it created more issues than it
  solved, thanks Anup for reporting those bugs!
- Add a patch that moves the linear mapping creation outside of setup_vm_final
- Use nomap API like arm64
- Removed RB from Andrew and Anup as the patch changed its logic
- Fix kernel rodata size computation

v8:
- Fix rv32, as reported by Anup
- Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
- Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
- Fix arm64 double mapping (which to me did not work in v7), but ends up not
  being pretty at all, will wait for comments from arm64 reviewers, but
  this patch can easily be dropped if they do not want it.

v7:
- Fix Anup bug report by introducing memblock_isolate_memory which
  allows us to split the memblock mappings and then avoid to map the
  the PUD which contains the kernel as read only
- Add a patch to arm64 to use this newly introduced API

v6:
- quiet LLVM warning by casting phys_ram_base into an unsigned long

v5:
- Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
  Conor
- Add RB from Andrew

v4:
- Rebase on top of v6.2-rc3, as noted by Conor
- Add Acked-by Rob

v3:
- Change the comment about initrd_start VA conversion so that it fits
  ARM64 and RISCV64 (and others in the future if needed), as suggested
  by Rob

v2:
- Add a comment on why RISCV64 does not need to set initrd_start/end that
  early in the boot process, as asked by Rob

Alexandre Ghiti (3):
  riscv: Get rid of riscv_pfn_base variable
  riscv: Move the linear mapping creation in its own function
  riscv: Use PUD/P4D/PGD pages for the linear mapping

 arch/riscv/include/asm/page.h |  19 ++++++-
 arch/riscv/mm/init.c          | 102 ++++++++++++++++++++++++++--------
 arch/riscv/mm/physaddr.c      |  16 ++++++
 drivers/of/fdt.c              |  11 ++--
 4 files changed, 118 insertions(+), 30 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-24 15:54 ` Alexandre Ghiti
  0 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti

This patchset intends to improve tlb utilization by using hugepages for
the linear mapping.

As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
take care of isolating the kernel text and rodata so that they are not
mapped with a PUD mapping which would then assign wrong permissions to
the whole region: it is achieved the same way as arm64 by using the
memblock nomap API which isolates those regions and re-merge them afterwards
thus avoiding any issue with the system resources tree creation.

base-commit-tag: v6.3-rc1

v9:
- Remove new API and arm64 patches as it created more issues than it
  solved, thanks Anup for reporting those bugs!
- Add a patch that moves the linear mapping creation outside of setup_vm_final
- Use nomap API like arm64
- Removed RB from Andrew and Anup as the patch changed its logic
- Fix kernel rodata size computation

v8:
- Fix rv32, as reported by Anup
- Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
- Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
- Fix arm64 double mapping (which to me did not work in v7), but ends up not
  being pretty at all, will wait for comments from arm64 reviewers, but
  this patch can easily be dropped if they do not want it.

v7:
- Fix Anup bug report by introducing memblock_isolate_memory which
  allows us to split the memblock mappings and then avoid to map the
  the PUD which contains the kernel as read only
- Add a patch to arm64 to use this newly introduced API

v6:
- quiet LLVM warning by casting phys_ram_base into an unsigned long

v5:
- Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
  Conor
- Add RB from Andrew

v4:
- Rebase on top of v6.2-rc3, as noted by Conor
- Add Acked-by Rob

v3:
- Change the comment about initrd_start VA conversion so that it fits
  ARM64 and RISCV64 (and others in the future if needed), as suggested
  by Rob

v2:
- Add a comment on why RISCV64 does not need to set initrd_start/end that
  early in the boot process, as asked by Rob

Alexandre Ghiti (3):
  riscv: Get rid of riscv_pfn_base variable
  riscv: Move the linear mapping creation in its own function
  riscv: Use PUD/P4D/PGD pages for the linear mapping

 arch/riscv/include/asm/page.h |  19 ++++++-
 arch/riscv/mm/init.c          | 102 ++++++++++++++++++++++++++--------
 arch/riscv/mm/physaddr.c      |  16 ++++++
 drivers/of/fdt.c              |  11 ++--
 4 files changed, 118 insertions(+), 30 deletions(-)

-- 
2.37.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v9 1/3] riscv: Get rid of riscv_pfn_base variable
  2023-03-24 15:54 ` Alexandre Ghiti
@ 2023-03-24 15:54   ` Alexandre Ghiti
  -1 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti

Use directly phys_ram_base instead, riscv_pfn_base is just the pfn of
the address contained in phys_ram_base.

Even if there is no functional change intended in this patch, actually
setting phys_ram_base that early changes the behaviour of
kernel_mapping_pa_to_va during the early boot: phys_ram_base used to be
zero before this patch and now it is set to the physical start address of
the kernel. But it does not break the conversion of a kernel physical
address into a virtual address since kernel_mapping_pa_to_va should only
be used on kernel physical addresses, i.e. addresses greater than the
physical start address of the kernel.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/include/asm/page.h | 3 +--
 arch/riscv/mm/init.c          | 6 +-----
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 7fed7c431928..8dc686f549b6 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -91,8 +91,7 @@ typedef struct page *pgtable_t;
 #endif
 
 #ifdef CONFIG_MMU
-extern unsigned long riscv_pfn_base;
-#define ARCH_PFN_OFFSET		(riscv_pfn_base)
+#define ARCH_PFN_OFFSET		(PFN_DOWN((unsigned long)phys_ram_base))
 #else
 #define ARCH_PFN_OFFSET		(PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 87f6a5d475a6..cc558d94559a 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -271,9 +271,6 @@ static void __init setup_bootmem(void)
 #ifdef CONFIG_MMU
 struct pt_alloc_ops pt_ops __initdata;
 
-unsigned long riscv_pfn_base __ro_after_init;
-EXPORT_SYMBOL(riscv_pfn_base);
-
 pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
@@ -285,7 +282,6 @@ static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAG
 
 #ifdef CONFIG_XIP_KERNEL
 #define pt_ops			(*(struct pt_alloc_ops *)XIP_FIXUP(&pt_ops))
-#define riscv_pfn_base         (*(unsigned long  *)XIP_FIXUP(&riscv_pfn_base))
 #define trampoline_pg_dir      ((pgd_t *)XIP_FIXUP(trampoline_pg_dir))
 #define fixmap_pte             ((pte_t *)XIP_FIXUP(fixmap_pte))
 #define early_pg_dir           ((pgd_t *)XIP_FIXUP(early_pg_dir))
@@ -985,7 +981,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
 	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
 
-	riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
+	phys_ram_base = kernel_map.phys_addr;
 
 	/*
 	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 1/3] riscv: Get rid of riscv_pfn_base variable
@ 2023-03-24 15:54   ` Alexandre Ghiti
  0 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti

Use directly phys_ram_base instead, riscv_pfn_base is just the pfn of
the address contained in phys_ram_base.

Even if there is no functional change intended in this patch, actually
setting phys_ram_base that early changes the behaviour of
kernel_mapping_pa_to_va during the early boot: phys_ram_base used to be
zero before this patch and now it is set to the physical start address of
the kernel. But it does not break the conversion of a kernel physical
address into a virtual address since kernel_mapping_pa_to_va should only
be used on kernel physical addresses, i.e. addresses greater than the
physical start address of the kernel.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/include/asm/page.h | 3 +--
 arch/riscv/mm/init.c          | 6 +-----
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 7fed7c431928..8dc686f549b6 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -91,8 +91,7 @@ typedef struct page *pgtable_t;
 #endif
 
 #ifdef CONFIG_MMU
-extern unsigned long riscv_pfn_base;
-#define ARCH_PFN_OFFSET		(riscv_pfn_base)
+#define ARCH_PFN_OFFSET		(PFN_DOWN((unsigned long)phys_ram_base))
 #else
 #define ARCH_PFN_OFFSET		(PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 87f6a5d475a6..cc558d94559a 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -271,9 +271,6 @@ static void __init setup_bootmem(void)
 #ifdef CONFIG_MMU
 struct pt_alloc_ops pt_ops __initdata;
 
-unsigned long riscv_pfn_base __ro_after_init;
-EXPORT_SYMBOL(riscv_pfn_base);
-
 pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
@@ -285,7 +282,6 @@ static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAG
 
 #ifdef CONFIG_XIP_KERNEL
 #define pt_ops			(*(struct pt_alloc_ops *)XIP_FIXUP(&pt_ops))
-#define riscv_pfn_base         (*(unsigned long  *)XIP_FIXUP(&riscv_pfn_base))
 #define trampoline_pg_dir      ((pgd_t *)XIP_FIXUP(trampoline_pg_dir))
 #define fixmap_pte             ((pte_t *)XIP_FIXUP(fixmap_pte))
 #define early_pg_dir           ((pgd_t *)XIP_FIXUP(early_pg_dir))
@@ -985,7 +981,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
 	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
 
-	riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
+	phys_ram_base = kernel_map.phys_addr;
 
 	/*
 	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
-- 
2.37.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 2/3] riscv: Move the linear mapping creation in its own function
  2023-03-24 15:54 ` Alexandre Ghiti
@ 2023-03-24 15:54   ` Alexandre Ghiti
  -1 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti

No change intended, it just splits the linear mapping creation from
setup_vm_final: this prepares for upcoming additions to the linear
mapping creation.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/mm/init.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index cc558d94559a..3b37d8606920 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1086,16 +1086,25 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	pt_ops_set_fixmap();
 }
 
-static void __init setup_vm_final(void)
+static void __init create_linear_mapping_range(phys_addr_t start,
+					       phys_addr_t end)
 {
+	phys_addr_t pa;
 	uintptr_t va, map_size;
-	phys_addr_t pa, start, end;
-	u64 i;
 
-	/* Setup swapper PGD for fixmap */
-	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
-			   __pa_symbol(fixmap_pgd_next),
-			   PGDIR_SIZE, PAGE_TABLE);
+	for (pa = start; pa < end; pa += map_size) {
+		va = (uintptr_t)__va(pa);
+		map_size = best_map_size(pa, end - pa);
+
+		create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
+				   pgprot_from_va(va));
+	}
+}
+
+static void __init create_linear_mapping_page_table(void)
+{
+	phys_addr_t start, end;
+	u64 i;
 
 	/* Map all memory banks in the linear mapping */
 	for_each_mem_range(i, &start, &end) {
@@ -1107,14 +1116,19 @@ static void __init setup_vm_final(void)
 		if (end >= __pa(PAGE_OFFSET) + memory_limit)
 			end = __pa(PAGE_OFFSET) + memory_limit;
 
-		for (pa = start; pa < end; pa += map_size) {
-			va = (uintptr_t)__va(pa);
-			map_size = best_map_size(pa, end - pa);
-
-			create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
-					   pgprot_from_va(va));
-		}
+		create_linear_mapping_range(start, end);
 	}
+}
+
+static void __init setup_vm_final(void)
+{
+	/* Setup swapper PGD for fixmap */
+	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
+			   __pa_symbol(fixmap_pgd_next),
+			   PGDIR_SIZE, PAGE_TABLE);
+
+	/* Map the linear mapping */
+	create_linear_mapping_page_table();
 
 	/* Map the kernel */
 	if (IS_ENABLED(CONFIG_64BIT))
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 2/3] riscv: Move the linear mapping creation in its own function
@ 2023-03-24 15:54   ` Alexandre Ghiti
  0 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti

No change intended, it just splits the linear mapping creation from
setup_vm_final: this prepares for upcoming additions to the linear
mapping creation.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/mm/init.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index cc558d94559a..3b37d8606920 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1086,16 +1086,25 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	pt_ops_set_fixmap();
 }
 
-static void __init setup_vm_final(void)
+static void __init create_linear_mapping_range(phys_addr_t start,
+					       phys_addr_t end)
 {
+	phys_addr_t pa;
 	uintptr_t va, map_size;
-	phys_addr_t pa, start, end;
-	u64 i;
 
-	/* Setup swapper PGD for fixmap */
-	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
-			   __pa_symbol(fixmap_pgd_next),
-			   PGDIR_SIZE, PAGE_TABLE);
+	for (pa = start; pa < end; pa += map_size) {
+		va = (uintptr_t)__va(pa);
+		map_size = best_map_size(pa, end - pa);
+
+		create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
+				   pgprot_from_va(va));
+	}
+}
+
+static void __init create_linear_mapping_page_table(void)
+{
+	phys_addr_t start, end;
+	u64 i;
 
 	/* Map all memory banks in the linear mapping */
 	for_each_mem_range(i, &start, &end) {
@@ -1107,14 +1116,19 @@ static void __init setup_vm_final(void)
 		if (end >= __pa(PAGE_OFFSET) + memory_limit)
 			end = __pa(PAGE_OFFSET) + memory_limit;
 
-		for (pa = start; pa < end; pa += map_size) {
-			va = (uintptr_t)__va(pa);
-			map_size = best_map_size(pa, end - pa);
-
-			create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
-					   pgprot_from_va(va));
-		}
+		create_linear_mapping_range(start, end);
 	}
+}
+
+static void __init setup_vm_final(void)
+{
+	/* Setup swapper PGD for fixmap */
+	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
+			   __pa_symbol(fixmap_pgd_next),
+			   PGDIR_SIZE, PAGE_TABLE);
+
+	/* Map the linear mapping */
+	create_linear_mapping_page_table();
 
 	/* Map the kernel */
 	if (IS_ENABLED(CONFIG_64BIT))
-- 
2.37.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-24 15:54 ` Alexandre Ghiti
@ 2023-03-24 15:54   ` Alexandre Ghiti
  -1 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti, Rob Herring

During the early page table creation, we used to set the mapping for
PAGE_OFFSET to the kernel load address: but the kernel load address is
always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
PAGE_OFFSET is).

But actually we don't have to establish this mapping (ie set va_pa_offset)
that early in the boot process because:

- first, setup_vm installs a temporary kernel mapping and among other
  things, discovers the system memory,
- then, setup_vm_final creates the final kernel mapping and takes
  advantage of the discovered system memory to create the linear
  mapping.

During the first phase, we don't know the start of the system memory and
then until the second phase is finished, we can't use the linear mapping at
all and phys_to_virt/virt_to_phys translations must not be used because it
would result in a different translation from the 'real' one once the final
mapping is installed.

So here we simply delay the initialization of va_pa_offset to after the
system memory discovery. But to make sure noone uses the linear mapping
before, we add some guard in the DEBUG_VIRTUAL config.

Finally we can use PUD/P4D/PGD hugepages when possible, which will result
in a better TLB utilization.

Note that:
- this does not apply to rv32 as the kernel mapping lies in the linear
  mapping.
- we rely on the firmware to protect itself using PMP.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Rob Herring <robh@kernel.org> # DT bits
---
 arch/riscv/include/asm/page.h | 16 ++++++++++
 arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
 arch/riscv/mm/physaddr.c      | 16 ++++++++++
 drivers/of/fdt.c              | 11 ++++---
 4 files changed, 90 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 8dc686f549b6..ea1a0e237211 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
 #define PTE_FMT "%08lx"
 #endif
 
+#ifdef CONFIG_64BIT
+/*
+ * We override this value as its generic definition uses __pa too early in
+ * the boot process (before kernel_map.va_pa_offset is set).
+ */
+#define MIN_MEMBLOCK_ADDR      0
+#endif
+
 #ifdef CONFIG_MMU
 #define ARCH_PFN_OFFSET		(PFN_DOWN((unsigned long)phys_ram_base))
 #else
@@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
 #define is_linear_mapping(x)	\
 	((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
 
+#ifndef CONFIG_DEBUG_VIRTUAL
 #define linear_mapping_pa_to_va(x)	((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
+#else
+void *linear_mapping_pa_to_va(unsigned long x);
+#endif
 #define kernel_mapping_pa_to_va(y)	({					\
 	unsigned long _y = (unsigned long)(y);					\
 	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?			\
@@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
 	})
 #define __pa_to_va_nodebug(x)		linear_mapping_pa_to_va(x)
 
+#ifndef CONFIG_DEBUG_VIRTUAL
 #define linear_mapping_va_to_pa(x)	((unsigned long)(x) - kernel_map.va_pa_offset)
+#else
+phys_addr_t linear_mapping_va_to_pa(unsigned long x);
+#endif
 #define kernel_mapping_va_to_pa(y) ({						\
 	unsigned long _y = (unsigned long)(y);					\
 	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 3b37d8606920..f803671d18b2 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
 	phys_ram_end = memblock_end_of_DRAM();
 	if (!IS_ENABLED(CONFIG_XIP_KERNEL))
 		phys_ram_base = memblock_start_of_DRAM();
+
+	/*
+	 * In 64-bit, any use of __va/__pa before this point is wrong as we
+	 * did not know the start of DRAM before.
+	 */
+	if (IS_ENABLED(CONFIG_64BIT))
+		kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
+
 	/*
 	 * memblock allocator is not aware of the fact that last 4K bytes of
 	 * the addressable memory can not be mapped because of IS_ERR_VALUE
@@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
 
 static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
 {
-	/* Upgrade to PMD_SIZE mappings whenever possible */
-	base &= PMD_SIZE - 1;
-	if (!base && size >= PMD_SIZE)
+	if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
+		return PGDIR_SIZE;
+
+	if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
+		return P4D_SIZE;
+
+	if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
+		return PUD_SIZE;
+
+	if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
 		return PMD_SIZE;
 
 	return PAGE_SIZE;
@@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	set_satp_mode();
 #endif
 
-	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
+	/*
+	 * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
+	 * where we have the system memory layout: this allows us to align
+	 * the physical and virtual mappings and then make use of PUD/P4D/PGD
+	 * for the linear mapping. This is only possible because the kernel
+	 * mapping lies outside the linear mapping.
+	 * In 32-bit however, as the kernel resides in the linear mapping,
+	 * setup_vm_final can not change the mapping established here,
+	 * otherwise the same kernel addresses would get mapped to different
+	 * physical addresses (if the start of dram is different from the
+	 * kernel physical address start).
+	 */
+	kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
+				0UL : PAGE_OFFSET - kernel_map.phys_addr;
 	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
 
-	phys_ram_base = kernel_map.phys_addr;
-
 	/*
 	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
 	 * kernel, whereas for 64-bit kernel, the end of the virtual address
@@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
 	phys_addr_t start, end;
 	u64 i;
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+	phys_addr_t ktext_start = __pa_symbol(_start);
+	phys_addr_t ktext_size = __init_data_begin - _start;
+	phys_addr_t krodata_start = __pa_symbol(__start_rodata);
+	phys_addr_t krodata_size = _data - __start_rodata;
+
+	/* Isolate kernel text and rodata so they don't get mapped with a PUD */
+	memblock_mark_nomap(ktext_start,  ktext_size);
+	memblock_mark_nomap(krodata_start, krodata_size);
+#endif
+
 	/* Map all memory banks in the linear mapping */
 	for_each_mem_range(i, &start, &end) {
 		if (start >= end)
@@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
 
 		create_linear_mapping_range(start, end);
 	}
+
+#ifdef CONFIG_STRICT_KERNEL_RWX
+	create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
+	create_linear_mapping_range(krodata_start,
+				    krodata_start + krodata_size);
+
+	memblock_clear_nomap(ktext_start,  ktext_size);
+	memblock_clear_nomap(krodata_start, krodata_size);
+#endif
 }
 
 static void __init setup_vm_final(void)
diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
index 9b18bda74154..18706f457da7 100644
--- a/arch/riscv/mm/physaddr.c
+++ b/arch/riscv/mm/physaddr.c
@@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
 	return __va_to_pa_nodebug(x);
 }
 EXPORT_SYMBOL(__phys_addr_symbol);
+
+phys_addr_t linear_mapping_va_to_pa(unsigned long x)
+{
+	BUG_ON(!kernel_map.va_pa_offset);
+
+	return ((unsigned long)(x) - kernel_map.va_pa_offset);
+}
+EXPORT_SYMBOL(linear_mapping_va_to_pa);
+
+void *linear_mapping_pa_to_va(unsigned long x)
+{
+	BUG_ON(!kernel_map.va_pa_offset);
+
+	return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
+}
+EXPORT_SYMBOL(linear_mapping_pa_to_va);
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index d1a68b6d03b3..d14735a81301 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
 static void __early_init_dt_declare_initrd(unsigned long start,
 					   unsigned long end)
 {
-	/* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
-	 * enabled since __va() is called too early. ARM64 does make use
-	 * of phys_initrd_start/phys_initrd_size so we can skip this
-	 * conversion.
+	/*
+	 * __va() is not yet available this early on some platforms. In that
+	 * case, the platform uses phys_initrd_start/phys_initrd_size instead
+	 * and does the VA conversion itself.
 	 */
-	if (!IS_ENABLED(CONFIG_ARM64)) {
+	if (!IS_ENABLED(CONFIG_ARM64) &&
+	    !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
 		initrd_start = (unsigned long)__va(start);
 		initrd_end = (unsigned long)__va(end);
 		initrd_below_start_ok = 1;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-24 15:54   ` Alexandre Ghiti
  0 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-24 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree
  Cc: Alexandre Ghiti, Rob Herring

During the early page table creation, we used to set the mapping for
PAGE_OFFSET to the kernel load address: but the kernel load address is
always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
PAGE_OFFSET is).

But actually we don't have to establish this mapping (ie set va_pa_offset)
that early in the boot process because:

- first, setup_vm installs a temporary kernel mapping and among other
  things, discovers the system memory,
- then, setup_vm_final creates the final kernel mapping and takes
  advantage of the discovered system memory to create the linear
  mapping.

During the first phase, we don't know the start of the system memory and
then until the second phase is finished, we can't use the linear mapping at
all and phys_to_virt/virt_to_phys translations must not be used because it
would result in a different translation from the 'real' one once the final
mapping is installed.

So here we simply delay the initialization of va_pa_offset to after the
system memory discovery. But to make sure noone uses the linear mapping
before, we add some guard in the DEBUG_VIRTUAL config.

Finally we can use PUD/P4D/PGD hugepages when possible, which will result
in a better TLB utilization.

Note that:
- this does not apply to rv32 as the kernel mapping lies in the linear
  mapping.
- we rely on the firmware to protect itself using PMP.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Rob Herring <robh@kernel.org> # DT bits
---
 arch/riscv/include/asm/page.h | 16 ++++++++++
 arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
 arch/riscv/mm/physaddr.c      | 16 ++++++++++
 drivers/of/fdt.c              | 11 ++++---
 4 files changed, 90 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 8dc686f549b6..ea1a0e237211 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
 #define PTE_FMT "%08lx"
 #endif
 
+#ifdef CONFIG_64BIT
+/*
+ * We override this value as its generic definition uses __pa too early in
+ * the boot process (before kernel_map.va_pa_offset is set).
+ */
+#define MIN_MEMBLOCK_ADDR      0
+#endif
+
 #ifdef CONFIG_MMU
 #define ARCH_PFN_OFFSET		(PFN_DOWN((unsigned long)phys_ram_base))
 #else
@@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
 #define is_linear_mapping(x)	\
 	((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
 
+#ifndef CONFIG_DEBUG_VIRTUAL
 #define linear_mapping_pa_to_va(x)	((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
+#else
+void *linear_mapping_pa_to_va(unsigned long x);
+#endif
 #define kernel_mapping_pa_to_va(y)	({					\
 	unsigned long _y = (unsigned long)(y);					\
 	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?			\
@@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
 	})
 #define __pa_to_va_nodebug(x)		linear_mapping_pa_to_va(x)
 
+#ifndef CONFIG_DEBUG_VIRTUAL
 #define linear_mapping_va_to_pa(x)	((unsigned long)(x) - kernel_map.va_pa_offset)
+#else
+phys_addr_t linear_mapping_va_to_pa(unsigned long x);
+#endif
 #define kernel_mapping_va_to_pa(y) ({						\
 	unsigned long _y = (unsigned long)(y);					\
 	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 3b37d8606920..f803671d18b2 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
 	phys_ram_end = memblock_end_of_DRAM();
 	if (!IS_ENABLED(CONFIG_XIP_KERNEL))
 		phys_ram_base = memblock_start_of_DRAM();
+
+	/*
+	 * In 64-bit, any use of __va/__pa before this point is wrong as we
+	 * did not know the start of DRAM before.
+	 */
+	if (IS_ENABLED(CONFIG_64BIT))
+		kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
+
 	/*
 	 * memblock allocator is not aware of the fact that last 4K bytes of
 	 * the addressable memory can not be mapped because of IS_ERR_VALUE
@@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
 
 static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
 {
-	/* Upgrade to PMD_SIZE mappings whenever possible */
-	base &= PMD_SIZE - 1;
-	if (!base && size >= PMD_SIZE)
+	if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
+		return PGDIR_SIZE;
+
+	if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
+		return P4D_SIZE;
+
+	if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
+		return PUD_SIZE;
+
+	if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
 		return PMD_SIZE;
 
 	return PAGE_SIZE;
@@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	set_satp_mode();
 #endif
 
-	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
+	/*
+	 * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
+	 * where we have the system memory layout: this allows us to align
+	 * the physical and virtual mappings and then make use of PUD/P4D/PGD
+	 * for the linear mapping. This is only possible because the kernel
+	 * mapping lies outside the linear mapping.
+	 * In 32-bit however, as the kernel resides in the linear mapping,
+	 * setup_vm_final can not change the mapping established here,
+	 * otherwise the same kernel addresses would get mapped to different
+	 * physical addresses (if the start of dram is different from the
+	 * kernel physical address start).
+	 */
+	kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
+				0UL : PAGE_OFFSET - kernel_map.phys_addr;
 	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
 
-	phys_ram_base = kernel_map.phys_addr;
-
 	/*
 	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
 	 * kernel, whereas for 64-bit kernel, the end of the virtual address
@@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
 	phys_addr_t start, end;
 	u64 i;
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+	phys_addr_t ktext_start = __pa_symbol(_start);
+	phys_addr_t ktext_size = __init_data_begin - _start;
+	phys_addr_t krodata_start = __pa_symbol(__start_rodata);
+	phys_addr_t krodata_size = _data - __start_rodata;
+
+	/* Isolate kernel text and rodata so they don't get mapped with a PUD */
+	memblock_mark_nomap(ktext_start,  ktext_size);
+	memblock_mark_nomap(krodata_start, krodata_size);
+#endif
+
 	/* Map all memory banks in the linear mapping */
 	for_each_mem_range(i, &start, &end) {
 		if (start >= end)
@@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
 
 		create_linear_mapping_range(start, end);
 	}
+
+#ifdef CONFIG_STRICT_KERNEL_RWX
+	create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
+	create_linear_mapping_range(krodata_start,
+				    krodata_start + krodata_size);
+
+	memblock_clear_nomap(ktext_start,  ktext_size);
+	memblock_clear_nomap(krodata_start, krodata_size);
+#endif
 }
 
 static void __init setup_vm_final(void)
diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
index 9b18bda74154..18706f457da7 100644
--- a/arch/riscv/mm/physaddr.c
+++ b/arch/riscv/mm/physaddr.c
@@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
 	return __va_to_pa_nodebug(x);
 }
 EXPORT_SYMBOL(__phys_addr_symbol);
+
+phys_addr_t linear_mapping_va_to_pa(unsigned long x)
+{
+	BUG_ON(!kernel_map.va_pa_offset);
+
+	return ((unsigned long)(x) - kernel_map.va_pa_offset);
+}
+EXPORT_SYMBOL(linear_mapping_va_to_pa);
+
+void *linear_mapping_pa_to_va(unsigned long x)
+{
+	BUG_ON(!kernel_map.va_pa_offset);
+
+	return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
+}
+EXPORT_SYMBOL(linear_mapping_pa_to_va);
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index d1a68b6d03b3..d14735a81301 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
 static void __early_init_dt_declare_initrd(unsigned long start,
 					   unsigned long end)
 {
-	/* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
-	 * enabled since __va() is called too early. ARM64 does make use
-	 * of phys_initrd_start/phys_initrd_size so we can skip this
-	 * conversion.
+	/*
+	 * __va() is not yet available this early on some platforms. In that
+	 * case, the platform uses phys_initrd_start/phys_initrd_size instead
+	 * and does the VA conversion itself.
 	 */
-	if (!IS_ENABLED(CONFIG_ARM64)) {
+	if (!IS_ENABLED(CONFIG_ARM64) &&
+	    !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
 		initrd_start = (unsigned long)__va(start);
 		initrd_end = (unsigned long)__va(end);
 		initrd_below_start_ok = 1;
-- 
2.37.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-24 15:54   ` Alexandre Ghiti
@ 2023-03-27  9:39     ` Andrew Jones
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27  9:39 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
> 
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
> 
> - first, setup_vm installs a temporary kernel mapping and among other
>   things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
>   advantage of the discovered system memory to create the linear
>   mapping.
> 
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
> 
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
> 
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
> 
> Note that:
> - this does not apply to rv32 as the kernel mapping lies in the linear
>   mapping.
> - we rely on the firmware to protect itself using PMP.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Acked-by: Rob Herring <robh@kernel.org> # DT bits
> ---
>  arch/riscv/include/asm/page.h | 16 ++++++++++
>  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
>  arch/riscv/mm/physaddr.c      | 16 ++++++++++
>  drivers/of/fdt.c              | 11 ++++---
>  4 files changed, 90 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 8dc686f549b6..ea1a0e237211 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
>  #define PTE_FMT "%08lx"
>  #endif
>  
> +#ifdef CONFIG_64BIT
> +/*
> + * We override this value as its generic definition uses __pa too early in
> + * the boot process (before kernel_map.va_pa_offset is set).
> + */
> +#define MIN_MEMBLOCK_ADDR      0
> +#endif
> +
>  #ifdef CONFIG_MMU
>  #define ARCH_PFN_OFFSET		(PFN_DOWN((unsigned long)phys_ram_base))
>  #else
> @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
>  #define is_linear_mapping(x)	\
>  	((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
>  
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_pa_to_va(x)	((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> +#else
> +void *linear_mapping_pa_to_va(unsigned long x);
> +#endif
>  #define kernel_mapping_pa_to_va(y)	({					\
>  	unsigned long _y = (unsigned long)(y);					\
>  	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?			\
> @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
>  	})
>  #define __pa_to_va_nodebug(x)		linear_mapping_pa_to_va(x)
>  
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_va_to_pa(x)	((unsigned long)(x) - kernel_map.va_pa_offset)
> +#else
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> +#endif
>  #define kernel_mapping_va_to_pa(y) ({						\
>  	unsigned long _y = (unsigned long)(y);					\
>  	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 3b37d8606920..f803671d18b2 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
>  	phys_ram_end = memblock_end_of_DRAM();
>  	if (!IS_ENABLED(CONFIG_XIP_KERNEL))
>  		phys_ram_base = memblock_start_of_DRAM();
> +
> +	/*
> +	 * In 64-bit, any use of __va/__pa before this point is wrong as we
> +	 * did not know the start of DRAM before.
> +	 */
> +	if (IS_ENABLED(CONFIG_64BIT))
> +		kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> +
>  	/*
>  	 * memblock allocator is not aware of the fact that last 4K bytes of
>  	 * the addressable memory can not be mapped because of IS_ERR_VALUE
> @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
>  
>  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
>  {
> -	/* Upgrade to PMD_SIZE mappings whenever possible */
> -	base &= PMD_SIZE - 1;
> -	if (!base && size >= PMD_SIZE)
> +	if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> +		return PGDIR_SIZE;
> +
> +	if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> +		return P4D_SIZE;
> +
> +	if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> +		return PUD_SIZE;
> +
> +	if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
>  		return PMD_SIZE;
>  
>  	return PAGE_SIZE;
> @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>  	set_satp_mode();
>  #endif
>  
> -	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> +	/*
> +	 * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> +	 * where we have the system memory layout: this allows us to align
> +	 * the physical and virtual mappings and then make use of PUD/P4D/PGD
> +	 * for the linear mapping. This is only possible because the kernel
> +	 * mapping lies outside the linear mapping.
> +	 * In 32-bit however, as the kernel resides in the linear mapping,
> +	 * setup_vm_final can not change the mapping established here,
> +	 * otherwise the same kernel addresses would get mapped to different
> +	 * physical addresses (if the start of dram is different from the
> +	 * kernel physical address start).
> +	 */
> +	kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> +				0UL : PAGE_OFFSET - kernel_map.phys_addr;
>  	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>  
> -	phys_ram_base = kernel_map.phys_addr;
> -
>  	/*
>  	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
>  	 * kernel, whereas for 64-bit kernel, the end of the virtual address
> @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
>  	phys_addr_t start, end;
>  	u64 i;
>  
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +	phys_addr_t ktext_start = __pa_symbol(_start);
> +	phys_addr_t ktext_size = __init_data_begin - _start;
> +	phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> +	phys_addr_t krodata_size = _data - __start_rodata;
> +
> +	/* Isolate kernel text and rodata so they don't get mapped with a PUD */
> +	memblock_mark_nomap(ktext_start,  ktext_size);
> +	memblock_mark_nomap(krodata_start, krodata_size);
> +#endif
> +
>  	/* Map all memory banks in the linear mapping */
>  	for_each_mem_range(i, &start, &end) {
>  		if (start >= end)
> @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
>  
>  		create_linear_mapping_range(start, end);
>  	}
> +
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +	create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> +	create_linear_mapping_range(krodata_start,
> +				    krodata_start + krodata_size);

Just for my own education, it looks to me like the rodata is left writable
until the end of start_kernel(), when mark_rodata_ro() is called. Is that
correct?

Thanks,
drew

> +
> +	memblock_clear_nomap(ktext_start,  ktext_size);
> +	memblock_clear_nomap(krodata_start, krodata_size);
> +#endif
>  }
>  
>  static void __init setup_vm_final(void)
> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> index 9b18bda74154..18706f457da7 100644
> --- a/arch/riscv/mm/physaddr.c
> +++ b/arch/riscv/mm/physaddr.c
> @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
>  	return __va_to_pa_nodebug(x);
>  }
>  EXPORT_SYMBOL(__phys_addr_symbol);
> +
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> +{
> +	BUG_ON(!kernel_map.va_pa_offset);
> +
> +	return ((unsigned long)(x) - kernel_map.va_pa_offset);
> +}
> +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> +
> +void *linear_mapping_pa_to_va(unsigned long x)
> +{
> +	BUG_ON(!kernel_map.va_pa_offset);
> +
> +	return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> +}
> +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index d1a68b6d03b3..d14735a81301 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
>  static void __early_init_dt_declare_initrd(unsigned long start,
>  					   unsigned long end)
>  {
> -	/* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> -	 * enabled since __va() is called too early. ARM64 does make use
> -	 * of phys_initrd_start/phys_initrd_size so we can skip this
> -	 * conversion.
> +	/*
> +	 * __va() is not yet available this early on some platforms. In that
> +	 * case, the platform uses phys_initrd_start/phys_initrd_size instead
> +	 * and does the VA conversion itself.
>  	 */
> -	if (!IS_ENABLED(CONFIG_ARM64)) {
> +	if (!IS_ENABLED(CONFIG_ARM64) &&
> +	    !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
>  		initrd_start = (unsigned long)__va(start);
>  		initrd_end = (unsigned long)__va(end);
>  		initrd_below_start_ok = 1;
> -- 
> 2.37.2
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-27  9:39     ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27  9:39 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
> 
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
> 
> - first, setup_vm installs a temporary kernel mapping and among other
>   things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
>   advantage of the discovered system memory to create the linear
>   mapping.
> 
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
> 
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
> 
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
> 
> Note that:
> - this does not apply to rv32 as the kernel mapping lies in the linear
>   mapping.
> - we rely on the firmware to protect itself using PMP.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Acked-by: Rob Herring <robh@kernel.org> # DT bits
> ---
>  arch/riscv/include/asm/page.h | 16 ++++++++++
>  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
>  arch/riscv/mm/physaddr.c      | 16 ++++++++++
>  drivers/of/fdt.c              | 11 ++++---
>  4 files changed, 90 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 8dc686f549b6..ea1a0e237211 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
>  #define PTE_FMT "%08lx"
>  #endif
>  
> +#ifdef CONFIG_64BIT
> +/*
> + * We override this value as its generic definition uses __pa too early in
> + * the boot process (before kernel_map.va_pa_offset is set).
> + */
> +#define MIN_MEMBLOCK_ADDR      0
> +#endif
> +
>  #ifdef CONFIG_MMU
>  #define ARCH_PFN_OFFSET		(PFN_DOWN((unsigned long)phys_ram_base))
>  #else
> @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
>  #define is_linear_mapping(x)	\
>  	((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
>  
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_pa_to_va(x)	((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> +#else
> +void *linear_mapping_pa_to_va(unsigned long x);
> +#endif
>  #define kernel_mapping_pa_to_va(y)	({					\
>  	unsigned long _y = (unsigned long)(y);					\
>  	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?			\
> @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
>  	})
>  #define __pa_to_va_nodebug(x)		linear_mapping_pa_to_va(x)
>  
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_va_to_pa(x)	((unsigned long)(x) - kernel_map.va_pa_offset)
> +#else
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> +#endif
>  #define kernel_mapping_va_to_pa(y) ({						\
>  	unsigned long _y = (unsigned long)(y);					\
>  	(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 3b37d8606920..f803671d18b2 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
>  	phys_ram_end = memblock_end_of_DRAM();
>  	if (!IS_ENABLED(CONFIG_XIP_KERNEL))
>  		phys_ram_base = memblock_start_of_DRAM();
> +
> +	/*
> +	 * In 64-bit, any use of __va/__pa before this point is wrong as we
> +	 * did not know the start of DRAM before.
> +	 */
> +	if (IS_ENABLED(CONFIG_64BIT))
> +		kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> +
>  	/*
>  	 * memblock allocator is not aware of the fact that last 4K bytes of
>  	 * the addressable memory can not be mapped because of IS_ERR_VALUE
> @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
>  
>  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
>  {
> -	/* Upgrade to PMD_SIZE mappings whenever possible */
> -	base &= PMD_SIZE - 1;
> -	if (!base && size >= PMD_SIZE)
> +	if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> +		return PGDIR_SIZE;
> +
> +	if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> +		return P4D_SIZE;
> +
> +	if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> +		return PUD_SIZE;
> +
> +	if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
>  		return PMD_SIZE;
>  
>  	return PAGE_SIZE;
> @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>  	set_satp_mode();
>  #endif
>  
> -	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> +	/*
> +	 * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> +	 * where we have the system memory layout: this allows us to align
> +	 * the physical and virtual mappings and then make use of PUD/P4D/PGD
> +	 * for the linear mapping. This is only possible because the kernel
> +	 * mapping lies outside the linear mapping.
> +	 * In 32-bit however, as the kernel resides in the linear mapping,
> +	 * setup_vm_final can not change the mapping established here,
> +	 * otherwise the same kernel addresses would get mapped to different
> +	 * physical addresses (if the start of dram is different from the
> +	 * kernel physical address start).
> +	 */
> +	kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> +				0UL : PAGE_OFFSET - kernel_map.phys_addr;
>  	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>  
> -	phys_ram_base = kernel_map.phys_addr;
> -
>  	/*
>  	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
>  	 * kernel, whereas for 64-bit kernel, the end of the virtual address
> @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
>  	phys_addr_t start, end;
>  	u64 i;
>  
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +	phys_addr_t ktext_start = __pa_symbol(_start);
> +	phys_addr_t ktext_size = __init_data_begin - _start;
> +	phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> +	phys_addr_t krodata_size = _data - __start_rodata;
> +
> +	/* Isolate kernel text and rodata so they don't get mapped with a PUD */
> +	memblock_mark_nomap(ktext_start,  ktext_size);
> +	memblock_mark_nomap(krodata_start, krodata_size);
> +#endif
> +
>  	/* Map all memory banks in the linear mapping */
>  	for_each_mem_range(i, &start, &end) {
>  		if (start >= end)
> @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
>  
>  		create_linear_mapping_range(start, end);
>  	}
> +
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +	create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> +	create_linear_mapping_range(krodata_start,
> +				    krodata_start + krodata_size);

Just for my own education, it looks to me like the rodata is left writable
until the end of start_kernel(), when mark_rodata_ro() is called. Is that
correct?

Thanks,
drew

> +
> +	memblock_clear_nomap(ktext_start,  ktext_size);
> +	memblock_clear_nomap(krodata_start, krodata_size);
> +#endif
>  }
>  
>  static void __init setup_vm_final(void)
> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> index 9b18bda74154..18706f457da7 100644
> --- a/arch/riscv/mm/physaddr.c
> +++ b/arch/riscv/mm/physaddr.c
> @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
>  	return __va_to_pa_nodebug(x);
>  }
>  EXPORT_SYMBOL(__phys_addr_symbol);
> +
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> +{
> +	BUG_ON(!kernel_map.va_pa_offset);
> +
> +	return ((unsigned long)(x) - kernel_map.va_pa_offset);
> +}
> +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> +
> +void *linear_mapping_pa_to_va(unsigned long x)
> +{
> +	BUG_ON(!kernel_map.va_pa_offset);
> +
> +	return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> +}
> +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index d1a68b6d03b3..d14735a81301 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
>  static void __early_init_dt_declare_initrd(unsigned long start,
>  					   unsigned long end)
>  {
> -	/* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> -	 * enabled since __va() is called too early. ARM64 does make use
> -	 * of phys_initrd_start/phys_initrd_size so we can skip this
> -	 * conversion.
> +	/*
> +	 * __va() is not yet available this early on some platforms. In that
> +	 * case, the platform uses phys_initrd_start/phys_initrd_size instead
> +	 * and does the VA conversion itself.
>  	 */
> -	if (!IS_ENABLED(CONFIG_ARM64)) {
> +	if (!IS_ENABLED(CONFIG_ARM64) &&
> +	    !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
>  		initrd_start = (unsigned long)__va(start);
>  		initrd_end = (unsigned long)__va(end);
>  		initrd_below_start_ok = 1;
> -- 
> 2.37.2
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 2/3] riscv: Move the linear mapping creation in its own function
  2023-03-24 15:54   ` Alexandre Ghiti
@ 2023-03-27  9:39     ` Andrew Jones
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27  9:39 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree

On Fri, Mar 24, 2023 at 04:54:20PM +0100, Alexandre Ghiti wrote:
> No change intended, it just splits the linear mapping creation from
> setup_vm_final: this prepares for upcoming additions to the linear
> mapping creation.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/mm/init.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 2/3] riscv: Move the linear mapping creation in its own function
@ 2023-03-27  9:39     ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27  9:39 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree

On Fri, Mar 24, 2023 at 04:54:20PM +0100, Alexandre Ghiti wrote:
> No change intended, it just splits the linear mapping creation from
> setup_vm_final: this prepares for upcoming additions to the linear
> mapping creation.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/mm/init.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-27  9:39     ` Andrew Jones
@ 2023-03-27 11:15       ` Alexandre Ghiti
  -1 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-27 11:15 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

Hi Andrew,

On Mon, Mar 27, 2023 at 11:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> > During the early page table creation, we used to set the mapping for
> > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > PAGE_OFFSET is).
> >
> > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > that early in the boot process because:
> >
> > - first, setup_vm installs a temporary kernel mapping and among other
> >   things, discovers the system memory,
> > - then, setup_vm_final creates the final kernel mapping and takes
> >   advantage of the discovered system memory to create the linear
> >   mapping.
> >
> > During the first phase, we don't know the start of the system memory and
> > then until the second phase is finished, we can't use the linear mapping at
> > all and phys_to_virt/virt_to_phys translations must not be used because it
> > would result in a different translation from the 'real' one once the final
> > mapping is installed.
> >
> > So here we simply delay the initialization of va_pa_offset to after the
> > system memory discovery. But to make sure noone uses the linear mapping
> > before, we add some guard in the DEBUG_VIRTUAL config.
> >
> > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > in a better TLB utilization.
> >
> > Note that:
> > - this does not apply to rv32 as the kernel mapping lies in the linear
> >   mapping.
> > - we rely on the firmware to protect itself using PMP.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > Acked-by: Rob Herring <robh@kernel.org> # DT bits
> > ---
> >  arch/riscv/include/asm/page.h | 16 ++++++++++
> >  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
> >  arch/riscv/mm/physaddr.c      | 16 ++++++++++
> >  drivers/of/fdt.c              | 11 ++++---
> >  4 files changed, 90 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 8dc686f549b6..ea1a0e237211 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> >  #define PTE_FMT "%08lx"
> >  #endif
> >
> > +#ifdef CONFIG_64BIT
> > +/*
> > + * We override this value as its generic definition uses __pa too early in
> > + * the boot process (before kernel_map.va_pa_offset is set).
> > + */
> > +#define MIN_MEMBLOCK_ADDR      0
> > +#endif
> > +
> >  #ifdef CONFIG_MMU
> >  #define ARCH_PFN_OFFSET              (PFN_DOWN((unsigned long)phys_ram_base))
> >  #else
> > @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
> >  #define is_linear_mapping(x) \
> >       ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> >
> > +#ifndef CONFIG_DEBUG_VIRTUAL
> >  #define linear_mapping_pa_to_va(x)   ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > +#else
> > +void *linear_mapping_pa_to_va(unsigned long x);
> > +#endif
> >  #define kernel_mapping_pa_to_va(y)   ({                                      \
> >       unsigned long _y = (unsigned long)(y);                                  \
> >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?                 \
> > @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
> >       })
> >  #define __pa_to_va_nodebug(x)                linear_mapping_pa_to_va(x)
> >
> > +#ifndef CONFIG_DEBUG_VIRTUAL
> >  #define linear_mapping_va_to_pa(x)   ((unsigned long)(x) - kernel_map.va_pa_offset)
> > +#else
> > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > +#endif
> >  #define kernel_mapping_va_to_pa(y) ({                                                \
> >       unsigned long _y = (unsigned long)(y);                                  \
> >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 3b37d8606920..f803671d18b2 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> >       phys_ram_end = memblock_end_of_DRAM();
> >       if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> >               phys_ram_base = memblock_start_of_DRAM();
> > +
> > +     /*
> > +      * In 64-bit, any use of __va/__pa before this point is wrong as we
> > +      * did not know the start of DRAM before.
> > +      */
> > +     if (IS_ENABLED(CONFIG_64BIT))
> > +             kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > +
> >       /*
> >        * memblock allocator is not aware of the fact that last 4K bytes of
> >        * the addressable memory can not be mapped because of IS_ERR_VALUE
> > @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> >
> >  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> >  {
> > -     /* Upgrade to PMD_SIZE mappings whenever possible */
> > -     base &= PMD_SIZE - 1;
> > -     if (!base && size >= PMD_SIZE)
> > +     if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > +             return PGDIR_SIZE;
> > +
> > +     if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > +             return P4D_SIZE;
> > +
> > +     if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > +             return PUD_SIZE;
> > +
> > +     if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> >               return PMD_SIZE;
> >
> >       return PAGE_SIZE;
> > @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >       set_satp_mode();
> >  #endif
> >
> > -     kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > +     /*
> > +      * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> > +      * where we have the system memory layout: this allows us to align
> > +      * the physical and virtual mappings and then make use of PUD/P4D/PGD
> > +      * for the linear mapping. This is only possible because the kernel
> > +      * mapping lies outside the linear mapping.
> > +      * In 32-bit however, as the kernel resides in the linear mapping,
> > +      * setup_vm_final can not change the mapping established here,
> > +      * otherwise the same kernel addresses would get mapped to different
> > +      * physical addresses (if the start of dram is different from the
> > +      * kernel physical address start).
> > +      */
> > +     kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> > +                             0UL : PAGE_OFFSET - kernel_map.phys_addr;
> >       kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> >
> > -     phys_ram_base = kernel_map.phys_addr;
> > -
> >       /*
> >        * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> >        * kernel, whereas for 64-bit kernel, the end of the virtual address
> > @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
> >       phys_addr_t start, end;
> >       u64 i;
> >
> > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > +     phys_addr_t ktext_start = __pa_symbol(_start);
> > +     phys_addr_t ktext_size = __init_data_begin - _start;
> > +     phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> > +     phys_addr_t krodata_size = _data - __start_rodata;
> > +
> > +     /* Isolate kernel text and rodata so they don't get mapped with a PUD */
> > +     memblock_mark_nomap(ktext_start,  ktext_size);
> > +     memblock_mark_nomap(krodata_start, krodata_size);
> > +#endif
> > +
> >       /* Map all memory banks in the linear mapping */
> >       for_each_mem_range(i, &start, &end) {
> >               if (start >= end)
> > @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
> >
> >               create_linear_mapping_range(start, end);
> >       }
> > +
> > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > +     create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> > +     create_linear_mapping_range(krodata_start,
> > +                                 krodata_start + krodata_size);
>
> Just for my own education, it looks to me like the rodata is left writable
> until the end of start_kernel(), when mark_rodata_ro() is called. Is that
> correct?

Yes, right before init is triggered, certainly that late because the
rodata section embeds the "__ro_after_init" variables.


>
> Thanks,
> drew
>
> > +
> > +     memblock_clear_nomap(ktext_start,  ktext_size);
> > +     memblock_clear_nomap(krodata_start, krodata_size);
> > +#endif
> >  }
> >
> >  static void __init setup_vm_final(void)
> > diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> > index 9b18bda74154..18706f457da7 100644
> > --- a/arch/riscv/mm/physaddr.c
> > +++ b/arch/riscv/mm/physaddr.c
> > @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
> >       return __va_to_pa_nodebug(x);
> >  }
> >  EXPORT_SYMBOL(__phys_addr_symbol);
> > +
> > +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> > +{
> > +     BUG_ON(!kernel_map.va_pa_offset);
> > +
> > +     return ((unsigned long)(x) - kernel_map.va_pa_offset);
> > +}
> > +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> > +
> > +void *linear_mapping_pa_to_va(unsigned long x)
> > +{
> > +     BUG_ON(!kernel_map.va_pa_offset);
> > +
> > +     return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> > +}
> > +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index d1a68b6d03b3..d14735a81301 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> >  static void __early_init_dt_declare_initrd(unsigned long start,
> >                                          unsigned long end)
> >  {
> > -     /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > -      * enabled since __va() is called too early. ARM64 does make use
> > -      * of phys_initrd_start/phys_initrd_size so we can skip this
> > -      * conversion.
> > +     /*
> > +      * __va() is not yet available this early on some platforms. In that
> > +      * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > +      * and does the VA conversion itself.
> >        */
> > -     if (!IS_ENABLED(CONFIG_ARM64)) {
> > +     if (!IS_ENABLED(CONFIG_ARM64) &&
> > +         !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> >               initrd_start = (unsigned long)__va(start);
> >               initrd_end = (unsigned long)__va(end);
> >               initrd_below_start_ok = 1;
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-27 11:15       ` Alexandre Ghiti
  0 siblings, 0 replies; 28+ messages in thread
From: Alexandre Ghiti @ 2023-03-27 11:15 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

Hi Andrew,

On Mon, Mar 27, 2023 at 11:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> > During the early page table creation, we used to set the mapping for
> > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > PAGE_OFFSET is).
> >
> > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > that early in the boot process because:
> >
> > - first, setup_vm installs a temporary kernel mapping and among other
> >   things, discovers the system memory,
> > - then, setup_vm_final creates the final kernel mapping and takes
> >   advantage of the discovered system memory to create the linear
> >   mapping.
> >
> > During the first phase, we don't know the start of the system memory and
> > then until the second phase is finished, we can't use the linear mapping at
> > all and phys_to_virt/virt_to_phys translations must not be used because it
> > would result in a different translation from the 'real' one once the final
> > mapping is installed.
> >
> > So here we simply delay the initialization of va_pa_offset to after the
> > system memory discovery. But to make sure noone uses the linear mapping
> > before, we add some guard in the DEBUG_VIRTUAL config.
> >
> > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > in a better TLB utilization.
> >
> > Note that:
> > - this does not apply to rv32 as the kernel mapping lies in the linear
> >   mapping.
> > - we rely on the firmware to protect itself using PMP.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > Acked-by: Rob Herring <robh@kernel.org> # DT bits
> > ---
> >  arch/riscv/include/asm/page.h | 16 ++++++++++
> >  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
> >  arch/riscv/mm/physaddr.c      | 16 ++++++++++
> >  drivers/of/fdt.c              | 11 ++++---
> >  4 files changed, 90 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 8dc686f549b6..ea1a0e237211 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> >  #define PTE_FMT "%08lx"
> >  #endif
> >
> > +#ifdef CONFIG_64BIT
> > +/*
> > + * We override this value as its generic definition uses __pa too early in
> > + * the boot process (before kernel_map.va_pa_offset is set).
> > + */
> > +#define MIN_MEMBLOCK_ADDR      0
> > +#endif
> > +
> >  #ifdef CONFIG_MMU
> >  #define ARCH_PFN_OFFSET              (PFN_DOWN((unsigned long)phys_ram_base))
> >  #else
> > @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
> >  #define is_linear_mapping(x) \
> >       ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> >
> > +#ifndef CONFIG_DEBUG_VIRTUAL
> >  #define linear_mapping_pa_to_va(x)   ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > +#else
> > +void *linear_mapping_pa_to_va(unsigned long x);
> > +#endif
> >  #define kernel_mapping_pa_to_va(y)   ({                                      \
> >       unsigned long _y = (unsigned long)(y);                                  \
> >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?                 \
> > @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
> >       })
> >  #define __pa_to_va_nodebug(x)                linear_mapping_pa_to_va(x)
> >
> > +#ifndef CONFIG_DEBUG_VIRTUAL
> >  #define linear_mapping_va_to_pa(x)   ((unsigned long)(x) - kernel_map.va_pa_offset)
> > +#else
> > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > +#endif
> >  #define kernel_mapping_va_to_pa(y) ({                                                \
> >       unsigned long _y = (unsigned long)(y);                                  \
> >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 3b37d8606920..f803671d18b2 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> >       phys_ram_end = memblock_end_of_DRAM();
> >       if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> >               phys_ram_base = memblock_start_of_DRAM();
> > +
> > +     /*
> > +      * In 64-bit, any use of __va/__pa before this point is wrong as we
> > +      * did not know the start of DRAM before.
> > +      */
> > +     if (IS_ENABLED(CONFIG_64BIT))
> > +             kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > +
> >       /*
> >        * memblock allocator is not aware of the fact that last 4K bytes of
> >        * the addressable memory can not be mapped because of IS_ERR_VALUE
> > @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> >
> >  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> >  {
> > -     /* Upgrade to PMD_SIZE mappings whenever possible */
> > -     base &= PMD_SIZE - 1;
> > -     if (!base && size >= PMD_SIZE)
> > +     if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > +             return PGDIR_SIZE;
> > +
> > +     if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > +             return P4D_SIZE;
> > +
> > +     if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > +             return PUD_SIZE;
> > +
> > +     if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> >               return PMD_SIZE;
> >
> >       return PAGE_SIZE;
> > @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >       set_satp_mode();
> >  #endif
> >
> > -     kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > +     /*
> > +      * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> > +      * where we have the system memory layout: this allows us to align
> > +      * the physical and virtual mappings and then make use of PUD/P4D/PGD
> > +      * for the linear mapping. This is only possible because the kernel
> > +      * mapping lies outside the linear mapping.
> > +      * In 32-bit however, as the kernel resides in the linear mapping,
> > +      * setup_vm_final can not change the mapping established here,
> > +      * otherwise the same kernel addresses would get mapped to different
> > +      * physical addresses (if the start of dram is different from the
> > +      * kernel physical address start).
> > +      */
> > +     kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> > +                             0UL : PAGE_OFFSET - kernel_map.phys_addr;
> >       kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> >
> > -     phys_ram_base = kernel_map.phys_addr;
> > -
> >       /*
> >        * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> >        * kernel, whereas for 64-bit kernel, the end of the virtual address
> > @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
> >       phys_addr_t start, end;
> >       u64 i;
> >
> > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > +     phys_addr_t ktext_start = __pa_symbol(_start);
> > +     phys_addr_t ktext_size = __init_data_begin - _start;
> > +     phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> > +     phys_addr_t krodata_size = _data - __start_rodata;
> > +
> > +     /* Isolate kernel text and rodata so they don't get mapped with a PUD */
> > +     memblock_mark_nomap(ktext_start,  ktext_size);
> > +     memblock_mark_nomap(krodata_start, krodata_size);
> > +#endif
> > +
> >       /* Map all memory banks in the linear mapping */
> >       for_each_mem_range(i, &start, &end) {
> >               if (start >= end)
> > @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
> >
> >               create_linear_mapping_range(start, end);
> >       }
> > +
> > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > +     create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> > +     create_linear_mapping_range(krodata_start,
> > +                                 krodata_start + krodata_size);
>
> Just for my own education, it looks to me like the rodata is left writable
> until the end of start_kernel(), when mark_rodata_ro() is called. Is that
> correct?

Yes, right before init is triggered, certainly that late because the
rodata section embeds the "__ro_after_init" variables.


>
> Thanks,
> drew
>
> > +
> > +     memblock_clear_nomap(ktext_start,  ktext_size);
> > +     memblock_clear_nomap(krodata_start, krodata_size);
> > +#endif
> >  }
> >
> >  static void __init setup_vm_final(void)
> > diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> > index 9b18bda74154..18706f457da7 100644
> > --- a/arch/riscv/mm/physaddr.c
> > +++ b/arch/riscv/mm/physaddr.c
> > @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
> >       return __va_to_pa_nodebug(x);
> >  }
> >  EXPORT_SYMBOL(__phys_addr_symbol);
> > +
> > +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> > +{
> > +     BUG_ON(!kernel_map.va_pa_offset);
> > +
> > +     return ((unsigned long)(x) - kernel_map.va_pa_offset);
> > +}
> > +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> > +
> > +void *linear_mapping_pa_to_va(unsigned long x)
> > +{
> > +     BUG_ON(!kernel_map.va_pa_offset);
> > +
> > +     return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> > +}
> > +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index d1a68b6d03b3..d14735a81301 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> >  static void __early_init_dt_declare_initrd(unsigned long start,
> >                                          unsigned long end)
> >  {
> > -     /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > -      * enabled since __va() is called too early. ARM64 does make use
> > -      * of phys_initrd_start/phys_initrd_size so we can skip this
> > -      * conversion.
> > +     /*
> > +      * __va() is not yet available this early on some platforms. In that
> > +      * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > +      * and does the VA conversion itself.
> >        */
> > -     if (!IS_ENABLED(CONFIG_ARM64)) {
> > +     if (!IS_ENABLED(CONFIG_ARM64) &&
> > +         !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> >               initrd_start = (unsigned long)__va(start);
> >               initrd_end = (unsigned long)__va(end);
> >               initrd_below_start_ok = 1;
> > --
> > 2.37.2
> >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-27 11:15       ` Alexandre Ghiti
@ 2023-03-27 11:37         ` Andrew Jones
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27 11:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

On Mon, Mar 27, 2023 at 01:15:43PM +0200, Alexandre Ghiti wrote:
> Hi Andrew,
> 
> On Mon, Mar 27, 2023 at 11:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> > > During the early page table creation, we used to set the mapping for
> > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > PAGE_OFFSET is).
> > >
> > > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > > that early in the boot process because:
> > >
> > > - first, setup_vm installs a temporary kernel mapping and among other
> > >   things, discovers the system memory,
> > > - then, setup_vm_final creates the final kernel mapping and takes
> > >   advantage of the discovered system memory to create the linear
> > >   mapping.
> > >
> > > During the first phase, we don't know the start of the system memory and
> > > then until the second phase is finished, we can't use the linear mapping at
> > > all and phys_to_virt/virt_to_phys translations must not be used because it
> > > would result in a different translation from the 'real' one once the final
> > > mapping is installed.
> > >
> > > So here we simply delay the initialization of va_pa_offset to after the
> > > system memory discovery. But to make sure noone uses the linear mapping
> > > before, we add some guard in the DEBUG_VIRTUAL config.
> > >
> > > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > > in a better TLB utilization.
> > >
> > > Note that:
> > > - this does not apply to rv32 as the kernel mapping lies in the linear
> > >   mapping.
> > > - we rely on the firmware to protect itself using PMP.
> > >
> > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > Acked-by: Rob Herring <robh@kernel.org> # DT bits
> > > ---
> > >  arch/riscv/include/asm/page.h | 16 ++++++++++
> > >  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
> > >  arch/riscv/mm/physaddr.c      | 16 ++++++++++
> > >  drivers/of/fdt.c              | 11 ++++---
> > >  4 files changed, 90 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > > index 8dc686f549b6..ea1a0e237211 100644
> > > --- a/arch/riscv/include/asm/page.h
> > > +++ b/arch/riscv/include/asm/page.h
> > > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> > >  #define PTE_FMT "%08lx"
> > >  #endif
> > >
> > > +#ifdef CONFIG_64BIT
> > > +/*
> > > + * We override this value as its generic definition uses __pa too early in
> > > + * the boot process (before kernel_map.va_pa_offset is set).
> > > + */
> > > +#define MIN_MEMBLOCK_ADDR      0
> > > +#endif
> > > +
> > >  #ifdef CONFIG_MMU
> > >  #define ARCH_PFN_OFFSET              (PFN_DOWN((unsigned long)phys_ram_base))
> > >  #else
> > > @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
> > >  #define is_linear_mapping(x) \
> > >       ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> > >
> > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > >  #define linear_mapping_pa_to_va(x)   ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > > +#else
> > > +void *linear_mapping_pa_to_va(unsigned long x);
> > > +#endif
> > >  #define kernel_mapping_pa_to_va(y)   ({                                      \
> > >       unsigned long _y = (unsigned long)(y);                                  \
> > >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?                 \
> > > @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
> > >       })
> > >  #define __pa_to_va_nodebug(x)                linear_mapping_pa_to_va(x)
> > >
> > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > >  #define linear_mapping_va_to_pa(x)   ((unsigned long)(x) - kernel_map.va_pa_offset)
> > > +#else
> > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > > +#endif
> > >  #define kernel_mapping_va_to_pa(y) ({                                                \
> > >       unsigned long _y = (unsigned long)(y);                                  \
> > >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > > index 3b37d8606920..f803671d18b2 100644
> > > --- a/arch/riscv/mm/init.c
> > > +++ b/arch/riscv/mm/init.c
> > > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> > >       phys_ram_end = memblock_end_of_DRAM();
> > >       if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> > >               phys_ram_base = memblock_start_of_DRAM();
> > > +
> > > +     /*
> > > +      * In 64-bit, any use of __va/__pa before this point is wrong as we
> > > +      * did not know the start of DRAM before.
> > > +      */
> > > +     if (IS_ENABLED(CONFIG_64BIT))
> > > +             kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > > +
> > >       /*
> > >        * memblock allocator is not aware of the fact that last 4K bytes of
> > >        * the addressable memory can not be mapped because of IS_ERR_VALUE
> > > @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> > >
> > >  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> > >  {
> > > -     /* Upgrade to PMD_SIZE mappings whenever possible */
> > > -     base &= PMD_SIZE - 1;
> > > -     if (!base && size >= PMD_SIZE)
> > > +     if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > > +             return PGDIR_SIZE;
> > > +
> > > +     if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > > +             return P4D_SIZE;
> > > +
> > > +     if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > > +             return PUD_SIZE;
> > > +
> > > +     if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> > >               return PMD_SIZE;
> > >
> > >       return PAGE_SIZE;
> > > @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > >       set_satp_mode();
> > >  #endif
> > >
> > > -     kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > > +     /*
> > > +      * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> > > +      * where we have the system memory layout: this allows us to align
> > > +      * the physical and virtual mappings and then make use of PUD/P4D/PGD
> > > +      * for the linear mapping. This is only possible because the kernel
> > > +      * mapping lies outside the linear mapping.
> > > +      * In 32-bit however, as the kernel resides in the linear mapping,
> > > +      * setup_vm_final can not change the mapping established here,
> > > +      * otherwise the same kernel addresses would get mapped to different
> > > +      * physical addresses (if the start of dram is different from the
> > > +      * kernel physical address start).
> > > +      */
> > > +     kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> > > +                             0UL : PAGE_OFFSET - kernel_map.phys_addr;
> > >       kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> > >
> > > -     phys_ram_base = kernel_map.phys_addr;
> > > -
> > >       /*
> > >        * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> > >        * kernel, whereas for 64-bit kernel, the end of the virtual address
> > > @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
> > >       phys_addr_t start, end;
> > >       u64 i;
> > >
> > > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > > +     phys_addr_t ktext_start = __pa_symbol(_start);
> > > +     phys_addr_t ktext_size = __init_data_begin - _start;
> > > +     phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> > > +     phys_addr_t krodata_size = _data - __start_rodata;
> > > +
> > > +     /* Isolate kernel text and rodata so they don't get mapped with a PUD */
> > > +     memblock_mark_nomap(ktext_start,  ktext_size);
> > > +     memblock_mark_nomap(krodata_start, krodata_size);
> > > +#endif
> > > +
> > >       /* Map all memory banks in the linear mapping */
> > >       for_each_mem_range(i, &start, &end) {
> > >               if (start >= end)
> > > @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
> > >
> > >               create_linear_mapping_range(start, end);
> > >       }
> > > +
> > > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > > +     create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> > > +     create_linear_mapping_range(krodata_start,
> > > +                                 krodata_start + krodata_size);
> >
> > Just for my own education, it looks to me like the rodata is left writable
> > until the end of start_kernel(), when mark_rodata_ro() is called. Is that
> > correct?
> 
> Yes, right before init is triggered, certainly that late because the
> rodata section embeds the "__ro_after_init" variables.

Ah, that indeed helps clarify why. Sounds good.

Thanks,
drew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-27 11:37         ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27 11:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

On Mon, Mar 27, 2023 at 01:15:43PM +0200, Alexandre Ghiti wrote:
> Hi Andrew,
> 
> On Mon, Mar 27, 2023 at 11:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> > > During the early page table creation, we used to set the mapping for
> > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > PAGE_OFFSET is).
> > >
> > > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > > that early in the boot process because:
> > >
> > > - first, setup_vm installs a temporary kernel mapping and among other
> > >   things, discovers the system memory,
> > > - then, setup_vm_final creates the final kernel mapping and takes
> > >   advantage of the discovered system memory to create the linear
> > >   mapping.
> > >
> > > During the first phase, we don't know the start of the system memory and
> > > then until the second phase is finished, we can't use the linear mapping at
> > > all and phys_to_virt/virt_to_phys translations must not be used because it
> > > would result in a different translation from the 'real' one once the final
> > > mapping is installed.
> > >
> > > So here we simply delay the initialization of va_pa_offset to after the
> > > system memory discovery. But to make sure noone uses the linear mapping
> > > before, we add some guard in the DEBUG_VIRTUAL config.
> > >
> > > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > > in a better TLB utilization.
> > >
> > > Note that:
> > > - this does not apply to rv32 as the kernel mapping lies in the linear
> > >   mapping.
> > > - we rely on the firmware to protect itself using PMP.
> > >
> > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > Acked-by: Rob Herring <robh@kernel.org> # DT bits
> > > ---
> > >  arch/riscv/include/asm/page.h | 16 ++++++++++
> > >  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
> > >  arch/riscv/mm/physaddr.c      | 16 ++++++++++
> > >  drivers/of/fdt.c              | 11 ++++---
> > >  4 files changed, 90 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > > index 8dc686f549b6..ea1a0e237211 100644
> > > --- a/arch/riscv/include/asm/page.h
> > > +++ b/arch/riscv/include/asm/page.h
> > > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> > >  #define PTE_FMT "%08lx"
> > >  #endif
> > >
> > > +#ifdef CONFIG_64BIT
> > > +/*
> > > + * We override this value as its generic definition uses __pa too early in
> > > + * the boot process (before kernel_map.va_pa_offset is set).
> > > + */
> > > +#define MIN_MEMBLOCK_ADDR      0
> > > +#endif
> > > +
> > >  #ifdef CONFIG_MMU
> > >  #define ARCH_PFN_OFFSET              (PFN_DOWN((unsigned long)phys_ram_base))
> > >  #else
> > > @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
> > >  #define is_linear_mapping(x) \
> > >       ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> > >
> > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > >  #define linear_mapping_pa_to_va(x)   ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > > +#else
> > > +void *linear_mapping_pa_to_va(unsigned long x);
> > > +#endif
> > >  #define kernel_mapping_pa_to_va(y)   ({                                      \
> > >       unsigned long _y = (unsigned long)(y);                                  \
> > >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?                 \
> > > @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
> > >       })
> > >  #define __pa_to_va_nodebug(x)                linear_mapping_pa_to_va(x)
> > >
> > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > >  #define linear_mapping_va_to_pa(x)   ((unsigned long)(x) - kernel_map.va_pa_offset)
> > > +#else
> > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > > +#endif
> > >  #define kernel_mapping_va_to_pa(y) ({                                                \
> > >       unsigned long _y = (unsigned long)(y);                                  \
> > >       (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > > index 3b37d8606920..f803671d18b2 100644
> > > --- a/arch/riscv/mm/init.c
> > > +++ b/arch/riscv/mm/init.c
> > > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> > >       phys_ram_end = memblock_end_of_DRAM();
> > >       if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> > >               phys_ram_base = memblock_start_of_DRAM();
> > > +
> > > +     /*
> > > +      * In 64-bit, any use of __va/__pa before this point is wrong as we
> > > +      * did not know the start of DRAM before.
> > > +      */
> > > +     if (IS_ENABLED(CONFIG_64BIT))
> > > +             kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > > +
> > >       /*
> > >        * memblock allocator is not aware of the fact that last 4K bytes of
> > >        * the addressable memory can not be mapped because of IS_ERR_VALUE
> > > @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> > >
> > >  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> > >  {
> > > -     /* Upgrade to PMD_SIZE mappings whenever possible */
> > > -     base &= PMD_SIZE - 1;
> > > -     if (!base && size >= PMD_SIZE)
> > > +     if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > > +             return PGDIR_SIZE;
> > > +
> > > +     if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > > +             return P4D_SIZE;
> > > +
> > > +     if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > > +             return PUD_SIZE;
> > > +
> > > +     if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> > >               return PMD_SIZE;
> > >
> > >       return PAGE_SIZE;
> > > @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > >       set_satp_mode();
> > >  #endif
> > >
> > > -     kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > > +     /*
> > > +      * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> > > +      * where we have the system memory layout: this allows us to align
> > > +      * the physical and virtual mappings and then make use of PUD/P4D/PGD
> > > +      * for the linear mapping. This is only possible because the kernel
> > > +      * mapping lies outside the linear mapping.
> > > +      * In 32-bit however, as the kernel resides in the linear mapping,
> > > +      * setup_vm_final can not change the mapping established here,
> > > +      * otherwise the same kernel addresses would get mapped to different
> > > +      * physical addresses (if the start of dram is different from the
> > > +      * kernel physical address start).
> > > +      */
> > > +     kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> > > +                             0UL : PAGE_OFFSET - kernel_map.phys_addr;
> > >       kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> > >
> > > -     phys_ram_base = kernel_map.phys_addr;
> > > -
> > >       /*
> > >        * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> > >        * kernel, whereas for 64-bit kernel, the end of the virtual address
> > > @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
> > >       phys_addr_t start, end;
> > >       u64 i;
> > >
> > > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > > +     phys_addr_t ktext_start = __pa_symbol(_start);
> > > +     phys_addr_t ktext_size = __init_data_begin - _start;
> > > +     phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> > > +     phys_addr_t krodata_size = _data - __start_rodata;
> > > +
> > > +     /* Isolate kernel text and rodata so they don't get mapped with a PUD */
> > > +     memblock_mark_nomap(ktext_start,  ktext_size);
> > > +     memblock_mark_nomap(krodata_start, krodata_size);
> > > +#endif
> > > +
> > >       /* Map all memory banks in the linear mapping */
> > >       for_each_mem_range(i, &start, &end) {
> > >               if (start >= end)
> > > @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
> > >
> > >               create_linear_mapping_range(start, end);
> > >       }
> > > +
> > > +#ifdef CONFIG_STRICT_KERNEL_RWX
> > > +     create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> > > +     create_linear_mapping_range(krodata_start,
> > > +                                 krodata_start + krodata_size);
> >
> > Just for my own education, it looks to me like the rodata is left writable
> > until the end of start_kernel(), when mark_rodata_ro() is called. Is that
> > correct?
> 
> Yes, right before init is triggered, certainly that late because the
> rodata section embeds the "__ro_after_init" variables.

Ah, that indeed helps clarify why. Sounds good.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-24 15:54   ` Alexandre Ghiti
@ 2023-03-27 11:37     ` Andrew Jones
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27 11:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
> 
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
> 
> - first, setup_vm installs a temporary kernel mapping and among other
>   things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
>   advantage of the discovered system memory to create the linear
>   mapping.
> 
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
> 
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
> 
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
> 
> Note that:
> - this does not apply to rv32 as the kernel mapping lies in the linear
>   mapping.
> - we rely on the firmware to protect itself using PMP.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Acked-by: Rob Herring <robh@kernel.org> # DT bits
> ---
>  arch/riscv/include/asm/page.h | 16 ++++++++++
>  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
>  arch/riscv/mm/physaddr.c      | 16 ++++++++++
>  drivers/of/fdt.c              | 11 ++++---
>  4 files changed, 90 insertions(+), 11 deletions(-)
> 

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-27 11:37     ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2023-03-27 11:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Anup Patel, linux-riscv, linux-kernel, devicetree,
	Rob Herring

On Fri, Mar 24, 2023 at 04:54:21PM +0100, Alexandre Ghiti wrote:
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
> 
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
> 
> - first, setup_vm installs a temporary kernel mapping and among other
>   things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
>   advantage of the discovered system memory to create the linear
>   mapping.
> 
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
> 
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
> 
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
> 
> Note that:
> - this does not apply to rv32 as the kernel mapping lies in the linear
>   mapping.
> - we rely on the firmware to protect itself using PMP.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Acked-by: Rob Herring <robh@kernel.org> # DT bits
> ---
>  arch/riscv/include/asm/page.h | 16 ++++++++++
>  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
>  arch/riscv/mm/physaddr.c      | 16 ++++++++++
>  drivers/of/fdt.c              | 11 ++++---
>  4 files changed, 90 insertions(+), 11 deletions(-)
> 

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-24 15:54 ` Alexandre Ghiti
@ 2023-03-27 12:12   ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2023-03-27 12:12 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, linux-riscv, linux-kernel,
	devicetree

On Fri, Mar 24, 2023 at 9:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> This patchset intends to improve tlb utilization by using hugepages for
> the linear mapping.
>
> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> take care of isolating the kernel text and rodata so that they are not
> mapped with a PUD mapping which would then assign wrong permissions to
> the whole region: it is achieved the same way as arm64 by using the
> memblock nomap API which isolates those regions and re-merge them afterwards
> thus avoiding any issue with the system resources tree creation.
>
> base-commit-tag: v6.3-rc1
>
> v9:
> - Remove new API and arm64 patches as it created more issues than it
>   solved, thanks Anup for reporting those bugs!
> - Add a patch that moves the linear mapping creation outside of setup_vm_final
> - Use nomap API like arm64
> - Removed RB from Andrew and Anup as the patch changed its logic
> - Fix kernel rodata size computation
>
> v8:
> - Fix rv32, as reported by Anup
> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
>   being pretty at all, will wait for comments from arm64 reviewers, but
>   this patch can easily be dropped if they do not want it.
>
> v7:
> - Fix Anup bug report by introducing memblock_isolate_memory which
>   allows us to split the memblock mappings and then avoid to map the
>   the PUD which contains the kernel as read only
> - Add a patch to arm64 to use this newly introduced API
>
> v6:
> - quiet LLVM warning by casting phys_ram_base into an unsigned long
>
> v5:
> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
>   Conor
> - Add RB from Andrew
>
> v4:
> - Rebase on top of v6.2-rc3, as noted by Conor
> - Add Acked-by Rob
>
> v3:
> - Change the comment about initrd_start VA conversion so that it fits
>   ARM64 and RISCV64 (and others in the future if needed), as suggested
>   by Rob
>
> v2:
> - Add a comment on why RISCV64 does not need to set initrd_start/end that
>   early in the boot process, as asked by Rob
>
> Alexandre Ghiti (3):
>   riscv: Get rid of riscv_pfn_base variable
>   riscv: Move the linear mapping creation in its own function
>   riscv: Use PUD/P4D/PGD pages for the linear mapping

I have tested this series again on QEMU RV64 and RV32. I also tried
KVM RV64 and RV32, this works fine as well.

>
>  arch/riscv/include/asm/page.h |  19 ++++++-
>  arch/riscv/mm/init.c          | 102 ++++++++++++++++++++++++++--------
>  arch/riscv/mm/physaddr.c      |  16 ++++++
>  drivers/of/fdt.c              |  11 ++--
>  4 files changed, 118 insertions(+), 30 deletions(-)
>
> --
> 2.37.2
>

Regards,
Anup

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-27 12:12   ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2023-03-27 12:12 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, linux-riscv, linux-kernel,
	devicetree

On Fri, Mar 24, 2023 at 9:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> This patchset intends to improve tlb utilization by using hugepages for
> the linear mapping.
>
> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> take care of isolating the kernel text and rodata so that they are not
> mapped with a PUD mapping which would then assign wrong permissions to
> the whole region: it is achieved the same way as arm64 by using the
> memblock nomap API which isolates those regions and re-merge them afterwards
> thus avoiding any issue with the system resources tree creation.
>
> base-commit-tag: v6.3-rc1
>
> v9:
> - Remove new API and arm64 patches as it created more issues than it
>   solved, thanks Anup for reporting those bugs!
> - Add a patch that moves the linear mapping creation outside of setup_vm_final
> - Use nomap API like arm64
> - Removed RB from Andrew and Anup as the patch changed its logic
> - Fix kernel rodata size computation
>
> v8:
> - Fix rv32, as reported by Anup
> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
>   being pretty at all, will wait for comments from arm64 reviewers, but
>   this patch can easily be dropped if they do not want it.
>
> v7:
> - Fix Anup bug report by introducing memblock_isolate_memory which
>   allows us to split the memblock mappings and then avoid to map the
>   the PUD which contains the kernel as read only
> - Add a patch to arm64 to use this newly introduced API
>
> v6:
> - quiet LLVM warning by casting phys_ram_base into an unsigned long
>
> v5:
> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
>   Conor
> - Add RB from Andrew
>
> v4:
> - Rebase on top of v6.2-rc3, as noted by Conor
> - Add Acked-by Rob
>
> v3:
> - Change the comment about initrd_start VA conversion so that it fits
>   ARM64 and RISCV64 (and others in the future if needed), as suggested
>   by Rob
>
> v2:
> - Add a comment on why RISCV64 does not need to set initrd_start/end that
>   early in the boot process, as asked by Rob
>
> Alexandre Ghiti (3):
>   riscv: Get rid of riscv_pfn_base variable
>   riscv: Move the linear mapping creation in its own function
>   riscv: Use PUD/P4D/PGD pages for the linear mapping

I have tested this series again on QEMU RV64 and RV32. I also tried
KVM RV64 and RV32, this works fine as well.

>
>  arch/riscv/include/asm/page.h |  19 ++++++-
>  arch/riscv/mm/init.c          | 102 ++++++++++++++++++++++++++--------
>  arch/riscv/mm/physaddr.c      |  16 ++++++
>  drivers/of/fdt.c              |  11 ++--
>  4 files changed, 118 insertions(+), 30 deletions(-)
>
> --
> 2.37.2
>

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-24 15:54   ` Alexandre Ghiti
@ 2023-03-27 12:13     ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2023-03-27 12:13 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, linux-riscv, linux-kernel,
	devicetree, Rob Herring

On Fri, Mar 24, 2023 at 9:27 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
>
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
>
> - first, setup_vm installs a temporary kernel mapping and among other
>   things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
>   advantage of the discovered system memory to create the linear
>   mapping.
>
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
>
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
>
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
>
> Note that:
> - this does not apply to rv32 as the kernel mapping lies in the linear
>   mapping.
> - we rely on the firmware to protect itself using PMP.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Acked-by: Rob Herring <robh@kernel.org> # DT bits

Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/riscv/include/asm/page.h | 16 ++++++++++
>  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
>  arch/riscv/mm/physaddr.c      | 16 ++++++++++
>  drivers/of/fdt.c              | 11 ++++---
>  4 files changed, 90 insertions(+), 11 deletions(-)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 8dc686f549b6..ea1a0e237211 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
>  #define PTE_FMT "%08lx"
>  #endif
>
> +#ifdef CONFIG_64BIT
> +/*
> + * We override this value as its generic definition uses __pa too early in
> + * the boot process (before kernel_map.va_pa_offset is set).
> + */
> +#define MIN_MEMBLOCK_ADDR      0
> +#endif
> +
>  #ifdef CONFIG_MMU
>  #define ARCH_PFN_OFFSET                (PFN_DOWN((unsigned long)phys_ram_base))
>  #else
> @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
>  #define is_linear_mapping(x)   \
>         ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
>
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_pa_to_va(x)     ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> +#else
> +void *linear_mapping_pa_to_va(unsigned long x);
> +#endif
>  #define kernel_mapping_pa_to_va(y)     ({                                      \
>         unsigned long _y = (unsigned long)(y);                                  \
>         (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?                 \
> @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
>         })
>  #define __pa_to_va_nodebug(x)          linear_mapping_pa_to_va(x)
>
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_va_to_pa(x)     ((unsigned long)(x) - kernel_map.va_pa_offset)
> +#else
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> +#endif
>  #define kernel_mapping_va_to_pa(y) ({                                          \
>         unsigned long _y = (unsigned long)(y);                                  \
>         (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 3b37d8606920..f803671d18b2 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
>         phys_ram_end = memblock_end_of_DRAM();
>         if (!IS_ENABLED(CONFIG_XIP_KERNEL))
>                 phys_ram_base = memblock_start_of_DRAM();
> +
> +       /*
> +        * In 64-bit, any use of __va/__pa before this point is wrong as we
> +        * did not know the start of DRAM before.
> +        */
> +       if (IS_ENABLED(CONFIG_64BIT))
> +               kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> +
>         /*
>          * memblock allocator is not aware of the fact that last 4K bytes of
>          * the addressable memory can not be mapped because of IS_ERR_VALUE
> @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
>
>  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
>  {
> -       /* Upgrade to PMD_SIZE mappings whenever possible */
> -       base &= PMD_SIZE - 1;
> -       if (!base && size >= PMD_SIZE)
> +       if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> +               return PGDIR_SIZE;
> +
> +       if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> +               return P4D_SIZE;
> +
> +       if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> +               return PUD_SIZE;
> +
> +       if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
>                 return PMD_SIZE;
>
>         return PAGE_SIZE;
> @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         set_satp_mode();
>  #endif
>
> -       kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> +       /*
> +        * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> +        * where we have the system memory layout: this allows us to align
> +        * the physical and virtual mappings and then make use of PUD/P4D/PGD
> +        * for the linear mapping. This is only possible because the kernel
> +        * mapping lies outside the linear mapping.
> +        * In 32-bit however, as the kernel resides in the linear mapping,
> +        * setup_vm_final can not change the mapping established here,
> +        * otherwise the same kernel addresses would get mapped to different
> +        * physical addresses (if the start of dram is different from the
> +        * kernel physical address start).
> +        */
> +       kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> +                               0UL : PAGE_OFFSET - kernel_map.phys_addr;
>         kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>
> -       phys_ram_base = kernel_map.phys_addr;
> -
>         /*
>          * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
>          * kernel, whereas for 64-bit kernel, the end of the virtual address
> @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
>         phys_addr_t start, end;
>         u64 i;
>
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +       phys_addr_t ktext_start = __pa_symbol(_start);
> +       phys_addr_t ktext_size = __init_data_begin - _start;
> +       phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> +       phys_addr_t krodata_size = _data - __start_rodata;
> +
> +       /* Isolate kernel text and rodata so they don't get mapped with a PUD */
> +       memblock_mark_nomap(ktext_start,  ktext_size);
> +       memblock_mark_nomap(krodata_start, krodata_size);
> +#endif
> +
>         /* Map all memory banks in the linear mapping */
>         for_each_mem_range(i, &start, &end) {
>                 if (start >= end)
> @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
>
>                 create_linear_mapping_range(start, end);
>         }
> +
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +       create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> +       create_linear_mapping_range(krodata_start,
> +                                   krodata_start + krodata_size);
> +
> +       memblock_clear_nomap(ktext_start,  ktext_size);
> +       memblock_clear_nomap(krodata_start, krodata_size);
> +#endif
>  }
>
>  static void __init setup_vm_final(void)
> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> index 9b18bda74154..18706f457da7 100644
> --- a/arch/riscv/mm/physaddr.c
> +++ b/arch/riscv/mm/physaddr.c
> @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
>         return __va_to_pa_nodebug(x);
>  }
>  EXPORT_SYMBOL(__phys_addr_symbol);
> +
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> +{
> +       BUG_ON(!kernel_map.va_pa_offset);
> +
> +       return ((unsigned long)(x) - kernel_map.va_pa_offset);
> +}
> +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> +
> +void *linear_mapping_pa_to_va(unsigned long x)
> +{
> +       BUG_ON(!kernel_map.va_pa_offset);
> +
> +       return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> +}
> +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index d1a68b6d03b3..d14735a81301 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
>  static void __early_init_dt_declare_initrd(unsigned long start,
>                                            unsigned long end)
>  {
> -       /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> -        * enabled since __va() is called too early. ARM64 does make use
> -        * of phys_initrd_start/phys_initrd_size so we can skip this
> -        * conversion.
> +       /*
> +        * __va() is not yet available this early on some platforms. In that
> +        * case, the platform uses phys_initrd_start/phys_initrd_size instead
> +        * and does the VA conversion itself.
>          */
> -       if (!IS_ENABLED(CONFIG_ARM64)) {
> +       if (!IS_ENABLED(CONFIG_ARM64) &&
> +           !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
>                 initrd_start = (unsigned long)__va(start);
>                 initrd_end = (unsigned long)__va(end);
>                 initrd_below_start_ok = 1;
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-03-27 12:13     ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2023-03-27 12:13 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, linux-riscv, linux-kernel,
	devicetree, Rob Herring

On Fri, Mar 24, 2023 at 9:27 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
>
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
>
> - first, setup_vm installs a temporary kernel mapping and among other
>   things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
>   advantage of the discovered system memory to create the linear
>   mapping.
>
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
>
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
>
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
>
> Note that:
> - this does not apply to rv32 as the kernel mapping lies in the linear
>   mapping.
> - we rely on the firmware to protect itself using PMP.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Acked-by: Rob Herring <robh@kernel.org> # DT bits

Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/riscv/include/asm/page.h | 16 ++++++++++
>  arch/riscv/mm/init.c          | 58 +++++++++++++++++++++++++++++++----
>  arch/riscv/mm/physaddr.c      | 16 ++++++++++
>  drivers/of/fdt.c              | 11 ++++---
>  4 files changed, 90 insertions(+), 11 deletions(-)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 8dc686f549b6..ea1a0e237211 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
>  #define PTE_FMT "%08lx"
>  #endif
>
> +#ifdef CONFIG_64BIT
> +/*
> + * We override this value as its generic definition uses __pa too early in
> + * the boot process (before kernel_map.va_pa_offset is set).
> + */
> +#define MIN_MEMBLOCK_ADDR      0
> +#endif
> +
>  #ifdef CONFIG_MMU
>  #define ARCH_PFN_OFFSET                (PFN_DOWN((unsigned long)phys_ram_base))
>  #else
> @@ -121,7 +129,11 @@ extern phys_addr_t phys_ram_base;
>  #define is_linear_mapping(x)   \
>         ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
>
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_pa_to_va(x)     ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> +#else
> +void *linear_mapping_pa_to_va(unsigned long x);
> +#endif
>  #define kernel_mapping_pa_to_va(y)     ({                                      \
>         unsigned long _y = (unsigned long)(y);                                  \
>         (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ?                 \
> @@ -130,7 +142,11 @@ extern phys_addr_t phys_ram_base;
>         })
>  #define __pa_to_va_nodebug(x)          linear_mapping_pa_to_va(x)
>
> +#ifndef CONFIG_DEBUG_VIRTUAL
>  #define linear_mapping_va_to_pa(x)     ((unsigned long)(x) - kernel_map.va_pa_offset)
> +#else
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> +#endif
>  #define kernel_mapping_va_to_pa(y) ({                                          \
>         unsigned long _y = (unsigned long)(y);                                  \
>         (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 3b37d8606920..f803671d18b2 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
>         phys_ram_end = memblock_end_of_DRAM();
>         if (!IS_ENABLED(CONFIG_XIP_KERNEL))
>                 phys_ram_base = memblock_start_of_DRAM();
> +
> +       /*
> +        * In 64-bit, any use of __va/__pa before this point is wrong as we
> +        * did not know the start of DRAM before.
> +        */
> +       if (IS_ENABLED(CONFIG_64BIT))
> +               kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> +
>         /*
>          * memblock allocator is not aware of the fact that last 4K bytes of
>          * the addressable memory can not be mapped because of IS_ERR_VALUE
> @@ -667,9 +675,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
>
>  static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
>  {
> -       /* Upgrade to PMD_SIZE mappings whenever possible */
> -       base &= PMD_SIZE - 1;
> -       if (!base && size >= PMD_SIZE)
> +       if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> +               return PGDIR_SIZE;
> +
> +       if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> +               return P4D_SIZE;
> +
> +       if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> +               return PUD_SIZE;
> +
> +       if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
>                 return PMD_SIZE;
>
>         return PAGE_SIZE;
> @@ -978,11 +993,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         set_satp_mode();
>  #endif
>
> -       kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> +       /*
> +        * In 64-bit, we defer the setup of va_pa_offset to setup_bootmem,
> +        * where we have the system memory layout: this allows us to align
> +        * the physical and virtual mappings and then make use of PUD/P4D/PGD
> +        * for the linear mapping. This is only possible because the kernel
> +        * mapping lies outside the linear mapping.
> +        * In 32-bit however, as the kernel resides in the linear mapping,
> +        * setup_vm_final can not change the mapping established here,
> +        * otherwise the same kernel addresses would get mapped to different
> +        * physical addresses (if the start of dram is different from the
> +        * kernel physical address start).
> +        */
> +       kernel_map.va_pa_offset = IS_ENABLED(CONFIG_64BIT) ?
> +                               0UL : PAGE_OFFSET - kernel_map.phys_addr;
>         kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>
> -       phys_ram_base = kernel_map.phys_addr;
> -
>         /*
>          * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
>          * kernel, whereas for 64-bit kernel, the end of the virtual address
> @@ -1106,6 +1132,17 @@ static void __init create_linear_mapping_page_table(void)
>         phys_addr_t start, end;
>         u64 i;
>
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +       phys_addr_t ktext_start = __pa_symbol(_start);
> +       phys_addr_t ktext_size = __init_data_begin - _start;
> +       phys_addr_t krodata_start = __pa_symbol(__start_rodata);
> +       phys_addr_t krodata_size = _data - __start_rodata;
> +
> +       /* Isolate kernel text and rodata so they don't get mapped with a PUD */
> +       memblock_mark_nomap(ktext_start,  ktext_size);
> +       memblock_mark_nomap(krodata_start, krodata_size);
> +#endif
> +
>         /* Map all memory banks in the linear mapping */
>         for_each_mem_range(i, &start, &end) {
>                 if (start >= end)
> @@ -1118,6 +1155,15 @@ static void __init create_linear_mapping_page_table(void)
>
>                 create_linear_mapping_range(start, end);
>         }
> +
> +#ifdef CONFIG_STRICT_KERNEL_RWX
> +       create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
> +       create_linear_mapping_range(krodata_start,
> +                                   krodata_start + krodata_size);
> +
> +       memblock_clear_nomap(ktext_start,  ktext_size);
> +       memblock_clear_nomap(krodata_start, krodata_size);
> +#endif
>  }
>
>  static void __init setup_vm_final(void)
> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> index 9b18bda74154..18706f457da7 100644
> --- a/arch/riscv/mm/physaddr.c
> +++ b/arch/riscv/mm/physaddr.c
> @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
>         return __va_to_pa_nodebug(x);
>  }
>  EXPORT_SYMBOL(__phys_addr_symbol);
> +
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> +{
> +       BUG_ON(!kernel_map.va_pa_offset);
> +
> +       return ((unsigned long)(x) - kernel_map.va_pa_offset);
> +}
> +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> +
> +void *linear_mapping_pa_to_va(unsigned long x)
> +{
> +       BUG_ON(!kernel_map.va_pa_offset);
> +
> +       return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> +}
> +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index d1a68b6d03b3..d14735a81301 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -887,12 +887,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
>  static void __early_init_dt_declare_initrd(unsigned long start,
>                                            unsigned long end)
>  {
> -       /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> -        * enabled since __va() is called too early. ARM64 does make use
> -        * of phys_initrd_start/phys_initrd_size so we can skip this
> -        * conversion.
> +       /*
> +        * __va() is not yet available this early on some platforms. In that
> +        * case, the platform uses phys_initrd_start/phys_initrd_size instead
> +        * and does the VA conversion itself.
>          */
> -       if (!IS_ENABLED(CONFIG_ARM64)) {
> +       if (!IS_ENABLED(CONFIG_ARM64) &&
> +           !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
>                 initrd_start = (unsigned long)__va(start);
>                 initrd_end = (unsigned long)__va(end);
>                 initrd_below_start_ok = 1;
> --
> 2.37.2
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 2/3] riscv: Move the linear mapping creation in its own function
  2023-03-24 15:54   ` Alexandre Ghiti
@ 2023-03-27 12:14     ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2023-03-27 12:14 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, linux-riscv, linux-kernel,
	devicetree

On Fri, Mar 24, 2023 at 9:26 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> No change intended, it just splits the linear mapping creation from
> setup_vm_final: this prepares for upcoming additions to the linear
> mapping creation.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/riscv/mm/init.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index cc558d94559a..3b37d8606920 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1086,16 +1086,25 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         pt_ops_set_fixmap();
>  }
>
> -static void __init setup_vm_final(void)
> +static void __init create_linear_mapping_range(phys_addr_t start,
> +                                              phys_addr_t end)
>  {
> +       phys_addr_t pa;
>         uintptr_t va, map_size;
> -       phys_addr_t pa, start, end;
> -       u64 i;
>
> -       /* Setup swapper PGD for fixmap */
> -       create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
> -                          __pa_symbol(fixmap_pgd_next),
> -                          PGDIR_SIZE, PAGE_TABLE);
> +       for (pa = start; pa < end; pa += map_size) {
> +               va = (uintptr_t)__va(pa);
> +               map_size = best_map_size(pa, end - pa);
> +
> +               create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
> +                                  pgprot_from_va(va));
> +       }
> +}
> +
> +static void __init create_linear_mapping_page_table(void)
> +{
> +       phys_addr_t start, end;
> +       u64 i;
>
>         /* Map all memory banks in the linear mapping */
>         for_each_mem_range(i, &start, &end) {
> @@ -1107,14 +1116,19 @@ static void __init setup_vm_final(void)
>                 if (end >= __pa(PAGE_OFFSET) + memory_limit)
>                         end = __pa(PAGE_OFFSET) + memory_limit;
>
> -               for (pa = start; pa < end; pa += map_size) {
> -                       va = (uintptr_t)__va(pa);
> -                       map_size = best_map_size(pa, end - pa);
> -
> -                       create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
> -                                          pgprot_from_va(va));
> -               }
> +               create_linear_mapping_range(start, end);
>         }
> +}
> +
> +static void __init setup_vm_final(void)
> +{
> +       /* Setup swapper PGD for fixmap */
> +       create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
> +                          __pa_symbol(fixmap_pgd_next),
> +                          PGDIR_SIZE, PAGE_TABLE);
> +
> +       /* Map the linear mapping */
> +       create_linear_mapping_page_table();
>
>         /* Map the kernel */
>         if (IS_ENABLED(CONFIG_64BIT))
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 2/3] riscv: Move the linear mapping creation in its own function
@ 2023-03-27 12:14     ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2023-03-27 12:14 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, linux-riscv, linux-kernel,
	devicetree

On Fri, Mar 24, 2023 at 9:26 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> No change intended, it just splits the linear mapping creation from
> setup_vm_final: this prepares for upcoming additions to the linear
> mapping creation.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/riscv/mm/init.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index cc558d94559a..3b37d8606920 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1086,16 +1086,25 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         pt_ops_set_fixmap();
>  }
>
> -static void __init setup_vm_final(void)
> +static void __init create_linear_mapping_range(phys_addr_t start,
> +                                              phys_addr_t end)
>  {
> +       phys_addr_t pa;
>         uintptr_t va, map_size;
> -       phys_addr_t pa, start, end;
> -       u64 i;
>
> -       /* Setup swapper PGD for fixmap */
> -       create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
> -                          __pa_symbol(fixmap_pgd_next),
> -                          PGDIR_SIZE, PAGE_TABLE);
> +       for (pa = start; pa < end; pa += map_size) {
> +               va = (uintptr_t)__va(pa);
> +               map_size = best_map_size(pa, end - pa);
> +
> +               create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
> +                                  pgprot_from_va(va));
> +       }
> +}
> +
> +static void __init create_linear_mapping_page_table(void)
> +{
> +       phys_addr_t start, end;
> +       u64 i;
>
>         /* Map all memory banks in the linear mapping */
>         for_each_mem_range(i, &start, &end) {
> @@ -1107,14 +1116,19 @@ static void __init setup_vm_final(void)
>                 if (end >= __pa(PAGE_OFFSET) + memory_limit)
>                         end = __pa(PAGE_OFFSET) + memory_limit;
>
> -               for (pa = start; pa < end; pa += map_size) {
> -                       va = (uintptr_t)__va(pa);
> -                       map_size = best_map_size(pa, end - pa);
> -
> -                       create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
> -                                          pgprot_from_va(va));
> -               }
> +               create_linear_mapping_range(start, end);
>         }
> +}
> +
> +static void __init setup_vm_final(void)
> +{
> +       /* Setup swapper PGD for fixmap */
> +       create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
> +                          __pa_symbol(fixmap_pgd_next),
> +                          PGDIR_SIZE, PAGE_TABLE);
> +
> +       /* Map the linear mapping */
> +       create_linear_mapping_page_table();
>
>         /* Map the kernel */
>         if (IS_ENABLED(CONFIG_64BIT))
> --
> 2.37.2
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-24 15:54 ` Alexandre Ghiti
@ 2023-04-19 14:22   ` Palmer Dabbelt
  -1 siblings, 0 replies; 28+ messages in thread
From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree, Alexandre Ghiti


On Fri, 24 Mar 2023 16:54:18 +0100, Alexandre Ghiti wrote:
> This patchset intends to improve tlb utilization by using hugepages for
> the linear mapping.
> 
> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> take care of isolating the kernel text and rodata so that they are not
> mapped with a PUD mapping which would then assign wrong permissions to
> the whole region: it is achieved the same way as arm64 by using the
> memblock nomap API which isolates those regions and re-merge them afterwards
> thus avoiding any issue with the system resources tree creation.
> 
> [...]

Applied, thanks!

[1/3] riscv: Get rid of riscv_pfn_base variable
      https://git.kernel.org/palmer/c/a7407a1318a9
[2/3] riscv: Move the linear mapping creation in its own function
      https://git.kernel.org/palmer/c/8589e346bbb6
[3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
      https://git.kernel.org/palmer/c/3335068f8721

Best regards,
-- 
Palmer Dabbelt <palmer@rivosinc.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-04-19 14:22   ` Palmer Dabbelt
  0 siblings, 0 replies; 28+ messages in thread
From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Rob Herring,
	Frank Rowand, Andrew Jones, Anup Patel, linux-riscv,
	linux-kernel, devicetree, Alexandre Ghiti


On Fri, 24 Mar 2023 16:54:18 +0100, Alexandre Ghiti wrote:
> This patchset intends to improve tlb utilization by using hugepages for
> the linear mapping.
> 
> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> take care of isolating the kernel text and rodata so that they are not
> mapped with a PUD mapping which would then assign wrong permissions to
> the whole region: it is achieved the same way as arm64 by using the
> memblock nomap API which isolates those regions and re-merge them afterwards
> thus avoiding any issue with the system resources tree creation.
> 
> [...]

Applied, thanks!

[1/3] riscv: Get rid of riscv_pfn_base variable
      https://git.kernel.org/palmer/c/a7407a1318a9
[2/3] riscv: Move the linear mapping creation in its own function
      https://git.kernel.org/palmer/c/8589e346bbb6
[3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
      https://git.kernel.org/palmer/c/3335068f8721

Best regards,
-- 
Palmer Dabbelt <palmer@rivosinc.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
  2023-03-24 15:54 ` Alexandre Ghiti
@ 2023-04-19 14:30   ` patchwork-bot+linux-riscv
  -1 siblings, 0 replies; 28+ messages in thread
From: patchwork-bot+linux-riscv @ 2023-04-19 14:30 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: linux-riscv, paul.walmsley, palmer, aou, robh+dt, frowand.list,
	ajones, anup, linux-kernel, devicetree

Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Fri, 24 Mar 2023 16:54:18 +0100 you wrote:
> This patchset intends to improve tlb utilization by using hugepages for
> the linear mapping.
> 
> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> take care of isolating the kernel text and rodata so that they are not
> mapped with a PUD mapping which would then assign wrong permissions to
> the whole region: it is achieved the same way as arm64 by using the
> memblock nomap API which isolates those regions and re-merge them afterwards
> thus avoiding any issue with the system resources tree creation.
> 
> [...]

Here is the summary with links:
  - [v9,1/3] riscv: Get rid of riscv_pfn_base variable
    https://git.kernel.org/riscv/c/a7407a1318a9
  - [v9,2/3] riscv: Move the linear mapping creation in its own function
    https://git.kernel.org/riscv/c/8589e346bbb6
  - [v9,3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
    https://git.kernel.org/riscv/c/3335068f8721

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
@ 2023-04-19 14:30   ` patchwork-bot+linux-riscv
  0 siblings, 0 replies; 28+ messages in thread
From: patchwork-bot+linux-riscv @ 2023-04-19 14:30 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: linux-riscv, paul.walmsley, palmer, aou, robh+dt, frowand.list,
	ajones, anup, linux-kernel, devicetree

Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Fri, 24 Mar 2023 16:54:18 +0100 you wrote:
> This patchset intends to improve tlb utilization by using hugepages for
> the linear mapping.
> 
> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> take care of isolating the kernel text and rodata so that they are not
> mapped with a PUD mapping which would then assign wrong permissions to
> the whole region: it is achieved the same way as arm64 by using the
> memblock nomap API which isolates those regions and re-merge them afterwards
> thus avoiding any issue with the system resources tree creation.
> 
> [...]

Here is the summary with links:
  - [v9,1/3] riscv: Get rid of riscv_pfn_base variable
    https://git.kernel.org/riscv/c/a7407a1318a9
  - [v9,2/3] riscv: Move the linear mapping creation in its own function
    https://git.kernel.org/riscv/c/8589e346bbb6
  - [v9,3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping
    https://git.kernel.org/riscv/c/3335068f8721

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-04-19 14:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-24 15:54 [PATCH v9 0/3] riscv: Use PUD/P4D/PGD pages for the linear mapping Alexandre Ghiti
2023-03-24 15:54 ` Alexandre Ghiti
2023-03-24 15:54 ` [PATCH v9 1/3] riscv: Get rid of riscv_pfn_base variable Alexandre Ghiti
2023-03-24 15:54   ` Alexandre Ghiti
2023-03-24 15:54 ` [PATCH v9 2/3] riscv: Move the linear mapping creation in its own function Alexandre Ghiti
2023-03-24 15:54   ` Alexandre Ghiti
2023-03-27  9:39   ` Andrew Jones
2023-03-27  9:39     ` Andrew Jones
2023-03-27 12:14   ` Anup Patel
2023-03-27 12:14     ` Anup Patel
2023-03-24 15:54 ` [PATCH v9 3/3] riscv: Use PUD/P4D/PGD pages for the linear mapping Alexandre Ghiti
2023-03-24 15:54   ` Alexandre Ghiti
2023-03-27  9:39   ` Andrew Jones
2023-03-27  9:39     ` Andrew Jones
2023-03-27 11:15     ` Alexandre Ghiti
2023-03-27 11:15       ` Alexandre Ghiti
2023-03-27 11:37       ` Andrew Jones
2023-03-27 11:37         ` Andrew Jones
2023-03-27 11:37   ` Andrew Jones
2023-03-27 11:37     ` Andrew Jones
2023-03-27 12:13   ` Anup Patel
2023-03-27 12:13     ` Anup Patel
2023-03-27 12:12 ` [PATCH v9 0/3] " Anup Patel
2023-03-27 12:12   ` Anup Patel
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22   ` Palmer Dabbelt
2023-04-19 14:30 ` patchwork-bot+linux-riscv
2023-04-19 14:30   ` patchwork-bot+linux-riscv

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.