All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2021-12-06 10:46 ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

* Please note notable changes in memory layouts and kasan population *

This patchset allows to have a single kernel for sv39 and sv48 without
being relocatable.

The idea comes from Arnd Bergmann who suggested to do the same as x86,
that is mapping the kernel to the end of the address space, which allows
the kernel to be linked at the same address for both sv39 and sv48 and
then does not require to be relocated at runtime.

This implements sv48 support at runtime. The kernel will try to
boot with 4-level page table and will fallback to 3-level if the HW does not
support it. Folding the 4th level into a 3-level page table has almost no
cost at runtime.

Note that kasan region had to be moved to the end of the address space
since its location must be known at compile-time and then be valid for
both sv39 and sv48 (and sv57 that is coming).

Tested on:
  - qemu rv64 sv39: OK
  - qemu rv64 sv48: OK
  - qemu rv64 sv39 + kasan: OK
  - qemu rv64 sv48 + kasan: OK
  - qemu rv32: OK

Changes in v3:
  - Fix SZ_1T, thanks to Atish
  - Fix warning create_pud_mapping, thanks to Atish
  - Fix k210 nommu build, thanks to Atish
  - Fix wrong rebase as noted by Samuel
  - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
  - * Move KASAN next to the kernel: virtual layouts changed and kasan population *

Changes in v2:
  - Rebase onto for-next
  - Fix KASAN
  - Fix stack canary
  - Get completely rid of MAXPHYSMEM configs
  - Add documentation

Alexandre Ghiti (13):
  riscv: Move KASAN mapping next to the kernel mapping
  riscv: Split early kasan mapping to prepare sv48 introduction
  riscv: Introduce functions to switch pt_ops
  riscv: Allow to dynamically define VA_BITS
  riscv: Get rid of MAXPHYSMEM configs
  asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
  riscv: Implement sv48 support
  riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
  riscv: Explicit comment about user virtual address space size
  riscv: Improve virtual kernel memory layout dump
  Documentation: riscv: Add sv48 description to VM layout
  riscv: Initialize thread pointer before calling C functions
  riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN

 Documentation/riscv/vm-layout.rst             |  48 ++-
 arch/riscv/Kconfig                            |  37 +-
 arch/riscv/configs/nommu_k210_defconfig       |   1 -
 .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
 arch/riscv/configs/nommu_virt_defconfig       |   1 -
 arch/riscv/include/asm/csr.h                  |   3 +-
 arch/riscv/include/asm/fixmap.h               |   1
 arch/riscv/include/asm/kasan.h                |  11 +-
 arch/riscv/include/asm/page.h                 |  20 +-
 arch/riscv/include/asm/pgalloc.h              |  40 ++
 arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
 arch/riscv/include/asm/pgtable.h              |  47 +-
 arch/riscv/include/asm/sparsemem.h            |   6 +-
 arch/riscv/kernel/cpu.c                       |  23 +-
 arch/riscv/kernel/head.S                      |   4 +-
 arch/riscv/mm/context.c                       |   4 +-
 arch/riscv/mm/init.c                          | 408 ++++++++++++++----
 arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
 drivers/firmware/efi/libstub/efi-stub.c       |   2
 drivers/pci/controller/pci-xgene.c            |   2 +-
 include/asm-generic/pgalloc.h                 |  24 +-
 include/linux/sizes.h                         |   1
 22 files changed, 833 insertions(+), 209 deletions(-)

--
2.32.0


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2021-12-06 10:46 ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

* Please note notable changes in memory layouts and kasan population *

This patchset allows to have a single kernel for sv39 and sv48 without
being relocatable.

The idea comes from Arnd Bergmann who suggested to do the same as x86,
that is mapping the kernel to the end of the address space, which allows
the kernel to be linked at the same address for both sv39 and sv48 and
then does not require to be relocated at runtime.

This implements sv48 support at runtime. The kernel will try to
boot with 4-level page table and will fallback to 3-level if the HW does not
support it. Folding the 4th level into a 3-level page table has almost no
cost at runtime.

Note that kasan region had to be moved to the end of the address space
since its location must be known at compile-time and then be valid for
both sv39 and sv48 (and sv57 that is coming).

Tested on:
  - qemu rv64 sv39: OK
  - qemu rv64 sv48: OK
  - qemu rv64 sv39 + kasan: OK
  - qemu rv64 sv48 + kasan: OK
  - qemu rv32: OK

Changes in v3:
  - Fix SZ_1T, thanks to Atish
  - Fix warning create_pud_mapping, thanks to Atish
  - Fix k210 nommu build, thanks to Atish
  - Fix wrong rebase as noted by Samuel
  - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
  - * Move KASAN next to the kernel: virtual layouts changed and kasan population *

Changes in v2:
  - Rebase onto for-next
  - Fix KASAN
  - Fix stack canary
  - Get completely rid of MAXPHYSMEM configs
  - Add documentation

Alexandre Ghiti (13):
  riscv: Move KASAN mapping next to the kernel mapping
  riscv: Split early kasan mapping to prepare sv48 introduction
  riscv: Introduce functions to switch pt_ops
  riscv: Allow to dynamically define VA_BITS
  riscv: Get rid of MAXPHYSMEM configs
  asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
  riscv: Implement sv48 support
  riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
  riscv: Explicit comment about user virtual address space size
  riscv: Improve virtual kernel memory layout dump
  Documentation: riscv: Add sv48 description to VM layout
  riscv: Initialize thread pointer before calling C functions
  riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN

 Documentation/riscv/vm-layout.rst             |  48 ++-
 arch/riscv/Kconfig                            |  37 +-
 arch/riscv/configs/nommu_k210_defconfig       |   1 -
 .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
 arch/riscv/configs/nommu_virt_defconfig       |   1 -
 arch/riscv/include/asm/csr.h                  |   3 +-
 arch/riscv/include/asm/fixmap.h               |   1
 arch/riscv/include/asm/kasan.h                |  11 +-
 arch/riscv/include/asm/page.h                 |  20 +-
 arch/riscv/include/asm/pgalloc.h              |  40 ++
 arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
 arch/riscv/include/asm/pgtable.h              |  47 +-
 arch/riscv/include/asm/sparsemem.h            |   6 +-
 arch/riscv/kernel/cpu.c                       |  23 +-
 arch/riscv/kernel/head.S                      |   4 +-
 arch/riscv/mm/context.c                       |   4 +-
 arch/riscv/mm/init.c                          | 408 ++++++++++++++----
 arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
 drivers/firmware/efi/libstub/efi-stub.c       |   2
 drivers/pci/controller/pci-xgene.c            |   2 +-
 include/asm-generic/pgalloc.h                 |  24 +-
 include/linux/sizes.h                         |   1
 22 files changed, 833 insertions(+), 209 deletions(-)

--
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v3 01/13] riscv: Move KASAN mapping next to the kernel mapping
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

Now that KASAN_SHADOW_OFFSET is defined at compile time as a config,
this value must remain constant whatever the size of the virtual address
space, which is only possible by pushing this region at the end of the
address space next to the kernel mapping.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 Documentation/riscv/vm-layout.rst | 12 ++++++------
 arch/riscv/Kconfig                |  4 ++--
 arch/riscv/include/asm/kasan.h    |  4 ++--
 arch/riscv/include/asm/page.h     |  6 +++++-
 arch/riscv/include/asm/pgtable.h  |  6 ++++--
 arch/riscv/mm/init.c              | 25 +++++++++++++------------
 6 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
index b7f98930d38d..1bd687b97104 100644
--- a/Documentation/riscv/vm-layout.rst
+++ b/Documentation/riscv/vm-layout.rst
@@ -47,12 +47,12 @@ RISC-V Linux Kernel SV39
                                                               | Kernel-space virtual memory, shared between all processes:
   ____________________________________________________________|___________________________________________________________
                     |            |                  |         |
-   ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
-   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
-   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
-   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
-   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB | vmalloc/ioremap space
-   ffffffe000000000 | -128    GB | ffffffff7fffffff |  124 GB | direct mapping of all physical memory
+   ffffffc6fee00000 | -228    GB | ffffffc6feffffff |    2 MB | fixmap
+   ffffffc6ff000000 | -228    GB | ffffffc6ffffffff |   16 MB | PCI io
+   ffffffc700000000 | -228    GB | ffffffc7ffffffff |    4 GB | vmemmap
+   ffffffc800000000 | -224    GB | ffffffd7ffffffff |   64 GB | vmalloc/ioremap space
+   ffffffd800000000 | -160    GB | fffffff6ffffffff |  124 GB | direct mapping of all physical memory
+   fffffff700000000 |  -36    GB | fffffffeffffffff |   32 GB | kasan
   __________________|____________|__________________|_________|____________________________________________________________
                                                               |
                                                               |
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 6d5b63bd4bd9..6cd98ade5ebc 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -161,12 +161,12 @@ config PAGE_OFFSET
 	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
 	default 0x80000000 if 64BIT && !MMU
 	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
-	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
+	default 0xffffffd800000000 if 64BIT && MAXPHYSMEM_128GB
 
 config KASAN_SHADOW_OFFSET
 	hex
 	depends on KASAN_GENERIC
-	default 0xdfffffc800000000 if 64BIT
+	default 0xdfffffff00000000 if 64BIT
 	default 0xffffffff if 32BIT
 
 config ARCH_FLATMEM_ENABLE
diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index b00f503ec124..257a2495145a 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -28,8 +28,8 @@
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
 #define KASAN_SHADOW_SIZE	(UL(1) << ((CONFIG_VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
-#define KASAN_SHADOW_START	KERN_VIRT_START
-#define KASAN_SHADOW_END	(KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
+#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
+#define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
 
 void kasan_init(void);
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 109c97e991a6..e03559f9b35e 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -33,7 +33,11 @@
  */
 #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
 
-#define KERN_VIRT_SIZE (-PAGE_OFFSET)
+/*
+ * Half of the kernel address space (half of the entries of the page global
+ * directory) is for the direct mapping.
+ */
+#define KERN_VIRT_SIZE		((PTRS_PER_PGD / 2 * PGDIR_SIZE) / 2)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 39b550310ec6..d34f3a7a9701 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -39,8 +39,10 @@
 
 /* Modules always live before the kernel */
 #ifdef CONFIG_64BIT
-#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
-#define MODULES_END	(PFN_ALIGN((unsigned long)&_start))
+/* This is used to define the end of the KASAN shadow region */
+#define MODULES_LOWEST_VADDR	(KERNEL_LINK_ADDR - SZ_2G)
+#define MODULES_VADDR		(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
+#define MODULES_END		(PFN_ALIGN((unsigned long)&_start))
 #endif
 
 /*
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index c0cddf0fc22d..4224e9d0ecf5 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -103,6 +103,9 @@ static void __init print_vm_layout(void)
 	print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
 		  (unsigned long)high_memory);
 #ifdef CONFIG_64BIT
+#ifdef CONFIG_KASAN
+	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
+#endif
 	print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
 		  (unsigned long)ADDRESS_SPACE_END);
 #endif
@@ -130,18 +133,8 @@ void __init mem_init(void)
 	print_vm_layout();
 }
 
-/*
- * The default maximal physical memory size is -PAGE_OFFSET for 32-bit kernel,
- * whereas for 64-bit kernel, the end of the virtual address space is occupied
- * by the modules/BPF/kernel mappings which reduces the available size of the
- * linear mapping.
- * Limit the memory size via mem.
- */
-#ifdef CONFIG_64BIT
-static phys_addr_t memory_limit = -PAGE_OFFSET - SZ_4G;
-#else
-static phys_addr_t memory_limit = -PAGE_OFFSET;
-#endif
+/* Limit the memory size via mem. */
+static phys_addr_t memory_limit;
 
 static int __init early_mem(char *p)
 {
@@ -613,6 +606,14 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 
 	riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
 
+	/*
+	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
+	 * kernel, whereas for 64-bit kernel, the end of the virtual address
+	 * space is occupied by the modules/BPF/kernel mappings which reduces
+	 * the available size of the linear mapping.
+	 */
+	memory_limit = KERN_VIRT_SIZE - (IS_ENABLED(CONFIG_64BIT) ? SZ_4G : 0);
+
 	/* Sanity check alignment and size */
 	BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0);
 	BUG_ON((kernel_map.phys_addr % PMD_SIZE) != 0);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 01/13] riscv: Move KASAN mapping next to the kernel mapping
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

Now that KASAN_SHADOW_OFFSET is defined at compile time as a config,
this value must remain constant whatever the size of the virtual address
space, which is only possible by pushing this region at the end of the
address space next to the kernel mapping.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 Documentation/riscv/vm-layout.rst | 12 ++++++------
 arch/riscv/Kconfig                |  4 ++--
 arch/riscv/include/asm/kasan.h    |  4 ++--
 arch/riscv/include/asm/page.h     |  6 +++++-
 arch/riscv/include/asm/pgtable.h  |  6 ++++--
 arch/riscv/mm/init.c              | 25 +++++++++++++------------
 6 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
index b7f98930d38d..1bd687b97104 100644
--- a/Documentation/riscv/vm-layout.rst
+++ b/Documentation/riscv/vm-layout.rst
@@ -47,12 +47,12 @@ RISC-V Linux Kernel SV39
                                                               | Kernel-space virtual memory, shared between all processes:
   ____________________________________________________________|___________________________________________________________
                     |            |                  |         |
-   ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
-   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
-   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
-   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
-   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB | vmalloc/ioremap space
-   ffffffe000000000 | -128    GB | ffffffff7fffffff |  124 GB | direct mapping of all physical memory
+   ffffffc6fee00000 | -228    GB | ffffffc6feffffff |    2 MB | fixmap
+   ffffffc6ff000000 | -228    GB | ffffffc6ffffffff |   16 MB | PCI io
+   ffffffc700000000 | -228    GB | ffffffc7ffffffff |    4 GB | vmemmap
+   ffffffc800000000 | -224    GB | ffffffd7ffffffff |   64 GB | vmalloc/ioremap space
+   ffffffd800000000 | -160    GB | fffffff6ffffffff |  124 GB | direct mapping of all physical memory
+   fffffff700000000 |  -36    GB | fffffffeffffffff |   32 GB | kasan
   __________________|____________|__________________|_________|____________________________________________________________
                                                               |
                                                               |
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 6d5b63bd4bd9..6cd98ade5ebc 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -161,12 +161,12 @@ config PAGE_OFFSET
 	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
 	default 0x80000000 if 64BIT && !MMU
 	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
-	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
+	default 0xffffffd800000000 if 64BIT && MAXPHYSMEM_128GB
 
 config KASAN_SHADOW_OFFSET
 	hex
 	depends on KASAN_GENERIC
-	default 0xdfffffc800000000 if 64BIT
+	default 0xdfffffff00000000 if 64BIT
 	default 0xffffffff if 32BIT
 
 config ARCH_FLATMEM_ENABLE
diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index b00f503ec124..257a2495145a 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -28,8 +28,8 @@
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
 #define KASAN_SHADOW_SIZE	(UL(1) << ((CONFIG_VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
-#define KASAN_SHADOW_START	KERN_VIRT_START
-#define KASAN_SHADOW_END	(KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
+#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
+#define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
 
 void kasan_init(void);
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 109c97e991a6..e03559f9b35e 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -33,7 +33,11 @@
  */
 #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
 
-#define KERN_VIRT_SIZE (-PAGE_OFFSET)
+/*
+ * Half of the kernel address space (half of the entries of the page global
+ * directory) is for the direct mapping.
+ */
+#define KERN_VIRT_SIZE		((PTRS_PER_PGD / 2 * PGDIR_SIZE) / 2)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 39b550310ec6..d34f3a7a9701 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -39,8 +39,10 @@
 
 /* Modules always live before the kernel */
 #ifdef CONFIG_64BIT
-#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
-#define MODULES_END	(PFN_ALIGN((unsigned long)&_start))
+/* This is used to define the end of the KASAN shadow region */
+#define MODULES_LOWEST_VADDR	(KERNEL_LINK_ADDR - SZ_2G)
+#define MODULES_VADDR		(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
+#define MODULES_END		(PFN_ALIGN((unsigned long)&_start))
 #endif
 
 /*
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index c0cddf0fc22d..4224e9d0ecf5 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -103,6 +103,9 @@ static void __init print_vm_layout(void)
 	print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
 		  (unsigned long)high_memory);
 #ifdef CONFIG_64BIT
+#ifdef CONFIG_KASAN
+	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
+#endif
 	print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
 		  (unsigned long)ADDRESS_SPACE_END);
 #endif
@@ -130,18 +133,8 @@ void __init mem_init(void)
 	print_vm_layout();
 }
 
-/*
- * The default maximal physical memory size is -PAGE_OFFSET for 32-bit kernel,
- * whereas for 64-bit kernel, the end of the virtual address space is occupied
- * by the modules/BPF/kernel mappings which reduces the available size of the
- * linear mapping.
- * Limit the memory size via mem.
- */
-#ifdef CONFIG_64BIT
-static phys_addr_t memory_limit = -PAGE_OFFSET - SZ_4G;
-#else
-static phys_addr_t memory_limit = -PAGE_OFFSET;
-#endif
+/* Limit the memory size via mem. */
+static phys_addr_t memory_limit;
 
 static int __init early_mem(char *p)
 {
@@ -613,6 +606,14 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 
 	riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
 
+	/*
+	 * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
+	 * kernel, whereas for 64-bit kernel, the end of the virtual address
+	 * space is occupied by the modules/BPF/kernel mappings which reduces
+	 * the available size of the linear mapping.
+	 */
+	memory_limit = KERN_VIRT_SIZE - (IS_ENABLED(CONFIG_64BIT) ? SZ_4G : 0);
+
 	/* Sanity check alignment and size */
 	BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0);
 	BUG_ON((kernel_map.phys_addr % PMD_SIZE) != 0);
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 02/13] riscv: Split early kasan mapping to prepare sv48 introduction
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

Now that kasan shadow region is next to the kernel, for sv48, this
region won't be aligned on PGDIR_SIZE and then when populating this
region, we'll need to get down to lower levels of the page table. So
instead of reimplementing the page table walk for the early population,
take advantage of the existing functions used for the final population.

Note that kasan swapper initialization must also be split since memblock
is not initialized at this point and as the last PGD is shared with the
kernel, we'd need to allocate a PUD so postpone the kasan final
population after the kernel population is done.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/include/asm/kasan.h |   1 +
 arch/riscv/mm/init.c           |   4 ++
 arch/riscv/mm/kasan_init.c     | 113 ++++++++++++++++++---------------
 3 files changed, 67 insertions(+), 51 deletions(-)

diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index 257a2495145a..2788e2c46609 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -34,6 +34,7 @@
 
 void kasan_init(void);
 asmlinkage void kasan_early_init(void);
+void kasan_swapper_init(void);
 
 #endif
 #endif
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 4224e9d0ecf5..5010eba52738 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -742,6 +742,10 @@ static void __init setup_vm_final(void)
 	create_kernel_page_table(swapper_pg_dir, false);
 #endif
 
+#ifdef CONFIG_KASAN
+	kasan_swapper_init();
+#endif
+
 	/* Clear fixmap PTE and PMD mappings */
 	clear_fixmap(FIX_PTE);
 	clear_fixmap(FIX_PMD);
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 54294f83513d..1434a0225140 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -12,44 +12,6 @@
 #include <asm/pgalloc.h>
 
 extern pgd_t early_pg_dir[PTRS_PER_PGD];
-asmlinkage void __init kasan_early_init(void)
-{
-	uintptr_t i;
-	pgd_t *pgd = early_pg_dir + pgd_index(KASAN_SHADOW_START);
-
-	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
-		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
-
-	for (i = 0; i < PTRS_PER_PTE; ++i)
-		set_pte(kasan_early_shadow_pte + i,
-			mk_pte(virt_to_page(kasan_early_shadow_page),
-			       PAGE_KERNEL));
-
-	for (i = 0; i < PTRS_PER_PMD; ++i)
-		set_pmd(kasan_early_shadow_pmd + i,
-			pfn_pmd(PFN_DOWN
-				(__pa((uintptr_t) kasan_early_shadow_pte)),
-				__pgprot(_PAGE_TABLE)));
-
-	for (i = KASAN_SHADOW_START; i < KASAN_SHADOW_END;
-	     i += PGDIR_SIZE, ++pgd)
-		set_pgd(pgd,
-			pfn_pgd(PFN_DOWN
-				(__pa(((uintptr_t) kasan_early_shadow_pmd))),
-				__pgprot(_PAGE_TABLE)));
-
-	/* init for swapper_pg_dir */
-	pgd = pgd_offset_k(KASAN_SHADOW_START);
-
-	for (i = KASAN_SHADOW_START; i < KASAN_SHADOW_END;
-	     i += PGDIR_SIZE, ++pgd)
-		set_pgd(pgd,
-			pfn_pgd(PFN_DOWN
-				(__pa(((uintptr_t) kasan_early_shadow_pmd))),
-				__pgprot(_PAGE_TABLE)));
-
-	local_flush_tlb_all();
-}
 
 static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
 {
@@ -108,26 +70,35 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
 	set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
 }
 
-static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
+static void __init kasan_populate_pgd(pgd_t *pgdp,
+				      unsigned long vaddr, unsigned long end,
+				      bool early)
 {
 	phys_addr_t phys_addr;
-	pgd_t *pgdp = pgd_offset_k(vaddr);
 	unsigned long next;
 
 	do {
 		next = pgd_addr_end(vaddr, end);
 
-		/*
-		 * pgdp can't be none since kasan_early_init initialized all KASAN
-		 * shadow region with kasan_early_shadow_pmd: if this is stillthe case,
-		 * that means we can try to allocate a hugepage as a replacement.
-		 */
-		if (pgd_page_vaddr(*pgdp) == (unsigned long)lm_alias(kasan_early_shadow_pmd) &&
-		    IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE) {
-			phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
-			if (phys_addr) {
-				set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_KERNEL));
+		if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE) {
+			if (early) {
+				phys_addr = __pa((uintptr_t)kasan_early_shadow_pgd_next);
+				set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_TABLE));
 				continue;
+			} else if (pgd_page_vaddr(*pgdp) ==
+				   (unsigned long)lm_alias(kasan_early_shadow_pgd_next)) {
+				/*
+				 * pgdp can't be none since kasan_early_init
+				 * initialized all KASAN shadow region with
+				 * kasan_early_shadow_pud: if this is still the
+				 * case, that means we can try to allocate a
+				 * hugepage as a replacement.
+				 */
+				phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
+				if (phys_addr) {
+					set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_KERNEL));
+					continue;
+				}
 			}
 		}
 
@@ -135,12 +106,52 @@ static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
 	} while (pgdp++, vaddr = next, vaddr != end);
 }
 
+asmlinkage void __init kasan_early_init(void)
+{
+	uintptr_t i;
+
+	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
+		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+
+	for (i = 0; i < PTRS_PER_PTE; ++i)
+		set_pte(kasan_early_shadow_pte + i,
+			mk_pte(virt_to_page(kasan_early_shadow_page),
+			       PAGE_KERNEL));
+
+	for (i = 0; i < PTRS_PER_PMD; ++i)
+		set_pmd(kasan_early_shadow_pmd + i,
+			pfn_pmd(PFN_DOWN
+				(__pa((uintptr_t)kasan_early_shadow_pte)),
+				PAGE_TABLE));
+
+	if (pgtable_l4_enabled) {
+		for (i = 0; i < PTRS_PER_PUD; ++i)
+			set_pud(kasan_early_shadow_pud + i,
+				pfn_pud(PFN_DOWN
+					(__pa(((uintptr_t)kasan_early_shadow_pmd))),
+					PAGE_TABLE));
+	}
+
+	kasan_populate_pgd(early_pg_dir + pgd_index(KASAN_SHADOW_START),
+			   KASAN_SHADOW_START, KASAN_SHADOW_END, true);
+
+	local_flush_tlb_all();
+}
+
+void __init kasan_swapper_init(void)
+{
+	kasan_populate_pgd(pgd_offset_k(KASAN_SHADOW_START),
+			   KASAN_SHADOW_START, KASAN_SHADOW_END, true);
+
+	local_flush_tlb_all();
+}
+
 static void __init kasan_populate(void *start, void *end)
 {
 	unsigned long vaddr = (unsigned long)start & PAGE_MASK;
 	unsigned long vend = PAGE_ALIGN((unsigned long)end);
 
-	kasan_populate_pgd(vaddr, vend);
+	kasan_populate_pgd(pgd_offset_k(vaddr), vaddr, vend, false);
 
 	local_flush_tlb_all();
 	memset(start, KASAN_SHADOW_INIT, end - start);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 02/13] riscv: Split early kasan mapping to prepare sv48 introduction
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

Now that kasan shadow region is next to the kernel, for sv48, this
region won't be aligned on PGDIR_SIZE and then when populating this
region, we'll need to get down to lower levels of the page table. So
instead of reimplementing the page table walk for the early population,
take advantage of the existing functions used for the final population.

Note that kasan swapper initialization must also be split since memblock
is not initialized at this point and as the last PGD is shared with the
kernel, we'd need to allocate a PUD so postpone the kasan final
population after the kernel population is done.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/include/asm/kasan.h |   1 +
 arch/riscv/mm/init.c           |   4 ++
 arch/riscv/mm/kasan_init.c     | 113 ++++++++++++++++++---------------
 3 files changed, 67 insertions(+), 51 deletions(-)

diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index 257a2495145a..2788e2c46609 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -34,6 +34,7 @@
 
 void kasan_init(void);
 asmlinkage void kasan_early_init(void);
+void kasan_swapper_init(void);
 
 #endif
 #endif
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 4224e9d0ecf5..5010eba52738 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -742,6 +742,10 @@ static void __init setup_vm_final(void)
 	create_kernel_page_table(swapper_pg_dir, false);
 #endif
 
+#ifdef CONFIG_KASAN
+	kasan_swapper_init();
+#endif
+
 	/* Clear fixmap PTE and PMD mappings */
 	clear_fixmap(FIX_PTE);
 	clear_fixmap(FIX_PMD);
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 54294f83513d..1434a0225140 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -12,44 +12,6 @@
 #include <asm/pgalloc.h>
 
 extern pgd_t early_pg_dir[PTRS_PER_PGD];
-asmlinkage void __init kasan_early_init(void)
-{
-	uintptr_t i;
-	pgd_t *pgd = early_pg_dir + pgd_index(KASAN_SHADOW_START);
-
-	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
-		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
-
-	for (i = 0; i < PTRS_PER_PTE; ++i)
-		set_pte(kasan_early_shadow_pte + i,
-			mk_pte(virt_to_page(kasan_early_shadow_page),
-			       PAGE_KERNEL));
-
-	for (i = 0; i < PTRS_PER_PMD; ++i)
-		set_pmd(kasan_early_shadow_pmd + i,
-			pfn_pmd(PFN_DOWN
-				(__pa((uintptr_t) kasan_early_shadow_pte)),
-				__pgprot(_PAGE_TABLE)));
-
-	for (i = KASAN_SHADOW_START; i < KASAN_SHADOW_END;
-	     i += PGDIR_SIZE, ++pgd)
-		set_pgd(pgd,
-			pfn_pgd(PFN_DOWN
-				(__pa(((uintptr_t) kasan_early_shadow_pmd))),
-				__pgprot(_PAGE_TABLE)));
-
-	/* init for swapper_pg_dir */
-	pgd = pgd_offset_k(KASAN_SHADOW_START);
-
-	for (i = KASAN_SHADOW_START; i < KASAN_SHADOW_END;
-	     i += PGDIR_SIZE, ++pgd)
-		set_pgd(pgd,
-			pfn_pgd(PFN_DOWN
-				(__pa(((uintptr_t) kasan_early_shadow_pmd))),
-				__pgprot(_PAGE_TABLE)));
-
-	local_flush_tlb_all();
-}
 
 static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
 {
@@ -108,26 +70,35 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
 	set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
 }
 
-static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
+static void __init kasan_populate_pgd(pgd_t *pgdp,
+				      unsigned long vaddr, unsigned long end,
+				      bool early)
 {
 	phys_addr_t phys_addr;
-	pgd_t *pgdp = pgd_offset_k(vaddr);
 	unsigned long next;
 
 	do {
 		next = pgd_addr_end(vaddr, end);
 
-		/*
-		 * pgdp can't be none since kasan_early_init initialized all KASAN
-		 * shadow region with kasan_early_shadow_pmd: if this is stillthe case,
-		 * that means we can try to allocate a hugepage as a replacement.
-		 */
-		if (pgd_page_vaddr(*pgdp) == (unsigned long)lm_alias(kasan_early_shadow_pmd) &&
-		    IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE) {
-			phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
-			if (phys_addr) {
-				set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_KERNEL));
+		if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE) {
+			if (early) {
+				phys_addr = __pa((uintptr_t)kasan_early_shadow_pgd_next);
+				set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_TABLE));
 				continue;
+			} else if (pgd_page_vaddr(*pgdp) ==
+				   (unsigned long)lm_alias(kasan_early_shadow_pgd_next)) {
+				/*
+				 * pgdp can't be none since kasan_early_init
+				 * initialized all KASAN shadow region with
+				 * kasan_early_shadow_pud: if this is still the
+				 * case, that means we can try to allocate a
+				 * hugepage as a replacement.
+				 */
+				phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
+				if (phys_addr) {
+					set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_KERNEL));
+					continue;
+				}
 			}
 		}
 
@@ -135,12 +106,52 @@ static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
 	} while (pgdp++, vaddr = next, vaddr != end);
 }
 
+asmlinkage void __init kasan_early_init(void)
+{
+	uintptr_t i;
+
+	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
+		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+
+	for (i = 0; i < PTRS_PER_PTE; ++i)
+		set_pte(kasan_early_shadow_pte + i,
+			mk_pte(virt_to_page(kasan_early_shadow_page),
+			       PAGE_KERNEL));
+
+	for (i = 0; i < PTRS_PER_PMD; ++i)
+		set_pmd(kasan_early_shadow_pmd + i,
+			pfn_pmd(PFN_DOWN
+				(__pa((uintptr_t)kasan_early_shadow_pte)),
+				PAGE_TABLE));
+
+	if (pgtable_l4_enabled) {
+		for (i = 0; i < PTRS_PER_PUD; ++i)
+			set_pud(kasan_early_shadow_pud + i,
+				pfn_pud(PFN_DOWN
+					(__pa(((uintptr_t)kasan_early_shadow_pmd))),
+					PAGE_TABLE));
+	}
+
+	kasan_populate_pgd(early_pg_dir + pgd_index(KASAN_SHADOW_START),
+			   KASAN_SHADOW_START, KASAN_SHADOW_END, true);
+
+	local_flush_tlb_all();
+}
+
+void __init kasan_swapper_init(void)
+{
+	kasan_populate_pgd(pgd_offset_k(KASAN_SHADOW_START),
+			   KASAN_SHADOW_START, KASAN_SHADOW_END, true);
+
+	local_flush_tlb_all();
+}
+
 static void __init kasan_populate(void *start, void *end)
 {
 	unsigned long vaddr = (unsigned long)start & PAGE_MASK;
 	unsigned long vend = PAGE_ALIGN((unsigned long)end);
 
-	kasan_populate_pgd(vaddr, vend);
+	kasan_populate_pgd(pgd_offset_k(vaddr), vaddr, vend, false);
 
 	local_flush_tlb_all();
 	memset(start, KASAN_SHADOW_INIT, end - start);
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 03/13] riscv: Introduce functions to switch pt_ops
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

This simply gathers the different pt_ops initialization in functions
where a comment was added to explain why the page table operations must
be changed along the boot process.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/mm/init.c | 74 ++++++++++++++++++++++++++++++--------------
 1 file changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 5010eba52738..1552226fb6bd 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -582,6 +582,52 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
 	dtb_early_pa = dtb_pa;
 }
 
+/*
+ * MMU is not enabled, the page tables are allocated directly using
+ * early_pmd/pud/p4d and the address returned is the physical one.
+ */
+void pt_ops_set_early(void)
+{
+	pt_ops.alloc_pte = alloc_pte_early;
+	pt_ops.get_pte_virt = get_pte_virt_early;
+#ifndef __PAGETABLE_PMD_FOLDED
+	pt_ops.alloc_pmd = alloc_pmd_early;
+	pt_ops.get_pmd_virt = get_pmd_virt_early;
+#endif
+}
+
+/*
+ * MMU is enabled but page table setup is not complete yet.
+ * fixmap page table alloc functions must be used as a means to temporarily
+ * map the allocated physical pages since the linear mapping does not exist yet.
+ *
+ * Note that this is called with MMU disabled, hence kernel_mapping_pa_to_va,
+ * but it will be used as described above.
+ */
+void pt_ops_set_fixmap(void)
+{
+	pt_ops.alloc_pte = kernel_mapping_pa_to_va((uintptr_t)alloc_pte_fixmap);
+	pt_ops.get_pte_virt = kernel_mapping_pa_to_va((uintptr_t)get_pte_virt_fixmap);
+#ifndef __PAGETABLE_PMD_FOLDED
+	pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
+	pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
+#endif
+}
+
+/*
+ * MMU is enabled and page table setup is complete, so from now, we can use
+ * generic page allocation functions to setup page table.
+ */
+void pt_ops_set_late(void)
+{
+	pt_ops.alloc_pte = alloc_pte_late;
+	pt_ops.get_pte_virt = get_pte_virt_late;
+#ifndef __PAGETABLE_PMD_FOLDED
+	pt_ops.alloc_pmd = alloc_pmd_late;
+	pt_ops.get_pmd_virt = get_pmd_virt_late;
+#endif
+}
+
 asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 {
 	pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
@@ -626,12 +672,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	BUG_ON((kernel_map.virt_addr + kernel_map.size) > ADDRESS_SPACE_END - SZ_4K);
 #endif
 
-	pt_ops.alloc_pte = alloc_pte_early;
-	pt_ops.get_pte_virt = get_pte_virt_early;
-#ifndef __PAGETABLE_PMD_FOLDED
-	pt_ops.alloc_pmd = alloc_pmd_early;
-	pt_ops.get_pmd_virt = get_pmd_virt_early;
-#endif
+	pt_ops_set_early();
+
 	/* Setup early PGD for fixmap */
 	create_pgd_mapping(early_pg_dir, FIXADDR_START,
 			   (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
@@ -695,6 +737,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
 	}
 #endif
+
+	pt_ops_set_fixmap();
 }
 
 static void __init setup_vm_final(void)
@@ -703,16 +747,6 @@ static void __init setup_vm_final(void)
 	phys_addr_t pa, start, end;
 	u64 i;
 
-	/**
-	 * MMU is enabled at this point. But page table setup is not complete yet.
-	 * fixmap page table alloc functions should be used at this point
-	 */
-	pt_ops.alloc_pte = alloc_pte_fixmap;
-	pt_ops.get_pte_virt = get_pte_virt_fixmap;
-#ifndef __PAGETABLE_PMD_FOLDED
-	pt_ops.alloc_pmd = alloc_pmd_fixmap;
-	pt_ops.get_pmd_virt = get_pmd_virt_fixmap;
-#endif
 	/* Setup swapper PGD for fixmap */
 	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
 			   __pa_symbol(fixmap_pgd_next),
@@ -754,13 +788,7 @@ static void __init setup_vm_final(void)
 	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
 	local_flush_tlb_all();
 
-	/* generic page allocation functions must be used to setup page table */
-	pt_ops.alloc_pte = alloc_pte_late;
-	pt_ops.get_pte_virt = get_pte_virt_late;
-#ifndef __PAGETABLE_PMD_FOLDED
-	pt_ops.alloc_pmd = alloc_pmd_late;
-	pt_ops.get_pmd_virt = get_pmd_virt_late;
-#endif
+	pt_ops_set_late();
 }
 #else
 asmlinkage void __init setup_vm(uintptr_t dtb_pa)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 03/13] riscv: Introduce functions to switch pt_ops
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

This simply gathers the different pt_ops initialization in functions
where a comment was added to explain why the page table operations must
be changed along the boot process.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/mm/init.c | 74 ++++++++++++++++++++++++++++++--------------
 1 file changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 5010eba52738..1552226fb6bd 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -582,6 +582,52 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
 	dtb_early_pa = dtb_pa;
 }
 
+/*
+ * MMU is not enabled, the page tables are allocated directly using
+ * early_pmd/pud/p4d and the address returned is the physical one.
+ */
+void pt_ops_set_early(void)
+{
+	pt_ops.alloc_pte = alloc_pte_early;
+	pt_ops.get_pte_virt = get_pte_virt_early;
+#ifndef __PAGETABLE_PMD_FOLDED
+	pt_ops.alloc_pmd = alloc_pmd_early;
+	pt_ops.get_pmd_virt = get_pmd_virt_early;
+#endif
+}
+
+/*
+ * MMU is enabled but page table setup is not complete yet.
+ * fixmap page table alloc functions must be used as a means to temporarily
+ * map the allocated physical pages since the linear mapping does not exist yet.
+ *
+ * Note that this is called with MMU disabled, hence kernel_mapping_pa_to_va,
+ * but it will be used as described above.
+ */
+void pt_ops_set_fixmap(void)
+{
+	pt_ops.alloc_pte = kernel_mapping_pa_to_va((uintptr_t)alloc_pte_fixmap);
+	pt_ops.get_pte_virt = kernel_mapping_pa_to_va((uintptr_t)get_pte_virt_fixmap);
+#ifndef __PAGETABLE_PMD_FOLDED
+	pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
+	pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
+#endif
+}
+
+/*
+ * MMU is enabled and page table setup is complete, so from now, we can use
+ * generic page allocation functions to setup page table.
+ */
+void pt_ops_set_late(void)
+{
+	pt_ops.alloc_pte = alloc_pte_late;
+	pt_ops.get_pte_virt = get_pte_virt_late;
+#ifndef __PAGETABLE_PMD_FOLDED
+	pt_ops.alloc_pmd = alloc_pmd_late;
+	pt_ops.get_pmd_virt = get_pmd_virt_late;
+#endif
+}
+
 asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 {
 	pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
@@ -626,12 +672,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	BUG_ON((kernel_map.virt_addr + kernel_map.size) > ADDRESS_SPACE_END - SZ_4K);
 #endif
 
-	pt_ops.alloc_pte = alloc_pte_early;
-	pt_ops.get_pte_virt = get_pte_virt_early;
-#ifndef __PAGETABLE_PMD_FOLDED
-	pt_ops.alloc_pmd = alloc_pmd_early;
-	pt_ops.get_pmd_virt = get_pmd_virt_early;
-#endif
+	pt_ops_set_early();
+
 	/* Setup early PGD for fixmap */
 	create_pgd_mapping(early_pg_dir, FIXADDR_START,
 			   (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
@@ -695,6 +737,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
 	}
 #endif
+
+	pt_ops_set_fixmap();
 }
 
 static void __init setup_vm_final(void)
@@ -703,16 +747,6 @@ static void __init setup_vm_final(void)
 	phys_addr_t pa, start, end;
 	u64 i;
 
-	/**
-	 * MMU is enabled at this point. But page table setup is not complete yet.
-	 * fixmap page table alloc functions should be used at this point
-	 */
-	pt_ops.alloc_pte = alloc_pte_fixmap;
-	pt_ops.get_pte_virt = get_pte_virt_fixmap;
-#ifndef __PAGETABLE_PMD_FOLDED
-	pt_ops.alloc_pmd = alloc_pmd_fixmap;
-	pt_ops.get_pmd_virt = get_pmd_virt_fixmap;
-#endif
 	/* Setup swapper PGD for fixmap */
 	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
 			   __pa_symbol(fixmap_pgd_next),
@@ -754,13 +788,7 @@ static void __init setup_vm_final(void)
 	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
 	local_flush_tlb_all();
 
-	/* generic page allocation functions must be used to setup page table */
-	pt_ops.alloc_pte = alloc_pte_late;
-	pt_ops.get_pte_virt = get_pte_virt_late;
-#ifndef __PAGETABLE_PMD_FOLDED
-	pt_ops.alloc_pmd = alloc_pmd_late;
-	pt_ops.get_pmd_virt = get_pmd_virt_late;
-#endif
+	pt_ops_set_late();
 }
 #else
 asmlinkage void __init setup_vm(uintptr_t dtb_pa)
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 04/13] riscv: Allow to dynamically define VA_BITS
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

With 4-level page table folding at runtime, we don't know at compile time
the size of the virtual address space so we must set VA_BITS dynamically
so that sparsemem reserves the right amount of memory for struct pages.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/Kconfig                 | 10 ----------
 arch/riscv/include/asm/kasan.h     |  2 +-
 arch/riscv/include/asm/pgtable.h   | 10 ++++++++--
 arch/riscv/include/asm/sparsemem.h |  6 +++++-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 6cd98ade5ebc..c3a167eea011 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -146,16 +146,6 @@ config MMU
 	  Select if you want MMU-based virtualised addressing space
 	  support by paged memory management. If unsure, say 'Y'.
 
-config VA_BITS
-	int
-	default 32 if 32BIT
-	default 39 if 64BIT
-
-config PA_BITS
-	int
-	default 34 if 32BIT
-	default 56 if 64BIT
-
 config PAGE_OFFSET
 	hex
 	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index 2788e2c46609..743e6ff57996 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -27,7 +27,7 @@
  */
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
-#define KASAN_SHADOW_SIZE	(UL(1) << ((CONFIG_VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
+#define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
 #define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
 #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index d34f3a7a9701..e1a52e22ad7e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -50,8 +50,14 @@
  * struct pages to map half the virtual address space. Then
  * position vmemmap directly below the VMALLOC region.
  */
+#ifdef CONFIG_64BIT
+#define VA_BITS		39
+#else
+#define VA_BITS		32
+#endif
+
 #define VMEMMAP_SHIFT \
-	(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
+	(VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
 #define VMEMMAP_SIZE	BIT(VMEMMAP_SHIFT)
 #define VMEMMAP_END	(VMALLOC_START - 1)
 #define VMEMMAP_START	(VMALLOC_START - VMEMMAP_SIZE)
@@ -653,7 +659,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
  * and give the kernel the other (upper) half.
  */
 #ifdef CONFIG_64BIT
-#define KERN_VIRT_START	(-(BIT(CONFIG_VA_BITS)) + TASK_SIZE)
+#define KERN_VIRT_START	(-(BIT(VA_BITS)) + TASK_SIZE)
 #else
 #define KERN_VIRT_START	FIXADDR_START
 #endif
diff --git a/arch/riscv/include/asm/sparsemem.h b/arch/riscv/include/asm/sparsemem.h
index 45a7018a8118..63acaecc3374 100644
--- a/arch/riscv/include/asm/sparsemem.h
+++ b/arch/riscv/include/asm/sparsemem.h
@@ -4,7 +4,11 @@
 #define _ASM_RISCV_SPARSEMEM_H
 
 #ifdef CONFIG_SPARSEMEM
-#define MAX_PHYSMEM_BITS	CONFIG_PA_BITS
+#ifdef CONFIG_64BIT
+#define MAX_PHYSMEM_BITS	56
+#else
+#define MAX_PHYSMEM_BITS	34
+#endif /* CONFIG_64BIT */
 #define SECTION_SIZE_BITS	27
 #endif /* CONFIG_SPARSEMEM */
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 04/13] riscv: Allow to dynamically define VA_BITS
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

With 4-level page table folding at runtime, we don't know at compile time
the size of the virtual address space so we must set VA_BITS dynamically
so that sparsemem reserves the right amount of memory for struct pages.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/Kconfig                 | 10 ----------
 arch/riscv/include/asm/kasan.h     |  2 +-
 arch/riscv/include/asm/pgtable.h   | 10 ++++++++--
 arch/riscv/include/asm/sparsemem.h |  6 +++++-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 6cd98ade5ebc..c3a167eea011 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -146,16 +146,6 @@ config MMU
 	  Select if you want MMU-based virtualised addressing space
 	  support by paged memory management. If unsure, say 'Y'.
 
-config VA_BITS
-	int
-	default 32 if 32BIT
-	default 39 if 64BIT
-
-config PA_BITS
-	int
-	default 34 if 32BIT
-	default 56 if 64BIT
-
 config PAGE_OFFSET
 	hex
 	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index 2788e2c46609..743e6ff57996 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -27,7 +27,7 @@
  */
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
-#define KASAN_SHADOW_SIZE	(UL(1) << ((CONFIG_VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
+#define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
 #define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
 #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index d34f3a7a9701..e1a52e22ad7e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -50,8 +50,14 @@
  * struct pages to map half the virtual address space. Then
  * position vmemmap directly below the VMALLOC region.
  */
+#ifdef CONFIG_64BIT
+#define VA_BITS		39
+#else
+#define VA_BITS		32
+#endif
+
 #define VMEMMAP_SHIFT \
-	(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
+	(VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
 #define VMEMMAP_SIZE	BIT(VMEMMAP_SHIFT)
 #define VMEMMAP_END	(VMALLOC_START - 1)
 #define VMEMMAP_START	(VMALLOC_START - VMEMMAP_SIZE)
@@ -653,7 +659,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
  * and give the kernel the other (upper) half.
  */
 #ifdef CONFIG_64BIT
-#define KERN_VIRT_START	(-(BIT(CONFIG_VA_BITS)) + TASK_SIZE)
+#define KERN_VIRT_START	(-(BIT(VA_BITS)) + TASK_SIZE)
 #else
 #define KERN_VIRT_START	FIXADDR_START
 #endif
diff --git a/arch/riscv/include/asm/sparsemem.h b/arch/riscv/include/asm/sparsemem.h
index 45a7018a8118..63acaecc3374 100644
--- a/arch/riscv/include/asm/sparsemem.h
+++ b/arch/riscv/include/asm/sparsemem.h
@@ -4,7 +4,11 @@
 #define _ASM_RISCV_SPARSEMEM_H
 
 #ifdef CONFIG_SPARSEMEM
-#define MAX_PHYSMEM_BITS	CONFIG_PA_BITS
+#ifdef CONFIG_64BIT
+#define MAX_PHYSMEM_BITS	56
+#else
+#define MAX_PHYSMEM_BITS	34
+#endif /* CONFIG_64BIT */
 #define SECTION_SIZE_BITS	27
 #endif /* CONFIG_SPARSEMEM */
 
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 05/13] riscv: Get rid of MAXPHYSMEM configs
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

CONFIG_MAXPHYSMEM_* were actually never used, even the nommu defconfigs
selecting the MAXPHYSMEM_2GB had no effects on PAGE_OFFSET since it was
preempted by !MMU case right before.

In addition, I suspect that commit 2bfc6cd81bd1 ("riscv: Move kernel
mapping outside of linear mapping") which moved the kernel to
0xffffffff80000000 broke the MAXPHYSMEM_2GB config which defined
PAGE_OFFSET at the same address.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/Kconfig                            | 23 ++-----------------
 arch/riscv/configs/nommu_k210_defconfig       |  1 -
 .../riscv/configs/nommu_k210_sdcard_defconfig |  1 -
 arch/riscv/configs/nommu_virt_defconfig       |  1 -
 4 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c3a167eea011..ac6c0cd9bc29 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -148,10 +148,9 @@ config MMU
 
 config PAGE_OFFSET
 	hex
-	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
+	default 0xC0000000 if 32BIT
 	default 0x80000000 if 64BIT && !MMU
-	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
-	default 0xffffffd800000000 if 64BIT && MAXPHYSMEM_128GB
+	default 0xffffffd800000000 if 64BIT
 
 config KASAN_SHADOW_OFFSET
 	hex
@@ -260,24 +259,6 @@ config MODULE_SECTIONS
 	bool
 	select HAVE_MOD_ARCH_SPECIFIC
 
-choice
-	prompt "Maximum Physical Memory"
-	default MAXPHYSMEM_1GB if 32BIT
-	default MAXPHYSMEM_2GB if 64BIT && CMODEL_MEDLOW
-	default MAXPHYSMEM_128GB if 64BIT && CMODEL_MEDANY
-
-	config MAXPHYSMEM_1GB
-		depends on 32BIT
-		bool "1GiB"
-	config MAXPHYSMEM_2GB
-		depends on 64BIT && CMODEL_MEDLOW
-		bool "2GiB"
-	config MAXPHYSMEM_128GB
-		depends on 64BIT && CMODEL_MEDANY
-		bool "128GiB"
-endchoice
-
-
 config SMP
 	bool "Symmetric Multi-Processing"
 	help
diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
index b16a2a12c82a..dae9179984cc 100644
--- a/arch/riscv/configs/nommu_k210_defconfig
+++ b/arch/riscv/configs/nommu_k210_defconfig
@@ -30,7 +30,6 @@ CONFIG_SLOB=y
 # CONFIG_MMU is not set
 CONFIG_SOC_CANAAN=y
 CONFIG_SOC_CANAAN_K210_DTB_SOURCE="k210_generic"
-CONFIG_MAXPHYSMEM_2GB=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=2
 CONFIG_CMDLINE="earlycon console=ttySIF0"
diff --git a/arch/riscv/configs/nommu_k210_sdcard_defconfig b/arch/riscv/configs/nommu_k210_sdcard_defconfig
index 61f887f65419..03f91525a059 100644
--- a/arch/riscv/configs/nommu_k210_sdcard_defconfig
+++ b/arch/riscv/configs/nommu_k210_sdcard_defconfig
@@ -22,7 +22,6 @@ CONFIG_SLOB=y
 # CONFIG_MMU is not set
 CONFIG_SOC_CANAAN=y
 CONFIG_SOC_CANAAN_K210_DTB_SOURCE="k210_generic"
-CONFIG_MAXPHYSMEM_2GB=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=2
 CONFIG_CMDLINE="earlycon console=ttySIF0 rootdelay=2 root=/dev/mmcblk0p1 ro"
diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
index e046a0babde4..f224be697785 100644
--- a/arch/riscv/configs/nommu_virt_defconfig
+++ b/arch/riscv/configs/nommu_virt_defconfig
@@ -27,7 +27,6 @@ CONFIG_SLOB=y
 # CONFIG_SLAB_MERGE_DEFAULT is not set
 # CONFIG_MMU is not set
 CONFIG_SOC_VIRT=y
-CONFIG_MAXPHYSMEM_2GB=y
 CONFIG_SMP=y
 CONFIG_CMDLINE="root=/dev/vda rw earlycon=uart8250,mmio,0x10000000,115200n8 console=ttyS0"
 CONFIG_CMDLINE_FORCE=y
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 05/13] riscv: Get rid of MAXPHYSMEM configs
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

CONFIG_MAXPHYSMEM_* were actually never used, even the nommu defconfigs
selecting the MAXPHYSMEM_2GB had no effects on PAGE_OFFSET since it was
preempted by !MMU case right before.

In addition, I suspect that commit 2bfc6cd81bd1 ("riscv: Move kernel
mapping outside of linear mapping") which moved the kernel to
0xffffffff80000000 broke the MAXPHYSMEM_2GB config which defined
PAGE_OFFSET at the same address.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/Kconfig                            | 23 ++-----------------
 arch/riscv/configs/nommu_k210_defconfig       |  1 -
 .../riscv/configs/nommu_k210_sdcard_defconfig |  1 -
 arch/riscv/configs/nommu_virt_defconfig       |  1 -
 4 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c3a167eea011..ac6c0cd9bc29 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -148,10 +148,9 @@ config MMU
 
 config PAGE_OFFSET
 	hex
-	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
+	default 0xC0000000 if 32BIT
 	default 0x80000000 if 64BIT && !MMU
-	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
-	default 0xffffffd800000000 if 64BIT && MAXPHYSMEM_128GB
+	default 0xffffffd800000000 if 64BIT
 
 config KASAN_SHADOW_OFFSET
 	hex
@@ -260,24 +259,6 @@ config MODULE_SECTIONS
 	bool
 	select HAVE_MOD_ARCH_SPECIFIC
 
-choice
-	prompt "Maximum Physical Memory"
-	default MAXPHYSMEM_1GB if 32BIT
-	default MAXPHYSMEM_2GB if 64BIT && CMODEL_MEDLOW
-	default MAXPHYSMEM_128GB if 64BIT && CMODEL_MEDANY
-
-	config MAXPHYSMEM_1GB
-		depends on 32BIT
-		bool "1GiB"
-	config MAXPHYSMEM_2GB
-		depends on 64BIT && CMODEL_MEDLOW
-		bool "2GiB"
-	config MAXPHYSMEM_128GB
-		depends on 64BIT && CMODEL_MEDANY
-		bool "128GiB"
-endchoice
-
-
 config SMP
 	bool "Symmetric Multi-Processing"
 	help
diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
index b16a2a12c82a..dae9179984cc 100644
--- a/arch/riscv/configs/nommu_k210_defconfig
+++ b/arch/riscv/configs/nommu_k210_defconfig
@@ -30,7 +30,6 @@ CONFIG_SLOB=y
 # CONFIG_MMU is not set
 CONFIG_SOC_CANAAN=y
 CONFIG_SOC_CANAAN_K210_DTB_SOURCE="k210_generic"
-CONFIG_MAXPHYSMEM_2GB=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=2
 CONFIG_CMDLINE="earlycon console=ttySIF0"
diff --git a/arch/riscv/configs/nommu_k210_sdcard_defconfig b/arch/riscv/configs/nommu_k210_sdcard_defconfig
index 61f887f65419..03f91525a059 100644
--- a/arch/riscv/configs/nommu_k210_sdcard_defconfig
+++ b/arch/riscv/configs/nommu_k210_sdcard_defconfig
@@ -22,7 +22,6 @@ CONFIG_SLOB=y
 # CONFIG_MMU is not set
 CONFIG_SOC_CANAAN=y
 CONFIG_SOC_CANAAN_K210_DTB_SOURCE="k210_generic"
-CONFIG_MAXPHYSMEM_2GB=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=2
 CONFIG_CMDLINE="earlycon console=ttySIF0 rootdelay=2 root=/dev/mmcblk0p1 ro"
diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
index e046a0babde4..f224be697785 100644
--- a/arch/riscv/configs/nommu_virt_defconfig
+++ b/arch/riscv/configs/nommu_virt_defconfig
@@ -27,7 +27,6 @@ CONFIG_SLOB=y
 # CONFIG_SLAB_MERGE_DEFAULT is not set
 # CONFIG_MMU is not set
 CONFIG_SOC_VIRT=y
-CONFIG_MAXPHYSMEM_2GB=y
 CONFIG_SMP=y
 CONFIG_CMDLINE="root=/dev/vda rw earlycon=uart8250,mmio,0x10000000,115200n8 console=ttyS0"
 CONFIG_CMDLINE_FORCE=y
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 06/13] asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

In the following commits, riscv will almost use the generic versions of
pud_alloc_one and pud_free but an additional check is required since those
functions are only relevant when using at least a 4-level page table, which
will be determined at runtime on riscv.

So move the content of those functions into other functions that riscv
can use without duplicating code.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 include/asm-generic/pgalloc.h | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 02932efad3ab..977bea16cf1b 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -147,6 +147,15 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 
 #if CONFIG_PGTABLE_LEVELS > 3
 
+static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	gfp_t gfp = GFP_PGTABLE_USER;
+
+	if (mm == &init_mm)
+		gfp = GFP_PGTABLE_KERNEL;
+	return (pud_t *)get_zeroed_page(gfp);
+}
+
 #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
 /**
  * pud_alloc_one - allocate a page for PUD-level page table
@@ -159,20 +168,23 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
  */
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	gfp_t gfp = GFP_PGTABLE_USER;
-
-	if (mm == &init_mm)
-		gfp = GFP_PGTABLE_KERNEL;
-	return (pud_t *)get_zeroed_page(gfp);
+	return __pud_alloc_one(mm, addr);
 }
 #endif
 
-static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
 {
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
 	free_page((unsigned long)pud);
 }
 
+#ifndef __HAVE_ARCH_PUD_FREE
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+	__pud_free(mm, pud);
+}
+#endif
+
 #endif /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #ifndef __HAVE_ARCH_PGD_FREE
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 06/13] asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

In the following commits, riscv will almost use the generic versions of
pud_alloc_one and pud_free but an additional check is required since those
functions are only relevant when using at least a 4-level page table, which
will be determined at runtime on riscv.

So move the content of those functions into other functions that riscv
can use without duplicating code.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 include/asm-generic/pgalloc.h | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 02932efad3ab..977bea16cf1b 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -147,6 +147,15 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 
 #if CONFIG_PGTABLE_LEVELS > 3
 
+static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	gfp_t gfp = GFP_PGTABLE_USER;
+
+	if (mm == &init_mm)
+		gfp = GFP_PGTABLE_KERNEL;
+	return (pud_t *)get_zeroed_page(gfp);
+}
+
 #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
 /**
  * pud_alloc_one - allocate a page for PUD-level page table
@@ -159,20 +168,23 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
  */
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	gfp_t gfp = GFP_PGTABLE_USER;
-
-	if (mm == &init_mm)
-		gfp = GFP_PGTABLE_KERNEL;
-	return (pud_t *)get_zeroed_page(gfp);
+	return __pud_alloc_one(mm, addr);
 }
 #endif
 
-static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
 {
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
 	free_page((unsigned long)pud);
 }
 
+#ifndef __HAVE_ARCH_PUD_FREE
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+	__pud_free(mm, pud);
+}
+#endif
+
 #endif /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #ifndef __HAVE_ARCH_PGD_FREE
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

By adding a new 4th level of page table, give the possibility to 64bit
kernel to address 2^48 bytes of virtual address: in practice, that offers
128TB of virtual address space to userspace and allows up to 64TB of
physical memory.

If the underlying hardware does not support sv48, we will automatically
fallback to a standard 3-level page table by folding the new PUD level into
PGDIR level. In order to detect HW capabilities at runtime, we
use SATP feature that ignores writes with an unsupported mode.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/Kconfig                      |   4 +-
 arch/riscv/include/asm/csr.h            |   3 +-
 arch/riscv/include/asm/fixmap.h         |   1 +
 arch/riscv/include/asm/kasan.h          |   6 +-
 arch/riscv/include/asm/page.h           |  14 ++
 arch/riscv/include/asm/pgalloc.h        |  40 +++++
 arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
 arch/riscv/include/asm/pgtable.h        |  24 ++-
 arch/riscv/kernel/head.S                |   3 +-
 arch/riscv/mm/context.c                 |   4 +-
 arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
 arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
 drivers/firmware/efi/libstub/efi-stub.c |   2 +
 13 files changed, 514 insertions(+), 44 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ac6c0cd9bc29..d28fe0148e13 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -150,7 +150,7 @@ config PAGE_OFFSET
 	hex
 	default 0xC0000000 if 32BIT
 	default 0x80000000 if 64BIT && !MMU
-	default 0xffffffd800000000 if 64BIT
+	default 0xffffaf8000000000 if 64BIT
 
 config KASAN_SHADOW_OFFSET
 	hex
@@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
 
 config PGTABLE_LEVELS
 	int
-	default 3 if 64BIT
+	default 4 if 64BIT
 	default 2
 
 config LOCKDEP_SUPPORT
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 87ac65696871..3fdb971c7896 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -40,14 +40,13 @@
 #ifndef CONFIG_64BIT
 #define SATP_PPN	_AC(0x003FFFFF, UL)
 #define SATP_MODE_32	_AC(0x80000000, UL)
-#define SATP_MODE	SATP_MODE_32
 #define SATP_ASID_BITS	9
 #define SATP_ASID_SHIFT	22
 #define SATP_ASID_MASK	_AC(0x1FF, UL)
 #else
 #define SATP_PPN	_AC(0x00000FFFFFFFFFFF, UL)
 #define SATP_MODE_39	_AC(0x8000000000000000, UL)
-#define SATP_MODE	SATP_MODE_39
+#define SATP_MODE_48	_AC(0x9000000000000000, UL)
 #define SATP_ASID_BITS	16
 #define SATP_ASID_SHIFT	44
 #define SATP_ASID_MASK	_AC(0xFFFF, UL)
diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
index 54cbf07fb4e9..58a718573ad6 100644
--- a/arch/riscv/include/asm/fixmap.h
+++ b/arch/riscv/include/asm/fixmap.h
@@ -24,6 +24,7 @@ enum fixed_addresses {
 	FIX_HOLE,
 	FIX_PTE,
 	FIX_PMD,
+	FIX_PUD,
 	FIX_TEXT_POKE1,
 	FIX_TEXT_POKE0,
 	FIX_EARLYCON_MEM_BASE,
diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index 743e6ff57996..0b85e363e778 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -28,7 +28,11 @@
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
 #define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
-#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
+/*
+ * Depending on the size of the virtual address space, the region may not be
+ * aligned on PGDIR_SIZE, so force its alignment to ease its population.
+ */
+#define KASAN_SHADOW_START	((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
 #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
 
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index e03559f9b35e..d089fe46f7d8 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -31,7 +31,20 @@
  * When not using MMU this corresponds to the first free page in
  * physical memory (aligned on a page boundary).
  */
+#ifdef CONFIG_64BIT
+#ifdef CONFIG_MMU
+#define PAGE_OFFSET		kernel_map.page_offset
+#else
+#define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
+#endif
+/*
+ * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
+ * define the PAGE_OFFSET value for SV39.
+ */
+#define PAGE_OFFSET_L3		_AC(0xffffffd800000000, UL)
+#else
 #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
+#endif /* CONFIG_64BIT */
 
 /*
  * Half of the kernel address space (half of the entries of the page global
@@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
 #endif /* CONFIG_MMU */
 
 struct kernel_mapping {
+	unsigned long page_offset;
 	unsigned long virt_addr;
 	uintptr_t phys_addr;
 	uintptr_t size;
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 0af6933a7100..11823004b87a 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -11,6 +11,8 @@
 #include <asm/tlb.h>
 
 #ifdef CONFIG_MMU
+#define __HAVE_ARCH_PUD_ALLOC_ONE
+#define __HAVE_ARCH_PUD_FREE
 #include <asm-generic/pgalloc.h>
 
 static inline void pmd_populate_kernel(struct mm_struct *mm,
@@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
 
 	set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 }
+
+static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
+{
+	if (pgtable_l4_enabled) {
+		unsigned long pfn = virt_to_pfn(pud);
+
+		set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
+	}
+}
+
+static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
+				     pud_t *pud)
+{
+	if (pgtable_l4_enabled) {
+		unsigned long pfn = virt_to_pfn(pud);
+
+		set_p4d_safe(p4d,
+			     __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
+	}
+}
+
+#define pud_alloc_one pud_alloc_one
+static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	if (pgtable_l4_enabled)
+		return __pud_alloc_one(mm, addr);
+
+	return NULL;
+}
+
+#define pud_free pud_free
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+	if (pgtable_l4_enabled)
+		__pud_free(mm, pud);
+}
+
+#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
 #endif /* __PAGETABLE_PMD_FOLDED */
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 228261aa9628..bbbdd66e5e2f 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -8,16 +8,36 @@
 
 #include <linux/const.h>
 
-#define PGDIR_SHIFT     30
+extern bool pgtable_l4_enabled;
+
+#define PGDIR_SHIFT_L3  30
+#define PGDIR_SHIFT_L4  39
+#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
+
+#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
 /* Size of region mapped by a page global directory */
 #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
 #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
 
+/* pud is folded into pgd in case of 3-level page table */
+#define PUD_SHIFT      30
+#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
+#define PUD_MASK       (~(PUD_SIZE - 1))
+
 #define PMD_SHIFT       21
 /* Size of region mapped by a page middle directory */
 #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
 #define PMD_MASK        (~(PMD_SIZE - 1))
 
+/* Page Upper Directory entry */
+typedef struct {
+	unsigned long pud;
+} pud_t;
+
+#define pud_val(x)      ((x).pud)
+#define __pud(x)        ((pud_t) { (x) })
+#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
+
 /* Page Middle Directory entry */
 typedef struct {
 	unsigned long pmd;
@@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
 	set_pud(pudp, __pud(0));
 }
 
+static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
+{
+	return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
+}
+
+static inline unsigned long _pud_pfn(pud_t pud)
+{
+	return pud_val(pud) >> _PAGE_PFN_SHIFT;
+}
+
 static inline pmd_t *pud_pgtable(pud_t pud)
 {
 	return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
@@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
 	return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
 }
 
+#define mm_pud_folded  mm_pud_folded
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+	if (pgtable_l4_enabled)
+		return false;
+
+	return true;
+}
+
+#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
+
 static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
 {
 	return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
@@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
 #define pmd_ERROR(e) \
 	pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
 
+#define pud_ERROR(e)   \
+	pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
+
+static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		*p4dp = p4d;
+	else
+		set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
+}
+
+static inline int p4d_none(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return (p4d_val(p4d) == 0);
+
+	return 0;
+}
+
+static inline int p4d_present(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return (p4d_val(p4d) & _PAGE_PRESENT);
+
+	return 1;
+}
+
+static inline int p4d_bad(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return !p4d_present(p4d);
+
+	return 0;
+}
+
+static inline void p4d_clear(p4d_t *p4d)
+{
+	if (pgtable_l4_enabled)
+		set_p4d(p4d, __p4d(0));
+}
+
+static inline pud_t *p4d_pgtable(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
+
+	return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
+}
+
+static inline struct page *p4d_page(p4d_t p4d)
+{
+	return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
+}
+
+#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+
+#define pud_offset pud_offset
+static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
+{
+	if (pgtable_l4_enabled)
+		return p4d_pgtable(*p4d) + pud_index(address);
+
+	return (pud_t *)p4d;
+}
+
 #endif /* _ASM_RISCV_PGTABLE_64_H */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index e1a52e22ad7e..e1c74ef4ead2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -51,7 +51,7 @@
  * position vmemmap directly below the VMALLOC region.
  */
 #ifdef CONFIG_64BIT
-#define VA_BITS		39
+#define VA_BITS		(pgtable_l4_enabled ? 48 : 39)
 #else
 #define VA_BITS		32
 #endif
@@ -90,8 +90,7 @@
 
 #ifndef __ASSEMBLY__
 
-/* Page Upper Directory not used in RISC-V */
-#include <asm-generic/pgtable-nopud.h>
+#include <asm-generic/pgtable-nop4d.h>
 #include <asm/page.h>
 #include <asm/tlbflush.h>
 #include <linux/mm_types.h>
@@ -113,6 +112,17 @@
 #define XIP_FIXUP(addr)		(addr)
 #endif /* CONFIG_XIP_KERNEL */
 
+struct pt_alloc_ops {
+	pte_t *(*get_pte_virt)(phys_addr_t pa);
+	phys_addr_t (*alloc_pte)(uintptr_t va);
+#ifndef __PAGETABLE_PMD_FOLDED
+	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
+	phys_addr_t (*alloc_pmd)(uintptr_t va);
+	pud_t *(*get_pud_virt)(phys_addr_t pa);
+	phys_addr_t (*alloc_pud)(uintptr_t va);
+#endif
+};
+
 #ifdef CONFIG_MMU
 /* Number of entries in the page global directory */
 #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
@@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
  * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
  */
 #ifdef CONFIG_64BIT
-#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
+#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
+#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
 #else
-#define TASK_SIZE FIXADDR_START
+#define TASK_SIZE	FIXADDR_START
+#define TASK_SIZE_MIN	TASK_SIZE
 #endif
 
 #else /* CONFIG_MMU */
@@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
 #define dtb_early_va	_dtb_early_va
 #define dtb_early_pa	_dtb_early_pa
 #endif /* CONFIG_XIP_KERNEL */
+extern u64 satp_mode;
+extern bool pgtable_l4_enabled;
 
 void paging_init(void);
 void misc_mem_init(void);
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index 52c5ff9804c5..c3c0ed559770 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -95,7 +95,8 @@ relocate:
 
 	/* Compute satp for kernel page tables, but don't load it yet */
 	srl a2, a0, PAGE_SHIFT
-	li a1, SATP_MODE
+	la a1, satp_mode
+	REG_L a1, 0(a1)
 	or a2, a2, a1
 
 	/*
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index ee3459cb6750..a7246872bd30 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 switch_mm_fast:
 	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
 		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
-		  SATP_MODE);
+		  satp_mode);
 
 	if (need_flush_tlb)
 		local_flush_tlb_all();
@@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 static void set_mm_noasid(struct mm_struct *mm)
 {
 	/* Switch the page table and blindly nuke entire local TLB */
-	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
+	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
 	local_flush_tlb_all();
 }
 
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 1552226fb6bd..6a19a1b1caf8 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
 #define kernel_map	(*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
 #endif
 
+#ifdef CONFIG_64BIT
+u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
+#else
+u64 satp_mode = SATP_MODE_32;
+#endif
+EXPORT_SYMBOL(satp_mode);
+
+bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
+				true : false;
+EXPORT_SYMBOL(pgtable_l4_enabled);
+
 phys_addr_t phys_ram_base __ro_after_init;
 EXPORT_SYMBOL(phys_ram_base);
 
@@ -53,15 +64,6 @@ extern char _start[];
 void *_dtb_early_va __initdata;
 uintptr_t _dtb_early_pa __initdata;
 
-struct pt_alloc_ops {
-	pte_t *(*get_pte_virt)(phys_addr_t pa);
-	phys_addr_t (*alloc_pte)(uintptr_t va);
-#ifndef __PAGETABLE_PMD_FOLDED
-	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
-	phys_addr_t (*alloc_pmd)(uintptr_t va);
-#endif
-};
-
 static phys_addr_t dma32_phys_limit __initdata;
 
 static void __init zone_sizes_init(void)
@@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
 }
 
 #ifdef CONFIG_MMU
-static struct pt_alloc_ops _pt_ops __initdata;
+struct pt_alloc_ops _pt_ops __initdata;
 
 #ifdef CONFIG_XIP_KERNEL
 #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
@@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
 
 pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
+static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
 static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
 
 #ifdef CONFIG_XIP_KERNEL
@@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
 #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
 #endif /* CONFIG_XIP_KERNEL */
 
+static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
+static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
+static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
+
+#ifdef CONFIG_XIP_KERNEL
+#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
+#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
+#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
+#endif /* CONFIG_XIP_KERNEL */
+
 static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
 {
 	/* Before MMU is enabled */
@@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
 
 static phys_addr_t __init alloc_pmd_early(uintptr_t va)
 {
-	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
+	BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
 
 	return (uintptr_t)early_pmd;
 }
@@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
 	create_pte_mapping(ptep, va, pa, sz, prot);
 }
 
-#define pgd_next_t		pmd_t
-#define alloc_pgd_next(__va)	pt_ops.alloc_pmd(__va)
-#define get_pgd_next_virt(__pa)	pt_ops.get_pmd_virt(__pa)
+static pud_t *__init get_pud_virt_early(phys_addr_t pa)
+{
+	return (pud_t *)((uintptr_t)pa);
+}
+
+static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
+{
+	clear_fixmap(FIX_PUD);
+	return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
+}
+
+static pud_t *__init get_pud_virt_late(phys_addr_t pa)
+{
+	return (pud_t *)__va(pa);
+}
+
+static phys_addr_t __init alloc_pud_early(uintptr_t va)
+{
+	/* Only one PUD is available for early mapping */
+	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
+
+	return (uintptr_t)early_pud;
+}
+
+static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
+{
+	return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
+}
+
+static phys_addr_t alloc_pud_late(uintptr_t va)
+{
+	unsigned long vaddr;
+
+	vaddr = __get_free_page(GFP_KERNEL);
+	BUG_ON(!vaddr);
+	return __pa(vaddr);
+}
+
+static void __init create_pud_mapping(pud_t *pudp,
+				      uintptr_t va, phys_addr_t pa,
+				      phys_addr_t sz, pgprot_t prot)
+{
+	pmd_t *nextp;
+	phys_addr_t next_phys;
+	uintptr_t pud_index = pud_index(va);
+
+	if (sz == PUD_SIZE) {
+		if (pud_val(pudp[pud_index]) == 0)
+			pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
+		return;
+	}
+
+	if (pud_val(pudp[pud_index]) == 0) {
+		next_phys = pt_ops.alloc_pmd(va);
+		pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
+		nextp = pt_ops.get_pmd_virt(next_phys);
+		memset(nextp, 0, PAGE_SIZE);
+	} else {
+		next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
+		nextp = pt_ops.get_pmd_virt(next_phys);
+	}
+
+	create_pmd_mapping(nextp, va, pa, sz, prot);
+}
+
+#define pgd_next_t		pud_t
+#define alloc_pgd_next(__va)	(pgtable_l4_enabled ?			\
+		pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
+#define get_pgd_next_virt(__pa)	(pgtable_l4_enabled ?			\
+		pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
 #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
-	create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
-#define fixmap_pgd_next		fixmap_pmd
+				(pgtable_l4_enabled ?			\
+		create_pud_mapping(__nextp, __va, __pa, __sz, __prot) :	\
+		create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
+#define fixmap_pgd_next		(pgtable_l4_enabled ?			\
+		(uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
+#define trampoline_pgd_next	(pgtable_l4_enabled ?			\
+		(uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
+#define early_dtb_pgd_next	(pgtable_l4_enabled ?			\
+		(uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
 #else
 #define pgd_next_t		pte_t
 #define alloc_pgd_next(__va)	pt_ops.alloc_pte(__va)
 #define get_pgd_next_virt(__pa)	pt_ops.get_pte_virt(__pa)
 #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
 	create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
-#define fixmap_pgd_next		fixmap_pte
+#define fixmap_pgd_next		((uintptr_t)fixmap_pte)
+#define early_dtb_pgd_next	((uintptr_t)early_dtb_pmd)
+#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
 #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
-#endif
+#endif /* __PAGETABLE_PMD_FOLDED */
 
 void __init create_pgd_mapping(pgd_t *pgdp,
 				      uintptr_t va, phys_addr_t pa,
@@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
 }
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
+#ifdef CONFIG_64BIT
+static void __init disable_pgtable_l4(void)
+{
+	pgtable_l4_enabled = false;
+	kernel_map.page_offset = PAGE_OFFSET_L3;
+	satp_mode = SATP_MODE_39;
+}
+
+/*
+ * There is a simple way to determine if 4-level is supported by the
+ * underlying hardware: establish 1:1 mapping in 4-level page table mode
+ * then read SATP to see if the configuration was taken into account
+ * meaning sv48 is supported.
+ */
+static __init void set_satp_mode(void)
+{
+	u64 identity_satp, hw_satp;
+	uintptr_t set_satp_mode_pmd;
+
+	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
+	create_pgd_mapping(early_pg_dir,
+			   set_satp_mode_pmd, (uintptr_t)early_pud,
+			   PGDIR_SIZE, PAGE_TABLE);
+	create_pud_mapping(early_pud,
+			   set_satp_mode_pmd, (uintptr_t)early_pmd,
+			   PUD_SIZE, PAGE_TABLE);
+	/* Handle the case where set_satp_mode straddles 2 PMDs */
+	create_pmd_mapping(early_pmd,
+			   set_satp_mode_pmd, set_satp_mode_pmd,
+			   PMD_SIZE, PAGE_KERNEL_EXEC);
+	create_pmd_mapping(early_pmd,
+			   set_satp_mode_pmd + PMD_SIZE,
+			   set_satp_mode_pmd + PMD_SIZE,
+			   PMD_SIZE, PAGE_KERNEL_EXEC);
+
+	identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
+
+	local_flush_tlb_all();
+	csr_write(CSR_SATP, identity_satp);
+	hw_satp = csr_swap(CSR_SATP, 0ULL);
+	local_flush_tlb_all();
+
+	if (hw_satp != identity_satp)
+		disable_pgtable_l4();
+
+	memset(early_pg_dir, 0, PAGE_SIZE);
+	memset(early_pud, 0, PAGE_SIZE);
+	memset(early_pmd, 0, PAGE_SIZE);
+}
+#endif
+
 /*
  * setup_vm() is called from head.S with MMU-off.
  *
@@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
 	uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
 
 	create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
-			   IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
+			   IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
 			   PGDIR_SIZE,
 			   IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
 
+	if (pgtable_l4_enabled) {
+		create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
+				   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
+	}
+
 	if (IS_ENABLED(CONFIG_64BIT)) {
 		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
 				   pa, PMD_SIZE, PAGE_KERNEL);
@@ -593,6 +738,8 @@ void pt_ops_set_early(void)
 #ifndef __PAGETABLE_PMD_FOLDED
 	pt_ops.alloc_pmd = alloc_pmd_early;
 	pt_ops.get_pmd_virt = get_pmd_virt_early;
+	pt_ops.alloc_pud = alloc_pud_early;
+	pt_ops.get_pud_virt = get_pud_virt_early;
 #endif
 }
 
@@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
 #ifndef __PAGETABLE_PMD_FOLDED
 	pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
 	pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
+	pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
+	pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
 #endif
 }
 
@@ -625,6 +774,8 @@ void pt_ops_set_late(void)
 #ifndef __PAGETABLE_PMD_FOLDED
 	pt_ops.alloc_pmd = alloc_pmd_late;
 	pt_ops.get_pmd_virt = get_pmd_virt_late;
+	pt_ops.alloc_pud = alloc_pud_late;
+	pt_ops.get_pud_virt = get_pud_virt_late;
 #endif
 }
 
@@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
 
 	kernel_map.virt_addr = KERNEL_LINK_ADDR;
+	kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
 
 #ifdef CONFIG_XIP_KERNEL
 	kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
@@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	kernel_map.phys_addr = (uintptr_t)(&_start);
 	kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
 #endif
+
+#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
+	set_satp_mode();
+#endif
+
 	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
 	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
 
@@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 
 	/* Setup early PGD for fixmap */
 	create_pgd_mapping(early_pg_dir, FIXADDR_START,
-			   (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
+			   fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 
 #ifndef __PAGETABLE_PMD_FOLDED
-	/* Setup fixmap PMD */
+	/* Setup fixmap PUD and PMD */
+	if (pgtable_l4_enabled)
+		create_pud_mapping(fixmap_pud, FIXADDR_START,
+				   (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
 	create_pmd_mapping(fixmap_pmd, FIXADDR_START,
 			   (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
 	/* Setup trampoline PGD and PMD */
 	create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
-			   (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
+			   trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
+	if (pgtable_l4_enabled)
+		create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
+				   (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
 #ifdef CONFIG_XIP_KERNEL
 	create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
 			   kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
@@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	 * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
 	 * range can not span multiple pmds.
 	 */
-	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
+	BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
 	/* Clear fixmap PTE and PMD mappings */
 	clear_fixmap(FIX_PTE);
 	clear_fixmap(FIX_PMD);
+	clear_fixmap(FIX_PUD);
 
 	/* Move to swapper page table */
-	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
+	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
 	local_flush_tlb_all();
 
 	pt_ops_set_late();
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 1434a0225140..993f50571a3b 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -11,7 +11,29 @@
 #include <asm/fixmap.h>
 #include <asm/pgalloc.h>
 
+/*
+ * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
+ * which is right before the kernel.
+ *
+ * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
+ * the page global directory with kasan_early_shadow_pmd.
+ *
+ * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
+ * must be divided as follows:
+ * - the first PGD entry, although incomplete, is populated with
+ *   kasan_early_shadow_pud/p4d
+ * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
+ * - the last PGD entry is shared with the kernel mapping so populated at the
+ *   lower levels pud/p4d
+ *
+ * In addition, when shallow populating a kasan region (for example vmalloc),
+ * this region may also not be aligned on PGDIR size, so we must go down to the
+ * pud level too.
+ */
+
 extern pgd_t early_pg_dir[PTRS_PER_PGD];
+extern struct pt_alloc_ops _pt_ops __initdata;
+#define pt_ops	_pt_ops
 
 static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
 {
@@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
 	set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
 }
 
-static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
+static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
 {
 	phys_addr_t phys_addr;
 	pmd_t *pmdp, *base_pmd;
 	unsigned long next;
 
-	base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
-	if (base_pmd == lm_alias(kasan_early_shadow_pmd))
+	if (pud_none(*pud)) {
 		base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
+	} else {
+		base_pmd = (pmd_t *)pud_pgtable(*pud);
+		if (base_pmd == lm_alias(kasan_early_shadow_pmd))
+			base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
+	}
 
 	pmdp = base_pmd + pmd_index(vaddr);
 
@@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
 	 * it entirely, memblock could allocate a page at a physical address
 	 * where KASAN is not populated yet and then we'd get a page fault.
 	 */
-	set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+	set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+}
+
+static void __init kasan_populate_pud(pgd_t *pgd,
+				      unsigned long vaddr, unsigned long end,
+				      bool early)
+{
+	phys_addr_t phys_addr;
+	pud_t *pudp, *base_pud;
+	unsigned long next;
+
+	if (early) {
+		/*
+		 * We can't use pgd_page_vaddr here as it would return a linear
+		 * mapping address but it is not mapped yet, but when populating
+		 * early_pg_dir, we need the physical address and when populating
+		 * swapper_pg_dir, we need the kernel virtual address so use
+		 * pt_ops facility.
+		 */
+		base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
+	} else {
+		base_pud = (pud_t *)pgd_page_vaddr(*pgd);
+		if (base_pud == lm_alias(kasan_early_shadow_pud))
+			base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
+	}
+
+	pudp = base_pud + pud_index(vaddr);
+
+	do {
+		next = pud_addr_end(vaddr, end);
+
+		if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
+			if (early) {
+				phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
+				set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
+				continue;
+			} else {
+				phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
+				if (phys_addr) {
+					set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
+					continue;
+				}
+			}
+		}
+
+		kasan_populate_pmd(pudp, vaddr, next);
+	} while (pudp++, vaddr = next, vaddr != end);
+
+	/*
+	 * Wait for the whole PGD to be populated before setting the PGD in
+	 * the page table, otherwise, if we did set the PGD before populating
+	 * it entirely, memblock could allocate a page at a physical address
+	 * where KASAN is not populated yet and then we'd get a page fault.
+	 */
+	if (!early)
+		set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
 }
 
+#define kasan_early_shadow_pgd_next			(pgtable_l4_enabled ?	\
+				(uintptr_t)kasan_early_shadow_pud :		\
+				(uintptr_t)kasan_early_shadow_pmd)
+#define kasan_populate_pgd_next(pgdp, vaddr, next, early)			\
+		(pgtable_l4_enabled ?						\
+			kasan_populate_pud(pgdp, vaddr, next, early) :		\
+			kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
+
 static void __init kasan_populate_pgd(pgd_t *pgdp,
 				      unsigned long vaddr, unsigned long end,
 				      bool early)
@@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
 			}
 		}
 
-		kasan_populate_pmd(pgdp, vaddr, next);
+		kasan_populate_pgd_next(pgdp, vaddr, next, early);
 	} while (pgdp++, vaddr = next, vaddr != end);
 }
 
@@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
 	memset(start, KASAN_SHADOW_INIT, end - start);
 }
 
+static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
+					      unsigned long vaddr, unsigned long end,
+					      bool kasan_populate)
+{
+	unsigned long next;
+	pud_t *pudp, *base_pud;
+	pmd_t *base_pmd;
+	bool is_kasan_pmd;
+
+	base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
+	pudp = base_pud + pud_index(vaddr);
+
+	if (kasan_populate)
+		memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
+		       sizeof(pud_t) * PTRS_PER_PUD);
+
+	do {
+		next = pud_addr_end(vaddr, end);
+		is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
+
+		if (is_kasan_pmd) {
+			base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+			set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+		}
+	} while (pudp++, vaddr = next, vaddr != end);
+}
+
 static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
 {
 	unsigned long next;
 	void *p;
 	pgd_t *pgd_k = pgd_offset_k(vaddr);
+	bool is_kasan_pgd_next;
 
 	do {
 		next = pgd_addr_end(vaddr, end);
-		if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
+		is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
+				     (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
+
+		if (is_kasan_pgd_next) {
 			p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 			set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
 		}
+
+		if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
+			continue;
+
+		kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
 	} while (pgd_k++, vaddr = next, vaddr != end);
 }
 
diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
index 26e69788f27a..b3db5d91ed38 100644
--- a/drivers/firmware/efi/libstub/efi-stub.c
+++ b/drivers/firmware/efi/libstub/efi-stub.c
@@ -40,6 +40,8 @@
 
 #ifdef CONFIG_ARM64
 # define EFI_RT_VIRTUAL_LIMIT	DEFAULT_MAP_WINDOW_64
+#elif defined(CONFIG_RISCV)
+# define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE_MIN
 #else
 # define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE
 #endif
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 07/13] riscv: Implement sv48 support
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

By adding a new 4th level of page table, give the possibility to 64bit
kernel to address 2^48 bytes of virtual address: in practice, that offers
128TB of virtual address space to userspace and allows up to 64TB of
physical memory.

If the underlying hardware does not support sv48, we will automatically
fallback to a standard 3-level page table by folding the new PUD level into
PGDIR level. In order to detect HW capabilities at runtime, we
use SATP feature that ignores writes with an unsupported mode.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/Kconfig                      |   4 +-
 arch/riscv/include/asm/csr.h            |   3 +-
 arch/riscv/include/asm/fixmap.h         |   1 +
 arch/riscv/include/asm/kasan.h          |   6 +-
 arch/riscv/include/asm/page.h           |  14 ++
 arch/riscv/include/asm/pgalloc.h        |  40 +++++
 arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
 arch/riscv/include/asm/pgtable.h        |  24 ++-
 arch/riscv/kernel/head.S                |   3 +-
 arch/riscv/mm/context.c                 |   4 +-
 arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
 arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
 drivers/firmware/efi/libstub/efi-stub.c |   2 +
 13 files changed, 514 insertions(+), 44 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ac6c0cd9bc29..d28fe0148e13 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -150,7 +150,7 @@ config PAGE_OFFSET
 	hex
 	default 0xC0000000 if 32BIT
 	default 0x80000000 if 64BIT && !MMU
-	default 0xffffffd800000000 if 64BIT
+	default 0xffffaf8000000000 if 64BIT
 
 config KASAN_SHADOW_OFFSET
 	hex
@@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
 
 config PGTABLE_LEVELS
 	int
-	default 3 if 64BIT
+	default 4 if 64BIT
 	default 2
 
 config LOCKDEP_SUPPORT
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 87ac65696871..3fdb971c7896 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -40,14 +40,13 @@
 #ifndef CONFIG_64BIT
 #define SATP_PPN	_AC(0x003FFFFF, UL)
 #define SATP_MODE_32	_AC(0x80000000, UL)
-#define SATP_MODE	SATP_MODE_32
 #define SATP_ASID_BITS	9
 #define SATP_ASID_SHIFT	22
 #define SATP_ASID_MASK	_AC(0x1FF, UL)
 #else
 #define SATP_PPN	_AC(0x00000FFFFFFFFFFF, UL)
 #define SATP_MODE_39	_AC(0x8000000000000000, UL)
-#define SATP_MODE	SATP_MODE_39
+#define SATP_MODE_48	_AC(0x9000000000000000, UL)
 #define SATP_ASID_BITS	16
 #define SATP_ASID_SHIFT	44
 #define SATP_ASID_MASK	_AC(0xFFFF, UL)
diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
index 54cbf07fb4e9..58a718573ad6 100644
--- a/arch/riscv/include/asm/fixmap.h
+++ b/arch/riscv/include/asm/fixmap.h
@@ -24,6 +24,7 @@ enum fixed_addresses {
 	FIX_HOLE,
 	FIX_PTE,
 	FIX_PMD,
+	FIX_PUD,
 	FIX_TEXT_POKE1,
 	FIX_TEXT_POKE0,
 	FIX_EARLYCON_MEM_BASE,
diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index 743e6ff57996..0b85e363e778 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -28,7 +28,11 @@
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
 #define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
-#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
+/*
+ * Depending on the size of the virtual address space, the region may not be
+ * aligned on PGDIR_SIZE, so force its alignment to ease its population.
+ */
+#define KASAN_SHADOW_START	((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
 #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
 
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index e03559f9b35e..d089fe46f7d8 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -31,7 +31,20 @@
  * When not using MMU this corresponds to the first free page in
  * physical memory (aligned on a page boundary).
  */
+#ifdef CONFIG_64BIT
+#ifdef CONFIG_MMU
+#define PAGE_OFFSET		kernel_map.page_offset
+#else
+#define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
+#endif
+/*
+ * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
+ * define the PAGE_OFFSET value for SV39.
+ */
+#define PAGE_OFFSET_L3		_AC(0xffffffd800000000, UL)
+#else
 #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
+#endif /* CONFIG_64BIT */
 
 /*
  * Half of the kernel address space (half of the entries of the page global
@@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
 #endif /* CONFIG_MMU */
 
 struct kernel_mapping {
+	unsigned long page_offset;
 	unsigned long virt_addr;
 	uintptr_t phys_addr;
 	uintptr_t size;
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 0af6933a7100..11823004b87a 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -11,6 +11,8 @@
 #include <asm/tlb.h>
 
 #ifdef CONFIG_MMU
+#define __HAVE_ARCH_PUD_ALLOC_ONE
+#define __HAVE_ARCH_PUD_FREE
 #include <asm-generic/pgalloc.h>
 
 static inline void pmd_populate_kernel(struct mm_struct *mm,
@@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
 
 	set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 }
+
+static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
+{
+	if (pgtable_l4_enabled) {
+		unsigned long pfn = virt_to_pfn(pud);
+
+		set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
+	}
+}
+
+static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
+				     pud_t *pud)
+{
+	if (pgtable_l4_enabled) {
+		unsigned long pfn = virt_to_pfn(pud);
+
+		set_p4d_safe(p4d,
+			     __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
+	}
+}
+
+#define pud_alloc_one pud_alloc_one
+static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	if (pgtable_l4_enabled)
+		return __pud_alloc_one(mm, addr);
+
+	return NULL;
+}
+
+#define pud_free pud_free
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+	if (pgtable_l4_enabled)
+		__pud_free(mm, pud);
+}
+
+#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
 #endif /* __PAGETABLE_PMD_FOLDED */
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 228261aa9628..bbbdd66e5e2f 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -8,16 +8,36 @@
 
 #include <linux/const.h>
 
-#define PGDIR_SHIFT     30
+extern bool pgtable_l4_enabled;
+
+#define PGDIR_SHIFT_L3  30
+#define PGDIR_SHIFT_L4  39
+#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
+
+#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
 /* Size of region mapped by a page global directory */
 #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
 #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
 
+/* pud is folded into pgd in case of 3-level page table */
+#define PUD_SHIFT      30
+#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
+#define PUD_MASK       (~(PUD_SIZE - 1))
+
 #define PMD_SHIFT       21
 /* Size of region mapped by a page middle directory */
 #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
 #define PMD_MASK        (~(PMD_SIZE - 1))
 
+/* Page Upper Directory entry */
+typedef struct {
+	unsigned long pud;
+} pud_t;
+
+#define pud_val(x)      ((x).pud)
+#define __pud(x)        ((pud_t) { (x) })
+#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
+
 /* Page Middle Directory entry */
 typedef struct {
 	unsigned long pmd;
@@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
 	set_pud(pudp, __pud(0));
 }
 
+static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
+{
+	return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
+}
+
+static inline unsigned long _pud_pfn(pud_t pud)
+{
+	return pud_val(pud) >> _PAGE_PFN_SHIFT;
+}
+
 static inline pmd_t *pud_pgtable(pud_t pud)
 {
 	return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
@@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
 	return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
 }
 
+#define mm_pud_folded  mm_pud_folded
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+	if (pgtable_l4_enabled)
+		return false;
+
+	return true;
+}
+
+#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
+
 static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
 {
 	return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
@@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
 #define pmd_ERROR(e) \
 	pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
 
+#define pud_ERROR(e)   \
+	pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
+
+static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		*p4dp = p4d;
+	else
+		set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
+}
+
+static inline int p4d_none(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return (p4d_val(p4d) == 0);
+
+	return 0;
+}
+
+static inline int p4d_present(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return (p4d_val(p4d) & _PAGE_PRESENT);
+
+	return 1;
+}
+
+static inline int p4d_bad(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return !p4d_present(p4d);
+
+	return 0;
+}
+
+static inline void p4d_clear(p4d_t *p4d)
+{
+	if (pgtable_l4_enabled)
+		set_p4d(p4d, __p4d(0));
+}
+
+static inline pud_t *p4d_pgtable(p4d_t p4d)
+{
+	if (pgtable_l4_enabled)
+		return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
+
+	return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
+}
+
+static inline struct page *p4d_page(p4d_t p4d)
+{
+	return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
+}
+
+#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+
+#define pud_offset pud_offset
+static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
+{
+	if (pgtable_l4_enabled)
+		return p4d_pgtable(*p4d) + pud_index(address);
+
+	return (pud_t *)p4d;
+}
+
 #endif /* _ASM_RISCV_PGTABLE_64_H */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index e1a52e22ad7e..e1c74ef4ead2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -51,7 +51,7 @@
  * position vmemmap directly below the VMALLOC region.
  */
 #ifdef CONFIG_64BIT
-#define VA_BITS		39
+#define VA_BITS		(pgtable_l4_enabled ? 48 : 39)
 #else
 #define VA_BITS		32
 #endif
@@ -90,8 +90,7 @@
 
 #ifndef __ASSEMBLY__
 
-/* Page Upper Directory not used in RISC-V */
-#include <asm-generic/pgtable-nopud.h>
+#include <asm-generic/pgtable-nop4d.h>
 #include <asm/page.h>
 #include <asm/tlbflush.h>
 #include <linux/mm_types.h>
@@ -113,6 +112,17 @@
 #define XIP_FIXUP(addr)		(addr)
 #endif /* CONFIG_XIP_KERNEL */
 
+struct pt_alloc_ops {
+	pte_t *(*get_pte_virt)(phys_addr_t pa);
+	phys_addr_t (*alloc_pte)(uintptr_t va);
+#ifndef __PAGETABLE_PMD_FOLDED
+	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
+	phys_addr_t (*alloc_pmd)(uintptr_t va);
+	pud_t *(*get_pud_virt)(phys_addr_t pa);
+	phys_addr_t (*alloc_pud)(uintptr_t va);
+#endif
+};
+
 #ifdef CONFIG_MMU
 /* Number of entries in the page global directory */
 #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
@@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
  * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
  */
 #ifdef CONFIG_64BIT
-#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
+#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
+#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
 #else
-#define TASK_SIZE FIXADDR_START
+#define TASK_SIZE	FIXADDR_START
+#define TASK_SIZE_MIN	TASK_SIZE
 #endif
 
 #else /* CONFIG_MMU */
@@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
 #define dtb_early_va	_dtb_early_va
 #define dtb_early_pa	_dtb_early_pa
 #endif /* CONFIG_XIP_KERNEL */
+extern u64 satp_mode;
+extern bool pgtable_l4_enabled;
 
 void paging_init(void);
 void misc_mem_init(void);
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index 52c5ff9804c5..c3c0ed559770 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -95,7 +95,8 @@ relocate:
 
 	/* Compute satp for kernel page tables, but don't load it yet */
 	srl a2, a0, PAGE_SHIFT
-	li a1, SATP_MODE
+	la a1, satp_mode
+	REG_L a1, 0(a1)
 	or a2, a2, a1
 
 	/*
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index ee3459cb6750..a7246872bd30 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 switch_mm_fast:
 	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
 		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
-		  SATP_MODE);
+		  satp_mode);
 
 	if (need_flush_tlb)
 		local_flush_tlb_all();
@@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 static void set_mm_noasid(struct mm_struct *mm)
 {
 	/* Switch the page table and blindly nuke entire local TLB */
-	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
+	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
 	local_flush_tlb_all();
 }
 
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 1552226fb6bd..6a19a1b1caf8 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
 #define kernel_map	(*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
 #endif
 
+#ifdef CONFIG_64BIT
+u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
+#else
+u64 satp_mode = SATP_MODE_32;
+#endif
+EXPORT_SYMBOL(satp_mode);
+
+bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
+				true : false;
+EXPORT_SYMBOL(pgtable_l4_enabled);
+
 phys_addr_t phys_ram_base __ro_after_init;
 EXPORT_SYMBOL(phys_ram_base);
 
@@ -53,15 +64,6 @@ extern char _start[];
 void *_dtb_early_va __initdata;
 uintptr_t _dtb_early_pa __initdata;
 
-struct pt_alloc_ops {
-	pte_t *(*get_pte_virt)(phys_addr_t pa);
-	phys_addr_t (*alloc_pte)(uintptr_t va);
-#ifndef __PAGETABLE_PMD_FOLDED
-	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
-	phys_addr_t (*alloc_pmd)(uintptr_t va);
-#endif
-};
-
 static phys_addr_t dma32_phys_limit __initdata;
 
 static void __init zone_sizes_init(void)
@@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
 }
 
 #ifdef CONFIG_MMU
-static struct pt_alloc_ops _pt_ops __initdata;
+struct pt_alloc_ops _pt_ops __initdata;
 
 #ifdef CONFIG_XIP_KERNEL
 #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
@@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
 
 pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
+static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
 static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
 
 #ifdef CONFIG_XIP_KERNEL
@@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
 #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
 #endif /* CONFIG_XIP_KERNEL */
 
+static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
+static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
+static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
+
+#ifdef CONFIG_XIP_KERNEL
+#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
+#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
+#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
+#endif /* CONFIG_XIP_KERNEL */
+
 static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
 {
 	/* Before MMU is enabled */
@@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
 
 static phys_addr_t __init alloc_pmd_early(uintptr_t va)
 {
-	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
+	BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
 
 	return (uintptr_t)early_pmd;
 }
@@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
 	create_pte_mapping(ptep, va, pa, sz, prot);
 }
 
-#define pgd_next_t		pmd_t
-#define alloc_pgd_next(__va)	pt_ops.alloc_pmd(__va)
-#define get_pgd_next_virt(__pa)	pt_ops.get_pmd_virt(__pa)
+static pud_t *__init get_pud_virt_early(phys_addr_t pa)
+{
+	return (pud_t *)((uintptr_t)pa);
+}
+
+static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
+{
+	clear_fixmap(FIX_PUD);
+	return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
+}
+
+static pud_t *__init get_pud_virt_late(phys_addr_t pa)
+{
+	return (pud_t *)__va(pa);
+}
+
+static phys_addr_t __init alloc_pud_early(uintptr_t va)
+{
+	/* Only one PUD is available for early mapping */
+	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
+
+	return (uintptr_t)early_pud;
+}
+
+static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
+{
+	return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
+}
+
+static phys_addr_t alloc_pud_late(uintptr_t va)
+{
+	unsigned long vaddr;
+
+	vaddr = __get_free_page(GFP_KERNEL);
+	BUG_ON(!vaddr);
+	return __pa(vaddr);
+}
+
+static void __init create_pud_mapping(pud_t *pudp,
+				      uintptr_t va, phys_addr_t pa,
+				      phys_addr_t sz, pgprot_t prot)
+{
+	pmd_t *nextp;
+	phys_addr_t next_phys;
+	uintptr_t pud_index = pud_index(va);
+
+	if (sz == PUD_SIZE) {
+		if (pud_val(pudp[pud_index]) == 0)
+			pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
+		return;
+	}
+
+	if (pud_val(pudp[pud_index]) == 0) {
+		next_phys = pt_ops.alloc_pmd(va);
+		pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
+		nextp = pt_ops.get_pmd_virt(next_phys);
+		memset(nextp, 0, PAGE_SIZE);
+	} else {
+		next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
+		nextp = pt_ops.get_pmd_virt(next_phys);
+	}
+
+	create_pmd_mapping(nextp, va, pa, sz, prot);
+}
+
+#define pgd_next_t		pud_t
+#define alloc_pgd_next(__va)	(pgtable_l4_enabled ?			\
+		pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
+#define get_pgd_next_virt(__pa)	(pgtable_l4_enabled ?			\
+		pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
 #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
-	create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
-#define fixmap_pgd_next		fixmap_pmd
+				(pgtable_l4_enabled ?			\
+		create_pud_mapping(__nextp, __va, __pa, __sz, __prot) :	\
+		create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
+#define fixmap_pgd_next		(pgtable_l4_enabled ?			\
+		(uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
+#define trampoline_pgd_next	(pgtable_l4_enabled ?			\
+		(uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
+#define early_dtb_pgd_next	(pgtable_l4_enabled ?			\
+		(uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
 #else
 #define pgd_next_t		pte_t
 #define alloc_pgd_next(__va)	pt_ops.alloc_pte(__va)
 #define get_pgd_next_virt(__pa)	pt_ops.get_pte_virt(__pa)
 #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
 	create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
-#define fixmap_pgd_next		fixmap_pte
+#define fixmap_pgd_next		((uintptr_t)fixmap_pte)
+#define early_dtb_pgd_next	((uintptr_t)early_dtb_pmd)
+#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
 #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
-#endif
+#endif /* __PAGETABLE_PMD_FOLDED */
 
 void __init create_pgd_mapping(pgd_t *pgdp,
 				      uintptr_t va, phys_addr_t pa,
@@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
 }
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
+#ifdef CONFIG_64BIT
+static void __init disable_pgtable_l4(void)
+{
+	pgtable_l4_enabled = false;
+	kernel_map.page_offset = PAGE_OFFSET_L3;
+	satp_mode = SATP_MODE_39;
+}
+
+/*
+ * There is a simple way to determine if 4-level is supported by the
+ * underlying hardware: establish 1:1 mapping in 4-level page table mode
+ * then read SATP to see if the configuration was taken into account
+ * meaning sv48 is supported.
+ */
+static __init void set_satp_mode(void)
+{
+	u64 identity_satp, hw_satp;
+	uintptr_t set_satp_mode_pmd;
+
+	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
+	create_pgd_mapping(early_pg_dir,
+			   set_satp_mode_pmd, (uintptr_t)early_pud,
+			   PGDIR_SIZE, PAGE_TABLE);
+	create_pud_mapping(early_pud,
+			   set_satp_mode_pmd, (uintptr_t)early_pmd,
+			   PUD_SIZE, PAGE_TABLE);
+	/* Handle the case where set_satp_mode straddles 2 PMDs */
+	create_pmd_mapping(early_pmd,
+			   set_satp_mode_pmd, set_satp_mode_pmd,
+			   PMD_SIZE, PAGE_KERNEL_EXEC);
+	create_pmd_mapping(early_pmd,
+			   set_satp_mode_pmd + PMD_SIZE,
+			   set_satp_mode_pmd + PMD_SIZE,
+			   PMD_SIZE, PAGE_KERNEL_EXEC);
+
+	identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
+
+	local_flush_tlb_all();
+	csr_write(CSR_SATP, identity_satp);
+	hw_satp = csr_swap(CSR_SATP, 0ULL);
+	local_flush_tlb_all();
+
+	if (hw_satp != identity_satp)
+		disable_pgtable_l4();
+
+	memset(early_pg_dir, 0, PAGE_SIZE);
+	memset(early_pud, 0, PAGE_SIZE);
+	memset(early_pmd, 0, PAGE_SIZE);
+}
+#endif
+
 /*
  * setup_vm() is called from head.S with MMU-off.
  *
@@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
 	uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
 
 	create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
-			   IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
+			   IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
 			   PGDIR_SIZE,
 			   IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
 
+	if (pgtable_l4_enabled) {
+		create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
+				   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
+	}
+
 	if (IS_ENABLED(CONFIG_64BIT)) {
 		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
 				   pa, PMD_SIZE, PAGE_KERNEL);
@@ -593,6 +738,8 @@ void pt_ops_set_early(void)
 #ifndef __PAGETABLE_PMD_FOLDED
 	pt_ops.alloc_pmd = alloc_pmd_early;
 	pt_ops.get_pmd_virt = get_pmd_virt_early;
+	pt_ops.alloc_pud = alloc_pud_early;
+	pt_ops.get_pud_virt = get_pud_virt_early;
 #endif
 }
 
@@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
 #ifndef __PAGETABLE_PMD_FOLDED
 	pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
 	pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
+	pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
+	pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
 #endif
 }
 
@@ -625,6 +774,8 @@ void pt_ops_set_late(void)
 #ifndef __PAGETABLE_PMD_FOLDED
 	pt_ops.alloc_pmd = alloc_pmd_late;
 	pt_ops.get_pmd_virt = get_pmd_virt_late;
+	pt_ops.alloc_pud = alloc_pud_late;
+	pt_ops.get_pud_virt = get_pud_virt_late;
 #endif
 }
 
@@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
 
 	kernel_map.virt_addr = KERNEL_LINK_ADDR;
+	kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
 
 #ifdef CONFIG_XIP_KERNEL
 	kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
@@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	kernel_map.phys_addr = (uintptr_t)(&_start);
 	kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
 #endif
+
+#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
+	set_satp_mode();
+#endif
+
 	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
 	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
 
@@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 
 	/* Setup early PGD for fixmap */
 	create_pgd_mapping(early_pg_dir, FIXADDR_START,
-			   (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
+			   fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 
 #ifndef __PAGETABLE_PMD_FOLDED
-	/* Setup fixmap PMD */
+	/* Setup fixmap PUD and PMD */
+	if (pgtable_l4_enabled)
+		create_pud_mapping(fixmap_pud, FIXADDR_START,
+				   (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
 	create_pmd_mapping(fixmap_pmd, FIXADDR_START,
 			   (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
 	/* Setup trampoline PGD and PMD */
 	create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
-			   (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
+			   trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
+	if (pgtable_l4_enabled)
+		create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
+				   (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
 #ifdef CONFIG_XIP_KERNEL
 	create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
 			   kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
@@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	 * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
 	 * range can not span multiple pmds.
 	 */
-	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
+	BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
 	/* Clear fixmap PTE and PMD mappings */
 	clear_fixmap(FIX_PTE);
 	clear_fixmap(FIX_PMD);
+	clear_fixmap(FIX_PUD);
 
 	/* Move to swapper page table */
-	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
+	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
 	local_flush_tlb_all();
 
 	pt_ops_set_late();
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 1434a0225140..993f50571a3b 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -11,7 +11,29 @@
 #include <asm/fixmap.h>
 #include <asm/pgalloc.h>
 
+/*
+ * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
+ * which is right before the kernel.
+ *
+ * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
+ * the page global directory with kasan_early_shadow_pmd.
+ *
+ * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
+ * must be divided as follows:
+ * - the first PGD entry, although incomplete, is populated with
+ *   kasan_early_shadow_pud/p4d
+ * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
+ * - the last PGD entry is shared with the kernel mapping so populated at the
+ *   lower levels pud/p4d
+ *
+ * In addition, when shallow populating a kasan region (for example vmalloc),
+ * this region may also not be aligned on PGDIR size, so we must go down to the
+ * pud level too.
+ */
+
 extern pgd_t early_pg_dir[PTRS_PER_PGD];
+extern struct pt_alloc_ops _pt_ops __initdata;
+#define pt_ops	_pt_ops
 
 static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
 {
@@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
 	set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
 }
 
-static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
+static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
 {
 	phys_addr_t phys_addr;
 	pmd_t *pmdp, *base_pmd;
 	unsigned long next;
 
-	base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
-	if (base_pmd == lm_alias(kasan_early_shadow_pmd))
+	if (pud_none(*pud)) {
 		base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
+	} else {
+		base_pmd = (pmd_t *)pud_pgtable(*pud);
+		if (base_pmd == lm_alias(kasan_early_shadow_pmd))
+			base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
+	}
 
 	pmdp = base_pmd + pmd_index(vaddr);
 
@@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
 	 * it entirely, memblock could allocate a page at a physical address
 	 * where KASAN is not populated yet and then we'd get a page fault.
 	 */
-	set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+	set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+}
+
+static void __init kasan_populate_pud(pgd_t *pgd,
+				      unsigned long vaddr, unsigned long end,
+				      bool early)
+{
+	phys_addr_t phys_addr;
+	pud_t *pudp, *base_pud;
+	unsigned long next;
+
+	if (early) {
+		/*
+		 * We can't use pgd_page_vaddr here as it would return a linear
+		 * mapping address but it is not mapped yet, but when populating
+		 * early_pg_dir, we need the physical address and when populating
+		 * swapper_pg_dir, we need the kernel virtual address so use
+		 * pt_ops facility.
+		 */
+		base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
+	} else {
+		base_pud = (pud_t *)pgd_page_vaddr(*pgd);
+		if (base_pud == lm_alias(kasan_early_shadow_pud))
+			base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
+	}
+
+	pudp = base_pud + pud_index(vaddr);
+
+	do {
+		next = pud_addr_end(vaddr, end);
+
+		if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
+			if (early) {
+				phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
+				set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
+				continue;
+			} else {
+				phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
+				if (phys_addr) {
+					set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
+					continue;
+				}
+			}
+		}
+
+		kasan_populate_pmd(pudp, vaddr, next);
+	} while (pudp++, vaddr = next, vaddr != end);
+
+	/*
+	 * Wait for the whole PGD to be populated before setting the PGD in
+	 * the page table, otherwise, if we did set the PGD before populating
+	 * it entirely, memblock could allocate a page at a physical address
+	 * where KASAN is not populated yet and then we'd get a page fault.
+	 */
+	if (!early)
+		set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
 }
 
+#define kasan_early_shadow_pgd_next			(pgtable_l4_enabled ?	\
+				(uintptr_t)kasan_early_shadow_pud :		\
+				(uintptr_t)kasan_early_shadow_pmd)
+#define kasan_populate_pgd_next(pgdp, vaddr, next, early)			\
+		(pgtable_l4_enabled ?						\
+			kasan_populate_pud(pgdp, vaddr, next, early) :		\
+			kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
+
 static void __init kasan_populate_pgd(pgd_t *pgdp,
 				      unsigned long vaddr, unsigned long end,
 				      bool early)
@@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
 			}
 		}
 
-		kasan_populate_pmd(pgdp, vaddr, next);
+		kasan_populate_pgd_next(pgdp, vaddr, next, early);
 	} while (pgdp++, vaddr = next, vaddr != end);
 }
 
@@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
 	memset(start, KASAN_SHADOW_INIT, end - start);
 }
 
+static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
+					      unsigned long vaddr, unsigned long end,
+					      bool kasan_populate)
+{
+	unsigned long next;
+	pud_t *pudp, *base_pud;
+	pmd_t *base_pmd;
+	bool is_kasan_pmd;
+
+	base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
+	pudp = base_pud + pud_index(vaddr);
+
+	if (kasan_populate)
+		memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
+		       sizeof(pud_t) * PTRS_PER_PUD);
+
+	do {
+		next = pud_addr_end(vaddr, end);
+		is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
+
+		if (is_kasan_pmd) {
+			base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+			set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+		}
+	} while (pudp++, vaddr = next, vaddr != end);
+}
+
 static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
 {
 	unsigned long next;
 	void *p;
 	pgd_t *pgd_k = pgd_offset_k(vaddr);
+	bool is_kasan_pgd_next;
 
 	do {
 		next = pgd_addr_end(vaddr, end);
-		if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
+		is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
+				     (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
+
+		if (is_kasan_pgd_next) {
 			p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 			set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
 		}
+
+		if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
+			continue;
+
+		kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
 	} while (pgd_k++, vaddr = next, vaddr != end);
 }
 
diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
index 26e69788f27a..b3db5d91ed38 100644
--- a/drivers/firmware/efi/libstub/efi-stub.c
+++ b/drivers/firmware/efi/libstub/efi-stub.c
@@ -40,6 +40,8 @@
 
 #ifdef CONFIG_ARM64
 # define EFI_RT_VIRTUAL_LIMIT	DEFAULT_MAP_WINDOW_64
+#elif defined(CONFIG_RISCV)
+# define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE_MIN
 #else
 # define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE
 #endif
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 08/13] riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti, Palmer Dabbelt

Now that the mmu type is determined at runtime using SATP
characteristic, use the global variable pgtable_l4_enabled to output
mmu type of the processor through /proc/cpuinfo instead of relying on
device tree infos.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 arch/riscv/kernel/cpu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 6d59e6906fdd..dea9b1c31889 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -7,6 +7,7 @@
 #include <linux/seq_file.h>
 #include <linux/of.h>
 #include <asm/smp.h>
+#include <asm/pgtable.h>
 
 /*
  * Returns the hart ID of the given device tree node, or -ENODEV if the node
@@ -70,18 +71,19 @@ static void print_isa(struct seq_file *f, const char *isa)
 	seq_puts(f, "\n");
 }
 
-static void print_mmu(struct seq_file *f, const char *mmu_type)
+static void print_mmu(struct seq_file *f)
 {
+	char sv_type[16];
+
 #if defined(CONFIG_32BIT)
-	if (strcmp(mmu_type, "riscv,sv32") != 0)
-		return;
+	strncpy(sv_type, "sv32", 5);
 #elif defined(CONFIG_64BIT)
-	if (strcmp(mmu_type, "riscv,sv39") != 0 &&
-	    strcmp(mmu_type, "riscv,sv48") != 0)
-		return;
+	if (pgtable_l4_enabled)
+		strncpy(sv_type, "sv48", 5);
+	else
+		strncpy(sv_type, "sv39", 5);
 #endif
-
-	seq_printf(f, "mmu\t\t: %s\n", mmu_type+6);
+	seq_printf(f, "mmu\t\t: %s\n", sv_type);
 }
 
 static void *c_start(struct seq_file *m, loff_t *pos)
@@ -106,14 +108,13 @@ static int c_show(struct seq_file *m, void *v)
 {
 	unsigned long cpu_id = (unsigned long)v - 1;
 	struct device_node *node = of_get_cpu_node(cpu_id, NULL);
-	const char *compat, *isa, *mmu;
+	const char *compat, *isa;
 
 	seq_printf(m, "processor\t: %lu\n", cpu_id);
 	seq_printf(m, "hart\t\t: %lu\n", cpuid_to_hartid_map(cpu_id));
 	if (!of_property_read_string(node, "riscv,isa", &isa))
 		print_isa(m, isa);
-	if (!of_property_read_string(node, "mmu-type", &mmu))
-		print_mmu(m, mmu);
+	print_mmu(m);
 	if (!of_property_read_string(node, "compatible", &compat)
 	    && strcmp(compat, "riscv"))
 		seq_printf(m, "uarch\t\t: %s\n", compat);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 08/13] riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti, Palmer Dabbelt

Now that the mmu type is determined at runtime using SATP
characteristic, use the global variable pgtable_l4_enabled to output
mmu type of the processor through /proc/cpuinfo instead of relying on
device tree infos.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 arch/riscv/kernel/cpu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 6d59e6906fdd..dea9b1c31889 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -7,6 +7,7 @@
 #include <linux/seq_file.h>
 #include <linux/of.h>
 #include <asm/smp.h>
+#include <asm/pgtable.h>
 
 /*
  * Returns the hart ID of the given device tree node, or -ENODEV if the node
@@ -70,18 +71,19 @@ static void print_isa(struct seq_file *f, const char *isa)
 	seq_puts(f, "\n");
 }
 
-static void print_mmu(struct seq_file *f, const char *mmu_type)
+static void print_mmu(struct seq_file *f)
 {
+	char sv_type[16];
+
 #if defined(CONFIG_32BIT)
-	if (strcmp(mmu_type, "riscv,sv32") != 0)
-		return;
+	strncpy(sv_type, "sv32", 5);
 #elif defined(CONFIG_64BIT)
-	if (strcmp(mmu_type, "riscv,sv39") != 0 &&
-	    strcmp(mmu_type, "riscv,sv48") != 0)
-		return;
+	if (pgtable_l4_enabled)
+		strncpy(sv_type, "sv48", 5);
+	else
+		strncpy(sv_type, "sv39", 5);
 #endif
-
-	seq_printf(f, "mmu\t\t: %s\n", mmu_type+6);
+	seq_printf(f, "mmu\t\t: %s\n", sv_type);
 }
 
 static void *c_start(struct seq_file *m, loff_t *pos)
@@ -106,14 +108,13 @@ static int c_show(struct seq_file *m, void *v)
 {
 	unsigned long cpu_id = (unsigned long)v - 1;
 	struct device_node *node = of_get_cpu_node(cpu_id, NULL);
-	const char *compat, *isa, *mmu;
+	const char *compat, *isa;
 
 	seq_printf(m, "processor\t: %lu\n", cpu_id);
 	seq_printf(m, "hart\t\t: %lu\n", cpuid_to_hartid_map(cpu_id));
 	if (!of_property_read_string(node, "riscv,isa", &isa))
 		print_isa(m, isa);
-	if (!of_property_read_string(node, "mmu-type", &mmu))
-		print_mmu(m, mmu);
+	print_mmu(m);
 	if (!of_property_read_string(node, "compatible", &compat)
 	    && strcmp(compat, "riscv"))
 		seq_printf(m, "uarch\t\t: %s\n", compat);
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 09/13] riscv: Explicit comment about user virtual address space size
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti, Palmer Dabbelt

Define precisely the size of the user accessible virtual space size
for sv32/39/48 mmu types and explain why the whole virtual address
space is split into 2 equal chunks between kernel and user space.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 arch/riscv/include/asm/pgtable.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index e1c74ef4ead2..fe1701329237 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -677,6 +677,15 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 /*
  * Task size is 0x4000000000 for RV64 or 0x9fc00000 for RV32.
  * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
+ * Task size is:
+ * -     0x9fc00000 (~2.5GB) for RV32.
+ * -   0x4000000000 ( 256GB) for RV64 using SV39 mmu
+ * - 0x800000000000 ( 128TB) for RV64 using SV48 mmu
+ *
+ * Note that PGDIR_SIZE must evenly divide TASK_SIZE since "RISC-V
+ * Instruction Set Manual Volume II: Privileged Architecture" states that
+ * "load and store effective addresses, which are 64bits, must have bits
+ * 63–48 all equal to bit 47, or else a page-fault exception will occur."
  */
 #ifdef CONFIG_64BIT
 #define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 09/13] riscv: Explicit comment about user virtual address space size
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti, Palmer Dabbelt

Define precisely the size of the user accessible virtual space size
for sv32/39/48 mmu types and explain why the whole virtual address
space is split into 2 equal chunks between kernel and user space.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 arch/riscv/include/asm/pgtable.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index e1c74ef4ead2..fe1701329237 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -677,6 +677,15 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 /*
  * Task size is 0x4000000000 for RV64 or 0x9fc00000 for RV32.
  * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
+ * Task size is:
+ * -     0x9fc00000 (~2.5GB) for RV32.
+ * -   0x4000000000 ( 256GB) for RV64 using SV39 mmu
+ * - 0x800000000000 ( 128TB) for RV64 using SV48 mmu
+ *
+ * Note that PGDIR_SIZE must evenly divide TASK_SIZE since "RISC-V
+ * Instruction Set Manual Volume II: Privileged Architecture" states that
+ * "load and store effective addresses, which are 64bits, must have bits
+ * 63–48 all equal to bit 47, or else a page-fault exception will occur."
  */
 #ifdef CONFIG_64BIT
 #define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 10/13] riscv: Improve virtual kernel memory layout dump
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

With the arrival of sv48 and its large address space, it would be
cumbersome to statically define the unit size to use to print the different
portions of the virtual memory layout: instead, determine it dynamically.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/mm/init.c               | 67 +++++++++++++++++++++++-------
 drivers/pci/controller/pci-xgene.c |  2 +-
 include/linux/sizes.h              |  1 +
 3 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 6a19a1b1caf8..28de6ea0a720 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -79,37 +79,74 @@ static void __init zone_sizes_init(void)
 }
 
 #if defined(CONFIG_MMU) && defined(CONFIG_DEBUG_VM)
+
+#define LOG2_SZ_1K  ilog2(SZ_1K)
+#define LOG2_SZ_1M  ilog2(SZ_1M)
+#define LOG2_SZ_1G  ilog2(SZ_1G)
+#define LOG2_SZ_1T  ilog2(SZ_1T)
+
 static inline void print_mlk(char *name, unsigned long b, unsigned long t)
 {
 	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld kB)\n", name, b, t,
-		  (((t) - (b)) >> 10));
+		  (((t) - (b)) >> LOG2_SZ_1K));
 }
 
 static inline void print_mlm(char *name, unsigned long b, unsigned long t)
 {
 	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld MB)\n", name, b, t,
-		  (((t) - (b)) >> 20));
+		  (((t) - (b)) >> LOG2_SZ_1M));
+}
+
+static inline void print_mlg(char *name, unsigned long b, unsigned long t)
+{
+	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld GB)\n", name, b, t,
+		  (((t) - (b)) >> LOG2_SZ_1G));
+}
+
+#ifdef CONFIG_64BIT
+static inline void print_mlt(char *name, unsigned long b, unsigned long t)
+{
+	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld TB)\n", name, b, t,
+		  (((t) - (b)) >> LOG2_SZ_1T));
+}
+#endif
+
+static inline void print_ml(char *name, unsigned long b, unsigned long t)
+{
+	unsigned long diff = t - b;
+
+#ifdef CONFIG_64BIT
+	if ((diff >> LOG2_SZ_1T) >= 10)
+		print_mlt(name, b, t);
+	else
+#endif
+	if ((diff >> LOG2_SZ_1G) >= 10)
+		print_mlg(name, b, t);
+	else if ((diff >> LOG2_SZ_1M) >= 10)
+		print_mlm(name, b, t);
+	else
+		print_mlk(name, b, t);
 }
 
 static void __init print_vm_layout(void)
 {
 	pr_notice("Virtual kernel memory layout:\n");
-	print_mlk("fixmap", (unsigned long)FIXADDR_START,
-		  (unsigned long)FIXADDR_TOP);
-	print_mlm("pci io", (unsigned long)PCI_IO_START,
-		  (unsigned long)PCI_IO_END);
-	print_mlm("vmemmap", (unsigned long)VMEMMAP_START,
-		  (unsigned long)VMEMMAP_END);
-	print_mlm("vmalloc", (unsigned long)VMALLOC_START,
-		  (unsigned long)VMALLOC_END);
-	print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
-		  (unsigned long)high_memory);
+	print_ml("fixmap", (unsigned long)FIXADDR_START,
+		 (unsigned long)FIXADDR_TOP);
+	print_ml("pci io", (unsigned long)PCI_IO_START,
+		 (unsigned long)PCI_IO_END);
+	print_ml("vmemmap", (unsigned long)VMEMMAP_START,
+		 (unsigned long)VMEMMAP_END);
+	print_ml("vmalloc", (unsigned long)VMALLOC_START,
+		 (unsigned long)VMALLOC_END);
+	print_ml("lowmem", (unsigned long)PAGE_OFFSET,
+		 (unsigned long)high_memory);
 #ifdef CONFIG_64BIT
 #ifdef CONFIG_KASAN
-	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
+	print_ml("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
 #endif
-	print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
-		  (unsigned long)ADDRESS_SPACE_END);
+	print_ml("kernel", (unsigned long)KERNEL_LINK_ADDR,
+		 (unsigned long)ADDRESS_SPACE_END);
 #endif
 }
 #else
diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c
index e64536047b65..187dcf8a9694 100644
--- a/drivers/pci/controller/pci-xgene.c
+++ b/drivers/pci/controller/pci-xgene.c
@@ -21,6 +21,7 @@
 #include <linux/pci-ecam.h>
 #include <linux/platform_device.h>
 #include <linux/slab.h>
+#include <linux/sizes.h>
 
 #include "../pci.h"
 
@@ -50,7 +51,6 @@
 #define OB_LO_IO			0x00000002
 #define XGENE_PCIE_VENDORID		0x10E8
 #define XGENE_PCIE_DEVICEID		0xE004
-#define SZ_1T				(SZ_1G*1024ULL)
 #define PIPE_PHY_RATE_RD(src)		((0xc000 & (u32)(src)) >> 0xe)
 
 #define XGENE_V1_PCI_EXP_CAP		0x40
diff --git a/include/linux/sizes.h b/include/linux/sizes.h
index 1ac79bcee2bb..0bc6cf394b08 100644
--- a/include/linux/sizes.h
+++ b/include/linux/sizes.h
@@ -47,6 +47,7 @@
 #define SZ_8G				_AC(0x200000000, ULL)
 #define SZ_16G				_AC(0x400000000, ULL)
 #define SZ_32G				_AC(0x800000000, ULL)
+#define SZ_1T				_AC(0x10000000000, ULL)
 #define SZ_64T				_AC(0x400000000000, ULL)
 
 #endif /* __LINUX_SIZES_H__ */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 10/13] riscv: Improve virtual kernel memory layout dump
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

With the arrival of sv48 and its large address space, it would be
cumbersome to statically define the unit size to use to print the different
portions of the virtual memory layout: instead, determine it dynamically.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/mm/init.c               | 67 +++++++++++++++++++++++-------
 drivers/pci/controller/pci-xgene.c |  2 +-
 include/linux/sizes.h              |  1 +
 3 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 6a19a1b1caf8..28de6ea0a720 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -79,37 +79,74 @@ static void __init zone_sizes_init(void)
 }
 
 #if defined(CONFIG_MMU) && defined(CONFIG_DEBUG_VM)
+
+#define LOG2_SZ_1K  ilog2(SZ_1K)
+#define LOG2_SZ_1M  ilog2(SZ_1M)
+#define LOG2_SZ_1G  ilog2(SZ_1G)
+#define LOG2_SZ_1T  ilog2(SZ_1T)
+
 static inline void print_mlk(char *name, unsigned long b, unsigned long t)
 {
 	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld kB)\n", name, b, t,
-		  (((t) - (b)) >> 10));
+		  (((t) - (b)) >> LOG2_SZ_1K));
 }
 
 static inline void print_mlm(char *name, unsigned long b, unsigned long t)
 {
 	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld MB)\n", name, b, t,
-		  (((t) - (b)) >> 20));
+		  (((t) - (b)) >> LOG2_SZ_1M));
+}
+
+static inline void print_mlg(char *name, unsigned long b, unsigned long t)
+{
+	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld GB)\n", name, b, t,
+		  (((t) - (b)) >> LOG2_SZ_1G));
+}
+
+#ifdef CONFIG_64BIT
+static inline void print_mlt(char *name, unsigned long b, unsigned long t)
+{
+	pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld TB)\n", name, b, t,
+		  (((t) - (b)) >> LOG2_SZ_1T));
+}
+#endif
+
+static inline void print_ml(char *name, unsigned long b, unsigned long t)
+{
+	unsigned long diff = t - b;
+
+#ifdef CONFIG_64BIT
+	if ((diff >> LOG2_SZ_1T) >= 10)
+		print_mlt(name, b, t);
+	else
+#endif
+	if ((diff >> LOG2_SZ_1G) >= 10)
+		print_mlg(name, b, t);
+	else if ((diff >> LOG2_SZ_1M) >= 10)
+		print_mlm(name, b, t);
+	else
+		print_mlk(name, b, t);
 }
 
 static void __init print_vm_layout(void)
 {
 	pr_notice("Virtual kernel memory layout:\n");
-	print_mlk("fixmap", (unsigned long)FIXADDR_START,
-		  (unsigned long)FIXADDR_TOP);
-	print_mlm("pci io", (unsigned long)PCI_IO_START,
-		  (unsigned long)PCI_IO_END);
-	print_mlm("vmemmap", (unsigned long)VMEMMAP_START,
-		  (unsigned long)VMEMMAP_END);
-	print_mlm("vmalloc", (unsigned long)VMALLOC_START,
-		  (unsigned long)VMALLOC_END);
-	print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
-		  (unsigned long)high_memory);
+	print_ml("fixmap", (unsigned long)FIXADDR_START,
+		 (unsigned long)FIXADDR_TOP);
+	print_ml("pci io", (unsigned long)PCI_IO_START,
+		 (unsigned long)PCI_IO_END);
+	print_ml("vmemmap", (unsigned long)VMEMMAP_START,
+		 (unsigned long)VMEMMAP_END);
+	print_ml("vmalloc", (unsigned long)VMALLOC_START,
+		 (unsigned long)VMALLOC_END);
+	print_ml("lowmem", (unsigned long)PAGE_OFFSET,
+		 (unsigned long)high_memory);
 #ifdef CONFIG_64BIT
 #ifdef CONFIG_KASAN
-	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
+	print_ml("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
 #endif
-	print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
-		  (unsigned long)ADDRESS_SPACE_END);
+	print_ml("kernel", (unsigned long)KERNEL_LINK_ADDR,
+		 (unsigned long)ADDRESS_SPACE_END);
 #endif
 }
 #else
diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c
index e64536047b65..187dcf8a9694 100644
--- a/drivers/pci/controller/pci-xgene.c
+++ b/drivers/pci/controller/pci-xgene.c
@@ -21,6 +21,7 @@
 #include <linux/pci-ecam.h>
 #include <linux/platform_device.h>
 #include <linux/slab.h>
+#include <linux/sizes.h>
 
 #include "../pci.h"
 
@@ -50,7 +51,6 @@
 #define OB_LO_IO			0x00000002
 #define XGENE_PCIE_VENDORID		0x10E8
 #define XGENE_PCIE_DEVICEID		0xE004
-#define SZ_1T				(SZ_1G*1024ULL)
 #define PIPE_PHY_RATE_RD(src)		((0xc000 & (u32)(src)) >> 0xe)
 
 #define XGENE_V1_PCI_EXP_CAP		0x40
diff --git a/include/linux/sizes.h b/include/linux/sizes.h
index 1ac79bcee2bb..0bc6cf394b08 100644
--- a/include/linux/sizes.h
+++ b/include/linux/sizes.h
@@ -47,6 +47,7 @@
 #define SZ_8G				_AC(0x200000000, ULL)
 #define SZ_16G				_AC(0x400000000, ULL)
 #define SZ_32G				_AC(0x800000000, ULL)
+#define SZ_1T				_AC(0x10000000000, ULL)
 #define SZ_64T				_AC(0x400000000000, ULL)
 
 #endif /* __LINUX_SIZES_H__ */
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 11/13] Documentation: riscv: Add sv48 description to VM layout
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

sv48 was just introduced, so add its virtual memory layout to the
documentation.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 Documentation/riscv/vm-layout.rst | 36 +++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
index 1bd687b97104..5b36e45fef60 100644
--- a/Documentation/riscv/vm-layout.rst
+++ b/Documentation/riscv/vm-layout.rst
@@ -61,3 +61,39 @@ RISC-V Linux Kernel SV39
    ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | modules, BPF
    ffffffff80000000 |   -2    GB | ffffffffffffffff |    2 GB | kernel
   __________________|____________|__________________|_________|____________________________________________________________
+
+
+RISC-V Linux Kernel SV48
+------------------------
+
+::
+
+ ========================================================================================================================
+      Start addr    |   Offset   |     End addr     |  Size   | VM area description
+ ========================================================================================================================
+                    |            |                  |         |
+   0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
+  __________________|____________|__________________|_________|___________________________________________________________
+                    |            |                  |         |
+   0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
+                    |            |                  |         | virtual memory addresses up to the -128 TB
+                    |            |                  |         | starting offset of kernel mappings.
+  __________________|____________|__________________|_________|___________________________________________________________
+                                                              |
+                                                              | Kernel-space virtual memory, shared between all processes:
+  ____________________________________________________________|___________________________________________________________
+                    |            |                  |         |
+   ffff8d7ffee00000 |  -114.5 TB | ffff8d7ffeffffff |    2 MB | fixmap
+   ffff8d7fff000000 |  -114.5 TB | ffff8d7fffffffff |   16 MB | PCI io
+   ffff8d8000000000 |  -114.5 TB | ffff8f7fffffffff |    2 TB | vmemmap
+   ffff8f8000000000 |  -112.5 TB | ffffaf7fffffffff |   32 TB | vmalloc/ioremap space
+   ffffaf8000000000 |  -80.5  TB | ffffef7fffffffff |   64 TB | direct mapping of all physical memory
+   ffffef8000000000 |  -16.5  TB | fffffffeffffffff | 16.5 TB | kasan
+  __________________|____________|__________________|_________|____________________________________________________________
+                                                              |
+                                                              | Identical layout to the 39-bit one from here on:
+  ____________________________________________________________|____________________________________________________________
+                    |            |                  |         |
+   ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | modules, BPF
+   ffffffff80000000 |   -2    GB | ffffffffffffffff |    2 GB | kernel
+  __________________|____________|__________________|_________|____________________________________________________________
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 11/13] Documentation: riscv: Add sv48 description to VM layout
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

sv48 was just introduced, so add its virtual memory layout to the
documentation.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 Documentation/riscv/vm-layout.rst | 36 +++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
index 1bd687b97104..5b36e45fef60 100644
--- a/Documentation/riscv/vm-layout.rst
+++ b/Documentation/riscv/vm-layout.rst
@@ -61,3 +61,39 @@ RISC-V Linux Kernel SV39
    ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | modules, BPF
    ffffffff80000000 |   -2    GB | ffffffffffffffff |    2 GB | kernel
   __________________|____________|__________________|_________|____________________________________________________________
+
+
+RISC-V Linux Kernel SV48
+------------------------
+
+::
+
+ ========================================================================================================================
+      Start addr    |   Offset   |     End addr     |  Size   | VM area description
+ ========================================================================================================================
+                    |            |                  |         |
+   0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
+  __________________|____________|__________________|_________|___________________________________________________________
+                    |            |                  |         |
+   0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
+                    |            |                  |         | virtual memory addresses up to the -128 TB
+                    |            |                  |         | starting offset of kernel mappings.
+  __________________|____________|__________________|_________|___________________________________________________________
+                                                              |
+                                                              | Kernel-space virtual memory, shared between all processes:
+  ____________________________________________________________|___________________________________________________________
+                    |            |                  |         |
+   ffff8d7ffee00000 |  -114.5 TB | ffff8d7ffeffffff |    2 MB | fixmap
+   ffff8d7fff000000 |  -114.5 TB | ffff8d7fffffffff |   16 MB | PCI io
+   ffff8d8000000000 |  -114.5 TB | ffff8f7fffffffff |    2 TB | vmemmap
+   ffff8f8000000000 |  -112.5 TB | ffffaf7fffffffff |   32 TB | vmalloc/ioremap space
+   ffffaf8000000000 |  -80.5  TB | ffffef7fffffffff |   64 TB | direct mapping of all physical memory
+   ffffef8000000000 |  -16.5  TB | fffffffeffffffff | 16.5 TB | kasan
+  __________________|____________|__________________|_________|____________________________________________________________
+                                                              |
+                                                              | Identical layout to the 39-bit one from here on:
+  ____________________________________________________________|____________________________________________________________
+                    |            |                  |         |
+   ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | modules, BPF
+   ffffffff80000000 |   -2    GB | ffffffffffffffff |    2 GB | kernel
+  __________________|____________|__________________|_________|____________________________________________________________
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

Because of the stack canary feature that reads from the current task
structure the stack canary value, the thread pointer register "tp" must
be set before calling any C function from head.S: by chance, setup_vm
and all the functions that it calls does not seem to be part of the
functions where the canary check is done, but in the following commits,
some functions will.

Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/kernel/head.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index c3c0ed559770..86f7ee3d210d 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -302,6 +302,7 @@ clear_bss_done:
 	REG_S a0, (a2)
 
 	/* Initialize page tables and relocate to virtual addresses */
+	la tp, init_task
 	la sp, init_thread_union + THREAD_SIZE
 	XIP_FIXUP_OFFSET sp
 #ifdef CONFIG_BUILTIN_DTB
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

Because of the stack canary feature that reads from the current task
structure the stack canary value, the thread pointer register "tp" must
be set before calling any C function from head.S: by chance, setup_vm
and all the functions that it calls does not seem to be part of the
functions where the canary check is done, but in the following commits,
some functions will.

Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/kernel/head.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index c3c0ed559770..86f7ee3d210d 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -302,6 +302,7 @@ clear_bss_done:
 	REG_S a0, (a2)
 
 	/* Initialize page tables and relocate to virtual addresses */
+	la tp, init_task
 	la sp, init_thread_union + THREAD_SIZE
 	XIP_FIXUP_OFFSET sp
 #ifdef CONFIG_BUILTIN_DTB
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 13/13] riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 10:46   ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

This is made possible by using the mmu-type property of the cpu node of
the device tree.

By default, the kernel will boot with 4-level page table if the hw supports
it but it can be interesting for the user to select 3-level page table as
it is less memory consuming and faster since it requires less memory
accesses in case of a TLB miss.

This functionality requires that kasan is disabled since calling the fdt
functions that are kasan instrumented with the MMU off can't work.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/mm/init.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 28de6ea0a720..299b5a44f902 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -633,10 +633,38 @@ static void __init disable_pgtable_l4(void)
  * then read SATP to see if the configuration was taken into account
  * meaning sv48 is supported.
  */
-static __init void set_satp_mode(void)
+static __init void set_satp_mode(uintptr_t dtb_pa)
 {
 	u64 identity_satp, hw_satp;
 	uintptr_t set_satp_mode_pmd;
+#ifndef CONFIG_KASAN
+	/*
+	 * The below fdt functions are kasan instrumented, since at this point
+	 * there is no mapping for the kasan shadow memory, this can't be used
+	 * when kasan is enabled otherwise it traps.
+	 */
+	int cpus_node;
+
+	/* Check if the user asked for sv39 explicitly in the device tree */
+	cpus_node = fdt_path_offset((void *)dtb_pa, "/cpus");
+	if (cpus_node >= 0) {
+		int node;
+
+		fdt_for_each_subnode(node, (void *)dtb_pa, cpus_node) {
+			const char *mmu_type = fdt_getprop((void *)dtb_pa, node,
+					"mmu-type", NULL);
+			if (!mmu_type)
+				continue;
+
+			if (!strcmp(mmu_type, "riscv,sv39")) {
+				disable_pgtable_l4();
+				return;
+			}
+
+			break;
+		}
+	}
+#endif
 
 	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
 	create_pgd_mapping(early_pg_dir,
@@ -838,7 +866,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 #endif
 
 #if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
-	set_satp_mode();
+	set_satp_mode(dtb_pa);
 #endif
 
 	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v3 13/13] riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
@ 2021-12-06 10:46   ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2021-12-06 10:46 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch
  Cc: Alexandre Ghiti

This is made possible by using the mmu-type property of the cpu node of
the device tree.

By default, the kernel will boot with 4-level page table if the hw supports
it but it can be interesting for the user to select 3-level page table as
it is less memory consuming and faster since it requires less memory
accesses in case of a TLB miss.

This functionality requires that kasan is disabled since calling the fdt
functions that are kasan instrumented with the MMU off can't work.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
---
 arch/riscv/mm/init.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 28de6ea0a720..299b5a44f902 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -633,10 +633,38 @@ static void __init disable_pgtable_l4(void)
  * then read SATP to see if the configuration was taken into account
  * meaning sv48 is supported.
  */
-static __init void set_satp_mode(void)
+static __init void set_satp_mode(uintptr_t dtb_pa)
 {
 	u64 identity_satp, hw_satp;
 	uintptr_t set_satp_mode_pmd;
+#ifndef CONFIG_KASAN
+	/*
+	 * The below fdt functions are kasan instrumented, since at this point
+	 * there is no mapping for the kasan shadow memory, this can't be used
+	 * when kasan is enabled otherwise it traps.
+	 */
+	int cpus_node;
+
+	/* Check if the user asked for sv39 explicitly in the device tree */
+	cpus_node = fdt_path_offset((void *)dtb_pa, "/cpus");
+	if (cpus_node >= 0) {
+		int node;
+
+		fdt_for_each_subnode(node, (void *)dtb_pa, cpus_node) {
+			const char *mmu_type = fdt_getprop((void *)dtb_pa, node,
+					"mmu-type", NULL);
+			if (!mmu_type)
+				continue;
+
+			if (!strcmp(mmu_type, "riscv,sv39")) {
+				disable_pgtable_l4();
+				return;
+			}
+
+			break;
+		}
+	}
+#endif
 
 	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
 	create_pgd_mapping(early_pg_dir,
@@ -838,7 +866,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 #endif
 
 #if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
-	set_satp_mode();
+	set_satp_mode(dtb_pa);
 #endif
 
 	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
-- 
2.32.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-06 10:46   ` Alexandre Ghiti
@ 2021-12-06 11:05     ` Alexandre ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre ghiti @ 2021-12-06 11:05 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

On 12/6/21 11:46, Alexandre Ghiti wrote:
> By adding a new 4th level of page table, give the possibility to 64bit
> kernel to address 2^48 bytes of virtual address: in practice, that offers
> 128TB of virtual address space to userspace and allows up to 64TB of
> physical memory.
>
> If the underlying hardware does not support sv48, we will automatically
> fallback to a standard 3-level page table by folding the new PUD level into
> PGDIR level. In order to detect HW capabilities at runtime, we
> use SATP feature that ignores writes with an unsupported mode.
>
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>   arch/riscv/Kconfig                      |   4 +-
>   arch/riscv/include/asm/csr.h            |   3 +-
>   arch/riscv/include/asm/fixmap.h         |   1 +
>   arch/riscv/include/asm/kasan.h          |   6 +-
>   arch/riscv/include/asm/page.h           |  14 ++
>   arch/riscv/include/asm/pgalloc.h        |  40 +++++
>   arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
>   arch/riscv/include/asm/pgtable.h        |  24 ++-
>   arch/riscv/kernel/head.S                |   3 +-
>   arch/riscv/mm/context.c                 |   4 +-
>   arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
>   arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
>   drivers/firmware/efi/libstub/efi-stub.c |   2 +
>   13 files changed, 514 insertions(+), 44 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ac6c0cd9bc29..d28fe0148e13 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -150,7 +150,7 @@ config PAGE_OFFSET
>   	hex
>   	default 0xC0000000 if 32BIT
>   	default 0x80000000 if 64BIT && !MMU
> -	default 0xffffffd800000000 if 64BIT
> +	default 0xffffaf8000000000 if 64BIT
>   
>   config KASAN_SHADOW_OFFSET
>   	hex
> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
>   
>   config PGTABLE_LEVELS
>   	int
> -	default 3 if 64BIT
> +	default 4 if 64BIT
>   	default 2
>   
>   config LOCKDEP_SUPPORT
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 87ac65696871..3fdb971c7896 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -40,14 +40,13 @@
>   #ifndef CONFIG_64BIT
>   #define SATP_PPN	_AC(0x003FFFFF, UL)
>   #define SATP_MODE_32	_AC(0x80000000, UL)
> -#define SATP_MODE	SATP_MODE_32
>   #define SATP_ASID_BITS	9
>   #define SATP_ASID_SHIFT	22
>   #define SATP_ASID_MASK	_AC(0x1FF, UL)
>   #else
>   #define SATP_PPN	_AC(0x00000FFFFFFFFFFF, UL)
>   #define SATP_MODE_39	_AC(0x8000000000000000, UL)
> -#define SATP_MODE	SATP_MODE_39
> +#define SATP_MODE_48	_AC(0x9000000000000000, UL)
>   #define SATP_ASID_BITS	16
>   #define SATP_ASID_SHIFT	44
>   #define SATP_ASID_MASK	_AC(0xFFFF, UL)
> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> index 54cbf07fb4e9..58a718573ad6 100644
> --- a/arch/riscv/include/asm/fixmap.h
> +++ b/arch/riscv/include/asm/fixmap.h
> @@ -24,6 +24,7 @@ enum fixed_addresses {
>   	FIX_HOLE,
>   	FIX_PTE,
>   	FIX_PMD,
> +	FIX_PUD,
>   	FIX_TEXT_POKE1,
>   	FIX_TEXT_POKE0,
>   	FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index 743e6ff57996..0b85e363e778 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,7 +28,11 @@
>   #define KASAN_SHADOW_SCALE_SHIFT	3
>   
>   #define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +/*
> + * Depending on the size of the virtual address space, the region may not be
> + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> + */
> +#define KASAN_SHADOW_START	((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
>   #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
>   #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>   
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index e03559f9b35e..d089fe46f7d8 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,7 +31,20 @@
>    * When not using MMU this corresponds to the first free page in
>    * physical memory (aligned on a page boundary).
>    */
> +#ifdef CONFIG_64BIT
> +#ifdef CONFIG_MMU
> +#define PAGE_OFFSET		kernel_map.page_offset
> +#else
> +#define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif
> +/*
> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> + * define the PAGE_OFFSET value for SV39.
> + */
> +#define PAGE_OFFSET_L3		_AC(0xffffffd800000000, UL)
> +#else
>   #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif /* CONFIG_64BIT */
>   
>   /*
>    * Half of the kernel address space (half of the entries of the page global
> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
>   #endif /* CONFIG_MMU */
>   
>   struct kernel_mapping {
> +	unsigned long page_offset;
>   	unsigned long virt_addr;
>   	uintptr_t phys_addr;
>   	uintptr_t size;
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index 0af6933a7100..11823004b87a 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -11,6 +11,8 @@
>   #include <asm/tlb.h>
>   
>   #ifdef CONFIG_MMU
> +#define __HAVE_ARCH_PUD_ALLOC_ONE
> +#define __HAVE_ARCH_PUD_FREE
>   #include <asm-generic/pgalloc.h>
>   
>   static inline void pmd_populate_kernel(struct mm_struct *mm,
> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>   
>   	set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
>   }
> +
> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> +				     pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d_safe(p4d,
> +			     __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +#define pud_alloc_one pud_alloc_one
> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> +{
> +	if (pgtable_l4_enabled)
> +		return __pud_alloc_one(mm, addr);
> +
> +	return NULL;
> +}
> +
> +#define pud_free pud_free
> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled)
> +		__pud_free(mm, pud);
> +}
> +
> +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
>   #endif /* __PAGETABLE_PMD_FOLDED */
>   
>   static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index 228261aa9628..bbbdd66e5e2f 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -8,16 +8,36 @@
>   
>   #include <linux/const.h>
>   
> -#define PGDIR_SHIFT     30
> +extern bool pgtable_l4_enabled;
> +
> +#define PGDIR_SHIFT_L3  30
> +#define PGDIR_SHIFT_L4  39
> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> +
> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
>   /* Size of region mapped by a page global directory */
>   #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
>   #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
>   
> +/* pud is folded into pgd in case of 3-level page table */
> +#define PUD_SHIFT      30
> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> +#define PUD_MASK       (~(PUD_SIZE - 1))
> +
>   #define PMD_SHIFT       21
>   /* Size of region mapped by a page middle directory */
>   #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
>   #define PMD_MASK        (~(PMD_SIZE - 1))
>   
> +/* Page Upper Directory entry */
> +typedef struct {
> +	unsigned long pud;
> +} pud_t;
> +
> +#define pud_val(x)      ((x).pud)
> +#define __pud(x)        ((pud_t) { (x) })
> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> +
>   /* Page Middle Directory entry */
>   typedef struct {
>   	unsigned long pmd;
> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
>   	set_pud(pudp, __pud(0));
>   }
>   
> +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> +{
> +	return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> +}
> +
> +static inline unsigned long _pud_pfn(pud_t pud)
> +{
> +	return pud_val(pud) >> _PAGE_PFN_SHIFT;
> +}
> +
>   static inline pmd_t *pud_pgtable(pud_t pud)
>   {
>   	return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
>   	return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
>   }
>   
> +#define mm_pud_folded  mm_pud_folded
> +static inline bool mm_pud_folded(struct mm_struct *mm)
> +{
> +	if (pgtable_l4_enabled)
> +		return false;
> +
> +	return true;
> +}
> +
> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> +
>   static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
>   {
>   	return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
>   #define pmd_ERROR(e) \
>   	pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
>   
> +#define pud_ERROR(e)   \
> +	pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> +
> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		*p4dp = p4d;
> +	else
> +		set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> +}
> +
> +static inline int p4d_none(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) == 0);
> +
> +	return 0;
> +}
> +
> +static inline int p4d_present(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) & _PAGE_PRESENT);
> +
> +	return 1;
> +}
> +
> +static inline int p4d_bad(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return !p4d_present(p4d);
> +
> +	return 0;
> +}
> +
> +static inline void p4d_clear(p4d_t *p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		set_p4d(p4d, __p4d(0));
> +}
> +
> +static inline pud_t *p4d_pgtable(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +
> +	return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> +}
> +
> +static inline struct page *p4d_page(p4d_t p4d)
> +{
> +	return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +}
> +
> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> +
> +#define pud_offset pud_offset
> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> +{
> +	if (pgtable_l4_enabled)
> +		return p4d_pgtable(*p4d) + pud_index(address);
> +
> +	return (pud_t *)p4d;
> +}
> +
>   #endif /* _ASM_RISCV_PGTABLE_64_H */
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index e1a52e22ad7e..e1c74ef4ead2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -51,7 +51,7 @@
>    * position vmemmap directly below the VMALLOC region.
>    */
>   #ifdef CONFIG_64BIT
> -#define VA_BITS		39
> +#define VA_BITS		(pgtable_l4_enabled ? 48 : 39)
>   #else
>   #define VA_BITS		32
>   #endif
> @@ -90,8 +90,7 @@
>   
>   #ifndef __ASSEMBLY__
>   
> -/* Page Upper Directory not used in RISC-V */
> -#include <asm-generic/pgtable-nopud.h>
> +#include <asm-generic/pgtable-nop4d.h>
>   #include <asm/page.h>
>   #include <asm/tlbflush.h>
>   #include <linux/mm_types.h>
> @@ -113,6 +112,17 @@
>   #define XIP_FIXUP(addr)		(addr)
>   #endif /* CONFIG_XIP_KERNEL */
>   
> +struct pt_alloc_ops {
> +	pte_t *(*get_pte_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pte)(uintptr_t va);
> +#ifndef __PAGETABLE_PMD_FOLDED
> +	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pmd)(uintptr_t va);
> +	pud_t *(*get_pud_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pud)(uintptr_t va);
> +#endif
> +};
> +
>   #ifdef CONFIG_MMU
>   /* Number of entries in the page global directory */
>   #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>    * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>    */
>   #ifdef CONFIG_64BIT
> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>   #else
> -#define TASK_SIZE FIXADDR_START
> +#define TASK_SIZE	FIXADDR_START
> +#define TASK_SIZE_MIN	TASK_SIZE
>   #endif
>   
>   #else /* CONFIG_MMU */
> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
>   #define dtb_early_va	_dtb_early_va
>   #define dtb_early_pa	_dtb_early_pa
>   #endif /* CONFIG_XIP_KERNEL */
> +extern u64 satp_mode;
> +extern bool pgtable_l4_enabled;
>   
>   void paging_init(void);
>   void misc_mem_init(void);
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 52c5ff9804c5..c3c0ed559770 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -95,7 +95,8 @@ relocate:
>   
>   	/* Compute satp for kernel page tables, but don't load it yet */
>   	srl a2, a0, PAGE_SHIFT
> -	li a1, SATP_MODE
> +	la a1, satp_mode
> +	REG_L a1, 0(a1)
>   	or a2, a2, a1
>   
>   	/*
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index ee3459cb6750..a7246872bd30 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>   switch_mm_fast:
>   	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
>   		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> -		  SATP_MODE);
> +		  satp_mode);
>   
>   	if (need_flush_tlb)
>   		local_flush_tlb_all();
> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>   static void set_mm_noasid(struct mm_struct *mm)
>   {
>   	/* Switch the page table and blindly nuke entire local TLB */
> -	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> +	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
>   	local_flush_tlb_all();
>   }
>   
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 1552226fb6bd..6a19a1b1caf8 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
>   #define kernel_map	(*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
>   #endif
>   
> +#ifdef CONFIG_64BIT
> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> +#else
> +u64 satp_mode = SATP_MODE_32;
> +#endif
> +EXPORT_SYMBOL(satp_mode);
> +
> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> +				true : false;
> +EXPORT_SYMBOL(pgtable_l4_enabled);
> +
>   phys_addr_t phys_ram_base __ro_after_init;
>   EXPORT_SYMBOL(phys_ram_base);
>   
> @@ -53,15 +64,6 @@ extern char _start[];
>   void *_dtb_early_va __initdata;
>   uintptr_t _dtb_early_pa __initdata;
>   
> -struct pt_alloc_ops {
> -	pte_t *(*get_pte_virt)(phys_addr_t pa);
> -	phys_addr_t (*alloc_pte)(uintptr_t va);
> -#ifndef __PAGETABLE_PMD_FOLDED
> -	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> -	phys_addr_t (*alloc_pmd)(uintptr_t va);
> -#endif
> -};
> -
>   static phys_addr_t dma32_phys_limit __initdata;
>   
>   static void __init zone_sizes_init(void)
> @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
>   }
>   
>   #ifdef CONFIG_MMU
> -static struct pt_alloc_ops _pt_ops __initdata;
> +struct pt_alloc_ops _pt_ops __initdata;
>   
>   #ifdef CONFIG_XIP_KERNEL
>   #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
> @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>   static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>   
>   pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
>   static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>   
>   #ifdef CONFIG_XIP_KERNEL
> @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>   #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
>   #endif /* CONFIG_XIP_KERNEL */
>   
> +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> +
> +#ifdef CONFIG_XIP_KERNEL
> +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
> +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
> +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
> +#endif /* CONFIG_XIP_KERNEL */
> +
>   static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
>   {
>   	/* Before MMU is enabled */
> @@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
>   
>   static phys_addr_t __init alloc_pmd_early(uintptr_t va)
>   {
> -	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +	BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
>   
>   	return (uintptr_t)early_pmd;
>   }
> @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
>   	create_pte_mapping(ptep, va, pa, sz, prot);
>   }
>   
> -#define pgd_next_t		pmd_t
> -#define alloc_pgd_next(__va)	pt_ops.alloc_pmd(__va)
> -#define get_pgd_next_virt(__pa)	pt_ops.get_pmd_virt(__pa)
> +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
> +{
> +	return (pud_t *)((uintptr_t)pa);
> +}
> +
> +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
> +{
> +	clear_fixmap(FIX_PUD);
> +	return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
> +}
> +
> +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
> +{
> +	return (pud_t *)__va(pa);
> +}
> +
> +static phys_addr_t __init alloc_pud_early(uintptr_t va)
> +{
> +	/* Only one PUD is available for early mapping */
> +	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +
> +	return (uintptr_t)early_pud;
> +}
> +
> +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
> +{
> +	return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> +}
> +
> +static phys_addr_t alloc_pud_late(uintptr_t va)
> +{
> +	unsigned long vaddr;
> +
> +	vaddr = __get_free_page(GFP_KERNEL);
> +	BUG_ON(!vaddr);
> +	return __pa(vaddr);
> +}
> +
> +static void __init create_pud_mapping(pud_t *pudp,
> +				      uintptr_t va, phys_addr_t pa,
> +				      phys_addr_t sz, pgprot_t prot)
> +{
> +	pmd_t *nextp;
> +	phys_addr_t next_phys;
> +	uintptr_t pud_index = pud_index(va);
> +
> +	if (sz == PUD_SIZE) {
> +		if (pud_val(pudp[pud_index]) == 0)
> +			pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
> +		return;
> +	}
> +
> +	if (pud_val(pudp[pud_index]) == 0) {
> +		next_phys = pt_ops.alloc_pmd(va);
> +		pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
> +		nextp = pt_ops.get_pmd_virt(next_phys);
> +		memset(nextp, 0, PAGE_SIZE);
> +	} else {
> +		next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
> +		nextp = pt_ops.get_pmd_virt(next_phys);
> +	}
> +
> +	create_pmd_mapping(nextp, va, pa, sz, prot);
> +}
> +
> +#define pgd_next_t		pud_t
> +#define alloc_pgd_next(__va)	(pgtable_l4_enabled ?			\
> +		pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
> +#define get_pgd_next_virt(__pa)	(pgtable_l4_enabled ?			\
> +		pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
> -	create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next		fixmap_pmd
> +				(pgtable_l4_enabled ?			\
> +		create_pud_mapping(__nextp, __va, __pa, __sz, __prot) :	\
> +		create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
> +#define fixmap_pgd_next		(pgtable_l4_enabled ?			\
> +		(uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
> +#define trampoline_pgd_next	(pgtable_l4_enabled ?			\
> +		(uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
> +#define early_dtb_pgd_next	(pgtable_l4_enabled ?			\
> +		(uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
>   #else
>   #define pgd_next_t		pte_t
>   #define alloc_pgd_next(__va)	pt_ops.alloc_pte(__va)
>   #define get_pgd_next_virt(__pa)	pt_ops.get_pte_virt(__pa)
>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
>   	create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next		fixmap_pte
> +#define fixmap_pgd_next		((uintptr_t)fixmap_pte)
> +#define early_dtb_pgd_next	((uintptr_t)early_dtb_pmd)
> +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
>   #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
> -#endif
> +#endif /* __PAGETABLE_PMD_FOLDED */
>   
>   void __init create_pgd_mapping(pgd_t *pgdp,
>   				      uintptr_t va, phys_addr_t pa,
> @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
>   }
>   #endif /* CONFIG_STRICT_KERNEL_RWX */
>   
> +#ifdef CONFIG_64BIT
> +static void __init disable_pgtable_l4(void)
> +{
> +	pgtable_l4_enabled = false;
> +	kernel_map.page_offset = PAGE_OFFSET_L3;
> +	satp_mode = SATP_MODE_39;
> +}
> +
> +/*
> + * There is a simple way to determine if 4-level is supported by the
> + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> + * then read SATP to see if the configuration was taken into account
> + * meaning sv48 is supported.
> + */
> +static __init void set_satp_mode(void)
> +{
> +	u64 identity_satp, hw_satp;
> +	uintptr_t set_satp_mode_pmd;
> +
> +	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> +	create_pgd_mapping(early_pg_dir,
> +			   set_satp_mode_pmd, (uintptr_t)early_pud,
> +			   PGDIR_SIZE, PAGE_TABLE);
> +	create_pud_mapping(early_pud,
> +			   set_satp_mode_pmd, (uintptr_t)early_pmd,
> +			   PUD_SIZE, PAGE_TABLE);
> +	/* Handle the case where set_satp_mode straddles 2 PMDs */
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd, set_satp_mode_pmd,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +
> +	identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> +
> +	local_flush_tlb_all();
> +	csr_write(CSR_SATP, identity_satp);
> +	hw_satp = csr_swap(CSR_SATP, 0ULL);
> +	local_flush_tlb_all();
> +
> +	if (hw_satp != identity_satp)
> +		disable_pgtable_l4();
> +
> +	memset(early_pg_dir, 0, PAGE_SIZE);
> +	memset(early_pud, 0, PAGE_SIZE);
> +	memset(early_pmd, 0, PAGE_SIZE);
> +}
> +#endif
> +
>   /*
>    * setup_vm() is called from head.S with MMU-off.
>    *
> @@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
>   	uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
>   
>   	create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> -			   IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
> +			   IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
>   			   PGDIR_SIZE,
>   			   IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
>   
> +	if (pgtable_l4_enabled) {
> +		create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
> +				   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
> +	}
> +
>   	if (IS_ENABLED(CONFIG_64BIT)) {
>   		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
>   				   pa, PMD_SIZE, PAGE_KERNEL);
> @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
>   #ifndef __PAGETABLE_PMD_FOLDED
>   	pt_ops.alloc_pmd = alloc_pmd_early;
>   	pt_ops.get_pmd_virt = get_pmd_virt_early;
> +	pt_ops.alloc_pud = alloc_pud_early;
> +	pt_ops.get_pud_virt = get_pud_virt_early;
>   #endif
>   }
>   
> @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
>   #ifndef __PAGETABLE_PMD_FOLDED
>   	pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
>   	pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
> +	pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
> +	pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
>   #endif
>   }
>   
> @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
>   #ifndef __PAGETABLE_PMD_FOLDED
>   	pt_ops.alloc_pmd = alloc_pmd_late;
>   	pt_ops.get_pmd_virt = get_pmd_virt_late;
> +	pt_ops.alloc_pud = alloc_pud_late;
> +	pt_ops.get_pud_virt = get_pud_virt_late;
>   #endif
>   }
>   
> @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   	pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
>   
>   	kernel_map.virt_addr = KERNEL_LINK_ADDR;
> +	kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
>   
>   #ifdef CONFIG_XIP_KERNEL
>   	kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
> @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   	kernel_map.phys_addr = (uintptr_t)(&_start);
>   	kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
>   #endif
> +
> +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
> +	set_satp_mode();
> +#endif
> +
>   	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
>   	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>   
> @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   
>   	/* Setup early PGD for fixmap */
>   	create_pgd_mapping(early_pg_dir, FIXADDR_START,
> -			   (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +			   fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
>   
>   #ifndef __PAGETABLE_PMD_FOLDED
> -	/* Setup fixmap PMD */
> +	/* Setup fixmap PUD and PMD */
> +	if (pgtable_l4_enabled)
> +		create_pud_mapping(fixmap_pud, FIXADDR_START,
> +				   (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
>   	create_pmd_mapping(fixmap_pmd, FIXADDR_START,
>   			   (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
>   	/* Setup trampoline PGD and PMD */
>   	create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
> -			   (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
> +			   trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +	if (pgtable_l4_enabled)
> +		create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
> +				   (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
>   #ifdef CONFIG_XIP_KERNEL
>   	create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
>   			   kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
> @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   	 * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
>   	 * range can not span multiple pmds.
>   	 */
> -	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> +	BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
>   		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
>   
>   #ifndef __PAGETABLE_PMD_FOLDED
> @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
>   	/* Clear fixmap PTE and PMD mappings */
>   	clear_fixmap(FIX_PTE);
>   	clear_fixmap(FIX_PMD);
> +	clear_fixmap(FIX_PUD);
>   
>   	/* Move to swapper page table */
> -	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
> +	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
>   	local_flush_tlb_all();
>   
>   	pt_ops_set_late();
> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> index 1434a0225140..993f50571a3b 100644
> --- a/arch/riscv/mm/kasan_init.c
> +++ b/arch/riscv/mm/kasan_init.c
> @@ -11,7 +11,29 @@
>   #include <asm/fixmap.h>
>   #include <asm/pgalloc.h>
>   
> +/*
> + * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
> + * which is right before the kernel.
> + *
> + * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
> + * the page global directory with kasan_early_shadow_pmd.
> + *
> + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
> + * must be divided as follows:
> + * - the first PGD entry, although incomplete, is populated with
> + *   kasan_early_shadow_pud/p4d
> + * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
> + * - the last PGD entry is shared with the kernel mapping so populated at the
> + *   lower levels pud/p4d
> + *
> + * In addition, when shallow populating a kasan region (for example vmalloc),
> + * this region may also not be aligned on PGDIR size, so we must go down to the
> + * pud level too.
> + */
> +
>   extern pgd_t early_pg_dir[PTRS_PER_PGD];
> +extern struct pt_alloc_ops _pt_ops __initdata;
> +#define pt_ops	_pt_ops
>   
>   static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
>   {
> @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
>   	set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
>   }
>   
> -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
> +static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
>   {
>   	phys_addr_t phys_addr;
>   	pmd_t *pmdp, *base_pmd;
>   	unsigned long next;
>   
> -	base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
> -	if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +	if (pud_none(*pud)) {
>   		base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +	} else {
> +		base_pmd = (pmd_t *)pud_pgtable(*pud);
> +		if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +			base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +	}
>   
>   	pmdp = base_pmd + pmd_index(vaddr);
>   
> @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
>   	 * it entirely, memblock could allocate a page at a physical address
>   	 * where KASAN is not populated yet and then we'd get a page fault.
>   	 */
> -	set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +	set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +}
> +
> +static void __init kasan_populate_pud(pgd_t *pgd,
> +				      unsigned long vaddr, unsigned long end,
> +				      bool early)
> +{
> +	phys_addr_t phys_addr;
> +	pud_t *pudp, *base_pud;
> +	unsigned long next;
> +
> +	if (early) {
> +		/*
> +		 * We can't use pgd_page_vaddr here as it would return a linear
> +		 * mapping address but it is not mapped yet, but when populating
> +		 * early_pg_dir, we need the physical address and when populating
> +		 * swapper_pg_dir, we need the kernel virtual address so use
> +		 * pt_ops facility.
> +		 */
> +		base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
> +	} else {
> +		base_pud = (pud_t *)pgd_page_vaddr(*pgd);
> +		if (base_pud == lm_alias(kasan_early_shadow_pud))
> +			base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
> +	}
> +
> +	pudp = base_pud + pud_index(vaddr);
> +
> +	do {
> +		next = pud_addr_end(vaddr, end);
> +
> +		if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
> +			if (early) {
> +				phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
> +				set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
> +				continue;
> +			} else {
> +				phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
> +				if (phys_addr) {
> +					set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
> +					continue;
> +				}
> +			}
> +		}
> +
> +		kasan_populate_pmd(pudp, vaddr, next);
> +	} while (pudp++, vaddr = next, vaddr != end);
> +
> +	/*
> +	 * Wait for the whole PGD to be populated before setting the PGD in
> +	 * the page table, otherwise, if we did set the PGD before populating
> +	 * it entirely, memblock could allocate a page at a physical address
> +	 * where KASAN is not populated yet and then we'd get a page fault.
> +	 */
> +	if (!early)
> +		set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
>   }
>   
> +#define kasan_early_shadow_pgd_next			(pgtable_l4_enabled ?	\
> +				(uintptr_t)kasan_early_shadow_pud :		\
> +				(uintptr_t)kasan_early_shadow_pmd)
> +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)			\
> +		(pgtable_l4_enabled ?						\
> +			kasan_populate_pud(pgdp, vaddr, next, early) :		\
> +			kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
> +
>   static void __init kasan_populate_pgd(pgd_t *pgdp,
>   				      unsigned long vaddr, unsigned long end,
>   				      bool early)
> @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
>   			}
>   		}
>   
> -		kasan_populate_pmd(pgdp, vaddr, next);
> +		kasan_populate_pgd_next(pgdp, vaddr, next, early);
>   	} while (pgdp++, vaddr = next, vaddr != end);
>   }
>   
> @@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
>   	memset(start, KASAN_SHADOW_INIT, end - start);
>   }
>   
> +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
> +					      unsigned long vaddr, unsigned long end,
> +					      bool kasan_populate)
> +{
> +	unsigned long next;
> +	pud_t *pudp, *base_pud;
> +	pmd_t *base_pmd;
> +	bool is_kasan_pmd;
> +
> +	base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
> +	pudp = base_pud + pud_index(vaddr);
> +
> +	if (kasan_populate)
> +		memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
> +		       sizeof(pud_t) * PTRS_PER_PUD);
> +
> +	do {
> +		next = pud_addr_end(vaddr, end);
> +		is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
> +
> +		if (is_kasan_pmd) {
> +			base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +			set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +		}
> +	} while (pudp++, vaddr = next, vaddr != end);
> +}
> +
>   static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
>   {
>   	unsigned long next;
>   	void *p;
>   	pgd_t *pgd_k = pgd_offset_k(vaddr);
> +	bool is_kasan_pgd_next;
>   
>   	do {
>   		next = pgd_addr_end(vaddr, end);
> -		if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
> +		is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
> +				     (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
> +
> +		if (is_kasan_pgd_next) {
>   			p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>   			set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
>   		}
> +
> +		if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
> +			continue;
> +
> +		kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
>   	} while (pgd_k++, vaddr = next, vaddr != end);
>   }


@Qinglin: I can deal with sv57 kasan population if needs be as it is a 
bit tricky and I think it would save you quite some time :)


>   
> diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
> index 26e69788f27a..b3db5d91ed38 100644
> --- a/drivers/firmware/efi/libstub/efi-stub.c
> +++ b/drivers/firmware/efi/libstub/efi-stub.c
> @@ -40,6 +40,8 @@
>   
>   #ifdef CONFIG_ARM64
>   # define EFI_RT_VIRTUAL_LIMIT	DEFAULT_MAP_WINDOW_64
> +#elif defined(CONFIG_RISCV)
> +# define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE_MIN
>   #else
>   # define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE
>   #endif

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
@ 2021-12-06 11:05     ` Alexandre ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre ghiti @ 2021-12-06 11:05 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

On 12/6/21 11:46, Alexandre Ghiti wrote:
> By adding a new 4th level of page table, give the possibility to 64bit
> kernel to address 2^48 bytes of virtual address: in practice, that offers
> 128TB of virtual address space to userspace and allows up to 64TB of
> physical memory.
>
> If the underlying hardware does not support sv48, we will automatically
> fallback to a standard 3-level page table by folding the new PUD level into
> PGDIR level. In order to detect HW capabilities at runtime, we
> use SATP feature that ignores writes with an unsupported mode.
>
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>   arch/riscv/Kconfig                      |   4 +-
>   arch/riscv/include/asm/csr.h            |   3 +-
>   arch/riscv/include/asm/fixmap.h         |   1 +
>   arch/riscv/include/asm/kasan.h          |   6 +-
>   arch/riscv/include/asm/page.h           |  14 ++
>   arch/riscv/include/asm/pgalloc.h        |  40 +++++
>   arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
>   arch/riscv/include/asm/pgtable.h        |  24 ++-
>   arch/riscv/kernel/head.S                |   3 +-
>   arch/riscv/mm/context.c                 |   4 +-
>   arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
>   arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
>   drivers/firmware/efi/libstub/efi-stub.c |   2 +
>   13 files changed, 514 insertions(+), 44 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ac6c0cd9bc29..d28fe0148e13 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -150,7 +150,7 @@ config PAGE_OFFSET
>   	hex
>   	default 0xC0000000 if 32BIT
>   	default 0x80000000 if 64BIT && !MMU
> -	default 0xffffffd800000000 if 64BIT
> +	default 0xffffaf8000000000 if 64BIT
>   
>   config KASAN_SHADOW_OFFSET
>   	hex
> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
>   
>   config PGTABLE_LEVELS
>   	int
> -	default 3 if 64BIT
> +	default 4 if 64BIT
>   	default 2
>   
>   config LOCKDEP_SUPPORT
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 87ac65696871..3fdb971c7896 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -40,14 +40,13 @@
>   #ifndef CONFIG_64BIT
>   #define SATP_PPN	_AC(0x003FFFFF, UL)
>   #define SATP_MODE_32	_AC(0x80000000, UL)
> -#define SATP_MODE	SATP_MODE_32
>   #define SATP_ASID_BITS	9
>   #define SATP_ASID_SHIFT	22
>   #define SATP_ASID_MASK	_AC(0x1FF, UL)
>   #else
>   #define SATP_PPN	_AC(0x00000FFFFFFFFFFF, UL)
>   #define SATP_MODE_39	_AC(0x8000000000000000, UL)
> -#define SATP_MODE	SATP_MODE_39
> +#define SATP_MODE_48	_AC(0x9000000000000000, UL)
>   #define SATP_ASID_BITS	16
>   #define SATP_ASID_SHIFT	44
>   #define SATP_ASID_MASK	_AC(0xFFFF, UL)
> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> index 54cbf07fb4e9..58a718573ad6 100644
> --- a/arch/riscv/include/asm/fixmap.h
> +++ b/arch/riscv/include/asm/fixmap.h
> @@ -24,6 +24,7 @@ enum fixed_addresses {
>   	FIX_HOLE,
>   	FIX_PTE,
>   	FIX_PMD,
> +	FIX_PUD,
>   	FIX_TEXT_POKE1,
>   	FIX_TEXT_POKE0,
>   	FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index 743e6ff57996..0b85e363e778 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,7 +28,11 @@
>   #define KASAN_SHADOW_SCALE_SHIFT	3
>   
>   #define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +/*
> + * Depending on the size of the virtual address space, the region may not be
> + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> + */
> +#define KASAN_SHADOW_START	((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
>   #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
>   #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>   
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index e03559f9b35e..d089fe46f7d8 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,7 +31,20 @@
>    * When not using MMU this corresponds to the first free page in
>    * physical memory (aligned on a page boundary).
>    */
> +#ifdef CONFIG_64BIT
> +#ifdef CONFIG_MMU
> +#define PAGE_OFFSET		kernel_map.page_offset
> +#else
> +#define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif
> +/*
> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> + * define the PAGE_OFFSET value for SV39.
> + */
> +#define PAGE_OFFSET_L3		_AC(0xffffffd800000000, UL)
> +#else
>   #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif /* CONFIG_64BIT */
>   
>   /*
>    * Half of the kernel address space (half of the entries of the page global
> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
>   #endif /* CONFIG_MMU */
>   
>   struct kernel_mapping {
> +	unsigned long page_offset;
>   	unsigned long virt_addr;
>   	uintptr_t phys_addr;
>   	uintptr_t size;
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index 0af6933a7100..11823004b87a 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -11,6 +11,8 @@
>   #include <asm/tlb.h>
>   
>   #ifdef CONFIG_MMU
> +#define __HAVE_ARCH_PUD_ALLOC_ONE
> +#define __HAVE_ARCH_PUD_FREE
>   #include <asm-generic/pgalloc.h>
>   
>   static inline void pmd_populate_kernel(struct mm_struct *mm,
> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>   
>   	set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
>   }
> +
> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> +				     pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d_safe(p4d,
> +			     __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +#define pud_alloc_one pud_alloc_one
> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> +{
> +	if (pgtable_l4_enabled)
> +		return __pud_alloc_one(mm, addr);
> +
> +	return NULL;
> +}
> +
> +#define pud_free pud_free
> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled)
> +		__pud_free(mm, pud);
> +}
> +
> +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
>   #endif /* __PAGETABLE_PMD_FOLDED */
>   
>   static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index 228261aa9628..bbbdd66e5e2f 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -8,16 +8,36 @@
>   
>   #include <linux/const.h>
>   
> -#define PGDIR_SHIFT     30
> +extern bool pgtable_l4_enabled;
> +
> +#define PGDIR_SHIFT_L3  30
> +#define PGDIR_SHIFT_L4  39
> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> +
> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
>   /* Size of region mapped by a page global directory */
>   #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
>   #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
>   
> +/* pud is folded into pgd in case of 3-level page table */
> +#define PUD_SHIFT      30
> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> +#define PUD_MASK       (~(PUD_SIZE - 1))
> +
>   #define PMD_SHIFT       21
>   /* Size of region mapped by a page middle directory */
>   #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
>   #define PMD_MASK        (~(PMD_SIZE - 1))
>   
> +/* Page Upper Directory entry */
> +typedef struct {
> +	unsigned long pud;
> +} pud_t;
> +
> +#define pud_val(x)      ((x).pud)
> +#define __pud(x)        ((pud_t) { (x) })
> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> +
>   /* Page Middle Directory entry */
>   typedef struct {
>   	unsigned long pmd;
> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
>   	set_pud(pudp, __pud(0));
>   }
>   
> +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> +{
> +	return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> +}
> +
> +static inline unsigned long _pud_pfn(pud_t pud)
> +{
> +	return pud_val(pud) >> _PAGE_PFN_SHIFT;
> +}
> +
>   static inline pmd_t *pud_pgtable(pud_t pud)
>   {
>   	return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
>   	return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
>   }
>   
> +#define mm_pud_folded  mm_pud_folded
> +static inline bool mm_pud_folded(struct mm_struct *mm)
> +{
> +	if (pgtable_l4_enabled)
> +		return false;
> +
> +	return true;
> +}
> +
> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> +
>   static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
>   {
>   	return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
>   #define pmd_ERROR(e) \
>   	pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
>   
> +#define pud_ERROR(e)   \
> +	pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> +
> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		*p4dp = p4d;
> +	else
> +		set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> +}
> +
> +static inline int p4d_none(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) == 0);
> +
> +	return 0;
> +}
> +
> +static inline int p4d_present(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) & _PAGE_PRESENT);
> +
> +	return 1;
> +}
> +
> +static inline int p4d_bad(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return !p4d_present(p4d);
> +
> +	return 0;
> +}
> +
> +static inline void p4d_clear(p4d_t *p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		set_p4d(p4d, __p4d(0));
> +}
> +
> +static inline pud_t *p4d_pgtable(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +
> +	return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> +}
> +
> +static inline struct page *p4d_page(p4d_t p4d)
> +{
> +	return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +}
> +
> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> +
> +#define pud_offset pud_offset
> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> +{
> +	if (pgtable_l4_enabled)
> +		return p4d_pgtable(*p4d) + pud_index(address);
> +
> +	return (pud_t *)p4d;
> +}
> +
>   #endif /* _ASM_RISCV_PGTABLE_64_H */
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index e1a52e22ad7e..e1c74ef4ead2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -51,7 +51,7 @@
>    * position vmemmap directly below the VMALLOC region.
>    */
>   #ifdef CONFIG_64BIT
> -#define VA_BITS		39
> +#define VA_BITS		(pgtable_l4_enabled ? 48 : 39)
>   #else
>   #define VA_BITS		32
>   #endif
> @@ -90,8 +90,7 @@
>   
>   #ifndef __ASSEMBLY__
>   
> -/* Page Upper Directory not used in RISC-V */
> -#include <asm-generic/pgtable-nopud.h>
> +#include <asm-generic/pgtable-nop4d.h>
>   #include <asm/page.h>
>   #include <asm/tlbflush.h>
>   #include <linux/mm_types.h>
> @@ -113,6 +112,17 @@
>   #define XIP_FIXUP(addr)		(addr)
>   #endif /* CONFIG_XIP_KERNEL */
>   
> +struct pt_alloc_ops {
> +	pte_t *(*get_pte_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pte)(uintptr_t va);
> +#ifndef __PAGETABLE_PMD_FOLDED
> +	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pmd)(uintptr_t va);
> +	pud_t *(*get_pud_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pud)(uintptr_t va);
> +#endif
> +};
> +
>   #ifdef CONFIG_MMU
>   /* Number of entries in the page global directory */
>   #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>    * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>    */
>   #ifdef CONFIG_64BIT
> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>   #else
> -#define TASK_SIZE FIXADDR_START
> +#define TASK_SIZE	FIXADDR_START
> +#define TASK_SIZE_MIN	TASK_SIZE
>   #endif
>   
>   #else /* CONFIG_MMU */
> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
>   #define dtb_early_va	_dtb_early_va
>   #define dtb_early_pa	_dtb_early_pa
>   #endif /* CONFIG_XIP_KERNEL */
> +extern u64 satp_mode;
> +extern bool pgtable_l4_enabled;
>   
>   void paging_init(void);
>   void misc_mem_init(void);
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 52c5ff9804c5..c3c0ed559770 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -95,7 +95,8 @@ relocate:
>   
>   	/* Compute satp for kernel page tables, but don't load it yet */
>   	srl a2, a0, PAGE_SHIFT
> -	li a1, SATP_MODE
> +	la a1, satp_mode
> +	REG_L a1, 0(a1)
>   	or a2, a2, a1
>   
>   	/*
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index ee3459cb6750..a7246872bd30 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>   switch_mm_fast:
>   	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
>   		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> -		  SATP_MODE);
> +		  satp_mode);
>   
>   	if (need_flush_tlb)
>   		local_flush_tlb_all();
> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>   static void set_mm_noasid(struct mm_struct *mm)
>   {
>   	/* Switch the page table and blindly nuke entire local TLB */
> -	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> +	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
>   	local_flush_tlb_all();
>   }
>   
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 1552226fb6bd..6a19a1b1caf8 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
>   #define kernel_map	(*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
>   #endif
>   
> +#ifdef CONFIG_64BIT
> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> +#else
> +u64 satp_mode = SATP_MODE_32;
> +#endif
> +EXPORT_SYMBOL(satp_mode);
> +
> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> +				true : false;
> +EXPORT_SYMBOL(pgtable_l4_enabled);
> +
>   phys_addr_t phys_ram_base __ro_after_init;
>   EXPORT_SYMBOL(phys_ram_base);
>   
> @@ -53,15 +64,6 @@ extern char _start[];
>   void *_dtb_early_va __initdata;
>   uintptr_t _dtb_early_pa __initdata;
>   
> -struct pt_alloc_ops {
> -	pte_t *(*get_pte_virt)(phys_addr_t pa);
> -	phys_addr_t (*alloc_pte)(uintptr_t va);
> -#ifndef __PAGETABLE_PMD_FOLDED
> -	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> -	phys_addr_t (*alloc_pmd)(uintptr_t va);
> -#endif
> -};
> -
>   static phys_addr_t dma32_phys_limit __initdata;
>   
>   static void __init zone_sizes_init(void)
> @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
>   }
>   
>   #ifdef CONFIG_MMU
> -static struct pt_alloc_ops _pt_ops __initdata;
> +struct pt_alloc_ops _pt_ops __initdata;
>   
>   #ifdef CONFIG_XIP_KERNEL
>   #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
> @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>   static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>   
>   pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
>   static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>   
>   #ifdef CONFIG_XIP_KERNEL
> @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>   #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
>   #endif /* CONFIG_XIP_KERNEL */
>   
> +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> +
> +#ifdef CONFIG_XIP_KERNEL
> +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
> +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
> +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
> +#endif /* CONFIG_XIP_KERNEL */
> +
>   static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
>   {
>   	/* Before MMU is enabled */
> @@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
>   
>   static phys_addr_t __init alloc_pmd_early(uintptr_t va)
>   {
> -	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +	BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
>   
>   	return (uintptr_t)early_pmd;
>   }
> @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
>   	create_pte_mapping(ptep, va, pa, sz, prot);
>   }
>   
> -#define pgd_next_t		pmd_t
> -#define alloc_pgd_next(__va)	pt_ops.alloc_pmd(__va)
> -#define get_pgd_next_virt(__pa)	pt_ops.get_pmd_virt(__pa)
> +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
> +{
> +	return (pud_t *)((uintptr_t)pa);
> +}
> +
> +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
> +{
> +	clear_fixmap(FIX_PUD);
> +	return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
> +}
> +
> +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
> +{
> +	return (pud_t *)__va(pa);
> +}
> +
> +static phys_addr_t __init alloc_pud_early(uintptr_t va)
> +{
> +	/* Only one PUD is available for early mapping */
> +	BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +
> +	return (uintptr_t)early_pud;
> +}
> +
> +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
> +{
> +	return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> +}
> +
> +static phys_addr_t alloc_pud_late(uintptr_t va)
> +{
> +	unsigned long vaddr;
> +
> +	vaddr = __get_free_page(GFP_KERNEL);
> +	BUG_ON(!vaddr);
> +	return __pa(vaddr);
> +}
> +
> +static void __init create_pud_mapping(pud_t *pudp,
> +				      uintptr_t va, phys_addr_t pa,
> +				      phys_addr_t sz, pgprot_t prot)
> +{
> +	pmd_t *nextp;
> +	phys_addr_t next_phys;
> +	uintptr_t pud_index = pud_index(va);
> +
> +	if (sz == PUD_SIZE) {
> +		if (pud_val(pudp[pud_index]) == 0)
> +			pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
> +		return;
> +	}
> +
> +	if (pud_val(pudp[pud_index]) == 0) {
> +		next_phys = pt_ops.alloc_pmd(va);
> +		pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
> +		nextp = pt_ops.get_pmd_virt(next_phys);
> +		memset(nextp, 0, PAGE_SIZE);
> +	} else {
> +		next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
> +		nextp = pt_ops.get_pmd_virt(next_phys);
> +	}
> +
> +	create_pmd_mapping(nextp, va, pa, sz, prot);
> +}
> +
> +#define pgd_next_t		pud_t
> +#define alloc_pgd_next(__va)	(pgtable_l4_enabled ?			\
> +		pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
> +#define get_pgd_next_virt(__pa)	(pgtable_l4_enabled ?			\
> +		pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
> -	create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next		fixmap_pmd
> +				(pgtable_l4_enabled ?			\
> +		create_pud_mapping(__nextp, __va, __pa, __sz, __prot) :	\
> +		create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
> +#define fixmap_pgd_next		(pgtable_l4_enabled ?			\
> +		(uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
> +#define trampoline_pgd_next	(pgtable_l4_enabled ?			\
> +		(uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
> +#define early_dtb_pgd_next	(pgtable_l4_enabled ?			\
> +		(uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
>   #else
>   #define pgd_next_t		pte_t
>   #define alloc_pgd_next(__va)	pt_ops.alloc_pte(__va)
>   #define get_pgd_next_virt(__pa)	pt_ops.get_pte_virt(__pa)
>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
>   	create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next		fixmap_pte
> +#define fixmap_pgd_next		((uintptr_t)fixmap_pte)
> +#define early_dtb_pgd_next	((uintptr_t)early_dtb_pmd)
> +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
>   #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
> -#endif
> +#endif /* __PAGETABLE_PMD_FOLDED */
>   
>   void __init create_pgd_mapping(pgd_t *pgdp,
>   				      uintptr_t va, phys_addr_t pa,
> @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
>   }
>   #endif /* CONFIG_STRICT_KERNEL_RWX */
>   
> +#ifdef CONFIG_64BIT
> +static void __init disable_pgtable_l4(void)
> +{
> +	pgtable_l4_enabled = false;
> +	kernel_map.page_offset = PAGE_OFFSET_L3;
> +	satp_mode = SATP_MODE_39;
> +}
> +
> +/*
> + * There is a simple way to determine if 4-level is supported by the
> + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> + * then read SATP to see if the configuration was taken into account
> + * meaning sv48 is supported.
> + */
> +static __init void set_satp_mode(void)
> +{
> +	u64 identity_satp, hw_satp;
> +	uintptr_t set_satp_mode_pmd;
> +
> +	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> +	create_pgd_mapping(early_pg_dir,
> +			   set_satp_mode_pmd, (uintptr_t)early_pud,
> +			   PGDIR_SIZE, PAGE_TABLE);
> +	create_pud_mapping(early_pud,
> +			   set_satp_mode_pmd, (uintptr_t)early_pmd,
> +			   PUD_SIZE, PAGE_TABLE);
> +	/* Handle the case where set_satp_mode straddles 2 PMDs */
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd, set_satp_mode_pmd,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +
> +	identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> +
> +	local_flush_tlb_all();
> +	csr_write(CSR_SATP, identity_satp);
> +	hw_satp = csr_swap(CSR_SATP, 0ULL);
> +	local_flush_tlb_all();
> +
> +	if (hw_satp != identity_satp)
> +		disable_pgtable_l4();
> +
> +	memset(early_pg_dir, 0, PAGE_SIZE);
> +	memset(early_pud, 0, PAGE_SIZE);
> +	memset(early_pmd, 0, PAGE_SIZE);
> +}
> +#endif
> +
>   /*
>    * setup_vm() is called from head.S with MMU-off.
>    *
> @@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
>   	uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
>   
>   	create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> -			   IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
> +			   IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
>   			   PGDIR_SIZE,
>   			   IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
>   
> +	if (pgtable_l4_enabled) {
> +		create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
> +				   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
> +	}
> +
>   	if (IS_ENABLED(CONFIG_64BIT)) {
>   		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
>   				   pa, PMD_SIZE, PAGE_KERNEL);
> @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
>   #ifndef __PAGETABLE_PMD_FOLDED
>   	pt_ops.alloc_pmd = alloc_pmd_early;
>   	pt_ops.get_pmd_virt = get_pmd_virt_early;
> +	pt_ops.alloc_pud = alloc_pud_early;
> +	pt_ops.get_pud_virt = get_pud_virt_early;
>   #endif
>   }
>   
> @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
>   #ifndef __PAGETABLE_PMD_FOLDED
>   	pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
>   	pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
> +	pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
> +	pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
>   #endif
>   }
>   
> @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
>   #ifndef __PAGETABLE_PMD_FOLDED
>   	pt_ops.alloc_pmd = alloc_pmd_late;
>   	pt_ops.get_pmd_virt = get_pmd_virt_late;
> +	pt_ops.alloc_pud = alloc_pud_late;
> +	pt_ops.get_pud_virt = get_pud_virt_late;
>   #endif
>   }
>   
> @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   	pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
>   
>   	kernel_map.virt_addr = KERNEL_LINK_ADDR;
> +	kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
>   
>   #ifdef CONFIG_XIP_KERNEL
>   	kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
> @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   	kernel_map.phys_addr = (uintptr_t)(&_start);
>   	kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
>   #endif
> +
> +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
> +	set_satp_mode();
> +#endif
> +
>   	kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
>   	kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>   
> @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   
>   	/* Setup early PGD for fixmap */
>   	create_pgd_mapping(early_pg_dir, FIXADDR_START,
> -			   (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +			   fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
>   
>   #ifndef __PAGETABLE_PMD_FOLDED
> -	/* Setup fixmap PMD */
> +	/* Setup fixmap PUD and PMD */
> +	if (pgtable_l4_enabled)
> +		create_pud_mapping(fixmap_pud, FIXADDR_START,
> +				   (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
>   	create_pmd_mapping(fixmap_pmd, FIXADDR_START,
>   			   (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
>   	/* Setup trampoline PGD and PMD */
>   	create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
> -			   (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
> +			   trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +	if (pgtable_l4_enabled)
> +		create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
> +				   (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
>   #ifdef CONFIG_XIP_KERNEL
>   	create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
>   			   kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
> @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>   	 * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
>   	 * range can not span multiple pmds.
>   	 */
> -	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> +	BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
>   		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
>   
>   #ifndef __PAGETABLE_PMD_FOLDED
> @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
>   	/* Clear fixmap PTE and PMD mappings */
>   	clear_fixmap(FIX_PTE);
>   	clear_fixmap(FIX_PMD);
> +	clear_fixmap(FIX_PUD);
>   
>   	/* Move to swapper page table */
> -	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
> +	csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
>   	local_flush_tlb_all();
>   
>   	pt_ops_set_late();
> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> index 1434a0225140..993f50571a3b 100644
> --- a/arch/riscv/mm/kasan_init.c
> +++ b/arch/riscv/mm/kasan_init.c
> @@ -11,7 +11,29 @@
>   #include <asm/fixmap.h>
>   #include <asm/pgalloc.h>
>   
> +/*
> + * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
> + * which is right before the kernel.
> + *
> + * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
> + * the page global directory with kasan_early_shadow_pmd.
> + *
> + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
> + * must be divided as follows:
> + * - the first PGD entry, although incomplete, is populated with
> + *   kasan_early_shadow_pud/p4d
> + * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
> + * - the last PGD entry is shared with the kernel mapping so populated at the
> + *   lower levels pud/p4d
> + *
> + * In addition, when shallow populating a kasan region (for example vmalloc),
> + * this region may also not be aligned on PGDIR size, so we must go down to the
> + * pud level too.
> + */
> +
>   extern pgd_t early_pg_dir[PTRS_PER_PGD];
> +extern struct pt_alloc_ops _pt_ops __initdata;
> +#define pt_ops	_pt_ops
>   
>   static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
>   {
> @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
>   	set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
>   }
>   
> -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
> +static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
>   {
>   	phys_addr_t phys_addr;
>   	pmd_t *pmdp, *base_pmd;
>   	unsigned long next;
>   
> -	base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
> -	if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +	if (pud_none(*pud)) {
>   		base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +	} else {
> +		base_pmd = (pmd_t *)pud_pgtable(*pud);
> +		if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +			base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +	}
>   
>   	pmdp = base_pmd + pmd_index(vaddr);
>   
> @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
>   	 * it entirely, memblock could allocate a page at a physical address
>   	 * where KASAN is not populated yet and then we'd get a page fault.
>   	 */
> -	set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +	set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +}
> +
> +static void __init kasan_populate_pud(pgd_t *pgd,
> +				      unsigned long vaddr, unsigned long end,
> +				      bool early)
> +{
> +	phys_addr_t phys_addr;
> +	pud_t *pudp, *base_pud;
> +	unsigned long next;
> +
> +	if (early) {
> +		/*
> +		 * We can't use pgd_page_vaddr here as it would return a linear
> +		 * mapping address but it is not mapped yet, but when populating
> +		 * early_pg_dir, we need the physical address and when populating
> +		 * swapper_pg_dir, we need the kernel virtual address so use
> +		 * pt_ops facility.
> +		 */
> +		base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
> +	} else {
> +		base_pud = (pud_t *)pgd_page_vaddr(*pgd);
> +		if (base_pud == lm_alias(kasan_early_shadow_pud))
> +			base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
> +	}
> +
> +	pudp = base_pud + pud_index(vaddr);
> +
> +	do {
> +		next = pud_addr_end(vaddr, end);
> +
> +		if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
> +			if (early) {
> +				phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
> +				set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
> +				continue;
> +			} else {
> +				phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
> +				if (phys_addr) {
> +					set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
> +					continue;
> +				}
> +			}
> +		}
> +
> +		kasan_populate_pmd(pudp, vaddr, next);
> +	} while (pudp++, vaddr = next, vaddr != end);
> +
> +	/*
> +	 * Wait for the whole PGD to be populated before setting the PGD in
> +	 * the page table, otherwise, if we did set the PGD before populating
> +	 * it entirely, memblock could allocate a page at a physical address
> +	 * where KASAN is not populated yet and then we'd get a page fault.
> +	 */
> +	if (!early)
> +		set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
>   }
>   
> +#define kasan_early_shadow_pgd_next			(pgtable_l4_enabled ?	\
> +				(uintptr_t)kasan_early_shadow_pud :		\
> +				(uintptr_t)kasan_early_shadow_pmd)
> +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)			\
> +		(pgtable_l4_enabled ?						\
> +			kasan_populate_pud(pgdp, vaddr, next, early) :		\
> +			kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
> +
>   static void __init kasan_populate_pgd(pgd_t *pgdp,
>   				      unsigned long vaddr, unsigned long end,
>   				      bool early)
> @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
>   			}
>   		}
>   
> -		kasan_populate_pmd(pgdp, vaddr, next);
> +		kasan_populate_pgd_next(pgdp, vaddr, next, early);
>   	} while (pgdp++, vaddr = next, vaddr != end);
>   }
>   
> @@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
>   	memset(start, KASAN_SHADOW_INIT, end - start);
>   }
>   
> +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
> +					      unsigned long vaddr, unsigned long end,
> +					      bool kasan_populate)
> +{
> +	unsigned long next;
> +	pud_t *pudp, *base_pud;
> +	pmd_t *base_pmd;
> +	bool is_kasan_pmd;
> +
> +	base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
> +	pudp = base_pud + pud_index(vaddr);
> +
> +	if (kasan_populate)
> +		memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
> +		       sizeof(pud_t) * PTRS_PER_PUD);
> +
> +	do {
> +		next = pud_addr_end(vaddr, end);
> +		is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
> +
> +		if (is_kasan_pmd) {
> +			base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +			set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +		}
> +	} while (pudp++, vaddr = next, vaddr != end);
> +}
> +
>   static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
>   {
>   	unsigned long next;
>   	void *p;
>   	pgd_t *pgd_k = pgd_offset_k(vaddr);
> +	bool is_kasan_pgd_next;
>   
>   	do {
>   		next = pgd_addr_end(vaddr, end);
> -		if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
> +		is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
> +				     (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
> +
> +		if (is_kasan_pgd_next) {
>   			p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>   			set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
>   		}
> +
> +		if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
> +			continue;
> +
> +		kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
>   	} while (pgd_k++, vaddr = next, vaddr != end);
>   }


@Qinglin: I can deal with sv57 kasan population if needs be as it is a 
bit tricky and I think it would save you quite some time :)


>   
> diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
> index 26e69788f27a..b3db5d91ed38 100644
> --- a/drivers/firmware/efi/libstub/efi-stub.c
> +++ b/drivers/firmware/efi/libstub/efi-stub.c
> @@ -40,6 +40,8 @@
>   
>   #ifdef CONFIG_ARM64
>   # define EFI_RT_VIRTUAL_LIMIT	DEFAULT_MAP_WINDOW_64
> +#elif defined(CONFIG_RISCV)
> +# define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE_MIN
>   #else
>   # define EFI_RT_VIRTUAL_LIMIT	TASK_SIZE
>   #endif

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2021-12-06 11:08   ` Alexandre ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre ghiti @ 2021-12-06 11:08 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

And I messed Atish address, I was pretty sure I could recall it without 
checking, I guess I'm wrong :)

Sorry for the noise,

Alex

On 12/6/21 11:46, Alexandre Ghiti wrote:
> * Please note notable changes in memory layouts and kasan population *
>
> This patchset allows to have a single kernel for sv39 and sv48 without
> being relocatable.
>
> The idea comes from Arnd Bergmann who suggested to do the same as x86,
> that is mapping the kernel to the end of the address space, which allows
> the kernel to be linked at the same address for both sv39 and sv48 and
> then does not require to be relocated at runtime.
>
> This implements sv48 support at runtime. The kernel will try to
> boot with 4-level page table and will fallback to 3-level if the HW does not
> support it. Folding the 4th level into a 3-level page table has almost no
> cost at runtime.
>
> Note that kasan region had to be moved to the end of the address space
> since its location must be known at compile-time and then be valid for
> both sv39 and sv48 (and sv57 that is coming).
>
> Tested on:
>    - qemu rv64 sv39: OK
>    - qemu rv64 sv48: OK
>    - qemu rv64 sv39 + kasan: OK
>    - qemu rv64 sv48 + kasan: OK
>    - qemu rv32: OK
>
> Changes in v3:
>    - Fix SZ_1T, thanks to Atish
>    - Fix warning create_pud_mapping, thanks to Atish
>    - Fix k210 nommu build, thanks to Atish
>    - Fix wrong rebase as noted by Samuel
>    - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>    - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>
> Changes in v2:
>    - Rebase onto for-next
>    - Fix KASAN
>    - Fix stack canary
>    - Get completely rid of MAXPHYSMEM configs
>    - Add documentation
>
> Alexandre Ghiti (13):
>    riscv: Move KASAN mapping next to the kernel mapping
>    riscv: Split early kasan mapping to prepare sv48 introduction
>    riscv: Introduce functions to switch pt_ops
>    riscv: Allow to dynamically define VA_BITS
>    riscv: Get rid of MAXPHYSMEM configs
>    asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>    riscv: Implement sv48 support
>    riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>    riscv: Explicit comment about user virtual address space size
>    riscv: Improve virtual kernel memory layout dump
>    Documentation: riscv: Add sv48 description to VM layout
>    riscv: Initialize thread pointer before calling C functions
>    riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>
>   Documentation/riscv/vm-layout.rst             |  48 ++-
>   arch/riscv/Kconfig                            |  37 +-
>   arch/riscv/configs/nommu_k210_defconfig       |   1 -
>   .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>   arch/riscv/configs/nommu_virt_defconfig       |   1 -
>   arch/riscv/include/asm/csr.h                  |   3 +-
>   arch/riscv/include/asm/fixmap.h               |   1
>   arch/riscv/include/asm/kasan.h                |  11 +-
>   arch/riscv/include/asm/page.h                 |  20 +-
>   arch/riscv/include/asm/pgalloc.h              |  40 ++
>   arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>   arch/riscv/include/asm/pgtable.h              |  47 +-
>   arch/riscv/include/asm/sparsemem.h            |   6 +-
>   arch/riscv/kernel/cpu.c                       |  23 +-
>   arch/riscv/kernel/head.S                      |   4 +-
>   arch/riscv/mm/context.c                       |   4 +-
>   arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>   arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>   drivers/firmware/efi/libstub/efi-stub.c       |   2
>   drivers/pci/controller/pci-xgene.c            |   2 +-
>   include/asm-generic/pgalloc.h                 |  24 +-
>   include/linux/sizes.h                         |   1
>   22 files changed, 833 insertions(+), 209 deletions(-)
>
> --
> 2.32.0
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2021-12-06 11:08   ` Alexandre ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre ghiti @ 2021-12-06 11:08 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

And I messed Atish address, I was pretty sure I could recall it without 
checking, I guess I'm wrong :)

Sorry for the noise,

Alex

On 12/6/21 11:46, Alexandre Ghiti wrote:
> * Please note notable changes in memory layouts and kasan population *
>
> This patchset allows to have a single kernel for sv39 and sv48 without
> being relocatable.
>
> The idea comes from Arnd Bergmann who suggested to do the same as x86,
> that is mapping the kernel to the end of the address space, which allows
> the kernel to be linked at the same address for both sv39 and sv48 and
> then does not require to be relocated at runtime.
>
> This implements sv48 support at runtime. The kernel will try to
> boot with 4-level page table and will fallback to 3-level if the HW does not
> support it. Folding the 4th level into a 3-level page table has almost no
> cost at runtime.
>
> Note that kasan region had to be moved to the end of the address space
> since its location must be known at compile-time and then be valid for
> both sv39 and sv48 (and sv57 that is coming).
>
> Tested on:
>    - qemu rv64 sv39: OK
>    - qemu rv64 sv48: OK
>    - qemu rv64 sv39 + kasan: OK
>    - qemu rv64 sv48 + kasan: OK
>    - qemu rv32: OK
>
> Changes in v3:
>    - Fix SZ_1T, thanks to Atish
>    - Fix warning create_pud_mapping, thanks to Atish
>    - Fix k210 nommu build, thanks to Atish
>    - Fix wrong rebase as noted by Samuel
>    - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>    - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>
> Changes in v2:
>    - Rebase onto for-next
>    - Fix KASAN
>    - Fix stack canary
>    - Get completely rid of MAXPHYSMEM configs
>    - Add documentation
>
> Alexandre Ghiti (13):
>    riscv: Move KASAN mapping next to the kernel mapping
>    riscv: Split early kasan mapping to prepare sv48 introduction
>    riscv: Introduce functions to switch pt_ops
>    riscv: Allow to dynamically define VA_BITS
>    riscv: Get rid of MAXPHYSMEM configs
>    asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>    riscv: Implement sv48 support
>    riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>    riscv: Explicit comment about user virtual address space size
>    riscv: Improve virtual kernel memory layout dump
>    Documentation: riscv: Add sv48 description to VM layout
>    riscv: Initialize thread pointer before calling C functions
>    riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>
>   Documentation/riscv/vm-layout.rst             |  48 ++-
>   arch/riscv/Kconfig                            |  37 +-
>   arch/riscv/configs/nommu_k210_defconfig       |   1 -
>   .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>   arch/riscv/configs/nommu_virt_defconfig       |   1 -
>   arch/riscv/include/asm/csr.h                  |   3 +-
>   arch/riscv/include/asm/fixmap.h               |   1
>   arch/riscv/include/asm/kasan.h                |  11 +-
>   arch/riscv/include/asm/page.h                 |  20 +-
>   arch/riscv/include/asm/pgalloc.h              |  40 ++
>   arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>   arch/riscv/include/asm/pgtable.h              |  47 +-
>   arch/riscv/include/asm/sparsemem.h            |   6 +-
>   arch/riscv/kernel/cpu.c                       |  23 +-
>   arch/riscv/kernel/head.S                      |   4 +-
>   arch/riscv/mm/context.c                       |   4 +-
>   arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>   arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>   drivers/firmware/efi/libstub/efi-stub.c       |   2
>   drivers/pci/controller/pci-xgene.c            |   2 +-
>   include/asm-generic/pgalloc.h                 |  24 +-
>   include/linux/sizes.h                         |   1
>   22 files changed, 833 insertions(+), 209 deletions(-)
>
> --
> 2.32.0
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 01/13] riscv: Move KASAN mapping next to the kernel mapping
  2021-12-06 10:46   ` Alexandre Ghiti
@ 2021-12-06 16:18     ` Jisheng Zhang
  -1 siblings, 0 replies; 70+ messages in thread
From: Jisheng Zhang @ 2021-12-06 16:18 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

On Mon,  6 Dec 2021 11:46:45 +0100
Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:

> Now that KASAN_SHADOW_OFFSET is defined at compile time as a config,
> this value must remain constant whatever the size of the virtual address
> space, which is only possible by pushing this region at the end of the
> address space next to the kernel mapping.
> 
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  Documentation/riscv/vm-layout.rst | 12 ++++++------
>  arch/riscv/Kconfig                |  4 ++--
>  arch/riscv/include/asm/kasan.h    |  4 ++--
>  arch/riscv/include/asm/page.h     |  6 +++++-
>  arch/riscv/include/asm/pgtable.h  |  6 ++++--
>  arch/riscv/mm/init.c              | 25 +++++++++++++------------
>  6 files changed, 32 insertions(+), 25 deletions(-)
> 
> diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
> index b7f98930d38d..1bd687b97104 100644
> --- a/Documentation/riscv/vm-layout.rst
> +++ b/Documentation/riscv/vm-layout.rst
> @@ -47,12 +47,12 @@ RISC-V Linux Kernel SV39
>                                                                | Kernel-space virtual memory, shared between all processes:
>    ____________________________________________________________|___________________________________________________________
>                      |            |                  |         |
> -   ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
> -   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
> -   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
> -   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
> -   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB | vmalloc/ioremap space
> -   ffffffe000000000 | -128    GB | ffffffff7fffffff |  124 GB | direct mapping of all physical memory
> +   ffffffc6fee00000 | -228    GB | ffffffc6feffffff |    2 MB | fixmap
> +   ffffffc6ff000000 | -228    GB | ffffffc6ffffffff |   16 MB | PCI io
> +   ffffffc700000000 | -228    GB | ffffffc7ffffffff |    4 GB | vmemmap
> +   ffffffc800000000 | -224    GB | ffffffd7ffffffff |   64 GB | vmalloc/ioremap space
> +   ffffffd800000000 | -160    GB | fffffff6ffffffff |  124 GB | direct mapping of all physical memory
> +   fffffff700000000 |  -36    GB | fffffffeffffffff |   32 GB | kasan
>    __________________|____________|__________________|_________|____________________________________________________________
>                                                                |
>                                                                |
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 6d5b63bd4bd9..6cd98ade5ebc 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -161,12 +161,12 @@ config PAGE_OFFSET
>  	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
>  	default 0x80000000 if 64BIT && !MMU
>  	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> -	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> +	default 0xffffffd800000000 if 64BIT && MAXPHYSMEM_128GB
>  
>  config KASAN_SHADOW_OFFSET
>  	hex
>  	depends on KASAN_GENERIC
> -	default 0xdfffffc800000000 if 64BIT
> +	default 0xdfffffff00000000 if 64BIT
>  	default 0xffffffff if 32BIT
>  
>  config ARCH_FLATMEM_ENABLE
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index b00f503ec124..257a2495145a 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,8 +28,8 @@
>  #define KASAN_SHADOW_SCALE_SHIFT	3
>  
>  #define KASAN_SHADOW_SIZE	(UL(1) << ((CONFIG_VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START	KERN_VIRT_START
> -#define KASAN_SHADOW_END	(KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
> +#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +#define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
>  #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>  
>  void kasan_init(void);
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 109c97e991a6..e03559f9b35e 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -33,7 +33,11 @@
>   */
>  #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
>  
> -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> +/*
> + * Half of the kernel address space (half of the entries of the page global
> + * directory) is for the direct mapping.
> + */
> +#define KERN_VIRT_SIZE		((PTRS_PER_PGD / 2 * PGDIR_SIZE) / 2)
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 39b550310ec6..d34f3a7a9701 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -39,8 +39,10 @@
>  
>  /* Modules always live before the kernel */
>  #ifdef CONFIG_64BIT
> -#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
> -#define MODULES_END	(PFN_ALIGN((unsigned long)&_start))
> +/* This is used to define the end of the KASAN shadow region */
> +#define MODULES_LOWEST_VADDR	(KERNEL_LINK_ADDR - SZ_2G)
> +#define MODULES_VADDR		(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
> +#define MODULES_END		(PFN_ALIGN((unsigned long)&_start))
>  #endif
>  
>  /*
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index c0cddf0fc22d..4224e9d0ecf5 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -103,6 +103,9 @@ static void __init print_vm_layout(void)
>  	print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
>  		  (unsigned long)high_memory);
>  #ifdef CONFIG_64BIT
> +#ifdef CONFIG_KASAN
> +	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
> +#endif

I think we'd better avoid #ifdef usage as much as possible.
For this KASAN case, we can make both KASAN_SHADOW_START and KASAN_SHADOW_END
always visible as x86 does, then above code can be
if (IS_ENABLED(CONFIG_KASAN))
	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);

Thanks

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 01/13] riscv: Move KASAN mapping next to the kernel mapping
@ 2021-12-06 16:18     ` Jisheng Zhang
  0 siblings, 0 replies; 70+ messages in thread
From: Jisheng Zhang @ 2021-12-06 16:18 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

On Mon,  6 Dec 2021 11:46:45 +0100
Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:

> Now that KASAN_SHADOW_OFFSET is defined at compile time as a config,
> this value must remain constant whatever the size of the virtual address
> space, which is only possible by pushing this region at the end of the
> address space next to the kernel mapping.
> 
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  Documentation/riscv/vm-layout.rst | 12 ++++++------
>  arch/riscv/Kconfig                |  4 ++--
>  arch/riscv/include/asm/kasan.h    |  4 ++--
>  arch/riscv/include/asm/page.h     |  6 +++++-
>  arch/riscv/include/asm/pgtable.h  |  6 ++++--
>  arch/riscv/mm/init.c              | 25 +++++++++++++------------
>  6 files changed, 32 insertions(+), 25 deletions(-)
> 
> diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
> index b7f98930d38d..1bd687b97104 100644
> --- a/Documentation/riscv/vm-layout.rst
> +++ b/Documentation/riscv/vm-layout.rst
> @@ -47,12 +47,12 @@ RISC-V Linux Kernel SV39
>                                                                | Kernel-space virtual memory, shared between all processes:
>    ____________________________________________________________|___________________________________________________________
>                      |            |                  |         |
> -   ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
> -   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
> -   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
> -   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
> -   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB | vmalloc/ioremap space
> -   ffffffe000000000 | -128    GB | ffffffff7fffffff |  124 GB | direct mapping of all physical memory
> +   ffffffc6fee00000 | -228    GB | ffffffc6feffffff |    2 MB | fixmap
> +   ffffffc6ff000000 | -228    GB | ffffffc6ffffffff |   16 MB | PCI io
> +   ffffffc700000000 | -228    GB | ffffffc7ffffffff |    4 GB | vmemmap
> +   ffffffc800000000 | -224    GB | ffffffd7ffffffff |   64 GB | vmalloc/ioremap space
> +   ffffffd800000000 | -160    GB | fffffff6ffffffff |  124 GB | direct mapping of all physical memory
> +   fffffff700000000 |  -36    GB | fffffffeffffffff |   32 GB | kasan
>    __________________|____________|__________________|_________|____________________________________________________________
>                                                                |
>                                                                |
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 6d5b63bd4bd9..6cd98ade5ebc 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -161,12 +161,12 @@ config PAGE_OFFSET
>  	default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
>  	default 0x80000000 if 64BIT && !MMU
>  	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> -	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> +	default 0xffffffd800000000 if 64BIT && MAXPHYSMEM_128GB
>  
>  config KASAN_SHADOW_OFFSET
>  	hex
>  	depends on KASAN_GENERIC
> -	default 0xdfffffc800000000 if 64BIT
> +	default 0xdfffffff00000000 if 64BIT
>  	default 0xffffffff if 32BIT
>  
>  config ARCH_FLATMEM_ENABLE
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index b00f503ec124..257a2495145a 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,8 +28,8 @@
>  #define KASAN_SHADOW_SCALE_SHIFT	3
>  
>  #define KASAN_SHADOW_SIZE	(UL(1) << ((CONFIG_VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START	KERN_VIRT_START
> -#define KASAN_SHADOW_END	(KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
> +#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +#define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
>  #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>  
>  void kasan_init(void);
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 109c97e991a6..e03559f9b35e 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -33,7 +33,11 @@
>   */
>  #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
>  
> -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> +/*
> + * Half of the kernel address space (half of the entries of the page global
> + * directory) is for the direct mapping.
> + */
> +#define KERN_VIRT_SIZE		((PTRS_PER_PGD / 2 * PGDIR_SIZE) / 2)
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 39b550310ec6..d34f3a7a9701 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -39,8 +39,10 @@
>  
>  /* Modules always live before the kernel */
>  #ifdef CONFIG_64BIT
> -#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
> -#define MODULES_END	(PFN_ALIGN((unsigned long)&_start))
> +/* This is used to define the end of the KASAN shadow region */
> +#define MODULES_LOWEST_VADDR	(KERNEL_LINK_ADDR - SZ_2G)
> +#define MODULES_VADDR		(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
> +#define MODULES_END		(PFN_ALIGN((unsigned long)&_start))
>  #endif
>  
>  /*
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index c0cddf0fc22d..4224e9d0ecf5 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -103,6 +103,9 @@ static void __init print_vm_layout(void)
>  	print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
>  		  (unsigned long)high_memory);
>  #ifdef CONFIG_64BIT
> +#ifdef CONFIG_KASAN
> +	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
> +#endif

I think we'd better avoid #ifdef usage as much as possible.
For this KASAN case, we can make both KASAN_SHADOW_START and KASAN_SHADOW_END
always visible as x86 does, then above code can be
if (IS_ENABLED(CONFIG_KASAN))
	print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);

Thanks

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 10/13] riscv: Improve virtual kernel memory layout dump
  2021-12-06 10:46   ` Alexandre Ghiti
  (?)
@ 2021-12-09  4:18   ` 潘庆霖
  2021-12-09  9:09     ` Alexandre ghiti
  -1 siblings, 1 reply; 70+ messages in thread
From: 潘庆霖 @ 2021-12-09  4:18 UTC (permalink / raw)
  To: Alexandre Ghiti, linux-riscv

Hi Alex,


On 2021/12/6 18:46, Alexandre Ghiti wrote:
 > With the arrival of sv48 and its large address space, it would be
 > cumbersome to statically define the unit size to use to print the 
different
 > portions of the virtual memory layout: instead, determine it dynamically.
 >
 > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
 > ---
 >  arch/riscv/mm/init.c               | 67 +++++++++++++++++++++++-------
 >  drivers/pci/controller/pci-xgene.c |  2 +-
 >  include/linux/sizes.h              |  1 +
 >  3 files changed, 54 insertions(+), 16 deletions(-)
 >
 > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
 > index 6a19a1b1caf8..28de6ea0a720 100644
 > --- a/arch/riscv/mm/init.c
 > +++ b/arch/riscv/mm/init.c
 > @@ -79,37 +79,74 @@ static void __init zone_sizes_init(void)
 >  }
 >
 >  #if defined(CONFIG_MMU) && defined(CONFIG_DEBUG_VM)
 > +
 > +#define LOG2_SZ_1K  ilog2(SZ_1K)
 > +#define LOG2_SZ_1M  ilog2(SZ_1M)
 > +#define LOG2_SZ_1G  ilog2(SZ_1G)
 > +#define LOG2_SZ_1T  ilog2(SZ_1T)
 > +
 >  static inline void print_mlk(char *name, unsigned long b, unsigned 
long t)
 >  {
 >      pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld kB)\n", name, b, t,
 > -          (((t) - (b)) >> 10));
 > +          (((t) - (b)) >> LOG2_SZ_1K));
 >  }
 >
 >  static inline void print_mlm(char *name, unsigned long b, unsigned 
long t)
 >  {
 >      pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld MB)\n", name, b, t,
 > -          (((t) - (b)) >> 20));
 > +          (((t) - (b)) >> LOG2_SZ_1M));
 > +}
 > +
 > +static inline void print_mlg(char *name, unsigned long b, unsigned 
long t)
 > +{
 > +    pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld GB)\n", name, b, t,
 > +          (((t) - (b)) >> LOG2_SZ_1G));
 > +}
 > +
 > +#ifdef CONFIG_64BIT
 > +static inline void print_mlt(char *name, unsigned long b, unsigned 
long t)
 > +{
 > +    pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld TB)\n", name, b, t,
 > +          (((t) - (b)) >> LOG2_SZ_1T));
 > +}
 > +#endif
 > +
 > +static inline void print_ml(char *name, unsigned long b, unsigned 
long t)
 > +{
 > +    unsigned long diff = t - b;
 > +
 > +#ifdef CONFIG_64BIT
 > +    if ((diff >> LOG2_SZ_1T) >= 10)
 > +        print_mlt(name, b, t);
 > +    else
 > +#endif
 > +    if ((diff >> LOG2_SZ_1G) >= 10)
 > +        print_mlg(name, b, t);
 > +    else if ((diff >> LOG2_SZ_1M) >= 10)
 > +        print_mlm(name, b, t);
 > +    else
 > +        print_mlk(name, b, t);
 >  }
 >
 >  static void __init print_vm_layout(void)
 >  {
 >      pr_notice("Virtual kernel memory layout:\n");
 > -    print_mlk("fixmap", (unsigned long)FIXADDR_START,
 > -          (unsigned long)FIXADDR_TOP);
 > -    print_mlm("pci io", (unsigned long)PCI_IO_START,
 > -          (unsigned long)PCI_IO_END);
 > -    print_mlm("vmemmap", (unsigned long)VMEMMAP_START,
 > -          (unsigned long)VMEMMAP_END);
 > -    print_mlm("vmalloc", (unsigned long)VMALLOC_START,
 > -          (unsigned long)VMALLOC_END);
 > -    print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
 > -          (unsigned long)high_memory);
 > +    print_ml("fixmap", (unsigned long)FIXADDR_START,
 > +         (unsigned long)FIXADDR_TOP);
 > +    print_ml("pci io", (unsigned long)PCI_IO_START,
 > +         (unsigned long)PCI_IO_END);
 > +    print_ml("vmemmap", (unsigned long)VMEMMAP_START,
 > +         (unsigned long)VMEMMAP_END);
 > +    print_ml("vmalloc", (unsigned long)VMALLOC_START,
 > +         (unsigned long)VMALLOC_END);
 > +    print_ml("lowmem", (unsigned long)PAGE_OFFSET,
 > +         (unsigned long)high_memory);
 >  #ifdef CONFIG_64BIT
 >  #ifdef CONFIG_KASAN
 > -    print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
 > +    print_ml("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
 >  #endif
 > -    print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
 > -          (unsigned long)ADDRESS_SPACE_END);
 > +    print_ml("kernel", (unsigned long)KERNEL_LINK_ADDR,
 > +         (unsigned long)ADDRESS_SPACE_END);
 >  #endif
 >  }
 >  #else
 > diff --git a/drivers/pci/controller/pci-xgene.c 
b/drivers/pci/controller/pci-xgene.c
 > index e64536047b65..187dcf8a9694 100644
 > --- a/drivers/pci/controller/pci-xgene.c
 > +++ b/drivers/pci/controller/pci-xgene.c
 > @@ -21,6 +21,7 @@
 >  #include <linux/pci-ecam.h>
 >  #include <linux/platform_device.h>
 >  #include <linux/slab.h>
 > +#include <linux/sizes.h>
 >
 >  #include "../pci.h"
 >
 > @@ -50,7 +51,6 @@
 >  #define OB_LO_IO            0x00000002
 >  #define XGENE_PCIE_VENDORID        0x10E8
 >  #define XGENE_PCIE_DEVICEID        0xE004
 > -#define SZ_1T                (SZ_1G*1024ULL)

I am trying to apply your patchset on upstream's master or for-next 
branch. The git repo is

git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git

and I get a failure. The commit which I apply on is 
fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf

I found the code here on that commit is:

#define OB_LO_IO            0x00000002
#define XGENE_PCIE_DEVICEID        0xE004
#define SZ_1T                    (SZ_1G*1024ULL)
#define PIPE_PHY_RATE_RD(src)        ((0xc000 & (u32)(src)) >> 0xe)

I think it may be the reason why the apply is failed. May I get your 
help to determine the reason?

Thanks,
Qinglin


 >
 >  #define PIPE_PHY_RATE_RD(src)        ((0xc000 & (u32)(src)) >> 0xe)
 >
 >  #define XGENE_V1_PCI_EXP_CAP        0x40
 > diff --git a/include/linux/sizes.h b/include/linux/sizes.h
 > index 1ac79bcee2bb..0bc6cf394b08 100644
 > --- a/include/linux/sizes.h
 > +++ b/include/linux/sizes.h
 > @@ -47,6 +47,7 @@
 >  #define SZ_8G                _AC(0x200000000, ULL)
 >  #define SZ_16G                _AC(0x400000000, ULL)
 >  #define SZ_32G                _AC(0x800000000, ULL)
 > +#define SZ_1T                _AC(0x10000000000, ULL)
 >  #define SZ_64T                _AC(0x400000000000, ULL)
 >
 >  #endif /* __LINUX_SIZES_H__ */


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-06 11:05     ` Alexandre ghiti
@ 2021-12-09  4:32       ` 潘庆霖
  -1 siblings, 0 replies; 70+ messages in thread
From: 潘庆霖 @ 2021-12-09  4:32 UTC (permalink / raw)
  To: Alexandre ghiti, Alexandre Ghiti, Jonathan Corbet, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Zong Li, Anup Patel, Atish Patra,
	Christoph Hellwig, Andrey Ryabinin, Alexander Potapenko,
	Andrey Konovalov, Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann,
	Kees Cook, Guo Ren, Heinrich Schuchardt, Mayuresh Chitale,
	linux-doc, linux-riscv, linux-kernel, kasan-dev, linux-efi,
	linux-arch

Hi Alex,

On 2021/12/6 19:05, Alexandre ghiti wrote:
 > On 12/6/21 11:46, Alexandre Ghiti wrote:
 >> By adding a new 4th level of page table, give the possibility to 64bit
 >> kernel to address 2^48 bytes of virtual address: in practice, that 
offers
 >> 128TB of virtual address space to userspace and allows up to 64TB of
 >> physical memory.
 >>
 >> If the underlying hardware does not support sv48, we will automatically
 >> fallback to a standard 3-level page table by folding the new PUD 
level into
 >> PGDIR level. In order to detect HW capabilities at runtime, we
 >> use SATP feature that ignores writes with an unsupported mode.
 >>
 >> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
 >> ---
 >>   arch/riscv/Kconfig                      |   4 +-
 >>   arch/riscv/include/asm/csr.h            |   3 +-
 >>   arch/riscv/include/asm/fixmap.h         |   1 +
 >>   arch/riscv/include/asm/kasan.h          |   6 +-
 >>   arch/riscv/include/asm/page.h           |  14 ++
 >>   arch/riscv/include/asm/pgalloc.h        |  40 +++++
 >>   arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
 >>   arch/riscv/include/asm/pgtable.h        |  24 ++-
 >>   arch/riscv/kernel/head.S                |   3 +-
 >>   arch/riscv/mm/context.c                 |   4 +-
 >>   arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
 >>   arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
 >>   drivers/firmware/efi/libstub/efi-stub.c |   2 +
 >>   13 files changed, 514 insertions(+), 44 deletions(-)
 >>
 >> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
 >> index ac6c0cd9bc29..d28fe0148e13 100644
 >> --- a/arch/riscv/Kconfig
 >> +++ b/arch/riscv/Kconfig
 >> @@ -150,7 +150,7 @@ config PAGE_OFFSET
 >>       hex
 >>       default 0xC0000000 if 32BIT
 >>       default 0x80000000 if 64BIT && !MMU
 >> -    default 0xffffffd800000000 if 64BIT
 >> +    default 0xffffaf8000000000 if 64BIT
 >>     config KASAN_SHADOW_OFFSET
 >>       hex
 >> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
 >>     config PGTABLE_LEVELS
 >>       int
 >> -    default 3 if 64BIT
 >> +    default 4 if 64BIT
 >>       default 2
 >>     config LOCKDEP_SUPPORT
 >> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
 >> index 87ac65696871..3fdb971c7896 100644
 >> --- a/arch/riscv/include/asm/csr.h
 >> +++ b/arch/riscv/include/asm/csr.h
 >> @@ -40,14 +40,13 @@
 >>   #ifndef CONFIG_64BIT
 >>   #define SATP_PPN    _AC(0x003FFFFF, UL)
 >>   #define SATP_MODE_32    _AC(0x80000000, UL)
 >> -#define SATP_MODE    SATP_MODE_32
 >>   #define SATP_ASID_BITS    9
 >>   #define SATP_ASID_SHIFT    22
 >>   #define SATP_ASID_MASK    _AC(0x1FF, UL)
 >>   #else
 >>   #define SATP_PPN    _AC(0x00000FFFFFFFFFFF, UL)
 >>   #define SATP_MODE_39    _AC(0x8000000000000000, UL)
 >> -#define SATP_MODE    SATP_MODE_39
 >> +#define SATP_MODE_48    _AC(0x9000000000000000, UL)
 >>   #define SATP_ASID_BITS    16
 >>   #define SATP_ASID_SHIFT    44
 >>   #define SATP_ASID_MASK    _AC(0xFFFF, UL)
 >> diff --git a/arch/riscv/include/asm/fixmap.h 
b/arch/riscv/include/asm/fixmap.h
 >> index 54cbf07fb4e9..58a718573ad6 100644
 >> --- a/arch/riscv/include/asm/fixmap.h
 >> +++ b/arch/riscv/include/asm/fixmap.h
 >> @@ -24,6 +24,7 @@ enum fixed_addresses {
 >>       FIX_HOLE,
 >>       FIX_PTE,
 >>       FIX_PMD,
 >> +    FIX_PUD,
 >>       FIX_TEXT_POKE1,
 >>       FIX_TEXT_POKE0,
 >>       FIX_EARLYCON_MEM_BASE,
 >> diff --git a/arch/riscv/include/asm/kasan.h 
b/arch/riscv/include/asm/kasan.h
 >> index 743e6ff57996..0b85e363e778 100644
 >> --- a/arch/riscv/include/asm/kasan.h
 >> +++ b/arch/riscv/include/asm/kasan.h
 >> @@ -28,7 +28,11 @@
 >>   #define KASAN_SHADOW_SCALE_SHIFT    3
 >>     #define KASAN_SHADOW_SIZE    (UL(1) << ((VA_BITS - 1) - 
KASAN_SHADOW_SCALE_SHIFT))
 >> -#define KASAN_SHADOW_START    (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
 >> +/*
 >> + * Depending on the size of the virtual address space, the region 
may not be
 >> + * aligned on PGDIR_SIZE, so force its alignment to ease its 
population.
 >> + */
 >> +#define KASAN_SHADOW_START    ((KASAN_SHADOW_END - 
KASAN_SHADOW_SIZE) & PGDIR_MASK)
 >>   #define KASAN_SHADOW_END    MODULES_LOWEST_VADDR
 >>   #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
 >>   diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h
 >> index e03559f9b35e..d089fe46f7d8 100644
 >> --- a/arch/riscv/include/asm/page.h
 >> +++ b/arch/riscv/include/asm/page.h
 >> @@ -31,7 +31,20 @@
 >>    * When not using MMU this corresponds to the first free page in
 >>    * physical memory (aligned on a page boundary).
 >>    */
 >> +#ifdef CONFIG_64BIT
 >> +#ifdef CONFIG_MMU
 >> +#define PAGE_OFFSET        kernel_map.page_offset
 >> +#else
 >> +#define PAGE_OFFSET        _AC(CONFIG_PAGE_OFFSET, UL)
 >> +#endif
 >> +/*
 >> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address 
space so
 >> + * define the PAGE_OFFSET value for SV39.
 >> + */
 >> +#define PAGE_OFFSET_L3        _AC(0xffffffd800000000, UL)
 >> +#else
 >>   #define PAGE_OFFSET        _AC(CONFIG_PAGE_OFFSET, UL)
 >> +#endif /* CONFIG_64BIT */
 >>     /*
 >>    * Half of the kernel address space (half of the entries of the 
page global
 >> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
 >>   #endif /* CONFIG_MMU */
 >>     struct kernel_mapping {
 >> +    unsigned long page_offset;
 >>       unsigned long virt_addr;
 >>       uintptr_t phys_addr;
 >>       uintptr_t size;
 >> diff --git a/arch/riscv/include/asm/pgalloc.h 
b/arch/riscv/include/asm/pgalloc.h
 >> index 0af6933a7100..11823004b87a 100644
 >> --- a/arch/riscv/include/asm/pgalloc.h
 >> +++ b/arch/riscv/include/asm/pgalloc.h
 >> @@ -11,6 +11,8 @@
 >>   #include <asm/tlb.h>
 >>     #ifdef CONFIG_MMU
 >> +#define __HAVE_ARCH_PUD_ALLOC_ONE
 >> +#define __HAVE_ARCH_PUD_FREE
 >>   #include <asm-generic/pgalloc.h>
 >>     static inline void pmd_populate_kernel(struct mm_struct *mm,
 >> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct 
*mm, pud_t *pud, pmd_t *pmd)
 >>         set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 >>   }
 >> +
 >> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, 
pud_t *pud)
 >> +{
 >> +    if (pgtable_l4_enabled) {
 >> +        unsigned long pfn = virt_to_pfn(pud);
 >> +
 >> +        set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 >> +    }
 >> +}
 >> +
 >> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
 >> +                     pud_t *pud)
 >> +{
 >> +    if (pgtable_l4_enabled) {
 >> +        unsigned long pfn = virt_to_pfn(pud);
 >> +
 >> +        set_p4d_safe(p4d,
 >> +                 __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 >> +    }
 >> +}
 >> +
 >> +#define pud_alloc_one pud_alloc_one
 >> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned 
long addr)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return __pud_alloc_one(mm, addr);
 >> +
 >> +    return NULL;
 >> +}
 >> +
 >> +#define pud_free pud_free
 >> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        __pud_free(mm, pud);
 >> +}
 >> +
 >> +#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
 >>   #endif /* __PAGETABLE_PMD_FOLDED */
 >>     static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 >> diff --git a/arch/riscv/include/asm/pgtable-64.h 
b/arch/riscv/include/asm/pgtable-64.h
 >> index 228261aa9628..bbbdd66e5e2f 100644
 >> --- a/arch/riscv/include/asm/pgtable-64.h
 >> +++ b/arch/riscv/include/asm/pgtable-64.h
 >> @@ -8,16 +8,36 @@
 >>     #include <linux/const.h>
 >>   -#define PGDIR_SHIFT     30
 >> +extern bool pgtable_l4_enabled;
 >> +
 >> +#define PGDIR_SHIFT_L3  30
 >> +#define PGDIR_SHIFT_L4  39
 >> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
 >> +
 >> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : 
PGDIR_SHIFT_L3)
 >>   /* Size of region mapped by a page global directory */
 >>   #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
 >>   #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
 >>   +/* pud is folded into pgd in case of 3-level page table */
 >> +#define PUD_SHIFT      30
 >> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
 >> +#define PUD_MASK       (~(PUD_SIZE - 1))
 >> +
 >>   #define PMD_SHIFT       21
 >>   /* Size of region mapped by a page middle directory */
 >>   #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
 >>   #define PMD_MASK        (~(PMD_SIZE - 1))
 >>   +/* Page Upper Directory entry */
 >> +typedef struct {
 >> +    unsigned long pud;
 >> +} pud_t;
 >> +
 >> +#define pud_val(x)      ((x).pud)
 >> +#define __pud(x)        ((pud_t) { (x) })
 >> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
 >> +
 >>   /* Page Middle Directory entry */
 >>   typedef struct {
 >>       unsigned long pmd;
 >> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
 >>       set_pud(pudp, __pud(0));
 >>   }
 >>   +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
 >> +{
 >> +    return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
 >> +}
 >> +
 >> +static inline unsigned long _pud_pfn(pud_t pud)
 >> +{
 >> +    return pud_val(pud) >> _PAGE_PFN_SHIFT;
 >> +}
 >> +
 >>   static inline pmd_t *pud_pgtable(pud_t pud)
 >>   {
 >>       return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
 >> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
 >>       return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
 >>   }
 >>   +#define mm_pud_folded  mm_pud_folded
 >> +static inline bool mm_pud_folded(struct mm_struct *mm)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return false;
 >> +
 >> +    return true;
 >> +}
 >> +
 >> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
 >> +
 >>   static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
 >>   {
 >>       return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
 >> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
 >>   #define pmd_ERROR(e) \
 >>       pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
 >>   +#define pud_ERROR(e)   \
 >> +    pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
 >> +
 >> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        *p4dp = p4d;
 >> +    else
 >> +        set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
 >> +}
 >> +
 >> +static inline int p4d_none(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return (p4d_val(p4d) == 0);
 >> +
 >> +    return 0;
 >> +}
 >> +
 >> +static inline int p4d_present(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return (p4d_val(p4d) & _PAGE_PRESENT);
 >> +
 >> +    return 1;
 >> +}
 >> +
 >> +static inline int p4d_bad(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return !p4d_present(p4d);
 >> +
 >> +    return 0;
 >> +}
 >> +
 >> +static inline void p4d_clear(p4d_t *p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        set_p4d(p4d, __p4d(0));
 >> +}
 >> +
 >> +static inline pud_t *p4d_pgtable(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
 >> +
 >> +    return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
 >> +}
 >> +
 >> +static inline struct page *p4d_page(p4d_t p4d)
 >> +{
 >> +    return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
 >> +}
 >> +
 >> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
 >> +
 >> +#define pud_offset pud_offset
 >> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return p4d_pgtable(*p4d) + pud_index(address);
 >> +
 >> +    return (pud_t *)p4d;
 >> +}
 >> +
 >>   #endif /* _ASM_RISCV_PGTABLE_64_H */
 >> diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h
 >> index e1a52e22ad7e..e1c74ef4ead2 100644
 >> --- a/arch/riscv/include/asm/pgtable.h
 >> +++ b/arch/riscv/include/asm/pgtable.h
 >> @@ -51,7 +51,7 @@
 >>    * position vmemmap directly below the VMALLOC region.
 >>    */
 >>   #ifdef CONFIG_64BIT
 >> -#define VA_BITS        39
 >> +#define VA_BITS        (pgtable_l4_enabled ? 48 : 39)
 >>   #else
 >>   #define VA_BITS        32
 >>   #endif
 >> @@ -90,8 +90,7 @@
 >>     #ifndef __ASSEMBLY__
 >>   -/* Page Upper Directory not used in RISC-V */
 >> -#include <asm-generic/pgtable-nopud.h>
 >> +#include <asm-generic/pgtable-nop4d.h>
 >>   #include <asm/page.h>
 >>   #include <asm/tlbflush.h>
 >>   #include <linux/mm_types.h>
 >> @@ -113,6 +112,17 @@
 >>   #define XIP_FIXUP(addr)        (addr)
 >>   #endif /* CONFIG_XIP_KERNEL */
 >>   +struct pt_alloc_ops {
 >> +    pte_t *(*get_pte_virt)(phys_addr_t pa);
 >> +    phys_addr_t (*alloc_pte)(uintptr_t va);
 >> +#ifndef __PAGETABLE_PMD_FOLDED
 >> +    pmd_t *(*get_pmd_virt)(phys_addr_t pa);
 >> +    phys_addr_t (*alloc_pmd)(uintptr_t va);
 >> +    pud_t *(*get_pud_virt)(phys_addr_t pa);
 >> +    phys_addr_t (*alloc_pud)(uintptr_t va);
 >> +#endif
 >> +};
 >> +
 >>   #ifdef CONFIG_MMU
 >>   /* Number of entries in the page global directory */
 >>   #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
 >> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct 
vm_area_struct *vma,
 >>    * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
 >>    */
 >>   #ifdef CONFIG_64BIT
 >> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
 >> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
 >> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
 >>   #else
 >> -#define TASK_SIZE FIXADDR_START
 >> +#define TASK_SIZE    FIXADDR_START
 >> +#define TASK_SIZE_MIN    TASK_SIZE
 >>   #endif
 >>     #else /* CONFIG_MMU */
 >> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
 >>   #define dtb_early_va    _dtb_early_va
 >>   #define dtb_early_pa    _dtb_early_pa
 >>   #endif /* CONFIG_XIP_KERNEL */
 >> +extern u64 satp_mode;
 >> +extern bool pgtable_l4_enabled;
 >>     void paging_init(void);
 >>   void misc_mem_init(void);
 >> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
 >> index 52c5ff9804c5..c3c0ed559770 100644
 >> --- a/arch/riscv/kernel/head.S
 >> +++ b/arch/riscv/kernel/head.S
 >> @@ -95,7 +95,8 @@ relocate:
 >>         /* Compute satp for kernel page tables, but don't load it yet */
 >>       srl a2, a0, PAGE_SHIFT
 >> -    li a1, SATP_MODE
 >> +    la a1, satp_mode
 >> +    REG_L a1, 0(a1)
 >>       or a2, a2, a1
 >>         /*
 >> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
 >> index ee3459cb6750..a7246872bd30 100644
 >> --- a/arch/riscv/mm/context.c
 >> +++ b/arch/riscv/mm/context.c
 >> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, 
unsigned int cpu)
 >>   switch_mm_fast:
 >>       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
 >>             ((cntx & asid_mask) << SATP_ASID_SHIFT) |
 >> -          SATP_MODE);
 >> +          satp_mode);
 >>         if (need_flush_tlb)
 >>           local_flush_tlb_all();
 >> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, 
unsigned int cpu)
 >>   static void set_mm_noasid(struct mm_struct *mm)
 >>   {
 >>       /* Switch the page table and blindly nuke entire local TLB */
 >> -    csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
 >> +    csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
 >>       local_flush_tlb_all();
 >>   }
 >>   diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
 >> index 1552226fb6bd..6a19a1b1caf8 100644
 >> --- a/arch/riscv/mm/init.c
 >> +++ b/arch/riscv/mm/init.c
 >> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
 >>   #define kernel_map    (*(struct kernel_mapping 
*)XIP_FIXUP(&kernel_map))
 >>   #endif
 >>   +#ifdef CONFIG_64BIT
 >> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : 
SATP_MODE_39;
 >> +#else
 >> +u64 satp_mode = SATP_MODE_32;
 >> +#endif
 >> +EXPORT_SYMBOL(satp_mode);
 >> +
 >> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && 
!IS_ENABLED(CONFIG_XIP_KERNEL) ?
 >> +                true : false;
 >> +EXPORT_SYMBOL(pgtable_l4_enabled);
 >> +
 >>   phys_addr_t phys_ram_base __ro_after_init;
 >>   EXPORT_SYMBOL(phys_ram_base);
 >>   @@ -53,15 +64,6 @@ extern char _start[];
 >>   void *_dtb_early_va __initdata;
 >>   uintptr_t _dtb_early_pa __initdata;
 >>   -struct pt_alloc_ops {
 >> -    pte_t *(*get_pte_virt)(phys_addr_t pa);
 >> -    phys_addr_t (*alloc_pte)(uintptr_t va);
 >> -#ifndef __PAGETABLE_PMD_FOLDED
 >> -    pmd_t *(*get_pmd_virt)(phys_addr_t pa);
 >> -    phys_addr_t (*alloc_pmd)(uintptr_t va);
 >> -#endif
 >> -};
 >> -
 >>   static phys_addr_t dma32_phys_limit __initdata;
 >>     static void __init zone_sizes_init(void)
 >> @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
 >>   }
 >>     #ifdef CONFIG_MMU
 >> -static struct pt_alloc_ops _pt_ops __initdata;
 >> +struct pt_alloc_ops _pt_ops __initdata;
 >>     #ifdef CONFIG_XIP_KERNEL
 >>   #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
 >> @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] 
__page_aligned_bss;
 >>   static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
 >>     pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
 >> +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata 
__aligned(PAGE_SIZE);
 >>   static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata 
__aligned(PAGE_SIZE);
 >>     #ifdef CONFIG_XIP_KERNEL
 >> @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata 
__aligned(PAGE_SIZE);
 >>   #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
 >>   #endif /* CONFIG_XIP_KERNEL */
 >>   +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
 >> +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
 >> +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
 >> +
 >> +#ifdef CONFIG_XIP_KERNEL
 >> +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
 >> +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
 >> +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
 >> +#endif /* CONFIG_XIP_KERNEL */
 >> +
 >>   static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
 >>   {
 >>       /* Before MMU is enabled */
 >> @@ -345,7 +358,7 @@ static pmd_t *__init 
get_pmd_virt_late(phys_addr_t pa)
 >>     static phys_addr_t __init alloc_pmd_early(uintptr_t va)
 >>   {
 >> -    BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
 >> +    BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
 >>         return (uintptr_t)early_pmd;
 >>   }
 >> @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
 >>       create_pte_mapping(ptep, va, pa, sz, prot);
 >>   }
 >>   -#define pgd_next_t        pmd_t
 >> -#define alloc_pgd_next(__va)    pt_ops.alloc_pmd(__va)
 >> -#define get_pgd_next_virt(__pa) pt_ops.get_pmd_virt(__pa)
 >> +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
 >> +{
 >> +    return (pud_t *)((uintptr_t)pa);
 >> +}
 >> +
 >> +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
 >> +{
 >> +    clear_fixmap(FIX_PUD);
 >> +    return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
 >> +}
 >> +
 >> +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
 >> +{
 >> +    return (pud_t *)__va(pa);
 >> +}
 >> +
 >> +static phys_addr_t __init alloc_pud_early(uintptr_t va)
 >> +{
 >> +    /* Only one PUD is available for early mapping */
 >> +    BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
 >> +
 >> +    return (uintptr_t)early_pud;
 >> +}
 >> +
 >> +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
 >> +{
 >> +    return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
 >> +}
 >> +
 >> +static phys_addr_t alloc_pud_late(uintptr_t va)
 >> +{
 >> +    unsigned long vaddr;
 >> +
 >> +    vaddr = __get_free_page(GFP_KERNEL);
 >> +    BUG_ON(!vaddr);
 >> +    return __pa(vaddr);
 >> +}
 >> +
 >> +static void __init create_pud_mapping(pud_t *pudp,
 >> +                      uintptr_t va, phys_addr_t pa,
 >> +                      phys_addr_t sz, pgprot_t prot)
 >> +{
 >> +    pmd_t *nextp;
 >> +    phys_addr_t next_phys;
 >> +    uintptr_t pud_index = pud_index(va);
 >> +
 >> +    if (sz == PUD_SIZE) {
 >> +        if (pud_val(pudp[pud_index]) == 0)
 >> +            pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
 >> +        return;
 >> +    }
 >> +
 >> +    if (pud_val(pudp[pud_index]) == 0) {
 >> +        next_phys = pt_ops.alloc_pmd(va);
 >> +        pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
 >> +        nextp = pt_ops.get_pmd_virt(next_phys);
 >> +        memset(nextp, 0, PAGE_SIZE);
 >> +    } else {
 >> +        next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
 >> +        nextp = pt_ops.get_pmd_virt(next_phys);
 >> +    }
 >> +
 >> +    create_pmd_mapping(nextp, va, pa, sz, prot);
 >> +}
 >> +
 >> +#define pgd_next_t        pud_t
 >> +#define alloc_pgd_next(__va)    (pgtable_l4_enabled ?            \
 >> +        pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
 >> +#define get_pgd_next_virt(__pa)    (pgtable_l4_enabled ?            \
 >> +        pt_ops.get_pud_virt(__pa) : (pgd_next_t 
*)pt_ops.get_pmd_virt(__pa))
 >>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, 
__prot)    \
 >> -    create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
 >> -#define fixmap_pgd_next        fixmap_pmd
 >> +                (pgtable_l4_enabled ?            \
 >> +        create_pud_mapping(__nextp, __va, __pa, __sz, __prot) :    \
 >> +        create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
 >> +#define fixmap_pgd_next        (pgtable_l4_enabled ?            \
 >> +        (uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
 >> +#define trampoline_pgd_next    (pgtable_l4_enabled ?            \
 >> +        (uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
 >> +#define early_dtb_pgd_next    (pgtable_l4_enabled ?            \
 >> +        (uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
 >>   #else
 >>   #define pgd_next_t        pte_t
 >>   #define alloc_pgd_next(__va)    pt_ops.alloc_pte(__va)
 >>   #define get_pgd_next_virt(__pa) pt_ops.get_pte_virt(__pa)
 >>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, 
__prot)    \
 >>       create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
 >> -#define fixmap_pgd_next        fixmap_pte
 >> +#define fixmap_pgd_next        ((uintptr_t)fixmap_pte)
 >> +#define early_dtb_pgd_next    ((uintptr_t)early_dtb_pmd)
 >> +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
 >>   #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
 >> -#endif
 >> +#endif /* __PAGETABLE_PMD_FOLDED */
 >>     void __init create_pgd_mapping(pgd_t *pgdp,
 >>                         uintptr_t va, phys_addr_t pa,
 >> @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
 >>   }
 >>   #endif /* CONFIG_STRICT_KERNEL_RWX */
 >>   +#ifdef CONFIG_64BIT
 >> +static void __init disable_pgtable_l4(void)
 >> +{
 >> +    pgtable_l4_enabled = false;
 >> +    kernel_map.page_offset = PAGE_OFFSET_L3;
 >> +    satp_mode = SATP_MODE_39;
 >> +}
 >> +
 >> +/*
 >> + * There is a simple way to determine if 4-level is supported by the
 >> + * underlying hardware: establish 1:1 mapping in 4-level page table 
mode
 >> + * then read SATP to see if the configuration was taken into account
 >> + * meaning sv48 is supported.
 >> + */
 >> +static __init void set_satp_mode(void)
 >> +{
 >> +    u64 identity_satp, hw_satp;
 >> +    uintptr_t set_satp_mode_pmd;
 >> +
 >> +    set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
 >> +    create_pgd_mapping(early_pg_dir,
 >> +               set_satp_mode_pmd, (uintptr_t)early_pud,
 >> +               PGDIR_SIZE, PAGE_TABLE);
 >> +    create_pud_mapping(early_pud,
 >> +               set_satp_mode_pmd, (uintptr_t)early_pmd,
 >> +               PUD_SIZE, PAGE_TABLE);
 >> +    /* Handle the case where set_satp_mode straddles 2 PMDs */
 >> +    create_pmd_mapping(early_pmd,
 >> +               set_satp_mode_pmd, set_satp_mode_pmd,
 >> +               PMD_SIZE, PAGE_KERNEL_EXEC);
 >> +    create_pmd_mapping(early_pmd,
 >> +               set_satp_mode_pmd + PMD_SIZE,
 >> +               set_satp_mode_pmd + PMD_SIZE,
 >> +               PMD_SIZE, PAGE_KERNEL_EXEC);
 >> +
 >> +    identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
 >> +
 >> +    local_flush_tlb_all();
 >> +    csr_write(CSR_SATP, identity_satp);
 >> +    hw_satp = csr_swap(CSR_SATP, 0ULL);
 >> +    local_flush_tlb_all();
 >> +
 >> +    if (hw_satp != identity_satp)
 >> +        disable_pgtable_l4();
 >> +
 >> +    memset(early_pg_dir, 0, PAGE_SIZE);
 >> +    memset(early_pud, 0, PAGE_SIZE);
 >> +    memset(early_pmd, 0, PAGE_SIZE);
 >> +}
 >> +#endif
 >> +
 >>   /*
 >>    * setup_vm() is called from head.S with MMU-off.
 >>    *
 >> @@ -557,10 +697,15 @@ static void __init 
create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
 >>       uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
 >>         create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
 >> -               IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd 
: pa,
 >> +               IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
 >>                  PGDIR_SIZE,
 >>                  IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
 >>   +    if (pgtable_l4_enabled) {
 >> +        create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
 >> +                   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
 >> +    }
 >> +
 >>       if (IS_ENABLED(CONFIG_64BIT)) {
 >>           create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
 >>                      pa, PMD_SIZE, PAGE_KERNEL);
 >> @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
 >>   #ifndef __PAGETABLE_PMD_FOLDED
 >>       pt_ops.alloc_pmd = alloc_pmd_early;
 >>       pt_ops.get_pmd_virt = get_pmd_virt_early;
 >> +    pt_ops.alloc_pud = alloc_pud_early;
 >> +    pt_ops.get_pud_virt = get_pud_virt_early;
 >>   #endif
 >>   }
 >>   @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
 >>   #ifndef __PAGETABLE_PMD_FOLDED
 >>       pt_ops.alloc_pmd = 
kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
 >>       pt_ops.get_pmd_virt = 
kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
 >> +    pt_ops.alloc_pud = 
kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
 >> +    pt_ops.get_pud_virt = 
kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
 >>   #endif
 >>   }
 >>   @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
 >>   #ifndef __PAGETABLE_PMD_FOLDED
 >>       pt_ops.alloc_pmd = alloc_pmd_late;
 >>       pt_ops.get_pmd_virt = get_pmd_virt_late;
 >> +    pt_ops.alloc_pud = alloc_pud_late;
 >> +    pt_ops.get_pud_virt = get_pud_virt_late;
 >>   #endif
 >>   }
 >>   @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 >>       pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
 >>         kernel_map.virt_addr = KERNEL_LINK_ADDR;
 >> +    kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
 >>     #ifdef CONFIG_XIP_KERNEL
 >>       kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
 >> @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 >>       kernel_map.phys_addr = (uintptr_t)(&_start);
 >>       kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
 >>   #endif
 >> +
 >> +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
 >> +    set_satp_mode();
 >> +#endif
 >> +
 >>       kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
 >>       kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - 
kernel_map.phys_addr;
 >>   @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t 
dtb_pa)
 >>         /* Setup early PGD for fixmap */
 >>       create_pgd_mapping(early_pg_dir, FIXADDR_START,
 >> -               (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 >> +               fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 >>     #ifndef __PAGETABLE_PMD_FOLDED
 >> -    /* Setup fixmap PMD */
 >> +    /* Setup fixmap PUD and PMD */
 >> +    if (pgtable_l4_enabled)
 >> +        create_pud_mapping(fixmap_pud, FIXADDR_START,
 >> +                   (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
 >>       create_pmd_mapping(fixmap_pmd, FIXADDR_START,
 >>                  (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
 >>       /* Setup trampoline PGD and PMD */
 >>       create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
 >> -               (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
 >> +               trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 >> +    if (pgtable_l4_enabled)
 >> +        create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
 >> +                   (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
 >>   #ifdef CONFIG_XIP_KERNEL
 >>       create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
 >>                  kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
 >> @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 >>        * Bootime fixmap only can handle PMD_SIZE mapping. Thus, 
boot-ioremap
 >>        * range can not span multiple pmds.
 >>        */
 >> -    BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 >> +    BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 >>                != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
 >>     #ifndef __PAGETABLE_PMD_FOLDED
 >> @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
 >>       /* Clear fixmap PTE and PMD mappings */
 >>       clear_fixmap(FIX_PTE);
 >>       clear_fixmap(FIX_PMD);
 >> +    clear_fixmap(FIX_PUD);
 >>         /* Move to swapper page table */
 >> -    csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | 
SATP_MODE);
 >> +    csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | 
satp_mode);
 >>       local_flush_tlb_all();
 >>         pt_ops_set_late();
 >> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
 >> index 1434a0225140..993f50571a3b 100644
 >> --- a/arch/riscv/mm/kasan_init.c
 >> +++ b/arch/riscv/mm/kasan_init.c
 >> @@ -11,7 +11,29 @@
 >>   #include <asm/fixmap.h>
 >>   #include <asm/pgalloc.h>
 >>   +/*
 >> + * Kasan shadow region must lie at a fixed address across sv39, 
sv48 and sv57
 >> + * which is right before the kernel.
 >> + *
 >> + * For sv39, the region is aligned on PGDIR_SIZE so we only need to 
populate
 >> + * the page global directory with kasan_early_shadow_pmd.
 >> + *
 >> + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so 
the mapping
 >> + * must be divided as follows:
 >> + * - the first PGD entry, although incomplete, is populated with
 >> + *   kasan_early_shadow_pud/p4d
 >> + * - the PGD entries in the middle are populated with 
kasan_early_shadow_pud/p4d
 >> + * - the last PGD entry is shared with the kernel mapping so 
populated at the
 >> + *   lower levels pud/p4d
 >> + *
 >> + * In addition, when shallow populating a kasan region (for example 
vmalloc),
 >> + * this region may also not be aligned on PGDIR size, so we must go 
down to the
 >> + * pud level too.
 >> + */
 >> +
 >>   extern pgd_t early_pg_dir[PTRS_PER_PGD];
 >> +extern struct pt_alloc_ops _pt_ops __initdata;
 >> +#define pt_ops    _pt_ops
 >>     static void __init kasan_populate_pte(pmd_t *pmd, unsigned long 
vaddr, unsigned long end)
 >>   {
 >> @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t 
*pmd, unsigned long vaddr, unsigned
 >>       set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
 >>   }
 >>   -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long 
vaddr, unsigned long end)
 >> +static void __init kasan_populate_pmd(pud_t *pud, unsigned long 
vaddr, unsigned long end)
 >>   {
 >>       phys_addr_t phys_addr;
 >>       pmd_t *pmdp, *base_pmd;
 >>       unsigned long next;
 >>   -    base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
 >> -    if (base_pmd == lm_alias(kasan_early_shadow_pmd))
 >> +    if (pud_none(*pud)) {
 >>           base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), 
PAGE_SIZE);
 >> +    } else {
 >> +        base_pmd = (pmd_t *)pud_pgtable(*pud);
 >> +        if (base_pmd == lm_alias(kasan_early_shadow_pmd))
 >> +            base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), 
PAGE_SIZE);
 >> +    }
 >>         pmdp = base_pmd + pmd_index(vaddr);
 >>   @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t 
*pgd, unsigned long vaddr, unsigned
 >>        * it entirely, memblock could allocate a page at a physical 
address
 >>        * where KASAN is not populated yet and then we'd get a page 
fault.
 >>        */
 >> -    set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
 >> +    set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
 >> +}
 >> +
 >> +static void __init kasan_populate_pud(pgd_t *pgd,
 >> +                      unsigned long vaddr, unsigned long end,
 >> +                      bool early)
 >> +{
 >> +    phys_addr_t phys_addr;
 >> +    pud_t *pudp, *base_pud;
 >> +    unsigned long next;
 >> +
 >> +    if (early) {
 >> +        /*
 >> +         * We can't use pgd_page_vaddr here as it would return a linear
 >> +         * mapping address but it is not mapped yet, but when 
populating
 >> +         * early_pg_dir, we need the physical address and when 
populating
 >> +         * swapper_pg_dir, we need the kernel virtual address so use
 >> +         * pt_ops facility.
 >> +         */
 >> +        base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
 >> +    } else {
 >> +        base_pud = (pud_t *)pgd_page_vaddr(*pgd);
 >> +        if (base_pud == lm_alias(kasan_early_shadow_pud))
 >> +            base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), 
PAGE_SIZE);
 >> +    }
 >> +
 >> +    pudp = base_pud + pud_index(vaddr);
 >> +
 >> +    do {
 >> +        next = pud_addr_end(vaddr, end);
 >> +
 >> +        if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next 
- vaddr) >= PUD_SIZE) {
 >> +            if (early) {
 >> +                phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
 >> +                set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), 
PAGE_TABLE));
 >> +                continue;
 >> +            } else {
 >> +                phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
 >> +                if (phys_addr) {
 >> +                    set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), 
PAGE_KERNEL));
 >> +                    continue;
 >> +                }
 >> +            }
 >> +        }
 >> +
 >> +        kasan_populate_pmd(pudp, vaddr, next);
 >> +    } while (pudp++, vaddr = next, vaddr != end);
 >> +
 >> +    /*
 >> +     * Wait for the whole PGD to be populated before setting the PGD in
 >> +     * the page table, otherwise, if we did set the PGD before 
populating
 >> +     * it entirely, memblock could allocate a page at a physical 
address
 >> +     * where KASAN is not populated yet and then we'd get a page fault.
 >> +     */
 >> +    if (!early)
 >> +        set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
 >>   }
 >>   +#define kasan_early_shadow_pgd_next (pgtable_l4_enabled ?    \
 >> +                (uintptr_t)kasan_early_shadow_pud : \
 >> +                (uintptr_t)kasan_early_shadow_pmd)
 >> +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)            \
 >> +        (pgtable_l4_enabled ?                        \
 >> +            kasan_populate_pud(pgdp, vaddr, next, early) :        \
 >> +            kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
 >> +
 >>   static void __init kasan_populate_pgd(pgd_t *pgdp,
 >>                         unsigned long vaddr, unsigned long end,
 >>                         bool early)
 >> @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
 >>               }
 >>           }
 >>   -        kasan_populate_pmd(pgdp, vaddr, next);
 >> +        kasan_populate_pgd_next(pgdp, vaddr, next, early);
 >>       } while (pgdp++, vaddr = next, vaddr != end);
 >>   }
 >>   @@ -157,18 +246,54 @@ static void __init kasan_populate(void 
*start, void *end)
 >>       memset(start, KASAN_SHADOW_INIT, end - start);
 >>   }
 >>   +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
 >> +                          unsigned long vaddr, unsigned long end,
 >> +                          bool kasan_populate)
 >> +{
 >> +    unsigned long next;
 >> +    pud_t *pudp, *base_pud;
 >> +    pmd_t *base_pmd;
 >> +    bool is_kasan_pmd;
 >> +
 >> +    base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
 >> +    pudp = base_pud + pud_index(vaddr);
 >> +
 >> +    if (kasan_populate)
 >> +        memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
 >> +               sizeof(pud_t) * PTRS_PER_PUD);
 >> +
 >> +    do {
 >> +        next = pud_addr_end(vaddr, end);
 >> +        is_kasan_pmd = (pud_pgtable(*pudp) == 
lm_alias(kasan_early_shadow_pmd));
 >> +
 >> +        if (is_kasan_pmd) {
 >> +            base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 >> +            set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), 
PAGE_TABLE));
 >> +        }
 >> +    } while (pudp++, vaddr = next, vaddr != end);
 >> +}
 >> +
 >>   static void __init kasan_shallow_populate_pgd(unsigned long vaddr, 
unsigned long end)
 >>   {
 >>       unsigned long next;
 >>       void *p;
 >>       pgd_t *pgd_k = pgd_offset_k(vaddr);
 >> +    bool is_kasan_pgd_next;
 >>         do {
 >>           next = pgd_addr_end(vaddr, end);
 >> -        if (pgd_page_vaddr(*pgd_k) == (unsigned 
long)lm_alias(kasan_early_shadow_pmd)) {
 >> +        is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
 >> +                     (unsigned 
long)lm_alias(kasan_early_shadow_pgd_next));
 >> +
 >> +        if (is_kasan_pgd_next) {
 >>               p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 >>               set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
 >>           }
 >> +
 >> +        if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= 
PGDIR_SIZE)
 >> +            continue;
 >> +
 >> +        kasan_shallow_populate_pud(pgd_k, vaddr, next, 
is_kasan_pgd_next);
 >>       } while (pgd_k++, vaddr = next, vaddr != end);
 >>   }
 >
 >
 > @Qinglin: I can deal with sv57 kasan population if needs be as it is 
a bit tricky and I think it would save you quite some time :)

Thanks so much for you suggestion! And I want to give it a try firstly 
as I am now making new Sv57 patchset :) I will ask for your help when I 
meet any trouble, and thanks again!

Yours,
Qinglin

 >
 >
 >>   diff --git a/drivers/firmware/efi/libstub/efi-stub.c 
b/drivers/firmware/efi/libstub/efi-stub.c
 >> index 26e69788f27a..b3db5d91ed38 100644
 >> --- a/drivers/firmware/efi/libstub/efi-stub.c
 >> +++ b/drivers/firmware/efi/libstub/efi-stub.c
 >> @@ -40,6 +40,8 @@
 >>     #ifdef CONFIG_ARM64
 >>   # define EFI_RT_VIRTUAL_LIMIT    DEFAULT_MAP_WINDOW_64
 >> +#elif defined(CONFIG_RISCV)
 >> +# define EFI_RT_VIRTUAL_LIMIT    TASK_SIZE_MIN
 >>   #else
 >>   # define EFI_RT_VIRTUAL_LIMIT    TASK_SIZE
 >>   #endif


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
@ 2021-12-09  4:32       ` 潘庆霖
  0 siblings, 0 replies; 70+ messages in thread
From: 潘庆霖 @ 2021-12-09  4:32 UTC (permalink / raw)
  To: Alexandre ghiti, Alexandre Ghiti, Jonathan Corbet, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Zong Li, Anup Patel, Atish Patra,
	Christoph Hellwig, Andrey Ryabinin, Alexander Potapenko,
	Andrey Konovalov, Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann,
	Kees Cook, Guo Ren, Heinrich Schuchardt, Mayuresh Chitale,
	linux-doc, linux-riscv, linux-kernel, kasan-dev, linux-efi,
	linux-arch

Hi Alex,

On 2021/12/6 19:05, Alexandre ghiti wrote:
 > On 12/6/21 11:46, Alexandre Ghiti wrote:
 >> By adding a new 4th level of page table, give the possibility to 64bit
 >> kernel to address 2^48 bytes of virtual address: in practice, that 
offers
 >> 128TB of virtual address space to userspace and allows up to 64TB of
 >> physical memory.
 >>
 >> If the underlying hardware does not support sv48, we will automatically
 >> fallback to a standard 3-level page table by folding the new PUD 
level into
 >> PGDIR level. In order to detect HW capabilities at runtime, we
 >> use SATP feature that ignores writes with an unsupported mode.
 >>
 >> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
 >> ---
 >>   arch/riscv/Kconfig                      |   4 +-
 >>   arch/riscv/include/asm/csr.h            |   3 +-
 >>   arch/riscv/include/asm/fixmap.h         |   1 +
 >>   arch/riscv/include/asm/kasan.h          |   6 +-
 >>   arch/riscv/include/asm/page.h           |  14 ++
 >>   arch/riscv/include/asm/pgalloc.h        |  40 +++++
 >>   arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
 >>   arch/riscv/include/asm/pgtable.h        |  24 ++-
 >>   arch/riscv/kernel/head.S                |   3 +-
 >>   arch/riscv/mm/context.c                 |   4 +-
 >>   arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
 >>   arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
 >>   drivers/firmware/efi/libstub/efi-stub.c |   2 +
 >>   13 files changed, 514 insertions(+), 44 deletions(-)
 >>
 >> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
 >> index ac6c0cd9bc29..d28fe0148e13 100644
 >> --- a/arch/riscv/Kconfig
 >> +++ b/arch/riscv/Kconfig
 >> @@ -150,7 +150,7 @@ config PAGE_OFFSET
 >>       hex
 >>       default 0xC0000000 if 32BIT
 >>       default 0x80000000 if 64BIT && !MMU
 >> -    default 0xffffffd800000000 if 64BIT
 >> +    default 0xffffaf8000000000 if 64BIT
 >>     config KASAN_SHADOW_OFFSET
 >>       hex
 >> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
 >>     config PGTABLE_LEVELS
 >>       int
 >> -    default 3 if 64BIT
 >> +    default 4 if 64BIT
 >>       default 2
 >>     config LOCKDEP_SUPPORT
 >> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
 >> index 87ac65696871..3fdb971c7896 100644
 >> --- a/arch/riscv/include/asm/csr.h
 >> +++ b/arch/riscv/include/asm/csr.h
 >> @@ -40,14 +40,13 @@
 >>   #ifndef CONFIG_64BIT
 >>   #define SATP_PPN    _AC(0x003FFFFF, UL)
 >>   #define SATP_MODE_32    _AC(0x80000000, UL)
 >> -#define SATP_MODE    SATP_MODE_32
 >>   #define SATP_ASID_BITS    9
 >>   #define SATP_ASID_SHIFT    22
 >>   #define SATP_ASID_MASK    _AC(0x1FF, UL)
 >>   #else
 >>   #define SATP_PPN    _AC(0x00000FFFFFFFFFFF, UL)
 >>   #define SATP_MODE_39    _AC(0x8000000000000000, UL)
 >> -#define SATP_MODE    SATP_MODE_39
 >> +#define SATP_MODE_48    _AC(0x9000000000000000, UL)
 >>   #define SATP_ASID_BITS    16
 >>   #define SATP_ASID_SHIFT    44
 >>   #define SATP_ASID_MASK    _AC(0xFFFF, UL)
 >> diff --git a/arch/riscv/include/asm/fixmap.h 
b/arch/riscv/include/asm/fixmap.h
 >> index 54cbf07fb4e9..58a718573ad6 100644
 >> --- a/arch/riscv/include/asm/fixmap.h
 >> +++ b/arch/riscv/include/asm/fixmap.h
 >> @@ -24,6 +24,7 @@ enum fixed_addresses {
 >>       FIX_HOLE,
 >>       FIX_PTE,
 >>       FIX_PMD,
 >> +    FIX_PUD,
 >>       FIX_TEXT_POKE1,
 >>       FIX_TEXT_POKE0,
 >>       FIX_EARLYCON_MEM_BASE,
 >> diff --git a/arch/riscv/include/asm/kasan.h 
b/arch/riscv/include/asm/kasan.h
 >> index 743e6ff57996..0b85e363e778 100644
 >> --- a/arch/riscv/include/asm/kasan.h
 >> +++ b/arch/riscv/include/asm/kasan.h
 >> @@ -28,7 +28,11 @@
 >>   #define KASAN_SHADOW_SCALE_SHIFT    3
 >>     #define KASAN_SHADOW_SIZE    (UL(1) << ((VA_BITS - 1) - 
KASAN_SHADOW_SCALE_SHIFT))
 >> -#define KASAN_SHADOW_START    (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
 >> +/*
 >> + * Depending on the size of the virtual address space, the region 
may not be
 >> + * aligned on PGDIR_SIZE, so force its alignment to ease its 
population.
 >> + */
 >> +#define KASAN_SHADOW_START    ((KASAN_SHADOW_END - 
KASAN_SHADOW_SIZE) & PGDIR_MASK)
 >>   #define KASAN_SHADOW_END    MODULES_LOWEST_VADDR
 >>   #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
 >>   diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h
 >> index e03559f9b35e..d089fe46f7d8 100644
 >> --- a/arch/riscv/include/asm/page.h
 >> +++ b/arch/riscv/include/asm/page.h
 >> @@ -31,7 +31,20 @@
 >>    * When not using MMU this corresponds to the first free page in
 >>    * physical memory (aligned on a page boundary).
 >>    */
 >> +#ifdef CONFIG_64BIT
 >> +#ifdef CONFIG_MMU
 >> +#define PAGE_OFFSET        kernel_map.page_offset
 >> +#else
 >> +#define PAGE_OFFSET        _AC(CONFIG_PAGE_OFFSET, UL)
 >> +#endif
 >> +/*
 >> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address 
space so
 >> + * define the PAGE_OFFSET value for SV39.
 >> + */
 >> +#define PAGE_OFFSET_L3        _AC(0xffffffd800000000, UL)
 >> +#else
 >>   #define PAGE_OFFSET        _AC(CONFIG_PAGE_OFFSET, UL)
 >> +#endif /* CONFIG_64BIT */
 >>     /*
 >>    * Half of the kernel address space (half of the entries of the 
page global
 >> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
 >>   #endif /* CONFIG_MMU */
 >>     struct kernel_mapping {
 >> +    unsigned long page_offset;
 >>       unsigned long virt_addr;
 >>       uintptr_t phys_addr;
 >>       uintptr_t size;
 >> diff --git a/arch/riscv/include/asm/pgalloc.h 
b/arch/riscv/include/asm/pgalloc.h
 >> index 0af6933a7100..11823004b87a 100644
 >> --- a/arch/riscv/include/asm/pgalloc.h
 >> +++ b/arch/riscv/include/asm/pgalloc.h
 >> @@ -11,6 +11,8 @@
 >>   #include <asm/tlb.h>
 >>     #ifdef CONFIG_MMU
 >> +#define __HAVE_ARCH_PUD_ALLOC_ONE
 >> +#define __HAVE_ARCH_PUD_FREE
 >>   #include <asm-generic/pgalloc.h>
 >>     static inline void pmd_populate_kernel(struct mm_struct *mm,
 >> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct 
*mm, pud_t *pud, pmd_t *pmd)
 >>         set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 >>   }
 >> +
 >> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, 
pud_t *pud)
 >> +{
 >> +    if (pgtable_l4_enabled) {
 >> +        unsigned long pfn = virt_to_pfn(pud);
 >> +
 >> +        set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 >> +    }
 >> +}
 >> +
 >> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
 >> +                     pud_t *pud)
 >> +{
 >> +    if (pgtable_l4_enabled) {
 >> +        unsigned long pfn = virt_to_pfn(pud);
 >> +
 >> +        set_p4d_safe(p4d,
 >> +                 __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 >> +    }
 >> +}
 >> +
 >> +#define pud_alloc_one pud_alloc_one
 >> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned 
long addr)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return __pud_alloc_one(mm, addr);
 >> +
 >> +    return NULL;
 >> +}
 >> +
 >> +#define pud_free pud_free
 >> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        __pud_free(mm, pud);
 >> +}
 >> +
 >> +#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
 >>   #endif /* __PAGETABLE_PMD_FOLDED */
 >>     static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 >> diff --git a/arch/riscv/include/asm/pgtable-64.h 
b/arch/riscv/include/asm/pgtable-64.h
 >> index 228261aa9628..bbbdd66e5e2f 100644
 >> --- a/arch/riscv/include/asm/pgtable-64.h
 >> +++ b/arch/riscv/include/asm/pgtable-64.h
 >> @@ -8,16 +8,36 @@
 >>     #include <linux/const.h>
 >>   -#define PGDIR_SHIFT     30
 >> +extern bool pgtable_l4_enabled;
 >> +
 >> +#define PGDIR_SHIFT_L3  30
 >> +#define PGDIR_SHIFT_L4  39
 >> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
 >> +
 >> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : 
PGDIR_SHIFT_L3)
 >>   /* Size of region mapped by a page global directory */
 >>   #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
 >>   #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
 >>   +/* pud is folded into pgd in case of 3-level page table */
 >> +#define PUD_SHIFT      30
 >> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
 >> +#define PUD_MASK       (~(PUD_SIZE - 1))
 >> +
 >>   #define PMD_SHIFT       21
 >>   /* Size of region mapped by a page middle directory */
 >>   #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
 >>   #define PMD_MASK        (~(PMD_SIZE - 1))
 >>   +/* Page Upper Directory entry */
 >> +typedef struct {
 >> +    unsigned long pud;
 >> +} pud_t;
 >> +
 >> +#define pud_val(x)      ((x).pud)
 >> +#define __pud(x)        ((pud_t) { (x) })
 >> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
 >> +
 >>   /* Page Middle Directory entry */
 >>   typedef struct {
 >>       unsigned long pmd;
 >> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
 >>       set_pud(pudp, __pud(0));
 >>   }
 >>   +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
 >> +{
 >> +    return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
 >> +}
 >> +
 >> +static inline unsigned long _pud_pfn(pud_t pud)
 >> +{
 >> +    return pud_val(pud) >> _PAGE_PFN_SHIFT;
 >> +}
 >> +
 >>   static inline pmd_t *pud_pgtable(pud_t pud)
 >>   {
 >>       return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
 >> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
 >>       return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
 >>   }
 >>   +#define mm_pud_folded  mm_pud_folded
 >> +static inline bool mm_pud_folded(struct mm_struct *mm)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return false;
 >> +
 >> +    return true;
 >> +}
 >> +
 >> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
 >> +
 >>   static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
 >>   {
 >>       return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
 >> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
 >>   #define pmd_ERROR(e) \
 >>       pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
 >>   +#define pud_ERROR(e)   \
 >> +    pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
 >> +
 >> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        *p4dp = p4d;
 >> +    else
 >> +        set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
 >> +}
 >> +
 >> +static inline int p4d_none(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return (p4d_val(p4d) == 0);
 >> +
 >> +    return 0;
 >> +}
 >> +
 >> +static inline int p4d_present(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return (p4d_val(p4d) & _PAGE_PRESENT);
 >> +
 >> +    return 1;
 >> +}
 >> +
 >> +static inline int p4d_bad(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return !p4d_present(p4d);
 >> +
 >> +    return 0;
 >> +}
 >> +
 >> +static inline void p4d_clear(p4d_t *p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        set_p4d(p4d, __p4d(0));
 >> +}
 >> +
 >> +static inline pud_t *p4d_pgtable(p4d_t p4d)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
 >> +
 >> +    return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
 >> +}
 >> +
 >> +static inline struct page *p4d_page(p4d_t p4d)
 >> +{
 >> +    return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
 >> +}
 >> +
 >> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
 >> +
 >> +#define pud_offset pud_offset
 >> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
 >> +{
 >> +    if (pgtable_l4_enabled)
 >> +        return p4d_pgtable(*p4d) + pud_index(address);
 >> +
 >> +    return (pud_t *)p4d;
 >> +}
 >> +
 >>   #endif /* _ASM_RISCV_PGTABLE_64_H */
 >> diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h
 >> index e1a52e22ad7e..e1c74ef4ead2 100644
 >> --- a/arch/riscv/include/asm/pgtable.h
 >> +++ b/arch/riscv/include/asm/pgtable.h
 >> @@ -51,7 +51,7 @@
 >>    * position vmemmap directly below the VMALLOC region.
 >>    */
 >>   #ifdef CONFIG_64BIT
 >> -#define VA_BITS        39
 >> +#define VA_BITS        (pgtable_l4_enabled ? 48 : 39)
 >>   #else
 >>   #define VA_BITS        32
 >>   #endif
 >> @@ -90,8 +90,7 @@
 >>     #ifndef __ASSEMBLY__
 >>   -/* Page Upper Directory not used in RISC-V */
 >> -#include <asm-generic/pgtable-nopud.h>
 >> +#include <asm-generic/pgtable-nop4d.h>
 >>   #include <asm/page.h>
 >>   #include <asm/tlbflush.h>
 >>   #include <linux/mm_types.h>
 >> @@ -113,6 +112,17 @@
 >>   #define XIP_FIXUP(addr)        (addr)
 >>   #endif /* CONFIG_XIP_KERNEL */
 >>   +struct pt_alloc_ops {
 >> +    pte_t *(*get_pte_virt)(phys_addr_t pa);
 >> +    phys_addr_t (*alloc_pte)(uintptr_t va);
 >> +#ifndef __PAGETABLE_PMD_FOLDED
 >> +    pmd_t *(*get_pmd_virt)(phys_addr_t pa);
 >> +    phys_addr_t (*alloc_pmd)(uintptr_t va);
 >> +    pud_t *(*get_pud_virt)(phys_addr_t pa);
 >> +    phys_addr_t (*alloc_pud)(uintptr_t va);
 >> +#endif
 >> +};
 >> +
 >>   #ifdef CONFIG_MMU
 >>   /* Number of entries in the page global directory */
 >>   #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
 >> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct 
vm_area_struct *vma,
 >>    * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
 >>    */
 >>   #ifdef CONFIG_64BIT
 >> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
 >> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
 >> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
 >>   #else
 >> -#define TASK_SIZE FIXADDR_START
 >> +#define TASK_SIZE    FIXADDR_START
 >> +#define TASK_SIZE_MIN    TASK_SIZE
 >>   #endif
 >>     #else /* CONFIG_MMU */
 >> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
 >>   #define dtb_early_va    _dtb_early_va
 >>   #define dtb_early_pa    _dtb_early_pa
 >>   #endif /* CONFIG_XIP_KERNEL */
 >> +extern u64 satp_mode;
 >> +extern bool pgtable_l4_enabled;
 >>     void paging_init(void);
 >>   void misc_mem_init(void);
 >> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
 >> index 52c5ff9804c5..c3c0ed559770 100644
 >> --- a/arch/riscv/kernel/head.S
 >> +++ b/arch/riscv/kernel/head.S
 >> @@ -95,7 +95,8 @@ relocate:
 >>         /* Compute satp for kernel page tables, but don't load it yet */
 >>       srl a2, a0, PAGE_SHIFT
 >> -    li a1, SATP_MODE
 >> +    la a1, satp_mode
 >> +    REG_L a1, 0(a1)
 >>       or a2, a2, a1
 >>         /*
 >> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
 >> index ee3459cb6750..a7246872bd30 100644
 >> --- a/arch/riscv/mm/context.c
 >> +++ b/arch/riscv/mm/context.c
 >> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, 
unsigned int cpu)
 >>   switch_mm_fast:
 >>       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
 >>             ((cntx & asid_mask) << SATP_ASID_SHIFT) |
 >> -          SATP_MODE);
 >> +          satp_mode);
 >>         if (need_flush_tlb)
 >>           local_flush_tlb_all();
 >> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, 
unsigned int cpu)
 >>   static void set_mm_noasid(struct mm_struct *mm)
 >>   {
 >>       /* Switch the page table and blindly nuke entire local TLB */
 >> -    csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
 >> +    csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
 >>       local_flush_tlb_all();
 >>   }
 >>   diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
 >> index 1552226fb6bd..6a19a1b1caf8 100644
 >> --- a/arch/riscv/mm/init.c
 >> +++ b/arch/riscv/mm/init.c
 >> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
 >>   #define kernel_map    (*(struct kernel_mapping 
*)XIP_FIXUP(&kernel_map))
 >>   #endif
 >>   +#ifdef CONFIG_64BIT
 >> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : 
SATP_MODE_39;
 >> +#else
 >> +u64 satp_mode = SATP_MODE_32;
 >> +#endif
 >> +EXPORT_SYMBOL(satp_mode);
 >> +
 >> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && 
!IS_ENABLED(CONFIG_XIP_KERNEL) ?
 >> +                true : false;
 >> +EXPORT_SYMBOL(pgtable_l4_enabled);
 >> +
 >>   phys_addr_t phys_ram_base __ro_after_init;
 >>   EXPORT_SYMBOL(phys_ram_base);
 >>   @@ -53,15 +64,6 @@ extern char _start[];
 >>   void *_dtb_early_va __initdata;
 >>   uintptr_t _dtb_early_pa __initdata;
 >>   -struct pt_alloc_ops {
 >> -    pte_t *(*get_pte_virt)(phys_addr_t pa);
 >> -    phys_addr_t (*alloc_pte)(uintptr_t va);
 >> -#ifndef __PAGETABLE_PMD_FOLDED
 >> -    pmd_t *(*get_pmd_virt)(phys_addr_t pa);
 >> -    phys_addr_t (*alloc_pmd)(uintptr_t va);
 >> -#endif
 >> -};
 >> -
 >>   static phys_addr_t dma32_phys_limit __initdata;
 >>     static void __init zone_sizes_init(void)
 >> @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
 >>   }
 >>     #ifdef CONFIG_MMU
 >> -static struct pt_alloc_ops _pt_ops __initdata;
 >> +struct pt_alloc_ops _pt_ops __initdata;
 >>     #ifdef CONFIG_XIP_KERNEL
 >>   #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
 >> @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] 
__page_aligned_bss;
 >>   static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
 >>     pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
 >> +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata 
__aligned(PAGE_SIZE);
 >>   static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata 
__aligned(PAGE_SIZE);
 >>     #ifdef CONFIG_XIP_KERNEL
 >> @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata 
__aligned(PAGE_SIZE);
 >>   #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
 >>   #endif /* CONFIG_XIP_KERNEL */
 >>   +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
 >> +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
 >> +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
 >> +
 >> +#ifdef CONFIG_XIP_KERNEL
 >> +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
 >> +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
 >> +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
 >> +#endif /* CONFIG_XIP_KERNEL */
 >> +
 >>   static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
 >>   {
 >>       /* Before MMU is enabled */
 >> @@ -345,7 +358,7 @@ static pmd_t *__init 
get_pmd_virt_late(phys_addr_t pa)
 >>     static phys_addr_t __init alloc_pmd_early(uintptr_t va)
 >>   {
 >> -    BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
 >> +    BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
 >>         return (uintptr_t)early_pmd;
 >>   }
 >> @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
 >>       create_pte_mapping(ptep, va, pa, sz, prot);
 >>   }
 >>   -#define pgd_next_t        pmd_t
 >> -#define alloc_pgd_next(__va)    pt_ops.alloc_pmd(__va)
 >> -#define get_pgd_next_virt(__pa) pt_ops.get_pmd_virt(__pa)
 >> +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
 >> +{
 >> +    return (pud_t *)((uintptr_t)pa);
 >> +}
 >> +
 >> +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
 >> +{
 >> +    clear_fixmap(FIX_PUD);
 >> +    return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
 >> +}
 >> +
 >> +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
 >> +{
 >> +    return (pud_t *)__va(pa);
 >> +}
 >> +
 >> +static phys_addr_t __init alloc_pud_early(uintptr_t va)
 >> +{
 >> +    /* Only one PUD is available for early mapping */
 >> +    BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
 >> +
 >> +    return (uintptr_t)early_pud;
 >> +}
 >> +
 >> +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
 >> +{
 >> +    return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
 >> +}
 >> +
 >> +static phys_addr_t alloc_pud_late(uintptr_t va)
 >> +{
 >> +    unsigned long vaddr;
 >> +
 >> +    vaddr = __get_free_page(GFP_KERNEL);
 >> +    BUG_ON(!vaddr);
 >> +    return __pa(vaddr);
 >> +}
 >> +
 >> +static void __init create_pud_mapping(pud_t *pudp,
 >> +                      uintptr_t va, phys_addr_t pa,
 >> +                      phys_addr_t sz, pgprot_t prot)
 >> +{
 >> +    pmd_t *nextp;
 >> +    phys_addr_t next_phys;
 >> +    uintptr_t pud_index = pud_index(va);
 >> +
 >> +    if (sz == PUD_SIZE) {
 >> +        if (pud_val(pudp[pud_index]) == 0)
 >> +            pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
 >> +        return;
 >> +    }
 >> +
 >> +    if (pud_val(pudp[pud_index]) == 0) {
 >> +        next_phys = pt_ops.alloc_pmd(va);
 >> +        pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
 >> +        nextp = pt_ops.get_pmd_virt(next_phys);
 >> +        memset(nextp, 0, PAGE_SIZE);
 >> +    } else {
 >> +        next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
 >> +        nextp = pt_ops.get_pmd_virt(next_phys);
 >> +    }
 >> +
 >> +    create_pmd_mapping(nextp, va, pa, sz, prot);
 >> +}
 >> +
 >> +#define pgd_next_t        pud_t
 >> +#define alloc_pgd_next(__va)    (pgtable_l4_enabled ?            \
 >> +        pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
 >> +#define get_pgd_next_virt(__pa)    (pgtable_l4_enabled ?            \
 >> +        pt_ops.get_pud_virt(__pa) : (pgd_next_t 
*)pt_ops.get_pmd_virt(__pa))
 >>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, 
__prot)    \
 >> -    create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
 >> -#define fixmap_pgd_next        fixmap_pmd
 >> +                (pgtable_l4_enabled ?            \
 >> +        create_pud_mapping(__nextp, __va, __pa, __sz, __prot) :    \
 >> +        create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
 >> +#define fixmap_pgd_next        (pgtable_l4_enabled ?            \
 >> +        (uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
 >> +#define trampoline_pgd_next    (pgtable_l4_enabled ?            \
 >> +        (uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
 >> +#define early_dtb_pgd_next    (pgtable_l4_enabled ?            \
 >> +        (uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
 >>   #else
 >>   #define pgd_next_t        pte_t
 >>   #define alloc_pgd_next(__va)    pt_ops.alloc_pte(__va)
 >>   #define get_pgd_next_virt(__pa) pt_ops.get_pte_virt(__pa)
 >>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, 
__prot)    \
 >>       create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
 >> -#define fixmap_pgd_next        fixmap_pte
 >> +#define fixmap_pgd_next        ((uintptr_t)fixmap_pte)
 >> +#define early_dtb_pgd_next    ((uintptr_t)early_dtb_pmd)
 >> +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
 >>   #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
 >> -#endif
 >> +#endif /* __PAGETABLE_PMD_FOLDED */
 >>     void __init create_pgd_mapping(pgd_t *pgdp,
 >>                         uintptr_t va, phys_addr_t pa,
 >> @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
 >>   }
 >>   #endif /* CONFIG_STRICT_KERNEL_RWX */
 >>   +#ifdef CONFIG_64BIT
 >> +static void __init disable_pgtable_l4(void)
 >> +{
 >> +    pgtable_l4_enabled = false;
 >> +    kernel_map.page_offset = PAGE_OFFSET_L3;
 >> +    satp_mode = SATP_MODE_39;
 >> +}
 >> +
 >> +/*
 >> + * There is a simple way to determine if 4-level is supported by the
 >> + * underlying hardware: establish 1:1 mapping in 4-level page table 
mode
 >> + * then read SATP to see if the configuration was taken into account
 >> + * meaning sv48 is supported.
 >> + */
 >> +static __init void set_satp_mode(void)
 >> +{
 >> +    u64 identity_satp, hw_satp;
 >> +    uintptr_t set_satp_mode_pmd;
 >> +
 >> +    set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
 >> +    create_pgd_mapping(early_pg_dir,
 >> +               set_satp_mode_pmd, (uintptr_t)early_pud,
 >> +               PGDIR_SIZE, PAGE_TABLE);
 >> +    create_pud_mapping(early_pud,
 >> +               set_satp_mode_pmd, (uintptr_t)early_pmd,
 >> +               PUD_SIZE, PAGE_TABLE);
 >> +    /* Handle the case where set_satp_mode straddles 2 PMDs */
 >> +    create_pmd_mapping(early_pmd,
 >> +               set_satp_mode_pmd, set_satp_mode_pmd,
 >> +               PMD_SIZE, PAGE_KERNEL_EXEC);
 >> +    create_pmd_mapping(early_pmd,
 >> +               set_satp_mode_pmd + PMD_SIZE,
 >> +               set_satp_mode_pmd + PMD_SIZE,
 >> +               PMD_SIZE, PAGE_KERNEL_EXEC);
 >> +
 >> +    identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
 >> +
 >> +    local_flush_tlb_all();
 >> +    csr_write(CSR_SATP, identity_satp);
 >> +    hw_satp = csr_swap(CSR_SATP, 0ULL);
 >> +    local_flush_tlb_all();
 >> +
 >> +    if (hw_satp != identity_satp)
 >> +        disable_pgtable_l4();
 >> +
 >> +    memset(early_pg_dir, 0, PAGE_SIZE);
 >> +    memset(early_pud, 0, PAGE_SIZE);
 >> +    memset(early_pmd, 0, PAGE_SIZE);
 >> +}
 >> +#endif
 >> +
 >>   /*
 >>    * setup_vm() is called from head.S with MMU-off.
 >>    *
 >> @@ -557,10 +697,15 @@ static void __init 
create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
 >>       uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
 >>         create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
 >> -               IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd 
: pa,
 >> +               IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
 >>                  PGDIR_SIZE,
 >>                  IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
 >>   +    if (pgtable_l4_enabled) {
 >> +        create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
 >> +                   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
 >> +    }
 >> +
 >>       if (IS_ENABLED(CONFIG_64BIT)) {
 >>           create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
 >>                      pa, PMD_SIZE, PAGE_KERNEL);
 >> @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
 >>   #ifndef __PAGETABLE_PMD_FOLDED
 >>       pt_ops.alloc_pmd = alloc_pmd_early;
 >>       pt_ops.get_pmd_virt = get_pmd_virt_early;
 >> +    pt_ops.alloc_pud = alloc_pud_early;
 >> +    pt_ops.get_pud_virt = get_pud_virt_early;
 >>   #endif
 >>   }
 >>   @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
 >>   #ifndef __PAGETABLE_PMD_FOLDED
 >>       pt_ops.alloc_pmd = 
kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
 >>       pt_ops.get_pmd_virt = 
kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
 >> +    pt_ops.alloc_pud = 
kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
 >> +    pt_ops.get_pud_virt = 
kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
 >>   #endif
 >>   }
 >>   @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
 >>   #ifndef __PAGETABLE_PMD_FOLDED
 >>       pt_ops.alloc_pmd = alloc_pmd_late;
 >>       pt_ops.get_pmd_virt = get_pmd_virt_late;
 >> +    pt_ops.alloc_pud = alloc_pud_late;
 >> +    pt_ops.get_pud_virt = get_pud_virt_late;
 >>   #endif
 >>   }
 >>   @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 >>       pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
 >>         kernel_map.virt_addr = KERNEL_LINK_ADDR;
 >> +    kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
 >>     #ifdef CONFIG_XIP_KERNEL
 >>       kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
 >> @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 >>       kernel_map.phys_addr = (uintptr_t)(&_start);
 >>       kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
 >>   #endif
 >> +
 >> +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
 >> +    set_satp_mode();
 >> +#endif
 >> +
 >>       kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
 >>       kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - 
kernel_map.phys_addr;
 >>   @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t 
dtb_pa)
 >>         /* Setup early PGD for fixmap */
 >>       create_pgd_mapping(early_pg_dir, FIXADDR_START,
 >> -               (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 >> +               fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 >>     #ifndef __PAGETABLE_PMD_FOLDED
 >> -    /* Setup fixmap PMD */
 >> +    /* Setup fixmap PUD and PMD */
 >> +    if (pgtable_l4_enabled)
 >> +        create_pud_mapping(fixmap_pud, FIXADDR_START,
 >> +                   (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
 >>       create_pmd_mapping(fixmap_pmd, FIXADDR_START,
 >>                  (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
 >>       /* Setup trampoline PGD and PMD */
 >>       create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
 >> -               (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
 >> +               trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
 >> +    if (pgtable_l4_enabled)
 >> +        create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
 >> +                   (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
 >>   #ifdef CONFIG_XIP_KERNEL
 >>       create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
 >>                  kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
 >> @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 >>        * Bootime fixmap only can handle PMD_SIZE mapping. Thus, 
boot-ioremap
 >>        * range can not span multiple pmds.
 >>        */
 >> -    BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 >> +    BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 >>                != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
 >>     #ifndef __PAGETABLE_PMD_FOLDED
 >> @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
 >>       /* Clear fixmap PTE and PMD mappings */
 >>       clear_fixmap(FIX_PTE);
 >>       clear_fixmap(FIX_PMD);
 >> +    clear_fixmap(FIX_PUD);
 >>         /* Move to swapper page table */
 >> -    csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | 
SATP_MODE);
 >> +    csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | 
satp_mode);
 >>       local_flush_tlb_all();
 >>         pt_ops_set_late();
 >> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
 >> index 1434a0225140..993f50571a3b 100644
 >> --- a/arch/riscv/mm/kasan_init.c
 >> +++ b/arch/riscv/mm/kasan_init.c
 >> @@ -11,7 +11,29 @@
 >>   #include <asm/fixmap.h>
 >>   #include <asm/pgalloc.h>
 >>   +/*
 >> + * Kasan shadow region must lie at a fixed address across sv39, 
sv48 and sv57
 >> + * which is right before the kernel.
 >> + *
 >> + * For sv39, the region is aligned on PGDIR_SIZE so we only need to 
populate
 >> + * the page global directory with kasan_early_shadow_pmd.
 >> + *
 >> + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so 
the mapping
 >> + * must be divided as follows:
 >> + * - the first PGD entry, although incomplete, is populated with
 >> + *   kasan_early_shadow_pud/p4d
 >> + * - the PGD entries in the middle are populated with 
kasan_early_shadow_pud/p4d
 >> + * - the last PGD entry is shared with the kernel mapping so 
populated at the
 >> + *   lower levels pud/p4d
 >> + *
 >> + * In addition, when shallow populating a kasan region (for example 
vmalloc),
 >> + * this region may also not be aligned on PGDIR size, so we must go 
down to the
 >> + * pud level too.
 >> + */
 >> +
 >>   extern pgd_t early_pg_dir[PTRS_PER_PGD];
 >> +extern struct pt_alloc_ops _pt_ops __initdata;
 >> +#define pt_ops    _pt_ops
 >>     static void __init kasan_populate_pte(pmd_t *pmd, unsigned long 
vaddr, unsigned long end)
 >>   {
 >> @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t 
*pmd, unsigned long vaddr, unsigned
 >>       set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
 >>   }
 >>   -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long 
vaddr, unsigned long end)
 >> +static void __init kasan_populate_pmd(pud_t *pud, unsigned long 
vaddr, unsigned long end)
 >>   {
 >>       phys_addr_t phys_addr;
 >>       pmd_t *pmdp, *base_pmd;
 >>       unsigned long next;
 >>   -    base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
 >> -    if (base_pmd == lm_alias(kasan_early_shadow_pmd))
 >> +    if (pud_none(*pud)) {
 >>           base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), 
PAGE_SIZE);
 >> +    } else {
 >> +        base_pmd = (pmd_t *)pud_pgtable(*pud);
 >> +        if (base_pmd == lm_alias(kasan_early_shadow_pmd))
 >> +            base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), 
PAGE_SIZE);
 >> +    }
 >>         pmdp = base_pmd + pmd_index(vaddr);
 >>   @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t 
*pgd, unsigned long vaddr, unsigned
 >>        * it entirely, memblock could allocate a page at a physical 
address
 >>        * where KASAN is not populated yet and then we'd get a page 
fault.
 >>        */
 >> -    set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
 >> +    set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
 >> +}
 >> +
 >> +static void __init kasan_populate_pud(pgd_t *pgd,
 >> +                      unsigned long vaddr, unsigned long end,
 >> +                      bool early)
 >> +{
 >> +    phys_addr_t phys_addr;
 >> +    pud_t *pudp, *base_pud;
 >> +    unsigned long next;
 >> +
 >> +    if (early) {
 >> +        /*
 >> +         * We can't use pgd_page_vaddr here as it would return a linear
 >> +         * mapping address but it is not mapped yet, but when 
populating
 >> +         * early_pg_dir, we need the physical address and when 
populating
 >> +         * swapper_pg_dir, we need the kernel virtual address so use
 >> +         * pt_ops facility.
 >> +         */
 >> +        base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
 >> +    } else {
 >> +        base_pud = (pud_t *)pgd_page_vaddr(*pgd);
 >> +        if (base_pud == lm_alias(kasan_early_shadow_pud))
 >> +            base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), 
PAGE_SIZE);
 >> +    }
 >> +
 >> +    pudp = base_pud + pud_index(vaddr);
 >> +
 >> +    do {
 >> +        next = pud_addr_end(vaddr, end);
 >> +
 >> +        if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next 
- vaddr) >= PUD_SIZE) {
 >> +            if (early) {
 >> +                phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
 >> +                set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), 
PAGE_TABLE));
 >> +                continue;
 >> +            } else {
 >> +                phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
 >> +                if (phys_addr) {
 >> +                    set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), 
PAGE_KERNEL));
 >> +                    continue;
 >> +                }
 >> +            }
 >> +        }
 >> +
 >> +        kasan_populate_pmd(pudp, vaddr, next);
 >> +    } while (pudp++, vaddr = next, vaddr != end);
 >> +
 >> +    /*
 >> +     * Wait for the whole PGD to be populated before setting the PGD in
 >> +     * the page table, otherwise, if we did set the PGD before 
populating
 >> +     * it entirely, memblock could allocate a page at a physical 
address
 >> +     * where KASAN is not populated yet and then we'd get a page fault.
 >> +     */
 >> +    if (!early)
 >> +        set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
 >>   }
 >>   +#define kasan_early_shadow_pgd_next (pgtable_l4_enabled ?    \
 >> +                (uintptr_t)kasan_early_shadow_pud : \
 >> +                (uintptr_t)kasan_early_shadow_pmd)
 >> +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)            \
 >> +        (pgtable_l4_enabled ?                        \
 >> +            kasan_populate_pud(pgdp, vaddr, next, early) :        \
 >> +            kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
 >> +
 >>   static void __init kasan_populate_pgd(pgd_t *pgdp,
 >>                         unsigned long vaddr, unsigned long end,
 >>                         bool early)
 >> @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
 >>               }
 >>           }
 >>   -        kasan_populate_pmd(pgdp, vaddr, next);
 >> +        kasan_populate_pgd_next(pgdp, vaddr, next, early);
 >>       } while (pgdp++, vaddr = next, vaddr != end);
 >>   }
 >>   @@ -157,18 +246,54 @@ static void __init kasan_populate(void 
*start, void *end)
 >>       memset(start, KASAN_SHADOW_INIT, end - start);
 >>   }
 >>   +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
 >> +                          unsigned long vaddr, unsigned long end,
 >> +                          bool kasan_populate)
 >> +{
 >> +    unsigned long next;
 >> +    pud_t *pudp, *base_pud;
 >> +    pmd_t *base_pmd;
 >> +    bool is_kasan_pmd;
 >> +
 >> +    base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
 >> +    pudp = base_pud + pud_index(vaddr);
 >> +
 >> +    if (kasan_populate)
 >> +        memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
 >> +               sizeof(pud_t) * PTRS_PER_PUD);
 >> +
 >> +    do {
 >> +        next = pud_addr_end(vaddr, end);
 >> +        is_kasan_pmd = (pud_pgtable(*pudp) == 
lm_alias(kasan_early_shadow_pmd));
 >> +
 >> +        if (is_kasan_pmd) {
 >> +            base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 >> +            set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), 
PAGE_TABLE));
 >> +        }
 >> +    } while (pudp++, vaddr = next, vaddr != end);
 >> +}
 >> +
 >>   static void __init kasan_shallow_populate_pgd(unsigned long vaddr, 
unsigned long end)
 >>   {
 >>       unsigned long next;
 >>       void *p;
 >>       pgd_t *pgd_k = pgd_offset_k(vaddr);
 >> +    bool is_kasan_pgd_next;
 >>         do {
 >>           next = pgd_addr_end(vaddr, end);
 >> -        if (pgd_page_vaddr(*pgd_k) == (unsigned 
long)lm_alias(kasan_early_shadow_pmd)) {
 >> +        is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
 >> +                     (unsigned 
long)lm_alias(kasan_early_shadow_pgd_next));
 >> +
 >> +        if (is_kasan_pgd_next) {
 >>               p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 >>               set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
 >>           }
 >> +
 >> +        if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= 
PGDIR_SIZE)
 >> +            continue;
 >> +
 >> +        kasan_shallow_populate_pud(pgd_k, vaddr, next, 
is_kasan_pgd_next);
 >>       } while (pgd_k++, vaddr = next, vaddr != end);
 >>   }
 >
 >
 > @Qinglin: I can deal with sv57 kasan population if needs be as it is 
a bit tricky and I think it would save you quite some time :)

Thanks so much for you suggestion! And I want to give it a try firstly 
as I am now making new Sv57 patchset :) I will ask for your help when I 
meet any trouble, and thanks again!

Yours,
Qinglin

 >
 >
 >>   diff --git a/drivers/firmware/efi/libstub/efi-stub.c 
b/drivers/firmware/efi/libstub/efi-stub.c
 >> index 26e69788f27a..b3db5d91ed38 100644
 >> --- a/drivers/firmware/efi/libstub/efi-stub.c
 >> +++ b/drivers/firmware/efi/libstub/efi-stub.c
 >> @@ -40,6 +40,8 @@
 >>     #ifdef CONFIG_ARM64
 >>   # define EFI_RT_VIRTUAL_LIMIT    DEFAULT_MAP_WINDOW_64
 >> +#elif defined(CONFIG_RISCV)
 >> +# define EFI_RT_VIRTUAL_LIMIT    TASK_SIZE_MIN
 >>   #else
 >>   # define EFI_RT_VIRTUAL_LIMIT    TASK_SIZE
 >>   #endif


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 10/13] riscv: Improve virtual kernel memory layout dump
  2021-12-09  4:18   ` 潘庆霖
@ 2021-12-09  9:09     ` Alexandre ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre ghiti @ 2021-12-09  9:09 UTC (permalink / raw)
  To: 潘庆霖, Alexandre Ghiti, linux-riscv

On 12/9/21 05:18, 潘庆霖 wrote:
> Hi Alex,
>
>
> On 2021/12/6 18:46, Alexandre Ghiti wrote:
> > With the arrival of sv48 and its large address space, it would be
> > cumbersome to statically define the unit size to use to print the 
> different
> > portions of the virtual memory layout: instead, determine it 
> dynamically.
> >
> > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > ---
> >  arch/riscv/mm/init.c               | 67 +++++++++++++++++++++++-------
> >  drivers/pci/controller/pci-xgene.c |  2 +-
> >  include/linux/sizes.h              |  1 +
> >  3 files changed, 54 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 6a19a1b1caf8..28de6ea0a720 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -79,37 +79,74 @@ static void __init zone_sizes_init(void)
> >  }
> >
> >  #if defined(CONFIG_MMU) && defined(CONFIG_DEBUG_VM)
> > +
> > +#define LOG2_SZ_1K  ilog2(SZ_1K)
> > +#define LOG2_SZ_1M  ilog2(SZ_1M)
> > +#define LOG2_SZ_1G  ilog2(SZ_1G)
> > +#define LOG2_SZ_1T  ilog2(SZ_1T)
> > +
> >  static inline void print_mlk(char *name, unsigned long b, unsigned 
> long t)
> >  {
> >      pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld kB)\n", name, b, t,
> > -          (((t) - (b)) >> 10));
> > +          (((t) - (b)) >> LOG2_SZ_1K));
> >  }
> >
> >  static inline void print_mlm(char *name, unsigned long b, unsigned 
> long t)
> >  {
> >      pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld MB)\n", name, b, t,
> > -          (((t) - (b)) >> 20));
> > +          (((t) - (b)) >> LOG2_SZ_1M));
> > +}
> > +
> > +static inline void print_mlg(char *name, unsigned long b, unsigned 
> long t)
> > +{
> > +    pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld GB)\n", name, b, t,
> > +          (((t) - (b)) >> LOG2_SZ_1G));
> > +}
> > +
> > +#ifdef CONFIG_64BIT
> > +static inline void print_mlt(char *name, unsigned long b, unsigned 
> long t)
> > +{
> > +    pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld TB)\n", name, b, t,
> > +          (((t) - (b)) >> LOG2_SZ_1T));
> > +}
> > +#endif
> > +
> > +static inline void print_ml(char *name, unsigned long b, unsigned 
> long t)
> > +{
> > +    unsigned long diff = t - b;
> > +
> > +#ifdef CONFIG_64BIT
> > +    if ((diff >> LOG2_SZ_1T) >= 10)
> > +        print_mlt(name, b, t);
> > +    else
> > +#endif
> > +    if ((diff >> LOG2_SZ_1G) >= 10)
> > +        print_mlg(name, b, t);
> > +    else if ((diff >> LOG2_SZ_1M) >= 10)
> > +        print_mlm(name, b, t);
> > +    else
> > +        print_mlk(name, b, t);
> >  }
> >
> >  static void __init print_vm_layout(void)
> >  {
> >      pr_notice("Virtual kernel memory layout:\n");
> > -    print_mlk("fixmap", (unsigned long)FIXADDR_START,
> > -          (unsigned long)FIXADDR_TOP);
> > -    print_mlm("pci io", (unsigned long)PCI_IO_START,
> > -          (unsigned long)PCI_IO_END);
> > -    print_mlm("vmemmap", (unsigned long)VMEMMAP_START,
> > -          (unsigned long)VMEMMAP_END);
> > -    print_mlm("vmalloc", (unsigned long)VMALLOC_START,
> > -          (unsigned long)VMALLOC_END);
> > -    print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
> > -          (unsigned long)high_memory);
> > +    print_ml("fixmap", (unsigned long)FIXADDR_START,
> > +         (unsigned long)FIXADDR_TOP);
> > +    print_ml("pci io", (unsigned long)PCI_IO_START,
> > +         (unsigned long)PCI_IO_END);
> > +    print_ml("vmemmap", (unsigned long)VMEMMAP_START,
> > +         (unsigned long)VMEMMAP_END);
> > +    print_ml("vmalloc", (unsigned long)VMALLOC_START,
> > +         (unsigned long)VMALLOC_END);
> > +    print_ml("lowmem", (unsigned long)PAGE_OFFSET,
> > +         (unsigned long)high_memory);
> >  #ifdef CONFIG_64BIT
> >  #ifdef CONFIG_KASAN
> > -    print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
> > +    print_ml("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
> >  #endif
> > -    print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
> > -          (unsigned long)ADDRESS_SPACE_END);
> > +    print_ml("kernel", (unsigned long)KERNEL_LINK_ADDR,
> > +         (unsigned long)ADDRESS_SPACE_END);
> >  #endif
> >  }
> >  #else
> > diff --git a/drivers/pci/controller/pci-xgene.c 
> b/drivers/pci/controller/pci-xgene.c
> > index e64536047b65..187dcf8a9694 100644
> > --- a/drivers/pci/controller/pci-xgene.c
> > +++ b/drivers/pci/controller/pci-xgene.c
> > @@ -21,6 +21,7 @@
> >  #include <linux/pci-ecam.h>
> >  #include <linux/platform_device.h>
> >  #include <linux/slab.h>
> > +#include <linux/sizes.h>
> >
> >  #include "../pci.h"
> >
> > @@ -50,7 +51,6 @@
> >  #define OB_LO_IO            0x00000002
> >  #define XGENE_PCIE_VENDORID        0x10E8
> >  #define XGENE_PCIE_DEVICEID        0xE004
> > -#define SZ_1T                (SZ_1G*1024ULL)
>
> I am trying to apply your patchset on upstream's master or for-next 
> branch. The git repo is
>
> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
>
> and I get a failure. The commit which I apply on is 
> fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
>
> I found the code here on that commit is:
>
> #define OB_LO_IO            0x00000002
> #define XGENE_PCIE_DEVICEID        0xE004
> #define SZ_1T                    (SZ_1G*1024ULL)
> #define PIPE_PHY_RATE_RD(src)        ((0xc000 & (u32)(src)) >> 0xe)
>
> I think it may be the reason why the apply is failed. May I get your 
> help to determine the reason?


I will rebase my patchset on top of v5.16-rc4 shortly, I will fix that, 
this file changed in the meantime.

Thanks,

Alex


>
> Thanks,
> Qinglin
>
>
> >
> >  #define PIPE_PHY_RATE_RD(src)        ((0xc000 & (u32)(src)) >> 0xe)
> >
> >  #define XGENE_V1_PCI_EXP_CAP        0x40
> > diff --git a/include/linux/sizes.h b/include/linux/sizes.h
> > index 1ac79bcee2bb..0bc6cf394b08 100644
> > --- a/include/linux/sizes.h
> > +++ b/include/linux/sizes.h
> > @@ -47,6 +47,7 @@
> >  #define SZ_8G                _AC(0x200000000, ULL)
> >  #define SZ_16G                _AC(0x400000000, ULL)
> >  #define SZ_32G                _AC(0x800000000, ULL)
> > +#define SZ_1T                _AC(0x10000000000, ULL)
> >  #define SZ_64T                _AC(0x400000000000, ULL)
> >
> >  #endif /* __LINUX_SIZES_H__ */
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
  2021-12-06 10:46   ` Alexandre Ghiti
@ 2021-12-20  9:11     ` Guo Ren
  -1 siblings, 0 replies; 70+ messages in thread
From: Guo Ren @ 2021-12-20  9:11 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Tue, Dec 7, 2021 at 11:55 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> Because of the stack canary feature that reads from the current task
> structure the stack canary value, the thread pointer register "tp" must
> be set before calling any C function from head.S: by chance, setup_vm
Shall we disable -fstack-protector for setup_vm() with __attribute__?
Actually, we've already init tp later.

> and all the functions that it calls does not seem to be part of the
> functions where the canary check is done, but in the following commits,
> some functions will.
>
> Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  arch/riscv/kernel/head.S | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index c3c0ed559770..86f7ee3d210d 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -302,6 +302,7 @@ clear_bss_done:
>         REG_S a0, (a2)
>
>         /* Initialize page tables and relocate to virtual addresses */
> +       la tp, init_task
>         la sp, init_thread_union + THREAD_SIZE
>         XIP_FIXUP_OFFSET sp
>  #ifdef CONFIG_BUILTIN_DTB
> --
> 2.32.0
>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
@ 2021-12-20  9:11     ` Guo Ren
  0 siblings, 0 replies; 70+ messages in thread
From: Guo Ren @ 2021-12-20  9:11 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Tue, Dec 7, 2021 at 11:55 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> Because of the stack canary feature that reads from the current task
> structure the stack canary value, the thread pointer register "tp" must
> be set before calling any C function from head.S: by chance, setup_vm
Shall we disable -fstack-protector for setup_vm() with __attribute__?
Actually, we've already init tp later.

> and all the functions that it calls does not seem to be part of the
> functions where the canary check is done, but in the following commits,
> some functions will.
>
> Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  arch/riscv/kernel/head.S | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index c3c0ed559770..86f7ee3d210d 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -302,6 +302,7 @@ clear_bss_done:
>         REG_S a0, (a2)
>
>         /* Initialize page tables and relocate to virtual addresses */
> +       la tp, init_task
>         la sp, init_thread_union + THREAD_SIZE
>         XIP_FIXUP_OFFSET sp
>  #ifdef CONFIG_BUILTIN_DTB
> --
> 2.32.0
>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
  2021-12-20  9:11     ` Guo Ren
@ 2021-12-20  9:17       ` Ard Biesheuvel
  -1 siblings, 0 replies; 70+ messages in thread
From: Ard Biesheuvel @ 2021-12-20  9:17 UTC (permalink / raw)
  To: Guo Ren
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Mon, 20 Dec 2021 at 10:11, Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Dec 7, 2021 at 11:55 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > Because of the stack canary feature that reads from the current task
> > structure the stack canary value, the thread pointer register "tp" must
> > be set before calling any C function from head.S: by chance, setup_vm
> Shall we disable -fstack-protector for setup_vm() with __attribute__?

Don't use __attribute__((optimize())) for that: it is known to be
broken, and documented as debug purposes only in the GCC info pages:

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html




> Actually, we've already init tp later.
>
> > and all the functions that it calls does not seem to be part of the
> > functions where the canary check is done, but in the following commits,
> > some functions will.
> >
> > Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > ---
> >  arch/riscv/kernel/head.S | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index c3c0ed559770..86f7ee3d210d 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -302,6 +302,7 @@ clear_bss_done:
> >         REG_S a0, (a2)
> >
> >         /* Initialize page tables and relocate to virtual addresses */
> > +       la tp, init_task
> >         la sp, init_thread_union + THREAD_SIZE
> >         XIP_FIXUP_OFFSET sp
> >  #ifdef CONFIG_BUILTIN_DTB
> > --
> > 2.32.0
> >
>
>
> --
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
@ 2021-12-20  9:17       ` Ard Biesheuvel
  0 siblings, 0 replies; 70+ messages in thread
From: Ard Biesheuvel @ 2021-12-20  9:17 UTC (permalink / raw)
  To: Guo Ren
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Mon, 20 Dec 2021 at 10:11, Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Dec 7, 2021 at 11:55 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > Because of the stack canary feature that reads from the current task
> > structure the stack canary value, the thread pointer register "tp" must
> > be set before calling any C function from head.S: by chance, setup_vm
> Shall we disable -fstack-protector for setup_vm() with __attribute__?

Don't use __attribute__((optimize())) for that: it is known to be
broken, and documented as debug purposes only in the GCC info pages:

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html




> Actually, we've already init tp later.
>
> > and all the functions that it calls does not seem to be part of the
> > functions where the canary check is done, but in the following commits,
> > some functions will.
> >
> > Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > ---
> >  arch/riscv/kernel/head.S | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index c3c0ed559770..86f7ee3d210d 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -302,6 +302,7 @@ clear_bss_done:
> >         REG_S a0, (a2)
> >
> >         /* Initialize page tables and relocate to virtual addresses */
> > +       la tp, init_task
> >         la sp, init_thread_union + THREAD_SIZE
> >         XIP_FIXUP_OFFSET sp
> >  #ifdef CONFIG_BUILTIN_DTB
> > --
> > 2.32.0
> >
>
>
> --
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
  2021-12-20  9:17       ` Ard Biesheuvel
@ 2021-12-20 13:40         ` Guo Ren
  -1 siblings, 0 replies; 70+ messages in thread
From: Guo Ren @ 2021-12-20 13:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Mon, Dec 20, 2021 at 5:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 20 Dec 2021 at 10:11, Guo Ren <guoren@kernel.org> wrote:
> >
> > On Tue, Dec 7, 2021 at 11:55 AM Alexandre Ghiti
> > <alexandre.ghiti@canonical.com> wrote:
> > >
> > > Because of the stack canary feature that reads from the current task
> > > structure the stack canary value, the thread pointer register "tp" must
> > > be set before calling any C function from head.S: by chance, setup_vm
> > Shall we disable -fstack-protector for setup_vm() with __attribute__?
>
> Don't use __attribute__((optimize())) for that: it is known to be
> broken, and documented as debug purposes only in the GCC info pages:
>
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Oh, thx for the link.

>
>
>
>
> > Actually, we've already init tp later.
> >
> > > and all the functions that it calls does not seem to be part of the
> > > functions where the canary check is done, but in the following commits,
> > > some functions will.
> > >
> > > Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> > > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > > ---
> > >  arch/riscv/kernel/head.S | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > > index c3c0ed559770..86f7ee3d210d 100644
> > > --- a/arch/riscv/kernel/head.S
> > > +++ b/arch/riscv/kernel/head.S
> > > @@ -302,6 +302,7 @@ clear_bss_done:
> > >         REG_S a0, (a2)
> > >
> > >         /* Initialize page tables and relocate to virtual addresses */
> > > +       la tp, init_task
> > >         la sp, init_thread_union + THREAD_SIZE
> > >         XIP_FIXUP_OFFSET sp
> > >  #ifdef CONFIG_BUILTIN_DTB
> > > --
> > > 2.32.0
> > >
> >
> >
> > --
> > Best Regards
> >  Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
@ 2021-12-20 13:40         ` Guo Ren
  0 siblings, 0 replies; 70+ messages in thread
From: Guo Ren @ 2021-12-20 13:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Mon, Dec 20, 2021 at 5:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 20 Dec 2021 at 10:11, Guo Ren <guoren@kernel.org> wrote:
> >
> > On Tue, Dec 7, 2021 at 11:55 AM Alexandre Ghiti
> > <alexandre.ghiti@canonical.com> wrote:
> > >
> > > Because of the stack canary feature that reads from the current task
> > > structure the stack canary value, the thread pointer register "tp" must
> > > be set before calling any C function from head.S: by chance, setup_vm
> > Shall we disable -fstack-protector for setup_vm() with __attribute__?
>
> Don't use __attribute__((optimize())) for that: it is known to be
> broken, and documented as debug purposes only in the GCC info pages:
>
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Oh, thx for the link.

>
>
>
>
> > Actually, we've already init tp later.
> >
> > > and all the functions that it calls does not seem to be part of the
> > > functions where the canary check is done, but in the following commits,
> > > some functions will.
> > >
> > > Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> > > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > > ---
> > >  arch/riscv/kernel/head.S | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > > index c3c0ed559770..86f7ee3d210d 100644
> > > --- a/arch/riscv/kernel/head.S
> > > +++ b/arch/riscv/kernel/head.S
> > > @@ -302,6 +302,7 @@ clear_bss_done:
> > >         REG_S a0, (a2)
> > >
> > >         /* Initialize page tables and relocate to virtual addresses */
> > > +       la tp, init_task
> > >         la sp, init_thread_union + THREAD_SIZE
> > >         XIP_FIXUP_OFFSET sp
> > >  #ifdef CONFIG_BUILTIN_DTB
> > > --
> > > 2.32.0
> > >
> >
> >
> > --
> > Best Regards
> >  Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-06 10:46   ` Alexandre Ghiti
@ 2021-12-26  8:59     ` Jisheng Zhang
  -1 siblings, 0 replies; 70+ messages in thread
From: Jisheng Zhang @ 2021-12-26  8:59 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

On Mon,  6 Dec 2021 11:46:51 +0100
Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:

> By adding a new 4th level of page table, give the possibility to 64bit
> kernel to address 2^48 bytes of virtual address: in practice, that offers
> 128TB of virtual address space to userspace and allows up to 64TB of
> physical memory.
> 
> If the underlying hardware does not support sv48, we will automatically
> fallback to a standard 3-level page table by folding the new PUD level into
> PGDIR level. In order to detect HW capabilities at runtime, we
> use SATP feature that ignores writes with an unsupported mode.
> 
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  arch/riscv/Kconfig                      |   4 +-
>  arch/riscv/include/asm/csr.h            |   3 +-
>  arch/riscv/include/asm/fixmap.h         |   1 +
>  arch/riscv/include/asm/kasan.h          |   6 +-
>  arch/riscv/include/asm/page.h           |  14 ++
>  arch/riscv/include/asm/pgalloc.h        |  40 +++++
>  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
>  arch/riscv/include/asm/pgtable.h        |  24 ++-
>  arch/riscv/kernel/head.S                |   3 +-
>  arch/riscv/mm/context.c                 |   4 +-
>  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
>  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
>  drivers/firmware/efi/libstub/efi-stub.c |   2 +
>  13 files changed, 514 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ac6c0cd9bc29..d28fe0148e13 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -150,7 +150,7 @@ config PAGE_OFFSET
>  	hex
>  	default 0xC0000000 if 32BIT
>  	default 0x80000000 if 64BIT && !MMU
> -	default 0xffffffd800000000 if 64BIT
> +	default 0xffffaf8000000000 if 64BIT
>  
>  config KASAN_SHADOW_OFFSET
>  	hex
> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
>  
>  config PGTABLE_LEVELS
>  	int
> -	default 3 if 64BIT
> +	default 4 if 64BIT
>  	default 2
>  
>  config LOCKDEP_SUPPORT
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 87ac65696871..3fdb971c7896 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -40,14 +40,13 @@
>  #ifndef CONFIG_64BIT
>  #define SATP_PPN	_AC(0x003FFFFF, UL)
>  #define SATP_MODE_32	_AC(0x80000000, UL)
> -#define SATP_MODE	SATP_MODE_32
>  #define SATP_ASID_BITS	9
>  #define SATP_ASID_SHIFT	22
>  #define SATP_ASID_MASK	_AC(0x1FF, UL)
>  #else
>  #define SATP_PPN	_AC(0x00000FFFFFFFFFFF, UL)
>  #define SATP_MODE_39	_AC(0x8000000000000000, UL)
> -#define SATP_MODE	SATP_MODE_39
> +#define SATP_MODE_48	_AC(0x9000000000000000, UL)
>  #define SATP_ASID_BITS	16
>  #define SATP_ASID_SHIFT	44
>  #define SATP_ASID_MASK	_AC(0xFFFF, UL)
> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> index 54cbf07fb4e9..58a718573ad6 100644
> --- a/arch/riscv/include/asm/fixmap.h
> +++ b/arch/riscv/include/asm/fixmap.h
> @@ -24,6 +24,7 @@ enum fixed_addresses {
>  	FIX_HOLE,
>  	FIX_PTE,
>  	FIX_PMD,
> +	FIX_PUD,
>  	FIX_TEXT_POKE1,
>  	FIX_TEXT_POKE0,
>  	FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index 743e6ff57996..0b85e363e778 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,7 +28,11 @@
>  #define KASAN_SHADOW_SCALE_SHIFT	3
>  
>  #define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +/*
> + * Depending on the size of the virtual address space, the region may not be
> + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> + */
> +#define KASAN_SHADOW_START	((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
>  #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
>  #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>  
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index e03559f9b35e..d089fe46f7d8 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,7 +31,20 @@
>   * When not using MMU this corresponds to the first free page in
>   * physical memory (aligned on a page boundary).
>   */
> +#ifdef CONFIG_64BIT
> +#ifdef CONFIG_MMU
> +#define PAGE_OFFSET		kernel_map.page_offset
> +#else
> +#define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif
> +/*
> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> + * define the PAGE_OFFSET value for SV39.
> + */
> +#define PAGE_OFFSET_L3		_AC(0xffffffd800000000, UL)
> +#else
>  #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif /* CONFIG_64BIT */
>  
>  /*
>   * Half of the kernel address space (half of the entries of the page global
> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
>  #endif /* CONFIG_MMU */
>  
>  struct kernel_mapping {
> +	unsigned long page_offset;
>  	unsigned long virt_addr;
>  	uintptr_t phys_addr;
>  	uintptr_t size;
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index 0af6933a7100..11823004b87a 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -11,6 +11,8 @@
>  #include <asm/tlb.h>
>  
>  #ifdef CONFIG_MMU
> +#define __HAVE_ARCH_PUD_ALLOC_ONE
> +#define __HAVE_ARCH_PUD_FREE
>  #include <asm-generic/pgalloc.h>
>  
>  static inline void pmd_populate_kernel(struct mm_struct *mm,
> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>  
>  	set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
>  }
> +
> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> +				     pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d_safe(p4d,
> +			     __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +#define pud_alloc_one pud_alloc_one
> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> +{
> +	if (pgtable_l4_enabled)
> +		return __pud_alloc_one(mm, addr);
> +
> +	return NULL;
> +}
> +
> +#define pud_free pud_free
> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled)
> +		__pud_free(mm, pud);
> +}
> +
> +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
>  #endif /* __PAGETABLE_PMD_FOLDED */
>  
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index 228261aa9628..bbbdd66e5e2f 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -8,16 +8,36 @@
>  
>  #include <linux/const.h>
>  
> -#define PGDIR_SHIFT     30
> +extern bool pgtable_l4_enabled;
> +
> +#define PGDIR_SHIFT_L3  30
> +#define PGDIR_SHIFT_L4  39
> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> +
> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
>  /* Size of region mapped by a page global directory */
>  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
>  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
>  
> +/* pud is folded into pgd in case of 3-level page table */
> +#define PUD_SHIFT      30
> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> +#define PUD_MASK       (~(PUD_SIZE - 1))
> +
>  #define PMD_SHIFT       21
>  /* Size of region mapped by a page middle directory */
>  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
>  #define PMD_MASK        (~(PMD_SIZE - 1))
>  
> +/* Page Upper Directory entry */
> +typedef struct {
> +	unsigned long pud;
> +} pud_t;
> +
> +#define pud_val(x)      ((x).pud)
> +#define __pud(x)        ((pud_t) { (x) })
> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> +
>  /* Page Middle Directory entry */
>  typedef struct {
>  	unsigned long pmd;
> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
>  	set_pud(pudp, __pud(0));
>  }
>  
> +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> +{
> +	return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> +}
> +
> +static inline unsigned long _pud_pfn(pud_t pud)
> +{
> +	return pud_val(pud) >> _PAGE_PFN_SHIFT;
> +}
> +
>  static inline pmd_t *pud_pgtable(pud_t pud)
>  {
>  	return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
>  	return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
>  }
>  
> +#define mm_pud_folded  mm_pud_folded
> +static inline bool mm_pud_folded(struct mm_struct *mm)
> +{
> +	if (pgtable_l4_enabled)
> +		return false;
> +
> +	return true;
> +}
> +
> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> +
>  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
>  {
>  	return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
>  #define pmd_ERROR(e) \
>  	pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
>  
> +#define pud_ERROR(e)   \
> +	pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> +
> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		*p4dp = p4d;
> +	else
> +		set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> +}
> +
> +static inline int p4d_none(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) == 0);
> +
> +	return 0;
> +}
> +
> +static inline int p4d_present(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) & _PAGE_PRESENT);
> +
> +	return 1;
> +}
> +
> +static inline int p4d_bad(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return !p4d_present(p4d);
> +
> +	return 0;
> +}
> +
> +static inline void p4d_clear(p4d_t *p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		set_p4d(p4d, __p4d(0));
> +}
> +
> +static inline pud_t *p4d_pgtable(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +
> +	return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> +}
> +
> +static inline struct page *p4d_page(p4d_t p4d)
> +{
> +	return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +}
> +
> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> +
> +#define pud_offset pud_offset
> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> +{
> +	if (pgtable_l4_enabled)
> +		return p4d_pgtable(*p4d) + pud_index(address);
> +
> +	return (pud_t *)p4d;
> +}
> +
>  #endif /* _ASM_RISCV_PGTABLE_64_H */
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index e1a52e22ad7e..e1c74ef4ead2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -51,7 +51,7 @@
>   * position vmemmap directly below the VMALLOC region.
>   */
>  #ifdef CONFIG_64BIT
> -#define VA_BITS		39
> +#define VA_BITS		(pgtable_l4_enabled ? 48 : 39)
>  #else
>  #define VA_BITS		32
>  #endif
> @@ -90,8 +90,7 @@
>  
>  #ifndef __ASSEMBLY__
>  
> -/* Page Upper Directory not used in RISC-V */
> -#include <asm-generic/pgtable-nopud.h>
> +#include <asm-generic/pgtable-nop4d.h>
>  #include <asm/page.h>
>  #include <asm/tlbflush.h>
>  #include <linux/mm_types.h>
> @@ -113,6 +112,17 @@
>  #define XIP_FIXUP(addr)		(addr)
>  #endif /* CONFIG_XIP_KERNEL */
>  
> +struct pt_alloc_ops {
> +	pte_t *(*get_pte_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pte)(uintptr_t va);
> +#ifndef __PAGETABLE_PMD_FOLDED
> +	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pmd)(uintptr_t va);
> +	pud_t *(*get_pud_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pud)(uintptr_t va);
> +#endif
> +};
> +
>  #ifdef CONFIG_MMU
>  /* Number of entries in the page global directory */
>  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>   */
>  #ifdef CONFIG_64BIT
> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>  #else
> -#define TASK_SIZE FIXADDR_START
> +#define TASK_SIZE	FIXADDR_START
> +#define TASK_SIZE_MIN	TASK_SIZE
>  #endif
>  
>  #else /* CONFIG_MMU */
> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
>  #define dtb_early_va	_dtb_early_va
>  #define dtb_early_pa	_dtb_early_pa
>  #endif /* CONFIG_XIP_KERNEL */
> +extern u64 satp_mode;
> +extern bool pgtable_l4_enabled;
>  
>  void paging_init(void);
>  void misc_mem_init(void);
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 52c5ff9804c5..c3c0ed559770 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -95,7 +95,8 @@ relocate:
>  
>  	/* Compute satp for kernel page tables, but don't load it yet */
>  	srl a2, a0, PAGE_SHIFT
> -	li a1, SATP_MODE
> +	la a1, satp_mode
> +	REG_L a1, 0(a1)
>  	or a2, a2, a1
>  
>  	/*
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index ee3459cb6750..a7246872bd30 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  switch_mm_fast:
>  	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
>  		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> -		  SATP_MODE);
> +		  satp_mode);
>  
>  	if (need_flush_tlb)
>  		local_flush_tlb_all();
> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  static void set_mm_noasid(struct mm_struct *mm)
>  {
>  	/* Switch the page table and blindly nuke entire local TLB */
> -	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> +	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
>  	local_flush_tlb_all();
>  }
>  
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 1552226fb6bd..6a19a1b1caf8 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
>  #define kernel_map	(*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
>  #endif
>  
> +#ifdef CONFIG_64BIT
> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> +#else
> +u64 satp_mode = SATP_MODE_32;
> +#endif
> +EXPORT_SYMBOL(satp_mode);
> +
> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> +				true : false;

Hi Alex,

I'm not sure whether we can use static key for pgtable_l4_enabled or
not. Obviously, for a specific HW platform, pgtable_l4_enabled won't change
after boot, and it seems it sits hot code path, so IMHO, static key maybe
suitable for it.

Thanks


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
@ 2021-12-26  8:59     ` Jisheng Zhang
  0 siblings, 0 replies; 70+ messages in thread
From: Jisheng Zhang @ 2021-12-26  8:59 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

On Mon,  6 Dec 2021 11:46:51 +0100
Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:

> By adding a new 4th level of page table, give the possibility to 64bit
> kernel to address 2^48 bytes of virtual address: in practice, that offers
> 128TB of virtual address space to userspace and allows up to 64TB of
> physical memory.
> 
> If the underlying hardware does not support sv48, we will automatically
> fallback to a standard 3-level page table by folding the new PUD level into
> PGDIR level. In order to detect HW capabilities at runtime, we
> use SATP feature that ignores writes with an unsupported mode.
> 
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  arch/riscv/Kconfig                      |   4 +-
>  arch/riscv/include/asm/csr.h            |   3 +-
>  arch/riscv/include/asm/fixmap.h         |   1 +
>  arch/riscv/include/asm/kasan.h          |   6 +-
>  arch/riscv/include/asm/page.h           |  14 ++
>  arch/riscv/include/asm/pgalloc.h        |  40 +++++
>  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
>  arch/riscv/include/asm/pgtable.h        |  24 ++-
>  arch/riscv/kernel/head.S                |   3 +-
>  arch/riscv/mm/context.c                 |   4 +-
>  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
>  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
>  drivers/firmware/efi/libstub/efi-stub.c |   2 +
>  13 files changed, 514 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ac6c0cd9bc29..d28fe0148e13 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -150,7 +150,7 @@ config PAGE_OFFSET
>  	hex
>  	default 0xC0000000 if 32BIT
>  	default 0x80000000 if 64BIT && !MMU
> -	default 0xffffffd800000000 if 64BIT
> +	default 0xffffaf8000000000 if 64BIT
>  
>  config KASAN_SHADOW_OFFSET
>  	hex
> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
>  
>  config PGTABLE_LEVELS
>  	int
> -	default 3 if 64BIT
> +	default 4 if 64BIT
>  	default 2
>  
>  config LOCKDEP_SUPPORT
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 87ac65696871..3fdb971c7896 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -40,14 +40,13 @@
>  #ifndef CONFIG_64BIT
>  #define SATP_PPN	_AC(0x003FFFFF, UL)
>  #define SATP_MODE_32	_AC(0x80000000, UL)
> -#define SATP_MODE	SATP_MODE_32
>  #define SATP_ASID_BITS	9
>  #define SATP_ASID_SHIFT	22
>  #define SATP_ASID_MASK	_AC(0x1FF, UL)
>  #else
>  #define SATP_PPN	_AC(0x00000FFFFFFFFFFF, UL)
>  #define SATP_MODE_39	_AC(0x8000000000000000, UL)
> -#define SATP_MODE	SATP_MODE_39
> +#define SATP_MODE_48	_AC(0x9000000000000000, UL)
>  #define SATP_ASID_BITS	16
>  #define SATP_ASID_SHIFT	44
>  #define SATP_ASID_MASK	_AC(0xFFFF, UL)
> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> index 54cbf07fb4e9..58a718573ad6 100644
> --- a/arch/riscv/include/asm/fixmap.h
> +++ b/arch/riscv/include/asm/fixmap.h
> @@ -24,6 +24,7 @@ enum fixed_addresses {
>  	FIX_HOLE,
>  	FIX_PTE,
>  	FIX_PMD,
> +	FIX_PUD,
>  	FIX_TEXT_POKE1,
>  	FIX_TEXT_POKE0,
>  	FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index 743e6ff57996..0b85e363e778 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,7 +28,11 @@
>  #define KASAN_SHADOW_SCALE_SHIFT	3
>  
>  #define KASAN_SHADOW_SIZE	(UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START	(KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +/*
> + * Depending on the size of the virtual address space, the region may not be
> + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> + */
> +#define KASAN_SHADOW_START	((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
>  #define KASAN_SHADOW_END	MODULES_LOWEST_VADDR
>  #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>  
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index e03559f9b35e..d089fe46f7d8 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,7 +31,20 @@
>   * When not using MMU this corresponds to the first free page in
>   * physical memory (aligned on a page boundary).
>   */
> +#ifdef CONFIG_64BIT
> +#ifdef CONFIG_MMU
> +#define PAGE_OFFSET		kernel_map.page_offset
> +#else
> +#define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif
> +/*
> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> + * define the PAGE_OFFSET value for SV39.
> + */
> +#define PAGE_OFFSET_L3		_AC(0xffffffd800000000, UL)
> +#else
>  #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
> +#endif /* CONFIG_64BIT */
>  
>  /*
>   * Half of the kernel address space (half of the entries of the page global
> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
>  #endif /* CONFIG_MMU */
>  
>  struct kernel_mapping {
> +	unsigned long page_offset;
>  	unsigned long virt_addr;
>  	uintptr_t phys_addr;
>  	uintptr_t size;
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index 0af6933a7100..11823004b87a 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -11,6 +11,8 @@
>  #include <asm/tlb.h>
>  
>  #ifdef CONFIG_MMU
> +#define __HAVE_ARCH_PUD_ALLOC_ONE
> +#define __HAVE_ARCH_PUD_FREE
>  #include <asm-generic/pgalloc.h>
>  
>  static inline void pmd_populate_kernel(struct mm_struct *mm,
> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>  
>  	set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
>  }
> +
> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> +				     pud_t *pud)
> +{
> +	if (pgtable_l4_enabled) {
> +		unsigned long pfn = virt_to_pfn(pud);
> +
> +		set_p4d_safe(p4d,
> +			     __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +}
> +
> +#define pud_alloc_one pud_alloc_one
> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> +{
> +	if (pgtable_l4_enabled)
> +		return __pud_alloc_one(mm, addr);
> +
> +	return NULL;
> +}
> +
> +#define pud_free pud_free
> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> +{
> +	if (pgtable_l4_enabled)
> +		__pud_free(mm, pud);
> +}
> +
> +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
>  #endif /* __PAGETABLE_PMD_FOLDED */
>  
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index 228261aa9628..bbbdd66e5e2f 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -8,16 +8,36 @@
>  
>  #include <linux/const.h>
>  
> -#define PGDIR_SHIFT     30
> +extern bool pgtable_l4_enabled;
> +
> +#define PGDIR_SHIFT_L3  30
> +#define PGDIR_SHIFT_L4  39
> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> +
> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
>  /* Size of region mapped by a page global directory */
>  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
>  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
>  
> +/* pud is folded into pgd in case of 3-level page table */
> +#define PUD_SHIFT      30
> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> +#define PUD_MASK       (~(PUD_SIZE - 1))
> +
>  #define PMD_SHIFT       21
>  /* Size of region mapped by a page middle directory */
>  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
>  #define PMD_MASK        (~(PMD_SIZE - 1))
>  
> +/* Page Upper Directory entry */
> +typedef struct {
> +	unsigned long pud;
> +} pud_t;
> +
> +#define pud_val(x)      ((x).pud)
> +#define __pud(x)        ((pud_t) { (x) })
> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> +
>  /* Page Middle Directory entry */
>  typedef struct {
>  	unsigned long pmd;
> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
>  	set_pud(pudp, __pud(0));
>  }
>  
> +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> +{
> +	return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> +}
> +
> +static inline unsigned long _pud_pfn(pud_t pud)
> +{
> +	return pud_val(pud) >> _PAGE_PFN_SHIFT;
> +}
> +
>  static inline pmd_t *pud_pgtable(pud_t pud)
>  {
>  	return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
>  	return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
>  }
>  
> +#define mm_pud_folded  mm_pud_folded
> +static inline bool mm_pud_folded(struct mm_struct *mm)
> +{
> +	if (pgtable_l4_enabled)
> +		return false;
> +
> +	return true;
> +}
> +
> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> +
>  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
>  {
>  	return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
>  #define pmd_ERROR(e) \
>  	pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
>  
> +#define pud_ERROR(e)   \
> +	pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> +
> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		*p4dp = p4d;
> +	else
> +		set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> +}
> +
> +static inline int p4d_none(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) == 0);
> +
> +	return 0;
> +}
> +
> +static inline int p4d_present(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (p4d_val(p4d) & _PAGE_PRESENT);
> +
> +	return 1;
> +}
> +
> +static inline int p4d_bad(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return !p4d_present(p4d);
> +
> +	return 0;
> +}
> +
> +static inline void p4d_clear(p4d_t *p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		set_p4d(p4d, __p4d(0));
> +}
> +
> +static inline pud_t *p4d_pgtable(p4d_t p4d)
> +{
> +	if (pgtable_l4_enabled)
> +		return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +
> +	return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> +}
> +
> +static inline struct page *p4d_page(p4d_t p4d)
> +{
> +	return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +}
> +
> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> +
> +#define pud_offset pud_offset
> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> +{
> +	if (pgtable_l4_enabled)
> +		return p4d_pgtable(*p4d) + pud_index(address);
> +
> +	return (pud_t *)p4d;
> +}
> +
>  #endif /* _ASM_RISCV_PGTABLE_64_H */
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index e1a52e22ad7e..e1c74ef4ead2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -51,7 +51,7 @@
>   * position vmemmap directly below the VMALLOC region.
>   */
>  #ifdef CONFIG_64BIT
> -#define VA_BITS		39
> +#define VA_BITS		(pgtable_l4_enabled ? 48 : 39)
>  #else
>  #define VA_BITS		32
>  #endif
> @@ -90,8 +90,7 @@
>  
>  #ifndef __ASSEMBLY__
>  
> -/* Page Upper Directory not used in RISC-V */
> -#include <asm-generic/pgtable-nopud.h>
> +#include <asm-generic/pgtable-nop4d.h>
>  #include <asm/page.h>
>  #include <asm/tlbflush.h>
>  #include <linux/mm_types.h>
> @@ -113,6 +112,17 @@
>  #define XIP_FIXUP(addr)		(addr)
>  #endif /* CONFIG_XIP_KERNEL */
>  
> +struct pt_alloc_ops {
> +	pte_t *(*get_pte_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pte)(uintptr_t va);
> +#ifndef __PAGETABLE_PMD_FOLDED
> +	pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pmd)(uintptr_t va);
> +	pud_t *(*get_pud_virt)(phys_addr_t pa);
> +	phys_addr_t (*alloc_pud)(uintptr_t va);
> +#endif
> +};
> +
>  #ifdef CONFIG_MMU
>  /* Number of entries in the page global directory */
>  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>   */
>  #ifdef CONFIG_64BIT
> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>  #else
> -#define TASK_SIZE FIXADDR_START
> +#define TASK_SIZE	FIXADDR_START
> +#define TASK_SIZE_MIN	TASK_SIZE
>  #endif
>  
>  #else /* CONFIG_MMU */
> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
>  #define dtb_early_va	_dtb_early_va
>  #define dtb_early_pa	_dtb_early_pa
>  #endif /* CONFIG_XIP_KERNEL */
> +extern u64 satp_mode;
> +extern bool pgtable_l4_enabled;
>  
>  void paging_init(void);
>  void misc_mem_init(void);
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 52c5ff9804c5..c3c0ed559770 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -95,7 +95,8 @@ relocate:
>  
>  	/* Compute satp for kernel page tables, but don't load it yet */
>  	srl a2, a0, PAGE_SHIFT
> -	li a1, SATP_MODE
> +	la a1, satp_mode
> +	REG_L a1, 0(a1)
>  	or a2, a2, a1
>  
>  	/*
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index ee3459cb6750..a7246872bd30 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  switch_mm_fast:
>  	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
>  		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> -		  SATP_MODE);
> +		  satp_mode);
>  
>  	if (need_flush_tlb)
>  		local_flush_tlb_all();
> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  static void set_mm_noasid(struct mm_struct *mm)
>  {
>  	/* Switch the page table and blindly nuke entire local TLB */
> -	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> +	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
>  	local_flush_tlb_all();
>  }
>  
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 1552226fb6bd..6a19a1b1caf8 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
>  #define kernel_map	(*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
>  #endif
>  
> +#ifdef CONFIG_64BIT
> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> +#else
> +u64 satp_mode = SATP_MODE_32;
> +#endif
> +EXPORT_SYMBOL(satp_mode);
> +
> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> +				true : false;

Hi Alex,

I'm not sure whether we can use static key for pgtable_l4_enabled or
not. Obviously, for a specific HW platform, pgtable_l4_enabled won't change
after boot, and it seems it sits hot code path, so IMHO, static key maybe
suitable for it.

Thanks


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-06 10:46   ` Alexandre Ghiti
@ 2021-12-29  3:42     ` Guo Ren
  -1 siblings, 0 replies; 70+ messages in thread
From: Guo Ren @ 2021-12-29  3:42 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Tue, Dec 7, 2021 at 11:54 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> By adding a new 4th level of page table, give the possibility to 64bit
> kernel to address 2^48 bytes of virtual address: in practice, that offers
> 128TB of virtual address space to userspace and allows up to 64TB of
> physical memory.
>
> If the underlying hardware does not support sv48, we will automatically
> fallback to a standard 3-level page table by folding the new PUD level into
> PGDIR level. In order to detect HW capabilities at runtime, we
> use SATP feature that ignores writes with an unsupported mode.
>
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  arch/riscv/Kconfig                      |   4 +-
>  arch/riscv/include/asm/csr.h            |   3 +-
>  arch/riscv/include/asm/fixmap.h         |   1 +
>  arch/riscv/include/asm/kasan.h          |   6 +-
>  arch/riscv/include/asm/page.h           |  14 ++
>  arch/riscv/include/asm/pgalloc.h        |  40 +++++
>  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
>  arch/riscv/include/asm/pgtable.h        |  24 ++-
>  arch/riscv/kernel/head.S                |   3 +-
>  arch/riscv/mm/context.c                 |   4 +-
>  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
>  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
>  drivers/firmware/efi/libstub/efi-stub.c |   2 +
>  13 files changed, 514 insertions(+), 44 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ac6c0cd9bc29..d28fe0148e13 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -150,7 +150,7 @@ config PAGE_OFFSET
>         hex
>         default 0xC0000000 if 32BIT
>         default 0x80000000 if 64BIT && !MMU
> -       default 0xffffffd800000000 if 64BIT
> +       default 0xffffaf8000000000 if 64BIT
>
>  config KASAN_SHADOW_OFFSET
>         hex
> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
>
>  config PGTABLE_LEVELS
>         int
> -       default 3 if 64BIT
> +       default 4 if 64BIT
>         default 2
>
>  config LOCKDEP_SUPPORT
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 87ac65696871..3fdb971c7896 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -40,14 +40,13 @@
>  #ifndef CONFIG_64BIT
>  #define SATP_PPN       _AC(0x003FFFFF, UL)
>  #define SATP_MODE_32   _AC(0x80000000, UL)
> -#define SATP_MODE      SATP_MODE_32
>  #define SATP_ASID_BITS 9
>  #define SATP_ASID_SHIFT        22
>  #define SATP_ASID_MASK _AC(0x1FF, UL)
>  #else
>  #define SATP_PPN       _AC(0x00000FFFFFFFFFFF, UL)
>  #define SATP_MODE_39   _AC(0x8000000000000000, UL)
> -#define SATP_MODE      SATP_MODE_39
> +#define SATP_MODE_48   _AC(0x9000000000000000, UL)
>  #define SATP_ASID_BITS 16
>  #define SATP_ASID_SHIFT        44
>  #define SATP_ASID_MASK _AC(0xFFFF, UL)
> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> index 54cbf07fb4e9..58a718573ad6 100644
> --- a/arch/riscv/include/asm/fixmap.h
> +++ b/arch/riscv/include/asm/fixmap.h
> @@ -24,6 +24,7 @@ enum fixed_addresses {
>         FIX_HOLE,
>         FIX_PTE,
>         FIX_PMD,
> +       FIX_PUD,
>         FIX_TEXT_POKE1,
>         FIX_TEXT_POKE0,
>         FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index 743e6ff57996..0b85e363e778 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,7 +28,11 @@
>  #define KASAN_SHADOW_SCALE_SHIFT       3
>
>  #define KASAN_SHADOW_SIZE      (UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START     (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +/*
> + * Depending on the size of the virtual address space, the region may not be
> + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> + */
> +#define KASAN_SHADOW_START     ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
>  #define KASAN_SHADOW_END       MODULES_LOWEST_VADDR
>  #define KASAN_SHADOW_OFFSET    _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index e03559f9b35e..d089fe46f7d8 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,7 +31,20 @@
>   * When not using MMU this corresponds to the first free page in
>   * physical memory (aligned on a page boundary).
>   */
> +#ifdef CONFIG_64BIT
> +#ifdef CONFIG_MMU
> +#define PAGE_OFFSET            kernel_map.page_offset
> +#else
> +#define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> +#endif
> +/*
> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> + * define the PAGE_OFFSET value for SV39.
> + */
> +#define PAGE_OFFSET_L3         _AC(0xffffffd800000000, UL)
> +#else
>  #define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> +#endif /* CONFIG_64BIT */
>
>  /*
>   * Half of the kernel address space (half of the entries of the page global
> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
>  #endif /* CONFIG_MMU */
>
>  struct kernel_mapping {
> +       unsigned long page_offset;
>         unsigned long virt_addr;
>         uintptr_t phys_addr;
>         uintptr_t size;
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index 0af6933a7100..11823004b87a 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -11,6 +11,8 @@
>  #include <asm/tlb.h>
>
>  #ifdef CONFIG_MMU
> +#define __HAVE_ARCH_PUD_ALLOC_ONE
> +#define __HAVE_ARCH_PUD_FREE
>  #include <asm-generic/pgalloc.h>
>
>  static inline void pmd_populate_kernel(struct mm_struct *mm,
> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>
>         set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
>  }
> +
> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> +{
> +       if (pgtable_l4_enabled) {
> +               unsigned long pfn = virt_to_pfn(pud);
> +
> +               set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +       }
> +}
> +
> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> +                                    pud_t *pud)
> +{
> +       if (pgtable_l4_enabled) {
> +               unsigned long pfn = virt_to_pfn(pud);
> +
> +               set_p4d_safe(p4d,
> +                            __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +       }
> +}
> +
> +#define pud_alloc_one pud_alloc_one
> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> +{
> +       if (pgtable_l4_enabled)
> +               return __pud_alloc_one(mm, addr);
> +
> +       return NULL;
> +}
> +
> +#define pud_free pud_free
> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> +{
> +       if (pgtable_l4_enabled)
> +               __pud_free(mm, pud);
> +}
> +
> +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
>  #endif /* __PAGETABLE_PMD_FOLDED */
>
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index 228261aa9628..bbbdd66e5e2f 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -8,16 +8,36 @@
>
>  #include <linux/const.h>
>
> -#define PGDIR_SHIFT     30
> +extern bool pgtable_l4_enabled;
> +
> +#define PGDIR_SHIFT_L3  30
> +#define PGDIR_SHIFT_L4  39
> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> +
> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
>  /* Size of region mapped by a page global directory */
>  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
>  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
>
> +/* pud is folded into pgd in case of 3-level page table */
> +#define PUD_SHIFT      30
> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> +#define PUD_MASK       (~(PUD_SIZE - 1))
> +
>  #define PMD_SHIFT       21
>  /* Size of region mapped by a page middle directory */
>  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
>  #define PMD_MASK        (~(PMD_SIZE - 1))
>
> +/* Page Upper Directory entry */
> +typedef struct {
> +       unsigned long pud;
> +} pud_t;
> +
> +#define pud_val(x)      ((x).pud)
> +#define __pud(x)        ((pud_t) { (x) })
> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> +
>  /* Page Middle Directory entry */
>  typedef struct {
>         unsigned long pmd;
> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
>         set_pud(pudp, __pud(0));
>  }
>
> +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> +{
> +       return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> +}
> +
> +static inline unsigned long _pud_pfn(pud_t pud)
> +{
> +       return pud_val(pud) >> _PAGE_PFN_SHIFT;
> +}
> +
>  static inline pmd_t *pud_pgtable(pud_t pud)
>  {
>         return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
>         return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
>  }
>
> +#define mm_pud_folded  mm_pud_folded
> +static inline bool mm_pud_folded(struct mm_struct *mm)
> +{
> +       if (pgtable_l4_enabled)
> +               return false;
> +
> +       return true;
> +}
> +
> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> +
>  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
>  {
>         return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
>  #define pmd_ERROR(e) \
>         pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
>
> +#define pud_ERROR(e)   \
> +       pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> +
> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               *p4dp = p4d;
> +       else
> +               set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> +}
> +
> +static inline int p4d_none(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return (p4d_val(p4d) == 0);
> +
> +       return 0;
> +}
> +
> +static inline int p4d_present(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return (p4d_val(p4d) & _PAGE_PRESENT);
> +
> +       return 1;
> +}
> +
> +static inline int p4d_bad(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return !p4d_present(p4d);
> +
> +       return 0;
> +}
> +
> +static inline void p4d_clear(p4d_t *p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               set_p4d(p4d, __p4d(0));
> +}
> +
> +static inline pud_t *p4d_pgtable(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +
> +       return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> +}
> +
> +static inline struct page *p4d_page(p4d_t p4d)
> +{
> +       return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +}
> +
> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> +
> +#define pud_offset pud_offset
> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> +{
> +       if (pgtable_l4_enabled)
> +               return p4d_pgtable(*p4d) + pud_index(address);
> +
> +       return (pud_t *)p4d;
> +}
> +
>  #endif /* _ASM_RISCV_PGTABLE_64_H */
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index e1a52e22ad7e..e1c74ef4ead2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -51,7 +51,7 @@
>   * position vmemmap directly below the VMALLOC region.
>   */
>  #ifdef CONFIG_64BIT
> -#define VA_BITS                39
> +#define VA_BITS                (pgtable_l4_enabled ? 48 : 39)
>  #else
>  #define VA_BITS                32
>  #endif
> @@ -90,8 +90,7 @@
>
>  #ifndef __ASSEMBLY__
>
> -/* Page Upper Directory not used in RISC-V */
> -#include <asm-generic/pgtable-nopud.h>
> +#include <asm-generic/pgtable-nop4d.h>
>  #include <asm/page.h>
>  #include <asm/tlbflush.h>
>  #include <linux/mm_types.h>
> @@ -113,6 +112,17 @@
>  #define XIP_FIXUP(addr)                (addr)
>  #endif /* CONFIG_XIP_KERNEL */
>
> +struct pt_alloc_ops {
> +       pte_t *(*get_pte_virt)(phys_addr_t pa);
> +       phys_addr_t (*alloc_pte)(uintptr_t va);
> +#ifndef __PAGETABLE_PMD_FOLDED
> +       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> +       phys_addr_t (*alloc_pmd)(uintptr_t va);
> +       pud_t *(*get_pud_virt)(phys_addr_t pa);
> +       phys_addr_t (*alloc_pud)(uintptr_t va);
> +#endif
> +};
> +
>  #ifdef CONFIG_MMU
>  /* Number of entries in the page global directory */
>  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>   */
>  #ifdef CONFIG_64BIT
> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>  #else
> -#define TASK_SIZE FIXADDR_START
> +#define TASK_SIZE      FIXADDR_START
> +#define TASK_SIZE_MIN  TASK_SIZE
This is used by efi-stub.c, rv64 compat patch also need it, we reuse
DEFAULT_MAP_WINDOW_64 macro.

TASK_SIZE_MIN is also okay for me, I think it should be a separate
patch with efi-stub midification.
https://lore.kernel.org/linux-riscv/20211228143958.3409187-9-guoren@kernel.org/

I've merged your patchset with compat tree and we are testing them
together totally & carefully.
https://github.com/c-sky/csky-linux/tree/riscv_compat_v2_sv48_v3

Now, rv32_rootfs & 64_rootfs booting have been passed. But I would
give you tested-by later after totally tested. Your patch set is very
helpful, thx.

ps: Could you give chance let customer choice sv48 or sv39 in dts?


>  #endif
>
>  #else /* CONFIG_MMU */
> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
>  #define dtb_early_va   _dtb_early_va
>  #define dtb_early_pa   _dtb_early_pa
>  #endif /* CONFIG_XIP_KERNEL */
> +extern u64 satp_mode;
> +extern bool pgtable_l4_enabled;
>
>  void paging_init(void);
>  void misc_mem_init(void);
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 52c5ff9804c5..c3c0ed559770 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -95,7 +95,8 @@ relocate:
>
>         /* Compute satp for kernel page tables, but don't load it yet */
>         srl a2, a0, PAGE_SHIFT
> -       li a1, SATP_MODE
> +       la a1, satp_mode
> +       REG_L a1, 0(a1)
>         or a2, a2, a1
>
>         /*
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index ee3459cb6750..a7246872bd30 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  switch_mm_fast:
>         csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
>                   ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> -                 SATP_MODE);
> +                 satp_mode);
>
>         if (need_flush_tlb)
>                 local_flush_tlb_all();
> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  static void set_mm_noasid(struct mm_struct *mm)
>  {
>         /* Switch the page table and blindly nuke entire local TLB */
> -       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> +       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
>         local_flush_tlb_all();
>  }
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 1552226fb6bd..6a19a1b1caf8 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
>  #define kernel_map     (*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
>  #endif
>
> +#ifdef CONFIG_64BIT
> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> +#else
> +u64 satp_mode = SATP_MODE_32;
> +#endif
> +EXPORT_SYMBOL(satp_mode);
> +
> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> +                               true : false;
> +EXPORT_SYMBOL(pgtable_l4_enabled);
> +
>  phys_addr_t phys_ram_base __ro_after_init;
>  EXPORT_SYMBOL(phys_ram_base);
>
> @@ -53,15 +64,6 @@ extern char _start[];
>  void *_dtb_early_va __initdata;
>  uintptr_t _dtb_early_pa __initdata;
>
> -struct pt_alloc_ops {
> -       pte_t *(*get_pte_virt)(phys_addr_t pa);
> -       phys_addr_t (*alloc_pte)(uintptr_t va);
> -#ifndef __PAGETABLE_PMD_FOLDED
> -       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> -       phys_addr_t (*alloc_pmd)(uintptr_t va);
> -#endif
> -};
> -
>  static phys_addr_t dma32_phys_limit __initdata;
>
>  static void __init zone_sizes_init(void)
> @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
>  }
>
>  #ifdef CONFIG_MMU
> -static struct pt_alloc_ops _pt_ops __initdata;
> +struct pt_alloc_ops _pt_ops __initdata;
>
>  #ifdef CONFIG_XIP_KERNEL
>  #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
> @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>  static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>
>  pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
>  static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>
>  #ifdef CONFIG_XIP_KERNEL
> @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>  #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
>  #endif /* CONFIG_XIP_KERNEL */
>
> +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> +
> +#ifdef CONFIG_XIP_KERNEL
> +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
> +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
> +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
> +#endif /* CONFIG_XIP_KERNEL */
> +
>  static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
>  {
>         /* Before MMU is enabled */
> @@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
>
>  static phys_addr_t __init alloc_pmd_early(uintptr_t va)
>  {
> -       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +       BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
>
>         return (uintptr_t)early_pmd;
>  }
> @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
>         create_pte_mapping(ptep, va, pa, sz, prot);
>  }
>
> -#define pgd_next_t             pmd_t
> -#define alloc_pgd_next(__va)   pt_ops.alloc_pmd(__va)
> -#define get_pgd_next_virt(__pa)        pt_ops.get_pmd_virt(__pa)
> +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
> +{
> +       return (pud_t *)((uintptr_t)pa);
> +}
> +
> +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
> +{
> +       clear_fixmap(FIX_PUD);
> +       return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
> +}
> +
> +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
> +{
> +       return (pud_t *)__va(pa);
> +}
> +
> +static phys_addr_t __init alloc_pud_early(uintptr_t va)
> +{
> +       /* Only one PUD is available for early mapping */
> +       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +
> +       return (uintptr_t)early_pud;
> +}
> +
> +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
> +{
> +       return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> +}
> +
> +static phys_addr_t alloc_pud_late(uintptr_t va)
> +{
> +       unsigned long vaddr;
> +
> +       vaddr = __get_free_page(GFP_KERNEL);
> +       BUG_ON(!vaddr);
> +       return __pa(vaddr);
> +}
> +
> +static void __init create_pud_mapping(pud_t *pudp,
> +                                     uintptr_t va, phys_addr_t pa,
> +                                     phys_addr_t sz, pgprot_t prot)
> +{
> +       pmd_t *nextp;
> +       phys_addr_t next_phys;
> +       uintptr_t pud_index = pud_index(va);
> +
> +       if (sz == PUD_SIZE) {
> +               if (pud_val(pudp[pud_index]) == 0)
> +                       pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
> +               return;
> +       }
> +
> +       if (pud_val(pudp[pud_index]) == 0) {
> +               next_phys = pt_ops.alloc_pmd(va);
> +               pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
> +               nextp = pt_ops.get_pmd_virt(next_phys);
> +               memset(nextp, 0, PAGE_SIZE);
> +       } else {
> +               next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
> +               nextp = pt_ops.get_pmd_virt(next_phys);
> +       }
> +
> +       create_pmd_mapping(nextp, va, pa, sz, prot);
> +}
> +
> +#define pgd_next_t             pud_t
> +#define alloc_pgd_next(__va)   (pgtable_l4_enabled ?                   \
> +               pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
> +#define get_pgd_next_virt(__pa)        (pgtable_l4_enabled ?                   \
> +               pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
>  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
> -       create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next                fixmap_pmd
> +                               (pgtable_l4_enabled ?                   \
> +               create_pud_mapping(__nextp, __va, __pa, __sz, __prot) : \
> +               create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
> +#define fixmap_pgd_next                (pgtable_l4_enabled ?                   \
> +               (uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
> +#define trampoline_pgd_next    (pgtable_l4_enabled ?                   \
> +               (uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
> +#define early_dtb_pgd_next     (pgtable_l4_enabled ?                   \
> +               (uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
>  #else
>  #define pgd_next_t             pte_t
>  #define alloc_pgd_next(__va)   pt_ops.alloc_pte(__va)
>  #define get_pgd_next_virt(__pa)        pt_ops.get_pte_virt(__pa)
>  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
>         create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next                fixmap_pte
> +#define fixmap_pgd_next                ((uintptr_t)fixmap_pte)
> +#define early_dtb_pgd_next     ((uintptr_t)early_dtb_pmd)
> +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
>  #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
> -#endif
> +#endif /* __PAGETABLE_PMD_FOLDED */
>
>  void __init create_pgd_mapping(pgd_t *pgdp,
>                                       uintptr_t va, phys_addr_t pa,
> @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
>  }
>  #endif /* CONFIG_STRICT_KERNEL_RWX */
>
> +#ifdef CONFIG_64BIT
> +static void __init disable_pgtable_l4(void)
> +{
> +       pgtable_l4_enabled = false;
> +       kernel_map.page_offset = PAGE_OFFSET_L3;
> +       satp_mode = SATP_MODE_39;
> +}
> +
> +/*
> + * There is a simple way to determine if 4-level is supported by the
> + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> + * then read SATP to see if the configuration was taken into account
> + * meaning sv48 is supported.
> + */
> +static __init void set_satp_mode(void)
> +{
> +       u64 identity_satp, hw_satp;
> +       uintptr_t set_satp_mode_pmd;
> +
> +       set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> +       create_pgd_mapping(early_pg_dir,
> +                          set_satp_mode_pmd, (uintptr_t)early_pud,
> +                          PGDIR_SIZE, PAGE_TABLE);
> +       create_pud_mapping(early_pud,
> +                          set_satp_mode_pmd, (uintptr_t)early_pmd,
> +                          PUD_SIZE, PAGE_TABLE);
> +       /* Handle the case where set_satp_mode straddles 2 PMDs */
> +       create_pmd_mapping(early_pmd,
> +                          set_satp_mode_pmd, set_satp_mode_pmd,
> +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> +       create_pmd_mapping(early_pmd,
> +                          set_satp_mode_pmd + PMD_SIZE,
> +                          set_satp_mode_pmd + PMD_SIZE,
> +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> +
> +       identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> +
> +       local_flush_tlb_all();
> +       csr_write(CSR_SATP, identity_satp);
> +       hw_satp = csr_swap(CSR_SATP, 0ULL);
> +       local_flush_tlb_all();
> +
> +       if (hw_satp != identity_satp)
> +               disable_pgtable_l4();
> +
> +       memset(early_pg_dir, 0, PAGE_SIZE);
> +       memset(early_pud, 0, PAGE_SIZE);
> +       memset(early_pmd, 0, PAGE_SIZE);
> +}
> +#endif
> +
>  /*
>   * setup_vm() is called from head.S with MMU-off.
>   *
> @@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
>         uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
>
>         create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> -                          IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
> +                          IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
>                            PGDIR_SIZE,
>                            IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
>
> +       if (pgtable_l4_enabled) {
> +               create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
> +                                  (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
> +       }
> +
>         if (IS_ENABLED(CONFIG_64BIT)) {
>                 create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
>                                    pa, PMD_SIZE, PAGE_KERNEL);
> @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
>  #ifndef __PAGETABLE_PMD_FOLDED
>         pt_ops.alloc_pmd = alloc_pmd_early;
>         pt_ops.get_pmd_virt = get_pmd_virt_early;
> +       pt_ops.alloc_pud = alloc_pud_early;
> +       pt_ops.get_pud_virt = get_pud_virt_early;
>  #endif
>  }
>
> @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
>  #ifndef __PAGETABLE_PMD_FOLDED
>         pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
>         pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
> +       pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
> +       pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
>  #endif
>  }
>
> @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
>  #ifndef __PAGETABLE_PMD_FOLDED
>         pt_ops.alloc_pmd = alloc_pmd_late;
>         pt_ops.get_pmd_virt = get_pmd_virt_late;
> +       pt_ops.alloc_pud = alloc_pud_late;
> +       pt_ops.get_pud_virt = get_pud_virt_late;
>  #endif
>  }
>
> @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
>
>         kernel_map.virt_addr = KERNEL_LINK_ADDR;
> +       kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
>
>  #ifdef CONFIG_XIP_KERNEL
>         kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
> @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         kernel_map.phys_addr = (uintptr_t)(&_start);
>         kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
>  #endif
> +
> +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
> +       set_satp_mode();
> +#endif
> +
>         kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
>         kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>
> @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>
>         /* Setup early PGD for fixmap */
>         create_pgd_mapping(early_pg_dir, FIXADDR_START,
> -                          (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +                          fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
>
>  #ifndef __PAGETABLE_PMD_FOLDED
> -       /* Setup fixmap PMD */
> +       /* Setup fixmap PUD and PMD */
> +       if (pgtable_l4_enabled)
> +               create_pud_mapping(fixmap_pud, FIXADDR_START,
> +                                  (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
>         create_pmd_mapping(fixmap_pmd, FIXADDR_START,
>                            (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
>         /* Setup trampoline PGD and PMD */
>         create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
> -                          (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
> +                          trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +       if (pgtable_l4_enabled)
> +               create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
> +                                  (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
>  #ifdef CONFIG_XIP_KERNEL
>         create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
>                            kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
> @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>          * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
>          * range can not span multiple pmds.
>          */
> -       BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> +       BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
>                      != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
>
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
>         /* Clear fixmap PTE and PMD mappings */
>         clear_fixmap(FIX_PTE);
>         clear_fixmap(FIX_PMD);
> +       clear_fixmap(FIX_PUD);
>
>         /* Move to swapper page table */
> -       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
> +       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
>         local_flush_tlb_all();
>
>         pt_ops_set_late();
> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> index 1434a0225140..993f50571a3b 100644
> --- a/arch/riscv/mm/kasan_init.c
> +++ b/arch/riscv/mm/kasan_init.c
> @@ -11,7 +11,29 @@
>  #include <asm/fixmap.h>
>  #include <asm/pgalloc.h>
>
> +/*
> + * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
> + * which is right before the kernel.
> + *
> + * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
> + * the page global directory with kasan_early_shadow_pmd.
> + *
> + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
> + * must be divided as follows:
> + * - the first PGD entry, although incomplete, is populated with
> + *   kasan_early_shadow_pud/p4d
> + * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
> + * - the last PGD entry is shared with the kernel mapping so populated at the
> + *   lower levels pud/p4d
> + *
> + * In addition, when shallow populating a kasan region (for example vmalloc),
> + * this region may also not be aligned on PGDIR size, so we must go down to the
> + * pud level too.
> + */
> +
>  extern pgd_t early_pg_dir[PTRS_PER_PGD];
> +extern struct pt_alloc_ops _pt_ops __initdata;
> +#define pt_ops _pt_ops
>
>  static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
>  {
> @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
>         set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
>  }
>
> -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
> +static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
>  {
>         phys_addr_t phys_addr;
>         pmd_t *pmdp, *base_pmd;
>         unsigned long next;
>
> -       base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
> -       if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +       if (pud_none(*pud)) {
>                 base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +       } else {
> +               base_pmd = (pmd_t *)pud_pgtable(*pud);
> +               if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +                       base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +       }
>
>         pmdp = base_pmd + pmd_index(vaddr);
>
> @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
>          * it entirely, memblock could allocate a page at a physical address
>          * where KASAN is not populated yet and then we'd get a page fault.
>          */
> -       set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +       set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +}
> +
> +static void __init kasan_populate_pud(pgd_t *pgd,
> +                                     unsigned long vaddr, unsigned long end,
> +                                     bool early)
> +{
> +       phys_addr_t phys_addr;
> +       pud_t *pudp, *base_pud;
> +       unsigned long next;
> +
> +       if (early) {
> +               /*
> +                * We can't use pgd_page_vaddr here as it would return a linear
> +                * mapping address but it is not mapped yet, but when populating
> +                * early_pg_dir, we need the physical address and when populating
> +                * swapper_pg_dir, we need the kernel virtual address so use
> +                * pt_ops facility.
> +                */
> +               base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
> +       } else {
> +               base_pud = (pud_t *)pgd_page_vaddr(*pgd);
> +               if (base_pud == lm_alias(kasan_early_shadow_pud))
> +                       base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
> +       }
> +
> +       pudp = base_pud + pud_index(vaddr);
> +
> +       do {
> +               next = pud_addr_end(vaddr, end);
> +
> +               if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
> +                       if (early) {
> +                               phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
> +                               set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
> +                               continue;
> +                       } else {
> +                               phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
> +                               if (phys_addr) {
> +                                       set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
> +                                       continue;
> +                               }
> +                       }
> +               }
> +
> +               kasan_populate_pmd(pudp, vaddr, next);
> +       } while (pudp++, vaddr = next, vaddr != end);
> +
> +       /*
> +        * Wait for the whole PGD to be populated before setting the PGD in
> +        * the page table, otherwise, if we did set the PGD before populating
> +        * it entirely, memblock could allocate a page at a physical address
> +        * where KASAN is not populated yet and then we'd get a page fault.
> +        */
> +       if (!early)
> +               set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
>  }
>
> +#define kasan_early_shadow_pgd_next                    (pgtable_l4_enabled ?   \
> +                               (uintptr_t)kasan_early_shadow_pud :             \
> +                               (uintptr_t)kasan_early_shadow_pmd)
> +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)                      \
> +               (pgtable_l4_enabled ?                                           \
> +                       kasan_populate_pud(pgdp, vaddr, next, early) :          \
> +                       kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
> +
>  static void __init kasan_populate_pgd(pgd_t *pgdp,
>                                       unsigned long vaddr, unsigned long end,
>                                       bool early)
> @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
>                         }
>                 }
>
> -               kasan_populate_pmd(pgdp, vaddr, next);
> +               kasan_populate_pgd_next(pgdp, vaddr, next, early);
>         } while (pgdp++, vaddr = next, vaddr != end);
>  }
>
> @@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
>         memset(start, KASAN_SHADOW_INIT, end - start);
>  }
>
> +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
> +                                             unsigned long vaddr, unsigned long end,
> +                                             bool kasan_populate)
> +{
> +       unsigned long next;
> +       pud_t *pudp, *base_pud;
> +       pmd_t *base_pmd;
> +       bool is_kasan_pmd;
> +
> +       base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
> +       pudp = base_pud + pud_index(vaddr);
> +
> +       if (kasan_populate)
> +               memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
> +                      sizeof(pud_t) * PTRS_PER_PUD);
> +
> +       do {
> +               next = pud_addr_end(vaddr, end);
> +               is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
> +
> +               if (is_kasan_pmd) {
> +                       base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +                       set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +               }
> +       } while (pudp++, vaddr = next, vaddr != end);
> +}
> +
>  static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
>  {
>         unsigned long next;
>         void *p;
>         pgd_t *pgd_k = pgd_offset_k(vaddr);
> +       bool is_kasan_pgd_next;
>
>         do {
>                 next = pgd_addr_end(vaddr, end);
> -               if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
> +               is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
> +                                    (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
> +
> +               if (is_kasan_pgd_next) {
>                         p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>                         set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
>                 }
> +
> +               if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
> +                       continue;
> +
> +               kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
>         } while (pgd_k++, vaddr = next, vaddr != end);
>  }
>
> diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
> index 26e69788f27a..b3db5d91ed38 100644
> --- a/drivers/firmware/efi/libstub/efi-stub.c
> +++ b/drivers/firmware/efi/libstub/efi-stub.c
> @@ -40,6 +40,8 @@
>
>  #ifdef CONFIG_ARM64
>  # define EFI_RT_VIRTUAL_LIMIT  DEFAULT_MAP_WINDOW_64
> +#elif defined(CONFIG_RISCV)
> +# define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE_MIN
>  #else
>  # define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE
>  #endif
> --
> 2.32.0
>


--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
@ 2021-12-29  3:42     ` Guo Ren
  0 siblings, 0 replies; 70+ messages in thread
From: Guo Ren @ 2021-12-29  3:42 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

On Tue, Dec 7, 2021 at 11:54 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> By adding a new 4th level of page table, give the possibility to 64bit
> kernel to address 2^48 bytes of virtual address: in practice, that offers
> 128TB of virtual address space to userspace and allows up to 64TB of
> physical memory.
>
> If the underlying hardware does not support sv48, we will automatically
> fallback to a standard 3-level page table by folding the new PUD level into
> PGDIR level. In order to detect HW capabilities at runtime, we
> use SATP feature that ignores writes with an unsupported mode.
>
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>  arch/riscv/Kconfig                      |   4 +-
>  arch/riscv/include/asm/csr.h            |   3 +-
>  arch/riscv/include/asm/fixmap.h         |   1 +
>  arch/riscv/include/asm/kasan.h          |   6 +-
>  arch/riscv/include/asm/page.h           |  14 ++
>  arch/riscv/include/asm/pgalloc.h        |  40 +++++
>  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
>  arch/riscv/include/asm/pgtable.h        |  24 ++-
>  arch/riscv/kernel/head.S                |   3 +-
>  arch/riscv/mm/context.c                 |   4 +-
>  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
>  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
>  drivers/firmware/efi/libstub/efi-stub.c |   2 +
>  13 files changed, 514 insertions(+), 44 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ac6c0cd9bc29..d28fe0148e13 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -150,7 +150,7 @@ config PAGE_OFFSET
>         hex
>         default 0xC0000000 if 32BIT
>         default 0x80000000 if 64BIT && !MMU
> -       default 0xffffffd800000000 if 64BIT
> +       default 0xffffaf8000000000 if 64BIT
>
>  config KASAN_SHADOW_OFFSET
>         hex
> @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
>
>  config PGTABLE_LEVELS
>         int
> -       default 3 if 64BIT
> +       default 4 if 64BIT
>         default 2
>
>  config LOCKDEP_SUPPORT
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 87ac65696871..3fdb971c7896 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -40,14 +40,13 @@
>  #ifndef CONFIG_64BIT
>  #define SATP_PPN       _AC(0x003FFFFF, UL)
>  #define SATP_MODE_32   _AC(0x80000000, UL)
> -#define SATP_MODE      SATP_MODE_32
>  #define SATP_ASID_BITS 9
>  #define SATP_ASID_SHIFT        22
>  #define SATP_ASID_MASK _AC(0x1FF, UL)
>  #else
>  #define SATP_PPN       _AC(0x00000FFFFFFFFFFF, UL)
>  #define SATP_MODE_39   _AC(0x8000000000000000, UL)
> -#define SATP_MODE      SATP_MODE_39
> +#define SATP_MODE_48   _AC(0x9000000000000000, UL)
>  #define SATP_ASID_BITS 16
>  #define SATP_ASID_SHIFT        44
>  #define SATP_ASID_MASK _AC(0xFFFF, UL)
> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> index 54cbf07fb4e9..58a718573ad6 100644
> --- a/arch/riscv/include/asm/fixmap.h
> +++ b/arch/riscv/include/asm/fixmap.h
> @@ -24,6 +24,7 @@ enum fixed_addresses {
>         FIX_HOLE,
>         FIX_PTE,
>         FIX_PMD,
> +       FIX_PUD,
>         FIX_TEXT_POKE1,
>         FIX_TEXT_POKE0,
>         FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index 743e6ff57996..0b85e363e778 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -28,7 +28,11 @@
>  #define KASAN_SHADOW_SCALE_SHIFT       3
>
>  #define KASAN_SHADOW_SIZE      (UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> -#define KASAN_SHADOW_START     (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> +/*
> + * Depending on the size of the virtual address space, the region may not be
> + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> + */
> +#define KASAN_SHADOW_START     ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
>  #define KASAN_SHADOW_END       MODULES_LOWEST_VADDR
>  #define KASAN_SHADOW_OFFSET    _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index e03559f9b35e..d089fe46f7d8 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,7 +31,20 @@
>   * When not using MMU this corresponds to the first free page in
>   * physical memory (aligned on a page boundary).
>   */
> +#ifdef CONFIG_64BIT
> +#ifdef CONFIG_MMU
> +#define PAGE_OFFSET            kernel_map.page_offset
> +#else
> +#define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> +#endif
> +/*
> + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> + * define the PAGE_OFFSET value for SV39.
> + */
> +#define PAGE_OFFSET_L3         _AC(0xffffffd800000000, UL)
> +#else
>  #define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> +#endif /* CONFIG_64BIT */
>
>  /*
>   * Half of the kernel address space (half of the entries of the page global
> @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
>  #endif /* CONFIG_MMU */
>
>  struct kernel_mapping {
> +       unsigned long page_offset;
>         unsigned long virt_addr;
>         uintptr_t phys_addr;
>         uintptr_t size;
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index 0af6933a7100..11823004b87a 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -11,6 +11,8 @@
>  #include <asm/tlb.h>
>
>  #ifdef CONFIG_MMU
> +#define __HAVE_ARCH_PUD_ALLOC_ONE
> +#define __HAVE_ARCH_PUD_FREE
>  #include <asm-generic/pgalloc.h>
>
>  static inline void pmd_populate_kernel(struct mm_struct *mm,
> @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>
>         set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
>  }
> +
> +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> +{
> +       if (pgtable_l4_enabled) {
> +               unsigned long pfn = virt_to_pfn(pud);
> +
> +               set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +       }
> +}
> +
> +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> +                                    pud_t *pud)
> +{
> +       if (pgtable_l4_enabled) {
> +               unsigned long pfn = virt_to_pfn(pud);
> +
> +               set_p4d_safe(p4d,
> +                            __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +       }
> +}
> +
> +#define pud_alloc_one pud_alloc_one
> +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> +{
> +       if (pgtable_l4_enabled)
> +               return __pud_alloc_one(mm, addr);
> +
> +       return NULL;
> +}
> +
> +#define pud_free pud_free
> +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> +{
> +       if (pgtable_l4_enabled)
> +               __pud_free(mm, pud);
> +}
> +
> +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
>  #endif /* __PAGETABLE_PMD_FOLDED */
>
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index 228261aa9628..bbbdd66e5e2f 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -8,16 +8,36 @@
>
>  #include <linux/const.h>
>
> -#define PGDIR_SHIFT     30
> +extern bool pgtable_l4_enabled;
> +
> +#define PGDIR_SHIFT_L3  30
> +#define PGDIR_SHIFT_L4  39
> +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> +
> +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
>  /* Size of region mapped by a page global directory */
>  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
>  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
>
> +/* pud is folded into pgd in case of 3-level page table */
> +#define PUD_SHIFT      30
> +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> +#define PUD_MASK       (~(PUD_SIZE - 1))
> +
>  #define PMD_SHIFT       21
>  /* Size of region mapped by a page middle directory */
>  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
>  #define PMD_MASK        (~(PMD_SIZE - 1))
>
> +/* Page Upper Directory entry */
> +typedef struct {
> +       unsigned long pud;
> +} pud_t;
> +
> +#define pud_val(x)      ((x).pud)
> +#define __pud(x)        ((pud_t) { (x) })
> +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> +
>  /* Page Middle Directory entry */
>  typedef struct {
>         unsigned long pmd;
> @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
>         set_pud(pudp, __pud(0));
>  }
>
> +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> +{
> +       return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> +}
> +
> +static inline unsigned long _pud_pfn(pud_t pud)
> +{
> +       return pud_val(pud) >> _PAGE_PFN_SHIFT;
> +}
> +
>  static inline pmd_t *pud_pgtable(pud_t pud)
>  {
>         return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
>         return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
>  }
>
> +#define mm_pud_folded  mm_pud_folded
> +static inline bool mm_pud_folded(struct mm_struct *mm)
> +{
> +       if (pgtable_l4_enabled)
> +               return false;
> +
> +       return true;
> +}
> +
> +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> +
>  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
>  {
>         return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
>  #define pmd_ERROR(e) \
>         pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
>
> +#define pud_ERROR(e)   \
> +       pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> +
> +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               *p4dp = p4d;
> +       else
> +               set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> +}
> +
> +static inline int p4d_none(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return (p4d_val(p4d) == 0);
> +
> +       return 0;
> +}
> +
> +static inline int p4d_present(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return (p4d_val(p4d) & _PAGE_PRESENT);
> +
> +       return 1;
> +}
> +
> +static inline int p4d_bad(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return !p4d_present(p4d);
> +
> +       return 0;
> +}
> +
> +static inline void p4d_clear(p4d_t *p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               set_p4d(p4d, __p4d(0));
> +}
> +
> +static inline pud_t *p4d_pgtable(p4d_t p4d)
> +{
> +       if (pgtable_l4_enabled)
> +               return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +
> +       return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> +}
> +
> +static inline struct page *p4d_page(p4d_t p4d)
> +{
> +       return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> +}
> +
> +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> +
> +#define pud_offset pud_offset
> +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> +{
> +       if (pgtable_l4_enabled)
> +               return p4d_pgtable(*p4d) + pud_index(address);
> +
> +       return (pud_t *)p4d;
> +}
> +
>  #endif /* _ASM_RISCV_PGTABLE_64_H */
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index e1a52e22ad7e..e1c74ef4ead2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -51,7 +51,7 @@
>   * position vmemmap directly below the VMALLOC region.
>   */
>  #ifdef CONFIG_64BIT
> -#define VA_BITS                39
> +#define VA_BITS                (pgtable_l4_enabled ? 48 : 39)
>  #else
>  #define VA_BITS                32
>  #endif
> @@ -90,8 +90,7 @@
>
>  #ifndef __ASSEMBLY__
>
> -/* Page Upper Directory not used in RISC-V */
> -#include <asm-generic/pgtable-nopud.h>
> +#include <asm-generic/pgtable-nop4d.h>
>  #include <asm/page.h>
>  #include <asm/tlbflush.h>
>  #include <linux/mm_types.h>
> @@ -113,6 +112,17 @@
>  #define XIP_FIXUP(addr)                (addr)
>  #endif /* CONFIG_XIP_KERNEL */
>
> +struct pt_alloc_ops {
> +       pte_t *(*get_pte_virt)(phys_addr_t pa);
> +       phys_addr_t (*alloc_pte)(uintptr_t va);
> +#ifndef __PAGETABLE_PMD_FOLDED
> +       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> +       phys_addr_t (*alloc_pmd)(uintptr_t va);
> +       pud_t *(*get_pud_virt)(phys_addr_t pa);
> +       phys_addr_t (*alloc_pud)(uintptr_t va);
> +#endif
> +};
> +
>  #ifdef CONFIG_MMU
>  /* Number of entries in the page global directory */
>  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>   */
>  #ifdef CONFIG_64BIT
> -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>  #else
> -#define TASK_SIZE FIXADDR_START
> +#define TASK_SIZE      FIXADDR_START
> +#define TASK_SIZE_MIN  TASK_SIZE
This is used by efi-stub.c, rv64 compat patch also need it, we reuse
DEFAULT_MAP_WINDOW_64 macro.

TASK_SIZE_MIN is also okay for me, I think it should be a separate
patch with efi-stub midification.
https://lore.kernel.org/linux-riscv/20211228143958.3409187-9-guoren@kernel.org/

I've merged your patchset with compat tree and we are testing them
together totally & carefully.
https://github.com/c-sky/csky-linux/tree/riscv_compat_v2_sv48_v3

Now, rv32_rootfs & 64_rootfs booting have been passed. But I would
give you tested-by later after totally tested. Your patch set is very
helpful, thx.

ps: Could you give chance let customer choice sv48 or sv39 in dts?


>  #endif
>
>  #else /* CONFIG_MMU */
> @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
>  #define dtb_early_va   _dtb_early_va
>  #define dtb_early_pa   _dtb_early_pa
>  #endif /* CONFIG_XIP_KERNEL */
> +extern u64 satp_mode;
> +extern bool pgtable_l4_enabled;
>
>  void paging_init(void);
>  void misc_mem_init(void);
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 52c5ff9804c5..c3c0ed559770 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -95,7 +95,8 @@ relocate:
>
>         /* Compute satp for kernel page tables, but don't load it yet */
>         srl a2, a0, PAGE_SHIFT
> -       li a1, SATP_MODE
> +       la a1, satp_mode
> +       REG_L a1, 0(a1)
>         or a2, a2, a1
>
>         /*
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index ee3459cb6750..a7246872bd30 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  switch_mm_fast:
>         csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
>                   ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> -                 SATP_MODE);
> +                 satp_mode);
>
>         if (need_flush_tlb)
>                 local_flush_tlb_all();
> @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>  static void set_mm_noasid(struct mm_struct *mm)
>  {
>         /* Switch the page table and blindly nuke entire local TLB */
> -       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> +       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
>         local_flush_tlb_all();
>  }
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 1552226fb6bd..6a19a1b1caf8 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
>  #define kernel_map     (*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
>  #endif
>
> +#ifdef CONFIG_64BIT
> +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> +#else
> +u64 satp_mode = SATP_MODE_32;
> +#endif
> +EXPORT_SYMBOL(satp_mode);
> +
> +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> +                               true : false;
> +EXPORT_SYMBOL(pgtable_l4_enabled);
> +
>  phys_addr_t phys_ram_base __ro_after_init;
>  EXPORT_SYMBOL(phys_ram_base);
>
> @@ -53,15 +64,6 @@ extern char _start[];
>  void *_dtb_early_va __initdata;
>  uintptr_t _dtb_early_pa __initdata;
>
> -struct pt_alloc_ops {
> -       pte_t *(*get_pte_virt)(phys_addr_t pa);
> -       phys_addr_t (*alloc_pte)(uintptr_t va);
> -#ifndef __PAGETABLE_PMD_FOLDED
> -       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> -       phys_addr_t (*alloc_pmd)(uintptr_t va);
> -#endif
> -};
> -
>  static phys_addr_t dma32_phys_limit __initdata;
>
>  static void __init zone_sizes_init(void)
> @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
>  }
>
>  #ifdef CONFIG_MMU
> -static struct pt_alloc_ops _pt_ops __initdata;
> +struct pt_alloc_ops _pt_ops __initdata;
>
>  #ifdef CONFIG_XIP_KERNEL
>  #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
> @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>  static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>
>  pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
>  static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>
>  #ifdef CONFIG_XIP_KERNEL
> @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>  #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
>  #endif /* CONFIG_XIP_KERNEL */
>
> +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> +
> +#ifdef CONFIG_XIP_KERNEL
> +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
> +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
> +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
> +#endif /* CONFIG_XIP_KERNEL */
> +
>  static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
>  {
>         /* Before MMU is enabled */
> @@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
>
>  static phys_addr_t __init alloc_pmd_early(uintptr_t va)
>  {
> -       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +       BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
>
>         return (uintptr_t)early_pmd;
>  }
> @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
>         create_pte_mapping(ptep, va, pa, sz, prot);
>  }
>
> -#define pgd_next_t             pmd_t
> -#define alloc_pgd_next(__va)   pt_ops.alloc_pmd(__va)
> -#define get_pgd_next_virt(__pa)        pt_ops.get_pmd_virt(__pa)
> +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
> +{
> +       return (pud_t *)((uintptr_t)pa);
> +}
> +
> +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
> +{
> +       clear_fixmap(FIX_PUD);
> +       return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
> +}
> +
> +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
> +{
> +       return (pud_t *)__va(pa);
> +}
> +
> +static phys_addr_t __init alloc_pud_early(uintptr_t va)
> +{
> +       /* Only one PUD is available for early mapping */
> +       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> +
> +       return (uintptr_t)early_pud;
> +}
> +
> +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
> +{
> +       return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> +}
> +
> +static phys_addr_t alloc_pud_late(uintptr_t va)
> +{
> +       unsigned long vaddr;
> +
> +       vaddr = __get_free_page(GFP_KERNEL);
> +       BUG_ON(!vaddr);
> +       return __pa(vaddr);
> +}
> +
> +static void __init create_pud_mapping(pud_t *pudp,
> +                                     uintptr_t va, phys_addr_t pa,
> +                                     phys_addr_t sz, pgprot_t prot)
> +{
> +       pmd_t *nextp;
> +       phys_addr_t next_phys;
> +       uintptr_t pud_index = pud_index(va);
> +
> +       if (sz == PUD_SIZE) {
> +               if (pud_val(pudp[pud_index]) == 0)
> +                       pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
> +               return;
> +       }
> +
> +       if (pud_val(pudp[pud_index]) == 0) {
> +               next_phys = pt_ops.alloc_pmd(va);
> +               pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
> +               nextp = pt_ops.get_pmd_virt(next_phys);
> +               memset(nextp, 0, PAGE_SIZE);
> +       } else {
> +               next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
> +               nextp = pt_ops.get_pmd_virt(next_phys);
> +       }
> +
> +       create_pmd_mapping(nextp, va, pa, sz, prot);
> +}
> +
> +#define pgd_next_t             pud_t
> +#define alloc_pgd_next(__va)   (pgtable_l4_enabled ?                   \
> +               pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
> +#define get_pgd_next_virt(__pa)        (pgtable_l4_enabled ?                   \
> +               pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
>  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
> -       create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next                fixmap_pmd
> +                               (pgtable_l4_enabled ?                   \
> +               create_pud_mapping(__nextp, __va, __pa, __sz, __prot) : \
> +               create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
> +#define fixmap_pgd_next                (pgtable_l4_enabled ?                   \
> +               (uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
> +#define trampoline_pgd_next    (pgtable_l4_enabled ?                   \
> +               (uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
> +#define early_dtb_pgd_next     (pgtable_l4_enabled ?                   \
> +               (uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
>  #else
>  #define pgd_next_t             pte_t
>  #define alloc_pgd_next(__va)   pt_ops.alloc_pte(__va)
>  #define get_pgd_next_virt(__pa)        pt_ops.get_pte_virt(__pa)
>  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
>         create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
> -#define fixmap_pgd_next                fixmap_pte
> +#define fixmap_pgd_next                ((uintptr_t)fixmap_pte)
> +#define early_dtb_pgd_next     ((uintptr_t)early_dtb_pmd)
> +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
>  #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
> -#endif
> +#endif /* __PAGETABLE_PMD_FOLDED */
>
>  void __init create_pgd_mapping(pgd_t *pgdp,
>                                       uintptr_t va, phys_addr_t pa,
> @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
>  }
>  #endif /* CONFIG_STRICT_KERNEL_RWX */
>
> +#ifdef CONFIG_64BIT
> +static void __init disable_pgtable_l4(void)
> +{
> +       pgtable_l4_enabled = false;
> +       kernel_map.page_offset = PAGE_OFFSET_L3;
> +       satp_mode = SATP_MODE_39;
> +}
> +
> +/*
> + * There is a simple way to determine if 4-level is supported by the
> + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> + * then read SATP to see if the configuration was taken into account
> + * meaning sv48 is supported.
> + */
> +static __init void set_satp_mode(void)
> +{
> +       u64 identity_satp, hw_satp;
> +       uintptr_t set_satp_mode_pmd;
> +
> +       set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> +       create_pgd_mapping(early_pg_dir,
> +                          set_satp_mode_pmd, (uintptr_t)early_pud,
> +                          PGDIR_SIZE, PAGE_TABLE);
> +       create_pud_mapping(early_pud,
> +                          set_satp_mode_pmd, (uintptr_t)early_pmd,
> +                          PUD_SIZE, PAGE_TABLE);
> +       /* Handle the case where set_satp_mode straddles 2 PMDs */
> +       create_pmd_mapping(early_pmd,
> +                          set_satp_mode_pmd, set_satp_mode_pmd,
> +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> +       create_pmd_mapping(early_pmd,
> +                          set_satp_mode_pmd + PMD_SIZE,
> +                          set_satp_mode_pmd + PMD_SIZE,
> +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> +
> +       identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> +
> +       local_flush_tlb_all();
> +       csr_write(CSR_SATP, identity_satp);
> +       hw_satp = csr_swap(CSR_SATP, 0ULL);
> +       local_flush_tlb_all();
> +
> +       if (hw_satp != identity_satp)
> +               disable_pgtable_l4();
> +
> +       memset(early_pg_dir, 0, PAGE_SIZE);
> +       memset(early_pud, 0, PAGE_SIZE);
> +       memset(early_pmd, 0, PAGE_SIZE);
> +}
> +#endif
> +
>  /*
>   * setup_vm() is called from head.S with MMU-off.
>   *
> @@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
>         uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
>
>         create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> -                          IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
> +                          IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
>                            PGDIR_SIZE,
>                            IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
>
> +       if (pgtable_l4_enabled) {
> +               create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
> +                                  (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
> +       }
> +
>         if (IS_ENABLED(CONFIG_64BIT)) {
>                 create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
>                                    pa, PMD_SIZE, PAGE_KERNEL);
> @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
>  #ifndef __PAGETABLE_PMD_FOLDED
>         pt_ops.alloc_pmd = alloc_pmd_early;
>         pt_ops.get_pmd_virt = get_pmd_virt_early;
> +       pt_ops.alloc_pud = alloc_pud_early;
> +       pt_ops.get_pud_virt = get_pud_virt_early;
>  #endif
>  }
>
> @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
>  #ifndef __PAGETABLE_PMD_FOLDED
>         pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
>         pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
> +       pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
> +       pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
>  #endif
>  }
>
> @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
>  #ifndef __PAGETABLE_PMD_FOLDED
>         pt_ops.alloc_pmd = alloc_pmd_late;
>         pt_ops.get_pmd_virt = get_pmd_virt_late;
> +       pt_ops.alloc_pud = alloc_pud_late;
> +       pt_ops.get_pud_virt = get_pud_virt_late;
>  #endif
>  }
>
> @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
>
>         kernel_map.virt_addr = KERNEL_LINK_ADDR;
> +       kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
>
>  #ifdef CONFIG_XIP_KERNEL
>         kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
> @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>         kernel_map.phys_addr = (uintptr_t)(&_start);
>         kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
>  #endif
> +
> +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
> +       set_satp_mode();
> +#endif
> +
>         kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
>         kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>
> @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>
>         /* Setup early PGD for fixmap */
>         create_pgd_mapping(early_pg_dir, FIXADDR_START,
> -                          (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +                          fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
>
>  #ifndef __PAGETABLE_PMD_FOLDED
> -       /* Setup fixmap PMD */
> +       /* Setup fixmap PUD and PMD */
> +       if (pgtable_l4_enabled)
> +               create_pud_mapping(fixmap_pud, FIXADDR_START,
> +                                  (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
>         create_pmd_mapping(fixmap_pmd, FIXADDR_START,
>                            (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
>         /* Setup trampoline PGD and PMD */
>         create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
> -                          (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
> +                          trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> +       if (pgtable_l4_enabled)
> +               create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
> +                                  (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
>  #ifdef CONFIG_XIP_KERNEL
>         create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
>                            kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
> @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>          * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
>          * range can not span multiple pmds.
>          */
> -       BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> +       BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
>                      != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
>
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
>         /* Clear fixmap PTE and PMD mappings */
>         clear_fixmap(FIX_PTE);
>         clear_fixmap(FIX_PMD);
> +       clear_fixmap(FIX_PUD);
>
>         /* Move to swapper page table */
> -       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
> +       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
>         local_flush_tlb_all();
>
>         pt_ops_set_late();
> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> index 1434a0225140..993f50571a3b 100644
> --- a/arch/riscv/mm/kasan_init.c
> +++ b/arch/riscv/mm/kasan_init.c
> @@ -11,7 +11,29 @@
>  #include <asm/fixmap.h>
>  #include <asm/pgalloc.h>
>
> +/*
> + * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
> + * which is right before the kernel.
> + *
> + * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
> + * the page global directory with kasan_early_shadow_pmd.
> + *
> + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
> + * must be divided as follows:
> + * - the first PGD entry, although incomplete, is populated with
> + *   kasan_early_shadow_pud/p4d
> + * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
> + * - the last PGD entry is shared with the kernel mapping so populated at the
> + *   lower levels pud/p4d
> + *
> + * In addition, when shallow populating a kasan region (for example vmalloc),
> + * this region may also not be aligned on PGDIR size, so we must go down to the
> + * pud level too.
> + */
> +
>  extern pgd_t early_pg_dir[PTRS_PER_PGD];
> +extern struct pt_alloc_ops _pt_ops __initdata;
> +#define pt_ops _pt_ops
>
>  static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
>  {
> @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
>         set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
>  }
>
> -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
> +static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
>  {
>         phys_addr_t phys_addr;
>         pmd_t *pmdp, *base_pmd;
>         unsigned long next;
>
> -       base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
> -       if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +       if (pud_none(*pud)) {
>                 base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +       } else {
> +               base_pmd = (pmd_t *)pud_pgtable(*pud);
> +               if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> +                       base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> +       }
>
>         pmdp = base_pmd + pmd_index(vaddr);
>
> @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
>          * it entirely, memblock could allocate a page at a physical address
>          * where KASAN is not populated yet and then we'd get a page fault.
>          */
> -       set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +       set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +}
> +
> +static void __init kasan_populate_pud(pgd_t *pgd,
> +                                     unsigned long vaddr, unsigned long end,
> +                                     bool early)
> +{
> +       phys_addr_t phys_addr;
> +       pud_t *pudp, *base_pud;
> +       unsigned long next;
> +
> +       if (early) {
> +               /*
> +                * We can't use pgd_page_vaddr here as it would return a linear
> +                * mapping address but it is not mapped yet, but when populating
> +                * early_pg_dir, we need the physical address and when populating
> +                * swapper_pg_dir, we need the kernel virtual address so use
> +                * pt_ops facility.
> +                */
> +               base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
> +       } else {
> +               base_pud = (pud_t *)pgd_page_vaddr(*pgd);
> +               if (base_pud == lm_alias(kasan_early_shadow_pud))
> +                       base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
> +       }
> +
> +       pudp = base_pud + pud_index(vaddr);
> +
> +       do {
> +               next = pud_addr_end(vaddr, end);
> +
> +               if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
> +                       if (early) {
> +                               phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
> +                               set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
> +                               continue;
> +                       } else {
> +                               phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
> +                               if (phys_addr) {
> +                                       set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
> +                                       continue;
> +                               }
> +                       }
> +               }
> +
> +               kasan_populate_pmd(pudp, vaddr, next);
> +       } while (pudp++, vaddr = next, vaddr != end);
> +
> +       /*
> +        * Wait for the whole PGD to be populated before setting the PGD in
> +        * the page table, otherwise, if we did set the PGD before populating
> +        * it entirely, memblock could allocate a page at a physical address
> +        * where KASAN is not populated yet and then we'd get a page fault.
> +        */
> +       if (!early)
> +               set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
>  }
>
> +#define kasan_early_shadow_pgd_next                    (pgtable_l4_enabled ?   \
> +                               (uintptr_t)kasan_early_shadow_pud :             \
> +                               (uintptr_t)kasan_early_shadow_pmd)
> +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)                      \
> +               (pgtable_l4_enabled ?                                           \
> +                       kasan_populate_pud(pgdp, vaddr, next, early) :          \
> +                       kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
> +
>  static void __init kasan_populate_pgd(pgd_t *pgdp,
>                                       unsigned long vaddr, unsigned long end,
>                                       bool early)
> @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
>                         }
>                 }
>
> -               kasan_populate_pmd(pgdp, vaddr, next);
> +               kasan_populate_pgd_next(pgdp, vaddr, next, early);
>         } while (pgdp++, vaddr = next, vaddr != end);
>  }
>
> @@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
>         memset(start, KASAN_SHADOW_INIT, end - start);
>  }
>
> +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
> +                                             unsigned long vaddr, unsigned long end,
> +                                             bool kasan_populate)
> +{
> +       unsigned long next;
> +       pud_t *pudp, *base_pud;
> +       pmd_t *base_pmd;
> +       bool is_kasan_pmd;
> +
> +       base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
> +       pudp = base_pud + pud_index(vaddr);
> +
> +       if (kasan_populate)
> +               memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
> +                      sizeof(pud_t) * PTRS_PER_PUD);
> +
> +       do {
> +               next = pud_addr_end(vaddr, end);
> +               is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
> +
> +               if (is_kasan_pmd) {
> +                       base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +                       set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> +               }
> +       } while (pudp++, vaddr = next, vaddr != end);
> +}
> +
>  static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
>  {
>         unsigned long next;
>         void *p;
>         pgd_t *pgd_k = pgd_offset_k(vaddr);
> +       bool is_kasan_pgd_next;
>
>         do {
>                 next = pgd_addr_end(vaddr, end);
> -               if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
> +               is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
> +                                    (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
> +
> +               if (is_kasan_pgd_next) {
>                         p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>                         set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
>                 }
> +
> +               if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
> +                       continue;
> +
> +               kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
>         } while (pgd_k++, vaddr = next, vaddr != end);
>  }
>
> diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
> index 26e69788f27a..b3db5d91ed38 100644
> --- a/drivers/firmware/efi/libstub/efi-stub.c
> +++ b/drivers/firmware/efi/libstub/efi-stub.c
> @@ -40,6 +40,8 @@
>
>  #ifdef CONFIG_ARM64
>  # define EFI_RT_VIRTUAL_LIMIT  DEFAULT_MAP_WINDOW_64
> +#elif defined(CONFIG_RISCV)
> +# define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE_MIN
>  #else
>  # define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE
>  #endif
> --
> 2.32.0
>


--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-29  3:42     ` Guo Ren
@ 2022-01-04 12:42       ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-04 12:42 UTC (permalink / raw)
  To: Guo Ren
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

Hi Guo,

On Wed, Dec 29, 2021 at 4:42 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Dec 7, 2021 at 11:54 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > By adding a new 4th level of page table, give the possibility to 64bit
> > kernel to address 2^48 bytes of virtual address: in practice, that offers
> > 128TB of virtual address space to userspace and allows up to 64TB of
> > physical memory.
> >
> > If the underlying hardware does not support sv48, we will automatically
> > fallback to a standard 3-level page table by folding the new PUD level into
> > PGDIR level. In order to detect HW capabilities at runtime, we
> > use SATP feature that ignores writes with an unsupported mode.
> >
> > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > ---
> >  arch/riscv/Kconfig                      |   4 +-
> >  arch/riscv/include/asm/csr.h            |   3 +-
> >  arch/riscv/include/asm/fixmap.h         |   1 +
> >  arch/riscv/include/asm/kasan.h          |   6 +-
> >  arch/riscv/include/asm/page.h           |  14 ++
> >  arch/riscv/include/asm/pgalloc.h        |  40 +++++
> >  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
> >  arch/riscv/include/asm/pgtable.h        |  24 ++-
> >  arch/riscv/kernel/head.S                |   3 +-
> >  arch/riscv/mm/context.c                 |   4 +-
> >  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
> >  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
> >  drivers/firmware/efi/libstub/efi-stub.c |   2 +
> >  13 files changed, 514 insertions(+), 44 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index ac6c0cd9bc29..d28fe0148e13 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -150,7 +150,7 @@ config PAGE_OFFSET
> >         hex
> >         default 0xC0000000 if 32BIT
> >         default 0x80000000 if 64BIT && !MMU
> > -       default 0xffffffd800000000 if 64BIT
> > +       default 0xffffaf8000000000 if 64BIT
> >
> >  config KASAN_SHADOW_OFFSET
> >         hex
> > @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
> >
> >  config PGTABLE_LEVELS
> >         int
> > -       default 3 if 64BIT
> > +       default 4 if 64BIT
> >         default 2
> >
> >  config LOCKDEP_SUPPORT
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index 87ac65696871..3fdb971c7896 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -40,14 +40,13 @@
> >  #ifndef CONFIG_64BIT
> >  #define SATP_PPN       _AC(0x003FFFFF, UL)
> >  #define SATP_MODE_32   _AC(0x80000000, UL)
> > -#define SATP_MODE      SATP_MODE_32
> >  #define SATP_ASID_BITS 9
> >  #define SATP_ASID_SHIFT        22
> >  #define SATP_ASID_MASK _AC(0x1FF, UL)
> >  #else
> >  #define SATP_PPN       _AC(0x00000FFFFFFFFFFF, UL)
> >  #define SATP_MODE_39   _AC(0x8000000000000000, UL)
> > -#define SATP_MODE      SATP_MODE_39
> > +#define SATP_MODE_48   _AC(0x9000000000000000, UL)
> >  #define SATP_ASID_BITS 16
> >  #define SATP_ASID_SHIFT        44
> >  #define SATP_ASID_MASK _AC(0xFFFF, UL)
> > diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> > index 54cbf07fb4e9..58a718573ad6 100644
> > --- a/arch/riscv/include/asm/fixmap.h
> > +++ b/arch/riscv/include/asm/fixmap.h
> > @@ -24,6 +24,7 @@ enum fixed_addresses {
> >         FIX_HOLE,
> >         FIX_PTE,
> >         FIX_PMD,
> > +       FIX_PUD,
> >         FIX_TEXT_POKE1,
> >         FIX_TEXT_POKE0,
> >         FIX_EARLYCON_MEM_BASE,
> > diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> > index 743e6ff57996..0b85e363e778 100644
> > --- a/arch/riscv/include/asm/kasan.h
> > +++ b/arch/riscv/include/asm/kasan.h
> > @@ -28,7 +28,11 @@
> >  #define KASAN_SHADOW_SCALE_SHIFT       3
> >
> >  #define KASAN_SHADOW_SIZE      (UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> > -#define KASAN_SHADOW_START     (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> > +/*
> > + * Depending on the size of the virtual address space, the region may not be
> > + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> > + */
> > +#define KASAN_SHADOW_START     ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
> >  #define KASAN_SHADOW_END       MODULES_LOWEST_VADDR
> >  #define KASAN_SHADOW_OFFSET    _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
> >
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index e03559f9b35e..d089fe46f7d8 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -31,7 +31,20 @@
> >   * When not using MMU this corresponds to the first free page in
> >   * physical memory (aligned on a page boundary).
> >   */
> > +#ifdef CONFIG_64BIT
> > +#ifdef CONFIG_MMU
> > +#define PAGE_OFFSET            kernel_map.page_offset
> > +#else
> > +#define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif
> > +/*
> > + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> > + * define the PAGE_OFFSET value for SV39.
> > + */
> > +#define PAGE_OFFSET_L3         _AC(0xffffffd800000000, UL)
> > +#else
> >  #define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif /* CONFIG_64BIT */
> >
> >  /*
> >   * Half of the kernel address space (half of the entries of the page global
> > @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
> >  #endif /* CONFIG_MMU */
> >
> >  struct kernel_mapping {
> > +       unsigned long page_offset;
> >         unsigned long virt_addr;
> >         uintptr_t phys_addr;
> >         uintptr_t size;
> > diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> > index 0af6933a7100..11823004b87a 100644
> > --- a/arch/riscv/include/asm/pgalloc.h
> > +++ b/arch/riscv/include/asm/pgalloc.h
> > @@ -11,6 +11,8 @@
> >  #include <asm/tlb.h>
> >
> >  #ifdef CONFIG_MMU
> > +#define __HAVE_ARCH_PUD_ALLOC_ONE
> > +#define __HAVE_ARCH_PUD_FREE
> >  #include <asm-generic/pgalloc.h>
> >
> >  static inline void pmd_populate_kernel(struct mm_struct *mm,
> > @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> >
> >         set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> >  }
> > +
> > +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> > +{
> > +       if (pgtable_l4_enabled) {
> > +               unsigned long pfn = virt_to_pfn(pud);
> > +
> > +               set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +       }
> > +}
> > +
> > +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> > +                                    pud_t *pud)
> > +{
> > +       if (pgtable_l4_enabled) {
> > +               unsigned long pfn = virt_to_pfn(pud);
> > +
> > +               set_p4d_safe(p4d,
> > +                            __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +       }
> > +}
> > +
> > +#define pud_alloc_one pud_alloc_one
> > +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return __pud_alloc_one(mm, addr);
> > +
> > +       return NULL;
> > +}
> > +
> > +#define pud_free pud_free
> > +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               __pud_free(mm, pud);
> > +}
> > +
> > +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
> >  #endif /* __PAGETABLE_PMD_FOLDED */
> >
> >  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> > index 228261aa9628..bbbdd66e5e2f 100644
> > --- a/arch/riscv/include/asm/pgtable-64.h
> > +++ b/arch/riscv/include/asm/pgtable-64.h
> > @@ -8,16 +8,36 @@
> >
> >  #include <linux/const.h>
> >
> > -#define PGDIR_SHIFT     30
> > +extern bool pgtable_l4_enabled;
> > +
> > +#define PGDIR_SHIFT_L3  30
> > +#define PGDIR_SHIFT_L4  39
> > +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> > +
> > +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
> >  /* Size of region mapped by a page global directory */
> >  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
> >  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
> >
> > +/* pud is folded into pgd in case of 3-level page table */
> > +#define PUD_SHIFT      30
> > +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> > +#define PUD_MASK       (~(PUD_SIZE - 1))
> > +
> >  #define PMD_SHIFT       21
> >  /* Size of region mapped by a page middle directory */
> >  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
> >  #define PMD_MASK        (~(PMD_SIZE - 1))
> >
> > +/* Page Upper Directory entry */
> > +typedef struct {
> > +       unsigned long pud;
> > +} pud_t;
> > +
> > +#define pud_val(x)      ((x).pud)
> > +#define __pud(x)        ((pud_t) { (x) })
> > +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> > +
> >  /* Page Middle Directory entry */
> >  typedef struct {
> >         unsigned long pmd;
> > @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
> >         set_pud(pudp, __pud(0));
> >  }
> >
> > +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> > +{
> > +       return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > +}
> > +
> > +static inline unsigned long _pud_pfn(pud_t pud)
> > +{
> > +       return pud_val(pud) >> _PAGE_PFN_SHIFT;
> > +}
> > +
> >  static inline pmd_t *pud_pgtable(pud_t pud)
> >  {
> >         return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> > @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
> >         return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
> >  }
> >
> > +#define mm_pud_folded  mm_pud_folded
> > +static inline bool mm_pud_folded(struct mm_struct *mm)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> > +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> > +
> >  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
> >  {
> >         return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
> >  #define pmd_ERROR(e) \
> >         pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
> >
> > +#define pud_ERROR(e)   \
> > +       pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> > +
> > +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               *p4dp = p4d;
> > +       else
> > +               set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> > +}
> > +
> > +static inline int p4d_none(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return (p4d_val(p4d) == 0);
> > +
> > +       return 0;
> > +}
> > +
> > +static inline int p4d_present(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return (p4d_val(p4d) & _PAGE_PRESENT);
> > +
> > +       return 1;
> > +}
> > +
> > +static inline int p4d_bad(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return !p4d_present(p4d);
> > +
> > +       return 0;
> > +}
> > +
> > +static inline void p4d_clear(p4d_t *p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               set_p4d(p4d, __p4d(0));
> > +}
> > +
> > +static inline pud_t *p4d_pgtable(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +
> > +       return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> > +}
> > +
> > +static inline struct page *p4d_page(p4d_t p4d)
> > +{
> > +       return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +}
> > +
> > +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> > +
> > +#define pud_offset pud_offset
> > +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return p4d_pgtable(*p4d) + pud_index(address);
> > +
> > +       return (pud_t *)p4d;
> > +}
> > +
> >  #endif /* _ASM_RISCV_PGTABLE_64_H */
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index e1a52e22ad7e..e1c74ef4ead2 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -51,7 +51,7 @@
> >   * position vmemmap directly below the VMALLOC region.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define VA_BITS                39
> > +#define VA_BITS                (pgtable_l4_enabled ? 48 : 39)
> >  #else
> >  #define VA_BITS                32
> >  #endif
> > @@ -90,8 +90,7 @@
> >
> >  #ifndef __ASSEMBLY__
> >
> > -/* Page Upper Directory not used in RISC-V */
> > -#include <asm-generic/pgtable-nopud.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >  #include <asm/page.h>
> >  #include <asm/tlbflush.h>
> >  #include <linux/mm_types.h>
> > @@ -113,6 +112,17 @@
> >  #define XIP_FIXUP(addr)                (addr)
> >  #endif /* CONFIG_XIP_KERNEL */
> >
> > +struct pt_alloc_ops {
> > +       pte_t *(*get_pte_virt)(phys_addr_t pa);
> > +       phys_addr_t (*alloc_pte)(uintptr_t va);
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> > +       phys_addr_t (*alloc_pmd)(uintptr_t va);
> > +       pud_t *(*get_pud_virt)(phys_addr_t pa);
> > +       phys_addr_t (*alloc_pud)(uintptr_t va);
> > +#endif
> > +};
> > +
> >  #ifdef CONFIG_MMU
> >  /* Number of entries in the page global directory */
> >  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> > @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> >   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
> >  #else
> > -#define TASK_SIZE FIXADDR_START
> > +#define TASK_SIZE      FIXADDR_START
> > +#define TASK_SIZE_MIN  TASK_SIZE
> This is used by efi-stub.c, rv64 compat patch also need it, we reuse
> DEFAULT_MAP_WINDOW_64 macro.
>
> TASK_SIZE_MIN is also okay for me, I think it should be a separate
> patch with efi-stub midification.

IMO, TASK_SIZE_MIN is more explicit than DEFAULT_MAP_WINDOW_64. I'll
split this change in the next series.

> https://lore.kernel.org/linux-riscv/20211228143958.3409187-9-guoren@kernel.org/
>
> I've merged your patchset with compat tree and we are testing them
> together totally & carefully.
> https://github.com/c-sky/csky-linux/tree/riscv_compat_v2_sv48_v3
>
> Now, rv32_rootfs & 64_rootfs booting have been passed. But I would
> give you tested-by later after totally tested. Your patch set is very
> helpful, thx.

Thanks a lot, that will help move forward ;)

>
> ps: Could you give chance let customer choice sv48 or sv39 in dts?
>

This is already implemented in patch 13.

Thanks!

Alex

>
> >  #endif
> >
> >  #else /* CONFIG_MMU */
> > @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
> >  #define dtb_early_va   _dtb_early_va
> >  #define dtb_early_pa   _dtb_early_pa
> >  #endif /* CONFIG_XIP_KERNEL */
> > +extern u64 satp_mode;
> > +extern bool pgtable_l4_enabled;
> >
> >  void paging_init(void);
> >  void misc_mem_init(void);
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index 52c5ff9804c5..c3c0ed559770 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -95,7 +95,8 @@ relocate:
> >
> >         /* Compute satp for kernel page tables, but don't load it yet */
> >         srl a2, a0, PAGE_SHIFT
> > -       li a1, SATP_MODE
> > +       la a1, satp_mode
> > +       REG_L a1, 0(a1)
> >         or a2, a2, a1
> >
> >         /*
> > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> > index ee3459cb6750..a7246872bd30 100644
> > --- a/arch/riscv/mm/context.c
> > +++ b/arch/riscv/mm/context.c
> > @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  switch_mm_fast:
> >         csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
> >                   ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> > -                 SATP_MODE);
> > +                 satp_mode);
> >
> >         if (need_flush_tlb)
> >                 local_flush_tlb_all();
> > @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  static void set_mm_noasid(struct mm_struct *mm)
> >  {
> >         /* Switch the page table and blindly nuke entire local TLB */
> > -       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> > +       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
> >         local_flush_tlb_all();
> >  }
> >
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 1552226fb6bd..6a19a1b1caf8 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
> >  #define kernel_map     (*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
> >  #endif
> >
> > +#ifdef CONFIG_64BIT
> > +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> > +#else
> > +u64 satp_mode = SATP_MODE_32;
> > +#endif
> > +EXPORT_SYMBOL(satp_mode);
> > +
> > +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> > +                               true : false;
> > +EXPORT_SYMBOL(pgtable_l4_enabled);
> > +
> >  phys_addr_t phys_ram_base __ro_after_init;
> >  EXPORT_SYMBOL(phys_ram_base);
> >
> > @@ -53,15 +64,6 @@ extern char _start[];
> >  void *_dtb_early_va __initdata;
> >  uintptr_t _dtb_early_pa __initdata;
> >
> > -struct pt_alloc_ops {
> > -       pte_t *(*get_pte_virt)(phys_addr_t pa);
> > -       phys_addr_t (*alloc_pte)(uintptr_t va);
> > -#ifndef __PAGETABLE_PMD_FOLDED
> > -       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> > -       phys_addr_t (*alloc_pmd)(uintptr_t va);
> > -#endif
> > -};
> > -
> >  static phys_addr_t dma32_phys_limit __initdata;
> >
> >  static void __init zone_sizes_init(void)
> > @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
> >  }
> >
> >  #ifdef CONFIG_MMU
> > -static struct pt_alloc_ops _pt_ops __initdata;
> > +struct pt_alloc_ops _pt_ops __initdata;
> >
> >  #ifdef CONFIG_XIP_KERNEL
> >  #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
> > @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> >  static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
> >
> >  pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> > +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> >  static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
> >
> >  #ifdef CONFIG_XIP_KERNEL
> > @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
> >  #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
> >  #endif /* CONFIG_XIP_KERNEL */
> >
> > +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
> > +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
> > +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> > +
> > +#ifdef CONFIG_XIP_KERNEL
> > +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
> > +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
> > +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
> > +#endif /* CONFIG_XIP_KERNEL */
> > +
> >  static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
> >  {
> >         /* Before MMU is enabled */
> > @@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
> >
> >  static phys_addr_t __init alloc_pmd_early(uintptr_t va)
> >  {
> > -       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> > +       BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
> >
> >         return (uintptr_t)early_pmd;
> >  }
> > @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
> >         create_pte_mapping(ptep, va, pa, sz, prot);
> >  }
> >
> > -#define pgd_next_t             pmd_t
> > -#define alloc_pgd_next(__va)   pt_ops.alloc_pmd(__va)
> > -#define get_pgd_next_virt(__pa)        pt_ops.get_pmd_virt(__pa)
> > +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
> > +{
> > +       return (pud_t *)((uintptr_t)pa);
> > +}
> > +
> > +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
> > +{
> > +       clear_fixmap(FIX_PUD);
> > +       return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
> > +}
> > +
> > +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
> > +{
> > +       return (pud_t *)__va(pa);
> > +}
> > +
> > +static phys_addr_t __init alloc_pud_early(uintptr_t va)
> > +{
> > +       /* Only one PUD is available for early mapping */
> > +       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> > +
> > +       return (uintptr_t)early_pud;
> > +}
> > +
> > +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
> > +{
> > +       return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> > +}
> > +
> > +static phys_addr_t alloc_pud_late(uintptr_t va)
> > +{
> > +       unsigned long vaddr;
> > +
> > +       vaddr = __get_free_page(GFP_KERNEL);
> > +       BUG_ON(!vaddr);
> > +       return __pa(vaddr);
> > +}
> > +
> > +static void __init create_pud_mapping(pud_t *pudp,
> > +                                     uintptr_t va, phys_addr_t pa,
> > +                                     phys_addr_t sz, pgprot_t prot)
> > +{
> > +       pmd_t *nextp;
> > +       phys_addr_t next_phys;
> > +       uintptr_t pud_index = pud_index(va);
> > +
> > +       if (sz == PUD_SIZE) {
> > +               if (pud_val(pudp[pud_index]) == 0)
> > +                       pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
> > +               return;
> > +       }
> > +
> > +       if (pud_val(pudp[pud_index]) == 0) {
> > +               next_phys = pt_ops.alloc_pmd(va);
> > +               pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
> > +               nextp = pt_ops.get_pmd_virt(next_phys);
> > +               memset(nextp, 0, PAGE_SIZE);
> > +       } else {
> > +               next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
> > +               nextp = pt_ops.get_pmd_virt(next_phys);
> > +       }
> > +
> > +       create_pmd_mapping(nextp, va, pa, sz, prot);
> > +}
> > +
> > +#define pgd_next_t             pud_t
> > +#define alloc_pgd_next(__va)   (pgtable_l4_enabled ?                   \
> > +               pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
> > +#define get_pgd_next_virt(__pa)        (pgtable_l4_enabled ?                   \
> > +               pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
> >  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
> > -       create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
> > -#define fixmap_pgd_next                fixmap_pmd
> > +                               (pgtable_l4_enabled ?                   \
> > +               create_pud_mapping(__nextp, __va, __pa, __sz, __prot) : \
> > +               create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
> > +#define fixmap_pgd_next                (pgtable_l4_enabled ?                   \
> > +               (uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
> > +#define trampoline_pgd_next    (pgtable_l4_enabled ?                   \
> > +               (uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
> > +#define early_dtb_pgd_next     (pgtable_l4_enabled ?                   \
> > +               (uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
> >  #else
> >  #define pgd_next_t             pte_t
> >  #define alloc_pgd_next(__va)   pt_ops.alloc_pte(__va)
> >  #define get_pgd_next_virt(__pa)        pt_ops.get_pte_virt(__pa)
> >  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
> >         create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
> > -#define fixmap_pgd_next                fixmap_pte
> > +#define fixmap_pgd_next                ((uintptr_t)fixmap_pte)
> > +#define early_dtb_pgd_next     ((uintptr_t)early_dtb_pmd)
> > +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
> >  #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
> > -#endif
> > +#endif /* __PAGETABLE_PMD_FOLDED */
> >
> >  void __init create_pgd_mapping(pgd_t *pgdp,
> >                                       uintptr_t va, phys_addr_t pa,
> > @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
> >  }
> >  #endif /* CONFIG_STRICT_KERNEL_RWX */
> >
> > +#ifdef CONFIG_64BIT
> > +static void __init disable_pgtable_l4(void)
> > +{
> > +       pgtable_l4_enabled = false;
> > +       kernel_map.page_offset = PAGE_OFFSET_L3;
> > +       satp_mode = SATP_MODE_39;
> > +}
> > +
> > +/*
> > + * There is a simple way to determine if 4-level is supported by the
> > + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> > + * then read SATP to see if the configuration was taken into account
> > + * meaning sv48 is supported.
> > + */
> > +static __init void set_satp_mode(void)
> > +{
> > +       u64 identity_satp, hw_satp;
> > +       uintptr_t set_satp_mode_pmd;
> > +
> > +       set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> > +       create_pgd_mapping(early_pg_dir,
> > +                          set_satp_mode_pmd, (uintptr_t)early_pud,
> > +                          PGDIR_SIZE, PAGE_TABLE);
> > +       create_pud_mapping(early_pud,
> > +                          set_satp_mode_pmd, (uintptr_t)early_pmd,
> > +                          PUD_SIZE, PAGE_TABLE);
> > +       /* Handle the case where set_satp_mode straddles 2 PMDs */
> > +       create_pmd_mapping(early_pmd,
> > +                          set_satp_mode_pmd, set_satp_mode_pmd,
> > +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> > +       create_pmd_mapping(early_pmd,
> > +                          set_satp_mode_pmd + PMD_SIZE,
> > +                          set_satp_mode_pmd + PMD_SIZE,
> > +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> > +
> > +       identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> > +
> > +       local_flush_tlb_all();
> > +       csr_write(CSR_SATP, identity_satp);
> > +       hw_satp = csr_swap(CSR_SATP, 0ULL);
> > +       local_flush_tlb_all();
> > +
> > +       if (hw_satp != identity_satp)
> > +               disable_pgtable_l4();
> > +
> > +       memset(early_pg_dir, 0, PAGE_SIZE);
> > +       memset(early_pud, 0, PAGE_SIZE);
> > +       memset(early_pmd, 0, PAGE_SIZE);
> > +}
> > +#endif
> > +
> >  /*
> >   * setup_vm() is called from head.S with MMU-off.
> >   *
> > @@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
> >         uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
> >
> >         create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> > -                          IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
> > +                          IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
> >                            PGDIR_SIZE,
> >                            IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
> >
> > +       if (pgtable_l4_enabled) {
> > +               create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
> > +                                  (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
> > +       }
> > +
> >         if (IS_ENABLED(CONFIG_64BIT)) {
> >                 create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
> >                                    pa, PMD_SIZE, PAGE_KERNEL);
> > @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
> >  #ifndef __PAGETABLE_PMD_FOLDED
> >         pt_ops.alloc_pmd = alloc_pmd_early;
> >         pt_ops.get_pmd_virt = get_pmd_virt_early;
> > +       pt_ops.alloc_pud = alloc_pud_early;
> > +       pt_ops.get_pud_virt = get_pud_virt_early;
> >  #endif
> >  }
> >
> > @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
> >  #ifndef __PAGETABLE_PMD_FOLDED
> >         pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
> >         pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
> > +       pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
> > +       pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
> >  #endif
> >  }
> >
> > @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
> >  #ifndef __PAGETABLE_PMD_FOLDED
> >         pt_ops.alloc_pmd = alloc_pmd_late;
> >         pt_ops.get_pmd_virt = get_pmd_virt_late;
> > +       pt_ops.alloc_pud = alloc_pud_late;
> > +       pt_ops.get_pud_virt = get_pud_virt_late;
> >  #endif
> >  }
> >
> > @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >         pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
> >
> >         kernel_map.virt_addr = KERNEL_LINK_ADDR;
> > +       kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
> >
> >  #ifdef CONFIG_XIP_KERNEL
> >         kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
> > @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >         kernel_map.phys_addr = (uintptr_t)(&_start);
> >         kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
> >  #endif
> > +
> > +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
> > +       set_satp_mode();
> > +#endif
> > +
> >         kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> >         kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> >
> > @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >
> >         /* Setup early PGD for fixmap */
> >         create_pgd_mapping(early_pg_dir, FIXADDR_START,
> > -                          (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> > +                          fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> >
> >  #ifndef __PAGETABLE_PMD_FOLDED
> > -       /* Setup fixmap PMD */
> > +       /* Setup fixmap PUD and PMD */
> > +       if (pgtable_l4_enabled)
> > +               create_pud_mapping(fixmap_pud, FIXADDR_START,
> > +                                  (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
> >         create_pmd_mapping(fixmap_pmd, FIXADDR_START,
> >                            (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
> >         /* Setup trampoline PGD and PMD */
> >         create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
> > -                          (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
> > +                          trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> > +       if (pgtable_l4_enabled)
> > +               create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
> > +                                  (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
> >  #ifdef CONFIG_XIP_KERNEL
> >         create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
> >                            kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
> > @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >          * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
> >          * range can not span multiple pmds.
> >          */
> > -       BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> > +       BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> >                      != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
> >
> >  #ifndef __PAGETABLE_PMD_FOLDED
> > @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
> >         /* Clear fixmap PTE and PMD mappings */
> >         clear_fixmap(FIX_PTE);
> >         clear_fixmap(FIX_PMD);
> > +       clear_fixmap(FIX_PUD);
> >
> >         /* Move to swapper page table */
> > -       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
> > +       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
> >         local_flush_tlb_all();
> >
> >         pt_ops_set_late();
> > diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> > index 1434a0225140..993f50571a3b 100644
> > --- a/arch/riscv/mm/kasan_init.c
> > +++ b/arch/riscv/mm/kasan_init.c
> > @@ -11,7 +11,29 @@
> >  #include <asm/fixmap.h>
> >  #include <asm/pgalloc.h>
> >
> > +/*
> > + * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
> > + * which is right before the kernel.
> > + *
> > + * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
> > + * the page global directory with kasan_early_shadow_pmd.
> > + *
> > + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
> > + * must be divided as follows:
> > + * - the first PGD entry, although incomplete, is populated with
> > + *   kasan_early_shadow_pud/p4d
> > + * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
> > + * - the last PGD entry is shared with the kernel mapping so populated at the
> > + *   lower levels pud/p4d
> > + *
> > + * In addition, when shallow populating a kasan region (for example vmalloc),
> > + * this region may also not be aligned on PGDIR size, so we must go down to the
> > + * pud level too.
> > + */
> > +
> >  extern pgd_t early_pg_dir[PTRS_PER_PGD];
> > +extern struct pt_alloc_ops _pt_ops __initdata;
> > +#define pt_ops _pt_ops
> >
> >  static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
> >  {
> > @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
> >         set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
> >  }
> >
> > -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
> > +static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
> >  {
> >         phys_addr_t phys_addr;
> >         pmd_t *pmdp, *base_pmd;
> >         unsigned long next;
> >
> > -       base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
> > -       if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> > +       if (pud_none(*pud)) {
> >                 base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> > +       } else {
> > +               base_pmd = (pmd_t *)pud_pgtable(*pud);
> > +               if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> > +                       base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> > +       }
> >
> >         pmdp = base_pmd + pmd_index(vaddr);
> >
> > @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
> >          * it entirely, memblock could allocate a page at a physical address
> >          * where KASAN is not populated yet and then we'd get a page fault.
> >          */
> > -       set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> > +       set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> > +}
> > +
> > +static void __init kasan_populate_pud(pgd_t *pgd,
> > +                                     unsigned long vaddr, unsigned long end,
> > +                                     bool early)
> > +{
> > +       phys_addr_t phys_addr;
> > +       pud_t *pudp, *base_pud;
> > +       unsigned long next;
> > +
> > +       if (early) {
> > +               /*
> > +                * We can't use pgd_page_vaddr here as it would return a linear
> > +                * mapping address but it is not mapped yet, but when populating
> > +                * early_pg_dir, we need the physical address and when populating
> > +                * swapper_pg_dir, we need the kernel virtual address so use
> > +                * pt_ops facility.
> > +                */
> > +               base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
> > +       } else {
> > +               base_pud = (pud_t *)pgd_page_vaddr(*pgd);
> > +               if (base_pud == lm_alias(kasan_early_shadow_pud))
> > +                       base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
> > +       }
> > +
> > +       pudp = base_pud + pud_index(vaddr);
> > +
> > +       do {
> > +               next = pud_addr_end(vaddr, end);
> > +
> > +               if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
> > +                       if (early) {
> > +                               phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
> > +                               set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
> > +                               continue;
> > +                       } else {
> > +                               phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
> > +                               if (phys_addr) {
> > +                                       set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
> > +                                       continue;
> > +                               }
> > +                       }
> > +               }
> > +
> > +               kasan_populate_pmd(pudp, vaddr, next);
> > +       } while (pudp++, vaddr = next, vaddr != end);
> > +
> > +       /*
> > +        * Wait for the whole PGD to be populated before setting the PGD in
> > +        * the page table, otherwise, if we did set the PGD before populating
> > +        * it entirely, memblock could allocate a page at a physical address
> > +        * where KASAN is not populated yet and then we'd get a page fault.
> > +        */
> > +       if (!early)
> > +               set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
> >  }
> >
> > +#define kasan_early_shadow_pgd_next                    (pgtable_l4_enabled ?   \
> > +                               (uintptr_t)kasan_early_shadow_pud :             \
> > +                               (uintptr_t)kasan_early_shadow_pmd)
> > +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)                      \
> > +               (pgtable_l4_enabled ?                                           \
> > +                       kasan_populate_pud(pgdp, vaddr, next, early) :          \
> > +                       kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
> > +
> >  static void __init kasan_populate_pgd(pgd_t *pgdp,
> >                                       unsigned long vaddr, unsigned long end,
> >                                       bool early)
> > @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
> >                         }
> >                 }
> >
> > -               kasan_populate_pmd(pgdp, vaddr, next);
> > +               kasan_populate_pgd_next(pgdp, vaddr, next, early);
> >         } while (pgdp++, vaddr = next, vaddr != end);
> >  }
> >
> > @@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
> >         memset(start, KASAN_SHADOW_INIT, end - start);
> >  }
> >
> > +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
> > +                                             unsigned long vaddr, unsigned long end,
> > +                                             bool kasan_populate)
> > +{
> > +       unsigned long next;
> > +       pud_t *pudp, *base_pud;
> > +       pmd_t *base_pmd;
> > +       bool is_kasan_pmd;
> > +
> > +       base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
> > +       pudp = base_pud + pud_index(vaddr);
> > +
> > +       if (kasan_populate)
> > +               memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
> > +                      sizeof(pud_t) * PTRS_PER_PUD);
> > +
> > +       do {
> > +               next = pud_addr_end(vaddr, end);
> > +               is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
> > +
> > +               if (is_kasan_pmd) {
> > +                       base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> > +                       set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> > +               }
> > +       } while (pudp++, vaddr = next, vaddr != end);
> > +}
> > +
> >  static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
> >  {
> >         unsigned long next;
> >         void *p;
> >         pgd_t *pgd_k = pgd_offset_k(vaddr);
> > +       bool is_kasan_pgd_next;
> >
> >         do {
> >                 next = pgd_addr_end(vaddr, end);
> > -               if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
> > +               is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
> > +                                    (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
> > +
> > +               if (is_kasan_pgd_next) {
> >                         p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> >                         set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
> >                 }
> > +
> > +               if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
> > +                       continue;
> > +
> > +               kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
> >         } while (pgd_k++, vaddr = next, vaddr != end);
> >  }
> >
> > diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
> > index 26e69788f27a..b3db5d91ed38 100644
> > --- a/drivers/firmware/efi/libstub/efi-stub.c
> > +++ b/drivers/firmware/efi/libstub/efi-stub.c
> > @@ -40,6 +40,8 @@
> >
> >  #ifdef CONFIG_ARM64
> >  # define EFI_RT_VIRTUAL_LIMIT  DEFAULT_MAP_WINDOW_64
> > +#elif defined(CONFIG_RISCV)
> > +# define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE_MIN
> >  #else
> >  # define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE
> >  #endif
> > --
> > 2.32.0
> >
>
>
> --
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
@ 2022-01-04 12:42       ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-04 12:42 UTC (permalink / raw)
  To: Guo Ren
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020,
	Linux Doc Mailing List, linux-riscv, Linux Kernel Mailing List,
	kasan-dev, linux-efi, linux-arch

Hi Guo,

On Wed, Dec 29, 2021 at 4:42 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Dec 7, 2021 at 11:54 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > By adding a new 4th level of page table, give the possibility to 64bit
> > kernel to address 2^48 bytes of virtual address: in practice, that offers
> > 128TB of virtual address space to userspace and allows up to 64TB of
> > physical memory.
> >
> > If the underlying hardware does not support sv48, we will automatically
> > fallback to a standard 3-level page table by folding the new PUD level into
> > PGDIR level. In order to detect HW capabilities at runtime, we
> > use SATP feature that ignores writes with an unsupported mode.
> >
> > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > ---
> >  arch/riscv/Kconfig                      |   4 +-
> >  arch/riscv/include/asm/csr.h            |   3 +-
> >  arch/riscv/include/asm/fixmap.h         |   1 +
> >  arch/riscv/include/asm/kasan.h          |   6 +-
> >  arch/riscv/include/asm/page.h           |  14 ++
> >  arch/riscv/include/asm/pgalloc.h        |  40 +++++
> >  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
> >  arch/riscv/include/asm/pgtable.h        |  24 ++-
> >  arch/riscv/kernel/head.S                |   3 +-
> >  arch/riscv/mm/context.c                 |   4 +-
> >  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
> >  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
> >  drivers/firmware/efi/libstub/efi-stub.c |   2 +
> >  13 files changed, 514 insertions(+), 44 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index ac6c0cd9bc29..d28fe0148e13 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -150,7 +150,7 @@ config PAGE_OFFSET
> >         hex
> >         default 0xC0000000 if 32BIT
> >         default 0x80000000 if 64BIT && !MMU
> > -       default 0xffffffd800000000 if 64BIT
> > +       default 0xffffaf8000000000 if 64BIT
> >
> >  config KASAN_SHADOW_OFFSET
> >         hex
> > @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
> >
> >  config PGTABLE_LEVELS
> >         int
> > -       default 3 if 64BIT
> > +       default 4 if 64BIT
> >         default 2
> >
> >  config LOCKDEP_SUPPORT
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index 87ac65696871..3fdb971c7896 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -40,14 +40,13 @@
> >  #ifndef CONFIG_64BIT
> >  #define SATP_PPN       _AC(0x003FFFFF, UL)
> >  #define SATP_MODE_32   _AC(0x80000000, UL)
> > -#define SATP_MODE      SATP_MODE_32
> >  #define SATP_ASID_BITS 9
> >  #define SATP_ASID_SHIFT        22
> >  #define SATP_ASID_MASK _AC(0x1FF, UL)
> >  #else
> >  #define SATP_PPN       _AC(0x00000FFFFFFFFFFF, UL)
> >  #define SATP_MODE_39   _AC(0x8000000000000000, UL)
> > -#define SATP_MODE      SATP_MODE_39
> > +#define SATP_MODE_48   _AC(0x9000000000000000, UL)
> >  #define SATP_ASID_BITS 16
> >  #define SATP_ASID_SHIFT        44
> >  #define SATP_ASID_MASK _AC(0xFFFF, UL)
> > diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> > index 54cbf07fb4e9..58a718573ad6 100644
> > --- a/arch/riscv/include/asm/fixmap.h
> > +++ b/arch/riscv/include/asm/fixmap.h
> > @@ -24,6 +24,7 @@ enum fixed_addresses {
> >         FIX_HOLE,
> >         FIX_PTE,
> >         FIX_PMD,
> > +       FIX_PUD,
> >         FIX_TEXT_POKE1,
> >         FIX_TEXT_POKE0,
> >         FIX_EARLYCON_MEM_BASE,
> > diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> > index 743e6ff57996..0b85e363e778 100644
> > --- a/arch/riscv/include/asm/kasan.h
> > +++ b/arch/riscv/include/asm/kasan.h
> > @@ -28,7 +28,11 @@
> >  #define KASAN_SHADOW_SCALE_SHIFT       3
> >
> >  #define KASAN_SHADOW_SIZE      (UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> > -#define KASAN_SHADOW_START     (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> > +/*
> > + * Depending on the size of the virtual address space, the region may not be
> > + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> > + */
> > +#define KASAN_SHADOW_START     ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
> >  #define KASAN_SHADOW_END       MODULES_LOWEST_VADDR
> >  #define KASAN_SHADOW_OFFSET    _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
> >
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index e03559f9b35e..d089fe46f7d8 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -31,7 +31,20 @@
> >   * When not using MMU this corresponds to the first free page in
> >   * physical memory (aligned on a page boundary).
> >   */
> > +#ifdef CONFIG_64BIT
> > +#ifdef CONFIG_MMU
> > +#define PAGE_OFFSET            kernel_map.page_offset
> > +#else
> > +#define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif
> > +/*
> > + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> > + * define the PAGE_OFFSET value for SV39.
> > + */
> > +#define PAGE_OFFSET_L3         _AC(0xffffffd800000000, UL)
> > +#else
> >  #define PAGE_OFFSET            _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif /* CONFIG_64BIT */
> >
> >  /*
> >   * Half of the kernel address space (half of the entries of the page global
> > @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
> >  #endif /* CONFIG_MMU */
> >
> >  struct kernel_mapping {
> > +       unsigned long page_offset;
> >         unsigned long virt_addr;
> >         uintptr_t phys_addr;
> >         uintptr_t size;
> > diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> > index 0af6933a7100..11823004b87a 100644
> > --- a/arch/riscv/include/asm/pgalloc.h
> > +++ b/arch/riscv/include/asm/pgalloc.h
> > @@ -11,6 +11,8 @@
> >  #include <asm/tlb.h>
> >
> >  #ifdef CONFIG_MMU
> > +#define __HAVE_ARCH_PUD_ALLOC_ONE
> > +#define __HAVE_ARCH_PUD_FREE
> >  #include <asm-generic/pgalloc.h>
> >
> >  static inline void pmd_populate_kernel(struct mm_struct *mm,
> > @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> >
> >         set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> >  }
> > +
> > +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> > +{
> > +       if (pgtable_l4_enabled) {
> > +               unsigned long pfn = virt_to_pfn(pud);
> > +
> > +               set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +       }
> > +}
> > +
> > +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> > +                                    pud_t *pud)
> > +{
> > +       if (pgtable_l4_enabled) {
> > +               unsigned long pfn = virt_to_pfn(pud);
> > +
> > +               set_p4d_safe(p4d,
> > +                            __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +       }
> > +}
> > +
> > +#define pud_alloc_one pud_alloc_one
> > +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return __pud_alloc_one(mm, addr);
> > +
> > +       return NULL;
> > +}
> > +
> > +#define pud_free pud_free
> > +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               __pud_free(mm, pud);
> > +}
> > +
> > +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
> >  #endif /* __PAGETABLE_PMD_FOLDED */
> >
> >  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> > index 228261aa9628..bbbdd66e5e2f 100644
> > --- a/arch/riscv/include/asm/pgtable-64.h
> > +++ b/arch/riscv/include/asm/pgtable-64.h
> > @@ -8,16 +8,36 @@
> >
> >  #include <linux/const.h>
> >
> > -#define PGDIR_SHIFT     30
> > +extern bool pgtable_l4_enabled;
> > +
> > +#define PGDIR_SHIFT_L3  30
> > +#define PGDIR_SHIFT_L4  39
> > +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> > +
> > +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
> >  /* Size of region mapped by a page global directory */
> >  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
> >  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
> >
> > +/* pud is folded into pgd in case of 3-level page table */
> > +#define PUD_SHIFT      30
> > +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> > +#define PUD_MASK       (~(PUD_SIZE - 1))
> > +
> >  #define PMD_SHIFT       21
> >  /* Size of region mapped by a page middle directory */
> >  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
> >  #define PMD_MASK        (~(PMD_SIZE - 1))
> >
> > +/* Page Upper Directory entry */
> > +typedef struct {
> > +       unsigned long pud;
> > +} pud_t;
> > +
> > +#define pud_val(x)      ((x).pud)
> > +#define __pud(x)        ((pud_t) { (x) })
> > +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> > +
> >  /* Page Middle Directory entry */
> >  typedef struct {
> >         unsigned long pmd;
> > @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
> >         set_pud(pudp, __pud(0));
> >  }
> >
> > +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> > +{
> > +       return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > +}
> > +
> > +static inline unsigned long _pud_pfn(pud_t pud)
> > +{
> > +       return pud_val(pud) >> _PAGE_PFN_SHIFT;
> > +}
> > +
> >  static inline pmd_t *pud_pgtable(pud_t pud)
> >  {
> >         return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> > @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
> >         return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
> >  }
> >
> > +#define mm_pud_folded  mm_pud_folded
> > +static inline bool mm_pud_folded(struct mm_struct *mm)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> > +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> > +
> >  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
> >  {
> >         return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
> >  #define pmd_ERROR(e) \
> >         pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
> >
> > +#define pud_ERROR(e)   \
> > +       pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> > +
> > +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               *p4dp = p4d;
> > +       else
> > +               set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> > +}
> > +
> > +static inline int p4d_none(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return (p4d_val(p4d) == 0);
> > +
> > +       return 0;
> > +}
> > +
> > +static inline int p4d_present(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return (p4d_val(p4d) & _PAGE_PRESENT);
> > +
> > +       return 1;
> > +}
> > +
> > +static inline int p4d_bad(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return !p4d_present(p4d);
> > +
> > +       return 0;
> > +}
> > +
> > +static inline void p4d_clear(p4d_t *p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               set_p4d(p4d, __p4d(0));
> > +}
> > +
> > +static inline pud_t *p4d_pgtable(p4d_t p4d)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +
> > +       return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> > +}
> > +
> > +static inline struct page *p4d_page(p4d_t p4d)
> > +{
> > +       return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +}
> > +
> > +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> > +
> > +#define pud_offset pud_offset
> > +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> > +{
> > +       if (pgtable_l4_enabled)
> > +               return p4d_pgtable(*p4d) + pud_index(address);
> > +
> > +       return (pud_t *)p4d;
> > +}
> > +
> >  #endif /* _ASM_RISCV_PGTABLE_64_H */
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index e1a52e22ad7e..e1c74ef4ead2 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -51,7 +51,7 @@
> >   * position vmemmap directly below the VMALLOC region.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define VA_BITS                39
> > +#define VA_BITS                (pgtable_l4_enabled ? 48 : 39)
> >  #else
> >  #define VA_BITS                32
> >  #endif
> > @@ -90,8 +90,7 @@
> >
> >  #ifndef __ASSEMBLY__
> >
> > -/* Page Upper Directory not used in RISC-V */
> > -#include <asm-generic/pgtable-nopud.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >  #include <asm/page.h>
> >  #include <asm/tlbflush.h>
> >  #include <linux/mm_types.h>
> > @@ -113,6 +112,17 @@
> >  #define XIP_FIXUP(addr)                (addr)
> >  #endif /* CONFIG_XIP_KERNEL */
> >
> > +struct pt_alloc_ops {
> > +       pte_t *(*get_pte_virt)(phys_addr_t pa);
> > +       phys_addr_t (*alloc_pte)(uintptr_t va);
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> > +       phys_addr_t (*alloc_pmd)(uintptr_t va);
> > +       pud_t *(*get_pud_virt)(phys_addr_t pa);
> > +       phys_addr_t (*alloc_pud)(uintptr_t va);
> > +#endif
> > +};
> > +
> >  #ifdef CONFIG_MMU
> >  /* Number of entries in the page global directory */
> >  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> > @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> >   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
> >  #else
> > -#define TASK_SIZE FIXADDR_START
> > +#define TASK_SIZE      FIXADDR_START
> > +#define TASK_SIZE_MIN  TASK_SIZE
> This is used by efi-stub.c, rv64 compat patch also need it, we reuse
> DEFAULT_MAP_WINDOW_64 macro.
>
> TASK_SIZE_MIN is also okay for me, I think it should be a separate
> patch with efi-stub midification.

IMO, TASK_SIZE_MIN is more explicit than DEFAULT_MAP_WINDOW_64. I'll
split this change in the next series.

> https://lore.kernel.org/linux-riscv/20211228143958.3409187-9-guoren@kernel.org/
>
> I've merged your patchset with compat tree and we are testing them
> together totally & carefully.
> https://github.com/c-sky/csky-linux/tree/riscv_compat_v2_sv48_v3
>
> Now, rv32_rootfs & 64_rootfs booting have been passed. But I would
> give you tested-by later after totally tested. Your patch set is very
> helpful, thx.

Thanks a lot, that will help move forward ;)

>
> ps: Could you give chance let customer choice sv48 or sv39 in dts?
>

This is already implemented in patch 13.

Thanks!

Alex

>
> >  #endif
> >
> >  #else /* CONFIG_MMU */
> > @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
> >  #define dtb_early_va   _dtb_early_va
> >  #define dtb_early_pa   _dtb_early_pa
> >  #endif /* CONFIG_XIP_KERNEL */
> > +extern u64 satp_mode;
> > +extern bool pgtable_l4_enabled;
> >
> >  void paging_init(void);
> >  void misc_mem_init(void);
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index 52c5ff9804c5..c3c0ed559770 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -95,7 +95,8 @@ relocate:
> >
> >         /* Compute satp for kernel page tables, but don't load it yet */
> >         srl a2, a0, PAGE_SHIFT
> > -       li a1, SATP_MODE
> > +       la a1, satp_mode
> > +       REG_L a1, 0(a1)
> >         or a2, a2, a1
> >
> >         /*
> > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> > index ee3459cb6750..a7246872bd30 100644
> > --- a/arch/riscv/mm/context.c
> > +++ b/arch/riscv/mm/context.c
> > @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  switch_mm_fast:
> >         csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
> >                   ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> > -                 SATP_MODE);
> > +                 satp_mode);
> >
> >         if (need_flush_tlb)
> >                 local_flush_tlb_all();
> > @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  static void set_mm_noasid(struct mm_struct *mm)
> >  {
> >         /* Switch the page table and blindly nuke entire local TLB */
> > -       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> > +       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
> >         local_flush_tlb_all();
> >  }
> >
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 1552226fb6bd..6a19a1b1caf8 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
> >  #define kernel_map     (*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
> >  #endif
> >
> > +#ifdef CONFIG_64BIT
> > +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> > +#else
> > +u64 satp_mode = SATP_MODE_32;
> > +#endif
> > +EXPORT_SYMBOL(satp_mode);
> > +
> > +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> > +                               true : false;
> > +EXPORT_SYMBOL(pgtable_l4_enabled);
> > +
> >  phys_addr_t phys_ram_base __ro_after_init;
> >  EXPORT_SYMBOL(phys_ram_base);
> >
> > @@ -53,15 +64,6 @@ extern char _start[];
> >  void *_dtb_early_va __initdata;
> >  uintptr_t _dtb_early_pa __initdata;
> >
> > -struct pt_alloc_ops {
> > -       pte_t *(*get_pte_virt)(phys_addr_t pa);
> > -       phys_addr_t (*alloc_pte)(uintptr_t va);
> > -#ifndef __PAGETABLE_PMD_FOLDED
> > -       pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> > -       phys_addr_t (*alloc_pmd)(uintptr_t va);
> > -#endif
> > -};
> > -
> >  static phys_addr_t dma32_phys_limit __initdata;
> >
> >  static void __init zone_sizes_init(void)
> > @@ -222,7 +224,7 @@ static void __init setup_bootmem(void)
> >  }
> >
> >  #ifdef CONFIG_MMU
> > -static struct pt_alloc_ops _pt_ops __initdata;
> > +struct pt_alloc_ops _pt_ops __initdata;
> >
> >  #ifdef CONFIG_XIP_KERNEL
> >  #define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
> > @@ -238,6 +240,7 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> >  static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
> >
> >  pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> > +static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> >  static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
> >
> >  #ifdef CONFIG_XIP_KERNEL
> > @@ -326,6 +329,16 @@ static pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
> >  #define early_pmd      ((pmd_t *)XIP_FIXUP(early_pmd))
> >  #endif /* CONFIG_XIP_KERNEL */
> >
> > +static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
> > +static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
> > +static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> > +
> > +#ifdef CONFIG_XIP_KERNEL
> > +#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
> > +#define fixmap_pud     ((pud_t *)XIP_FIXUP(fixmap_pud))
> > +#define early_pud      ((pud_t *)XIP_FIXUP(early_pud))
> > +#endif /* CONFIG_XIP_KERNEL */
> > +
> >  static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
> >  {
> >         /* Before MMU is enabled */
> > @@ -345,7 +358,7 @@ static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
> >
> >  static phys_addr_t __init alloc_pmd_early(uintptr_t va)
> >  {
> > -       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> > +       BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
> >
> >         return (uintptr_t)early_pmd;
> >  }
> > @@ -391,21 +404,97 @@ static void __init create_pmd_mapping(pmd_t *pmdp,
> >         create_pte_mapping(ptep, va, pa, sz, prot);
> >  }
> >
> > -#define pgd_next_t             pmd_t
> > -#define alloc_pgd_next(__va)   pt_ops.alloc_pmd(__va)
> > -#define get_pgd_next_virt(__pa)        pt_ops.get_pmd_virt(__pa)
> > +static pud_t *__init get_pud_virt_early(phys_addr_t pa)
> > +{
> > +       return (pud_t *)((uintptr_t)pa);
> > +}
> > +
> > +static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
> > +{
> > +       clear_fixmap(FIX_PUD);
> > +       return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
> > +}
> > +
> > +static pud_t *__init get_pud_virt_late(phys_addr_t pa)
> > +{
> > +       return (pud_t *)__va(pa);
> > +}
> > +
> > +static phys_addr_t __init alloc_pud_early(uintptr_t va)
> > +{
> > +       /* Only one PUD is available for early mapping */
> > +       BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
> > +
> > +       return (uintptr_t)early_pud;
> > +}
> > +
> > +static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
> > +{
> > +       return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> > +}
> > +
> > +static phys_addr_t alloc_pud_late(uintptr_t va)
> > +{
> > +       unsigned long vaddr;
> > +
> > +       vaddr = __get_free_page(GFP_KERNEL);
> > +       BUG_ON(!vaddr);
> > +       return __pa(vaddr);
> > +}
> > +
> > +static void __init create_pud_mapping(pud_t *pudp,
> > +                                     uintptr_t va, phys_addr_t pa,
> > +                                     phys_addr_t sz, pgprot_t prot)
> > +{
> > +       pmd_t *nextp;
> > +       phys_addr_t next_phys;
> > +       uintptr_t pud_index = pud_index(va);
> > +
> > +       if (sz == PUD_SIZE) {
> > +               if (pud_val(pudp[pud_index]) == 0)
> > +                       pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
> > +               return;
> > +       }
> > +
> > +       if (pud_val(pudp[pud_index]) == 0) {
> > +               next_phys = pt_ops.alloc_pmd(va);
> > +               pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
> > +               nextp = pt_ops.get_pmd_virt(next_phys);
> > +               memset(nextp, 0, PAGE_SIZE);
> > +       } else {
> > +               next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
> > +               nextp = pt_ops.get_pmd_virt(next_phys);
> > +       }
> > +
> > +       create_pmd_mapping(nextp, va, pa, sz, prot);
> > +}
> > +
> > +#define pgd_next_t             pud_t
> > +#define alloc_pgd_next(__va)   (pgtable_l4_enabled ?                   \
> > +               pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
> > +#define get_pgd_next_virt(__pa)        (pgtable_l4_enabled ?                   \
> > +               pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
> >  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
> > -       create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
> > -#define fixmap_pgd_next                fixmap_pmd
> > +                               (pgtable_l4_enabled ?                   \
> > +               create_pud_mapping(__nextp, __va, __pa, __sz, __prot) : \
> > +               create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
> > +#define fixmap_pgd_next                (pgtable_l4_enabled ?                   \
> > +               (uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
> > +#define trampoline_pgd_next    (pgtable_l4_enabled ?                   \
> > +               (uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
> > +#define early_dtb_pgd_next     (pgtable_l4_enabled ?                   \
> > +               (uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
> >  #else
> >  #define pgd_next_t             pte_t
> >  #define alloc_pgd_next(__va)   pt_ops.alloc_pte(__va)
> >  #define get_pgd_next_virt(__pa)        pt_ops.get_pte_virt(__pa)
> >  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)     \
> >         create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
> > -#define fixmap_pgd_next                fixmap_pte
> > +#define fixmap_pgd_next                ((uintptr_t)fixmap_pte)
> > +#define early_dtb_pgd_next     ((uintptr_t)early_dtb_pmd)
> > +#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
> >  #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
> > -#endif
> > +#endif /* __PAGETABLE_PMD_FOLDED */
> >
> >  void __init create_pgd_mapping(pgd_t *pgdp,
> >                                       uintptr_t va, phys_addr_t pa,
> > @@ -493,6 +582,57 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
> >  }
> >  #endif /* CONFIG_STRICT_KERNEL_RWX */
> >
> > +#ifdef CONFIG_64BIT
> > +static void __init disable_pgtable_l4(void)
> > +{
> > +       pgtable_l4_enabled = false;
> > +       kernel_map.page_offset = PAGE_OFFSET_L3;
> > +       satp_mode = SATP_MODE_39;
> > +}
> > +
> > +/*
> > + * There is a simple way to determine if 4-level is supported by the
> > + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> > + * then read SATP to see if the configuration was taken into account
> > + * meaning sv48 is supported.
> > + */
> > +static __init void set_satp_mode(void)
> > +{
> > +       u64 identity_satp, hw_satp;
> > +       uintptr_t set_satp_mode_pmd;
> > +
> > +       set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> > +       create_pgd_mapping(early_pg_dir,
> > +                          set_satp_mode_pmd, (uintptr_t)early_pud,
> > +                          PGDIR_SIZE, PAGE_TABLE);
> > +       create_pud_mapping(early_pud,
> > +                          set_satp_mode_pmd, (uintptr_t)early_pmd,
> > +                          PUD_SIZE, PAGE_TABLE);
> > +       /* Handle the case where set_satp_mode straddles 2 PMDs */
> > +       create_pmd_mapping(early_pmd,
> > +                          set_satp_mode_pmd, set_satp_mode_pmd,
> > +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> > +       create_pmd_mapping(early_pmd,
> > +                          set_satp_mode_pmd + PMD_SIZE,
> > +                          set_satp_mode_pmd + PMD_SIZE,
> > +                          PMD_SIZE, PAGE_KERNEL_EXEC);
> > +
> > +       identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> > +
> > +       local_flush_tlb_all();
> > +       csr_write(CSR_SATP, identity_satp);
> > +       hw_satp = csr_swap(CSR_SATP, 0ULL);
> > +       local_flush_tlb_all();
> > +
> > +       if (hw_satp != identity_satp)
> > +               disable_pgtable_l4();
> > +
> > +       memset(early_pg_dir, 0, PAGE_SIZE);
> > +       memset(early_pud, 0, PAGE_SIZE);
> > +       memset(early_pmd, 0, PAGE_SIZE);
> > +}
> > +#endif
> > +
> >  /*
> >   * setup_vm() is called from head.S with MMU-off.
> >   *
> > @@ -557,10 +697,15 @@ static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
> >         uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
> >
> >         create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> > -                          IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
> > +                          IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
> >                            PGDIR_SIZE,
> >                            IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
> >
> > +       if (pgtable_l4_enabled) {
> > +               create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
> > +                                  (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
> > +       }
> > +
> >         if (IS_ENABLED(CONFIG_64BIT)) {
> >                 create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
> >                                    pa, PMD_SIZE, PAGE_KERNEL);
> > @@ -593,6 +738,8 @@ void pt_ops_set_early(void)
> >  #ifndef __PAGETABLE_PMD_FOLDED
> >         pt_ops.alloc_pmd = alloc_pmd_early;
> >         pt_ops.get_pmd_virt = get_pmd_virt_early;
> > +       pt_ops.alloc_pud = alloc_pud_early;
> > +       pt_ops.get_pud_virt = get_pud_virt_early;
> >  #endif
> >  }
> >
> > @@ -611,6 +758,8 @@ void pt_ops_set_fixmap(void)
> >  #ifndef __PAGETABLE_PMD_FOLDED
> >         pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
> >         pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
> > +       pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
> > +       pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
> >  #endif
> >  }
> >
> > @@ -625,6 +774,8 @@ void pt_ops_set_late(void)
> >  #ifndef __PAGETABLE_PMD_FOLDED
> >         pt_ops.alloc_pmd = alloc_pmd_late;
> >         pt_ops.get_pmd_virt = get_pmd_virt_late;
> > +       pt_ops.alloc_pud = alloc_pud_late;
> > +       pt_ops.get_pud_virt = get_pud_virt_late;
> >  #endif
> >  }
> >
> > @@ -633,6 +784,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >         pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
> >
> >         kernel_map.virt_addr = KERNEL_LINK_ADDR;
> > +       kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
> >
> >  #ifdef CONFIG_XIP_KERNEL
> >         kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
> > @@ -647,6 +799,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >         kernel_map.phys_addr = (uintptr_t)(&_start);
> >         kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
> >  #endif
> > +
> > +#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
> > +       set_satp_mode();
> > +#endif
> > +
> >         kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> >         kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> >
> > @@ -676,15 +833,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >
> >         /* Setup early PGD for fixmap */
> >         create_pgd_mapping(early_pg_dir, FIXADDR_START,
> > -                          (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> > +                          fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> >
> >  #ifndef __PAGETABLE_PMD_FOLDED
> > -       /* Setup fixmap PMD */
> > +       /* Setup fixmap PUD and PMD */
> > +       if (pgtable_l4_enabled)
> > +               create_pud_mapping(fixmap_pud, FIXADDR_START,
> > +                                  (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
> >         create_pmd_mapping(fixmap_pmd, FIXADDR_START,
> >                            (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
> >         /* Setup trampoline PGD and PMD */
> >         create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
> > -                          (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
> > +                          trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
> > +       if (pgtable_l4_enabled)
> > +               create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
> > +                                  (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
> >  #ifdef CONFIG_XIP_KERNEL
> >         create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
> >                            kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
> > @@ -712,7 +875,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> >          * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
> >          * range can not span multiple pmds.
> >          */
> > -       BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> > +       BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> >                      != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
> >
> >  #ifndef __PAGETABLE_PMD_FOLDED
> > @@ -783,9 +946,10 @@ static void __init setup_vm_final(void)
> >         /* Clear fixmap PTE and PMD mappings */
> >         clear_fixmap(FIX_PTE);
> >         clear_fixmap(FIX_PMD);
> > +       clear_fixmap(FIX_PUD);
> >
> >         /* Move to swapper page table */
> > -       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
> > +       csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
> >         local_flush_tlb_all();
> >
> >         pt_ops_set_late();
> > diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> > index 1434a0225140..993f50571a3b 100644
> > --- a/arch/riscv/mm/kasan_init.c
> > +++ b/arch/riscv/mm/kasan_init.c
> > @@ -11,7 +11,29 @@
> >  #include <asm/fixmap.h>
> >  #include <asm/pgalloc.h>
> >
> > +/*
> > + * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
> > + * which is right before the kernel.
> > + *
> > + * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
> > + * the page global directory with kasan_early_shadow_pmd.
> > + *
> > + * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
> > + * must be divided as follows:
> > + * - the first PGD entry, although incomplete, is populated with
> > + *   kasan_early_shadow_pud/p4d
> > + * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
> > + * - the last PGD entry is shared with the kernel mapping so populated at the
> > + *   lower levels pud/p4d
> > + *
> > + * In addition, when shallow populating a kasan region (for example vmalloc),
> > + * this region may also not be aligned on PGDIR size, so we must go down to the
> > + * pud level too.
> > + */
> > +
> >  extern pgd_t early_pg_dir[PTRS_PER_PGD];
> > +extern struct pt_alloc_ops _pt_ops __initdata;
> > +#define pt_ops _pt_ops
> >
> >  static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
> >  {
> > @@ -35,15 +57,19 @@ static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned
> >         set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
> >  }
> >
> > -static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
> > +static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
> >  {
> >         phys_addr_t phys_addr;
> >         pmd_t *pmdp, *base_pmd;
> >         unsigned long next;
> >
> > -       base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
> > -       if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> > +       if (pud_none(*pud)) {
> >                 base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> > +       } else {
> > +               base_pmd = (pmd_t *)pud_pgtable(*pud);
> > +               if (base_pmd == lm_alias(kasan_early_shadow_pmd))
> > +                       base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
> > +       }
> >
> >         pmdp = base_pmd + pmd_index(vaddr);
> >
> > @@ -67,9 +93,72 @@ static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned
> >          * it entirely, memblock could allocate a page at a physical address
> >          * where KASAN is not populated yet and then we'd get a page fault.
> >          */
> > -       set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> > +       set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> > +}
> > +
> > +static void __init kasan_populate_pud(pgd_t *pgd,
> > +                                     unsigned long vaddr, unsigned long end,
> > +                                     bool early)
> > +{
> > +       phys_addr_t phys_addr;
> > +       pud_t *pudp, *base_pud;
> > +       unsigned long next;
> > +
> > +       if (early) {
> > +               /*
> > +                * We can't use pgd_page_vaddr here as it would return a linear
> > +                * mapping address but it is not mapped yet, but when populating
> > +                * early_pg_dir, we need the physical address and when populating
> > +                * swapper_pg_dir, we need the kernel virtual address so use
> > +                * pt_ops facility.
> > +                */
> > +               base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
> > +       } else {
> > +               base_pud = (pud_t *)pgd_page_vaddr(*pgd);
> > +               if (base_pud == lm_alias(kasan_early_shadow_pud))
> > +                       base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
> > +       }
> > +
> > +       pudp = base_pud + pud_index(vaddr);
> > +
> > +       do {
> > +               next = pud_addr_end(vaddr, end);
> > +
> > +               if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
> > +                       if (early) {
> > +                               phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
> > +                               set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
> > +                               continue;
> > +                       } else {
> > +                               phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
> > +                               if (phys_addr) {
> > +                                       set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
> > +                                       continue;
> > +                               }
> > +                       }
> > +               }
> > +
> > +               kasan_populate_pmd(pudp, vaddr, next);
> > +       } while (pudp++, vaddr = next, vaddr != end);
> > +
> > +       /*
> > +        * Wait for the whole PGD to be populated before setting the PGD in
> > +        * the page table, otherwise, if we did set the PGD before populating
> > +        * it entirely, memblock could allocate a page at a physical address
> > +        * where KASAN is not populated yet and then we'd get a page fault.
> > +        */
> > +       if (!early)
> > +               set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
> >  }
> >
> > +#define kasan_early_shadow_pgd_next                    (pgtable_l4_enabled ?   \
> > +                               (uintptr_t)kasan_early_shadow_pud :             \
> > +                               (uintptr_t)kasan_early_shadow_pmd)
> > +#define kasan_populate_pgd_next(pgdp, vaddr, next, early)                      \
> > +               (pgtable_l4_enabled ?                                           \
> > +                       kasan_populate_pud(pgdp, vaddr, next, early) :          \
> > +                       kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
> > +
> >  static void __init kasan_populate_pgd(pgd_t *pgdp,
> >                                       unsigned long vaddr, unsigned long end,
> >                                       bool early)
> > @@ -102,7 +191,7 @@ static void __init kasan_populate_pgd(pgd_t *pgdp,
> >                         }
> >                 }
> >
> > -               kasan_populate_pmd(pgdp, vaddr, next);
> > +               kasan_populate_pgd_next(pgdp, vaddr, next, early);
> >         } while (pgdp++, vaddr = next, vaddr != end);
> >  }
> >
> > @@ -157,18 +246,54 @@ static void __init kasan_populate(void *start, void *end)
> >         memset(start, KASAN_SHADOW_INIT, end - start);
> >  }
> >
> > +static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
> > +                                             unsigned long vaddr, unsigned long end,
> > +                                             bool kasan_populate)
> > +{
> > +       unsigned long next;
> > +       pud_t *pudp, *base_pud;
> > +       pmd_t *base_pmd;
> > +       bool is_kasan_pmd;
> > +
> > +       base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
> > +       pudp = base_pud + pud_index(vaddr);
> > +
> > +       if (kasan_populate)
> > +               memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
> > +                      sizeof(pud_t) * PTRS_PER_PUD);
> > +
> > +       do {
> > +               next = pud_addr_end(vaddr, end);
> > +               is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
> > +
> > +               if (is_kasan_pmd) {
> > +                       base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> > +                       set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
> > +               }
> > +       } while (pudp++, vaddr = next, vaddr != end);
> > +}
> > +
> >  static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
> >  {
> >         unsigned long next;
> >         void *p;
> >         pgd_t *pgd_k = pgd_offset_k(vaddr);
> > +       bool is_kasan_pgd_next;
> >
> >         do {
> >                 next = pgd_addr_end(vaddr, end);
> > -               if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
> > +               is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
> > +                                    (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
> > +
> > +               if (is_kasan_pgd_next) {
> >                         p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> >                         set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
> >                 }
> > +
> > +               if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
> > +                       continue;
> > +
> > +               kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
> >         } while (pgd_k++, vaddr = next, vaddr != end);
> >  }
> >
> > diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
> > index 26e69788f27a..b3db5d91ed38 100644
> > --- a/drivers/firmware/efi/libstub/efi-stub.c
> > +++ b/drivers/firmware/efi/libstub/efi-stub.c
> > @@ -40,6 +40,8 @@
> >
> >  #ifdef CONFIG_ARM64
> >  # define EFI_RT_VIRTUAL_LIMIT  DEFAULT_MAP_WINDOW_64
> > +#elif defined(CONFIG_RISCV)
> > +# define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE_MIN
> >  #else
> >  # define EFI_RT_VIRTUAL_LIMIT  TASK_SIZE
> >  #endif
> > --
> > 2.32.0
> >
>
>
> --
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-26  8:59     ` Jisheng Zhang
@ 2022-01-04 12:44       ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-04 12:44 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

Hi Jisheng,

On Sun, Dec 26, 2021 at 10:06 AM Jisheng Zhang
<jszhang3@mail.ustc.edu.cn> wrote:
>
> On Mon,  6 Dec 2021 11:46:51 +0100
> Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:
>
> > By adding a new 4th level of page table, give the possibility to 64bit
> > kernel to address 2^48 bytes of virtual address: in practice, that offers
> > 128TB of virtual address space to userspace and allows up to 64TB of
> > physical memory.
> >
> > If the underlying hardware does not support sv48, we will automatically
> > fallback to a standard 3-level page table by folding the new PUD level into
> > PGDIR level. In order to detect HW capabilities at runtime, we
> > use SATP feature that ignores writes with an unsupported mode.
> >
> > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > ---
> >  arch/riscv/Kconfig                      |   4 +-
> >  arch/riscv/include/asm/csr.h            |   3 +-
> >  arch/riscv/include/asm/fixmap.h         |   1 +
> >  arch/riscv/include/asm/kasan.h          |   6 +-
> >  arch/riscv/include/asm/page.h           |  14 ++
> >  arch/riscv/include/asm/pgalloc.h        |  40 +++++
> >  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
> >  arch/riscv/include/asm/pgtable.h        |  24 ++-
> >  arch/riscv/kernel/head.S                |   3 +-
> >  arch/riscv/mm/context.c                 |   4 +-
> >  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
> >  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
> >  drivers/firmware/efi/libstub/efi-stub.c |   2 +
> >  13 files changed, 514 insertions(+), 44 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index ac6c0cd9bc29..d28fe0148e13 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -150,7 +150,7 @@ config PAGE_OFFSET
> >       hex
> >       default 0xC0000000 if 32BIT
> >       default 0x80000000 if 64BIT && !MMU
> > -     default 0xffffffd800000000 if 64BIT
> > +     default 0xffffaf8000000000 if 64BIT
> >
> >  config KASAN_SHADOW_OFFSET
> >       hex
> > @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
> >
> >  config PGTABLE_LEVELS
> >       int
> > -     default 3 if 64BIT
> > +     default 4 if 64BIT
> >       default 2
> >
> >  config LOCKDEP_SUPPORT
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index 87ac65696871..3fdb971c7896 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -40,14 +40,13 @@
> >  #ifndef CONFIG_64BIT
> >  #define SATP_PPN     _AC(0x003FFFFF, UL)
> >  #define SATP_MODE_32 _AC(0x80000000, UL)
> > -#define SATP_MODE    SATP_MODE_32
> >  #define SATP_ASID_BITS       9
> >  #define SATP_ASID_SHIFT      22
> >  #define SATP_ASID_MASK       _AC(0x1FF, UL)
> >  #else
> >  #define SATP_PPN     _AC(0x00000FFFFFFFFFFF, UL)
> >  #define SATP_MODE_39 _AC(0x8000000000000000, UL)
> > -#define SATP_MODE    SATP_MODE_39
> > +#define SATP_MODE_48 _AC(0x9000000000000000, UL)
> >  #define SATP_ASID_BITS       16
> >  #define SATP_ASID_SHIFT      44
> >  #define SATP_ASID_MASK       _AC(0xFFFF, UL)
> > diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> > index 54cbf07fb4e9..58a718573ad6 100644
> > --- a/arch/riscv/include/asm/fixmap.h
> > +++ b/arch/riscv/include/asm/fixmap.h
> > @@ -24,6 +24,7 @@ enum fixed_addresses {
> >       FIX_HOLE,
> >       FIX_PTE,
> >       FIX_PMD,
> > +     FIX_PUD,
> >       FIX_TEXT_POKE1,
> >       FIX_TEXT_POKE0,
> >       FIX_EARLYCON_MEM_BASE,
> > diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> > index 743e6ff57996..0b85e363e778 100644
> > --- a/arch/riscv/include/asm/kasan.h
> > +++ b/arch/riscv/include/asm/kasan.h
> > @@ -28,7 +28,11 @@
> >  #define KASAN_SHADOW_SCALE_SHIFT     3
> >
> >  #define KASAN_SHADOW_SIZE    (UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> > -#define KASAN_SHADOW_START   (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> > +/*
> > + * Depending on the size of the virtual address space, the region may not be
> > + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> > + */
> > +#define KASAN_SHADOW_START   ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
> >  #define KASAN_SHADOW_END     MODULES_LOWEST_VADDR
> >  #define KASAN_SHADOW_OFFSET  _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
> >
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index e03559f9b35e..d089fe46f7d8 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -31,7 +31,20 @@
> >   * When not using MMU this corresponds to the first free page in
> >   * physical memory (aligned on a page boundary).
> >   */
> > +#ifdef CONFIG_64BIT
> > +#ifdef CONFIG_MMU
> > +#define PAGE_OFFSET          kernel_map.page_offset
> > +#else
> > +#define PAGE_OFFSET          _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif
> > +/*
> > + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> > + * define the PAGE_OFFSET value for SV39.
> > + */
> > +#define PAGE_OFFSET_L3               _AC(0xffffffd800000000, UL)
> > +#else
> >  #define PAGE_OFFSET          _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif /* CONFIG_64BIT */
> >
> >  /*
> >   * Half of the kernel address space (half of the entries of the page global
> > @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
> >  #endif /* CONFIG_MMU */
> >
> >  struct kernel_mapping {
> > +     unsigned long page_offset;
> >       unsigned long virt_addr;
> >       uintptr_t phys_addr;
> >       uintptr_t size;
> > diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> > index 0af6933a7100..11823004b87a 100644
> > --- a/arch/riscv/include/asm/pgalloc.h
> > +++ b/arch/riscv/include/asm/pgalloc.h
> > @@ -11,6 +11,8 @@
> >  #include <asm/tlb.h>
> >
> >  #ifdef CONFIG_MMU
> > +#define __HAVE_ARCH_PUD_ALLOC_ONE
> > +#define __HAVE_ARCH_PUD_FREE
> >  #include <asm-generic/pgalloc.h>
> >
> >  static inline void pmd_populate_kernel(struct mm_struct *mm,
> > @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> >
> >       set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> >  }
> > +
> > +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> > +{
> > +     if (pgtable_l4_enabled) {
> > +             unsigned long pfn = virt_to_pfn(pud);
> > +
> > +             set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +     }
> > +}
> > +
> > +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> > +                                  pud_t *pud)
> > +{
> > +     if (pgtable_l4_enabled) {
> > +             unsigned long pfn = virt_to_pfn(pud);
> > +
> > +             set_p4d_safe(p4d,
> > +                          __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +     }
> > +}
> > +
> > +#define pud_alloc_one pud_alloc_one
> > +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return __pud_alloc_one(mm, addr);
> > +
> > +     return NULL;
> > +}
> > +
> > +#define pud_free pud_free
> > +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             __pud_free(mm, pud);
> > +}
> > +
> > +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
> >  #endif /* __PAGETABLE_PMD_FOLDED */
> >
> >  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> > index 228261aa9628..bbbdd66e5e2f 100644
> > --- a/arch/riscv/include/asm/pgtable-64.h
> > +++ b/arch/riscv/include/asm/pgtable-64.h
> > @@ -8,16 +8,36 @@
> >
> >  #include <linux/const.h>
> >
> > -#define PGDIR_SHIFT     30
> > +extern bool pgtable_l4_enabled;
> > +
> > +#define PGDIR_SHIFT_L3  30
> > +#define PGDIR_SHIFT_L4  39
> > +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> > +
> > +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
> >  /* Size of region mapped by a page global directory */
> >  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
> >  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
> >
> > +/* pud is folded into pgd in case of 3-level page table */
> > +#define PUD_SHIFT      30
> > +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> > +#define PUD_MASK       (~(PUD_SIZE - 1))
> > +
> >  #define PMD_SHIFT       21
> >  /* Size of region mapped by a page middle directory */
> >  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
> >  #define PMD_MASK        (~(PMD_SIZE - 1))
> >
> > +/* Page Upper Directory entry */
> > +typedef struct {
> > +     unsigned long pud;
> > +} pud_t;
> > +
> > +#define pud_val(x)      ((x).pud)
> > +#define __pud(x)        ((pud_t) { (x) })
> > +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> > +
> >  /* Page Middle Directory entry */
> >  typedef struct {
> >       unsigned long pmd;
> > @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
> >       set_pud(pudp, __pud(0));
> >  }
> >
> > +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> > +{
> > +     return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > +}
> > +
> > +static inline unsigned long _pud_pfn(pud_t pud)
> > +{
> > +     return pud_val(pud) >> _PAGE_PFN_SHIFT;
> > +}
> > +
> >  static inline pmd_t *pud_pgtable(pud_t pud)
> >  {
> >       return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> > @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
> >       return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
> >  }
> >
> > +#define mm_pud_folded  mm_pud_folded
> > +static inline bool mm_pud_folded(struct mm_struct *mm)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return false;
> > +
> > +     return true;
> > +}
> > +
> > +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> > +
> >  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
> >  {
> >       return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
> >  #define pmd_ERROR(e) \
> >       pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
> >
> > +#define pud_ERROR(e)   \
> > +     pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> > +
> > +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             *p4dp = p4d;
> > +     else
> > +             set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> > +}
> > +
> > +static inline int p4d_none(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return (p4d_val(p4d) == 0);
> > +
> > +     return 0;
> > +}
> > +
> > +static inline int p4d_present(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return (p4d_val(p4d) & _PAGE_PRESENT);
> > +
> > +     return 1;
> > +}
> > +
> > +static inline int p4d_bad(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return !p4d_present(p4d);
> > +
> > +     return 0;
> > +}
> > +
> > +static inline void p4d_clear(p4d_t *p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             set_p4d(p4d, __p4d(0));
> > +}
> > +
> > +static inline pud_t *p4d_pgtable(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +
> > +     return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> > +}
> > +
> > +static inline struct page *p4d_page(p4d_t p4d)
> > +{
> > +     return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +}
> > +
> > +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> > +
> > +#define pud_offset pud_offset
> > +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return p4d_pgtable(*p4d) + pud_index(address);
> > +
> > +     return (pud_t *)p4d;
> > +}
> > +
> >  #endif /* _ASM_RISCV_PGTABLE_64_H */
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index e1a52e22ad7e..e1c74ef4ead2 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -51,7 +51,7 @@
> >   * position vmemmap directly below the VMALLOC region.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define VA_BITS              39
> > +#define VA_BITS              (pgtable_l4_enabled ? 48 : 39)
> >  #else
> >  #define VA_BITS              32
> >  #endif
> > @@ -90,8 +90,7 @@
> >
> >  #ifndef __ASSEMBLY__
> >
> > -/* Page Upper Directory not used in RISC-V */
> > -#include <asm-generic/pgtable-nopud.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >  #include <asm/page.h>
> >  #include <asm/tlbflush.h>
> >  #include <linux/mm_types.h>
> > @@ -113,6 +112,17 @@
> >  #define XIP_FIXUP(addr)              (addr)
> >  #endif /* CONFIG_XIP_KERNEL */
> >
> > +struct pt_alloc_ops {
> > +     pte_t *(*get_pte_virt)(phys_addr_t pa);
> > +     phys_addr_t (*alloc_pte)(uintptr_t va);
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +     pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> > +     phys_addr_t (*alloc_pmd)(uintptr_t va);
> > +     pud_t *(*get_pud_virt)(phys_addr_t pa);
> > +     phys_addr_t (*alloc_pud)(uintptr_t va);
> > +#endif
> > +};
> > +
> >  #ifdef CONFIG_MMU
> >  /* Number of entries in the page global directory */
> >  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> > @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> >   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
> >  #else
> > -#define TASK_SIZE FIXADDR_START
> > +#define TASK_SIZE    FIXADDR_START
> > +#define TASK_SIZE_MIN        TASK_SIZE
> >  #endif
> >
> >  #else /* CONFIG_MMU */
> > @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
> >  #define dtb_early_va _dtb_early_va
> >  #define dtb_early_pa _dtb_early_pa
> >  #endif /* CONFIG_XIP_KERNEL */
> > +extern u64 satp_mode;
> > +extern bool pgtable_l4_enabled;
> >
> >  void paging_init(void);
> >  void misc_mem_init(void);
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index 52c5ff9804c5..c3c0ed559770 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -95,7 +95,8 @@ relocate:
> >
> >       /* Compute satp for kernel page tables, but don't load it yet */
> >       srl a2, a0, PAGE_SHIFT
> > -     li a1, SATP_MODE
> > +     la a1, satp_mode
> > +     REG_L a1, 0(a1)
> >       or a2, a2, a1
> >
> >       /*
> > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> > index ee3459cb6750..a7246872bd30 100644
> > --- a/arch/riscv/mm/context.c
> > +++ b/arch/riscv/mm/context.c
> > @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  switch_mm_fast:
> >       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
> >                 ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> > -               SATP_MODE);
> > +               satp_mode);
> >
> >       if (need_flush_tlb)
> >               local_flush_tlb_all();
> > @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  static void set_mm_noasid(struct mm_struct *mm)
> >  {
> >       /* Switch the page table and blindly nuke entire local TLB */
> > -     csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> > +     csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
> >       local_flush_tlb_all();
> >  }
> >
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 1552226fb6bd..6a19a1b1caf8 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
> >  #define kernel_map   (*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
> >  #endif
> >
> > +#ifdef CONFIG_64BIT
> > +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> > +#else
> > +u64 satp_mode = SATP_MODE_32;
> > +#endif
> > +EXPORT_SYMBOL(satp_mode);
> > +
> > +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> > +                             true : false;
>
> Hi Alex,
>
> I'm not sure whether we can use static key for pgtable_l4_enabled or
> not. Obviously, for a specific HW platform, pgtable_l4_enabled won't change
> after boot, and it seems it sits hot code path, so IMHO, static key maybe
> suitable for it.

Thanks for the suggestion, I'll explore that after this series is
merged if you don't mind.

Thanks,

Alex

>
> Thanks
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
@ 2022-01-04 12:44       ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-04 12:44 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

Hi Jisheng,

On Sun, Dec 26, 2021 at 10:06 AM Jisheng Zhang
<jszhang3@mail.ustc.edu.cn> wrote:
>
> On Mon,  6 Dec 2021 11:46:51 +0100
> Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:
>
> > By adding a new 4th level of page table, give the possibility to 64bit
> > kernel to address 2^48 bytes of virtual address: in practice, that offers
> > 128TB of virtual address space to userspace and allows up to 64TB of
> > physical memory.
> >
> > If the underlying hardware does not support sv48, we will automatically
> > fallback to a standard 3-level page table by folding the new PUD level into
> > PGDIR level. In order to detect HW capabilities at runtime, we
> > use SATP feature that ignores writes with an unsupported mode.
> >
> > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> > ---
> >  arch/riscv/Kconfig                      |   4 +-
> >  arch/riscv/include/asm/csr.h            |   3 +-
> >  arch/riscv/include/asm/fixmap.h         |   1 +
> >  arch/riscv/include/asm/kasan.h          |   6 +-
> >  arch/riscv/include/asm/page.h           |  14 ++
> >  arch/riscv/include/asm/pgalloc.h        |  40 +++++
> >  arch/riscv/include/asm/pgtable-64.h     | 108 +++++++++++-
> >  arch/riscv/include/asm/pgtable.h        |  24 ++-
> >  arch/riscv/kernel/head.S                |   3 +-
> >  arch/riscv/mm/context.c                 |   4 +-
> >  arch/riscv/mm/init.c                    | 212 +++++++++++++++++++++---
> >  arch/riscv/mm/kasan_init.c              | 137 ++++++++++++++-
> >  drivers/firmware/efi/libstub/efi-stub.c |   2 +
> >  13 files changed, 514 insertions(+), 44 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index ac6c0cd9bc29..d28fe0148e13 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -150,7 +150,7 @@ config PAGE_OFFSET
> >       hex
> >       default 0xC0000000 if 32BIT
> >       default 0x80000000 if 64BIT && !MMU
> > -     default 0xffffffd800000000 if 64BIT
> > +     default 0xffffaf8000000000 if 64BIT
> >
> >  config KASAN_SHADOW_OFFSET
> >       hex
> > @@ -201,7 +201,7 @@ config FIX_EARLYCON_MEM
> >
> >  config PGTABLE_LEVELS
> >       int
> > -     default 3 if 64BIT
> > +     default 4 if 64BIT
> >       default 2
> >
> >  config LOCKDEP_SUPPORT
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index 87ac65696871..3fdb971c7896 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -40,14 +40,13 @@
> >  #ifndef CONFIG_64BIT
> >  #define SATP_PPN     _AC(0x003FFFFF, UL)
> >  #define SATP_MODE_32 _AC(0x80000000, UL)
> > -#define SATP_MODE    SATP_MODE_32
> >  #define SATP_ASID_BITS       9
> >  #define SATP_ASID_SHIFT      22
> >  #define SATP_ASID_MASK       _AC(0x1FF, UL)
> >  #else
> >  #define SATP_PPN     _AC(0x00000FFFFFFFFFFF, UL)
> >  #define SATP_MODE_39 _AC(0x8000000000000000, UL)
> > -#define SATP_MODE    SATP_MODE_39
> > +#define SATP_MODE_48 _AC(0x9000000000000000, UL)
> >  #define SATP_ASID_BITS       16
> >  #define SATP_ASID_SHIFT      44
> >  #define SATP_ASID_MASK       _AC(0xFFFF, UL)
> > diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> > index 54cbf07fb4e9..58a718573ad6 100644
> > --- a/arch/riscv/include/asm/fixmap.h
> > +++ b/arch/riscv/include/asm/fixmap.h
> > @@ -24,6 +24,7 @@ enum fixed_addresses {
> >       FIX_HOLE,
> >       FIX_PTE,
> >       FIX_PMD,
> > +     FIX_PUD,
> >       FIX_TEXT_POKE1,
> >       FIX_TEXT_POKE0,
> >       FIX_EARLYCON_MEM_BASE,
> > diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> > index 743e6ff57996..0b85e363e778 100644
> > --- a/arch/riscv/include/asm/kasan.h
> > +++ b/arch/riscv/include/asm/kasan.h
> > @@ -28,7 +28,11 @@
> >  #define KASAN_SHADOW_SCALE_SHIFT     3
> >
> >  #define KASAN_SHADOW_SIZE    (UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
> > -#define KASAN_SHADOW_START   (KASAN_SHADOW_END - KASAN_SHADOW_SIZE)
> > +/*
> > + * Depending on the size of the virtual address space, the region may not be
> > + * aligned on PGDIR_SIZE, so force its alignment to ease its population.
> > + */
> > +#define KASAN_SHADOW_START   ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
> >  #define KASAN_SHADOW_END     MODULES_LOWEST_VADDR
> >  #define KASAN_SHADOW_OFFSET  _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
> >
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index e03559f9b35e..d089fe46f7d8 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -31,7 +31,20 @@
> >   * When not using MMU this corresponds to the first free page in
> >   * physical memory (aligned on a page boundary).
> >   */
> > +#ifdef CONFIG_64BIT
> > +#ifdef CONFIG_MMU
> > +#define PAGE_OFFSET          kernel_map.page_offset
> > +#else
> > +#define PAGE_OFFSET          _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif
> > +/*
> > + * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
> > + * define the PAGE_OFFSET value for SV39.
> > + */
> > +#define PAGE_OFFSET_L3               _AC(0xffffffd800000000, UL)
> > +#else
> >  #define PAGE_OFFSET          _AC(CONFIG_PAGE_OFFSET, UL)
> > +#endif /* CONFIG_64BIT */
> >
> >  /*
> >   * Half of the kernel address space (half of the entries of the page global
> > @@ -90,6 +103,7 @@ extern unsigned long riscv_pfn_base;
> >  #endif /* CONFIG_MMU */
> >
> >  struct kernel_mapping {
> > +     unsigned long page_offset;
> >       unsigned long virt_addr;
> >       uintptr_t phys_addr;
> >       uintptr_t size;
> > diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> > index 0af6933a7100..11823004b87a 100644
> > --- a/arch/riscv/include/asm/pgalloc.h
> > +++ b/arch/riscv/include/asm/pgalloc.h
> > @@ -11,6 +11,8 @@
> >  #include <asm/tlb.h>
> >
> >  #ifdef CONFIG_MMU
> > +#define __HAVE_ARCH_PUD_ALLOC_ONE
> > +#define __HAVE_ARCH_PUD_FREE
> >  #include <asm-generic/pgalloc.h>
> >
> >  static inline void pmd_populate_kernel(struct mm_struct *mm,
> > @@ -36,6 +38,44 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> >
> >       set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> >  }
> > +
> > +static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
> > +{
> > +     if (pgtable_l4_enabled) {
> > +             unsigned long pfn = virt_to_pfn(pud);
> > +
> > +             set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +     }
> > +}
> > +
> > +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
> > +                                  pud_t *pud)
> > +{
> > +     if (pgtable_l4_enabled) {
> > +             unsigned long pfn = virt_to_pfn(pud);
> > +
> > +             set_p4d_safe(p4d,
> > +                          __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> > +     }
> > +}
> > +
> > +#define pud_alloc_one pud_alloc_one
> > +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return __pud_alloc_one(mm, addr);
> > +
> > +     return NULL;
> > +}
> > +
> > +#define pud_free pud_free
> > +static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             __pud_free(mm, pud);
> > +}
> > +
> > +#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
> >  #endif /* __PAGETABLE_PMD_FOLDED */
> >
> >  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> > index 228261aa9628..bbbdd66e5e2f 100644
> > --- a/arch/riscv/include/asm/pgtable-64.h
> > +++ b/arch/riscv/include/asm/pgtable-64.h
> > @@ -8,16 +8,36 @@
> >
> >  #include <linux/const.h>
> >
> > -#define PGDIR_SHIFT     30
> > +extern bool pgtable_l4_enabled;
> > +
> > +#define PGDIR_SHIFT_L3  30
> > +#define PGDIR_SHIFT_L4  39
> > +#define PGDIR_SIZE_L3   (_AC(1, UL) << PGDIR_SHIFT_L3)
> > +
> > +#define PGDIR_SHIFT     (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
> >  /* Size of region mapped by a page global directory */
> >  #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
> >  #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
> >
> > +/* pud is folded into pgd in case of 3-level page table */
> > +#define PUD_SHIFT      30
> > +#define PUD_SIZE       (_AC(1, UL) << PUD_SHIFT)
> > +#define PUD_MASK       (~(PUD_SIZE - 1))
> > +
> >  #define PMD_SHIFT       21
> >  /* Size of region mapped by a page middle directory */
> >  #define PMD_SIZE        (_AC(1, UL) << PMD_SHIFT)
> >  #define PMD_MASK        (~(PMD_SIZE - 1))
> >
> > +/* Page Upper Directory entry */
> > +typedef struct {
> > +     unsigned long pud;
> > +} pud_t;
> > +
> > +#define pud_val(x)      ((x).pud)
> > +#define __pud(x)        ((pud_t) { (x) })
> > +#define PTRS_PER_PUD    (PAGE_SIZE / sizeof(pud_t))
> > +
> >  /* Page Middle Directory entry */
> >  typedef struct {
> >       unsigned long pmd;
> > @@ -59,6 +79,16 @@ static inline void pud_clear(pud_t *pudp)
> >       set_pud(pudp, __pud(0));
> >  }
> >
> > +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
> > +{
> > +     return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > +}
> > +
> > +static inline unsigned long _pud_pfn(pud_t pud)
> > +{
> > +     return pud_val(pud) >> _PAGE_PFN_SHIFT;
> > +}
> > +
> >  static inline pmd_t *pud_pgtable(pud_t pud)
> >  {
> >       return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
> > @@ -69,6 +99,17 @@ static inline struct page *pud_page(pud_t pud)
> >       return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
> >  }
> >
> > +#define mm_pud_folded  mm_pud_folded
> > +static inline bool mm_pud_folded(struct mm_struct *mm)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return false;
> > +
> > +     return true;
> > +}
> > +
> > +#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
> > +
> >  static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
> >  {
> >       return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
> > @@ -84,4 +125,69 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
> >  #define pmd_ERROR(e) \
> >       pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
> >
> > +#define pud_ERROR(e)   \
> > +     pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
> > +
> > +static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             *p4dp = p4d;
> > +     else
> > +             set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
> > +}
> > +
> > +static inline int p4d_none(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return (p4d_val(p4d) == 0);
> > +
> > +     return 0;
> > +}
> > +
> > +static inline int p4d_present(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return (p4d_val(p4d) & _PAGE_PRESENT);
> > +
> > +     return 1;
> > +}
> > +
> > +static inline int p4d_bad(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return !p4d_present(p4d);
> > +
> > +     return 0;
> > +}
> > +
> > +static inline void p4d_clear(p4d_t *p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             set_p4d(p4d, __p4d(0));
> > +}
> > +
> > +static inline pud_t *p4d_pgtable(p4d_t p4d)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +
> > +     return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
> > +}
> > +
> > +static inline struct page *p4d_page(p4d_t p4d)
> > +{
> > +     return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
> > +}
> > +
> > +#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
> > +
> > +#define pud_offset pud_offset
> > +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> > +{
> > +     if (pgtable_l4_enabled)
> > +             return p4d_pgtable(*p4d) + pud_index(address);
> > +
> > +     return (pud_t *)p4d;
> > +}
> > +
> >  #endif /* _ASM_RISCV_PGTABLE_64_H */
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index e1a52e22ad7e..e1c74ef4ead2 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -51,7 +51,7 @@
> >   * position vmemmap directly below the VMALLOC region.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define VA_BITS              39
> > +#define VA_BITS              (pgtable_l4_enabled ? 48 : 39)
> >  #else
> >  #define VA_BITS              32
> >  #endif
> > @@ -90,8 +90,7 @@
> >
> >  #ifndef __ASSEMBLY__
> >
> > -/* Page Upper Directory not used in RISC-V */
> > -#include <asm-generic/pgtable-nopud.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >  #include <asm/page.h>
> >  #include <asm/tlbflush.h>
> >  #include <linux/mm_types.h>
> > @@ -113,6 +112,17 @@
> >  #define XIP_FIXUP(addr)              (addr)
> >  #endif /* CONFIG_XIP_KERNEL */
> >
> > +struct pt_alloc_ops {
> > +     pte_t *(*get_pte_virt)(phys_addr_t pa);
> > +     phys_addr_t (*alloc_pte)(uintptr_t va);
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +     pmd_t *(*get_pmd_virt)(phys_addr_t pa);
> > +     phys_addr_t (*alloc_pmd)(uintptr_t va);
> > +     pud_t *(*get_pud_virt)(phys_addr_t pa);
> > +     phys_addr_t (*alloc_pud)(uintptr_t va);
> > +#endif
> > +};
> > +
> >  #ifdef CONFIG_MMU
> >  /* Number of entries in the page global directory */
> >  #define PTRS_PER_PGD    (PAGE_SIZE / sizeof(pgd_t))
> > @@ -669,9 +679,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> >   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
> >   */
> >  #ifdef CONFIG_64BIT
> > -#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE      (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > +#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
> >  #else
> > -#define TASK_SIZE FIXADDR_START
> > +#define TASK_SIZE    FIXADDR_START
> > +#define TASK_SIZE_MIN        TASK_SIZE
> >  #endif
> >
> >  #else /* CONFIG_MMU */
> > @@ -697,6 +709,8 @@ extern uintptr_t _dtb_early_pa;
> >  #define dtb_early_va _dtb_early_va
> >  #define dtb_early_pa _dtb_early_pa
> >  #endif /* CONFIG_XIP_KERNEL */
> > +extern u64 satp_mode;
> > +extern bool pgtable_l4_enabled;
> >
> >  void paging_init(void);
> >  void misc_mem_init(void);
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index 52c5ff9804c5..c3c0ed559770 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -95,7 +95,8 @@ relocate:
> >
> >       /* Compute satp for kernel page tables, but don't load it yet */
> >       srl a2, a0, PAGE_SHIFT
> > -     li a1, SATP_MODE
> > +     la a1, satp_mode
> > +     REG_L a1, 0(a1)
> >       or a2, a2, a1
> >
> >       /*
> > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> > index ee3459cb6750..a7246872bd30 100644
> > --- a/arch/riscv/mm/context.c
> > +++ b/arch/riscv/mm/context.c
> > @@ -192,7 +192,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  switch_mm_fast:
> >       csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
> >                 ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> > -               SATP_MODE);
> > +               satp_mode);
> >
> >       if (need_flush_tlb)
> >               local_flush_tlb_all();
> > @@ -201,7 +201,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
> >  static void set_mm_noasid(struct mm_struct *mm)
> >  {
> >       /* Switch the page table and blindly nuke entire local TLB */
> > -     csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
> > +     csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
> >       local_flush_tlb_all();
> >  }
> >
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 1552226fb6bd..6a19a1b1caf8 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -37,6 +37,17 @@ EXPORT_SYMBOL(kernel_map);
> >  #define kernel_map   (*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
> >  #endif
> >
> > +#ifdef CONFIG_64BIT
> > +u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
> > +#else
> > +u64 satp_mode = SATP_MODE_32;
> > +#endif
> > +EXPORT_SYMBOL(satp_mode);
> > +
> > +bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL) ?
> > +                             true : false;
>
> Hi Alex,
>
> I'm not sure whether we can use static key for pgtable_l4_enabled or
> not. Obviously, for a specific HW platform, pgtable_l4_enabled won't change
> after boot, and it seems it sits hot code path, so IMHO, static key maybe
> suitable for it.

Thanks for the suggestion, I'll explore that after this series is
merged if you don't mind.

Thanks,

Alex

>
> Thanks
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
  2021-12-06 10:46   ` Alexandre Ghiti
@ 2022-01-10  8:03     ` Alexandre ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre ghiti @ 2022-01-10  8:03 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

Hi Palmer,

I fell onto this issue again today, do you think you could take this 
patch in for-next? Because I assume it is too late now to take the sv48 
patchset: if not, I can respin it today or tomorrow.

Thanks,

Alex

On 12/6/21 11:46, Alexandre Ghiti wrote:
> Because of the stack canary feature that reads from the current task
> structure the stack canary value, the thread pointer register "tp" must
> be set before calling any C function from head.S: by chance, setup_vm
> and all the functions that it calls does not seem to be part of the
> functions where the canary check is done, but in the following commits,
> some functions will.
>
> Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>   arch/riscv/kernel/head.S | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index c3c0ed559770..86f7ee3d210d 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -302,6 +302,7 @@ clear_bss_done:
>   	REG_S a0, (a2)
>   
>   	/* Initialize page tables and relocate to virtual addresses */
> +	la tp, init_task
>   	la sp, init_thread_union + THREAD_SIZE
>   	XIP_FIXUP_OFFSET sp
>   #ifdef CONFIG_BUILTIN_DTB

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions
@ 2022-01-10  8:03     ` Alexandre ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre ghiti @ 2022-01-10  8:03 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

Hi Palmer,

I fell onto this issue again today, do you think you could take this 
patch in for-next? Because I assume it is too late now to take the sv48 
patchset: if not, I can respin it today or tomorrow.

Thanks,

Alex

On 12/6/21 11:46, Alexandre Ghiti wrote:
> Because of the stack canary feature that reads from the current task
> structure the stack canary value, the thread pointer register "tp" must
> be set before calling any C function from head.S: by chance, setup_vm
> and all the functions that it calls does not seem to be part of the
> functions where the canary check is done, but in the following commits,
> some functions will.
>
> Fixes: f2c9699f65557a31 ("riscv: Add STACKPROTECTOR supported")
> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
> ---
>   arch/riscv/kernel/head.S | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index c3c0ed559770..86f7ee3d210d 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -302,6 +302,7 @@ clear_bss_done:
>   	REG_S a0, (a2)
>   
>   	/* Initialize page tables and relocate to virtual addresses */
> +	la tp, init_task
>   	la sp, init_thread_union + THREAD_SIZE
>   	XIP_FIXUP_OFFSET sp
>   #ifdef CONFIG_BUILTIN_DTB

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2021-12-06 10:46 ` Alexandre Ghiti
@ 2022-01-20  4:18   ` Palmer Dabbelt
  -1 siblings, 0 replies; 70+ messages in thread
From: Palmer Dabbelt @ 2022-01-20  4:18 UTC (permalink / raw)
  To: alexandre.ghiti
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch, alexandre.ghiti

On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> * Please note notable changes in memory layouts and kasan population *
>
> This patchset allows to have a single kernel for sv39 and sv48 without
> being relocatable.
>
> The idea comes from Arnd Bergmann who suggested to do the same as x86,
> that is mapping the kernel to the end of the address space, which allows
> the kernel to be linked at the same address for both sv39 and sv48 and
> then does not require to be relocated at runtime.
>
> This implements sv48 support at runtime. The kernel will try to
> boot with 4-level page table and will fallback to 3-level if the HW does not
> support it. Folding the 4th level into a 3-level page table has almost no
> cost at runtime.
>
> Note that kasan region had to be moved to the end of the address space
> since its location must be known at compile-time and then be valid for
> both sv39 and sv48 (and sv57 that is coming).
>
> Tested on:
>   - qemu rv64 sv39: OK
>   - qemu rv64 sv48: OK
>   - qemu rv64 sv39 + kasan: OK
>   - qemu rv64 sv48 + kasan: OK
>   - qemu rv32: OK
>
> Changes in v3:
>   - Fix SZ_1T, thanks to Atish
>   - Fix warning create_pud_mapping, thanks to Atish
>   - Fix k210 nommu build, thanks to Atish
>   - Fix wrong rebase as noted by Samuel
>   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>
> Changes in v2:
>   - Rebase onto for-next
>   - Fix KASAN
>   - Fix stack canary
>   - Get completely rid of MAXPHYSMEM configs
>   - Add documentation
>
> Alexandre Ghiti (13):
>   riscv: Move KASAN mapping next to the kernel mapping
>   riscv: Split early kasan mapping to prepare sv48 introduction
>   riscv: Introduce functions to switch pt_ops
>   riscv: Allow to dynamically define VA_BITS
>   riscv: Get rid of MAXPHYSMEM configs
>   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>   riscv: Implement sv48 support
>   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>   riscv: Explicit comment about user virtual address space size
>   riscv: Improve virtual kernel memory layout dump
>   Documentation: riscv: Add sv48 description to VM layout
>   riscv: Initialize thread pointer before calling C functions
>   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>
>  Documentation/riscv/vm-layout.rst             |  48 ++-
>  arch/riscv/Kconfig                            |  37 +-
>  arch/riscv/configs/nommu_k210_defconfig       |   1 -
>  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>  arch/riscv/configs/nommu_virt_defconfig       |   1 -
>  arch/riscv/include/asm/csr.h                  |   3 +-
>  arch/riscv/include/asm/fixmap.h               |   1
>  arch/riscv/include/asm/kasan.h                |  11 +-
>  arch/riscv/include/asm/page.h                 |  20 +-
>  arch/riscv/include/asm/pgalloc.h              |  40 ++
>  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>  arch/riscv/include/asm/pgtable.h              |  47 +-
>  arch/riscv/include/asm/sparsemem.h            |   6 +-
>  arch/riscv/kernel/cpu.c                       |  23 +-
>  arch/riscv/kernel/head.S                      |   4 +-
>  arch/riscv/mm/context.c                       |   4 +-
>  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>  drivers/firmware/efi/libstub/efi-stub.c       |   2
>  drivers/pci/controller/pci-xgene.c            |   2 +-
>  include/asm-generic/pgalloc.h                 |  24 +-
>  include/linux/sizes.h                         |   1
>  22 files changed, 833 insertions(+), 209 deletions(-)

Sorry this took a while.  This is on for-next, with a bit of juggling: a 
handful of trivial fixes for configs that were failing to build/boot and 
some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so 
it'd be easier to backport.  This is bigger than something I'd normally like to
take late in the cycle, but given there's a lot of cleanups, likely some fixes,
and it looks like folks have been testing this I'm just going to go with it.

Let me know if there's any issues with the merge, it was a bit hairy.  
Probably best to just send along a fixup patch at this point.

Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2022-01-20  4:18   ` Palmer Dabbelt
  0 siblings, 0 replies; 70+ messages in thread
From: Palmer Dabbelt @ 2022-01-20  4:18 UTC (permalink / raw)
  To: alexandre.ghiti
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch, alexandre.ghiti

On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> * Please note notable changes in memory layouts and kasan population *
>
> This patchset allows to have a single kernel for sv39 and sv48 without
> being relocatable.
>
> The idea comes from Arnd Bergmann who suggested to do the same as x86,
> that is mapping the kernel to the end of the address space, which allows
> the kernel to be linked at the same address for both sv39 and sv48 and
> then does not require to be relocated at runtime.
>
> This implements sv48 support at runtime. The kernel will try to
> boot with 4-level page table and will fallback to 3-level if the HW does not
> support it. Folding the 4th level into a 3-level page table has almost no
> cost at runtime.
>
> Note that kasan region had to be moved to the end of the address space
> since its location must be known at compile-time and then be valid for
> both sv39 and sv48 (and sv57 that is coming).
>
> Tested on:
>   - qemu rv64 sv39: OK
>   - qemu rv64 sv48: OK
>   - qemu rv64 sv39 + kasan: OK
>   - qemu rv64 sv48 + kasan: OK
>   - qemu rv32: OK
>
> Changes in v3:
>   - Fix SZ_1T, thanks to Atish
>   - Fix warning create_pud_mapping, thanks to Atish
>   - Fix k210 nommu build, thanks to Atish
>   - Fix wrong rebase as noted by Samuel
>   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>
> Changes in v2:
>   - Rebase onto for-next
>   - Fix KASAN
>   - Fix stack canary
>   - Get completely rid of MAXPHYSMEM configs
>   - Add documentation
>
> Alexandre Ghiti (13):
>   riscv: Move KASAN mapping next to the kernel mapping
>   riscv: Split early kasan mapping to prepare sv48 introduction
>   riscv: Introduce functions to switch pt_ops
>   riscv: Allow to dynamically define VA_BITS
>   riscv: Get rid of MAXPHYSMEM configs
>   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>   riscv: Implement sv48 support
>   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>   riscv: Explicit comment about user virtual address space size
>   riscv: Improve virtual kernel memory layout dump
>   Documentation: riscv: Add sv48 description to VM layout
>   riscv: Initialize thread pointer before calling C functions
>   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>
>  Documentation/riscv/vm-layout.rst             |  48 ++-
>  arch/riscv/Kconfig                            |  37 +-
>  arch/riscv/configs/nommu_k210_defconfig       |   1 -
>  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>  arch/riscv/configs/nommu_virt_defconfig       |   1 -
>  arch/riscv/include/asm/csr.h                  |   3 +-
>  arch/riscv/include/asm/fixmap.h               |   1
>  arch/riscv/include/asm/kasan.h                |  11 +-
>  arch/riscv/include/asm/page.h                 |  20 +-
>  arch/riscv/include/asm/pgalloc.h              |  40 ++
>  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>  arch/riscv/include/asm/pgtable.h              |  47 +-
>  arch/riscv/include/asm/sparsemem.h            |   6 +-
>  arch/riscv/kernel/cpu.c                       |  23 +-
>  arch/riscv/kernel/head.S                      |   4 +-
>  arch/riscv/mm/context.c                       |   4 +-
>  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>  drivers/firmware/efi/libstub/efi-stub.c       |   2
>  drivers/pci/controller/pci-xgene.c            |   2 +-
>  include/asm-generic/pgalloc.h                 |  24 +-
>  include/linux/sizes.h                         |   1
>  22 files changed, 833 insertions(+), 209 deletions(-)

Sorry this took a while.  This is on for-next, with a bit of juggling: a 
handful of trivial fixes for configs that were failing to build/boot and 
some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so 
it'd be easier to backport.  This is bigger than something I'd normally like to
take late in the cycle, but given there's a lot of cleanups, likely some fixes,
and it looks like folks have been testing this I'm just going to go with it.

Let me know if there's any issues with the merge, it was a bit hairy.  
Probably best to just send along a fixup patch at this point.

Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2022-01-20  4:18   ` Palmer Dabbelt
@ 2022-01-20  7:30     ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-20  7:30 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > * Please note notable changes in memory layouts and kasan population *
> >
> > This patchset allows to have a single kernel for sv39 and sv48 without
> > being relocatable.
> >
> > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > that is mapping the kernel to the end of the address space, which allows
> > the kernel to be linked at the same address for both sv39 and sv48 and
> > then does not require to be relocated at runtime.
> >
> > This implements sv48 support at runtime. The kernel will try to
> > boot with 4-level page table and will fallback to 3-level if the HW does not
> > support it. Folding the 4th level into a 3-level page table has almost no
> > cost at runtime.
> >
> > Note that kasan region had to be moved to the end of the address space
> > since its location must be known at compile-time and then be valid for
> > both sv39 and sv48 (and sv57 that is coming).
> >
> > Tested on:
> >   - qemu rv64 sv39: OK
> >   - qemu rv64 sv48: OK
> >   - qemu rv64 sv39 + kasan: OK
> >   - qemu rv64 sv48 + kasan: OK
> >   - qemu rv32: OK
> >
> > Changes in v3:
> >   - Fix SZ_1T, thanks to Atish
> >   - Fix warning create_pud_mapping, thanks to Atish
> >   - Fix k210 nommu build, thanks to Atish
> >   - Fix wrong rebase as noted by Samuel
> >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> >
> > Changes in v2:
> >   - Rebase onto for-next
> >   - Fix KASAN
> >   - Fix stack canary
> >   - Get completely rid of MAXPHYSMEM configs
> >   - Add documentation
> >
> > Alexandre Ghiti (13):
> >   riscv: Move KASAN mapping next to the kernel mapping
> >   riscv: Split early kasan mapping to prepare sv48 introduction
> >   riscv: Introduce functions to switch pt_ops
> >   riscv: Allow to dynamically define VA_BITS
> >   riscv: Get rid of MAXPHYSMEM configs
> >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> >   riscv: Implement sv48 support
> >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> >   riscv: Explicit comment about user virtual address space size
> >   riscv: Improve virtual kernel memory layout dump
> >   Documentation: riscv: Add sv48 description to VM layout
> >   riscv: Initialize thread pointer before calling C functions
> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> >
> >  Documentation/riscv/vm-layout.rst             |  48 ++-
> >  arch/riscv/Kconfig                            |  37 +-
> >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> >  arch/riscv/include/asm/csr.h                  |   3 +-
> >  arch/riscv/include/asm/fixmap.h               |   1
> >  arch/riscv/include/asm/kasan.h                |  11 +-
> >  arch/riscv/include/asm/page.h                 |  20 +-
> >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> >  arch/riscv/include/asm/pgtable.h              |  47 +-
> >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> >  arch/riscv/kernel/cpu.c                       |  23 +-
> >  arch/riscv/kernel/head.S                      |   4 +-
> >  arch/riscv/mm/context.c                       |   4 +-
> >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> >  drivers/pci/controller/pci-xgene.c            |   2 +-
> >  include/asm-generic/pgalloc.h                 |  24 +-
> >  include/linux/sizes.h                         |   1
> >  22 files changed, 833 insertions(+), 209 deletions(-)
>
> Sorry this took a while.  This is on for-next, with a bit of juggling: a
> handful of trivial fixes for configs that were failing to build/boot and
> some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> it'd be easier to backport.  This is bigger than something I'd normally like to
> take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> and it looks like folks have been testing this I'm just going to go with it.
>

Yes yes yes! That's fantastic news :)

> Let me know if there's any issues with the merge, it was a bit hairy.
> Probably best to just send along a fixup patch at this point.

I'm going to take a look at that now, and I'll fix anything that comes
up quickly :)

Thanks!

Alex

>
> Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2022-01-20  7:30     ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-20  7:30 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > * Please note notable changes in memory layouts and kasan population *
> >
> > This patchset allows to have a single kernel for sv39 and sv48 without
> > being relocatable.
> >
> > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > that is mapping the kernel to the end of the address space, which allows
> > the kernel to be linked at the same address for both sv39 and sv48 and
> > then does not require to be relocated at runtime.
> >
> > This implements sv48 support at runtime. The kernel will try to
> > boot with 4-level page table and will fallback to 3-level if the HW does not
> > support it. Folding the 4th level into a 3-level page table has almost no
> > cost at runtime.
> >
> > Note that kasan region had to be moved to the end of the address space
> > since its location must be known at compile-time and then be valid for
> > both sv39 and sv48 (and sv57 that is coming).
> >
> > Tested on:
> >   - qemu rv64 sv39: OK
> >   - qemu rv64 sv48: OK
> >   - qemu rv64 sv39 + kasan: OK
> >   - qemu rv64 sv48 + kasan: OK
> >   - qemu rv32: OK
> >
> > Changes in v3:
> >   - Fix SZ_1T, thanks to Atish
> >   - Fix warning create_pud_mapping, thanks to Atish
> >   - Fix k210 nommu build, thanks to Atish
> >   - Fix wrong rebase as noted by Samuel
> >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> >
> > Changes in v2:
> >   - Rebase onto for-next
> >   - Fix KASAN
> >   - Fix stack canary
> >   - Get completely rid of MAXPHYSMEM configs
> >   - Add documentation
> >
> > Alexandre Ghiti (13):
> >   riscv: Move KASAN mapping next to the kernel mapping
> >   riscv: Split early kasan mapping to prepare sv48 introduction
> >   riscv: Introduce functions to switch pt_ops
> >   riscv: Allow to dynamically define VA_BITS
> >   riscv: Get rid of MAXPHYSMEM configs
> >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> >   riscv: Implement sv48 support
> >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> >   riscv: Explicit comment about user virtual address space size
> >   riscv: Improve virtual kernel memory layout dump
> >   Documentation: riscv: Add sv48 description to VM layout
> >   riscv: Initialize thread pointer before calling C functions
> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> >
> >  Documentation/riscv/vm-layout.rst             |  48 ++-
> >  arch/riscv/Kconfig                            |  37 +-
> >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> >  arch/riscv/include/asm/csr.h                  |   3 +-
> >  arch/riscv/include/asm/fixmap.h               |   1
> >  arch/riscv/include/asm/kasan.h                |  11 +-
> >  arch/riscv/include/asm/page.h                 |  20 +-
> >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> >  arch/riscv/include/asm/pgtable.h              |  47 +-
> >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> >  arch/riscv/kernel/cpu.c                       |  23 +-
> >  arch/riscv/kernel/head.S                      |   4 +-
> >  arch/riscv/mm/context.c                       |   4 +-
> >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> >  drivers/pci/controller/pci-xgene.c            |   2 +-
> >  include/asm-generic/pgalloc.h                 |  24 +-
> >  include/linux/sizes.h                         |   1
> >  22 files changed, 833 insertions(+), 209 deletions(-)
>
> Sorry this took a while.  This is on for-next, with a bit of juggling: a
> handful of trivial fixes for configs that were failing to build/boot and
> some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> it'd be easier to backport.  This is bigger than something I'd normally like to
> take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> and it looks like folks have been testing this I'm just going to go with it.
>

Yes yes yes! That's fantastic news :)

> Let me know if there's any issues with the merge, it was a bit hairy.
> Probably best to just send along a fixup patch at this point.

I'm going to take a look at that now, and I'll fix anything that comes
up quickly :)

Thanks!

Alex

>
> Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2022-01-20  7:30     ` Alexandre Ghiti
@ 2022-01-20 10:05       ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-20 10:05 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >
> > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > > * Please note notable changes in memory layouts and kasan population *
> > >
> > > This patchset allows to have a single kernel for sv39 and sv48 without
> > > being relocatable.
> > >
> > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > > that is mapping the kernel to the end of the address space, which allows
> > > the kernel to be linked at the same address for both sv39 and sv48 and
> > > then does not require to be relocated at runtime.
> > >
> > > This implements sv48 support at runtime. The kernel will try to
> > > boot with 4-level page table and will fallback to 3-level if the HW does not
> > > support it. Folding the 4th level into a 3-level page table has almost no
> > > cost at runtime.
> > >
> > > Note that kasan region had to be moved to the end of the address space
> > > since its location must be known at compile-time and then be valid for
> > > both sv39 and sv48 (and sv57 that is coming).
> > >
> > > Tested on:
> > >   - qemu rv64 sv39: OK
> > >   - qemu rv64 sv48: OK
> > >   - qemu rv64 sv39 + kasan: OK
> > >   - qemu rv64 sv48 + kasan: OK
> > >   - qemu rv32: OK
> > >
> > > Changes in v3:
> > >   - Fix SZ_1T, thanks to Atish
> > >   - Fix warning create_pud_mapping, thanks to Atish
> > >   - Fix k210 nommu build, thanks to Atish
> > >   - Fix wrong rebase as noted by Samuel
> > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> > >
> > > Changes in v2:
> > >   - Rebase onto for-next
> > >   - Fix KASAN
> > >   - Fix stack canary
> > >   - Get completely rid of MAXPHYSMEM configs
> > >   - Add documentation
> > >
> > > Alexandre Ghiti (13):
> > >   riscv: Move KASAN mapping next to the kernel mapping
> > >   riscv: Split early kasan mapping to prepare sv48 introduction
> > >   riscv: Introduce functions to switch pt_ops
> > >   riscv: Allow to dynamically define VA_BITS
> > >   riscv: Get rid of MAXPHYSMEM configs
> > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> > >   riscv: Implement sv48 support
> > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> > >   riscv: Explicit comment about user virtual address space size
> > >   riscv: Improve virtual kernel memory layout dump
> > >   Documentation: riscv: Add sv48 description to VM layout
> > >   riscv: Initialize thread pointer before calling C functions
> > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> > >
> > >  Documentation/riscv/vm-layout.rst             |  48 ++-
> > >  arch/riscv/Kconfig                            |  37 +-
> > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> > >  arch/riscv/include/asm/csr.h                  |   3 +-
> > >  arch/riscv/include/asm/fixmap.h               |   1
> > >  arch/riscv/include/asm/kasan.h                |  11 +-
> > >  arch/riscv/include/asm/page.h                 |  20 +-
> > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> > >  arch/riscv/include/asm/pgtable.h              |  47 +-
> > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> > >  arch/riscv/kernel/cpu.c                       |  23 +-
> > >  arch/riscv/kernel/head.S                      |   4 +-
> > >  arch/riscv/mm/context.c                       |   4 +-
> > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> > >  drivers/pci/controller/pci-xgene.c            |   2 +-
> > >  include/asm-generic/pgalloc.h                 |  24 +-
> > >  include/linux/sizes.h                         |   1
> > >  22 files changed, 833 insertions(+), 209 deletions(-)
> >
> > Sorry this took a while.  This is on for-next, with a bit of juggling: a
> > handful of trivial fixes for configs that were failing to build/boot and
> > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> > it'd be easier to backport.  This is bigger than something I'd normally like to
> > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> > and it looks like folks have been testing this I'm just going to go with it.
> >
>
> Yes yes yes! That's fantastic news :)
>
> > Let me know if there's any issues with the merge, it was a bit hairy.
> > Probably best to just send along a fixup patch at this point.
>
> I'm going to take a look at that now, and I'll fix anything that comes
> up quickly :)

I see in for-next that you did not take the following patches:

  riscv: Improve virtual kernel memory layout dump
  Documentation: riscv: Add sv48 description to VM layout
  riscv: Initialize thread pointer before calling C functions
  riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN

I'm not sure this was your intention. If it was, I believe that at
least the first 2 patches are needed in this series, the 3rd one is a
useful fix and we can discuss the 4th if that's an issue for you.

I tested for-next on both sv39 and sv48 successfully, I took a glance
at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
that. Otherwise nothing obvious has popped.

Thanks again,

Alex

>
> Thanks!
>
> Alex
>
> >
> > Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2022-01-20 10:05       ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-01-20 10:05 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >
> > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > > * Please note notable changes in memory layouts and kasan population *
> > >
> > > This patchset allows to have a single kernel for sv39 and sv48 without
> > > being relocatable.
> > >
> > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > > that is mapping the kernel to the end of the address space, which allows
> > > the kernel to be linked at the same address for both sv39 and sv48 and
> > > then does not require to be relocated at runtime.
> > >
> > > This implements sv48 support at runtime. The kernel will try to
> > > boot with 4-level page table and will fallback to 3-level if the HW does not
> > > support it. Folding the 4th level into a 3-level page table has almost no
> > > cost at runtime.
> > >
> > > Note that kasan region had to be moved to the end of the address space
> > > since its location must be known at compile-time and then be valid for
> > > both sv39 and sv48 (and sv57 that is coming).
> > >
> > > Tested on:
> > >   - qemu rv64 sv39: OK
> > >   - qemu rv64 sv48: OK
> > >   - qemu rv64 sv39 + kasan: OK
> > >   - qemu rv64 sv48 + kasan: OK
> > >   - qemu rv32: OK
> > >
> > > Changes in v3:
> > >   - Fix SZ_1T, thanks to Atish
> > >   - Fix warning create_pud_mapping, thanks to Atish
> > >   - Fix k210 nommu build, thanks to Atish
> > >   - Fix wrong rebase as noted by Samuel
> > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> > >
> > > Changes in v2:
> > >   - Rebase onto for-next
> > >   - Fix KASAN
> > >   - Fix stack canary
> > >   - Get completely rid of MAXPHYSMEM configs
> > >   - Add documentation
> > >
> > > Alexandre Ghiti (13):
> > >   riscv: Move KASAN mapping next to the kernel mapping
> > >   riscv: Split early kasan mapping to prepare sv48 introduction
> > >   riscv: Introduce functions to switch pt_ops
> > >   riscv: Allow to dynamically define VA_BITS
> > >   riscv: Get rid of MAXPHYSMEM configs
> > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> > >   riscv: Implement sv48 support
> > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> > >   riscv: Explicit comment about user virtual address space size
> > >   riscv: Improve virtual kernel memory layout dump
> > >   Documentation: riscv: Add sv48 description to VM layout
> > >   riscv: Initialize thread pointer before calling C functions
> > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> > >
> > >  Documentation/riscv/vm-layout.rst             |  48 ++-
> > >  arch/riscv/Kconfig                            |  37 +-
> > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> > >  arch/riscv/include/asm/csr.h                  |   3 +-
> > >  arch/riscv/include/asm/fixmap.h               |   1
> > >  arch/riscv/include/asm/kasan.h                |  11 +-
> > >  arch/riscv/include/asm/page.h                 |  20 +-
> > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> > >  arch/riscv/include/asm/pgtable.h              |  47 +-
> > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> > >  arch/riscv/kernel/cpu.c                       |  23 +-
> > >  arch/riscv/kernel/head.S                      |   4 +-
> > >  arch/riscv/mm/context.c                       |   4 +-
> > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> > >  drivers/pci/controller/pci-xgene.c            |   2 +-
> > >  include/asm-generic/pgalloc.h                 |  24 +-
> > >  include/linux/sizes.h                         |   1
> > >  22 files changed, 833 insertions(+), 209 deletions(-)
> >
> > Sorry this took a while.  This is on for-next, with a bit of juggling: a
> > handful of trivial fixes for configs that were failing to build/boot and
> > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> > it'd be easier to backport.  This is bigger than something I'd normally like to
> > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> > and it looks like folks have been testing this I'm just going to go with it.
> >
>
> Yes yes yes! That's fantastic news :)
>
> > Let me know if there's any issues with the merge, it was a bit hairy.
> > Probably best to just send along a fixup patch at this point.
>
> I'm going to take a look at that now, and I'll fix anything that comes
> up quickly :)

I see in for-next that you did not take the following patches:

  riscv: Improve virtual kernel memory layout dump
  Documentation: riscv: Add sv48 description to VM layout
  riscv: Initialize thread pointer before calling C functions
  riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN

I'm not sure this was your intention. If it was, I believe that at
least the first 2 patches are needed in this series, the 3rd one is a
useful fix and we can discuss the 4th if that's an issue for you.

I tested for-next on both sv39 and sv48 successfully, I took a glance
at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
that. Otherwise nothing obvious has popped.

Thanks again,

Alex

>
> Thanks!
>
> Alex
>
> >
> > Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2022-01-20 10:05       ` Alexandre Ghiti
@ 2022-02-18 10:45         ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-02-18 10:45 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

Hi Palmer,

On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > >
> > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > > > * Please note notable changes in memory layouts and kasan population *
> > > >
> > > > This patchset allows to have a single kernel for sv39 and sv48 without
> > > > being relocatable.
> > > >
> > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > > > that is mapping the kernel to the end of the address space, which allows
> > > > the kernel to be linked at the same address for both sv39 and sv48 and
> > > > then does not require to be relocated at runtime.
> > > >
> > > > This implements sv48 support at runtime. The kernel will try to
> > > > boot with 4-level page table and will fallback to 3-level if the HW does not
> > > > support it. Folding the 4th level into a 3-level page table has almost no
> > > > cost at runtime.
> > > >
> > > > Note that kasan region had to be moved to the end of the address space
> > > > since its location must be known at compile-time and then be valid for
> > > > both sv39 and sv48 (and sv57 that is coming).
> > > >
> > > > Tested on:
> > > >   - qemu rv64 sv39: OK
> > > >   - qemu rv64 sv48: OK
> > > >   - qemu rv64 sv39 + kasan: OK
> > > >   - qemu rv64 sv48 + kasan: OK
> > > >   - qemu rv32: OK
> > > >
> > > > Changes in v3:
> > > >   - Fix SZ_1T, thanks to Atish
> > > >   - Fix warning create_pud_mapping, thanks to Atish
> > > >   - Fix k210 nommu build, thanks to Atish
> > > >   - Fix wrong rebase as noted by Samuel
> > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> > > >
> > > > Changes in v2:
> > > >   - Rebase onto for-next
> > > >   - Fix KASAN
> > > >   - Fix stack canary
> > > >   - Get completely rid of MAXPHYSMEM configs
> > > >   - Add documentation
> > > >
> > > > Alexandre Ghiti (13):
> > > >   riscv: Move KASAN mapping next to the kernel mapping
> > > >   riscv: Split early kasan mapping to prepare sv48 introduction
> > > >   riscv: Introduce functions to switch pt_ops
> > > >   riscv: Allow to dynamically define VA_BITS
> > > >   riscv: Get rid of MAXPHYSMEM configs
> > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> > > >   riscv: Implement sv48 support
> > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> > > >   riscv: Explicit comment about user virtual address space size
> > > >   riscv: Improve virtual kernel memory layout dump
> > > >   Documentation: riscv: Add sv48 description to VM layout
> > > >   riscv: Initialize thread pointer before calling C functions
> > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> > > >
> > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
> > > >  arch/riscv/Kconfig                            |  37 +-
> > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> > > >  arch/riscv/include/asm/csr.h                  |   3 +-
> > > >  arch/riscv/include/asm/fixmap.h               |   1
> > > >  arch/riscv/include/asm/kasan.h                |  11 +-
> > > >  arch/riscv/include/asm/page.h                 |  20 +-
> > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
> > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> > > >  arch/riscv/kernel/cpu.c                       |  23 +-
> > > >  arch/riscv/kernel/head.S                      |   4 +-
> > > >  arch/riscv/mm/context.c                       |   4 +-
> > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
> > > >  include/asm-generic/pgalloc.h                 |  24 +-
> > > >  include/linux/sizes.h                         |   1
> > > >  22 files changed, 833 insertions(+), 209 deletions(-)
> > >
> > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
> > > handful of trivial fixes for configs that were failing to build/boot and
> > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> > > it'd be easier to backport.  This is bigger than something I'd normally like to
> > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> > > and it looks like folks have been testing this I'm just going to go with it.
> > >
> >
> > Yes yes yes! That's fantastic news :)
> >
> > > Let me know if there's any issues with the merge, it was a bit hairy.
> > > Probably best to just send along a fixup patch at this point.
> >
> > I'm going to take a look at that now, and I'll fix anything that comes
> > up quickly :)
>
> I see in for-next that you did not take the following patches:
>
>   riscv: Improve virtual kernel memory layout dump
>   Documentation: riscv: Add sv48 description to VM layout
>   riscv: Initialize thread pointer before calling C functions
>   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>
> I'm not sure this was your intention. If it was, I believe that at
> least the first 2 patches are needed in this series, the 3rd one is a
> useful fix and we can discuss the 4th if that's an issue for you.

Can you confirm that this was intentional and maybe explain the
motivation behind it? Because I see value in those patches.

Thanks,

Alex

>
> I tested for-next on both sv39 and sv48 successfully, I took a glance
> at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
> that. Otherwise nothing obvious has popped.
>
> Thanks again,
>
> Alex
>
> >
> > Thanks!
> >
> > Alex
> >
> > >
> > > Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2022-02-18 10:45         ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-02-18 10:45 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

Hi Palmer,

On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > >
> > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > > > * Please note notable changes in memory layouts and kasan population *
> > > >
> > > > This patchset allows to have a single kernel for sv39 and sv48 without
> > > > being relocatable.
> > > >
> > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > > > that is mapping the kernel to the end of the address space, which allows
> > > > the kernel to be linked at the same address for both sv39 and sv48 and
> > > > then does not require to be relocated at runtime.
> > > >
> > > > This implements sv48 support at runtime. The kernel will try to
> > > > boot with 4-level page table and will fallback to 3-level if the HW does not
> > > > support it. Folding the 4th level into a 3-level page table has almost no
> > > > cost at runtime.
> > > >
> > > > Note that kasan region had to be moved to the end of the address space
> > > > since its location must be known at compile-time and then be valid for
> > > > both sv39 and sv48 (and sv57 that is coming).
> > > >
> > > > Tested on:
> > > >   - qemu rv64 sv39: OK
> > > >   - qemu rv64 sv48: OK
> > > >   - qemu rv64 sv39 + kasan: OK
> > > >   - qemu rv64 sv48 + kasan: OK
> > > >   - qemu rv32: OK
> > > >
> > > > Changes in v3:
> > > >   - Fix SZ_1T, thanks to Atish
> > > >   - Fix warning create_pud_mapping, thanks to Atish
> > > >   - Fix k210 nommu build, thanks to Atish
> > > >   - Fix wrong rebase as noted by Samuel
> > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> > > >
> > > > Changes in v2:
> > > >   - Rebase onto for-next
> > > >   - Fix KASAN
> > > >   - Fix stack canary
> > > >   - Get completely rid of MAXPHYSMEM configs
> > > >   - Add documentation
> > > >
> > > > Alexandre Ghiti (13):
> > > >   riscv: Move KASAN mapping next to the kernel mapping
> > > >   riscv: Split early kasan mapping to prepare sv48 introduction
> > > >   riscv: Introduce functions to switch pt_ops
> > > >   riscv: Allow to dynamically define VA_BITS
> > > >   riscv: Get rid of MAXPHYSMEM configs
> > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> > > >   riscv: Implement sv48 support
> > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> > > >   riscv: Explicit comment about user virtual address space size
> > > >   riscv: Improve virtual kernel memory layout dump
> > > >   Documentation: riscv: Add sv48 description to VM layout
> > > >   riscv: Initialize thread pointer before calling C functions
> > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> > > >
> > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
> > > >  arch/riscv/Kconfig                            |  37 +-
> > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> > > >  arch/riscv/include/asm/csr.h                  |   3 +-
> > > >  arch/riscv/include/asm/fixmap.h               |   1
> > > >  arch/riscv/include/asm/kasan.h                |  11 +-
> > > >  arch/riscv/include/asm/page.h                 |  20 +-
> > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
> > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> > > >  arch/riscv/kernel/cpu.c                       |  23 +-
> > > >  arch/riscv/kernel/head.S                      |   4 +-
> > > >  arch/riscv/mm/context.c                       |   4 +-
> > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
> > > >  include/asm-generic/pgalloc.h                 |  24 +-
> > > >  include/linux/sizes.h                         |   1
> > > >  22 files changed, 833 insertions(+), 209 deletions(-)
> > >
> > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
> > > handful of trivial fixes for configs that were failing to build/boot and
> > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> > > it'd be easier to backport.  This is bigger than something I'd normally like to
> > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> > > and it looks like folks have been testing this I'm just going to go with it.
> > >
> >
> > Yes yes yes! That's fantastic news :)
> >
> > > Let me know if there's any issues with the merge, it was a bit hairy.
> > > Probably best to just send along a fixup patch at this point.
> >
> > I'm going to take a look at that now, and I'll fix anything that comes
> > up quickly :)
>
> I see in for-next that you did not take the following patches:
>
>   riscv: Improve virtual kernel memory layout dump
>   Documentation: riscv: Add sv48 description to VM layout
>   riscv: Initialize thread pointer before calling C functions
>   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>
> I'm not sure this was your intention. If it was, I believe that at
> least the first 2 patches are needed in this series, the 3rd one is a
> useful fix and we can discuss the 4th if that's an issue for you.

Can you confirm that this was intentional and maybe explain the
motivation behind it? Because I see value in those patches.

Thanks,

Alex

>
> I tested for-next on both sv39 and sv48 successfully, I took a glance
> at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
> that. Otherwise nothing obvious has popped.
>
> Thanks again,
>
> Alex
>
> >
> > Thanks!
> >
> > Alex
> >
> > >
> > > Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2022-02-18 10:45         ` Alexandre Ghiti
@ 2022-04-01 12:56           ` Alexandre Ghiti
  -1 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-04-01 12:56 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Fri, Feb 18, 2022 at 11:45 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> Hi Palmer,
>
> On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
> > <alexandre.ghiti@canonical.com> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > > >
> > > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > > > > * Please note notable changes in memory layouts and kasan population *
> > > > >
> > > > > This patchset allows to have a single kernel for sv39 and sv48 without
> > > > > being relocatable.
> > > > >
> > > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > > > > that is mapping the kernel to the end of the address space, which allows
> > > > > the kernel to be linked at the same address for both sv39 and sv48 and
> > > > > then does not require to be relocated at runtime.
> > > > >
> > > > > This implements sv48 support at runtime. The kernel will try to
> > > > > boot with 4-level page table and will fallback to 3-level if the HW does not
> > > > > support it. Folding the 4th level into a 3-level page table has almost no
> > > > > cost at runtime.
> > > > >
> > > > > Note that kasan region had to be moved to the end of the address space
> > > > > since its location must be known at compile-time and then be valid for
> > > > > both sv39 and sv48 (and sv57 that is coming).
> > > > >
> > > > > Tested on:
> > > > >   - qemu rv64 sv39: OK
> > > > >   - qemu rv64 sv48: OK
> > > > >   - qemu rv64 sv39 + kasan: OK
> > > > >   - qemu rv64 sv48 + kasan: OK
> > > > >   - qemu rv32: OK
> > > > >
> > > > > Changes in v3:
> > > > >   - Fix SZ_1T, thanks to Atish
> > > > >   - Fix warning create_pud_mapping, thanks to Atish
> > > > >   - Fix k210 nommu build, thanks to Atish
> > > > >   - Fix wrong rebase as noted by Samuel
> > > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> > > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> > > > >
> > > > > Changes in v2:
> > > > >   - Rebase onto for-next
> > > > >   - Fix KASAN
> > > > >   - Fix stack canary
> > > > >   - Get completely rid of MAXPHYSMEM configs
> > > > >   - Add documentation
> > > > >
> > > > > Alexandre Ghiti (13):
> > > > >   riscv: Move KASAN mapping next to the kernel mapping
> > > > >   riscv: Split early kasan mapping to prepare sv48 introduction
> > > > >   riscv: Introduce functions to switch pt_ops
> > > > >   riscv: Allow to dynamically define VA_BITS
> > > > >   riscv: Get rid of MAXPHYSMEM configs
> > > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> > > > >   riscv: Implement sv48 support
> > > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> > > > >   riscv: Explicit comment about user virtual address space size
> > > > >   riscv: Improve virtual kernel memory layout dump
> > > > >   Documentation: riscv: Add sv48 description to VM layout
> > > > >   riscv: Initialize thread pointer before calling C functions
> > > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> > > > >
> > > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
> > > > >  arch/riscv/Kconfig                            |  37 +-
> > > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> > > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> > > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> > > > >  arch/riscv/include/asm/csr.h                  |   3 +-
> > > > >  arch/riscv/include/asm/fixmap.h               |   1
> > > > >  arch/riscv/include/asm/kasan.h                |  11 +-
> > > > >  arch/riscv/include/asm/page.h                 |  20 +-
> > > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> > > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> > > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
> > > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> > > > >  arch/riscv/kernel/cpu.c                       |  23 +-
> > > > >  arch/riscv/kernel/head.S                      |   4 +-
> > > > >  arch/riscv/mm/context.c                       |   4 +-
> > > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> > > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> > > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> > > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
> > > > >  include/asm-generic/pgalloc.h                 |  24 +-
> > > > >  include/linux/sizes.h                         |   1
> > > > >  22 files changed, 833 insertions(+), 209 deletions(-)
> > > >
> > > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
> > > > handful of trivial fixes for configs that were failing to build/boot and
> > > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> > > > it'd be easier to backport.  This is bigger than something I'd normally like to
> > > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> > > > and it looks like folks have been testing this I'm just going to go with it.
> > > >
> > >
> > > Yes yes yes! That's fantastic news :)
> > >
> > > > Let me know if there's any issues with the merge, it was a bit hairy.
> > > > Probably best to just send along a fixup patch at this point.
> > >
> > > I'm going to take a look at that now, and I'll fix anything that comes
> > > up quickly :)
> >
> > I see in for-next that you did not take the following patches:
> >
> >   riscv: Improve virtual kernel memory layout dump
> >   Documentation: riscv: Add sv48 description to VM layout
> >   riscv: Initialize thread pointer before calling C functions
> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> >
> > I'm not sure this was your intention. If it was, I believe that at
> > least the first 2 patches are needed in this series, the 3rd one is a
> > useful fix and we can discuss the 4th if that's an issue for you.
>
> Can you confirm that this was intentional and maybe explain the
> motivation behind it? Because I see value in those patches.

Palmer,

I read that you were still taking patches for 5.18, so I confirm again
that the patches above are needed IMO.

Maybe even the relocatable series?

Thanks,

Alex

>
> Thanks,
>
> Alex
>
> >
> > I tested for-next on both sv39 and sv48 successfully, I took a glance
> > at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
> > that. Otherwise nothing obvious has popped.
> >
> > Thanks again,
> >
> > Alex
> >
> > >
> > > Thanks!
> > >
> > > Alex
> > >
> > > >
> > > > Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2022-04-01 12:56           ` Alexandre Ghiti
  0 siblings, 0 replies; 70+ messages in thread
From: Alexandre Ghiti @ 2022-04-01 12:56 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Fri, Feb 18, 2022 at 11:45 AM Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> Hi Palmer,
>
> On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
> > <alexandre.ghiti@canonical.com> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > > >
> > > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
> > > > > * Please note notable changes in memory layouts and kasan population *
> > > > >
> > > > > This patchset allows to have a single kernel for sv39 and sv48 without
> > > > > being relocatable.
> > > > >
> > > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
> > > > > that is mapping the kernel to the end of the address space, which allows
> > > > > the kernel to be linked at the same address for both sv39 and sv48 and
> > > > > then does not require to be relocated at runtime.
> > > > >
> > > > > This implements sv48 support at runtime. The kernel will try to
> > > > > boot with 4-level page table and will fallback to 3-level if the HW does not
> > > > > support it. Folding the 4th level into a 3-level page table has almost no
> > > > > cost at runtime.
> > > > >
> > > > > Note that kasan region had to be moved to the end of the address space
> > > > > since its location must be known at compile-time and then be valid for
> > > > > both sv39 and sv48 (and sv57 that is coming).
> > > > >
> > > > > Tested on:
> > > > >   - qemu rv64 sv39: OK
> > > > >   - qemu rv64 sv48: OK
> > > > >   - qemu rv64 sv39 + kasan: OK
> > > > >   - qemu rv64 sv48 + kasan: OK
> > > > >   - qemu rv32: OK
> > > > >
> > > > > Changes in v3:
> > > > >   - Fix SZ_1T, thanks to Atish
> > > > >   - Fix warning create_pud_mapping, thanks to Atish
> > > > >   - Fix k210 nommu build, thanks to Atish
> > > > >   - Fix wrong rebase as noted by Samuel
> > > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
> > > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
> > > > >
> > > > > Changes in v2:
> > > > >   - Rebase onto for-next
> > > > >   - Fix KASAN
> > > > >   - Fix stack canary
> > > > >   - Get completely rid of MAXPHYSMEM configs
> > > > >   - Add documentation
> > > > >
> > > > > Alexandre Ghiti (13):
> > > > >   riscv: Move KASAN mapping next to the kernel mapping
> > > > >   riscv: Split early kasan mapping to prepare sv48 introduction
> > > > >   riscv: Introduce functions to switch pt_ops
> > > > >   riscv: Allow to dynamically define VA_BITS
> > > > >   riscv: Get rid of MAXPHYSMEM configs
> > > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
> > > > >   riscv: Implement sv48 support
> > > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
> > > > >   riscv: Explicit comment about user virtual address space size
> > > > >   riscv: Improve virtual kernel memory layout dump
> > > > >   Documentation: riscv: Add sv48 description to VM layout
> > > > >   riscv: Initialize thread pointer before calling C functions
> > > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> > > > >
> > > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
> > > > >  arch/riscv/Kconfig                            |  37 +-
> > > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
> > > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
> > > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
> > > > >  arch/riscv/include/asm/csr.h                  |   3 +-
> > > > >  arch/riscv/include/asm/fixmap.h               |   1
> > > > >  arch/riscv/include/asm/kasan.h                |  11 +-
> > > > >  arch/riscv/include/asm/page.h                 |  20 +-
> > > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
> > > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
> > > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
> > > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
> > > > >  arch/riscv/kernel/cpu.c                       |  23 +-
> > > > >  arch/riscv/kernel/head.S                      |   4 +-
> > > > >  arch/riscv/mm/context.c                       |   4 +-
> > > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
> > > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
> > > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
> > > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
> > > > >  include/asm-generic/pgalloc.h                 |  24 +-
> > > > >  include/linux/sizes.h                         |   1
> > > > >  22 files changed, 833 insertions(+), 209 deletions(-)
> > > >
> > > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
> > > > handful of trivial fixes for configs that were failing to build/boot and
> > > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
> > > > it'd be easier to backport.  This is bigger than something I'd normally like to
> > > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
> > > > and it looks like folks have been testing this I'm just going to go with it.
> > > >
> > >
> > > Yes yes yes! That's fantastic news :)
> > >
> > > > Let me know if there's any issues with the merge, it was a bit hairy.
> > > > Probably best to just send along a fixup patch at this point.
> > >
> > > I'm going to take a look at that now, and I'll fix anything that comes
> > > up quickly :)
> >
> > I see in for-next that you did not take the following patches:
> >
> >   riscv: Improve virtual kernel memory layout dump
> >   Documentation: riscv: Add sv48 description to VM layout
> >   riscv: Initialize thread pointer before calling C functions
> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
> >
> > I'm not sure this was your intention. If it was, I believe that at
> > least the first 2 patches are needed in this series, the 3rd one is a
> > useful fix and we can discuss the 4th if that's an issue for you.
>
> Can you confirm that this was intentional and maybe explain the
> motivation behind it? Because I see value in those patches.

Palmer,

I read that you were still taking patches for 5.18, so I confirm again
that the patches above are needed IMO.

Maybe even the relocatable series?

Thanks,

Alex

>
> Thanks,
>
> Alex
>
> >
> > I tested for-next on both sv39 and sv48 successfully, I took a glance
> > at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
> > that. Otherwise nothing obvious has popped.
> >
> > Thanks again,
> >
> > Alex
> >
> > >
> > > Thanks!
> > >
> > > Alex
> > >
> > > >
> > > > Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2022-04-01 12:56           ` Alexandre Ghiti
@ 2022-04-23  1:50             ` Palmer Dabbelt
  -1 siblings, 0 replies; 70+ messages in thread
From: Palmer Dabbelt @ 2022-04-23  1:50 UTC (permalink / raw)
  To: alexandre.ghiti
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Fri, 01 Apr 2022 05:56:30 PDT (-0700), alexandre.ghiti@canonical.com wrote:
> On Fri, Feb 18, 2022 at 11:45 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
>>
>> Hi Palmer,
>>
>> On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
>> <alexandre.ghiti@canonical.com> wrote:
>> >
>> > On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
>> > <alexandre.ghiti@canonical.com> wrote:
>> > >
>> > > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> > > >
>> > > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
>> > > > > * Please note notable changes in memory layouts and kasan population *
>> > > > >
>> > > > > This patchset allows to have a single kernel for sv39 and sv48 without
>> > > > > being relocatable.
>> > > > >
>> > > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
>> > > > > that is mapping the kernel to the end of the address space, which allows
>> > > > > the kernel to be linked at the same address for both sv39 and sv48 and
>> > > > > then does not require to be relocated at runtime.
>> > > > >
>> > > > > This implements sv48 support at runtime. The kernel will try to
>> > > > > boot with 4-level page table and will fallback to 3-level if the HW does not
>> > > > > support it. Folding the 4th level into a 3-level page table has almost no
>> > > > > cost at runtime.
>> > > > >
>> > > > > Note that kasan region had to be moved to the end of the address space
>> > > > > since its location must be known at compile-time and then be valid for
>> > > > > both sv39 and sv48 (and sv57 that is coming).
>> > > > >
>> > > > > Tested on:
>> > > > >   - qemu rv64 sv39: OK
>> > > > >   - qemu rv64 sv48: OK
>> > > > >   - qemu rv64 sv39 + kasan: OK
>> > > > >   - qemu rv64 sv48 + kasan: OK
>> > > > >   - qemu rv32: OK
>> > > > >
>> > > > > Changes in v3:
>> > > > >   - Fix SZ_1T, thanks to Atish
>> > > > >   - Fix warning create_pud_mapping, thanks to Atish
>> > > > >   - Fix k210 nommu build, thanks to Atish
>> > > > >   - Fix wrong rebase as noted by Samuel
>> > > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>> > > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>> > > > >
>> > > > > Changes in v2:
>> > > > >   - Rebase onto for-next
>> > > > >   - Fix KASAN
>> > > > >   - Fix stack canary
>> > > > >   - Get completely rid of MAXPHYSMEM configs
>> > > > >   - Add documentation
>> > > > >
>> > > > > Alexandre Ghiti (13):
>> > > > >   riscv: Move KASAN mapping next to the kernel mapping
>> > > > >   riscv: Split early kasan mapping to prepare sv48 introduction
>> > > > >   riscv: Introduce functions to switch pt_ops
>> > > > >   riscv: Allow to dynamically define VA_BITS
>> > > > >   riscv: Get rid of MAXPHYSMEM configs
>> > > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>> > > > >   riscv: Implement sv48 support
>> > > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>> > > > >   riscv: Explicit comment about user virtual address space size
>> > > > >   riscv: Improve virtual kernel memory layout dump
>> > > > >   Documentation: riscv: Add sv48 description to VM layout
>> > > > >   riscv: Initialize thread pointer before calling C functions
>> > > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>> > > > >
>> > > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
>> > > > >  arch/riscv/Kconfig                            |  37 +-
>> > > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
>> > > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>> > > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
>> > > > >  arch/riscv/include/asm/csr.h                  |   3 +-
>> > > > >  arch/riscv/include/asm/fixmap.h               |   1
>> > > > >  arch/riscv/include/asm/kasan.h                |  11 +-
>> > > > >  arch/riscv/include/asm/page.h                 |  20 +-
>> > > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
>> > > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>> > > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
>> > > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
>> > > > >  arch/riscv/kernel/cpu.c                       |  23 +-
>> > > > >  arch/riscv/kernel/head.S                      |   4 +-
>> > > > >  arch/riscv/mm/context.c                       |   4 +-
>> > > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>> > > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>> > > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
>> > > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
>> > > > >  include/asm-generic/pgalloc.h                 |  24 +-
>> > > > >  include/linux/sizes.h                         |   1
>> > > > >  22 files changed, 833 insertions(+), 209 deletions(-)
>> > > >
>> > > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
>> > > > handful of trivial fixes for configs that were failing to build/boot and
>> > > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
>> > > > it'd be easier to backport.  This is bigger than something I'd normally like to
>> > > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
>> > > > and it looks like folks have been testing this I'm just going to go with it.
>> > > >
>> > >
>> > > Yes yes yes! That's fantastic news :)
>> > >
>> > > > Let me know if there's any issues with the merge, it was a bit hairy.
>> > > > Probably best to just send along a fixup patch at this point.
>> > >
>> > > I'm going to take a look at that now, and I'll fix anything that comes
>> > > up quickly :)
>> >
>> > I see in for-next that you did not take the following patches:
>> >
>> >   riscv: Improve virtual kernel memory layout dump
>> >   Documentation: riscv: Add sv48 description to VM layout
>> >   riscv: Initialize thread pointer before calling C functions
>> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>> >
>> > I'm not sure this was your intention. If it was, I believe that at
>> > least the first 2 patches are needed in this series, the 3rd one is a
>> > useful fix and we can discuss the 4th if that's an issue for you.
>>
>> Can you confirm that this was intentional and maybe explain the
>> motivation behind it? Because I see value in those patches.
>
> Palmer,
>
> I read that you were still taking patches for 5.18, so I confirm again
> that the patches above are needed IMO.

It was too late for this when it was sent (I saw it then, but just got 
around to actually doing the work to sort it out).

It took me a while to figure out exactly what was going on here, but I 
think I remember now: that downgrade patch (and the follow-on I just 
sent) is broken for medlow, because mm/init.c must be built medany 
(which we're using for the mostly-PIC qualities).  I remember being in 
the middle of rebasing/debugging this a while ago, I must have forgotten 
I was in the middle of that and accidentally merged the branch as-is.  
Certainly wasn't trying to silently take half the patch set and leave 
the rest in limbo, that's the wrong way to do things. 

I'm not sure what the right answer is here, but I just sent a patch to 
drop support for medlow.  We'll have to talk about that, for now I 
cleaned up some other minor issues, rearranged that docs and fix to come 
first, and put this at palmer/riscv-sv48.  I think that fix is 
reasonable to take the doc and fix into fixes, then the dump improvement 
on for-next.  We'll have to see what folks think about the medany-only 
kernels, the other option would be to build FDT as medany which seems a 
bit awkward.  

> Maybe even the relocatable series?

Do you mind giving me a pointer?  I'm not sure why I'm so drop-prone 
with your patches, I promise I'm not doing it on purpose.

>
> Thanks,
>
> Alex
>
>>
>> Thanks,
>>
>> Alex
>>
>> >
>> > I tested for-next on both sv39 and sv48 successfully, I took a glance
>> > at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
>> > that. Otherwise nothing obvious has popped.
>> >
>> > Thanks again,
>> >
>> > Alex
>> >
>> > >
>> > > Thanks!
>> > >
>> > > Alex
>> > >
>> > > >
>> > > > Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2022-04-23  1:50             ` Palmer Dabbelt
  0 siblings, 0 replies; 70+ messages in thread
From: Palmer Dabbelt @ 2022-04-23  1:50 UTC (permalink / raw)
  To: alexandre.ghiti
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Fri, 01 Apr 2022 05:56:30 PDT (-0700), alexandre.ghiti@canonical.com wrote:
> On Fri, Feb 18, 2022 at 11:45 AM Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
>>
>> Hi Palmer,
>>
>> On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
>> <alexandre.ghiti@canonical.com> wrote:
>> >
>> > On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
>> > <alexandre.ghiti@canonical.com> wrote:
>> > >
>> > > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> > > >
>> > > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
>> > > > > * Please note notable changes in memory layouts and kasan population *
>> > > > >
>> > > > > This patchset allows to have a single kernel for sv39 and sv48 without
>> > > > > being relocatable.
>> > > > >
>> > > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
>> > > > > that is mapping the kernel to the end of the address space, which allows
>> > > > > the kernel to be linked at the same address for both sv39 and sv48 and
>> > > > > then does not require to be relocated at runtime.
>> > > > >
>> > > > > This implements sv48 support at runtime. The kernel will try to
>> > > > > boot with 4-level page table and will fallback to 3-level if the HW does not
>> > > > > support it. Folding the 4th level into a 3-level page table has almost no
>> > > > > cost at runtime.
>> > > > >
>> > > > > Note that kasan region had to be moved to the end of the address space
>> > > > > since its location must be known at compile-time and then be valid for
>> > > > > both sv39 and sv48 (and sv57 that is coming).
>> > > > >
>> > > > > Tested on:
>> > > > >   - qemu rv64 sv39: OK
>> > > > >   - qemu rv64 sv48: OK
>> > > > >   - qemu rv64 sv39 + kasan: OK
>> > > > >   - qemu rv64 sv48 + kasan: OK
>> > > > >   - qemu rv32: OK
>> > > > >
>> > > > > Changes in v3:
>> > > > >   - Fix SZ_1T, thanks to Atish
>> > > > >   - Fix warning create_pud_mapping, thanks to Atish
>> > > > >   - Fix k210 nommu build, thanks to Atish
>> > > > >   - Fix wrong rebase as noted by Samuel
>> > > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>> > > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>> > > > >
>> > > > > Changes in v2:
>> > > > >   - Rebase onto for-next
>> > > > >   - Fix KASAN
>> > > > >   - Fix stack canary
>> > > > >   - Get completely rid of MAXPHYSMEM configs
>> > > > >   - Add documentation
>> > > > >
>> > > > > Alexandre Ghiti (13):
>> > > > >   riscv: Move KASAN mapping next to the kernel mapping
>> > > > >   riscv: Split early kasan mapping to prepare sv48 introduction
>> > > > >   riscv: Introduce functions to switch pt_ops
>> > > > >   riscv: Allow to dynamically define VA_BITS
>> > > > >   riscv: Get rid of MAXPHYSMEM configs
>> > > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>> > > > >   riscv: Implement sv48 support
>> > > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>> > > > >   riscv: Explicit comment about user virtual address space size
>> > > > >   riscv: Improve virtual kernel memory layout dump
>> > > > >   Documentation: riscv: Add sv48 description to VM layout
>> > > > >   riscv: Initialize thread pointer before calling C functions
>> > > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>> > > > >
>> > > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
>> > > > >  arch/riscv/Kconfig                            |  37 +-
>> > > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
>> > > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>> > > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
>> > > > >  arch/riscv/include/asm/csr.h                  |   3 +-
>> > > > >  arch/riscv/include/asm/fixmap.h               |   1
>> > > > >  arch/riscv/include/asm/kasan.h                |  11 +-
>> > > > >  arch/riscv/include/asm/page.h                 |  20 +-
>> > > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
>> > > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>> > > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
>> > > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
>> > > > >  arch/riscv/kernel/cpu.c                       |  23 +-
>> > > > >  arch/riscv/kernel/head.S                      |   4 +-
>> > > > >  arch/riscv/mm/context.c                       |   4 +-
>> > > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>> > > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>> > > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
>> > > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
>> > > > >  include/asm-generic/pgalloc.h                 |  24 +-
>> > > > >  include/linux/sizes.h                         |   1
>> > > > >  22 files changed, 833 insertions(+), 209 deletions(-)
>> > > >
>> > > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
>> > > > handful of trivial fixes for configs that were failing to build/boot and
>> > > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
>> > > > it'd be easier to backport.  This is bigger than something I'd normally like to
>> > > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
>> > > > and it looks like folks have been testing this I'm just going to go with it.
>> > > >
>> > >
>> > > Yes yes yes! That's fantastic news :)
>> > >
>> > > > Let me know if there's any issues with the merge, it was a bit hairy.
>> > > > Probably best to just send along a fixup patch at this point.
>> > >
>> > > I'm going to take a look at that now, and I'll fix anything that comes
>> > > up quickly :)
>> >
>> > I see in for-next that you did not take the following patches:
>> >
>> >   riscv: Improve virtual kernel memory layout dump
>> >   Documentation: riscv: Add sv48 description to VM layout
>> >   riscv: Initialize thread pointer before calling C functions
>> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>> >
>> > I'm not sure this was your intention. If it was, I believe that at
>> > least the first 2 patches are needed in this series, the 3rd one is a
>> > useful fix and we can discuss the 4th if that's an issue for you.
>>
>> Can you confirm that this was intentional and maybe explain the
>> motivation behind it? Because I see value in those patches.
>
> Palmer,
>
> I read that you were still taking patches for 5.18, so I confirm again
> that the patches above are needed IMO.

It was too late for this when it was sent (I saw it then, but just got 
around to actually doing the work to sort it out).

It took me a while to figure out exactly what was going on here, but I 
think I remember now: that downgrade patch (and the follow-on I just 
sent) is broken for medlow, because mm/init.c must be built medany 
(which we're using for the mostly-PIC qualities).  I remember being in 
the middle of rebasing/debugging this a while ago, I must have forgotten 
I was in the middle of that and accidentally merged the branch as-is.  
Certainly wasn't trying to silently take half the patch set and leave 
the rest in limbo, that's the wrong way to do things. 

I'm not sure what the right answer is here, but I just sent a patch to 
drop support for medlow.  We'll have to talk about that, for now I 
cleaned up some other minor issues, rearranged that docs and fix to come 
first, and put this at palmer/riscv-sv48.  I think that fix is 
reasonable to take the doc and fix into fixes, then the dump improvement 
on for-next.  We'll have to see what folks think about the medany-only 
kernels, the other option would be to build FDT as medany which seems a 
bit awkward.  

> Maybe even the relocatable series?

Do you mind giving me a pointer?  I'm not sure why I'm so drop-prone 
with your patches, I promise I'm not doing it on purpose.

>
> Thanks,
>
> Alex
>
>>
>> Thanks,
>>
>> Alex
>>
>> >
>> > I tested for-next on both sv39 and sv48 successfully, I took a glance
>> > at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
>> > that. Otherwise nothing obvious has popped.
>> >
>> > Thanks again,
>> >
>> > Alex
>> >
>> > >
>> > > Thanks!
>> > >
>> > > Alex
>> > >
>> > > >
>> > > > Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
  2021-12-06 10:46   ` Alexandre Ghiti
@ 2022-04-26  5:57     ` Nick Kossifidis
  -1 siblings, 0 replies; 70+ messages in thread
From: Nick Kossifidis @ 2022-04-26  5:57 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

Hello Alex,

On 12/6/21 12:46, Alexandre Ghiti wrote:
> 
> +#ifdef CONFIG_64BIT
> +static void __init disable_pgtable_l4(void)
> +{
> +	pgtable_l4_enabled = false;
> +	kernel_map.page_offset = PAGE_OFFSET_L3;
> +	satp_mode = SATP_MODE_39;
> +}
> +
> +/*
> + * There is a simple way to determine if 4-level is supported by the
> + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> + * then read SATP to see if the configuration was taken into account
> + * meaning sv48 is supported.
> + */
> +static __init void set_satp_mode(void)
> +{
> +	u64 identity_satp, hw_satp;
> +	uintptr_t set_satp_mode_pmd;
> +
> +	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> +	create_pgd_mapping(early_pg_dir,
> +			   set_satp_mode_pmd, (uintptr_t)early_pud,
> +			   PGDIR_SIZE, PAGE_TABLE);
> +	create_pud_mapping(early_pud,
> +			   set_satp_mode_pmd, (uintptr_t)early_pmd,
> +			   PUD_SIZE, PAGE_TABLE);
> +	/* Handle the case where set_satp_mode straddles 2 PMDs */
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd, set_satp_mode_pmd,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +
> +	identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> +
> +	local_flush_tlb_all();
> +	csr_write(CSR_SATP, identity_satp);
> +	hw_satp = csr_swap(CSR_SATP, 0ULL);
> +	local_flush_tlb_all();
> +
> +	if (hw_satp != identity_satp)
> +		disable_pgtable_l4();
> +
> +	memset(early_pg_dir, 0, PAGE_SIZE);
> +	memset(early_pud, 0, PAGE_SIZE);
> +	memset(early_pmd, 0, PAGE_SIZE);
> +}
> +#endif
> +

When doing the 1:1 mapping you don't take into account the limitation 
that all bits above 47 need to have the same value as bit 47. If the 
kernel exists at a high physical address with bit 47 set the 
corresponding virtual address will be invalid, resulting an instruction 
fetch fault as the privilege spec mandates. We verified this bug on our 
prototype. I suggest we re-write this in assembly and do a proper satp 
switch like we do on head.S, so that we don't need the 1:1 mapping and 
we also have a way to recover in case this fails.

Regards,
Nick

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 07/13] riscv: Implement sv48 support
@ 2022-04-26  5:57     ` Nick Kossifidis
  0 siblings, 0 replies; 70+ messages in thread
From: Nick Kossifidis @ 2022-04-26  5:57 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Zong Li, Anup Patel, Atish Patra, Christoph Hellwig,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Ard Biesheuvel, Arnd Bergmann, Kees Cook, Guo Ren,
	Heinrich Schuchardt, Mayuresh Chitale, panqinglin2020, linux-doc,
	linux-riscv, linux-kernel, kasan-dev, linux-efi, linux-arch

Hello Alex,

On 12/6/21 12:46, Alexandre Ghiti wrote:
> 
> +#ifdef CONFIG_64BIT
> +static void __init disable_pgtable_l4(void)
> +{
> +	pgtable_l4_enabled = false;
> +	kernel_map.page_offset = PAGE_OFFSET_L3;
> +	satp_mode = SATP_MODE_39;
> +}
> +
> +/*
> + * There is a simple way to determine if 4-level is supported by the
> + * underlying hardware: establish 1:1 mapping in 4-level page table mode
> + * then read SATP to see if the configuration was taken into account
> + * meaning sv48 is supported.
> + */
> +static __init void set_satp_mode(void)
> +{
> +	u64 identity_satp, hw_satp;
> +	uintptr_t set_satp_mode_pmd;
> +
> +	set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
> +	create_pgd_mapping(early_pg_dir,
> +			   set_satp_mode_pmd, (uintptr_t)early_pud,
> +			   PGDIR_SIZE, PAGE_TABLE);
> +	create_pud_mapping(early_pud,
> +			   set_satp_mode_pmd, (uintptr_t)early_pmd,
> +			   PUD_SIZE, PAGE_TABLE);
> +	/* Handle the case where set_satp_mode straddles 2 PMDs */
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd, set_satp_mode_pmd,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +	create_pmd_mapping(early_pmd,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   set_satp_mode_pmd + PMD_SIZE,
> +			   PMD_SIZE, PAGE_KERNEL_EXEC);
> +
> +	identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
> +
> +	local_flush_tlb_all();
> +	csr_write(CSR_SATP, identity_satp);
> +	hw_satp = csr_swap(CSR_SATP, 0ULL);
> +	local_flush_tlb_all();
> +
> +	if (hw_satp != identity_satp)
> +		disable_pgtable_l4();
> +
> +	memset(early_pg_dir, 0, PAGE_SIZE);
> +	memset(early_pud, 0, PAGE_SIZE);
> +	memset(early_pmd, 0, PAGE_SIZE);
> +}
> +#endif
> +

When doing the 1:1 mapping you don't take into account the limitation 
that all bits above 47 need to have the same value as bit 47. If the 
kernel exists at a high physical address with bit 47 set the 
corresponding virtual address will be invalid, resulting an instruction 
fetch fault as the privilege spec mandates. We verified this bug on our 
prototype. I suggest we re-write this in assembly and do a proper satp 
switch like we do on head.S, so that we don't need the 1:1 mapping and 
we also have a way to recover in case this fails.

Regards,
Nick

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
  2022-04-23  1:50             ` Palmer Dabbelt
@ 2022-06-02  3:43               ` Palmer Dabbelt
  -1 siblings, 0 replies; 70+ messages in thread
From: Palmer Dabbelt @ 2022-06-02  3:43 UTC (permalink / raw)
  To: alexandre.ghiti
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Fri, 22 Apr 2022 18:50:47 PDT (-0700), Palmer Dabbelt wrote:
> On Fri, 01 Apr 2022 05:56:30 PDT (-0700), alexandre.ghiti@canonical.com wrote:
>> On Fri, Feb 18, 2022 at 11:45 AM Alexandre Ghiti
>> <alexandre.ghiti@canonical.com> wrote:
>>>
>>> Hi Palmer,
>>>
>>> On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
>>> <alexandre.ghiti@canonical.com> wrote:
>>> >
>>> > On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
>>> > <alexandre.ghiti@canonical.com> wrote:
>>> > >
>>> > > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>> > > >
>>> > > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
>>> > > > > * Please note notable changes in memory layouts and kasan population *
>>> > > > >
>>> > > > > This patchset allows to have a single kernel for sv39 and sv48 without
>>> > > > > being relocatable.
>>> > > > >
>>> > > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
>>> > > > > that is mapping the kernel to the end of the address space, which allows
>>> > > > > the kernel to be linked at the same address for both sv39 and sv48 and
>>> > > > > then does not require to be relocated at runtime.
>>> > > > >
>>> > > > > This implements sv48 support at runtime. The kernel will try to
>>> > > > > boot with 4-level page table and will fallback to 3-level if the HW does not
>>> > > > > support it. Folding the 4th level into a 3-level page table has almost no
>>> > > > > cost at runtime.
>>> > > > >
>>> > > > > Note that kasan region had to be moved to the end of the address space
>>> > > > > since its location must be known at compile-time and then be valid for
>>> > > > > both sv39 and sv48 (and sv57 that is coming).
>>> > > > >
>>> > > > > Tested on:
>>> > > > >   - qemu rv64 sv39: OK
>>> > > > >   - qemu rv64 sv48: OK
>>> > > > >   - qemu rv64 sv39 + kasan: OK
>>> > > > >   - qemu rv64 sv48 + kasan: OK
>>> > > > >   - qemu rv32: OK
>>> > > > >
>>> > > > > Changes in v3:
>>> > > > >   - Fix SZ_1T, thanks to Atish
>>> > > > >   - Fix warning create_pud_mapping, thanks to Atish
>>> > > > >   - Fix k210 nommu build, thanks to Atish
>>> > > > >   - Fix wrong rebase as noted by Samuel
>>> > > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>>> > > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>>> > > > >
>>> > > > > Changes in v2:
>>> > > > >   - Rebase onto for-next
>>> > > > >   - Fix KASAN
>>> > > > >   - Fix stack canary
>>> > > > >   - Get completely rid of MAXPHYSMEM configs
>>> > > > >   - Add documentation
>>> > > > >
>>> > > > > Alexandre Ghiti (13):
>>> > > > >   riscv: Move KASAN mapping next to the kernel mapping
>>> > > > >   riscv: Split early kasan mapping to prepare sv48 introduction
>>> > > > >   riscv: Introduce functions to switch pt_ops
>>> > > > >   riscv: Allow to dynamically define VA_BITS
>>> > > > >   riscv: Get rid of MAXPHYSMEM configs
>>> > > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>>> > > > >   riscv: Implement sv48 support
>>> > > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>>> > > > >   riscv: Explicit comment about user virtual address space size
>>> > > > >   riscv: Improve virtual kernel memory layout dump
>>> > > > >   Documentation: riscv: Add sv48 description to VM layout
>>> > > > >   riscv: Initialize thread pointer before calling C functions
>>> > > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>>> > > > >
>>> > > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
>>> > > > >  arch/riscv/Kconfig                            |  37 +-
>>> > > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
>>> > > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>>> > > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
>>> > > > >  arch/riscv/include/asm/csr.h                  |   3 +-
>>> > > > >  arch/riscv/include/asm/fixmap.h               |   1
>>> > > > >  arch/riscv/include/asm/kasan.h                |  11 +-
>>> > > > >  arch/riscv/include/asm/page.h                 |  20 +-
>>> > > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
>>> > > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>>> > > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
>>> > > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
>>> > > > >  arch/riscv/kernel/cpu.c                       |  23 +-
>>> > > > >  arch/riscv/kernel/head.S                      |   4 +-
>>> > > > >  arch/riscv/mm/context.c                       |   4 +-
>>> > > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>>> > > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>>> > > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
>>> > > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
>>> > > > >  include/asm-generic/pgalloc.h                 |  24 +-
>>> > > > >  include/linux/sizes.h                         |   1
>>> > > > >  22 files changed, 833 insertions(+), 209 deletions(-)
>>> > > >
>>> > > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
>>> > > > handful of trivial fixes for configs that were failing to build/boot and
>>> > > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
>>> > > > it'd be easier to backport.  This is bigger than something I'd normally like to
>>> > > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
>>> > > > and it looks like folks have been testing this I'm just going to go with it.
>>> > > >
>>> > >
>>> > > Yes yes yes! That's fantastic news :)
>>> > >
>>> > > > Let me know if there's any issues with the merge, it was a bit hairy.
>>> > > > Probably best to just send along a fixup patch at this point.
>>> > >
>>> > > I'm going to take a look at that now, and I'll fix anything that comes
>>> > > up quickly :)
>>> >
>>> > I see in for-next that you did not take the following patches:
>>> >
>>> >   riscv: Improve virtual kernel memory layout dump
>>> >   Documentation: riscv: Add sv48 description to VM layout
>>> >   riscv: Initialize thread pointer before calling C functions
>>> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>>> >
>>> > I'm not sure this was your intention. If it was, I believe that at
>>> > least the first 2 patches are needed in this series, the 3rd one is a
>>> > useful fix and we can discuss the 4th if that's an issue for you.
>>>
>>> Can you confirm that this was intentional and maybe explain the
>>> motivation behind it? Because I see value in those patches.
>>
>> Palmer,
>>
>> I read that you were still taking patches for 5.18, so I confirm again
>> that the patches above are needed IMO.
>
> It was too late for this when it was sent (I saw it then, but just got
> around to actually doing the work to sort it out).
>
> It took me a while to figure out exactly what was going on here, but I
> think I remember now: that downgrade patch (and the follow-on I just
> sent) is broken for medlow, because mm/init.c must be built medany
> (which we're using for the mostly-PIC qualities).  I remember being in
> the middle of rebasing/debugging this a while ago, I must have forgotten
> I was in the middle of that and accidentally merged the branch as-is.
> Certainly wasn't trying to silently take half the patch set and leave
> the rest in limbo, that's the wrong way to do things.
>
> I'm not sure what the right answer is here, but I just sent a patch to
> drop support for medlow.  We'll have to talk about that, for now I
> cleaned up some other minor issues, rearranged that docs and fix to come
> first, and put this at palmer/riscv-sv48.  I think that fix is
> reasonable to take the doc and fix into fixes, then the dump improvement
> on for-next.  We'll have to see what folks think about the medany-only
> kernels, the other option would be to build FDT as medany which seems a
> bit awkward.

All but the last one are on for-next, there's some discussion on that 
last one that pointed out some better ways to do it.

>
>> Maybe even the relocatable series?
>
> Do you mind giving me a pointer?  I'm not sure why I'm so drop-prone
> with your patches, I promise I'm not doing it on purpose.
>
>>
>> Thanks,
>>
>> Alex
>>
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>> >
>>> > I tested for-next on both sv39 and sv48 successfully, I took a glance
>>> > at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
>>> > that. Otherwise nothing obvious has popped.
>>> >
>>> > Thanks again,
>>> >
>>> > Alex
>>> >
>>> > >
>>> > > Thanks!
>>> > >
>>> > > Alex
>>> > >
>>> > > >
>>> > > > Thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v3 00/13] Introduce sv48 support without relocatable kernel
@ 2022-06-02  3:43               ` Palmer Dabbelt
  0 siblings, 0 replies; 70+ messages in thread
From: Palmer Dabbelt @ 2022-06-02  3:43 UTC (permalink / raw)
  To: alexandre.ghiti
  Cc: corbet, Paul Walmsley, aou, zong.li, anup, Atish.Patra,
	Christoph Hellwig, ryabinin.a.a, glider, andreyknvl, dvyukov,
	ardb, Arnd Bergmann, keescook, guoren, heinrich.schuchardt,
	mchitale, panqinglin2020, linux-doc, linux-riscv, linux-kernel,
	kasan-dev, linux-efi, linux-arch

On Fri, 22 Apr 2022 18:50:47 PDT (-0700), Palmer Dabbelt wrote:
> On Fri, 01 Apr 2022 05:56:30 PDT (-0700), alexandre.ghiti@canonical.com wrote:
>> On Fri, Feb 18, 2022 at 11:45 AM Alexandre Ghiti
>> <alexandre.ghiti@canonical.com> wrote:
>>>
>>> Hi Palmer,
>>>
>>> On Thu, Jan 20, 2022 at 11:05 AM Alexandre Ghiti
>>> <alexandre.ghiti@canonical.com> wrote:
>>> >
>>> > On Thu, Jan 20, 2022 at 8:30 AM Alexandre Ghiti
>>> > <alexandre.ghiti@canonical.com> wrote:
>>> > >
>>> > > On Thu, Jan 20, 2022 at 5:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>> > > >
>>> > > > On Mon, 06 Dec 2021 02:46:44 PST (-0800), alexandre.ghiti@canonical.com wrote:
>>> > > > > * Please note notable changes in memory layouts and kasan population *
>>> > > > >
>>> > > > > This patchset allows to have a single kernel for sv39 and sv48 without
>>> > > > > being relocatable.
>>> > > > >
>>> > > > > The idea comes from Arnd Bergmann who suggested to do the same as x86,
>>> > > > > that is mapping the kernel to the end of the address space, which allows
>>> > > > > the kernel to be linked at the same address for both sv39 and sv48 and
>>> > > > > then does not require to be relocated at runtime.
>>> > > > >
>>> > > > > This implements sv48 support at runtime. The kernel will try to
>>> > > > > boot with 4-level page table and will fallback to 3-level if the HW does not
>>> > > > > support it. Folding the 4th level into a 3-level page table has almost no
>>> > > > > cost at runtime.
>>> > > > >
>>> > > > > Note that kasan region had to be moved to the end of the address space
>>> > > > > since its location must be known at compile-time and then be valid for
>>> > > > > both sv39 and sv48 (and sv57 that is coming).
>>> > > > >
>>> > > > > Tested on:
>>> > > > >   - qemu rv64 sv39: OK
>>> > > > >   - qemu rv64 sv48: OK
>>> > > > >   - qemu rv64 sv39 + kasan: OK
>>> > > > >   - qemu rv64 sv48 + kasan: OK
>>> > > > >   - qemu rv32: OK
>>> > > > >
>>> > > > > Changes in v3:
>>> > > > >   - Fix SZ_1T, thanks to Atish
>>> > > > >   - Fix warning create_pud_mapping, thanks to Atish
>>> > > > >   - Fix k210 nommu build, thanks to Atish
>>> > > > >   - Fix wrong rebase as noted by Samuel
>>> > > > >   - * Downgrade to sv39 is only possible if !KASAN (see commit changelog) *
>>> > > > >   - * Move KASAN next to the kernel: virtual layouts changed and kasan population *
>>> > > > >
>>> > > > > Changes in v2:
>>> > > > >   - Rebase onto for-next
>>> > > > >   - Fix KASAN
>>> > > > >   - Fix stack canary
>>> > > > >   - Get completely rid of MAXPHYSMEM configs
>>> > > > >   - Add documentation
>>> > > > >
>>> > > > > Alexandre Ghiti (13):
>>> > > > >   riscv: Move KASAN mapping next to the kernel mapping
>>> > > > >   riscv: Split early kasan mapping to prepare sv48 introduction
>>> > > > >   riscv: Introduce functions to switch pt_ops
>>> > > > >   riscv: Allow to dynamically define VA_BITS
>>> > > > >   riscv: Get rid of MAXPHYSMEM configs
>>> > > > >   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
>>> > > > >   riscv: Implement sv48 support
>>> > > > >   riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
>>> > > > >   riscv: Explicit comment about user virtual address space size
>>> > > > >   riscv: Improve virtual kernel memory layout dump
>>> > > > >   Documentation: riscv: Add sv48 description to VM layout
>>> > > > >   riscv: Initialize thread pointer before calling C functions
>>> > > > >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>>> > > > >
>>> > > > >  Documentation/riscv/vm-layout.rst             |  48 ++-
>>> > > > >  arch/riscv/Kconfig                            |  37 +-
>>> > > > >  arch/riscv/configs/nommu_k210_defconfig       |   1 -
>>> > > > >  .../riscv/configs/nommu_k210_sdcard_defconfig |   1 -
>>> > > > >  arch/riscv/configs/nommu_virt_defconfig       |   1 -
>>> > > > >  arch/riscv/include/asm/csr.h                  |   3 +-
>>> > > > >  arch/riscv/include/asm/fixmap.h               |   1
>>> > > > >  arch/riscv/include/asm/kasan.h                |  11 +-
>>> > > > >  arch/riscv/include/asm/page.h                 |  20 +-
>>> > > > >  arch/riscv/include/asm/pgalloc.h              |  40 ++
>>> > > > >  arch/riscv/include/asm/pgtable-64.h           | 108 ++++-
>>> > > > >  arch/riscv/include/asm/pgtable.h              |  47 +-
>>> > > > >  arch/riscv/include/asm/sparsemem.h            |   6 +-
>>> > > > >  arch/riscv/kernel/cpu.c                       |  23 +-
>>> > > > >  arch/riscv/kernel/head.S                      |   4 +-
>>> > > > >  arch/riscv/mm/context.c                       |   4 +-
>>> > > > >  arch/riscv/mm/init.c                          | 408 ++++++++++++++----
>>> > > > >  arch/riscv/mm/kasan_init.c                    | 250 ++++++++---
>>> > > > >  drivers/firmware/efi/libstub/efi-stub.c       |   2
>>> > > > >  drivers/pci/controller/pci-xgene.c            |   2 +-
>>> > > > >  include/asm-generic/pgalloc.h                 |  24 +-
>>> > > > >  include/linux/sizes.h                         |   1
>>> > > > >  22 files changed, 833 insertions(+), 209 deletions(-)
>>> > > >
>>> > > > Sorry this took a while.  This is on for-next, with a bit of juggling: a
>>> > > > handful of trivial fixes for configs that were failing to build/boot and
>>> > > > some merge issues.  I also pulled out that MAXPHYSMEM fix to the top, so
>>> > > > it'd be easier to backport.  This is bigger than something I'd normally like to
>>> > > > take late in the cycle, but given there's a lot of cleanups, likely some fixes,
>>> > > > and it looks like folks have been testing this I'm just going to go with it.
>>> > > >
>>> > >
>>> > > Yes yes yes! That's fantastic news :)
>>> > >
>>> > > > Let me know if there's any issues with the merge, it was a bit hairy.
>>> > > > Probably best to just send along a fixup patch at this point.
>>> > >
>>> > > I'm going to take a look at that now, and I'll fix anything that comes
>>> > > up quickly :)
>>> >
>>> > I see in for-next that you did not take the following patches:
>>> >
>>> >   riscv: Improve virtual kernel memory layout dump
>>> >   Documentation: riscv: Add sv48 description to VM layout
>>> >   riscv: Initialize thread pointer before calling C functions
>>> >   riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN
>>> >
>>> > I'm not sure this was your intention. If it was, I believe that at
>>> > least the first 2 patches are needed in this series, the 3rd one is a
>>> > useful fix and we can discuss the 4th if that's an issue for you.
>>>
>>> Can you confirm that this was intentional and maybe explain the
>>> motivation behind it? Because I see value in those patches.
>>
>> Palmer,
>>
>> I read that you were still taking patches for 5.18, so I confirm again
>> that the patches above are needed IMO.
>
> It was too late for this when it was sent (I saw it then, but just got
> around to actually doing the work to sort it out).
>
> It took me a while to figure out exactly what was going on here, but I
> think I remember now: that downgrade patch (and the follow-on I just
> sent) is broken for medlow, because mm/init.c must be built medany
> (which we're using for the mostly-PIC qualities).  I remember being in
> the middle of rebasing/debugging this a while ago, I must have forgotten
> I was in the middle of that and accidentally merged the branch as-is.
> Certainly wasn't trying to silently take half the patch set and leave
> the rest in limbo, that's the wrong way to do things.
>
> I'm not sure what the right answer is here, but I just sent a patch to
> drop support for medlow.  We'll have to talk about that, for now I
> cleaned up some other minor issues, rearranged that docs and fix to come
> first, and put this at palmer/riscv-sv48.  I think that fix is
> reasonable to take the doc and fix into fixes, then the dump improvement
> on for-next.  We'll have to see what folks think about the medany-only
> kernels, the other option would be to build FDT as medany which seems a
> bit awkward.

All but the last one are on for-next, there's some discussion on that 
last one that pointed out some better ways to do it.

>
>> Maybe even the relocatable series?
>
> Do you mind giving me a pointer?  I'm not sure why I'm so drop-prone
> with your patches, I promise I'm not doing it on purpose.
>
>>
>> Thanks,
>>
>> Alex
>>
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>> >
>>> > I tested for-next on both sv39 and sv48 successfully, I took a glance
>>> > at the code and noticed you fixed the PTRS_PER_PGD error, thanks for
>>> > that. Otherwise nothing obvious has popped.
>>> >
>>> > Thanks again,
>>> >
>>> > Alex
>>> >
>>> > >
>>> > > Thanks!
>>> > >
>>> > > Alex
>>> > >
>>> > > >
>>> > > > Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2022-06-02  3:44 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06 10:46 [PATCH v3 00/13] Introduce sv48 support without relocatable kernel Alexandre Ghiti
2021-12-06 10:46 ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 01/13] riscv: Move KASAN mapping next to the kernel mapping Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 16:18   ` Jisheng Zhang
2021-12-06 16:18     ` Jisheng Zhang
2021-12-06 10:46 ` [PATCH v3 02/13] riscv: Split early kasan mapping to prepare sv48 introduction Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 03/13] riscv: Introduce functions to switch pt_ops Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 04/13] riscv: Allow to dynamically define VA_BITS Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 05/13] riscv: Get rid of MAXPHYSMEM configs Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 06/13] asm-generic: Prepare for riscv use of pud_alloc_one and pud_free Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 07/13] riscv: Implement sv48 support Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 11:05   ` Alexandre ghiti
2021-12-06 11:05     ` Alexandre ghiti
2021-12-09  4:32     ` 潘庆霖
2021-12-09  4:32       ` 潘庆霖
2021-12-26  8:59   ` Jisheng Zhang
2021-12-26  8:59     ` Jisheng Zhang
2022-01-04 12:44     ` Alexandre Ghiti
2022-01-04 12:44       ` Alexandre Ghiti
2021-12-29  3:42   ` Guo Ren
2021-12-29  3:42     ` Guo Ren
2022-01-04 12:42     ` Alexandre Ghiti
2022-01-04 12:42       ` Alexandre Ghiti
2022-04-26  5:57   ` Nick Kossifidis
2022-04-26  5:57     ` Nick Kossifidis
2021-12-06 10:46 ` [PATCH v3 08/13] riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 09/13] riscv: Explicit comment about user virtual address space size Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 10/13] riscv: Improve virtual kernel memory layout dump Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-09  4:18   ` 潘庆霖
2021-12-09  9:09     ` Alexandre ghiti
2021-12-06 10:46 ` [PATCH v3 11/13] Documentation: riscv: Add sv48 description to VM layout Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 10:46 ` [PATCH v3 12/13] riscv: Initialize thread pointer before calling C functions Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-20  9:11   ` Guo Ren
2021-12-20  9:11     ` Guo Ren
2021-12-20  9:17     ` Ard Biesheuvel
2021-12-20  9:17       ` Ard Biesheuvel
2021-12-20 13:40       ` Guo Ren
2021-12-20 13:40         ` Guo Ren
2022-01-10  8:03   ` Alexandre ghiti
2022-01-10  8:03     ` Alexandre ghiti
2021-12-06 10:46 ` [PATCH v3 13/13] riscv: Allow user to downgrade to sv39 when hw supports sv48 if !KASAN Alexandre Ghiti
2021-12-06 10:46   ` Alexandre Ghiti
2021-12-06 11:08 ` [PATCH v3 00/13] Introduce sv48 support without relocatable kernel Alexandre ghiti
2021-12-06 11:08   ` Alexandre ghiti
2022-01-20  4:18 ` Palmer Dabbelt
2022-01-20  4:18   ` Palmer Dabbelt
2022-01-20  7:30   ` Alexandre Ghiti
2022-01-20  7:30     ` Alexandre Ghiti
2022-01-20 10:05     ` Alexandre Ghiti
2022-01-20 10:05       ` Alexandre Ghiti
2022-02-18 10:45       ` Alexandre Ghiti
2022-02-18 10:45         ` Alexandre Ghiti
2022-04-01 12:56         ` Alexandre Ghiti
2022-04-01 12:56           ` Alexandre Ghiti
2022-04-23  1:50           ` Palmer Dabbelt
2022-04-23  1:50             ` Palmer Dabbelt
2022-06-02  3:43             ` Palmer Dabbelt
2022-06-02  3:43               ` Palmer Dabbelt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.