All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work
@ 2023-08-14  4:25 Henry Wang
  2023-08-14  4:25 ` [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm() Henry Wang
                   ` (12 more replies)
  0 siblings, 13 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Wei Chen, Penny Zheng, Volodymyr Babchuk, Jan Beulich,
	Paul Durrant, Roger Pau Monné

Based on the discussion in the Xen Summit [1], sending this series out after
addressing the comments in v4 [2] as the preparation work to add MPU support.
Full series of single core MPU support to boot Linux guest can be found
in [3], which passed the GitLab CI check in [4].

Mostly code movement, with some of Kconfig and build system (mainly Makefiles)
adjustment. No functional change expected.

This series is based on:
a9a3b432a8 x86: adjust comparison for earlier signedness change

[1] https://lore.kernel.org/xen-devel/AS8PR08MB799122F8B0CB841DED64F4819226A@AS8PR08MB7991.eurprd08.prod.outlook.com/
[2] https://lore.kernel.org/xen-devel/20230801034419.2047541-1-Henry.Wang@arm.com/
[3] https://gitlab.com/xen-project/people/henryw/xen/-/commits/mpu_v5/
[4] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/966450933

Henry Wang (4):
  xen/arm: Introduce CONFIG_MMU Kconfig option
  xen/arm64: Split and move MMU-specific head.S to mmu/head.S
  xen/arm64: Fold setup_fixmap() to create_page_tables()
  xen/arm: Extract MMU-specific code

Penny Zheng (6):
  xen/arm: Fold pmap and fixmap into MMU system
  xen/arm: mm: Use generic variable/function names for extendability
  xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c
  xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  xen/arm: mmu: relocate copy_from_paddr() to setup.c
  xen/arm: mmu: enable SMMU subsystem only in MMU

Wei Chen (3):
  xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm()
  xen/arm64: prepare for moving MMU related code from head.S
  xen/arm: Move MMU related definitions from config.h to mmu/layout.h

 xen/arch/arm/Kconfig                    |    5 +-
 xen/arch/arm/Makefile                   |    1 +
 xen/arch/arm/arm32/head.S               |    4 +-
 xen/arch/arm/arm64/Makefile             |    2 +-
 xen/arch/arm/arm64/head.S               |  496 +------
 xen/arch/arm/arm64/mmu/Makefile         |    2 +
 xen/arch/arm/arm64/mmu/head.S           |  460 ++++++
 xen/arch/arm/arm64/{ => mmu}/mm.c       |   11 +-
 xen/arch/arm/arm64/smpboot.c            |    6 +-
 xen/arch/arm/include/asm/arm64/macros.h |   36 +
 xen/arch/arm/include/asm/arm64/mm.h     |    7 +-
 xen/arch/arm/include/asm/config.h       |  132 +-
 xen/arch/arm/include/asm/fixmap.h       |    7 +-
 xen/arch/arm/include/asm/mm.h           |   30 +-
 xen/arch/arm/include/asm/mmu/layout.h   |  146 ++
 xen/arch/arm/include/asm/mmu/mm.h       |   55 +
 xen/arch/arm/include/asm/mmu/p2m.h      |   18 +
 xen/arch/arm/include/asm/p2m.h          |   33 +-
 xen/arch/arm/include/asm/page.h         |   15 -
 xen/arch/arm/include/asm/setup.h        |    8 +-
 xen/arch/arm/kernel.c                   |   27 -
 xen/arch/arm/mm.c                       | 1119 --------------
 xen/arch/arm/mmu/Makefile               |    3 +
 xen/arch/arm/mmu/mm.c                   | 1153 +++++++++++++++
 xen/arch/arm/mmu/p2m.c                  | 1610 ++++++++++++++++++++
 xen/arch/arm/mmu/setup.c                |  366 +++++
 xen/arch/arm/p2m.c                      | 1772 ++---------------------
 xen/arch/arm/setup.c                    |  326 +----
 xen/arch/arm/smpboot.c                  |    4 +-
 xen/arch/arm/xen.lds.S                  |    1 +
 xen/drivers/passthrough/Kconfig         |    3 +-
 31 files changed, 4042 insertions(+), 3816 deletions(-)
 create mode 100644 xen/arch/arm/arm64/mmu/Makefile
 create mode 100644 xen/arch/arm/arm64/mmu/head.S
 rename xen/arch/arm/arm64/{ => mmu}/mm.c (95%)
 create mode 100644 xen/arch/arm/include/asm/mmu/layout.h
 create mode 100644 xen/arch/arm/include/asm/mmu/mm.h
 create mode 100644 xen/arch/arm/include/asm/mmu/p2m.h
 create mode 100644 xen/arch/arm/mmu/Makefile
 create mode 100644 xen/arch/arm/mmu/mm.c
 create mode 100644 xen/arch/arm/mmu/p2m.c
 create mode 100644 xen/arch/arm/mmu/setup.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm()
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21  8:33   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 02/13] xen/arm: Introduce CONFIG_MMU Kconfig option Henry Wang
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng, Henry Wang

From: Wei Chen <wei.chen@arm.com>

At the moment, on MMU system, enable_mmu() will return to an
address in the 1:1 mapping, then each path is responsible to
switch to virtual runtime mapping. Then remove_identity_mapping()
is called on the boot CPU to remove all 1:1 mapping.

Since remove_identity_mapping() is not necessary on Non-MMU system,
and we also avoid creating empty function for Non-MMU system, trying
to keep only one codeflow in arm64/head.S, we move path switch and
remove_identity_mapping() in enable_mmu() on MMU system.

As the remove_identity_mapping should only be called for the boot
CPU only, so we introduce enable_boot_cpu_mm() for boot CPU and
enable_secondary_cpu_mm() for secondary CPUs in this patch.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- Add missing "()" in title.
- Use more generic comment in enable_{boot,secondary}_cpu_mm() to
  mention function will return to the vaddr requested by the caller.
- Move 'mov lr, x5' closer to 'b remove_identity_mapping'.
- Drop the 'b fail' for unreachable code in enable_boot_cpu_mm().
v4:
- Clarify remove_identity_mapping() is called on boot CPU and keep
  the function/proc format consistent in commit msg.
- Drop inaccurate (due to the refactor) in-code comment.
- Rename enable_{boot,runtime}_mmu to enable_{boot,secondary}_cpu_mm.
- Reword the in-code comment on top of enable_{boot,secondary}_cpu_mm.
- Call "fail" for unreachable code.
v3:
- new patch
---
 xen/arch/arm/arm64/head.S | 83 ++++++++++++++++++++++++++++++---------
 1 file changed, 64 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 8bca9afa27..2bc2a03565 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -325,21 +325,11 @@ real_start_efi:
 
         bl    check_cpu_mode
         bl    cpu_init
-        bl    create_page_tables
-        load_paddr x0, boot_pgtable
-        bl    enable_mmu
 
-        /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
-        ldr   x0, =primary_switched
-        br    x0
+        ldr   lr, =primary_switched
+        b     enable_boot_cpu_mm
+
 primary_switched:
-        /*
-         * The 1:1 map may clash with other parts of the Xen virtual memory
-         * layout. As it is not used anymore, remove it completely to
-         * avoid having to worry about replacing existing mapping
-         * afterwards.
-         */
-        bl    remove_identity_mapping
         bl    setup_fixmap
 #ifdef CONFIG_EARLY_PRINTK
         /* Use a virtual address to access the UART. */
@@ -384,13 +374,10 @@ GLOBAL(init_secondary)
 #endif
         bl    check_cpu_mode
         bl    cpu_init
-        load_paddr x0, init_ttbr
-        ldr   x0, [x0]
-        bl    enable_mmu
 
-        /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
-        ldr   x0, =secondary_switched
-        br    x0
+        ldr   lr, =secondary_switched
+        b     enable_secondary_cpu_mm
+
 secondary_switched:
 #ifdef CONFIG_EARLY_PRINTK
         /* Use a virtual address to access the UART. */
@@ -748,6 +735,64 @@ enable_mmu:
         ret
 ENDPROC(enable_mmu)
 
+/*
+ * Enable mm (turn on the data cache and the MMU) for secondary CPUs.
+ * The function will return to the virtual address provided in LR (e.g. the
+ * runtime mapping).
+ *
+ * Inputs:
+ *   lr : Virtual address to return to.
+ *
+ * Clobbers x0 - x5
+ */
+enable_secondary_cpu_mm:
+        mov   x5, lr
+
+        load_paddr x0, init_ttbr
+        ldr   x0, [x0]
+
+        bl    enable_mmu
+        mov   lr, x5
+
+        /* Return to the virtual address requested by the caller. */
+        ret
+ENDPROC(enable_secondary_cpu_mm)
+
+/*
+ * Enable mm (turn on the data cache and the MMU) for the boot CPU.
+ * The function will return to the virtual address provided in LR (e.g. the
+ * runtime mapping).
+ *
+ * Inputs:
+ *   lr : Virtual address to return to.
+ *
+ * Clobbers x0 - x5
+ */
+enable_boot_cpu_mm:
+        mov   x5, lr
+
+        bl    create_page_tables
+        load_paddr x0, boot_pgtable
+
+        bl    enable_mmu
+
+        /*
+         * The MMU is turned on and we are in the 1:1 mapping. Switch
+         * to the runtime mapping.
+         */
+        ldr   x0, =1f
+        br    x0
+1:
+        mov   lr, x5
+        /*
+         * The 1:1 map may clash with other parts of the Xen virtual memory
+         * layout. As it is not used anymore, remove it completely to avoid
+         * having to worry about replacing existing mapping afterwards.
+         * Function will return to the virtual address requested by the caller.
+         */
+        b     remove_identity_mapping
+ENDPROC(enable_boot_cpu_mm)
+
 /*
  * Remove the 1:1 map from the page-tables. It is not easy to keep track
  * where the 1:1 map was mapped, so we will look for the top-level entry
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 02/13] xen/arm: Introduce CONFIG_MMU Kconfig option
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
  2023-08-14  4:25 ` [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm() Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21  8:43   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S Henry Wang
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Wei Chen, Penny Zheng, Volodymyr Babchuk, Julien Grall

There are two types of memory system architectures available for
Arm-based systems, namely the Virtual Memory System Architecture (VMSA)
and the Protected Memory System Architecture (PMSA). According to
ARM DDI 0487G.a, A VMSA provides a Memory Management Unit (MMU) that
controls address translation, access permissions, and memory attribute
determination and checking, for memory accesses made by the PE. And
refer to ARM DDI 0600A.c, the PMSA supports a unified memory protection
scheme where an Memory Protection Unit (MPU) manages instruction and
data access. Currently, Xen only suuports VMSA.

Introduce a Kconfig option CONFIG_MMU, which is currently default
set to y and unselectable because currently only VMSA is supported.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- Only introduce the unselectable CONFIG_MMU, add the 'choice' in
  future commits.
v4:
- Completely rework "[v3,06/52] xen/arm: introduce CONFIG_HAS_MMU"
---
 xen/arch/arm/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 57bd1d01d7..eb0413336b 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -59,6 +59,9 @@ config PADDR_BITS
 	default 40 if ARM_PA_BITS_40
 	default 48 if ARM_64
 
+config MMU
+	def_bool y
+
 source "arch/Kconfig"
 
 config ACPI
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
  2023-08-14  4:25 ` [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm() Henry Wang
  2023-08-14  4:25 ` [PATCH v5 02/13] xen/arm: Introduce CONFIG_MMU Kconfig option Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21  8:44   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S Henry Wang
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng, Henry Wang, Ayan Kumar Halder,
	Julien Grall

From: Wei Chen <wei.chen@arm.com>

We want to reuse head.S for MPU systems, but there are some
code are implemented for MMU systems only. We will move such
code to another MMU specific file. But before that we will
do some indentations fix in this patch to make them be easier
for reviewing:
1. Fix the indentations and incorrect style of code comments.
2. Fix the indentations for .text.header section.
3. Rename puts() to asm_puts() for global export

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
---
v5:
- Use "must" in the code comment for asm_puts() to note that
  this function must be called from assembly.
- Add the Reviewed-by tags from Ayan and Julien.
v4:
- Rebase to pick the adr -> adr_l change in PRINT(_s).
- Correct in-code comment for asm_puts() and add a note to
  mention that asm_puts() should be only called from assembly.
- Drop redundant puts (now asm_puts) under CONFIG_EARLY_PRINTK.
v3:
-  fix commit message
-  Rename puts() to asm_puts() for global export
v2:
-  New patch.
---
 xen/arch/arm/arm64/head.S | 46 ++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 2bc2a03565..f25a41d36c 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -94,7 +94,7 @@
 #define PRINT(_s)          \
         mov   x3, lr ;     \
         adr_l x0, 98f ;    \
-        bl    puts    ;    \
+        bl    asm_puts ;   \
         mov   lr, x3 ;     \
         RODATA_STR(98, _s)
 
@@ -148,21 +148,21 @@
         isb
 .endm
 
-        .section .text.header, "ax", %progbits
-        /*.aarch64*/
+.section .text.header, "ax", %progbits
+/*.aarch64*/
 
-        /*
-         * Kernel startup entry point.
-         * ---------------------------
-         *
-         * The requirements are:
-         *   MMU = off, D-cache = off, I-cache = on or off,
-         *   x0 = physical address to the FDT blob.
-         *
-         * This must be the very first address in the loaded image.
-         * It should be linked at XEN_VIRT_START, and loaded at any
-         * 4K-aligned address.
-         */
+/*
+ * Kernel startup entry point.
+ * ---------------------------
+ *
+ * The requirements are:
+ *   MMU = off, D-cache = off, I-cache = on or off,
+ *   x0 = physical address to the FDT blob.
+ *
+ * This must be the very first address in the loaded image.
+ * It should be linked at XEN_VIRT_START, and loaded at any
+ * 4K-aligned address.
+ */
 
 GLOBAL(start)
         /*
@@ -547,7 +547,7 @@ ENDPROC(cpu_init)
  * Macro to create a mapping entry in \tbl to \phys. Only mapping in 3rd
  * level table (i.e page granularity) is supported.
  *
- * ptbl:     table symbol where the entry will be created
+ * ptbl:    table symbol where the entry will be created
  * virt:    virtual address
  * phys:    physical address (should be page aligned)
  * tmp1:    scratch register
@@ -965,19 +965,22 @@ init_uart:
         ret
 ENDPROC(init_uart)
 
-/* Print early debug messages.
+/*
+ * Print early debug messages.
+ * Note: This function must be called from assembly.
  * x0: Nul-terminated string to print.
  * x23: Early UART base address
- * Clobbers x0-x1 */
-puts:
+ * Clobbers x0-x1
+ */
+ENTRY(asm_puts)
         early_uart_ready x23, 1
         ldrb  w1, [x0], #1           /* Load next char */
         cbz   w1, 1f                 /* Exit on nul */
         early_uart_transmit x23, w1
-        b     puts
+        b     asm_puts
 1:
         ret
-ENDPROC(puts)
+ENDPROC(asm_puts)
 
 /*
  * Print a 64-bit number in hex.
@@ -1007,7 +1010,6 @@ hex:    .ascii "0123456789abcdef"
 
 ENTRY(early_puts)
 init_uart:
-puts:
 putn:   ret
 
 #endif /* !CONFIG_EARLY_PRINTK */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (2 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21  9:18   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 05/13] xen/arm: Move MMU related definitions from config.h to mmu/layout.h Henry Wang
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Penny Zheng, Volodymyr Babchuk, Wei Chen

The MMU specific code in head.S will not be used on MPU systems.
Instead of introducing more #ifdefs which will bring complexity
to the code, move MMU related code to mmu/head.S and keep common
code in head.S. Two notes while moving:
- As "fail" in original head.S is very simple and this name is too
  easy to be conflicted, duplicate it in mmu/head.S instead of
  exporting it.
- Use ENTRY() for enable_secondary_cpu_mm, enable_boot_cpu_mm and
  setup_fixmap to please the compiler after the code movement.

Also move the assembly macros shared by head.S and mmu/head.S to
macros.h.

Note that, only the first 4KB of Xen image will be mapped as
identity (PA == VA). At the moment, Xen guarantees this by having
everything that needs to be used in the identity mapping in
.text.header section of head.S, and the size will be checked by
_idmap_start and _idmap_end at link time if this fits in 4KB.
Since we are introducing a new head.S in this patch, although
we can add .text.header to the new file to guarantee all identity
map code still in the first 4KB. However, the order of these two
files on this 4KB depends on the build toolchains. Hence, introduce
a new section named .text.idmap in the region between _idmap_start
and _idmap_end. And in Xen linker script, we force the .text.idmap
contents to linked after .text.header. This will ensure code of
head.S always be at the top of Xen binary.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v5:
- Rebase on top of commit
  "xen/arm64: head: Introduce a helper to flush local TLBs".
v4:
- Rework "[v3,08/52] xen/arm64: move MMU related code from
  head.S to mmu/head.S"
- Don't move the "yet to shared" macro such as print_reg.
- Fold "[v3,04/52] xen/arm: add .text.idmap in ld script for Xen
  identity map sections" to this patch. Rework commit msg.
---
 xen/arch/arm/arm64/Makefile             |   1 +
 xen/arch/arm/arm64/head.S               | 492 +-----------------------
 xen/arch/arm/arm64/mmu/Makefile         |   1 +
 xen/arch/arm/arm64/mmu/head.S           | 488 +++++++++++++++++++++++
 xen/arch/arm/include/asm/arm64/macros.h |  36 ++
 xen/arch/arm/xen.lds.S                  |   1 +
 6 files changed, 528 insertions(+), 491 deletions(-)
 create mode 100644 xen/arch/arm/arm64/mmu/Makefile
 create mode 100644 xen/arch/arm/arm64/mmu/head.S

diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index 54ad55c75c..f89d5fb4fb 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -1,4 +1,5 @@
 obj-y += lib/
+obj-$(CONFIG_MMU) += mmu/
 
 obj-y += cache.o
 obj-y += cpufeature.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index f25a41d36c..3c8a12eda7 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -28,17 +28,6 @@
 #include <asm/arm64/efibind.h>
 #endif
 
-#define PT_PT     0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_MEM    0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
-#define PT_MEM_L3 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_DEV    0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */
-#define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */
-
-/* Convenience defines to get slot used by Xen mapping. */
-#define XEN_ZEROETH_SLOT    zeroeth_table_offset(XEN_VIRT_START)
-#define XEN_FIRST_SLOT      first_table_offset(XEN_VIRT_START)
-#define XEN_SECOND_SLOT     second_table_offset(XEN_VIRT_START)
-
 #define __HEAD_FLAG_PAGE_SIZE   ((PAGE_SHIFT - 10) / 2)
 
 #define __HEAD_FLAG_PHYS_BASE   1
@@ -85,19 +74,7 @@
  *  x30 - lr
  */
 
-#ifdef CONFIG_EARLY_PRINTK
-/*
- * Macro to print a string to the UART, if there is one.
- *
- * Clobbers x0 - x3
- */
-#define PRINT(_s)          \
-        mov   x3, lr ;     \
-        adr_l x0, 98f ;    \
-        bl    asm_puts ;   \
-        mov   lr, x3 ;     \
-        RODATA_STR(98, _s)
-
+ #ifdef CONFIG_EARLY_PRINTK
 /*
  * Macro to print the value of register \xb
  *
@@ -111,43 +88,11 @@
 .endm
 
 #else /* CONFIG_EARLY_PRINTK */
-#define PRINT(s)
-
 .macro print_reg xb
 .endm
 
 #endif /* !CONFIG_EARLY_PRINTK */
 
-/*
- * Pseudo-op for PC relative adr <reg>, <symbol> where <symbol> is
- * within the range +/- 4GB of the PC.
- *
- * @dst: destination register (64 bit wide)
- * @sym: name of the symbol
- */
-.macro  adr_l, dst, sym
-        adrp \dst, \sym
-        add  \dst, \dst, :lo12:\sym
-.endm
-
-/* Load the physical address of a symbol into xb */
-.macro load_paddr xb, sym
-        ldr \xb, =\sym
-        add \xb, \xb, x20
-.endm
-
-/*
- * Flush local TLBs
- *
- * See asm/arm64/flushtlb.h for the explanation of the sequence.
- */
-.macro flush_xen_tlb_local
-        dsb   nshst
-        tlbi  alle2
-        dsb   nsh
-        isb
-.endm
-
 .section .text.header, "ax", %progbits
 /*.aarch64*/
 
@@ -484,402 +429,6 @@ cpu_init:
         ret
 ENDPROC(cpu_init)
 
-/*
- * Macro to find the slot number at a given page-table level
- *
- * slot:     slot computed
- * virt:     virtual address
- * lvl:      page-table level
- */
-.macro get_table_slot, slot, virt, lvl
-        ubfx  \slot, \virt, #XEN_PT_LEVEL_SHIFT(\lvl), #XEN_PT_LPAE_SHIFT
-.endm
-
-/*
- * Macro to create a page table entry in \ptbl to \tbl
- * ptbl:    table symbol where the entry will be created
- * tbl:     physical address of the table to point to
- * virt:    virtual address
- * lvl:     page-table level
- * tmp1:    scratch register
- * tmp2:    scratch register
- *
- * Preserves \virt
- * Clobbers \tbl, \tmp1, \tmp2
- *
- * Note that all parameters using registers should be distinct.
- */
-.macro create_table_entry_from_paddr, ptbl, tbl, virt, lvl, tmp1, tmp2
-        get_table_slot \tmp1, \virt, \lvl   /* \tmp1 := slot in \tbl */
-
-        mov   \tmp2, #PT_PT                 /* \tmp2 := right for linear PT */
-        orr   \tmp2, \tmp2, \tbl            /*          + \tbl */
-
-        adr_l \tbl, \ptbl                   /* \tbl := address(\ptbl) */
-
-        str   \tmp2, [\tbl, \tmp1, lsl #3]
-.endm
-
-/*
- * Macro to create a page table entry in \ptbl to \tbl
- *
- * ptbl:    table symbol where the entry will be created
- * tbl:     table symbol to point to
- * virt:    virtual address
- * lvl:     page-table level
- * tmp1:    scratch register
- * tmp2:    scratch register
- * tmp3:    scratch register
- *
- * Preserves \virt
- * Clobbers \tmp1, \tmp2, \tmp3
- *
- * Also use x20 for the phys offset.
- *
- * Note that all parameters using registers should be distinct.
- */
-.macro create_table_entry, ptbl, tbl, virt, lvl, tmp1, tmp2, tmp3
-        load_paddr \tmp1, \tbl
-        create_table_entry_from_paddr \ptbl, \tmp1, \virt, \lvl, \tmp2, \tmp3
-.endm
-
-/*
- * Macro to create a mapping entry in \tbl to \phys. Only mapping in 3rd
- * level table (i.e page granularity) is supported.
- *
- * ptbl:    table symbol where the entry will be created
- * virt:    virtual address
- * phys:    physical address (should be page aligned)
- * tmp1:    scratch register
- * tmp2:    scratch register
- * tmp3:    scratch register
- * type:    mapping type. If not specified it will be normal memory (PT_MEM_L3)
- *
- * Preserves \virt, \phys
- * Clobbers \tmp1, \tmp2, \tmp3
- *
- * Note that all parameters using registers should be distinct.
- */
-.macro create_mapping_entry, ptbl, virt, phys, tmp1, tmp2, tmp3, type=PT_MEM_L3
-        and   \tmp3, \phys, #THIRD_MASK     /* \tmp3 := PAGE_ALIGNED(phys) */
-
-        get_table_slot \tmp1, \virt, 3      /* \tmp1 := slot in \tlb */
-
-        mov   \tmp2, #\type                 /* \tmp2 := right for section PT */
-        orr   \tmp2, \tmp2, \tmp3           /*          + PAGE_ALIGNED(phys) */
-
-        adr_l \tmp3, \ptbl
-
-        str   \tmp2, [\tmp3, \tmp1, lsl #3]
-.endm
-
-/*
- * Rebuild the boot pagetable's first-level entries. The structure
- * is described in mm.c.
- *
- * After the CPU enables paging it will add the fixmap mapping
- * to these page tables, however this may clash with the 1:1
- * mapping. So each CPU must rebuild the page tables here with
- * the 1:1 in place.
- *
- * Inputs:
- *   x19: paddr(start)
- *   x20: phys offset
- *
- * Clobbers x0 - x4
- */
-create_page_tables:
-        /* Prepare the page-tables for mapping Xen */
-        ldr   x0, =XEN_VIRT_START
-        create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
-        create_table_entry boot_first, boot_second, x0, 1, x1, x2, x3
-
-        /*
-         * We need to use a stash register because
-         * create_table_entry_paddr() will clobber the register storing
-         * the physical address of the table to point to.
-         */
-        load_paddr x4, boot_third
-        ldr   x1, =XEN_VIRT_START
-.rept XEN_NR_ENTRIES(2)
-        mov   x0, x4                            /* x0 := paddr(l3 table) */
-        create_table_entry_from_paddr boot_second, x0, x1, 2, x2, x3
-        add   x1, x1, #XEN_PT_LEVEL_SIZE(2)     /* x1 := Next vaddr */
-        add   x4, x4, #PAGE_SIZE                /* x4 := Next table */
-.endr
-
-        /*
-         * Find the size of Xen in pages and multiply by the size of a
-         * PTE. This will then be compared in the mapping loop below.
-         *
-         * Note the multiplication is just to avoid using an extra
-         * register/instruction per iteration.
-         */
-        ldr   x0, =_start            /* x0 := vaddr(_start) */
-        ldr   x1, =_end              /* x1 := vaddr(_end) */
-        sub   x0, x1, x0             /* x0 := effective size of Xen */
-        lsr   x0, x0, #PAGE_SHIFT    /* x0 := Number of pages for Xen */
-        lsl   x0, x0, #3             /* x0 := Number of pages * PTE size */
-
-        /* Map Xen */
-        adr_l x4, boot_third
-
-        lsr   x2, x19, #THIRD_SHIFT  /* Base address for 4K mapping */
-        lsl   x2, x2, #THIRD_SHIFT
-        mov   x3, #PT_MEM_L3         /* x2 := Section map */
-        orr   x2, x2, x3
-
-        /* ... map of vaddr(start) in boot_third */
-        mov   x1, xzr
-1:      str   x2, [x4, x1]           /* Map vaddr(start) */
-        add   x2, x2, #PAGE_SIZE     /* Next page */
-        add   x1, x1, #8             /* Next slot */
-        cmp   x1, x0                 /* Loop until we map all of Xen */
-        b.lt  1b
-
-        /*
-         * If Xen is loaded at exactly XEN_VIRT_START then we don't
-         * need an additional 1:1 mapping, the virtual mapping will
-         * suffice.
-         */
-        ldr   x0, =XEN_VIRT_START
-        cmp   x19, x0
-        bne   1f
-        ret
-1:
-        /*
-         * Setup the 1:1 mapping so we can turn the MMU on. Note that
-         * only the first page of Xen will be part of the 1:1 mapping.
-         */
-
-        /*
-         * Find the zeroeth slot used. If the slot is not
-         * XEN_ZEROETH_SLOT, then the 1:1 mapping will use its own set of
-         * page-tables from the first level.
-         */
-        get_table_slot x0, x19, 0       /* x0 := zeroeth slot */
-        cmp   x0, #XEN_ZEROETH_SLOT
-        beq   1f
-        create_table_entry boot_pgtable, boot_first_id, x19, 0, x0, x1, x2
-        b     link_from_first_id
-
-1:
-        /*
-         * Find the first slot used. If the slot is not XEN_FIRST_SLOT,
-         * then the 1:1 mapping will use its own set of page-tables from
-         * the second level.
-         */
-        get_table_slot x0, x19, 1      /* x0 := first slot */
-        cmp   x0, #XEN_FIRST_SLOT
-        beq   1f
-        create_table_entry boot_first, boot_second_id, x19, 1, x0, x1, x2
-        b     link_from_second_id
-
-1:
-        /*
-         * Find the second slot used. If the slot is XEN_SECOND_SLOT, then the
-         * 1:1 mapping will use its own set of page-tables from the
-         * third level. For slot XEN_SECOND_SLOT, Xen is not yet able to handle
-         * it.
-         */
-        get_table_slot x0, x19, 2     /* x0 := second slot */
-        cmp   x0, #XEN_SECOND_SLOT
-        beq   virtphys_clash
-        create_table_entry boot_second, boot_third_id, x19, 2, x0, x1, x2
-        b     link_from_third_id
-
-link_from_first_id:
-        create_table_entry boot_first_id, boot_second_id, x19, 1, x0, x1, x2
-link_from_second_id:
-        create_table_entry boot_second_id, boot_third_id, x19, 2, x0, x1, x2
-link_from_third_id:
-        create_mapping_entry boot_third_id, x19, x19, x0, x1, x2
-        ret
-
-virtphys_clash:
-        /* Identity map clashes with boot_third, which we cannot handle yet */
-        PRINT("- Unable to build boot page tables - virt and phys addresses clash. -\r\n")
-        b     fail
-ENDPROC(create_page_tables)
-
-/*
- * Turn on the Data Cache and the MMU. The function will return on the 1:1
- * mapping. In other word, the caller is responsible to switch to the runtime
- * mapping.
- *
- * Inputs:
- *   x0 : Physical address of the page tables.
- *
- * Clobbers x0 - x4
- */
-enable_mmu:
-        mov   x4, x0
-        PRINT("- Turning on paging -\r\n")
-
-        /*
-         * The state of the TLBs is unknown before turning on the MMU.
-         * Flush them to avoid stale one.
-         */
-        flush_xen_tlb_local
-
-        /* Write Xen's PT's paddr into TTBR0_EL2 */
-        msr   TTBR0_EL2, x4
-        isb
-
-        mrs   x0, SCTLR_EL2
-        orr   x0, x0, #SCTLR_Axx_ELx_M  /* Enable MMU */
-        orr   x0, x0, #SCTLR_Axx_ELx_C  /* Enable D-cache */
-        dsb   sy                     /* Flush PTE writes and finish reads */
-        msr   SCTLR_EL2, x0          /* now paging is enabled */
-        isb                          /* Now, flush the icache */
-        ret
-ENDPROC(enable_mmu)
-
-/*
- * Enable mm (turn on the data cache and the MMU) for secondary CPUs.
- * The function will return to the virtual address provided in LR (e.g. the
- * runtime mapping).
- *
- * Inputs:
- *   lr : Virtual address to return to.
- *
- * Clobbers x0 - x5
- */
-enable_secondary_cpu_mm:
-        mov   x5, lr
-
-        load_paddr x0, init_ttbr
-        ldr   x0, [x0]
-
-        bl    enable_mmu
-        mov   lr, x5
-
-        /* Return to the virtual address requested by the caller. */
-        ret
-ENDPROC(enable_secondary_cpu_mm)
-
-/*
- * Enable mm (turn on the data cache and the MMU) for the boot CPU.
- * The function will return to the virtual address provided in LR (e.g. the
- * runtime mapping).
- *
- * Inputs:
- *   lr : Virtual address to return to.
- *
- * Clobbers x0 - x5
- */
-enable_boot_cpu_mm:
-        mov   x5, lr
-
-        bl    create_page_tables
-        load_paddr x0, boot_pgtable
-
-        bl    enable_mmu
-
-        /*
-         * The MMU is turned on and we are in the 1:1 mapping. Switch
-         * to the runtime mapping.
-         */
-        ldr   x0, =1f
-        br    x0
-1:
-        mov   lr, x5
-        /*
-         * The 1:1 map may clash with other parts of the Xen virtual memory
-         * layout. As it is not used anymore, remove it completely to avoid
-         * having to worry about replacing existing mapping afterwards.
-         * Function will return to the virtual address requested by the caller.
-         */
-        b     remove_identity_mapping
-ENDPROC(enable_boot_cpu_mm)
-
-/*
- * Remove the 1:1 map from the page-tables. It is not easy to keep track
- * where the 1:1 map was mapped, so we will look for the top-level entry
- * exclusive to the 1:1 map and remove it.
- *
- * Inputs:
- *   x19: paddr(start)
- *
- * Clobbers x0 - x1
- */
-remove_identity_mapping:
-        /*
-         * Find the zeroeth slot used. Remove the entry from zeroeth
-         * table if the slot is not XEN_ZEROETH_SLOT.
-         */
-        get_table_slot x1, x19, 0       /* x1 := zeroeth slot */
-        cmp   x1, #XEN_ZEROETH_SLOT
-        beq   1f
-        /* It is not in slot XEN_ZEROETH_SLOT, remove the entry. */
-        ldr   x0, =boot_pgtable         /* x0 := root table */
-        str   xzr, [x0, x1, lsl #3]
-        b     identity_mapping_removed
-
-1:
-        /*
-         * Find the first slot used. Remove the entry for the first
-         * table if the slot is not XEN_FIRST_SLOT.
-         */
-        get_table_slot x1, x19, 1       /* x1 := first slot */
-        cmp   x1, #XEN_FIRST_SLOT
-        beq   1f
-        /* It is not in slot XEN_FIRST_SLOT, remove the entry. */
-        ldr   x0, =boot_first           /* x0 := first table */
-        str   xzr, [x0, x1, lsl #3]
-        b     identity_mapping_removed
-
-1:
-        /*
-         * Find the second slot used. Remove the entry for the first
-         * table if the slot is not XEN_SECOND_SLOT.
-         */
-        get_table_slot x1, x19, 2       /* x1 := second slot */
-        cmp   x1, #XEN_SECOND_SLOT
-        beq   identity_mapping_removed
-        /* It is not in slot 1, remove the entry */
-        ldr   x0, =boot_second          /* x0 := second table */
-        str   xzr, [x0, x1, lsl #3]
-
-identity_mapping_removed:
-        flush_xen_tlb_local
-
-        ret
-ENDPROC(remove_identity_mapping)
-
-/*
- * Map the UART in the fixmap (when earlyprintk is used) and hook the
- * fixmap table in the page tables.
- *
- * The fixmap cannot be mapped in create_page_tables because it may
- * clash with the 1:1 mapping.
- *
- * Inputs:
- *   x20: Physical offset
- *   x23: Early UART base physical address
- *
- * Clobbers x0 - x3
- */
-setup_fixmap:
-#ifdef CONFIG_EARLY_PRINTK
-        /* Add UART to the fixmap table */
-        ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
-        create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
-#endif
-        /* Map fixmap into boot_second */
-        ldr   x0, =FIXMAP_ADDR(0)
-        create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
-        /* Ensure any page table updates made above have occurred. */
-        dsb   nshst
-        /*
-         * The fixmap area will be used soon after. So ensure no hardware
-         * translation happens before the dsb completes.
-         */
-        isb
-
-        ret
-ENDPROC(setup_fixmap)
-
 /*
  * Setup the initial stack and jump to the C world
  *
@@ -908,45 +457,6 @@ fail:   PRINT("- Boot failed -\r\n")
         b     1b
 ENDPROC(fail)
 
-/*
- * Switch TTBR
- *
- * x0    ttbr
- */
-ENTRY(switch_ttbr_id)
-        /* 1) Ensure any previous read/write have completed */
-        dsb    ish
-        isb
-
-        /* 2) Turn off MMU */
-        mrs    x1, SCTLR_EL2
-        bic    x1, x1, #SCTLR_Axx_ELx_M
-        msr    SCTLR_EL2, x1
-        isb
-
-        /* 3) Flush the TLBs */
-        flush_xen_tlb_local
-
-        /* 4) Update the TTBR */
-        msr   TTBR0_EL2, x0
-        isb
-
-        /*
-         * 5) Flush I-cache
-         * This should not be necessary but it is kept for safety.
-         */
-        ic     iallu
-        isb
-
-        /* 6) Turn on the MMU */
-        mrs   x1, SCTLR_EL2
-        orr   x1, x1, #SCTLR_Axx_ELx_M  /* Enable MMU */
-        msr   SCTLR_EL2, x1
-        isb
-
-        ret
-ENDPROC(switch_ttbr_id)
-
 #ifdef CONFIG_EARLY_PRINTK
 /*
  * Initialize the UART. Should only be called on the boot CPU.
diff --git a/xen/arch/arm/arm64/mmu/Makefile b/xen/arch/arm/arm64/mmu/Makefile
new file mode 100644
index 0000000000..3340058c08
--- /dev/null
+++ b/xen/arch/arm/arm64/mmu/Makefile
@@ -0,0 +1 @@
+obj-y += head.o
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
new file mode 100644
index 0000000000..97d872c3cb
--- /dev/null
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -0,0 +1,488 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * xen/arch/arm/arm64/mmu/head.S
+ *
+ * Arm64 MMU specific start-of-day code.
+ */
+
+#include <asm/page.h>
+#include <asm/early_printk.h>
+
+#define PT_PT     0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
+#define PT_MEM    0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
+#define PT_MEM_L3 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
+#define PT_DEV    0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */
+#define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */
+
+/* Convenience defines to get slot used by Xen mapping. */
+#define XEN_ZEROETH_SLOT    zeroeth_table_offset(XEN_VIRT_START)
+#define XEN_FIRST_SLOT      first_table_offset(XEN_VIRT_START)
+#define XEN_SECOND_SLOT     second_table_offset(XEN_VIRT_START)
+
+/*
+ * Flush local TLBs
+ *
+ * See asm/arm64/flushtlb.h for the explanation of the sequence.
+ */
+.macro flush_xen_tlb_local
+        dsb   nshst
+        tlbi  alle2
+        dsb   nsh
+        isb
+.endm
+
+/*
+ * Macro to find the slot number at a given page-table level
+ *
+ * slot:     slot computed
+ * virt:     virtual address
+ * lvl:      page-table level
+ */
+.macro get_table_slot, slot, virt, lvl
+        ubfx  \slot, \virt, #XEN_PT_LEVEL_SHIFT(\lvl), #XEN_PT_LPAE_SHIFT
+.endm
+
+/*
+ * Macro to create a page table entry in \ptbl to \tbl
+ * ptbl:    table symbol where the entry will be created
+ * tbl:     physical address of the table to point to
+ * virt:    virtual address
+ * lvl:     page-table level
+ * tmp1:    scratch register
+ * tmp2:    scratch register
+ *
+ * Preserves \virt
+ * Clobbers \tbl, \tmp1, \tmp2
+ *
+ * Note that all parameters using registers should be distinct.
+ */
+.macro create_table_entry_from_paddr, ptbl, tbl, virt, lvl, tmp1, tmp2
+        get_table_slot \tmp1, \virt, \lvl   /* \tmp1 := slot in \tbl */
+
+        mov   \tmp2, #PT_PT                 /* \tmp2 := right for linear PT */
+        orr   \tmp2, \tmp2, \tbl            /*          + \tbl */
+
+        adr_l \tbl, \ptbl                   /* \tbl := address(\ptbl) */
+
+        str   \tmp2, [\tbl, \tmp1, lsl #3]
+.endm
+
+/*
+ * Macro to create a page table entry in \ptbl to \tbl
+ *
+ * ptbl:    table symbol where the entry will be created
+ * tbl:     table symbol to point to
+ * virt:    virtual address
+ * lvl:     page-table level
+ * tmp1:    scratch register
+ * tmp2:    scratch register
+ * tmp3:    scratch register
+ *
+ * Preserves \virt
+ * Clobbers \tmp1, \tmp2, \tmp3
+ *
+ * Also use x20 for the phys offset.
+ *
+ * Note that all parameters using registers should be distinct.
+ */
+.macro create_table_entry, ptbl, tbl, virt, lvl, tmp1, tmp2, tmp3
+        load_paddr \tmp1, \tbl
+        create_table_entry_from_paddr \ptbl, \tmp1, \virt, \lvl, \tmp2, \tmp3
+.endm
+
+/*
+ * Macro to create a mapping entry in \tbl to \phys. Only mapping in 3rd
+ * level table (i.e page granularity) is supported.
+ *
+ * ptbl:    table symbol where the entry will be created
+ * virt:    virtual address
+ * phys:    physical address (should be page aligned)
+ * tmp1:    scratch register
+ * tmp2:    scratch register
+ * tmp3:    scratch register
+ * type:    mapping type. If not specified it will be normal memory (PT_MEM_L3)
+ *
+ * Preserves \virt, \phys
+ * Clobbers \tmp1, \tmp2, \tmp3
+ *
+ * Note that all parameters using registers should be distinct.
+ */
+.macro create_mapping_entry, ptbl, virt, phys, tmp1, tmp2, tmp3, type=PT_MEM_L3
+        and   \tmp3, \phys, #THIRD_MASK     /* \tmp3 := PAGE_ALIGNED(phys) */
+
+        get_table_slot \tmp1, \virt, 3      /* \tmp1 := slot in \tlb */
+
+        mov   \tmp2, #\type                 /* \tmp2 := right for section PT */
+        orr   \tmp2, \tmp2, \tmp3           /*          + PAGE_ALIGNED(phys) */
+
+        adr_l \tmp3, \ptbl
+
+        str   \tmp2, [\tmp3, \tmp1, lsl #3]
+.endm
+
+.section .text.idmap, "ax", %progbits
+
+/*
+ * Rebuild the boot pagetable's first-level entries. The structure
+ * is described in mm.c.
+ *
+ * After the CPU enables paging it will add the fixmap mapping
+ * to these page tables, however this may clash with the 1:1
+ * mapping. So each CPU must rebuild the page tables here with
+ * the 1:1 in place.
+ *
+ * Inputs:
+ *   x19: paddr(start)
+ *   x20: phys offset
+ *
+ * Clobbers x0 - x4
+ */
+create_page_tables:
+        /* Prepare the page-tables for mapping Xen */
+        ldr   x0, =XEN_VIRT_START
+        create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
+        create_table_entry boot_first, boot_second, x0, 1, x1, x2, x3
+
+        /*
+         * We need to use a stash register because
+         * create_table_entry_paddr() will clobber the register storing
+         * the physical address of the table to point to.
+         */
+        load_paddr x4, boot_third
+        ldr   x1, =XEN_VIRT_START
+.rept XEN_NR_ENTRIES(2)
+        mov   x0, x4                            /* x0 := paddr(l3 table) */
+        create_table_entry_from_paddr boot_second, x0, x1, 2, x2, x3
+        add   x1, x1, #XEN_PT_LEVEL_SIZE(2)     /* x1 := Next vaddr */
+        add   x4, x4, #PAGE_SIZE                /* x4 := Next table */
+.endr
+
+        /*
+         * Find the size of Xen in pages and multiply by the size of a
+         * PTE. This will then be compared in the mapping loop below.
+         *
+         * Note the multiplication is just to avoid using an extra
+         * register/instruction per iteration.
+         */
+        ldr   x0, =_start            /* x0 := vaddr(_start) */
+        ldr   x1, =_end              /* x1 := vaddr(_end) */
+        sub   x0, x1, x0             /* x0 := effective size of Xen */
+        lsr   x0, x0, #PAGE_SHIFT    /* x0 := Number of pages for Xen */
+        lsl   x0, x0, #3             /* x0 := Number of pages * PTE size */
+
+        /* Map Xen */
+        adr_l x4, boot_third
+
+        lsr   x2, x19, #THIRD_SHIFT  /* Base address for 4K mapping */
+        lsl   x2, x2, #THIRD_SHIFT
+        mov   x3, #PT_MEM_L3         /* x2 := Section map */
+        orr   x2, x2, x3
+
+        /* ... map of vaddr(start) in boot_third */
+        mov   x1, xzr
+1:      str   x2, [x4, x1]           /* Map vaddr(start) */
+        add   x2, x2, #PAGE_SIZE     /* Next page */
+        add   x1, x1, #8             /* Next slot */
+        cmp   x1, x0                 /* Loop until we map all of Xen */
+        b.lt  1b
+
+        /*
+         * If Xen is loaded at exactly XEN_VIRT_START then we don't
+         * need an additional 1:1 mapping, the virtual mapping will
+         * suffice.
+         */
+        ldr   x0, =XEN_VIRT_START
+        cmp   x19, x0
+        bne   1f
+        ret
+1:
+        /*
+         * Setup the 1:1 mapping so we can turn the MMU on. Note that
+         * only the first page of Xen will be part of the 1:1 mapping.
+         */
+
+        /*
+         * Find the zeroeth slot used. If the slot is not
+         * XEN_ZEROETH_SLOT, then the 1:1 mapping will use its own set of
+         * page-tables from the first level.
+         */
+        get_table_slot x0, x19, 0       /* x0 := zeroeth slot */
+        cmp   x0, #XEN_ZEROETH_SLOT
+        beq   1f
+        create_table_entry boot_pgtable, boot_first_id, x19, 0, x0, x1, x2
+        b     link_from_first_id
+
+1:
+        /*
+         * Find the first slot used. If the slot is not XEN_FIRST_SLOT,
+         * then the 1:1 mapping will use its own set of page-tables from
+         * the second level.
+         */
+        get_table_slot x0, x19, 1      /* x0 := first slot */
+        cmp   x0, #XEN_FIRST_SLOT
+        beq   1f
+        create_table_entry boot_first, boot_second_id, x19, 1, x0, x1, x2
+        b     link_from_second_id
+
+1:
+        /*
+         * Find the second slot used. If the slot is XEN_SECOND_SLOT, then the
+         * 1:1 mapping will use its own set of page-tables from the
+         * third level. For slot XEN_SECOND_SLOT, Xen is not yet able to handle
+         * it.
+         */
+        get_table_slot x0, x19, 2     /* x0 := second slot */
+        cmp   x0, #XEN_SECOND_SLOT
+        beq   virtphys_clash
+        create_table_entry boot_second, boot_third_id, x19, 2, x0, x1, x2
+        b     link_from_third_id
+
+link_from_first_id:
+        create_table_entry boot_first_id, boot_second_id, x19, 1, x0, x1, x2
+link_from_second_id:
+        create_table_entry boot_second_id, boot_third_id, x19, 2, x0, x1, x2
+link_from_third_id:
+        create_mapping_entry boot_third_id, x19, x19, x0, x1, x2
+        ret
+
+virtphys_clash:
+        /* Identity map clashes with boot_third, which we cannot handle yet */
+        PRINT("- Unable to build boot page tables - virt and phys addresses clash. -\r\n")
+        b     fail
+ENDPROC(create_page_tables)
+
+/*
+ * Turn on the Data Cache and the MMU. The function will return on the 1:1
+ * mapping. In other word, the caller is responsible to switch to the runtime
+ * mapping.
+ *
+ * Inputs:
+ *   x0 : Physical address of the page tables.
+ *
+ * Clobbers x0 - x4
+ */
+enable_mmu:
+        mov   x4, x0
+        PRINT("- Turning on paging -\r\n")
+
+        /*
+         * The state of the TLBs is unknown before turning on the MMU.
+         * Flush them to avoid stale one.
+         */
+        flush_xen_tlb_local
+
+        /* Write Xen's PT's paddr into TTBR0_EL2 */
+        msr   TTBR0_EL2, x4
+        isb
+
+        mrs   x0, SCTLR_EL2
+        orr   x0, x0, #SCTLR_Axx_ELx_M  /* Enable MMU */
+        orr   x0, x0, #SCTLR_Axx_ELx_C  /* Enable D-cache */
+        dsb   sy                     /* Flush PTE writes and finish reads */
+        msr   SCTLR_EL2, x0          /* now paging is enabled */
+        isb                          /* Now, flush the icache */
+        ret
+ENDPROC(enable_mmu)
+
+/*
+ * Enable mm (turn on the data cache and the MMU) for secondary CPUs.
+ * The function will return to the virtual address provided in LR (e.g. the
+ * runtime mapping).
+ *
+ * Inputs:
+ *   lr : Virtual address to return to.
+ *
+ * Clobbers x0 - x5
+ */
+ENTRY(enable_secondary_cpu_mm)
+        mov   x5, lr
+
+        load_paddr x0, init_ttbr
+        ldr   x0, [x0]
+
+        bl    enable_mmu
+        mov   lr, x5
+
+        /* return to secondary_switched */
+        ret
+ENDPROC(enable_secondary_cpu_mm)
+
+/*
+ * Enable mm (turn on the data cache and the MMU) for the boot CPU.
+ * The function will return to the virtual address provided in LR (e.g. the
+ * runtime mapping).
+ *
+ * Inputs:
+ *   lr : Virtual address to return to.
+ *
+ * Clobbers x0 - x5
+ */
+ENTRY(enable_boot_cpu_mm)
+        mov   x5, lr
+
+        bl    create_page_tables
+        load_paddr x0, boot_pgtable
+
+        bl    enable_mmu
+        mov   lr, x5
+
+        /*
+         * The MMU is turned on and we are in the 1:1 mapping. Switch
+         * to the runtime mapping.
+         */
+        ldr   x0, =1f
+        br    x0
+1:
+        /*
+         * The 1:1 map may clash with other parts of the Xen virtual memory
+         * layout. As it is not used anymore, remove it completely to
+         * avoid having to worry about replacing existing mapping
+         * afterwards. Function will return to primary_switched.
+         */
+        b     remove_identity_mapping
+
+        /*
+         * Below is supposed to be unreachable code, as "ret" in
+         * remove_identity_mapping will use the return address in LR in advance.
+         */
+        b     fail
+ENDPROC(enable_boot_cpu_mm)
+
+/*
+ * Remove the 1:1 map from the page-tables. It is not easy to keep track
+ * where the 1:1 map was mapped, so we will look for the top-level entry
+ * exclusive to the 1:1 map and remove it.
+ *
+ * Inputs:
+ *   x19: paddr(start)
+ *
+ * Clobbers x0 - x1
+ */
+remove_identity_mapping:
+        /*
+         * Find the zeroeth slot used. Remove the entry from zeroeth
+         * table if the slot is not XEN_ZEROETH_SLOT.
+         */
+        get_table_slot x1, x19, 0       /* x1 := zeroeth slot */
+        cmp   x1, #XEN_ZEROETH_SLOT
+        beq   1f
+        /* It is not in slot XEN_ZEROETH_SLOT, remove the entry. */
+        ldr   x0, =boot_pgtable         /* x0 := root table */
+        str   xzr, [x0, x1, lsl #3]
+        b     identity_mapping_removed
+
+1:
+        /*
+         * Find the first slot used. Remove the entry for the first
+         * table if the slot is not XEN_FIRST_SLOT.
+         */
+        get_table_slot x1, x19, 1       /* x1 := first slot */
+        cmp   x1, #XEN_FIRST_SLOT
+        beq   1f
+        /* It is not in slot XEN_FIRST_SLOT, remove the entry. */
+        ldr   x0, =boot_first           /* x0 := first table */
+        str   xzr, [x0, x1, lsl #3]
+        b     identity_mapping_removed
+
+1:
+        /*
+         * Find the second slot used. Remove the entry for the first
+         * table if the slot is not XEN_SECOND_SLOT.
+         */
+        get_table_slot x1, x19, 2       /* x1 := second slot */
+        cmp   x1, #XEN_SECOND_SLOT
+        beq   identity_mapping_removed
+        /* It is not in slot 1, remove the entry */
+        ldr   x0, =boot_second          /* x0 := second table */
+        str   xzr, [x0, x1, lsl #3]
+
+identity_mapping_removed:
+        flush_xen_tlb_local
+
+        ret
+ENDPROC(remove_identity_mapping)
+
+/*
+ * Map the UART in the fixmap (when earlyprintk is used) and hook the
+ * fixmap table in the page tables.
+ *
+ * The fixmap cannot be mapped in create_page_tables because it may
+ * clash with the 1:1 mapping.
+ *
+ * Inputs:
+ *   x20: Physical offset
+ *   x23: Early UART base physical address
+ *
+ * Clobbers x0 - x3
+ */
+ENTRY(setup_fixmap)
+#ifdef CONFIG_EARLY_PRINTK
+        /* Add UART to the fixmap table */
+        ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
+        create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
+#endif
+        /* Map fixmap into boot_second */
+        ldr   x0, =FIXMAP_ADDR(0)
+        create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
+        /* Ensure any page table updates made above have occurred. */
+        dsb   nshst
+        /*
+         * The fixmap area will be used soon after. So ensure no hardware
+         * translation happens before the dsb completes.
+         */
+        isb
+
+        ret
+ENDPROC(setup_fixmap)
+
+/* Fail-stop */
+fail:   PRINT("- Boot failed -\r\n")
+1:      wfe
+        b     1b
+ENDPROC(fail)
+
+/*
+ * Switch TTBR
+ *
+ * x0    ttbr
+ */
+ENTRY(switch_ttbr_id)
+        /* 1) Ensure any previous read/write have completed */
+        dsb    ish
+        isb
+
+        /* 2) Turn off MMU */
+        mrs    x1, SCTLR_EL2
+        bic    x1, x1, #SCTLR_Axx_ELx_M
+        msr    SCTLR_EL2, x1
+        isb
+
+        /* 3) Flush the TLBs */
+        flush_xen_tlb_local
+
+        /* 4) Update the TTBR */
+        msr   TTBR0_EL2, x0
+        isb
+
+        /*
+         * 5) Flush I-cache
+         * This should not be necessary but it is kept for safety.
+         */
+        ic     iallu
+        isb
+
+        /* 6) Turn on the MMU */
+        mrs   x1, SCTLR_EL2
+        orr   x1, x1, #SCTLR_Axx_ELx_M  /* Enable MMU */
+        msr   SCTLR_EL2, x1
+        isb
+
+        ret
+ENDPROC(switch_ttbr_id)
+
+/*
+ * Local variables:
+ * mode: ASM
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/arm64/macros.h b/xen/arch/arm/include/asm/arm64/macros.h
index 140e223b4c..99c401fcaf 100644
--- a/xen/arch/arm/include/asm/arm64/macros.h
+++ b/xen/arch/arm/include/asm/arm64/macros.h
@@ -32,6 +32,42 @@
         hint    #22
     .endm
 
+#ifdef CONFIG_EARLY_PRINTK
+/*
+ * Macro to print a string to the UART, if there is one.
+ *
+ * Clobbers x0 - x3
+ */
+#define PRINT(_s)          \
+        mov   x3, lr ;     \
+        adr_l x0, 98f ;    \
+        bl    asm_puts ;   \
+        mov   lr, x3 ;     \
+        RODATA_STR(98, _s)
+
+#else /* CONFIG_EARLY_PRINTK */
+#define PRINT(s)
+
+#endif /* !CONFIG_EARLY_PRINTK */
+
+/*
+ * Pseudo-op for PC relative adr <reg>, <symbol> where <symbol> is
+ * within the range +/- 4GB of the PC.
+ *
+ * @dst: destination register (64 bit wide)
+ * @sym: name of the symbol
+ */
+.macro  adr_l, dst, sym
+        adrp \dst, \sym
+        add  \dst, \dst, :lo12:\sym
+.endm
+
+/* Load the physical address of a symbol into xb */
+.macro load_paddr xb, sym
+        ldr \xb, =\sym
+        add \xb, \xb, x20
+.endm
+
 /*
  * Register aliases.
  */
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index a3c90ca823..59b80d122f 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -34,6 +34,7 @@ SECTIONS
        _stext = .;             /* Text section */
        _idmap_start = .;
        *(.text.header)
+       *(.text.idmap)
        _idmap_end = .;
 
        *(.text.cold)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 05/13] xen/arm: Move MMU related definitions from config.h to mmu/layout.h
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (3 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-14  4:25 ` [PATCH v5 06/13] xen/arm64: Fold setup_fixmap() to create_page_tables() Henry Wang
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng, Henry Wang, Julien Grall

From: Wei Chen <wei.chen@arm.com>

Xen defines some global configuration macros for Arm in config.h.
However there are some address layout related definitions that are
defined for MMU systems only, and these definitions could not be
used by MPU systems. Adding ifdefs to differentiate the MPU from MMU
layout will result in a messy and hard-to-read/maintain code.

So move all memory layout definitions to a new file, i.e. mmu/layout.h
to avoid spreading "#ifdef" everywhere.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
---
v5:
- Rework commit message.
- Add Reviewed-by tag from Julien
v4:
- Rebase on top of latest staging to pick the recent UBSAN change
  to the layout.
- Use #ifdef CONFIG_HAS_MMU instead of #ifndef CONFIG_HAS_MPU, add
  a #else case.
- Rework commit message.
v3:
- name the new header layout.h
v2:
- Remove duplicated FIXMAP definitions from config_mmu.h
---
 xen/arch/arm/include/asm/config.h     | 132 +----------------------
 xen/arch/arm/include/asm/mmu/layout.h | 146 ++++++++++++++++++++++++++
 2 files changed, 149 insertions(+), 129 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/mmu/layout.h

diff --git a/xen/arch/arm/include/asm/config.h b/xen/arch/arm/include/asm/config.h
index 83cbf6b0cb..e1dcec4dd7 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -71,136 +71,10 @@
 #include <xen/const.h>
 #include <xen/page-size.h>
 
-/*
- * ARM32 layout:
- *   0  -   2M   Unmapped
- *   2M -  10M   Xen text, data, bss
- *  10M -  12M   Fixmap: special-purpose 4K mapping slots
- *  12M -  16M   Early boot mapping of FDT
- *  16M -  18M   Livepatch vmap (if compiled in)
- *
- *  32M - 128M   Frametable: 32 bytes per page for 12GB of RAM
- * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
- *                    space
- *
- *   1G -   2G   Xenheap: always-mapped memory
- *   2G -   4G   Domheap: on-demand-mapped
- *
- * ARM64 layout:
- * 0x0000000000000000 - 0x000001ffffffffff (2TB, L0 slots [0..3])
- *
- *  Reserved to identity map Xen
- *
- * 0x0000020000000000 - 0x0000027fffffffff (512GB, L0 slot [4])
- *  (Relative offsets)
- *   0  -   2M   Unmapped
- *   2M -  10M   Xen text, data, bss
- *  10M -  12M   Fixmap: special-purpose 4K mapping slots
- *  12M -  16M   Early boot mapping of FDT
- *  16M -  18M   Livepatch vmap (if compiled in)
- *
- *   1G -   2G   VMAP: ioremap and early_ioremap
- *
- *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
- *
- * 0x0000028000000000 - 0x00007fffffffffff (125TB, L0 slots [5..255])
- *  Unused
- *
- * 0x0000800000000000 - 0x000084ffffffffff (5TB, L0 slots [256..265])
- *  1:1 mapping of RAM
- *
- * 0x0000850000000000 - 0x0000ffffffffffff (123TB, L0 slots [266..511])
- *  Unused
- */
-
-#ifdef CONFIG_ARM_32
-#define XEN_VIRT_START          _AT(vaddr_t, MB(2))
+#ifdef CONFIG_MMU
+#include <asm/mmu/layout.h>
 #else
-
-#define SLOT0_ENTRY_BITS  39
-#define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
-#define SLOT0_ENTRY_SIZE  SLOT0(1)
-
-#define XEN_VIRT_START          (SLOT0(4) + _AT(vaddr_t, MB(2)))
-#endif
-
-/*
- * Reserve enough space so both UBSAN and GCOV can be enabled together
- * plus some slack for future growth.
- */
-#define XEN_VIRT_SIZE           _AT(vaddr_t, MB(8))
-#define XEN_NR_ENTRIES(lvl)     (XEN_VIRT_SIZE / XEN_PT_LEVEL_SIZE(lvl))
-
-#define FIXMAP_VIRT_START       (XEN_VIRT_START + XEN_VIRT_SIZE)
-#define FIXMAP_VIRT_SIZE        _AT(vaddr_t, MB(2))
-
-#define FIXMAP_ADDR(n)          (FIXMAP_VIRT_START + (n) * PAGE_SIZE)
-
-#define BOOT_FDT_VIRT_START     (FIXMAP_VIRT_START + FIXMAP_VIRT_SIZE)
-#define BOOT_FDT_VIRT_SIZE      _AT(vaddr_t, MB(4))
-
-#ifdef CONFIG_LIVEPATCH
-#define LIVEPATCH_VMAP_START    (BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE)
-#define LIVEPATCH_VMAP_SIZE    _AT(vaddr_t, MB(2))
-#endif
-
-#define HYPERVISOR_VIRT_START  XEN_VIRT_START
-
-#ifdef CONFIG_ARM_32
-
-#define CONFIG_SEPARATE_XENHEAP 1
-
-#define FRAMETABLE_VIRT_START  _AT(vaddr_t, MB(32))
-#define FRAMETABLE_SIZE        MB(128-32)
-#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
-
-#define VMAP_VIRT_START        _AT(vaddr_t, MB(256))
-#define VMAP_VIRT_SIZE         _AT(vaddr_t, GB(1) - MB(256))
-
-#define XENHEAP_VIRT_START     _AT(vaddr_t, GB(1))
-#define XENHEAP_VIRT_SIZE      _AT(vaddr_t, GB(1))
-
-#define DOMHEAP_VIRT_START     _AT(vaddr_t, GB(2))
-#define DOMHEAP_VIRT_SIZE      _AT(vaddr_t, GB(2))
-
-#define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
-
-/* Number of domheap pagetable pages required at the second level (2MB mappings) */
-#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
-
-/*
- * The temporary area is overlapping with the domheap area. This may
- * be used to create an alias of the first slot containing Xen mappings
- * when turning on/off the MMU.
- */
-#define TEMPORARY_AREA_FIRST_SLOT    (first_table_offset(DOMHEAP_VIRT_START))
-
-/* Calculate the address in the temporary area */
-#define TEMPORARY_AREA_ADDR(addr)                           \
-     (((addr) & ~XEN_PT_LEVEL_MASK(1)) |                    \
-      (TEMPORARY_AREA_FIRST_SLOT << XEN_PT_LEVEL_SHIFT(1)))
-
-#define TEMPORARY_XEN_VIRT_START    TEMPORARY_AREA_ADDR(XEN_VIRT_START)
-
-#else /* ARM_64 */
-
-#define IDENTITY_MAPPING_AREA_NR_L0  4
-
-#define VMAP_VIRT_START  (SLOT0(4) + GB(1))
-#define VMAP_VIRT_SIZE   GB(1)
-
-#define FRAMETABLE_VIRT_START  (SLOT0(4) + GB(32))
-#define FRAMETABLE_SIZE        GB(32)
-#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
-
-#define DIRECTMAP_VIRT_START   SLOT0(256)
-#define DIRECTMAP_SIZE         (SLOT0_ENTRY_SIZE * (266 - 256))
-#define DIRECTMAP_VIRT_END     (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE - 1)
-
-#define XENHEAP_VIRT_START     directmap_virt_start
-
-#define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
-
+# error "Unknown memory management layout"
 #endif
 
 #define NR_hypercalls 64
diff --git a/xen/arch/arm/include/asm/mmu/layout.h b/xen/arch/arm/include/asm/mmu/layout.h
new file mode 100644
index 0000000000..da6be276ac
--- /dev/null
+++ b/xen/arch/arm/include/asm/mmu/layout.h
@@ -0,0 +1,146 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ARM_MMU_LAYOUT_H__
+#define __ARM_MMU_LAYOUT_H__
+
+/*
+ * ARM32 layout:
+ *   0  -   2M   Unmapped
+ *   2M -  10M   Xen text, data, bss
+ *  10M -  12M   Fixmap: special-purpose 4K mapping slots
+ *  12M -  16M   Early boot mapping of FDT
+ *  16M -  18M   Livepatch vmap (if compiled in)
+ *
+ *  32M - 128M   Frametable: 32 bytes per page for 12GB of RAM
+ * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
+ *                    space
+ *
+ *   1G -   2G   Xenheap: always-mapped memory
+ *   2G -   4G   Domheap: on-demand-mapped
+ *
+ * ARM64 layout:
+ * 0x0000000000000000 - 0x000001ffffffffff (2TB, L0 slots [0..3])
+ *
+ *  Reserved to identity map Xen
+ *
+ * 0x0000020000000000 - 0x0000027fffffffff (512GB, L0 slot [4])
+ *  (Relative offsets)
+ *   0  -   2M   Unmapped
+ *   2M -  10M   Xen text, data, bss
+ *  10M -  12M   Fixmap: special-purpose 4K mapping slots
+ *  12M -  16M   Early boot mapping of FDT
+ *  16M -  18M   Livepatch vmap (if compiled in)
+ *
+ *   1G -   2G   VMAP: ioremap and early_ioremap
+ *
+ *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
+ *
+ * 0x0000028000000000 - 0x00007fffffffffff (125TB, L0 slots [5..255])
+ *  Unused
+ *
+ * 0x0000800000000000 - 0x000084ffffffffff (5TB, L0 slots [256..265])
+ *  1:1 mapping of RAM
+ *
+ * 0x0000850000000000 - 0x0000ffffffffffff (123TB, L0 slots [266..511])
+ *  Unused
+ */
+
+#ifdef CONFIG_ARM_32
+#define XEN_VIRT_START          _AT(vaddr_t, MB(2))
+#else
+
+#define SLOT0_ENTRY_BITS  39
+#define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
+#define SLOT0_ENTRY_SIZE  SLOT0(1)
+
+#define XEN_VIRT_START          (SLOT0(4) + _AT(vaddr_t, MB(2)))
+#endif
+
+/*
+ * Reserve enough space so both UBSAN and GCOV can be enabled together
+ * plus some slack for future growth.
+ */
+#define XEN_VIRT_SIZE           _AT(vaddr_t, MB(8))
+#define XEN_NR_ENTRIES(lvl)     (XEN_VIRT_SIZE / XEN_PT_LEVEL_SIZE(lvl))
+
+#define FIXMAP_VIRT_START       (XEN_VIRT_START + XEN_VIRT_SIZE)
+#define FIXMAP_VIRT_SIZE        _AT(vaddr_t, MB(2))
+
+#define FIXMAP_ADDR(n)          (FIXMAP_VIRT_START + (n) * PAGE_SIZE)
+
+#define BOOT_FDT_VIRT_START     (FIXMAP_VIRT_START + FIXMAP_VIRT_SIZE)
+#define BOOT_FDT_VIRT_SIZE      _AT(vaddr_t, MB(4))
+
+#ifdef CONFIG_LIVEPATCH
+#define LIVEPATCH_VMAP_START    (BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE)
+#define LIVEPATCH_VMAP_SIZE    _AT(vaddr_t, MB(2))
+#endif
+
+#define HYPERVISOR_VIRT_START  XEN_VIRT_START
+
+#ifdef CONFIG_ARM_32
+
+#define CONFIG_SEPARATE_XENHEAP 1
+
+#define FRAMETABLE_VIRT_START  _AT(vaddr_t, MB(32))
+#define FRAMETABLE_SIZE        MB(128-32)
+#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
+
+#define VMAP_VIRT_START        _AT(vaddr_t, MB(256))
+#define VMAP_VIRT_SIZE         _AT(vaddr_t, GB(1) - MB(256))
+
+#define XENHEAP_VIRT_START     _AT(vaddr_t, GB(1))
+#define XENHEAP_VIRT_SIZE      _AT(vaddr_t, GB(1))
+
+#define DOMHEAP_VIRT_START     _AT(vaddr_t, GB(2))
+#define DOMHEAP_VIRT_SIZE      _AT(vaddr_t, GB(2))
+
+#define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
+
+/* Number of domheap pagetable pages required at the second level (2MB mappings) */
+#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
+
+/*
+ * The temporary area is overlapping with the domheap area. This may
+ * be used to create an alias of the first slot containing Xen mappings
+ * when turning on/off the MMU.
+ */
+#define TEMPORARY_AREA_FIRST_SLOT    (first_table_offset(DOMHEAP_VIRT_START))
+
+/* Calculate the address in the temporary area */
+#define TEMPORARY_AREA_ADDR(addr)                           \
+     (((addr) & ~XEN_PT_LEVEL_MASK(1)) |                    \
+      (TEMPORARY_AREA_FIRST_SLOT << XEN_PT_LEVEL_SHIFT(1)))
+
+#define TEMPORARY_XEN_VIRT_START    TEMPORARY_AREA_ADDR(XEN_VIRT_START)
+
+#else /* ARM_64 */
+
+#define IDENTITY_MAPPING_AREA_NR_L0  4
+
+#define VMAP_VIRT_START  (SLOT0(4) + GB(1))
+#define VMAP_VIRT_SIZE   GB(1)
+
+#define FRAMETABLE_VIRT_START  (SLOT0(4) + GB(32))
+#define FRAMETABLE_SIZE        GB(32)
+#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
+
+#define DIRECTMAP_VIRT_START   SLOT0(256)
+#define DIRECTMAP_SIZE         (SLOT0_ENTRY_SIZE * (266 - 256))
+#define DIRECTMAP_VIRT_END     (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE - 1)
+
+#define XENHEAP_VIRT_START     directmap_virt_start
+
+#define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
+
+#endif
+
+#endif /* __ARM_MMU_LAYOUT_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 06/13] xen/arm64: Fold setup_fixmap() to create_page_tables()
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (4 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 05/13] xen/arm: Move MMU related definitions from config.h to mmu/layout.h Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21  9:22   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 07/13] xen/arm: Extract MMU-specific code Henry Wang
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Wei Chen, Penny Zheng, Volodymyr Babchuk

The original assembly setup_fixmap() is actually doing two seperate
tasks, one is enabling the early UART when earlyprintk on, and the
other is to set up the fixmap (even when earlyprintk is off).

Per discussion in [1], since commit
9d267c049d92 ("xen/arm64: Rework the memory layout"), there is no
chance that the fixmap and the mapping of early UART will clash with
the 1:1 mapping. Therefore the mapping of both the fixmap and the
early UART can be moved to the end of create_pagetables().

No functional change intended.

[1] https://lore.kernel.org/xen-devel/78862bb8-fd7f-5a51-a7ae-3c5b5998ed80@xen.org/

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- Refine the title and commit message.
- Drop the "not applied" in-code comment about the 1:1 mapping clash on
  top of create_page_tables().
- Drop the unnecessary dsb and isb from the original setup_fixmap().
v4:
- Rework "[v3,12/52] xen/mmu: extract early uart mapping from setup_fixmap"
---
 xen/arch/arm/arm64/head.S     |  1 -
 xen/arch/arm/arm64/mmu/head.S | 48 ++++++++---------------------------
 2 files changed, 10 insertions(+), 39 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 3c8a12eda7..4ad85dcf58 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -275,7 +275,6 @@ real_start_efi:
         b     enable_boot_cpu_mm
 
 primary_switched:
-        bl    setup_fixmap
 #ifdef CONFIG_EARLY_PRINTK
         /* Use a virtual address to access the UART. */
         ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index 97d872c3cb..ba2ddd7e67 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -126,11 +126,6 @@
  * Rebuild the boot pagetable's first-level entries. The structure
  * is described in mm.c.
  *
- * After the CPU enables paging it will add the fixmap mapping
- * to these page tables, however this may clash with the 1:1
- * mapping. So each CPU must rebuild the page tables here with
- * the 1:1 in place.
- *
  * Inputs:
  *   x19: paddr(start)
  *   x20: phys offset
@@ -243,6 +238,16 @@ link_from_second_id:
         create_table_entry boot_second_id, boot_third_id, x19, 2, x0, x1, x2
 link_from_third_id:
         create_mapping_entry boot_third_id, x19, x19, x0, x1, x2
+
+#ifdef CONFIG_EARLY_PRINTK
+        /* Add UART to the fixmap table */
+        ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
+        /* x23: Early UART base physical address */
+        create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
+#endif
+        /* Map fixmap into boot_second */
+        ldr   x0, =FIXMAP_ADDR(0)
+        create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
         ret
 
 virtphys_clash:
@@ -402,39 +407,6 @@ identity_mapping_removed:
         ret
 ENDPROC(remove_identity_mapping)
 
-/*
- * Map the UART in the fixmap (when earlyprintk is used) and hook the
- * fixmap table in the page tables.
- *
- * The fixmap cannot be mapped in create_page_tables because it may
- * clash with the 1:1 mapping.
- *
- * Inputs:
- *   x20: Physical offset
- *   x23: Early UART base physical address
- *
- * Clobbers x0 - x3
- */
-ENTRY(setup_fixmap)
-#ifdef CONFIG_EARLY_PRINTK
-        /* Add UART to the fixmap table */
-        ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
-        create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
-#endif
-        /* Map fixmap into boot_second */
-        ldr   x0, =FIXMAP_ADDR(0)
-        create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
-        /* Ensure any page table updates made above have occurred. */
-        dsb   nshst
-        /*
-         * The fixmap area will be used soon after. So ensure no hardware
-         * translation happens before the dsb completes.
-         */
-        isb
-
-        ret
-ENDPROC(setup_fixmap)
-
 /* Fail-stop */
 fail:   PRINT("- Boot failed -\r\n")
 1:      wfe
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 07/13] xen/arm: Extract MMU-specific code
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (5 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 06/13] xen/arm64: Fold setup_fixmap() to create_page_tables() Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21 17:57   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system Henry Wang
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Wei Chen, Penny Zheng, Volodymyr Babchuk

Currently, most of the MMU-specific code is in mm.{c,h}. To make the
mm extendable, this commit extract the MMU-specific code by firstly:
- Create a arch/arm/include/asm/mmu/ subdir.
- Create a arch/arm/mmu/ subdir.

Then move the MMU-specific code to above mmu subdir, which includes
below changes:
- Move arch/arm/arm64/mm.c to arch/arm/arm64/mmu/mm.c
- Move MMU-related declaration in arch/arm/include/asm/mm.h to
  arch/arm/include/asm/mmu/mm.h
- Move the MMU-related declaration dump_pt_walk() in asm/page.h
  and pte_of_xenaddr() in asm/setup.h to the new asm/mmu/mm.h.
- Move MMU-related code in arch/arm/mm.c to arch/arm/mmu/mm.c.

Also modify the build system (Makefiles in this case) to pick above
mentioned code changes.

This patch is a pure code movement, no functional change intended.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
With the code movement of this patch, the descriptions on top of
xen/arch/arm/mm.c and xen/arch/arm/mmu/mm.c might need some changes,
suggestions?
v5:
- Rebase on top of xen/arm: Introduce CONFIG_MMU Kconfig option and
  xen/arm: mm: add missing extern variable declaration
v4:
- Rework "[v3,13/52] xen/mmu: extract mmu-specific codes from
  mm.c/mm.h" with the lastest staging branch, only do the code movement
  in this patch to ease the review.
---
 xen/arch/arm/Makefile             |    1 +
 xen/arch/arm/arm64/Makefile       |    1 -
 xen/arch/arm/arm64/mmu/Makefile   |    1 +
 xen/arch/arm/arm64/{ => mmu}/mm.c |    0
 xen/arch/arm/include/asm/mm.h     |   20 +-
 xen/arch/arm/include/asm/mmu/mm.h |   55 ++
 xen/arch/arm/include/asm/page.h   |   15 -
 xen/arch/arm/include/asm/setup.h  |    3 -
 xen/arch/arm/mm.c                 | 1119 ----------------------------
 xen/arch/arm/mmu/Makefile         |    1 +
 xen/arch/arm/mmu/mm.c             | 1146 +++++++++++++++++++++++++++++
 11 files changed, 1208 insertions(+), 1154 deletions(-)
 rename xen/arch/arm/arm64/{ => mmu}/mm.c (100%)
 create mode 100644 xen/arch/arm/include/asm/mmu/mm.h
 create mode 100644 xen/arch/arm/mmu/Makefile
 create mode 100644 xen/arch/arm/mmu/mm.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 7bf07e9920..548917097c 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_ARM_32) += arm32/
 obj-$(CONFIG_ARM_64) += arm64/
 obj-$(CONFIG_ACPI) += acpi/
+obj-$(CONFIG_MMU) += mmu/
 obj-$(CONFIG_HAS_PCI) += pci/
 ifneq ($(CONFIG_NO_PLAT),y)
 obj-y += platforms/
diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index f89d5fb4fb..72161ff22e 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -11,7 +11,6 @@ obj-y += entry.o
 obj-y += head.o
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
-obj-y += mm.o
 obj-y += smc.o
 obj-y += smpboot.o
 obj-$(CONFIG_ARM64_SVE) += sve.o sve-asm.o
diff --git a/xen/arch/arm/arm64/mmu/Makefile b/xen/arch/arm/arm64/mmu/Makefile
index 3340058c08..a8a750a3d0 100644
--- a/xen/arch/arm/arm64/mmu/Makefile
+++ b/xen/arch/arm/arm64/mmu/Makefile
@@ -1 +1,2 @@
 obj-y += head.o
+obj-y += mm.o
diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mmu/mm.c
similarity index 100%
rename from xen/arch/arm/arm64/mm.c
rename to xen/arch/arm/arm64/mmu/mm.c
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index aaacba3f04..dc1458b047 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -14,6 +14,10 @@
 # error "unknown ARM variant"
 #endif
 
+#ifdef CONFIG_MMU
+#include <asm/mmu/mm.h>
+#endif
+
 /* Align Xen to a 2 MiB boundary. */
 #define XEN_PADDR_ALIGN (1 << 21)
 
@@ -168,13 +172,6 @@ struct page_info
 /* Non-boot CPUs use this to find the correct pagetables. */
 extern uint64_t init_ttbr;
 
-extern mfn_t directmap_mfn_start, directmap_mfn_end;
-extern vaddr_t directmap_virt_end;
-#ifdef CONFIG_ARM_64
-extern vaddr_t directmap_virt_start;
-extern unsigned long directmap_base_pdx;
-#endif
-
 #ifdef CONFIG_ARM_32
 #define is_xen_heap_page(page) is_xen_heap_mfn(page_to_mfn(page))
 #define is_xen_heap_mfn(mfn) ({                                 \
@@ -197,7 +194,6 @@ extern unsigned long directmap_base_pdx;
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
 
-#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 /* PDX of the first page in the frame table. */
 extern unsigned long frametable_base_pdx;
 
@@ -207,8 +203,6 @@ extern unsigned long frametable_base_pdx;
 extern void setup_pagetables(unsigned long boot_phys_offset);
 /* Map FDT in boot pagetable */
 extern void *early_fdt_map(paddr_t fdt_paddr);
-/* Switch to a new root page-tables */
-extern void switch_ttbr(uint64_t ttbr);
 /* Remove early mappings */
 extern void remove_early_mappings(void);
 /* Allocate and initialise pagetables for a secondary CPU. Sets init_ttbr to the
@@ -216,12 +210,6 @@ extern void remove_early_mappings(void);
 extern int init_secondary_pagetables(int cpu);
 /* Switch secondary CPUS to its own pagetables and finalise MMU setup */
 extern void mmu_init_secondary_cpu(void);
-/*
- * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
- * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
- * For Arm64, map the region in the directmap area.
- */
-extern void setup_directmap_mappings(unsigned long base_mfn, unsigned long nr_mfns);
 /* Map a frame table to cover physical addresses ps through pe */
 extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
 /* map a physical range in virtual memory */
diff --git a/xen/arch/arm/include/asm/mmu/mm.h b/xen/arch/arm/include/asm/mmu/mm.h
new file mode 100644
index 0000000000..3d9755b7b4
--- /dev/null
+++ b/xen/arch/arm/include/asm/mmu/mm.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ARM_MMU_MM_H__
+#define __ARM_MMU_MM_H__
+
+extern mfn_t directmap_mfn_start, directmap_mfn_end;
+extern vaddr_t directmap_virt_end;
+#ifdef CONFIG_ARM_64
+extern vaddr_t directmap_virt_start;
+extern unsigned long directmap_base_pdx;
+#endif
+
+#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
+
+/*
+ * Print a walk of a page table or p2m
+ *
+ * ttbr is the base address register (TTBR0_EL2 or VTTBR_EL2)
+ * addr is the PA or IPA to translate
+ * root_level is the starting level of the page table
+ *   (e.g. TCR_EL2.SL0 or VTCR_EL2.SL0 )
+ * nr_root_tables is the number of concatenated tables at the root.
+ *   this can only be != 1 for P2M walks starting at the first or
+ *   subsequent level.
+ */
+void dump_pt_walk(paddr_t ttbr, paddr_t addr,
+                  unsigned int root_level,
+                  unsigned int nr_root_tables);
+
+/* Find where Xen will be residing at runtime and return a PT entry */
+lpae_t pte_of_xenaddr(vaddr_t);
+
+/* Switch to a new root page-tables */
+extern void switch_ttbr(uint64_t ttbr);
+/*
+ * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
+ * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
+ * For Arm64, map the region in the directmap area.
+ */
+extern void setup_directmap_mappings(unsigned long base_mfn, unsigned long nr_mfns);
+extern int xen_pt_update(unsigned long virt,
+                         mfn_t mfn,
+                         /* const on purpose as it is used for TLB flush */
+                         const unsigned long nr_mfns,
+                         unsigned int flags);
+
+#endif /* __ARM_MMU_MM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h
index 657c4b33db..ac65f0277a 100644
--- a/xen/arch/arm/include/asm/page.h
+++ b/xen/arch/arm/include/asm/page.h
@@ -257,21 +257,6 @@ static inline void write_pte(lpae_t *p, lpae_t pte)
 /* Flush the dcache for an entire page. */
 void flush_page_to_ram(unsigned long mfn, bool sync_icache);
 
-/*
- * Print a walk of a page table or p2m
- *
- * ttbr is the base address register (TTBR0_EL2 or VTTBR_EL2)
- * addr is the PA or IPA to translate
- * root_level is the starting level of the page table
- *   (e.g. TCR_EL2.SL0 or VTCR_EL2.SL0 )
- * nr_root_tables is the number of concatenated tables at the root.
- *   this can only be != 1 for P2M walks starting at the first or
- *   subsequent level.
- */
-void dump_pt_walk(paddr_t ttbr, paddr_t addr,
-                  unsigned int root_level,
-                  unsigned int nr_root_tables);
-
 /* Print a walk of the hypervisor's page tables for a virtual addr. */
 extern void dump_hyp_walk(vaddr_t addr);
 /* Print a walk of the p2m for a domain for a physical address. */
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 19dc637d55..f0f64d228c 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -176,9 +176,6 @@ extern lpae_t boot_first_id[XEN_PT_LPAE_ENTRIES];
 extern lpae_t boot_second_id[XEN_PT_LPAE_ENTRIES];
 extern lpae_t boot_third_id[XEN_PT_LPAE_ENTRIES];
 
-/* Find where Xen will be residing at runtime and return a PT entry */
-lpae_t pte_of_xenaddr(vaddr_t);
-
 extern const char __ro_after_init_start[], __ro_after_init_end[];
 
 struct init_info
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index d1e1bc72bd..487c64db0f 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -15,16 +15,12 @@
 #include <xen/init.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/mm.h>
-#include <xen/pfn.h>
-#include <xen/pmap.h>
-#include <xen/sched.h>
 #include <xen/sizes.h>
 #include <xen/types.h>
 #include <xen/vmap.h>
 
 #include <xsm/xsm.h>
 
-#include <asm/fixmap.h>
 #include <asm/setup.h>
 
 #include <public/memory.h>
@@ -32,347 +28,12 @@
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
-#undef mfn_to_virt
-#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))
-
-#ifdef NDEBUG
-static inline void
-__attribute__ ((__format__ (__printf__, 1, 2)))
-mm_printk(const char *fmt, ...) {}
-#else
-#define mm_printk(fmt, args...)             \
-    do                                      \
-    {                                       \
-        dprintk(XENLOG_ERR, fmt, ## args);  \
-        WARN();                             \
-    } while (0)
-#endif
-
-/* Static start-of-day pagetables that we use before the allocators
- * are up. These are used by all CPUs during bringup before switching
- * to the CPUs own pagetables.
- *
- * These pagetables have a very simple structure. They include:
- *  - XEN_VIRT_SIZE worth of L3 mappings of xen at XEN_VIRT_START, boot_first
- *    and boot_second are used to populate the tables down to boot_third
- *    which contains the actual mapping.
- *  - a 1:1 mapping of xen at its current physical address. This uses a
- *    section mapping at whichever of boot_{pgtable,first,second}
- *    covers that physical address.
- *
- * For the boot CPU these mappings point to the address where Xen was
- * loaded by the bootloader. For secondary CPUs they point to the
- * relocated copy of Xen for the benefit of secondary CPUs.
- *
- * In addition to the above for the boot CPU the device-tree is
- * initially mapped in the boot misc slot. This mapping is not present
- * for secondary CPUs.
- *
- * Finally, if EARLY_PRINTK is enabled then xen_fixmap will be mapped
- * by the CPU once it has moved off the 1:1 mapping.
- */
-DEFINE_BOOT_PAGE_TABLE(boot_pgtable);
-#ifdef CONFIG_ARM_64
-DEFINE_BOOT_PAGE_TABLE(boot_first);
-DEFINE_BOOT_PAGE_TABLE(boot_first_id);
-#endif
-DEFINE_BOOT_PAGE_TABLE(boot_second_id);
-DEFINE_BOOT_PAGE_TABLE(boot_third_id);
-DEFINE_BOOT_PAGE_TABLE(boot_second);
-DEFINE_BOOT_PAGE_TABLES(boot_third, XEN_NR_ENTRIES(2));
-
-/* Main runtime page tables */
-
-/*
- * For arm32 xen_pgtable are per-PCPU and are allocated before
- * bringing up each CPU. For arm64 xen_pgtable is common to all PCPUs.
- *
- * xen_second, xen_fixmap and xen_xenmap are always shared between all
- * PCPUs.
- */
-
-#ifdef CONFIG_ARM_64
-#define HYP_PT_ROOT_LEVEL 0
-DEFINE_PAGE_TABLE(xen_pgtable);
-static DEFINE_PAGE_TABLE(xen_first);
-#define THIS_CPU_PGTABLE xen_pgtable
-#else
-#define HYP_PT_ROOT_LEVEL 1
-/* Per-CPU pagetable pages */
-/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
-DEFINE_PER_CPU(lpae_t *, xen_pgtable);
-#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
-/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
-static DEFINE_PAGE_TABLE(cpu0_pgtable);
-#endif
-
-/* Common pagetable leaves */
-/* Second level page table used to cover Xen virtual address space */
-static DEFINE_PAGE_TABLE(xen_second);
-/* Third level page table used for fixmap */
-DEFINE_BOOT_PAGE_TABLE(xen_fixmap);
-/*
- * Third level page table used to map Xen itself with the XN bit set
- * as appropriate.
- */
-static DEFINE_PAGE_TABLES(xen_xenmap, XEN_NR_ENTRIES(2));
-
-/* Non-boot CPUs use this to find the correct pagetables. */
-uint64_t init_ttbr;
-
-static paddr_t phys_offset;
-
-/* Limits of the Xen heap */
-mfn_t directmap_mfn_start __read_mostly = INVALID_MFN_INITIALIZER;
-mfn_t directmap_mfn_end __read_mostly;
-vaddr_t directmap_virt_end __read_mostly;
-#ifdef CONFIG_ARM_64
-vaddr_t directmap_virt_start __read_mostly;
-unsigned long directmap_base_pdx __read_mostly;
-#endif
 
 unsigned long frametable_base_pdx __read_mostly;
 unsigned long frametable_virt_end __read_mostly;
 
 extern char __init_begin[], __init_end[];
 
-/* Checking VA memory layout alignment. */
-static void __init __maybe_unused build_assertions(void)
-{
-    /* 2MB aligned regions */
-    BUILD_BUG_ON(XEN_VIRT_START & ~SECOND_MASK);
-    BUILD_BUG_ON(FIXMAP_ADDR(0) & ~SECOND_MASK);
-    /* 1GB aligned regions */
-#ifdef CONFIG_ARM_32
-    BUILD_BUG_ON(XENHEAP_VIRT_START & ~FIRST_MASK);
-#else
-    BUILD_BUG_ON(DIRECTMAP_VIRT_START & ~FIRST_MASK);
-#endif
-    /* Page table structure constraints */
-#ifdef CONFIG_ARM_64
-    /*
-     * The first few slots of the L0 table is reserved for the identity
-     * mapping. Check that none of the other regions are overlapping
-     * with it.
-     */
-#define CHECK_OVERLAP_WITH_IDMAP(virt) \
-    BUILD_BUG_ON(zeroeth_table_offset(virt) < IDENTITY_MAPPING_AREA_NR_L0)
-
-    CHECK_OVERLAP_WITH_IDMAP(XEN_VIRT_START);
-    CHECK_OVERLAP_WITH_IDMAP(VMAP_VIRT_START);
-    CHECK_OVERLAP_WITH_IDMAP(FRAMETABLE_VIRT_START);
-    CHECK_OVERLAP_WITH_IDMAP(DIRECTMAP_VIRT_START);
-#undef CHECK_OVERLAP_WITH_IDMAP
-#endif
-    BUILD_BUG_ON(first_table_offset(XEN_VIRT_START));
-#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
-    BUILD_BUG_ON(DOMHEAP_VIRT_START & ~FIRST_MASK);
-#endif
-    /*
-     * The boot code expects the regions XEN_VIRT_START, FIXMAP_ADDR(0),
-     * BOOT_FDT_VIRT_START to use the same 0th (arm64 only) and 1st
-     * slot in the page tables.
-     */
-#define CHECK_SAME_SLOT(level, virt1, virt2) \
-    BUILD_BUG_ON(level##_table_offset(virt1) != level##_table_offset(virt2))
-
-#define CHECK_DIFFERENT_SLOT(level, virt1, virt2) \
-    BUILD_BUG_ON(level##_table_offset(virt1) == level##_table_offset(virt2))
-
-#ifdef CONFIG_ARM_64
-    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, FIXMAP_ADDR(0));
-    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, BOOT_FDT_VIRT_START);
-#endif
-    CHECK_SAME_SLOT(first, XEN_VIRT_START, FIXMAP_ADDR(0));
-    CHECK_SAME_SLOT(first, XEN_VIRT_START, BOOT_FDT_VIRT_START);
-
-    /*
-     * For arm32, the temporary mapping will re-use the domheap
-     * first slot and the second slots will match.
-     */
-#ifdef CONFIG_ARM_32
-    CHECK_SAME_SLOT(first, TEMPORARY_XEN_VIRT_START, DOMHEAP_VIRT_START);
-    CHECK_DIFFERENT_SLOT(first, XEN_VIRT_START, TEMPORARY_XEN_VIRT_START);
-    CHECK_SAME_SLOT(second, XEN_VIRT_START, TEMPORARY_XEN_VIRT_START);
-#endif
-
-#undef CHECK_SAME_SLOT
-#undef CHECK_DIFFERENT_SLOT
-}
-
-static lpae_t *xen_map_table(mfn_t mfn)
-{
-    /*
-     * During early boot, map_domain_page() may be unusable. Use the
-     * PMAP to map temporarily a page-table.
-     */
-    if ( system_state == SYS_STATE_early_boot )
-        return pmap_map(mfn);
-
-    return map_domain_page(mfn);
-}
-
-static void xen_unmap_table(const lpae_t *table)
-{
-    /*
-     * During early boot, xen_map_table() will not use map_domain_page()
-     * but the PMAP.
-     */
-    if ( system_state == SYS_STATE_early_boot )
-        pmap_unmap(table);
-    else
-        unmap_domain_page(table);
-}
-
-void dump_pt_walk(paddr_t ttbr, paddr_t addr,
-                  unsigned int root_level,
-                  unsigned int nr_root_tables)
-{
-    static const char *level_strs[4] = { "0TH", "1ST", "2ND", "3RD" };
-    const mfn_t root_mfn = maddr_to_mfn(ttbr);
-    DECLARE_OFFSETS(offsets, addr);
-    lpae_t pte, *mapping;
-    unsigned int level, root_table;
-
-#ifdef CONFIG_ARM_32
-    BUG_ON(root_level < 1);
-#endif
-    BUG_ON(root_level > 3);
-
-    if ( nr_root_tables > 1 )
-    {
-        /*
-         * Concatenated root-level tables. The table number will be
-         * the offset at the previous level. It is not possible to
-         * concatenate a level-0 root.
-         */
-        BUG_ON(root_level == 0);
-        root_table = offsets[root_level - 1];
-        printk("Using concatenated root table %u\n", root_table);
-        if ( root_table >= nr_root_tables )
-        {
-            printk("Invalid root table offset\n");
-            return;
-        }
-    }
-    else
-        root_table = 0;
-
-    mapping = xen_map_table(mfn_add(root_mfn, root_table));
-
-    for ( level = root_level; ; level++ )
-    {
-        if ( offsets[level] > XEN_PT_LPAE_ENTRIES )
-            break;
-
-        pte = mapping[offsets[level]];
-
-        printk("%s[0x%03x] = 0x%"PRIx64"\n",
-               level_strs[level], offsets[level], pte.bits);
-
-        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
-            break;
-
-        /* For next iteration */
-        xen_unmap_table(mapping);
-        mapping = xen_map_table(lpae_get_mfn(pte));
-    }
-
-    xen_unmap_table(mapping);
-}
-
-void dump_hyp_walk(vaddr_t addr)
-{
-    uint64_t ttbr = READ_SYSREG64(TTBR0_EL2);
-
-    printk("Walking Hypervisor VA 0x%"PRIvaddr" "
-           "on CPU%d via TTBR 0x%016"PRIx64"\n",
-           addr, smp_processor_id(), ttbr);
-
-    dump_pt_walk(ttbr, addr, HYP_PT_ROOT_LEVEL, 1);
-}
-
-lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr)
-{
-    lpae_t e = (lpae_t) {
-        .pt = {
-            .valid = 1,           /* Mappings are present */
-            .table = 0,           /* Set to 1 for links and 4k maps */
-            .ai = attr,
-            .ns = 1,              /* Hyp mode is in the non-secure world */
-            .up = 1,              /* See below */
-            .ro = 0,              /* Assume read-write */
-            .af = 1,              /* No need for access tracking */
-            .ng = 1,              /* Makes TLB flushes easier */
-            .contig = 0,          /* Assume non-contiguous */
-            .xn = 1,              /* No need to execute outside .text */
-            .avail = 0,           /* Reference count for domheap mapping */
-        }};
-    /*
-     * For EL2 stage-1 page table, up (aka AP[1]) is RES1 as the translation
-     * regime applies to only one exception level (see D4.4.4 and G4.6.1
-     * in ARM DDI 0487B.a). If this changes, remember to update the
-     * hard-coded values in head.S too.
-     */
-
-    switch ( attr )
-    {
-    case MT_NORMAL_NC:
-        /*
-         * ARM ARM: Overlaying the shareability attribute (DDI
-         * 0406C.b B3-1376 to 1377)
-         *
-         * A memory region with a resultant memory type attribute of Normal,
-         * and a resultant cacheability attribute of Inner Non-cacheable,
-         * Outer Non-cacheable, must have a resultant shareability attribute
-         * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
-         *
-         * On ARMv8 sharability is ignored and explicitly treated as Outer
-         * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
-         */
-        e.pt.sh = LPAE_SH_OUTER;
-        break;
-    case MT_DEVICE_nGnRnE:
-    case MT_DEVICE_nGnRE:
-        /*
-         * Shareability is ignored for non-Normal memory, Outer is as
-         * good as anything.
-         *
-         * On ARMv8 sharability is ignored and explicitly treated as Outer
-         * Shareable for any device memory type.
-         */
-        e.pt.sh = LPAE_SH_OUTER;
-        break;
-    default:
-        e.pt.sh = LPAE_SH_INNER;  /* Xen mappings are SMP coherent */
-        break;
-    }
-
-    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
-
-    lpae_set_mfn(e, mfn);
-
-    return e;
-}
-
-/* Map a 4k page in a fixmap entry */
-void set_fixmap(unsigned int map, mfn_t mfn, unsigned int flags)
-{
-    int res;
-
-    res = map_pages_to_xen(FIXMAP_ADDR(map), mfn, 1, flags);
-    BUG_ON(res != 0);
-}
-
-/* Remove a mapping from a fixmap entry */
-void clear_fixmap(unsigned int map)
-{
-    int res;
-
-    res = destroy_xen_mappings(FIXMAP_ADDR(map), FIXMAP_ADDR(map) + PAGE_SIZE);
-    BUG_ON(res != 0);
-}
-
 void flush_page_to_ram(unsigned long mfn, bool sync_icache)
 {
     void *v = map_domain_page(_mfn(mfn));
@@ -392,13 +53,6 @@ void flush_page_to_ram(unsigned long mfn, bool sync_icache)
         invalidate_icache();
 }
 
-lpae_t pte_of_xenaddr(vaddr_t va)
-{
-    paddr_t ma = va + phys_offset;
-
-    return mfn_to_xen_entry(maddr_to_mfn(ma), MT_NORMAL);
-}
-
 void * __init early_fdt_map(paddr_t fdt_paddr)
 {
     /* We are using 2MB superpage for mapping the FDT */
@@ -452,779 +106,11 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
     return fdt_virt;
 }
 
-void __init remove_early_mappings(void)
-{
-    int rc;
-
-    /* destroy the _PAGE_BLOCK mapping */
-    rc = modify_xen_mappings(BOOT_FDT_VIRT_START,
-                             BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE,
-                             _PAGE_BLOCK);
-    BUG_ON(rc);
-}
-
-/*
- * After boot, Xen page-tables should not contain mapping that are both
- * Writable and eXecutables.
- *
- * This should be called on each CPU to enforce the policy.
- */
-static void xen_pt_enforce_wnx(void)
-{
-    WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
-    /*
-     * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
-     * before flushing the TLBs.
-     */
-    isb();
-    flush_xen_tlb_local();
-}
-
-/* Clear a translation table and clean & invalidate the cache */
-static void clear_table(void *table)
-{
-    clear_page(table);
-    clean_and_invalidate_dcache_va_range(table, PAGE_SIZE);
-}
-
-/* Boot-time pagetable setup.
- * Changes here may need matching changes in head.S */
-void __init setup_pagetables(unsigned long boot_phys_offset)
-{
-    uint64_t ttbr;
-    lpae_t pte, *p;
-    int i;
-
-    phys_offset = boot_phys_offset;
-
-    arch_setup_page_tables();
-
-#ifdef CONFIG_ARM_64
-    pte = pte_of_xenaddr((uintptr_t)xen_first);
-    pte.pt.table = 1;
-    pte.pt.xn = 0;
-    xen_pgtable[zeroeth_table_offset(XEN_VIRT_START)] = pte;
-
-    p = (void *) xen_first;
-#else
-    p = (void *) cpu0_pgtable;
-#endif
-
-    /* Map xen second level page-table */
-    p[0] = pte_of_xenaddr((uintptr_t)(xen_second));
-    p[0].pt.table = 1;
-    p[0].pt.xn = 0;
-
-    /* Break up the Xen mapping into pages and protect them separately. */
-    for ( i = 0; i < XEN_NR_ENTRIES(3); i++ )
-    {
-        vaddr_t va = XEN_VIRT_START + (i << PAGE_SHIFT);
-
-        if ( !is_kernel(va) )
-            break;
-        pte = pte_of_xenaddr(va);
-        pte.pt.table = 1; /* third level mappings always have this bit set */
-        if ( is_kernel_text(va) || is_kernel_inittext(va) )
-        {
-            pte.pt.xn = 0;
-            pte.pt.ro = 1;
-        }
-        if ( is_kernel_rodata(va) )
-            pte.pt.ro = 1;
-        xen_xenmap[i] = pte;
-    }
-
-    /* Initialise xen second level entries ... */
-    /* ... Xen's text etc */
-    for ( i = 0; i < XEN_NR_ENTRIES(2); i++ )
-    {
-        vaddr_t va = XEN_VIRT_START + (i << XEN_PT_LEVEL_SHIFT(2));
-
-        pte = pte_of_xenaddr((vaddr_t)(xen_xenmap + i * XEN_PT_LPAE_ENTRIES));
-        pte.pt.table = 1;
-        xen_second[second_table_offset(va)] = pte;
-    }
-
-    /* ... Fixmap */
-    pte = pte_of_xenaddr((vaddr_t)xen_fixmap);
-    pte.pt.table = 1;
-    xen_second[second_table_offset(FIXMAP_ADDR(0))] = pte;
-
-#ifdef CONFIG_ARM_64
-    ttbr = (uintptr_t) xen_pgtable + phys_offset;
-#else
-    ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
-#endif
-
-    switch_ttbr(ttbr);
-
-    xen_pt_enforce_wnx();
-
-#ifdef CONFIG_ARM_32
-    per_cpu(xen_pgtable, 0) = cpu0_pgtable;
-#endif
-}
-
-static void clear_boot_pagetables(void)
-{
-    /*
-     * Clear the copy of the boot pagetables. Each secondary CPU
-     * rebuilds these itself (see head.S).
-     */
-    clear_table(boot_pgtable);
-#ifdef CONFIG_ARM_64
-    clear_table(boot_first);
-    clear_table(boot_first_id);
-#endif
-    clear_table(boot_second);
-    clear_table(boot_third);
-}
-
-#ifdef CONFIG_ARM_64
-int init_secondary_pagetables(int cpu)
-{
-    clear_boot_pagetables();
-
-    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
-     * pagetables, but rewrite it each time for consistency with 32 bit. */
-    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
-    clean_dcache(init_ttbr);
-    return 0;
-}
-#else
-int init_secondary_pagetables(int cpu)
-{
-    lpae_t *first;
-
-    first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level trie */
-
-    if ( !first )
-    {
-        printk("CPU%u: Unable to allocate the first page-table\n", cpu);
-        return -ENOMEM;
-    }
-
-    /* Initialise root pagetable from root of boot tables */
-    memcpy(first, cpu0_pgtable, PAGE_SIZE);
-    per_cpu(xen_pgtable, cpu) = first;
-
-    if ( !init_domheap_mappings(cpu) )
-    {
-        printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
-        per_cpu(xen_pgtable, cpu) = NULL;
-        free_xenheap_page(first);
-        return -ENOMEM;
-    }
-
-    clear_boot_pagetables();
-
-    /* Set init_ttbr for this CPU coming up */
-    init_ttbr = __pa(first);
-    clean_dcache(init_ttbr);
-
-    return 0;
-}
-#endif
-
-/* MMU setup for secondary CPUS (which already have paging enabled) */
-void mmu_init_secondary_cpu(void)
-{
-    xen_pt_enforce_wnx();
-}
-
-#ifdef CONFIG_ARM_32
-/*
- * Set up the direct-mapped xenheap:
- * up to 1GB of contiguous, always-mapped memory.
- */
-void __init setup_directmap_mappings(unsigned long base_mfn,
-                                     unsigned long nr_mfns)
-{
-    int rc;
-
-    rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns,
-                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to setup the directmap mappings.\n");
-
-    /* Record where the directmap is, for translation routines. */
-    directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
-}
-#else /* CONFIG_ARM_64 */
-/* Map the region in the directmap area. */
-void __init setup_directmap_mappings(unsigned long base_mfn,
-                                     unsigned long nr_mfns)
-{
-    int rc;
-
-    /* First call sets the directmap physical and virtual offset. */
-    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
-    {
-        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
-
-        directmap_mfn_start = _mfn(base_mfn);
-        directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
-        /*
-         * The base address may not be aligned to the first level
-         * size (e.g. 1GB when using 4KB pages). This would prevent
-         * superpage mappings for all the regions because the virtual
-         * address and machine address should both be suitably aligned.
-         *
-         * Prevent that by offsetting the start of the directmap virtual
-         * address.
-         */
-        directmap_virt_start = DIRECTMAP_VIRT_START +
-            (base_mfn - mfn_gb) * PAGE_SIZE;
-    }
-
-    if ( base_mfn < mfn_x(directmap_mfn_start) )
-        panic("cannot add directmap mapping at %lx below heap start %lx\n",
-              base_mfn, mfn_x(directmap_mfn_start));
-
-    rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
-                          _mfn(base_mfn), nr_mfns,
-                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to setup the directmap mappings.\n");
-}
-#endif
-
-/* Map a frame table to cover physical addresses ps through pe */
-void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
-{
-    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
-                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
-    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
-    mfn_t base_mfn;
-    const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : MB(32);
-    int rc;
-
-    /*
-     * The size of paddr_t should be sufficient for the complete range of
-     * physical address.
-     */
-    BUILD_BUG_ON((sizeof(paddr_t) * BITS_PER_BYTE) < PADDR_BITS);
-    BUILD_BUG_ON(sizeof(struct page_info) != PAGE_INFO_SIZE);
-
-    if ( frametable_size > FRAMETABLE_SIZE )
-        panic("The frametable cannot cover the physical region %#"PRIpaddr" - %#"PRIpaddr"\n",
-              ps, pe);
-
-    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
-    /* Round up to 2M or 32M boundary, as appropriate. */
-    frametable_size = ROUNDUP(frametable_size, mapping_size);
-    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
-
-    rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
-                          frametable_size >> PAGE_SHIFT,
-                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to setup the frametable mappings.\n");
-
-    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
-    memset(&frame_table[nr_pdxs], -1,
-           frametable_size - (nr_pdxs * sizeof(struct page_info)));
-
-    frametable_virt_end = FRAMETABLE_VIRT_START + (nr_pdxs * sizeof(struct page_info));
-}
-
-void *__init arch_vmap_virt_end(void)
-{
-    return (void *)(VMAP_VIRT_START + VMAP_VIRT_SIZE);
-}
-
-/*
- * This function should only be used to remap device address ranges
- * TODO: add a check to verify this assumption
- */
-void *ioremap_attr(paddr_t start, size_t len, unsigned int attributes)
-{
-    mfn_t mfn = _mfn(PFN_DOWN(start));
-    unsigned int offs = start & (PAGE_SIZE - 1);
-    unsigned int nr = PFN_UP(offs + len);
-    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
-
-    if ( ptr == NULL )
-        return NULL;
-
-    return ptr + offs;
-}
-
 void *ioremap(paddr_t pa, size_t len)
 {
     return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
 }
 
-static int create_xen_table(lpae_t *entry)
-{
-    mfn_t mfn;
-    void *p;
-    lpae_t pte;
-
-    if ( system_state != SYS_STATE_early_boot )
-    {
-        struct page_info *pg = alloc_domheap_page(NULL, 0);
-
-        if ( pg == NULL )
-            return -ENOMEM;
-
-        mfn = page_to_mfn(pg);
-    }
-    else
-        mfn = alloc_boot_pages(1, 1);
-
-    p = xen_map_table(mfn);
-    clear_page(p);
-    xen_unmap_table(p);
-
-    pte = mfn_to_xen_entry(mfn, MT_NORMAL);
-    pte.pt.table = 1;
-    write_pte(entry, pte);
-    /*
-     * No ISB here. It is deferred to xen_pt_update() as the new table
-     * will not be used for hardware translation table access as part of
-     * the mapping update.
-     */
-
-    return 0;
-}
-
-#define XEN_TABLE_MAP_FAILED 0
-#define XEN_TABLE_SUPER_PAGE 1
-#define XEN_TABLE_NORMAL_PAGE 2
-
-/*
- * Take the currently mapped table, find the corresponding entry,
- * and map the next table, if available.
- *
- * The read_only parameters indicates whether intermediate tables should
- * be allocated when not present.
- *
- * Return values:
- *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
- *  was empty, or allocating a new page failed.
- *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
- *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
- */
-static int xen_pt_next_level(bool read_only, unsigned int level,
-                             lpae_t **table, unsigned int offset)
-{
-    lpae_t *entry;
-    int ret;
-    mfn_t mfn;
-
-    entry = *table + offset;
-
-    if ( !lpae_is_valid(*entry) )
-    {
-        if ( read_only )
-            return XEN_TABLE_MAP_FAILED;
-
-        ret = create_xen_table(entry);
-        if ( ret )
-            return XEN_TABLE_MAP_FAILED;
-    }
-
-    /* The function xen_pt_next_level is never called at the 3rd level */
-    if ( lpae_is_mapping(*entry, level) )
-        return XEN_TABLE_SUPER_PAGE;
-
-    mfn = lpae_get_mfn(*entry);
-
-    xen_unmap_table(*table);
-    *table = xen_map_table(mfn);
-
-    return XEN_TABLE_NORMAL_PAGE;
-}
-
-/* Sanity check of the entry */
-static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
-                               unsigned int flags)
-{
-    /* Sanity check when modifying an entry. */
-    if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
-    {
-        /* We don't allow modifying an invalid entry. */
-        if ( !lpae_is_valid(entry) )
-        {
-            mm_printk("Modifying invalid entry is not allowed.\n");
-            return false;
-        }
-
-        /* We don't allow modifying a table entry */
-        if ( !lpae_is_mapping(entry, level) )
-        {
-            mm_printk("Modifying a table entry is not allowed.\n");
-            return false;
-        }
-
-        /* We don't allow changing memory attributes. */
-        if ( entry.pt.ai != PAGE_AI_MASK(flags) )
-        {
-            mm_printk("Modifying memory attributes is not allowed (0x%x -> 0x%x).\n",
-                      entry.pt.ai, PAGE_AI_MASK(flags));
-            return false;
-        }
-
-        /* We don't allow modifying entry with contiguous bit set. */
-        if ( entry.pt.contig )
-        {
-            mm_printk("Modifying entry with contiguous bit set is not allowed.\n");
-            return false;
-        }
-    }
-    /* Sanity check when inserting a mapping */
-    else if ( flags & _PAGE_PRESENT )
-    {
-        /* We should be here with a valid MFN. */
-        ASSERT(!mfn_eq(mfn, INVALID_MFN));
-
-        /*
-         * We don't allow replacing any valid entry.
-         *
-         * Note that the function xen_pt_update() relies on this
-         * assumption and will skip the TLB flush. The function will need
-         * to be updated if the check is relaxed.
-         */
-        if ( lpae_is_valid(entry) )
-        {
-            if ( lpae_is_mapping(entry, level) )
-                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
-                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
-            else
-                mm_printk("Trying to replace a table with a mapping.\n");
-            return false;
-        }
-    }
-    /* Sanity check when removing a mapping. */
-    else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
-    {
-        /* We should be here with an invalid MFN. */
-        ASSERT(mfn_eq(mfn, INVALID_MFN));
-
-        /* We don't allow removing a table */
-        if ( lpae_is_table(entry, level) )
-        {
-            mm_printk("Removing a table is not allowed.\n");
-            return false;
-        }
-
-        /* We don't allow removing a mapping with contiguous bit set. */
-        if ( entry.pt.contig )
-        {
-            mm_printk("Removing entry with contiguous bit set is not allowed.\n");
-            return false;
-        }
-    }
-    /* Sanity check when populating the page-table. No check so far. */
-    else
-    {
-        ASSERT(flags & _PAGE_POPULATE);
-        /* We should be here with an invalid MFN */
-        ASSERT(mfn_eq(mfn, INVALID_MFN));
-    }
-
-    return true;
-}
-
-/* Update an entry at the level @target. */
-static int xen_pt_update_entry(mfn_t root, unsigned long virt,
-                               mfn_t mfn, unsigned int target,
-                               unsigned int flags)
-{
-    int rc;
-    unsigned int level;
-    lpae_t *table;
-    /*
-     * The intermediate page tables are read-only when the MFN is not valid
-     * and we are not populating page table.
-     * This means we either modify permissions or remove an entry.
-     */
-    bool read_only = mfn_eq(mfn, INVALID_MFN) && !(flags & _PAGE_POPULATE);
-    lpae_t pte, *entry;
-
-    /* convenience aliases */
-    DECLARE_OFFSETS(offsets, (paddr_t)virt);
-
-    /* _PAGE_POPULATE and _PAGE_PRESENT should never be set together. */
-    ASSERT((flags & (_PAGE_POPULATE|_PAGE_PRESENT)) != (_PAGE_POPULATE|_PAGE_PRESENT));
-
-    table = xen_map_table(root);
-    for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
-    {
-        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
-        if ( rc == XEN_TABLE_MAP_FAILED )
-        {
-            /*
-             * We are here because xen_pt_next_level has failed to map
-             * the intermediate page table (e.g the table does not exist
-             * and the pt is read-only). It is a valid case when
-             * removing a mapping as it may not exist in the page table.
-             * In this case, just ignore it.
-             */
-            if ( flags & (_PAGE_PRESENT|_PAGE_POPULATE) )
-            {
-                mm_printk("%s: Unable to map level %u\n", __func__, level);
-                rc = -ENOENT;
-                goto out;
-            }
-            else
-            {
-                rc = 0;
-                goto out;
-            }
-        }
-        else if ( rc != XEN_TABLE_NORMAL_PAGE )
-            break;
-    }
-
-    if ( level != target )
-    {
-        mm_printk("%s: Shattering superpage is not supported\n", __func__);
-        rc = -EOPNOTSUPP;
-        goto out;
-    }
-
-    entry = table + offsets[level];
-
-    rc = -EINVAL;
-    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
-        goto out;
-
-    /* If we are only populating page-table, then we are done. */
-    rc = 0;
-    if ( flags & _PAGE_POPULATE )
-        goto out;
-
-    /* We are removing the page */
-    if ( !(flags & _PAGE_PRESENT) )
-        memset(&pte, 0x00, sizeof(pte));
-    else
-    {
-        /* We are inserting a mapping => Create new pte. */
-        if ( !mfn_eq(mfn, INVALID_MFN) )
-        {
-            pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
-
-            /*
-             * First and second level pages set pte.pt.table = 0, but
-             * third level entries set pte.pt.table = 1.
-             */
-            pte.pt.table = (level == 3);
-        }
-        else /* We are updating the permission => Copy the current pte. */
-            pte = *entry;
-
-        /* Set permission */
-        pte.pt.ro = PAGE_RO_MASK(flags);
-        pte.pt.xn = PAGE_XN_MASK(flags);
-        /* Set contiguous bit */
-        pte.pt.contig = !!(flags & _PAGE_CONTIG);
-    }
-
-    write_pte(entry, pte);
-    /*
-     * No ISB or TLB flush here. They are deferred to xen_pt_update()
-     * as the entry will not be used as part of the mapping update.
-     */
-
-    rc = 0;
-
-out:
-    xen_unmap_table(table);
-
-    return rc;
-}
-
-/* Return the level where mapping should be done */
-static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
-                                unsigned int flags)
-{
-    unsigned int level;
-    unsigned long mask;
-
-    /*
-      * Don't take into account the MFN when removing mapping (i.e
-      * MFN_INVALID) to calculate the correct target order.
-      *
-      * Per the Arm Arm, `vfn` and `mfn` must be both superpage aligned.
-      * They are or-ed together and then checked against the size of
-      * each level.
-      *
-      * `left` is not included and checked separately to allow
-      * superpage mapping even if it is not properly aligned (the
-      * user may have asked to map 2MB + 4k).
-      */
-     mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
-     mask |= vfn;
-
-     /*
-      * Always use level 3 mapping unless the caller request block
-      * mapping.
-      */
-     if ( likely(!(flags & _PAGE_BLOCK)) )
-         level = 3;
-     else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) &&
-               (nr >= BIT(FIRST_ORDER, UL)) )
-         level = 1;
-     else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) &&
-               (nr >= BIT(SECOND_ORDER, UL)) )
-         level = 2;
-     else
-         level = 3;
-
-     return level;
-}
-
-#define XEN_PT_4K_NR_CONTIG 16
-
-/*
- * Check whether the contiguous bit can be set. Return the number of
- * contiguous entry allowed. If not allowed, return 1.
- */
-static unsigned int xen_pt_check_contig(unsigned long vfn, mfn_t mfn,
-                                        unsigned int level, unsigned long left,
-                                        unsigned int flags)
-{
-    unsigned long nr_contig;
-
-    /*
-     * Allow the contiguous bit to set when the caller requests block
-     * mapping.
-     */
-    if ( !(flags & _PAGE_BLOCK) )
-        return 1;
-
-    /*
-     * We don't allow to remove mapping with the contiguous bit set.
-     * So shortcut the logic and directly return 1.
-     */
-    if ( mfn_eq(mfn, INVALID_MFN) )
-        return 1;
-
-    /*
-     * The number of contiguous entries varies depending on the page
-     * granularity used. The logic below assumes 4KB.
-     */
-    BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
-
-    /*
-     * In order to enable the contiguous bit, we should have enough entries
-     * to map left and both the virtual and physical address should be
-     * aligned to the size of 16 translation tables entries.
-     */
-    nr_contig = BIT(XEN_PT_LEVEL_ORDER(level), UL) * XEN_PT_4K_NR_CONTIG;
-
-    if ( (left < nr_contig) || ((mfn_x(mfn) | vfn) & (nr_contig - 1)) )
-        return 1;
-
-    return XEN_PT_4K_NR_CONTIG;
-}
-
-static DEFINE_SPINLOCK(xen_pt_lock);
-
-static int xen_pt_update(unsigned long virt,
-                         mfn_t mfn,
-                         /* const on purpose as it is used for TLB flush */
-                         const unsigned long nr_mfns,
-                         unsigned int flags)
-{
-    int rc = 0;
-    unsigned long vfn = virt >> PAGE_SHIFT;
-    unsigned long left = nr_mfns;
-
-    /*
-     * For arm32, page-tables are different on each CPUs. Yet, they share
-     * some common mappings. It is assumed that only common mappings
-     * will be modified with this function.
-     *
-     * XXX: Add a check.
-     */
-    const mfn_t root = maddr_to_mfn(READ_SYSREG64(TTBR0_EL2));
-
-    /*
-     * The hardware was configured to forbid mapping both writeable and
-     * executable.
-     * When modifying/creating mapping (i.e _PAGE_PRESENT is set),
-     * prevent any update if this happen.
-     */
-    if ( (flags & _PAGE_PRESENT) && !PAGE_RO_MASK(flags) &&
-         !PAGE_XN_MASK(flags) )
-    {
-        mm_printk("Mappings should not be both Writeable and Executable.\n");
-        return -EINVAL;
-    }
-
-    if ( flags & _PAGE_CONTIG )
-    {
-        mm_printk("_PAGE_CONTIG is an internal only flag.\n");
-        return -EINVAL;
-    }
-
-    if ( !IS_ALIGNED(virt, PAGE_SIZE) )
-    {
-        mm_printk("The virtual address is not aligned to the page-size.\n");
-        return -EINVAL;
-    }
-
-    spin_lock(&xen_pt_lock);
-
-    while ( left )
-    {
-        unsigned int order, level, nr_contig, new_flags;
-
-        level = xen_pt_mapping_level(vfn, mfn, left, flags);
-        order = XEN_PT_LEVEL_ORDER(level);
-
-        ASSERT(left >= BIT(order, UL));
-
-        /*
-         * Check if we can set the contiguous mapping and update the
-         * flags accordingly.
-         */
-        nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags);
-        new_flags = flags | ((nr_contig > 1) ? _PAGE_CONTIG : 0);
-
-        for ( ; nr_contig > 0; nr_contig-- )
-        {
-            rc = xen_pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level,
-                                     new_flags);
-            if ( rc )
-                break;
-
-            vfn += 1U << order;
-            if ( !mfn_eq(mfn, INVALID_MFN) )
-                mfn = mfn_add(mfn, 1U << order);
-
-            left -= (1U << order);
-        }
-
-        if ( rc )
-            break;
-    }
-
-    /*
-     * The TLBs flush can be safely skipped when a mapping is inserted
-     * as we don't allow mapping replacement (see xen_pt_check_entry()).
-     * Although we still need an ISB to ensure any DSB in
-     * write_pte() will complete because the mapping may be used soon
-     * after.
-     *
-     * For all the other cases, the TLBs will be flushed unconditionally
-     * even if the mapping has failed. This is because we may have
-     * partially modified the PT. This will prevent any unexpected
-     * behavior afterwards.
-     */
-    if ( !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) )
-        flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
-    else
-        isb();
-
-    spin_unlock(&xen_pt_lock);
-
-    return rc;
-}
-
 int map_pages_to_xen(unsigned long virt,
                      mfn_t mfn,
                      unsigned long nr_mfns,
@@ -1233,11 +119,6 @@ int map_pages_to_xen(unsigned long virt,
     return xen_pt_update(virt, mfn, nr_mfns, flags);
 }
 
-int __init populate_pt_range(unsigned long virt, unsigned long nr_mfns)
-{
-    return xen_pt_update(virt, INVALID_MFN, nr_mfns, _PAGE_POPULATE);
-}
-
 int destroy_xen_mappings(unsigned long s, unsigned long e)
 {
     ASSERT(IS_ALIGNED(s, PAGE_SIZE));
diff --git a/xen/arch/arm/mmu/Makefile b/xen/arch/arm/mmu/Makefile
new file mode 100644
index 0000000000..b18cec4836
--- /dev/null
+++ b/xen/arch/arm/mmu/Makefile
@@ -0,0 +1 @@
+obj-y += mm.o
diff --git a/xen/arch/arm/mmu/mm.c b/xen/arch/arm/mmu/mm.c
new file mode 100644
index 0000000000..b70982e9d6
--- /dev/null
+++ b/xen/arch/arm/mmu/mm.c
@@ -0,0 +1,1146 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * xen/arch/arm/mmu/mm.c
+ *
+ * MMU code for an ARMv7-A with virt extensions.
+ *
+ */
+
+#include <xen/domain_page.h>
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/mm.h>
+#include <xen/pmap.h>
+#include <xen/sched.h>
+#include <xen/sizes.h>
+#include <xen/types.h>
+#include <xen/vmap.h>
+
+#include <asm/fixmap.h>
+#include <asm/setup.h>
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_virt
+#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))
+
+#ifdef NDEBUG
+static inline void
+__attribute__ ((__format__ (__printf__, 1, 2)))
+mm_printk(const char *fmt, ...) {}
+#else
+#define mm_printk(fmt, args...)             \
+    do                                      \
+    {                                       \
+        dprintk(XENLOG_ERR, fmt, ## args);  \
+        WARN();                             \
+    } while (0)
+#endif
+
+/* Static start-of-day pagetables that we use before the allocators
+ * are up. These are used by all CPUs during bringup before switching
+ * to the CPUs own pagetables.
+ *
+ * These pagetables have a very simple structure. They include:
+ *  - XEN_VIRT_SIZE worth of L3 mappings of xen at XEN_VIRT_START, boot_first
+ *    and boot_second are used to populate the tables down to boot_third
+ *    which contains the actual mapping.
+ *  - a 1:1 mapping of xen at its current physical address. This uses a
+ *    section mapping at whichever of boot_{pgtable,first,second}
+ *    covers that physical address.
+ *
+ * For the boot CPU these mappings point to the address where Xen was
+ * loaded by the bootloader. For secondary CPUs they point to the
+ * relocated copy of Xen for the benefit of secondary CPUs.
+ *
+ * In addition to the above for the boot CPU the device-tree is
+ * initially mapped in the boot misc slot. This mapping is not present
+ * for secondary CPUs.
+ *
+ * Finally, if EARLY_PRINTK is enabled then xen_fixmap will be mapped
+ * by the CPU once it has moved off the 1:1 mapping.
+ */
+DEFINE_BOOT_PAGE_TABLE(boot_pgtable);
+#ifdef CONFIG_ARM_64
+DEFINE_BOOT_PAGE_TABLE(boot_first);
+DEFINE_BOOT_PAGE_TABLE(boot_first_id);
+#endif
+DEFINE_BOOT_PAGE_TABLE(boot_second_id);
+DEFINE_BOOT_PAGE_TABLE(boot_third_id);
+DEFINE_BOOT_PAGE_TABLE(boot_second);
+DEFINE_BOOT_PAGE_TABLES(boot_third, XEN_NR_ENTRIES(2));
+
+/* Main runtime page tables */
+
+/*
+ * For arm32 xen_pgtable are per-PCPU and are allocated before
+ * bringing up each CPU. For arm64 xen_pgtable is common to all PCPUs.
+ *
+ * xen_second, xen_fixmap and xen_xenmap are always shared between all
+ * PCPUs.
+ */
+
+#ifdef CONFIG_ARM_64
+#define HYP_PT_ROOT_LEVEL 0
+DEFINE_PAGE_TABLE(xen_pgtable);
+static DEFINE_PAGE_TABLE(xen_first);
+#define THIS_CPU_PGTABLE xen_pgtable
+#else
+#define HYP_PT_ROOT_LEVEL 1
+/* Per-CPU pagetable pages */
+/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
+DEFINE_PER_CPU(lpae_t *, xen_pgtable);
+#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
+/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
+static DEFINE_PAGE_TABLE(cpu0_pgtable);
+#endif
+
+/* Common pagetable leaves */
+/* Second level page table used to cover Xen virtual address space */
+static DEFINE_PAGE_TABLE(xen_second);
+/* Third level page table used for fixmap */
+DEFINE_BOOT_PAGE_TABLE(xen_fixmap);
+/*
+ * Third level page table used to map Xen itself with the XN bit set
+ * as appropriate.
+ */
+static DEFINE_PAGE_TABLES(xen_xenmap, XEN_NR_ENTRIES(2));
+
+/* Non-boot CPUs use this to find the correct pagetables. */
+uint64_t init_ttbr;
+
+static paddr_t phys_offset;
+
+/* Limits of the Xen heap */
+mfn_t directmap_mfn_start __read_mostly = INVALID_MFN_INITIALIZER;
+mfn_t directmap_mfn_end __read_mostly;
+vaddr_t directmap_virt_end __read_mostly;
+#ifdef CONFIG_ARM_64
+vaddr_t directmap_virt_start __read_mostly;
+unsigned long directmap_base_pdx __read_mostly;
+#endif
+
+/* Checking VA memory layout alignment. */
+static void __init __maybe_unused build_assertions(void)
+{
+    /* 2MB aligned regions */
+    BUILD_BUG_ON(XEN_VIRT_START & ~SECOND_MASK);
+    BUILD_BUG_ON(FIXMAP_ADDR(0) & ~SECOND_MASK);
+    /* 1GB aligned regions */
+#ifdef CONFIG_ARM_32
+    BUILD_BUG_ON(XENHEAP_VIRT_START & ~FIRST_MASK);
+#else
+    BUILD_BUG_ON(DIRECTMAP_VIRT_START & ~FIRST_MASK);
+#endif
+    /* Page table structure constraints */
+#ifdef CONFIG_ARM_64
+    /*
+     * The first few slots of the L0 table is reserved for the identity
+     * mapping. Check that none of the other regions are overlapping
+     * with it.
+     */
+#define CHECK_OVERLAP_WITH_IDMAP(virt) \
+    BUILD_BUG_ON(zeroeth_table_offset(virt) < IDENTITY_MAPPING_AREA_NR_L0)
+
+    CHECK_OVERLAP_WITH_IDMAP(XEN_VIRT_START);
+    CHECK_OVERLAP_WITH_IDMAP(VMAP_VIRT_START);
+    CHECK_OVERLAP_WITH_IDMAP(FRAMETABLE_VIRT_START);
+    CHECK_OVERLAP_WITH_IDMAP(DIRECTMAP_VIRT_START);
+#undef CHECK_OVERLAP_WITH_IDMAP
+#endif
+    BUILD_BUG_ON(first_table_offset(XEN_VIRT_START));
+#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
+    BUILD_BUG_ON(DOMHEAP_VIRT_START & ~FIRST_MASK);
+#endif
+    /*
+     * The boot code expects the regions XEN_VIRT_START, FIXMAP_ADDR(0),
+     * BOOT_FDT_VIRT_START to use the same 0th (arm64 only) and 1st
+     * slot in the page tables.
+     */
+#define CHECK_SAME_SLOT(level, virt1, virt2) \
+    BUILD_BUG_ON(level##_table_offset(virt1) != level##_table_offset(virt2))
+
+#define CHECK_DIFFERENT_SLOT(level, virt1, virt2) \
+    BUILD_BUG_ON(level##_table_offset(virt1) == level##_table_offset(virt2))
+
+#ifdef CONFIG_ARM_64
+    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, FIXMAP_ADDR(0));
+    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, BOOT_FDT_VIRT_START);
+#endif
+    CHECK_SAME_SLOT(first, XEN_VIRT_START, FIXMAP_ADDR(0));
+    CHECK_SAME_SLOT(first, XEN_VIRT_START, BOOT_FDT_VIRT_START);
+
+    /*
+     * For arm32, the temporary mapping will re-use the domheap
+     * first slot and the second slots will match.
+     */
+#ifdef CONFIG_ARM_32
+    CHECK_SAME_SLOT(first, TEMPORARY_XEN_VIRT_START, DOMHEAP_VIRT_START);
+    CHECK_DIFFERENT_SLOT(first, XEN_VIRT_START, TEMPORARY_XEN_VIRT_START);
+    CHECK_SAME_SLOT(second, XEN_VIRT_START, TEMPORARY_XEN_VIRT_START);
+#endif
+
+#undef CHECK_SAME_SLOT
+#undef CHECK_DIFFERENT_SLOT
+}
+
+static lpae_t *xen_map_table(mfn_t mfn)
+{
+    /*
+     * During early boot, map_domain_page() may be unusable. Use the
+     * PMAP to map temporarily a page-table.
+     */
+    if ( system_state == SYS_STATE_early_boot )
+        return pmap_map(mfn);
+
+    return map_domain_page(mfn);
+}
+
+static void xen_unmap_table(const lpae_t *table)
+{
+    /*
+     * During early boot, xen_map_table() will not use map_domain_page()
+     * but the PMAP.
+     */
+    if ( system_state == SYS_STATE_early_boot )
+        pmap_unmap(table);
+    else
+        unmap_domain_page(table);
+}
+
+void dump_pt_walk(paddr_t ttbr, paddr_t addr,
+                  unsigned int root_level,
+                  unsigned int nr_root_tables)
+{
+    static const char *level_strs[4] = { "0TH", "1ST", "2ND", "3RD" };
+    const mfn_t root_mfn = maddr_to_mfn(ttbr);
+    DECLARE_OFFSETS(offsets, addr);
+    lpae_t pte, *mapping;
+    unsigned int level, root_table;
+
+#ifdef CONFIG_ARM_32
+    BUG_ON(root_level < 1);
+#endif
+    BUG_ON(root_level > 3);
+
+    if ( nr_root_tables > 1 )
+    {
+        /*
+         * Concatenated root-level tables. The table number will be
+         * the offset at the previous level. It is not possible to
+         * concatenate a level-0 root.
+         */
+        BUG_ON(root_level == 0);
+        root_table = offsets[root_level - 1];
+        printk("Using concatenated root table %u\n", root_table);
+        if ( root_table >= nr_root_tables )
+        {
+            printk("Invalid root table offset\n");
+            return;
+        }
+    }
+    else
+        root_table = 0;
+
+    mapping = xen_map_table(mfn_add(root_mfn, root_table));
+
+    for ( level = root_level; ; level++ )
+    {
+        if ( offsets[level] > XEN_PT_LPAE_ENTRIES )
+            break;
+
+        pte = mapping[offsets[level]];
+
+        printk("%s[0x%03x] = 0x%"PRIx64"\n",
+               level_strs[level], offsets[level], pte.bits);
+
+        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
+            break;
+
+        /* For next iteration */
+        xen_unmap_table(mapping);
+        mapping = xen_map_table(lpae_get_mfn(pte));
+    }
+
+    xen_unmap_table(mapping);
+}
+
+void dump_hyp_walk(vaddr_t addr)
+{
+    uint64_t ttbr = READ_SYSREG64(TTBR0_EL2);
+
+    printk("Walking Hypervisor VA 0x%"PRIvaddr" "
+           "on CPU%d via TTBR 0x%016"PRIx64"\n",
+           addr, smp_processor_id(), ttbr);
+
+    dump_pt_walk(ttbr, addr, HYP_PT_ROOT_LEVEL, 1);
+}
+
+lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr)
+{
+    lpae_t e = (lpae_t) {
+        .pt = {
+            .valid = 1,           /* Mappings are present */
+            .table = 0,           /* Set to 1 for links and 4k maps */
+            .ai = attr,
+            .ns = 1,              /* Hyp mode is in the non-secure world */
+            .up = 1,              /* See below */
+            .ro = 0,              /* Assume read-write */
+            .af = 1,              /* No need for access tracking */
+            .ng = 1,              /* Makes TLB flushes easier */
+            .contig = 0,          /* Assume non-contiguous */
+            .xn = 1,              /* No need to execute outside .text */
+            .avail = 0,           /* Reference count for domheap mapping */
+        }};
+    /*
+     * For EL2 stage-1 page table, up (aka AP[1]) is RES1 as the translation
+     * regime applies to only one exception level (see D4.4.4 and G4.6.1
+     * in ARM DDI 0487B.a). If this changes, remember to update the
+     * hard-coded values in head.S too.
+     */
+
+    switch ( attr )
+    {
+    case MT_NORMAL_NC:
+        /*
+         * ARM ARM: Overlaying the shareability attribute (DDI
+         * 0406C.b B3-1376 to 1377)
+         *
+         * A memory region with a resultant memory type attribute of Normal,
+         * and a resultant cacheability attribute of Inner Non-cacheable,
+         * Outer Non-cacheable, must have a resultant shareability attribute
+         * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
+         *
+         * On ARMv8 sharability is ignored and explicitly treated as Outer
+         * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
+         */
+        e.pt.sh = LPAE_SH_OUTER;
+        break;
+    case MT_DEVICE_nGnRnE:
+    case MT_DEVICE_nGnRE:
+        /*
+         * Shareability is ignored for non-Normal memory, Outer is as
+         * good as anything.
+         *
+         * On ARMv8 sharability is ignored and explicitly treated as Outer
+         * Shareable for any device memory type.
+         */
+        e.pt.sh = LPAE_SH_OUTER;
+        break;
+    default:
+        e.pt.sh = LPAE_SH_INNER;  /* Xen mappings are SMP coherent */
+        break;
+    }
+
+    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
+
+    lpae_set_mfn(e, mfn);
+
+    return e;
+}
+
+/* Map a 4k page in a fixmap entry */
+void set_fixmap(unsigned int map, mfn_t mfn, unsigned int flags)
+{
+    int res;
+
+    res = map_pages_to_xen(FIXMAP_ADDR(map), mfn, 1, flags);
+    BUG_ON(res != 0);
+}
+
+/* Remove a mapping from a fixmap entry */
+void clear_fixmap(unsigned int map)
+{
+    int res;
+
+    res = destroy_xen_mappings(FIXMAP_ADDR(map), FIXMAP_ADDR(map) + PAGE_SIZE);
+    BUG_ON(res != 0);
+}
+
+lpae_t pte_of_xenaddr(vaddr_t va)
+{
+    paddr_t ma = va + phys_offset;
+
+    return mfn_to_xen_entry(maddr_to_mfn(ma), MT_NORMAL);
+}
+
+void __init remove_early_mappings(void)
+{
+    int rc;
+
+    /* destroy the _PAGE_BLOCK mapping */
+    rc = modify_xen_mappings(BOOT_FDT_VIRT_START,
+                             BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE,
+                             _PAGE_BLOCK);
+    BUG_ON(rc);
+}
+
+/*
+ * After boot, Xen page-tables should not contain mapping that are both
+ * Writable and eXecutables.
+ *
+ * This should be called on each CPU to enforce the policy.
+ */
+static void xen_pt_enforce_wnx(void)
+{
+    WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
+    /*
+     * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
+     * before flushing the TLBs.
+     */
+    isb();
+    flush_xen_tlb_local();
+}
+
+/* Clear a translation table and clean & invalidate the cache */
+static void clear_table(void *table)
+{
+    clear_page(table);
+    clean_and_invalidate_dcache_va_range(table, PAGE_SIZE);
+}
+
+/* Boot-time pagetable setup.
+ * Changes here may need matching changes in head.S */
+void __init setup_pagetables(unsigned long boot_phys_offset)
+{
+    uint64_t ttbr;
+    lpae_t pte, *p;
+    int i;
+
+    phys_offset = boot_phys_offset;
+
+    arch_setup_page_tables();
+
+#ifdef CONFIG_ARM_64
+    pte = pte_of_xenaddr((uintptr_t)xen_first);
+    pte.pt.table = 1;
+    pte.pt.xn = 0;
+    xen_pgtable[zeroeth_table_offset(XEN_VIRT_START)] = pte;
+
+    p = (void *) xen_first;
+#else
+    p = (void *) cpu0_pgtable;
+#endif
+
+    /* Map xen second level page-table */
+    p[0] = pte_of_xenaddr((uintptr_t)(xen_second));
+    p[0].pt.table = 1;
+    p[0].pt.xn = 0;
+
+    /* Break up the Xen mapping into pages and protect them separately. */
+    for ( i = 0; i < XEN_NR_ENTRIES(3); i++ )
+    {
+        vaddr_t va = XEN_VIRT_START + (i << PAGE_SHIFT);
+
+        if ( !is_kernel(va) )
+            break;
+        pte = pte_of_xenaddr(va);
+        pte.pt.table = 1; /* third level mappings always have this bit set */
+        if ( is_kernel_text(va) || is_kernel_inittext(va) )
+        {
+            pte.pt.xn = 0;
+            pte.pt.ro = 1;
+        }
+        if ( is_kernel_rodata(va) )
+            pte.pt.ro = 1;
+        xen_xenmap[i] = pte;
+    }
+
+    /* Initialise xen second level entries ... */
+    /* ... Xen's text etc */
+    for ( i = 0; i < XEN_NR_ENTRIES(2); i++ )
+    {
+        vaddr_t va = XEN_VIRT_START + (i << XEN_PT_LEVEL_SHIFT(2));
+
+        pte = pte_of_xenaddr((vaddr_t)(xen_xenmap + i * XEN_PT_LPAE_ENTRIES));
+        pte.pt.table = 1;
+        xen_second[second_table_offset(va)] = pte;
+    }
+
+    /* ... Fixmap */
+    pte = pte_of_xenaddr((vaddr_t)xen_fixmap);
+    pte.pt.table = 1;
+    xen_second[second_table_offset(FIXMAP_ADDR(0))] = pte;
+
+#ifdef CONFIG_ARM_64
+    ttbr = (uintptr_t) xen_pgtable + phys_offset;
+#else
+    ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
+#endif
+
+    switch_ttbr(ttbr);
+
+    xen_pt_enforce_wnx();
+
+#ifdef CONFIG_ARM_32
+    per_cpu(xen_pgtable, 0) = cpu0_pgtable;
+#endif
+}
+
+static void clear_boot_pagetables(void)
+{
+    /*
+     * Clear the copy of the boot pagetables. Each secondary CPU
+     * rebuilds these itself (see head.S).
+     */
+    clear_table(boot_pgtable);
+#ifdef CONFIG_ARM_64
+    clear_table(boot_first);
+    clear_table(boot_first_id);
+#endif
+    clear_table(boot_second);
+    clear_table(boot_third);
+}
+
+#ifdef CONFIG_ARM_64
+int init_secondary_pagetables(int cpu)
+{
+    clear_boot_pagetables();
+
+    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
+     * pagetables, but rewrite it each time for consistency with 32 bit. */
+    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
+    clean_dcache(init_ttbr);
+    return 0;
+}
+#else
+int init_secondary_pagetables(int cpu)
+{
+    lpae_t *first;
+
+    first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level trie */
+
+    if ( !first )
+    {
+        printk("CPU%u: Unable to allocate the first page-table\n", cpu);
+        return -ENOMEM;
+    }
+
+    /* Initialise root pagetable from root of boot tables */
+    memcpy(first, cpu0_pgtable, PAGE_SIZE);
+    per_cpu(xen_pgtable, cpu) = first;
+
+    if ( !init_domheap_mappings(cpu) )
+    {
+        printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
+        per_cpu(xen_pgtable, cpu) = NULL;
+        free_xenheap_page(first);
+        return -ENOMEM;
+    }
+
+    clear_boot_pagetables();
+
+    /* Set init_ttbr for this CPU coming up */
+    init_ttbr = __pa(first);
+    clean_dcache(init_ttbr);
+
+    return 0;
+}
+#endif
+
+/* MMU setup for secondary CPUS (which already have paging enabled) */
+void mmu_init_secondary_cpu(void)
+{
+    xen_pt_enforce_wnx();
+}
+
+#ifdef CONFIG_ARM_32
+/*
+ * Set up the direct-mapped xenheap:
+ * up to 1GB of contiguous, always-mapped memory.
+ */
+void __init setup_directmap_mappings(unsigned long base_mfn,
+                                     unsigned long nr_mfns)
+{
+    int rc;
+
+    rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns,
+                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+    if ( rc )
+        panic("Unable to setup the directmap mappings.\n");
+
+    /* Record where the directmap is, for translation routines. */
+    directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
+}
+#else /* CONFIG_ARM_64 */
+/* Map the region in the directmap area. */
+void __init setup_directmap_mappings(unsigned long base_mfn,
+                                     unsigned long nr_mfns)
+{
+    int rc;
+
+    /* First call sets the directmap physical and virtual offset. */
+    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
+    {
+        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
+
+        directmap_mfn_start = _mfn(base_mfn);
+        directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
+        /*
+         * The base address may not be aligned to the first level
+         * size (e.g. 1GB when using 4KB pages). This would prevent
+         * superpage mappings for all the regions because the virtual
+         * address and machine address should both be suitably aligned.
+         *
+         * Prevent that by offsetting the start of the directmap virtual
+         * address.
+         */
+        directmap_virt_start = DIRECTMAP_VIRT_START +
+            (base_mfn - mfn_gb) * PAGE_SIZE;
+    }
+
+    if ( base_mfn < mfn_x(directmap_mfn_start) )
+        panic("cannot add directmap mapping at %lx below heap start %lx\n",
+              base_mfn, mfn_x(directmap_mfn_start));
+
+    rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
+                          _mfn(base_mfn), nr_mfns,
+                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+    if ( rc )
+        panic("Unable to setup the directmap mappings.\n");
+}
+#endif
+
+/* Map a frame table to cover physical addresses ps through pe */
+void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
+{
+    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
+                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
+    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
+    mfn_t base_mfn;
+    const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : MB(32);
+    int rc;
+
+    /*
+     * The size of paddr_t should be sufficient for the complete range of
+     * physical address.
+     */
+    BUILD_BUG_ON((sizeof(paddr_t) * BITS_PER_BYTE) < PADDR_BITS);
+    BUILD_BUG_ON(sizeof(struct page_info) != PAGE_INFO_SIZE);
+
+    if ( frametable_size > FRAMETABLE_SIZE )
+        panic("The frametable cannot cover the physical region %#"PRIpaddr" - %#"PRIpaddr"\n",
+              ps, pe);
+
+    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
+    /* Round up to 2M or 32M boundary, as appropriate. */
+    frametable_size = ROUNDUP(frametable_size, mapping_size);
+    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
+
+    rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
+                          frametable_size >> PAGE_SHIFT,
+                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+    if ( rc )
+        panic("Unable to setup the frametable mappings.\n");
+
+    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
+    memset(&frame_table[nr_pdxs], -1,
+           frametable_size - (nr_pdxs * sizeof(struct page_info)));
+
+    frametable_virt_end = FRAMETABLE_VIRT_START + (nr_pdxs * sizeof(struct page_info));
+}
+
+void *__init arch_vmap_virt_end(void)
+{
+    return (void *)(VMAP_VIRT_START + VMAP_VIRT_SIZE);
+}
+
+/*
+ * This function should only be used to remap device address ranges
+ * TODO: add a check to verify this assumption
+ */
+void *ioremap_attr(paddr_t start, size_t len, unsigned int attributes)
+{
+    mfn_t mfn = _mfn(PFN_DOWN(start));
+    unsigned int offs = start & (PAGE_SIZE - 1);
+    unsigned int nr = PFN_UP(offs + len);
+    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
+
+    if ( ptr == NULL )
+        return NULL;
+
+    return ptr + offs;
+}
+
+static int create_xen_table(lpae_t *entry)
+{
+    mfn_t mfn;
+    void *p;
+    lpae_t pte;
+
+    if ( system_state != SYS_STATE_early_boot )
+    {
+        struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+        if ( pg == NULL )
+            return -ENOMEM;
+
+        mfn = page_to_mfn(pg);
+    }
+    else
+        mfn = alloc_boot_pages(1, 1);
+
+    p = xen_map_table(mfn);
+    clear_page(p);
+    xen_unmap_table(p);
+
+    pte = mfn_to_xen_entry(mfn, MT_NORMAL);
+    pte.pt.table = 1;
+    write_pte(entry, pte);
+    /*
+     * No ISB here. It is deferred to xen_pt_update() as the new table
+     * will not be used for hardware translation table access as part of
+     * the mapping update.
+     */
+
+    return 0;
+}
+
+#define XEN_TABLE_MAP_FAILED 0
+#define XEN_TABLE_SUPER_PAGE 1
+#define XEN_TABLE_NORMAL_PAGE 2
+
+/*
+ * Take the currently mapped table, find the corresponding entry,
+ * and map the next table, if available.
+ *
+ * The read_only parameters indicates whether intermediate tables should
+ * be allocated when not present.
+ *
+ * Return values:
+ *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
+ *  was empty, or allocating a new page failed.
+ *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
+ *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
+ */
+static int xen_pt_next_level(bool read_only, unsigned int level,
+                             lpae_t **table, unsigned int offset)
+{
+    lpae_t *entry;
+    int ret;
+    mfn_t mfn;
+
+    entry = *table + offset;
+
+    if ( !lpae_is_valid(*entry) )
+    {
+        if ( read_only )
+            return XEN_TABLE_MAP_FAILED;
+
+        ret = create_xen_table(entry);
+        if ( ret )
+            return XEN_TABLE_MAP_FAILED;
+    }
+
+    /* The function xen_pt_next_level is never called at the 3rd level */
+    if ( lpae_is_mapping(*entry, level) )
+        return XEN_TABLE_SUPER_PAGE;
+
+    mfn = lpae_get_mfn(*entry);
+
+    xen_unmap_table(*table);
+    *table = xen_map_table(mfn);
+
+    return XEN_TABLE_NORMAL_PAGE;
+}
+
+/* Sanity check of the entry */
+static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
+                               unsigned int flags)
+{
+    /* Sanity check when modifying an entry. */
+    if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
+    {
+        /* We don't allow modifying an invalid entry. */
+        if ( !lpae_is_valid(entry) )
+        {
+            mm_printk("Modifying invalid entry is not allowed.\n");
+            return false;
+        }
+
+        /* We don't allow modifying a table entry */
+        if ( !lpae_is_mapping(entry, level) )
+        {
+            mm_printk("Modifying a table entry is not allowed.\n");
+            return false;
+        }
+
+        /* We don't allow changing memory attributes. */
+        if ( entry.pt.ai != PAGE_AI_MASK(flags) )
+        {
+            mm_printk("Modifying memory attributes is not allowed (0x%x -> 0x%x).\n",
+                      entry.pt.ai, PAGE_AI_MASK(flags));
+            return false;
+        }
+
+        /* We don't allow modifying entry with contiguous bit set. */
+        if ( entry.pt.contig )
+        {
+            mm_printk("Modifying entry with contiguous bit set is not allowed.\n");
+            return false;
+        }
+    }
+    /* Sanity check when inserting a mapping */
+    else if ( flags & _PAGE_PRESENT )
+    {
+        /* We should be here with a valid MFN. */
+        ASSERT(!mfn_eq(mfn, INVALID_MFN));
+
+        /*
+         * We don't allow replacing any valid entry.
+         *
+         * Note that the function xen_pt_update() relies on this
+         * assumption and will skip the TLB flush. The function will need
+         * to be updated if the check is relaxed.
+         */
+        if ( lpae_is_valid(entry) )
+        {
+            if ( lpae_is_mapping(entry, level) )
+                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
+                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
+            else
+                mm_printk("Trying to replace a table with a mapping.\n");
+            return false;
+        }
+    }
+    /* Sanity check when removing a mapping. */
+    else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
+    {
+        /* We should be here with an invalid MFN. */
+        ASSERT(mfn_eq(mfn, INVALID_MFN));
+
+        /* We don't allow removing a table */
+        if ( lpae_is_table(entry, level) )
+        {
+            mm_printk("Removing a table is not allowed.\n");
+            return false;
+        }
+
+        /* We don't allow removing a mapping with contiguous bit set. */
+        if ( entry.pt.contig )
+        {
+            mm_printk("Removing entry with contiguous bit set is not allowed.\n");
+            return false;
+        }
+    }
+    /* Sanity check when populating the page-table. No check so far. */
+    else
+    {
+        ASSERT(flags & _PAGE_POPULATE);
+        /* We should be here with an invalid MFN */
+        ASSERT(mfn_eq(mfn, INVALID_MFN));
+    }
+
+    return true;
+}
+
+/* Update an entry at the level @target. */
+static int xen_pt_update_entry(mfn_t root, unsigned long virt,
+                               mfn_t mfn, unsigned int target,
+                               unsigned int flags)
+{
+    int rc;
+    unsigned int level;
+    lpae_t *table;
+    /*
+     * The intermediate page tables are read-only when the MFN is not valid
+     * and we are not populating page table.
+     * This means we either modify permissions or remove an entry.
+     */
+    bool read_only = mfn_eq(mfn, INVALID_MFN) && !(flags & _PAGE_POPULATE);
+    lpae_t pte, *entry;
+
+    /* convenience aliases */
+    DECLARE_OFFSETS(offsets, (paddr_t)virt);
+
+    /* _PAGE_POPULATE and _PAGE_PRESENT should never be set together. */
+    ASSERT((flags & (_PAGE_POPULATE|_PAGE_PRESENT)) != (_PAGE_POPULATE|_PAGE_PRESENT));
+
+    table = xen_map_table(root);
+    for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
+    {
+        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
+        if ( rc == XEN_TABLE_MAP_FAILED )
+        {
+            /*
+             * We are here because xen_pt_next_level has failed to map
+             * the intermediate page table (e.g the table does not exist
+             * and the pt is read-only). It is a valid case when
+             * removing a mapping as it may not exist in the page table.
+             * In this case, just ignore it.
+             */
+            if ( flags & (_PAGE_PRESENT|_PAGE_POPULATE) )
+            {
+                mm_printk("%s: Unable to map level %u\n", __func__, level);
+                rc = -ENOENT;
+                goto out;
+            }
+            else
+            {
+                rc = 0;
+                goto out;
+            }
+        }
+        else if ( rc != XEN_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    if ( level != target )
+    {
+        mm_printk("%s: Shattering superpage is not supported\n", __func__);
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    entry = table + offsets[level];
+
+    rc = -EINVAL;
+    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
+        goto out;
+
+    /* If we are only populating page-table, then we are done. */
+    rc = 0;
+    if ( flags & _PAGE_POPULATE )
+        goto out;
+
+    /* We are removing the page */
+    if ( !(flags & _PAGE_PRESENT) )
+        memset(&pte, 0x00, sizeof(pte));
+    else
+    {
+        /* We are inserting a mapping => Create new pte. */
+        if ( !mfn_eq(mfn, INVALID_MFN) )
+        {
+            pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
+
+            /*
+             * First and second level pages set pte.pt.table = 0, but
+             * third level entries set pte.pt.table = 1.
+             */
+            pte.pt.table = (level == 3);
+        }
+        else /* We are updating the permission => Copy the current pte. */
+            pte = *entry;
+
+        /* Set permission */
+        pte.pt.ro = PAGE_RO_MASK(flags);
+        pte.pt.xn = PAGE_XN_MASK(flags);
+        /* Set contiguous bit */
+        pte.pt.contig = !!(flags & _PAGE_CONTIG);
+    }
+
+    write_pte(entry, pte);
+    /*
+     * No ISB or TLB flush here. They are deferred to xen_pt_update()
+     * as the entry will not be used as part of the mapping update.
+     */
+
+    rc = 0;
+
+out:
+    xen_unmap_table(table);
+
+    return rc;
+}
+
+/* Return the level where mapping should be done */
+static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
+                                unsigned int flags)
+{
+    unsigned int level;
+    unsigned long mask;
+
+    /*
+      * Don't take into account the MFN when removing mapping (i.e
+      * MFN_INVALID) to calculate the correct target order.
+      *
+      * Per the Arm Arm, `vfn` and `mfn` must be both superpage aligned.
+      * They are or-ed together and then checked against the size of
+      * each level.
+      *
+      * `left` is not included and checked separately to allow
+      * superpage mapping even if it is not properly aligned (the
+      * user may have asked to map 2MB + 4k).
+      */
+     mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
+     mask |= vfn;
+
+     /*
+      * Always use level 3 mapping unless the caller request block
+      * mapping.
+      */
+     if ( likely(!(flags & _PAGE_BLOCK)) )
+         level = 3;
+     else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) &&
+               (nr >= BIT(FIRST_ORDER, UL)) )
+         level = 1;
+     else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) &&
+               (nr >= BIT(SECOND_ORDER, UL)) )
+         level = 2;
+     else
+         level = 3;
+
+     return level;
+}
+
+#define XEN_PT_4K_NR_CONTIG 16
+
+/*
+ * Check whether the contiguous bit can be set. Return the number of
+ * contiguous entry allowed. If not allowed, return 1.
+ */
+static unsigned int xen_pt_check_contig(unsigned long vfn, mfn_t mfn,
+                                        unsigned int level, unsigned long left,
+                                        unsigned int flags)
+{
+    unsigned long nr_contig;
+
+    /*
+     * Allow the contiguous bit to set when the caller requests block
+     * mapping.
+     */
+    if ( !(flags & _PAGE_BLOCK) )
+        return 1;
+
+    /*
+     * We don't allow to remove mapping with the contiguous bit set.
+     * So shortcut the logic and directly return 1.
+     */
+    if ( mfn_eq(mfn, INVALID_MFN) )
+        return 1;
+
+    /*
+     * The number of contiguous entries varies depending on the page
+     * granularity used. The logic below assumes 4KB.
+     */
+    BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
+
+    /*
+     * In order to enable the contiguous bit, we should have enough entries
+     * to map left and both the virtual and physical address should be
+     * aligned to the size of 16 translation tables entries.
+     */
+    nr_contig = BIT(XEN_PT_LEVEL_ORDER(level), UL) * XEN_PT_4K_NR_CONTIG;
+
+    if ( (left < nr_contig) || ((mfn_x(mfn) | vfn) & (nr_contig - 1)) )
+        return 1;
+
+    return XEN_PT_4K_NR_CONTIG;
+}
+
+static DEFINE_SPINLOCK(xen_pt_lock);
+
+int xen_pt_update(unsigned long virt,
+                  mfn_t mfn,
+                  /* const on purpose as it is used for TLB flush */
+                  const unsigned long nr_mfns,
+                  unsigned int flags)
+{
+    int rc = 0;
+    unsigned long vfn = virt >> PAGE_SHIFT;
+    unsigned long left = nr_mfns;
+
+    /*
+     * For arm32, page-tables are different on each CPUs. Yet, they share
+     * some common mappings. It is assumed that only common mappings
+     * will be modified with this function.
+     *
+     * XXX: Add a check.
+     */
+    const mfn_t root = maddr_to_mfn(READ_SYSREG64(TTBR0_EL2));
+
+    /*
+     * The hardware was configured to forbid mapping both writeable and
+     * executable.
+     * When modifying/creating mapping (i.e _PAGE_PRESENT is set),
+     * prevent any update if this happen.
+     */
+    if ( (flags & _PAGE_PRESENT) && !PAGE_RO_MASK(flags) &&
+         !PAGE_XN_MASK(flags) )
+    {
+        mm_printk("Mappings should not be both Writeable and Executable.\n");
+        return -EINVAL;
+    }
+
+    if ( flags & _PAGE_CONTIG )
+    {
+        mm_printk("_PAGE_CONTIG is an internal only flag.\n");
+        return -EINVAL;
+    }
+
+    if ( !IS_ALIGNED(virt, PAGE_SIZE) )
+    {
+        mm_printk("The virtual address is not aligned to the page-size.\n");
+        return -EINVAL;
+    }
+
+    spin_lock(&xen_pt_lock);
+
+    while ( left )
+    {
+        unsigned int order, level, nr_contig, new_flags;
+
+        level = xen_pt_mapping_level(vfn, mfn, left, flags);
+        order = XEN_PT_LEVEL_ORDER(level);
+
+        ASSERT(left >= BIT(order, UL));
+
+        /*
+         * Check if we can set the contiguous mapping and update the
+         * flags accordingly.
+         */
+        nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags);
+        new_flags = flags | ((nr_contig > 1) ? _PAGE_CONTIG : 0);
+
+        for ( ; nr_contig > 0; nr_contig-- )
+        {
+            rc = xen_pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level,
+                                     new_flags);
+            if ( rc )
+                break;
+
+            vfn += 1U << order;
+            if ( !mfn_eq(mfn, INVALID_MFN) )
+                mfn = mfn_add(mfn, 1U << order);
+
+            left -= (1U << order);
+        }
+
+        if ( rc )
+            break;
+    }
+
+    /*
+     * The TLBs flush can be safely skipped when a mapping is inserted
+     * as we don't allow mapping replacement (see xen_pt_check_entry()).
+     * Although we still need an ISB to ensure any DSB in
+     * write_pte() will complete because the mapping may be used soon
+     * after.
+     *
+     * For all the other cases, the TLBs will be flushed unconditionally
+     * even if the mapping has failed. This is because we may have
+     * partially modified the PT. This will prevent any unexpected
+     * behavior afterwards.
+     */
+    if ( !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) )
+        flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
+    else
+        isb();
+
+    spin_unlock(&xen_pt_lock);
+
+    return rc;
+}
+
+int __init populate_pt_range(unsigned long virt, unsigned long nr_mfns)
+{
+    return xen_pt_update(virt, INVALID_MFN, nr_mfns, _PAGE_POPULATE);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (6 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 07/13] xen/arm: Extract MMU-specific code Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21 18:14   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability Henry Wang
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk, Henry Wang

From: Penny Zheng <penny.zheng@arm.com>

fixmap and pmap are MMU-specific features, so fold them to MMU system.
Do the folding for pmap by moving the HAS_PMAP Kconfig selection under
HAS_MMU. Do the folding for fixmap by moving the implementation of
virt_to_fix() to mmu/mm.c, so that unnecessary stubs can be avoided.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- Rebase on top of xen/arm: Introduce CONFIG_MMU Kconfig option
v4:
- Rework "[v3,11/52] xen/arm: mmu: fold FIXMAP into MMU system",
  change the order of this patch and avoid introducing stubs.
---
 xen/arch/arm/Kconfig              | 2 +-
 xen/arch/arm/include/asm/fixmap.h | 7 +------
 xen/arch/arm/mmu/mm.c             | 7 +++++++
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index eb0413336b..8a7b79b4b5 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -15,7 +15,6 @@ config ARM
 	select HAS_DEVICE_TREE
 	select HAS_PASSTHROUGH
 	select HAS_PDX
-	select HAS_PMAP
 	select HAS_UBSAN
 	select IOMMU_FORCE_PT_SHARE
 
@@ -61,6 +60,7 @@ config PADDR_BITS
 
 config MMU
 	def_bool y
+	select HAS_PMAP
 
 source "arch/Kconfig"
 
diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h
index 734eb9b1d4..5d5de6995a 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -36,12 +36,7 @@ extern void clear_fixmap(unsigned int map);
 
 #define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot))
 
-static inline unsigned int virt_to_fix(vaddr_t vaddr)
-{
-    BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
-
-    return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
-}
+extern unsigned int virt_to_fix(vaddr_t vaddr);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/xen/arch/arm/mmu/mm.c b/xen/arch/arm/mmu/mm.c
index b70982e9d6..1d6267e6c5 100644
--- a/xen/arch/arm/mmu/mm.c
+++ b/xen/arch/arm/mmu/mm.c
@@ -1136,6 +1136,13 @@ int __init populate_pt_range(unsigned long virt, unsigned long nr_mfns)
     return xen_pt_update(virt, INVALID_MFN, nr_mfns, _PAGE_POPULATE);
 }
 
+unsigned int virt_to_fix(vaddr_t vaddr)
+{
+    BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
+
+    return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (7 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21 18:32   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 10/13] xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c Henry Wang
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk, Henry Wang

From: Penny Zheng <penny.zheng@arm.com>

As preparation for MPU support, which will use some variables/functions
for both MMU and MPU system, We rename the affected variable/function
to more generic names:
- init_ttbr -> init_mm,
- mmu_init_secondary_cpu() -> mm_init_secondary_cpu()
- init_secondary_pagetables() -> init_secondary_mm()
- Add a wrapper update_mm_mapping() for MMU system's
  update_identity_mapping()

Modify the related in-code comment to reflect above changes, take the
opportunity to fix the incorrect coding style of the in-code comments.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- Rebase on top of xen/arm: mm: add missing extern variable declaration
v4:
- Extract the renaming part from the original patch:
  "[v3,13/52] xen/mmu: extract mmu-specific codes from mm.c/mm.h"
---
 xen/arch/arm/arm32/head.S           |  4 ++--
 xen/arch/arm/arm64/mmu/head.S       |  2 +-
 xen/arch/arm/arm64/mmu/mm.c         | 11 ++++++++---
 xen/arch/arm/arm64/smpboot.c        |  6 +++---
 xen/arch/arm/include/asm/arm64/mm.h |  7 ++++---
 xen/arch/arm/include/asm/mm.h       | 12 +++++++-----
 xen/arch/arm/mmu/mm.c               | 20 ++++++++++----------
 xen/arch/arm/smpboot.c              |  4 ++--
 8 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 33b038e7e0..03ab68578a 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -238,11 +238,11 @@ GLOBAL(init_secondary)
 secondary_switched:
         /*
          * Non-boot CPUs need to move on to the proper pagetables, which were
-         * setup in init_secondary_pagetables.
+         * setup in init_secondary_mm.
          *
          * XXX: This is not compliant with the Arm Arm.
          */
-        mov_w r4, init_ttbr          /* VA of HTTBR value stashed by CPU 0 */
+        mov_w r4, init_mm            /* VA of HTTBR value stashed by CPU 0 */
         ldrd  r4, r5, [r4]           /* Actual value */
         dsb
         mcrr  CP64(r4, r5, HTTBR)
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index ba2ddd7e67..58d91c9088 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -302,7 +302,7 @@ ENDPROC(enable_mmu)
 ENTRY(enable_secondary_cpu_mm)
         mov   x5, lr
 
-        load_paddr x0, init_ttbr
+        load_paddr x0, init_mm
         ldr   x0, [x0]
 
         bl    enable_mmu
diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index 78b7c7eb00..ed0fc5ff7b 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -106,7 +106,7 @@ void __init arch_setup_page_tables(void)
     prepare_runtime_identity_mapping();
 }
 
-void update_identity_mapping(bool enable)
+static void update_identity_mapping(bool enable)
 {
     paddr_t id_addr = virt_to_maddr(_start);
     int rc;
@@ -120,6 +120,11 @@ void update_identity_mapping(bool enable)
     BUG_ON(rc);
 }
 
+void update_mm_mapping(bool enable)
+{
+    update_identity_mapping(enable);
+}
+
 extern void switch_ttbr_id(uint64_t ttbr);
 
 typedef void (switch_ttbr_fn)(uint64_t ttbr);
@@ -131,7 +136,7 @@ void __init switch_ttbr(uint64_t ttbr)
     lpae_t pte;
 
     /* Enable the identity mapping in the boot page tables */
-    update_identity_mapping(true);
+    update_mm_mapping(true);
 
     /* Enable the identity mapping in the runtime page tables */
     pte = pte_of_xenaddr((vaddr_t)switch_ttbr_id);
@@ -148,7 +153,7 @@ void __init switch_ttbr(uint64_t ttbr)
      * Note it is not necessary to disable it in the boot page tables
      * because they are not going to be used by this CPU anymore.
      */
-    update_identity_mapping(false);
+    update_mm_mapping(false);
 }
 
 /*
diff --git a/xen/arch/arm/arm64/smpboot.c b/xen/arch/arm/arm64/smpboot.c
index 9637f42469..2b1d086a1e 100644
--- a/xen/arch/arm/arm64/smpboot.c
+++ b/xen/arch/arm/arm64/smpboot.c
@@ -111,18 +111,18 @@ int arch_cpu_up(int cpu)
     if ( !smp_enable_ops[cpu].prepare_cpu )
         return -ENODEV;
 
-    update_identity_mapping(true);
+    update_mm_mapping(true);
 
     rc = smp_enable_ops[cpu].prepare_cpu(cpu);
     if ( rc )
-        update_identity_mapping(false);
+        update_mm_mapping(false);
 
     return rc;
 }
 
 void arch_cpu_up_finish(void)
 {
-    update_identity_mapping(false);
+    update_mm_mapping(false);
 }
 
 /*
diff --git a/xen/arch/arm/include/asm/arm64/mm.h b/xen/arch/arm/include/asm/arm64/mm.h
index e0bd23a6ed..7a389c4b21 100644
--- a/xen/arch/arm/include/asm/arm64/mm.h
+++ b/xen/arch/arm/include/asm/arm64/mm.h
@@ -15,13 +15,14 @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
 void arch_setup_page_tables(void);
 
 /*
- * Enable/disable the identity mapping in the live page-tables (i.e.
- * the one pointed by TTBR_EL2).
+ * In MMU system, enable/disable the identity mapping in the live
+ * page-tables (i.e. the one pointed by TTBR_EL2) through
+ * update_identity_mapping().
  *
  * Note that nested call (e.g. enable=true, enable=true) is not
  * supported.
  */
-void update_identity_mapping(bool enable);
+void update_mm_mapping(bool enable);
 
 #endif /* __ARM_ARM64_MM_H__ */
 
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index dc1458b047..8084c62c01 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -170,7 +170,7 @@ struct page_info
 #define PGC_need_scrub    PGC_allocated
 
 /* Non-boot CPUs use this to find the correct pagetables. */
-extern uint64_t init_ttbr;
+extern uint64_t init_mm;
 
 #ifdef CONFIG_ARM_32
 #define is_xen_heap_page(page) is_xen_heap_mfn(page_to_mfn(page))
@@ -205,11 +205,13 @@ extern void setup_pagetables(unsigned long boot_phys_offset);
 extern void *early_fdt_map(paddr_t fdt_paddr);
 /* Remove early mappings */
 extern void remove_early_mappings(void);
-/* Allocate and initialise pagetables for a secondary CPU. Sets init_ttbr to the
- * new page table */
-extern int init_secondary_pagetables(int cpu);
+/*
+ * Allocate and initialise pagetables for a secondary CPU. Sets init_mm to the
+ * new page table
+ */
+extern int init_secondary_mm(int cpu);
 /* Switch secondary CPUS to its own pagetables and finalise MMU setup */
-extern void mmu_init_secondary_cpu(void);
+extern void mm_init_secondary_cpu(void);
 /* Map a frame table to cover physical addresses ps through pe */
 extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
 /* map a physical range in virtual memory */
diff --git a/xen/arch/arm/mmu/mm.c b/xen/arch/arm/mmu/mm.c
index 1d6267e6c5..7486c35ec0 100644
--- a/xen/arch/arm/mmu/mm.c
+++ b/xen/arch/arm/mmu/mm.c
@@ -106,7 +106,7 @@ DEFINE_BOOT_PAGE_TABLE(xen_fixmap);
 static DEFINE_PAGE_TABLES(xen_xenmap, XEN_NR_ENTRIES(2));
 
 /* Non-boot CPUs use this to find the correct pagetables. */
-uint64_t init_ttbr;
+uint64_t init_mm;
 
 static paddr_t phys_offset;
 
@@ -492,18 +492,18 @@ static void clear_boot_pagetables(void)
 }
 
 #ifdef CONFIG_ARM_64
-int init_secondary_pagetables(int cpu)
+int init_secondary_mm(int cpu)
 {
     clear_boot_pagetables();
 
-    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
+    /* Set init_mm for this CPU coming up. All CPus share a single setof
      * pagetables, but rewrite it each time for consistency with 32 bit. */
-    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
-    clean_dcache(init_ttbr);
+    init_mm = (uintptr_t) xen_pgtable + phys_offset;
+    clean_dcache(init_mm);
     return 0;
 }
 #else
-int init_secondary_pagetables(int cpu)
+int init_secondary_mm(int cpu)
 {
     lpae_t *first;
 
@@ -529,16 +529,16 @@ int init_secondary_pagetables(int cpu)
 
     clear_boot_pagetables();
 
-    /* Set init_ttbr for this CPU coming up */
-    init_ttbr = __pa(first);
-    clean_dcache(init_ttbr);
+    /* Set init_mm for this CPU coming up */
+    init_mm = __pa(first);
+    clean_dcache(init_mm);
 
     return 0;
 }
 #endif
 
 /* MMU setup for secondary CPUS (which already have paging enabled) */
-void mmu_init_secondary_cpu(void)
+void mm_init_secondary_cpu(void)
 {
     xen_pt_enforce_wnx();
 }
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index e107b86b7b..8bcdbea66c 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -359,7 +359,7 @@ void start_secondary(void)
      */
     update_system_features(&current_cpu_data);
 
-    mmu_init_secondary_cpu();
+    mm_init_secondary_cpu();
 
     gic_init_secondary_cpu();
 
@@ -448,7 +448,7 @@ int __cpu_up(unsigned int cpu)
 
     printk("Bringing up CPU%d\n", cpu);
 
-    rc = init_secondary_pagetables(cpu);
+    rc = init_secondary_mm(cpu);
     if ( rc < 0 )
         return rc;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 10/13] xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (8 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21 21:19   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h} Henry Wang
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen, Henry Wang

From: Penny Zheng <penny.zheng@arm.com>

setup_mm is used for Xen to setup memory management subsystem at boot
time, like boot allocator, direct-mapping, xenheap initialization,
frametable and static memory pages.

We could inherit some components seamlessly in later MPU system like
boot allocator, whilst we need to implement some components differently
in MPU, like xenheap, etc. There are some components that is specific
to MMU only, like direct-mapping.

In the commit, we move MMU-specific components into mmu/setup.c, in
preparation of implementing MPU version of setup_mm later in future
commit. Also, make init_pdx(), init_staticmem_pages(), setup_mm(), and
populate_boot_allocator() public for future MPU inplementation.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- No change
v4:
- No change
---
 xen/arch/arm/include/asm/setup.h |   5 +
 xen/arch/arm/mmu/Makefile        |   1 +
 xen/arch/arm/mmu/setup.c         | 339 +++++++++++++++++++++++++++++++
 xen/arch/arm/setup.c             | 326 +----------------------------
 4 files changed, 349 insertions(+), 322 deletions(-)
 create mode 100644 xen/arch/arm/mmu/setup.c

diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index f0f64d228c..0922549631 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -156,6 +156,11 @@ struct bootcmdline *boot_cmdline_find_by_kind(bootmodule_kind kind);
 struct bootcmdline * boot_cmdline_find_by_name(const char *name);
 const char *boot_module_kind_as_string(bootmodule_kind kind);
 
+extern void init_pdx(void);
+extern void init_staticmem_pages(void);
+extern void populate_boot_allocator(void);
+extern void setup_mm(void);
+
 extern uint32_t hyp_traps_vector[];
 void init_traps(void);
 
diff --git a/xen/arch/arm/mmu/Makefile b/xen/arch/arm/mmu/Makefile
index b18cec4836..4aa1fb466d 100644
--- a/xen/arch/arm/mmu/Makefile
+++ b/xen/arch/arm/mmu/Makefile
@@ -1 +1,2 @@
 obj-y += mm.o
+obj-y += setup.o
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
new file mode 100644
index 0000000000..e05cca3f86
--- /dev/null
+++ b/xen/arch/arm/mmu/setup.c
@@ -0,0 +1,339 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * xen/arch/arm/mmu/setup.c
+ *
+ * MMU-specific early bringup code for an ARMv7-A with virt extensions.
+ */
+
+#include <xen/init.h>
+#include <xen/serial.h>
+#include <xen/libfdt/libfdt-xen.h>
+#include <xen/mm.h>
+#include <xen/param.h>
+#include <xen/pfn.h>
+#include <asm/fixmap.h>
+#include <asm/page.h>
+#include <asm/setup.h>
+
+#ifdef CONFIG_ARM_32
+static unsigned long opt_xenheap_megabytes __initdata;
+integer_param("xenheap_megabytes", opt_xenheap_megabytes);
+
+/*
+ * Returns the end address of the highest region in the range s..e
+ * with required size and alignment that does not conflict with the
+ * modules from first_mod to nr_modules.
+ *
+ * For non-recursive callers first_mod should normally be 0 (all
+ * modules and Xen itself) or 1 (all modules but not Xen).
+ */
+static paddr_t __init consider_modules(paddr_t s, paddr_t e,
+                                       uint32_t size, paddr_t align,
+                                       int first_mod)
+{
+    const struct bootmodules *mi = &bootinfo.modules;
+    int i;
+    int nr;
+
+    s = (s+align-1) & ~(align-1);
+    e = e & ~(align-1);
+
+    if ( s > e ||  e - s < size )
+        return 0;
+
+    /* First check the boot modules */
+    for ( i = first_mod; i < mi->nr_mods; i++ )
+    {
+        paddr_t mod_s = mi->module[i].start;
+        paddr_t mod_e = mod_s + mi->module[i].size;
+
+        if ( s < mod_e && mod_s < e )
+        {
+            mod_e = consider_modules(mod_e, e, size, align, i+1);
+            if ( mod_e )
+                return mod_e;
+
+            return consider_modules(s, mod_s, size, align, i+1);
+        }
+    }
+
+    /* Now check any fdt reserved areas. */
+
+    nr = fdt_num_mem_rsv(device_tree_flattened);
+
+    for ( ; i < mi->nr_mods + nr; i++ )
+    {
+        paddr_t mod_s, mod_e;
+
+        if ( fdt_get_mem_rsv_paddr(device_tree_flattened,
+                                   i - mi->nr_mods,
+                                   &mod_s, &mod_e ) < 0 )
+            /* If we can't read it, pretend it doesn't exist... */
+            continue;
+
+        /* fdt_get_mem_rsv_paddr returns length */
+        mod_e += mod_s;
+
+        if ( s < mod_e && mod_s < e )
+        {
+            mod_e = consider_modules(mod_e, e, size, align, i+1);
+            if ( mod_e )
+                return mod_e;
+
+            return consider_modules(s, mod_s, size, align, i+1);
+        }
+    }
+
+    /*
+     * i is the current bootmodule we are evaluating, across all
+     * possible kinds of bootmodules.
+     *
+     * When retrieving the corresponding reserved-memory addresses, we
+     * need to index the bootinfo.reserved_mem bank starting from 0, and
+     * only counting the reserved-memory modules. Hence, we need to use
+     * i - nr.
+     */
+    nr += mi->nr_mods;
+    for ( ; i - nr < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        paddr_t r_s = bootinfo.reserved_mem.bank[i - nr].start;
+        paddr_t r_e = r_s + bootinfo.reserved_mem.bank[i - nr].size;
+
+        if ( s < r_e && r_s < e )
+        {
+            r_e = consider_modules(r_e, e, size, align, i + 1);
+            if ( r_e )
+                return r_e;
+
+            return consider_modules(s, r_s, size, align, i + 1);
+        }
+    }
+    return e;
+}
+
+/*
+ * Find a contiguous region that fits in the static heap region with
+ * required size and alignment, and return the end address of the region
+ * if found otherwise 0.
+ */
+static paddr_t __init fit_xenheap_in_static_heap(uint32_t size, paddr_t align)
+{
+    unsigned int i;
+    paddr_t end = 0, aligned_start, aligned_end;
+    paddr_t bank_start, bank_size, bank_end;
+
+    for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
+            continue;
+
+        bank_start = bootinfo.reserved_mem.bank[i].start;
+        bank_size = bootinfo.reserved_mem.bank[i].size;
+        bank_end = bank_start + bank_size;
+
+        if ( bank_size < size )
+            continue;
+
+        aligned_end = bank_end & ~(align - 1);
+        aligned_start = (aligned_end - size) & ~(align - 1);
+
+        if ( aligned_start > bank_start )
+            /*
+             * Allocate the xenheap as high as possible to keep low-memory
+             * available (assuming the admin supplied region below 4GB)
+             * for other use (e.g. domain memory allocation).
+             */
+            end = max(end, aligned_end);
+    }
+
+    return end;
+}
+
+void __init setup_mm(void)
+{
+    paddr_t ram_start, ram_end, ram_size, e, bank_start, bank_end, bank_size;
+    paddr_t static_heap_end = 0, static_heap_size = 0;
+    unsigned long heap_pages, xenheap_pages, domheap_pages;
+    unsigned int i;
+    const uint32_t ctr = READ_CP32(CTR);
+
+    if ( !bootinfo.mem.nr_banks )
+        panic("No memory bank\n");
+
+    /* We only supports instruction caches implementing the IVIPT extension. */
+    if ( ((ctr >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK) == ICACHE_POLICY_AIVIVT )
+        panic("AIVIVT instruction cache not supported\n");
+
+    init_pdx();
+
+    ram_start = bootinfo.mem.bank[0].start;
+    ram_size  = bootinfo.mem.bank[0].size;
+    ram_end   = ram_start + ram_size;
+
+    for ( i = 1; i < bootinfo.mem.nr_banks; i++ )
+    {
+        bank_start = bootinfo.mem.bank[i].start;
+        bank_size = bootinfo.mem.bank[i].size;
+        bank_end = bank_start + bank_size;
+
+        ram_size  = ram_size + bank_size;
+        ram_start = min(ram_start,bank_start);
+        ram_end   = max(ram_end,bank_end);
+    }
+
+    total_pages = ram_size >> PAGE_SHIFT;
+
+    if ( bootinfo.static_heap )
+    {
+        for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
+        {
+            if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
+                continue;
+
+            bank_start = bootinfo.reserved_mem.bank[i].start;
+            bank_size = bootinfo.reserved_mem.bank[i].size;
+            bank_end = bank_start + bank_size;
+
+            static_heap_size += bank_size;
+            static_heap_end = max(static_heap_end, bank_end);
+        }
+
+        heap_pages = static_heap_size >> PAGE_SHIFT;
+    }
+    else
+        heap_pages = total_pages;
+
+    /*
+     * If the user has not requested otherwise via the command line
+     * then locate the xenheap using these constraints:
+     *
+     *  - must be contiguous
+     *  - must be 32 MiB aligned
+     *  - must not include Xen itself or the boot modules
+     *  - must be at most 1GB or 1/32 the total RAM in the system (or static
+          heap if enabled) if less
+     *  - must be at least 32M
+     *
+     * We try to allocate the largest xenheap possible within these
+     * constraints.
+     */
+    if ( opt_xenheap_megabytes )
+        xenheap_pages = opt_xenheap_megabytes << (20-PAGE_SHIFT);
+    else
+    {
+        xenheap_pages = (heap_pages/32 + 0x1fffUL) & ~0x1fffUL;
+        xenheap_pages = max(xenheap_pages, 32UL<<(20-PAGE_SHIFT));
+        xenheap_pages = min(xenheap_pages, 1UL<<(30-PAGE_SHIFT));
+    }
+
+    do
+    {
+        e = bootinfo.static_heap ?
+            fit_xenheap_in_static_heap(pfn_to_paddr(xenheap_pages), MB(32)) :
+            consider_modules(ram_start, ram_end,
+                             pfn_to_paddr(xenheap_pages),
+                             32<<20, 0);
+        if ( e )
+            break;
+
+        xenheap_pages >>= 1;
+    } while ( !opt_xenheap_megabytes && xenheap_pages > 32<<(20-PAGE_SHIFT) );
+
+    if ( ! e )
+        panic("Not enough space for xenheap\n");
+
+    domheap_pages = heap_pages - xenheap_pages;
+
+    printk("Xen heap: %"PRIpaddr"-%"PRIpaddr" (%lu pages%s)\n",
+           e - (pfn_to_paddr(xenheap_pages)), e, xenheap_pages,
+           opt_xenheap_megabytes ? ", from command-line" : "");
+    printk("Dom heap: %lu pages\n", domheap_pages);
+
+    /*
+     * We need some memory to allocate the page-tables used for the
+     * directmap mappings. So populate the boot allocator first.
+     *
+     * This requires us to set directmap_mfn_{start, end} first so the
+     * direct-mapped Xenheap region can be avoided.
+     */
+    directmap_mfn_start = _mfn((e >> PAGE_SHIFT) - xenheap_pages);
+    directmap_mfn_end = mfn_add(directmap_mfn_start, xenheap_pages);
+
+    populate_boot_allocator();
+
+    setup_directmap_mappings(mfn_x(directmap_mfn_start), xenheap_pages);
+
+    /* Frame table covers all of RAM region, including holes */
+    setup_frametable_mappings(ram_start, ram_end);
+    max_page = PFN_DOWN(ram_end);
+
+    /*
+     * The allocators may need to use map_domain_page() (such as for
+     * scrubbing pages). So we need to prepare the domheap area first.
+     */
+    if ( !init_domheap_mappings(smp_processor_id()) )
+        panic("CPU%u: Unable to prepare the domheap page-tables\n",
+              smp_processor_id());
+
+    /* Add xenheap memory that was not already added to the boot allocator. */
+    init_xenheap_pages(mfn_to_maddr(directmap_mfn_start),
+                       mfn_to_maddr(directmap_mfn_end));
+
+    init_staticmem_pages();
+}
+#else /* CONFIG_ARM_64 */
+void __init setup_mm(void)
+{
+    const struct meminfo *banks = &bootinfo.mem;
+    paddr_t ram_start = INVALID_PADDR;
+    paddr_t ram_end = 0;
+    paddr_t ram_size = 0;
+    unsigned int i;
+
+    init_pdx();
+
+    /*
+     * We need some memory to allocate the page-tables used for the directmap
+     * mappings. But some regions may contain memory already allocated
+     * for other uses (e.g. modules, reserved-memory...).
+     *
+     * For simplicity, add all the free regions in the boot allocator.
+     */
+    populate_boot_allocator();
+
+    total_pages = 0;
+
+    for ( i = 0; i < banks->nr_banks; i++ )
+    {
+        const struct membank *bank = &banks->bank[i];
+        paddr_t bank_end = bank->start + bank->size;
+
+        ram_size = ram_size + bank->size;
+        ram_start = min(ram_start, bank->start);
+        ram_end = max(ram_end, bank_end);
+
+        setup_directmap_mappings(PFN_DOWN(bank->start),
+                                 PFN_DOWN(bank->size));
+    }
+
+    total_pages += ram_size >> PAGE_SHIFT;
+
+    directmap_virt_end = XENHEAP_VIRT_START + ram_end - ram_start;
+    directmap_mfn_start = maddr_to_mfn(ram_start);
+    directmap_mfn_end = maddr_to_mfn(ram_end);
+
+    setup_frametable_mappings(ram_start, ram_end);
+    max_page = PFN_DOWN(ram_end);
+
+    init_staticmem_pages();
+}
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 44ccea03ca..b3dea41099 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -2,7 +2,7 @@
 /*
  * xen/arch/arm/setup.c
  *
- * Early bringup code for an ARMv7-A with virt extensions.
+ * Early bringup code for an ARMv7-A/ARM64v8R with virt extensions.
  *
  * Tim Deegan <tim@xen.org>
  * Copyright (c) 2011 Citrix Systems.
@@ -58,11 +58,6 @@ struct cpuinfo_arm __read_mostly system_cpuinfo;
 bool __read_mostly acpi_disabled;
 #endif
 
-#ifdef CONFIG_ARM_32
-static unsigned long opt_xenheap_megabytes __initdata;
-integer_param("xenheap_megabytes", opt_xenheap_megabytes);
-#endif
-
 domid_t __read_mostly max_init_domid;
 
 static __used void init_done(void)
@@ -547,138 +542,6 @@ static void * __init relocate_fdt(paddr_t dtb_paddr, size_t dtb_size)
     return fdt;
 }
 
-#ifdef CONFIG_ARM_32
-/*
- * Returns the end address of the highest region in the range s..e
- * with required size and alignment that does not conflict with the
- * modules from first_mod to nr_modules.
- *
- * For non-recursive callers first_mod should normally be 0 (all
- * modules and Xen itself) or 1 (all modules but not Xen).
- */
-static paddr_t __init consider_modules(paddr_t s, paddr_t e,
-                                       uint32_t size, paddr_t align,
-                                       int first_mod)
-{
-    const struct bootmodules *mi = &bootinfo.modules;
-    int i;
-    int nr;
-
-    s = (s+align-1) & ~(align-1);
-    e = e & ~(align-1);
-
-    if ( s > e ||  e - s < size )
-        return 0;
-
-    /* First check the boot modules */
-    for ( i = first_mod; i < mi->nr_mods; i++ )
-    {
-        paddr_t mod_s = mi->module[i].start;
-        paddr_t mod_e = mod_s + mi->module[i].size;
-
-        if ( s < mod_e && mod_s < e )
-        {
-            mod_e = consider_modules(mod_e, e, size, align, i+1);
-            if ( mod_e )
-                return mod_e;
-
-            return consider_modules(s, mod_s, size, align, i+1);
-        }
-    }
-
-    /* Now check any fdt reserved areas. */
-
-    nr = fdt_num_mem_rsv(device_tree_flattened);
-
-    for ( ; i < mi->nr_mods + nr; i++ )
-    {
-        paddr_t mod_s, mod_e;
-
-        if ( fdt_get_mem_rsv_paddr(device_tree_flattened,
-                                   i - mi->nr_mods,
-                                   &mod_s, &mod_e ) < 0 )
-            /* If we can't read it, pretend it doesn't exist... */
-            continue;
-
-        /* fdt_get_mem_rsv_paddr returns length */
-        mod_e += mod_s;
-
-        if ( s < mod_e && mod_s < e )
-        {
-            mod_e = consider_modules(mod_e, e, size, align, i+1);
-            if ( mod_e )
-                return mod_e;
-
-            return consider_modules(s, mod_s, size, align, i+1);
-        }
-    }
-
-    /*
-     * i is the current bootmodule we are evaluating, across all
-     * possible kinds of bootmodules.
-     *
-     * When retrieving the corresponding reserved-memory addresses, we
-     * need to index the bootinfo.reserved_mem bank starting from 0, and
-     * only counting the reserved-memory modules. Hence, we need to use
-     * i - nr.
-     */
-    nr += mi->nr_mods;
-    for ( ; i - nr < bootinfo.reserved_mem.nr_banks; i++ )
-    {
-        paddr_t r_s = bootinfo.reserved_mem.bank[i - nr].start;
-        paddr_t r_e = r_s + bootinfo.reserved_mem.bank[i - nr].size;
-
-        if ( s < r_e && r_s < e )
-        {
-            r_e = consider_modules(r_e, e, size, align, i + 1);
-            if ( r_e )
-                return r_e;
-
-            return consider_modules(s, r_s, size, align, i + 1);
-        }
-    }
-    return e;
-}
-
-/*
- * Find a contiguous region that fits in the static heap region with
- * required size and alignment, and return the end address of the region
- * if found otherwise 0.
- */
-static paddr_t __init fit_xenheap_in_static_heap(uint32_t size, paddr_t align)
-{
-    unsigned int i;
-    paddr_t end = 0, aligned_start, aligned_end;
-    paddr_t bank_start, bank_size, bank_end;
-
-    for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
-    {
-        if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
-            continue;
-
-        bank_start = bootinfo.reserved_mem.bank[i].start;
-        bank_size = bootinfo.reserved_mem.bank[i].size;
-        bank_end = bank_start + bank_size;
-
-        if ( bank_size < size )
-            continue;
-
-        aligned_end = bank_end & ~(align - 1);
-        aligned_start = (aligned_end - size) & ~(align - 1);
-
-        if ( aligned_start > bank_start )
-            /*
-             * Allocate the xenheap as high as possible to keep low-memory
-             * available (assuming the admin supplied region below 4GB)
-             * for other use (e.g. domain memory allocation).
-             */
-            end = max(end, aligned_end);
-    }
-
-    return end;
-}
-#endif
-
 /*
  * Return the end of the non-module region starting at s. In other
  * words return s the start of the next modules after s.
@@ -713,7 +576,7 @@ static paddr_t __init next_module(paddr_t s, paddr_t *end)
     return lowest;
 }
 
-static void __init init_pdx(void)
+void __init init_pdx(void)
 {
     paddr_t bank_start, bank_size, bank_end;
 
@@ -758,7 +621,7 @@ static void __init init_pdx(void)
 }
 
 /* Static memory initialization */
-static void __init init_staticmem_pages(void)
+void __init init_staticmem_pages(void)
 {
 #ifdef CONFIG_STATIC_MEMORY
     unsigned int bank;
@@ -792,7 +655,7 @@ static void __init init_staticmem_pages(void)
  * allocator with the corresponding regions only, but with Xenheap excluded
  * on arm32.
  */
-static void __init populate_boot_allocator(void)
+void __init populate_boot_allocator(void)
 {
     unsigned int i;
     const struct meminfo *banks = &bootinfo.mem;
@@ -861,187 +724,6 @@ static void __init populate_boot_allocator(void)
     }
 }
 
-#ifdef CONFIG_ARM_32
-static void __init setup_mm(void)
-{
-    paddr_t ram_start, ram_end, ram_size, e, bank_start, bank_end, bank_size;
-    paddr_t static_heap_end = 0, static_heap_size = 0;
-    unsigned long heap_pages, xenheap_pages, domheap_pages;
-    unsigned int i;
-    const uint32_t ctr = READ_CP32(CTR);
-
-    if ( !bootinfo.mem.nr_banks )
-        panic("No memory bank\n");
-
-    /* We only supports instruction caches implementing the IVIPT extension. */
-    if ( ((ctr >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK) == ICACHE_POLICY_AIVIVT )
-        panic("AIVIVT instruction cache not supported\n");
-
-    init_pdx();
-
-    ram_start = bootinfo.mem.bank[0].start;
-    ram_size  = bootinfo.mem.bank[0].size;
-    ram_end   = ram_start + ram_size;
-
-    for ( i = 1; i < bootinfo.mem.nr_banks; i++ )
-    {
-        bank_start = bootinfo.mem.bank[i].start;
-        bank_size = bootinfo.mem.bank[i].size;
-        bank_end = bank_start + bank_size;
-
-        ram_size  = ram_size + bank_size;
-        ram_start = min(ram_start,bank_start);
-        ram_end   = max(ram_end,bank_end);
-    }
-
-    total_pages = ram_size >> PAGE_SHIFT;
-
-    if ( bootinfo.static_heap )
-    {
-        for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
-        {
-            if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
-                continue;
-
-            bank_start = bootinfo.reserved_mem.bank[i].start;
-            bank_size = bootinfo.reserved_mem.bank[i].size;
-            bank_end = bank_start + bank_size;
-
-            static_heap_size += bank_size;
-            static_heap_end = max(static_heap_end, bank_end);
-        }
-
-        heap_pages = static_heap_size >> PAGE_SHIFT;
-    }
-    else
-        heap_pages = total_pages;
-
-    /*
-     * If the user has not requested otherwise via the command line
-     * then locate the xenheap using these constraints:
-     *
-     *  - must be contiguous
-     *  - must be 32 MiB aligned
-     *  - must not include Xen itself or the boot modules
-     *  - must be at most 1GB or 1/32 the total RAM in the system (or static
-          heap if enabled) if less
-     *  - must be at least 32M
-     *
-     * We try to allocate the largest xenheap possible within these
-     * constraints.
-     */
-    if ( opt_xenheap_megabytes )
-        xenheap_pages = opt_xenheap_megabytes << (20-PAGE_SHIFT);
-    else
-    {
-        xenheap_pages = (heap_pages/32 + 0x1fffUL) & ~0x1fffUL;
-        xenheap_pages = max(xenheap_pages, 32UL<<(20-PAGE_SHIFT));
-        xenheap_pages = min(xenheap_pages, 1UL<<(30-PAGE_SHIFT));
-    }
-
-    do
-    {
-        e = bootinfo.static_heap ?
-            fit_xenheap_in_static_heap(pfn_to_paddr(xenheap_pages), MB(32)) :
-            consider_modules(ram_start, ram_end,
-                             pfn_to_paddr(xenheap_pages),
-                             32<<20, 0);
-        if ( e )
-            break;
-
-        xenheap_pages >>= 1;
-    } while ( !opt_xenheap_megabytes && xenheap_pages > 32<<(20-PAGE_SHIFT) );
-
-    if ( ! e )
-        panic("Not enough space for xenheap\n");
-
-    domheap_pages = heap_pages - xenheap_pages;
-
-    printk("Xen heap: %"PRIpaddr"-%"PRIpaddr" (%lu pages%s)\n",
-           e - (pfn_to_paddr(xenheap_pages)), e, xenheap_pages,
-           opt_xenheap_megabytes ? ", from command-line" : "");
-    printk("Dom heap: %lu pages\n", domheap_pages);
-
-    /*
-     * We need some memory to allocate the page-tables used for the
-     * directmap mappings. So populate the boot allocator first.
-     *
-     * This requires us to set directmap_mfn_{start, end} first so the
-     * direct-mapped Xenheap region can be avoided.
-     */
-    directmap_mfn_start = _mfn((e >> PAGE_SHIFT) - xenheap_pages);
-    directmap_mfn_end = mfn_add(directmap_mfn_start, xenheap_pages);
-
-    populate_boot_allocator();
-
-    setup_directmap_mappings(mfn_x(directmap_mfn_start), xenheap_pages);
-
-    /* Frame table covers all of RAM region, including holes */
-    setup_frametable_mappings(ram_start, ram_end);
-    max_page = PFN_DOWN(ram_end);
-
-    /*
-     * The allocators may need to use map_domain_page() (such as for
-     * scrubbing pages). So we need to prepare the domheap area first.
-     */
-    if ( !init_domheap_mappings(smp_processor_id()) )
-        panic("CPU%u: Unable to prepare the domheap page-tables\n",
-              smp_processor_id());
-
-    /* Add xenheap memory that was not already added to the boot allocator. */
-    init_xenheap_pages(mfn_to_maddr(directmap_mfn_start),
-                       mfn_to_maddr(directmap_mfn_end));
-
-    init_staticmem_pages();
-}
-#else /* CONFIG_ARM_64 */
-static void __init setup_mm(void)
-{
-    const struct meminfo *banks = &bootinfo.mem;
-    paddr_t ram_start = INVALID_PADDR;
-    paddr_t ram_end = 0;
-    paddr_t ram_size = 0;
-    unsigned int i;
-
-    init_pdx();
-
-    /*
-     * We need some memory to allocate the page-tables used for the directmap
-     * mappings. But some regions may contain memory already allocated
-     * for other uses (e.g. modules, reserved-memory...).
-     *
-     * For simplicity, add all the free regions in the boot allocator.
-     */
-    populate_boot_allocator();
-
-    total_pages = 0;
-
-    for ( i = 0; i < banks->nr_banks; i++ )
-    {
-        const struct membank *bank = &banks->bank[i];
-        paddr_t bank_end = bank->start + bank->size;
-
-        ram_size = ram_size + bank->size;
-        ram_start = min(ram_start, bank->start);
-        ram_end = max(ram_end, bank_end);
-
-        setup_directmap_mappings(PFN_DOWN(bank->start),
-                                 PFN_DOWN(bank->size));
-    }
-
-    total_pages += ram_size >> PAGE_SHIFT;
-
-    directmap_virt_end = XENHEAP_VIRT_START + ram_end - ram_start;
-    directmap_mfn_start = maddr_to_mfn(ram_start);
-    directmap_mfn_end = maddr_to_mfn(ram_end);
-
-    setup_frametable_mappings(ram_start, ram_end);
-    max_page = PFN_DOWN(ram_end);
-
-    init_staticmem_pages();
-}
-#endif
-
 static bool __init is_dom0less_mode(void)
 {
     struct bootmodules *mods = &bootinfo.modules;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (9 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 10/13] xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-22 18:01   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c Henry Wang
  2023-08-14  4:25 ` [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU Henry Wang
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen, Henry Wang

From: Penny Zheng <penny.zheng@arm.com>

Current P2M implementation is designed for MMU system only.
We move the MMU-specific codes into mmu/p2m.c, and only keep generic
codes in p2m.c, like VMID allocator, etc. We also move MMU-specific
definitions and declarations to mmu/p2m.h, such as p2m_tlb_flush_sync().
Also expose previously static functions p2m_vmid_allocator_init(),
p2m_alloc_vmid(), __p2m_set_entry() and setup_virt_paging_one()
for futher MPU usage.

With the code movement, global variable max_vmid is used in multiple
files instead of a single file (and will be used in MPU P2M
implementation), declare it in the header and remove the "static" of
this variable.

Add #ifdef CONFIG_HAS_MMU to p2m_write_unlock() since future MPU
work does not need p2m_tlb_flush_sync().

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- No change
v4:
- Rework the patch to drop the unnecessary changes.
- Rework the commit msg a bit.
v3:
- remove MPU stubs
- adapt to the introduction of new directories: mmu/
v2:
- new commit
---
 xen/arch/arm/include/asm/mmu/p2m.h |   18 +
 xen/arch/arm/include/asm/p2m.h     |   33 +-
 xen/arch/arm/mmu/Makefile          |    1 +
 xen/arch/arm/mmu/p2m.c             | 1610 +++++++++++++++++++++++++
 xen/arch/arm/p2m.c                 | 1772 ++--------------------------
 5 files changed, 1745 insertions(+), 1689 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/mmu/p2m.h
 create mode 100644 xen/arch/arm/mmu/p2m.c

diff --git a/xen/arch/arm/include/asm/mmu/p2m.h b/xen/arch/arm/include/asm/mmu/p2m.h
new file mode 100644
index 0000000000..f829e325ce
--- /dev/null
+++ b/xen/arch/arm/include/asm/mmu/p2m.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ARM_MMU_P2M_H__
+#define __ARM_MMU_P2M_H__
+
+struct p2m_domain;
+void p2m_force_tlb_flush_sync(struct p2m_domain *p2m);
+void p2m_tlb_flush_sync(struct p2m_domain *p2m);
+
+#endif /* __ARM_MMU_P2M_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index 940495d42b..d3419bdfe1 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -19,6 +19,22 @@ extern unsigned int p2m_root_level;
 #define P2M_ROOT_ORDER    p2m_root_order
 #define P2M_ROOT_LEVEL p2m_root_level
 
+#define MAX_VMID_8_BIT  (1UL << 8)
+#define MAX_VMID_16_BIT (1UL << 16)
+
+#define INVALID_VMID 0 /* VMID 0 is reserved */
+
+#ifdef CONFIG_ARM_64
+extern unsigned int max_vmid;
+/* VMID is by default 8 bit width on AArch64 */
+#define MAX_VMID       max_vmid
+#else
+/* VMID is always 8 bit width on AArch32 */
+#define MAX_VMID        MAX_VMID_8_BIT
+#endif
+
+#define P2M_ROOT_PAGES    (1<<P2M_ROOT_ORDER)
+
 struct domain;
 
 extern void memory_type_changed(struct domain *);
@@ -156,6 +172,10 @@ typedef enum {
 #endif
 #include <xen/p2m-common.h>
 
+#ifdef CONFIG_MMU
+#include <asm/mmu/p2m.h>
+#endif
+
 static inline bool arch_acquire_resource_check(struct domain *d)
 {
     /*
@@ -180,7 +200,11 @@ void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
  */
 void p2m_restrict_ipa_bits(unsigned int ipa_bits);
 
+void p2m_vmid_allocator_init(void);
+int p2m_alloc_vmid(struct domain *d);
+
 /* Second stage paging setup, to be called on all CPUs */
+void setup_virt_paging_one(void *data);
 void setup_virt_paging(void);
 
 /* Init the datastructures for later use by the p2m code */
@@ -242,8 +266,6 @@ static inline int p2m_is_write_locked(struct p2m_domain *p2m)
     return rw_is_write_locked(&p2m->lock);
 }
 
-void p2m_tlb_flush_sync(struct p2m_domain *p2m);
-
 /* Look up the MFN corresponding to a domain's GFN. */
 mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t);
 
@@ -269,6 +291,13 @@ int p2m_set_entry(struct p2m_domain *p2m,
                   p2m_type_t t,
                   p2m_access_t a);
 
+int __p2m_set_entry(struct p2m_domain *p2m,
+                    gfn_t sgfn,
+                    unsigned int page_order,
+                    mfn_t smfn,
+                    p2m_type_t t,
+                    p2m_access_t a);
+
 bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn);
 
 void p2m_clear_root_pages(struct p2m_domain *p2m);
diff --git a/xen/arch/arm/mmu/Makefile b/xen/arch/arm/mmu/Makefile
index 4aa1fb466d..a4f07ab90a 100644
--- a/xen/arch/arm/mmu/Makefile
+++ b/xen/arch/arm/mmu/Makefile
@@ -1,2 +1,3 @@
 obj-y += mm.o
+obj-y += p2m.o
 obj-y += setup.o
diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
new file mode 100644
index 0000000000..a916e2318c
--- /dev/null
+++ b/xen/arch/arm/mmu/p2m.c
@@ -0,0 +1,1610 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <xen/domain_page.h>
+#include <xen/ioreq.h>
+#include <xen/lib.h>
+#include <xen/sched.h>
+
+#include <asm/alternative.h>
+#include <asm/event.h>
+#include <asm/flushtlb.h>
+#include <asm/page.h>
+
+unsigned int __read_mostly p2m_root_order;
+unsigned int __read_mostly p2m_root_level;
+
+static mfn_t __read_mostly empty_root_mfn;
+
+static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
+{
+    return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
+}
+
+static struct page_info *p2m_alloc_page(struct domain *d)
+{
+    struct page_info *pg;
+
+    /*
+     * For hardware domain, there should be no limit in the number of pages that
+     * can be allocated, so that the kernel may take advantage of the extended
+     * regions. Hence, allocate p2m pages for hardware domains from heap.
+     */
+    if ( is_hardware_domain(d) )
+    {
+        pg = alloc_domheap_page(NULL, 0);
+        if ( pg == NULL )
+            printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n");
+    }
+    else
+    {
+        spin_lock(&d->arch.paging.lock);
+        pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
+        spin_unlock(&d->arch.paging.lock);
+    }
+
+    return pg;
+}
+
+static void p2m_free_page(struct domain *d, struct page_info *pg)
+{
+    if ( is_hardware_domain(d) )
+        free_domheap_page(pg);
+    else
+    {
+        spin_lock(&d->arch.paging.lock);
+        page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
+        spin_unlock(&d->arch.paging.lock);
+    }
+}
+
+/* Return the size of the pool, in bytes. */
+int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
+{
+    *size = (uint64_t)ACCESS_ONCE(d->arch.paging.p2m_total_pages) << PAGE_SHIFT;
+    return 0;
+}
+
+/*
+ * Set the pool of pages to the required number of pages.
+ * Returns 0 for success, non-zero for failure.
+ * Call with d->arch.paging.lock held.
+ */
+int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
+{
+    struct page_info *pg;
+
+    ASSERT(spin_is_locked(&d->arch.paging.lock));
+
+    for ( ; ; )
+    {
+        if ( d->arch.paging.p2m_total_pages < pages )
+        {
+            /* Need to allocate more memory from domheap */
+            pg = alloc_domheap_page(NULL, 0);
+            if ( pg == NULL )
+            {
+                printk(XENLOG_ERR "Failed to allocate P2M pages.\n");
+                return -ENOMEM;
+            }
+            ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
+                d->arch.paging.p2m_total_pages + 1;
+            page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
+        }
+        else if ( d->arch.paging.p2m_total_pages > pages )
+        {
+            /* Need to return memory to domheap */
+            pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
+            if( pg )
+            {
+                ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
+                    d->arch.paging.p2m_total_pages - 1;
+                free_domheap_page(pg);
+            }
+            else
+            {
+                printk(XENLOG_ERR
+                       "Failed to free P2M pages, P2M freelist is empty.\n");
+                return -ENOMEM;
+            }
+        }
+        else
+            break;
+
+        /* Check to see if we need to yield and try again */
+        if ( preempted && general_preempt_check() )
+        {
+            *preempted = true;
+            return -ERESTART;
+        }
+    }
+
+    return 0;
+}
+
+int arch_set_paging_mempool_size(struct domain *d, uint64_t size)
+{
+    unsigned long pages = size >> PAGE_SHIFT;
+    bool preempted = false;
+    int rc;
+
+    if ( (size & ~PAGE_MASK) ||          /* Non page-sized request? */
+         pages != (size >> PAGE_SHIFT) ) /* 32-bit overflow? */
+        return -EINVAL;
+
+    spin_lock(&d->arch.paging.lock);
+    rc = p2m_set_allocation(d, pages, &preempted);
+    spin_unlock(&d->arch.paging.lock);
+
+    ASSERT(preempted == (rc == -ERESTART));
+
+    return rc;
+}
+
+int p2m_teardown_allocation(struct domain *d)
+{
+    int ret = 0;
+    bool preempted = false;
+
+    spin_lock(&d->arch.paging.lock);
+    if ( d->arch.paging.p2m_total_pages != 0 )
+    {
+        ret = p2m_set_allocation(d, 0, &preempted);
+        if ( preempted )
+        {
+            spin_unlock(&d->arch.paging.lock);
+            return -ERESTART;
+        }
+        ASSERT(d->arch.paging.p2m_total_pages == 0);
+    }
+    spin_unlock(&d->arch.paging.lock);
+
+    return ret;
+}
+
+void p2m_dump_info(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    p2m_read_lock(p2m);
+    printk("p2m mappings for domain %d (vmid %d):\n",
+           d->domain_id, p2m->vmid);
+    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
+    printk("  1G mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[1], p2m->stats.shattered[1]);
+    printk("  2M mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[2], p2m->stats.shattered[2]);
+    printk("  4K mappings: %ld\n", p2m->stats.mappings[3]);
+    p2m_read_unlock(p2m);
+}
+
+/*
+ * p2m_save_state and p2m_restore_state work in pair to workaround
+ * ARM64_WORKAROUND_AT_SPECULATE. p2m_save_state will set-up VTTBR to
+ * point to the empty page-tables to stop allocating TLB entries.
+ */
+void p2m_save_state(struct vcpu *p)
+{
+    p->arch.sctlr = READ_SYSREG(SCTLR_EL1);
+
+    if ( cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+    {
+        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
+        /*
+         * Ensure VTTBR_EL2 is correctly synchronized so we can restore
+         * the next vCPU context without worrying about AT instruction
+         * speculation.
+         */
+        isb();
+    }
+}
+
+void p2m_restore_state(struct vcpu *n)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(n->domain);
+    uint8_t *last_vcpu_ran;
+
+    if ( is_idle_vcpu(n) )
+        return;
+
+    WRITE_SYSREG(n->arch.sctlr, SCTLR_EL1);
+    WRITE_SYSREG(n->arch.hcr_el2, HCR_EL2);
+
+    /*
+     * ARM64_WORKAROUND_AT_SPECULATE: VTTBR_EL2 should be restored after all
+     * registers associated to EL1/EL0 translations regime have been
+     * synchronized.
+     */
+    asm volatile(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_AT_SPECULATE));
+    WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
+
+    last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
+
+    /*
+     * While we are restoring an out-of-context translation regime
+     * we still need to ensure:
+     *  - VTTBR_EL2 is synchronized before flushing the TLBs
+     *  - All registers for EL1 are synchronized before executing an AT
+     *  instructions targeting S1/S2.
+     */
+    isb();
+
+    /*
+     * Flush local TLB for the domain to prevent wrong TLB translation
+     * when running multiple vCPU of the same domain on a single pCPU.
+     */
+    if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
+        flush_guest_tlb_local();
+
+    *last_vcpu_ran = n->vcpu_id;
+}
+
+/*
+ * Force a synchronous P2M TLB flush.
+ *
+ * Must be called with the p2m lock held.
+ */
+void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
+{
+    unsigned long flags = 0;
+    uint64_t ovttbr;
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    /*
+     * ARM only provides an instruction to flush TLBs for the current
+     * VMID. So switch to the VTTBR of a given P2M if different.
+     */
+    ovttbr = READ_SYSREG64(VTTBR_EL2);
+    if ( ovttbr != p2m->vttbr )
+    {
+        uint64_t vttbr;
+
+        local_irq_save(flags);
+
+        /*
+         * ARM64_WORKAROUND_AT_SPECULATE: We need to stop AT to allocate
+         * TLBs entries because the context is partially modified. We
+         * only need the VMID for flushing the TLBs, so we can generate
+         * a new VTTBR with the VMID to flush and the empty root table.
+         */
+        if ( !cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+            vttbr = p2m->vttbr;
+        else
+            vttbr = generate_vttbr(p2m->vmid, empty_root_mfn);
+
+        WRITE_SYSREG64(vttbr, VTTBR_EL2);
+
+        /* Ensure VTTBR_EL2 is synchronized before flushing the TLBs */
+        isb();
+    }
+
+    flush_guest_tlb();
+
+    if ( ovttbr != READ_SYSREG64(VTTBR_EL2) )
+    {
+        WRITE_SYSREG64(ovttbr, VTTBR_EL2);
+        /* Ensure VTTBR_EL2 is back in place before continuing. */
+        isb();
+        local_irq_restore(flags);
+    }
+
+    p2m->need_flush = false;
+}
+
+void p2m_tlb_flush_sync(struct p2m_domain *p2m)
+{
+    if ( p2m->need_flush )
+        p2m_force_tlb_flush_sync(p2m);
+}
+
+/*
+ * Find and map the root page table. The caller is responsible for
+ * unmapping the table.
+ *
+ * The function will return NULL if the offset of the root table is
+ * invalid.
+ */
+static lpae_t *p2m_get_root_pointer(struct p2m_domain *p2m,
+                                    gfn_t gfn)
+{
+    unsigned long root_table;
+
+    /*
+     * While the root table index is the offset from the previous level,
+     * we can't use (P2M_ROOT_LEVEL - 1) because the root level might be
+     * 0. Yet we still want to check if all the unused bits are zeroed.
+     */
+    root_table = gfn_x(gfn) >> (XEN_PT_LEVEL_ORDER(P2M_ROOT_LEVEL) +
+                                XEN_PT_LPAE_SHIFT);
+    if ( root_table >= P2M_ROOT_PAGES )
+        return NULL;
+
+    return __map_domain_page(p2m->root + root_table);
+}
+
+/*
+ * Lookup the MFN corresponding to a domain's GFN.
+ * Lookup mem access in the ratrix tree.
+ * The entries associated to the GFN is considered valid.
+ */
+static p2m_access_t p2m_mem_access_radix_get(struct p2m_domain *p2m, gfn_t gfn)
+{
+    void *ptr;
+
+    if ( !p2m->mem_access_enabled )
+        return p2m->default_access;
+
+    ptr = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
+    if ( !ptr )
+        return p2m_access_rwx;
+    else
+        return radix_tree_ptr_to_int(ptr);
+}
+
+/*
+ * In the case of the P2M, the valid bit is used for other purpose. Use
+ * the type to check whether an entry is valid.
+ */
+static inline bool p2m_is_valid(lpae_t pte)
+{
+    return pte.p2m.type != p2m_invalid;
+}
+
+/*
+ * lpae_is_* helpers don't check whether the valid bit is set in the
+ * PTE. Provide our own overlay to check the valid bit.
+ */
+static inline bool p2m_is_mapping(lpae_t pte, unsigned int level)
+{
+    return p2m_is_valid(pte) && lpae_is_mapping(pte, level);
+}
+
+static inline bool p2m_is_superpage(lpae_t pte, unsigned int level)
+{
+    return p2m_is_valid(pte) && lpae_is_superpage(pte, level);
+}
+
+#define GUEST_TABLE_MAP_FAILED 0
+#define GUEST_TABLE_SUPER_PAGE 1
+#define GUEST_TABLE_NORMAL_PAGE 2
+
+static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry);
+
+/*
+ * Take the currently mapped table, find the corresponding GFN entry,
+ * and map the next table, if available. The previous table will be
+ * unmapped if the next level was mapped (e.g GUEST_TABLE_NORMAL_PAGE
+ * returned).
+ *
+ * The read_only parameters indicates whether intermediate tables should
+ * be allocated when not present.
+ *
+ * Return values:
+ *  GUEST_TABLE_MAP_FAILED: Either read_only was set and the entry
+ *  was empty, or allocating a new page failed.
+ *  GUEST_TABLE_NORMAL_PAGE: next level mapped normally
+ *  GUEST_TABLE_SUPER_PAGE: The next entry points to a superpage.
+ */
+static int p2m_next_level(struct p2m_domain *p2m, bool read_only,
+                          unsigned int level, lpae_t **table,
+                          unsigned int offset)
+{
+    lpae_t *entry;
+    int ret;
+    mfn_t mfn;
+
+    entry = *table + offset;
+
+    if ( !p2m_is_valid(*entry) )
+    {
+        if ( read_only )
+            return GUEST_TABLE_MAP_FAILED;
+
+        ret = p2m_create_table(p2m, entry);
+        if ( ret )
+            return GUEST_TABLE_MAP_FAILED;
+    }
+
+    /* The function p2m_next_level is never called at the 3rd level */
+    ASSERT(level < 3);
+    if ( p2m_is_mapping(*entry, level) )
+        return GUEST_TABLE_SUPER_PAGE;
+
+    mfn = lpae_get_mfn(*entry);
+
+    unmap_domain_page(*table);
+    *table = map_domain_page(mfn);
+
+    return GUEST_TABLE_NORMAL_PAGE;
+}
+
+/*
+ * Get the details of a given gfn.
+ *
+ * If the entry is present, the associated MFN will be returned and the
+ * access and type filled up. The page_order will correspond to the
+ * order of the mapping in the page table (i.e it could be a superpage).
+ *
+ * If the entry is not present, INVALID_MFN will be returned and the
+ * page_order will be set according to the order of the invalid range.
+ *
+ * valid will contain the value of bit[0] (e.g valid bit) of the
+ * entry.
+ */
+mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+                    p2m_type_t *t, p2m_access_t *a,
+                    unsigned int *page_order,
+                    bool *valid)
+{
+    paddr_t addr = gfn_to_gaddr(gfn);
+    unsigned int level = 0;
+    lpae_t entry, *table;
+    int rc;
+    mfn_t mfn = INVALID_MFN;
+    p2m_type_t _t;
+    DECLARE_OFFSETS(offsets, addr);
+
+    ASSERT(p2m_is_locked(p2m));
+    BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
+
+    /* Allow t to be NULL */
+    t = t ?: &_t;
+
+    *t = p2m_invalid;
+
+    if ( valid )
+        *valid = false;
+
+    /* XXX: Check if the mapping is lower than the mapped gfn */
+
+    /* This gfn is higher than the highest the p2m map currently holds */
+    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
+    {
+        for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
+            if ( (gfn_x(gfn) & (XEN_PT_LEVEL_MASK(level) >> PAGE_SHIFT)) >
+                 gfn_x(p2m->max_mapped_gfn) )
+                break;
+
+        goto out;
+    }
+
+    table = p2m_get_root_pointer(p2m, gfn);
+
+    /*
+     * the table should always be non-NULL because the gfn is below
+     * p2m->max_mapped_gfn and the root table pages are always present.
+     */
+    if ( !table )
+    {
+        ASSERT_UNREACHABLE();
+        level = P2M_ROOT_LEVEL;
+        goto out;
+    }
+
+    for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
+    {
+        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+            goto out_unmap;
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    entry = table[offsets[level]];
+
+    if ( p2m_is_valid(entry) )
+    {
+        *t = entry.p2m.type;
+
+        if ( a )
+            *a = p2m_mem_access_radix_get(p2m, gfn);
+
+        mfn = lpae_get_mfn(entry);
+        /*
+         * The entry may point to a superpage. Find the MFN associated
+         * to the GFN.
+         */
+        mfn = mfn_add(mfn,
+                      gfn_x(gfn) & ((1UL << XEN_PT_LEVEL_ORDER(level)) - 1));
+
+        if ( valid )
+            *valid = lpae_is_valid(entry);
+    }
+
+out_unmap:
+    unmap_domain_page(table);
+
+out:
+    if ( page_order )
+        *page_order = XEN_PT_LEVEL_ORDER(level);
+
+    return mfn;
+}
+
+static void p2m_set_permission(lpae_t *e, p2m_type_t t, p2m_access_t a)
+{
+    /* First apply type permissions */
+    switch ( t )
+    {
+    case p2m_ram_rw:
+        e->p2m.xn = 0;
+        e->p2m.write = 1;
+        break;
+
+    case p2m_ram_ro:
+        e->p2m.xn = 0;
+        e->p2m.write = 0;
+        break;
+
+    case p2m_iommu_map_rw:
+    case p2m_map_foreign_rw:
+    case p2m_grant_map_rw:
+    case p2m_mmio_direct_dev:
+    case p2m_mmio_direct_nc:
+    case p2m_mmio_direct_c:
+        e->p2m.xn = 1;
+        e->p2m.write = 1;
+        break;
+
+    case p2m_iommu_map_ro:
+    case p2m_map_foreign_ro:
+    case p2m_grant_map_ro:
+    case p2m_invalid:
+        e->p2m.xn = 1;
+        e->p2m.write = 0;
+        break;
+
+    case p2m_max_real_type:
+        BUG();
+        break;
+    }
+
+    /* Then restrict with access permissions */
+    switch ( a )
+    {
+    case p2m_access_rwx:
+        break;
+    case p2m_access_wx:
+        e->p2m.read = 0;
+        break;
+    case p2m_access_rw:
+        e->p2m.xn = 1;
+        break;
+    case p2m_access_w:
+        e->p2m.read = 0;
+        e->p2m.xn = 1;
+        break;
+    case p2m_access_rx:
+    case p2m_access_rx2rw:
+        e->p2m.write = 0;
+        break;
+    case p2m_access_x:
+        e->p2m.write = 0;
+        e->p2m.read = 0;
+        break;
+    case p2m_access_r:
+        e->p2m.write = 0;
+        e->p2m.xn = 1;
+        break;
+    case p2m_access_n:
+    case p2m_access_n2rwx:
+        e->p2m.read = e->p2m.write = 0;
+        e->p2m.xn = 1;
+        break;
+    }
+}
+
+static lpae_t mfn_to_p2m_entry(mfn_t mfn, p2m_type_t t, p2m_access_t a)
+{
+    /*
+     * sh, xn and write bit will be defined in the following switches
+     * based on mattr and t.
+     */
+    lpae_t e = (lpae_t) {
+        .p2m.af = 1,
+        .p2m.read = 1,
+        .p2m.table = 1,
+        .p2m.valid = 1,
+        .p2m.type = t,
+    };
+
+    BUILD_BUG_ON(p2m_max_real_type > (1 << 4));
+
+    switch ( t )
+    {
+    case p2m_mmio_direct_dev:
+        e.p2m.mattr = MATTR_DEV;
+        e.p2m.sh = LPAE_SH_OUTER;
+        break;
+
+    case p2m_mmio_direct_c:
+        e.p2m.mattr = MATTR_MEM;
+        e.p2m.sh = LPAE_SH_OUTER;
+        break;
+
+    /*
+     * ARM ARM: Overlaying the shareability attribute (DDI
+     * 0406C.b B3-1376 to 1377)
+     *
+     * A memory region with a resultant memory type attribute of Normal,
+     * and a resultant cacheability attribute of Inner Non-cacheable,
+     * Outer Non-cacheable, must have a resultant shareability attribute
+     * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
+     *
+     * On ARMv8 shareability is ignored and explicitly treated as Outer
+     * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
+     * See the note for table D4-40, in page 1788 of the ARM DDI 0487A.j.
+     */
+    case p2m_mmio_direct_nc:
+        e.p2m.mattr = MATTR_MEM_NC;
+        e.p2m.sh = LPAE_SH_OUTER;
+        break;
+
+    default:
+        e.p2m.mattr = MATTR_MEM;
+        e.p2m.sh = LPAE_SH_INNER;
+    }
+
+    p2m_set_permission(&e, t, a);
+
+    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
+
+    lpae_set_mfn(e, mfn);
+
+    return e;
+}
+
+/* Generate table entry with correct attributes. */
+static lpae_t page_to_p2m_table(struct page_info *page)
+{
+    /*
+     * The access value does not matter because the hardware will ignore
+     * the permission fields for table entry.
+     *
+     * We use p2m_ram_rw so the entry has a valid type. This is important
+     * for p2m_is_valid() to return valid on table entries.
+     */
+    return mfn_to_p2m_entry(page_to_mfn(page), p2m_ram_rw, p2m_access_rwx);
+}
+
+static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool clean_pte)
+{
+    write_pte(p, pte);
+    if ( clean_pte )
+        clean_dcache(*p);
+}
+
+static inline void p2m_remove_pte(lpae_t *p, bool clean_pte)
+{
+    lpae_t pte;
+
+    memset(&pte, 0x00, sizeof(pte));
+    p2m_write_pte(p, pte, clean_pte);
+}
+
+/* Allocate a new page table page and hook it in via the given entry. */
+static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry)
+{
+    struct page_info *page;
+    lpae_t *p;
+
+    ASSERT(!p2m_is_valid(*entry));
+
+    page = p2m_alloc_page(p2m->domain);
+    if ( page == NULL )
+        return -ENOMEM;
+
+    page_list_add(page, &p2m->pages);
+
+    p = __map_domain_page(page);
+    clear_page(p);
+
+    if ( p2m->clean_pte )
+        clean_dcache_va_range(p, PAGE_SIZE);
+
+    unmap_domain_page(p);
+
+    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
+
+    return 0;
+}
+
+static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
+                                    p2m_access_t a)
+{
+    int rc;
+
+    if ( !p2m->mem_access_enabled )
+        return 0;
+
+    if ( p2m_access_rwx == a )
+    {
+        radix_tree_delete(&p2m->mem_access_settings, gfn_x(gfn));
+        return 0;
+    }
+
+    rc = radix_tree_insert(&p2m->mem_access_settings, gfn_x(gfn),
+                           radix_tree_int_to_ptr(a));
+    if ( rc == -EEXIST )
+    {
+        /* If a setting already exists, change it to the new one */
+        radix_tree_replace_slot(
+            radix_tree_lookup_slot(
+                &p2m->mem_access_settings, gfn_x(gfn)),
+            radix_tree_int_to_ptr(a));
+        rc = 0;
+    }
+
+    return rc;
+}
+
+/*
+ * Put any references on the single 4K page referenced by pte.
+ * TODO: Handle superpages, for now we only take special references for leaf
+ * pages (specifically foreign ones, which can't be super mapped today).
+ */
+static void p2m_put_l3_page(const lpae_t pte)
+{
+    mfn_t mfn = lpae_get_mfn(pte);
+
+    ASSERT(p2m_is_valid(pte));
+
+    /*
+     * TODO: Handle other p2m types
+     *
+     * It's safe to do the put_page here because page_alloc will
+     * flush the TLBs if the page is reallocated before the end of
+     * this loop.
+     */
+    if ( p2m_is_foreign(pte.p2m.type) )
+    {
+        ASSERT(mfn_valid(mfn));
+        put_page(mfn_to_page(mfn));
+    }
+    /* Detect the xenheap page and mark the stored GFN as invalid. */
+    else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
+}
+
+/* Free lpae sub-tree behind an entry */
+static void p2m_free_entry(struct p2m_domain *p2m,
+                           lpae_t entry, unsigned int level)
+{
+    unsigned int i;
+    lpae_t *table;
+    mfn_t mfn;
+    struct page_info *pg;
+
+    /* Nothing to do if the entry is invalid. */
+    if ( !p2m_is_valid(entry) )
+        return;
+
+    if ( p2m_is_superpage(entry, level) || (level == 3) )
+    {
+#ifdef CONFIG_IOREQ_SERVER
+        /*
+         * If this gets called then either the entry was replaced by an entry
+         * with a different base (valid case) or the shattering of a superpage
+         * has failed (error case).
+         * So, at worst, the spurious mapcache invalidation might be sent.
+         */
+        if ( p2m_is_ram(entry.p2m.type) &&
+             domain_has_ioreq_server(p2m->domain) )
+            ioreq_request_mapcache_invalidate(p2m->domain);
+#endif
+
+        p2m->stats.mappings[level]--;
+        /* Nothing to do if the entry is a super-page. */
+        if ( level == 3 )
+            p2m_put_l3_page(entry);
+        return;
+    }
+
+    table = map_domain_page(lpae_get_mfn(entry));
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+        p2m_free_entry(p2m, *(table + i), level + 1);
+
+    unmap_domain_page(table);
+
+    /*
+     * Make sure all the references in the TLB have been removed before
+     * freing the intermediate page table.
+     * XXX: Should we defer the free of the page table to avoid the
+     * flush?
+     */
+    p2m_tlb_flush_sync(p2m);
+
+    mfn = lpae_get_mfn(entry);
+    ASSERT(mfn_valid(mfn));
+
+    pg = mfn_to_page(mfn);
+
+    page_list_del(pg, &p2m->pages);
+    p2m_free_page(p2m->domain, pg);
+}
+
+static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
+                                unsigned int level, unsigned int target,
+                                const unsigned int *offsets)
+{
+    struct page_info *page;
+    unsigned int i;
+    lpae_t pte, *table;
+    bool rv = true;
+
+    /* Convenience aliases */
+    mfn_t mfn = lpae_get_mfn(*entry);
+    unsigned int next_level = level + 1;
+    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
+
+    /*
+     * This should only be called with target != level and the entry is
+     * a superpage.
+     */
+    ASSERT(level < target);
+    ASSERT(p2m_is_superpage(*entry, level));
+
+    page = p2m_alloc_page(p2m->domain);
+    if ( !page )
+        return false;
+
+    page_list_add(page, &p2m->pages);
+    table = __map_domain_page(page);
+
+    /*
+     * We are either splitting a first level 1G page into 512 second level
+     * 2M pages, or a second level 2M page into 512 third level 4K pages.
+     */
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+    {
+        lpae_t *new_entry = table + i;
+
+        /*
+         * Use the content of the superpage entry and override
+         * the necessary fields. So the correct permission are kept.
+         */
+        pte = *entry;
+        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
+
+        /*
+         * First and second level pages set p2m.table = 0, but third
+         * level entries set p2m.table = 1.
+         */
+        pte.p2m.table = (next_level == 3);
+
+        write_pte(new_entry, pte);
+    }
+
+    /* Update stats */
+    p2m->stats.shattered[level]++;
+    p2m->stats.mappings[level]--;
+    p2m->stats.mappings[next_level] += XEN_PT_LPAE_ENTRIES;
+
+    /*
+     * Shatter superpage in the page to the level we want to make the
+     * changes.
+     * This is done outside the loop to avoid checking the offset to
+     * know whether the entry should be shattered for every entry.
+     */
+    if ( next_level != target )
+        rv = p2m_split_superpage(p2m, table + offsets[next_level],
+                                 level + 1, target, offsets);
+
+    if ( p2m->clean_pte )
+        clean_dcache_va_range(table, PAGE_SIZE);
+
+    unmap_domain_page(table);
+
+    /*
+     * Even if we failed, we should install the newly allocated LPAE
+     * entry. The caller will be in charge to free the sub-tree.
+     */
+    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
+
+    return rv;
+}
+
+/*
+ * Insert an entry in the p2m. This should be called with a mapping
+ * equal to a page/superpage (4K, 2M, 1G).
+ */
+int __p2m_set_entry(struct p2m_domain *p2m,
+                    gfn_t sgfn,
+                    unsigned int page_order,
+                    mfn_t smfn,
+                    p2m_type_t t,
+                    p2m_access_t a)
+{
+    unsigned int level = 0;
+    unsigned int target = 3 - (page_order / XEN_PT_LPAE_SHIFT);
+    lpae_t *entry, *table, orig_pte;
+    int rc;
+    /* A mapping is removed if the MFN is invalid. */
+    bool removing_mapping = mfn_eq(smfn, INVALID_MFN);
+    DECLARE_OFFSETS(offsets, gfn_to_gaddr(sgfn));
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    /*
+     * Check if the level target is valid: we only support
+     * 4K - 2M - 1G mapping.
+     */
+    ASSERT(target > 0 && target <= 3);
+
+    table = p2m_get_root_pointer(p2m, sgfn);
+    if ( !table )
+        return -EINVAL;
+
+    for ( level = P2M_ROOT_LEVEL; level < target; level++ )
+    {
+        /*
+         * Don't try to allocate intermediate page table if the mapping
+         * is about to be removed.
+         */
+        rc = p2m_next_level(p2m, removing_mapping,
+                            level, &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+        {
+            /*
+             * We are here because p2m_next_level has failed to map
+             * the intermediate page table (e.g the table does not exist
+             * and they p2m tree is read-only). It is a valid case
+             * when removing a mapping as it may not exist in the
+             * page table. In this case, just ignore it.
+             */
+            rc = removing_mapping ?  0 : -ENOENT;
+            goto out;
+        }
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    entry = table + offsets[level];
+
+    /*
+     * If we are here with level < target, we must be at a leaf node,
+     * and we need to break up the superpage.
+     */
+    if ( level < target )
+    {
+        /* We need to split the original page. */
+        lpae_t split_pte = *entry;
+
+        ASSERT(p2m_is_superpage(*entry, level));
+
+        if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
+        {
+            /*
+             * The current super-page is still in-place, so re-increment
+             * the stats.
+             */
+            p2m->stats.mappings[level]++;
+
+            /* Free the allocated sub-tree */
+            p2m_free_entry(p2m, split_pte, level);
+
+            rc = -ENOMEM;
+            goto out;
+        }
+
+        /*
+         * Follow the break-before-sequence to update the entry.
+         * For more details see (D4.7.1 in ARM DDI 0487A.j).
+         */
+        p2m_remove_pte(entry, p2m->clean_pte);
+        p2m_force_tlb_flush_sync(p2m);
+
+        p2m_write_pte(entry, split_pte, p2m->clean_pte);
+
+        /* then move to the level we want to make real changes */
+        for ( ; level < target; level++ )
+        {
+            rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
+
+            /*
+             * The entry should be found and either be a table
+             * or a superpage if level 3 is not targeted
+             */
+            ASSERT(rc == GUEST_TABLE_NORMAL_PAGE ||
+                   (rc == GUEST_TABLE_SUPER_PAGE && target < 3));
+        }
+
+        entry = table + offsets[level];
+    }
+
+    /*
+     * We should always be there with the correct level because
+     * all the intermediate tables have been installed if necessary.
+     */
+    ASSERT(level == target);
+
+    orig_pte = *entry;
+
+    /*
+     * The radix-tree can only work on 4KB. This is only used when
+     * memaccess is enabled and during shutdown.
+     */
+    ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
+           p2m->domain->is_dying);
+    /*
+     * The access type should always be p2m_access_rwx when the mapping
+     * is removed.
+     */
+    ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx));
+    /*
+     * Update the mem access permission before update the P2M. So we
+     * don't have to revert the mapping if it has failed.
+     */
+    rc = p2m_mem_access_radix_set(p2m, sgfn, a);
+    if ( rc )
+        goto out;
+
+    /*
+     * Always remove the entry in order to follow the break-before-make
+     * sequence when updating the translation table (D4.7.1 in ARM DDI
+     * 0487A.j).
+     */
+    if ( lpae_is_valid(orig_pte) || removing_mapping )
+        p2m_remove_pte(entry, p2m->clean_pte);
+
+    if ( removing_mapping )
+        /* Flush can be deferred if the entry is removed */
+        p2m->need_flush |= !!lpae_is_valid(orig_pte);
+    else
+    {
+        lpae_t pte = mfn_to_p2m_entry(smfn, t, a);
+
+        if ( level < 3 )
+            pte.p2m.table = 0; /* Superpage entry */
+
+        /*
+         * It is necessary to flush the TLB before writing the new entry
+         * to keep coherency when the previous entry was valid.
+         *
+         * Although, it could be defered when only the permissions are
+         * changed (e.g in case of memaccess).
+         */
+        if ( lpae_is_valid(orig_pte) )
+        {
+            if ( likely(!p2m->mem_access_enabled) ||
+                 P2M_CLEAR_PERM(pte) != P2M_CLEAR_PERM(orig_pte) )
+                p2m_force_tlb_flush_sync(p2m);
+            else
+                p2m->need_flush = true;
+        }
+        else if ( !p2m_is_valid(orig_pte) ) /* new mapping */
+            p2m->stats.mappings[level]++;
+
+        p2m_write_pte(entry, pte, p2m->clean_pte);
+
+        p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
+                                      gfn_add(sgfn, (1UL << page_order) - 1));
+        p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn);
+    }
+
+    if ( is_iommu_enabled(p2m->domain) &&
+         (lpae_is_valid(orig_pte) || lpae_is_valid(*entry)) )
+    {
+        unsigned int flush_flags = 0;
+
+        if ( lpae_is_valid(orig_pte) )
+            flush_flags |= IOMMU_FLUSHF_modified;
+        if ( lpae_is_valid(*entry) )
+            flush_flags |= IOMMU_FLUSHF_added;
+
+        rc = iommu_iotlb_flush(p2m->domain, _dfn(gfn_x(sgfn)),
+                               1UL << page_order, flush_flags);
+    }
+    else
+        rc = 0;
+
+    /*
+     * Free the entry only if the original pte was valid and the base
+     * is different (to avoid freeing when permission is changed).
+     */
+    if ( p2m_is_valid(orig_pte) &&
+         !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
+        p2m_free_entry(p2m, orig_pte, level);
+
+out:
+    unmap_domain_page(table);
+
+    return rc;
+}
+
+int p2m_set_entry(struct p2m_domain *p2m,
+                  gfn_t sgfn,
+                  unsigned long nr,
+                  mfn_t smfn,
+                  p2m_type_t t,
+                  p2m_access_t a)
+{
+    int rc = 0;
+
+    /*
+     * Any reference taken by the P2M mappings (e.g. foreign mapping) will
+     * be dropped in relinquish_p2m_mapping(). As the P2M will still
+     * be accessible after, we need to prevent mapping to be added when the
+     * domain is dying.
+     */
+    if ( unlikely(p2m->domain->is_dying) )
+        return -ENOMEM;
+
+    while ( nr )
+    {
+        unsigned long mask;
+        unsigned long order;
+
+        /*
+         * Don't take into account the MFN when removing mapping (i.e
+         * MFN_INVALID) to calculate the correct target order.
+         *
+         * XXX: Support superpage mappings if nr is not aligned to a
+         * superpage size.
+         */
+        mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
+        mask |= gfn_x(sgfn) | nr;
+
+        /* Always map 4k by 4k when memaccess is enabled */
+        if ( unlikely(p2m->mem_access_enabled) )
+            order = THIRD_ORDER;
+        else if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
+            order = FIRST_ORDER;
+        else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
+            order = SECOND_ORDER;
+        else
+            order = THIRD_ORDER;
+
+        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
+        if ( rc )
+            break;
+
+        sgfn = gfn_add(sgfn, (1 << order));
+        if ( !mfn_eq(smfn, INVALID_MFN) )
+           smfn = mfn_add(smfn, (1 << order));
+
+        nr -= (1 << order);
+    }
+
+    return rc;
+}
+
+/* Invalidate all entries in the table. The p2m should be write locked. */
+static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn)
+{
+    lpae_t *table;
+    unsigned int i;
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    table = map_domain_page(mfn);
+
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+    {
+        lpae_t pte = table[i];
+
+        /*
+         * Writing an entry can be expensive because it may involve
+         * cleaning the cache. So avoid updating the entry if the valid
+         * bit is already cleared.
+         */
+        if ( !pte.p2m.valid )
+            continue;
+
+        pte.p2m.valid = 0;
+
+        p2m_write_pte(&table[i], pte, p2m->clean_pte);
+    }
+
+    unmap_domain_page(table);
+
+    p2m->need_flush = true;
+}
+
+/*
+ * Invalidate all entries in the root page-tables. This is
+ * useful to get fault on entry and do an action.
+ *
+ * p2m_invalid_root() should not be called when the P2M is shared with
+ * the IOMMU because it will cause IOMMU fault.
+ */
+void p2m_invalidate_root(struct p2m_domain *p2m)
+{
+    unsigned int i;
+
+    ASSERT(!iommu_use_hap_pt(p2m->domain));
+
+    p2m_write_lock(p2m);
+
+    for ( i = 0; i < P2M_ROOT_LEVEL; i++ )
+        p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i));
+
+    p2m_write_unlock(p2m);
+}
+
+/*
+ * Resolve any translation fault due to change in the p2m. This
+ * includes break-before-make and valid bit cleared.
+ */
+bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    unsigned int level = 0;
+    bool resolved = false;
+    lpae_t entry, *table;
+
+    /* Convenience aliases */
+    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
+
+    p2m_write_lock(p2m);
+
+    /* This gfn is higher than the highest the p2m map currently holds */
+    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
+        goto out;
+
+    table = p2m_get_root_pointer(p2m, gfn);
+    /*
+     * The table should always be non-NULL because the gfn is below
+     * p2m->max_mapped_gfn and the root table pages are always present.
+     */
+    if ( !table )
+    {
+        ASSERT_UNREACHABLE();
+        goto out;
+    }
+
+    /*
+     * Go down the page-tables until an entry has the valid bit unset or
+     * a block/page entry has been hit.
+     */
+    for ( level = P2M_ROOT_LEVEL; level <= 3; level++ )
+    {
+        int rc;
+
+        entry = table[offsets[level]];
+
+        if ( level == 3 )
+            break;
+
+        /* Stop as soon as we hit an entry with the valid bit unset. */
+        if ( !lpae_is_valid(entry) )
+            break;
+
+        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+            goto out_unmap;
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    /*
+     * If the valid bit of the entry is set, it means someone was playing with
+     * the Stage-2 page table. Nothing to do and mark the fault as resolved.
+     */
+    if ( lpae_is_valid(entry) )
+    {
+        resolved = true;
+        goto out_unmap;
+    }
+
+    /*
+     * The valid bit is unset. If the entry is still not valid then the fault
+     * cannot be resolved, exit and report it.
+     */
+    if ( !p2m_is_valid(entry) )
+        goto out_unmap;
+
+    /*
+     * Now we have an entry with valid bit unset, but still valid from
+     * the P2M point of view.
+     *
+     * If an entry is pointing to a table, each entry of the table will
+     * have there valid bit cleared. This allows a function to clear the
+     * full p2m with just a couple of write. The valid bit will then be
+     * propagated on the fault.
+     * If an entry is pointing to a block/page, no work to do for now.
+     */
+    if ( lpae_is_table(entry, level) )
+        p2m_invalidate_table(p2m, lpae_get_mfn(entry));
+
+    /*
+     * Now that the work on the entry is done, set the valid bit to prevent
+     * another fault on that entry.
+     */
+    resolved = true;
+    entry.p2m.valid = 1;
+
+    p2m_write_pte(table + offsets[level], entry, p2m->clean_pte);
+
+    /*
+     * No need to flush the TLBs as the modified entry had the valid bit
+     * unset.
+     */
+
+out_unmap:
+    unmap_domain_page(table);
+
+out:
+    p2m_write_unlock(p2m);
+
+    return resolved;
+}
+
+static struct page_info *p2m_allocate_root(void)
+{
+    struct page_info *page;
+    unsigned int i;
+
+    page = alloc_domheap_pages(NULL, P2M_ROOT_ORDER, 0);
+    if ( page == NULL )
+        return NULL;
+
+    /* Clear both first level pages */
+    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
+        clear_and_clean_page(page + i);
+
+    return page;
+}
+
+static int p2m_alloc_table(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    p2m->root = p2m_allocate_root();
+    if ( !p2m->root )
+        return -ENOMEM;
+
+    p2m->vttbr = generate_vttbr(p2m->vmid, page_to_mfn(p2m->root));
+
+    /*
+     * Make sure that all TLBs corresponding to the new VMID are flushed
+     * before using it
+     */
+    p2m_write_lock(p2m);
+    p2m_force_tlb_flush_sync(p2m);
+    p2m_write_unlock(p2m);
+
+    return 0;
+}
+
+int p2m_teardown(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    unsigned long count = 0;
+    struct page_info *pg;
+    int rc = 0;
+
+    p2m_write_lock(p2m);
+
+    while ( (pg = page_list_remove_head(&p2m->pages)) )
+    {
+        p2m_free_page(p2m->domain, pg);
+        count++;
+        /* Arbitrarily preempt every 512 iterations */
+        if ( !(count % 512) && hypercall_preempt_check() )
+        {
+            rc = -ERESTART;
+            break;
+        }
+    }
+
+    p2m_write_unlock(p2m);
+
+    return rc;
+}
+
+int p2m_init(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
+    unsigned int cpu;
+
+    rwlock_init(&p2m->lock);
+    spin_lock_init(&d->arch.paging.lock);
+    INIT_PAGE_LIST_HEAD(&p2m->pages);
+    INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
+
+    p2m->vmid = INVALID_VMID;
+    p2m->max_mapped_gfn = _gfn(0);
+    p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
+
+    p2m->default_access = p2m_access_rwx;
+    p2m->mem_access_enabled = false;
+    radix_tree_init(&p2m->mem_access_settings);
+
+    /*
+     * Some IOMMUs don't support coherent PT walk. When the p2m is
+     * shared with the CPU, Xen has to make sure that the PT changes have
+     * reached the memory
+     */
+    p2m->clean_pte = is_iommu_enabled(d) &&
+        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
+
+    /*
+     * Make sure that the type chosen to is able to store the an vCPU ID
+     * between 0 and the maximum of virtual CPUS supported as long as
+     * the INVALID_VCPU_ID.
+     */
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
+
+    for_each_possible_cpu(cpu)
+       p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
+
+    /*
+     * "Trivial" initialisation is now complete.  Set the backpointer so
+     * p2m_teardown() and friends know to do something.
+     */
+    p2m->domain = d;
+
+    rc = p2m_alloc_vmid(d);
+    if ( rc )
+        return rc;
+
+    rc = p2m_alloc_table(d);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+/* VTCR value to be configured by all CPUs. Set only once by the boot CPU */
+static register_t __read_mostly vtcr;
+
+void setup_virt_paging_one(void *data)
+{
+    WRITE_SYSREG(vtcr, VTCR_EL2);
+
+    /*
+     * ARM64_WORKAROUND_AT_SPECULATE: We want to keep the TLBs free from
+     * entries related to EL1/EL0 translation regime until a guest vCPU
+     * is running. For that, we need to set-up VTTBR to point to an empty
+     * page-table and turn on stage-2 translation. The TLB entries
+     * associated with EL1/EL0 translation regime will also be flushed in case
+     * an AT instruction was speculated before hand.
+     */
+    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+    {
+        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
+        WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_VM, HCR_EL2);
+        isb();
+
+        flush_all_guests_tlb_local();
+    }
+}
+
+void __init setup_virt_paging(void)
+{
+    /* Setup Stage 2 address translation */
+    register_t val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
+
+    static const struct {
+        unsigned int pabits; /* Physical Address Size */
+        unsigned int t0sz;   /* Desired T0SZ, minimum in comment */
+        unsigned int root_order; /* Page order of the root of the p2m */
+        unsigned int sl0;    /* Desired SL0, maximum in comment */
+    } pa_range_info[] __initconst = {
+        /* T0SZ minimum and SL0 maximum from ARM DDI 0487H.a Table D5-6 */
+        /*      PA size, t0sz(min), root-order, sl0(max) */
+#ifdef CONFIG_ARM_64
+        [0] = { 32,      32/*32*/,  0,          1 },
+        [1] = { 36,      28/*28*/,  0,          1 },
+        [2] = { 40,      24/*24*/,  1,          1 },
+        [3] = { 42,      22/*22*/,  3,          1 },
+        [4] = { 44,      20/*20*/,  0,          2 },
+        [5] = { 48,      16/*16*/,  0,          2 },
+        [6] = { 52,      12/*12*/,  4,          2 },
+        [7] = { 0 }  /* Invalid */
+#else
+        { 32,      0/*0*/,    0,          1 },
+        { 40,      24/*24*/,  1,          1 }
+#endif
+    };
+
+    unsigned int i;
+    unsigned int pa_range = 0x10; /* Larger than any possible value */
+
+#ifdef CONFIG_ARM_32
+    /*
+     * Typecast pa_range_info[].t0sz into arm32 bit variant.
+     *
+     * VTCR.T0SZ is bits [3:0] and S(sign extension), bit[4] for arm322.
+     * Thus, pa_range_info[].t0sz is translated to its arm32 variant using
+     * struct bitfields.
+     */
+    struct
+    {
+        signed int val:5;
+    } t0sz_32;
+#else
+    /*
+     * Restrict "p2m_ipa_bits" if needed. As P2M table is always configured
+     * with IPA bits == PA bits, compare against "pabits".
+     */
+    if ( pa_range_info[system_cpuinfo.mm64.pa_range].pabits < p2m_ipa_bits )
+        p2m_ipa_bits = pa_range_info[system_cpuinfo.mm64.pa_range].pabits;
+
+    /*
+     * cpu info sanitization made sure we support 16bits VMID only if all
+     * cores are supporting it.
+     */
+    if ( system_cpuinfo.mm64.vmid_bits == MM64_VMID_16_BITS_SUPPORT )
+        max_vmid = MAX_VMID_16_BIT;
+#endif
+
+    /* Choose suitable "pa_range" according to the resulted "p2m_ipa_bits". */
+    for ( i = 0; i < ARRAY_SIZE(pa_range_info); i++ )
+    {
+        if ( p2m_ipa_bits == pa_range_info[i].pabits )
+        {
+            pa_range = i;
+            break;
+        }
+    }
+
+    /* Check if we found the associated entry in the array */
+    if ( pa_range >= ARRAY_SIZE(pa_range_info) || !pa_range_info[pa_range].pabits )
+        panic("%u-bit P2M is not supported\n", p2m_ipa_bits);
+
+#ifdef CONFIG_ARM_64
+    val |= VTCR_PS(pa_range);
+    val |= VTCR_TG0_4K;
+
+    /* Set the VS bit only if 16 bit VMID is supported. */
+    if ( MAX_VMID == MAX_VMID_16_BIT )
+        val |= VTCR_VS;
+#endif
+
+    val |= VTCR_SL0(pa_range_info[pa_range].sl0);
+    val |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
+
+    p2m_root_order = pa_range_info[pa_range].root_order;
+    p2m_root_level = 2 - pa_range_info[pa_range].sl0;
+
+#ifdef CONFIG_ARM_64
+    p2m_ipa_bits = 64 - pa_range_info[pa_range].t0sz;
+#else
+    t0sz_32.val = pa_range_info[pa_range].t0sz;
+    p2m_ipa_bits = 32 - t0sz_32.val;
+#endif
+
+    printk("P2M: %d-bit IPA with %d-bit PA and %d-bit VMID\n",
+           p2m_ipa_bits,
+           pa_range_info[pa_range].pabits,
+           ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
+
+    printk("P2M: %d levels with order-%d root, VTCR 0x%"PRIregister"\n",
+           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
+
+    p2m_vmid_allocator_init();
+
+    /* It is not allowed to concatenate a level zero root */
+    BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
+    vtcr = val;
+
+    /*
+     * ARM64_WORKAROUND_AT_SPECULATE requires to allocate root table
+     * with all entries zeroed.
+     */
+    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+    {
+        struct page_info *root;
+
+        root = p2m_allocate_root();
+        if ( !root )
+            panic("Unable to allocate root table for ARM64_WORKAROUND_AT_SPECULATE\n");
+
+        empty_root_mfn = page_to_mfn(root);
+    }
+
+    setup_virt_paging_one(NULL);
+    smp_call_function(setup_virt_paging_one, NULL, 1);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index de32a2d638..64dcfcb05b 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,1466 +1,136 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <xen/cpu.h>
-#include <xen/domain_page.h>
 #include <xen/iocap.h>
-#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/softirq.h>
 
-#include <asm/alternative.h>
 #include <asm/event.h>
 #include <asm/flushtlb.h>
 #include <asm/guest_walk.h>
 #include <asm/page.h>
 #include <asm/traps.h>
 
-#define MAX_VMID_8_BIT  (1UL << 8)
-#define MAX_VMID_16_BIT (1UL << 16)
-
-#define INVALID_VMID 0 /* VMID 0 is reserved */
-
-unsigned int __read_mostly p2m_root_order;
-unsigned int __read_mostly p2m_root_level;
 #ifdef CONFIG_ARM_64
-static unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
-/* VMID is by default 8 bit width on AArch64 */
-#define MAX_VMID       max_vmid
-#else
-/* VMID is always 8 bit width on AArch32 */
-#define MAX_VMID        MAX_VMID_8_BIT
-#endif
-
-#define P2M_ROOT_PAGES    (1<<P2M_ROOT_ORDER)
-
-/*
- * Set to the maximum configured support for IPA bits, so the number of IPA bits can be
- * restricted by external entity (e.g. IOMMU).
- */
-unsigned int __read_mostly p2m_ipa_bits = PADDR_BITS;
-
-static mfn_t __read_mostly empty_root_mfn;
-
-static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
-{
-    return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
-}
-
-static struct page_info *p2m_alloc_page(struct domain *d)
-{
-    struct page_info *pg;
-
-    /*
-     * For hardware domain, there should be no limit in the number of pages that
-     * can be allocated, so that the kernel may take advantage of the extended
-     * regions. Hence, allocate p2m pages for hardware domains from heap.
-     */
-    if ( is_hardware_domain(d) )
-    {
-        pg = alloc_domheap_page(NULL, 0);
-        if ( pg == NULL )
-            printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n");
-    }
-    else
-    {
-        spin_lock(&d->arch.paging.lock);
-        pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
-        spin_unlock(&d->arch.paging.lock);
-    }
-
-    return pg;
-}
-
-static void p2m_free_page(struct domain *d, struct page_info *pg)
-{
-    if ( is_hardware_domain(d) )
-        free_domheap_page(pg);
-    else
-    {
-        spin_lock(&d->arch.paging.lock);
-        page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
-        spin_unlock(&d->arch.paging.lock);
-    }
-}
-
-/* Return the size of the pool, in bytes. */
-int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
-{
-    *size = (uint64_t)ACCESS_ONCE(d->arch.paging.p2m_total_pages) << PAGE_SHIFT;
-    return 0;
-}
-
-/*
- * Set the pool of pages to the required number of pages.
- * Returns 0 for success, non-zero for failure.
- * Call with d->arch.paging.lock held.
- */
-int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
-{
-    struct page_info *pg;
-
-    ASSERT(spin_is_locked(&d->arch.paging.lock));
-
-    for ( ; ; )
-    {
-        if ( d->arch.paging.p2m_total_pages < pages )
-        {
-            /* Need to allocate more memory from domheap */
-            pg = alloc_domheap_page(NULL, 0);
-            if ( pg == NULL )
-            {
-                printk(XENLOG_ERR "Failed to allocate P2M pages.\n");
-                return -ENOMEM;
-            }
-            ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
-                d->arch.paging.p2m_total_pages + 1;
-            page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
-        }
-        else if ( d->arch.paging.p2m_total_pages > pages )
-        {
-            /* Need to return memory to domheap */
-            pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
-            if( pg )
-            {
-                ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
-                    d->arch.paging.p2m_total_pages - 1;
-                free_domheap_page(pg);
-            }
-            else
-            {
-                printk(XENLOG_ERR
-                       "Failed to free P2M pages, P2M freelist is empty.\n");
-                return -ENOMEM;
-            }
-        }
-        else
-            break;
-
-        /* Check to see if we need to yield and try again */
-        if ( preempted && general_preempt_check() )
-        {
-            *preempted = true;
-            return -ERESTART;
-        }
-    }
-
-    return 0;
-}
-
-int arch_set_paging_mempool_size(struct domain *d, uint64_t size)
-{
-    unsigned long pages = size >> PAGE_SHIFT;
-    bool preempted = false;
-    int rc;
-
-    if ( (size & ~PAGE_MASK) ||          /* Non page-sized request? */
-         pages != (size >> PAGE_SHIFT) ) /* 32-bit overflow? */
-        return -EINVAL;
-
-    spin_lock(&d->arch.paging.lock);
-    rc = p2m_set_allocation(d, pages, &preempted);
-    spin_unlock(&d->arch.paging.lock);
-
-    ASSERT(preempted == (rc == -ERESTART));
-
-    return rc;
-}
-
-int p2m_teardown_allocation(struct domain *d)
-{
-    int ret = 0;
-    bool preempted = false;
-
-    spin_lock(&d->arch.paging.lock);
-    if ( d->arch.paging.p2m_total_pages != 0 )
-    {
-        ret = p2m_set_allocation(d, 0, &preempted);
-        if ( preempted )
-        {
-            spin_unlock(&d->arch.paging.lock);
-            return -ERESTART;
-        }
-        ASSERT(d->arch.paging.p2m_total_pages == 0);
-    }
-    spin_unlock(&d->arch.paging.lock);
-
-    return ret;
-}
-
-/* Unlock the flush and do a P2M TLB flush if necessary */
-void p2m_write_unlock(struct p2m_domain *p2m)
-{
-    /*
-     * The final flush is done with the P2M write lock taken to avoid
-     * someone else modifying the P2M wbefore the TLB invalidation has
-     * completed.
-     */
-    p2m_tlb_flush_sync(p2m);
-
-    write_unlock(&p2m->lock);
-}
-
-void p2m_dump_info(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    p2m_read_lock(p2m);
-    printk("p2m mappings for domain %d (vmid %d):\n",
-           d->domain_id, p2m->vmid);
-    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
-    printk("  1G mappings: %ld (shattered %ld)\n",
-           p2m->stats.mappings[1], p2m->stats.shattered[1]);
-    printk("  2M mappings: %ld (shattered %ld)\n",
-           p2m->stats.mappings[2], p2m->stats.shattered[2]);
-    printk("  4K mappings: %ld\n", p2m->stats.mappings[3]);
-    p2m_read_unlock(p2m);
-}
-
-void memory_type_changed(struct domain *d)
-{
-}
-
-void dump_p2m_lookup(struct domain *d, paddr_t addr)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    printk("dom%d IPA 0x%"PRIpaddr"\n", d->domain_id, addr);
-
-    printk("P2M @ %p mfn:%#"PRI_mfn"\n",
-           p2m->root, mfn_x(page_to_mfn(p2m->root)));
-
-    dump_pt_walk(page_to_maddr(p2m->root), addr,
-                 P2M_ROOT_LEVEL, P2M_ROOT_PAGES);
-}
-
-/*
- * p2m_save_state and p2m_restore_state work in pair to workaround
- * ARM64_WORKAROUND_AT_SPECULATE. p2m_save_state will set-up VTTBR to
- * point to the empty page-tables to stop allocating TLB entries.
- */
-void p2m_save_state(struct vcpu *p)
-{
-    p->arch.sctlr = READ_SYSREG(SCTLR_EL1);
-
-    if ( cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-    {
-        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
-        /*
-         * Ensure VTTBR_EL2 is correctly synchronized so we can restore
-         * the next vCPU context without worrying about AT instruction
-         * speculation.
-         */
-        isb();
-    }
-}
-
-void p2m_restore_state(struct vcpu *n)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(n->domain);
-    uint8_t *last_vcpu_ran;
-
-    if ( is_idle_vcpu(n) )
-        return;
-
-    WRITE_SYSREG(n->arch.sctlr, SCTLR_EL1);
-    WRITE_SYSREG(n->arch.hcr_el2, HCR_EL2);
-
-    /*
-     * ARM64_WORKAROUND_AT_SPECULATE: VTTBR_EL2 should be restored after all
-     * registers associated to EL1/EL0 translations regime have been
-     * synchronized.
-     */
-    asm volatile(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_AT_SPECULATE));
-    WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
-
-    last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
-
-    /*
-     * While we are restoring an out-of-context translation regime
-     * we still need to ensure:
-     *  - VTTBR_EL2 is synchronized before flushing the TLBs
-     *  - All registers for EL1 are synchronized before executing an AT
-     *  instructions targeting S1/S2.
-     */
-    isb();
-
-    /*
-     * Flush local TLB for the domain to prevent wrong TLB translation
-     * when running multiple vCPU of the same domain on a single pCPU.
-     */
-    if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
-        flush_guest_tlb_local();
-
-    *last_vcpu_ran = n->vcpu_id;
-}
-
-/*
- * Force a synchronous P2M TLB flush.
- *
- * Must be called with the p2m lock held.
- */
-static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
-{
-    unsigned long flags = 0;
-    uint64_t ovttbr;
-
-    ASSERT(p2m_is_write_locked(p2m));
-
-    /*
-     * ARM only provides an instruction to flush TLBs for the current
-     * VMID. So switch to the VTTBR of a given P2M if different.
-     */
-    ovttbr = READ_SYSREG64(VTTBR_EL2);
-    if ( ovttbr != p2m->vttbr )
-    {
-        uint64_t vttbr;
-
-        local_irq_save(flags);
-
-        /*
-         * ARM64_WORKAROUND_AT_SPECULATE: We need to stop AT to allocate
-         * TLBs entries because the context is partially modified. We
-         * only need the VMID for flushing the TLBs, so we can generate
-         * a new VTTBR with the VMID to flush and the empty root table.
-         */
-        if ( !cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-            vttbr = p2m->vttbr;
-        else
-            vttbr = generate_vttbr(p2m->vmid, empty_root_mfn);
-
-        WRITE_SYSREG64(vttbr, VTTBR_EL2);
-
-        /* Ensure VTTBR_EL2 is synchronized before flushing the TLBs */
-        isb();
-    }
-
-    flush_guest_tlb();
-
-    if ( ovttbr != READ_SYSREG64(VTTBR_EL2) )
-    {
-        WRITE_SYSREG64(ovttbr, VTTBR_EL2);
-        /* Ensure VTTBR_EL2 is back in place before continuing. */
-        isb();
-        local_irq_restore(flags);
-    }
-
-    p2m->need_flush = false;
-}
-
-void p2m_tlb_flush_sync(struct p2m_domain *p2m)
-{
-    if ( p2m->need_flush )
-        p2m_force_tlb_flush_sync(p2m);
-}
-
-/*
- * Find and map the root page table. The caller is responsible for
- * unmapping the table.
- *
- * The function will return NULL if the offset of the root table is
- * invalid.
- */
-static lpae_t *p2m_get_root_pointer(struct p2m_domain *p2m,
-                                    gfn_t gfn)
-{
-    unsigned long root_table;
-
-    /*
-     * While the root table index is the offset from the previous level,
-     * we can't use (P2M_ROOT_LEVEL - 1) because the root level might be
-     * 0. Yet we still want to check if all the unused bits are zeroed.
-     */
-    root_table = gfn_x(gfn) >> (XEN_PT_LEVEL_ORDER(P2M_ROOT_LEVEL) +
-                                XEN_PT_LPAE_SHIFT);
-    if ( root_table >= P2M_ROOT_PAGES )
-        return NULL;
-
-    return __map_domain_page(p2m->root + root_table);
-}
-
-/*
- * Lookup the MFN corresponding to a domain's GFN.
- * Lookup mem access in the ratrix tree.
- * The entries associated to the GFN is considered valid.
- */
-static p2m_access_t p2m_mem_access_radix_get(struct p2m_domain *p2m, gfn_t gfn)
-{
-    void *ptr;
-
-    if ( !p2m->mem_access_enabled )
-        return p2m->default_access;
-
-    ptr = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
-    if ( !ptr )
-        return p2m_access_rwx;
-    else
-        return radix_tree_ptr_to_int(ptr);
-}
-
-/*
- * In the case of the P2M, the valid bit is used for other purpose. Use
- * the type to check whether an entry is valid.
- */
-static inline bool p2m_is_valid(lpae_t pte)
-{
-    return pte.p2m.type != p2m_invalid;
-}
-
-/*
- * lpae_is_* helpers don't check whether the valid bit is set in the
- * PTE. Provide our own overlay to check the valid bit.
- */
-static inline bool p2m_is_mapping(lpae_t pte, unsigned int level)
-{
-    return p2m_is_valid(pte) && lpae_is_mapping(pte, level);
-}
-
-static inline bool p2m_is_superpage(lpae_t pte, unsigned int level)
-{
-    return p2m_is_valid(pte) && lpae_is_superpage(pte, level);
-}
-
-#define GUEST_TABLE_MAP_FAILED 0
-#define GUEST_TABLE_SUPER_PAGE 1
-#define GUEST_TABLE_NORMAL_PAGE 2
-
-static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry);
-
-/*
- * Take the currently mapped table, find the corresponding GFN entry,
- * and map the next table, if available. The previous table will be
- * unmapped if the next level was mapped (e.g GUEST_TABLE_NORMAL_PAGE
- * returned).
- *
- * The read_only parameters indicates whether intermediate tables should
- * be allocated when not present.
- *
- * Return values:
- *  GUEST_TABLE_MAP_FAILED: Either read_only was set and the entry
- *  was empty, or allocating a new page failed.
- *  GUEST_TABLE_NORMAL_PAGE: next level mapped normally
- *  GUEST_TABLE_SUPER_PAGE: The next entry points to a superpage.
- */
-static int p2m_next_level(struct p2m_domain *p2m, bool read_only,
-                          unsigned int level, lpae_t **table,
-                          unsigned int offset)
-{
-    lpae_t *entry;
-    int ret;
-    mfn_t mfn;
-
-    entry = *table + offset;
-
-    if ( !p2m_is_valid(*entry) )
-    {
-        if ( read_only )
-            return GUEST_TABLE_MAP_FAILED;
-
-        ret = p2m_create_table(p2m, entry);
-        if ( ret )
-            return GUEST_TABLE_MAP_FAILED;
-    }
-
-    /* The function p2m_next_level is never called at the 3rd level */
-    ASSERT(level < 3);
-    if ( p2m_is_mapping(*entry, level) )
-        return GUEST_TABLE_SUPER_PAGE;
-
-    mfn = lpae_get_mfn(*entry);
-
-    unmap_domain_page(*table);
-    *table = map_domain_page(mfn);
-
-    return GUEST_TABLE_NORMAL_PAGE;
-}
-
-/*
- * Get the details of a given gfn.
- *
- * If the entry is present, the associated MFN will be returned and the
- * access and type filled up. The page_order will correspond to the
- * order of the mapping in the page table (i.e it could be a superpage).
- *
- * If the entry is not present, INVALID_MFN will be returned and the
- * page_order will be set according to the order of the invalid range.
- *
- * valid will contain the value of bit[0] (e.g valid bit) of the
- * entry.
- */
-mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
-                    p2m_type_t *t, p2m_access_t *a,
-                    unsigned int *page_order,
-                    bool *valid)
-{
-    paddr_t addr = gfn_to_gaddr(gfn);
-    unsigned int level = 0;
-    lpae_t entry, *table;
-    int rc;
-    mfn_t mfn = INVALID_MFN;
-    p2m_type_t _t;
-    DECLARE_OFFSETS(offsets, addr);
-
-    ASSERT(p2m_is_locked(p2m));
-    BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
-
-    /* Allow t to be NULL */
-    t = t ?: &_t;
-
-    *t = p2m_invalid;
-
-    if ( valid )
-        *valid = false;
-
-    /* XXX: Check if the mapping is lower than the mapped gfn */
-
-    /* This gfn is higher than the highest the p2m map currently holds */
-    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
-    {
-        for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
-            if ( (gfn_x(gfn) & (XEN_PT_LEVEL_MASK(level) >> PAGE_SHIFT)) >
-                 gfn_x(p2m->max_mapped_gfn) )
-                break;
-
-        goto out;
-    }
-
-    table = p2m_get_root_pointer(p2m, gfn);
-
-    /*
-     * the table should always be non-NULL because the gfn is below
-     * p2m->max_mapped_gfn and the root table pages are always present.
-     */
-    if ( !table )
-    {
-        ASSERT_UNREACHABLE();
-        level = P2M_ROOT_LEVEL;
-        goto out;
-    }
-
-    for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
-    {
-        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
-        if ( rc == GUEST_TABLE_MAP_FAILED )
-            goto out_unmap;
-        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
-            break;
-    }
-
-    entry = table[offsets[level]];
-
-    if ( p2m_is_valid(entry) )
-    {
-        *t = entry.p2m.type;
-
-        if ( a )
-            *a = p2m_mem_access_radix_get(p2m, gfn);
-
-        mfn = lpae_get_mfn(entry);
-        /*
-         * The entry may point to a superpage. Find the MFN associated
-         * to the GFN.
-         */
-        mfn = mfn_add(mfn,
-                      gfn_x(gfn) & ((1UL << XEN_PT_LEVEL_ORDER(level)) - 1));
-
-        if ( valid )
-            *valid = lpae_is_valid(entry);
-    }
-
-out_unmap:
-    unmap_domain_page(table);
-
-out:
-    if ( page_order )
-        *page_order = XEN_PT_LEVEL_ORDER(level);
-
-    return mfn;
-}
-
-mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
-{
-    mfn_t mfn;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    p2m_read_lock(p2m);
-    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL, NULL);
-    p2m_read_unlock(p2m);
-
-    return mfn;
-}
-
-struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn,
-                                        p2m_type_t *t)
-{
-    struct page_info *page;
-    p2m_type_t p2mt;
-    mfn_t mfn = p2m_lookup(d, gfn, &p2mt);
-
-    if ( t )
-        *t = p2mt;
-
-    if ( !p2m_is_any_ram(p2mt) )
-        return NULL;
-
-    if ( !mfn_valid(mfn) )
-        return NULL;
-
-    page = mfn_to_page(mfn);
-
-    /*
-     * get_page won't work on foreign mapping because the page doesn't
-     * belong to the current domain.
-     */
-    if ( p2m_is_foreign(p2mt) )
-    {
-        struct domain *fdom = page_get_owner_and_reference(page);
-        ASSERT(fdom != NULL);
-        ASSERT(fdom != d);
-        return page;
-    }
-
-    return get_page(page, d) ? page : NULL;
-}
-
-int guest_physmap_mark_populate_on_demand(struct domain *d,
-                                          unsigned long gfn,
-                                          unsigned int order)
-{
-    return -ENOSYS;
-}
-
-unsigned long p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn,
-                                           unsigned int order)
-{
-    return 0;
-}
-
-static void p2m_set_permission(lpae_t *e, p2m_type_t t, p2m_access_t a)
-{
-    /* First apply type permissions */
-    switch ( t )
-    {
-    case p2m_ram_rw:
-        e->p2m.xn = 0;
-        e->p2m.write = 1;
-        break;
-
-    case p2m_ram_ro:
-        e->p2m.xn = 0;
-        e->p2m.write = 0;
-        break;
-
-    case p2m_iommu_map_rw:
-    case p2m_map_foreign_rw:
-    case p2m_grant_map_rw:
-    case p2m_mmio_direct_dev:
-    case p2m_mmio_direct_nc:
-    case p2m_mmio_direct_c:
-        e->p2m.xn = 1;
-        e->p2m.write = 1;
-        break;
-
-    case p2m_iommu_map_ro:
-    case p2m_map_foreign_ro:
-    case p2m_grant_map_ro:
-    case p2m_invalid:
-        e->p2m.xn = 1;
-        e->p2m.write = 0;
-        break;
-
-    case p2m_max_real_type:
-        BUG();
-        break;
-    }
-
-    /* Then restrict with access permissions */
-    switch ( a )
-    {
-    case p2m_access_rwx:
-        break;
-    case p2m_access_wx:
-        e->p2m.read = 0;
-        break;
-    case p2m_access_rw:
-        e->p2m.xn = 1;
-        break;
-    case p2m_access_w:
-        e->p2m.read = 0;
-        e->p2m.xn = 1;
-        break;
-    case p2m_access_rx:
-    case p2m_access_rx2rw:
-        e->p2m.write = 0;
-        break;
-    case p2m_access_x:
-        e->p2m.write = 0;
-        e->p2m.read = 0;
-        break;
-    case p2m_access_r:
-        e->p2m.write = 0;
-        e->p2m.xn = 1;
-        break;
-    case p2m_access_n:
-    case p2m_access_n2rwx:
-        e->p2m.read = e->p2m.write = 0;
-        e->p2m.xn = 1;
-        break;
-    }
-}
-
-static lpae_t mfn_to_p2m_entry(mfn_t mfn, p2m_type_t t, p2m_access_t a)
-{
-    /*
-     * sh, xn and write bit will be defined in the following switches
-     * based on mattr and t.
-     */
-    lpae_t e = (lpae_t) {
-        .p2m.af = 1,
-        .p2m.read = 1,
-        .p2m.table = 1,
-        .p2m.valid = 1,
-        .p2m.type = t,
-    };
-
-    BUILD_BUG_ON(p2m_max_real_type > (1 << 4));
-
-    switch ( t )
-    {
-    case p2m_mmio_direct_dev:
-        e.p2m.mattr = MATTR_DEV;
-        e.p2m.sh = LPAE_SH_OUTER;
-        break;
-
-    case p2m_mmio_direct_c:
-        e.p2m.mattr = MATTR_MEM;
-        e.p2m.sh = LPAE_SH_OUTER;
-        break;
-
-    /*
-     * ARM ARM: Overlaying the shareability attribute (DDI
-     * 0406C.b B3-1376 to 1377)
-     *
-     * A memory region with a resultant memory type attribute of Normal,
-     * and a resultant cacheability attribute of Inner Non-cacheable,
-     * Outer Non-cacheable, must have a resultant shareability attribute
-     * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
-     *
-     * On ARMv8 shareability is ignored and explicitly treated as Outer
-     * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
-     * See the note for table D4-40, in page 1788 of the ARM DDI 0487A.j.
-     */
-    case p2m_mmio_direct_nc:
-        e.p2m.mattr = MATTR_MEM_NC;
-        e.p2m.sh = LPAE_SH_OUTER;
-        break;
-
-    default:
-        e.p2m.mattr = MATTR_MEM;
-        e.p2m.sh = LPAE_SH_INNER;
-    }
-
-    p2m_set_permission(&e, t, a);
-
-    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
-
-    lpae_set_mfn(e, mfn);
-
-    return e;
-}
-
-/* Generate table entry with correct attributes. */
-static lpae_t page_to_p2m_table(struct page_info *page)
-{
-    /*
-     * The access value does not matter because the hardware will ignore
-     * the permission fields for table entry.
-     *
-     * We use p2m_ram_rw so the entry has a valid type. This is important
-     * for p2m_is_valid() to return valid on table entries.
-     */
-    return mfn_to_p2m_entry(page_to_mfn(page), p2m_ram_rw, p2m_access_rwx);
-}
-
-static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool clean_pte)
-{
-    write_pte(p, pte);
-    if ( clean_pte )
-        clean_dcache(*p);
-}
-
-static inline void p2m_remove_pte(lpae_t *p, bool clean_pte)
-{
-    lpae_t pte;
-
-    memset(&pte, 0x00, sizeof(pte));
-    p2m_write_pte(p, pte, clean_pte);
-}
-
-/* Allocate a new page table page and hook it in via the given entry. */
-static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry)
-{
-    struct page_info *page;
-    lpae_t *p;
-
-    ASSERT(!p2m_is_valid(*entry));
-
-    page = p2m_alloc_page(p2m->domain);
-    if ( page == NULL )
-        return -ENOMEM;
-
-    page_list_add(page, &p2m->pages);
-
-    p = __map_domain_page(page);
-    clear_page(p);
-
-    if ( p2m->clean_pte )
-        clean_dcache_va_range(p, PAGE_SIZE);
-
-    unmap_domain_page(p);
-
-    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
-
-    return 0;
-}
-
-static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
-                                    p2m_access_t a)
-{
-    int rc;
-
-    if ( !p2m->mem_access_enabled )
-        return 0;
-
-    if ( p2m_access_rwx == a )
-    {
-        radix_tree_delete(&p2m->mem_access_settings, gfn_x(gfn));
-        return 0;
-    }
-
-    rc = radix_tree_insert(&p2m->mem_access_settings, gfn_x(gfn),
-                           radix_tree_int_to_ptr(a));
-    if ( rc == -EEXIST )
-    {
-        /* If a setting already exists, change it to the new one */
-        radix_tree_replace_slot(
-            radix_tree_lookup_slot(
-                &p2m->mem_access_settings, gfn_x(gfn)),
-            radix_tree_int_to_ptr(a));
-        rc = 0;
-    }
-
-    return rc;
-}
-
-/*
- * Put any references on the single 4K page referenced by pte.
- * TODO: Handle superpages, for now we only take special references for leaf
- * pages (specifically foreign ones, which can't be super mapped today).
- */
-static void p2m_put_l3_page(const lpae_t pte)
-{
-    mfn_t mfn = lpae_get_mfn(pte);
-
-    ASSERT(p2m_is_valid(pte));
-
-    /*
-     * TODO: Handle other p2m types
-     *
-     * It's safe to do the put_page here because page_alloc will
-     * flush the TLBs if the page is reallocated before the end of
-     * this loop.
-     */
-    if ( p2m_is_foreign(pte.p2m.type) )
-    {
-        ASSERT(mfn_valid(mfn));
-        put_page(mfn_to_page(mfn));
-    }
-    /* Detect the xenheap page and mark the stored GFN as invalid. */
-    else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
-        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
-}
-
-/* Free lpae sub-tree behind an entry */
-static void p2m_free_entry(struct p2m_domain *p2m,
-                           lpae_t entry, unsigned int level)
-{
-    unsigned int i;
-    lpae_t *table;
-    mfn_t mfn;
-    struct page_info *pg;
-
-    /* Nothing to do if the entry is invalid. */
-    if ( !p2m_is_valid(entry) )
-        return;
-
-    if ( p2m_is_superpage(entry, level) || (level == 3) )
-    {
-#ifdef CONFIG_IOREQ_SERVER
-        /*
-         * If this gets called then either the entry was replaced by an entry
-         * with a different base (valid case) or the shattering of a superpage
-         * has failed (error case).
-         * So, at worst, the spurious mapcache invalidation might be sent.
-         */
-        if ( p2m_is_ram(entry.p2m.type) &&
-             domain_has_ioreq_server(p2m->domain) )
-            ioreq_request_mapcache_invalidate(p2m->domain);
-#endif
-
-        p2m->stats.mappings[level]--;
-        /* Nothing to do if the entry is a super-page. */
-        if ( level == 3 )
-            p2m_put_l3_page(entry);
-        return;
-    }
-
-    table = map_domain_page(lpae_get_mfn(entry));
-    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-        p2m_free_entry(p2m, *(table + i), level + 1);
-
-    unmap_domain_page(table);
-
-    /*
-     * Make sure all the references in the TLB have been removed before
-     * freing the intermediate page table.
-     * XXX: Should we defer the free of the page table to avoid the
-     * flush?
-     */
-    p2m_tlb_flush_sync(p2m);
-
-    mfn = lpae_get_mfn(entry);
-    ASSERT(mfn_valid(mfn));
-
-    pg = mfn_to_page(mfn);
-
-    page_list_del(pg, &p2m->pages);
-    p2m_free_page(p2m->domain, pg);
-}
-
-static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
-                                unsigned int level, unsigned int target,
-                                const unsigned int *offsets)
-{
-    struct page_info *page;
-    unsigned int i;
-    lpae_t pte, *table;
-    bool rv = true;
-
-    /* Convenience aliases */
-    mfn_t mfn = lpae_get_mfn(*entry);
-    unsigned int next_level = level + 1;
-    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
-
-    /*
-     * This should only be called with target != level and the entry is
-     * a superpage.
-     */
-    ASSERT(level < target);
-    ASSERT(p2m_is_superpage(*entry, level));
-
-    page = p2m_alloc_page(p2m->domain);
-    if ( !page )
-        return false;
-
-    page_list_add(page, &p2m->pages);
-    table = __map_domain_page(page);
-
-    /*
-     * We are either splitting a first level 1G page into 512 second level
-     * 2M pages, or a second level 2M page into 512 third level 4K pages.
-     */
-    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-    {
-        lpae_t *new_entry = table + i;
-
-        /*
-         * Use the content of the superpage entry and override
-         * the necessary fields. So the correct permission are kept.
-         */
-        pte = *entry;
-        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
-
-        /*
-         * First and second level pages set p2m.table = 0, but third
-         * level entries set p2m.table = 1.
-         */
-        pte.p2m.table = (next_level == 3);
-
-        write_pte(new_entry, pte);
-    }
-
-    /* Update stats */
-    p2m->stats.shattered[level]++;
-    p2m->stats.mappings[level]--;
-    p2m->stats.mappings[next_level] += XEN_PT_LPAE_ENTRIES;
-
-    /*
-     * Shatter superpage in the page to the level we want to make the
-     * changes.
-     * This is done outside the loop to avoid checking the offset to
-     * know whether the entry should be shattered for every entry.
-     */
-    if ( next_level != target )
-        rv = p2m_split_superpage(p2m, table + offsets[next_level],
-                                 level + 1, target, offsets);
-
-    if ( p2m->clean_pte )
-        clean_dcache_va_range(table, PAGE_SIZE);
-
-    unmap_domain_page(table);
-
-    /*
-     * Even if we failed, we should install the newly allocated LPAE
-     * entry. The caller will be in charge to free the sub-tree.
-     */
-    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
-
-    return rv;
-}
-
-/*
- * Insert an entry in the p2m. This should be called with a mapping
- * equal to a page/superpage (4K, 2M, 1G).
- */
-static int __p2m_set_entry(struct p2m_domain *p2m,
-                           gfn_t sgfn,
-                           unsigned int page_order,
-                           mfn_t smfn,
-                           p2m_type_t t,
-                           p2m_access_t a)
-{
-    unsigned int level = 0;
-    unsigned int target = 3 - (page_order / XEN_PT_LPAE_SHIFT);
-    lpae_t *entry, *table, orig_pte;
-    int rc;
-    /* A mapping is removed if the MFN is invalid. */
-    bool removing_mapping = mfn_eq(smfn, INVALID_MFN);
-    DECLARE_OFFSETS(offsets, gfn_to_gaddr(sgfn));
-
-    ASSERT(p2m_is_write_locked(p2m));
-
-    /*
-     * Check if the level target is valid: we only support
-     * 4K - 2M - 1G mapping.
-     */
-    ASSERT(target > 0 && target <= 3);
-
-    table = p2m_get_root_pointer(p2m, sgfn);
-    if ( !table )
-        return -EINVAL;
-
-    for ( level = P2M_ROOT_LEVEL; level < target; level++ )
-    {
-        /*
-         * Don't try to allocate intermediate page table if the mapping
-         * is about to be removed.
-         */
-        rc = p2m_next_level(p2m, removing_mapping,
-                            level, &table, offsets[level]);
-        if ( rc == GUEST_TABLE_MAP_FAILED )
-        {
-            /*
-             * We are here because p2m_next_level has failed to map
-             * the intermediate page table (e.g the table does not exist
-             * and they p2m tree is read-only). It is a valid case
-             * when removing a mapping as it may not exist in the
-             * page table. In this case, just ignore it.
-             */
-            rc = removing_mapping ?  0 : -ENOENT;
-            goto out;
-        }
-        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
-            break;
-    }
-
-    entry = table + offsets[level];
-
-    /*
-     * If we are here with level < target, we must be at a leaf node,
-     * and we need to break up the superpage.
-     */
-    if ( level < target )
-    {
-        /* We need to split the original page. */
-        lpae_t split_pte = *entry;
-
-        ASSERT(p2m_is_superpage(*entry, level));
-
-        if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
-        {
-            /*
-             * The current super-page is still in-place, so re-increment
-             * the stats.
-             */
-            p2m->stats.mappings[level]++;
-
-            /* Free the allocated sub-tree */
-            p2m_free_entry(p2m, split_pte, level);
-
-            rc = -ENOMEM;
-            goto out;
-        }
-
-        /*
-         * Follow the break-before-sequence to update the entry.
-         * For more details see (D4.7.1 in ARM DDI 0487A.j).
-         */
-        p2m_remove_pte(entry, p2m->clean_pte);
-        p2m_force_tlb_flush_sync(p2m);
-
-        p2m_write_pte(entry, split_pte, p2m->clean_pte);
-
-        /* then move to the level we want to make real changes */
-        for ( ; level < target; level++ )
-        {
-            rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
-
-            /*
-             * The entry should be found and either be a table
-             * or a superpage if level 3 is not targeted
-             */
-            ASSERT(rc == GUEST_TABLE_NORMAL_PAGE ||
-                   (rc == GUEST_TABLE_SUPER_PAGE && target < 3));
-        }
-
-        entry = table + offsets[level];
-    }
-
-    /*
-     * We should always be there with the correct level because
-     * all the intermediate tables have been installed if necessary.
-     */
-    ASSERT(level == target);
-
-    orig_pte = *entry;
-
-    /*
-     * The radix-tree can only work on 4KB. This is only used when
-     * memaccess is enabled and during shutdown.
-     */
-    ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
-           p2m->domain->is_dying);
-    /*
-     * The access type should always be p2m_access_rwx when the mapping
-     * is removed.
-     */
-    ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx));
-    /*
-     * Update the mem access permission before update the P2M. So we
-     * don't have to revert the mapping if it has failed.
-     */
-    rc = p2m_mem_access_radix_set(p2m, sgfn, a);
-    if ( rc )
-        goto out;
-
-    /*
-     * Always remove the entry in order to follow the break-before-make
-     * sequence when updating the translation table (D4.7.1 in ARM DDI
-     * 0487A.j).
-     */
-    if ( lpae_is_valid(orig_pte) || removing_mapping )
-        p2m_remove_pte(entry, p2m->clean_pte);
-
-    if ( removing_mapping )
-        /* Flush can be deferred if the entry is removed */
-        p2m->need_flush |= !!lpae_is_valid(orig_pte);
-    else
-    {
-        lpae_t pte = mfn_to_p2m_entry(smfn, t, a);
-
-        if ( level < 3 )
-            pte.p2m.table = 0; /* Superpage entry */
-
-        /*
-         * It is necessary to flush the TLB before writing the new entry
-         * to keep coherency when the previous entry was valid.
-         *
-         * Although, it could be defered when only the permissions are
-         * changed (e.g in case of memaccess).
-         */
-        if ( lpae_is_valid(orig_pte) )
-        {
-            if ( likely(!p2m->mem_access_enabled) ||
-                 P2M_CLEAR_PERM(pte) != P2M_CLEAR_PERM(orig_pte) )
-                p2m_force_tlb_flush_sync(p2m);
-            else
-                p2m->need_flush = true;
-        }
-        else if ( !p2m_is_valid(orig_pte) ) /* new mapping */
-            p2m->stats.mappings[level]++;
-
-        p2m_write_pte(entry, pte, p2m->clean_pte);
-
-        p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
-                                      gfn_add(sgfn, (1UL << page_order) - 1));
-        p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn);
-    }
-
-    if ( is_iommu_enabled(p2m->domain) &&
-         (lpae_is_valid(orig_pte) || lpae_is_valid(*entry)) )
-    {
-        unsigned int flush_flags = 0;
-
-        if ( lpae_is_valid(orig_pte) )
-            flush_flags |= IOMMU_FLUSHF_modified;
-        if ( lpae_is_valid(*entry) )
-            flush_flags |= IOMMU_FLUSHF_added;
-
-        rc = iommu_iotlb_flush(p2m->domain, _dfn(gfn_x(sgfn)),
-                               1UL << page_order, flush_flags);
-    }
-    else
-        rc = 0;
-
-    /*
-     * Free the entry only if the original pte was valid and the base
-     * is different (to avoid freeing when permission is changed).
-     */
-    if ( p2m_is_valid(orig_pte) &&
-         !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
-        p2m_free_entry(p2m, orig_pte, level);
-
-out:
-    unmap_domain_page(table);
-
-    return rc;
-}
-
-int p2m_set_entry(struct p2m_domain *p2m,
-                  gfn_t sgfn,
-                  unsigned long nr,
-                  mfn_t smfn,
-                  p2m_type_t t,
-                  p2m_access_t a)
-{
-    int rc = 0;
-
-    /*
-     * Any reference taken by the P2M mappings (e.g. foreign mapping) will
-     * be dropped in relinquish_p2m_mapping(). As the P2M will still
-     * be accessible after, we need to prevent mapping to be added when the
-     * domain is dying.
-     */
-    if ( unlikely(p2m->domain->is_dying) )
-        return -ENOMEM;
-
-    while ( nr )
-    {
-        unsigned long mask;
-        unsigned long order;
-
-        /*
-         * Don't take into account the MFN when removing mapping (i.e
-         * MFN_INVALID) to calculate the correct target order.
-         *
-         * XXX: Support superpage mappings if nr is not aligned to a
-         * superpage size.
-         */
-        mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
-        mask |= gfn_x(sgfn) | nr;
-
-        /* Always map 4k by 4k when memaccess is enabled */
-        if ( unlikely(p2m->mem_access_enabled) )
-            order = THIRD_ORDER;
-        else if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
-            order = FIRST_ORDER;
-        else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
-            order = SECOND_ORDER;
-        else
-            order = THIRD_ORDER;
-
-        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
-        if ( rc )
-            break;
-
-        sgfn = gfn_add(sgfn, (1 << order));
-        if ( !mfn_eq(smfn, INVALID_MFN) )
-           smfn = mfn_add(smfn, (1 << order));
-
-        nr -= (1 << order);
-    }
-
-    return rc;
-}
-
-/* Invalidate all entries in the table. The p2m should be write locked. */
-static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn)
-{
-    lpae_t *table;
-    unsigned int i;
-
-    ASSERT(p2m_is_write_locked(p2m));
-
-    table = map_domain_page(mfn);
-
-    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-    {
-        lpae_t pte = table[i];
-
-        /*
-         * Writing an entry can be expensive because it may involve
-         * cleaning the cache. So avoid updating the entry if the valid
-         * bit is already cleared.
-         */
-        if ( !pte.p2m.valid )
-            continue;
-
-        pte.p2m.valid = 0;
-
-        p2m_write_pte(&table[i], pte, p2m->clean_pte);
-    }
-
-    unmap_domain_page(table);
-
-    p2m->need_flush = true;
-}
+unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
+#endif
 
 /*
- * The domain will not be scheduled anymore, so in theory we should
- * not need to flush the TLBs. Do it for safety purpose.
- * Note that all the devices have already been de-assigned. So we don't
- * need to flush the IOMMU TLB here.
+ * Set to the maximum configured support for IPA bits, so the number of IPA bits can be
+ * restricted by external entity (e.g. IOMMU).
  */
-void p2m_clear_root_pages(struct p2m_domain *p2m)
-{
-    unsigned int i;
-
-    p2m_write_lock(p2m);
-
-    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
-        clear_and_clean_page(p2m->root + i);
+unsigned int __read_mostly p2m_ipa_bits = PADDR_BITS;
 
-    p2m_force_tlb_flush_sync(p2m);
+/* Unlock the flush and do a P2M TLB flush if necessary */
+void p2m_write_unlock(struct p2m_domain *p2m)
+{
+#ifdef CONFIG_MMU
+    /*
+     * The final flush is done with the P2M write lock taken to avoid
+     * someone else modifying the P2M wbefore the TLB invalidation has
+     * completed.
+     */
+    p2m_tlb_flush_sync(p2m);
+#endif
 
-    p2m_write_unlock(p2m);
+    write_unlock(&p2m->lock);
 }
 
-/*
- * Invalidate all entries in the root page-tables. This is
- * useful to get fault on entry and do an action.
- *
- * p2m_invalid_root() should not be called when the P2M is shared with
- * the IOMMU because it will cause IOMMU fault.
- */
-void p2m_invalidate_root(struct p2m_domain *p2m)
+void memory_type_changed(struct domain *d)
 {
-    unsigned int i;
+}
 
-    ASSERT(!iommu_use_hap_pt(p2m->domain));
+void dump_p2m_lookup(struct domain *d, paddr_t addr)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
-    p2m_write_lock(p2m);
+    printk("dom%d IPA 0x%"PRIpaddr"\n", d->domain_id, addr);
 
-    for ( i = 0; i < P2M_ROOT_LEVEL; i++ )
-        p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i));
+    printk("P2M @ %p mfn:%#"PRI_mfn"\n",
+           p2m->root, mfn_x(page_to_mfn(p2m->root)));
 
-    p2m_write_unlock(p2m);
+    dump_pt_walk(page_to_maddr(p2m->root), addr,
+                 P2M_ROOT_LEVEL, P2M_ROOT_PAGES);
 }
 
-/*
- * Resolve any translation fault due to change in the p2m. This
- * includes break-before-make and valid bit cleared.
- */
-bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn)
+mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
 {
+    mfn_t mfn;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned int level = 0;
-    bool resolved = false;
-    lpae_t entry, *table;
-
-    /* Convenience aliases */
-    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
-
-    p2m_write_lock(p2m);
 
-    /* This gfn is higher than the highest the p2m map currently holds */
-    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
-        goto out;
+    p2m_read_lock(p2m);
+    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL, NULL);
+    p2m_read_unlock(p2m);
 
-    table = p2m_get_root_pointer(p2m, gfn);
-    /*
-     * The table should always be non-NULL because the gfn is below
-     * p2m->max_mapped_gfn and the root table pages are always present.
-     */
-    if ( !table )
-    {
-        ASSERT_UNREACHABLE();
-        goto out;
-    }
+    return mfn;
+}
 
-    /*
-     * Go down the page-tables until an entry has the valid bit unset or
-     * a block/page entry has been hit.
-     */
-    for ( level = P2M_ROOT_LEVEL; level <= 3; level++ )
-    {
-        int rc;
+struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn,
+                                        p2m_type_t *t)
+{
+    struct page_info *page;
+    p2m_type_t p2mt;
+    mfn_t mfn = p2m_lookup(d, gfn, &p2mt);
 
-        entry = table[offsets[level]];
+    if ( t )
+        *t = p2mt;
 
-        if ( level == 3 )
-            break;
+    if ( !p2m_is_any_ram(p2mt) )
+        return NULL;
 
-        /* Stop as soon as we hit an entry with the valid bit unset. */
-        if ( !lpae_is_valid(entry) )
-            break;
+    if ( !mfn_valid(mfn) )
+        return NULL;
 
-        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
-        if ( rc == GUEST_TABLE_MAP_FAILED )
-            goto out_unmap;
-        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
-            break;
-    }
+    page = mfn_to_page(mfn);
 
     /*
-     * If the valid bit of the entry is set, it means someone was playing with
-     * the Stage-2 page table. Nothing to do and mark the fault as resolved.
+     * get_page won't work on foreign mapping because the page doesn't
+     * belong to the current domain.
      */
-    if ( lpae_is_valid(entry) )
+    if ( p2m_is_foreign(p2mt) )
     {
-        resolved = true;
-        goto out_unmap;
+        struct domain *fdom = page_get_owner_and_reference(page);
+        ASSERT(fdom != NULL);
+        ASSERT(fdom != d);
+        return page;
     }
 
-    /*
-     * The valid bit is unset. If the entry is still not valid then the fault
-     * cannot be resolved, exit and report it.
-     */
-    if ( !p2m_is_valid(entry) )
-        goto out_unmap;
+    return get_page(page, d) ? page : NULL;
+}
 
-    /*
-     * Now we have an entry with valid bit unset, but still valid from
-     * the P2M point of view.
-     *
-     * If an entry is pointing to a table, each entry of the table will
-     * have there valid bit cleared. This allows a function to clear the
-     * full p2m with just a couple of write. The valid bit will then be
-     * propagated on the fault.
-     * If an entry is pointing to a block/page, no work to do for now.
-     */
-    if ( lpae_is_table(entry, level) )
-        p2m_invalidate_table(p2m, lpae_get_mfn(entry));
+int guest_physmap_mark_populate_on_demand(struct domain *d,
+                                          unsigned long gfn,
+                                          unsigned int order)
+{
+    return -ENOSYS;
+}
 
-    /*
-     * Now that the work on the entry is done, set the valid bit to prevent
-     * another fault on that entry.
-     */
-    resolved = true;
-    entry.p2m.valid = 1;
+unsigned long p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn,
+                                           unsigned int order)
+{
+    return 0;
+}
 
-    p2m_write_pte(table + offsets[level], entry, p2m->clean_pte);
+/*
+ * The domain will not be scheduled anymore, so in theory we should
+ * not need to flush the TLBs. Do it for safety purpose.
+ * Note that all the devices have already been de-assigned. So we don't
+ * need to flush the IOMMU TLB here.
+ */
+void p2m_clear_root_pages(struct p2m_domain *p2m)
+{
+    unsigned int i;
 
-    /*
-     * No need to flush the TLBs as the modified entry had the valid bit
-     * unset.
-     */
+    p2m_write_lock(p2m);
 
-out_unmap:
-    unmap_domain_page(table);
+    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
+        clear_and_clean_page(p2m->root + i);
 
-out:
-    p2m_write_unlock(p2m);
+#ifdef CONFIG_MMU
+    p2m_force_tlb_flush_sync(p2m);
+#endif
 
-    return resolved;
+    p2m_write_unlock(p2m);
 }
 
 int p2m_insert_mapping(struct domain *d, gfn_t start_gfn, unsigned long nr,
@@ -1612,44 +282,6 @@ int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
     return rc;
 }
 
-static struct page_info *p2m_allocate_root(void)
-{
-    struct page_info *page;
-    unsigned int i;
-
-    page = alloc_domheap_pages(NULL, P2M_ROOT_ORDER, 0);
-    if ( page == NULL )
-        return NULL;
-
-    /* Clear both first level pages */
-    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
-        clear_and_clean_page(page + i);
-
-    return page;
-}
-
-static int p2m_alloc_table(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    p2m->root = p2m_allocate_root();
-    if ( !p2m->root )
-        return -ENOMEM;
-
-    p2m->vttbr = generate_vttbr(p2m->vmid, page_to_mfn(p2m->root));
-
-    /*
-     * Make sure that all TLBs corresponding to the new VMID are flushed
-     * before using it
-     */
-    p2m_write_lock(p2m);
-    p2m_force_tlb_flush_sync(p2m);
-    p2m_write_unlock(p2m);
-
-    return 0;
-}
-
-
 static spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
 
 /*
@@ -1660,7 +292,7 @@ static spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
  */
 static unsigned long *vmid_mask;
 
-static void p2m_vmid_allocator_init(void)
+void p2m_vmid_allocator_init(void)
 {
     /*
      * allocate space for vmid_mask based on MAX_VMID
@@ -1673,7 +305,7 @@ static void p2m_vmid_allocator_init(void)
     set_bit(INVALID_VMID, vmid_mask);
 }
 
-static int p2m_alloc_vmid(struct domain *d)
+int p2m_alloc_vmid(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
@@ -1713,32 +345,6 @@ static void p2m_free_vmid(struct domain *d)
     spin_unlock(&vmid_alloc_lock);
 }
 
-int p2m_teardown(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned long count = 0;
-    struct page_info *pg;
-    int rc = 0;
-
-    p2m_write_lock(p2m);
-
-    while ( (pg = page_list_remove_head(&p2m->pages)) )
-    {
-        p2m_free_page(p2m->domain, pg);
-        count++;
-        /* Arbitrarily preempt every 512 iterations */
-        if ( !(count % 512) && hypercall_preempt_check() )
-        {
-            rc = -ERESTART;
-            break;
-        }
-    }
-
-    p2m_write_unlock(p2m);
-
-    return rc;
-}
-
 void p2m_final_teardown(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
@@ -1771,61 +377,6 @@ void p2m_final_teardown(struct domain *d)
     p2m->domain = NULL;
 }
 
-int p2m_init(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    int rc;
-    unsigned int cpu;
-
-    rwlock_init(&p2m->lock);
-    spin_lock_init(&d->arch.paging.lock);
-    INIT_PAGE_LIST_HEAD(&p2m->pages);
-    INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
-
-    p2m->vmid = INVALID_VMID;
-    p2m->max_mapped_gfn = _gfn(0);
-    p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
-
-    p2m->default_access = p2m_access_rwx;
-    p2m->mem_access_enabled = false;
-    radix_tree_init(&p2m->mem_access_settings);
-
-    /*
-     * Some IOMMUs don't support coherent PT walk. When the p2m is
-     * shared with the CPU, Xen has to make sure that the PT changes have
-     * reached the memory
-     */
-    p2m->clean_pte = is_iommu_enabled(d) &&
-        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
-
-    /*
-     * Make sure that the type chosen to is able to store the an vCPU ID
-     * between 0 and the maximum of virtual CPUS supported as long as
-     * the INVALID_VCPU_ID.
-     */
-    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
-    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
-
-    for_each_possible_cpu(cpu)
-       p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
-
-    /*
-     * "Trivial" initialisation is now complete.  Set the backpointer so
-     * p2m_teardown() and friends know to do something.
-     */
-    p2m->domain = d;
-
-    rc = p2m_alloc_vmid(d);
-    if ( rc )
-        return rc;
-
-    rc = p2m_alloc_table(d);
-    if ( rc )
-        return rc;
-
-    return 0;
-}
-
 /*
  * The function will go through the p2m and remove page reference when it
  * is required. The mapping will be removed from the p2m.
@@ -2217,159 +768,6 @@ void __init p2m_restrict_ipa_bits(unsigned int ipa_bits)
         p2m_ipa_bits = ipa_bits;
 }
 
-/* VTCR value to be configured by all CPUs. Set only once by the boot CPU */
-static register_t __read_mostly vtcr;
-
-static void setup_virt_paging_one(void *data)
-{
-    WRITE_SYSREG(vtcr, VTCR_EL2);
-
-    /*
-     * ARM64_WORKAROUND_AT_SPECULATE: We want to keep the TLBs free from
-     * entries related to EL1/EL0 translation regime until a guest vCPU
-     * is running. For that, we need to set-up VTTBR to point to an empty
-     * page-table and turn on stage-2 translation. The TLB entries
-     * associated with EL1/EL0 translation regime will also be flushed in case
-     * an AT instruction was speculated before hand.
-     */
-    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-    {
-        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
-        WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_VM, HCR_EL2);
-        isb();
-
-        flush_all_guests_tlb_local();
-    }
-}
-
-void __init setup_virt_paging(void)
-{
-    /* Setup Stage 2 address translation */
-    register_t val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
-
-    static const struct {
-        unsigned int pabits; /* Physical Address Size */
-        unsigned int t0sz;   /* Desired T0SZ, minimum in comment */
-        unsigned int root_order; /* Page order of the root of the p2m */
-        unsigned int sl0;    /* Desired SL0, maximum in comment */
-    } pa_range_info[] __initconst = {
-        /* T0SZ minimum and SL0 maximum from ARM DDI 0487H.a Table D5-6 */
-        /*      PA size, t0sz(min), root-order, sl0(max) */
-#ifdef CONFIG_ARM_64
-        [0] = { 32,      32/*32*/,  0,          1 },
-        [1] = { 36,      28/*28*/,  0,          1 },
-        [2] = { 40,      24/*24*/,  1,          1 },
-        [3] = { 42,      22/*22*/,  3,          1 },
-        [4] = { 44,      20/*20*/,  0,          2 },
-        [5] = { 48,      16/*16*/,  0,          2 },
-        [6] = { 52,      12/*12*/,  4,          2 },
-        [7] = { 0 }  /* Invalid */
-#else
-        { 32,      0/*0*/,    0,          1 },
-        { 40,      24/*24*/,  1,          1 }
-#endif
-    };
-
-    unsigned int i;
-    unsigned int pa_range = 0x10; /* Larger than any possible value */
-
-#ifdef CONFIG_ARM_32
-    /*
-     * Typecast pa_range_info[].t0sz into arm32 bit variant.
-     *
-     * VTCR.T0SZ is bits [3:0] and S(sign extension), bit[4] for arm322.
-     * Thus, pa_range_info[].t0sz is translated to its arm32 variant using
-     * struct bitfields.
-     */
-    struct
-    {
-        signed int val:5;
-    } t0sz_32;
-#else
-    /*
-     * Restrict "p2m_ipa_bits" if needed. As P2M table is always configured
-     * with IPA bits == PA bits, compare against "pabits".
-     */
-    if ( pa_range_info[system_cpuinfo.mm64.pa_range].pabits < p2m_ipa_bits )
-        p2m_ipa_bits = pa_range_info[system_cpuinfo.mm64.pa_range].pabits;
-
-    /*
-     * cpu info sanitization made sure we support 16bits VMID only if all
-     * cores are supporting it.
-     */
-    if ( system_cpuinfo.mm64.vmid_bits == MM64_VMID_16_BITS_SUPPORT )
-        max_vmid = MAX_VMID_16_BIT;
-#endif
-
-    /* Choose suitable "pa_range" according to the resulted "p2m_ipa_bits". */
-    for ( i = 0; i < ARRAY_SIZE(pa_range_info); i++ )
-    {
-        if ( p2m_ipa_bits == pa_range_info[i].pabits )
-        {
-            pa_range = i;
-            break;
-        }
-    }
-
-    /* Check if we found the associated entry in the array */
-    if ( pa_range >= ARRAY_SIZE(pa_range_info) || !pa_range_info[pa_range].pabits )
-        panic("%u-bit P2M is not supported\n", p2m_ipa_bits);
-
-#ifdef CONFIG_ARM_64
-    val |= VTCR_PS(pa_range);
-    val |= VTCR_TG0_4K;
-
-    /* Set the VS bit only if 16 bit VMID is supported. */
-    if ( MAX_VMID == MAX_VMID_16_BIT )
-        val |= VTCR_VS;
-#endif
-
-    val |= VTCR_SL0(pa_range_info[pa_range].sl0);
-    val |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
-
-    p2m_root_order = pa_range_info[pa_range].root_order;
-    p2m_root_level = 2 - pa_range_info[pa_range].sl0;
-
-#ifdef CONFIG_ARM_64
-    p2m_ipa_bits = 64 - pa_range_info[pa_range].t0sz;
-#else
-    t0sz_32.val = pa_range_info[pa_range].t0sz;
-    p2m_ipa_bits = 32 - t0sz_32.val;
-#endif
-
-    printk("P2M: %d-bit IPA with %d-bit PA and %d-bit VMID\n",
-           p2m_ipa_bits,
-           pa_range_info[pa_range].pabits,
-           ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
-
-    printk("P2M: %d levels with order-%d root, VTCR 0x%"PRIregister"\n",
-           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
-
-    p2m_vmid_allocator_init();
-
-    /* It is not allowed to concatenate a level zero root */
-    BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
-    vtcr = val;
-
-    /*
-     * ARM64_WORKAROUND_AT_SPECULATE requires to allocate root table
-     * with all entries zeroed.
-     */
-    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-    {
-        struct page_info *root;
-
-        root = p2m_allocate_root();
-        if ( !root )
-            panic("Unable to allocate root table for ARM64_WORKAROUND_AT_SPECULATE\n");
-
-        empty_root_mfn = page_to_mfn(root);
-    }
-
-    setup_virt_paging_one(NULL);
-    smp_call_function(setup_virt_paging_one, NULL, 1);
-}
-
 static int cpu_virt_paging_callback(struct notifier_block *nfb,
                                     unsigned long action,
                                     void *hcpu)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (10 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h} Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-21 21:31   ` Julien Grall
  2023-08-14  4:25 ` [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU Henry Wang
  12 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng, Wei Chen, Henry Wang

From: Penny Zheng <Penny.Zheng@arm.com>

Function copy_from_paddr() is defined in asm/setup.h, so it is better
to be implemented in setup.c.

Current copy_from_paddr() implementation is mmu-specific, so this
commit moves copy_from_paddr() into mmu/setup.c, and it is also
benefical for us to implement MPU version of copy_from_paddr() in
later commit.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v5:
- No change
v4:
- No change
v3:
- new commit
---
 xen/arch/arm/kernel.c    | 27 ---------------------------
 xen/arch/arm/mmu/setup.c | 27 +++++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 508c54824d..0d433a32e7 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -41,33 +41,6 @@ struct minimal_dtb_header {
 
 #define DTB_MAGIC 0xd00dfeedU
 
-/**
- * copy_from_paddr - copy data from a physical address
- * @dst: destination virtual address
- * @paddr: source physical address
- * @len: length to copy
- */
-void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
-{
-    void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
-
-    while (len) {
-        unsigned long l, s;
-
-        s = paddr & (PAGE_SIZE-1);
-        l = min(PAGE_SIZE - s, len);
-
-        set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
-        memcpy(dst, src + s, l);
-        clean_dcache_va_range(dst, l);
-        clear_fixmap(FIXMAP_MISC);
-
-        paddr += l;
-        dst += l;
-        len -= l;
-    }
-}
-
 static void __init place_modules(struct kernel_info *info,
                                  paddr_t kernbase, paddr_t kernend)
 {
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index e05cca3f86..889ada6b87 100644
--- a/xen/arch/arm/mmu/setup.c
+++ b/xen/arch/arm/mmu/setup.c
@@ -329,6 +329,33 @@ void __init setup_mm(void)
 }
 #endif
 
+/*
+ * copy_from_paddr - copy data from a physical address
+ * @dst: destination virtual address
+ * @paddr: source physical address
+ * @len: length to copy
+ */
+void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
+{
+    void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
+
+    while (len) {
+        unsigned long l, s;
+
+        s = paddr & (PAGE_SIZE-1);
+        l = min(PAGE_SIZE - s, len);
+
+        set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+        memcpy(dst, src + s, l);
+        clean_dcache_va_range(dst, l);
+        clear_fixmap(FIXMAP_MISC);
+
+        paddr += l;
+        dst += l;
+        len -= l;
+    }
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
                   ` (11 preceding siblings ...)
  2023-08-14  4:25 ` [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c Henry Wang
@ 2023-08-14  4:25 ` Henry Wang
  2023-08-14  7:08   ` Jan Beulich
  2023-08-21 21:34   ` Julien Grall
  12 siblings, 2 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-14  4:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Penny Zheng, Jan Beulich, Paul Durrant, Roger Pau Monné,
	Stefano Stabellini, Julien Grall, Bertrand Marquis, Penny Zheng,
	Wei Chen, Henry Wang

From: Penny Zheng <Penny.Zheng@arm.com>

SMMU subsystem is only supported in MMU system, so we make it dependent
on CONFIG_HAS_MMU.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v5:
- Add Acked-by tag from Jan.
v4:
- No change
v3:
- new patch
---
 xen/drivers/passthrough/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
index 864fcf3b0c..ebb350bc37 100644
--- a/xen/drivers/passthrough/Kconfig
+++ b/xen/drivers/passthrough/Kconfig
@@ -5,6 +5,7 @@ config HAS_PASSTHROUGH
 if ARM
 config ARM_SMMU
 	bool "ARM SMMUv1 and v2 driver"
+	depends on MMU
 	default y
 	---help---
 	  Support for implementations of the ARM System MMU architecture
@@ -15,7 +16,7 @@ config ARM_SMMU
 
 config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
-	depends on ARM_64 && (!ACPI || BROKEN)
+	depends on ARM_64 && (!ACPI || BROKEN) && MMU
 	---help---
 	 Support for implementations of the ARM System MMU architecture
 	 version 3. Driver is in experimental stage and should not be used in
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-14  4:25 ` [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU Henry Wang
@ 2023-08-14  7:08   ` Jan Beulich
  2023-08-14  7:10     ` Henry Wang
  2023-08-21 21:34   ` Julien Grall
  1 sibling, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2023-08-14  7:08 UTC (permalink / raw)
  To: Henry Wang
  Cc: Penny Zheng, Paul Durrant, Roger Pau Monné,
	Stefano Stabellini, Julien Grall, Bertrand Marquis, Wei Chen,
	xen-devel

On 14.08.2023 06:25, Henry Wang wrote:
> From: Penny Zheng <Penny.Zheng@arm.com>
> 
> SMMU subsystem is only supported in MMU system, so we make it dependent
> on CONFIG_HAS_MMU.

Nit: Stale "HAS" infix?

Jan

> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
> v5:
> - Add Acked-by tag from Jan.
> v4:
> - No change
> v3:
> - new patch
> ---
>  xen/drivers/passthrough/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
> index 864fcf3b0c..ebb350bc37 100644
> --- a/xen/drivers/passthrough/Kconfig
> +++ b/xen/drivers/passthrough/Kconfig
> @@ -5,6 +5,7 @@ config HAS_PASSTHROUGH
>  if ARM
>  config ARM_SMMU
>  	bool "ARM SMMUv1 and v2 driver"
> +	depends on MMU
>  	default y
>  	---help---
>  	  Support for implementations of the ARM System MMU architecture
> @@ -15,7 +16,7 @@ config ARM_SMMU
>  
>  config ARM_SMMU_V3
>  	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
> -	depends on ARM_64 && (!ACPI || BROKEN)
> +	depends on ARM_64 && (!ACPI || BROKEN) && MMU
>  	---help---
>  	 Support for implementations of the ARM System MMU architecture
>  	 version 3. Driver is in experimental stage and should not be used in



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-14  7:08   ` Jan Beulich
@ 2023-08-14  7:10     ` Henry Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-14  7:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Penny Zheng, Paul Durrant, Roger Pau Monné,
	Stefano Stabellini, Julien Grall, Bertrand Marquis, Wei Chen,
	xen-devel

Hi Jan,

> On Aug 14, 2023, at 15:08, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 14.08.2023 06:25, Henry Wang wrote:
>> From: Penny Zheng <Penny.Zheng@arm.com>
>> 
>> SMMU subsystem is only supported in MMU system, so we make it dependent
>> on CONFIG_HAS_MMU.
> 
> Nit: Stale "HAS" infix?

Ah…Nice catch, sorry about that, will fix that in v6 if the series needs changes
in other patches.

Kind regards,
Henry

> 
> Jan
> 
>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v5:
>> - Add Acked-by tag from Jan.
>> v4:
>> - No change
>> v3:
>> - new patch
>> ---
>> xen/drivers/passthrough/Kconfig | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
>> index 864fcf3b0c..ebb350bc37 100644
>> --- a/xen/drivers/passthrough/Kconfig
>> +++ b/xen/drivers/passthrough/Kconfig
>> @@ -5,6 +5,7 @@ config HAS_PASSTHROUGH
>> if ARM
>> config ARM_SMMU
>> bool "ARM SMMUv1 and v2 driver"
>> + depends on MMU
>> default y
>> ---help---
>>  Support for implementations of the ARM System MMU architecture
>> @@ -15,7 +16,7 @@ config ARM_SMMU
>> 
>> config ARM_SMMU_V3
>> bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
>> - depends on ARM_64 && (!ACPI || BROKEN)
>> + depends on ARM_64 && (!ACPI || BROKEN) && MMU
>> ---help---
>> Support for implementations of the ARM System MMU architecture
>> version 3. Driver is in experimental stage and should not be used in
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm()
  2023-08-14  4:25 ` [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm() Henry Wang
@ 2023-08-21  8:33   ` Julien Grall
  2023-08-21  8:40     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21  8:33 UTC (permalink / raw)
  To: Henry Wang, xen-devel

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> At the moment, on MMU system, enable_mmu() will return to an
> address in the 1:1 mapping, then each path is responsible to
> switch to virtual runtime mapping. Then remove_identity_mapping()
> is called on the boot CPU to remove all 1:1 mapping.
> 
> Since remove_identity_mapping() is not necessary on Non-MMU system,
> and we also avoid creating empty function for Non-MMU system, trying
> to keep only one codeflow in arm64/head.S, we move path switch and
> remove_identity_mapping() in enable_mmu() on MMU system.
> 
> As the remove_identity_mapping should only be called for the boot
> CPU only, so we introduce enable_boot_cpu_mm() for boot CPU and
> enable_secondary_cpu_mm() for secondary CPUs in this patch.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Penny Zheng <penny.zheng@arm.com> > Signed-off-by: Henry Wang <Henry.Wang@arm.com>

One remark below. With or without it addressed:

Reviewed-by: Julien Grall <jgrall@amazon.com>

[...]

> +/*
> + * Enable mm (turn on the data cache and the MMU) for secondary CPUs.
> + * The function will return to the virtual address provided in LR (e.g. the
> + * runtime mapping).
> + *
> + * Inputs:
> + *   lr : Virtual address to return to.
> + *
> + * Clobbers x0 - x5
> + */
> +enable_secondary_cpu_mm:
> +        mov   x5, lr
> +
> +        load_paddr x0, init_ttbr
> +        ldr   x0, [x0]
> +
> +        bl    enable_mmu
> +        mov   lr, x5
> +
> +        /* Return to the virtual address requested by the caller. */
> +        ret
> +ENDPROC(enable_secondary_cpu_mm)

NIT: enable_mmu() could directly return to the virtual address. This 
would reduce the function to:

load_paddr x0, init_ttbr
ldr   x0, [x0]

/* Return to the virtual address requested by the caller.
b enable_mmu

> +
> +/*
> + * Enable mm (turn on the data cache and the MMU) for the boot CPU.
> + * The function will return to the virtual address provided in LR (e.g. the
> + * runtime mapping).
> + *
> + * Inputs:
> + *   lr : Virtual address to return to.
> + *
> + * Clobbers x0 - x5
> + */
> +enable_boot_cpu_mm:
> +        mov   x5, lr
> +
> +        bl    create_page_tables
> +        load_paddr x0, boot_pgtable
> +
> +        bl    enable_mmu
> +
> +        /*
> +         * The MMU is turned on and we are in the 1:1 mapping. Switch
> +         * to the runtime mapping.
> +         */
> +        ldr   x0, =1f
> +        br    x0
> +1:
> +        mov   lr, x5
> +        /*
> +         * The 1:1 map may clash with other parts of the Xen virtual memory
> +         * layout. As it is not used anymore, remove it completely to avoid
> +         * having to worry about replacing existing mapping afterwards.
> +         * Function will return to the virtual address requested by the caller.
> +         */
> +        b     remove_identity_mapping
> +ENDPROC(enable_boot_cpu_mm)
> +
>   /*
>    * Remove the 1:1 map from the page-tables. It is not easy to keep track
>    * where the 1:1 map was mapped, so we will look for the top-level entry

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm()
  2023-08-21  8:33   ` Julien Grall
@ 2023-08-21  8:40     ` Henry Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-21  8:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Wei Chen, Penny Zheng, Bertrand Marquis, Stefano Stabellini

Hi Julien,

> On Aug 21, 2023, at 16:33, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Wei Chen <wei.chen@arm.com>
>> At the moment, on MMU system, enable_mmu() will return to an
>> address in the 1:1 mapping, then each path is responsible to
>> switch to virtual runtime mapping. Then remove_identity_mapping()
>> is called on the boot CPU to remove all 1:1 mapping.
>> Since remove_identity_mapping() is not necessary on Non-MMU system,
>> and we also avoid creating empty function for Non-MMU system, trying
>> to keep only one codeflow in arm64/head.S, we move path switch and
>> remove_identity_mapping() in enable_mmu() on MMU system.
>> As the remove_identity_mapping should only be called for the boot
>> CPU only, so we introduce enable_boot_cpu_mm() for boot CPU and
>> enable_secondary_cpu_mm() for secondary CPUs in this patch.
>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>> Signed-off-by: Penny Zheng <penny.zheng@arm.com> > Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> 
> One remark below. With or without it addressed:
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>

Thanks, I will take this tag with ...

> 
> [...]
> 
>> +/*
>> + * Enable mm (turn on the data cache and the MMU) for secondary CPUs.
>> + * The function will return to the virtual address provided in LR (e.g. the
>> + * runtime mapping).
>> + *
>> + * Inputs:
>> + *   lr : Virtual address to return to.
>> + *
>> + * Clobbers x0 - x5
>> + */
>> +enable_secondary_cpu_mm:
>> +        mov   x5, lr
>> +
>> +        load_paddr x0, init_ttbr
>> +        ldr   x0, [x0]
>> +
>> +        bl    enable_mmu
>> +        mov   lr, x5
>> +
>> +        /* Return to the virtual address requested by the caller. */
>> +        ret
>> +ENDPROC(enable_secondary_cpu_mm)
> 
> NIT: enable_mmu() could directly return to the virtual address. This would reduce the function to:
> 
> load_paddr x0, init_ttbr
> ldr   x0, [x0]
> 
> /* Return to the virtual address requested by the caller.
> b enable_mmu

…this fixed in v6 since I think there is likely to be a v6, and I think I also need
to address the commit message nit pointed out by Jan in the last patch.

Kind regards,
Henry


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 02/13] xen/arm: Introduce CONFIG_MMU Kconfig option
  2023-08-14  4:25 ` [PATCH v5 02/13] xen/arm: Introduce CONFIG_MMU Kconfig option Henry Wang
@ 2023-08-21  8:43   ` Julien Grall
  2023-08-21  8:45     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21  8:43 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Wei Chen, Penny Zheng,
	Volodymyr Babchuk, Julien Grall

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> There are two types of memory system architectures available for
> Arm-based systems, namely the Virtual Memory System Architecture (VMSA)
> and the Protected Memory System Architecture (PMSA). According to
> ARM DDI 0487G.a, A VMSA provides a Memory Management Unit (MMU) that
> controls address translation, access permissions, and memory attribute
> determination and checking, for memory accesses made by the PE. And
> refer to ARM DDI 0600A.c, the PMSA supports a unified memory protection
> scheme where an Memory Protection Unit (MPU) manages instruction and
> data access. Currently, Xen only suuports VMSA.

Typo: s/suuports/supports/

> 
> Introduce a Kconfig option CONFIG_MMU, which is currently default
> set to y and unselectable because currently only VMSA is supported.

NIT: It would be worth to explicit mention that this will be used in 
follow-up patches. So one will wonder what's the goal of introducing an 
unused config.

Or it could have been merged in the first patch splitting the MMU code 
so we don't introduce a config without any use.

> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S
  2023-08-14  4:25 ` [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S Henry Wang
@ 2023-08-21  8:44   ` Julien Grall
  2023-08-21  8:54     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21  8:44 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng, Ayan Kumar Halder, Julien Grall

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> We want to reuse head.S for MPU systems, but there are some
> code are implemented for MMU systems only. We will move such
> code to another MMU specific file. But before that we will
> do some indentations fix in this patch to make them be easier
> for reviewing:
> 1. Fix the indentations and incorrect style of code comments.
> 2. Fix the indentations for .text.header section.
> 3. Rename puts() to asm_puts() for global export
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
> Reviewed-by: Julien Grall <jgrall@amazon.com>

Is this patch depends on the first two? If not, I will commit it before v6.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 02/13] xen/arm: Introduce CONFIG_MMU Kconfig option
  2023-08-21  8:43   ` Julien Grall
@ 2023-08-21  8:45     ` Henry Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-21  8:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Stefano Stabellini, Bertrand Marquis, Wei Chen,
	Penny Zheng, Volodymyr Babchuk, Julien Grall

Hi Julien,

> On Aug 21, 2023, at 16:43, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> There are two types of memory system architectures available for
>> Arm-based systems, namely the Virtual Memory System Architecture (VMSA)
>> and the Protected Memory System Architecture (PMSA). According to
>> ARM DDI 0487G.a, A VMSA provides a Memory Management Unit (MMU) that
>> controls address translation, access permissions, and memory attribute
>> determination and checking, for memory accesses made by the PE. And
>> refer to ARM DDI 0600A.c, the PMSA supports a unified memory protection
>> scheme where an Memory Protection Unit (MPU) manages instruction and
>> data access. Currently, Xen only suuports VMSA.
> 
> Typo: s/suuports/supports/

Oops, sorry about this, will fix in v6.

> 
>> Introduce a Kconfig option CONFIG_MMU, which is currently default
>> set to y and unselectable because currently only VMSA is supported.
> 
> NIT: It would be worth to explicit mention that this will be used in follow-up patches. So one will wonder what's the goal of introducing an unused config.
> 
> Or it could have been merged in the first patch splitting the MMU code so we don't introduce a config without any use.

Sure, I will think about this and fix in v6.

> 
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> 
> Acked-by: Julien Grall <jgrall@amazon.com>

Thanks!

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S
  2023-08-21  8:44   ` Julien Grall
@ 2023-08-21  8:54     ` Henry Wang
  2023-08-21 17:04       ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-21  8:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Wei Chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng, Ayan Kumar Halder, Julien Grall

Hi Julien,

> On Aug 21, 2023, at 16:44, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Wei Chen <wei.chen@arm.com>
>> We want to reuse head.S for MPU systems, but there are some
>> code are implemented for MMU systems only. We will move such
>> code to another MMU specific file. But before that we will
>> do some indentations fix in this patch to make them be easier
>> for reviewing:
>> 1. Fix the indentations and incorrect style of code comments.
>> 2. Fix the indentations for .text.header section.
>> 3. Rename puts() to asm_puts() for global export
>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
>> Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
>> Reviewed-by: Julien Grall <jgrall@amazon.com>
> 
> Is this patch depends on the first two? If not, I will commit it before v6.

Good point, no this patch is independent from the first two. Also I just
tested applying this patch on top of staging and building with and without
Earlyprintk. Xen and Dom0 boot fine on FVP for both cases.

So please commit this patch if you have time. Thanks!

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S
  2023-08-14  4:25 ` [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S Henry Wang
@ 2023-08-21  9:18   ` Julien Grall
  2023-08-21  9:29     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21  9:18 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Penny Zheng,
	Volodymyr Babchuk, Wei Chen

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> The MMU specific code in head.S will not be used on MPU systems.
> Instead of introducing more #ifdefs which will bring complexity
> to the code, move MMU related code to mmu/head.S and keep common
> code in head.S. Two notes while moving:
> - As "fail" in original head.S is very simple and this name is too
>    easy to be conflicted, duplicate it in mmu/head.S instead of
>    exporting it.
> - Use ENTRY() for enable_secondary_cpu_mm, enable_boot_cpu_mm and
>    setup_fixmap to please the compiler after the code movement.

I am not sure I understand why you are saying "to please the compiler" 
here. Isn't it necessary for the linker (not the compiler) to find the 
function? And therefore there is no pleasing (as in this is not a bug in 
the toolchain).

Other than that, the split looks good to me.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 06/13] xen/arm64: Fold setup_fixmap() to create_page_tables()
  2023-08-14  4:25 ` [PATCH v5 06/13] xen/arm64: Fold setup_fixmap() to create_page_tables() Henry Wang
@ 2023-08-21  9:22   ` Julien Grall
  2023-08-21  9:30     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21  9:22 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Wei Chen, Penny Zheng,
	Volodymyr Babchuk

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> The original assembly setup_fixmap() is actually doing two seperate
> tasks, one is enabling the early UART when earlyprintk on, and the
> other is to set up the fixmap (even when earlyprintk is off).
> 
> Per discussion in [1], since commit
> 9d267c049d92 ("xen/arm64: Rework the memory layout"), there is no
> chance that the fixmap and the mapping of early UART will clash with
> the 1:1 mapping. Therefore the mapping of both the fixmap and the
> early UART can be moved to the end of create_pagetables().
> 
> No functional change intended.

I would drop this sentence because the fixmap is now prepared much 
earlier in the code. So there is technically some functional change.

> 
> [1] https://lore.kernel.org/xen-devel/78862bb8-fd7f-5a51-a7ae-3c5b5998ed80@xen.org/
> 
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>

Reviewed-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S
  2023-08-21  9:18   ` Julien Grall
@ 2023-08-21  9:29     ` Henry Wang
  2023-08-21 10:16       ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-21  9:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Stefano Stabellini, Bertrand Marquis, Penny Zheng,
	Volodymyr Babchuk, Wei Chen

Hi Julien,

> On Aug 21, 2023, at 17:18, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> The MMU specific code in head.S will not be used on MPU systems.
>> Instead of introducing more #ifdefs which will bring complexity
>> to the code, move MMU related code to mmu/head.S and keep common
>> code in head.S. Two notes while moving:
>> - As "fail" in original head.S is very simple and this name is too
>>   easy to be conflicted, duplicate it in mmu/head.S instead of
>>   exporting it.
>> - Use ENTRY() for enable_secondary_cpu_mm, enable_boot_cpu_mm and
>>   setup_fixmap to please the compiler after the code movement.
> 
> I am not sure I understand why you are saying "to please the compiler" here. Isn't it necessary for the linker (not the compiler) to find the function? And therefore there is no pleasing (as in this is not a bug in the toolchain).

Yes it meant to be linker, sorry for the confusion. What I want to express is
without the ENTRY(), for example if we remove the ENTRY() around the
setup_fixmap(), we will have:

```
aarch64-none-linux-gnu-ld: prelink.o: in function `primary_switched':
/home/xinwan02/repos_for_development/xen_playground/xen/xen/arch/arm/arm64/head.S:278: undefined reference to `setup_fixmap'
/home/xinwan02/repos_for_development/xen_playground/xen/xen/arch/arm/arm64/head.S:278:(.text.header+0x1a0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `setup_fixmap'
make[2]: *** [arch/arm/Makefile:95: xen-syms] Error 1
make[1]: *** [build.mk:90: xen] Error 2
make: *** [Makefile:598: xen] Error 2
``` 

I will use the word “linker” in v6 if you agree.

> 
> Other than that, the split looks good to me.

May I please take this as a Reviewed-by tag? I will add the tag if you are
happy with that.

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 06/13] xen/arm64: Fold setup_fixmap() to create_page_tables()
  2023-08-21  9:22   ` Julien Grall
@ 2023-08-21  9:30     ` Henry Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-21  9:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Stefano Stabellini, Bertrand Marquis, Wei Chen,
	Penny Zheng, Volodymyr Babchuk



> On Aug 21, 2023, at 17:22, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> The original assembly setup_fixmap() is actually doing two seperate
>> tasks, one is enabling the early UART when earlyprintk on, and the
>> other is to set up the fixmap (even when earlyprintk is off).
>> Per discussion in [1], since commit
>> 9d267c049d92 ("xen/arm64: Rework the memory layout"), there is no
>> chance that the fixmap and the mapping of early UART will clash with
>> the 1:1 mapping. Therefore the mapping of both the fixmap and the
>> early UART can be moved to the end of create_pagetables().
>> No functional change intended.
> 
> I would drop this sentence because the fixmap is now prepared much earlier in the code. So there is technically some functional change.

Sure, I will drop this sentence in v6.

> 
>> [1] https://lore.kernel.org/xen-devel/78862bb8-fd7f-5a51-a7ae-3c5b5998ed80@xen.org/
>> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>

Thanks!

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S
  2023-08-21  9:29     ` Henry Wang
@ 2023-08-21 10:16       ` Julien Grall
  2023-08-21 10:21         ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21 10:16 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Stefano Stabellini, Bertrand Marquis, Penny Zheng,
	Volodymyr Babchuk, Wei Chen



On 21/08/2023 10:29, Henry Wang wrote:
>> On Aug 21, 2023, at 17:18, Julien Grall <julien@xen.org> wrote:
>> On 14/08/2023 05:25, Henry Wang wrote:
>>> The MMU specific code in head.S will not be used on MPU systems.
>>> Instead of introducing more #ifdefs which will bring complexity
>>> to the code, move MMU related code to mmu/head.S and keep common
>>> code in head.S. Two notes while moving:
>>> - As "fail" in original head.S is very simple and this name is too
>>>    easy to be conflicted, duplicate it in mmu/head.S instead of
>>>    exporting it.
>>> - Use ENTRY() for enable_secondary_cpu_mm, enable_boot_cpu_mm and
>>>    setup_fixmap to please the compiler after the code movement.
>>
>> I am not sure I understand why you are saying "to please the compiler" here. Isn't it necessary for the linker (not the compiler) to find the function? And therefore there is no pleasing (as in this is not a bug in the toolchain).
> 
> Yes it meant to be linker, sorry for the confusion. What I want to express is
> without the ENTRY(), for example if we remove the ENTRY() around the
> setup_fixmap(), we will have:
> 
> ```
> aarch64-none-linux-gnu-ld: prelink.o: in function `primary_switched':
> /home/xinwan02/repos_for_development/xen_playground/xen/xen/arch/arm/arm64/head.S:278: undefined reference to `setup_fixmap'
> /home/xinwan02/repos_for_development/xen_playground/xen/xen/arch/arm/arm64/head.S:278:(.text.header+0x1a0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `setup_fixmap'
> make[2]: *** [arch/arm/Makefile:95: xen-syms] Error 1
> make[1]: *** [build.mk:90: xen] Error 2
> make: *** [Makefile:598: xen] Error 2
> ```
> 
> I will use the word “linker” in v6 if you agree.

The sentence also need to be reworded. How about:

"Use ENTRY() for ... as they will be used externally."

> 
>>
>> Other than that, the split looks good to me.
> 
> May I please take this as a Reviewed-by tag? I will add the tag if you are
> happy with that.

Sure. Reviewed-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S
  2023-08-21 10:16       ` Julien Grall
@ 2023-08-21 10:21         ` Henry Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-21 10:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Stefano Stabellini, Bertrand Marquis, Penny Zheng,
	Volodymyr Babchuk, Wei Chen

Hi Julien,

> On Aug 21, 2023, at 18:16, Julien Grall <julien@xen.org> wrote:
> On 21/08/2023 10:29, Henry Wang wrote:
>>> On Aug 21, 2023, at 17:18, Julien Grall <julien@xen.org> wrote:
>>> On 14/08/2023 05:25, Henry Wang wrote:
>>>> The MMU specific code in head.S will not be used on MPU systems.
>>>> Instead of introducing more #ifdefs which will bring complexity
>>>> to the code, move MMU related code to mmu/head.S and keep common
>>>> code in head.S. Two notes while moving:
>>>> - As "fail" in original head.S is very simple and this name is too
>>>>   easy to be conflicted, duplicate it in mmu/head.S instead of
>>>>   exporting it.
>>>> - Use ENTRY() for enable_secondary_cpu_mm, enable_boot_cpu_mm and
>>>>   setup_fixmap to please the compiler after the code movement.
>>> 
>>> I am not sure I understand why you are saying "to please the compiler" here. Isn't it necessary for the linker (not the compiler) to find the function? And therefore there is no pleasing (as in this is not a bug in the toolchain).
>> Yes it meant to be linker, sorry for the confusion. What I want to express is
>> without the ENTRY(), for example if we remove the ENTRY() around the
>> setup_fixmap(), we will have:
>> ```
>> aarch64-none-linux-gnu-ld: prelink.o: in function `primary_switched':
>> /home/xinwan02/repos_for_development/xen_playground/xen/xen/arch/arm/arm64/head.S:278: undefined reference to `setup_fixmap'
>> /home/xinwan02/repos_for_development/xen_playground/xen/xen/arch/arm/arm64/head.S:278:(.text.header+0x1a0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `setup_fixmap'
>> make[2]: *** [arch/arm/Makefile:95: xen-syms] Error 1
>> make[1]: *** [build.mk:90: xen] Error 2
>> make: *** [Makefile:598: xen] Error 2
>> ```
>> I will use the word “linker” in v6 if you agree.
> 
> The sentence also need to be reworded. How about:
> 
> "Use ENTRY() for ... as they will be used externally."

Sure, I will use the suggested sentence.

> 
>>> 
>>> Other than that, the split looks good to me.
>> May I please take this as a Reviewed-by tag? I will add the tag if you are
>> happy with that.
> 
> Sure. Reviewed-by: Julien Grall <jgrall@amazon.com>

Thanks!

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S
  2023-08-21  8:54     ` Henry Wang
@ 2023-08-21 17:04       ` Julien Grall
  0 siblings, 0 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-21 17:04 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Wei Chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng, Ayan Kumar Halder, Julien Grall

Hi Henry,

On 21/08/2023 09:54, Henry Wang wrote:
>> On Aug 21, 2023, at 16:44, Julien Grall <julien@xen.org> wrote:
>> On 14/08/2023 05:25, Henry Wang wrote:
>>> From: Wei Chen <wei.chen@arm.com>
>>> We want to reuse head.S for MPU systems, but there are some
>>> code are implemented for MMU systems only. We will move such
>>> code to another MMU specific file. But before that we will
>>> do some indentations fix in this patch to make them be easier
>>> for reviewing:
>>> 1. Fix the indentations and incorrect style of code comments.
>>> 2. Fix the indentations for .text.header section.
>>> 3. Rename puts() to asm_puts() for global export
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
>>> Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
>>> Reviewed-by: Julien Grall <jgrall@amazon.com>
>>
>> Is this patch depends on the first two? If not, I will commit it before v6.
> 
> Good point, no this patch is independent from the first two. Also I just
> tested applying this patch on top of staging and building with and without
> Earlyprintk. Xen and Dom0 boot fine on FVP for both cases.

Thanks for confirming. It is now ...

> 
> So please commit this patch if you have time. Thanks!

... committed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 07/13] xen/arm: Extract MMU-specific code
  2023-08-14  4:25 ` [PATCH v5 07/13] xen/arm: Extract MMU-specific code Henry Wang
@ 2023-08-21 17:57   ` Julien Grall
  0 siblings, 0 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-21 17:57 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Wei Chen, Penny Zheng,
	Volodymyr Babchuk

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> Currently, most of the MMU-specific code is in mm.{c,h}. To make the
> mm extendable, this commit extract the MMU-specific code by firstly:
> - Create a arch/arm/include/asm/mmu/ subdir.
> - Create a arch/arm/mmu/ subdir.
> 
> Then move the MMU-specific code to above mmu subdir, which includes
> below changes:
> - Move arch/arm/arm64/mm.c to arch/arm/arm64/mmu/mm.c
> - Move MMU-related declaration in arch/arm/include/asm/mm.h to
>    arch/arm/include/asm/mmu/mm.h
> - Move the MMU-related declaration dump_pt_walk() in asm/page.h
While I agree that dump_pt_walk() is better to be defined in mm.h, I am 
not entirely sure for ...

>    and pte_of_xenaddr() in asm/setup.h to the new asm/mmu/mm.h.

... pte_of_xenaddr(). It was defined in setup.h because this is an 
helper that is only intended to be used during early boot.


That said, it is probably not worth creating a new helper for that. So I 
would suggest to at least mark pte_of_xenaddr() __init to make clear of 
the intended usage.

> - Move MMU-related code in arch/arm/mm.c to arch/arm/mmu/mm.c.
> 
> Also modify the build system (Makefiles in this case) to pick above
> mentioned code changes.
> 
> This patch is a pure code movement, no functional change intended.
> 
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> ---
> With the code movement of this patch, the descriptions on top of
> xen/arch/arm/mm.c and xen/arch/arm/mmu/mm.c might need some changes,
> suggestions?
> v5:
> - Rebase on top of xen/arm: Introduce CONFIG_MMU Kconfig option and
>    xen/arm: mm: add missing extern variable declaration
> v4:
> - Rework "[v3,13/52] xen/mmu: extract mmu-specific codes from
>    mm.c/mm.h" with the lastest staging branch, only do the code movement
>    in this patch to ease the review.
> ---
>   xen/arch/arm/Makefile             |    1 +
>   xen/arch/arm/arm64/Makefile       |    1 -
>   xen/arch/arm/arm64/mmu/Makefile   |    1 +
>   xen/arch/arm/arm64/{ => mmu}/mm.c |    0
>   xen/arch/arm/include/asm/mm.h     |   20 +-
>   xen/arch/arm/include/asm/mmu/mm.h |   55 ++
>   xen/arch/arm/include/asm/page.h   |   15 -
>   xen/arch/arm/include/asm/setup.h  |    3 -
>   xen/arch/arm/mm.c                 | 1119 ----------------------------
>   xen/arch/arm/mmu/Makefile         |    1 +
>   xen/arch/arm/mmu/mm.c             | 1146 +++++++++++++++++++++++++++++

I noticed you transferred everything in mm.c But I think some part could 
go in arm{32,64}/mmu/mm.c.

>   11 files changed, 1208 insertions(+), 1154 deletions(-)
>   rename xen/arch/arm/arm64/{ => mmu}/mm.c (100%)
>   create mode 100644 xen/arch/arm/include/asm/mmu/mm.h
>   create mode 100644 xen/arch/arm/mmu/Makefile
>   create mode 100644 xen/arch/arm/mmu/mm.c

(I haven't checked if the code was moved correctly. I only checked if 
the split makes sense).

To ease the review, I think this patch can be split in a more piecemeal 
patches. The first two pieces would be :
   * Patch #1 transfer xen_pt_update()/dump_pt_walk() and their dependencies
   * Patch #2 transfer root page-table allocation

Then you can have some for each small functions.

[...]

> diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mmu/mm.c
> similarity index 100%
> rename from xen/arch/arm/arm64/mm.c
> rename to xen/arch/arm/arm64/mmu/mm.c
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index aaacba3f04..dc1458b047 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -14,6 +14,10 @@
>   # error "unknown ARM variant"
>   #endif
>   
> +#ifdef CONFIG_MMU
> +#include <asm/mmu/mm.h>

I am guessing you will need to include <asm/mpu/mm.h> at some point. So 
I would add a:

#else
# error "Unknown memory management layout"

This would be easier to find out where includes might be missing.

[...]

> @@ -1233,11 +119,6 @@ int map_pages_to_xen(unsigned long virt,
>       return xen_pt_update(virt, mfn, nr_mfns, flags);
>   }
>   

[...]

> +/* MMU setup for secondary CPUS (which already have paging enabled) */
> +void mmu_init_secondary_cpu(void)
> +{
> +    xen_pt_enforce_wnx();
> +}
> +
> +#ifdef CONFIG_ARM_32

Rather than #ifdef, I would prefer if we move each implementation to 
arm32/mmu/mm.c and ...

> +/*
> + * Set up the direct-mapped xenheap:
> + * up to 1GB of contiguous, always-mapped memory.
> + */
> +void __init setup_directmap_mappings(unsigned long base_mfn,
> +                                     unsigned long nr_mfns)
> +{
> +    int rc;
> +
> +    rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns,
> +                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
> +    if ( rc )
> +        panic("Unable to setup the directmap mappings.\n");
> +
> +    /* Record where the directmap is, for translation routines. */
> +    directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
> +}
> +#else /* CONFIG_ARM_64 */
> +/* Map the region in the directmap area. */
> +void __init setup_directmap_mappings(unsigned long base_mfn,
> +                                     unsigned long nr_mfns)

... arm64/mmu/mm.c.

> +{
> +    int rc;
> +
> +    /* First call sets the directmap physical and virtual offset. */
> +    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
> +    {
> +        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
> +
> +        directmap_mfn_start = _mfn(base_mfn);
> +        directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
> +        /*
> +         * The base address may not be aligned to the first level
> +         * size (e.g. 1GB when using 4KB pages). This would prevent
> +         * superpage mappings for all the regions because the virtual
> +         * address and machine address should both be suitably aligned.
> +         *
> +         * Prevent that by offsetting the start of the directmap virtual
> +         * address.
> +         */
> +        directmap_virt_start = DIRECTMAP_VIRT_START +
> +            (base_mfn - mfn_gb) * PAGE_SIZE;
> +    }
> +
> +    if ( base_mfn < mfn_x(directmap_mfn_start) )
> +        panic("cannot add directmap mapping at %lx below heap start %lx\n",
> +              base_mfn, mfn_x(directmap_mfn_start));
> +
> +    rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
> +                          _mfn(base_mfn), nr_mfns,
> +                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
> +    if ( rc )
> +        panic("Unable to setup the directmap mappings.\n");
> +}
> +#endif
> +
> +/* Map a frame table to cover physical addresses ps through pe */
> +void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)

I looked at the implement for the MPU and the code is mainly the same. 
So can we keep this code in common and just ...

> +{
> +    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
> +                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
> +    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
> +    mfn_t base_mfn;
> +    const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : MB(32);
> +    int rc;
> +
> +    /*
> +     * The size of paddr_t should be sufficient for the complete range of
> +     * physical address.
> +     */
> +    BUILD_BUG_ON((sizeof(paddr_t) * BITS_PER_BYTE) < PADDR_BITS);
> +    BUILD_BUG_ON(sizeof(struct page_info) != PAGE_INFO_SIZE);
> +
> +    if ( frametable_size > FRAMETABLE_SIZE )
> +        panic("The frametable cannot cover the physical region %#"PRIpaddr" - %#"PRIpaddr"\n",
> +              ps, pe);
> +
> +    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
> +    /* Round up to 2M or 32M boundary, as appropriate. */
> +    frametable_size = ROUNDUP(frametable_size, mapping_size);
> +    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
> +
> +    rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> +                          frametable_size >> PAGE_SHIFT,
> +                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);

abstract the frametable mapping? THis would also make clear that the 
BUILD_BUG_ON() above are not specific to the MMU code.

> +    if ( rc )
> +        panic("Unable to setup the frametable mappings.\n");
> +
> +    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
> +    memset(&frame_table[nr_pdxs], -1,
> +           frametable_size - (nr_pdxs * sizeof(struct page_info)));
> +
> +    frametable_virt_end = FRAMETABLE_VIRT_START + (nr_pdxs * sizeof(struct page_info));
> +}

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system
  2023-08-14  4:25 ` [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system Henry Wang
@ 2023-08-21 18:14   ` Julien Grall
  2023-08-22  2:42     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21 18:14 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Bertrand Marquis, Wei Chen,
	Volodymyr Babchuk

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Penny Zheng <penny.zheng@arm.com>
> 
> fixmap and pmap are MMU-specific features, so fold them to MMU system.
> Do the folding for pmap by moving the HAS_PMAP Kconfig selection under
> HAS_MMU. Do the folding for fixmap by moving the implementation of
> virt_to_fix() to mmu/mm.c, so that unnecessary stubs can be avoided.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> ---
> v5:
> - Rebase on top of xen/arm: Introduce CONFIG_MMU Kconfig option
> v4:
> - Rework "[v3,11/52] xen/arm: mmu: fold FIXMAP into MMU system",
>    change the order of this patch and avoid introducing stubs.
> ---
>   xen/arch/arm/Kconfig              | 2 +-
>   xen/arch/arm/include/asm/fixmap.h | 7 +------
>   xen/arch/arm/mmu/mm.c             | 7 +++++++
>   3 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index eb0413336b..8a7b79b4b5 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -15,7 +15,6 @@ config ARM
>   	select HAS_DEVICE_TREE
>   	select HAS_PASSTHROUGH
>   	select HAS_PDX
> -	select HAS_PMAP
>   	select HAS_UBSAN
>   	select IOMMU_FORCE_PT_SHARE
>   
> @@ -61,6 +60,7 @@ config PADDR_BITS
>   
>   config MMU
>   	def_bool y
> +	select HAS_PMAP
>   
>   source "arch/Kconfig"
>   
> diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h
> index 734eb9b1d4..5d5de6995a 100644
> --- a/xen/arch/arm/include/asm/fixmap.h
> +++ b/xen/arch/arm/include/asm/fixmap.h
> @@ -36,12 +36,7 @@ extern void clear_fixmap(unsigned int map);
>   
>   #define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot))
>   
> -static inline unsigned int virt_to_fix(vaddr_t vaddr)
> -{
> -    BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
> -
> -    return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
> -}
> +extern unsigned int virt_to_fix(vaddr_t vaddr);

AFAICT, virt_to_fix() is not going to be implemented for the MPU code. 
This implies that no-one should call it.

Also, none of the definitions in fixmap.h actually makes sense for the 
MPU. I would prefer if we instead try to lmit the include of fixmap to 
when this is strictly necessary. Looking for the inclusion in staging I 
could find:

42sh> ack "\#include" | ack "fixmap" | ack -v x86
arch/arm/acpi/lib.c:28:#include <asm/fixmap.h>
arch/arm/kernel.c:19:#include <asm/fixmap.h>
arch/arm/mm.c:27:#include <asm/fixmap.h>
arch/arm/include/asm/fixmap.h:7:#include <xen/acpi.h>
arch/arm/include/asm/fixmap.h:8:#include <xen/pmap.h>
arch/arm/include/asm/pmap.h:6:#include <asm/fixmap.h>
arch/arm/include/asm/early_printk.h:14:#include <asm/fixmap.h>
common/efi/boot.c:30:#include <asm/fixmap.h>
common/pmap.c:7:#include <asm/fixmap.h>
drivers/acpi/apei/erst.c:36:#include <asm/fixmap.h>
drivers/acpi/apei/apei-io.c:32:#include <asm/fixmap.h>
drivers/char/xhci-dbc.c:30:#include <asm/fixmap.h>
drivers/char/ehci-dbgp.c:16:#include <asm/fixmap.h>
drivers/char/ns16550.c:40:#include <asm/fixmap.h>
drivers/char/xen_pv_console.c:28:#include <asm/fixmap.h>

Some of them are gone after your rework. The only remaining that we care 
are in kernel.h (but I think it can be removed after your series).

So I think it would be feasible to not touch fixmap.h at all.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability
  2023-08-14  4:25 ` [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability Henry Wang
@ 2023-08-21 18:32   ` Julien Grall
  2023-08-24  9:46     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21 18:32 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Bertrand Marquis, Wei Chen,
	Volodymyr Babchuk

Hi,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Penny Zheng <penny.zheng@arm.com>
> 
> As preparation for MPU support, which will use some variables/functions
> for both MMU and MPU system, We rename the affected variable/function
> to more generic names:
> - init_ttbr -> init_mm,

You moved init_ttbr to mm/mmu.c. So why does this need to be renamed?

And if you really planned to use it for the MPU code. Then init_ttbr 
should not have been moved.

> - mmu_init_secondary_cpu() -> mm_init_secondary_cpu()
> - init_secondary_pagetables() -> init_secondary_mm()

The original naming were not great but the new one are a lot more 
confusing as they seem to just be a reshuffle of word.

mm_init_secondary_cpu() is only setting the WxN bit. For the MMU, I 
think it can be done much earlier. Do you have anything to add in it? If 
not, then I would consider to get rid of it.

For init_secondary_mm(), I would renamed it to prepare_secondary_mm().

> - Add a wrapper update_mm_mapping() for MMU system's
>    update_identity_mapping()
> 
> Modify the related in-code comment to reflect above changes, take the
> opportunity to fix the incorrect coding style of the in-code comments.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> ---
> v5:
> - Rebase on top of xen/arm: mm: add missing extern variable declaration
> v4:
> - Extract the renaming part from the original patch:
>    "[v3,13/52] xen/mmu: extract mmu-specific codes from mm.c/mm.h"
> ---
>   xen/arch/arm/arm32/head.S           |  4 ++--
>   xen/arch/arm/arm64/mmu/head.S       |  2 +-
>   xen/arch/arm/arm64/mmu/mm.c         | 11 ++++++++---
>   xen/arch/arm/arm64/smpboot.c        |  6 +++---
>   xen/arch/arm/include/asm/arm64/mm.h |  7 ++++---
>   xen/arch/arm/include/asm/mm.h       | 12 +++++++-----
>   xen/arch/arm/mmu/mm.c               | 20 ++++++++++----------
>   xen/arch/arm/smpboot.c              |  4 ++--
>   8 files changed, 37 insertions(+), 29 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
> index 33b038e7e0..03ab68578a 100644
> --- a/xen/arch/arm/arm32/head.S
> +++ b/xen/arch/arm/arm32/head.S
> @@ -238,11 +238,11 @@ GLOBAL(init_secondary)
>   secondary_switched:
>           /*
>            * Non-boot CPUs need to move on to the proper pagetables, which were
> -         * setup in init_secondary_pagetables.
> +         * setup in init_secondary_mm.
>            *
>            * XXX: This is not compliant with the Arm Arm.
>            */
> -        mov_w r4, init_ttbr          /* VA of HTTBR value stashed by CPU 0 */
> +        mov_w r4, init_mm            /* VA of HTTBR value stashed by CPU 0 */
>           ldrd  r4, r5, [r4]           /* Actual value */
>           dsb
>           mcrr  CP64(r4, r5, HTTBR)
> diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
> index ba2ddd7e67..58d91c9088 100644
> --- a/xen/arch/arm/arm64/mmu/head.S
> +++ b/xen/arch/arm/arm64/mmu/head.S
> @@ -302,7 +302,7 @@ ENDPROC(enable_mmu)
>   ENTRY(enable_secondary_cpu_mm)
>           mov   x5, lr
>   
> -        load_paddr x0, init_ttbr
> +        load_paddr x0, init_mm
>           ldr   x0, [x0]
>   
>           bl    enable_mmu
> diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
> index 78b7c7eb00..ed0fc5ff7b 100644
> --- a/xen/arch/arm/arm64/mmu/mm.c
> +++ b/xen/arch/arm/arm64/mmu/mm.c
> @@ -106,7 +106,7 @@ void __init arch_setup_page_tables(void)
>       prepare_runtime_identity_mapping();
>   }
>   
> -void update_identity_mapping(bool enable)
> +static void update_identity_mapping(bool enable)

Why not simply renaming this function to update_mm_mapping()? But...

>   {
>       paddr_t id_addr = virt_to_maddr(_start);
>       int rc;
> @@ -120,6 +120,11 @@ void update_identity_mapping(bool enable)
>       BUG_ON(rc);
>   }
>   
> +void update_mm_mapping(bool enable)

... the new name it quite confusing. What is the mapping it is referring to?

If you don't want to keep update_identity_mapping(), then I would 
consider to simply wrap...

> +{
> +    update_identity_mapping(enable);
> +}
> +
>   extern void switch_ttbr_id(uint64_t ttbr);
>   
>   typedef void (switch_ttbr_fn)(uint64_t ttbr);
> @@ -131,7 +136,7 @@ void __init switch_ttbr(uint64_t ttbr)
>       lpae_t pte;
>   
>       /* Enable the identity mapping in the boot page tables */
> -    update_identity_mapping(true);
> +    update_mm_mapping(true);
>   
>       /* Enable the identity mapping in the runtime page tables */
>       pte = pte_of_xenaddr((vaddr_t)switch_ttbr_id);
> @@ -148,7 +153,7 @@ void __init switch_ttbr(uint64_t ttbr)
>        * Note it is not necessary to disable it in the boot page tables
>        * because they are not going to be used by this CPU anymore.
>        */
> -    update_identity_mapping(false);
> +    update_mm_mapping(false);
>   }
>   
>   /*
> diff --git a/xen/arch/arm/arm64/smpboot.c b/xen/arch/arm/arm64/smpboot.c
> index 9637f42469..2b1d086a1e 100644
> --- a/xen/arch/arm/arm64/smpboot.c
> +++ b/xen/arch/arm/arm64/smpboot.c
> @@ -111,18 +111,18 @@ int arch_cpu_up(int cpu)
>       if ( !smp_enable_ops[cpu].prepare_cpu )
>           return -ENODEV;
>   
> -    update_identity_mapping(true);
> +    update_mm_mapping(true);

... with #ifdef CONFIG_MMU here...

>   
>       rc = smp_enable_ops[cpu].prepare_cpu(cpu);
>       if ( rc )
> -        update_identity_mapping(false);
> +        update_mm_mapping(false);

... here and ...


>   
>       return rc;
>   }
>   
>   void arch_cpu_up_finish(void)
>   {
> -    update_identity_mapping(false);
> +    update_mm_mapping(false);

... here.

>   }
>   
>   /*
> diff --git a/xen/arch/arm/include/asm/arm64/mm.h b/xen/arch/arm/include/asm/arm64/mm.h
> index e0bd23a6ed..7a389c4b21 100644
> --- a/xen/arch/arm/include/asm/arm64/mm.h
> +++ b/xen/arch/arm/include/asm/arm64/mm.h
> @@ -15,13 +15,14 @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
>   void arch_setup_page_tables(void);
>   
>   /*
> - * Enable/disable the identity mapping in the live page-tables (i.e.
> - * the one pointed by TTBR_EL2).
> + * In MMU system, enable/disable the identity mapping in the live
> + * page-tables (i.e. the one pointed by TTBR_EL2) through
> + * update_identity_mapping().
>    *
>    * Note that nested call (e.g. enable=true, enable=true) is not
>    * supported.
>    */
> -void update_identity_mapping(bool enable);
> +void update_mm_mapping(bool enable);
>   
>   #endif /* __ARM_ARM64_MM_H__ */
>   
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index dc1458b047..8084c62c01 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -170,7 +170,7 @@ struct page_info
>   #define PGC_need_scrub    PGC_allocated
>   
>   /* Non-boot CPUs use this to find the correct pagetables. */
> -extern uint64_t init_ttbr;
> +extern uint64_t init_mm;
>   
>   #ifdef CONFIG_ARM_32
>   #define is_xen_heap_page(page) is_xen_heap_mfn(page_to_mfn(page))
> @@ -205,11 +205,13 @@ extern void setup_pagetables(unsigned long boot_phys_offset);
>   extern void *early_fdt_map(paddr_t fdt_paddr);
>   /* Remove early mappings */
>   extern void remove_early_mappings(void);
> -/* Allocate and initialise pagetables for a secondary CPU. Sets init_ttbr to the
> - * new page table */
> -extern int init_secondary_pagetables(int cpu);
> +/*
> + * Allocate and initialise pagetables for a secondary CPU. Sets init_mm to the
> + * new page table
> + */
> +extern int init_secondary_mm(int cpu);
>   /* Switch secondary CPUS to its own pagetables and finalise MMU setup */

Regardless what I wrote above, this comment is not accurate anymore as 
we don't switch the page tables for the secondary CPUs. We are only 
enable WxN.

In any case, this comment would need to be reworded to be more generic.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 10/13] xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c
  2023-08-14  4:25 ` [PATCH v5 10/13] xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c Henry Wang
@ 2023-08-21 21:19   ` Julien Grall
  0 siblings, 0 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-21 21:19 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen

Hi,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Penny Zheng <penny.zheng@arm.com>
> 
> setup_mm is used for Xen to setup memory management subsystem at boot
> time, like boot allocator, direct-mapping, xenheap initialization,
> frametable and static memory pages.
> 
> We could inherit some components seamlessly in later MPU system like
> boot allocator, whilst we need to implement some components differently
> in MPU, like xenheap, etc. There are some components that is specific
> to MMU only, like direct-mapping.
> 
> In the commit, we move MMU-specific components into mmu/setup.c, in
> preparation of implementing MPU version of setup_mm later in future
> commit. Also, make init_pdx(), init_staticmem_pages(), setup_mm(), and
> populate_boot_allocator() public for future MPU inplementation.

Typo: s/inplementation/implementation/

> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> ---
> v5:
> - No change
> v4:
> - No change
> ---
>   xen/arch/arm/include/asm/setup.h |   5 +
>   xen/arch/arm/mmu/Makefile        |   1 +
>   xen/arch/arm/mmu/setup.c         | 339 +++++++++++++++++++++++++++++++
>   xen/arch/arm/setup.c             | 326 +----------------------------
>   4 files changed, 349 insertions(+), 322 deletions(-)
>   create mode 100644 xen/arch/arm/mmu/setup.c
> 
> diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
> index f0f64d228c..0922549631 100644
> --- a/xen/arch/arm/include/asm/setup.h
> +++ b/xen/arch/arm/include/asm/setup.h
> @@ -156,6 +156,11 @@ struct bootcmdline *boot_cmdline_find_by_kind(bootmodule_kind kind);
>   struct bootcmdline * boot_cmdline_find_by_name(const char *name);
>   const char *boot_module_kind_as_string(bootmodule_kind kind);
>   
> +extern void init_pdx(void);
> +extern void init_staticmem_pages(void);
> +extern void populate_boot_allocator(void);
> +extern void setup_mm(void);

Please avoid the 'extern' for new function declaration.

> +
>   extern uint32_t hyp_traps_vector[];
>   void init_traps(void);
>   
> diff --git a/xen/arch/arm/mmu/Makefile b/xen/arch/arm/mmu/Makefile
> index b18cec4836..4aa1fb466d 100644
> --- a/xen/arch/arm/mmu/Makefile
> +++ b/xen/arch/arm/mmu/Makefile
> @@ -1 +1,2 @@
>   obj-y += mm.o
> +obj-y += setup.o
> diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
> new file mode 100644
> index 0000000000..e05cca3f86
> --- /dev/null
> +++ b/xen/arch/arm/mmu/setup.c
> @@ -0,0 +1,339 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * xen/arch/arm/mmu/setup.c
> + *
> + * MMU-specific early bringup code for an ARMv7-A with virt extensions.

You modify the comment in arm/setup.c to mention ARMv8 but not here. The 
former feels somewhat unrelated.

> + */
> +
> +#include <xen/init.h>
> +#include <xen/serial.h>
> +#include <xen/libfdt/libfdt-xen.h>
> +#include <xen/mm.h>
> +#include <xen/param.h>
> +#include <xen/pfn.h>
> +#include <asm/fixmap.h>
> +#include <asm/page.h>
> +#include <asm/setup.h>
> +
> +#ifdef CONFIG_ARM_32

AFAICT, mmu/mm.c has nothing common between arm32 and arm64. So can we 
introduce arm{32, 64}/mmu/setup.c and move the respective code there?

[...]

> +void __init setup_mm(void)
> +{
> +    paddr_t ram_start, ram_end, ram_size, e, bank_start, bank_end, bank_size;
> +    paddr_t static_heap_end = 0, static_heap_size = 0;
> +    unsigned long heap_pages, xenheap_pages, domheap_pages;
> +    unsigned int i;
> +    const uint32_t ctr = READ_CP32(CTR);
> +
> +    if ( !bootinfo.mem.nr_banks )
> +        panic("No memory bank\n");
> +
> +    /* We only supports instruction caches implementing the IVIPT extension. */
> +    if ( ((ctr >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK) == ICACHE_POLICY_AIVIVT )
> +        panic("AIVIVT instruction cache not supported\n");

Thi check is unlikely going to be MMU setup.c.

> +
> +    init_pdx();
> +
> +    ram_start = bootinfo.mem.bank[0].start;
> +    ram_size  = bootinfo.mem.bank[0].size;
> +    ram_end   = ram_start + ram_size;
> +
> +    for ( i = 1; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        bank_start = bootinfo.mem.bank[i].start;
> +        bank_size = bootinfo.mem.bank[i].size;
> +        bank_end = bank_start + bank_size;
> +
> +        ram_size  = ram_size + bank_size;
> +        ram_start = min(ram_start,bank_start);
> +        ram_end   = max(ram_end,bank_end);
> +    }
> +
> +    total_pages = ram_size >> PAGE_SHIFT;
> +
> +    if ( bootinfo.static_heap )
> +    {
> +        for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
> +        {
> +            if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
> +                continue;
> +
> +            bank_start = bootinfo.reserved_mem.bank[i].start;
> +            bank_size = bootinfo.reserved_mem.bank[i].size;
> +            bank_end = bank_start + bank_size;
> +
> +            static_heap_size += bank_size;
> +            static_heap_end = max(static_heap_end, bank_end);
> +        }
> +
> +        heap_pages = static_heap_size >> PAGE_SHIFT;
> +    }
> +    else
> +        heap_pages = total_pages;
> +
> +    /*
> +     * If the user has not requested otherwise via the command line
> +     * then locate the xenheap using these constraints:
> +     *
> +     *  - must be contiguous
> +     *  - must be 32 MiB aligned
> +     *  - must not include Xen itself or the boot modules
> +     *  - must be at most 1GB or 1/32 the total RAM in the system (or static
> +          heap if enabled) if less

While you are moving the code can you add '*'? Could be done as a 
follow-up patch as well.

[...[

> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 44ccea03ca..b3dea41099 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -2,7 +2,7 @@
>   /*
>    * xen/arch/arm/setup.c
>    *
> - * Early bringup code for an ARMv7-A with virt extensions.
> + * Early bringup code for an ARMv7-A/ARM64v8R with virt extensions.

You are not yet supporting ARMv8-R. But we are supporting ARMv8-A which 
is not mentioned here. That said, I don't really the benefits of 
mentioning the major revision of the Arm Arm we support. For instance, 
we are likely going to be able to boot on Armv9.

So I would just remove anything from 'for'.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-14  4:25 ` [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c Henry Wang
@ 2023-08-21 21:31   ` Julien Grall
  2023-08-22  7:44     ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21 21:31 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen

Hi,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Penny Zheng <Penny.Zheng@arm.com>
> 
> Function copy_from_paddr() is defined in asm/setup.h, so it is better
> to be implemented in setup.c.

I don't agree with this reasoning. We used setup.h to declare prototype 
for function that are out of setup.c.

Looking at the overall of this series, it is unclear to me what is the 
difference between mmu/mm.c and mmu/setup.c. I know this is technically 
not a new problem but as we split the code, it would be a good 
opportunity to have a better split.

For instance, we have setup_mm() defined in setup.c but 
setup_pagetables() and mm.c. Both are technically related to the memory 
management. So having them in separate file is a bit odd.

I also don't like the idea of having again a massive mm.c files. So 
maybe we need a split like:
   * File 1: Boot CPU0 MM bringup (mmu/setup.c)
   * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
   * File 3: Page tables update. (mmu/pt.c)

Ideally file 1 should contain only init code/data so it can be marked as 
.init. So the static pagetables may want to be defined in mmu/pt.c.

Bertrand, Stefano, any thoughts?

[...]

> diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
> index e05cca3f86..889ada6b87 100644
> --- a/xen/arch/arm/mmu/setup.c
> +++ b/xen/arch/arm/mmu/setup.c
> @@ -329,6 +329,33 @@ void __init setup_mm(void)
>   }
>   #endif
>   
> +/*

Why did the second '*' disappear?

> + * copy_from_paddr - copy data from a physical address
> + * @dst: destination virtual address
> + * @paddr: source physical address
> + * @len: length to copy
> + */
> +void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
> +{
> +    void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
> +
> +    while (len) {
> +        unsigned long l, s;
> +
> +        s = paddr & (PAGE_SIZE-1);
> +        l = min(PAGE_SIZE - s, len);
> +
> +        set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
> +        memcpy(dst, src + s, l);
> +        clean_dcache_va_range(dst, l);
> +        clear_fixmap(FIXMAP_MISC);
> +
> +        paddr += l;
> +        dst += l;
> +        len -= l;
> +    }
> +}
> +
>   /*
>    * Local variables:
>    * mode: C

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-14  4:25 ` [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU Henry Wang
  2023-08-14  7:08   ` Jan Beulich
@ 2023-08-21 21:34   ` Julien Grall
  2023-08-22  2:11     ` Henry Wang
  1 sibling, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-21 21:34 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Penny Zheng, Jan Beulich, Paul Durrant, Roger Pau Monné,
	Stefano Stabellini, Bertrand Marquis, Wei Chen

Hi,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Penny Zheng <Penny.Zheng@arm.com>
> 
> SMMU subsystem is only supported in MMU system, so we make it dependent
> on CONFIG_HAS_MMU.

"only supported" as in it doesn't work with Xen or the HW is not 
supporting it?

Also, I am not entirely convinced that anything in passthrough would 
properly work with MPU. At least none of the IOMMU drivers are. So I 
would consider to completely disable HAS_PASSTHROUGH.

> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
> v5:
> - Add Acked-by tag from Jan.
> v4:
> - No change
> v3:
> - new patch
> ---
>   xen/drivers/passthrough/Kconfig | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
> index 864fcf3b0c..ebb350bc37 100644
> --- a/xen/drivers/passthrough/Kconfig
> +++ b/xen/drivers/passthrough/Kconfig
> @@ -5,6 +5,7 @@ config HAS_PASSTHROUGH
>   if ARM
>   config ARM_SMMU
>   	bool "ARM SMMUv1 and v2 driver"
> +	depends on MMU
>   	default y
>   	---help---
>   	  Support for implementations of the ARM System MMU architecture
> @@ -15,7 +16,7 @@ config ARM_SMMU
>   
>   config ARM_SMMU_V3
>   	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
> -	depends on ARM_64 && (!ACPI || BROKEN)
> +	depends on ARM_64 && (!ACPI || BROKEN) && MMU
>   	---help---
>   	 Support for implementations of the ARM System MMU architecture
>   	 version 3. Driver is in experimental stage and should not be used in

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-21 21:34   ` Julien Grall
@ 2023-08-22  2:11     ` Henry Wang
  2023-08-22  8:18       ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-22  2:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Penny Zheng, Jan Beulich, Paul Durrant,
	Roger Pau Monné,
	Stefano Stabellini, Bertrand Marquis, Wei Chen

Hi Julien,

> On Aug 22, 2023, at 05:34, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Penny Zheng <Penny.Zheng@arm.com>
>> SMMU subsystem is only supported in MMU system, so we make it dependent
>> on CONFIG_HAS_MMU.
> 
> "only supported" as in it doesn't work with Xen or the HW is not supporting it?

I think currently there are no hardware combination of MPU + SMMU, but
theoretically I think this is a valid combination since SMMU supports the linear
mapping. So would below reword looks good to you:

“Currently the hardware use case of connecting SMMU to MPU system is rarely
seen, so we make CONFIG_ARM_SMMU and CONFIG_ARM_SMMU_V3
dependent on CONFIG_MMU." 

> 
> Also, I am not entirely convinced that anything in passthrough would properly work with MPU. At least none of the IOMMU drivers are. So I would consider to completely disable HAS_PASSTHROUGH.

I agree, do you think adding below addition diff to this patch makes sense to you?
If so I guess would also need to mention this in commit message.

```
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 8a7b79b4b5..fd29d14ed6 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -13,7 +13,7 @@ config ARM
        def_bool y
        select HAS_ALTERNATIVE
        select HAS_DEVICE_TREE
-       select HAS_PASSTHROUGH
+       select HAS_PASSTHROUGH if MMU
        select HAS_PDX
        select HAS_UBSAN
        select IOMMU_FORCE_PT_SHARE
```

Kind regards,
Henry


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system
  2023-08-21 18:14   ` Julien Grall
@ 2023-08-22  2:42     ` Henry Wang
  2023-08-22  8:06       ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-22  2:42 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk

Hi Julien,

> On Aug 22, 2023, at 02:14, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Penny Zheng <penny.zheng@arm.com>
>> 
>>  diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h
>> index 734eb9b1d4..5d5de6995a 100644
>> --- a/xen/arch/arm/include/asm/fixmap.h
>> +++ b/xen/arch/arm/include/asm/fixmap.h
>> @@ -36,12 +36,7 @@ extern void clear_fixmap(unsigned int map);
>>    #define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot))
>>  -static inline unsigned int virt_to_fix(vaddr_t vaddr)
>> -{
>> -    BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
>> -
>> -    return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
>> -}
>> +extern unsigned int virt_to_fix(vaddr_t vaddr);
> 
> AFAICT, virt_to_fix() is not going to be implemented for the MPU code. This implies that no-one should call it.
> 
> Also, none of the definitions in fixmap.h actually makes sense for the MPU. I would prefer if we instead try to lmit the include of fixmap to when this is strictly necessary. Looking for the inclusion in staging I could find:
> 
> 42sh> ack "\#include" | ack "fixmap" | ack -v x86
> arch/arm/acpi/lib.c:28:#include <asm/fixmap.h>
> arch/arm/kernel.c:19:#include <asm/fixmap.h>
> arch/arm/mm.c:27:#include <asm/fixmap.h>
> arch/arm/include/asm/fixmap.h:7:#include <xen/acpi.h>
> arch/arm/include/asm/fixmap.h:8:#include <xen/pmap.h>
> arch/arm/include/asm/pmap.h:6:#include <asm/fixmap.h>
> arch/arm/include/asm/early_printk.h:14:#include <asm/fixmap.h>
> common/efi/boot.c:30:#include <asm/fixmap.h>
> common/pmap.c:7:#include <asm/fixmap.h>
> drivers/acpi/apei/erst.c:36:#include <asm/fixmap.h>
> drivers/acpi/apei/apei-io.c:32:#include <asm/fixmap.h>
> drivers/char/xhci-dbc.c:30:#include <asm/fixmap.h>
> drivers/char/ehci-dbgp.c:16:#include <asm/fixmap.h>
> drivers/char/ns16550.c:40:#include <asm/fixmap.h>
> drivers/char/xen_pv_console.c:28:#include <asm/fixmap.h>
> 
> Some of them are gone after your rework. The only remaining that we care are in kernel.h (but I think it can be removed after your series).

I think you are correct, so I reverted the virt_to_fix() change from this patch,
deleted the include of asm/fixmap.h in kernel.c and put this patch as the (last - 1)th
patch of the series. The building of Xen works fine.

Does below updated patch look good to you?
```
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index eb0413336b..8a7b79b4b5 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -15,7 +15,6 @@ config ARM
        select HAS_DEVICE_TREE
        select HAS_PASSTHROUGH
        select HAS_PDX
-       select HAS_PMAP
        select HAS_UBSAN
        select IOMMU_FORCE_PT_SHARE

@@ -61,6 +60,7 @@ config PADDR_BITS

 config MMU
        def_bool y
+       select HAS_PMAP

 source "arch/Kconfig"

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 0d433a32e7..bc3e5bd6f9 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -16,7 +16,6 @@
 #include <xen/vmap.h>

 #include <asm/byteorder.h>
-#include <asm/fixmap.h>
 #include <asm/kernel.h>
 #include <asm/setup.h>
```

I will update the commit message accordingly.

Kind regards,
Henry

> 
> So I think it would be feasible to not touch fixmap.h at all.
> 
> Cheers,
> 
> -- 
> Julien Grall
> 



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-21 21:31   ` Julien Grall
@ 2023-08-22  7:44     ` Henry Wang
  2023-08-22  8:42       ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-22  7:44 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, Bertrand Marquis
  Cc: Xen-devel, Penny Zheng, Volodymyr Babchuk, Wei Chen

Hi Julien, Stefano, Bertrand,

> On Aug 22, 2023, at 05:31, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Penny Zheng <Penny.Zheng@arm.com>
>> Function copy_from_paddr() is defined in asm/setup.h, so it is better
>> to be implemented in setup.c.
> 
> I don't agree with this reasoning. We used setup.h to declare prototype for function that are out of setup.c.
> 
> Looking at the overall of this series, it is unclear to me what is the difference between mmu/mm.c and mmu/setup.c. I know this is technically not a new problem but as we split the code, it would be a good opportunity to have a better split.
> 
> For instance, we have setup_mm() defined in setup.c but setup_pagetables() and mm.c. Both are technically related to the memory management. So having them in separate file is a bit odd.

Skimming through the comments, it looks like we have a common problem
in patch 7, 9, 10, 12 about how to move/split the code. So instead of having
the discussion in each patch, I would like to propose that we can discuss all
of these in a common place here.

> 
> I also don't like the idea of having again a massive mm.c files. So maybe we need a split like:
>  * File 1: Boot CPU0 MM bringup (mmu/setup.c)
>  * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
>  * File 3: Page tables update. (mmu/pt.c)
> 
> Ideally file 1 should contain only init code/data so it can be marked as .init. So the static pagetables may want to be defined in mmu/pt.c.

So based on Julien’s suggestion, Penny and I worked a bit on the current
functions in “arch/arm/mm.c” and we would like to propose below split
scheme, would you please comment on if below makes sense to you,
thanks!

"""
static void __init __maybe_unused build_assertions()      -> arch/arm/mm.c
static lpae_t *xen_map_table()                            -> mmu/pt.c
static void xen_unmap_table()                             -> mmu/pt.c
void dump_pt_walk()                                       -> mmu/pt.c
void dump_hyp_walk()                                      -> mmu/pt.c
lpae_t mfn_to_xen_entry()                                 -> mmu/pt.c
void set_fixmap()                                         -> mmu/pt.c  
void clear_fixmap()                                       -> mmu/pt.c
void flush_page_to_ram()                                  -> arch/arm/mm.c?
lpae_t pte_of_xenaddr()                                   -> mmu/pt.c
void * __init early_fdt_map()                             -> mmu/setup.c
void __init remove_early_mappings()                       -> mmu/setup.c
static void xen_pt_enforce_wnx()                          -> mmu/pt.c, export it
static void clear_table()                                 -> mmu/smpboot.c
void __init setup_pagetables()                            -> mmu/setup.c
static void clear_boot_pagetables()                       -> mmu/smpboot.c
int init_secondary_pagetables()                           -> mmu/smpboot.c
void mmu_init_secondary_cpu()                             -> mmu/smpboot.c
void __init setup_directmap_mappings()                    -> mmu/setup.c
void __init setup_frametable_mappings()                   -> mmu/setup.c
void *__init arch_vmap_virt_end()                         -> arch/arm/mm.c or mmu/setup.c?
void *ioremap_attr()                                      -> mmu/pt.c
void *ioremap()                                           -> mmu/pt.c
static int create_xen_table()                             -> mmu/pt.c 
static int xen_pt_next_level()                            -> mmu/pt.c
static bool xen_pt_check_entry()                          -> mmu/pt.c 
static int xen_pt_update_entry()                          -> mmu/pt.c
static int xen_pt_mapping_level()                         -> mmu/pt.c 
static unsigned int xen_pt_check_contig()                 -> mmu/pt.c 
static int xen_pt_update()                                -> mmu/pt.c 
int map_pages_to_xen()                                    -> mmu/pt.c 
int __init populate_pt_range()                            -> mmu/pt.c
int destroy_xen_mappings()                                -> mmu/pt.c
int modify_xen_mappings()                                 -> mmu/pt.c
void free_init_memory()                                   -> mmu/setup.c
void arch_dump_shared_mem_info()                          -> arch/arm/mm.c
int steal_page()                                          -> arch/arm/mm.c
int page_is_ram_type()                                    -> arch/arm/mm.c
unsigned long domain_get_maximum_gpfn()                   -> arch/arm/mm.c
void share_xen_page_with_guest()                          -> arch/arm/mm.c
int xenmem_add_to_physmap_one()                           -> arch/arm/mm.c
long arch_memory_op()                                     -> arch/arm/mm.c
static struct domain *page_get_owner_and_nr_reference()   -> arch/arm/mm.c
struct domain *page_get_owner_and_reference()             -> arch/arm/mm.c
void put_page_nr()                                        -> arch/arm/mm.c
void put_page()                                           -> arch/arm/mm.c
bool get_page_nr()                                        -> arch/arm/mm.c
bool get_page()                                           -> arch/arm/mm.c
int get_page_type()                                       -> arch/arm/mm.c
void put_page_type()                                      -> arch/arm/mm.c
int create_grant_host_mapping()                           -> arch/arm/mm.c
int replace_grant_host_mapping()                          -> arch/arm/mm.c
bool is_iomem_page()                                      -> arch/arm/mm.c
void clear_and_clean_page()                               -> arch/arm/mm.c
unsigned long get_upper_mfn_bound()                       -> arch/arm/mm.c
"""

> 
> Bertrand, Stefano, any thoughts?
> 
> [...]
> 
>> diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
>> index e05cca3f86..889ada6b87 100644
>> --- a/xen/arch/arm/mmu/setup.c
>> +++ b/xen/arch/arm/mmu/setup.c
>> @@ -329,6 +329,33 @@ void __init setup_mm(void)
>>  }
>>  #endif
>>  +/*
> 
> Why did the second '*' disappear?

According to the CODING_STYLE, we should use something like this:

/*
 * Example, multi-line comment block.
 *
 * Note beginning and end markers on separate lines and leading '*'.
 */

Instead of "/**” in the beginning. But I think you made a point, I need
to mention that I took the opportunity to fix the comment style in commit
message.

Kind regards,
Henry

> 
>> + * copy_from_paddr - copy data from a physical address
>> + * @dst: destination virtual address
>> + * @paddr: source physical address
>> + * @len: length to copy
>> + */
>> +void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
>> +{
>> 
> 
> Cheers,
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system
  2023-08-22  2:42     ` Henry Wang
@ 2023-08-22  8:06       ` Julien Grall
  2023-08-22  8:08         ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-22  8:06 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk

Hi,

On 22/08/2023 03:42, Henry Wang wrote:
> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> index 0d433a32e7..bc3e5bd6f9 100644
> --- a/xen/arch/arm/kernel.c
> +++ b/xen/arch/arm/kernel.c
> @@ -16,7 +16,6 @@
>   #include <xen/vmap.h>
> 
>   #include <asm/byteorder.h>
> -#include <asm/fixmap.h>
>   #include <asm/kernel.h>
>   #include <asm/setup.h>
> ```

The changes in kernel.c should go in patch #12 where the fixmap user is 
moved out.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system
  2023-08-22  8:06       ` Julien Grall
@ 2023-08-22  8:08         ` Henry Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-22  8:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk

Hi Julien,

> On Aug 22, 2023, at 16:06, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 22/08/2023 03:42, Henry Wang wrote:
>> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
>> index 0d433a32e7..bc3e5bd6f9 100644
>> --- a/xen/arch/arm/kernel.c
>> +++ b/xen/arch/arm/kernel.c
>> @@ -16,7 +16,6 @@
>>  #include <xen/vmap.h>
>>  #include <asm/byteorder.h>
>> -#include <asm/fixmap.h>
>>  #include <asm/kernel.h>
>>  #include <asm/setup.h>
>> ```
> 
> The changes in kernel.c should go in patch #12 where the fixmap user is moved out.

Thanks, sounds good. I will fix it in v6.

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-22  2:11     ` Henry Wang
@ 2023-08-22  8:18       ` Julien Grall
  2023-08-22  8:48         ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-22  8:18 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Penny Zheng, Jan Beulich, Paul Durrant,
	Roger Pau Monné,
	Stefano Stabellini, Bertrand Marquis, Wei Chen

On 22/08/2023 03:11, Henry Wang wrote:
> Hi Julien,

Hi,

>> On Aug 22, 2023, at 05:34, Julien Grall <julien@xen.org> wrote:
>>
>> Hi,
>>
>> On 14/08/2023 05:25, Henry Wang wrote:
>>> From: Penny Zheng <Penny.Zheng@arm.com>
>>> SMMU subsystem is only supported in MMU system, so we make it dependent
>>> on CONFIG_HAS_MMU.
>>
>> "only supported" as in it doesn't work with Xen or the HW is not supporting it?
> 
> I think currently there are no hardware combination of MPU + SMMU, but
> theoretically I think this is a valid combination since SMMU supports the linear
> mapping. So would below reword looks good to you:
> 
> “Currently the hardware use case of connecting SMMU to MPU system is rarely
> seen, so we make CONFIG_ARM_SMMU and CONFIG_ARM_SMMU_V3
> dependent on CONFIG_MMU."

I read this as there might be MPU system with SMMU in development. What 
you want to explain is why we can't let the developper to select the 
SMMU driver on an MPU system.

 From my understanding this is because the drivers are expecting to use 
the page-tables and the concept doesn't exist in the MPU system. So the 
drivers are not ready for the MPU.

> 
>>
>> Also, I am not entirely convinced that anything in passthrough would properly work with MPU. At least none of the IOMMU drivers are. So I would consider to completely disable HAS_PASSTHROUGH.
> 
> I agree, do you think adding below addition diff to this patch makes sense to you?

I think it should be a replacement because none of the IOMMU drivers 
works for the MPU. So I would rather prefer if we avoid adding "depends 
on" on all of them (even if there are only 3) for now.

> If so I guess would also need to mention this in commit message.

Did you confirm that Xen MPU still build without HAS_PASSTHROUGH?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-22  7:44     ` Henry Wang
@ 2023-08-22  8:42       ` Julien Grall
  2023-08-22  8:54         ` Henry Wang
  2023-08-23  0:10         ` Stefano Stabellini
  0 siblings, 2 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-22  8:42 UTC (permalink / raw)
  To: Henry Wang, Stefano Stabellini, Bertrand Marquis
  Cc: Xen-devel, Penny Zheng, Volodymyr Babchuk, Wei Chen

On 22/08/2023 08:44, Henry Wang wrote:
>> On Aug 22, 2023, at 05:31, Julien Grall <julien@xen.org> wrote:
>>
>> Hi,
>>
>> On 14/08/2023 05:25, Henry Wang wrote:
>>> From: Penny Zheng <Penny.Zheng@arm.com>
>>> Function copy_from_paddr() is defined in asm/setup.h, so it is better
>>> to be implemented in setup.c.
>>
>> I don't agree with this reasoning. We used setup.h to declare prototype for function that are out of setup.c.
>>
>> Looking at the overall of this series, it is unclear to me what is the difference between mmu/mm.c and mmu/setup.c. I know this is technically not a new problem but as we split the code, it would be a good opportunity to have a better split.
>>
>> For instance, we have setup_mm() defined in setup.c but setup_pagetables() and mm.c. Both are technically related to the memory management. So having them in separate file is a bit odd.
> 
> Skimming through the comments, it looks like we have a common problem
> in patch 7, 9, 10, 12 about how to move/split the code. So instead of having
> the discussion in each patch, I would like to propose that we can discuss all
> of these in a common place here.

+1.

> 
>>
>> I also don't like the idea of having again a massive mm.c files. So maybe we need a split like:
>>   * File 1: Boot CPU0 MM bringup (mmu/setup.c)
>>   * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
>>   * File 3: Page tables update. (mmu/pt.c)
>>
>> Ideally file 1 should contain only init code/data so it can be marked as .init. So the static pagetables may want to be defined in mmu/pt.c.
> 
> So based on Julien’s suggestion, Penny and I worked a bit on the current
> functions in “arch/arm/mm.c” and we would like to propose below split
> scheme, would you please comment on if below makes sense to you,
> thanks!
> 
> """
> static void __init __maybe_unused build_assertions()      -> arch/arm/mm.c

All the existing build assertions seems to be MMU specific. So shouldn't 
they be moved to mmu/mm.c.

> static lpae_t *xen_map_table()                            -> mmu/pt.c
> static void xen_unmap_table()                             -> mmu/pt.c
> void dump_pt_walk()                                       -> mmu/pt.c
> void dump_hyp_walk()                                      -> mmu/pt.c
> lpae_t mfn_to_xen_entry()                                 -> mmu/pt.c
> void set_fixmap()                                         -> mmu/pt.c
> void clear_fixmap()                                       -> mmu/pt.c
> void flush_page_to_ram()                                  -> arch/arm/mm.c?

I think it should stay in arch/arm/mm.c because you will probably need 
to clean a page even on MPU systems.

> lpae_t pte_of_xenaddr()                                   -> mmu/pt.c
> void * __init early_fdt_map()                             -> mmu/setup.c
> void __init remove_early_mappings()                       -> mmu/setup.c
> static void xen_pt_enforce_wnx()                          -> mmu/pt.c, export it

AFAIU, it would be called from smpboot.c and setup.c. For the former, 
the caller is mmu_init_secondary_cpu() which I think can be folded in 
head.S.

If we do that, then xen_pt_enforce_wnx() can be moved in setup.c and 
doesn't need to be exported.

> static void clear_table()                                 -> mmu/smpboot.c
> void __init setup_pagetables()                            -> mmu/setup.c
> static void clear_boot_pagetables()                       -> mmu/smpboot.c
> int init_secondary_pagetables()                           -> mmu/smpboot.c
> void mmu_init_secondary_cpu()                             -> mmu/smpboot.c
> void __init setup_directmap_mappings()                    -> mmu/setup.c
> void __init setup_frametable_mappings()                   -> mmu/setup.c
> void *__init arch_vmap_virt_end()                         -> arch/arm/mm.c or mmu/setup.c?

AFAIU, the VMAP will not be used for MPU systems. So this would want to 
go in mmu/. I am ok with setup.c.

> void *ioremap_attr()                                      -> mmu/pt.c
> void *ioremap()                                           -> mmu/pt.c
> static int create_xen_table()                             -> mmu/pt.c
> static int xen_pt_next_level()                            -> mmu/pt.c
> static bool xen_pt_check_entry()                          -> mmu/pt.c
> static int xen_pt_update_entry()                          -> mmu/pt.c
> static int xen_pt_mapping_level()                         -> mmu/pt.c
> static unsigned int xen_pt_check_contig()                 -> mmu/pt.c
> static int xen_pt_update()                                -> mmu/pt.c
> int map_pages_to_xen()                                    -> mmu/pt.c
> int __init populate_pt_range()                            -> mmu/pt.c
> int destroy_xen_mappings()                                -> mmu/pt.c
> int modify_xen_mappings()                                 -> mmu/pt.c
> void free_init_memory()                                   -> mmu/setup.c
> void arch_dump_shared_mem_info()                          -> arch/arm/mm.c
> int steal_page()                                          -> arch/arm/mm.c
> int page_is_ram_type()                                    -> arch/arm/mm.c
> unsigned long domain_get_maximum_gpfn()                   -> arch/arm/mm.c
> void share_xen_page_with_guest()                          -> arch/arm/mm.c
> int xenmem_add_to_physmap_one()                           -> arch/arm/mm.c
> long arch_memory_op()                                     -> arch/arm/mm.c
> static struct domain *page_get_owner_and_nr_reference()   -> arch/arm/mm.c
> struct domain *page_get_owner_and_reference()             -> arch/arm/mm.c
> void put_page_nr()                                        -> arch/arm/mm.c
> void put_page()                                           -> arch/arm/mm.c
> bool get_page_nr()                                        -> arch/arm/mm.c
> bool get_page()                                           -> arch/arm/mm.c
> int get_page_type()                                       -> arch/arm/mm.c
> void put_page_type()                                      -> arch/arm/mm.c
> int create_grant_host_mapping()                           -> arch/arm/mm.c
> int replace_grant_host_mapping()                          -> arch/arm/mm.c
> bool is_iomem_page()                                      -> arch/arm/mm.c
> void clear_and_clean_page()                               -> arch/arm/mm.c
> unsigned long get_upper_mfn_bound()                       -> arch/arm/mm.c
> """

The placement of all the ones I didn't comment on look fine to me. It 
would be good to have a second opinion from either Bertrand or Stefano 
before you start moving the functions.

>>> diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
>>> index e05cca3f86..889ada6b87 100644
>>> --- a/xen/arch/arm/mmu/setup.c
>>> +++ b/xen/arch/arm/mmu/setup.c
>>> @@ -329,6 +329,33 @@ void __init setup_mm(void)
>>>   }
>>>   #endif
>>>   +/*
>>
>> Why did the second '*' disappear?
> 
> According to the CODING_STYLE, we should use something like this:
> 
> /*
>   * Example, multi-line comment block.
>   *
>   * Note beginning and end markers on separate lines and leading '*'.
>   */
> 
> Instead of "/**” in the beginning. But I think you made a point, I need
> to mention that I took the opportunity to fix the comment style in commit
> message.

We have started to use /** which I think is for Doxygen (see the PDX 
series). So I think the CODING_STYLE needs to be updated rather than 
removing the extra *.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-22  8:18       ` Julien Grall
@ 2023-08-22  8:48         ` Henry Wang
  2023-08-22  8:55           ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-22  8:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Penny Zheng, Jan Beulich, Paul Durrant,
	Roger Pau Monné,
	Stefano Stabellini, Bertrand Marquis, Wei Chen

Hi Julien,

> On Aug 22, 2023, at 16:18, Julien Grall <julien@xen.org> wrote:
> 
> On 22/08/2023 03:11, Henry Wang wrote:
>> Hi Julien,
> 
> Hi,
> 
>>> On Aug 22, 2023, at 05:34, Julien Grall <julien@xen.org> wrote:
>>> 
>>> Hi,
>>> 
>>> On 14/08/2023 05:25, Henry Wang wrote:
>>>> From: Penny Zheng <Penny.Zheng@arm.com>
>>>> SMMU subsystem is only supported in MMU system, so we make it dependent
>>>> on CONFIG_HAS_MMU.
>>> 
>>> "only supported" as in it doesn't work with Xen or the HW is not supporting it?
>> I think currently there are no hardware combination of MPU + SMMU, but
>> theoretically I think this is a valid combination since SMMU supports the linear
>> mapping. So would below reword looks good to you:
>> “Currently the hardware use case of connecting SMMU to MPU system is rarely
>> seen, so we make CONFIG_ARM_SMMU and CONFIG_ARM_SMMU_V3
>> dependent on CONFIG_MMU."
> 
> I read this as there might be MPU system with SMMU in development. What you want to explain is why we can't let the developper to select the SMMU driver on an MPU system.
> 
> From my understanding this is because the drivers are expecting to use the page-tables and the concept doesn't exist in the MPU system. So the drivers are not ready for the MPU.

I agree.

> 
>>> 
>>> Also, I am not entirely convinced that anything in passthrough would properly work with MPU. At least none of the IOMMU drivers are. So I would consider to completely disable HAS_PASSTHROUGH.
>> I agree, do you think adding below addition diff to this patch makes sense to you?
> 
> I think it should be a replacement because none of the IOMMU drivers works for the MPU. So I would rather prefer if we avoid adding "depends on" on all of them (even if there are only 3) for now.

I am a bit confused, I read your above explanation to the commit message as you
agree with the idea that making SMMU driver not selectable by MPU system. If we
replace this with “select HAS_PASSTHROUGH if MMU”, the SMMU driver will be
selectable by MPU system.

But...


> 
>> If so I guess would also need to mention this in commit message.
> 
> Did you confirm that Xen MPU still build without HAS_PASSTHROUGH?

…this is a good catch, no MPU is not buildable without HAS_PASSTHROUGH, we
will have:

```
In file included from ./include/xen/sched.h:12,
from ./include/xen/iocap.h:10,
from arch/arm/p2m.c:3:
arch/arm/p2m.c: In function 'p2m_set_way_flush':
./include/xen/iommu.h:366:40: error: 'struct domain' has no member named 'iommu'
366 | #define dom_iommu(d) (&(d)->iommu)
| ^~
./include/xen/iommu.h:371:36: note: in expansion of macro 'dom_iommu'
371 | #define iommu_use_hap_pt(d) (dom_iommu(d)->hap_pt_share)
| ^~~~~~~~~
arch/arm/p2m.c:617:10: note: in expansion of macro 'iommu_use_hap_pt'
617 | if ( iommu_use_hap_pt(current->domain) )
| ^~~~~~~~~~~~~~~~
In file included from ./include/xen/sched.h:12,
from ./arch/arm/include/asm/grant_table.h:7,
from ./include/xen/grant_table.h:29,
from arch/arm/domain.c:4:
arch/arm/domain.c: In function 'arch_domain_creation_finished':
./include/xen/iommu.h:366:40: error: 'struct domain' has no member named 'iommu'
366 | #define dom_iommu(d) (&(d)->iommu)
| ^~
./include/xen/iommu.h:371:36: note: in expansion of macro 'dom_iommu'
371 | #define iommu_use_hap_pt(d) (dom_iommu(d)->hap_pt_share)
| ^~~~~~~~~
arch/arm/domain.c:880:11: note: in expansion of macro 'iommu_use_hap_pt'
880 | if ( !iommu_use_hap_pt(d) )
| ^~~~~~~~~~~~~~~~
CC arch/arm/shutdown.o
CC lib/strlen.o
CC arch/arm/smp.o
CC arch/arm/smpboot.o
CC common/xmalloc_tlsf.o
CC lib/strncasecmp.o
make[2]: *** [Rules.mk:247: arch/arm/p2m.o] Error 1
make[2]: *** Waiting for unfinished jobs....
```

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-22  8:42       ` Julien Grall
@ 2023-08-22  8:54         ` Henry Wang
  2023-08-23  0:10         ` Stefano Stabellini
  1 sibling, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-22  8:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Bertrand Marquis, Xen-devel, Penny Zheng,
	Volodymyr Babchuk, Wei Chen

Hi Julien,

> On Aug 22, 2023, at 16:42, Julien Grall <julien@xen.org> wrote:
> 
> On 22/08/2023 08:44, Henry Wang wrote:
>>> On Aug 22, 2023, at 05:31, Julien Grall <julien@xen.org> wrote:
>>> 
>>> Hi,
>>> 
>>> On 14/08/2023 05:25, Henry Wang wrote:
>>>> From: Penny Zheng <Penny.Zheng@arm.com>
>>>> Function copy_from_paddr() is defined in asm/setup.h, so it is better
>>>> to be implemented in setup.c.
>>> 
>>> I don't agree with this reasoning. We used setup.h to declare prototype for function that are out of setup.c.
>>> 
>>> Looking at the overall of this series, it is unclear to me what is the difference between mmu/mm.c and mmu/setup.c. I know this is technically not a new problem but as we split the code, it would be a good opportunity to have a better split.
>>> 
>>> For instance, we have setup_mm() defined in setup.c but setup_pagetables() and mm.c. Both are technically related to the memory management. So having them in separate file is a bit odd.
>> Skimming through the comments, it looks like we have a common problem
>> in patch 7, 9, 10, 12 about how to move/split the code. So instead of having
>> the discussion in each patch, I would like to propose that we can discuss all
>> of these in a common place here.
> 
> +1.
> 
>>> 
>>> I also don't like the idea of having again a massive mm.c files. So maybe we need a split like:
>>>  * File 1: Boot CPU0 MM bringup (mmu/setup.c)
>>>  * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
>>>  * File 3: Page tables update. (mmu/pt.c)
>>> 
>>> Ideally file 1 should contain only init code/data so it can be marked as .init. So the static pagetables may want to be defined in mmu/pt.c.
>> So based on Julien’s suggestion, Penny and I worked a bit on the current
>> functions in “arch/arm/mm.c” and we would like to propose below split
>> scheme, would you please comment on if below makes sense to you,
>> thanks!
>> """
>> static void __init __maybe_unused build_assertions()      -> arch/arm/mm.c
> 
> All the existing build assertions seems to be MMU specific. So shouldn't they be moved to mmu/mm.c.
> 
>> static lpae_t *xen_map_table()                            -> mmu/pt.c
>> static void xen_unmap_table()                             -> mmu/pt.c
>> void dump_pt_walk()                                       -> mmu/pt.c
>> void dump_hyp_walk()                                      -> mmu/pt.c
>> lpae_t mfn_to_xen_entry()                                 -> mmu/pt.c
>> void set_fixmap()                                         -> mmu/pt.c
>> void clear_fixmap()                                       -> mmu/pt.c
>> void flush_page_to_ram()                                  -> arch/arm/mm.c?
> 
> I think it should stay in arch/arm/mm.c because you will probably need to clean a page even on MPU systems.
> 
>> lpae_t pte_of_xenaddr()                                   -> mmu/pt.c
>> void * __init early_fdt_map()                             -> mmu/setup.c
>> void __init remove_early_mappings()                       -> mmu/setup.c
>> static void xen_pt_enforce_wnx()                          -> mmu/pt.c, export it
> 
> AFAIU, it would be called from smpboot.c and setup.c. For the former, the caller is mmu_init_secondary_cpu() which I think can be folded in head.S.
> 
> If we do that, then xen_pt_enforce_wnx() can be moved in setup.c and doesn't need to be exported.
> 
>> static void clear_table()                                 -> mmu/smpboot.c
>> void __init setup_pagetables()                            -> mmu/setup.c
>> static void clear_boot_pagetables()                       -> mmu/smpboot.c
>> int init_secondary_pagetables()                           -> mmu/smpboot.c
>> void mmu_init_secondary_cpu()                             -> mmu/smpboot.c
>> void __init setup_directmap_mappings()                    -> mmu/setup.c
>> void __init setup_frametable_mappings()                   -> mmu/setup.c
>> void *__init arch_vmap_virt_end()                         -> arch/arm/mm.c or mmu/setup.c?
> 
> AFAIU, the VMAP will not be used for MPU systems. So this would want to go in mmu/. I am ok with setup.c.
> 
>> void *ioremap_attr()                                      -> mmu/pt.c
>> void *ioremap()                                           -> mmu/pt.c
>> static int create_xen_table()                             -> mmu/pt.c
>> static int xen_pt_next_level()                            -> mmu/pt.c
>> static bool xen_pt_check_entry()                          -> mmu/pt.c
>> static int xen_pt_update_entry()                          -> mmu/pt.c
>> static int xen_pt_mapping_level()                         -> mmu/pt.c
>> static unsigned int xen_pt_check_contig()                 -> mmu/pt.c
>> static int xen_pt_update()                                -> mmu/pt.c
>> int map_pages_to_xen()                                    -> mmu/pt.c
>> int __init populate_pt_range()                            -> mmu/pt.c
>> int destroy_xen_mappings()                                -> mmu/pt.c
>> int modify_xen_mappings()                                 -> mmu/pt.c
>> void free_init_memory()                                   -> mmu/setup.c
>> void arch_dump_shared_mem_info()                          -> arch/arm/mm.c
>> int steal_page()                                          -> arch/arm/mm.c
>> int page_is_ram_type()                                    -> arch/arm/mm.c
>> unsigned long domain_get_maximum_gpfn()                   -> arch/arm/mm.c
>> void share_xen_page_with_guest()                          -> arch/arm/mm.c
>> int xenmem_add_to_physmap_one()                           -> arch/arm/mm.c
>> long arch_memory_op()                                     -> arch/arm/mm.c
>> static struct domain *page_get_owner_and_nr_reference()   -> arch/arm/mm.c
>> struct domain *page_get_owner_and_reference()             -> arch/arm/mm.c
>> void put_page_nr()                                        -> arch/arm/mm.c
>> void put_page()                                           -> arch/arm/mm.c
>> bool get_page_nr()                                        -> arch/arm/mm.c
>> bool get_page()                                           -> arch/arm/mm.c
>> int get_page_type()                                       -> arch/arm/mm.c
>> void put_page_type()                                      -> arch/arm/mm.c
>> int create_grant_host_mapping()                           -> arch/arm/mm.c
>> int replace_grant_host_mapping()                          -> arch/arm/mm.c
>> bool is_iomem_page()                                      -> arch/arm/mm.c
>> void clear_and_clean_page()                               -> arch/arm/mm.c
>> unsigned long get_upper_mfn_bound()                       -> arch/arm/mm.c
>> """
> 
> The placement of all the ones I didn't comment on look fine to me. It would be good to have a second opinion from either Bertrand or Stefano before you start moving the functions.

I agree to all your above comments, and sure I can wait for a few days for other
maintainers opinion before start moving the code.

> 
>>>> diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
>>>> index e05cca3f86..889ada6b87 100644
>>>> --- a/xen/arch/arm/mmu/setup.c
>>>> +++ b/xen/arch/arm/mmu/setup.c
>>>> @@ -329,6 +329,33 @@ void __init setup_mm(void)
>>>>  }
>>>>  #endif
>>>>  +/*
>>> 
>>> Why did the second '*' disappear?
>> According to the CODING_STYLE, we should use something like this:
>> /*
>>  * Example, multi-line comment block.
>>  *
>>  * Note beginning and end markers on separate lines and leading '*'.
>>  */
>> Instead of "/**” in the beginning. But I think you made a point, I need
>> to mention that I took the opportunity to fix the comment style in commit
>> message.
> 
> We have started to use /** which I think is for Doxygen (see the PDX series). So I think the CODING_STYLE needs to be updated rather than removing the extra *.

Ahhh thanks for the context, I totally forgot the Doxygen…Then I will use
"/**” in v6.

Kind regards,
Henry

> 
> Cheers,
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU
  2023-08-22  8:48         ` Henry Wang
@ 2023-08-22  8:55           ` Julien Grall
  0 siblings, 0 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-22  8:55 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Penny Zheng, Jan Beulich, Paul Durrant,
	Roger Pau Monné,
	Stefano Stabellini, Bertrand Marquis, Wei Chen



On 22/08/2023 09:48, Henry Wang wrote:
> Hi Julien,
> 
>> On Aug 22, 2023, at 16:18, Julien Grall <julien@xen.org> wrote:
>>
>> On 22/08/2023 03:11, Henry Wang wrote:
>>> Hi Julien,
>>
>> Hi,
>>
>>>> On Aug 22, 2023, at 05:34, Julien Grall <julien@xen.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 14/08/2023 05:25, Henry Wang wrote:
>>>>> From: Penny Zheng <Penny.Zheng@arm.com>
>>>>> SMMU subsystem is only supported in MMU system, so we make it dependent
>>>>> on CONFIG_HAS_MMU.
>>>>
>>>> "only supported" as in it doesn't work with Xen or the HW is not supporting it?
>>> I think currently there are no hardware combination of MPU + SMMU, but
>>> theoretically I think this is a valid combination since SMMU supports the linear
>>> mapping. So would below reword looks good to you:
>>> “Currently the hardware use case of connecting SMMU to MPU system is rarely
>>> seen, so we make CONFIG_ARM_SMMU and CONFIG_ARM_SMMU_V3
>>> dependent on CONFIG_MMU."
>>
>> I read this as there might be MPU system with SMMU in development. What you want to explain is why we can't let the developper to select the SMMU driver on an MPU system.
>>
>>  From my understanding this is because the drivers are expecting to use the page-tables and the concept doesn't exist in the MPU system. So the drivers are not ready for the MPU.
> 
> I agree.
> 
>>
>>>>
>>>> Also, I am not entirely convinced that anything in passthrough would properly work with MPU. At least none of the IOMMU drivers are. So I would consider to completely disable HAS_PASSTHROUGH.
>>> I agree, do you think adding below addition diff to this patch makes sense to you?
>>
>> I think it should be a replacement because none of the IOMMU drivers works for the MPU. So I would rather prefer if we avoid adding "depends on" on all of them (even if there are only 3) for now.
> 
> I am a bit confused, I read your above explanation to the commit message as you
> agree with the idea that making SMMU driver not selectable by MPU system. 

No. I would rather prefer if HAS_PASSTHROUGH is completely disabled 
because I doubt the IOMMU infrastructure will work without any change 
for the MPU.

So it sounds incorrect to enable HAS_PASSTHROUGH until one of you 
confirm it works.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  2023-08-14  4:25 ` [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h} Henry Wang
@ 2023-08-22 18:01   ` Julien Grall
  2023-08-23  1:41     ` Henry Wang
  2023-08-23  3:47     ` Penny Zheng
  0 siblings, 2 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-22 18:01 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen

Hi Henry,

On 14/08/2023 05:25, Henry Wang wrote:
> From: Penny Zheng <penny.zheng@arm.com>
> 
> Current P2M implementation is designed for MMU system only.
> We move the MMU-specific codes into mmu/p2m.c, and only keep generic
> codes in p2m.c, like VMID allocator, etc. We also move MMU-specific
> definitions and declarations to mmu/p2m.h, such as p2m_tlb_flush_sync().
> Also expose previously static functions p2m_vmid_allocator_init(),
> p2m_alloc_vmid(), __p2m_set_entry() and setup_virt_paging_one()

Looking at the code, it seemsm that you need to keep expose 
__p2m_set_entry() because of p2m_relinquish_mapping(). However, it is 
not clear how this code is supposed to work for the MPU. So should we 
instead from p2m_relinquish_mapping() to mmu/p2m.c?

Other functions which doesn't seem to make sense in p2m.c are:
   * p2m_clear_root_pages(): AFAIU there is no concept of root in the 
MPU. This also means that we possibly want to move out anything specific 
to the MMU from 'struct p2m'. This could be done separately.
   * p2m_flush_vm(): This is built with MMU in mind as we can use the 
page-table to track access pages. You don't have that fine granularity 
in the MPU.

> for futher MPU usage.

typo: futher/further/

> 
> With the code movement, global variable max_vmid is used in multiple
> files instead of a single file (and will be used in MPU P2M
> implementation), declare it in the header and remove the "static" of
> this variable.
> 
> Add #ifdef CONFIG_HAS_MMU to p2m_write_unlock() since future MPU
> work does not need p2m_tlb_flush_sync().

And there are no specific barrier required? Overall, I am not sure I 
like the #ifdef rather than providing a stub helper.

If the other like the idea of the #ifdef. I think a comment on top would 
be necessary to explain why there is nothing to do in the context of the 
MPU.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-22  8:42       ` Julien Grall
  2023-08-22  8:54         ` Henry Wang
@ 2023-08-23  0:10         ` Stefano Stabellini
  2023-08-23  0:58           ` Henry Wang
  2023-08-23 17:59           ` Julien Grall
  1 sibling, 2 replies; 57+ messages in thread
From: Stefano Stabellini @ 2023-08-23  0:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Henry Wang, Stefano Stabellini, Bertrand Marquis, Xen-devel,
	Penny Zheng, Volodymyr Babchuk, Wei Chen

[-- Attachment #1: Type: text/plain, Size: 6362 bytes --]

On Tue, 22 Aug 2023, Julien Grall wrote:
> > > I also don't like the idea of having again a massive mm.c files. So maybe
> > > we need a split like:
> > >   * File 1: Boot CPU0 MM bringup (mmu/setup.c)
> > >   * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
> > >   * File 3: Page tables update. (mmu/pt.c)
> > > 
> > > Ideally file 1 should contain only init code/data so it can be marked as
> > > .init. So the static pagetables may want to be defined in mmu/pt.c.
> > 
> > So based on Julien’s suggestion, Penny and I worked a bit on the current
> > functions in “arch/arm/mm.c” and we would like to propose below split
> > scheme, would you please comment on if below makes sense to you,
> > thanks!
> > 
> > """
> > static void __init __maybe_unused build_assertions()      -> arch/arm/mm.c
> 
> All the existing build assertions seems to be MMU specific. So shouldn't they
> be moved to mmu/mm.c.
> 
> > static lpae_t *xen_map_table()                            -> mmu/pt.c
> > static void xen_unmap_table()                             -> mmu/pt.c
> > void dump_pt_walk()                                       -> mmu/pt.c
> > void dump_hyp_walk()                                      -> mmu/pt.c
> > lpae_t mfn_to_xen_entry()                                 -> mmu/pt.c
> > void set_fixmap()                                         -> mmu/pt.c
> > void clear_fixmap()                                       -> mmu/pt.c
> > void flush_page_to_ram()                                  -> arch/arm/mm.c?
> 
> I think it should stay in arch/arm/mm.c because you will probably need to
> clean a page even on MPU systems.

I take you are referring to flush_page_to_ram() only, and not the other
functions above


> > lpae_t pte_of_xenaddr()                                   -> mmu/pt.c
> > void * __init early_fdt_map()                             -> mmu/setup.c
> > void __init remove_early_mappings()                       -> mmu/setup.c
> > static void xen_pt_enforce_wnx()                          -> mmu/pt.c,
> > export it
> 
> AFAIU, it would be called from smpboot.c and setup.c. For the former, the
> caller is mmu_init_secondary_cpu() which I think can be folded in head.S.
> 
> If we do that, then xen_pt_enforce_wnx() can be moved in setup.c and doesn't
> need to be exported.
> 
> > static void clear_table()                                 -> mmu/smpboot.c
> > void __init setup_pagetables()                            -> mmu/setup.c
> > static void clear_boot_pagetables()                       -> mmu/smpboot.c

Why clear_table() and clear_boot_pagetables() in mmu/smpboot.c rather
than mmu/setup.c ? It is OK either way as it does seem to make much of a
difference but I am curious.


> > int init_secondary_pagetables()                           -> mmu/smpboot.c
> > void mmu_init_secondary_cpu()                             -> mmu/smpboot.c
> > void __init setup_directmap_mappings()                    -> mmu/setup.c
> > void __init setup_frametable_mappings()                   -> mmu/setup.c
> > void *__init arch_vmap_virt_end()                         -> arch/arm/mm.c
> > or mmu/setup.c?
> 
> AFAIU, the VMAP will not be used for MPU systems. So this would want to go in
> mmu/. I am ok with setup.c.
> 
> > void *ioremap_attr()                                      -> mmu/pt.c
> > void *ioremap()                                           -> mmu/pt.c
> > static int create_xen_table()                             -> mmu/pt.c
> > static int xen_pt_next_level()                            -> mmu/pt.c
> > static bool xen_pt_check_entry()                          -> mmu/pt.c
> > static int xen_pt_update_entry()                          -> mmu/pt.c
> > static int xen_pt_mapping_level()                         -> mmu/pt.c
> > static unsigned int xen_pt_check_contig()                 -> mmu/pt.c
> > static int xen_pt_update()                                -> mmu/pt.c
> > int map_pages_to_xen()                                    -> mmu/pt.c
> > int __init populate_pt_range()                            -> mmu/pt.c
> > int destroy_xen_mappings()                                -> mmu/pt.c
> > int modify_xen_mappings()                                 -> mmu/pt.c
> > void free_init_memory()                                   -> mmu/setup.c
> > void arch_dump_shared_mem_info()                          -> arch/arm/mm.c
> > int steal_page()                                          -> arch/arm/mm.c
> > int page_is_ram_type()                                    -> arch/arm/mm.c
> > unsigned long domain_get_maximum_gpfn()                   -> arch/arm/mm.c
> > void share_xen_page_with_guest()                          -> arch/arm/mm.c
> > int xenmem_add_to_physmap_one()                           -> arch/arm/mm.c
> > long arch_memory_op()                                     -> arch/arm/mm.c
> > static struct domain *page_get_owner_and_nr_reference()   -> arch/arm/mm.c
> > struct domain *page_get_owner_and_reference()             -> arch/arm/mm.c
> > void put_page_nr()                                        -> arch/arm/mm.c
> > void put_page()                                           -> arch/arm/mm.c
> > bool get_page_nr()                                        -> arch/arm/mm.c
> > bool get_page()                                           -> arch/arm/mm.c
> > int get_page_type()                                       -> arch/arm/mm.c
> > void put_page_type()                                      -> arch/arm/mm.c
> > int create_grant_host_mapping()                           -> arch/arm/mm.c
> > int replace_grant_host_mapping()                          -> arch/arm/mm.c
> > bool is_iomem_page()                                      -> arch/arm/mm.c
> > void clear_and_clean_page()                               -> arch/arm/mm.c
> > unsigned long get_upper_mfn_bound()                       -> arch/arm/mm.c
> > """
> 
> The placement of all the ones I didn't comment on look fine to me. It would be
> good to have a second opinion from either Bertrand or Stefano before you start
> moving the functions.

It looks good in principle and it also looks like a great clean up. It
is not always super clear to me the distinction between mmu/pt.c,
mmu/smpboot.c and mmu/setup.c but it doesn't seem important. The
distinction between mmu/* and arch/arm/mm.c is clear and looks fine to
me.

I am looking forward to this!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-23  0:10         ` Stefano Stabellini
@ 2023-08-23  0:58           ` Henry Wang
  2023-08-23 17:59           ` Julien Grall
  1 sibling, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-23  0:58 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Bertrand Marquis, Xen-devel, Penny Zheng,
	Volodymyr Babchuk, Wei Chen

Hi Stefano,

> On Aug 23, 2023, at 08:10, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Tue, 22 Aug 2023, Julien Grall wrote:
>>>> I also don't like the idea of having again a massive mm.c files. So maybe
>>>> we need a split like:
>>>>  * File 1: Boot CPU0 MM bringup (mmu/setup.c)
>>>>  * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
>>>>  * File 3: Page tables update. (mmu/pt.c)
>>>> 
>>>> Ideally file 1 should contain only init code/data so it can be marked as
>>>> .init. So the static pagetables may want to be defined in mmu/pt.c.
>>> 
>>> So based on Julien’s suggestion, Penny and I worked a bit on the current
>>> functions in “arch/arm/mm.c” and we would like to propose below split
>>> scheme, would you please comment on if below makes sense to you,
>>> thanks!
>>> 
>>> """
>>> static void __init __maybe_unused build_assertions()      -> arch/arm/mm.c
>> 
>> All the existing build assertions seems to be MMU specific. So shouldn't they
>> be moved to mmu/mm.c.
>> 
>>> static lpae_t *xen_map_table()                            -> mmu/pt.c
>>> static void xen_unmap_table()                             -> mmu/pt.c
>>> void dump_pt_walk()                                       -> mmu/pt.c
>>> void dump_hyp_walk()                                      -> mmu/pt.c
>>> lpae_t mfn_to_xen_entry()                                 -> mmu/pt.c
>>> void set_fixmap()                                         -> mmu/pt.c
>>> void clear_fixmap()                                       -> mmu/pt.c
>>> void flush_page_to_ram()                                  -> arch/arm/mm.c?
>> 
>> I think it should stay in arch/arm/mm.c because you will probably need to
>> clean a page even on MPU systems.
> 
> I take you are referring to flush_page_to_ram() only, and not the other
> functions above
> 
> 
>>> lpae_t pte_of_xenaddr()                                   -> mmu/pt.c
>>> void * __init early_fdt_map()                             -> mmu/setup.c
>>> void __init remove_early_mappings()                       -> mmu/setup.c
>>> static void xen_pt_enforce_wnx()                          -> mmu/pt.c,
>>> export it
>> 
>> AFAIU, it would be called from smpboot.c and setup.c. For the former, the
>> caller is mmu_init_secondary_cpu() which I think can be folded in head.S.
>> 
>> If we do that, then xen_pt_enforce_wnx() can be moved in setup.c and doesn't
>> need to be exported.
>> 
>>> static void clear_table()                                 -> mmu/smpboot.c
>>> void __init setup_pagetables()                            -> mmu/setup.c
>>> static void clear_boot_pagetables()                       -> mmu/smpboot.c
> 
> Why clear_table() and clear_boot_pagetables() in mmu/smpboot.c rather
> than mmu/setup.c ? It is OK either way as it does seem to make much of a
> difference but I am curious.

I think it is because below call sequence:
init_secondary_mm() -> clear_boot_pagetables() -> clear_table()

We have the suggestion from Julien that:
"File 2: Secondary CPUs MM bringup (mmu/smpboot.c)”, and hence
I suggested the smpboot.c

>> 
>> The placement of all the ones I didn't comment on look fine to me. It would be
>> good to have a second opinion from either Bertrand or Stefano before you start
>> moving the functions.
> 
> It looks good in principle and it also looks like a great clean up. It
> is not always super clear to me the distinction between mmu/pt.c,
> mmu/smpboot.c and mmu/setup.c but it doesn't seem important. The
> distinction between mmu/* and arch/arm/mm.c is clear and looks fine to
> me.

I generally followed:
"* File 1: Boot CPU0 MM bringup (mmu/setup.c)
 * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
 * File 3: Page tables update. (mmu/pt.c)"

> 
> I am looking forward to this!

+1, will post the updated patch soon!

Kind regards,
Henry



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  2023-08-22 18:01   ` Julien Grall
@ 2023-08-23  1:41     ` Henry Wang
  2023-08-23 18:08       ` Julien Grall
  2023-08-23  3:47     ` Penny Zheng
  1 sibling, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-23  1:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen

Hi Julien,

> On Aug 23, 2023, at 02:01, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Penny Zheng <penny.zheng@arm.com>
>> Current P2M implementation is designed for MMU system only.
>> We move the MMU-specific codes into mmu/p2m.c, and only keep generic
>> codes in p2m.c, like VMID allocator, etc. We also move MMU-specific
>> definitions and declarations to mmu/p2m.h, such as p2m_tlb_flush_sync().
>> Also expose previously static functions p2m_vmid_allocator_init(),
>> p2m_alloc_vmid(), __p2m_set_entry() and setup_virt_paging_one()
> 
> Looking at the code, it seemsm that you need to keep expose __p2m_set_entry() because of p2m_relinquish_mapping(). However, it is not clear how this code is supposed to work for the MPU. So should we instead from p2m_relinquish_mapping() to mmu/p2m.c?

Sure, I will try that.

> 
> Other functions which doesn't seem to make sense in p2m.c are:
>  * p2m_clear_root_pages(): AFAIU there is no concept of root in the MPU. This also means that we possibly want to move out anything specific to the MMU from 'struct p2m'. This could be done separately.
>  * p2m_flush_vm(): This is built with MMU in mind as we can use the page-table to track access pages. You don't have that fine granularity in the MPU.

I agree, will also move these to mmu/ in v6.

> 
>> for futher MPU usage.
> 
> typo: futher/further/

Thanks, will fix.

> 
>> With the code movement, global variable max_vmid is used in multiple
>> files instead of a single file (and will be used in MPU P2M
>> implementation), declare it in the header and remove the "static" of
>> this variable.
>> Add #ifdef CONFIG_HAS_MMU to p2m_write_unlock() since future MPU
>> work does not need p2m_tlb_flush_sync().
> 
> And there are no specific barrier required? Overall, I am not sure I like the #ifdef rather than providing a stub helper.

I think for MPU systems we don’t need to flush the TLB, hence the #ifdef. Do you mean we should
provide a stub helper of p2m_tlb_flush_sync() for MPU? If so I think maybe the naming of this stub
helper is not really ideal?

Kind regards,
Henry

> 
> If the other like the idea of the #ifdef. I think a comment on top would be necessary to explain why there is nothing to do in the context of the MPU.
> 
> Cheers,
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  2023-08-22 18:01   ` Julien Grall
  2023-08-23  1:41     ` Henry Wang
@ 2023-08-23  3:47     ` Penny Zheng
  2023-08-23 21:39       ` Julien Grall
  1 sibling, 1 reply; 57+ messages in thread
From: Penny Zheng @ 2023-08-23  3:47 UTC (permalink / raw)
  To: Julien Grall, Henry Wang, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Wei Chen

Hi Julien

On 2023/8/23 02:01, Julien Grall wrote:
> Hi Henry,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Penny Zheng <penny.zheng@arm.com>
>>
>> Current P2M implementation is designed for MMU system only.
>> We move the MMU-specific codes into mmu/p2m.c, and only keep generic
>> codes in p2m.c, like VMID allocator, etc. We also move MMU-specific
>> definitions and declarations to mmu/p2m.h, such as p2m_tlb_flush_sync().
>> Also expose previously static functions p2m_vmid_allocator_init(),
>> p2m_alloc_vmid(), __p2m_set_entry() and setup_virt_paging_one()
> 
> Looking at the code, it seemsm that you need to keep expose 
> __p2m_set_entry() because of p2m_relinquish_mapping(). However, it is 
> not clear how this code is supposed to work for the MPU. So should we 
> instead from p2m_relinquish_mapping() to mmu/p2m.c?
> 

p2m->root stores per-domain P2M table, which is actually an array of MPU
region(pr_t). So maybe we should relinquish mapping region by region,
instead of page by page. Nevertheless, p2m_relinquish_mapping() shall be
moved to mmu/p2m.c and we need MPU version of it.

> Other functions which doesn't seem to make sense in p2m.c are:
>    * p2m_clear_root_pages(): AFAIU there is no concept of root in the 
> MPU. This also means that we possibly want to move out anything specific 
> to the MMU from 'struct p2m'. This could be done separately.

Current MPU implementation is re-using p2m->root to store P2M table.
Do you agree on this, or should we create a new variable, like 
p2m->mpu_table, for MPU P2M table only?
How we treat p2m_clear_root_pages also decides how we destroy P2M at 
domain destruction stage, current MPU flow is as follows:
```
     PROGRESS(mapping):
         ret = relinquish_p2m_mapping(d);
         if ( ret )
             return ret;

     PROGRESS(p2m_root):
         /*
          * We are about to free the intermediate page-tables, so clear the
          * root to prevent any walk to use them.
          */
         p2m_clear_root_pages(&d->arch.p2m);

#ifdef CONFIG_HAS_PAGING_MEMPOOL
     PROGRESS(p2m):
         ret = p2m_teardown(d);
         if ( ret )
             return ret;

     PROGRESS(p2m_pool):
         ret = p2m_teardown_allocation(d);
         if( ret )
             return ret;
#endif
```
We guard MMU-specific intermediate page-tables destroy with the new 
Kconfig CONFIG_HAS_PAGING_MEMPOOL, check 
https://gitlab.com/xen-project/people/henryw/xen/-/commit/7ff6d351e65f43fc34ea694adea0e030a51b1576
for more details.

If we destroy MPU P2M table in relinquish_p2m_mapping, region by region,
we could provide empty stub for p2m_clear_root_pages, and move it to 
mmu/p2m.c

>    * p2m_flush_vm(): This is built with MMU in mind as we can use the 
> page-table to track access pages. You don't have that fine granularity 
> in the MPU.
> 

Understood

>> for futher MPU usage.
> 
> typo: futher/further/
> 
>>
>> With the code movement, global variable max_vmid is used in multiple
>> files instead of a single file (and will be used in MPU P2M
>> implementation), declare it in the header and remove the "static" of
>> this variable.
>>
>> Add #ifdef CONFIG_HAS_MMU to p2m_write_unlock() since future MPU
>> work does not need p2m_tlb_flush_sync().
> 
> And there are no specific barrier required? Overall, I am not sure I 
> like the #ifdef rather than providing a stub helper.
> 
> If the other like the idea of the #ifdef. I think a comment on top would 
> be necessary to explain why there is nothing to do in the context of the 
> MPU.
> 
> Cheers,
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c
  2023-08-23  0:10         ` Stefano Stabellini
  2023-08-23  0:58           ` Henry Wang
@ 2023-08-23 17:59           ` Julien Grall
  1 sibling, 0 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-23 17:59 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Henry Wang, Bertrand Marquis, Xen-devel, Penny Zheng,
	Volodymyr Babchuk, Wei Chen

Hi Stefano,

On 23/08/2023 01:10, Stefano Stabellini wrote:
> On Tue, 22 Aug 2023, Julien Grall wrote:
>>>> I also don't like the idea of having again a massive mm.c files. So maybe
>>>> we need a split like:
>>>>    * File 1: Boot CPU0 MM bringup (mmu/setup.c)
>>>>    * File 2: Secondary CPUs MM bringup (mmu/smpboot.c)
>>>>    * File 3: Page tables update. (mmu/pt.c)
>>>>
>>>> Ideally file 1 should contain only init code/data so it can be marked as
>>>> .init. So the static pagetables may want to be defined in mmu/pt.c.
>>>
>>> So based on Julien’s suggestion, Penny and I worked a bit on the current
>>> functions in “arch/arm/mm.c” and we would like to propose below split
>>> scheme, would you please comment on if below makes sense to you,
>>> thanks!
>>>
>>> """
>>> static void __init __maybe_unused build_assertions()      -> arch/arm/mm.c
>>
>> All the existing build assertions seems to be MMU specific. So shouldn't they
>> be moved to mmu/mm.c.
>>
>>> static lpae_t *xen_map_table()                            -> mmu/pt.c
>>> static void xen_unmap_table()                             -> mmu/pt.c
>>> void dump_pt_walk()                                       -> mmu/pt.c
>>> void dump_hyp_walk()                                      -> mmu/pt.c
>>> lpae_t mfn_to_xen_entry()                                 -> mmu/pt.c
>>> void set_fixmap()                                         -> mmu/pt.c
>>> void clear_fixmap()                                       -> mmu/pt.c
>>> void flush_page_to_ram()                                  -> arch/arm/mm.c?
>>
>> I think it should stay in arch/arm/mm.c because you will probably need to
>> clean a page even on MPU systems.
> 
> I take you are referring to flush_page_to_ram() only, and not the other
> functions above

That's correct.

> 
> 
>>> lpae_t pte_of_xenaddr()                                   -> mmu/pt.c
>>> void * __init early_fdt_map()                             -> mmu/setup.c
>>> void __init remove_early_mappings()                       -> mmu/setup.c
>>> static void xen_pt_enforce_wnx()                          -> mmu/pt.c,
>>> export it
>>
>> AFAIU, it would be called from smpboot.c and setup.c. For the former, the
>> caller is mmu_init_secondary_cpu() which I think can be folded in head.S.
>>
>> If we do that, then xen_pt_enforce_wnx() can be moved in setup.c and doesn't
>> need to be exported.
>>
>>> static void clear_table()                                 -> mmu/smpboot.c
>>> void __init setup_pagetables()                            -> mmu/setup.c
>>> static void clear_boot_pagetables()                       -> mmu/smpboot.c
> 
> Why clear_table() and clear_boot_pagetables() in mmu/smpboot.c rather
> than mmu/setup.c ? It is OK either way as it does seem to make much of a
> difference but I am curious.

I initially wondered the same. But then I didn't comment because 
clear_boot_pagetables() is only used in order to prepare the page-tables 
for the secondary boot CPU.

Also, even if today we don't support CPU hotplug, there is nothing 
preventing us to do (in fact there was a series on the ML for that). 
This means clear_table() & co would need to be outside of the init 
section and we would need to remove the check that all 
variables/functions defined in setup.c are residing in it.

Saying that, we should not need to clear the boot page tables anymore 
for arm64 because secondary CPUs will directly jump to the runtime 
page-tables. So the code could be cleaned up a bit. Anyway, this is not 
a request for this series and could be done afterwards by someone else.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  2023-08-23  1:41     ` Henry Wang
@ 2023-08-23 18:08       ` Julien Grall
  2023-08-23 21:17         ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-23 18:08 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen



On 23/08/2023 02:41, Henry Wang wrote:
> Hi Julien,

Hi Henry,

>> On Aug 23, 2023, at 02:01, Julien Grall <julien@xen.org> wrote:
>>
>> Hi Henry,
>>
>> On 14/08/2023 05:25, Henry Wang wrote:
>>> From: Penny Zheng <penny.zheng@arm.com>
>>> Current P2M implementation is designed for MMU system only.
>>> We move the MMU-specific codes into mmu/p2m.c, and only keep generic
>>> codes in p2m.c, like VMID allocator, etc. We also move MMU-specific
>>> definitions and declarations to mmu/p2m.h, such as p2m_tlb_flush_sync().
>>> Also expose previously static functions p2m_vmid_allocator_init(),
>>> p2m_alloc_vmid(), __p2m_set_entry() and setup_virt_paging_one()
>>
>> Looking at the code, it seemsm that you need to keep expose __p2m_set_entry() because of p2m_relinquish_mapping(). However, it is not clear how this code is supposed to work for the MPU. So should we instead from p2m_relinquish_mapping() to mmu/p2m.c?
> 
> Sure, I will try that.
> 
>>
>> Other functions which doesn't seem to make sense in p2m.c are:
>>   * p2m_clear_root_pages(): AFAIU there is no concept of root in the MPU. This also means that we possibly want to move out anything specific to the MMU from 'struct p2m'. This could be done separately.
>>   * p2m_flush_vm(): This is built with MMU in mind as we can use the page-table to track access pages. You don't have that fine granularity in the MPU.
> 
> I agree, will also move these to mmu/ in v6.
> 
>>
>>> for futher MPU usage.
>>
>> typo: futher/further/
> 
> Thanks, will fix.
> 
>>
>>> With the code movement, global variable max_vmid is used in multiple
>>> files instead of a single file (and will be used in MPU P2M
>>> implementation), declare it in the header and remove the "static" of
>>> this variable.
>>> Add #ifdef CONFIG_HAS_MMU to p2m_write_unlock() since future MPU
>>> work does not need p2m_tlb_flush_sync().
>>
>> And there are no specific barrier required? Overall, I am not sure I like the #ifdef rather than providing a stub helper.
> 
> I think for MPU systems we don’t need to flush the TLB, hence the #ifdef.

I wasn't necessarily thinking about a TLB flush but instead a DSB/DMB. 
At least for the MMU case, I think that in theory we need a DSB when the 
there is no TLB flush to ensure new entry in the page-tables are seen 
before p2m_write_unlock() completes.

So far we are getting away because write_pte() always have a barrier 
after. But at some point, I would like to remove it as this is a massive 
hammer.

> Do you mean we should
> provide a stub helper of p2m_tlb_flush_sync() for MPU? If so I think maybe the naming of this stub
> helper is not really ideal?

See above. I am trying to understand the expected sequence when updating 
the MPU tables. Are you going to add barriers after every update to the 
entry?

Having an helper would also be a good place to explain why some 
synchronization is not needed. I am not sure about a name though.

Maybe p2m_sync() and p2m_force_sync()?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  2023-08-23 18:08       ` Julien Grall
@ 2023-08-23 21:17         ` Julien Grall
  0 siblings, 0 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-23 21:17 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Wei Chen

Hi,

On 23/08/2023 19:08, Julien Grall wrote:
>>>> With the code movement, global variable max_vmid is used in multiple
>>>> files instead of a single file (and will be used in MPU P2M
>>>> implementation), declare it in the header and remove the "static" of
>>>> this variable.
>>>> Add #ifdef CONFIG_HAS_MMU to p2m_write_unlock() since future MPU
>>>> work does not need p2m_tlb_flush_sync().
>>>
>>> And there are no specific barrier required? Overall, I am not sure I 
>>> like the #ifdef rather than providing a stub helper.
>>
>> I think for MPU systems we don’t need to flush the TLB, hence the #ifdef.
> 
> I wasn't necessarily thinking about a TLB flush but instead a DSB/DMB. 
> At least for the MMU case, I think that in theory we need a DSB when the 
> there is no TLB flush to ensure new entry in the page-tables are seen 
> before p2m_write_unlock() completes.
> 
> So far we are getting away because write_pte() always have a barrier 
> after. But at some point, I would like to remove it as this is a massive 
> hammer.
> 
>> Do you mean we should
>> provide a stub helper of p2m_tlb_flush_sync() for MPU? If so I think 
>> maybe the naming of this stub
>> helper is not really ideal?
> 
> See above. I am trying to understand the expected sequence when updating 
> the MPU tables. Are you going to add barriers after every update to the 
> entry?

I have looked at your branch mpu_v5. In theory the P2M can be modified 
at any point of the life-cycle of the domain. This means that another 
pCPU may have the regions loaded.

If that's the case then you would likely want to ensure the entries are 
synchronized. The easiest way would be to pause/unpause the domain when 
taking/releasing the lock. There might be other way, but this indicates 
that helper would be needed for the MPU.

That said, it is not clear to me if there is any use-case where you 
would want to update the P2M at runtime. If you have known, then you 
might be able to simply return an error if the P2M is modified after the 
domain was created.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  2023-08-23  3:47     ` Penny Zheng
@ 2023-08-23 21:39       ` Julien Grall
  0 siblings, 0 replies; 57+ messages in thread
From: Julien Grall @ 2023-08-23 21:39 UTC (permalink / raw)
  To: Penny Zheng, Henry Wang, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Wei Chen



On 23/08/2023 04:47, Penny Zheng wrote:
> Hi Julien

Hi Penny,

> On 2023/8/23 02:01, Julien Grall wrote:
>> Hi Henry,
>>
>> On 14/08/2023 05:25, Henry Wang wrote:
>>> From: Penny Zheng <penny.zheng@arm.com>
>>>
>>> Current P2M implementation is designed for MMU system only.
>>> We move the MMU-specific codes into mmu/p2m.c, and only keep generic
>>> codes in p2m.c, like VMID allocator, etc. We also move MMU-specific
>>> definitions and declarations to mmu/p2m.h, such as p2m_tlb_flush_sync().
>>> Also expose previously static functions p2m_vmid_allocator_init(),
>>> p2m_alloc_vmid(), __p2m_set_entry() and setup_virt_paging_one()
>>
>> Looking at the code, it seemsm that you need to keep expose 
>> __p2m_set_entry() because of p2m_relinquish_mapping(). However, it is 
>> not clear how this code is supposed to work for the MPU. So should we 
>> instead from p2m_relinquish_mapping() to mmu/p2m.c?
>>
> 
> p2m->root stores per-domain P2M table, which is actually an array of MPU
> region(pr_t). So maybe we should relinquish mapping region by region,
> instead of page by page. Nevertheless, p2m_relinquish_mapping() shall be
> moved to mmu/p2m.c and we need MPU version of it.
> 
>> Other functions which doesn't seem to make sense in p2m.c are:
>>    * p2m_clear_root_pages(): AFAIU there is no concept of root in the 
>> MPU. This also means that we possibly want to move out anything 
>> specific to the MMU from 'struct p2m'. This could be done separately.
> 
> Current MPU implementation is re-using p2m->root to store P2M table.
> Do you agree on this, or should we create a new variable, like 
> p2m->mpu_table, for MPU P2M table only?

I have looked at the mpu_v5 tree to check how you use p2m->root. The 
common pattern is:

table = (pr_t *)page_to_virt(p2m->root);

AFAICT the "root" is always mapped in your case. So this is a bit 
inneficient to have to convert from a page to virt every single time you 
need to modify the P2M.

Therefore it would be better to introduce something more specific to the 
MPU. Rather than introduce a field in the structure p2m, it would be 
better if we introduce a structure that would be defined in 
{mmu,mpu}/p2m.h, would include there specific fields and be embedded in 
struct p2m.

> How we treat p2m_clear_root_pages also decides how we destroy P2M at 
> domain destruction stage, current MPU flow is as follows:
> ```
>      PROGRESS(mapping):
>          ret = relinquish_p2m_mapping(d);
>          if ( ret )
>              return ret;
> 
>      PROGRESS(p2m_root):
>          /*
>           * We are about to free the intermediate page-tables, so clear the
>           * root to prevent any walk to use them.
>           */
>          p2m_clear_root_pages(&d->arch.p2m);
> 
> #ifdef CONFIG_HAS_PAGING_MEMPOOL
>      PROGRESS(p2m):
>          ret = p2m_teardown(d);
>          if ( ret )
>              return ret;
> 
>      PROGRESS(p2m_pool):
>          ret = p2m_teardown_allocation(d);
>          if( ret )
>              return ret;
> #endif
> ```
> We guard MMU-specific intermediate page-tables destroy with the new 
> Kconfig CONFIG_HAS_PAGING_MEMPOOL, check 
> https://gitlab.com/xen-project/people/henryw/xen/-/commit/7ff6d351e65f43fc34ea694adea0e030a51b1576
> for more details.
> 
> If we destroy MPU P2M table in relinquish_p2m_mapping, region by region,

It would seem to be better to handle it region by region. Note that you 
will still need to handle preemption and that may happen in the middle 
of the region (in particular if they are big).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability
  2023-08-21 18:32   ` Julien Grall
@ 2023-08-24  9:46     ` Henry Wang
  2023-08-24 10:19       ` Julien Grall
  0 siblings, 1 reply; 57+ messages in thread
From: Henry Wang @ 2023-08-24  9:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk

Hi Julien,

> On Aug 22, 2023, at 02:32, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 14/08/2023 05:25, Henry Wang wrote:
>> From: Penny Zheng <penny.zheng@arm.com>
>> As preparation for MPU support, which will use some variables/functions
>> for both MMU and MPU system, We rename the affected variable/function
>> to more generic names:
>> - init_ttbr -> init_mm,
> 
> You moved init_ttbr to mm/mmu.c. So why does this need to be renamed?
> 
> And if you really planned to use it for the MPU code. Then init_ttbr should not have been moved.

You are correct. I think we need to use the “init_mm” for MPU SMP support,
so I would not move this variable in v6.

> 
>> - mmu_init_secondary_cpu() -> mm_init_secondary_cpu()
>> - init_secondary_pagetables() -> init_secondary_mm()
> 
> The original naming were not great but the new one are a lot more confusing as they seem to just be a reshuffle of word.
> 
> mm_init_secondary_cpu() is only setting the WxN bit. For the MMU, I think it can be done much earlier. Do you have anything to add in it? If not, then I would consider to get rid of it.

I’ve got rid of mmu_init_secondary_cpu() function in my local v6 as it is now
folded to the assembly code.

> 
> For init_secondary_mm(), I would renamed it to prepare_secondary_mm().

Sure, thanks for the name suggestion.

> 
>>  -void update_identity_mapping(bool enable)
>> +static void update_identity_mapping(bool enable)
> 
> Why not simply renaming this function to update_mm_mapping()? But...
> 
>>  {
>>      paddr_t id_addr = virt_to_maddr(_start);
>>      int rc;
>> @@ -120,6 +120,11 @@ void update_identity_mapping(bool enable)
>>      BUG_ON(rc);
>>  }
>>  +void update_mm_mapping(bool enable)
> 
> ... the new name it quite confusing. What is the mapping it is referring to?

So I checked the MPU SMP support code and now I think I understand the reason
why update_mm_mapping() was introduced:

In the future we eventually need to support SMP for MMU systems, which means
we need to call arch_cpu_up() and arch_cpu_up_finish(). These two functions call
update_identity_mapping(). Since we believe "identity mapping" is a MMU concept,
we changed this to generic name "mm mapping” as arch_cpu_up() and 
arch_cpu_up_finish() is a shared path between MMU and MPU.

But I think MPU won’t use update_mm_mapping() function at all, so I wonder do
you prefer creating an empty stub update_identity_mapping() for MPU? Or use #ifdef
as suggested in your previous email...

> 
> If you don't want to keep update_identity_mapping(), then I would consider to simply wrap...

…here and ...

> 
>> +{
>> +    update_identity_mapping(enable);
>> +}
>> +
>>  extern void switch_ttbr_id(uint64_t ttbr);
>>    typedef void (switch_ttbr_fn)(uint64_t ttbr);
>> @@ -131,7 +136,7 @@ void __init switch_ttbr(uint64_t ttbr)
>>      lpae_t pte;
>>        /* Enable the identity mapping in the boot page tables */
>> -    update_identity_mapping(true);
>> +    update_mm_mapping(true);
>>        /* Enable the identity mapping in the runtime page tables */
>>      pte = pte_of_xenaddr((vaddr_t)switch_ttbr_id);
>> @@ -148,7 +153,7 @@ void __init switch_ttbr(uint64_t ttbr)
>>       * Note it is not necessary to disable it in the boot page tables
>>       * because they are not going to be used by this CPU anymore.
>>       */
>> -    update_identity_mapping(false);
>> +    update_mm_mapping(false);
>>  }
>>    /*
>> diff --git a/xen/arch/arm/arm64/smpboot.c b/xen/arch/arm/arm64/smpboot.c
>> index 9637f42469..2b1d086a1e 100644
>> --- a/xen/arch/arm/arm64/smpboot.c
>> +++ b/xen/arch/arm/arm64/smpboot.c
>> @@ -111,18 +111,18 @@ int arch_cpu_up(int cpu)
>>      if ( !smp_enable_ops[cpu].prepare_cpu )
>>          return -ENODEV;
>>  -    update_identity_mapping(true);
>> +    update_mm_mapping(true);
> 
> ... with #ifdef CONFIG_MMU here...
> 
>>        rc = smp_enable_ops[cpu].prepare_cpu(cpu);
>>      if ( rc )
>> -        update_identity_mapping(false);
>> +        update_mm_mapping(false);
> 
> ... here and ...
> 
> 
>>        return rc;
>>  }
>>    void arch_cpu_up_finish(void)
>>  {
>> -    update_identity_mapping(false);
>> +    update_mm_mapping(false);
> 
> ... here.

…all here?

Kind regards,
Henry



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability
  2023-08-24  9:46     ` Henry Wang
@ 2023-08-24 10:19       ` Julien Grall
  2023-08-24 11:18         ` Henry Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2023-08-24 10:19 UTC (permalink / raw)
  To: Henry Wang
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk

Hi Henry,

On 24/08/2023 10:46, Henry Wang wrote:
>> On Aug 22, 2023, at 02:32, Julien Grall <julien@xen.org> wrote:
>>
>> Hi,
>>
>> On 14/08/2023 05:25, Henry Wang wrote:
>>> From: Penny Zheng <penny.zheng@arm.com>
>>> As preparation for MPU support, which will use some variables/functions
>>> for both MMU and MPU system, We rename the affected variable/function
>>> to more generic names:
>>> - init_ttbr -> init_mm,
>>
>> You moved init_ttbr to mm/mmu.c. So why does this need to be renamed?
>>
>> And if you really planned to use it for the MPU code. Then init_ttbr should not have been moved.
> 
> You are correct. I think we need to use the “init_mm” for MPU SMP support,
> so I would not move this variable in v6.

Your branch mpu_v5 doesn't seem to contain any use. But I would expect 
that the common is never going to use the variable. Also, at the moment 
it is 64-bit but I don't see why it would be necessary to be bigger than 
32-bit on 32-bit.

So I think it would be preferable if init_ttbr is move in mm/mmu.c. You
can then introduce an MPU specific variable.

In general, only variables that will be used by common code should be 
defined in common. All the rest should be defined in their specific 
directory.

>>> - mmu_init_secondary_cpu() -> mm_init_secondary_cpu()
>>> - init_secondary_pagetables() -> init_secondary_mm()
>>
>> The original naming were not great but the new one are a lot more confusing as they seem to just be a reshuffle of word.
>>
>> mm_init_secondary_cpu() is only setting the WxN bit. For the MMU, I think it can be done much earlier. Do you have anything to add in it? If not, then I would consider to get rid of it.
> 
> I’ve got rid of mmu_init_secondary_cpu() function in my local v6 as it is now
> folded to the assembly code.
> 
>>
>> For init_secondary_mm(), I would renamed it to prepare_secondary_mm().
> 
> Sure, thanks for the name suggestion.
> 
>>
>>>   -void update_identity_mapping(bool enable)
>>> +static void update_identity_mapping(bool enable)
>>
>> Why not simply renaming this function to update_mm_mapping()? But...
>>
>>>   {
>>>       paddr_t id_addr = virt_to_maddr(_start);
>>>       int rc;
>>> @@ -120,6 +120,11 @@ void update_identity_mapping(bool enable)
>>>       BUG_ON(rc);
>>>   }
>>>   +void update_mm_mapping(bool enable)
>>
>> ... the new name it quite confusing. What is the mapping it is referring to?
> 
> So I checked the MPU SMP support code and now I think I understand the reason
> why update_mm_mapping() was introduced:
> 
> In the future we eventually need to support SMP for MMU systems, which means
> we need to call arch_cpu_up() and arch_cpu_up_finish(). These two functions call
> update_identity_mapping(). Since we believe "identity mapping" is a MMU concept,
> we changed this to generic name "mm mapping” as arch_cpu_up() and
> arch_cpu_up_finish() is a shared path between MMU and MPU.

The function is today called "update_identity_mapping()" because this is 
what the implementation does on arm64. But the goal of this function is 
to make sure that any mapping necessary for bring-up secondary CPUs are 
present.

So if you don't need similar work for the MPU then I would go with...

> 
> But I think MPU won’t use update_mm_mapping() function at all, so I wonder do
> you prefer creating an empty stub update_identity_mapping() for MPU? Or use #ifdef
> as suggested in your previous email...


... #ifdef. I have some preliminary work where the call to 
update_identity_mapping() may end up to be moved somewhere else as the 
page-tables would not be shared between pCPU anymore. So the logic will 
not some rework (see [1]).

Cheers,


[1] https://lore.kernel.org/all/20221216114853.8227-21-julien@xen.org/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability
  2023-08-24 10:19       ` Julien Grall
@ 2023-08-24 11:18         ` Henry Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Henry Wang @ 2023-08-24 11:18 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Penny Zheng, Stefano Stabellini, Bertrand Marquis,
	Wei Chen, Volodymyr Babchuk

Hi Julienm

> On Aug 24, 2023, at 18:19, Julien Grall <julien@xen.org> wrote:
> 
> Hi Henry,
> 
> On 24/08/2023 10:46, Henry Wang wrote:
>>> On Aug 22, 2023, at 02:32, Julien Grall <julien@xen.org> wrote:
>>> 
>>> Hi,
>>> 
>>> On 14/08/2023 05:25, Henry Wang wrote:
>>>> From: Penny Zheng <penny.zheng@arm.com>
>>>> As preparation for MPU support, which will use some variables/functions
>>>> for both MMU and MPU system, We rename the affected variable/function
>>>> to more generic names:
>>>> - init_ttbr -> init_mm,
>>> 
>>> You moved init_ttbr to mm/mmu.c. So why does this need to be renamed?
>>> 
>>> And if you really planned to use it for the MPU code. Then init_ttbr should not have been moved.
>> You are correct. I think we need to use the “init_mm” for MPU SMP support,
>> so I would not move this variable in v6.
> 
> Your branch mpu_v5 doesn't seem to contain any use. But I would expect that the common is never going to use the variable. Also, at the moment it is 64-bit but I don't see why it would be necessary to be bigger than 32-bit on 32-bit.
> 
> So I think it would be preferable if init_ttbr is move in mm/mmu.c. You
> can then introduce an MPU specific variable.

Sounds good to me.

> 
> In general, only variables that will be used by common code should be defined in common. All the rest should be defined in their specific directory.

Got it :))

> 
>>>> - mmu_init_secondary_cpu() -> mm_init_secondary_cpu()
>>>> - init_secondary_pagetables() -> init_secondary_mm()
>>> 
>>> The original naming were not great but the new one are a lot more confusing as they seem to just be a reshuffle of word.
>>> 
>>> mm_init_secondary_cpu() is only setting the WxN bit. For the MMU, I think it can be done much earlier. Do you have anything to add in it? If not, then I would consider to get rid of it.
>> I’ve got rid of mmu_init_secondary_cpu() function in my local v6 as it is now
>> folded to the assembly code.
>>> 
>>> For init_secondary_mm(), I would renamed it to prepare_secondary_mm().
>> Sure, thanks for the name suggestion.
>>> 
>>>>  -void update_identity_mapping(bool enable)
>>>> +static void update_identity_mapping(bool enable)
>>> 
>>> Why not simply renaming this function to update_mm_mapping()? But...
>>> 
>>>>  {
>>>>      paddr_t id_addr = virt_to_maddr(_start);
>>>>      int rc;
>>>> @@ -120,6 +120,11 @@ void update_identity_mapping(bool enable)
>>>>      BUG_ON(rc);
>>>>  }
>>>>  +void update_mm_mapping(bool enable)
>>> 
>>> ... the new name it quite confusing. What is the mapping it is referring to?
>> So I checked the MPU SMP support code and now I think I understand the reason
>> why update_mm_mapping() was introduced:
>> In the future we eventually need to support SMP for MMU systems, which means
>> we need to call arch_cpu_up() and arch_cpu_up_finish(). These two functions call
>> update_identity_mapping(). Since we believe "identity mapping" is a MMU concept,
>> we changed this to generic name "mm mapping” as arch_cpu_up() and
>> arch_cpu_up_finish() is a shared path between MMU and MPU.
> 
> The function is today called "update_identity_mapping()" because this is what the implementation does on arm64. But the goal of this function is to make sure that any mapping necessary for bring-up secondary CPUs are present.
> 
> So if you don't need similar work for the MPU then I would go with...
> 
>> But I think MPU won’t use update_mm_mapping() function at all, so I wonder do
>> you prefer creating an empty stub update_identity_mapping() for MPU? Or use #ifdef
>> as suggested in your previous email...
> 
> 
> ... #ifdef. I have some preliminary work where the call to update_identity_mapping() may end up to be moved somewhere else as the page-tables would not be shared between pCPU anymore. So the logic will not some rework (see [1]).

Thanks for sharing this info, I will drop the modification to update_identity_mapping()
from this patch.

Kind regards,
Henry


> 
> Cheers,
> 
> 
> [1] https://lore.kernel.org/all/20221216114853.8227-21-julien@xen.org/
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2023-08-24 11:19 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-14  4:25 [PATCH v5 00/13] xen/arm: Split MMU code as the prepration of MPU work Henry Wang
2023-08-14  4:25 ` [PATCH v5 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm() Henry Wang
2023-08-21  8:33   ` Julien Grall
2023-08-21  8:40     ` Henry Wang
2023-08-14  4:25 ` [PATCH v5 02/13] xen/arm: Introduce CONFIG_MMU Kconfig option Henry Wang
2023-08-21  8:43   ` Julien Grall
2023-08-21  8:45     ` Henry Wang
2023-08-14  4:25 ` [PATCH v5 03/13] xen/arm64: prepare for moving MMU related code from head.S Henry Wang
2023-08-21  8:44   ` Julien Grall
2023-08-21  8:54     ` Henry Wang
2023-08-21 17:04       ` Julien Grall
2023-08-14  4:25 ` [PATCH v5 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S Henry Wang
2023-08-21  9:18   ` Julien Grall
2023-08-21  9:29     ` Henry Wang
2023-08-21 10:16       ` Julien Grall
2023-08-21 10:21         ` Henry Wang
2023-08-14  4:25 ` [PATCH v5 05/13] xen/arm: Move MMU related definitions from config.h to mmu/layout.h Henry Wang
2023-08-14  4:25 ` [PATCH v5 06/13] xen/arm64: Fold setup_fixmap() to create_page_tables() Henry Wang
2023-08-21  9:22   ` Julien Grall
2023-08-21  9:30     ` Henry Wang
2023-08-14  4:25 ` [PATCH v5 07/13] xen/arm: Extract MMU-specific code Henry Wang
2023-08-21 17:57   ` Julien Grall
2023-08-14  4:25 ` [PATCH v5 08/13] xen/arm: Fold pmap and fixmap into MMU system Henry Wang
2023-08-21 18:14   ` Julien Grall
2023-08-22  2:42     ` Henry Wang
2023-08-22  8:06       ` Julien Grall
2023-08-22  8:08         ` Henry Wang
2023-08-14  4:25 ` [PATCH v5 09/13] xen/arm: mm: Use generic variable/function names for extendability Henry Wang
2023-08-21 18:32   ` Julien Grall
2023-08-24  9:46     ` Henry Wang
2023-08-24 10:19       ` Julien Grall
2023-08-24 11:18         ` Henry Wang
2023-08-14  4:25 ` [PATCH v5 10/13] xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c Henry Wang
2023-08-21 21:19   ` Julien Grall
2023-08-14  4:25 ` [PATCH v5 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h} Henry Wang
2023-08-22 18:01   ` Julien Grall
2023-08-23  1:41     ` Henry Wang
2023-08-23 18:08       ` Julien Grall
2023-08-23 21:17         ` Julien Grall
2023-08-23  3:47     ` Penny Zheng
2023-08-23 21:39       ` Julien Grall
2023-08-14  4:25 ` [PATCH v5 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c Henry Wang
2023-08-21 21:31   ` Julien Grall
2023-08-22  7:44     ` Henry Wang
2023-08-22  8:42       ` Julien Grall
2023-08-22  8:54         ` Henry Wang
2023-08-23  0:10         ` Stefano Stabellini
2023-08-23  0:58           ` Henry Wang
2023-08-23 17:59           ` Julien Grall
2023-08-14  4:25 ` [PATCH v5 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU Henry Wang
2023-08-14  7:08   ` Jan Beulich
2023-08-14  7:10     ` Henry Wang
2023-08-21 21:34   ` Julien Grall
2023-08-22  2:11     ` Henry Wang
2023-08-22  8:18       ` Julien Grall
2023-08-22  8:48         ` Henry Wang
2023-08-22  8:55           ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.