All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1
@ 2023-01-13  5:28 Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h Penny Zheng
                   ` (42 more replies)
  0 siblings, 43 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Roger Pau Monné

The Armv8-R architecture profile was designed to support use cases
that have a high sensitivity to deterministic execution. (e.g.
Fuel Injection, Brake control, Drive trains, Motor control etc)

Arm announced Armv8-R in 2013, it is the latest generation Arm
architecture targeted at the Real-time profile. It introduces
virtualization at the highest security level while retaining the
Protected Memory System Architecture (PMSA) based on a Memory
Protection Unit (MPU). In 2020, Arm announced Cortex-R82,
which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
The latest Armv8-R64 document can be found [1]. And the features of
Armv8-R64 architecture:
  - An exception model that is compatible with the Armv8-A model
  - Virtualization with support for guest operating systems
  - PMSA virtualization using MPUs In EL2.
  - Adds support for the 64-bit A64 instruction set.
  - Supports up to 48-bit physical addressing.
  - Supports three Exception Levels (ELs)
        - Secure EL2 - The Highest Privilege
        - Secure EL1 - RichOS (MMU) or RTOS (MPU)
        - Secure EL0 - Application Workloads
 - Supports only a single Security state - Secure.
 - MPU in EL1 & EL2 is configurable, MMU in EL1 is configurable.

These patch series are implementing the Armv8-R64 MPU support
for Xen, which are based on the discussion of
"Proposal for Porting Xen to Armv8-R64 - DraftC" [2].

We will implement the Armv8-R64 and MPU support in three stages:
1. Boot Xen itself to idle thread, do not create any guests on it.
2. Support to boot MPU and MMU domains on Armv8-R64 Xen.
3. SMP and other advanced features of Xen support on Armv8-R64.

As we have not implemented guest support in part#1 series of MPU
support, Xen can not create any guest in boot time. So in this
patch serie, we provide an extra DNM-commit in the last for users
to test Xen boot to idle on MPU system.

We will split these patches to several parts, this series is the
part#1, v1 is in [3], the full PoC can be found in [4]. More software for
Armv8-R64 can be found in [5];

[1] https://developer.arm.com/documentation/ddi0600/latest
[2] https://lists.xenproject.org/archives/html/xen-devel/2022-05/msg00643.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2022-11/msg00289.html
[4] https://gitlab.com/xen-project/people/weic/xen/-/tree/integration/mpu_v2
[5] https://armv8r64-refstack.docs.arm.com/en/v5.0/

Penny Zheng (28):
  xen/mpu: build up start-of-day Xen MPU memory region map
  xen/mpu: introduce helpers for MPU enablement
  xen/mpu: introduce unified function setup_early_uart to map early UART
  xen/arm64: head: Jump to the runtime mapping in enable_mm()
  xen/arm: introduce setup_mm_mappings
  xen/mpu: plump virt/maddr/mfn convertion in MPU system
  xen/mpu: introduce helper access_protection_region
  xen/mpu: populate a new region in Xen MPU mapping table
  xen/mpu: plump early_fdt_map in MPU systems
  xen/arm: move MMU-specific setup_mm to setup_mmu.c
  xen/mpu: implement MPU version of setup_mm in setup_mpu.c
  xen/mpu: initialize frametable in MPU system
  xen/mpu: introduce "mpu,xxx-memory-section"
  xen/mpu: map MPU guest memory section before static memory
    initialization
  xen/mpu: destroy an existing entry in Xen MPU memory mapping table
  xen/mpu: map device memory resource in MPU system
  xen/mpu: map boot module section in MPU system
  xen/mpu: introduce mpu_memory_section_contains for address range check
  xen/mpu: disable VMAP sub-system for MPU systems
  xen/mpu: disable FIXMAP in MPU system
  xen/mpu: implement MPU version of ioremap_xxx
  xen/mpu: free init memory in MPU system
  xen/mpu: destroy boot modules and early FDT mapping in MPU system
  xen/mpu: Use secure hypervisor timer for AArch64v8R
  xen/mpu: move MMU specific P2M code to p2m_mmu.c
  xen/mpu: implement setup_virt_paging for MPU system
  xen/mpu: re-order xen_mpumap in arch_init_finialize
  xen/mpu: add Kconfig option to enable Armv8-R AArch64 support

Wei Chen (13):
  xen/arm: remove xen_phys_start and xenheap_phys_end from config.h
  xen/arm: make ARM_EFI selectable for Arm64
  xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA
  xen/arm: add an option to define Xen start address for Armv8-R
  xen/arm64: prepare for moving MMU related code from head.S
  xen/arm64: move MMU related code from head.S to head_mmu.S
  xen/arm64: add .text.idmap for Xen identity map sections
  xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R
  xen/arm: decouple copy_from_paddr with FIXMAP
  xen/arm: split MMU and MPU config files from config.h
  xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h
  xen/arm: check mapping status and attributes for MPU copy_from_paddr
  xen/mpu: make Xen boot to idle on MPU systems(DNM)

 xen/arch/arm/Kconfig                      |   44 +-
 xen/arch/arm/Makefile                     |   17 +-
 xen/arch/arm/arm64/Makefile               |    5 +
 xen/arch/arm/arm64/head.S                 |  466 +----
 xen/arch/arm/arm64/head_mmu.S             |  399 ++++
 xen/arch/arm/arm64/head_mpu.S             |  394 ++++
 xen/arch/arm/bootfdt.c                    |   13 +-
 xen/arch/arm/domain_build.c               |    4 +
 xen/arch/arm/include/asm/alternative.h    |   15 +
 xen/arch/arm/include/asm/arm64/flushtlb.h |   25 +
 xen/arch/arm/include/asm/arm64/macros.h   |   51 +
 xen/arch/arm/include/asm/arm64/mpu.h      |  174 ++
 xen/arch/arm/include/asm/arm64/sysregs.h  |   77 +
 xen/arch/arm/include/asm/config.h         |  105 +-
 xen/arch/arm/include/asm/config_mmu.h     |  112 +
 xen/arch/arm/include/asm/config_mpu.h     |   25 +
 xen/arch/arm/include/asm/cpregs.h         |    4 +-
 xen/arch/arm/include/asm/cpuerrata.h      |   12 +
 xen/arch/arm/include/asm/cpufeature.h     |    7 +
 xen/arch/arm/include/asm/early_printk.h   |   13 +
 xen/arch/arm/include/asm/fixmap.h         |   28 +-
 xen/arch/arm/include/asm/flushtlb.h       |   22 +
 xen/arch/arm/include/asm/mm.h             |   78 +-
 xen/arch/arm/include/asm/mm_mmu.h         |   77 +
 xen/arch/arm/include/asm/mm_mpu.h         |   54 +
 xen/arch/arm/include/asm/p2m.h            |   27 +-
 xen/arch/arm/include/asm/p2m_mmu.h        |   28 +
 xen/arch/arm/include/asm/processor.h      |   13 +
 xen/arch/arm/include/asm/setup.h          |   39 +
 xen/arch/arm/kernel.c                     |   31 +-
 xen/arch/arm/mm.c                         | 1340 +-----------
 xen/arch/arm/mm_mmu.c                     | 1376 +++++++++++++
 xen/arch/arm/mm_mpu.c                     | 1056 ++++++++++
 xen/arch/arm/p2m.c                        | 2282 +--------------------
 xen/arch/arm/p2m_mmu.c                    | 2257 ++++++++++++++++++++
 xen/arch/arm/p2m_mpu.c                    |  274 +++
 xen/arch/arm/platforms/Kconfig            |   16 +-
 xen/arch/arm/setup.c                      |  394 +---
 xen/arch/arm/setup_mmu.c                  |  391 ++++
 xen/arch/arm/setup_mpu.c                  |  208 ++
 xen/arch/arm/time.c                       |   14 +-
 xen/arch/arm/traps.c                      |    2 +
 xen/arch/arm/xen.lds.S                    |   10 +-
 xen/arch/x86/Kconfig                      |    1 +
 xen/common/Kconfig                        |    6 +
 xen/common/Makefile                       |    2 +-
 xen/include/xen/vmap.h                    |   93 +-
 47 files changed, 7500 insertions(+), 4581 deletions(-)
 create mode 100644 xen/arch/arm/arm64/head_mmu.S
 create mode 100644 xen/arch/arm/arm64/head_mpu.S
 create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
 create mode 100644 xen/arch/arm/include/asm/config_mmu.h
 create mode 100644 xen/arch/arm/include/asm/config_mpu.h
 create mode 100644 xen/arch/arm/include/asm/mm_mmu.h
 create mode 100644 xen/arch/arm/include/asm/mm_mpu.h
 create mode 100644 xen/arch/arm/include/asm/p2m_mmu.h
 create mode 100644 xen/arch/arm/mm_mmu.c
 create mode 100644 xen/arch/arm/mm_mpu.c
 create mode 100644 xen/arch/arm/p2m_mmu.c
 create mode 100644 xen/arch/arm/p2m_mpu.c
 create mode 100644 xen/arch/arm/setup_mmu.c
 create mode 100644 xen/arch/arm/setup_mpu.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-13 10:06   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64 Penny Zheng
                   ` (41 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Julien Grall

From: Wei Chen <wei.chen@arm.com>

These two variables are stale variables, they only have declarations
in config.h, they don't have any definition and no any code is using
these two variables. So in this patch, we remove them from config.h.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
---
v1 -> v2:
1. Add Ab.
---
 xen/arch/arm/include/asm/config.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/xen/arch/arm/include/asm/config.h b/xen/arch/arm/include/asm/config.h
index 0fefed1b8a..25a625ff08 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -172,8 +172,6 @@
 #define STACK_SIZE  (PAGE_SIZE << STACK_ORDER)
 
 #ifndef __ASSEMBLY__
-extern unsigned long xen_phys_start;
-extern unsigned long xenheap_phys_end;
 extern unsigned long frametable_virt_end;
 #endif
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-17 23:09   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA Penny Zheng
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

Currently, ARM_EFI will mandatorily selected by Arm64.
Even if the user knows for sure that their images will not
start in the EFI environment, they can't disable the EFI
support for Arm64. This means there will be about 3K lines
unused code in their images.

So in this patch, we make ARM_EFI selectable for Arm64, and
based on that, we can use CONFIG_ARM_EFI to gate the EFI
specific code in head.S for those images that will not be
booted in EFI environment.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. New patch
---
 xen/arch/arm/Kconfig      | 10 ++++++++--
 xen/arch/arm/arm64/head.S | 15 +++++++++++++--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 239d3aed3c..ace7178c9a 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -7,7 +7,6 @@ config ARM_64
 	def_bool y
 	depends on !ARM_32
 	select 64BIT
-	select ARM_EFI
 	select HAS_FAST_MULTIPLY
 
 config ARM
@@ -37,7 +36,14 @@ config ACPI
 	  an alternative to device tree on ARM64.
 
 config ARM_EFI
-	bool
+	bool "UEFI boot service support"
+	depends on ARM_64
+	default y
+	help
+	  This option provides support for boot services through
+	  UEFI firmware. A UEFI stub is provided to allow Xen to
+	  be booted as an EFI application. This is only useful for
+	  Xen that may run on systems that have UEFI firmware.
 
 config GICV3
 	bool "GICv3 driver"
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index ad014716db..93f9b0b9d5 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -22,8 +22,11 @@
 
 #include <asm/page.h>
 #include <asm/early_printk.h>
+
+#ifdef CONFIG_ARM_EFI
 #include <efi/efierr.h>
 #include <asm/arm64/efibind.h>
+#endif
 
 #define PT_PT     0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
 #define PT_MEM    0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
@@ -172,8 +175,10 @@ efi_head:
         .byte   0x52
         .byte   0x4d
         .byte   0x64
-        .long   pe_header - efi_head        /* Offset to the PE header. */
-
+#ifndef CONFIG_ARM_EFI
+        .long   0                    /* 0 means no PE header. */
+#else
+        .long   pe_header - efi_head /* Offset to the PE header. */
         /*
          * Add the PE/COFF header to the file.  The address of this header
          * is at offset 0x3c in the file, and is part of Linux "Image"
@@ -279,6 +284,8 @@ section_table:
         .short  0                /* NumberOfLineNumbers  (0 for executables) */
         .long   0xe0500020       /* Characteristics (section flags) */
         .align  5
+#endif /* CONFIG_ARM_EFI */
+
 real_start:
         /* BSS should be zeroed when booting without EFI */
         mov   x26, #0                /* x26 := skip_zero_bss */
@@ -913,6 +920,8 @@ putn:   ret
 ENTRY(lookup_processor_type)
         mov  x0, #0
         ret
+
+#ifdef CONFIG_ARM_EFI
 /*
  *  Function to transition from EFI loader in C, to Xen entry point.
  *  void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
@@ -971,6 +980,8 @@ ENTRY(efi_xen_start)
         b     real_start_efi
 ENDPROC(efi_xen_start)
 
+#endif /* CONFIG_ARM_EFI */
+
 /*
  * Local variables:
  * mode: ASM
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64 Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-17 23:16   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R Penny Zheng
                   ` (39 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

From Arm ARM Supplement of Armv8-R AArch64 (DDI 0600A) [1],
section D1.6.2 TLB maintenance instructions, we know that
Armv8-R AArch64 permits an implementation to cache stage 1
VMSAv8-64 and stage 2 PMSAv8-64 attributes as a common entry
for the Secure EL1&0 translation regime. But for Xen itself,
it's running with stage 1 PMSAv8-64 on Armv8-R AArch64. The
EL2 MPU updates for stage 1 PMSAv8-64 will not be cached in
TLB entries. So we don't need any TLB invalidation for Xen
itself in EL2.

So in this patch, we use empty macros to stub Xen TLB helpers
for MPU system (PMSA), but still keep the Guest TLB helpers.
Because when a guest running in EL1 with VMSAv8-64 (MMU), guest
TLB invalidation is still needed. But we need some policy to
distinguish MPU and MMU guest, this will be done in guest
support of Armv8-R AArch64 later.

[1] https://developer.arm.com/documentation/ddi0600/ac

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. No change.
---
 xen/arch/arm/include/asm/arm64/flushtlb.h | 25 +++++++++++++++++++++++
 xen/arch/arm/include/asm/flushtlb.h       | 22 ++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/xen/arch/arm/include/asm/arm64/flushtlb.h b/xen/arch/arm/include/asm/arm64/flushtlb.h
index 7c54315187..fe445f6831 100644
--- a/xen/arch/arm/include/asm/arm64/flushtlb.h
+++ b/xen/arch/arm/include/asm/arm64/flushtlb.h
@@ -51,6 +51,8 @@ TLB_HELPER(flush_all_guests_tlb_local, alle1);
 /* Flush innershareable TLBs, all VMIDs, non-hypervisor mode */
 TLB_HELPER(flush_all_guests_tlb, alle1is);
 
+#ifndef CONFIG_HAS_MPU
+
 /* Flush all hypervisor mappings from the TLB of the local processor. */
 TLB_HELPER(flush_xen_tlb_local, alle2);
 
@@ -66,6 +68,29 @@ static inline void __flush_xen_tlb_one(vaddr_t va)
     asm volatile("tlbi vae2is, %0;" : : "r" (va>>PAGE_SHIFT) : "memory");
 }
 
+#else
+
+/*
+ * When Xen is running with stage 1 PMSAv8-64 on MPU systems. The EL2 MPU
+ * updates for stage1 PMSAv8-64 will not be cached in TLB entries. So we
+ * don't need any TLB invalidation for Xen itself in EL2. See Arm ARM
+ * Supplement of Armv8-R AArch64 (DDI 0600A), section D1.6.2 TLB maintenance
+ * instructions for more details.
+ */
+static inline void flush_xen_tlb_local(void)
+{
+}
+
+static inline void  __flush_xen_tlb_one_local(vaddr_t va)
+{
+}
+
+static inline void __flush_xen_tlb_one(vaddr_t va)
+{
+}
+
+#endif /* CONFIG_HAS_MPU */
+
 #endif /* __ASM_ARM_ARM64_FLUSHTLB_H__ */
 /*
  * Local variables:
diff --git a/xen/arch/arm/include/asm/flushtlb.h b/xen/arch/arm/include/asm/flushtlb.h
index 125a141975..4b8bf65281 100644
--- a/xen/arch/arm/include/asm/flushtlb.h
+++ b/xen/arch/arm/include/asm/flushtlb.h
@@ -28,6 +28,7 @@ static inline void page_set_tlbflush_timestamp(struct page_info *page)
 /* Flush specified CPUs' TLBs */
 void arch_flush_tlb_mask(const cpumask_t *mask);
 
+#ifndef CONFIG_HAS_MPU
 /*
  * Flush a range of VA's hypervisor mappings from the TLB of the local
  * processor.
@@ -66,6 +67,27 @@ static inline void flush_xen_tlb_range_va(vaddr_t va,
     isb();
 }
 
+#else
+
+/*
+ * When Xen is running with stage 1 PMSAv8-64 on MPU systems. The EL2 MPU
+ * updates for stage1 PMSAv8-64 will not be cached in TLB entries. So we
+ * don't need any TLB invalidation for Xen itself in EL2. See Arm ARM
+ * Supplement of Armv8-R AArch64 (DDI 0600A), section D1.6.2 TLB maintenance
+ * instructions for more details.
+ */
+static inline void flush_xen_tlb_range_va_local(vaddr_t va,
+                                                unsigned long size)
+{
+}
+
+static inline void flush_xen_tlb_range_va(vaddr_t va,
+                                          unsigned long size)
+{
+}
+
+#endif /* CONFIG_HAS_MPU */
+
 #endif /* __ASM_ARM_FLUSHTLB_H__ */
 /*
  * Local variables:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (2 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-17 23:24   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S Penny Zheng
                   ` (38 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jiamei . Xie

From: Wei Chen <wei.chen@arm.com>

On Armv8-A, Xen has a fixed virtual start address (link address
too) for all Armv8-A platforms. In an MMU based system, Xen can
map its loaded address to this virtual start address. So, on
Armv8-A platforms, the Xen start address does not need to be
configurable. But on Armv8-R platforms, there is no MMU to map
loaded address to a fixed virtual address and different platforms
will have very different address space layout. So Xen cannot use
a fixed physical address on MPU based system and need to have it
configurable.

In this patch we introduce one Kconfig option for users to define
the default Xen start address for Armv8-R. Users can enter the
address in config time, or select the tailored platform config
file from arch/arm/configs.

And as we introduced Armv8-R platforms to Xen, that means the
existed Arm64 platforms should not be listed in Armv8-R platform
list, so we add !ARM_V8R dependency for these platforms.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Jiamei.Xie <jiamei.xie@arm.com>
---
v1 -> v2:
1. Remove the platform header fvp_baser.h.
2. Remove the default start address for fvp_baser64.
3. Remove the description of default address from commit log.
4. Change HAS_MPU to ARM_V8R for Xen start address dependency.
   No matter Arm-v8r board has MPU or not, it always need to
   specify the start address.
---
 xen/arch/arm/Kconfig           |  8 ++++++++
 xen/arch/arm/platforms/Kconfig | 16 +++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ace7178c9a..c6b6b612d1 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -145,6 +145,14 @@ config TEE
 	  This option enables generic TEE mediators support. It allows guests
 	  to access real TEE via one of TEE mediators implemented in XEN.
 
+config XEN_START_ADDRESS
+	hex "Xen start address: keep default to use platform defined address"
+	default 0
+	depends on ARM_V8R
+	help
+	  This option allows to set the customized address at which Xen will be
+	  linked on MPU systems. This address must be aligned to a page size.
+
 source "arch/arm/tee/Kconfig"
 
 config STATIC_SHM
diff --git a/xen/arch/arm/platforms/Kconfig b/xen/arch/arm/platforms/Kconfig
index c93a6b2756..0904793a0b 100644
--- a/xen/arch/arm/platforms/Kconfig
+++ b/xen/arch/arm/platforms/Kconfig
@@ -1,6 +1,7 @@
 choice
 	prompt "Platform Support"
 	default ALL_PLAT
+	default FVP_BASER if ARM_V8R
 	---help---
 	Choose which hardware platform to enable in Xen.
 
@@ -8,13 +9,14 @@ choice
 
 config ALL_PLAT
 	bool "All Platforms"
+	depends on !ARM_V8R
 	---help---
 	Enable support for all available hardware platforms. It doesn't
 	automatically select any of the related drivers.
 
 config QEMU
 	bool "QEMU aarch virt machine support"
-	depends on ARM_64
+	depends on ARM_64 && !ARM_V8R
 	select GICV3
 	select HAS_PL011
 	---help---
@@ -23,7 +25,7 @@ config QEMU
 
 config RCAR3
 	bool "Renesas RCar3 support"
-	depends on ARM_64
+	depends on ARM_64 && !ARM_V8R
 	select HAS_SCIF
 	select IPMMU_VMSA
 	---help---
@@ -31,14 +33,22 @@ config RCAR3
 
 config MPSOC
 	bool "Xilinx Ultrascale+ MPSoC support"
-	depends on ARM_64
+	depends on ARM_64 && !ARM_V8R
 	select HAS_CADENCE_UART
 	select ARM_SMMU
 	---help---
 	Enable all the required drivers for Xilinx Ultrascale+ MPSoC
 
+config FVP_BASER
+	bool "Fixed Virtual Platform BaseR support"
+	depends on ARM_V8R
+	help
+	  Enable platform specific configurations for Fixed Virtual
+	  Platform BaseR
+
 config NO_PLAT
 	bool "No Platforms"
+	depends on !ARM_V8R
 	---help---
 	Do not enable specific support for any platform.
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (3 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-17 23:37   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 06/40] xen/arm64: move MMU related code from head.S to head_mmu.S Penny Zheng
                   ` (37 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

We want to reuse head.S for MPU systems, but there are some
code implemented for MMU systems only. We will move such
code to another MMU specific file. But before that, we will
do some preparations in this patch to make them easier
for reviewing:
1. Fix the indentations of code comments.
2. Export some symbols that will be accessed out of file
   scope.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. New patch.
---
 xen/arch/arm/arm64/head.S | 40 +++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 93f9b0b9d5..b2214bc5e3 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -136,22 +136,22 @@
         add \xb, \xb, x20
 .endm
 
-        .section .text.header, "ax", %progbits
-        /*.aarch64*/
+.section .text.header, "ax", %progbits
+/*.aarch64*/
 
-        /*
-         * Kernel startup entry point.
-         * ---------------------------
-         *
-         * The requirements are:
-         *   MMU = off, D-cache = off, I-cache = on or off,
-         *   x0 = physical address to the FDT blob.
-         *
-         * This must be the very first address in the loaded image.
-         * It should be linked at XEN_VIRT_START, and loaded at any
-         * 4K-aligned address.  All of text+data+bss must fit in 2MB,
-         * or the initial pagetable code below will need adjustment.
-         */
+/*
+ * Kernel startup entry point.
+ * ---------------------------
+ *
+ * The requirements are:
+ *   MMU = off, D-cache = off, I-cache = on or off,
+ *   x0 = physical address to the FDT blob.
+ *
+ * This must be the very first address in the loaded image.
+ * It should be linked at XEN_VIRT_START, and loaded at any
+ * 4K-aligned address.  All of text+data+bss must fit in 2MB,
+ * or the initial pagetable code below will need adjustment.
+ */
 
 GLOBAL(start)
         /*
@@ -586,7 +586,7 @@ ENDPROC(cpu_init)
  *
  * Clobbers x0 - x4
  */
-create_page_tables:
+ENTRY(create_page_tables)
         /* Prepare the page-tables for mapping Xen */
         ldr   x0, =XEN_VIRT_START
         create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
@@ -680,7 +680,7 @@ ENDPROC(create_page_tables)
  *
  * Clobbers x0 - x3
  */
-enable_mmu:
+ENTRY(enable_mmu)
         PRINT("- Turning on paging -\r\n")
 
         /*
@@ -714,7 +714,7 @@ ENDPROC(enable_mmu)
  *
  * Clobbers x0 - x1
  */
-remove_identity_mapping:
+ENTRY(remove_identity_mapping)
         /*
          * Find the zeroeth slot used. Remove the entry from zeroeth
          * table if the slot is not XEN_ZEROETH_SLOT.
@@ -775,7 +775,7 @@ ENDPROC(remove_identity_mapping)
  *
  * Clobbers x0 - x3
  */
-setup_fixmap:
+ENTRY(setup_fixmap)
 #ifdef CONFIG_EARLY_PRINTK
         /* Add UART to the fixmap table */
         ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
@@ -871,7 +871,7 @@ ENDPROC(init_uart)
  * x0: Nul-terminated string to print.
  * x23: Early UART base address
  * Clobbers x0-x1 */
-puts:
+ENTRY(puts)
         early_uart_ready x23, 1
         ldrb  w1, [x0], #1           /* Load next char */
         cbz   w1, 1f                 /* Exit on nul */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 06/40] xen/arm64: move MMU related code from head.S to head_mmu.S
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (4 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections Penny Zheng
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Henry Wang

From: Wei Chen <wei.chen@arm.com>

There are lots of MMU specific code in head.S. This code will not
be used in MPU systems. If we use #ifdef to gate them, the code
will become messy and hard to maintain. So we move MMU related
code to head_mmu.S, and keep common code still in head.S.

And some assembly macros that will be shared by MMU and MPU later,
we move them to macros.h.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v1 -> v2:
1. Move macros to macros.h
2. Remove the indention modification
3. Duplicate "fail" instead of exporting it.
---
 xen/arch/arm/arm64/Makefile             |   3 +
 xen/arch/arm/arm64/head.S               | 383 ------------------------
 xen/arch/arm/arm64/head_mmu.S           | 372 +++++++++++++++++++++++
 xen/arch/arm/include/asm/arm64/macros.h |  51 ++++
 4 files changed, 426 insertions(+), 383 deletions(-)
 create mode 100644 xen/arch/arm/arm64/head_mmu.S

diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index 6d507da0d4..22da2f54b5 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -8,6 +8,9 @@ obj-y += domctl.o
 obj-y += domain.o
 obj-y += entry.o
 obj-y += head.o
+ifneq ($(CONFIG_HAS_MPU),y)
+obj-y += head_mmu.o
+endif
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += smc.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index b2214bc5e3..5cfa47279b 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -28,17 +28,6 @@
 #include <asm/arm64/efibind.h>
 #endif
 
-#define PT_PT     0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_MEM    0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
-#define PT_MEM_L3 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_DEV    0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */
-#define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */
-
-/* Convenience defines to get slot used by Xen mapping. */
-#define XEN_ZEROETH_SLOT    zeroeth_table_offset(XEN_VIRT_START)
-#define XEN_FIRST_SLOT      first_table_offset(XEN_VIRT_START)
-#define XEN_SECOND_SLOT     second_table_offset(XEN_VIRT_START)
-
 #define __HEAD_FLAG_PAGE_SIZE   ((PAGE_SHIFT - 10) / 2)
 
 #define __HEAD_FLAG_PHYS_BASE   1
@@ -85,57 +74,6 @@
  *  x30 - lr
  */
 
-#ifdef CONFIG_EARLY_PRINTK
-/*
- * Macro to print a string to the UART, if there is one.
- *
- * Clobbers x0 - x3
- */
-#define PRINT(_s)          \
-        mov   x3, lr ;     \
-        adr   x0, 98f ;    \
-        bl    puts    ;    \
-        mov   lr, x3 ;     \
-        RODATA_STR(98, _s)
-
-/*
- * Macro to print the value of register \xb
- *
- * Clobbers x0 - x4
- */
-.macro print_reg xb
-        mov   x0, \xb
-        mov   x4, lr
-        bl    putn
-        mov   lr, x4
-.endm
-
-#else /* CONFIG_EARLY_PRINTK */
-#define PRINT(s)
-
-.macro print_reg xb
-.endm
-
-#endif /* !CONFIG_EARLY_PRINTK */
-
-/*
- * Pseudo-op for PC relative adr <reg>, <symbol> where <symbol> is
- * within the range +/- 4GB of the PC.
- *
- * @dst: destination register (64 bit wide)
- * @sym: name of the symbol
- */
-.macro  adr_l, dst, sym
-        adrp \dst, \sym
-        add  \dst, \dst, :lo12:\sym
-.endm
-
-/* Load the physical address of a symbol into xb */
-.macro load_paddr xb, sym
-        ldr \xb, =\sym
-        add \xb, \xb, x20
-.endm
-
 .section .text.header, "ax", %progbits
 /*.aarch64*/
 
@@ -500,296 +438,6 @@ cpu_init:
         ret
 ENDPROC(cpu_init)
 
-/*
- * Macro to find the slot number at a given page-table level
- *
- * slot:     slot computed
- * virt:     virtual address
- * lvl:      page-table level
- */
-.macro get_table_slot, slot, virt, lvl
-        ubfx  \slot, \virt, #XEN_PT_LEVEL_SHIFT(\lvl), #XEN_PT_LPAE_SHIFT
-.endm
-
-/*
- * Macro to create a page table entry in \ptbl to \tbl
- *
- * ptbl:    table symbol where the entry will be created
- * tbl:     table symbol to point to
- * virt:    virtual address
- * lvl:     page-table level
- * tmp1:    scratch register
- * tmp2:    scratch register
- * tmp3:    scratch register
- *
- * Preserves \virt
- * Clobbers \tmp1, \tmp2, \tmp3
- *
- * Also use x20 for the phys offset.
- *
- * Note that all parameters using registers should be distinct.
- */
-.macro create_table_entry, ptbl, tbl, virt, lvl, tmp1, tmp2, tmp3
-        get_table_slot \tmp1, \virt, \lvl   /* \tmp1 := slot in \tlb */
-
-        load_paddr \tmp2, \tbl
-        mov   \tmp3, #PT_PT                 /* \tmp3 := right for linear PT */
-        orr   \tmp3, \tmp3, \tmp2           /*          + \tlb paddr */
-
-        adr_l \tmp2, \ptbl
-
-        str   \tmp3, [\tmp2, \tmp1, lsl #3]
-.endm
-
-/*
- * Macro to create a mapping entry in \tbl to \phys. Only mapping in 3rd
- * level table (i.e page granularity) is supported.
- *
- * ptbl:     table symbol where the entry will be created
- * virt:    virtual address
- * phys:    physical address (should be page aligned)
- * tmp1:    scratch register
- * tmp2:    scratch register
- * tmp3:    scratch register
- * type:    mapping type. If not specified it will be normal memory (PT_MEM_L3)
- *
- * Preserves \virt, \phys
- * Clobbers \tmp1, \tmp2, \tmp3
- *
- * Note that all parameters using registers should be distinct.
- */
-.macro create_mapping_entry, ptbl, virt, phys, tmp1, tmp2, tmp3, type=PT_MEM_L3
-        and   \tmp3, \phys, #THIRD_MASK     /* \tmp3 := PAGE_ALIGNED(phys) */
-
-        get_table_slot \tmp1, \virt, 3      /* \tmp1 := slot in \tlb */
-
-        mov   \tmp2, #\type                 /* \tmp2 := right for section PT */
-        orr   \tmp2, \tmp2, \tmp3           /*          + PAGE_ALIGNED(phys) */
-
-        adr_l \tmp3, \ptbl
-
-        str   \tmp2, [\tmp3, \tmp1, lsl #3]
-.endm
-
-/*
- * Rebuild the boot pagetable's first-level entries. The structure
- * is described in mm.c.
- *
- * After the CPU enables paging it will add the fixmap mapping
- * to these page tables, however this may clash with the 1:1
- * mapping. So each CPU must rebuild the page tables here with
- * the 1:1 in place.
- *
- * Inputs:
- *   x19: paddr(start)
- *   x20: phys offset
- *
- * Clobbers x0 - x4
- */
-ENTRY(create_page_tables)
-        /* Prepare the page-tables for mapping Xen */
-        ldr   x0, =XEN_VIRT_START
-        create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
-        create_table_entry boot_first, boot_second, x0, 1, x1, x2, x3
-        create_table_entry boot_second, boot_third, x0, 2, x1, x2, x3
-
-        /* Map Xen */
-        adr_l x4, boot_third
-
-        lsr   x2, x19, #THIRD_SHIFT  /* Base address for 4K mapping */
-        lsl   x2, x2, #THIRD_SHIFT
-        mov   x3, #PT_MEM_L3         /* x2 := Section map */
-        orr   x2, x2, x3
-
-        /* ... map of vaddr(start) in boot_third */
-        mov   x1, xzr
-1:      str   x2, [x4, x1]           /* Map vaddr(start) */
-        add   x2, x2, #PAGE_SIZE     /* Next page */
-        add   x1, x1, #8             /* Next slot */
-        cmp   x1, #(XEN_PT_LPAE_ENTRIES<<3) /* 512 entries per page */
-        b.lt  1b
-
-        /*
-         * If Xen is loaded at exactly XEN_VIRT_START then we don't
-         * need an additional 1:1 mapping, the virtual mapping will
-         * suffice.
-         */
-        cmp   x19, #XEN_VIRT_START
-        bne   1f
-        ret
-1:
-        /*
-         * Setup the 1:1 mapping so we can turn the MMU on. Note that
-         * only the first page of Xen will be part of the 1:1 mapping.
-         */
-
-        /*
-         * Find the zeroeth slot used. If the slot is not
-         * XEN_ZEROETH_SLOT, then the 1:1 mapping will use its own set of
-         * page-tables from the first level.
-         */
-        get_table_slot x0, x19, 0       /* x0 := zeroeth slot */
-        cmp   x0, #XEN_ZEROETH_SLOT
-        beq   1f
-        create_table_entry boot_pgtable, boot_first_id, x19, 0, x0, x1, x2
-        b     link_from_first_id
-
-1:
-        /*
-         * Find the first slot used. If the slot is not XEN_FIRST_SLOT,
-         * then the 1:1 mapping will use its own set of page-tables from
-         * the second level.
-         */
-        get_table_slot x0, x19, 1      /* x0 := first slot */
-        cmp   x0, #XEN_FIRST_SLOT
-        beq   1f
-        create_table_entry boot_first, boot_second_id, x19, 1, x0, x1, x2
-        b     link_from_second_id
-
-1:
-        /*
-         * Find the second slot used. If the slot is XEN_SECOND_SLOT, then the
-         * 1:1 mapping will use its own set of page-tables from the
-         * third level. For slot XEN_SECOND_SLOT, Xen is not yet able to handle
-         * it.
-         */
-        get_table_slot x0, x19, 2     /* x0 := second slot */
-        cmp   x0, #XEN_SECOND_SLOT
-        beq   virtphys_clash
-        create_table_entry boot_second, boot_third_id, x19, 2, x0, x1, x2
-        b     link_from_third_id
-
-link_from_first_id:
-        create_table_entry boot_first_id, boot_second_id, x19, 1, x0, x1, x2
-link_from_second_id:
-        create_table_entry boot_second_id, boot_third_id, x19, 2, x0, x1, x2
-link_from_third_id:
-        create_mapping_entry boot_third_id, x19, x19, x0, x1, x2
-        ret
-
-virtphys_clash:
-        /* Identity map clashes with boot_third, which we cannot handle yet */
-        PRINT("- Unable to build boot page tables - virt and phys addresses clash. -\r\n")
-        b     fail
-ENDPROC(create_page_tables)
-
-/*
- * Turn on the Data Cache and the MMU. The function will return on the 1:1
- * mapping. In other word, the caller is responsible to switch to the runtime
- * mapping.
- *
- * Clobbers x0 - x3
- */
-ENTRY(enable_mmu)
-        PRINT("- Turning on paging -\r\n")
-
-        /*
-         * The state of the TLBs is unknown before turning on the MMU.
-         * Flush them to avoid stale one.
-         */
-        tlbi  alle2                  /* Flush hypervisor TLBs */
-        dsb   nsh
-
-        /* Write Xen's PT's paddr into TTBR0_EL2 */
-        load_paddr x0, boot_pgtable
-        msr   TTBR0_EL2, x0
-        isb
-
-        mrs   x0, SCTLR_EL2
-        orr   x0, x0, #SCTLR_Axx_ELx_M  /* Enable MMU */
-        orr   x0, x0, #SCTLR_Axx_ELx_C  /* Enable D-cache */
-        dsb   sy                     /* Flush PTE writes and finish reads */
-        msr   SCTLR_EL2, x0          /* now paging is enabled */
-        isb                          /* Now, flush the icache */
-        ret
-ENDPROC(enable_mmu)
-
-/*
- * Remove the 1:1 map from the page-tables. It is not easy to keep track
- * where the 1:1 map was mapped, so we will look for the top-level entry
- * exclusive to the 1:1 map and remove it.
- *
- * Inputs:
- *   x19: paddr(start)
- *
- * Clobbers x0 - x1
- */
-ENTRY(remove_identity_mapping)
-        /*
-         * Find the zeroeth slot used. Remove the entry from zeroeth
-         * table if the slot is not XEN_ZEROETH_SLOT.
-         */
-        get_table_slot x1, x19, 0       /* x1 := zeroeth slot */
-        cmp   x1, #XEN_ZEROETH_SLOT
-        beq   1f
-        /* It is not in slot XEN_ZEROETH_SLOT, remove the entry. */
-        ldr   x0, =boot_pgtable         /* x0 := root table */
-        str   xzr, [x0, x1, lsl #3]
-        b     identity_mapping_removed
-
-1:
-        /*
-         * Find the first slot used. Remove the entry for the first
-         * table if the slot is not XEN_FIRST_SLOT.
-         */
-        get_table_slot x1, x19, 1       /* x1 := first slot */
-        cmp   x1, #XEN_FIRST_SLOT
-        beq   1f
-        /* It is not in slot XEN_FIRST_SLOT, remove the entry. */
-        ldr   x0, =boot_first           /* x0 := first table */
-        str   xzr, [x0, x1, lsl #3]
-        b     identity_mapping_removed
-
-1:
-        /*
-         * Find the second slot used. Remove the entry for the first
-         * table if the slot is not XEN_SECOND_SLOT.
-         */
-        get_table_slot x1, x19, 2       /* x1 := second slot */
-        cmp   x1, #XEN_SECOND_SLOT
-        beq   identity_mapping_removed
-        /* It is not in slot 1, remove the entry */
-        ldr   x0, =boot_second          /* x0 := second table */
-        str   xzr, [x0, x1, lsl #3]
-
-identity_mapping_removed:
-        /* See asm/arm64/flushtlb.h for the explanation of the sequence. */
-        dsb   nshst
-        tlbi  alle2
-        dsb   nsh
-        isb
-
-        ret
-ENDPROC(remove_identity_mapping)
-
-/*
- * Map the UART in the fixmap (when earlyprintk is used) and hook the
- * fixmap table in the page tables.
- *
- * The fixmap cannot be mapped in create_page_tables because it may
- * clash with the 1:1 mapping.
- *
- * Inputs:
- *   x20: Physical offset
- *   x23: Early UART base physical address
- *
- * Clobbers x0 - x3
- */
-ENTRY(setup_fixmap)
-#ifdef CONFIG_EARLY_PRINTK
-        /* Add UART to the fixmap table */
-        ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
-        create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
-#endif
-        /* Map fixmap into boot_second */
-        ldr   x0, =FIXMAP_ADDR(0)
-        create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
-        /* Ensure any page table updates made above have occurred. */
-        dsb   nshst
-
-        ret
-ENDPROC(setup_fixmap)
-
 /*
  * Setup the initial stack and jump to the C world
  *
@@ -818,37 +466,6 @@ fail:   PRINT("- Boot failed -\r\n")
         b     1b
 ENDPROC(fail)
 
-GLOBAL(_end_boot)
-
-/*
- * Switch TTBR
- *
- * x0    ttbr
- *
- * TODO: This code does not comply with break-before-make.
- */
-ENTRY(switch_ttbr)
-        dsb   sy                     /* Ensure the flushes happen before
-                                      * continuing */
-        isb                          /* Ensure synchronization with previous
-                                      * changes to text */
-        tlbi   alle2                 /* Flush hypervisor TLB */
-        ic     iallu                 /* Flush I-cache */
-        dsb    sy                    /* Ensure completion of TLB flush */
-        isb
-
-        msr    TTBR0_EL2, x0
-
-        isb                          /* Ensure synchronization with previous
-                                      * changes to text */
-        tlbi   alle2                 /* Flush hypervisor TLB */
-        ic     iallu                 /* Flush I-cache */
-        dsb    sy                    /* Ensure completion of TLB flush */
-        isb
-
-        ret
-ENDPROC(switch_ttbr)
-
 #ifdef CONFIG_EARLY_PRINTK
 /*
  * Initialize the UART. Should only be called on the boot CPU.
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
new file mode 100644
index 0000000000..e2c8f07140
--- /dev/null
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -0,0 +1,372 @@
+/*
+ * xen/arch/arm/head_mmu.S
+ *
+ * Start-of-day code for an ARMv8-A.
+ *
+ * Ian Campbell <ian.campbell@citrix.com>
+ * Copyright (c) 2012 Citrix Systems.
+ *
+ * Based on ARMv7-A head.S by
+ * Tim Deegan <tim@xen.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <asm/page.h>
+#include <asm/early_printk.h>
+
+#define PT_PT     0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
+#define PT_MEM    0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
+#define PT_MEM_L3 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
+#define PT_DEV    0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */
+#define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */
+
+/* Convenience defines to get slot used by Xen mapping. */
+#define XEN_ZEROETH_SLOT    zeroeth_table_offset(XEN_VIRT_START)
+#define XEN_FIRST_SLOT      first_table_offset(XEN_VIRT_START)
+#define XEN_SECOND_SLOT     second_table_offset(XEN_VIRT_START)
+
+/*
+ * Macro to find the slot number at a given page-table level
+ *
+ * slot:     slot computed
+ * virt:     virtual address
+ * lvl:      page-table level
+ */
+.macro get_table_slot, slot, virt, lvl
+        ubfx  \slot, \virt, #XEN_PT_LEVEL_SHIFT(\lvl), #XEN_PT_LPAE_SHIFT
+.endm
+
+/*
+ * Macro to create a page table entry in \ptbl to \tbl
+ *
+ * ptbl:    table symbol where the entry will be created
+ * tbl:     table symbol to point to
+ * virt:    virtual address
+ * lvl:     page-table level
+ * tmp1:    scratch register
+ * tmp2:    scratch register
+ * tmp3:    scratch register
+ *
+ * Preserves \virt
+ * Clobbers \tmp1, \tmp2, \tmp3
+ *
+ * Also use x20 for the phys offset.
+ *
+ * Note that all parameters using registers should be distinct.
+ */
+.macro create_table_entry, ptbl, tbl, virt, lvl, tmp1, tmp2, tmp3
+        get_table_slot \tmp1, \virt, \lvl   /* \tmp1 := slot in \tlb */
+
+        load_paddr \tmp2, \tbl
+        mov   \tmp3, #PT_PT                 /* \tmp3 := right for linear PT */
+        orr   \tmp3, \tmp3, \tmp2           /*          + \tlb paddr */
+
+        adr_l \tmp2, \ptbl
+
+        str   \tmp3, [\tmp2, \tmp1, lsl #3]
+.endm
+
+/*
+ * Macro to create a mapping entry in \tbl to \phys. Only mapping in 3rd
+ * level table (i.e page granularity) is supported.
+ *
+ * ptbl:     table symbol where the entry will be created
+ * virt:    virtual address
+ * phys:    physical address (should be page aligned)
+ * tmp1:    scratch register
+ * tmp2:    scratch register
+ * tmp3:    scratch register
+ * type:    mapping type. If not specified it will be normal memory (PT_MEM_L3)
+ *
+ * Preserves \virt, \phys
+ * Clobbers \tmp1, \tmp2, \tmp3
+ *
+ * Note that all parameters using registers should be distinct.
+ */
+.macro create_mapping_entry, ptbl, virt, phys, tmp1, tmp2, tmp3, type=PT_MEM_L3
+        and   \tmp3, \phys, #THIRD_MASK     /* \tmp3 := PAGE_ALIGNED(phys) */
+
+        get_table_slot \tmp1, \virt, 3      /* \tmp1 := slot in \tlb */
+
+        mov   \tmp2, #\type                 /* \tmp2 := right for section PT */
+        orr   \tmp2, \tmp2, \tmp3           /*          + PAGE_ALIGNED(phys) */
+
+        adr_l \tmp3, \ptbl
+
+        str   \tmp2, [\tmp3, \tmp1, lsl #3]
+.endm
+
+.section .text.header, "ax", %progbits
+/*.aarch64*/
+
+/*
+ * Rebuild the boot pagetable's first-level entries. The structure
+ * is described in mm.c.
+ *
+ * After the CPU enables paging it will add the fixmap mapping
+ * to these page tables, however this may clash with the 1:1
+ * mapping. So each CPU must rebuild the page tables here with
+ * the 1:1 in place.
+ *
+ * Inputs:
+ *   x19: paddr(start)
+ *   x20: phys offset
+ *
+ * Clobbers x0 - x4
+ */
+ENTRY(create_page_tables)
+        /* Prepare the page-tables for mapping Xen */
+        ldr   x0, =XEN_VIRT_START
+        create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
+        create_table_entry boot_first, boot_second, x0, 1, x1, x2, x3
+        create_table_entry boot_second, boot_third, x0, 2, x1, x2, x3
+
+        /* Map Xen */
+        adr_l x4, boot_third
+
+        lsr   x2, x19, #THIRD_SHIFT  /* Base address for 4K mapping */
+        lsl   x2, x2, #THIRD_SHIFT
+        mov   x3, #PT_MEM_L3         /* x2 := Section map */
+        orr   x2, x2, x3
+
+        /* ... map of vaddr(start) in boot_third */
+        mov   x1, xzr
+1:      str   x2, [x4, x1]           /* Map vaddr(start) */
+        add   x2, x2, #PAGE_SIZE     /* Next page */
+        add   x1, x1, #8             /* Next slot */
+        cmp   x1, #(XEN_PT_LPAE_ENTRIES<<3) /* 512 entries per page */
+        b.lt  1b
+
+        /*
+         * If Xen is loaded at exactly XEN_VIRT_START then we don't
+         * need an additional 1:1 mapping, the virtual mapping will
+         * suffice.
+         */
+        cmp   x19, #XEN_VIRT_START
+        bne   1f
+        ret
+1:
+        /*
+         * Setup the 1:1 mapping so we can turn the MMU on. Note that
+         * only the first page of Xen will be part of the 1:1 mapping.
+         */
+
+        /*
+         * Find the zeroeth slot used. If the slot is not
+         * XEN_ZEROETH_SLOT, then the 1:1 mapping will use its own set of
+         * page-tables from the first level.
+         */
+        get_table_slot x0, x19, 0       /* x0 := zeroeth slot */
+        cmp   x0, #XEN_ZEROETH_SLOT
+        beq   1f
+        create_table_entry boot_pgtable, boot_first_id, x19, 0, x0, x1, x2
+        b     link_from_first_id
+
+1:
+        /*
+         * Find the first slot used. If the slot is not XEN_FIRST_SLOT,
+         * then the 1:1 mapping will use its own set of page-tables from
+         * the second level.
+         */
+        get_table_slot x0, x19, 1      /* x0 := first slot */
+        cmp   x0, #XEN_FIRST_SLOT
+        beq   1f
+        create_table_entry boot_first, boot_second_id, x19, 1, x0, x1, x2
+        b     link_from_second_id
+
+1:
+        /*
+         * Find the second slot used. If the slot is XEN_SECOND_SLOT, then the
+         * 1:1 mapping will use its own set of page-tables from the
+         * third level. For slot XEN_SECOND_SLOT, Xen is not yet able to handle
+         * it.
+         */
+        get_table_slot x0, x19, 2     /* x0 := second slot */
+        cmp   x0, #XEN_SECOND_SLOT
+        beq   virtphys_clash
+        create_table_entry boot_second, boot_third_id, x19, 2, x0, x1, x2
+        b     link_from_third_id
+
+link_from_first_id:
+        create_table_entry boot_first_id, boot_second_id, x19, 1, x0, x1, x2
+link_from_second_id:
+        create_table_entry boot_second_id, boot_third_id, x19, 2, x0, x1, x2
+link_from_third_id:
+        create_mapping_entry boot_third_id, x19, x19, x0, x1, x2
+        ret
+
+virtphys_clash:
+        /* Identity map clashes with boot_third, which we cannot handle yet */
+        PRINT("- Unable to build boot page tables - virt and phys addresses clash. -\r\n")
+        b     fail
+ENDPROC(create_page_tables)
+
+/*
+ * Turn on the Data Cache and the MMU. The function will return on the 1:1
+ * mapping. In other word, the caller is responsible to switch to the runtime
+ * mapping.
+ *
+ * Clobbers x0 - x3
+ */
+ENTRY(enable_mmu)
+        PRINT("- Turning on paging -\r\n")
+
+        /*
+         * The state of the TLBs is unknown before turning on the MMU.
+         * Flush them to avoid stale one.
+         */
+        tlbi  alle2                  /* Flush hypervisor TLBs */
+        dsb   nsh
+
+        /* Write Xen's PT's paddr into TTBR0_EL2 */
+        load_paddr x0, boot_pgtable
+        msr   TTBR0_EL2, x0
+        isb
+
+        mrs   x0, SCTLR_EL2
+        orr   x0, x0, #SCTLR_Axx_ELx_M  /* Enable MMU */
+        orr   x0, x0, #SCTLR_Axx_ELx_C  /* Enable D-cache */
+        dsb   sy                     /* Flush PTE writes and finish reads */
+        msr   SCTLR_EL2, x0          /* now paging is enabled */
+        isb                          /* Now, flush the icache */
+        ret
+ENDPROC(enable_mmu)
+
+/*
+ * Remove the 1:1 map from the page-tables. It is not easy to keep track
+ * where the 1:1 map was mapped, so we will look for the top-level entry
+ * exclusive to the 1:1 map and remove it.
+ *
+ * Inputs:
+ *   x19: paddr(start)
+ *
+ * Clobbers x0 - x1
+ */
+ENTRY(remove_identity_mapping)
+        /*
+         * Find the zeroeth slot used. Remove the entry from zeroeth
+         * table if the slot is not XEN_ZEROETH_SLOT.
+         */
+        get_table_slot x1, x19, 0       /* x1 := zeroeth slot */
+        cmp   x1, #XEN_ZEROETH_SLOT
+        beq   1f
+        /* It is not in slot XEN_ZEROETH_SLOT, remove the entry. */
+        ldr   x0, =boot_pgtable         /* x0 := root table */
+        str   xzr, [x0, x1, lsl #3]
+        b     identity_mapping_removed
+
+1:
+        /*
+         * Find the first slot used. Remove the entry for the first
+         * table if the slot is not XEN_FIRST_SLOT.
+         */
+        get_table_slot x1, x19, 1       /* x1 := first slot */
+        cmp   x1, #XEN_FIRST_SLOT
+        beq   1f
+        /* It is not in slot XEN_FIRST_SLOT, remove the entry. */
+        ldr   x0, =boot_first           /* x0 := first table */
+        str   xzr, [x0, x1, lsl #3]
+        b     identity_mapping_removed
+
+1:
+        /*
+         * Find the second slot used. Remove the entry for the first
+         * table if the slot is not XEN_SECOND_SLOT.
+         */
+        get_table_slot x1, x19, 2       /* x1 := second slot */
+        cmp   x1, #XEN_SECOND_SLOT
+        beq   identity_mapping_removed
+        /* It is not in slot 1, remove the entry */
+        ldr   x0, =boot_second          /* x0 := second table */
+        str   xzr, [x0, x1, lsl #3]
+
+identity_mapping_removed:
+        /* See asm/arm64/flushtlb.h for the explanation of the sequence. */
+        dsb   nshst
+        tlbi  alle2
+        dsb   nsh
+        isb
+
+        ret
+ENDPROC(remove_identity_mapping)
+
+/*
+ * Map the UART in the fixmap (when earlyprintk is used) and hook the
+ * fixmap table in the page tables.
+ *
+ * The fixmap cannot be mapped in create_page_tables because it may
+ * clash with the 1:1 mapping.
+ *
+ * Inputs:
+ *   x20: Physical offset
+ *   x23: Early UART base physical address
+ *
+ * Clobbers x0 - x3
+ */
+ENTRY(setup_fixmap)
+#ifdef CONFIG_EARLY_PRINTK
+        /* Add UART to the fixmap table */
+        ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
+        create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
+#endif
+        /* Map fixmap into boot_second */
+        ldr   x0, =FIXMAP_ADDR(0)
+        create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
+        /* Ensure any page table updates made above have occurred. */
+        dsb   nshst
+
+        ret
+ENDPROC(setup_fixmap)
+
+/* Fail-stop */
+fail:   PRINT("- Boot failed -\r\n")
+1:      wfe
+        b     1b
+ENDPROC(fail)
+
+GLOBAL(_end_boot)
+
+/*
+ * Switch TTBR
+ *
+ * x0    ttbr
+ *
+ * TODO: This code does not comply with break-before-make.
+ */
+ENTRY(switch_ttbr)
+        dsb   sy                     /* Ensure the flushes happen before
+                                      * continuing */
+        isb                          /* Ensure synchronization with previous
+                                      * changes to text */
+        tlbi   alle2                 /* Flush hypervisor TLB */
+        ic     iallu                 /* Flush I-cache */
+        dsb    sy                    /* Ensure completion of TLB flush */
+        isb
+
+        msr    TTBR0_EL2, x0
+
+        isb                          /* Ensure synchronization with previous
+                                      * changes to text */
+        tlbi   alle2                 /* Flush hypervisor TLB */
+        ic     iallu                 /* Flush I-cache */
+        dsb    sy                    /* Ensure completion of TLB flush */
+        isb
+
+        ret
+ENDPROC(switch_ttbr)
+
+/*
+ * Local variables:
+ * mode: ASM
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/arm64/macros.h b/xen/arch/arm/include/asm/arm64/macros.h
index 140e223b4c..f28c124e66 100644
--- a/xen/arch/arm/include/asm/arm64/macros.h
+++ b/xen/arch/arm/include/asm/arm64/macros.h
@@ -32,6 +32,57 @@
         hint    #22
     .endm
 
+#ifdef CONFIG_EARLY_PRINTK
+/*
+ * Macro to print a string to the UART, if there is one.
+ *
+ * Clobbers x0 - x3
+ */
+#define PRINT(_s)          \
+        mov   x3, lr ;     \
+        adr   x0, 98f ;    \
+        bl    puts    ;    \
+        mov   lr, x3 ;     \
+        RODATA_STR(98, _s)
+
+/*
+ * Macro to print the value of register \xb
+ *
+ * Clobbers x0 - x4
+ */
+.macro print_reg xb
+        mov   x0, \xb
+        mov   x4, lr
+        bl    putn
+        mov   lr, x4
+.endm
+
+#else /* CONFIG_EARLY_PRINTK */
+#define PRINT(s)
+
+.macro print_reg xb
+.endm
+
+#endif /* !CONFIG_EARLY_PRINTK */
+
+/*
+ * Pseudo-op for PC relative adr <reg>, <symbol> where <symbol> is
+ * within the range +/- 4GB of the PC.
+ *
+ * @dst: destination register (64 bit wide)
+ * @sym: name of the symbol
+ */
+.macro  adr_l, dst, sym
+        adrp \dst, \sym
+        add  \dst, \dst, :lo12:\sym
+.endm
+
+/* Load the physical address of a symbol into xb */
+.macro load_paddr xb, sym
+        ldr \xb, =\sym
+        add \xb, \xb, x20
+.endm
+
 /*
  * Register aliases.
  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (5 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 06/40] xen/arm64: move MMU related code from head.S to head_mmu.S Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-17 23:46   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R Penny Zheng
                   ` (35 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

Only the first 4KB of Xen image will be mapped as identity
(PA == VA). At the moment, Xen guarantees this by having
everything that needs to be used in the identity mapping
in head.S before _end_boot and checking at link time if this
fits in 4KB.

In previous patch, we have moved the MMU code outside of
head.S. Although we have added .text.header to the new file
to guarantee all identity map code still in the first 4KB.
However, the order of these two files on this 4KB depends
on the build tools. Currently, we use the build tools to
process the order of objs in the Makefile to ensure that
head.S must be at the top. But if you change to another build
tools, it may not be the same result.

In this patch we introduce .text.idmap to head_mmu.S, and
add this section after .text.header. to ensure code of
head_mmu.S after the code of header.S.

After this, we will still include some code that does not
belong to identity map before _end_boot. Because we have
moved _end_boot to head_mmu.S. That means all code in head.S
will be included before _end_boot. In this patch, we also
added .text flag in the place of original _end_boot in head.S.
All the code after .text in head.S will not be included in
identity map section.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. New patch.
---
 xen/arch/arm/arm64/head.S     | 6 ++++++
 xen/arch/arm/arm64/head_mmu.S | 2 +-
 xen/arch/arm/xen.lds.S        | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 5cfa47279b..782bd1f94c 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -466,6 +466,12 @@ fail:   PRINT("- Boot failed -\r\n")
         b     1b
 ENDPROC(fail)
 
+/*
+ * For the code that do not need in indentity map section,
+ * we put them back to normal .text section
+ */
+.section .text, "ax", %progbits
+
 #ifdef CONFIG_EARLY_PRINTK
 /*
  * Initialize the UART. Should only be called on the boot CPU.
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index e2c8f07140..6ff13c751c 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -105,7 +105,7 @@
         str   \tmp2, [\tmp3, \tmp1, lsl #3]
 .endm
 
-.section .text.header, "ax", %progbits
+.section .text.idmap, "ax", %progbits
 /*.aarch64*/
 
 /*
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 92c2984052..bc45ea2c65 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -33,6 +33,7 @@ SECTIONS
   .text : {
         _stext = .;            /* Text section */
        *(.text.header)
+       *(.text.idmap)
 
        *(.text.cold)
        *(.text.unlikely .text.*_unlikely .text.unlikely.*)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (6 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-17 23:49   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 09/40] xen/arm: decouple copy_from_paddr with FIXMAP Penny Zheng
                   ` (34 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

There is no VMSA support on Armv8-R AArch64, so we can not map early
UART to FIXMAP_CONSOLE. Instead, we use PA == VA to define
EARLY_UART_VIRTUAL_ADDRESS on Armv8-R AArch64.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
1. New patch
---
 xen/arch/arm/include/asm/early_printk.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/xen/arch/arm/include/asm/early_printk.h b/xen/arch/arm/include/asm/early_printk.h
index c5149b2976..44a230853f 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -15,10 +15,22 @@
 
 #ifdef CONFIG_EARLY_PRINTK
 
+#ifdef CONFIG_ARM_V8R
+
+/*
+ * For Armv-8r, there is not VMSA support in EL2, so we use VA == PA
+ * for EARLY_UART_VIRTUAL_ADDRESS.
+ */
+#define EARLY_UART_VIRTUAL_ADDRESS CONFIG_EARLY_UART_BASE_ADDRESS
+
+#else
+
 /* need to add the uart address offset in page to the fixmap address */
 #define EARLY_UART_VIRTUAL_ADDRESS \
     (FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
 
+#endif /* CONFIG_ARM_V8R */
+
 #endif /* !CONFIG_EARLY_PRINTK */
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 09/40] xen/arm: decouple copy_from_paddr with FIXMAP
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (7 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h Penny Zheng
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

copy_from_paddr will map a page to Xen's FIXMAP_MISC area for
temporary access. But for those systems do not support VMSA,
they can not implement set_fixmap/clear_fixmap, that means they
can't always use the same virtual address for source address.

In this case, we introduce to helpers to decouple copy_from_paddr
with set_fixmap/clear_fixmap. map_page_to_xen_misc can always
return the same virtual address as before for VMSA systems. It
also can return different address for non-VMSA systems.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. New patch
---
 xen/arch/arm/include/asm/setup.h |  4 ++++
 xen/arch/arm/kernel.c            | 13 +++++++------
 xen/arch/arm/mm.c                | 12 ++++++++++++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index a926f30a2b..4f39a1aa0a 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -119,6 +119,10 @@ extern struct bootinfo bootinfo;
 
 extern domid_t max_init_domid;
 
+/* Map a page to misc area */
+void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes);
+/* Unmap the page from misc area */
+void unmap_page_from_xen_misc(void);
 void copy_from_paddr(void *dst, paddr_t paddr, unsigned long len);
 
 size_t estimate_efi_size(unsigned int mem_nr_banks);
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 23b840ea9e..0475d8fae7 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -49,18 +49,19 @@ struct minimal_dtb_header {
  */
 void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
 {
-    void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
-
-    while (len) {
+    while ( len )
+    {
+        void *src;
         unsigned long l, s;
 
-        s = paddr & (PAGE_SIZE-1);
+        s = paddr & (PAGE_SIZE - 1);
         l = min(PAGE_SIZE - s, len);
 
-        set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+        src = map_page_to_xen_misc(maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+        ASSERT(src != NULL);
         memcpy(dst, src + s, l);
         clean_dcache_va_range(dst, l);
-        clear_fixmap(FIXMAP_MISC);
+        unmap_page_from_xen_misc();
 
         paddr += l;
         dst += l;
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 0fc6f2992d..8f15814c5e 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -355,6 +355,18 @@ void clear_fixmap(unsigned int map)
     BUG_ON(res != 0);
 }
 
+void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes)
+{
+    set_fixmap(FIXMAP_MISC, mfn, attributes);
+
+    return fix_to_virt(FIXMAP_MISC);
+}
+
+void unmap_page_from_xen_misc(void)
+{
+    clear_fixmap(FIXMAP_MISC);
+}
+
 void flush_page_to_ram(unsigned long mfn, bool sync_icache)
 {
     void *v = map_domain_page(_mfn(mfn));
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (8 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 09/40] xen/arm: decouple copy_from_paddr with FIXMAP Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-19 14:20   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map Penny Zheng
                   ` (32 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

Xen defines some global configuration macros for Arm in
config.h. We still want to use it for Armv8-R systems, but
there are some address related macros that are defined for
MMU systems. These macros will not be used by MPU systems,
Adding ifdefery with CONFIG_HAS_MPU to gate these macros
will result in a messy and hard-to-read/maintain code.

So we keep some common definitions still in config.h, but
move virtual address related definitions to a new file -
config_mmu.h. And use a new file config_mpu.h to store
definitions for MPU systems. To avoid spreading #ifdef
everywhere, we keep the same definition names for MPU
systems, like XEN_VIRT_START and HYPERVISOR_VIRT_START,
but the definition contents are MPU specific.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. Remove duplicated FIXMAP definitions from config_mmu.h
---
 xen/arch/arm/include/asm/config.h     | 103 +++--------------------
 xen/arch/arm/include/asm/config_mmu.h | 112 ++++++++++++++++++++++++++
 xen/arch/arm/include/asm/config_mpu.h |  25 ++++++
 3 files changed, 147 insertions(+), 93 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/config_mmu.h
 create mode 100644 xen/arch/arm/include/asm/config_mpu.h

diff --git a/xen/arch/arm/include/asm/config.h b/xen/arch/arm/include/asm/config.h
index 25a625ff08..86d8142959 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -48,6 +48,12 @@
 
 #define INVALID_VCPU_ID MAX_VIRT_CPUS
 
+/* Used for calculating PDX */
+#ifdef CONFIG_ARM_64
+#define FRAMETABLE_SIZE        GB(32)
+#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
+#endif
+
 #define __LINUX_ARM_ARCH__ 7
 #define CONFIG_AEABI
 
@@ -71,99 +77,10 @@
 #include <xen/const.h>
 #include <xen/page-size.h>
 
-/*
- * Common ARM32 and ARM64 layout:
- *   0  -   2M   Unmapped
- *   2M -   4M   Xen text, data, bss
- *   4M -   6M   Fixmap: special-purpose 4K mapping slots
- *   6M -  10M   Early boot mapping of FDT
- *   10M - 12M   Livepatch vmap (if compiled in)
- *
- * ARM32 layout:
- *   0  -  12M   <COMMON>
- *
- *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
- * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
- *                    space
- *
- *   1G -   2G   Xenheap: always-mapped memory
- *   2G -   4G   Domheap: on-demand-mapped
- *
- * ARM64 layout:
- * 0x0000000000000000 - 0x0000007fffffffff (512GB, L0 slot [0])
- *   0  -  12M   <COMMON>
- *
- *   1G -   2G   VMAP: ioremap and early_ioremap
- *
- *  32G -  64G   Frametable: 24 bytes per page for 5.3TB of RAM
- *
- * 0x0000008000000000 - 0x00007fffffffffff (127.5TB, L0 slots [1..255])
- *  Unused
- *
- * 0x0000800000000000 - 0x000084ffffffffff (5TB, L0 slots [256..265])
- *  1:1 mapping of RAM
- *
- * 0x0000850000000000 - 0x0000ffffffffffff (123TB, L0 slots [266..511])
- *  Unused
- */
-
-#define XEN_VIRT_START         _AT(vaddr_t,0x00200000)
-#define FIXMAP_ADDR(n)        (_AT(vaddr_t,0x00400000) + (n) * PAGE_SIZE)
-
-#define BOOT_FDT_VIRT_START    _AT(vaddr_t,0x00600000)
-#define BOOT_FDT_VIRT_SIZE     _AT(vaddr_t, MB(4))
-
-#ifdef CONFIG_LIVEPATCH
-#define LIVEPATCH_VMAP_START   _AT(vaddr_t,0x00a00000)
-#define LIVEPATCH_VMAP_SIZE    _AT(vaddr_t, MB(2))
-#endif
-
-#define HYPERVISOR_VIRT_START  XEN_VIRT_START
-
-#ifdef CONFIG_ARM_32
-
-#define CONFIG_SEPARATE_XENHEAP 1
-
-#define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
-#define FRAMETABLE_SIZE        MB(128-32)
-#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
-#define FRAMETABLE_VIRT_END    (FRAMETABLE_VIRT_START + FRAMETABLE_SIZE - 1)
-
-#define VMAP_VIRT_START        _AT(vaddr_t,0x10000000)
-#define VMAP_VIRT_SIZE         _AT(vaddr_t, GB(1) - MB(256))
-
-#define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
-#define XENHEAP_VIRT_SIZE      _AT(vaddr_t, GB(1))
-
-#define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
-#define DOMHEAP_VIRT_SIZE      _AT(vaddr_t, GB(2))
-
-#define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
-
-/* Number of domheap pagetable pages required at the second level (2MB mappings) */
-#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
-
-#else /* ARM_64 */
-
-#define SLOT0_ENTRY_BITS  39
-#define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
-#define SLOT0_ENTRY_SIZE  SLOT0(1)
-
-#define VMAP_VIRT_START  GB(1)
-#define VMAP_VIRT_SIZE   GB(1)
-
-#define FRAMETABLE_VIRT_START  GB(32)
-#define FRAMETABLE_SIZE        GB(32)
-#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
-
-#define DIRECTMAP_VIRT_START   SLOT0(256)
-#define DIRECTMAP_SIZE         (SLOT0_ENTRY_SIZE * (265-256))
-#define DIRECTMAP_VIRT_END     (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE - 1)
-
-#define XENHEAP_VIRT_START     directmap_virt_start
-
-#define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
-
+#ifdef CONFIG_HAS_MPU
+#include <asm/config_mpu.h>
+#else
+#include <asm/config_mmu.h>
 #endif
 
 #define NR_hypercalls 64
diff --git a/xen/arch/arm/include/asm/config_mmu.h b/xen/arch/arm/include/asm/config_mmu.h
new file mode 100644
index 0000000000..c12ff25cf4
--- /dev/null
+++ b/xen/arch/arm/include/asm/config_mmu.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/******************************************************************************
+ * config_mmu.h
+ *
+ * A Linux-style configuration list, only can be included by config.h
+ */
+
+#ifndef __ARM_CONFIG_MMU_H__
+#define __ARM_CONFIG_MMU_H__
+
+/*
+ * Common ARM32 and ARM64 layout:
+ *   0  -   2M   Unmapped
+ *   2M -   4M   Xen text, data, bss
+ *   4M -   6M   Fixmap: special-purpose 4K mapping slots
+ *   6M -  10M   Early boot mapping of FDT
+ *   10M - 12M   Livepatch vmap (if compiled in)
+ *
+ * ARM32 layout:
+ *   0  -  12M   <COMMON>
+ *
+ *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
+ * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
+ *                    space
+ *
+ *   1G -   2G   Xenheap: always-mapped memory
+ *   2G -   4G   Domheap: on-demand-mapped
+ *
+ * ARM64 layout:
+ * 0x0000000000000000 - 0x0000007fffffffff (512GB, L0 slot [0])
+ *   0  -  12M   <COMMON>
+ *
+ *   1G -   2G   VMAP: ioremap and early_ioremap
+ *
+ *  32G -  64G   Frametable: 24 bytes per page for 5.3TB of RAM
+ *
+ * 0x0000008000000000 - 0x00007fffffffffff (127.5TB, L0 slots [1..255])
+ *  Unused
+ *
+ * 0x0000800000000000 - 0x000084ffffffffff (5TB, L0 slots [256..265])
+ *  1:1 mapping of RAM
+ *
+ * 0x0000850000000000 - 0x0000ffffffffffff (123TB, L0 slots [266..511])
+ *  Unused
+ */
+
+#define XEN_VIRT_START         _AT(vaddr_t,0x00200000)
+#define FIXMAP_ADDR(n)        (_AT(vaddr_t,0x00400000) + (n) * PAGE_SIZE)
+
+#define BOOT_FDT_VIRT_START    _AT(vaddr_t,0x00600000)
+#define BOOT_FDT_VIRT_SIZE     _AT(vaddr_t, MB(4))
+
+#ifdef CONFIG_LIVEPATCH
+#define LIVEPATCH_VMAP_START   _AT(vaddr_t,0x00a00000)
+#define LIVEPATCH_VMAP_SIZE    _AT(vaddr_t, MB(2))
+#endif
+
+#define HYPERVISOR_VIRT_START  XEN_VIRT_START
+
+#ifdef CONFIG_ARM_32
+
+#define CONFIG_SEPARATE_XENHEAP 1
+
+#define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
+#define FRAMETABLE_SIZE        MB(128-32)
+#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
+#define FRAMETABLE_VIRT_END    (FRAMETABLE_VIRT_START + FRAMETABLE_SIZE - 1)
+
+#define VMAP_VIRT_START        _AT(vaddr_t,0x10000000)
+#define VMAP_VIRT_SIZE         _AT(vaddr_t, GB(1) - MB(256))
+
+#define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
+#define XENHEAP_VIRT_SIZE      _AT(vaddr_t, GB(1))
+
+#define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
+#define DOMHEAP_VIRT_SIZE      _AT(vaddr_t, GB(2))
+
+#define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
+
+/* Number of domheap pagetable pages required at the second level (2MB mappings) */
+#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
+
+#else /* ARM_64 */
+
+#define SLOT0_ENTRY_BITS  39
+#define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
+#define SLOT0_ENTRY_SIZE  SLOT0(1)
+
+#define VMAP_VIRT_START  GB(1)
+#define VMAP_VIRT_SIZE   GB(1)
+
+#define FRAMETABLE_VIRT_START  GB(32)
+
+#define DIRECTMAP_VIRT_START   SLOT0(256)
+#define DIRECTMAP_SIZE         (SLOT0_ENTRY_SIZE * (265-256))
+#define DIRECTMAP_VIRT_END     (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE - 1)
+
+#define XENHEAP_VIRT_START     directmap_virt_start
+
+#define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
+
+#endif
+
+#endif /* __ARM_CONFIG_MMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/config_mpu.h b/xen/arch/arm/include/asm/config_mpu.h
new file mode 100644
index 0000000000..6b52b11ef7
--- /dev/null
+++ b/xen/arch/arm/include/asm/config_mpu.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * config_mpu.h: A Linux-style configuration list for Arm MPU systems,
+ *               only can be included by config.h
+ */
+
+#ifndef __ARM_CONFIG_MPU_H__
+#define __ARM_CONFIG_MPU_H__
+
+#define XEN_START_ADDRESS CONFIG_XEN_START_ADDRESS
+
+/*
+ * All MPU platforms need to provide a XEN_START_ADDRESS for linker. This
+ * address indicates where Xen image will be loaded and run from. This
+ * address must be aligned to a PAGE_SIZE.
+ */
+#if (XEN_START_ADDRESS % PAGE_SIZE) != 0
+#error "XEN_START_ADDRESS must be aligned to PAGE_SIZE"
+#endif
+
+#define XEN_VIRT_START         _AT(paddr_t, XEN_START_ADDRESS)
+
+#define HYPERVISOR_VIRT_START  XEN_VIRT_START
+
+#endif /* __ARM_CONFIG_MPU_H__ */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (9 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-19 10:18   ` Ayan Kumar Halder
  2023-01-19 15:04   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement Penny Zheng
                   ` (31 subsequent siblings)
  42 siblings, 2 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk

From: Penny Zheng <penny.zheng@arm.com>

The start-of-day Xen MPU memory region layout shall be like as follows:

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
......
xen_mpumap[max_xen_mpumap - 2]: Xen init data
xen_mpumap[max_xen_mpumap - 1]: Xen init text

max_xen_mpumap refers to the number of regions supported by the EL2 MPU.
The layout shall be compliant with what we describe in xen.lds.S, or the
codes need adjustment.

As MMU system and MPU system have different functions to create
the boot MMU/MPU memory management data, instead of introducing
extra #ifdef in main code flow, we introduce a neutral name
prepare_early_mappings for both, and also to replace create_page_tables for MMU.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/arm64/Makefile              |   2 +
 xen/arch/arm/arm64/head.S                |  17 +-
 xen/arch/arm/arm64/head_mmu.S            |   4 +-
 xen/arch/arm/arm64/head_mpu.S            | 323 +++++++++++++++++++++++
 xen/arch/arm/include/asm/arm64/mpu.h     |  63 +++++
 xen/arch/arm/include/asm/arm64/sysregs.h |  49 ++++
 xen/arch/arm/mm_mpu.c                    |  48 ++++
 xen/arch/arm/xen.lds.S                   |   4 +
 8 files changed, 502 insertions(+), 8 deletions(-)
 create mode 100644 xen/arch/arm/arm64/head_mpu.S
 create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
 create mode 100644 xen/arch/arm/mm_mpu.c

diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index 22da2f54b5..438c9737ad 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -10,6 +10,8 @@ obj-y += entry.o
 obj-y += head.o
 ifneq ($(CONFIG_HAS_MPU),y)
 obj-y += head_mmu.o
+else
+obj-y += head_mpu.o
 endif
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 782bd1f94c..145e3d53dc 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -68,9 +68,9 @@
  *  x24 -
  *  x25 -
  *  x26 - skip_zero_bss (boot cpu only)
- *  x27 -
- *  x28 -
- *  x29 -
+ *  x27 - region selector (mpu only)
+ *  x28 - prbar (mpu only)
+ *  x29 - prlar (mpu only)
  *  x30 - lr
  */
 
@@ -82,7 +82,7 @@
  * ---------------------------
  *
  * The requirements are:
- *   MMU = off, D-cache = off, I-cache = on or off,
+ *   MMU/MPU = off, D-cache = off, I-cache = on or off,
  *   x0 = physical address to the FDT blob.
  *
  * This must be the very first address in the loaded image.
@@ -252,7 +252,12 @@ real_start_efi:
 
         bl    check_cpu_mode
         bl    cpu_init
-        bl    create_page_tables
+
+        /*
+         * Create boot memory management data, pagetable for MMU systems
+         * and memory regions for MPU systems.
+         */
+        bl    prepare_early_mappings
         bl    enable_mmu
 
         /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
@@ -310,7 +315,7 @@ GLOBAL(init_secondary)
 #endif
         bl    check_cpu_mode
         bl    cpu_init
-        bl    create_page_tables
+        bl    prepare_early_mappings
         bl    enable_mmu
 
         /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index 6ff13c751c..2346f755df 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -123,7 +123,7 @@
  *
  * Clobbers x0 - x4
  */
-ENTRY(create_page_tables)
+ENTRY(prepare_early_mappings)
         /* Prepare the page-tables for mapping Xen */
         ldr   x0, =XEN_VIRT_START
         create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
@@ -208,7 +208,7 @@ virtphys_clash:
         /* Identity map clashes with boot_third, which we cannot handle yet */
         PRINT("- Unable to build boot page tables - virt and phys addresses clash. -\r\n")
         b     fail
-ENDPROC(create_page_tables)
+ENDPROC(prepare_early_mappings)
 
 /*
  * Turn on the Data Cache and the MMU. The function will return on the 1:1
diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
new file mode 100644
index 0000000000..0b97ce4646
--- /dev/null
+++ b/xen/arch/arm/arm64/head_mpu.S
@@ -0,0 +1,323 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Start-of-day code for an Armv8-R AArch64 MPU system.
+ */
+
+#include <asm/arm64/mpu.h>
+#include <asm/early_printk.h>
+#include <asm/page.h>
+
+/*
+ * One entry in Xen MPU memory region mapping table(xen_mpumap) is a structure
+ * of pr_t, which is 16-bytes size, so the entry offset is the order of 4.
+ */
+#define MPU_ENTRY_SHIFT         0x4
+
+#define REGION_SEL_MASK         0xf
+
+#define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
+#define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
+#define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
+
+#define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
+
+/*
+ * Macro to round up the section address to be PAGE_SIZE aligned
+ * Each section(e.g. .text, .data, etc) in xen.lds.S is page-aligned,
+ * which is usually guarded with ". = ALIGN(PAGE_SIZE)" in the head,
+ * or in the end
+ */
+.macro roundup_section, xb
+        add   \xb, \xb, #(PAGE_SIZE-1)
+        and   \xb, \xb, #PAGE_MASK
+.endm
+
+/*
+ * Macro to create a new MPU memory region entry, which is a structure
+ * of pr_t,  in \prmap.
+ *
+ * Inputs:
+ * prmap:   mpu memory region map table symbol
+ * sel:     region selector
+ * prbar:   preserve value for PRBAR_EL2
+ * prlar    preserve value for PRLAR_EL2
+ *
+ * Clobbers \tmp1, \tmp2
+ *
+ */
+.macro create_mpu_entry prmap, sel, prbar, prlar, tmp1, tmp2
+    mov   \tmp2, \sel
+    lsl   \tmp2, \tmp2, #MPU_ENTRY_SHIFT
+    adr_l \tmp1, \prmap
+    /* Write the first 8 bytes(prbar_t) of pr_t */
+    str   \prbar, [\tmp1, \tmp2]
+
+    add   \tmp2, \tmp2, #8
+    /* Write the last 8 bytes(prlar_t) of pr_t */
+    str   \prlar, [\tmp1, \tmp2]
+.endm
+
+/*
+ * Macro to store the maximum number of regions supported by the EL2 MPU
+ * in max_xen_mpumap, which is identified by MPUIR_EL2.
+ *
+ * Outputs:
+ * nr_regions: preserve the maximum number of regions supported by the EL2 MPU
+ *
+ * Clobbers \tmp1
+ *
+ */
+.macro read_max_el2_regions, nr_regions, tmp1
+    load_paddr \tmp1, max_xen_mpumap
+    mrs   \nr_regions, MPUIR_EL2
+    isb
+    str   \nr_regions, [\tmp1]
+.endm
+
+/*
+ * Macro to prepare and set a MPU memory region
+ *
+ * Inputs:
+ * base:        base address symbol (should be page-aligned)
+ * limit:       limit address symbol
+ * sel:         region selector
+ * prbar:       store computed PRBAR_EL2 value
+ * prlar:       store computed PRLAR_EL2 value
+ * attr_prbar:  PRBAR_EL2-related memory attributes. If not specified it will be REGION_DATA_PRBAR
+ * attr_prlar:  PRLAR_EL2-related memory attributes. If not specified it will be REGION_NORMAL_PRLAR
+ *
+ * Clobber \tmp1
+ *
+ */
+.macro prepare_xen_region, base, limit, sel, prbar, prlar, tmp1, attr_prbar=REGION_DATA_PRBAR, attr_prlar=REGION_NORMAL_PRLAR
+    /* Prepare value for PRBAR_EL2 reg and preserve it in \prbar.*/
+    load_paddr \prbar, \base
+    and   \prbar, \prbar, #MPU_REGION_MASK
+    mov   \tmp1, #\attr_prbar
+    orr   \prbar, \prbar, \tmp1
+
+    /* Prepare value for PRLAR_EL2 reg and preserve it in \prlar.*/
+    load_paddr \prlar, \limit
+    /* Round up limit address to be PAGE_SIZE aligned */
+    roundup_section \prlar
+    /* Limit address should be inclusive */
+    sub   \prlar, \prlar, #1
+    and   \prlar, \prlar, #MPU_REGION_MASK
+    mov   \tmp1, #\attr_prlar
+    orr   \prlar, \prlar, \tmp1
+
+    mov   x27, \sel
+    mov   x28, \prbar
+    mov   x29, \prlar
+    /*
+     * x27, x28, x29 are special registers designed as
+     * inputs for function write_pr
+     */
+    bl    write_pr
+.endm
+
+.section .text.idmap, "ax", %progbits
+
+/*
+ * ENTRY to configure a EL2 MPU memory region
+ * ARMv8-R AArch64 at most supports 255 MPU protection regions.
+ * See section G1.3.18 of the reference manual for ARMv8-R AArch64,
+ * PRBAR<n>_EL2 and PRLAR<n>_EL2 provides access to the EL2 MPU region
+ * determined by the value of 'n' and PRSELR_EL2.REGION as
+ * PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
+ * For example to access regions from 16 to 31 (0b10000 to 0b11111):
+ * - Set PRSELR_EL2 to 0b1xxxx
+ * - Region 16 configuration is accessible through PRBAR0_EL2 and PRLAR0_EL2
+ * - Region 17 configuration is accessible through PRBAR1_EL2 and PRLAR1_EL2
+ * - Region 18 configuration is accessible through PRBAR2_EL2 and PRLAR2_EL2
+ * - ...
+ * - Region 31 configuration is accessible through PRBAR15_EL2 and PRLAR15_EL2
+ *
+ * Inputs:
+ * x27: region selector
+ * x28: preserve value for PRBAR_EL2
+ * x29: preserve value for PRLAR_EL2
+ *
+ */
+ENTRY(write_pr)
+    msr   PRSELR_EL2, x27
+    dsb   sy
+    and   x27, x27, #REGION_SEL_MASK
+    cmp   x27, #0
+    bne   1f
+    msr   PRBAR0_EL2, x28
+    msr   PRLAR0_EL2, x29
+    b     out
+1:
+    cmp   x27, #1
+    bne   2f
+    msr   PRBAR1_EL2, x28
+    msr   PRLAR1_EL2, x29
+    b     out
+2:
+    cmp   x27, #2
+    bne   3f
+    msr   PRBAR2_EL2, x28
+    msr   PRLAR2_EL2, x29
+    b     out
+3:
+    cmp   x27, #3
+    bne   4f
+    msr   PRBAR3_EL2, x28
+    msr   PRLAR3_EL2, x29
+    b     out
+4:
+    cmp   x27, #4
+    bne   5f
+    msr   PRBAR4_EL2, x28
+    msr   PRLAR4_EL2, x29
+    b     out
+5:
+    cmp   x27, #5
+    bne   6f
+    msr   PRBAR5_EL2, x28
+    msr   PRLAR5_EL2, x29
+    b     out
+6:
+    cmp   x27, #6
+    bne   7f
+    msr   PRBAR6_EL2, x28
+    msr   PRLAR6_EL2, x29
+    b     out
+7:
+    cmp   x27, #7
+    bne   8f
+    msr   PRBAR7_EL2, x28
+    msr   PRLAR7_EL2, x29
+    b     out
+8:
+    cmp   x27, #8
+    bne   9f
+    msr   PRBAR8_EL2, x28
+    msr   PRLAR8_EL2, x29
+    b     out
+9:
+    cmp   x27, #9
+    bne   10f
+    msr   PRBAR9_EL2, x28
+    msr   PRLAR9_EL2, x29
+    b     out
+10:
+    cmp   x27, #10
+    bne   11f
+    msr   PRBAR10_EL2, x28
+    msr   PRLAR10_EL2, x29
+    b     out
+11:
+    cmp   x27, #11
+    bne   12f
+    msr   PRBAR11_EL2, x28
+    msr   PRLAR11_EL2, x29
+    b     out
+12:
+    cmp   x27, #12
+    bne   13f
+    msr   PRBAR12_EL2, x28
+    msr   PRLAR12_EL2, x29
+    b     out
+13:
+    cmp   x27, #13
+    bne   14f
+    msr   PRBAR13_EL2, x28
+    msr   PRLAR13_EL2, x29
+    b     out
+14:
+    cmp   x27, #14
+    bne   15f
+    msr   PRBAR14_EL2, x28
+    msr   PRLAR14_EL2, x29
+    b     out
+15:
+    msr   PRBAR15_EL2, x28
+    msr   PRLAR15_EL2, x29
+out:
+    isb
+    ret
+ENDPROC(write_pr)
+
+/*
+ * Static start-of-day Xen EL2 MPU memory region layout.
+ *
+ *     xen_mpumap[0] : Xen text
+ *     xen_mpumap[1] : Xen read-only data
+ *     xen_mpumap[2] : Xen read-only after init data
+ *     xen_mpumap[3] : Xen read-write data
+ *     xen_mpumap[4] : Xen BSS
+ *     ......
+ *     xen_mpumap[max_xen_mpumap - 2]: Xen init data
+ *     xen_mpumap[max_xen_mpumap - 1]: Xen init text
+ *
+ * Clobbers x0 - x6
+ *
+ * It shall be compliant with what describes in xen.lds.S, or the below
+ * codes need adjustment.
+ * It shall also follow the rules of putting fixed MPU memory region in
+ * the front, and the others in the rear, which, here, mainly refers to
+ * boot-only region, like Xen init text region.
+ */
+ENTRY(prepare_early_mappings)
+    /* stack LR as write_pr will be called later like nested function */
+    mov   x6, lr
+
+    /* x0: region sel */
+    mov   x0, xzr
+    /* Xen text section. */
+    prepare_xen_region _stext, _etext, x0, x1, x2, x3, attr_prbar=REGION_TEXT_PRBAR
+    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
+
+    add   x0, x0, #1
+    /* Xen read-only data section. */
+    prepare_xen_region _srodata, _erodata, x0, x1, x2, x3, attr_prbar=REGION_RO_PRBAR
+    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
+
+    add   x0, x0, #1
+    /* Xen read-only after init data section. */
+    prepare_xen_region __ro_after_init_start, __ro_after_init_end, x0, x1, x2, x3
+    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
+
+    add   x0, x0, #1
+    /* Xen read-write data section. */
+    prepare_xen_region __data_begin, __init_begin, x0, x1, x2, x3
+    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
+
+    read_max_el2_regions x5, x3 /* x5: max_mpumap */
+    sub   x5, x5, #1
+    /* Xen init text section. */
+    prepare_xen_region _sinittext, _einittext, x5, x1, x2, x3, attr_prbar=REGION_TEXT_PRBAR
+    create_mpu_entry xen_mpumap, x5, x1, x2, x3, x4
+
+    sub   x5, x5, #1
+    /* Xen init data section. */
+    prepare_xen_region __init_data_begin, __init_end, x5, x1, x2, x3
+    create_mpu_entry xen_mpumap, x5, x1, x2, x3, x4
+
+    add   x0, x0, #1
+    /* Xen BSS section. */
+    prepare_xen_region __bss_start, __bss_end, x0, x1, x2, x3
+    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
+
+    /* Update next_fixed_region_idx and next_transient_region_idx */
+    load_paddr x3, next_fixed_region_idx
+    add   x0, x0, #1
+    str   x0, [x3]
+    load_paddr x4, next_transient_region_idx
+    sub   x5, x5, #1
+    str   x5, [x4]
+
+    mov   lr, x6
+    ret
+ENDPROC(prepare_early_mappings)
+
+GLOBAL(_end_boot)
+
+/*
+ * Local variables:
+ * mode: ASM
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
new file mode 100644
index 0000000000..c945dd53db
--- /dev/null
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * mpu.h: Arm Memory Protection Region definitions.
+ */
+
+#ifndef __ARM64_MPU_H__
+#define __ARM64_MPU_H__
+
+#define MPU_REGION_SHIFT  6
+#define MPU_REGION_ALIGN  (_AC(1, UL) << MPU_REGION_SHIFT)
+#define MPU_REGION_MASK   (~(MPU_REGION_ALIGN - 1))
+
+/*
+ * MPUIR_EL2.Region identifies the number of regions supported by the EL2 MPU.
+ * It is a 8-bit field, so 255 MPU memory regions at most.
+ */
+#define ARM_MAX_MPU_MEMORY_REGIONS 255
+
+#ifndef __ASSEMBLY__
+
+/* Protection Region Base Address Register */
+typedef union {
+    struct __packed {
+        unsigned long xn:2;       /* Execute-Never */
+        unsigned long ap:2;       /* Acess Permission */
+        unsigned long sh:2;       /* Sharebility */
+        unsigned long base:42;    /* Base Address */
+        unsigned long pad:16;
+    } reg;
+    uint64_t bits;
+} prbar_t;
+
+/* Protection Region Limit Address Register */
+typedef union {
+    struct __packed {
+        unsigned long en:1;     /* Region enable */
+        unsigned long ai:3;     /* Memory Attribute Index */
+        unsigned long ns:1;     /* Not-Secure */
+        unsigned long res:1;    /* Reserved 0 by hardware */
+        unsigned long limit:42; /* Limit Address */
+        unsigned long pad:16;
+    } reg;
+    uint64_t bits;
+} prlar_t;
+
+/* MPU Protection Region */
+typedef struct {
+    prbar_t prbar;
+    prlar_t prlar;
+} pr_t;
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ARM64_MPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h b/xen/arch/arm/include/asm/arm64/sysregs.h
index 4638999514..aca9bca5b1 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -458,6 +458,55 @@
 #define ZCR_ELx_LEN_SIZE             9
 #define ZCR_ELx_LEN_MASK             0x1ff
 
+/* System registers for Armv8-R AArch64 */
+#ifdef CONFIG_HAS_MPU
+
+/* EL2 MPU Protection Region Base Address Register encode */
+#define PRBAR_EL2   S3_4_C6_C8_0
+#define PRBAR0_EL2  S3_4_C6_C8_0
+#define PRBAR1_EL2  S3_4_C6_C8_4
+#define PRBAR2_EL2  S3_4_C6_C9_0
+#define PRBAR3_EL2  S3_4_C6_C9_4
+#define PRBAR4_EL2  S3_4_C6_C10_0
+#define PRBAR5_EL2  S3_4_C6_C10_4
+#define PRBAR6_EL2  S3_4_C6_C11_0
+#define PRBAR7_EL2  S3_4_C6_C11_4
+#define PRBAR8_EL2  S3_4_C6_C12_0
+#define PRBAR9_EL2  S3_4_C6_C12_4
+#define PRBAR10_EL2 S3_4_C6_C13_0
+#define PRBAR11_EL2 S3_4_C6_C13_4
+#define PRBAR12_EL2 S3_4_C6_C14_0
+#define PRBAR13_EL2 S3_4_C6_C14_4
+#define PRBAR14_EL2 S3_4_C6_C15_0
+#define PRBAR15_EL2 S3_4_C6_C15_4
+
+/* EL2 MPU Protection Region Limit Address Register encode */
+#define PRLAR_EL2   S3_4_C6_C8_1
+#define PRLAR0_EL2  S3_4_C6_C8_1
+#define PRLAR1_EL2  S3_4_C6_C8_5
+#define PRLAR2_EL2  S3_4_C6_C9_1
+#define PRLAR3_EL2  S3_4_C6_C9_5
+#define PRLAR4_EL2  S3_4_C6_C10_1
+#define PRLAR5_EL2  S3_4_C6_C10_5
+#define PRLAR6_EL2  S3_4_C6_C11_1
+#define PRLAR7_EL2  S3_4_C6_C11_5
+#define PRLAR8_EL2  S3_4_C6_C12_1
+#define PRLAR9_EL2  S3_4_C6_C12_5
+#define PRLAR10_EL2 S3_4_C6_C13_1
+#define PRLAR11_EL2 S3_4_C6_C13_5
+#define PRLAR12_EL2 S3_4_C6_C14_1
+#define PRLAR13_EL2 S3_4_C6_C14_5
+#define PRLAR14_EL2 S3_4_C6_C15_1
+#define PRLAR15_EL2 S3_4_C6_C15_5
+
+/* MPU Protection Region Selection Register encode */
+#define PRSELR_EL2 S3_4_C6_C2_1
+
+/* MPU Type registers encode */
+#define MPUIR_EL2 S3_4_C0_C0_4
+
+#endif
+
 /* Access to system registers */
 
 #define WRITE_SYSREG64(v, name) do {                    \
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
new file mode 100644
index 0000000000..43e9a1be4d
--- /dev/null
+++ b/xen/arch/arm/mm_mpu.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * xen/arch/arm/mm_mpu.c
+ *
+ * MPU based memory managment code for Armv8-R AArch64.
+ *
+ * Copyright (C) 2022 Arm Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/init.h>
+#include <xen/page-size.h>
+#include <asm/arm64/mpu.h>
+
+/* Xen MPU memory region mapping table. */
+pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
+     xen_mpumap[ARM_MAX_MPU_MEMORY_REGIONS];
+
+/* Index into MPU memory region map for fixed regions, ascending from zero. */
+uint64_t __ro_after_init next_fixed_region_idx;
+/*
+ * Index into MPU memory region map for transient regions, like boot-only
+ * region, which descends from max_xen_mpumap.
+ */
+uint64_t __ro_after_init next_transient_region_idx;
+
+/* Maximum number of supported MPU memory regions by the EL2 MPU. */
+uint64_t __ro_after_init max_xen_mpumap;
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index bc45ea2c65..79965a3c17 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -91,6 +91,8 @@ SECTIONS
       __ro_after_init_end = .;
   } : text
 
+  . = ALIGN(PAGE_SIZE);
+  __data_begin = .;
   .data.read_mostly : {
        /* Exception table */
        __start___ex_table = .;
@@ -157,7 +159,9 @@ SECTIONS
        *(.altinstr_replacement)
   } :text
   . = ALIGN(PAGE_SIZE);
+
   .init.data : {
+       __init_data_begin = .;            /* Init data */
        *(.init.rodata)
        *(.init.rodata.*)
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (10 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-23 17:07   ` Ayan Kumar Halder
  2023-01-24 18:54   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART Penny Zheng
                   ` (30 subsequent siblings)
  42 siblings, 2 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

We need a new helper for Xen to enable MPU in boot-time.
The new helper is semantically consistent with the original enable_mmu.

If the Background region is enabled, then the MPU uses the default memory
map as the Background region for generating the memory
attributes when MPU is disabled.
Since the default memory map of the Armv8-R AArch64 architecture is
IMPLEMENTATION DEFINED, we always turn off the Background region.

In this patch, we also introduce a neutral name enable_mm for
Xen to enable MMU/MPU. This can help us to keep one code flow
in head.S

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/arm64/head.S     |  5 +++--
 xen/arch/arm/arm64/head_mmu.S |  4 ++--
 xen/arch/arm/arm64/head_mpu.S | 19 +++++++++++++++++++
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 145e3d53dc..7f3f973468 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -258,7 +258,8 @@ real_start_efi:
          * and memory regions for MPU systems.
          */
         bl    prepare_early_mappings
-        bl    enable_mmu
+        /* Turn on MMU or MPU */
+        bl    enable_mm
 
         /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
         ldr   x0, =primary_switched
@@ -316,7 +317,7 @@ GLOBAL(init_secondary)
         bl    check_cpu_mode
         bl    cpu_init
         bl    prepare_early_mappings
-        bl    enable_mmu
+        bl    enable_mm
 
         /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
         ldr   x0, =secondary_switched
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index 2346f755df..b59c40495f 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -217,7 +217,7 @@ ENDPROC(prepare_early_mappings)
  *
  * Clobbers x0 - x3
  */
-ENTRY(enable_mmu)
+ENTRY(enable_mm)
         PRINT("- Turning on paging -\r\n")
 
         /*
@@ -239,7 +239,7 @@ ENTRY(enable_mmu)
         msr   SCTLR_EL2, x0          /* now paging is enabled */
         isb                          /* Now, flush the icache */
         ret
-ENDPROC(enable_mmu)
+ENDPROC(enable_mm)
 
 /*
  * Remove the 1:1 map from the page-tables. It is not easy to keep track
diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
index 0b97ce4646..e2ac69b0cc 100644
--- a/xen/arch/arm/arm64/head_mpu.S
+++ b/xen/arch/arm/arm64/head_mpu.S
@@ -315,6 +315,25 @@ ENDPROC(prepare_early_mappings)
 
 GLOBAL(_end_boot)
 
+/*
+ * Enable EL2 MPU and data cache
+ * If the Background region is enabled, then the MPU uses the default memory
+ * map as the Background region for generating the memory
+ * attributes when MPU is disabled.
+ * Since the default memory map of the Armv8-R AArch64 architecture is
+ * IMPLEMENTATION DEFINED, we intend to turn off the Background region here.
+ */
+ENTRY(enable_mm)
+    mrs   x0, SCTLR_EL2
+    orr   x0, x0, #SCTLR_Axx_ELx_M    /* Enable MPU */
+    orr   x0, x0, #SCTLR_Axx_ELx_C    /* Enable D-cache */
+    orr   x0, x0, #SCTLR_Axx_ELx_WXN  /* Enable WXN */
+    dsb   sy
+    msr   SCTLR_EL2, x0
+    isb
+    ret
+ENDPROC(enable_mm)
+
 /*
  * Local variables:
  * mode: ASM
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (11 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-24 19:09   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 14/40] xen/arm64: head: Jump to the runtime mapping in enable_mm() Penny Zheng
                   ` (29 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In MMU system, we map the UART in the fixmap (when earlyprintk is used).
However in MPU system, we map the UART with a transient MPU memory
region.

So we introduce a new unified function setup_early_uart to replace
the previous setup_fixmap.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/arm64/head.S               |  2 +-
 xen/arch/arm/arm64/head_mmu.S           |  4 +-
 xen/arch/arm/arm64/head_mpu.S           | 52 +++++++++++++++++++++++++
 xen/arch/arm/include/asm/early_printk.h |  1 +
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 7f3f973468..a92883319d 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -272,7 +272,7 @@ primary_switched:
          * afterwards.
          */
         bl    remove_identity_mapping
-        bl    setup_fixmap
+        bl    setup_early_uart
 #ifdef CONFIG_EARLY_PRINTK
         /* Use a virtual address to access the UART. */
         ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index b59c40495f..a19b7c873d 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
  *
  * Clobbers x0 - x3
  */
-ENTRY(setup_fixmap)
+ENTRY(setup_early_uart)
 #ifdef CONFIG_EARLY_PRINTK
         /* Add UART to the fixmap table */
         ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
@@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
         dsb   nshst
 
         ret
-ENDPROC(setup_fixmap)
+ENDPROC(setup_early_uart)
 
 /* Fail-stop */
 fail:   PRINT("- Boot failed -\r\n")
diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
index e2ac69b0cc..72d1e0863d 100644
--- a/xen/arch/arm/arm64/head_mpu.S
+++ b/xen/arch/arm/arm64/head_mpu.S
@@ -18,8 +18,10 @@
 #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
 #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
 #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
+#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
 
 #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
+#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
 
 /*
  * Macro to round up the section address to be PAGE_SIZE aligned
@@ -334,6 +336,56 @@ ENTRY(enable_mm)
     ret
 ENDPROC(enable_mm)
 
+/*
+ * Map the early UART with a new transient MPU memory region.
+ *
+ * x27: region selector
+ * x28: prbar
+ * x29: prlar
+ *
+ * Clobbers x0 - x4
+ *
+ */
+ENTRY(setup_early_uart)
+#ifdef CONFIG_EARLY_PRINTK
+    /* stack LR as write_pr will be called later like nested function */
+    mov   x3, lr
+
+    /*
+     * MPU region for early UART is a transient region, since it will be
+     * replaced by specific device memory layout when FDT gets parsed.
+     */
+    load_paddr x0, next_transient_region_idx
+    ldr   x4, [x0]
+
+    ldr   x28, =CONFIG_EARLY_UART_BASE_ADDRESS
+    and   x28, x28, #MPU_REGION_MASK
+    mov   x1, #REGION_DEVICE_PRBAR
+    orr   x28, x28, x1
+
+    ldr x29, =(CONFIG_EARLY_UART_BASE_ADDRESS + EARLY_UART_SIZE)
+    roundup_section x29
+    /* Limit address is inclusive */
+    sub   x29, x29, #1
+    and   x29, x29, #MPU_REGION_MASK
+    mov   x2, #REGION_DEVICE_PRLAR
+    orr   x29, x29, x2
+
+    mov   x27, x4
+    bl    write_pr
+
+    /* Create a new entry in xen_mpumap for early UART */
+    create_mpu_entry xen_mpumap, x4, x28, x29, x1, x2
+
+    /* Update next_transient_region_idx */
+    sub   x4, x4, #1
+    str   x4, [x0]
+
+    mov   lr, x3
+    ret
+#endif
+ENDPROC(setup_early_uart)
+
 /*
  * Local variables:
  * mode: ASM
diff --git a/xen/arch/arm/include/asm/early_printk.h b/xen/arch/arm/include/asm/early_printk.h
index 44a230853f..d87623e6d5 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -22,6 +22,7 @@
  * for EARLY_UART_VIRTUAL_ADDRESS.
  */
 #define EARLY_UART_VIRTUAL_ADDRESS CONFIG_EARLY_UART_BASE_ADDRESS
+#define EARLY_UART_SIZE            0x1000
 
 #else
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 14/40] xen/arm64: head: Jump to the runtime mapping in enable_mm()
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (12 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-05 21:13   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h Penny Zheng
                   ` (28 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

At the moment, on MMU system, enable_mm() will return to an address in
the 1:1 mapping, then each path is responsible to switch to virtual runtime
mapping. Then remove_identity_mapping() is called to remove all 1:1 mapping.

Since remove_identity_mapping() is not necessary on MPU system, and we also
avoid creating empty function for MPU system, trying to keep only one codeflow
in arm64/head.S, we move path switch and remove_identity_mapping() in
enable_mm() on MMU system.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/arm64/head.S     | 28 +++++++++++++---------------
 xen/arch/arm/arm64/head_mmu.S | 33 ++++++++++++++++++++++++++++++---
 2 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index a92883319d..6358305f03 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -258,20 +258,15 @@ real_start_efi:
          * and memory regions for MPU systems.
          */
         bl    prepare_early_mappings
+        /*
+         * Address in the runtime mapping to jump to after the
+         * MMU/MPU is enabled
+         */
+        ldr   lr, =primary_switched
         /* Turn on MMU or MPU */
-        bl    enable_mm
+        b    enable_mm
 
-        /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
-        ldr   x0, =primary_switched
-        br    x0
 primary_switched:
-        /*
-         * The 1:1 map may clash with other parts of the Xen virtual memory
-         * layout. As it is not used anymore, remove it completely to
-         * avoid having to worry about replacing existing mapping
-         * afterwards.
-         */
-        bl    remove_identity_mapping
         bl    setup_early_uart
 #ifdef CONFIG_EARLY_PRINTK
         /* Use a virtual address to access the UART. */
@@ -317,11 +312,14 @@ GLOBAL(init_secondary)
         bl    check_cpu_mode
         bl    cpu_init
         bl    prepare_early_mappings
-        bl    enable_mm
 
-        /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
-        ldr   x0, =secondary_switched
-        br    x0
+        /*
+         * Address in the runtime mapping to jump to after the
+         * MMU/MPU is enabled
+         */
+        ldr   lr, =secondary_switched
+        b    enable_mm
+
 secondary_switched:
         /*
          * Non-boot CPUs need to move on to the proper pagetables, which were
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index a19b7c873d..c9e83bbe2d 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -211,9 +211,11 @@ virtphys_clash:
 ENDPROC(prepare_early_mappings)
 
 /*
- * Turn on the Data Cache and the MMU. The function will return on the 1:1
- * mapping. In other word, the caller is responsible to switch to the runtime
- * mapping.
+ * Turn on the Data Cache and the MMU. The function will return
+ * to the virtual address provided in LR (e.g. the runtime mapping).
+ *
+ * Inputs:
+ * lr(x30): Virtual address to return to
  *
  * Clobbers x0 - x3
  */
@@ -238,6 +240,31 @@ ENTRY(enable_mm)
         dsb   sy                     /* Flush PTE writes and finish reads */
         msr   SCTLR_EL2, x0          /* now paging is enabled */
         isb                          /* Now, flush the icache */
+
+        /*
+         * The MMU is turned on and we are in the 1:1 mapping. Switch
+         * to the runtime mapping.
+         */
+        ldr   x0, =1f
+        br    x0
+1:
+        /*
+         * The 1:1 map may clash with other parts of the Xen virtual memory
+         * layout. As it is not used anymore, remove it completely to
+         * avoid having to worry about replacing existing mapping
+         * afterwards.
+         *
+         * On return this will jump to the virtual address requested by
+         * the caller
+         */
+        b     remove_identity_mapping
+
+        /*
+         * Here might not be reached, as "ret" in remove_identity_mapping
+         * will use the return address in LR in advance. But keep ret here
+         * might be more safe if "ret" in remove_identity_mapping is removed
+         * in future.
+         */
         ret
 ENDPROC(enable_mm)
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (13 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 14/40] xen/arm64: head: Jump to the runtime mapping in enable_mm() Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-05 21:30   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings Penny Zheng
                   ` (27 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng

From: Wei Chen <wei.chen@arm.com>

To make the code readable and maintainable, we move MMU-specific
memory management code from mm.c to mm_mmu.c and move MMU-specific
definitions from mm.h to mm_mmu.h.
Later we will create mm_mpu.h and mm_mpu.c for MPU-specific memory
management code.
This will avoid lots of #ifdef in memory management code and header files.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
---
 xen/arch/arm/Makefile             |    5 +
 xen/arch/arm/include/asm/mm.h     |   19 +-
 xen/arch/arm/include/asm/mm_mmu.h |   35 +
 xen/arch/arm/mm.c                 | 1352 +---------------------------
 xen/arch/arm/mm_mmu.c             | 1376 +++++++++++++++++++++++++++++
 xen/arch/arm/mm_mpu.c             |   67 ++
 6 files changed, 1488 insertions(+), 1366 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/mm_mmu.h
 create mode 100644 xen/arch/arm/mm_mmu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 4d076b278b..21188b207f 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -37,6 +37,11 @@ obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += mem_access.o
 obj-y += mm.o
+ifneq ($(CONFIG_HAS_MPU), y)
+obj-y += mm_mmu.o
+else
+obj-y += mm_mpu.o
+endif
 obj-y += monitor.o
 obj-y += p2m.o
 obj-y += percpu.o
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 68adcac9fa..1b9fdb6ff5 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -154,13 +154,6 @@ struct page_info
 #define _PGC_need_scrub   _PGC_allocated
 #define PGC_need_scrub    PGC_allocated
 
-extern mfn_t directmap_mfn_start, directmap_mfn_end;
-extern vaddr_t directmap_virt_end;
-#ifdef CONFIG_ARM_64
-extern vaddr_t directmap_virt_start;
-extern unsigned long directmap_base_pdx;
-#endif
-
 #ifdef CONFIG_ARM_32
 #define is_xen_heap_page(page) is_xen_heap_mfn(page_to_mfn(page))
 #define is_xen_heap_mfn(mfn) ({                                 \
@@ -192,8 +185,6 @@ extern unsigned long total_pages;
 
 #define PDX_GROUP_SHIFT SECOND_SHIFT
 
-/* Boot-time pagetable setup */
-extern void setup_pagetables(unsigned long boot_phys_offset);
 /* Map FDT in boot pagetable */
 extern void *early_fdt_map(paddr_t fdt_paddr);
 /* Remove early mappings */
@@ -203,12 +194,6 @@ extern void remove_early_mappings(void);
 extern int init_secondary_pagetables(int cpu);
 /* Switch secondary CPUS to its own pagetables and finalise MMU setup */
 extern void mmu_init_secondary_cpu(void);
-/*
- * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
- * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
- * For Arm64, map the region in the directmap area.
- */
-extern void setup_directmap_mappings(unsigned long base_mfn, unsigned long nr_mfns);
 /* Map a frame table to cover physical addresses ps through pe */
 extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
 /* map a physical range in virtual memory */
@@ -256,6 +241,10 @@ static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
 #define vmap_to_mfn(va)     maddr_to_mfn(virt_to_maddr((vaddr_t)va))
 #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
 
+#ifndef CONFIG_HAS_MPU
+#include <asm/mm_mmu.h>
+#endif
+
 /* Page-align address and convert to frame number format */
 #define paddr_to_pfn_aligned(paddr)    paddr_to_pfn(PAGE_ALIGN(paddr))
 
diff --git a/xen/arch/arm/include/asm/mm_mmu.h b/xen/arch/arm/include/asm/mm_mmu.h
new file mode 100644
index 0000000000..a5e63d8af8
--- /dev/null
+++ b/xen/arch/arm/include/asm/mm_mmu.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ARCH_ARM_MM_MMU__
+#define __ARCH_ARM_MM_MMU__
+
+extern mfn_t directmap_mfn_start, directmap_mfn_end;
+extern vaddr_t directmap_virt_end;
+#ifdef CONFIG_ARM_64
+extern vaddr_t directmap_virt_start;
+extern unsigned long directmap_base_pdx;
+#endif
+
+/* Boot-time pagetable setup */
+extern void setup_pagetables(unsigned long boot_phys_offset);
+#define setup_mm_mappings(boot_phys_offset) setup_pagetables(boot_phys_offset)
+
+/* Non-boot CPUs use this to find the correct pagetables. */
+extern uint64_t init_ttbr;
+/*
+ * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
+ * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
+ * For Arm64, map the region in the directmap area.
+ */
+extern void setup_directmap_mappings(unsigned long base_mfn,
+                                     unsigned long nr_mfns);
+
+#endif /* __ARCH_ARM_MM_MMU__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 8f15814c5e..e1ce2a62dc 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -2,371 +2,24 @@
 /*
  * xen/arch/arm/mm.c
  *
- * MMU code for an ARMv7-A with virt extensions.
+ * Memory management common code for MMU and MPU system.
  *
  * Tim Deegan <tim@xen.org>
  * Copyright (c) 2011 Citrix Systems.
  */
 
 #include <xen/domain_page.h>
-#include <xen/errno.h>
 #include <xen/grant_table.h>
-#include <xen/guest_access.h>
-#include <xen/init.h>
-#include <xen/libfdt/libfdt.h>
-#include <xen/mm.h>
-#include <xen/pfn.h>
-#include <xen/pmap.h>
 #include <xen/sched.h>
-#include <xen/sizes.h>
 #include <xen/types.h>
-#include <xen/vmap.h>
 
 #include <xsm/xsm.h>
 
-#include <asm/fixmap.h>
-#include <asm/setup.h>
-
-#include <public/memory.h>
-
-/* Override macros from asm/page.h to make them work with mfn_t */
-#undef virt_to_mfn
-#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
-#undef mfn_to_virt
-#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))
-
-#ifdef NDEBUG
-static inline void
-__attribute__ ((__format__ (__printf__, 1, 2)))
-mm_printk(const char *fmt, ...) {}
-#else
-#define mm_printk(fmt, args...)             \
-    do                                      \
-    {                                       \
-        dprintk(XENLOG_ERR, fmt, ## args);  \
-        WARN();                             \
-    } while (0)
-#endif
-
-/* Static start-of-day pagetables that we use before the allocators
- * are up. These are used by all CPUs during bringup before switching
- * to the CPUs own pagetables.
- *
- * These pagetables have a very simple structure. They include:
- *  - 2MB worth of 4K mappings of xen at XEN_VIRT_START, boot_first and
- *    boot_second are used to populate the tables down to boot_third
- *    which contains the actual mapping.
- *  - a 1:1 mapping of xen at its current physical address. This uses a
- *    section mapping at whichever of boot_{pgtable,first,second}
- *    covers that physical address.
- *
- * For the boot CPU these mappings point to the address where Xen was
- * loaded by the bootloader. For secondary CPUs they point to the
- * relocated copy of Xen for the benefit of secondary CPUs.
- *
- * In addition to the above for the boot CPU the device-tree is
- * initially mapped in the boot misc slot. This mapping is not present
- * for secondary CPUs.
- *
- * Finally, if EARLY_PRINTK is enabled then xen_fixmap will be mapped
- * by the CPU once it has moved off the 1:1 mapping.
- */
-DEFINE_BOOT_PAGE_TABLE(boot_pgtable);
-#ifdef CONFIG_ARM_64
-DEFINE_BOOT_PAGE_TABLE(boot_first);
-DEFINE_BOOT_PAGE_TABLE(boot_first_id);
-#endif
-DEFINE_BOOT_PAGE_TABLE(boot_second_id);
-DEFINE_BOOT_PAGE_TABLE(boot_third_id);
-DEFINE_BOOT_PAGE_TABLE(boot_second);
-DEFINE_BOOT_PAGE_TABLE(boot_third);
-
-/* Main runtime page tables */
-
-/*
- * For arm32 xen_pgtable are per-PCPU and are allocated before
- * bringing up each CPU. For arm64 xen_pgtable is common to all PCPUs.
- *
- * xen_second, xen_fixmap and xen_xenmap are always shared between all
- * PCPUs.
- */
-
-#ifdef CONFIG_ARM_64
-#define HYP_PT_ROOT_LEVEL 0
-static DEFINE_PAGE_TABLE(xen_pgtable);
-static DEFINE_PAGE_TABLE(xen_first);
-#define THIS_CPU_PGTABLE xen_pgtable
-#else
-#define HYP_PT_ROOT_LEVEL 1
-/* Per-CPU pagetable pages */
-/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
-DEFINE_PER_CPU(lpae_t *, xen_pgtable);
-#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
-/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
-static DEFINE_PAGE_TABLE(cpu0_pgtable);
-#endif
-
-/* Common pagetable leaves */
-/* Second level page table used to cover Xen virtual address space */
-static DEFINE_PAGE_TABLE(xen_second);
-/* Third level page table used for fixmap */
-DEFINE_BOOT_PAGE_TABLE(xen_fixmap);
-/*
- * Third level page table used to map Xen itself with the XN bit set
- * as appropriate.
- */
-static DEFINE_PAGE_TABLE(xen_xenmap);
-
-/* Non-boot CPUs use this to find the correct pagetables. */
-uint64_t init_ttbr;
-
-static paddr_t phys_offset;
-
-/* Limits of the Xen heap */
-mfn_t directmap_mfn_start __read_mostly = INVALID_MFN_INITIALIZER;
-mfn_t directmap_mfn_end __read_mostly;
-vaddr_t directmap_virt_end __read_mostly;
-#ifdef CONFIG_ARM_64
-vaddr_t directmap_virt_start __read_mostly;
-unsigned long directmap_base_pdx __read_mostly;
-#endif
-
 unsigned long frametable_base_pdx __read_mostly;
-unsigned long frametable_virt_end __read_mostly;
 
 unsigned long max_page;
 unsigned long total_pages;
 
-extern char __init_begin[], __init_end[];
-
-/* Checking VA memory layout alignment. */
-static void __init __maybe_unused build_assertions(void)
-{
-    /* 2MB aligned regions */
-    BUILD_BUG_ON(XEN_VIRT_START & ~SECOND_MASK);
-    BUILD_BUG_ON(FIXMAP_ADDR(0) & ~SECOND_MASK);
-    /* 1GB aligned regions */
-#ifdef CONFIG_ARM_32
-    BUILD_BUG_ON(XENHEAP_VIRT_START & ~FIRST_MASK);
-#else
-    BUILD_BUG_ON(DIRECTMAP_VIRT_START & ~FIRST_MASK);
-#endif
-    /* Page table structure constraints */
-#ifdef CONFIG_ARM_64
-    BUILD_BUG_ON(zeroeth_table_offset(XEN_VIRT_START));
-#endif
-    BUILD_BUG_ON(first_table_offset(XEN_VIRT_START));
-#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
-    BUILD_BUG_ON(DOMHEAP_VIRT_START & ~FIRST_MASK);
-#endif
-    /*
-     * The boot code expects the regions XEN_VIRT_START, FIXMAP_ADDR(0),
-     * BOOT_FDT_VIRT_START to use the same 0th (arm64 only) and 1st
-     * slot in the page tables.
-     */
-#define CHECK_SAME_SLOT(level, virt1, virt2) \
-    BUILD_BUG_ON(level##_table_offset(virt1) != level##_table_offset(virt2))
-
-#ifdef CONFIG_ARM_64
-    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, FIXMAP_ADDR(0));
-    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, BOOT_FDT_VIRT_START);
-#endif
-    CHECK_SAME_SLOT(first, XEN_VIRT_START, FIXMAP_ADDR(0));
-    CHECK_SAME_SLOT(first, XEN_VIRT_START, BOOT_FDT_VIRT_START);
-
-#undef CHECK_SAME_SLOT
-}
-
-static lpae_t *xen_map_table(mfn_t mfn)
-{
-    /*
-     * During early boot, map_domain_page() may be unusable. Use the
-     * PMAP to map temporarily a page-table.
-     */
-    if ( system_state == SYS_STATE_early_boot )
-        return pmap_map(mfn);
-
-    return map_domain_page(mfn);
-}
-
-static void xen_unmap_table(const lpae_t *table)
-{
-    /*
-     * During early boot, xen_map_table() will not use map_domain_page()
-     * but the PMAP.
-     */
-    if ( system_state == SYS_STATE_early_boot )
-        pmap_unmap(table);
-    else
-        unmap_domain_page(table);
-}
-
-void dump_pt_walk(paddr_t ttbr, paddr_t addr,
-                  unsigned int root_level,
-                  unsigned int nr_root_tables)
-{
-    static const char *level_strs[4] = { "0TH", "1ST", "2ND", "3RD" };
-    const mfn_t root_mfn = maddr_to_mfn(ttbr);
-    const unsigned int offsets[4] = {
-        zeroeth_table_offset(addr),
-        first_table_offset(addr),
-        second_table_offset(addr),
-        third_table_offset(addr)
-    };
-    lpae_t pte, *mapping;
-    unsigned int level, root_table;
-
-#ifdef CONFIG_ARM_32
-    BUG_ON(root_level < 1);
-#endif
-    BUG_ON(root_level > 3);
-
-    if ( nr_root_tables > 1 )
-    {
-        /*
-         * Concatenated root-level tables. The table number will be
-         * the offset at the previous level. It is not possible to
-         * concatenate a level-0 root.
-         */
-        BUG_ON(root_level == 0);
-        root_table = offsets[root_level - 1];
-        printk("Using concatenated root table %u\n", root_table);
-        if ( root_table >= nr_root_tables )
-        {
-            printk("Invalid root table offset\n");
-            return;
-        }
-    }
-    else
-        root_table = 0;
-
-    mapping = xen_map_table(mfn_add(root_mfn, root_table));
-
-    for ( level = root_level; ; level++ )
-    {
-        if ( offsets[level] > XEN_PT_LPAE_ENTRIES )
-            break;
-
-        pte = mapping[offsets[level]];
-
-        printk("%s[0x%03x] = 0x%"PRIpaddr"\n",
-               level_strs[level], offsets[level], pte.bits);
-
-        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
-            break;
-
-        /* For next iteration */
-        xen_unmap_table(mapping);
-        mapping = xen_map_table(lpae_get_mfn(pte));
-    }
-
-    xen_unmap_table(mapping);
-}
-
-void dump_hyp_walk(vaddr_t addr)
-{
-    uint64_t ttbr = READ_SYSREG64(TTBR0_EL2);
-
-    printk("Walking Hypervisor VA 0x%"PRIvaddr" "
-           "on CPU%d via TTBR 0x%016"PRIx64"\n",
-           addr, smp_processor_id(), ttbr);
-
-    dump_pt_walk(ttbr, addr, HYP_PT_ROOT_LEVEL, 1);
-}
-
-lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr)
-{
-    lpae_t e = (lpae_t) {
-        .pt = {
-            .valid = 1,           /* Mappings are present */
-            .table = 0,           /* Set to 1 for links and 4k maps */
-            .ai = attr,
-            .ns = 1,              /* Hyp mode is in the non-secure world */
-            .up = 1,              /* See below */
-            .ro = 0,              /* Assume read-write */
-            .af = 1,              /* No need for access tracking */
-            .ng = 1,              /* Makes TLB flushes easier */
-            .contig = 0,          /* Assume non-contiguous */
-            .xn = 1,              /* No need to execute outside .text */
-            .avail = 0,           /* Reference count for domheap mapping */
-        }};
-    /*
-     * For EL2 stage-1 page table, up (aka AP[1]) is RES1 as the translation
-     * regime applies to only one exception level (see D4.4.4 and G4.6.1
-     * in ARM DDI 0487B.a). If this changes, remember to update the
-     * hard-coded values in head.S too.
-     */
-
-    switch ( attr )
-    {
-    case MT_NORMAL_NC:
-        /*
-         * ARM ARM: Overlaying the shareability attribute (DDI
-         * 0406C.b B3-1376 to 1377)
-         *
-         * A memory region with a resultant memory type attribute of Normal,
-         * and a resultant cacheability attribute of Inner Non-cacheable,
-         * Outer Non-cacheable, must have a resultant shareability attribute
-         * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
-         *
-         * On ARMv8 sharability is ignored and explicitly treated as Outer
-         * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
-         */
-        e.pt.sh = LPAE_SH_OUTER;
-        break;
-    case MT_DEVICE_nGnRnE:
-    case MT_DEVICE_nGnRE:
-        /*
-         * Shareability is ignored for non-Normal memory, Outer is as
-         * good as anything.
-         *
-         * On ARMv8 sharability is ignored and explicitly treated as Outer
-         * Shareable for any device memory type.
-         */
-        e.pt.sh = LPAE_SH_OUTER;
-        break;
-    default:
-        e.pt.sh = LPAE_SH_INNER;  /* Xen mappings are SMP coherent */
-        break;
-    }
-
-    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
-
-    lpae_set_mfn(e, mfn);
-
-    return e;
-}
-
-/* Map a 4k page in a fixmap entry */
-void set_fixmap(unsigned int map, mfn_t mfn, unsigned int flags)
-{
-    int res;
-
-    res = map_pages_to_xen(FIXMAP_ADDR(map), mfn, 1, flags);
-    BUG_ON(res != 0);
-}
-
-/* Remove a mapping from a fixmap entry */
-void clear_fixmap(unsigned int map)
-{
-    int res;
-
-    res = destroy_xen_mappings(FIXMAP_ADDR(map), FIXMAP_ADDR(map) + PAGE_SIZE);
-    BUG_ON(res != 0);
-}
-
-void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes)
-{
-    set_fixmap(FIXMAP_MISC, mfn, attributes);
-
-    return fix_to_virt(FIXMAP_MISC);
-}
-
-void unmap_page_from_xen_misc(void)
-{
-    clear_fixmap(FIXMAP_MISC);
-}
-
 void flush_page_to_ram(unsigned long mfn, bool sync_icache)
 {
     void *v = map_domain_page(_mfn(mfn));
@@ -386,878 +39,6 @@ void flush_page_to_ram(unsigned long mfn, bool sync_icache)
         invalidate_icache();
 }
 
-static inline lpae_t pte_of_xenaddr(vaddr_t va)
-{
-    paddr_t ma = va + phys_offset;
-
-    return mfn_to_xen_entry(maddr_to_mfn(ma), MT_NORMAL);
-}
-
-void * __init early_fdt_map(paddr_t fdt_paddr)
-{
-    /* We are using 2MB superpage for mapping the FDT */
-    paddr_t base_paddr = fdt_paddr & SECOND_MASK;
-    paddr_t offset;
-    void *fdt_virt;
-    uint32_t size;
-    int rc;
-
-    /*
-     * Check whether the physical FDT address is set and meets the minimum
-     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
-     * least 8 bytes so that we always access the magic and size fields
-     * of the FDT header after mapping the first chunk, double check if
-     * that is indeed the case.
-     */
-    BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
-    if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
-        return NULL;
-
-    /* The FDT is mapped using 2MB superpage */
-    BUILD_BUG_ON(BOOT_FDT_VIRT_START % SZ_2M);
-
-    rc = map_pages_to_xen(BOOT_FDT_VIRT_START, maddr_to_mfn(base_paddr),
-                          SZ_2M >> PAGE_SHIFT,
-                          PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to map the device-tree.\n");
-
-
-    offset = fdt_paddr % SECOND_SIZE;
-    fdt_virt = (void *)BOOT_FDT_VIRT_START + offset;
-
-    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
-        return NULL;
-
-    size = fdt_totalsize(fdt_virt);
-    if ( size > MAX_FDT_SIZE )
-        return NULL;
-
-    if ( (offset + size) > SZ_2M )
-    {
-        rc = map_pages_to_xen(BOOT_FDT_VIRT_START + SZ_2M,
-                              maddr_to_mfn(base_paddr + SZ_2M),
-                              SZ_2M >> PAGE_SHIFT,
-                              PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
-        if ( rc )
-            panic("Unable to map the device-tree\n");
-    }
-
-    return fdt_virt;
-}
-
-void __init remove_early_mappings(void)
-{
-    int rc;
-
-    /* destroy the _PAGE_BLOCK mapping */
-    rc = modify_xen_mappings(BOOT_FDT_VIRT_START,
-                             BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE,
-                             _PAGE_BLOCK);
-    BUG_ON(rc);
-}
-
-/*
- * After boot, Xen page-tables should not contain mapping that are both
- * Writable and eXecutables.
- *
- * This should be called on each CPU to enforce the policy.
- */
-static void xen_pt_enforce_wnx(void)
-{
-    WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
-    /*
-     * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
-     * before flushing the TLBs.
-     */
-    isb();
-    flush_xen_tlb_local();
-}
-
-extern void switch_ttbr(uint64_t ttbr);
-
-/* Clear a translation table and clean & invalidate the cache */
-static void clear_table(void *table)
-{
-    clear_page(table);
-    clean_and_invalidate_dcache_va_range(table, PAGE_SIZE);
-}
-
-/* Boot-time pagetable setup.
- * Changes here may need matching changes in head.S */
-void __init setup_pagetables(unsigned long boot_phys_offset)
-{
-    uint64_t ttbr;
-    lpae_t pte, *p;
-    int i;
-
-    phys_offset = boot_phys_offset;
-
-#ifdef CONFIG_ARM_64
-    p = (void *) xen_pgtable;
-    p[0] = pte_of_xenaddr((uintptr_t)xen_first);
-    p[0].pt.table = 1;
-    p[0].pt.xn = 0;
-    p = (void *) xen_first;
-#else
-    p = (void *) cpu0_pgtable;
-#endif
-
-    /* Map xen second level page-table */
-    p[0] = pte_of_xenaddr((uintptr_t)(xen_second));
-    p[0].pt.table = 1;
-    p[0].pt.xn = 0;
-
-    /* Break up the Xen mapping into 4k pages and protect them separately. */
-    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-    {
-        vaddr_t va = XEN_VIRT_START + (i << PAGE_SHIFT);
-
-        if ( !is_kernel(va) )
-            break;
-        pte = pte_of_xenaddr(va);
-        pte.pt.table = 1; /* 4k mappings always have this bit set */
-        if ( is_kernel_text(va) || is_kernel_inittext(va) )
-        {
-            pte.pt.xn = 0;
-            pte.pt.ro = 1;
-        }
-        if ( is_kernel_rodata(va) )
-            pte.pt.ro = 1;
-        xen_xenmap[i] = pte;
-    }
-
-    /* Initialise xen second level entries ... */
-    /* ... Xen's text etc */
-
-    pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
-    pte.pt.table = 1;
-    xen_second[second_table_offset(XEN_VIRT_START)] = pte;
-
-    /* ... Fixmap */
-    pte = pte_of_xenaddr((vaddr_t)xen_fixmap);
-    pte.pt.table = 1;
-    xen_second[second_table_offset(FIXMAP_ADDR(0))] = pte;
-
-#ifdef CONFIG_ARM_64
-    ttbr = (uintptr_t) xen_pgtable + phys_offset;
-#else
-    ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
-#endif
-
-    switch_ttbr(ttbr);
-
-    xen_pt_enforce_wnx();
-
-#ifdef CONFIG_ARM_32
-    per_cpu(xen_pgtable, 0) = cpu0_pgtable;
-#endif
-}
-
-static void clear_boot_pagetables(void)
-{
-    /*
-     * Clear the copy of the boot pagetables. Each secondary CPU
-     * rebuilds these itself (see head.S).
-     */
-    clear_table(boot_pgtable);
-#ifdef CONFIG_ARM_64
-    clear_table(boot_first);
-    clear_table(boot_first_id);
-#endif
-    clear_table(boot_second);
-    clear_table(boot_third);
-}
-
-#ifdef CONFIG_ARM_64
-int init_secondary_pagetables(int cpu)
-{
-    clear_boot_pagetables();
-
-    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
-     * pagetables, but rewrite it each time for consistency with 32 bit. */
-    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
-    clean_dcache(init_ttbr);
-    return 0;
-}
-#else
-int init_secondary_pagetables(int cpu)
-{
-    lpae_t *first;
-
-    first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level trie */
-
-    if ( !first )
-    {
-        printk("CPU%u: Unable to allocate the first page-table\n", cpu);
-        return -ENOMEM;
-    }
-
-    /* Initialise root pagetable from root of boot tables */
-    memcpy(first, cpu0_pgtable, PAGE_SIZE);
-    per_cpu(xen_pgtable, cpu) = first;
-
-    if ( !init_domheap_mappings(cpu) )
-    {
-        printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
-        per_cpu(xen_pgtable, cpu) = NULL;
-        free_xenheap_page(first);
-        return -ENOMEM;
-    }
-
-    clear_boot_pagetables();
-
-    /* Set init_ttbr for this CPU coming up */
-    init_ttbr = __pa(first);
-    clean_dcache(init_ttbr);
-
-    return 0;
-}
-#endif
-
-/* MMU setup for secondary CPUS (which already have paging enabled) */
-void mmu_init_secondary_cpu(void)
-{
-    xen_pt_enforce_wnx();
-}
-
-#ifdef CONFIG_ARM_32
-/*
- * Set up the direct-mapped xenheap:
- * up to 1GB of contiguous, always-mapped memory.
- */
-void __init setup_directmap_mappings(unsigned long base_mfn,
-                                     unsigned long nr_mfns)
-{
-    int rc;
-
-    rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns,
-                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to setup the directmap mappings.\n");
-
-    /* Record where the directmap is, for translation routines. */
-    directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
-}
-#else /* CONFIG_ARM_64 */
-/* Map the region in the directmap area. */
-void __init setup_directmap_mappings(unsigned long base_mfn,
-                                     unsigned long nr_mfns)
-{
-    int rc;
-
-    /* First call sets the directmap physical and virtual offset. */
-    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
-    {
-        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
-
-        directmap_mfn_start = _mfn(base_mfn);
-        directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
-        /*
-         * The base address may not be aligned to the first level
-         * size (e.g. 1GB when using 4KB pages). This would prevent
-         * superpage mappings for all the regions because the virtual
-         * address and machine address should both be suitably aligned.
-         *
-         * Prevent that by offsetting the start of the directmap virtual
-         * address.
-         */
-        directmap_virt_start = DIRECTMAP_VIRT_START +
-            (base_mfn - mfn_gb) * PAGE_SIZE;
-    }
-
-    if ( base_mfn < mfn_x(directmap_mfn_start) )
-        panic("cannot add directmap mapping at %lx below heap start %lx\n",
-              base_mfn, mfn_x(directmap_mfn_start));
-
-    rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
-                          _mfn(base_mfn), nr_mfns,
-                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to setup the directmap mappings.\n");
-}
-#endif
-
-/* Map a frame table to cover physical addresses ps through pe */
-void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
-{
-    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
-                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
-    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
-    mfn_t base_mfn;
-    const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : MB(32);
-    int rc;
-
-    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
-    /* Round up to 2M or 32M boundary, as appropriate. */
-    frametable_size = ROUNDUP(frametable_size, mapping_size);
-    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
-
-    rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
-                          frametable_size >> PAGE_SHIFT,
-                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to setup the frametable mappings.\n");
-
-    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
-    memset(&frame_table[nr_pdxs], -1,
-           frametable_size - (nr_pdxs * sizeof(struct page_info)));
-
-    frametable_virt_end = FRAMETABLE_VIRT_START + (nr_pdxs * sizeof(struct page_info));
-}
-
-void *__init arch_vmap_virt_end(void)
-{
-    return (void *)(VMAP_VIRT_START + VMAP_VIRT_SIZE);
-}
-
-/*
- * This function should only be used to remap device address ranges
- * TODO: add a check to verify this assumption
- */
-void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
-{
-    mfn_t mfn = _mfn(PFN_DOWN(pa));
-    unsigned int offs = pa & (PAGE_SIZE - 1);
-    unsigned int nr = PFN_UP(offs + len);
-    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
-
-    if ( ptr == NULL )
-        return NULL;
-
-    return ptr + offs;
-}
-
-void *ioremap(paddr_t pa, size_t len)
-{
-    return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
-}
-
-static int create_xen_table(lpae_t *entry)
-{
-    mfn_t mfn;
-    void *p;
-    lpae_t pte;
-
-    if ( system_state != SYS_STATE_early_boot )
-    {
-        struct page_info *pg = alloc_domheap_page(NULL, 0);
-
-        if ( pg == NULL )
-            return -ENOMEM;
-
-        mfn = page_to_mfn(pg);
-    }
-    else
-        mfn = alloc_boot_pages(1, 1);
-
-    p = xen_map_table(mfn);
-    clear_page(p);
-    xen_unmap_table(p);
-
-    pte = mfn_to_xen_entry(mfn, MT_NORMAL);
-    pte.pt.table = 1;
-    write_pte(entry, pte);
-
-    return 0;
-}
-
-#define XEN_TABLE_MAP_FAILED 0
-#define XEN_TABLE_SUPER_PAGE 1
-#define XEN_TABLE_NORMAL_PAGE 2
-
-/*
- * Take the currently mapped table, find the corresponding entry,
- * and map the next table, if available.
- *
- * The read_only parameters indicates whether intermediate tables should
- * be allocated when not present.
- *
- * Return values:
- *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
- *  was empty, or allocating a new page failed.
- *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
- *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
- */
-static int xen_pt_next_level(bool read_only, unsigned int level,
-                             lpae_t **table, unsigned int offset)
-{
-    lpae_t *entry;
-    int ret;
-    mfn_t mfn;
-
-    entry = *table + offset;
-
-    if ( !lpae_is_valid(*entry) )
-    {
-        if ( read_only )
-            return XEN_TABLE_MAP_FAILED;
-
-        ret = create_xen_table(entry);
-        if ( ret )
-            return XEN_TABLE_MAP_FAILED;
-    }
-
-    /* The function xen_pt_next_level is never called at the 3rd level */
-    if ( lpae_is_mapping(*entry, level) )
-        return XEN_TABLE_SUPER_PAGE;
-
-    mfn = lpae_get_mfn(*entry);
-
-    xen_unmap_table(*table);
-    *table = xen_map_table(mfn);
-
-    return XEN_TABLE_NORMAL_PAGE;
-}
-
-/* Sanity check of the entry */
-static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
-                               unsigned int flags)
-{
-    /* Sanity check when modifying an entry. */
-    if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
-    {
-        /* We don't allow modifying an invalid entry. */
-        if ( !lpae_is_valid(entry) )
-        {
-            mm_printk("Modifying invalid entry is not allowed.\n");
-            return false;
-        }
-
-        /* We don't allow modifying a table entry */
-        if ( !lpae_is_mapping(entry, level) )
-        {
-            mm_printk("Modifying a table entry is not allowed.\n");
-            return false;
-        }
-
-        /* We don't allow changing memory attributes. */
-        if ( entry.pt.ai != PAGE_AI_MASK(flags) )
-        {
-            mm_printk("Modifying memory attributes is not allowed (0x%x -> 0x%x).\n",
-                      entry.pt.ai, PAGE_AI_MASK(flags));
-            return false;
-        }
-
-        /* We don't allow modifying entry with contiguous bit set. */
-        if ( entry.pt.contig )
-        {
-            mm_printk("Modifying entry with contiguous bit set is not allowed.\n");
-            return false;
-        }
-    }
-    /* Sanity check when inserting a mapping */
-    else if ( flags & _PAGE_PRESENT )
-    {
-        /* We should be here with a valid MFN. */
-        ASSERT(!mfn_eq(mfn, INVALID_MFN));
-
-        /*
-         * We don't allow replacing any valid entry.
-         *
-         * Note that the function xen_pt_update() relies on this
-         * assumption and will skip the TLB flush. The function will need
-         * to be updated if the check is relaxed.
-         */
-        if ( lpae_is_valid(entry) )
-        {
-            if ( lpae_is_mapping(entry, level) )
-                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
-                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
-            else
-                mm_printk("Trying to replace a table with a mapping.\n");
-            return false;
-        }
-    }
-    /* Sanity check when removing a mapping. */
-    else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
-    {
-        /* We should be here with an invalid MFN. */
-        ASSERT(mfn_eq(mfn, INVALID_MFN));
-
-        /* We don't allow removing a table */
-        if ( lpae_is_table(entry, level) )
-        {
-            mm_printk("Removing a table is not allowed.\n");
-            return false;
-        }
-
-        /* We don't allow removing a mapping with contiguous bit set. */
-        if ( entry.pt.contig )
-        {
-            mm_printk("Removing entry with contiguous bit set is not allowed.\n");
-            return false;
-        }
-    }
-    /* Sanity check when populating the page-table. No check so far. */
-    else
-    {
-        ASSERT(flags & _PAGE_POPULATE);
-        /* We should be here with an invalid MFN */
-        ASSERT(mfn_eq(mfn, INVALID_MFN));
-    }
-
-    return true;
-}
-
-/* Update an entry at the level @target. */
-static int xen_pt_update_entry(mfn_t root, unsigned long virt,
-                               mfn_t mfn, unsigned int target,
-                               unsigned int flags)
-{
-    int rc;
-    unsigned int level;
-    lpae_t *table;
-    /*
-     * The intermediate page tables are read-only when the MFN is not valid
-     * and we are not populating page table.
-     * This means we either modify permissions or remove an entry.
-     */
-    bool read_only = mfn_eq(mfn, INVALID_MFN) && !(flags & _PAGE_POPULATE);
-    lpae_t pte, *entry;
-
-    /* convenience aliases */
-    DECLARE_OFFSETS(offsets, (paddr_t)virt);
-
-    /* _PAGE_POPULATE and _PAGE_PRESENT should never be set together. */
-    ASSERT((flags & (_PAGE_POPULATE|_PAGE_PRESENT)) != (_PAGE_POPULATE|_PAGE_PRESENT));
-
-    table = xen_map_table(root);
-    for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
-    {
-        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
-        if ( rc == XEN_TABLE_MAP_FAILED )
-        {
-            /*
-             * We are here because xen_pt_next_level has failed to map
-             * the intermediate page table (e.g the table does not exist
-             * and the pt is read-only). It is a valid case when
-             * removing a mapping as it may not exist in the page table.
-             * In this case, just ignore it.
-             */
-            if ( flags & (_PAGE_PRESENT|_PAGE_POPULATE) )
-            {
-                mm_printk("%s: Unable to map level %u\n", __func__, level);
-                rc = -ENOENT;
-                goto out;
-            }
-            else
-            {
-                rc = 0;
-                goto out;
-            }
-        }
-        else if ( rc != XEN_TABLE_NORMAL_PAGE )
-            break;
-    }
-
-    if ( level != target )
-    {
-        mm_printk("%s: Shattering superpage is not supported\n", __func__);
-        rc = -EOPNOTSUPP;
-        goto out;
-    }
-
-    entry = table + offsets[level];
-
-    rc = -EINVAL;
-    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
-        goto out;
-
-    /* If we are only populating page-table, then we are done. */
-    rc = 0;
-    if ( flags & _PAGE_POPULATE )
-        goto out;
-
-    /* We are removing the page */
-    if ( !(flags & _PAGE_PRESENT) )
-        memset(&pte, 0x00, sizeof(pte));
-    else
-    {
-        /* We are inserting a mapping => Create new pte. */
-        if ( !mfn_eq(mfn, INVALID_MFN) )
-        {
-            pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
-
-            /*
-             * First and second level pages set pte.pt.table = 0, but
-             * third level entries set pte.pt.table = 1.
-             */
-            pte.pt.table = (level == 3);
-        }
-        else /* We are updating the permission => Copy the current pte. */
-            pte = *entry;
-
-        /* Set permission */
-        pte.pt.ro = PAGE_RO_MASK(flags);
-        pte.pt.xn = PAGE_XN_MASK(flags);
-        /* Set contiguous bit */
-        pte.pt.contig = !!(flags & _PAGE_CONTIG);
-    }
-
-    write_pte(entry, pte);
-
-    rc = 0;
-
-out:
-    xen_unmap_table(table);
-
-    return rc;
-}
-
-/* Return the level where mapping should be done */
-static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
-                                unsigned int flags)
-{
-    unsigned int level;
-    unsigned long mask;
-
-    /*
-      * Don't take into account the MFN when removing mapping (i.e
-      * MFN_INVALID) to calculate the correct target order.
-      *
-      * Per the Arm Arm, `vfn` and `mfn` must be both superpage aligned.
-      * They are or-ed together and then checked against the size of
-      * each level.
-      *
-      * `left` is not included and checked separately to allow
-      * superpage mapping even if it is not properly aligned (the
-      * user may have asked to map 2MB + 4k).
-      */
-     mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
-     mask |= vfn;
-
-     /*
-      * Always use level 3 mapping unless the caller request block
-      * mapping.
-      */
-     if ( likely(!(flags & _PAGE_BLOCK)) )
-         level = 3;
-     else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) &&
-               (nr >= BIT(FIRST_ORDER, UL)) )
-         level = 1;
-     else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) &&
-               (nr >= BIT(SECOND_ORDER, UL)) )
-         level = 2;
-     else
-         level = 3;
-
-     return level;
-}
-
-#define XEN_PT_4K_NR_CONTIG 16
-
-/*
- * Check whether the contiguous bit can be set. Return the number of
- * contiguous entry allowed. If not allowed, return 1.
- */
-static unsigned int xen_pt_check_contig(unsigned long vfn, mfn_t mfn,
-                                        unsigned int level, unsigned long left,
-                                        unsigned int flags)
-{
-    unsigned long nr_contig;
-
-    /*
-     * Allow the contiguous bit to set when the caller requests block
-     * mapping.
-     */
-    if ( !(flags & _PAGE_BLOCK) )
-        return 1;
-
-    /*
-     * We don't allow to remove mapping with the contiguous bit set.
-     * So shortcut the logic and directly return 1.
-     */
-    if ( mfn_eq(mfn, INVALID_MFN) )
-        return 1;
-
-    /*
-     * The number of contiguous entries varies depending on the page
-     * granularity used. The logic below assumes 4KB.
-     */
-    BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
-
-    /*
-     * In order to enable the contiguous bit, we should have enough entries
-     * to map left and both the virtual and physical address should be
-     * aligned to the size of 16 translation tables entries.
-     */
-    nr_contig = BIT(XEN_PT_LEVEL_ORDER(level), UL) * XEN_PT_4K_NR_CONTIG;
-
-    if ( (left < nr_contig) || ((mfn_x(mfn) | vfn) & (nr_contig - 1)) )
-        return 1;
-
-    return XEN_PT_4K_NR_CONTIG;
-}
-
-static DEFINE_SPINLOCK(xen_pt_lock);
-
-static int xen_pt_update(unsigned long virt,
-                         mfn_t mfn,
-                         /* const on purpose as it is used for TLB flush */
-                         const unsigned long nr_mfns,
-                         unsigned int flags)
-{
-    int rc = 0;
-    unsigned long vfn = virt >> PAGE_SHIFT;
-    unsigned long left = nr_mfns;
-
-    /*
-     * For arm32, page-tables are different on each CPUs. Yet, they share
-     * some common mappings. It is assumed that only common mappings
-     * will be modified with this function.
-     *
-     * XXX: Add a check.
-     */
-    const mfn_t root = maddr_to_mfn(READ_SYSREG64(TTBR0_EL2));
-
-    /*
-     * The hardware was configured to forbid mapping both writeable and
-     * executable.
-     * When modifying/creating mapping (i.e _PAGE_PRESENT is set),
-     * prevent any update if this happen.
-     */
-    if ( (flags & _PAGE_PRESENT) && !PAGE_RO_MASK(flags) &&
-         !PAGE_XN_MASK(flags) )
-    {
-        mm_printk("Mappings should not be both Writeable and Executable.\n");
-        return -EINVAL;
-    }
-
-    if ( flags & _PAGE_CONTIG )
-    {
-        mm_printk("_PAGE_CONTIG is an internal only flag.\n");
-        return -EINVAL;
-    }
-
-    if ( !IS_ALIGNED(virt, PAGE_SIZE) )
-    {
-        mm_printk("The virtual address is not aligned to the page-size.\n");
-        return -EINVAL;
-    }
-
-    spin_lock(&xen_pt_lock);
-
-    while ( left )
-    {
-        unsigned int order, level, nr_contig, new_flags;
-
-        level = xen_pt_mapping_level(vfn, mfn, left, flags);
-        order = XEN_PT_LEVEL_ORDER(level);
-
-        ASSERT(left >= BIT(order, UL));
-
-        /*
-         * Check if we can set the contiguous mapping and update the
-         * flags accordingly.
-         */
-        nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags);
-        new_flags = flags | ((nr_contig > 1) ? _PAGE_CONTIG : 0);
-
-        for ( ; nr_contig > 0; nr_contig-- )
-        {
-            rc = xen_pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level,
-                                     new_flags);
-            if ( rc )
-                break;
-
-            vfn += 1U << order;
-            if ( !mfn_eq(mfn, INVALID_MFN) )
-                mfn = mfn_add(mfn, 1U << order);
-
-            left -= (1U << order);
-        }
-
-        if ( rc )
-            break;
-    }
-
-    /*
-     * The TLBs flush can be safely skipped when a mapping is inserted
-     * as we don't allow mapping replacement (see xen_pt_check_entry()).
-     *
-     * For all the other cases, the TLBs will be flushed unconditionally
-     * even if the mapping has failed. This is because we may have
-     * partially modified the PT. This will prevent any unexpected
-     * behavior afterwards.
-     */
-    if ( !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) )
-        flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
-
-    spin_unlock(&xen_pt_lock);
-
-    return rc;
-}
-
-int map_pages_to_xen(unsigned long virt,
-                     mfn_t mfn,
-                     unsigned long nr_mfns,
-                     unsigned int flags)
-{
-    return xen_pt_update(virt, mfn, nr_mfns, flags);
-}
-
-int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
-{
-    return xen_pt_update(virt, INVALID_MFN, nr_mfns, _PAGE_POPULATE);
-}
-
-int destroy_xen_mappings(unsigned long s, unsigned long e)
-{
-    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
-    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
-    ASSERT(s <= e);
-    return xen_pt_update(s, INVALID_MFN, (e - s) >> PAGE_SHIFT, 0);
-}
-
-int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
-{
-    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
-    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
-    ASSERT(s <= e);
-    return xen_pt_update(s, INVALID_MFN, (e - s) >> PAGE_SHIFT, flags);
-}
-
-/* Release all __init and __initdata ranges to be reused */
-void free_init_memory(void)
-{
-    paddr_t pa = virt_to_maddr(__init_begin);
-    unsigned long len = __init_end - __init_begin;
-    uint32_t insn;
-    unsigned int i, nr = len / sizeof(insn);
-    uint32_t *p;
-    int rc;
-
-    rc = modify_xen_mappings((unsigned long)__init_begin,
-                             (unsigned long)__init_end, PAGE_HYPERVISOR_RW);
-    if ( rc )
-        panic("Unable to map RW the init section (rc = %d)\n", rc);
-
-    /*
-     * From now on, init will not be used for execution anymore,
-     * so nuke the instruction cache to remove entries related to init.
-     */
-    invalidate_icache_local();
-
-#ifdef CONFIG_ARM_32
-    /* udf instruction i.e (see A8.8.247 in ARM DDI 0406C.c) */
-    insn = 0xe7f000f0;
-#else
-    insn = AARCH64_BREAK_FAULT;
-#endif
-    p = (uint32_t *)__init_begin;
-    for ( i = 0; i < nr; i++ )
-        *(p + i) = insn;
-
-    rc = destroy_xen_mappings((unsigned long)__init_begin,
-                              (unsigned long)__init_end);
-    if ( rc )
-        panic("Unable to remove the init section (rc = %d)\n", rc);
-
-    init_domheap_pages(pa, pa + len);
-    printk("Freed %ldkB init memory.\n", (long)(__init_end-__init_begin)>>10);
-}
-
 void arch_dump_shared_mem_info(void)
 {
 }
@@ -1319,137 +100,6 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
     spin_unlock(&d->page_alloc_lock);
 }
 
-int xenmem_add_to_physmap_one(
-    struct domain *d,
-    unsigned int space,
-    union add_to_physmap_extra extra,
-    unsigned long idx,
-    gfn_t gfn)
-{
-    mfn_t mfn = INVALID_MFN;
-    int rc;
-    p2m_type_t t;
-    struct page_info *page = NULL;
-
-    switch ( space )
-    {
-    case XENMAPSPACE_grant_table:
-        rc = gnttab_map_frame(d, idx, gfn, &mfn);
-        if ( rc )
-            return rc;
-
-        /* Need to take care of the reference obtained in gnttab_map_frame(). */
-        page = mfn_to_page(mfn);
-        t = p2m_ram_rw;
-
-        break;
-    case XENMAPSPACE_shared_info:
-        if ( idx != 0 )
-            return -EINVAL;
-
-        mfn = virt_to_mfn(d->shared_info);
-        t = p2m_ram_rw;
-
-        break;
-    case XENMAPSPACE_gmfn_foreign:
-    {
-        struct domain *od;
-        p2m_type_t p2mt;
-
-        od = get_pg_owner(extra.foreign_domid);
-        if ( od == NULL )
-            return -ESRCH;
-
-        if ( od == d )
-        {
-            put_pg_owner(od);
-            return -EINVAL;
-        }
-
-        rc = xsm_map_gmfn_foreign(XSM_TARGET, d, od);
-        if ( rc )
-        {
-            put_pg_owner(od);
-            return rc;
-        }
-
-        /* Take reference to the foreign domain page.
-         * Reference will be released in XENMEM_remove_from_physmap */
-        page = get_page_from_gfn(od, idx, &p2mt, P2M_ALLOC);
-        if ( !page )
-        {
-            put_pg_owner(od);
-            return -EINVAL;
-        }
-
-        if ( p2m_is_ram(p2mt) )
-            t = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : p2m_map_foreign_ro;
-        else
-        {
-            put_page(page);
-            put_pg_owner(od);
-            return -EINVAL;
-        }
-
-        mfn = page_to_mfn(page);
-
-        put_pg_owner(od);
-        break;
-    }
-    case XENMAPSPACE_dev_mmio:
-        rc = map_dev_mmio_page(d, gfn, _mfn(idx));
-        return rc;
-
-    default:
-        return -ENOSYS;
-    }
-
-    /*
-     * Map at new location. Here we need to map xenheap RAM page differently
-     * because we need to store the valid GFN and make sure that nothing was
-     * mapped before (the stored GFN is invalid). And these actions need to be
-     * performed with the P2M lock held. The guest_physmap_add_entry() is just
-     * a wrapper on top of p2m_set_entry().
-     */
-    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
-        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
-    else
-    {
-        struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-        p2m_write_lock(p2m);
-        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), INVALID_GFN) )
-        {
-            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
-            if ( !rc )
-                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
-        }
-        else
-            /*
-             * Mandate the caller to first unmap the page before mapping it
-             * again. This is to prevent Xen creating an unwanted hole in
-             * the P2M. For instance, this could happen if the firmware stole
-             * a RAM address for mapping the shared_info page into but forgot
-             * to unmap it afterwards.
-             */
-            rc = -EBUSY;
-        p2m_write_unlock(p2m);
-    }
-
-    /*
-     * For XENMAPSPACE_gmfn_foreign if we failed to add the mapping, we need
-     * to drop the reference we took earlier. In all other cases we need to
-     * drop any reference we took earlier (perhaps indirectly).
-     */
-    if ( space == XENMAPSPACE_gmfn_foreign ? rc : page != NULL )
-    {
-        ASSERT(page != NULL);
-        put_page(page);
-    }
-
-    return rc;
-}
-
 long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     switch ( op )
diff --git a/xen/arch/arm/mm_mmu.c b/xen/arch/arm/mm_mmu.c
new file mode 100644
index 0000000000..72b4909766
--- /dev/null
+++ b/xen/arch/arm/mm_mmu.c
@@ -0,0 +1,1376 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * xen/arch/arm/mm.c
+ *
+ * MMU code for an ARMv7-A with virt extensions.
+ *
+ * Tim Deegan <tim@xen.org>
+ * Copyright (c) 2011 Citrix Systems.
+ */
+
+#include <xen/domain_page.h>
+#include <xen/errno.h>
+#include <xen/grant_table.h>
+#include <xen/guest_access.h>
+#include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <xen/pmap.h>
+#include <xen/sched.h>
+#include <xen/sizes.h>
+#include <xen/types.h>
+#include <xen/vmap.h>
+
+#include <xsm/xsm.h>
+
+#include <asm/fixmap.h>
+#include <asm/setup.h>
+
+#include <public/memory.h>
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef virt_to_mfn
+#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+#undef mfn_to_virt
+#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))
+
+#ifdef NDEBUG
+static inline void
+__attribute__ ((__format__ (__printf__, 1, 2)))
+mm_printk(const char *fmt, ...) {}
+#else
+#define mm_printk(fmt, args...)             \
+    do                                      \
+    {                                       \
+        dprintk(XENLOG_ERR, fmt, ## args);  \
+        WARN();                             \
+    } while (0)
+#endif
+
+/* Static start-of-day pagetables that we use before the allocators
+ * are up. These are used by all CPUs during bringup before switching
+ * to the CPUs own pagetables.
+ *
+ * These pagetables have a very simple structure. They include:
+ *  - 2MB worth of 4K mappings of xen at XEN_VIRT_START, boot_first and
+ *    boot_second are used to populate the tables down to boot_third
+ *    which contains the actual mapping.
+ *  - a 1:1 mapping of xen at its current physical address. This uses a
+ *    section mapping at whichever of boot_{pgtable,first,second}
+ *    covers that physical address.
+ *
+ * For the boot CPU these mappings point to the address where Xen was
+ * loaded by the bootloader. For secondary CPUs they point to the
+ * relocated copy of Xen for the benefit of secondary CPUs.
+ *
+ * In addition to the above for the boot CPU the device-tree is
+ * initially mapped in the boot misc slot. This mapping is not present
+ * for secondary CPUs.
+ *
+ * Finally, if EARLY_PRINTK is enabled then xen_fixmap will be mapped
+ * by the CPU once it has moved off the 1:1 mapping.
+ */
+DEFINE_BOOT_PAGE_TABLE(boot_pgtable);
+#ifdef CONFIG_ARM_64
+DEFINE_BOOT_PAGE_TABLE(boot_first);
+DEFINE_BOOT_PAGE_TABLE(boot_first_id);
+#endif
+DEFINE_BOOT_PAGE_TABLE(boot_second_id);
+DEFINE_BOOT_PAGE_TABLE(boot_third_id);
+DEFINE_BOOT_PAGE_TABLE(boot_second);
+DEFINE_BOOT_PAGE_TABLE(boot_third);
+
+/* Main runtime page tables */
+
+/*
+ * For arm32 xen_pgtable are per-PCPU and are allocated before
+ * bringing up each CPU. For arm64 xen_pgtable is common to all PCPUs.
+ *
+ * xen_second, xen_fixmap and xen_xenmap are always shared between all
+ * PCPUs.
+ */
+
+#ifdef CONFIG_ARM_64
+#define HYP_PT_ROOT_LEVEL 0
+static DEFINE_PAGE_TABLE(xen_pgtable);
+static DEFINE_PAGE_TABLE(xen_first);
+#define THIS_CPU_PGTABLE xen_pgtable
+#else
+#define HYP_PT_ROOT_LEVEL 1
+/* Per-CPU pagetable pages */
+/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
+DEFINE_PER_CPU(lpae_t *, xen_pgtable);
+#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
+/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
+static DEFINE_PAGE_TABLE(cpu0_pgtable);
+#endif
+
+/* Common pagetable leaves */
+/* Second level page table used to cover Xen virtual address space */
+static DEFINE_PAGE_TABLE(xen_second);
+/* Third level page table used for fixmap */
+DEFINE_BOOT_PAGE_TABLE(xen_fixmap);
+/*
+ * Third level page table used to map Xen itself with the XN bit set
+ * as appropriate.
+ */
+static DEFINE_PAGE_TABLE(xen_xenmap);
+
+/* Non-boot CPUs use this to find the correct pagetables. */
+uint64_t init_ttbr;
+
+static paddr_t phys_offset;
+
+/* Limits of the Xen heap */
+mfn_t directmap_mfn_start __read_mostly = INVALID_MFN_INITIALIZER;
+mfn_t directmap_mfn_end __read_mostly;
+vaddr_t directmap_virt_end __read_mostly;
+#ifdef CONFIG_ARM_64
+vaddr_t directmap_virt_start __read_mostly;
+unsigned long directmap_base_pdx __read_mostly;
+#endif
+
+unsigned long frametable_virt_end __read_mostly;
+
+extern char __init_begin[], __init_end[];
+
+/* Checking VA memory layout alignment. */
+static void __init __maybe_unused build_assertions(void)
+{
+    /* 2MB aligned regions */
+    BUILD_BUG_ON(XEN_VIRT_START & ~SECOND_MASK);
+    BUILD_BUG_ON(FIXMAP_ADDR(0) & ~SECOND_MASK);
+    /* 1GB aligned regions */
+#ifdef CONFIG_ARM_32
+    BUILD_BUG_ON(XENHEAP_VIRT_START & ~FIRST_MASK);
+#else
+    BUILD_BUG_ON(DIRECTMAP_VIRT_START & ~FIRST_MASK);
+#endif
+    /* Page table structure constraints */
+#ifdef CONFIG_ARM_64
+    BUILD_BUG_ON(zeroeth_table_offset(XEN_VIRT_START));
+#endif
+    BUILD_BUG_ON(first_table_offset(XEN_VIRT_START));
+#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
+    BUILD_BUG_ON(DOMHEAP_VIRT_START & ~FIRST_MASK);
+#endif
+    /*
+     * The boot code expects the regions XEN_VIRT_START, FIXMAP_ADDR(0),
+     * BOOT_FDT_VIRT_START to use the same 0th (arm64 only) and 1st
+     * slot in the page tables.
+     */
+#define CHECK_SAME_SLOT(level, virt1, virt2) \
+    BUILD_BUG_ON(level##_table_offset(virt1) != level##_table_offset(virt2))
+
+#ifdef CONFIG_ARM_64
+    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, FIXMAP_ADDR(0));
+    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, BOOT_FDT_VIRT_START);
+#endif
+    CHECK_SAME_SLOT(first, XEN_VIRT_START, FIXMAP_ADDR(0));
+    CHECK_SAME_SLOT(first, XEN_VIRT_START, BOOT_FDT_VIRT_START);
+
+#undef CHECK_SAME_SLOT
+}
+
+static lpae_t *xen_map_table(mfn_t mfn)
+{
+    /*
+     * During early boot, map_domain_page() may be unusable. Use the
+     * PMAP to map temporarily a page-table.
+     */
+    if ( system_state == SYS_STATE_early_boot )
+        return pmap_map(mfn);
+
+    return map_domain_page(mfn);
+}
+
+static void xen_unmap_table(const lpae_t *table)
+{
+    /*
+     * During early boot, xen_map_table() will not use map_domain_page()
+     * but the PMAP.
+     */
+    if ( system_state == SYS_STATE_early_boot )
+        pmap_unmap(table);
+    else
+        unmap_domain_page(table);
+}
+
+void dump_pt_walk(paddr_t ttbr, paddr_t addr,
+                  unsigned int root_level,
+                  unsigned int nr_root_tables)
+{
+    static const char *level_strs[4] = { "0TH", "1ST", "2ND", "3RD" };
+    const mfn_t root_mfn = maddr_to_mfn(ttbr);
+    const unsigned int offsets[4] = {
+        zeroeth_table_offset(addr),
+        first_table_offset(addr),
+        second_table_offset(addr),
+        third_table_offset(addr)
+    };
+    lpae_t pte, *mapping;
+    unsigned int level, root_table;
+
+#ifdef CONFIG_ARM_32
+    BUG_ON(root_level < 1);
+#endif
+    BUG_ON(root_level > 3);
+
+    if ( nr_root_tables > 1 )
+    {
+        /*
+         * Concatenated root-level tables. The table number will be
+         * the offset at the previous level. It is not possible to
+         * concatenate a level-0 root.
+         */
+        BUG_ON(root_level == 0);
+        root_table = offsets[root_level - 1];
+        printk("Using concatenated root table %u\n", root_table);
+        if ( root_table >= nr_root_tables )
+        {
+            printk("Invalid root table offset\n");
+            return;
+        }
+    }
+    else
+        root_table = 0;
+
+    mapping = xen_map_table(mfn_add(root_mfn, root_table));
+
+    for ( level = root_level; ; level++ )
+    {
+        if ( offsets[level] > XEN_PT_LPAE_ENTRIES )
+            break;
+
+        pte = mapping[offsets[level]];
+
+        printk("%s[0x%03x] = 0x%"PRIpaddr"\n",
+               level_strs[level], offsets[level], pte.bits);
+
+        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
+            break;
+
+        /* For next iteration */
+        xen_unmap_table(mapping);
+        mapping = xen_map_table(lpae_get_mfn(pte));
+    }
+
+    xen_unmap_table(mapping);
+}
+
+void dump_hyp_walk(vaddr_t addr)
+{
+    uint64_t ttbr = READ_SYSREG64(TTBR0_EL2);
+
+    printk("Walking Hypervisor VA 0x%"PRIvaddr" "
+           "on CPU%d via TTBR 0x%016"PRIx64"\n",
+           addr, smp_processor_id(), ttbr);
+
+    dump_pt_walk(ttbr, addr, HYP_PT_ROOT_LEVEL, 1);
+}
+
+lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr)
+{
+    lpae_t e = (lpae_t) {
+        .pt = {
+            .valid = 1,           /* Mappings are present */
+            .table = 0,           /* Set to 1 for links and 4k maps */
+            .ai = attr,
+            .ns = 1,              /* Hyp mode is in the non-secure world */
+            .up = 1,              /* See below */
+            .ro = 0,              /* Assume read-write */
+            .af = 1,              /* No need for access tracking */
+            .ng = 1,              /* Makes TLB flushes easier */
+            .contig = 0,          /* Assume non-contiguous */
+            .xn = 1,              /* No need to execute outside .text */
+            .avail = 0,           /* Reference count for domheap mapping */
+        }};
+    /*
+     * For EL2 stage-1 page table, up (aka AP[1]) is RES1 as the translation
+     * regime applies to only one exception level (see D4.4.4 and G4.6.1
+     * in ARM DDI 0487B.a). If this changes, remember to update the
+     * hard-coded values in head.S too.
+     */
+
+    switch ( attr )
+    {
+    case MT_NORMAL_NC:
+        /*
+         * ARM ARM: Overlaying the shareability attribute (DDI
+         * 0406C.b B3-1376 to 1377)
+         *
+         * A memory region with a resultant memory type attribute of Normal,
+         * and a resultant cacheability attribute of Inner Non-cacheable,
+         * Outer Non-cacheable, must have a resultant shareability attribute
+         * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
+         *
+         * On ARMv8 sharability is ignored and explicitly treated as Outer
+         * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
+         */
+        e.pt.sh = LPAE_SH_OUTER;
+        break;
+    case MT_DEVICE_nGnRnE:
+    case MT_DEVICE_nGnRE:
+        /*
+         * Shareability is ignored for non-Normal memory, Outer is as
+         * good as anything.
+         *
+         * On ARMv8 sharability is ignored and explicitly treated as Outer
+         * Shareable for any device memory type.
+         */
+        e.pt.sh = LPAE_SH_OUTER;
+        break;
+    default:
+        e.pt.sh = LPAE_SH_INNER;  /* Xen mappings are SMP coherent */
+        break;
+    }
+
+    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
+
+    lpae_set_mfn(e, mfn);
+
+    return e;
+}
+
+/* Map a 4k page in a fixmap entry */
+void set_fixmap(unsigned int map, mfn_t mfn, unsigned int flags)
+{
+    int res;
+
+    res = map_pages_to_xen(FIXMAP_ADDR(map), mfn, 1, flags);
+    BUG_ON(res != 0);
+}
+
+/* Remove a mapping from a fixmap entry */
+void clear_fixmap(unsigned int map)
+{
+    int res;
+
+    res = destroy_xen_mappings(FIXMAP_ADDR(map), FIXMAP_ADDR(map) + PAGE_SIZE);
+    BUG_ON(res != 0);
+}
+
+void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes)
+{
+    set_fixmap(FIXMAP_MISC, mfn, attributes);
+
+    return fix_to_virt(FIXMAP_MISC);
+}
+
+void unmap_page_from_xen_misc(void)
+{
+    clear_fixmap(FIXMAP_MISC);
+}
+
+static inline lpae_t pte_of_xenaddr(vaddr_t va)
+{
+    paddr_t ma = va + phys_offset;
+
+    return mfn_to_xen_entry(maddr_to_mfn(ma), MT_NORMAL);
+}
+
+void * __init early_fdt_map(paddr_t fdt_paddr)
+{
+    /* We are using 2MB superpage for mapping the FDT */
+    paddr_t base_paddr = fdt_paddr & SECOND_MASK;
+    paddr_t offset;
+    void *fdt_virt;
+    uint32_t size;
+    int rc;
+
+    /*
+     * Check whether the physical FDT address is set and meets the minimum
+     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
+     * least 8 bytes so that we always access the magic and size fields
+     * of the FDT header after mapping the first chunk, double check if
+     * that is indeed the case.
+     */
+    BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
+    if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
+        return NULL;
+
+    /* The FDT is mapped using 2MB superpage */
+    BUILD_BUG_ON(BOOT_FDT_VIRT_START % SZ_2M);
+
+    rc = map_pages_to_xen(BOOT_FDT_VIRT_START, maddr_to_mfn(base_paddr),
+                          SZ_2M >> PAGE_SHIFT,
+                          PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
+    if ( rc )
+        panic("Unable to map the device-tree.\n");
+
+
+    offset = fdt_paddr % SECOND_SIZE;
+    fdt_virt = (void *)BOOT_FDT_VIRT_START + offset;
+
+    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
+        return NULL;
+
+    size = fdt_totalsize(fdt_virt);
+    if ( size > MAX_FDT_SIZE )
+        return NULL;
+
+    if ( (offset + size) > SZ_2M )
+    {
+        rc = map_pages_to_xen(BOOT_FDT_VIRT_START + SZ_2M,
+                              maddr_to_mfn(base_paddr + SZ_2M),
+                              SZ_2M >> PAGE_SHIFT,
+                              PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
+        if ( rc )
+            panic("Unable to map the device-tree\n");
+    }
+
+    return fdt_virt;
+}
+
+void __init remove_early_mappings(void)
+{
+    int rc;
+
+    /* destroy the _PAGE_BLOCK mapping */
+    rc = modify_xen_mappings(BOOT_FDT_VIRT_START,
+                             BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE,
+                             _PAGE_BLOCK);
+    BUG_ON(rc);
+}
+
+/*
+ * After boot, Xen page-tables should not contain mapping that are both
+ * Writable and eXecutables.
+ *
+ * This should be called on each CPU to enforce the policy.
+ */
+static void xen_pt_enforce_wnx(void)
+{
+    WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
+    /*
+     * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
+     * before flushing the TLBs.
+     */
+    isb();
+    flush_xen_tlb_local();
+}
+
+extern void switch_ttbr(uint64_t ttbr);
+
+/* Clear a translation table and clean & invalidate the cache */
+static void clear_table(void *table)
+{
+    clear_page(table);
+    clean_and_invalidate_dcache_va_range(table, PAGE_SIZE);
+}
+
+/* Boot-time pagetable setup.
+ * Changes here may need matching changes in head.S */
+void __init setup_pagetables(unsigned long boot_phys_offset)
+{
+    uint64_t ttbr;
+    lpae_t pte, *p;
+    int i;
+
+    phys_offset = boot_phys_offset;
+
+#ifdef CONFIG_ARM_64
+    p = (void *) xen_pgtable;
+    p[0] = pte_of_xenaddr((uintptr_t)xen_first);
+    p[0].pt.table = 1;
+    p[0].pt.xn = 0;
+    p = (void *) xen_first;
+#else
+    p = (void *) cpu0_pgtable;
+#endif
+
+    /* Map xen second level page-table */
+    p[0] = pte_of_xenaddr((uintptr_t)(xen_second));
+    p[0].pt.table = 1;
+    p[0].pt.xn = 0;
+
+    /* Break up the Xen mapping into 4k pages and protect them separately. */
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+    {
+        vaddr_t va = XEN_VIRT_START + (i << PAGE_SHIFT);
+
+        if ( !is_kernel(va) )
+            break;
+        pte = pte_of_xenaddr(va);
+        pte.pt.table = 1; /* 4k mappings always have this bit set */
+        if ( is_kernel_text(va) || is_kernel_inittext(va) )
+        {
+            pte.pt.xn = 0;
+            pte.pt.ro = 1;
+        }
+        if ( is_kernel_rodata(va) )
+            pte.pt.ro = 1;
+        xen_xenmap[i] = pte;
+    }
+
+    /* Initialise xen second level entries ... */
+    /* ... Xen's text etc */
+
+    pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
+    pte.pt.table = 1;
+    xen_second[second_table_offset(XEN_VIRT_START)] = pte;
+
+    /* ... Fixmap */
+    pte = pte_of_xenaddr((vaddr_t)xen_fixmap);
+    pte.pt.table = 1;
+    xen_second[second_table_offset(FIXMAP_ADDR(0))] = pte;
+
+#ifdef CONFIG_ARM_64
+    ttbr = (uintptr_t) xen_pgtable + phys_offset;
+#else
+    ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
+#endif
+
+    switch_ttbr(ttbr);
+
+    xen_pt_enforce_wnx();
+
+#ifdef CONFIG_ARM_32
+    per_cpu(xen_pgtable, 0) = cpu0_pgtable;
+#endif
+}
+
+static void clear_boot_pagetables(void)
+{
+    /*
+     * Clear the copy of the boot pagetables. Each secondary CPU
+     * rebuilds these itself (see head.S).
+     */
+    clear_table(boot_pgtable);
+#ifdef CONFIG_ARM_64
+    clear_table(boot_first);
+    clear_table(boot_first_id);
+#endif
+    clear_table(boot_second);
+    clear_table(boot_third);
+}
+
+#ifdef CONFIG_ARM_64
+int init_secondary_pagetables(int cpu)
+{
+    clear_boot_pagetables();
+
+    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
+     * pagetables, but rewrite it each time for consistency with 32 bit. */
+    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
+    clean_dcache(init_ttbr);
+    return 0;
+}
+#else
+int init_secondary_pagetables(int cpu)
+{
+    lpae_t *first;
+
+    first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level trie */
+
+    if ( !first )
+    {
+        printk("CPU%u: Unable to allocate the first page-table\n", cpu);
+        return -ENOMEM;
+    }
+
+    /* Initialise root pagetable from root of boot tables */
+    memcpy(first, cpu0_pgtable, PAGE_SIZE);
+    per_cpu(xen_pgtable, cpu) = first;
+
+    if ( !init_domheap_mappings(cpu) )
+    {
+        printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
+        per_cpu(xen_pgtable, cpu) = NULL;
+        free_xenheap_page(first);
+        return -ENOMEM;
+    }
+
+    clear_boot_pagetables();
+
+    /* Set init_ttbr for this CPU coming up */
+    init_ttbr = __pa(first);
+    clean_dcache(init_ttbr);
+
+    return 0;
+}
+#endif
+
+/* MMU setup for secondary CPUS (which already have paging enabled) */
+void mmu_init_secondary_cpu(void)
+{
+    xen_pt_enforce_wnx();
+}
+
+#ifdef CONFIG_ARM_32
+/*
+ * Set up the direct-mapped xenheap:
+ * up to 1GB of contiguous, always-mapped memory.
+ */
+void __init setup_directmap_mappings(unsigned long base_mfn,
+                                     unsigned long nr_mfns)
+{
+    int rc;
+
+    rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns,
+                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+    if ( rc )
+        panic("Unable to setup the directmap mappings.\n");
+
+    /* Record where the directmap is, for translation routines. */
+    directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
+}
+#else /* CONFIG_ARM_64 */
+/* Map the region in the directmap area. */
+void __init setup_directmap_mappings(unsigned long base_mfn,
+                                     unsigned long nr_mfns)
+{
+    int rc;
+
+    /* First call sets the directmap physical and virtual offset. */
+    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
+    {
+        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
+
+        directmap_mfn_start = _mfn(base_mfn);
+        directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
+        /*
+         * The base address may not be aligned to the first level
+         * size (e.g. 1GB when using 4KB pages). This would prevent
+         * superpage mappings for all the regions because the virtual
+         * address and machine address should both be suitably aligned.
+         *
+         * Prevent that by offsetting the start of the directmap virtual
+         * address.
+         */
+        directmap_virt_start = DIRECTMAP_VIRT_START +
+            (base_mfn - mfn_gb) * PAGE_SIZE;
+    }
+
+    if ( base_mfn < mfn_x(directmap_mfn_start) )
+        panic("cannot add directmap mapping at %lx below heap start %lx\n",
+              base_mfn, mfn_x(directmap_mfn_start));
+
+    rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
+                          _mfn(base_mfn), nr_mfns,
+                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+    if ( rc )
+        panic("Unable to setup the directmap mappings.\n");
+}
+#endif
+
+/* Map a frame table to cover physical addresses ps through pe */
+void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
+{
+    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
+                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
+    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
+    mfn_t base_mfn;
+    const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : MB(32);
+    int rc;
+
+    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
+    /* Round up to 2M or 32M boundary, as appropriate. */
+    frametable_size = ROUNDUP(frametable_size, mapping_size);
+    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
+
+    rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
+                          frametable_size >> PAGE_SHIFT,
+                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+    if ( rc )
+        panic("Unable to setup the frametable mappings.\n");
+
+    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
+    memset(&frame_table[nr_pdxs], -1,
+           frametable_size - (nr_pdxs * sizeof(struct page_info)));
+
+    frametable_virt_end = FRAMETABLE_VIRT_START + (nr_pdxs * sizeof(struct page_info));
+}
+
+void *__init arch_vmap_virt_end(void)
+{
+    return (void *)(VMAP_VIRT_START + VMAP_VIRT_SIZE);
+}
+
+/*
+ * This function should only be used to remap device address ranges
+ * TODO: add a check to verify this assumption
+ */
+void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
+{
+    mfn_t mfn = _mfn(PFN_DOWN(pa));
+    unsigned int offs = pa & (PAGE_SIZE - 1);
+    unsigned int nr = PFN_UP(offs + len);
+    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
+
+    if ( ptr == NULL )
+        return NULL;
+
+    return ptr + offs;
+}
+
+void *ioremap(paddr_t pa, size_t len)
+{
+    return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
+}
+
+static int create_xen_table(lpae_t *entry)
+{
+    mfn_t mfn;
+    void *p;
+    lpae_t pte;
+
+    if ( system_state != SYS_STATE_early_boot )
+    {
+        struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+        if ( pg == NULL )
+            return -ENOMEM;
+
+        mfn = page_to_mfn(pg);
+    }
+    else
+        mfn = alloc_boot_pages(1, 1);
+
+    p = xen_map_table(mfn);
+    clear_page(p);
+    xen_unmap_table(p);
+
+    pte = mfn_to_xen_entry(mfn, MT_NORMAL);
+    pte.pt.table = 1;
+    write_pte(entry, pte);
+
+    return 0;
+}
+
+#define XEN_TABLE_MAP_FAILED 0
+#define XEN_TABLE_SUPER_PAGE 1
+#define XEN_TABLE_NORMAL_PAGE 2
+
+/*
+ * Take the currently mapped table, find the corresponding entry,
+ * and map the next table, if available.
+ *
+ * The read_only parameters indicates whether intermediate tables should
+ * be allocated when not present.
+ *
+ * Return values:
+ *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
+ *  was empty, or allocating a new page failed.
+ *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
+ *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
+ */
+static int xen_pt_next_level(bool read_only, unsigned int level,
+                             lpae_t **table, unsigned int offset)
+{
+    lpae_t *entry;
+    int ret;
+    mfn_t mfn;
+
+    entry = *table + offset;
+
+    if ( !lpae_is_valid(*entry) )
+    {
+        if ( read_only )
+            return XEN_TABLE_MAP_FAILED;
+
+        ret = create_xen_table(entry);
+        if ( ret )
+            return XEN_TABLE_MAP_FAILED;
+    }
+
+    /* The function xen_pt_next_level is never called at the 3rd level */
+    if ( lpae_is_mapping(*entry, level) )
+        return XEN_TABLE_SUPER_PAGE;
+
+    mfn = lpae_get_mfn(*entry);
+
+    xen_unmap_table(*table);
+    *table = xen_map_table(mfn);
+
+    return XEN_TABLE_NORMAL_PAGE;
+}
+
+/* Sanity check of the entry */
+static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
+                               unsigned int flags)
+{
+    /* Sanity check when modifying an entry. */
+    if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
+    {
+        /* We don't allow modifying an invalid entry. */
+        if ( !lpae_is_valid(entry) )
+        {
+            mm_printk("Modifying invalid entry is not allowed.\n");
+            return false;
+        }
+
+        /* We don't allow modifying a table entry */
+        if ( !lpae_is_mapping(entry, level) )
+        {
+            mm_printk("Modifying a table entry is not allowed.\n");
+            return false;
+        }
+
+        /* We don't allow changing memory attributes. */
+        if ( entry.pt.ai != PAGE_AI_MASK(flags) )
+        {
+            mm_printk("Modifying memory attributes is not allowed (0x%x -> 0x%x).\n",
+                      entry.pt.ai, PAGE_AI_MASK(flags));
+            return false;
+        }
+
+        /* We don't allow modifying entry with contiguous bit set. */
+        if ( entry.pt.contig )
+        {
+            mm_printk("Modifying entry with contiguous bit set is not allowed.\n");
+            return false;
+        }
+    }
+    /* Sanity check when inserting a mapping */
+    else if ( flags & _PAGE_PRESENT )
+    {
+        /* We should be here with a valid MFN. */
+        ASSERT(!mfn_eq(mfn, INVALID_MFN));
+
+        /*
+         * We don't allow replacing any valid entry.
+         *
+         * Note that the function xen_pt_update() relies on this
+         * assumption and will skip the TLB flush. The function will need
+         * to be updated if the check is relaxed.
+         */
+        if ( lpae_is_valid(entry) )
+        {
+            if ( lpae_is_mapping(entry, level) )
+                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
+                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
+            else
+                mm_printk("Trying to replace a table with a mapping.\n");
+            return false;
+        }
+    }
+    /* Sanity check when removing a mapping. */
+    else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
+    {
+        /* We should be here with an invalid MFN. */
+        ASSERT(mfn_eq(mfn, INVALID_MFN));
+
+        /* We don't allow removing a table */
+        if ( lpae_is_table(entry, level) )
+        {
+            mm_printk("Removing a table is not allowed.\n");
+            return false;
+        }
+
+        /* We don't allow removing a mapping with contiguous bit set. */
+        if ( entry.pt.contig )
+        {
+            mm_printk("Removing entry with contiguous bit set is not allowed.\n");
+            return false;
+        }
+    }
+    /* Sanity check when populating the page-table. No check so far. */
+    else
+    {
+        ASSERT(flags & _PAGE_POPULATE);
+        /* We should be here with an invalid MFN */
+        ASSERT(mfn_eq(mfn, INVALID_MFN));
+    }
+
+    return true;
+}
+
+/* Update an entry at the level @target. */
+static int xen_pt_update_entry(mfn_t root, unsigned long virt,
+                               mfn_t mfn, unsigned int target,
+                               unsigned int flags)
+{
+    int rc;
+    unsigned int level;
+    lpae_t *table;
+    /*
+     * The intermediate page tables are read-only when the MFN is not valid
+     * and we are not populating page table.
+     * This means we either modify permissions or remove an entry.
+     */
+    bool read_only = mfn_eq(mfn, INVALID_MFN) && !(flags & _PAGE_POPULATE);
+    lpae_t pte, *entry;
+
+    /* convenience aliases */
+    DECLARE_OFFSETS(offsets, (paddr_t)virt);
+
+    /* _PAGE_POPULATE and _PAGE_PRESENT should never be set together. */
+    ASSERT((flags & (_PAGE_POPULATE|_PAGE_PRESENT)) != (_PAGE_POPULATE|_PAGE_PRESENT));
+
+    table = xen_map_table(root);
+    for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
+    {
+        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
+        if ( rc == XEN_TABLE_MAP_FAILED )
+        {
+            /*
+             * We are here because xen_pt_next_level has failed to map
+             * the intermediate page table (e.g the table does not exist
+             * and the pt is read-only). It is a valid case when
+             * removing a mapping as it may not exist in the page table.
+             * In this case, just ignore it.
+             */
+            if ( flags & (_PAGE_PRESENT|_PAGE_POPULATE) )
+            {
+                mm_printk("%s: Unable to map level %u\n", __func__, level);
+                rc = -ENOENT;
+                goto out;
+            }
+            else
+            {
+                rc = 0;
+                goto out;
+            }
+        }
+        else if ( rc != XEN_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    if ( level != target )
+    {
+        mm_printk("%s: Shattering superpage is not supported\n", __func__);
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    entry = table + offsets[level];
+
+    rc = -EINVAL;
+    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
+        goto out;
+
+    /* If we are only populating page-table, then we are done. */
+    rc = 0;
+    if ( flags & _PAGE_POPULATE )
+        goto out;
+
+    /* We are removing the page */
+    if ( !(flags & _PAGE_PRESENT) )
+        memset(&pte, 0x00, sizeof(pte));
+    else
+    {
+        /* We are inserting a mapping => Create new pte. */
+        if ( !mfn_eq(mfn, INVALID_MFN) )
+        {
+            pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
+
+            /*
+             * First and second level pages set pte.pt.table = 0, but
+             * third level entries set pte.pt.table = 1.
+             */
+            pte.pt.table = (level == 3);
+        }
+        else /* We are updating the permission => Copy the current pte. */
+            pte = *entry;
+
+        /* Set permission */
+        pte.pt.ro = PAGE_RO_MASK(flags);
+        pte.pt.xn = PAGE_XN_MASK(flags);
+        /* Set contiguous bit */
+        pte.pt.contig = !!(flags & _PAGE_CONTIG);
+    }
+
+    write_pte(entry, pte);
+
+    rc = 0;
+
+out:
+    xen_unmap_table(table);
+
+    return rc;
+}
+
+/* Return the level where mapping should be done */
+static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
+                                unsigned int flags)
+{
+    unsigned int level;
+    unsigned long mask;
+
+    /*
+      * Don't take into account the MFN when removing mapping (i.e
+      * MFN_INVALID) to calculate the correct target order.
+      *
+      * Per the Arm Arm, `vfn` and `mfn` must be both superpage aligned.
+      * They are or-ed together and then checked against the size of
+      * each level.
+      *
+      * `left` is not included and checked separately to allow
+      * superpage mapping even if it is not properly aligned (the
+      * user may have asked to map 2MB + 4k).
+      */
+     mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
+     mask |= vfn;
+
+     /*
+      * Always use level 3 mapping unless the caller request block
+      * mapping.
+      */
+     if ( likely(!(flags & _PAGE_BLOCK)) )
+         level = 3;
+     else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) &&
+               (nr >= BIT(FIRST_ORDER, UL)) )
+         level = 1;
+     else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) &&
+               (nr >= BIT(SECOND_ORDER, UL)) )
+         level = 2;
+     else
+         level = 3;
+
+     return level;
+}
+
+#define XEN_PT_4K_NR_CONTIG 16
+
+/*
+ * Check whether the contiguous bit can be set. Return the number of
+ * contiguous entry allowed. If not allowed, return 1.
+ */
+static unsigned int xen_pt_check_contig(unsigned long vfn, mfn_t mfn,
+                                        unsigned int level, unsigned long left,
+                                        unsigned int flags)
+{
+    unsigned long nr_contig;
+
+    /*
+     * Allow the contiguous bit to set when the caller requests block
+     * mapping.
+     */
+    if ( !(flags & _PAGE_BLOCK) )
+        return 1;
+
+    /*
+     * We don't allow to remove mapping with the contiguous bit set.
+     * So shortcut the logic and directly return 1.
+     */
+    if ( mfn_eq(mfn, INVALID_MFN) )
+        return 1;
+
+    /*
+     * The number of contiguous entries varies depending on the page
+     * granularity used. The logic below assumes 4KB.
+     */
+    BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
+
+    /*
+     * In order to enable the contiguous bit, we should have enough entries
+     * to map left and both the virtual and physical address should be
+     * aligned to the size of 16 translation tables entries.
+     */
+    nr_contig = BIT(XEN_PT_LEVEL_ORDER(level), UL) * XEN_PT_4K_NR_CONTIG;
+
+    if ( (left < nr_contig) || ((mfn_x(mfn) | vfn) & (nr_contig - 1)) )
+        return 1;
+
+    return XEN_PT_4K_NR_CONTIG;
+}
+
+static DEFINE_SPINLOCK(xen_pt_lock);
+
+static int xen_pt_update(unsigned long virt,
+                         mfn_t mfn,
+                         /* const on purpose as it is used for TLB flush */
+                         const unsigned long nr_mfns,
+                         unsigned int flags)
+{
+    int rc = 0;
+    unsigned long vfn = virt >> PAGE_SHIFT;
+    unsigned long left = nr_mfns;
+
+    /*
+     * For arm32, page-tables are different on each CPUs. Yet, they share
+     * some common mappings. It is assumed that only common mappings
+     * will be modified with this function.
+     *
+     * XXX: Add a check.
+     */
+    const mfn_t root = maddr_to_mfn(READ_SYSREG64(TTBR0_EL2));
+
+    /*
+     * The hardware was configured to forbid mapping both writeable and
+     * executable.
+     * When modifying/creating mapping (i.e _PAGE_PRESENT is set),
+     * prevent any update if this happen.
+     */
+    if ( (flags & _PAGE_PRESENT) && !PAGE_RO_MASK(flags) &&
+         !PAGE_XN_MASK(flags) )
+    {
+        mm_printk("Mappings should not be both Writeable and Executable.\n");
+        return -EINVAL;
+    }
+
+    if ( flags & _PAGE_CONTIG )
+    {
+        mm_printk("_PAGE_CONTIG is an internal only flag.\n");
+        return -EINVAL;
+    }
+
+    if ( !IS_ALIGNED(virt, PAGE_SIZE) )
+    {
+        mm_printk("The virtual address is not aligned to the page-size.\n");
+        return -EINVAL;
+    }
+
+    spin_lock(&xen_pt_lock);
+
+    while ( left )
+    {
+        unsigned int order, level, nr_contig, new_flags;
+
+        level = xen_pt_mapping_level(vfn, mfn, left, flags);
+        order = XEN_PT_LEVEL_ORDER(level);
+
+        ASSERT(left >= BIT(order, UL));
+
+        /*
+         * Check if we can set the contiguous mapping and update the
+         * flags accordingly.
+         */
+        nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags);
+        new_flags = flags | ((nr_contig > 1) ? _PAGE_CONTIG : 0);
+
+        for ( ; nr_contig > 0; nr_contig-- )
+        {
+            rc = xen_pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level,
+                                     new_flags);
+            if ( rc )
+                break;
+
+            vfn += 1U << order;
+            if ( !mfn_eq(mfn, INVALID_MFN) )
+                mfn = mfn_add(mfn, 1U << order);
+
+            left -= (1U << order);
+        }
+
+        if ( rc )
+            break;
+    }
+
+    /*
+     * The TLBs flush can be safely skipped when a mapping is inserted
+     * as we don't allow mapping replacement (see xen_pt_check_entry()).
+     *
+     * For all the other cases, the TLBs will be flushed unconditionally
+     * even if the mapping has failed. This is because we may have
+     * partially modified the PT. This will prevent any unexpected
+     * behavior afterwards.
+     */
+    if ( !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) )
+        flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
+
+    spin_unlock(&xen_pt_lock);
+
+    return rc;
+}
+
+int map_pages_to_xen(unsigned long virt,
+                     mfn_t mfn,
+                     unsigned long nr_mfns,
+                     unsigned int flags)
+{
+    return xen_pt_update(virt, mfn, nr_mfns, flags);
+}
+
+int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
+{
+    return xen_pt_update(virt, INVALID_MFN, nr_mfns, _PAGE_POPULATE);
+}
+
+int destroy_xen_mappings(unsigned long s, unsigned long e)
+{
+    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
+    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
+    ASSERT(s <= e);
+    return xen_pt_update(s, INVALID_MFN, (e - s) >> PAGE_SHIFT, 0);
+}
+
+int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
+{
+    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
+    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
+    ASSERT(s <= e);
+    return xen_pt_update(s, INVALID_MFN, (e - s) >> PAGE_SHIFT, flags);
+}
+
+/* Release all __init and __initdata ranges to be reused */
+void free_init_memory(void)
+{
+    paddr_t pa = virt_to_maddr(__init_begin);
+    unsigned long len = __init_end - __init_begin;
+    uint32_t insn;
+    unsigned int i, nr = len / sizeof(insn);
+    uint32_t *p;
+    int rc;
+
+    rc = modify_xen_mappings((unsigned long)__init_begin,
+                             (unsigned long)__init_end, PAGE_HYPERVISOR_RW);
+    if ( rc )
+        panic("Unable to map RW the init section (rc = %d)\n", rc);
+
+    /*
+     * From now on, init will not be used for execution anymore,
+     * so nuke the instruction cache to remove entries related to init.
+     */
+    invalidate_icache_local();
+
+#ifdef CONFIG_ARM_32
+    /* udf instruction i.e (see A8.8.247 in ARM DDI 0406C.c) */
+    insn = 0xe7f000f0;
+#else
+    insn = AARCH64_BREAK_FAULT;
+#endif
+    p = (uint32_t *)__init_begin;
+    for ( i = 0; i < nr; i++ )
+        *(p + i) = insn;
+
+    rc = destroy_xen_mappings((unsigned long)__init_begin,
+                              (unsigned long)__init_end);
+    if ( rc )
+        panic("Unable to remove the init section (rc = %d)\n", rc);
+
+    init_domheap_pages(pa, pa + len);
+    printk("Freed %ldkB init memory.\n", (long)(__init_end-__init_begin)>>10);
+}
+
+int xenmem_add_to_physmap_one(
+    struct domain *d,
+    unsigned int space,
+    union add_to_physmap_extra extra,
+    unsigned long idx,
+    gfn_t gfn)
+{
+    mfn_t mfn = INVALID_MFN;
+    int rc;
+    p2m_type_t t;
+    struct page_info *page = NULL;
+
+    switch ( space )
+    {
+    case XENMAPSPACE_grant_table:
+        rc = gnttab_map_frame(d, idx, gfn, &mfn);
+        if ( rc )
+            return rc;
+
+        /* Need to take care of the reference obtained in gnttab_map_frame(). */
+        page = mfn_to_page(mfn);
+        t = p2m_ram_rw;
+
+        break;
+    case XENMAPSPACE_shared_info:
+        if ( idx != 0 )
+            return -EINVAL;
+
+        mfn = virt_to_mfn(d->shared_info);
+        t = p2m_ram_rw;
+
+        break;
+    case XENMAPSPACE_gmfn_foreign:
+    {
+        struct domain *od;
+        p2m_type_t p2mt;
+
+        od = get_pg_owner(extra.foreign_domid);
+        if ( od == NULL )
+            return -ESRCH;
+
+        if ( od == d )
+        {
+            put_pg_owner(od);
+            return -EINVAL;
+        }
+
+        rc = xsm_map_gmfn_foreign(XSM_TARGET, d, od);
+        if ( rc )
+        {
+            put_pg_owner(od);
+            return rc;
+        }
+
+        /* Take reference to the foreign domain page.
+         * Reference will be released in XENMEM_remove_from_physmap */
+        page = get_page_from_gfn(od, idx, &p2mt, P2M_ALLOC);
+        if ( !page )
+        {
+            put_pg_owner(od);
+            return -EINVAL;
+        }
+
+        if ( p2m_is_ram(p2mt) )
+            t = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : p2m_map_foreign_ro;
+        else
+        {
+            put_page(page);
+            put_pg_owner(od);
+            return -EINVAL;
+        }
+
+        mfn = page_to_mfn(page);
+
+        put_pg_owner(od);
+        break;
+    }
+    case XENMAPSPACE_dev_mmio:
+        rc = map_dev_mmio_page(d, gfn, _mfn(idx));
+        return rc;
+
+    default:
+        return -ENOSYS;
+    }
+
+    /*
+     * Map at new location. Here we need to map xenheap RAM page differently
+     * because we need to store the valid GFN and make sure that nothing was
+     * mapped before (the stored GFN is invalid). And these actions need to be
+     * performed with the P2M lock held. The guest_physmap_add_entry() is just
+     * a wrapper on top of p2m_set_entry().
+     */
+    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
+        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
+    else
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        p2m_write_lock(p2m);
+        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), INVALID_GFN) )
+        {
+            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
+            if ( !rc )
+                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
+        }
+        else
+            /*
+             * Mandate the caller to first unmap the page before mapping it
+             * again. This is to prevent Xen creating an unwanted hole in
+             * the P2M. For instance, this could happen if the firmware stole
+             * a RAM address for mapping the shared_info page into but forgot
+             * to unmap it afterwards.
+             */
+            rc = -EBUSY;
+        p2m_write_unlock(p2m);
+    }
+
+    /*
+     * For XENMAPSPACE_gmfn_foreign if we failed to add the mapping, we need
+     * to drop the reference we took earlier. In all other cases we need to
+     * drop any reference we took earlier (perhaps indirectly).
+     */
+    if ( space == XENMAPSPACE_gmfn_foreign ? rc : page != NULL )
+    {
+        ASSERT(page != NULL);
+        put_page(page);
+    }
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 43e9a1be4d..87a12042cc 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -20,8 +20,10 @@
  */
 
 #include <xen/init.h>
+#include <xen/mm.h>
 #include <xen/page-size.h>
 #include <asm/arm64/mpu.h>
+#include <asm/page.h>
 
 /* Xen MPU memory region mapping table. */
 pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
@@ -38,6 +40,71 @@ uint64_t __ro_after_init next_transient_region_idx;
 /* Maximum number of supported MPU memory regions by the EL2 MPU. */
 uint64_t __ro_after_init max_xen_mpumap;
 
+/* TODO: Implementation on the first usage */
+void dump_hyp_walk(vaddr_t addr)
+{
+}
+
+void * __init early_fdt_map(paddr_t fdt_paddr)
+{
+    return NULL;
+}
+
+void __init remove_early_mappings(void)
+{
+}
+
+int init_secondary_pagetables(int cpu)
+{
+    return -ENOSYS;
+}
+
+void mmu_init_secondary_cpu(void)
+{
+}
+
+void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
+{
+    return NULL;
+}
+
+void *ioremap(paddr_t pa, size_t len)
+{
+    return NULL;
+}
+
+int map_pages_to_xen(unsigned long virt,
+                     mfn_t mfn,
+                     unsigned long nr_mfns,
+                     unsigned int flags)
+{
+    return -ENOSYS;
+}
+
+int destroy_xen_mappings(unsigned long s, unsigned long e)
+{
+    return -ENOSYS;
+}
+
+int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
+{
+    return -ENOSYS;
+}
+
+void free_init_memory(void)
+{
+}
+
+int xenmem_add_to_physmap_one(
+    struct domain *d,
+    unsigned int space,
+    union add_to_physmap_extra extra,
+    unsigned long idx,
+    gfn_t gfn)
+{
+    return -ENOSYS;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (14 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-05 21:32   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 17/40] xen/mpu: plump virt/maddr/mfn convertion in MPU system Penny Zheng
                   ` (26 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

Function setup_pagetables is responsible for boot-time pagetable setup
in MMU system.
But in MPU system, we have already built up start-of-day Xen MPU memory region
mapping at the very beginning in assembly.

So in order to keep only one codeflow in arm/setup.c, setup_mm_mappings
, with a more generic name, is introduced and act as an empty stub in
MPU system.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/mm.h     |  2 ++
 xen/arch/arm/include/asm/mm_mpu.h | 16 ++++++++++++++++
 xen/arch/arm/setup.c              |  2 +-
 3 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/include/asm/mm_mpu.h

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 1b9fdb6ff5..9b4c07d965 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -243,6 +243,8 @@ static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
 
 #ifndef CONFIG_HAS_MPU
 #include <asm/mm_mmu.h>
+#else
+#include <asm/mm_mpu.h>
 #endif
 
 /* Page-align address and convert to frame number format */
diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
new file mode 100644
index 0000000000..1f3cff7743
--- /dev/null
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ARCH_ARM_MM_MPU__
+#define __ARCH_ARM_MM_MPU__
+
+#define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
+
+#endif /* __ARCH_ARM_MM_MPU__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 1f26f67b90..d7d200179c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -1003,7 +1003,7 @@ void __init start_xen(unsigned long boot_phys_offset,
     /* Initialize traps early allow us to get backtrace when an error occurred */
     init_traps();
 
-    setup_pagetables(boot_phys_offset);
+    setup_mm_mappings(boot_phys_offset);
 
     smp_clear_cpu_maps();
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 17/40] xen/mpu: plump virt/maddr/mfn convertion in MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (15 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-05 21:36   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 18/40] xen/mpu: introduce helper access_protection_region Penny Zheng
                   ` (25 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

virt_to_maddr and maddr_to_virt are used widely in Xen code. So
even there is no VMSA in MPU system, we keep the interface name to
stay the same code flow.

We move the existing virt/maddr convertion from mm.h to mm_mmu.h.
And the MPU version of virt/maddr convertion is simple, returning
the input address as the output.

We should overide virt_to_mfn/mfn_to_virt in source file mm_mpu.c the
same way in mm_mmu.c.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/mm.h     | 26 --------------------------
 xen/arch/arm/include/asm/mm_mmu.h | 26 ++++++++++++++++++++++++++
 xen/arch/arm/include/asm/mm_mpu.h | 13 +++++++++++++
 xen/arch/arm/mm_mpu.c             |  6 ++++++
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 9b4c07d965..e29158028a 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -250,32 +250,6 @@ static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
 /* Page-align address and convert to frame number format */
 #define paddr_to_pfn_aligned(paddr)    paddr_to_pfn(PAGE_ALIGN(paddr))
 
-static inline paddr_t __virt_to_maddr(vaddr_t va)
-{
-    uint64_t par = va_to_par(va);
-    return (par & PADDR_MASK & PAGE_MASK) | (va & ~PAGE_MASK);
-}
-#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
-
-#ifdef CONFIG_ARM_32
-static inline void *maddr_to_virt(paddr_t ma)
-{
-    ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
-    ma -= mfn_to_maddr(directmap_mfn_start);
-    return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
-}
-#else
-static inline void *maddr_to_virt(paddr_t ma)
-{
-    ASSERT((mfn_to_pdx(maddr_to_mfn(ma)) - directmap_base_pdx) <
-           (DIRECTMAP_SIZE >> PAGE_SHIFT));
-    return (void *)(XENHEAP_VIRT_START -
-                    (directmap_base_pdx << PAGE_SHIFT) +
-                    ((ma & ma_va_bottom_mask) |
-                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
-}
-#endif
-
 /*
  * Translate a guest virtual address to a machine address.
  * Return the fault information if the translation has failed else 0.
diff --git a/xen/arch/arm/include/asm/mm_mmu.h b/xen/arch/arm/include/asm/mm_mmu.h
index a5e63d8af8..6d7e5ddde7 100644
--- a/xen/arch/arm/include/asm/mm_mmu.h
+++ b/xen/arch/arm/include/asm/mm_mmu.h
@@ -23,6 +23,32 @@ extern uint64_t init_ttbr;
 extern void setup_directmap_mappings(unsigned long base_mfn,
                                      unsigned long nr_mfns);
 
+static inline paddr_t __virt_to_maddr(vaddr_t va)
+{
+    uint64_t par = va_to_par(va);
+    return (par & PADDR_MASK & PAGE_MASK) | (va & ~PAGE_MASK);
+}
+#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
+
+#ifdef CONFIG_ARM_32
+static inline void *maddr_to_virt(paddr_t ma)
+{
+    ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
+    ma -= mfn_to_maddr(directmap_mfn_start);
+    return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
+}
+#else
+static inline void *maddr_to_virt(paddr_t ma)
+{
+    ASSERT((mfn_to_pdx(maddr_to_mfn(ma)) - directmap_base_pdx) <
+           (DIRECTMAP_SIZE >> PAGE_SHIFT));
+    return (void *)(XENHEAP_VIRT_START -
+                    (directmap_base_pdx << PAGE_SHIFT) +
+                    ((ma & ma_va_bottom_mask) |
+                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
+}
+#endif
+
 #endif /* __ARCH_ARM_MM_MMU__ */
 
 /*
diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
index 1f3cff7743..3a4b07f187 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -4,6 +4,19 @@
 
 #define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
 
+static inline paddr_t __virt_to_maddr(vaddr_t va)
+{
+    /* In MPU system, VA == PA. */
+    return (paddr_t)va;
+}
+#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
+
+static inline void *maddr_to_virt(paddr_t ma)
+{
+    /* In MPU system, VA == PA. */
+    return (void *)ma;
+}
+
 #endif /* __ARCH_ARM_MM_MPU__ */
 
 /*
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 87a12042cc..c9e17ab6da 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -29,6 +29,12 @@
 pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
      xen_mpumap[ARM_MAX_MPU_MEMORY_REGIONS];
 
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef virt_to_mfn
+#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+#undef mfn_to_virt
+#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))
+
 /* Index into MPU memory region map for fixed regions, ascending from zero. */
 uint64_t __ro_after_init next_fixed_region_idx;
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 18/40] xen/mpu: introduce helper access_protection_region
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (16 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 17/40] xen/mpu: plump virt/maddr/mfn convertion in MPU system Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-24 19:20   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table Penny Zheng
                   ` (24 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

Each EL2 MPU protection region could be configured using PRBAR<n>_EL2 and
PRLAR<n>_EL2.

This commit introduces a new helper access_protection_region() to access
EL2 MPU protection region, including both read/write operations.

As explained in section G1.3.18 of the reference manual for AArch64v8R,
a set of system register PRBAR<n>_EL2 and PRLAR<n>_EL2 provide access to
the EL2 MPU region which is determined by the value of 'n' and
PRSELR_EL2.REGION as PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
For example to access regions from 16 to 31:
- Set PRSELR_EL2 to 0b1xxxx
- Region 16 configuration is accessible through PRBAR0_EL2 and PRLAR0_EL2
- Region 17 configuration is accessible through PRBAR1_EL2 and PRLAR1_EL2
- Region 18 configuration is accessible through PRBAR2_EL2 and PRLAR2_EL2
- ...
- Region 31 configuration is accessible through PRBAR15_EL2 and PRLAR15_EL2

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/mm_mpu.c | 151 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index c9e17ab6da..f2b494449c 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -46,6 +46,157 @@ uint64_t __ro_after_init next_transient_region_idx;
 /* Maximum number of supported MPU memory regions by the EL2 MPU. */
 uint64_t __ro_after_init max_xen_mpumap;
 
+/* Write a MPU protection region */
+#define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
+    uint64_t _sel = sel;                                                \
+    const pr_t *_pr = pr;                                               \
+    asm volatile(                                                       \
+        "msr "__stringify(PRSELR_EL2)", %0;" /* Selects the region */   \
+        "dsb sy;"                                                       \
+        "msr "__stringify(prbar_el2)", %1;" /* Write PRBAR<n>_EL2 */    \
+        "msr "__stringify(prlar_el2)", %2;" /* Write PRLAR<n>_EL2 */    \
+        "dsb sy;"                                                       \
+        : : "r" (_sel), "r" (_pr->prbar.bits), "r" (_pr->prlar.bits));  \
+})
+
+/* Read a MPU protection region */
+#define READ_PROTECTION_REGION(sel, prbar_el2, prlar_el2) ({            \
+    uint64_t _sel = sel;                                                \
+    pr_t _pr;                                                           \
+    asm volatile(                                                       \
+        "msr "__stringify(PRSELR_EL2)", %2;" /* Selects the region */   \
+        "dsb sy;"                                                       \
+        "mrs %0, "__stringify(prbar_el2)";" /* Read PRBAR<n>_EL2 */     \
+        "mrs %1, "__stringify(prlar_el2)";" /* Read PRLAR<n>_EL2 */     \
+        "dsb sy;"                                                       \
+        : "=r" (_pr.prbar.bits), "=r" (_pr.prlar.bits) : "r" (_sel));   \
+    _pr;                                                                \
+})
+
+/*
+ * Access MPU protection region, including both read/write operations.
+ * Armv8-R AArch64 at most supports 255 MPU protection regions.
+ * See section G1.3.18 of the reference manual for Armv8-R AArch64,
+ * PRBAR<n>_EL2 and PRLAR<n>_EL2 provide access to the EL2 MPU region
+ * determined by the value of 'n' and PRSELR_EL2.REGION as
+ * PRSELR_EL2.REGION<7:4>:n(n = 0, 1, 2, ... , 15)
+ * For example to access regions from 16 to 31 (0b10000 to 0b11111):
+ * - Set PRSELR_EL2 to 0b1xxxx
+ * - Region 16 configuration is accessible through PRBAR0_ELx and PRLAR0_ELx
+ * - Region 17 configuration is accessible through PRBAR1_ELx and PRLAR1_ELx
+ * - Region 18 configuration is accessible through PRBAR2_ELx and PRLAR2_ELx
+ * - ...
+ * - Region 31 configuration is accessible through PRBAR15_ELx and PRLAR15_ELx
+ *
+ * @read: if it is read operation.
+ * @pr_read: mpu protection region returned by read op.
+ * @pr_write: const mpu protection region passed through write op.
+ * @sel: mpu protection region selector
+ */
+static void access_protection_region(bool read, pr_t *pr_read,
+                                     const pr_t *pr_write, uint64_t sel)
+{
+    switch ( sel & 0xf )
+    {
+    case 0:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR0_EL2, PRLAR0_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR0_EL2, PRLAR0_EL2);
+        break;
+    case 1:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR1_EL2, PRLAR1_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR1_EL2, PRLAR1_EL2);
+        break;
+    case 2:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR2_EL2, PRLAR2_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR2_EL2, PRLAR2_EL2);
+        break;
+    case 3:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR3_EL2, PRLAR3_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR3_EL2, PRLAR3_EL2);
+        break;
+    case 4:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR4_EL2, PRLAR4_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR4_EL2, PRLAR4_EL2);
+        break;
+    case 5:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR5_EL2, PRLAR5_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR5_EL2, PRLAR5_EL2);
+        break;
+    case 6:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR6_EL2, PRLAR6_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR6_EL2, PRLAR6_EL2);
+        break;
+    case 7:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR7_EL2, PRLAR7_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR7_EL2, PRLAR7_EL2);
+        break;
+    case 8:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR8_EL2, PRLAR8_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR8_EL2, PRLAR8_EL2);
+        break;
+    case 9:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR9_EL2, PRLAR9_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR9_EL2, PRLAR9_EL2);
+        break;
+    case 10:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR10_EL2, PRLAR10_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR10_EL2, PRLAR10_EL2);
+        break;
+    case 11:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR11_EL2, PRLAR11_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR11_EL2, PRLAR11_EL2);
+        break;
+    case 12:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR12_EL2, PRLAR12_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR12_EL2, PRLAR12_EL2);
+        break;
+    case 13:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR13_EL2, PRLAR13_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR13_EL2, PRLAR13_EL2);
+        break;
+    case 14:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR14_EL2, PRLAR14_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR14_EL2, PRLAR14_EL2);
+        break;
+    case 15:
+        if ( read )
+            *pr_read = READ_PROTECTION_REGION(sel, PRBAR15_EL2, PRLAR15_EL2);
+        else
+            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR15_EL2, PRLAR15_EL2);
+        break;
+    }
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (17 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 18/40] xen/mpu: introduce helper access_protection_region Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-05 21:45   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems Penny Zheng
                   ` (23 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

The new helper xen_mpumap_update() is responsible for updating an entry
in Xen MPU memory mapping table, including creating a new entry, updating
or destroying an existing one.

This commit only talks about populating a new entry in Xen MPU mapping table(
xen_mpumap). Others will be introduced in the following commits.

In xen_mpumap_update_entry(), firstly, we shall check if requested address
range [base, limit) is not mapped. Then we use pr_of_xenaddr() to build up the
structure of MPU memory region(pr_t).
In the last, we set memory attribute and permission based on variable @flags.

To summarize all region attributes in one variable @flags, layout of the
flags is elaborated as follows:
[0:2] Memory attribute Index
[3:4] Execute Never
[5:6] Access Permission
[7]   Region Present
Also, we provide a set of definitions(REGION_HYPERVISOR_RW, etc) that combine
the memory attribute and permission for common combinations.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/arm64/mpu.h |  72 +++++++
 xen/arch/arm/mm_mpu.c                | 276 ++++++++++++++++++++++++++-
 2 files changed, 340 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
index c945dd53db..fcde6ad0db 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -16,6 +16,61 @@
  */
 #define ARM_MAX_MPU_MEMORY_REGIONS 255
 
+/* Access permission attributes. */
+/* Read/Write at EL2, No Access at EL1/EL0. */
+#define AP_RW_EL2 0x0
+/* Read/Write at EL2/EL1/EL0 all levels. */
+#define AP_RW_ALL 0x1
+/* Read-only at EL2, No Access at EL1/EL0. */
+#define AP_RO_EL2 0x2
+/* Read-only at EL2/EL1/EL0 all levels. */
+#define AP_RO_ALL 0x3
+
+/*
+ * Excute never.
+ * Stage 1 EL2 translation regime.
+ * XN[1] determines whether execution of the instruction fetched from the MPU
+ * memory region is permitted.
+ * Stage 2 EL1/EL0 translation regime.
+ * XN[0] determines whether execution of the instruction fetched from the MPU
+ * memory region is permitted.
+ */
+#define XN_DISABLED    0x0
+#define XN_P2M_ENABLED 0x1
+#define XN_ENABLED     0x2
+
+/*
+ * Layout of the flags used for updating Xen MPU region attributes
+ * [0:2] Memory attribute Index
+ * [3:4] Execute Never
+ * [5:6] Access Permission
+ * [7]   Region Present
+ */
+#define _REGION_AI_BIT            0
+#define _REGION_XN_BIT            3
+#define _REGION_AP_BIT            5
+#define _REGION_PRESENT_BIT       7
+#define _REGION_XN                (2U << _REGION_XN_BIT)
+#define _REGION_RO                (2U << _REGION_AP_BIT)
+#define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
+#define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
+#define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
+#define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
+#define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
+
+/*
+ * _REGION_NORMAL is convenience define. It is not meant to be used
+ * outside of this header.
+ */
+#define _REGION_NORMAL            (MT_NORMAL|_REGION_PRESENT)
+
+#define REGION_HYPERVISOR_RW      (_REGION_NORMAL|_REGION_XN)
+#define REGION_HYPERVISOR_RO      (_REGION_NORMAL|_REGION_XN|_REGION_RO)
+
+#define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
+
+#define INVALID_REGION            (~0UL)
+
 #ifndef __ASSEMBLY__
 
 /* Protection Region Base Address Register */
@@ -49,6 +104,23 @@ typedef struct {
     prlar_t prlar;
 } pr_t;
 
+/* Access to set base address of MPU protection region(pr_t). */
+#define pr_set_base(pr, paddr) ({                           \
+    pr_t *_pr = pr;                                         \
+    _pr->prbar.reg.base = (paddr >> MPU_REGION_SHIFT);      \
+})
+
+/* Access to set limit address of MPU protection region(pr_t). */
+#define pr_set_limit(pr, paddr) ({                          \
+    pr_t *_pr = pr;                                         \
+    _pr->prlar.reg.limit = (paddr >> MPU_REGION_SHIFT);     \
+})
+
+#define region_is_valid(pr) ({                              \
+    pr_t *_pr = pr;                                         \
+    _pr->prlar.reg.en;                                      \
+})
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ARM64_MPU_H__ */
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index f2b494449c..08720a7c19 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -22,9 +22,23 @@
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/page-size.h>
+#include <xen/spinlock.h>
 #include <asm/arm64/mpu.h>
 #include <asm/page.h>
 
+#ifdef NDEBUG
+static inline void
+__attribute__ ((__format__ (__printf__, 1, 2)))
+region_printk(const char *fmt, ...) {}
+#else
+#define region_printk(fmt, args...)         \
+    do                                      \
+    {                                       \
+        dprintk(XENLOG_ERR, fmt, ## args);  \
+        WARN();                             \
+    } while (0)
+#endif
+
 /* Xen MPU memory region mapping table. */
 pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
      xen_mpumap[ARM_MAX_MPU_MEMORY_REGIONS];
@@ -46,6 +60,8 @@ uint64_t __ro_after_init next_transient_region_idx;
 /* Maximum number of supported MPU memory regions by the EL2 MPU. */
 uint64_t __ro_after_init max_xen_mpumap;
 
+static DEFINE_SPINLOCK(xen_mpumap_lock);
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
     uint64_t _sel = sel;                                                \
@@ -73,6 +89,28 @@ uint64_t __ro_after_init max_xen_mpumap;
     _pr;                                                                \
 })
 
+/*
+ * In boot-time, fixed MPU regions(e.g. Xen text section) are added
+ * at the front, indexed by next_fixed_region_idx, the others like
+ * boot-only regions(e.g. early FDT) should be added at the rear,
+ * indexed by next_transient_region_idx.
+ * With more and more MPU regions added-in, when the two indexes
+ * meet and pass with each other, we would run out of the whole
+ * EL2 MPU memory regions.
+ */
+static bool __init xen_boot_mpu_regions_is_full(void)
+{
+    return next_transient_region_idx < next_fixed_region_idx;
+}
+
+static void __init update_boot_xen_mpumap_idx(uint64_t idx)
+{
+    if ( idx == next_transient_region_idx )
+        next_transient_region_idx--;
+    else
+        next_fixed_region_idx++;
+}
+
 /*
  * Access MPU protection region, including both read/write operations.
  * Armv8-R AArch64 at most supports 255 MPU protection regions.
@@ -197,6 +235,236 @@ static void access_protection_region(bool read, pr_t *pr_read,
     }
 }
 
+/*
+ * Standard entry for building up the structure of MPU memory region(pr_t).
+ * It is equivalent to mfn_to_xen_entry in MMU system.
+ * base and limit both refer to inclusive address.
+ */
+static inline pr_t pr_of_xenaddr(paddr_t base, paddr_t limit, unsigned attr)
+{
+    prbar_t prbar;
+    prlar_t prlar;
+    pr_t region;
+
+    /* Build up value for PRBAR_EL2. */
+    prbar = (prbar_t) {
+        .reg = {
+            .ap = AP_RW_EL2,  /* Read/Write at EL2, no access at EL1/EL0. */
+            .xn = XN_ENABLED, /* No need to execute outside .text */
+        }};
+
+    switch ( attr )
+    {
+    case MT_NORMAL_NC:
+        /*
+         * ARM ARM: Overlaying the shareability attribute (DDI
+         * 0406C.b B3-1376 to 1377)
+         *
+         * A memory region with a resultant memory type attribute of normal,
+         * and a resultant cacheability attribute of Inner non-cacheable,
+         * outer non-cacheable, must have a resultant shareability attribute
+         * of outer shareable, otherwise shareability is UNPREDICTABLE.
+         *
+         * On ARMv8 sharability is ignored and explicitly treated as outer
+         * shareable for normal inner non-cacheable, outer non-cacheable.
+         */
+        prbar.reg.sh = LPAE_SH_OUTER;
+        break;
+    case MT_DEVICE_nGnRnE:
+    case MT_DEVICE_nGnRE:
+        /*
+         * Shareability is ignored for non-normal memory, Outer is as
+         * good as anything.
+         *
+         * On ARMv8 sharability is ignored and explicitly treated as outer
+         * shareable for any device memory type.
+         */
+        prbar.reg.sh = LPAE_SH_OUTER;
+        break;
+    default:
+        /* Xen mappings are SMP coherent */
+        prbar.reg.sh = LPAE_SH_INNER;
+        break;
+    }
+
+    /* Build up value for PRLAR_EL2. */
+    prlar = (prlar_t) {
+        .reg = {
+            .ns = 0,        /* Hyp mode is in secure world */
+            .ai = attr,
+            .en = 1,        /* Region enabled */
+        }};
+
+    /* Build up MPU memory region. */
+    region = (pr_t) {
+        .prbar = prbar,
+        .prlar = prlar,
+    };
+
+    /* Set base address and limit address. */
+    pr_set_base(&region, base);
+    pr_set_limit(&region, limit);
+
+    return region;
+}
+
+#define MPUMAP_REGION_FAILED    0
+#define MPUMAP_REGION_FOUND     1
+#define MPUMAP_REGION_INCLUSIVE 2
+#define MPUMAP_REGION_OVERLAP   3
+
+/*
+ * Check whether memory range [base, limit] is mapped in MPU memory
+ * region table \mpu. Only address range is considered, memory attributes
+ * and permission are not considered here.
+ * If we find the match, the associated index will be filled up.
+ * If the entry is not present, INVALID_REGION will be set in \index
+ *
+ * Make sure that parameter \base and \limit are both referring
+ * inclusive addresss
+ *
+ * Return values:
+ *  MPUMAP_REGION_FAILED: no mapping and no overmapping
+ *  MPUMAP_REGION_FOUND: find an exact match in address
+ *  MPUMAP_REGION_INCLUSIVE: find an inclusive match in address
+ *  MPUMAP_REGION_OVERLAP: overlap with the existing mapping
+ */
+static int mpumap_contain_region(pr_t *mpu, uint64_t nr_regions,
+                                 paddr_t base, paddr_t limit, uint64_t *index)
+{
+    uint64_t i = 0;
+    uint64_t _index = INVALID_REGION;
+
+    /* Allow index to be NULL */
+    index = index ?: &_index;
+
+    for ( ; i < nr_regions; i++ )
+    {
+        paddr_t iter_base = pr_get_base(&mpu[i]);
+        paddr_t iter_limit = pr_get_limit(&mpu[i]);
+
+        /* Found an exact valid match */
+        if ( (iter_base == base) && (iter_limit == limit) &&
+             region_is_valid(&mpu[i]) )
+        {
+            *index = i;
+            return MPUMAP_REGION_FOUND;
+        }
+
+        /* No overlapping */
+        if ( (iter_limit < base) || (iter_base > limit) )
+            continue;
+        /* Inclusive and valid */
+        else if ( (base >= iter_base) && (limit <= iter_limit) &&
+                  region_is_valid(&mpu[i]) )
+        {
+            *index = i;
+            return MPUMAP_REGION_INCLUSIVE;
+        }
+        else
+        {
+            region_printk("Range 0x%"PRIpaddr" - 0x%"PRIpaddr" overlaps with the existing region 0x%"PRIpaddr" - 0x%"PRIpaddr"\n",
+                          base, limit, iter_base, iter_limit);
+            return MPUMAP_REGION_OVERLAP;
+        }
+    }
+
+    return MPUMAP_REGION_FAILED;
+}
+
+/*
+ * Update an entry at the index @idx.
+ * @base:  base address
+ * @limit: limit address(exclusive)
+ * @flags: region attributes, should be the combination of REGION_HYPERVISOR_xx
+ */
+static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
+                                   unsigned int flags)
+{
+    uint64_t idx;
+    int rc;
+
+    rc = mpumap_contain_region(xen_mpumap, max_xen_mpumap, base, limit - 1,
+                               &idx);
+    if ( rc == MPUMAP_REGION_OVERLAP )
+        return -EINVAL;
+
+    /* We are inserting a mapping => Create new region. */
+    if ( flags & _REGION_PRESENT )
+    {
+        if ( rc != MPUMAP_REGION_FAILED )
+            return -EINVAL;
+
+        if ( xen_boot_mpu_regions_is_full() )
+        {
+            region_printk("There is no room left in EL2 MPU memory region mapping\n");
+            return -ENOMEM;
+        }
+
+        /* During boot time, the default index is next_fixed_region_idx. */
+        if ( system_state <= SYS_STATE_active )
+            idx = next_fixed_region_idx;
+
+        xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1, REGION_AI_MASK(flags));
+        /* Set permission */
+        xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
+        xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
+
+        /* Update and enable the region */
+        access_protection_region(false, NULL, (const pr_t*)(&xen_mpumap[idx]),
+                                 idx);
+
+        if ( system_state <= SYS_STATE_active )
+            update_boot_xen_mpumap_idx(idx);
+    }
+
+    return 0;
+}
+
+static int xen_mpumap_update(paddr_t base, paddr_t limit, unsigned int flags)
+{
+    int rc;
+
+    /*
+     * The hardware was configured to forbid mapping both writeable and
+     * executable.
+     * When modifying/creating mapping (i.e _REGION_PRESENT is set),
+     * prevent any update if this happen.
+     */
+    if ( (flags & _REGION_PRESENT) && !REGION_RO_MASK(flags) &&
+         !REGION_XN_MASK(flags) )
+    {
+        region_printk("Mappings should not be both Writeable and Executable.\n");
+        return -EINVAL;
+    }
+
+    if ( !IS_ALIGNED(base, PAGE_SIZE) || !IS_ALIGNED(limit, PAGE_SIZE) )
+    {
+        region_printk("base address 0x%"PRIpaddr", or limit address 0x%"PRIpaddr" is not page aligned.\n",
+                      base, limit);
+        return -EINVAL;
+    }
+
+    spin_lock(&xen_mpumap_lock);
+
+    rc = xen_mpumap_update_entry(base, limit, flags);
+
+    spin_unlock(&xen_mpumap_lock);
+
+    return rc;
+}
+
+int map_pages_to_xen(unsigned long virt,
+                     mfn_t mfn,
+                     unsigned long nr_mfns,
+                     unsigned int flags)
+{
+    ASSERT(virt == mfn_to_maddr(mfn));
+
+    return xen_mpumap_update(mfn_to_maddr(mfn),
+                             mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
@@ -230,14 +498,6 @@ void *ioremap(paddr_t pa, size_t len)
     return NULL;
 }
 
-int map_pages_to_xen(unsigned long virt,
-                     mfn_t mfn,
-                     unsigned long nr_mfns,
-                     unsigned int flags)
-{
-    return -ENOSYS;
-}
-
 int destroy_xen_mappings(unsigned long s, unsigned long e)
 {
     return -ENOSYS;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (18 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-05 21:52   ` Julien Grall
  2023-02-06 10:11   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 21/40] xen/arm: move MMU-specific setup_mm to setup_mmu.c Penny Zheng
                   ` (22 subsequent siblings)
  42 siblings, 2 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In MPU system, device tree binary can be packed with Xen
image through CONFIG_DTB_FILE, or provided by bootloader through x0.

In MPU system, each section in xen.lds.S is PAGE_SIZE aligned.
So in order to not overlap with the previous BSS section, dtb section
should be made page-aligned too.
We add . = ALIGN(PAGE_SIZE); in the head of dtb section to make it happen.

In this commit, we map early FDT with a transient MPU memory region at
rear with REGION_HYPERVISOR_BOOT.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/arm64/mpu.h |  5 +++
 xen/arch/arm/mm_mpu.c                | 63 +++++++++++++++++++++++++---
 xen/arch/arm/xen.lds.S               |  5 ++-
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
index fcde6ad0db..b85e420a90 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -45,18 +45,22 @@
  * [3:4] Execute Never
  * [5:6] Access Permission
  * [7]   Region Present
+ * [8]   Boot-only Region
  */
 #define _REGION_AI_BIT            0
 #define _REGION_XN_BIT            3
 #define _REGION_AP_BIT            5
 #define _REGION_PRESENT_BIT       7
+#define _REGION_BOOTONLY_BIT      8
 #define _REGION_XN                (2U << _REGION_XN_BIT)
 #define _REGION_RO                (2U << _REGION_AP_BIT)
 #define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
+#define _REGION_BOOTONLY          (1U << _REGION_BOOTONLY_BIT)
 #define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
 #define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
 #define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
 #define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
+#define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT) & 0x1U)
 
 /*
  * _REGION_NORMAL is convenience define. It is not meant to be used
@@ -68,6 +72,7 @@
 #define REGION_HYPERVISOR_RO      (_REGION_NORMAL|_REGION_XN|_REGION_RO)
 
 #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
+#define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
 
 #define INVALID_REGION            (~0UL)
 
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 08720a7c19..b34dbf4515 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -20,11 +20,16 @@
  */
 
 #include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
 #include <xen/mm.h>
 #include <xen/page-size.h>
+#include <xen/pfn.h>
+#include <xen/sizes.h>
 #include <xen/spinlock.h>
 #include <asm/arm64/mpu.h>
+#include <asm/early_printk.h>
 #include <asm/page.h>
+#include <asm/setup.h>
 
 #ifdef NDEBUG
 static inline void
@@ -62,6 +67,8 @@ uint64_t __ro_after_init max_xen_mpumap;
 
 static DEFINE_SPINLOCK(xen_mpumap_lock);
 
+static paddr_t dtb_paddr;
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
     uint64_t _sel = sel;                                                \
@@ -403,7 +410,16 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
 
         /* During boot time, the default index is next_fixed_region_idx. */
         if ( system_state <= SYS_STATE_active )
-            idx = next_fixed_region_idx;
+        {
+            /*
+             * If it is a boot-only region (i.e. region for early FDT),
+             * it shall be added from the tail for late init re-organizing
+             */
+            if ( REGION_BOOTONLY_MASK(flags) )
+                idx = next_transient_region_idx;
+            else
+                idx = next_fixed_region_idx;
+        }
 
         xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1, REGION_AI_MASK(flags));
         /* Set permission */
@@ -465,14 +481,51 @@ int map_pages_to_xen(unsigned long virt,
                              mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
 }
 
-/* TODO: Implementation on the first usage */
-void dump_hyp_walk(vaddr_t addr)
+void * __init early_fdt_map(paddr_t fdt_paddr)
 {
+    void *fdt_virt;
+    uint32_t size;
+
+    /*
+     * Check whether the physical FDT address is set and meets the minimum
+     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
+     * least 8 bytes so that we always access the magic and size fields
+     * of the FDT header after mapping the first chunk, double check if
+     * that is indeed the case.
+     */
+     BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
+     if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
+         return NULL;
+
+    dtb_paddr = fdt_paddr;
+    /*
+     * In MPU system, device tree binary can be packed with Xen image
+     * through CONFIG_DTB_FILE, or provided by bootloader through x0.
+     * Map FDT with a transient MPU memory region of MAX_FDT_SIZE.
+     * After that, we can do some magic check.
+     */
+    if ( map_pages_to_xen(round_pgdown(fdt_paddr),
+                          maddr_to_mfn(round_pgdown(fdt_paddr)),
+                          round_pgup(MAX_FDT_SIZE) >> PAGE_SHIFT,
+                          REGION_HYPERVISOR_BOOT) )
+        panic("Unable to map the device-tree.\n");
+
+    /* VA == PA */
+    fdt_virt = maddr_to_virt(fdt_paddr);
+
+    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
+        return NULL;
+
+    size = fdt_totalsize(fdt_virt);
+    if ( size > MAX_FDT_SIZE )
+        return NULL;
+
+    return fdt_virt;
 }
 
-void * __init early_fdt_map(paddr_t fdt_paddr)
+/* TODO: Implementation on the first usage */
+void dump_hyp_walk(vaddr_t addr)
 {
-    return NULL;
 }
 
 void __init remove_early_mappings(void)
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 79965a3c17..0565e22a1f 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -218,7 +218,10 @@ SECTIONS
   _end = . ;
 
   /* Section for the device tree blob (if any). */
-  .dtb : { *(.dtb) } :text
+  .dtb : {
+      . = ALIGN(PAGE_SIZE);
+      *(.dtb)
+  } :text
 
   DWARF2_DEBUG_SECTIONS
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 21/40] xen/arm: move MMU-specific setup_mm to setup_mmu.c
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (19 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 22/40] xen/mpu: implement MPU version of setup_mm in setup_mpu.c Penny Zheng
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

setup_mm is used for Xen to setup memory management subsystem, like boot
allocator, direct-mapping, xenheap, frametable and static memory pages.
We could inherit some components seamlessly in MPU system like
boot allocator, whilst we need to implement some components differently
in MPU, like xenheap, and some components could not be applied in MPU system,
like direct-mapping.

In the commit, we move setup_mm and its related functions and
variables to setup_mmu.c in preparation of implementing MPU
version of setup_mm later in future commits

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Makefile            |   3 +
 xen/arch/arm/include/asm/setup.h |   5 +
 xen/arch/arm/setup.c             | 326 +---------------------------
 xen/arch/arm/setup_mmu.c         | 350 +++++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+), 322 deletions(-)
 create mode 100644 xen/arch/arm/setup_mmu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 21188b207f..adeb17b7ab 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -51,6 +51,9 @@ obj-y += physdev.o
 obj-y += processor.o
 obj-y += psci.o
 obj-y += setup.o
+ifneq ($(CONFIG_HAS_MPU), y)
+obj-y += setup_mmu.o
+endif
 obj-y += shutdown.o
 obj-y += smp.o
 obj-y += smpboot.o
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 4f39a1aa0a..8f353b67f8 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -158,6 +158,11 @@ struct bootcmdline *boot_cmdline_find_by_kind(bootmodule_kind kind);
 struct bootcmdline * boot_cmdline_find_by_name(const char *name);
 const char *boot_module_kind_as_string(bootmodule_kind kind);
 
+extern void init_pdx(void);
+extern void init_staticmem_pages(void);
+extern void populate_boot_allocator(void);
+extern void setup_mm(void);
+
 extern uint32_t hyp_traps_vector[];
 void init_traps(void);
 
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index d7d200179c..3ebf9e9a5c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -2,7 +2,7 @@
 /*
  * xen/arch/arm/setup.c
  *
- * Early bringup code for an ARMv7-A with virt extensions.
+ * Early bringup code for an ARMv7-A/ARM64v8R with virt extensions.
  *
  * Tim Deegan <tim@xen.org>
  * Copyright (c) 2011 Citrix Systems.
@@ -57,11 +57,6 @@ struct cpuinfo_arm __read_mostly system_cpuinfo;
 bool __read_mostly acpi_disabled;
 #endif
 
-#ifdef CONFIG_ARM_32
-static unsigned long opt_xenheap_megabytes __initdata;
-integer_param("xenheap_megabytes", opt_xenheap_megabytes);
-#endif
-
 domid_t __read_mostly max_init_domid;
 
 static __used void init_done(void)
@@ -455,138 +450,6 @@ static void * __init relocate_fdt(paddr_t dtb_paddr, size_t dtb_size)
     return fdt;
 }
 
-#ifdef CONFIG_ARM_32
-/*
- * Returns the end address of the highest region in the range s..e
- * with required size and alignment that does not conflict with the
- * modules from first_mod to nr_modules.
- *
- * For non-recursive callers first_mod should normally be 0 (all
- * modules and Xen itself) or 1 (all modules but not Xen).
- */
-static paddr_t __init consider_modules(paddr_t s, paddr_t e,
-                                       uint32_t size, paddr_t align,
-                                       int first_mod)
-{
-    const struct bootmodules *mi = &bootinfo.modules;
-    int i;
-    int nr;
-
-    s = (s+align-1) & ~(align-1);
-    e = e & ~(align-1);
-
-    if ( s > e ||  e - s < size )
-        return 0;
-
-    /* First check the boot modules */
-    for ( i = first_mod; i < mi->nr_mods; i++ )
-    {
-        paddr_t mod_s = mi->module[i].start;
-        paddr_t mod_e = mod_s + mi->module[i].size;
-
-        if ( s < mod_e && mod_s < e )
-        {
-            mod_e = consider_modules(mod_e, e, size, align, i+1);
-            if ( mod_e )
-                return mod_e;
-
-            return consider_modules(s, mod_s, size, align, i+1);
-        }
-    }
-
-    /* Now check any fdt reserved areas. */
-
-    nr = fdt_num_mem_rsv(device_tree_flattened);
-
-    for ( ; i < mi->nr_mods + nr; i++ )
-    {
-        paddr_t mod_s, mod_e;
-
-        if ( fdt_get_mem_rsv(device_tree_flattened,
-                             i - mi->nr_mods,
-                             &mod_s, &mod_e ) < 0 )
-            /* If we can't read it, pretend it doesn't exist... */
-            continue;
-
-        /* fdt_get_mem_rsv returns length */
-        mod_e += mod_s;
-
-        if ( s < mod_e && mod_s < e )
-        {
-            mod_e = consider_modules(mod_e, e, size, align, i+1);
-            if ( mod_e )
-                return mod_e;
-
-            return consider_modules(s, mod_s, size, align, i+1);
-        }
-    }
-
-    /*
-     * i is the current bootmodule we are evaluating, across all
-     * possible kinds of bootmodules.
-     *
-     * When retrieving the corresponding reserved-memory addresses, we
-     * need to index the bootinfo.reserved_mem bank starting from 0, and
-     * only counting the reserved-memory modules. Hence, we need to use
-     * i - nr.
-     */
-    nr += mi->nr_mods;
-    for ( ; i - nr < bootinfo.reserved_mem.nr_banks; i++ )
-    {
-        paddr_t r_s = bootinfo.reserved_mem.bank[i - nr].start;
-        paddr_t r_e = r_s + bootinfo.reserved_mem.bank[i - nr].size;
-
-        if ( s < r_e && r_s < e )
-        {
-            r_e = consider_modules(r_e, e, size, align, i + 1);
-            if ( r_e )
-                return r_e;
-
-            return consider_modules(s, r_s, size, align, i + 1);
-        }
-    }
-    return e;
-}
-
-/*
- * Find a contiguous region that fits in the static heap region with
- * required size and alignment, and return the end address of the region
- * if found otherwise 0.
- */
-static paddr_t __init fit_xenheap_in_static_heap(uint32_t size, paddr_t align)
-{
-    unsigned int i;
-    paddr_t end = 0, aligned_start, aligned_end;
-    paddr_t bank_start, bank_size, bank_end;
-
-    for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
-    {
-        if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
-            continue;
-
-        bank_start = bootinfo.reserved_mem.bank[i].start;
-        bank_size = bootinfo.reserved_mem.bank[i].size;
-        bank_end = bank_start + bank_size;
-
-        if ( bank_size < size )
-            continue;
-
-        aligned_end = bank_end & ~(align - 1);
-        aligned_start = (aligned_end - size) & ~(align - 1);
-
-        if ( aligned_start > bank_start )
-            /*
-             * Allocate the xenheap as high as possible to keep low-memory
-             * available (assuming the admin supplied region below 4GB)
-             * for other use (e.g. domain memory allocation).
-             */
-            end = max(end, aligned_end);
-    }
-
-    return end;
-}
-#endif
-
 /*
  * Return the end of the non-module region starting at s. In other
  * words return s the start of the next modules after s.
@@ -621,7 +484,7 @@ static paddr_t __init next_module(paddr_t s, paddr_t *end)
     return lowest;
 }
 
-static void __init init_pdx(void)
+void __init init_pdx(void)
 {
     paddr_t bank_start, bank_size, bank_end;
 
@@ -666,7 +529,7 @@ static void __init init_pdx(void)
 }
 
 /* Static memory initialization */
-static void __init init_staticmem_pages(void)
+void __init init_staticmem_pages(void)
 {
 #ifdef CONFIG_STATIC_MEMORY
     unsigned int bank;
@@ -700,7 +563,7 @@ static void __init init_staticmem_pages(void)
  * allocator with the corresponding regions only, but with Xenheap excluded
  * on arm32.
  */
-static void __init populate_boot_allocator(void)
+void __init populate_boot_allocator(void)
 {
     unsigned int i;
     const struct meminfo *banks = &bootinfo.mem;
@@ -769,187 +632,6 @@ static void __init populate_boot_allocator(void)
     }
 }
 
-#ifdef CONFIG_ARM_32
-static void __init setup_mm(void)
-{
-    paddr_t ram_start, ram_end, ram_size, e, bank_start, bank_end, bank_size;
-    paddr_t static_heap_end = 0, static_heap_size = 0;
-    unsigned long heap_pages, xenheap_pages, domheap_pages;
-    unsigned int i;
-    const uint32_t ctr = READ_CP32(CTR);
-
-    if ( !bootinfo.mem.nr_banks )
-        panic("No memory bank\n");
-
-    /* We only supports instruction caches implementing the IVIPT extension. */
-    if ( ((ctr >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK) == ICACHE_POLICY_AIVIVT )
-        panic("AIVIVT instruction cache not supported\n");
-
-    init_pdx();
-
-    ram_start = bootinfo.mem.bank[0].start;
-    ram_size  = bootinfo.mem.bank[0].size;
-    ram_end   = ram_start + ram_size;
-
-    for ( i = 1; i < bootinfo.mem.nr_banks; i++ )
-    {
-        bank_start = bootinfo.mem.bank[i].start;
-        bank_size = bootinfo.mem.bank[i].size;
-        bank_end = bank_start + bank_size;
-
-        ram_size  = ram_size + bank_size;
-        ram_start = min(ram_start,bank_start);
-        ram_end   = max(ram_end,bank_end);
-    }
-
-    total_pages = ram_size >> PAGE_SHIFT;
-
-    if ( bootinfo.static_heap )
-    {
-        for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
-        {
-            if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
-                continue;
-
-            bank_start = bootinfo.reserved_mem.bank[i].start;
-            bank_size = bootinfo.reserved_mem.bank[i].size;
-            bank_end = bank_start + bank_size;
-
-            static_heap_size += bank_size;
-            static_heap_end = max(static_heap_end, bank_end);
-        }
-
-        heap_pages = static_heap_size >> PAGE_SHIFT;
-    }
-    else
-        heap_pages = total_pages;
-
-    /*
-     * If the user has not requested otherwise via the command line
-     * then locate the xenheap using these constraints:
-     *
-     *  - must be contiguous
-     *  - must be 32 MiB aligned
-     *  - must not include Xen itself or the boot modules
-     *  - must be at most 1GB or 1/32 the total RAM in the system (or static
-          heap if enabled) if less
-     *  - must be at least 32M
-     *
-     * We try to allocate the largest xenheap possible within these
-     * constraints.
-     */
-    if ( opt_xenheap_megabytes )
-        xenheap_pages = opt_xenheap_megabytes << (20-PAGE_SHIFT);
-    else
-    {
-        xenheap_pages = (heap_pages/32 + 0x1fffUL) & ~0x1fffUL;
-        xenheap_pages = max(xenheap_pages, 32UL<<(20-PAGE_SHIFT));
-        xenheap_pages = min(xenheap_pages, 1UL<<(30-PAGE_SHIFT));
-    }
-
-    do
-    {
-        e = bootinfo.static_heap ?
-            fit_xenheap_in_static_heap(pfn_to_paddr(xenheap_pages), MB(32)) :
-            consider_modules(ram_start, ram_end,
-                             pfn_to_paddr(xenheap_pages),
-                             32<<20, 0);
-        if ( e )
-            break;
-
-        xenheap_pages >>= 1;
-    } while ( !opt_xenheap_megabytes && xenheap_pages > 32<<(20-PAGE_SHIFT) );
-
-    if ( ! e )
-        panic("Not enough space for xenheap\n");
-
-    domheap_pages = heap_pages - xenheap_pages;
-
-    printk("Xen heap: %"PRIpaddr"-%"PRIpaddr" (%lu pages%s)\n",
-           e - (pfn_to_paddr(xenheap_pages)), e, xenheap_pages,
-           opt_xenheap_megabytes ? ", from command-line" : "");
-    printk("Dom heap: %lu pages\n", domheap_pages);
-
-    /*
-     * We need some memory to allocate the page-tables used for the
-     * directmap mappings. So populate the boot allocator first.
-     *
-     * This requires us to set directmap_mfn_{start, end} first so the
-     * direct-mapped Xenheap region can be avoided.
-     */
-    directmap_mfn_start = _mfn((e >> PAGE_SHIFT) - xenheap_pages);
-    directmap_mfn_end = mfn_add(directmap_mfn_start, xenheap_pages);
-
-    populate_boot_allocator();
-
-    setup_directmap_mappings(mfn_x(directmap_mfn_start), xenheap_pages);
-
-    /* Frame table covers all of RAM region, including holes */
-    setup_frametable_mappings(ram_start, ram_end);
-    max_page = PFN_DOWN(ram_end);
-
-    /*
-     * The allocators may need to use map_domain_page() (such as for
-     * scrubbing pages). So we need to prepare the domheap area first.
-     */
-    if ( !init_domheap_mappings(smp_processor_id()) )
-        panic("CPU%u: Unable to prepare the domheap page-tables\n",
-              smp_processor_id());
-
-    /* Add xenheap memory that was not already added to the boot allocator. */
-    init_xenheap_pages(mfn_to_maddr(directmap_mfn_start),
-                       mfn_to_maddr(directmap_mfn_end));
-
-    init_staticmem_pages();
-}
-#else /* CONFIG_ARM_64 */
-static void __init setup_mm(void)
-{
-    const struct meminfo *banks = &bootinfo.mem;
-    paddr_t ram_start = INVALID_PADDR;
-    paddr_t ram_end = 0;
-    paddr_t ram_size = 0;
-    unsigned int i;
-
-    init_pdx();
-
-    /*
-     * We need some memory to allocate the page-tables used for the directmap
-     * mappings. But some regions may contain memory already allocated
-     * for other uses (e.g. modules, reserved-memory...).
-     *
-     * For simplicity, add all the free regions in the boot allocator.
-     */
-    populate_boot_allocator();
-
-    total_pages = 0;
-
-    for ( i = 0; i < banks->nr_banks; i++ )
-    {
-        const struct membank *bank = &banks->bank[i];
-        paddr_t bank_end = bank->start + bank->size;
-
-        ram_size = ram_size + bank->size;
-        ram_start = min(ram_start, bank->start);
-        ram_end = max(ram_end, bank_end);
-
-        setup_directmap_mappings(PFN_DOWN(bank->start),
-                                 PFN_DOWN(bank->size));
-    }
-
-    total_pages += ram_size >> PAGE_SHIFT;
-
-    directmap_virt_end = XENHEAP_VIRT_START + ram_end - ram_start;
-    directmap_mfn_start = maddr_to_mfn(ram_start);
-    directmap_mfn_end = maddr_to_mfn(ram_end);
-
-    setup_frametable_mappings(ram_start, ram_end);
-    max_page = PFN_DOWN(ram_end);
-
-    init_staticmem_pages();
-}
-#endif
-
 static bool __init is_dom0less_mode(void)
 {
     struct bootmodules *mods = &bootinfo.modules;
diff --git a/xen/arch/arm/setup_mmu.c b/xen/arch/arm/setup_mmu.c
new file mode 100644
index 0000000000..7e5d87f8bd
--- /dev/null
+++ b/xen/arch/arm/setup_mmu.c
@@ -0,0 +1,350 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * xen/arch/arm/setup_mmu.c
+ *
+ * Early bringup code for an ARMv7-A with virt extensions.
+ *
+ * Tim Deegan <tim@xen.org>
+ * Copyright (c) 2011 Citrix Systems.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
+#include <xen/mm.h>
+#include <xen/param.h>
+#include <xen/pfn.h>
+#include <asm/page.h>
+#include <asm/setup.h>
+
+#ifdef CONFIG_ARM_32
+static unsigned long opt_xenheap_megabytes __initdata;
+integer_param("xenheap_megabytes", opt_xenheap_megabytes);
+
+/*
+ * Returns the end address of the highest region in the range s..e
+ * with required size and alignment that does not conflict with the
+ * modules from first_mod to nr_modules.
+ *
+ * For non-recursive callers first_mod should normally be 0 (all
+ * modules and Xen itself) or 1 (all modules but not Xen).
+ */
+static paddr_t __init consider_modules(paddr_t s, paddr_t e,
+                                       uint32_t size, paddr_t align,
+                                       int first_mod)
+{
+    const struct bootmodules *mi = &bootinfo.modules;
+    int i;
+    int nr;
+
+    s = (s+align-1) & ~(align-1);
+    e = e & ~(align-1);
+
+    if ( s > e ||  e - s < size )
+        return 0;
+
+    /* First check the boot modules */
+    for ( i = first_mod; i < mi->nr_mods; i++ )
+    {
+        paddr_t mod_s = mi->module[i].start;
+        paddr_t mod_e = mod_s + mi->module[i].size;
+
+        if ( s < mod_e && mod_s < e )
+        {
+            mod_e = consider_modules(mod_e, e, size, align, i+1);
+            if ( mod_e )
+                return mod_e;
+
+            return consider_modules(s, mod_s, size, align, i+1);
+        }
+    }
+
+    /* Now check any fdt reserved areas. */
+
+    nr = fdt_num_mem_rsv(device_tree_flattened);
+
+    for ( ; i < mi->nr_mods + nr; i++ )
+    {
+        paddr_t mod_s, mod_e;
+
+        if ( fdt_get_mem_rsv(device_tree_flattened,
+                             i - mi->nr_mods,
+                             &mod_s, &mod_e ) < 0 )
+            /* If we can't read it, pretend it doesn't exist... */
+            continue;
+
+        /* fdt_get_mem_rsv returns length */
+        mod_e += mod_s;
+
+        if ( s < mod_e && mod_s < e )
+        {
+            mod_e = consider_modules(mod_e, e, size, align, i+1);
+            if ( mod_e )
+                return mod_e;
+
+            return consider_modules(s, mod_s, size, align, i+1);
+        }
+    }
+
+    /*
+     * i is the current bootmodule we are evaluating, across all
+     * possible kinds of bootmodules.
+     *
+     * When retrieving the corresponding reserved-memory addresses, we
+     * need to index the bootinfo.reserved_mem bank starting from 0, and
+     * only counting the reserved-memory modules. Hence, we need to use
+     * i - nr.
+     */
+    nr += mi->nr_mods;
+    for ( ; i - nr < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        paddr_t r_s = bootinfo.reserved_mem.bank[i - nr].start;
+        paddr_t r_e = r_s + bootinfo.reserved_mem.bank[i - nr].size;
+
+        if ( s < r_e && r_s < e )
+        {
+            r_e = consider_modules(r_e, e, size, align, i + 1);
+            if ( r_e )
+                return r_e;
+
+            return consider_modules(s, r_s, size, align, i + 1);
+        }
+    }
+    return e;
+}
+
+/*
+ * Find a contiguous region that fits in the static heap region with
+ * required size and alignment, and return the end address of the region
+ * if found otherwise 0.
+ */
+static paddr_t __init fit_xenheap_in_static_heap(uint32_t size, paddr_t align)
+{
+    unsigned int i;
+    paddr_t end = 0, aligned_start, aligned_end;
+    paddr_t bank_start, bank_size, bank_end;
+
+    for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
+            continue;
+
+        bank_start = bootinfo.reserved_mem.bank[i].start;
+        bank_size = bootinfo.reserved_mem.bank[i].size;
+        bank_end = bank_start + bank_size;
+
+        if ( bank_size < size )
+            continue;
+
+        aligned_end = bank_end & ~(align - 1);
+        aligned_start = (aligned_end - size) & ~(align - 1);
+
+        if ( aligned_start > bank_start )
+            /*
+             * Allocate the xenheap as high as possible to keep low-memory
+             * available (assuming the admin supplied region below 4GB)
+             * for other use (e.g. domain memory allocation).
+             */
+            end = max(end, aligned_end);
+    }
+
+    return end;
+}
+
+void __init setup_mm(void)
+{
+    paddr_t ram_start, ram_end, ram_size, e, bank_start, bank_end, bank_size;
+    paddr_t static_heap_end = 0, static_heap_size = 0;
+    unsigned long heap_pages, xenheap_pages, domheap_pages;
+    unsigned int i;
+    const uint32_t ctr = READ_CP32(CTR);
+
+    if ( !bootinfo.mem.nr_banks )
+        panic("No memory bank\n");
+
+    /* We only supports instruction caches implementing the IVIPT extension. */
+    if ( ((ctr >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK) == ICACHE_POLICY_AIVIVT )
+        panic("AIVIVT instruction cache not supported\n");
+
+    init_pdx();
+
+    ram_start = bootinfo.mem.bank[0].start;
+    ram_size  = bootinfo.mem.bank[0].size;
+    ram_end   = ram_start + ram_size;
+
+    for ( i = 1; i < bootinfo.mem.nr_banks; i++ )
+    {
+        bank_start = bootinfo.mem.bank[i].start;
+        bank_size = bootinfo.mem.bank[i].size;
+        bank_end = bank_start + bank_size;
+
+        ram_size  = ram_size + bank_size;
+        ram_start = min(ram_start,bank_start);
+        ram_end   = max(ram_end,bank_end);
+    }
+
+    total_pages = ram_size >> PAGE_SHIFT;
+
+    if ( bootinfo.static_heap )
+    {
+        for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
+        {
+            if ( bootinfo.reserved_mem.bank[i].type != MEMBANK_STATIC_HEAP )
+                continue;
+
+            bank_start = bootinfo.reserved_mem.bank[i].start;
+            bank_size = bootinfo.reserved_mem.bank[i].size;
+            bank_end = bank_start + bank_size;
+
+            static_heap_size += bank_size;
+            static_heap_end = max(static_heap_end, bank_end);
+        }
+
+        heap_pages = static_heap_size >> PAGE_SHIFT;
+    }
+    else
+        heap_pages = total_pages;
+
+    /*
+     * If the user has not requested otherwise via the command line
+     * then locate the xenheap using these constraints:
+     *
+     *  - must be contiguous
+     *  - must be 32 MiB aligned
+     *  - must not include Xen itself or the boot modules
+     *  - must be at most 1GB or 1/32 the total RAM in the system (or static
+          heap if enabled) if less
+     *  - must be at least 32M
+     *
+     * We try to allocate the largest xenheap possible within these
+     * constraints.
+     */
+    if ( opt_xenheap_megabytes )
+        xenheap_pages = opt_xenheap_megabytes << (20-PAGE_SHIFT);
+    else
+    {
+        xenheap_pages = (heap_pages/32 + 0x1fffUL) & ~0x1fffUL;
+        xenheap_pages = max(xenheap_pages, 32UL<<(20-PAGE_SHIFT));
+        xenheap_pages = min(xenheap_pages, 1UL<<(30-PAGE_SHIFT));
+    }
+
+    do
+    {
+        e = bootinfo.static_heap ?
+            fit_xenheap_in_static_heap(pfn_to_paddr(xenheap_pages), MB(32)) :
+            consider_modules(ram_start, ram_end,
+                             pfn_to_paddr(xenheap_pages),
+                             32<<20, 0);
+        if ( e )
+            break;
+
+        xenheap_pages >>= 1;
+    } while ( !opt_xenheap_megabytes && xenheap_pages > 32<<(20-PAGE_SHIFT) );
+
+    if ( ! e )
+        panic("Not enough space for xenheap\n");
+
+    domheap_pages = heap_pages - xenheap_pages;
+
+    printk("Xen heap: %"PRIpaddr"-%"PRIpaddr" (%lu pages%s)\n",
+           e - (pfn_to_paddr(xenheap_pages)), e, xenheap_pages,
+           opt_xenheap_megabytes ? ", from command-line" : "");
+    printk("Dom heap: %lu pages\n", domheap_pages);
+
+    /*
+     * We need some memory to allocate the page-tables used for the
+     * directmap mappings. So populate the boot allocator first.
+     *
+     * This requires us to set directmap_mfn_{start, end} first so the
+     * direct-mapped Xenheap region can be avoided.
+     */
+    directmap_mfn_start = _mfn((e >> PAGE_SHIFT) - xenheap_pages);
+    directmap_mfn_end = mfn_add(directmap_mfn_start, xenheap_pages);
+
+    populate_boot_allocator();
+
+    setup_directmap_mappings(mfn_x(directmap_mfn_start), xenheap_pages);
+
+    /* Frame table covers all of RAM region, including holes */
+    setup_frametable_mappings(ram_start, ram_end);
+    max_page = PFN_DOWN(ram_end);
+
+    /*
+     * The allocators may need to use map_domain_page() (such as for
+     * scrubbing pages). So we need to prepare the domheap area first.
+     */
+    if ( !init_domheap_mappings(smp_processor_id()) )
+        panic("CPU%u: Unable to prepare the domheap page-tables\n",
+              smp_processor_id());
+
+    /* Add xenheap memory that was not already added to the boot allocator. */
+    init_xenheap_pages(mfn_to_maddr(directmap_mfn_start),
+                       mfn_to_maddr(directmap_mfn_end));
+
+    init_staticmem_pages();
+}
+#else /* CONFIG_ARM_64 */
+void __init setup_mm(void)
+{
+    const struct meminfo *banks = &bootinfo.mem;
+    paddr_t ram_start = INVALID_PADDR;
+    paddr_t ram_end = 0;
+    paddr_t ram_size = 0;
+    unsigned int i;
+
+    init_pdx();
+
+    /*
+     * We need some memory to allocate the page-tables used for the directmap
+     * mappings. But some regions may contain memory already allocated
+     * for other uses (e.g. modules, reserved-memory...).
+     *
+     * For simplicity, add all the free regions in the boot allocator.
+     */
+    populate_boot_allocator();
+
+    total_pages = 0;
+
+    for ( i = 0; i < banks->nr_banks; i++ )
+    {
+        const struct membank *bank = &banks->bank[i];
+        paddr_t bank_end = bank->start + bank->size;
+
+        ram_size = ram_size + bank->size;
+        ram_start = min(ram_start, bank->start);
+        ram_end = max(ram_end, bank_end);
+
+        setup_directmap_mappings(PFN_DOWN(bank->start),
+                                 PFN_DOWN(bank->size));
+    }
+
+    total_pages += ram_size >> PAGE_SHIFT;
+
+    directmap_virt_end = XENHEAP_VIRT_START + ram_end - ram_start;
+    directmap_mfn_start = maddr_to_mfn(ram_start);
+    directmap_mfn_end = maddr_to_mfn(ram_end);
+
+    setup_frametable_mappings(ram_start, ram_end);
+    max_page = PFN_DOWN(ram_end);
+
+    init_staticmem_pages();
+}
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 22/40] xen/mpu: implement MPU version of setup_mm in setup_mpu.c
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (20 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 21/40] xen/arm: move MMU-specific setup_mm to setup_mmu.c Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 23/40] xen/mpu: initialize frametable in MPU system Penny Zheng
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In MPU system, system RAM shall be statically partitioned into
different functionality section in Device Tree at the very beginning,
including static xenheap, guest memory section, boot-module section, etc.
So using a virtual contigious memory region to do direct-mapping for the
whole system RAM is not applicable in MPU system.

Function setup_static_mappings is introduced to set up MPU memory
region mapping section by section based on static configuration in
Device Tree.
And this commit is only responsible for static xenheap mapping, which is
implemented in setup_staticheap_mappings. All the other static
memory section mapping will be introduced later.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Makefile             |  2 +
 xen/arch/arm/include/asm/mm_mpu.h |  5 +++
 xen/arch/arm/mm_mpu.c             | 41 ++++++++++++++++++
 xen/arch/arm/setup_mpu.c          | 70 +++++++++++++++++++++++++++++++
 4 files changed, 118 insertions(+)
 create mode 100644 xen/arch/arm/setup_mpu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index adeb17b7ab..23dfbc3333 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -53,6 +53,8 @@ obj-y += psci.o
 obj-y += setup.o
 ifneq ($(CONFIG_HAS_MPU), y)
 obj-y += setup_mmu.o
+else
+obj-y += setup_mpu.o
 endif
 obj-y += shutdown.o
 obj-y += smp.o
diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
index 3a4b07f187..fe6a828a50 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -3,6 +3,11 @@
 #define __ARCH_ARM_MM_MPU__
 
 #define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
+/*
+ * Function setup_static_mappings() sets up MPU memory region mapping
+ * section by section based on static configuration in Device Tree.
+ */
+extern void setup_static_mappings(void);
 
 static inline paddr_t __virt_to_maddr(vaddr_t va)
 {
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index b34dbf4515..f057ee26df 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -523,6 +523,47 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
     return fdt_virt;
 }
 
+/*
+ * Heap must be statically configured in Device Tree through
+ * "xen,static-heap" in MPU system.
+ */
+static void __init setup_staticheap_mappings(void)
+{
+    unsigned int bank = 0;
+
+    for ( ; bank < bootinfo.reserved_mem.nr_banks; bank++ )
+    {
+        if ( bootinfo.reserved_mem.bank[bank].type == MEMBANK_STATIC_HEAP )
+        {
+            paddr_t bank_start = round_pgup(
+                                 bootinfo.reserved_mem.bank[bank].start);
+            paddr_t bank_size = round_pgdown(
+                                bootinfo.reserved_mem.bank[bank].size);
+
+            /* Map static heap with fixed MPU memory region */
+
+            if ( map_pages_to_xen(bank_start, maddr_to_mfn(bank_start),
+                                  bank_size >> PAGE_SHIFT,
+                                  REGION_HYPERVISOR) )
+                panic("mpu: failed to map static heap\n");
+        }
+    }
+}
+
+/*
+ * System RAM is statically partitioned into different functionality
+ * section in Device Tree, including static xenheap, guest memory
+ * section, boot-module section, etc.
+ * Function setup_static_mappings sets up MPU memory region mapping
+ * section by section.
+ */
+void __init setup_static_mappings(void)
+{
+    setup_staticheap_mappings();
+
+    /* TODO: guest memory section, device memory section, boot-module section, etc */
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
new file mode 100644
index 0000000000..ca0d8237d5
--- /dev/null
+++ b/xen/arch/arm/setup_mpu.c
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * xen/arch/arm/setup_mpu.c
+ *
+ * Early bringup code for an Armv8-R with virt extensions.
+ *
+ * Copyright (C) 2022 Arm Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/init.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <asm/mm_mpu.h>
+#include <asm/page.h>
+#include <asm/setup.h>
+
+void __init setup_mm(void)
+{
+    paddr_t ram_start = ~0, ram_end = 0, ram_size = 0;
+    unsigned int bank;
+
+    if ( !bootinfo.mem.nr_banks )
+        panic("No memory bank\n");
+
+    init_pdx();
+
+    populate_boot_allocator();
+
+    total_pages = 0;
+    for ( bank = 0 ; bank < bootinfo.mem.nr_banks; bank++ )
+    {
+        paddr_t bank_start = round_pgup(bootinfo.mem.bank[bank].start);
+        paddr_t bank_size = bootinfo.mem.bank[bank].size;
+        paddr_t bank_end = round_pgdown(bank_start + bank_size);
+
+        ram_size = ram_size + bank_size;
+        ram_start = min(ram_start, bank_start);
+        ram_end = max(ram_end, bank_end);
+    }
+
+    setup_static_mappings();
+
+    total_pages += ram_size >> PAGE_SHIFT;
+    max_page = PFN_DOWN(ram_end);
+
+    setup_frametable_mappings(ram_start, ram_end);
+
+    init_staticmem_pages();
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 23/40] xen/mpu: initialize frametable in MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (21 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 22/40] xen/mpu: implement MPU version of setup_mm in setup_mpu.c Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-05 22:07   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 24/40] xen/mpu: introduce "mpu,xxx-memory-section" Penny Zheng
                   ` (19 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

Xen is using page as the smallest granularity for memory managment.
And we want to follow the same concept in MPU system.
That is, structure page_info and the frametable which is used for storing
and managing page_info is also required in MPU system.

In MPU system, since there is no virtual address translation (VA == PA),
we can not use a fixed VA address(FRAMETABLE_VIRT_START) to map frametable
like MMU system does.
Instead, we define a variable "struct page_info *frame_table" as frametable
pointer, and ask boot allocator to allocate memory for frametable.

As frametable is successfully initialized, the convertion between machine frame
number/machine address/"virtual address" and page-info structure is
ready too, like mfn_to_page/maddr_to_page/virt_to_page, etc

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/mm.h     | 15 ---------------
 xen/arch/arm/include/asm/mm_mmu.h | 16 ++++++++++++++++
 xen/arch/arm/include/asm/mm_mpu.h | 17 +++++++++++++++++
 xen/arch/arm/mm_mpu.c             | 25 +++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index e29158028a..7969ec9f98 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -176,7 +176,6 @@ struct page_info
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
 
-#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 /* PDX of the first page in the frame table. */
 extern unsigned long frametable_base_pdx;
 
@@ -280,20 +279,6 @@ static inline uint64_t gvirt_to_maddr(vaddr_t va, paddr_t *pa,
 #define virt_to_mfn(va)     __virt_to_mfn(va)
 #define mfn_to_virt(mfn)    __mfn_to_virt(mfn)
 
-/* Convert between Xen-heap virtual addresses and page-info structures. */
-static inline struct page_info *virt_to_page(const void *v)
-{
-    unsigned long va = (unsigned long)v;
-    unsigned long pdx;
-
-    ASSERT(va >= XENHEAP_VIRT_START);
-    ASSERT(va < directmap_virt_end);
-
-    pdx = (va - XENHEAP_VIRT_START) >> PAGE_SHIFT;
-    pdx += mfn_to_pdx(directmap_mfn_start);
-    return frame_table + pdx - frametable_base_pdx;
-}
-
 static inline void *page_to_virt(const struct page_info *pg)
 {
     return mfn_to_virt(mfn_x(page_to_mfn(pg)));
diff --git a/xen/arch/arm/include/asm/mm_mmu.h b/xen/arch/arm/include/asm/mm_mmu.h
index 6d7e5ddde7..bc1b04c4c7 100644
--- a/xen/arch/arm/include/asm/mm_mmu.h
+++ b/xen/arch/arm/include/asm/mm_mmu.h
@@ -23,6 +23,8 @@ extern uint64_t init_ttbr;
 extern void setup_directmap_mappings(unsigned long base_mfn,
                                      unsigned long nr_mfns);
 
+#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
+
 static inline paddr_t __virt_to_maddr(vaddr_t va)
 {
     uint64_t par = va_to_par(va);
@@ -49,6 +51,20 @@ static inline void *maddr_to_virt(paddr_t ma)
 }
 #endif
 
+/* Convert between Xen-heap virtual addresses and page-info structures. */
+static inline struct page_info *virt_to_page(const void *v)
+{
+    unsigned long va = (unsigned long)v;
+    unsigned long pdx;
+
+    ASSERT(va >= XENHEAP_VIRT_START);
+    ASSERT(va < directmap_virt_end);
+
+    pdx = (va - XENHEAP_VIRT_START) >> PAGE_SHIFT;
+    pdx += mfn_to_pdx(directmap_mfn_start);
+    return frame_table + pdx - frametable_base_pdx;
+}
+
 #endif /* __ARCH_ARM_MM_MMU__ */
 
 /*
diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
index fe6a828a50..eebd5b5d35 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -9,6 +9,8 @@
  */
 extern void setup_static_mappings(void);
 
+extern struct page_info *frame_table;
+
 static inline paddr_t __virt_to_maddr(vaddr_t va)
 {
     /* In MPU system, VA == PA. */
@@ -22,6 +24,21 @@ static inline void *maddr_to_virt(paddr_t ma)
     return (void *)ma;
 }
 
+/* Convert between virtual address to page-info structure. */
+static inline struct page_info *virt_to_page(const void *v)
+{
+    unsigned long va = (unsigned long)v;
+    unsigned long pdx;
+
+    /*
+     * In MPU system, VA == PA, virt_to_maddr() outputs the
+     * exact input address.
+     */
+    pdx = mfn_to_pdx(maddr_to_mfn(virt_to_maddr(va)));
+
+    return frame_table + pdx - frametable_base_pdx;
+}
+
 #endif /* __ARCH_ARM_MM_MPU__ */
 
 /*
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index f057ee26df..7b282be4fb 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -69,6 +69,8 @@ static DEFINE_SPINLOCK(xen_mpumap_lock);
 
 static paddr_t dtb_paddr;
 
+struct page_info *frame_table;
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
     uint64_t _sel = sel;                                                \
@@ -564,6 +566,29 @@ void __init setup_static_mappings(void)
     /* TODO: guest memory section, device memory section, boot-module section, etc */
 }
 
+/* Map a frame table to cover physical addresses ps through pe */
+void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
+{
+    mfn_t base_mfn;
+    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
+                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
+    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
+
+    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
+    frametable_size = ROUNDUP(frametable_size, PAGE_SIZE);
+    /*
+     * Since VA == PA in MPU and we've already setup Xenheap mapping
+     * in setup_staticheap_mappings(), we could easily deduce the
+     * "virtual address" of frame table.
+     */
+    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 1);
+    frame_table = (struct page_info*)mfn_to_virt(base_mfn);
+
+    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
+    memset(&frame_table[nr_pdxs], -1,
+           frametable_size - (nr_pdxs * sizeof(struct page_info)));
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 24/40] xen/mpu: introduce "mpu,xxx-memory-section"
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (22 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 23/40] xen/mpu: initialize frametable in MPU system Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-01-13  5:28 ` [PATCH v2 25/40] xen/mpu: map MPU guest memory section before static memory initialization Penny Zheng
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In MPU system, all kinds of resources, including system resource and
domain resource must be statically configured in Device Tree, i.e,
guest RAM must be statically allocated through "xen,static-mem" property
under domain node.

However, due to limited MPU protection regions and a wide variety of resource,
we could easily exhaust all MPU protection regions very quickly.
So we want to introduce a set of new property, "#mpu,xxx-memory-section"
to mitigate the impact.
Each property limits the available host address range of one kind of
system/domain resource.

This commit also introduces "#mpu,guest-memory-section" as an example, for
limiting the scattering of static memory as guest RAM.
Guest RAM shall be not only statically configured through "xen,static-mem"
property in MPU system, but also shall be defined inside
outside "mpu,guest-memory-section".

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/bootfdt.c           | 13 ++++---
 xen/arch/arm/include/asm/setup.h | 24 +++++++++++++
 xen/arch/arm/setup_mpu.c         | 58 ++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 0085c28d74..d7a5dd0ede 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -59,10 +59,10 @@ void __init device_tree_get_reg(const __be32 **cell, u32 address_cells,
     *size = dt_next_cell(size_cells, cell);
 }
 
-static int __init device_tree_get_meminfo(const void *fdt, int node,
-                                          const char *prop_name,
-                                          u32 address_cells, u32 size_cells,
-                                          void *data, enum membank_type type)
+int __init device_tree_get_meminfo(const void *fdt, int node,
+                                   const char *prop_name,
+                                   u32 address_cells, u32 size_cells,
+                                   void *data, enum membank_type type)
 {
     const struct fdt_property *prop;
     unsigned int i, banks;
@@ -315,6 +315,11 @@ static int __init process_chosen_node(const void *fdt, int node,
         bootinfo.static_heap = true;
     }
 
+#ifdef CONFIG_HAS_MPU
+    if ( process_mpuinfo(fdt, node, address_cells, size_cells) )
+        return -EINVAL;
+#endif
+
     printk("Checking for initrd in /chosen\n");
 
     prop = fdt_get_property(fdt, node, "linux,initrd-start", &len);
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 8f353b67f8..3581f8f990 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -172,6 +172,11 @@ void device_tree_get_reg(const __be32 **cell, u32 address_cells,
 u32 device_tree_get_u32(const void *fdt, int node,
                         const char *prop_name, u32 dflt);
 
+int device_tree_get_meminfo(const void *fdt, int node,
+                            const char *prop_name,
+                            u32 address_cells, u32 size_cells,
+                            void *data, enum membank_type type);
+
 int map_range_to_domain(const struct dt_device_node *dev,
                         u64 addr, u64 len, void *data);
 
@@ -185,6 +190,25 @@ struct init_info
     unsigned int cpuid;
 };
 
+#ifdef CONFIG_HAS_MPU
+/* Index of MPU memory section */
+enum mpu_section_info {
+    MSINFO_GUEST,
+    MSINFO_MAX
+};
+
+extern const char *mpu_section_info_str[MSINFO_MAX];
+
+struct mpuinfo {
+    struct meminfo sections[MSINFO_MAX];
+};
+
+extern struct mpuinfo mpuinfo;
+
+extern int process_mpuinfo(const void *fdt, int node, uint32_t address_cells,
+                           uint32_t size_cells);
+#endif /* CONFIG_HAS_MPU */
+
 #endif
 /*
  * Local variables:
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index ca0d8237d5..09a38a34a4 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -20,12 +20,70 @@
  */
 
 #include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
 #include <xen/mm.h>
 #include <xen/pfn.h>
 #include <asm/mm_mpu.h>
 #include <asm/page.h>
 #include <asm/setup.h>
 
+const char *mpu_section_info_str[MSINFO_MAX] = {
+    "mpu,guest-memory-section",
+};
+
+/*
+ * mpuinfo stores mpu memory section info, which is configured under
+ * "mpu,xxx-memory-section" in Device Tree.
+ */
+struct mpuinfo __initdata mpuinfo;
+
+/*
+ * Due to limited MPU protection regions and a wide variety of resource,
+ * "#mpu,xxx-memory-section" is introduced to mitigate the impact.
+ * Each property limits the available host address range of one kind of
+ * system/domain resource.
+ *
+ * "mpu,guest-memory-section": guest RAM must be statically allocated
+ * through "xen,static-mem" property in MPU system. "mpu,guest-memory-section"
+ * limits the scattering of "xen,static-mem", as users could not define
+ * a "xen,static-mem" outside "mpu,guest-memory-section".
+ */
+static int __init process_mpu_memory_section(const void *fdt, int node,
+                                             const char *name, void *data,
+                                             uint32_t address_cells,
+                                             uint32_t size_cells)
+{
+    if ( !fdt_get_property(fdt, node, name, NULL) )
+        return -EINVAL;
+
+    return device_tree_get_meminfo(fdt, node, name, address_cells, size_cells,
+                                   data, MEMBANK_DEFAULT);
+}
+
+int __init process_mpuinfo(const void *fdt, int node,
+                           uint32_t address_cells, uint32_t size_cells)
+{
+    uint8_t idx = 0;
+    const char *prop_name;
+
+    for ( ; idx < MSINFO_MAX; idx++ )
+    {
+        prop_name = mpu_section_info_str[idx];
+
+        printk("Checking for %s in /chosen\n", prop_name);
+
+        if ( process_mpu_memory_section(fdt, node, prop_name,
+                                        &mpuinfo.sections[idx],
+                                        address_cells, size_cells) )
+        {
+            printk(XENLOG_ERR "fdt: failed to process %s\n", prop_name);
+            return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
 void __init setup_mm(void)
 {
     paddr_t ram_start = ~0, ram_end = 0, ram_size = 0;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 25/40] xen/mpu: map MPU guest memory section before static memory initialization
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (23 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 24/40] xen/mpu: introduce "mpu,xxx-memory-section" Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-09 10:51   ` Julien Grall
  2023-01-13  5:28 ` [PATCH v2 26/40] xen/mpu: destroy an existing entry in Xen MPU memory mapping table Penny Zheng
                   ` (17 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

Previous commit introduces a new device tree property
"mpu,guest-memory-section" to define MPU guest memory section, which
will mitigate the scattering of statically-configured guest RAM.

We only need to set up MPU memory region mapping for MPU guest memory section
to have access to all guest RAM.
And this should happen before static memory initialization(init_staticmem_pages())

MPU memory region for MPU guest memory secction gets switched out when
idle vcpu leaving, to avoid region overlapping if the vcpu enters into guest
mode later. On the contrary, it gets switched in when idle vcpu entering.
We introduce a bit in region "region.prlar.sw"(struct pr_t region) to
indicate this kind of feature.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/arm64/mpu.h | 14 ++++++---
 xen/arch/arm/mm_mpu.c                | 47 +++++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
index b85e420a90..0044bbf05d 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -45,22 +45,26 @@
  * [3:4] Execute Never
  * [5:6] Access Permission
  * [7]   Region Present
- * [8]   Boot-only Region
+ * [8:9] 0b00: Fixed Region; 0b01: Boot-only Region;
+ *       0b10: Region needs switching out/in during vcpu context switch;
  */
 #define _REGION_AI_BIT            0
 #define _REGION_XN_BIT            3
 #define _REGION_AP_BIT            5
 #define _REGION_PRESENT_BIT       7
-#define _REGION_BOOTONLY_BIT      8
+#define _REGION_TRANSIENT_BIT     8
 #define _REGION_XN                (2U << _REGION_XN_BIT)
 #define _REGION_RO                (2U << _REGION_AP_BIT)
 #define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
-#define _REGION_BOOTONLY          (1U << _REGION_BOOTONLY_BIT)
+#define _REGION_BOOTONLY          (1U << _REGION_TRANSIENT_BIT)
+#define _REGION_SWITCH            (2U << _REGION_TRANSIENT_BIT)
 #define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
 #define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
 #define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
 #define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
 #define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT) & 0x1U)
+#define REGION_SWITCH_MASK(x)     (((x) >> _REGION_TRANSIENT_BIT) & 0x2U)
+#define REGION_TRANSIENT_MASK(x)  (((x) >> _REGION_TRANSIENT_BIT) & 0x3U)
 
 /*
  * _REGION_NORMAL is convenience define. It is not meant to be used
@@ -73,6 +77,7 @@
 
 #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
 #define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
+#define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
 
 #define INVALID_REGION            (~0UL)
 
@@ -98,7 +103,8 @@ typedef union {
         unsigned long ns:1;     /* Not-Secure */
         unsigned long res:1;    /* Reserved 0 by hardware */
         unsigned long limit:42; /* Limit Address */
-        unsigned long pad:16;
+        unsigned long pad:15;
+        unsigned long sw:1;     /* Region gets switched out/in during vcpu context switch? */
     } reg;
     uint64_t bits;
 } prlar_t;
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 7b282be4fb..d2e19e836c 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -71,6 +71,10 @@ static paddr_t dtb_paddr;
 
 struct page_info *frame_table;
 
+static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
+    REGION_HYPERVISOR_SWITCH,
+};
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
     uint64_t _sel = sel;                                                \
@@ -414,10 +418,13 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
         if ( system_state <= SYS_STATE_active )
         {
             /*
-             * If it is a boot-only region (i.e. region for early FDT),
-             * it shall be added from the tail for late init re-organizing
+             * If it is a transient region, including boot-only region
+             * (i.e. region for early FDT), and region which needs switching
+             * in/out during vcpu context switch(i.e. region for guest memory
+             * section), it shall be added from the tail for late init
+             * re-organizing
              */
-            if ( REGION_BOOTONLY_MASK(flags) )
+            if ( REGION_TRANSIENT_MASK(flags) )
                 idx = next_transient_region_idx;
             else
                 idx = next_fixed_region_idx;
@@ -427,6 +434,13 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
         /* Set permission */
         xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
         xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
+        /*
+         * Bit sw indicates that region gets switched out when idle vcpu
+         * leaving hypervisor mode, and region gets switched in when idle vcpu
+         * entering hypervisor mode.
+         */
+        if ( REGION_SWITCH_MASK(flags) )
+            xen_mpumap[idx].prlar.reg.sw = 1;
 
         /* Update and enable the region */
         access_protection_region(false, NULL, (const pr_t*)(&xen_mpumap[idx]),
@@ -552,6 +566,29 @@ static void __init setup_staticheap_mappings(void)
     }
 }
 
+static void __init map_mpu_memory_section_on_boot(enum mpu_section_info type,
+                                                  unsigned int flags)
+{
+    unsigned int i = 0;
+
+    for ( ; i < mpuinfo.sections[type].nr_banks; i++ )
+    {
+        paddr_t start = round_pgup(
+                        mpuinfo.sections[type].bank[i].start);
+        paddr_t size = round_pgdown(mpuinfo.sections[type].bank[i].size);
+
+        /*
+         * Map MPU memory section with transient MPU memory region,
+         * as they are either boot-only, or will be switched out/in
+         * during vcpu context switch(i.e. guest memory section).
+         */
+        if ( map_pages_to_xen(start, maddr_to_mfn(start), size >> PAGE_SHIFT,
+                              flags) )
+            panic("mpu: failed to map MPU memory section %s\n",
+                  mpu_section_info_str[type]);
+    }
+}
+
 /*
  * System RAM is statically partitioned into different functionality
  * section in Device Tree, including static xenheap, guest memory
@@ -563,7 +600,9 @@ void __init setup_static_mappings(void)
 {
     setup_staticheap_mappings();
 
-    /* TODO: guest memory section, device memory section, boot-module section, etc */
+    for ( uint8_t i = MSINFO_GUEST; i < MSINFO_MAX; i++ )
+        map_mpu_memory_section_on_boot(i, mpu_section_mattr[i]);
+    /* TODO: device memory section, boot-module section, etc */
 }
 
 /* Map a frame table to cover physical addresses ps through pe */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 26/40] xen/mpu: destroy an existing entry in Xen MPU memory mapping table
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (24 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 25/40] xen/mpu: map MPU guest memory section before static memory initialization Penny Zheng
@ 2023-01-13  5:28 ` Penny Zheng
  2023-02-09 10:57   ` Julien Grall
  2023-01-13  5:29 ` [PATCH v2 27/40] xen/mpu: map device memory resource in MPU system Penny Zheng
                   ` (16 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:28 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

This commit expands xen_mpumap_update/xen_mpumap_update_entry to include
destroying an existing entry.

We define a new helper "control_xen_mpumap_region_from_index" to enable/disable
the MPU region based on index. If region is within [0, 31], we could quickly
disable the MPU region through PRENR_EL2 which provides direct access to the
PRLAR_EL2.EN bits of EL2 MPU regions.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/arm64/mpu.h     | 20 ++++++
 xen/arch/arm/include/asm/arm64/sysregs.h |  3 +
 xen/arch/arm/mm_mpu.c                    | 77 ++++++++++++++++++++++--
 3 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
index 0044bbf05d..c1dea1c8e9 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -16,6 +16,8 @@
  */
 #define ARM_MAX_MPU_MEMORY_REGIONS 255
 
+#define MPU_PRENR_BITS    32
+
 /* Access permission attributes. */
 /* Read/Write at EL2, No Access at EL1/EL0. */
 #define AP_RW_EL2 0x0
@@ -132,6 +134,24 @@ typedef struct {
     _pr->prlar.reg.en;                                      \
 })
 
+/*
+ * Access to get base address of MPU protection region(pr_t).
+ * The base address shall be zero extended.
+ */
+#define pr_get_base(pr) ({                                  \
+    pr_t *_pr = pr;                                         \
+    (uint64_t)_pr->prbar.reg.base << MPU_REGION_SHIFT;      \
+})
+
+/*
+ * Access to get limit address of MPU protection region(pr_t).
+ * The limit address shall be concatenated with 0x3f.
+ */
+#define pr_get_limit(pr) ({                                        \
+    pr_t *_pr = pr;                                                \
+    (uint64_t)((_pr->prlar.reg.limit << MPU_REGION_SHIFT) | 0x3f); \
+})
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ARM64_MPU_H__ */
diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h b/xen/arch/arm/include/asm/arm64/sysregs.h
index aca9bca5b1..c46daf6f69 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -505,6 +505,9 @@
 /* MPU Type registers encode */
 #define MPUIR_EL2 S3_4_C0_C0_4
 
+/* MPU Protection Region Enable Register encode */
+#define PRENR_EL2 S3_4_C6_C1_1
+
 #endif
 
 /* Access to system registers */
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index d2e19e836c..3a0d110b13 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -385,6 +385,45 @@ static int mpumap_contain_region(pr_t *mpu, uint64_t nr_regions,
     return MPUMAP_REGION_FAILED;
 }
 
+/* Disable or enable EL2 MPU memory region at index #index */
+static void control_mpu_region_from_index(uint64_t index, bool enable)
+{
+    pr_t region;
+
+    access_protection_region(true, &region, NULL, index);
+    if ( (region_is_valid(&region) && enable) ||
+         (!region_is_valid(&region) && !enable) )
+    {
+        printk(XENLOG_WARNING
+               "mpu: MPU memory region[%lu] is already %s\n", index,
+               enable ? "enabled" : "disabled");
+        return;
+    }
+
+    /*
+     * ARM64v8R provides PRENR_EL2 to have direct access to the
+     * PRLAR_EL2.EN bits of EL2 MPU regions from 0 to 31.
+     */
+    if ( index < MPU_PRENR_BITS )
+    {
+        uint64_t orig, after;
+
+        orig = READ_SYSREG(PRENR_EL2);
+        if ( enable )
+            /* Set respective bit */
+            after = orig | (1UL << index);
+        else
+            /* Clear respective bit */
+            after = orig & (~(1UL << index));
+        WRITE_SYSREG(after, PRENR_EL2);
+    }
+    else
+    {
+        region.prlar.reg.en = enable ? 1 : 0;
+        access_protection_region(false, NULL, (const pr_t*)&region, index);
+    }
+}
+
 /*
  * Update an entry at the index @idx.
  * @base:  base address
@@ -449,6 +488,30 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
         if ( system_state <= SYS_STATE_active )
             update_boot_xen_mpumap_idx(idx);
     }
+    else
+    {
+        /*
+         * Currently, we only support destroying a *WHOLE* MPU memory region,
+         * part-region removing is not supported, as in worst case, it will
+         * lead to two fragments in result after destroying.
+         * part-region removing will be introduced only when actual usage
+         * comes.
+         */
+        if ( rc == MPUMAP_REGION_INCLUSIVE )
+        {
+            region_printk("mpu: part-region removing is not supported\n");
+            return -EINVAL;
+        }
+
+        /* We are removing the region */
+        if ( rc != MPUMAP_REGION_FOUND )
+            return -EINVAL;
+
+        control_mpu_region_from_index(idx, false);
+
+        /* Clear the according MPU memory region entry.*/
+        memset(&xen_mpumap[idx], 0, sizeof(pr_t));
+    }
 
     return 0;
 }
@@ -589,6 +652,15 @@ static void __init map_mpu_memory_section_on_boot(enum mpu_section_info type,
     }
 }
 
+int destroy_xen_mappings(unsigned long s, unsigned long e)
+{
+    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
+    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
+    ASSERT(s <= e);
+
+    return xen_mpumap_update(s, e, 0);
+}
+
 /*
  * System RAM is statically partitioned into different functionality
  * section in Device Tree, including static xenheap, guest memory
@@ -656,11 +728,6 @@ void *ioremap(paddr_t pa, size_t len)
     return NULL;
 }
 
-int destroy_xen_mappings(unsigned long s, unsigned long e)
-{
-    return -ENOSYS;
-}
-
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
 {
     return -ENOSYS;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 27/40] xen/mpu: map device memory resource in MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (25 preceding siblings ...)
  2023-01-13  5:28 ` [PATCH v2 26/40] xen/mpu: destroy an existing entry in Xen MPU memory mapping table Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 28/40] xen/mpu: map boot module section " Penny Zheng
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In MPU system, we could not afford mapping a new MPU memory region
with each new device, it will exhaust limited MPU memory regions
very quickly.

So we introduce `mpu,device-memory-section` for users to statically
configure the whole system device memory with the least number of
memory regions in Device Tree. This section shall cover all devices
that will be used in Xen, like `UART`, `GIC`, etc.

Before we map `mpu,device-memory-section` with device memory attributes and
permissions(REGION_HYPRVISOR_NOCACHE), we shall destroy the mapping for early
UART which got set up in assembly boot-time, to avoid MPU memory
region overlapping.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/arm64/mpu.h |  6 ++++--
 xen/arch/arm/include/asm/setup.h     |  1 +
 xen/arch/arm/mm_mpu.c                | 14 +++++++++++++-
 xen/arch/arm/setup_mpu.c             |  5 +++++
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
index c1dea1c8e9..8e8679bc82 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -69,10 +69,11 @@
 #define REGION_TRANSIENT_MASK(x)  (((x) >> _REGION_TRANSIENT_BIT) & 0x3U)
 
 /*
- * _REGION_NORMAL is convenience define. It is not meant to be used
- * outside of this header.
+ * _REGION_NORMAL and _REGION_DEVICE are convenience defines. They are not
+ * meant to be used outside of this header.
  */
 #define _REGION_NORMAL            (MT_NORMAL|_REGION_PRESENT)
+#define _REGION_DEVICE            (_REGION_XN|_REGION_PRESENT)
 
 #define REGION_HYPERVISOR_RW      (_REGION_NORMAL|_REGION_XN)
 #define REGION_HYPERVISOR_RO      (_REGION_NORMAL|_REGION_XN|_REGION_RO)
@@ -80,6 +81,7 @@
 #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
 #define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
 #define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
+#define REGION_HYPERVISOR_NOCACHE (_REGION_DEVICE|MT_DEVICE_nGnRE|_REGION_SWITCH)
 
 #define INVALID_REGION            (~0UL)
 
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 3581f8f990..b7a2225c25 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -194,6 +194,7 @@ struct init_info
 /* Index of MPU memory section */
 enum mpu_section_info {
     MSINFO_GUEST,
+    MSINFO_DEVICE,
     MSINFO_MAX
 };
 
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 3a0d110b13..1566ba60af 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -73,6 +73,7 @@ struct page_info *frame_table;
 
 static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
     REGION_HYPERVISOR_SWITCH,
+    REGION_HYPERVISOR_NOCACHE,
 };
 
 /* Write a MPU protection region */
@@ -673,8 +674,19 @@ void __init setup_static_mappings(void)
     setup_staticheap_mappings();
 
     for ( uint8_t i = MSINFO_GUEST; i < MSINFO_MAX; i++ )
+    {
+#ifdef CONFIG_EARLY_PRINTK
+        if ( i == MSINFO_DEVICE )
+            /*
+             * Destroy early UART mapping before mapping device memory section.
+             * WARNING:console will be inaccessible temporarily.
+             */
+            destroy_xen_mappings(CONFIG_EARLY_UART_BASE_ADDRESS,
+                                 CONFIG_EARLY_UART_BASE_ADDRESS + EARLY_UART_SIZE);
+#endif
         map_mpu_memory_section_on_boot(i, mpu_section_mattr[i]);
-    /* TODO: device memory section, boot-module section, etc */
+    }
+    /* TODO: boot-module section, etc */
 }
 
 /* Map a frame table to cover physical addresses ps through pe */
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index 09a38a34a4..ec05542f68 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -29,6 +29,7 @@
 
 const char *mpu_section_info_str[MSINFO_MAX] = {
     "mpu,guest-memory-section",
+    "mpu,device-memory-section",
 };
 
 /*
@@ -47,6 +48,10 @@ struct mpuinfo __initdata mpuinfo;
  * through "xen,static-mem" property in MPU system. "mpu,guest-memory-section"
  * limits the scattering of "xen,static-mem", as users could not define
  * a "xen,static-mem" outside "mpu,guest-memory-section".
+ *
+ * "mpu,device-memory-section": this section draws the device memory layout
+ * with the least number of memory regions for all devices in system that will
+ * be used in Xen, like `UART`, `GIC`, etc.
  */
 static int __init process_mpu_memory_section(const void *fdt, int node,
                                              const char *name, void *data,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 28/40] xen/mpu: map boot module section in MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (26 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 27/40] xen/mpu: map device memory resource in MPU system Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 29/40] xen/mpu: introduce mpu_memory_section_contains for address range check Penny Zheng
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In MPU system, we could not afford mapping a new MPU memory region
with each new guest boot module, it will exhaust limited MPU memory regions
very quickly.

So we introduce `mpu,boot-module-section` for users to statically configure
one big memory section or very few memory sections for all guests' boot mudules.
Users shall make sure that any guest boot module defined in Device Tree is
within the section, including kernel module(BOOTMOD_KERNEL), device tree
passthrough module(BOOTMOD_GUEST_DTB), and ramdisk module(BOOTMOD_RAMDISK).

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/setup.h | 1 +
 xen/arch/arm/mm_mpu.c            | 2 +-
 xen/arch/arm/setup_mpu.c         | 7 +++++++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index b7a2225c25..61f24b5848 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -195,6 +195,7 @@ struct init_info
 enum mpu_section_info {
     MSINFO_GUEST,
     MSINFO_DEVICE,
+    MSINFO_BOOTMODULE,
     MSINFO_MAX
 };
 
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 1566ba60af..ea64aa38e4 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -74,6 +74,7 @@ struct page_info *frame_table;
 static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
     REGION_HYPERVISOR_SWITCH,
     REGION_HYPERVISOR_NOCACHE,
+    REGION_HYPERVISOR_BOOT,
 };
 
 /* Write a MPU protection region */
@@ -686,7 +687,6 @@ void __init setup_static_mappings(void)
 #endif
         map_mpu_memory_section_on_boot(i, mpu_section_mattr[i]);
     }
-    /* TODO: boot-module section, etc */
 }
 
 /* Map a frame table to cover physical addresses ps through pe */
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index ec05542f68..160934bf86 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -30,6 +30,7 @@
 const char *mpu_section_info_str[MSINFO_MAX] = {
     "mpu,guest-memory-section",
     "mpu,device-memory-section",
+    "mpu,boot-module-section",
 };
 
 /*
@@ -52,6 +53,12 @@ struct mpuinfo __initdata mpuinfo;
  * "mpu,device-memory-section": this section draws the device memory layout
  * with the least number of memory regions for all devices in system that will
  * be used in Xen, like `UART`, `GIC`, etc.
+ *
+ * "mpu,boot-module-section": this property uses one big memory section or
+ * very few memory sections to describe all guests' boot mudules. Users shall
+ * make sure that any guest boot module defined in Device Tree is within
+ * the section, including kernel module(BOOTMOD_KERNEL), device tree
+ * passthrough module(BOOTMOD_GUEST_DTB), and ramdisk module(BOOTMOD_RAMDISK).
  */
 static int __init process_mpu_memory_section(const void *fdt, int node,
                                              const char *name, void *data,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 29/40] xen/mpu: introduce mpu_memory_section_contains for address range check
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (27 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 28/40] xen/mpu: map boot module section " Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 30/40] xen/mpu: disable VMAP sub-system for MPU systems Penny Zheng
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

We have already introduced "mpu,xxx-memory-section" to limit system/domain
configuration, so we shall add check to verfify user's configuration.

We shall check if any guest boot module is within the boot module section,
including kernel module(BOOTMOD_KERNEL), device tree
passthrough module(BOOTMOD_GUEST_DTB), and ramdisk module(BOOTMOD_RAMDISK).

We also shall check if any guest RAM through "xen,static-mem" is within
the guest memory section.

Function mpu_memory_section_contains is introduced to do above check.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/domain_build.c      |  4 ++++
 xen/arch/arm/include/asm/setup.h |  2 ++
 xen/arch/arm/kernel.c            | 18 ++++++++++++++++++
 xen/arch/arm/setup_mpu.c         | 22 ++++++++++++++++++++++
 4 files changed, 46 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 829cea8de8..f48a3f679f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -546,6 +546,10 @@ static mfn_t __init acquire_static_memory_bank(struct domain *d,
                d, *psize);
         return INVALID_MFN;
     }
+#ifdef CONFIG_HAS_MPU
+    if ( !mpu_memory_section_contains(*pbase, *pbase + *psize, MSINFO_GUEST) )
+        return INVALID_MFN;
+#endif
 
     smfn = maddr_to_mfn(*pbase);
     res = acquire_domstatic_pages(d, smfn, PFN_DOWN(*psize), 0);
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 61f24b5848..d4c1336597 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -209,6 +209,8 @@ extern struct mpuinfo mpuinfo;
 
 extern int process_mpuinfo(const void *fdt, int node, uint32_t address_cells,
                            uint32_t size_cells);
+extern bool mpu_memory_section_contains(paddr_t s, paddr_t e,
+                                        enum mpu_section_info type);
 #endif /* CONFIG_HAS_MPU */
 
 #endif
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 0475d8fae7..ee7144ec13 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -467,6 +467,12 @@ int __init kernel_probe(struct kernel_info *info,
                 mod = boot_module_find_by_addr_and_kind(
                         BOOTMOD_KERNEL, kernel_addr);
                 info->kernel_bootmodule = mod;
+#ifdef CONFIG_HAS_MPU
+                if ( !mpu_memory_section_contains(mod->start,
+                                                  mod->start + mod->size,
+                                                  MSINFO_BOOTMODULE) )
+                    return -EINVAL;
+#endif
             }
             else if ( dt_device_is_compatible(node, "multiboot,ramdisk") )
             {
@@ -477,6 +483,12 @@ int __init kernel_probe(struct kernel_info *info,
                 dt_get_range(&val, node, &initrd_addr, &size);
                 info->initrd_bootmodule = boot_module_find_by_addr_and_kind(
                         BOOTMOD_RAMDISK, initrd_addr);
+#ifdef CONFIG_HAS_MPU
+                if ( !mpu_memory_section_contains(mod->start,
+                                                  mod->start + mod->size,
+                                                  MSINFO_BOOTMODULE) )
+                    return -EINVAL;
+#endif
             }
             else if ( dt_device_is_compatible(node, "multiboot,device-tree") )
             {
@@ -489,6 +501,12 @@ int __init kernel_probe(struct kernel_info *info,
                 dt_get_range(&val, node, &dtb_addr, &size);
                 info->dtb_bootmodule = boot_module_find_by_addr_and_kind(
                         BOOTMOD_GUEST_DTB, dtb_addr);
+#ifdef CONFIG_HAS_MPU
+                if ( !mpu_memory_section_contains(mod->start,
+                                                  mod->start + mod->size,
+                                                  MSINFO_BOOTMODULE) )
+                    return -EINVAL;
+#endif
             }
             else
                 continue;
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index 160934bf86..f7d74ea604 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -130,6 +130,28 @@ void __init setup_mm(void)
     init_staticmem_pages();
 }
 
+bool __init mpu_memory_section_contains(paddr_t s, paddr_t e,
+                                        enum mpu_section_info type)
+{
+    unsigned int i = 0;
+
+    for ( ; i < mpuinfo.sections[type].nr_banks; i++ )
+    {
+        paddr_t section_start = mpuinfo.sections[type].bank[i].start;
+        paddr_t section_size = mpuinfo.sections[type].bank[i].size;
+        paddr_t section_end = section_start + section_size;
+
+        /* range inclusive */
+        if ( s >= section_start && e <= section_end )
+            return true;
+    }
+
+    printk(XENLOG_ERR
+           "mpu: invalid range configuration 0x%"PRIpaddr" - 0x%"PRIpaddr", and it shall be within %s\n",
+           s, e, mpu_section_info_str[i]);
+    return false;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 30/40] xen/mpu: disable VMAP sub-system for MPU systems
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (28 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 29/40] xen/mpu: introduce mpu_memory_section_contains for address range check Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  9:39   ` Jan Beulich
  2023-01-13  5:29 ` [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system Penny Zheng
                   ` (12 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Roger Pau Monné,
	Penny Zheng

VMAP in MMU system, is used to remap a range of normal memory
or device memory to another virtual address with new attributes
for specific purpose, like ALTERNATIVE feature. Since there is
no virtual address translation support in MPU system, we can
not support VMAP in MPU system.

So in this patch, we disable VMAP for MPU systems, and some
features depending on VMAP also need to be disabled at the same
time, Like ALTERNATIVE, CPU ERRATA.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Kconfig                   |  3 +-
 xen/arch/arm/Makefile                  |  2 +-
 xen/arch/arm/include/asm/alternative.h | 15 +++++
 xen/arch/arm/include/asm/cpuerrata.h   | 12 ++++
 xen/arch/arm/setup.c                   |  7 +++
 xen/arch/x86/Kconfig                   |  1 +
 xen/common/Kconfig                     |  3 +
 xen/common/Makefile                    |  2 +-
 xen/include/xen/vmap.h                 | 81 ++++++++++++++++++++++++--
 9 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index c6b6b612d1..9230c8b885 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -11,12 +11,13 @@ config ARM_64
 
 config ARM
 	def_bool y
-	select HAS_ALTERNATIVE
+	select HAS_ALTERNATIVE if !ARM_V8R
 	select HAS_DEVICE_TREE
 	select HAS_PASSTHROUGH
 	select HAS_PDX
 	select HAS_PMAP
 	select IOMMU_FORCE_PT_SHARE
+	select HAS_VMAP if !ARM_V8R
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 23dfbc3333..c949661590 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_HAS_VPCI) += vpci.o
 
 obj-$(CONFIG_HAS_ALTERNATIVE) += alternative.o
 obj-y += bootfdt.init.o
-obj-y += cpuerrata.o
+obj-$(CONFIG_HAS_ALTERNATIVE) += cpuerrata.o
 obj-y += cpufeature.o
 obj-y += decode.o
 obj-y += device.o
diff --git a/xen/arch/arm/include/asm/alternative.h b/xen/arch/arm/include/asm/alternative.h
index 1eb4b60fbb..bc23d1d34f 100644
--- a/xen/arch/arm/include/asm/alternative.h
+++ b/xen/arch/arm/include/asm/alternative.h
@@ -8,6 +8,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <xen/errno.h>
 #include <xen/types.h>
 #include <xen/stringify.h>
 
@@ -28,8 +29,22 @@ typedef void (*alternative_cb_t)(const struct alt_instr *alt,
 				 const uint32_t *origptr, uint32_t *updptr,
 				 int nr_inst);
 
+#ifdef CONFIG_HAS_ALTERNATIVE
 void apply_alternatives_all(void);
 int apply_alternatives(const struct alt_instr *start, const struct alt_instr *end);
+#else
+static inline void apply_alternatives_all(void)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline int apply_alternatives(const struct alt_instr *start,
+                                     const struct alt_instr *end)
+{
+    ASSERT_UNREACHABLE();
+    return -EINVAL;
+}
+#endif /* !CONFIG_HAS_ALTERNATIVE */
 
 #define ALTINSTR_ENTRY(feature, cb)					      \
 	" .word 661b - .\n"				/* label           */ \
diff --git a/xen/arch/arm/include/asm/cpuerrata.h b/xen/arch/arm/include/asm/cpuerrata.h
index 8d7e7b9375..5d97f33763 100644
--- a/xen/arch/arm/include/asm/cpuerrata.h
+++ b/xen/arch/arm/include/asm/cpuerrata.h
@@ -4,8 +4,20 @@
 #include <asm/cpufeature.h>
 #include <asm/alternative.h>
 
+#ifdef CONFIG_HAS_ALTERNATIVE
 void check_local_cpu_errata(void);
 void enable_errata_workarounds(void);
+#else
+static inline void check_local_cpu_errata(void)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline void enable_errata_workarounds(void)
+{
+    ASSERT_UNREACHABLE();
+}
+#endif /* !CONFIG_HAS_ALTERNATIVE */
 
 #define CHECK_WORKAROUND_HELPER(erratum, feature, arch)         \
 static inline bool check_workaround_##erratum(void)             \
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 3ebf9e9a5c..0eac33e68c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -721,7 +721,9 @@ void __init start_xen(unsigned long boot_phys_offset,
      */
     system_state = SYS_STATE_boot;
 
+#ifdef CONFIG_HAS_VMAP
     vm_init();
+#endif
 
     if ( acpi_disabled )
     {
@@ -753,11 +755,13 @@ void __init start_xen(unsigned long boot_phys_offset,
     nr_cpu_ids = smp_get_max_cpus();
     printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", nr_cpu_ids);
 
+#ifdef CONFIG_HAS_ALTERNATIVE
     /*
      * Some errata relies on SMCCC version which is detected by psci_init()
      * (called from smp_init_cpus()).
      */
     check_local_cpu_errata();
+#endif
 
     check_local_cpu_features();
 
@@ -824,12 +828,15 @@ void __init start_xen(unsigned long boot_phys_offset,
 
     do_initcalls();
 
+
+#ifdef CONFIG_HAS_ALTERNATIVE
     /*
      * It needs to be called after do_initcalls to be able to use
      * stop_machine (tasklets initialized via an initcall).
      */
     apply_alternatives_all();
     enable_errata_workarounds();
+#endif
     enable_cpu_features();
 
     /* Create initial domain 0. */
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 6a7825f4ba..7f072cc603 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -28,6 +28,7 @@ config X86
 	select HAS_UBSAN
 	select HAS_VPCI if HVM
 	select NEEDS_LIBELF
+	select HAS_VMAP
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index f1ea3199c8..ba16366a4b 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -61,6 +61,9 @@ config HAS_SCHED_GRANULARITY
 config HAS_UBSAN
 	bool
 
+config HAS_VMAP
+	bool
+
 config MEM_ACCESS_ALWAYS_ON
 	bool
 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 9a3a12b12d..9d991effb2 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -50,7 +50,7 @@ obj-$(CONFIG_TRACEBUFFER) += trace.o
 obj-y += version.o
 obj-y += virtual_region.o
 obj-y += vm_event.o
-obj-y += vmap.o
+obj-$(CONFIG_HAS_VMAP) += vmap.o
 obj-y += vsprintf.o
 obj-y += wait.o
 obj-bin-y += warning.init.o
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index b0f7632e89..2e3ae0ca6a 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -1,15 +1,17 @@
-#if !defined(__XEN_VMAP_H__) && defined(VMAP_VIRT_START)
+#if !defined(__XEN_VMAP_H__) && (defined(VMAP_VIRT_START) || !defined(CONFIG_HAS_VMAP))
 #define __XEN_VMAP_H__
 
-#include <xen/mm-frame.h>
-#include <xen/page-size.h>
-
 enum vmap_region {
     VMAP_DEFAULT,
     VMAP_XEN,
     VMAP_REGION_NR,
 };
 
+#ifdef CONFIG_HAS_VMAP
+
+#include <xen/mm-frame.h>
+#include <xen/page-size.h>
+
 void vm_init_type(enum vmap_region type, void *start, void *end);
 
 void *__vmap(const mfn_t *mfn, unsigned int granularity, unsigned int nr,
@@ -38,4 +40,75 @@ static inline void vm_init(void)
     vm_init_type(VMAP_DEFAULT, (void *)VMAP_VIRT_START, arch_vmap_virt_end());
 }
 
+#else /* !CONFIG_HAS_VMAP */
+
+static inline void vm_init_type(enum vmap_region type, void *start, void *end)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline void *__vmap(const mfn_t *mfn, unsigned int granularity,
+                           unsigned int nr, unsigned int align,
+                           unsigned int flags, enum vmap_region type)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void *vmap(const mfn_t *mfn, unsigned int nr)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void vunmap(const void *va)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline void *vmalloc(size_t size)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void *vmalloc_xen(size_t size)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void *vzalloc(size_t size)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void vfree(void *va)
+{
+    ASSERT_UNREACHABLE();
+}
+
+void __iomem *ioremap(paddr_t, size_t)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void iounmap(void __iomem *va)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline void *arch_vmap_virt_end(void)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void vm_init(void)
+{
+    ASSERT_UNREACHABLE();
+}
+#endif  /* !CONFIG_HAS_VMAP */
 #endif /* __XEN_VMAP_H__ */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (29 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 30/40] xen/mpu: disable VMAP sub-system for MPU systems Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  9:42   ` Jan Beulich
  2023-01-13 10:10   ` Jan Beulich
  2023-01-13  5:29 ` [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx Penny Zheng
                   ` (11 subsequent siblings)
  42 siblings, 2 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Penny Zheng

FIXMAP in MMU system is used to do special-purpose 4K mapping, like
mapping early UART, temporarily mapping source codes for copy and paste
(copy_from_paddr), ect. As there is no VMSA in MPU system, we do not
support FIXMAP in MPU system.

We deine !CONFIG_HAS_FIXMAP to provide empty stubbers for MPU system

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Kconfig              |  3 ++-
 xen/arch/arm/include/asm/fixmap.h | 28 +++++++++++++++++++++++++---
 xen/common/Kconfig                |  3 +++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 9230c8b885..91491341c4 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -13,9 +13,10 @@ config ARM
 	def_bool y
 	select HAS_ALTERNATIVE if !ARM_V8R
 	select HAS_DEVICE_TREE
+	select HAS_FIXMAP if !ARM_V8R
 	select HAS_PASSTHROUGH
 	select HAS_PDX
-	select HAS_PMAP
+	select HAS_PMAP if !ARM_V8R
 	select IOMMU_FORCE_PT_SHARE
 	select HAS_VMAP if !ARM_V8R
 
diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h
index d0c9a52c8c..f0f4eb57ac 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -4,9 +4,6 @@
 #ifndef __ASM_FIXMAP_H
 #define __ASM_FIXMAP_H
 
-#include <xen/acpi.h>
-#include <xen/pmap.h>
-
 /* Fixmap slots */
 #define FIXMAP_CONSOLE  0  /* The primary UART */
 #define FIXMAP_MISC     1  /* Ephemeral mappings of hardware */
@@ -22,6 +19,11 @@
 
 #ifndef __ASSEMBLY__
 
+#ifdef CONFIG_HAS_FIXMAP
+
+#include <xen/acpi.h>
+#include <xen/pmap.h>
+
 /*
  * Direct access to xen_fixmap[] should only happen when {set,
  * clear}_fixmap() is unusable (e.g. where we would end up to
@@ -43,6 +45,26 @@ static inline unsigned int virt_to_fix(vaddr_t vaddr)
     return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
 }
 
+#else /* !CONFIG_HAS_FIXMAP */
+
+static inline void set_fixmap(unsigned int map, mfn_t mfn,
+                              unsigned int attributes)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline void clear_fixmap(unsigned int map)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline unsigned int virt_to_fix(vaddr_t vaddr)
+{
+    ASSERT_UNREACHABLE();
+    return -EINVAL;
+}
+#endif /* !CONFIG_HAS_FIXMAP */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_FIXMAP_H */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index ba16366a4b..680dc6f59c 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -43,6 +43,9 @@ config HAS_EX_TABLE
 config HAS_FAST_MULTIPLY
 	bool
 
+config HAS_FIXMAP
+	bool
+
 config HAS_IOPORTS
 	bool
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (30 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  9:49   ` Jan Beulich
  2023-02-09 11:14   ` Julien Grall
  2023-01-13  5:29 ` [PATCH v2 33/40] xen/arm: check mapping status and attributes for MPU copy_from_paddr Penny Zheng
                   ` (10 subsequent siblings)
  42 siblings, 2 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Penny Zheng

Function ioremap_xxx is normally being used to remap device address ranges
in MMU system during device driver initialization.

However, in MPU system, virtual translation is not supported and
device memory layout is statically configured in Device Tree, and being mapped
at very early stage.
So here we only add a check to verify this assumption.

But for tolerating a few cases where the function is called to map for
temporary copy and paste, like ioremap_wc in kernel image loading, the
region attribute mismatch will be treated as warning than error.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/arm64/mpu.h |  1 +
 xen/arch/arm/include/asm/mm.h        | 16 ++++-
 xen/arch/arm/include/asm/mm_mpu.h    |  2 +
 xen/arch/arm/mm_mpu.c                | 88 ++++++++++++++++++++++++----
 xen/include/xen/vmap.h               | 12 ++++
 5 files changed, 106 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
index 8e8679bc82..b4e50a9a0e 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -82,6 +82,7 @@
 #define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
 #define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
 #define REGION_HYPERVISOR_NOCACHE (_REGION_DEVICE|MT_DEVICE_nGnRE|_REGION_SWITCH)
+#define REGION_HYPERVISOR_WC      (_REGION_DEVICE|MT_NORMAL_NC)
 
 #define INVALID_REGION            (~0UL)
 
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 7969ec9f98..fa44cfc50d 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -14,6 +14,10 @@
 # error "unknown ARM variant"
 #endif
 
+#if defined(CONFIG_HAS_MPU)
+# include <asm/arm64/mpu.h>
+#endif
+
 /* Align Xen to a 2 MiB boundary. */
 #define XEN_PADDR_ALIGN (1 << 21)
 
@@ -198,19 +202,25 @@ extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
 /* map a physical range in virtual memory */
 void __iomem *ioremap_attr(paddr_t start, size_t len, unsigned int attributes);
 
+#ifndef CONFIG_HAS_MPU
+#define DEFINE_ATTRIBUTE(var)   (PAGE_##var)
+#else
+#define DEFINE_ATTRIBUTE(var)   (REGION_##var)
+#endif
+
 static inline void __iomem *ioremap_nocache(paddr_t start, size_t len)
 {
-    return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
+    return ioremap_attr(start, len, DEFINE_ATTRIBUTE(HYPERVISOR_NOCACHE));
 }
 
 static inline void __iomem *ioremap_cache(paddr_t start, size_t len)
 {
-    return ioremap_attr(start, len, PAGE_HYPERVISOR);
+    return ioremap_attr(start, len, DEFINE_ATTRIBUTE(HYPERVISOR));
 }
 
 static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
 {
-    return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
+    return ioremap_attr(start, len, DEFINE_ATTRIBUTE(HYPERVISOR_WC));
 }
 
 /* XXX -- account for base */
diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
index eebd5b5d35..5aa61c43b6 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -2,6 +2,8 @@
 #ifndef __ARCH_ARM_MM_MPU__
 #define __ARCH_ARM_MM_MPU__
 
+#include <asm/arm64/mpu.h>
+
 #define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
 /*
  * Function setup_static_mappings() sets up MPU memory region mapping
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index ea64aa38e4..7b54c87acf 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -712,32 +712,100 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
            frametable_size - (nr_pdxs * sizeof(struct page_info)));
 }
 
-/* TODO: Implementation on the first usage */
-void dump_hyp_walk(vaddr_t addr)
+static bool region_attribute_match(pr_t *region, unsigned int attributes)
 {
+    if ( region->prbar.reg.ap != REGION_AP_MASK(attributes) )
+    {
+        printk(XENLOG_ERR "region permission is not matched (0x%x -> 0x%x)\n",
+               region->prbar.reg.ap, REGION_AP_MASK(attributes));
+        return false;
+    }
+
+    if ( region->prbar.reg.xn != REGION_XN_MASK(attributes) )
+    {
+        printk(XENLOG_ERR "region execution permission is not matched (0x%x -> 0x%x)\n",
+               region->prbar.reg.xn, REGION_XN_MASK(attributes));
+        return false;
+    }
+
+    if ( region->prlar.reg.ai != REGION_AI_MASK(attributes) )
+    {
+        printk(XENLOG_ERR "region memory attributes is not matched (0x%x -> 0x%x)\n",
+               region->prlar.reg.ai, REGION_AI_MASK(attributes));
+        return false;
+    }
+
+    return true;
 }
 
-void __init remove_early_mappings(void)
+static bool check_region_and_attributes(paddr_t pa, size_t len,
+                                        unsigned int attributes,
+                                        const char *prefix)
+{
+    pr_t *region;
+    int rc;
+    uint64_t idx;
+
+    rc = mpumap_contain_region(xen_mpumap, max_xen_mpumap, pa, pa + len - 1,
+                               &idx);
+    if ( rc != MPUMAP_REGION_FOUND && rc != MPUMAP_REGION_INCLUSIVE )
+    {
+        region_printk("%s: range 0x%"PRIpaddr" - 0x%"PRIpaddr" has not been properly mapped\n",
+                      prefix, pa, pa + len - 1);
+        return false;
+    }
+
+    region = &xen_mpumap[idx];
+    /*
+     * For tolerating a few cases where the function is called to remap for
+     * temporary copy and paste, like ioremap_wc in kernel image loading, the
+     * permission mismatch will be treated as warning than error.
+     */
+    if ( !region_attribute_match(region, attributes) )
+        printk(XENLOG_WARNING
+               "mpu: %s: range 0x%"PRIpaddr" - 0x%"PRIpaddr" attributes mismatched\n",
+               prefix, pa, pa + len - 1);
+
+    return true;
+}
+
+/*
+ * This function is normally being used to remap device address ranges
+ * in MMU system.
+ * However, in MPU system, virtual translation is not supported and
+ * device memory is statically configured in FDT, while being mapped at very
+ * early stage.
+ * So here we only add a check to verify this assumption.
+ */
+void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
 {
+    if ( !check_region_and_attributes(pa, len, attributes, "ioremap") )
+        return NULL;
+
+    return maddr_to_virt(pa);
 }
 
-int init_secondary_pagetables(int cpu)
+void *ioremap(paddr_t pa, size_t len)
 {
-    return -ENOSYS;
+    return ioremap_attr(pa, len, REGION_HYPERVISOR_NOCACHE);
 }
 
-void mmu_init_secondary_cpu(void)
+/* TODO: Implementation on the first usage */
+void dump_hyp_walk(vaddr_t addr)
 {
 }
 
-void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
+void __init remove_early_mappings(void)
 {
-    return NULL;
 }
 
-void *ioremap(paddr_t pa, size_t len)
+int init_secondary_pagetables(int cpu)
+{
+    return -ENOSYS;
+}
+
+void mmu_init_secondary_cpu(void)
 {
-    return NULL;
 }
 
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 2e3ae0ca6a..fc56d02fc8 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -89,15 +89,27 @@ static inline void vfree(void *va)
     ASSERT_UNREACHABLE();
 }
 
+#ifdef CONFIG_HAS_MPU
+void __iomem *ioremap(paddr_t, size_t);
+#else
 void __iomem *ioremap(paddr_t, size_t)
 {
     ASSERT_UNREACHABLE();
     return NULL;
 }
+#endif
 
 static inline void iounmap(void __iomem *va)
 {
+#ifdef CONFIG_HAS_MPU
+    /*
+     * iounmap and ioremap are a couple, and as ioremap is only doing
+     * checking in MPU system, we do nothing and just return in iounmap
+     */
+    return;
+#else
     ASSERT_UNREACHABLE();
+#endif
 }
 
 static inline void *arch_vmap_virt_end(void)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 33/40] xen/arm: check mapping status and attributes for MPU copy_from_paddr
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (31 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 34/40] xen/mpu: free init memory in MPU system Penny Zheng
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Penny Zheng

From: Wei Chen <wei.chen@arm.com>

We introduce map_page_to_xen_misc/unmap_page_to_xen_misc to temporarily
map a page in Xen misc field to gain access, however, in MPU system,
all resource is statically configured in Device Tree and already mapped
at very early boot stage.

When enabling map_page_to_xen_misc for copy_from_paddr in MPU system,
we need to check whether a given paddr is properly mapped.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
---
 xen/arch/arm/kernel.c |  2 +-
 xen/arch/arm/mm_mpu.c | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index ee7144ec13..ce2b3347d7 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -57,7 +57,7 @@ void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
         s = paddr & (PAGE_SIZE - 1);
         l = min(PAGE_SIZE - s, len);
 
-        src = map_page_to_xen_misc(maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+        src = map_page_to_xen_misc(maddr_to_mfn(paddr), DEFINE_ATTRIBUTE(HYPERVISOR_WC));
         ASSERT(src != NULL);
         memcpy(dst, src + s, l);
         clean_dcache_va_range(dst, l);
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 7b54c87acf..0b720004ee 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -790,6 +790,27 @@ void *ioremap(paddr_t pa, size_t len)
     return ioremap_attr(pa, len, REGION_HYPERVISOR_NOCACHE);
 }
 
+/*
+ * In MPU system, due to limited MPU memory regions, all resource is statically
+ * configured in Device Tree and mapped at very early stage, dynamic temporary
+ * page mapping is not allowed.
+ * So in map_page_to_xen_misc, we need to check if page is already properly
+ * mapped with #attributes.
+ */
+void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes)
+{
+    paddr_t pa = mfn_to_maddr(mfn);
+
+    if ( !check_region_and_attributes(pa, PAGE_SIZE, attributes, "map_to_misc") )
+        return NULL;
+
+    return maddr_to_virt(pa);
+}
+
+void unmap_page_from_xen_misc(void)
+{
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 34/40] xen/mpu: free init memory in MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (32 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 33/40] xen/arm: check mapping status and attributes for MPU copy_from_paddr Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-02-09 11:27   ` Julien Grall
  2023-01-13  5:29 ` [PATCH v2 35/40] xen/mpu: destroy boot modules and early FDT mapping " Penny Zheng
                   ` (8 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

This commit implements free_init_memory in MPU system, trying to keep
the same strategy with MMU system.

In order to inserting BRK instruction into init code section, which
aims to provok a fault on purpose, we should change init code section
permission to RW at first.
Function modify_xen_mappings is introduced to modify permission of the
existing valid MPU memory region.

Then we nuke the instruction cache to remove entries related to init
text.
At last, we destroy these two MPU memory regions referring init text and
init data using destroy_xen_mappings.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/mm_mpu.c | 85 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 83 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 0b720004ee..de0c7d919a 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -20,6 +20,7 @@
  */
 
 #include <xen/init.h>
+#include <xen/kernel.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/mm.h>
 #include <xen/page-size.h>
@@ -77,6 +78,8 @@ static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
     REGION_HYPERVISOR_BOOT,
 };
 
+extern char __init_data_begin[], __init_end[];
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
     uint64_t _sel = sel;                                                \
@@ -443,8 +446,41 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
     if ( rc == MPUMAP_REGION_OVERLAP )
         return -EINVAL;
 
+    /* We are updating the permission. */
+    if ( (flags & _REGION_PRESENT) && (rc == MPUMAP_REGION_FOUND ||
+                                       rc == MPUMAP_REGION_INCLUSIVE) )
+    {
+
+        /*
+         * Currently, we only support modifying a *WHOLE* MPU memory region,
+         * part-region modification is not supported, as in worst case, it will
+         * lead to three fragments in result after modification.
+         * part-region modification will be introduced only when actual usage
+         * come
+         */
+        if ( rc == MPUMAP_REGION_INCLUSIVE )
+        {
+            region_printk("mpu: part-region modification is not supported\n");
+            return -EINVAL;
+        }
+
+        /* We don't allow changing memory attributes. */
+        if (xen_mpumap[idx].prlar.reg.ai != REGION_AI_MASK(flags) )
+        {
+            region_printk("Modifying memory attributes is not allowed (0x%x -> 0x%x).\n",
+                          xen_mpumap[idx].prlar.reg.ai, REGION_AI_MASK(flags));
+            return -EINVAL;
+        }
+
+        /* Set new permission */
+        xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
+        xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
+
+        access_protection_region(false, NULL, (const pr_t*)(&xen_mpumap[idx]),
+                                 idx);
+    }
     /* We are inserting a mapping => Create new region. */
-    if ( flags & _REGION_PRESENT )
+    else if ( flags & _REGION_PRESENT )
     {
         if ( rc != MPUMAP_REGION_FAILED )
             return -EINVAL;
@@ -831,11 +867,56 @@ void mmu_init_secondary_cpu(void)
 
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
 {
-    return -ENOSYS;
+    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
+    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
+    ASSERT(s <= e);
+    return xen_mpumap_update(s, e, flags);
 }
 
 void free_init_memory(void)
 {
+    /* Kernel init text section. */
+    paddr_t init_text = virt_to_maddr(_sinittext);
+    paddr_t init_text_end = round_pgup(virt_to_maddr(_einittext));
+    /* Kernel init data. */
+    paddr_t init_data = virt_to_maddr(__init_data_begin);
+    paddr_t init_data_end = round_pgup(virt_to_maddr(__init_end));
+    unsigned long init_section[4] = {(unsigned long)init_text,
+                                     (unsigned long)init_text_end,
+                                     (unsigned long)init_data,
+                                     (unsigned long)init_data_end};
+    unsigned int nr_init = 2;
+    uint32_t insn = AARCH64_BREAK_FAULT;
+    unsigned int i = 0, j = 0;
+
+    /* Change kernel init text section to RW. */
+    modify_xen_mappings((unsigned long)init_text,
+                        (unsigned long)init_text_end, REGION_HYPERVISOR_RW);
+
+    /*
+     * From now on, init will not be used for execution anymore,
+     * so nuke the instruction cache to remove entries related to init.
+     */
+    invalidate_icache_local();
+
+    /* Destroy two MPU memory regions referring init text and init data. */
+    for ( ; i < nr_init; i++ )
+    {
+        uint32_t *p;
+        unsigned int nr;
+        int rc;
+
+        i = 2 * i;
+        p = (uint32_t *)init_section[i];
+        nr = (init_section[i + 1] - init_section[i]) / sizeof(uint32_t);
+
+        for ( ; j < nr ; j++ )
+            *(p + j) = insn;
+
+        rc = destroy_xen_mappings(init_section[i], init_section[i + 1]);
+        if ( rc < 0 )
+            panic("Unable to remove the init section (rc = %d)\n", rc);
+    }
 }
 
 int xenmem_add_to_physmap_one(
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 35/40] xen/mpu: destroy boot modules and early FDT mapping in MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (33 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 34/40] xen/mpu: free init memory in MPU system Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 36/40] xen/mpu: Use secure hypervisor timer for AArch64v8R Penny Zheng
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In MMU system, we will free all memory as boot modules, like kernel
initramfs module, into heap, and it is not applicable in MPU system.
Heap must be statically configured in Device tree, so it should not
change.
In MPU system, we destory MPU memory regions of boot modules.

In MPU version of remove_early_mappings, we destroy MPU memory
region of early FDT mapping.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/mm_mpu.c    |  4 ++++
 xen/arch/arm/setup.c     | 25 -------------------------
 xen/arch/arm/setup_mmu.c | 25 +++++++++++++++++++++++++
 xen/arch/arm/setup_mpu.c | 26 ++++++++++++++++++++++++++
 4 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index de0c7d919a..118bb11d1a 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -854,6 +854,10 @@ void dump_hyp_walk(vaddr_t addr)
 
 void __init remove_early_mappings(void)
 {
+    /* Earlier, early FDT is mapped with MAX_FDT_SIZE in early_fdt_map */
+    if ( destroy_xen_mappings(round_pgdown(dtb_paddr),
+                              round_pgup(dtb_paddr + MAX_FDT_SIZE)) )
+        panic("Unable to destroy early Device-Tree mapping.\n");
 }
 
 int init_secondary_pagetables(int cpu)
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 0eac33e68c..49ba998f68 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -412,31 +412,6 @@ const char * __init boot_module_kind_as_string(bootmodule_kind kind)
     }
 }
 
-void __init discard_initial_modules(void)
-{
-    struct bootmodules *mi = &bootinfo.modules;
-    int i;
-
-    for ( i = 0; i < mi->nr_mods; i++ )
-    {
-        paddr_t s = mi->module[i].start;
-        paddr_t e = s + PAGE_ALIGN(mi->module[i].size);
-
-        if ( mi->module[i].kind == BOOTMOD_XEN )
-            continue;
-
-        if ( !mfn_valid(maddr_to_mfn(s)) ||
-             !mfn_valid(maddr_to_mfn(e)) )
-            continue;
-
-        fw_unreserved_regions(s, e, init_domheap_pages, 0);
-    }
-
-    mi->nr_mods = 0;
-
-    remove_early_mappings();
-}
-
 /* Relocate the FDT in Xen heap */
 static void * __init relocate_fdt(paddr_t dtb_paddr, size_t dtb_size)
 {
diff --git a/xen/arch/arm/setup_mmu.c b/xen/arch/arm/setup_mmu.c
index 7e5d87f8bd..611a60633e 100644
--- a/xen/arch/arm/setup_mmu.c
+++ b/xen/arch/arm/setup_mmu.c
@@ -340,6 +340,31 @@ void __init setup_mm(void)
 }
 #endif
 
+void __init discard_initial_modules(void)
+{
+    struct bootmodules *mi = &bootinfo.modules;
+    int i;
+
+    for ( i = 0; i < mi->nr_mods; i++ )
+    {
+        paddr_t s = mi->module[i].start;
+        paddr_t e = s + PAGE_ALIGN(mi->module[i].size);
+
+        if ( mi->module[i].kind == BOOTMOD_XEN )
+            continue;
+
+        if ( !mfn_valid(maddr_to_mfn(s)) ||
+             !mfn_valid(maddr_to_mfn(e)) )
+            continue;
+
+        fw_unreserved_regions(s, e, init_domheap_pages, 0);
+    }
+
+    mi->nr_mods = 0;
+
+    remove_early_mappings();
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index f7d74ea604..f47f1f39ee 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -152,6 +152,32 @@ bool __init mpu_memory_section_contains(paddr_t s, paddr_t e,
     return false;
 }
 
+void __init discard_initial_modules(void)
+{
+    unsigned int i = 0;
+
+    /*
+     * Xenheap in MPU system must be statically configured in FDT in MPU
+     * system, so its base address and size couldn't change and it could not
+     * accept freed memory from boot modules.
+     * Disable MPU memory region of boot module section, since it will be in
+     * no use after boot.
+     */
+    for ( ; i < mpuinfo.sections[MSINFO_BOOTMODULE].nr_banks; i++ )
+    {
+        paddr_t start = mpuinfo.sections[MSINFO_BOOTMODULE].bank[i].start;
+        paddr_t size = mpuinfo.sections[MSINFO_BOOTMODULE].bank[i].size;
+        int rc;
+
+        rc = destroy_xen_mappings(start, start + size);
+        if ( rc )
+            panic("mpu: Unable to destroy boot module section 0x%"PRIpaddr"- 0x%"PRIpaddr"\n",
+                  start, start + size);
+    }
+
+    remove_early_mappings();
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 36/40] xen/mpu: Use secure hypervisor timer for AArch64v8R
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (34 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 35/40] xen/mpu: destroy boot modules and early FDT mapping " Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-02-05 22:26   ` Julien Grall
  2023-01-13  5:29 ` [PATCH v2 37/40] xen/mpu: move MMU specific P2M code to p2m_mmu.c Penny Zheng
                   ` (6 subsequent siblings)
  42 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

As AArch64v8R only has one secure state, we have to use secure EL2 hypervisor
timer for Xen in secure EL2.

In this patch, we introduce a Kconfig option ARM_SECURE_STATE.
With this new Kconfig option, we can re-define the timer's
system register name in different secure state, but keep the
timer code flow unchanged.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Kconfig                     |  7 +++++++
 xen/arch/arm/include/asm/arm64/sysregs.h | 21 ++++++++++++++++++++-
 xen/arch/arm/include/asm/cpregs.h        |  4 ++--
 xen/arch/arm/time.c                      | 14 +++++++-------
 4 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 91491341c4..ee942a33bc 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -47,6 +47,13 @@ config ARM_EFI
 	  be booted as an EFI application. This is only useful for
 	  Xen that may run on systems that have UEFI firmware.
 
+config ARM_SECURE_STATE
+	bool "Xen will run in Arm Secure State"
+	depends on ARM_V8R
+	help
+	  In this state, a Processing Element (PE) can access the secure
+	  physical address space, and the secure copy of banked registers.
+
 config GICV3
 	bool "GICv3 driver"
 	depends on !NEW_VGIC
diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h b/xen/arch/arm/include/asm/arm64/sysregs.h
index c46daf6f69..9546e8e3d0 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -458,7 +458,6 @@
 #define ZCR_ELx_LEN_SIZE             9
 #define ZCR_ELx_LEN_MASK             0x1ff
 
-/* System registers for Armv8-R AArch64 */
 #ifdef CONFIG_HAS_MPU
 
 /* EL2 MPU Protection Region Base Address Register encode */
@@ -510,6 +509,26 @@
 
 #endif
 
+#ifdef CONFIG_ARM_SECURE_STATE
+/*
+ * The Armv8-R AArch64 architecture always executes code in Secure
+ * state with EL2 as the highest Exception.
+ *
+ * Hypervisor timer registers for Secure EL2.
+ */
+#define CNTHPS_TVAL_EL2  S3_4_C14_C5_0
+#define CNTHPS_CTL_EL2   S3_4_C14_C5_1
+#define CNTHPS_CVAL_EL2  S3_4_C14_C5_2
+#define CNTHPx_TVAL_EL2  CNTHPS_TVAL_EL2
+#define CNTHPx_CTL_EL2   CNTHPS_CTL_EL2
+#define CNTHPx_CVAL_EL2  CNTHPS_CVAL_EL2
+#else
+/* Hypervisor timer registers for Non-Secure EL2. */
+#define CNTHPx_TVAL_EL2  CNTHP_TVAL_EL2
+#define CNTHPx_CTL_EL2   CNTHP_CTL_EL2
+#define CNTHPx_CVAL_EL2  CNTHP_CVAL_EL2
+#endif /* CONFIG_ARM_SECURE_STATE */
+
 /* Access to system registers */
 
 #define WRITE_SYSREG64(v, name) do {                    \
diff --git a/xen/arch/arm/include/asm/cpregs.h b/xen/arch/arm/include/asm/cpregs.h
index 6b083de204..a704677fbc 100644
--- a/xen/arch/arm/include/asm/cpregs.h
+++ b/xen/arch/arm/include/asm/cpregs.h
@@ -374,8 +374,8 @@
 #define CLIDR_EL1               CLIDR
 #define CNTFRQ_EL0              CNTFRQ
 #define CNTHCTL_EL2             CNTHCTL
-#define CNTHP_CTL_EL2           CNTHP_CTL
-#define CNTHP_CVAL_EL2          CNTHP_CVAL
+#define CNTHPx_CTL_EL2          CNTHP_CTL
+#define CNTHPx_CVAL_EL2         CNTHP_CVAL
 #define CNTKCTL_EL1             CNTKCTL
 #define CNTPCT_EL0              CNTPCT
 #define CNTP_CTL_EL0            CNTP_CTL
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index 433d7be909..3bba733b83 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -196,13 +196,13 @@ int reprogram_timer(s_time_t timeout)
 
     if ( timeout == 0 )
     {
-        WRITE_SYSREG(0, CNTHP_CTL_EL2);
+        WRITE_SYSREG(0, CNTHPx_CTL_EL2);
         return 1;
     }
 
     deadline = ns_to_ticks(timeout) + boot_count;
-    WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2);
-    WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2);
+    WRITE_SYSREG64(deadline, CNTHPx_CVAL_EL2);
+    WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHPx_CTL_EL2);
     isb();
 
     /* No need to check for timers in the past; the Generic Timer fires
@@ -213,7 +213,7 @@ int reprogram_timer(s_time_t timeout)
 /* Handle the firing timer */
 static void htimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
 {
-    if ( unlikely(!(READ_SYSREG(CNTHP_CTL_EL2) & CNTx_CTL_PENDING)) )
+    if ( unlikely(!(READ_SYSREG(CNTHPx_CTL_EL2) & CNTx_CTL_PENDING)) )
         return;
 
     perfc_incr(hyp_timer_irqs);
@@ -222,7 +222,7 @@ static void htimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
     raise_softirq(TIMER_SOFTIRQ);
 
     /* Disable the timer to avoid more interrupts */
-    WRITE_SYSREG(0, CNTHP_CTL_EL2);
+    WRITE_SYSREG(0, CNTHPx_CTL_EL2);
 }
 
 static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
@@ -281,7 +281,7 @@ void init_timer_interrupt(void)
     /* Do not let the VMs program the physical timer, only read the physical counter */
     WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
     WRITE_SYSREG(0, CNTP_CTL_EL0);    /* Physical timer disabled */
-    WRITE_SYSREG(0, CNTHP_CTL_EL2);   /* Hypervisor's timer disabled */
+    WRITE_SYSREG(0, CNTHPx_CTL_EL2);   /* Hypervisor's timer disabled */
     isb();
 
     request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
@@ -301,7 +301,7 @@ void init_timer_interrupt(void)
 static void deinit_timer_interrupt(void)
 {
     WRITE_SYSREG(0, CNTP_CTL_EL0);    /* Disable physical timer */
-    WRITE_SYSREG(0, CNTHP_CTL_EL2);   /* Disable hypervisor's timer */
+    WRITE_SYSREG(0, CNTHPx_CTL_EL2);   /* Disable hypervisor's timer */
     isb();
 
     release_irq(timer_irq[TIMER_HYP_PPI], NULL);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 37/40] xen/mpu: move MMU specific P2M code to p2m_mmu.c
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (35 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 36/40] xen/mpu: Use secure hypervisor timer for AArch64v8R Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 38/40] xen/mpu: implement setup_virt_paging for MPU system Penny Zheng
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

Current P2M implementation is designed for MMU system. Only a few codes
can be shared by MPU system, like P2M pool, IPA, etc
We move the MMU-specific codes into p2m_mmu.c, and place stub functions
in p2m_mpu.c which wait for implementing on the first usage. And we
keep generic codes in p2m.c

We also move MMU-specific definitions to p2m_mmu.h, like P2M_ROOT_LEVEL and
function p2m_tlb_flush_sync.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Makefile              |    5 +
 xen/arch/arm/include/asm/p2m.h     |   17 +-
 xen/arch/arm/include/asm/p2m_mmu.h |   28 +
 xen/arch/arm/p2m.c                 | 2276 +--------------------------
 xen/arch/arm/p2m_mmu.c             | 2295 ++++++++++++++++++++++++++++
 xen/arch/arm/p2m_mpu.c             |  191 +++
 6 files changed, 2528 insertions(+), 2284 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/p2m_mmu.h
 create mode 100644 xen/arch/arm/p2m_mmu.c
 create mode 100644 xen/arch/arm/p2m_mpu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index c949661590..ea650db52b 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -44,6 +44,11 @@ obj-y += mm_mpu.o
 endif
 obj-y += monitor.o
 obj-y += p2m.o
+ifneq ($(CONFIG_HAS_MPU), y)
+obj-y += p2m_mmu.o
+else
+obj-y += p2m_mpu.o
+endif
 obj-y += percpu.o
 obj-y += platform.o
 obj-y += platform_hypercall.o
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index 91df922e1c..a430aca232 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -14,17 +14,6 @@
 /* Holds the bit size of IPAs in p2m tables.  */
 extern unsigned int p2m_ipa_bits;
 
-#ifdef CONFIG_ARM_64
-extern unsigned int p2m_root_order;
-extern unsigned int p2m_root_level;
-#define P2M_ROOT_ORDER    p2m_root_order
-#define P2M_ROOT_LEVEL p2m_root_level
-#else
-/* First level P2M is always 2 consecutive pages */
-#define P2M_ROOT_ORDER    1
-#define P2M_ROOT_LEVEL 1
-#endif
-
 struct domain;
 
 extern void memory_type_changed(struct domain *);
@@ -162,6 +151,10 @@ typedef enum {
 #endif
 #include <xen/p2m-common.h>
 
+#ifndef CONFIG_HAS_MPU
+#include <asm/p2m_mmu.h>
+#endif
+
 static inline bool arch_acquire_resource_check(struct domain *d)
 {
     /*
@@ -252,8 +245,6 @@ static inline int p2m_is_write_locked(struct p2m_domain *p2m)
     return rw_is_write_locked(&p2m->lock);
 }
 
-void p2m_tlb_flush_sync(struct p2m_domain *p2m);
-
 /* Look up the MFN corresponding to a domain's GFN. */
 mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t);
 
diff --git a/xen/arch/arm/include/asm/p2m_mmu.h b/xen/arch/arm/include/asm/p2m_mmu.h
new file mode 100644
index 0000000000..a0f2440336
--- /dev/null
+++ b/xen/arch/arm/include/asm/p2m_mmu.h
@@ -0,0 +1,28 @@
+#ifndef _XEN_P2M_MMU_H
+#define _XEN_P2M_MMU_H
+
+#ifdef CONFIG_ARM_64
+extern unsigned int p2m_root_order;
+extern unsigned int p2m_root_level;
+#define P2M_ROOT_ORDER    p2m_root_order
+#define P2M_ROOT_LEVEL p2m_root_level
+#else
+/* First level P2M is always 2 consecutive pages */
+#define P2M_ROOT_ORDER    1
+#define P2M_ROOT_LEVEL 1
+#endif
+
+struct p2m_domain;
+
+void p2m_tlb_flush_sync(struct p2m_domain *p2m);
+
+#endif /* _XEN_P2M_MMU_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 948f199d84..42f51051e0 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,36 +1,9 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#include <xen/cpu.h>
-#include <xen/domain_page.h>
-#include <xen/iocap.h>
-#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
-#include <xen/softirq.h>
 
-#include <asm/alternative.h>
 #include <asm/event.h>
-#include <asm/flushtlb.h>
-#include <asm/guest_walk.h>
 #include <asm/page.h>
-#include <asm/traps.h>
-
-#define MAX_VMID_8_BIT  (1UL << 8)
-#define MAX_VMID_16_BIT (1UL << 16)
-
-#define INVALID_VMID 0 /* VMID 0 is reserved */
-
-#ifdef CONFIG_ARM_64
-unsigned int __read_mostly p2m_root_order;
-unsigned int __read_mostly p2m_root_level;
-static unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
-/* VMID is by default 8 bit width on AArch64 */
-#define MAX_VMID       max_vmid
-#else
-/* VMID is always 8 bit width on AArch32 */
-#define MAX_VMID        MAX_VMID_8_BIT
-#endif
-
-#define P2M_ROOT_PAGES    (1<<P2M_ROOT_ORDER)
 
 /*
  * Set to the maximum configured support for IPA bits, so the number of IPA bits can be
@@ -38,50 +11,6 @@ static unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
  */
 unsigned int __read_mostly p2m_ipa_bits = PADDR_BITS;
 
-static mfn_t __read_mostly empty_root_mfn;
-
-static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
-{
-    return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
-}
-
-static struct page_info *p2m_alloc_page(struct domain *d)
-{
-    struct page_info *pg;
-
-    /*
-     * For hardware domain, there should be no limit in the number of pages that
-     * can be allocated, so that the kernel may take advantage of the extended
-     * regions. Hence, allocate p2m pages for hardware domains from heap.
-     */
-    if ( is_hardware_domain(d) )
-    {
-        pg = alloc_domheap_page(NULL, 0);
-        if ( pg == NULL )
-            printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n");
-    }
-    else
-    {
-        spin_lock(&d->arch.paging.lock);
-        pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
-        spin_unlock(&d->arch.paging.lock);
-    }
-
-    return pg;
-}
-
-static void p2m_free_page(struct domain *d, struct page_info *pg)
-{
-    if ( is_hardware_domain(d) )
-        free_domheap_page(pg);
-    else
-    {
-        spin_lock(&d->arch.paging.lock);
-        page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
-        spin_unlock(&d->arch.paging.lock);
-    }
-}
-
 /* Return the size of the pool, in bytes. */
 int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
 {
@@ -186,441 +115,10 @@ int p2m_teardown_allocation(struct domain *d)
     return ret;
 }
 
-/* Unlock the flush and do a P2M TLB flush if necessary */
-void p2m_write_unlock(struct p2m_domain *p2m)
-{
-    /*
-     * The final flush is done with the P2M write lock taken to avoid
-     * someone else modifying the P2M wbefore the TLB invalidation has
-     * completed.
-     */
-    p2m_tlb_flush_sync(p2m);
-
-    write_unlock(&p2m->lock);
-}
-
-void p2m_dump_info(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    p2m_read_lock(p2m);
-    printk("p2m mappings for domain %d (vmid %d):\n",
-           d->domain_id, p2m->vmid);
-    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
-    printk("  1G mappings: %ld (shattered %ld)\n",
-           p2m->stats.mappings[1], p2m->stats.shattered[1]);
-    printk("  2M mappings: %ld (shattered %ld)\n",
-           p2m->stats.mappings[2], p2m->stats.shattered[2]);
-    printk("  4K mappings: %ld\n", p2m->stats.mappings[3]);
-    p2m_read_unlock(p2m);
-}
-
 void memory_type_changed(struct domain *d)
 {
 }
 
-void dump_p2m_lookup(struct domain *d, paddr_t addr)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    printk("dom%d IPA 0x%"PRIpaddr"\n", d->domain_id, addr);
-
-    printk("P2M @ %p mfn:%#"PRI_mfn"\n",
-           p2m->root, mfn_x(page_to_mfn(p2m->root)));
-
-    dump_pt_walk(page_to_maddr(p2m->root), addr,
-                 P2M_ROOT_LEVEL, P2M_ROOT_PAGES);
-}
-
-/*
- * p2m_save_state and p2m_restore_state work in pair to workaround
- * ARM64_WORKAROUND_AT_SPECULATE. p2m_save_state will set-up VTTBR to
- * point to the empty page-tables to stop allocating TLB entries.
- */
-void p2m_save_state(struct vcpu *p)
-{
-    p->arch.sctlr = READ_SYSREG(SCTLR_EL1);
-
-    if ( cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-    {
-        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
-        /*
-         * Ensure VTTBR_EL2 is correctly synchronized so we can restore
-         * the next vCPU context without worrying about AT instruction
-         * speculation.
-         */
-        isb();
-    }
-}
-
-void p2m_restore_state(struct vcpu *n)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(n->domain);
-    uint8_t *last_vcpu_ran;
-
-    if ( is_idle_vcpu(n) )
-        return;
-
-    WRITE_SYSREG(n->arch.sctlr, SCTLR_EL1);
-    WRITE_SYSREG(n->arch.hcr_el2, HCR_EL2);
-
-    /*
-     * ARM64_WORKAROUND_AT_SPECULATE: VTTBR_EL2 should be restored after all
-     * registers associated to EL1/EL0 translations regime have been
-     * synchronized.
-     */
-    asm volatile(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_AT_SPECULATE));
-    WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
-
-    last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
-
-    /*
-     * While we are restoring an out-of-context translation regime
-     * we still need to ensure:
-     *  - VTTBR_EL2 is synchronized before flushing the TLBs
-     *  - All registers for EL1 are synchronized before executing an AT
-     *  instructions targeting S1/S2.
-     */
-    isb();
-
-    /*
-     * Flush local TLB for the domain to prevent wrong TLB translation
-     * when running multiple vCPU of the same domain on a single pCPU.
-     */
-    if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
-        flush_guest_tlb_local();
-
-    *last_vcpu_ran = n->vcpu_id;
-}
-
-/*
- * Force a synchronous P2M TLB flush.
- *
- * Must be called with the p2m lock held.
- */
-static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
-{
-    unsigned long flags = 0;
-    uint64_t ovttbr;
-
-    ASSERT(p2m_is_write_locked(p2m));
-
-    /*
-     * ARM only provides an instruction to flush TLBs for the current
-     * VMID. So switch to the VTTBR of a given P2M if different.
-     */
-    ovttbr = READ_SYSREG64(VTTBR_EL2);
-    if ( ovttbr != p2m->vttbr )
-    {
-        uint64_t vttbr;
-
-        local_irq_save(flags);
-
-        /*
-         * ARM64_WORKAROUND_AT_SPECULATE: We need to stop AT to allocate
-         * TLBs entries because the context is partially modified. We
-         * only need the VMID for flushing the TLBs, so we can generate
-         * a new VTTBR with the VMID to flush and the empty root table.
-         */
-        if ( !cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-            vttbr = p2m->vttbr;
-        else
-            vttbr = generate_vttbr(p2m->vmid, empty_root_mfn);
-
-        WRITE_SYSREG64(vttbr, VTTBR_EL2);
-
-        /* Ensure VTTBR_EL2 is synchronized before flushing the TLBs */
-        isb();
-    }
-
-    flush_guest_tlb();
-
-    if ( ovttbr != READ_SYSREG64(VTTBR_EL2) )
-    {
-        WRITE_SYSREG64(ovttbr, VTTBR_EL2);
-        /* Ensure VTTBR_EL2 is back in place before continuing. */
-        isb();
-        local_irq_restore(flags);
-    }
-
-    p2m->need_flush = false;
-}
-
-void p2m_tlb_flush_sync(struct p2m_domain *p2m)
-{
-    if ( p2m->need_flush )
-        p2m_force_tlb_flush_sync(p2m);
-}
-
-/*
- * Find and map the root page table. The caller is responsible for
- * unmapping the table.
- *
- * The function will return NULL if the offset of the root table is
- * invalid.
- */
-static lpae_t *p2m_get_root_pointer(struct p2m_domain *p2m,
-                                    gfn_t gfn)
-{
-    unsigned long root_table;
-
-    /*
-     * While the root table index is the offset from the previous level,
-     * we can't use (P2M_ROOT_LEVEL - 1) because the root level might be
-     * 0. Yet we still want to check if all the unused bits are zeroed.
-     */
-    root_table = gfn_x(gfn) >> (XEN_PT_LEVEL_ORDER(P2M_ROOT_LEVEL) +
-                                XEN_PT_LPAE_SHIFT);
-    if ( root_table >= P2M_ROOT_PAGES )
-        return NULL;
-
-    return __map_domain_page(p2m->root + root_table);
-}
-
-/*
- * Lookup the MFN corresponding to a domain's GFN.
- * Lookup mem access in the ratrix tree.
- * The entries associated to the GFN is considered valid.
- */
-static p2m_access_t p2m_mem_access_radix_get(struct p2m_domain *p2m, gfn_t gfn)
-{
-    void *ptr;
-
-    if ( !p2m->mem_access_enabled )
-        return p2m->default_access;
-
-    ptr = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
-    if ( !ptr )
-        return p2m_access_rwx;
-    else
-        return radix_tree_ptr_to_int(ptr);
-}
-
-/*
- * In the case of the P2M, the valid bit is used for other purpose. Use
- * the type to check whether an entry is valid.
- */
-static inline bool p2m_is_valid(lpae_t pte)
-{
-    return pte.p2m.type != p2m_invalid;
-}
-
-/*
- * lpae_is_* helpers don't check whether the valid bit is set in the
- * PTE. Provide our own overlay to check the valid bit.
- */
-static inline bool p2m_is_mapping(lpae_t pte, unsigned int level)
-{
-    return p2m_is_valid(pte) && lpae_is_mapping(pte, level);
-}
-
-static inline bool p2m_is_superpage(lpae_t pte, unsigned int level)
-{
-    return p2m_is_valid(pte) && lpae_is_superpage(pte, level);
-}
-
-#define GUEST_TABLE_MAP_FAILED 0
-#define GUEST_TABLE_SUPER_PAGE 1
-#define GUEST_TABLE_NORMAL_PAGE 2
-
-static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry);
-
-/*
- * Take the currently mapped table, find the corresponding GFN entry,
- * and map the next table, if available. The previous table will be
- * unmapped if the next level was mapped (e.g GUEST_TABLE_NORMAL_PAGE
- * returned).
- *
- * The read_only parameters indicates whether intermediate tables should
- * be allocated when not present.
- *
- * Return values:
- *  GUEST_TABLE_MAP_FAILED: Either read_only was set and the entry
- *  was empty, or allocating a new page failed.
- *  GUEST_TABLE_NORMAL_PAGE: next level mapped normally
- *  GUEST_TABLE_SUPER_PAGE: The next entry points to a superpage.
- */
-static int p2m_next_level(struct p2m_domain *p2m, bool read_only,
-                          unsigned int level, lpae_t **table,
-                          unsigned int offset)
-{
-    lpae_t *entry;
-    int ret;
-    mfn_t mfn;
-
-    entry = *table + offset;
-
-    if ( !p2m_is_valid(*entry) )
-    {
-        if ( read_only )
-            return GUEST_TABLE_MAP_FAILED;
-
-        ret = p2m_create_table(p2m, entry);
-        if ( ret )
-            return GUEST_TABLE_MAP_FAILED;
-    }
-
-    /* The function p2m_next_level is never called at the 3rd level */
-    ASSERT(level < 3);
-    if ( p2m_is_mapping(*entry, level) )
-        return GUEST_TABLE_SUPER_PAGE;
-
-    mfn = lpae_get_mfn(*entry);
-
-    unmap_domain_page(*table);
-    *table = map_domain_page(mfn);
-
-    return GUEST_TABLE_NORMAL_PAGE;
-}
-
-/*
- * Get the details of a given gfn.
- *
- * If the entry is present, the associated MFN will be returned and the
- * access and type filled up. The page_order will correspond to the
- * order of the mapping in the page table (i.e it could be a superpage).
- *
- * If the entry is not present, INVALID_MFN will be returned and the
- * page_order will be set according to the order of the invalid range.
- *
- * valid will contain the value of bit[0] (e.g valid bit) of the
- * entry.
- */
-mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
-                    p2m_type_t *t, p2m_access_t *a,
-                    unsigned int *page_order,
-                    bool *valid)
-{
-    paddr_t addr = gfn_to_gaddr(gfn);
-    unsigned int level = 0;
-    lpae_t entry, *table;
-    int rc;
-    mfn_t mfn = INVALID_MFN;
-    p2m_type_t _t;
-    DECLARE_OFFSETS(offsets, addr);
-
-    ASSERT(p2m_is_locked(p2m));
-    BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
-
-    /* Allow t to be NULL */
-    t = t ?: &_t;
-
-    *t = p2m_invalid;
-
-    if ( valid )
-        *valid = false;
-
-    /* XXX: Check if the mapping is lower than the mapped gfn */
-
-    /* This gfn is higher than the highest the p2m map currently holds */
-    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
-    {
-        for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
-            if ( (gfn_x(gfn) & (XEN_PT_LEVEL_MASK(level) >> PAGE_SHIFT)) >
-                 gfn_x(p2m->max_mapped_gfn) )
-                break;
-
-        goto out;
-    }
-
-    table = p2m_get_root_pointer(p2m, gfn);
-
-    /*
-     * the table should always be non-NULL because the gfn is below
-     * p2m->max_mapped_gfn and the root table pages are always present.
-     */
-    if ( !table )
-    {
-        ASSERT_UNREACHABLE();
-        level = P2M_ROOT_LEVEL;
-        goto out;
-    }
-
-    for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
-    {
-        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
-        if ( rc == GUEST_TABLE_MAP_FAILED )
-            goto out_unmap;
-        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
-            break;
-    }
-
-    entry = table[offsets[level]];
-
-    if ( p2m_is_valid(entry) )
-    {
-        *t = entry.p2m.type;
-
-        if ( a )
-            *a = p2m_mem_access_radix_get(p2m, gfn);
-
-        mfn = lpae_get_mfn(entry);
-        /*
-         * The entry may point to a superpage. Find the MFN associated
-         * to the GFN.
-         */
-        mfn = mfn_add(mfn,
-                      gfn_x(gfn) & ((1UL << XEN_PT_LEVEL_ORDER(level)) - 1));
-
-        if ( valid )
-            *valid = lpae_is_valid(entry);
-    }
-
-out_unmap:
-    unmap_domain_page(table);
-
-out:
-    if ( page_order )
-        *page_order = XEN_PT_LEVEL_ORDER(level);
-
-    return mfn;
-}
-
-mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
-{
-    mfn_t mfn;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    p2m_read_lock(p2m);
-    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL, NULL);
-    p2m_read_unlock(p2m);
-
-    return mfn;
-}
-
-struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn,
-                                        p2m_type_t *t)
-{
-    struct page_info *page;
-    p2m_type_t p2mt;
-    mfn_t mfn = p2m_lookup(d, gfn, &p2mt);
-
-    if ( t )
-        *t = p2mt;
-
-    if ( !p2m_is_any_ram(p2mt) )
-        return NULL;
-
-    if ( !mfn_valid(mfn) )
-        return NULL;
-
-    page = mfn_to_page(mfn);
-
-    /*
-     * get_page won't work on foreign mapping because the page doesn't
-     * belong to the current domain.
-     */
-    if ( p2m_is_foreign(p2mt) )
-    {
-        struct domain *fdom = page_get_owner_and_reference(page);
-        ASSERT(fdom != NULL);
-        ASSERT(fdom != d);
-        return page;
-    }
-
-    return get_page(page, d) ? page : NULL;
-}
-
 int guest_physmap_mark_populate_on_demand(struct domain *d,
                                           unsigned long gfn,
                                           unsigned int order)
@@ -634,1780 +132,16 @@ unsigned long p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn,
     return 0;
 }
 
-static void p2m_set_permission(lpae_t *e, p2m_type_t t, p2m_access_t a)
-{
-    /* First apply type permissions */
-    switch ( t )
-    {
-    case p2m_ram_rw:
-        e->p2m.xn = 0;
-        e->p2m.write = 1;
-        break;
-
-    case p2m_ram_ro:
-        e->p2m.xn = 0;
-        e->p2m.write = 0;
-        break;
-
-    case p2m_iommu_map_rw:
-    case p2m_map_foreign_rw:
-    case p2m_grant_map_rw:
-    case p2m_mmio_direct_dev:
-    case p2m_mmio_direct_nc:
-    case p2m_mmio_direct_c:
-        e->p2m.xn = 1;
-        e->p2m.write = 1;
-        break;
-
-    case p2m_iommu_map_ro:
-    case p2m_map_foreign_ro:
-    case p2m_grant_map_ro:
-    case p2m_invalid:
-        e->p2m.xn = 1;
-        e->p2m.write = 0;
-        break;
-
-    case p2m_max_real_type:
-        BUG();
-        break;
-    }
-
-    /* Then restrict with access permissions */
-    switch ( a )
-    {
-    case p2m_access_rwx:
-        break;
-    case p2m_access_wx:
-        e->p2m.read = 0;
-        break;
-    case p2m_access_rw:
-        e->p2m.xn = 1;
-        break;
-    case p2m_access_w:
-        e->p2m.read = 0;
-        e->p2m.xn = 1;
-        break;
-    case p2m_access_rx:
-    case p2m_access_rx2rw:
-        e->p2m.write = 0;
-        break;
-    case p2m_access_x:
-        e->p2m.write = 0;
-        e->p2m.read = 0;
-        break;
-    case p2m_access_r:
-        e->p2m.write = 0;
-        e->p2m.xn = 1;
-        break;
-    case p2m_access_n:
-    case p2m_access_n2rwx:
-        e->p2m.read = e->p2m.write = 0;
-        e->p2m.xn = 1;
-        break;
-    }
-}
-
-static lpae_t mfn_to_p2m_entry(mfn_t mfn, p2m_type_t t, p2m_access_t a)
-{
-    /*
-     * sh, xn and write bit will be defined in the following switches
-     * based on mattr and t.
-     */
-    lpae_t e = (lpae_t) {
-        .p2m.af = 1,
-        .p2m.read = 1,
-        .p2m.table = 1,
-        .p2m.valid = 1,
-        .p2m.type = t,
-    };
-
-    BUILD_BUG_ON(p2m_max_real_type > (1 << 4));
-
-    switch ( t )
-    {
-    case p2m_mmio_direct_dev:
-        e.p2m.mattr = MATTR_DEV;
-        e.p2m.sh = LPAE_SH_OUTER;
-        break;
-
-    case p2m_mmio_direct_c:
-        e.p2m.mattr = MATTR_MEM;
-        e.p2m.sh = LPAE_SH_OUTER;
-        break;
-
-    /*
-     * ARM ARM: Overlaying the shareability attribute (DDI
-     * 0406C.b B3-1376 to 1377)
-     *
-     * A memory region with a resultant memory type attribute of Normal,
-     * and a resultant cacheability attribute of Inner Non-cacheable,
-     * Outer Non-cacheable, must have a resultant shareability attribute
-     * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
-     *
-     * On ARMv8 shareability is ignored and explicitly treated as Outer
-     * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
-     * See the note for table D4-40, in page 1788 of the ARM DDI 0487A.j.
-     */
-    case p2m_mmio_direct_nc:
-        e.p2m.mattr = MATTR_MEM_NC;
-        e.p2m.sh = LPAE_SH_OUTER;
-        break;
-
-    default:
-        e.p2m.mattr = MATTR_MEM;
-        e.p2m.sh = LPAE_SH_INNER;
-    }
-
-    p2m_set_permission(&e, t, a);
-
-    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
-
-    lpae_set_mfn(e, mfn);
-
-    return e;
-}
-
-/* Generate table entry with correct attributes. */
-static lpae_t page_to_p2m_table(struct page_info *page)
+void __init p2m_restrict_ipa_bits(unsigned int ipa_bits)
 {
     /*
-     * The access value does not matter because the hardware will ignore
-     * the permission fields for table entry.
-     *
-     * We use p2m_ram_rw so the entry has a valid type. This is important
-     * for p2m_is_valid() to return valid on table entries.
+     * Calculate the minimum of the maximum IPA bits that any external entity
+     * can support.
      */
-    return mfn_to_p2m_entry(page_to_mfn(page), p2m_ram_rw, p2m_access_rwx);
-}
-
-static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool clean_pte)
-{
-    write_pte(p, pte);
-    if ( clean_pte )
-        clean_dcache(*p);
-}
-
-static inline void p2m_remove_pte(lpae_t *p, bool clean_pte)
-{
-    lpae_t pte;
-
-    memset(&pte, 0x00, sizeof(pte));
-    p2m_write_pte(p, pte, clean_pte);
-}
-
-/* Allocate a new page table page and hook it in via the given entry. */
-static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry)
-{
-    struct page_info *page;
-    lpae_t *p;
-
-    ASSERT(!p2m_is_valid(*entry));
-
-    page = p2m_alloc_page(p2m->domain);
-    if ( page == NULL )
-        return -ENOMEM;
-
-    page_list_add(page, &p2m->pages);
-
-    p = __map_domain_page(page);
-    clear_page(p);
-
-    if ( p2m->clean_pte )
-        clean_dcache_va_range(p, PAGE_SIZE);
-
-    unmap_domain_page(p);
-
-    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
-
-    return 0;
-}
-
-static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
-                                    p2m_access_t a)
-{
-    int rc;
-
-    if ( !p2m->mem_access_enabled )
-        return 0;
-
-    if ( p2m_access_rwx == a )
-    {
-        radix_tree_delete(&p2m->mem_access_settings, gfn_x(gfn));
-        return 0;
-    }
-
-    rc = radix_tree_insert(&p2m->mem_access_settings, gfn_x(gfn),
-                           radix_tree_int_to_ptr(a));
-    if ( rc == -EEXIST )
-    {
-        /* If a setting already exists, change it to the new one */
-        radix_tree_replace_slot(
-            radix_tree_lookup_slot(
-                &p2m->mem_access_settings, gfn_x(gfn)),
-            radix_tree_int_to_ptr(a));
-        rc = 0;
-    }
-
-    return rc;
+    if ( ipa_bits < p2m_ipa_bits )
+        p2m_ipa_bits = ipa_bits;
 }
 
-/*
- * Put any references on the single 4K page referenced by pte.
- * TODO: Handle superpages, for now we only take special references for leaf
- * pages (specifically foreign ones, which can't be super mapped today).
- */
-static void p2m_put_l3_page(const lpae_t pte)
-{
-    mfn_t mfn = lpae_get_mfn(pte);
-
-    ASSERT(p2m_is_valid(pte));
-
-    /*
-     * TODO: Handle other p2m types
-     *
-     * It's safe to do the put_page here because page_alloc will
-     * flush the TLBs if the page is reallocated before the end of
-     * this loop.
-     */
-    if ( p2m_is_foreign(pte.p2m.type) )
-    {
-        ASSERT(mfn_valid(mfn));
-        put_page(mfn_to_page(mfn));
-    }
-    /* Detect the xenheap page and mark the stored GFN as invalid. */
-    else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
-        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
-}
-
-/* Free lpae sub-tree behind an entry */
-static void p2m_free_entry(struct p2m_domain *p2m,
-                           lpae_t entry, unsigned int level)
-{
-    unsigned int i;
-    lpae_t *table;
-    mfn_t mfn;
-    struct page_info *pg;
-
-    /* Nothing to do if the entry is invalid. */
-    if ( !p2m_is_valid(entry) )
-        return;
-
-    if ( p2m_is_superpage(entry, level) || (level == 3) )
-    {
-#ifdef CONFIG_IOREQ_SERVER
-        /*
-         * If this gets called then either the entry was replaced by an entry
-         * with a different base (valid case) or the shattering of a superpage
-         * has failed (error case).
-         * So, at worst, the spurious mapcache invalidation might be sent.
-         */
-        if ( p2m_is_ram(entry.p2m.type) &&
-             domain_has_ioreq_server(p2m->domain) )
-            ioreq_request_mapcache_invalidate(p2m->domain);
-#endif
-
-        p2m->stats.mappings[level]--;
-        /* Nothing to do if the entry is a super-page. */
-        if ( level == 3 )
-            p2m_put_l3_page(entry);
-        return;
-    }
-
-    table = map_domain_page(lpae_get_mfn(entry));
-    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-        p2m_free_entry(p2m, *(table + i), level + 1);
-
-    unmap_domain_page(table);
-
-    /*
-     * Make sure all the references in the TLB have been removed before
-     * freing the intermediate page table.
-     * XXX: Should we defer the free of the page table to avoid the
-     * flush?
-     */
-    p2m_tlb_flush_sync(p2m);
-
-    mfn = lpae_get_mfn(entry);
-    ASSERT(mfn_valid(mfn));
-
-    pg = mfn_to_page(mfn);
-
-    page_list_del(pg, &p2m->pages);
-    p2m_free_page(p2m->domain, pg);
-}
-
-static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
-                                unsigned int level, unsigned int target,
-                                const unsigned int *offsets)
-{
-    struct page_info *page;
-    unsigned int i;
-    lpae_t pte, *table;
-    bool rv = true;
-
-    /* Convenience aliases */
-    mfn_t mfn = lpae_get_mfn(*entry);
-    unsigned int next_level = level + 1;
-    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
-
-    /*
-     * This should only be called with target != level and the entry is
-     * a superpage.
-     */
-    ASSERT(level < target);
-    ASSERT(p2m_is_superpage(*entry, level));
-
-    page = p2m_alloc_page(p2m->domain);
-    if ( !page )
-        return false;
-
-    page_list_add(page, &p2m->pages);
-    table = __map_domain_page(page);
-
-    /*
-     * We are either splitting a first level 1G page into 512 second level
-     * 2M pages, or a second level 2M page into 512 third level 4K pages.
-     */
-    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-    {
-        lpae_t *new_entry = table + i;
-
-        /*
-         * Use the content of the superpage entry and override
-         * the necessary fields. So the correct permission are kept.
-         */
-        pte = *entry;
-        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
-
-        /*
-         * First and second level pages set p2m.table = 0, but third
-         * level entries set p2m.table = 1.
-         */
-        pte.p2m.table = (next_level == 3);
-
-        write_pte(new_entry, pte);
-    }
-
-    /* Update stats */
-    p2m->stats.shattered[level]++;
-    p2m->stats.mappings[level]--;
-    p2m->stats.mappings[next_level] += XEN_PT_LPAE_ENTRIES;
-
-    /*
-     * Shatter superpage in the page to the level we want to make the
-     * changes.
-     * This is done outside the loop to avoid checking the offset to
-     * know whether the entry should be shattered for every entry.
-     */
-    if ( next_level != target )
-        rv = p2m_split_superpage(p2m, table + offsets[next_level],
-                                 level + 1, target, offsets);
-
-    if ( p2m->clean_pte )
-        clean_dcache_va_range(table, PAGE_SIZE);
-
-    unmap_domain_page(table);
-
-    /*
-     * Even if we failed, we should install the newly allocated LPAE
-     * entry. The caller will be in charge to free the sub-tree.
-     */
-    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
-
-    return rv;
-}
-
-/*
- * Insert an entry in the p2m. This should be called with a mapping
- * equal to a page/superpage (4K, 2M, 1G).
- */
-static int __p2m_set_entry(struct p2m_domain *p2m,
-                           gfn_t sgfn,
-                           unsigned int page_order,
-                           mfn_t smfn,
-                           p2m_type_t t,
-                           p2m_access_t a)
-{
-    unsigned int level = 0;
-    unsigned int target = 3 - (page_order / XEN_PT_LPAE_SHIFT);
-    lpae_t *entry, *table, orig_pte;
-    int rc;
-    /* A mapping is removed if the MFN is invalid. */
-    bool removing_mapping = mfn_eq(smfn, INVALID_MFN);
-    DECLARE_OFFSETS(offsets, gfn_to_gaddr(sgfn));
-
-    ASSERT(p2m_is_write_locked(p2m));
-
-    /*
-     * Check if the level target is valid: we only support
-     * 4K - 2M - 1G mapping.
-     */
-    ASSERT(target > 0 && target <= 3);
-
-    table = p2m_get_root_pointer(p2m, sgfn);
-    if ( !table )
-        return -EINVAL;
-
-    for ( level = P2M_ROOT_LEVEL; level < target; level++ )
-    {
-        /*
-         * Don't try to allocate intermediate page table if the mapping
-         * is about to be removed.
-         */
-        rc = p2m_next_level(p2m, removing_mapping,
-                            level, &table, offsets[level]);
-        if ( rc == GUEST_TABLE_MAP_FAILED )
-        {
-            /*
-             * We are here because p2m_next_level has failed to map
-             * the intermediate page table (e.g the table does not exist
-             * and they p2m tree is read-only). It is a valid case
-             * when removing a mapping as it may not exist in the
-             * page table. In this case, just ignore it.
-             */
-            rc = removing_mapping ?  0 : -ENOENT;
-            goto out;
-        }
-        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
-            break;
-    }
-
-    entry = table + offsets[level];
-
-    /*
-     * If we are here with level < target, we must be at a leaf node,
-     * and we need to break up the superpage.
-     */
-    if ( level < target )
-    {
-        /* We need to split the original page. */
-        lpae_t split_pte = *entry;
-
-        ASSERT(p2m_is_superpage(*entry, level));
-
-        if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
-        {
-            /*
-             * The current super-page is still in-place, so re-increment
-             * the stats.
-             */
-            p2m->stats.mappings[level]++;
-
-            /* Free the allocated sub-tree */
-            p2m_free_entry(p2m, split_pte, level);
-
-            rc = -ENOMEM;
-            goto out;
-        }
-
-        /*
-         * Follow the break-before-sequence to update the entry.
-         * For more details see (D4.7.1 in ARM DDI 0487A.j).
-         */
-        p2m_remove_pte(entry, p2m->clean_pte);
-        p2m_force_tlb_flush_sync(p2m);
-
-        p2m_write_pte(entry, split_pte, p2m->clean_pte);
-
-        /* then move to the level we want to make real changes */
-        for ( ; level < target; level++ )
-        {
-            rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
-
-            /*
-             * The entry should be found and either be a table
-             * or a superpage if level 3 is not targeted
-             */
-            ASSERT(rc == GUEST_TABLE_NORMAL_PAGE ||
-                   (rc == GUEST_TABLE_SUPER_PAGE && target < 3));
-        }
-
-        entry = table + offsets[level];
-    }
-
-    /*
-     * We should always be there with the correct level because
-     * all the intermediate tables have been installed if necessary.
-     */
-    ASSERT(level == target);
-
-    orig_pte = *entry;
-
-    /*
-     * The radix-tree can only work on 4KB. This is only used when
-     * memaccess is enabled and during shutdown.
-     */
-    ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
-           p2m->domain->is_dying);
-    /*
-     * The access type should always be p2m_access_rwx when the mapping
-     * is removed.
-     */
-    ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx));
-    /*
-     * Update the mem access permission before update the P2M. So we
-     * don't have to revert the mapping if it has failed.
-     */
-    rc = p2m_mem_access_radix_set(p2m, sgfn, a);
-    if ( rc )
-        goto out;
-
-    /*
-     * Always remove the entry in order to follow the break-before-make
-     * sequence when updating the translation table (D4.7.1 in ARM DDI
-     * 0487A.j).
-     */
-    if ( lpae_is_valid(orig_pte) || removing_mapping )
-        p2m_remove_pte(entry, p2m->clean_pte);
-
-    if ( removing_mapping )
-        /* Flush can be deferred if the entry is removed */
-        p2m->need_flush |= !!lpae_is_valid(orig_pte);
-    else
-    {
-        lpae_t pte = mfn_to_p2m_entry(smfn, t, a);
-
-        if ( level < 3 )
-            pte.p2m.table = 0; /* Superpage entry */
-
-        /*
-         * It is necessary to flush the TLB before writing the new entry
-         * to keep coherency when the previous entry was valid.
-         *
-         * Although, it could be defered when only the permissions are
-         * changed (e.g in case of memaccess).
-         */
-        if ( lpae_is_valid(orig_pte) )
-        {
-            if ( likely(!p2m->mem_access_enabled) ||
-                 P2M_CLEAR_PERM(pte) != P2M_CLEAR_PERM(orig_pte) )
-                p2m_force_tlb_flush_sync(p2m);
-            else
-                p2m->need_flush = true;
-        }
-        else if ( !p2m_is_valid(orig_pte) ) /* new mapping */
-            p2m->stats.mappings[level]++;
-
-        p2m_write_pte(entry, pte, p2m->clean_pte);
-
-        p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
-                                      gfn_add(sgfn, (1UL << page_order) - 1));
-        p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn);
-    }
-
-    if ( is_iommu_enabled(p2m->domain) &&
-         (lpae_is_valid(orig_pte) || lpae_is_valid(*entry)) )
-    {
-        unsigned int flush_flags = 0;
-
-        if ( lpae_is_valid(orig_pte) )
-            flush_flags |= IOMMU_FLUSHF_modified;
-        if ( lpae_is_valid(*entry) )
-            flush_flags |= IOMMU_FLUSHF_added;
-
-        rc = iommu_iotlb_flush(p2m->domain, _dfn(gfn_x(sgfn)),
-                               1UL << page_order, flush_flags);
-    }
-    else
-        rc = 0;
-
-    /*
-     * Free the entry only if the original pte was valid and the base
-     * is different (to avoid freeing when permission is changed).
-     */
-    if ( p2m_is_valid(orig_pte) &&
-         !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
-        p2m_free_entry(p2m, orig_pte, level);
-
-out:
-    unmap_domain_page(table);
-
-    return rc;
-}
-
-int p2m_set_entry(struct p2m_domain *p2m,
-                  gfn_t sgfn,
-                  unsigned long nr,
-                  mfn_t smfn,
-                  p2m_type_t t,
-                  p2m_access_t a)
-{
-    int rc = 0;
-
-    /*
-     * Any reference taken by the P2M mappings (e.g. foreign mapping) will
-     * be dropped in relinquish_p2m_mapping(). As the P2M will still
-     * be accessible after, we need to prevent mapping to be added when the
-     * domain is dying.
-     */
-    if ( unlikely(p2m->domain->is_dying) )
-        return -ENOMEM;
-
-    while ( nr )
-    {
-        unsigned long mask;
-        unsigned long order;
-
-        /*
-         * Don't take into account the MFN when removing mapping (i.e
-         * MFN_INVALID) to calculate the correct target order.
-         *
-         * XXX: Support superpage mappings if nr is not aligned to a
-         * superpage size.
-         */
-        mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
-        mask |= gfn_x(sgfn) | nr;
-
-        /* Always map 4k by 4k when memaccess is enabled */
-        if ( unlikely(p2m->mem_access_enabled) )
-            order = THIRD_ORDER;
-        else if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
-            order = FIRST_ORDER;
-        else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
-            order = SECOND_ORDER;
-        else
-            order = THIRD_ORDER;
-
-        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
-        if ( rc )
-            break;
-
-        sgfn = gfn_add(sgfn, (1 << order));
-        if ( !mfn_eq(smfn, INVALID_MFN) )
-           smfn = mfn_add(smfn, (1 << order));
-
-        nr -= (1 << order);
-    }
-
-    return rc;
-}
-
-/* Invalidate all entries in the table. The p2m should be write locked. */
-static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn)
-{
-    lpae_t *table;
-    unsigned int i;
-
-    ASSERT(p2m_is_write_locked(p2m));
-
-    table = map_domain_page(mfn);
-
-    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-    {
-        lpae_t pte = table[i];
-
-        /*
-         * Writing an entry can be expensive because it may involve
-         * cleaning the cache. So avoid updating the entry if the valid
-         * bit is already cleared.
-         */
-        if ( !pte.p2m.valid )
-            continue;
-
-        pte.p2m.valid = 0;
-
-        p2m_write_pte(&table[i], pte, p2m->clean_pte);
-    }
-
-    unmap_domain_page(table);
-
-    p2m->need_flush = true;
-}
-
-/*
- * Invalidate all entries in the root page-tables. This is
- * useful to get fault on entry and do an action.
- *
- * p2m_invalid_root() should not be called when the P2M is shared with
- * the IOMMU because it will cause IOMMU fault.
- */
-void p2m_invalidate_root(struct p2m_domain *p2m)
-{
-    unsigned int i;
-
-    ASSERT(!iommu_use_hap_pt(p2m->domain));
-
-    p2m_write_lock(p2m);
-
-    for ( i = 0; i < P2M_ROOT_LEVEL; i++ )
-        p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i));
-
-    p2m_write_unlock(p2m);
-}
-
-/*
- * Resolve any translation fault due to change in the p2m. This
- * includes break-before-make and valid bit cleared.
- */
-bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned int level = 0;
-    bool resolved = false;
-    lpae_t entry, *table;
-
-    /* Convenience aliases */
-    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
-
-    p2m_write_lock(p2m);
-
-    /* This gfn is higher than the highest the p2m map currently holds */
-    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
-        goto out;
-
-    table = p2m_get_root_pointer(p2m, gfn);
-    /*
-     * The table should always be non-NULL because the gfn is below
-     * p2m->max_mapped_gfn and the root table pages are always present.
-     */
-    if ( !table )
-    {
-        ASSERT_UNREACHABLE();
-        goto out;
-    }
-
-    /*
-     * Go down the page-tables until an entry has the valid bit unset or
-     * a block/page entry has been hit.
-     */
-    for ( level = P2M_ROOT_LEVEL; level <= 3; level++ )
-    {
-        int rc;
-
-        entry = table[offsets[level]];
-
-        if ( level == 3 )
-            break;
-
-        /* Stop as soon as we hit an entry with the valid bit unset. */
-        if ( !lpae_is_valid(entry) )
-            break;
-
-        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
-        if ( rc == GUEST_TABLE_MAP_FAILED )
-            goto out_unmap;
-        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
-            break;
-    }
-
-    /*
-     * If the valid bit of the entry is set, it means someone was playing with
-     * the Stage-2 page table. Nothing to do and mark the fault as resolved.
-     */
-    if ( lpae_is_valid(entry) )
-    {
-        resolved = true;
-        goto out_unmap;
-    }
-
-    /*
-     * The valid bit is unset. If the entry is still not valid then the fault
-     * cannot be resolved, exit and report it.
-     */
-    if ( !p2m_is_valid(entry) )
-        goto out_unmap;
-
-    /*
-     * Now we have an entry with valid bit unset, but still valid from
-     * the P2M point of view.
-     *
-     * If an entry is pointing to a table, each entry of the table will
-     * have there valid bit cleared. This allows a function to clear the
-     * full p2m with just a couple of write. The valid bit will then be
-     * propagated on the fault.
-     * If an entry is pointing to a block/page, no work to do for now.
-     */
-    if ( lpae_is_table(entry, level) )
-        p2m_invalidate_table(p2m, lpae_get_mfn(entry));
-
-    /*
-     * Now that the work on the entry is done, set the valid bit to prevent
-     * another fault on that entry.
-     */
-    resolved = true;
-    entry.p2m.valid = 1;
-
-    p2m_write_pte(table + offsets[level], entry, p2m->clean_pte);
-
-    /*
-     * No need to flush the TLBs as the modified entry had the valid bit
-     * unset.
-     */
-
-out_unmap:
-    unmap_domain_page(table);
-
-out:
-    p2m_write_unlock(p2m);
-
-    return resolved;
-}
-
-int p2m_insert_mapping(struct domain *d, gfn_t start_gfn, unsigned long nr,
-                       mfn_t mfn, p2m_type_t t)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    int rc;
-
-    p2m_write_lock(p2m);
-    rc = p2m_set_entry(p2m, start_gfn, nr, mfn, t, p2m->default_access);
-    p2m_write_unlock(p2m);
-
-    return rc;
-}
-
-static inline int p2m_remove_mapping(struct domain *d,
-                                     gfn_t start_gfn,
-                                     unsigned long nr,
-                                     mfn_t mfn)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned long i;
-    int rc;
-
-    p2m_write_lock(p2m);
-    /*
-     * Before removing the GFN - MFN mapping for any RAM pages make sure
-     * that there is no difference between what is already mapped and what
-     * is requested to be unmapped.
-     * If they don't match bail out early. For instance, this could happen
-     * if two CPUs are requesting to unmap the same P2M entry concurrently.
-     */
-    for ( i = 0; i < nr; )
-    {
-        unsigned int cur_order;
-        p2m_type_t t;
-        mfn_t mfn_return = p2m_get_entry(p2m, gfn_add(start_gfn, i), &t, NULL,
-                                         &cur_order, NULL);
-
-        if ( p2m_is_any_ram(t) &&
-             (!mfn_valid(mfn) || !mfn_eq(mfn_add(mfn, i), mfn_return)) )
-        {
-            rc = -EILSEQ;
-            goto out;
-        }
-
-        i += (1UL << cur_order) -
-             ((gfn_x(start_gfn) + i) & ((1UL << cur_order) - 1));
-    }
-
-    rc = p2m_set_entry(p2m, start_gfn, nr, INVALID_MFN,
-                       p2m_invalid, p2m_access_rwx);
-
-out:
-    p2m_write_unlock(p2m);
-
-    return rc;
-}
-
-int map_regions_p2mt(struct domain *d,
-                     gfn_t gfn,
-                     unsigned long nr,
-                     mfn_t mfn,
-                     p2m_type_t p2mt)
-{
-    return p2m_insert_mapping(d, gfn, nr, mfn, p2mt);
-}
-
-int unmap_regions_p2mt(struct domain *d,
-                       gfn_t gfn,
-                       unsigned long nr,
-                       mfn_t mfn)
-{
-    return p2m_remove_mapping(d, gfn, nr, mfn);
-}
-
-int map_mmio_regions(struct domain *d,
-                     gfn_t start_gfn,
-                     unsigned long nr,
-                     mfn_t mfn)
-{
-    return p2m_insert_mapping(d, start_gfn, nr, mfn, p2m_mmio_direct_dev);
-}
-
-int unmap_mmio_regions(struct domain *d,
-                       gfn_t start_gfn,
-                       unsigned long nr,
-                       mfn_t mfn)
-{
-    return p2m_remove_mapping(d, start_gfn, nr, mfn);
-}
-
-int map_dev_mmio_page(struct domain *d, gfn_t gfn, mfn_t mfn)
-{
-    int res;
-
-    if ( !iomem_access_permitted(d, mfn_x(mfn), mfn_x(mfn)) )
-        return 0;
-
-    res = p2m_insert_mapping(d, gfn, 1, mfn, p2m_mmio_direct_c);
-    if ( res < 0 )
-    {
-        printk(XENLOG_G_ERR "Unable to map MFN %#"PRI_mfn" in %pd\n",
-               mfn_x(mfn), d);
-        return res;
-    }
-
-    return 0;
-}
-
-int guest_physmap_add_entry(struct domain *d,
-                            gfn_t gfn,
-                            mfn_t mfn,
-                            unsigned long page_order,
-                            p2m_type_t t)
-{
-    return p2m_insert_mapping(d, gfn, (1 << page_order), mfn, t);
-}
-
-int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
-                              unsigned int page_order)
-{
-    return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
-}
-
-int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
-                          unsigned long gfn, mfn_t mfn)
-{
-    struct page_info *page = mfn_to_page(mfn);
-    int rc;
-
-    ASSERT(arch_acquire_resource_check(d));
-
-    if ( !get_page(page, fd) )
-        return -EINVAL;
-
-    /*
-     * It is valid to always use p2m_map_foreign_rw here as if this gets
-     * called then d != fd. A case when d == fd would be rejected by
-     * rcu_lock_remote_domain_by_id() earlier. Put a respective ASSERT()
-     * to catch incorrect usage in future.
-     */
-    ASSERT(d != fd);
-
-    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
-    if ( rc )
-        put_page(page);
-
-    return rc;
-}
-
-static struct page_info *p2m_allocate_root(void)
-{
-    struct page_info *page;
-    unsigned int i;
-
-    page = alloc_domheap_pages(NULL, P2M_ROOT_ORDER, 0);
-    if ( page == NULL )
-        return NULL;
-
-    /* Clear both first level pages */
-    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
-        clear_and_clean_page(page + i);
-
-    return page;
-}
-
-static int p2m_alloc_table(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    p2m->root = p2m_allocate_root();
-    if ( !p2m->root )
-        return -ENOMEM;
-
-    p2m->vttbr = generate_vttbr(p2m->vmid, page_to_mfn(p2m->root));
-
-    /*
-     * Make sure that all TLBs corresponding to the new VMID are flushed
-     * before using it
-     */
-    p2m_write_lock(p2m);
-    p2m_force_tlb_flush_sync(p2m);
-    p2m_write_unlock(p2m);
-
-    return 0;
-}
-
-
-static spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
-
-/*
- * VTTBR_EL2 VMID field is 8 or 16 bits. AArch64 may support 16-bit VMID.
- * Using a bitmap here limits us to 256 or 65536 (for AArch64) concurrent
- * domains. The bitmap space will be allocated dynamically based on
- * whether 8 or 16 bit VMIDs are supported.
- */
-static unsigned long *vmid_mask;
-
-static void p2m_vmid_allocator_init(void)
-{
-    /*
-     * allocate space for vmid_mask based on MAX_VMID
-     */
-    vmid_mask = xzalloc_array(unsigned long, BITS_TO_LONGS(MAX_VMID));
-
-    if ( !vmid_mask )
-        panic("Could not allocate VMID bitmap space\n");
-
-    set_bit(INVALID_VMID, vmid_mask);
-}
-
-static int p2m_alloc_vmid(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    int rc, nr;
-
-    spin_lock(&vmid_alloc_lock);
-
-    nr = find_first_zero_bit(vmid_mask, MAX_VMID);
-
-    ASSERT(nr != INVALID_VMID);
-
-    if ( nr == MAX_VMID )
-    {
-        rc = -EBUSY;
-        printk(XENLOG_ERR "p2m.c: dom%d: VMID pool exhausted\n", d->domain_id);
-        goto out;
-    }
-
-    set_bit(nr, vmid_mask);
-
-    p2m->vmid = nr;
-
-    rc = 0;
-
-out:
-    spin_unlock(&vmid_alloc_lock);
-    return rc;
-}
-
-static void p2m_free_vmid(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    spin_lock(&vmid_alloc_lock);
-    if ( p2m->vmid != INVALID_VMID )
-        clear_bit(p2m->vmid, vmid_mask);
-
-    spin_unlock(&vmid_alloc_lock);
-}
-
-int p2m_teardown(struct domain *d, bool allow_preemption)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned long count = 0;
-    struct page_info *pg;
-    unsigned int i;
-    int rc = 0;
-
-    if ( page_list_empty(&p2m->pages) )
-        return 0;
-
-    p2m_write_lock(p2m);
-
-    /*
-     * We are about to free the intermediate page-tables, so clear the
-     * root to prevent any walk to use them.
-     */
-    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
-        clear_and_clean_page(p2m->root + i);
-
-    /*
-     * The domain will not be scheduled anymore, so in theory we should
-     * not need to flush the TLBs. Do it for safety purpose.
-     *
-     * Note that all the devices have already been de-assigned. So we don't
-     * need to flush the IOMMU TLB here.
-     */
-    p2m_force_tlb_flush_sync(p2m);
-
-    while ( (pg = page_list_remove_head(&p2m->pages)) )
-    {
-        p2m_free_page(p2m->domain, pg);
-        count++;
-        /* Arbitrarily preempt every 512 iterations */
-        if ( allow_preemption && !(count % 512) && hypercall_preempt_check() )
-        {
-            rc = -ERESTART;
-            break;
-        }
-    }
-
-    p2m_write_unlock(p2m);
-
-    return rc;
-}
-
-void p2m_final_teardown(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    /* p2m not actually initialized */
-    if ( !p2m->domain )
-        return;
-
-    /*
-     * No need to call relinquish_p2m_mapping() here because
-     * p2m_final_teardown() is called either after domain_relinquish_resources()
-     * where relinquish_p2m_mapping() has been called, or from failure path of
-     * domain_create()/arch_domain_create() where mappings that require
-     * p2m_put_l3_page() should never be created. For the latter case, also see
-     * comment on top of the p2m_set_entry() for more info.
-     */
-
-    BUG_ON(p2m_teardown(d, false));
-    ASSERT(page_list_empty(&p2m->pages));
-
-    while ( p2m_teardown_allocation(d) == -ERESTART )
-        continue; /* No preemption support here */
-    ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
-
-    if ( p2m->root )
-        free_domheap_pages(p2m->root, P2M_ROOT_ORDER);
-
-    p2m->root = NULL;
-
-    p2m_free_vmid(d);
-
-    radix_tree_destroy(&p2m->mem_access_settings, NULL);
-
-    p2m->domain = NULL;
-}
-
-int p2m_init(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    int rc;
-    unsigned int cpu;
-
-    rwlock_init(&p2m->lock);
-    spin_lock_init(&d->arch.paging.lock);
-    INIT_PAGE_LIST_HEAD(&p2m->pages);
-    INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
-
-    p2m->vmid = INVALID_VMID;
-    p2m->max_mapped_gfn = _gfn(0);
-    p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
-
-    p2m->default_access = p2m_access_rwx;
-    p2m->mem_access_enabled = false;
-    radix_tree_init(&p2m->mem_access_settings);
-
-    /*
-     * Some IOMMUs don't support coherent PT walk. When the p2m is
-     * shared with the CPU, Xen has to make sure that the PT changes have
-     * reached the memory
-     */
-    p2m->clean_pte = is_iommu_enabled(d) &&
-        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
-
-    /*
-     * Make sure that the type chosen to is able to store the an vCPU ID
-     * between 0 and the maximum of virtual CPUS supported as long as
-     * the INVALID_VCPU_ID.
-     */
-    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
-    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
-
-    for_each_possible_cpu(cpu)
-       p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
-
-    /*
-     * "Trivial" initialisation is now complete.  Set the backpointer so
-     * p2m_teardown() and friends know to do something.
-     */
-    p2m->domain = d;
-
-    rc = p2m_alloc_vmid(d);
-    if ( rc )
-        return rc;
-
-    rc = p2m_alloc_table(d);
-    if ( rc )
-        return rc;
-
-    /*
-     * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
-     * when the domain is created. Considering the worst case for page
-     * tables and keep a buffer, populate 16 pages to the P2M pages pool here.
-     * For GICv3, the above-mentioned P2M mapping is not necessary, but since
-     * the allocated 16 pages here would not be lost, hence populate these
-     * pages unconditionally.
-     */
-    spin_lock(&d->arch.paging.lock);
-    rc = p2m_set_allocation(d, 16, NULL);
-    spin_unlock(&d->arch.paging.lock);
-    if ( rc )
-        return rc;
-
-    return 0;
-}
-
-/*
- * The function will go through the p2m and remove page reference when it
- * is required. The mapping will be removed from the p2m.
- *
- * XXX: See whether the mapping can be left intact in the p2m.
- */
-int relinquish_p2m_mapping(struct domain *d)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned long count = 0;
-    p2m_type_t t;
-    int rc = 0;
-    unsigned int order;
-    gfn_t start, end;
-
-    BUG_ON(!d->is_dying);
-    /* No mappings can be added in the P2M after the P2M lock is released. */
-    p2m_write_lock(p2m);
-
-    start = p2m->lowest_mapped_gfn;
-    end = gfn_add(p2m->max_mapped_gfn, 1);
-
-    for ( ; gfn_x(start) < gfn_x(end);
-          start = gfn_next_boundary(start, order) )
-    {
-        mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, NULL);
-
-        count++;
-        /*
-         * Arbitrarily preempt every 512 iterations.
-         */
-        if ( !(count % 512) && hypercall_preempt_check() )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        /*
-         * p2m_set_entry will take care of removing reference on page
-         * when it is necessary and removing the mapping in the p2m.
-         */
-        if ( !mfn_eq(mfn, INVALID_MFN) )
-        {
-            /*
-             * For valid mapping, the start will always be aligned as
-             * entry will be removed whilst relinquishing.
-             */
-            rc = __p2m_set_entry(p2m, start, order, INVALID_MFN,
-                                 p2m_invalid, p2m_access_rwx);
-            if ( unlikely(rc) )
-            {
-                printk(XENLOG_G_ERR "Unable to remove mapping gfn=%#"PRI_gfn" order=%u from the p2m of domain %d\n", gfn_x(start), order, d->domain_id);
-                break;
-            }
-        }
-    }
-
-    /*
-     * Update lowest_mapped_gfn so on the next call we still start where
-     * we stopped.
-     */
-    p2m->lowest_mapped_gfn = start;
-
-    p2m_write_unlock(p2m);
-
-    return rc;
-}
-
-int p2m_cache_flush_range(struct domain *d, gfn_t *pstart, gfn_t end)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    gfn_t next_block_gfn;
-    gfn_t start = *pstart;
-    mfn_t mfn = INVALID_MFN;
-    p2m_type_t t;
-    unsigned int order;
-    int rc = 0;
-    /* Counter for preemption */
-    unsigned short count = 0;
-
-    /*
-     * The operation cache flush will invalidate the RAM assigned to the
-     * guest in a given range. It will not modify the page table and
-     * flushing the cache whilst the page is used by another CPU is
-     * fine. So using read-lock is fine here.
-     */
-    p2m_read_lock(p2m);
-
-    start = gfn_max(start, p2m->lowest_mapped_gfn);
-    end = gfn_min(end, gfn_add(p2m->max_mapped_gfn, 1));
-
-    next_block_gfn = start;
-
-    while ( gfn_x(start) < gfn_x(end) )
-    {
-       /*
-         * Cleaning the cache for the P2M may take a long time. So we
-         * need to be able to preempt. We will arbitrarily preempt every
-         * time count reach 512 or above.
-         *
-         * The count will be incremented by:
-         *  - 1 on region skipped
-         *  - 10 for each page requiring a flush
-         */
-        if ( count >= 512 )
-        {
-            if ( softirq_pending(smp_processor_id()) )
-            {
-                rc = -ERESTART;
-                break;
-            }
-            count = 0;
-        }
-
-        /*
-         * We want to flush page by page as:
-         *  - it may not be possible to map the full block (can be up to 1GB)
-         *    in Xen memory
-         *  - we may want to do fine grain preemption as flushing multiple
-         *    page in one go may take a long time
-         *
-         * As p2m_get_entry is able to return the size of the mapping
-         * in the p2m, it is pointless to execute it for each page.
-         *
-         * We can optimize it by tracking the gfn of the next
-         * block. So we will only call p2m_get_entry for each block (can
-         * be up to 1GB).
-         */
-        if ( gfn_eq(start, next_block_gfn) )
-        {
-            bool valid;
-
-            mfn = p2m_get_entry(p2m, start, &t, NULL, &order, &valid);
-            next_block_gfn = gfn_next_boundary(start, order);
-
-            if ( mfn_eq(mfn, INVALID_MFN) || !p2m_is_any_ram(t) || !valid )
-            {
-                count++;
-                start = next_block_gfn;
-                continue;
-            }
-        }
-
-        count += 10;
-
-        flush_page_to_ram(mfn_x(mfn), false);
-
-        start = gfn_add(start, 1);
-        mfn = mfn_add(mfn, 1);
-    }
-
-    if ( rc != -ERESTART )
-        invalidate_icache();
-
-    p2m_read_unlock(p2m);
-
-    *pstart = start;
-
-    return rc;
-}
-
-/*
- * Clean & invalidate RAM associated to the guest vCPU.
- *
- * The function can only work with the current vCPU and should be called
- * with IRQ enabled as the vCPU could get preempted.
- */
-void p2m_flush_vm(struct vcpu *v)
-{
-    struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
-    int rc;
-    gfn_t start = _gfn(0);
-
-    ASSERT(v == current);
-    ASSERT(local_irq_is_enabled());
-    ASSERT(v->arch.need_flush_to_ram);
-
-    do
-    {
-        rc = p2m_cache_flush_range(v->domain, &start, _gfn(ULONG_MAX));
-        if ( rc == -ERESTART )
-            do_softirq();
-    } while ( rc == -ERESTART );
-
-    if ( rc != 0 )
-        gprintk(XENLOG_WARNING,
-                "P2M has not been correctly cleaned (rc = %d)\n",
-                rc);
-
-    /*
-     * Invalidate the p2m to track which page was modified by the guest
-     * between call of p2m_flush_vm().
-     */
-    p2m_invalidate_root(p2m);
-
-    v->arch.need_flush_to_ram = false;
-}
-
-/*
- * See note at ARMv7 ARM B1.14.4 (DDI 0406C.c) (TL;DR: S/W ops are not
- * easily virtualized).
- *
- * Main problems:
- *  - S/W ops are local to a CPU (not broadcast)
- *  - We have line migration behind our back (speculation)
- *  - System caches don't support S/W at all (damn!)
- *
- * In the face of the above, the best we can do is to try and convert
- * S/W ops to VA ops. Because the guest is not allowed to infer the S/W
- * to PA mapping, it can only use S/W to nuke the whole cache, which is
- * rather a good thing for us.
- *
- * Also, it is only used when turning caches on/off ("The expected
- * usage of the cache maintenance instructions that operate by set/way
- * is associated with the powerdown and powerup of caches, if this is
- * required by the implementation.").
- *
- * We use the following policy:
- *  - If we trap a S/W operation, we enabled VM trapping to detect
- *  caches being turned on/off, and do a full clean.
- *
- *  - We flush the caches on both caches being turned on and off.
- *
- *  - Once the caches are enabled, we stop trapping VM ops.
- */
-void p2m_set_way_flush(struct vcpu *v, struct cpu_user_regs *regs,
-                       const union hsr hsr)
-{
-    /* This function can only work with the current vCPU. */
-    ASSERT(v == current);
-
-    if ( iommu_use_hap_pt(current->domain) )
-    {
-        gprintk(XENLOG_ERR,
-                "The cache should be flushed by VA rather than by set/way.\n");
-        inject_undef_exception(regs, hsr);
-        return;
-    }
-
-    if ( !(v->arch.hcr_el2 & HCR_TVM) )
-    {
-        v->arch.need_flush_to_ram = true;
-        vcpu_hcr_set_flags(v, HCR_TVM);
-    }
-}
-
-void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
-{
-    bool now_enabled = vcpu_has_cache_enabled(v);
-
-    /* This function can only work with the current vCPU. */
-    ASSERT(v == current);
-
-    /*
-     * If switching the MMU+caches on, need to invalidate the caches.
-     * If switching it off, need to clean the caches.
-     * Clean + invalidate does the trick always.
-     */
-    if ( was_enabled != now_enabled )
-        v->arch.need_flush_to_ram = true;
-
-    /* Caches are now on, stop trapping VM ops (until a S/W op) */
-    if ( now_enabled )
-        vcpu_hcr_clear_flags(v, HCR_TVM);
-}
-
-mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
-{
-    return p2m_lookup(d, gfn, NULL);
-}
-
-struct page_info *get_page_from_gva(struct vcpu *v, vaddr_t va,
-                                    unsigned long flags)
-{
-    struct domain *d = v->domain;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    struct page_info *page = NULL;
-    paddr_t maddr = 0;
-    uint64_t par;
-    mfn_t mfn;
-    p2m_type_t t;
-
-    /*
-     * XXX: To support a different vCPU, we would need to load the
-     * VTTBR_EL2, TTBR0_EL1, TTBR1_EL1 and SCTLR_EL1
-     */
-    if ( v != current )
-        return NULL;
-
-    /*
-     * The lock is here to protect us against the break-before-make
-     * sequence used when updating the entry.
-     */
-    p2m_read_lock(p2m);
-    par = gvirt_to_maddr(va, &maddr, flags);
-    p2m_read_unlock(p2m);
-
-    /*
-     * gvirt_to_maddr may fail if the entry does not have the valid bit
-     * set. Fallback to the second method:
-     *  1) Translate the VA to IPA using software lookup -> Stage-1 page-table
-     *  may not be accessible because the stage-2 entries may have valid
-     *  bit unset.
-     *  2) Software lookup of the MFN
-     *
-     * Note that when memaccess is enabled, we instead call directly
-     * p2m_mem_access_check_and_get_page(...). Because the function is a
-     * a variant of the methods described above, it will be able to
-     * handle entries with valid bit unset.
-     *
-     * TODO: Integrate more nicely memaccess with the rest of the
-     * function.
-     * TODO: Use the fault error in PAR_EL1 to avoid pointless
-     *  translation.
-     */
-    if ( par )
-    {
-        paddr_t ipa;
-        unsigned int s1_perms;
-
-        /*
-         * When memaccess is enabled, the translation GVA to MADDR may
-         * have failed because of a permission fault.
-         */
-        if ( p2m->mem_access_enabled )
-            return p2m_mem_access_check_and_get_page(va, flags, v);
-
-        /*
-         * The software stage-1 table walk can still fail, e.g, if the
-         * GVA is not mapped.
-         */
-        if ( !guest_walk_tables(v, va, &ipa, &s1_perms) )
-        {
-            dprintk(XENLOG_G_DEBUG,
-                    "%pv: Failed to walk page-table va %#"PRIvaddr"\n", v, va);
-            return NULL;
-        }
-
-        mfn = p2m_lookup(d, gaddr_to_gfn(ipa), &t);
-        if ( mfn_eq(INVALID_MFN, mfn) || !p2m_is_ram(t) )
-            return NULL;
-
-        /*
-         * Check permission that are assumed by the caller. For instance
-         * in case of guestcopy, the caller assumes that the translated
-         * page can be accessed with the requested permissions. If this
-         * is not the case, we should fail.
-         *
-         * Please note that we do not check for the GV2M_EXEC
-         * permission. This is fine because the hardware-based translation
-         * instruction does not test for execute permissions.
-         */
-        if ( (flags & GV2M_WRITE) && !(s1_perms & GV2M_WRITE) )
-            return NULL;
-
-        if ( (flags & GV2M_WRITE) && t != p2m_ram_rw )
-            return NULL;
-    }
-    else
-        mfn = maddr_to_mfn(maddr);
-
-    if ( !mfn_valid(mfn) )
-    {
-        dprintk(XENLOG_G_DEBUG, "%pv: Invalid MFN %#"PRI_mfn"\n",
-                v, mfn_x(mfn));
-        return NULL;
-    }
-
-    page = mfn_to_page(mfn);
-    ASSERT(page);
-
-    if ( unlikely(!get_page(page, d)) )
-    {
-        dprintk(XENLOG_G_DEBUG, "%pv: Failing to acquire the MFN %#"PRI_mfn"\n",
-                v, mfn_x(maddr_to_mfn(maddr)));
-        return NULL;
-    }
-
-    return page;
-}
-
-void __init p2m_restrict_ipa_bits(unsigned int ipa_bits)
-{
-    /*
-     * Calculate the minimum of the maximum IPA bits that any external entity
-     * can support.
-     */
-    if ( ipa_bits < p2m_ipa_bits )
-        p2m_ipa_bits = ipa_bits;
-}
-
-/* VTCR value to be configured by all CPUs. Set only once by the boot CPU */
-static register_t __read_mostly vtcr;
-
-static void setup_virt_paging_one(void *data)
-{
-    WRITE_SYSREG(vtcr, VTCR_EL2);
-
-    /*
-     * ARM64_WORKAROUND_AT_SPECULATE: We want to keep the TLBs free from
-     * entries related to EL1/EL0 translation regime until a guest vCPU
-     * is running. For that, we need to set-up VTTBR to point to an empty
-     * page-table and turn on stage-2 translation. The TLB entries
-     * associated with EL1/EL0 translation regime will also be flushed in case
-     * an AT instruction was speculated before hand.
-     */
-    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-    {
-        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
-        WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_VM, HCR_EL2);
-        isb();
-
-        flush_all_guests_tlb_local();
-    }
-}
-
-void __init setup_virt_paging(void)
-{
-    /* Setup Stage 2 address translation */
-    register_t val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
-
-#ifdef CONFIG_ARM_32
-    if ( p2m_ipa_bits < 40 )
-        panic("P2M: Not able to support %u-bit IPA at the moment\n",
-              p2m_ipa_bits);
-
-    printk("P2M: 40-bit IPA\n");
-    p2m_ipa_bits = 40;
-    val |= VTCR_T0SZ(0x18); /* 40 bit IPA */
-    val |= VTCR_SL0(0x1); /* P2M starts at first level */
-#else /* CONFIG_ARM_64 */
-    static const struct {
-        unsigned int pabits; /* Physical Address Size */
-        unsigned int t0sz;   /* Desired T0SZ, minimum in comment */
-        unsigned int root_order; /* Page order of the root of the p2m */
-        unsigned int sl0;    /* Desired SL0, maximum in comment */
-    } pa_range_info[] __initconst = {
-        /* T0SZ minimum and SL0 maximum from ARM DDI 0487H.a Table D5-6 */
-        /*      PA size, t0sz(min), root-order, sl0(max) */
-        [0] = { 32,      32/*32*/,  0,          1 },
-        [1] = { 36,      28/*28*/,  0,          1 },
-        [2] = { 40,      24/*24*/,  1,          1 },
-        [3] = { 42,      22/*22*/,  3,          1 },
-        [4] = { 44,      20/*20*/,  0,          2 },
-        [5] = { 48,      16/*16*/,  0,          2 },
-        [6] = { 52,      12/*12*/,  4,          2 },
-        [7] = { 0 }  /* Invalid */
-    };
-
-    unsigned int i;
-    unsigned int pa_range = 0x10; /* Larger than any possible value */
-
-    /*
-     * Restrict "p2m_ipa_bits" if needed. As P2M table is always configured
-     * with IPA bits == PA bits, compare against "pabits".
-     */
-    if ( pa_range_info[system_cpuinfo.mm64.pa_range].pabits < p2m_ipa_bits )
-        p2m_ipa_bits = pa_range_info[system_cpuinfo.mm64.pa_range].pabits;
-
-    /*
-     * cpu info sanitization made sure we support 16bits VMID only if all
-     * cores are supporting it.
-     */
-    if ( system_cpuinfo.mm64.vmid_bits == MM64_VMID_16_BITS_SUPPORT )
-        max_vmid = MAX_VMID_16_BIT;
-
-    /* Choose suitable "pa_range" according to the resulted "p2m_ipa_bits". */
-    for ( i = 0; i < ARRAY_SIZE(pa_range_info); i++ )
-    {
-        if ( p2m_ipa_bits == pa_range_info[i].pabits )
-        {
-            pa_range = i;
-            break;
-        }
-    }
-
-    /* pa_range is 4 bits but we don't support all modes */
-    if ( pa_range >= ARRAY_SIZE(pa_range_info) || !pa_range_info[pa_range].pabits )
-        panic("Unknown encoding of ID_AA64MMFR0_EL1.PARange %x\n", pa_range);
-
-    val |= VTCR_PS(pa_range);
-    val |= VTCR_TG0_4K;
-
-    /* Set the VS bit only if 16 bit VMID is supported. */
-    if ( MAX_VMID == MAX_VMID_16_BIT )
-        val |= VTCR_VS;
-    val |= VTCR_SL0(pa_range_info[pa_range].sl0);
-    val |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
-
-    p2m_root_order = pa_range_info[pa_range].root_order;
-    p2m_root_level = 2 - pa_range_info[pa_range].sl0;
-    p2m_ipa_bits = 64 - pa_range_info[pa_range].t0sz;
-
-    printk("P2M: %d-bit IPA with %d-bit PA and %d-bit VMID\n",
-           p2m_ipa_bits,
-           pa_range_info[pa_range].pabits,
-           ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
-#endif
-    printk("P2M: %d levels with order-%d root, VTCR 0x%"PRIregister"\n",
-           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
-
-    p2m_vmid_allocator_init();
-
-    /* It is not allowed to concatenate a level zero root */
-    BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
-    vtcr = val;
-
-    /*
-     * ARM64_WORKAROUND_AT_SPECULATE requires to allocate root table
-     * with all entries zeroed.
-     */
-    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
-    {
-        struct page_info *root;
-
-        root = p2m_allocate_root();
-        if ( !root )
-            panic("Unable to allocate root table for ARM64_WORKAROUND_AT_SPECULATE\n");
-
-        empty_root_mfn = page_to_mfn(root);
-    }
-
-    setup_virt_paging_one(NULL);
-    smp_call_function(setup_virt_paging_one, NULL, 1);
-}
-
-static int cpu_virt_paging_callback(struct notifier_block *nfb,
-                                    unsigned long action,
-                                    void *hcpu)
-{
-    switch ( action )
-    {
-    case CPU_STARTING:
-        ASSERT(system_state != SYS_STATE_boot);
-        setup_virt_paging_one(NULL);
-        break;
-    default:
-        break;
-    }
-
-    return NOTIFY_DONE;
-}
-
-static struct notifier_block cpu_virt_paging_nfb = {
-    .notifier_call = cpu_virt_paging_callback,
-};
-
-static int __init cpu_virt_paging_init(void)
-{
-    register_cpu_notifier(&cpu_virt_paging_nfb);
-
-    return 0;
-}
-/*
- * Initialization of the notifier has to be done at init rather than presmp_init
- * phase because: the registered notifier is used to setup virtual paging for
- * non-boot CPUs after the initial virtual paging for all CPUs is already setup,
- * i.e. when a non-boot CPU is hotplugged after the system has booted. In other
- * words, the notifier should be registered after the virtual paging is
- * initially setup (setup_virt_paging() is called from start_xen()). This is
- * required because vtcr config value has to be set before a notifier can fire.
- */
-__initcall(cpu_virt_paging_init);
-
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/p2m_mmu.c b/xen/arch/arm/p2m_mmu.c
new file mode 100644
index 0000000000..88a9d8f392
--- /dev/null
+++ b/xen/arch/arm/p2m_mmu.c
@@ -0,0 +1,2295 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <xen/cpu.h>
+#include <xen/domain_page.h>
+#include <xen/iocap.h>
+#include <xen/ioreq.h>
+#include <xen/lib.h>
+#include <xen/sched.h>
+#include <xen/softirq.h>
+
+#include <asm/alternative.h>
+#include <asm/event.h>
+#include <asm/flushtlb.h>
+#include <asm/guest_walk.h>
+#include <asm/page.h>
+#include <asm/traps.h>
+
+#define MAX_VMID_8_BIT  (1UL << 8)
+#define MAX_VMID_16_BIT (1UL << 16)
+
+#define INVALID_VMID 0 /* VMID 0 is reserved */
+
+#ifdef CONFIG_ARM_64
+static unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
+/* VMID is by default 8 bit width on AArch64 */
+#define MAX_VMID       max_vmid
+#else
+/* VMID is always 8 bit width on AArch32 */
+#define MAX_VMID        MAX_VMID_8_BIT
+#endif
+
+#ifdef CONFIG_ARM_64
+unsigned int __read_mostly p2m_root_order;
+unsigned int __read_mostly p2m_root_level;
+#endif
+
+#define P2M_ROOT_PAGES    (1<<P2M_ROOT_ORDER)
+
+static mfn_t __read_mostly empty_root_mfn;
+
+static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
+{
+    return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
+}
+
+static struct page_info *p2m_alloc_page(struct domain *d)
+{
+    struct page_info *pg;
+
+    spin_lock(&d->arch.paging.lock);
+    /*
+     * For hardware domain, there should be no limit in the number of pages that
+     * can be allocated, so that the kernel may take advantage of the extended
+     * regions. Hence, allocate p2m pages for hardware domains from heap.
+     */
+    if ( is_hardware_domain(d) )
+    {
+        pg = alloc_domheap_page(NULL, 0);
+        if ( pg == NULL )
+        {
+            printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n");
+            spin_unlock(&d->arch.paging.lock);
+            return NULL;
+        }
+    }
+    else
+    {
+        pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
+        if ( unlikely(!pg) )
+        {
+            spin_unlock(&d->arch.paging.lock);
+            return NULL;
+        }
+        d->arch.paging.p2m_total_pages--;
+    }
+    spin_unlock(&d->arch.paging.lock);
+
+    return pg;
+}
+
+static void p2m_free_page(struct domain *d, struct page_info *pg)
+{
+    spin_lock(&d->arch.paging.lock);
+    if ( is_hardware_domain(d) )
+        free_domheap_page(pg);
+    else
+    {
+        d->arch.paging.p2m_total_pages++;
+        page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
+    }
+    spin_unlock(&d->arch.paging.lock);
+}
+
+/* Unlock the flush and do a P2M TLB flush if necessary */
+void p2m_write_unlock(struct p2m_domain *p2m)
+{
+    /*
+     * The final flush is done with the P2M write lock taken to avoid
+     * someone else modifying the P2M wbefore the TLB invalidation has
+     * completed.
+     */
+    p2m_tlb_flush_sync(p2m);
+
+    write_unlock(&p2m->lock);
+}
+
+void p2m_dump_info(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    p2m_read_lock(p2m);
+    printk("p2m mappings for domain %d (vmid %d):\n",
+           d->domain_id, p2m->vmid);
+    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
+    printk("  1G mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[1], p2m->stats.shattered[1]);
+    printk("  2M mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[2], p2m->stats.shattered[2]);
+    printk("  4K mappings: %ld\n", p2m->stats.mappings[3]);
+    p2m_read_unlock(p2m);
+}
+
+void dump_p2m_lookup(struct domain *d, paddr_t addr)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    printk("dom%d IPA 0x%"PRIpaddr"\n", d->domain_id, addr);
+
+    printk("P2M @ %p mfn:%#"PRI_mfn"\n",
+           p2m->root, mfn_x(page_to_mfn(p2m->root)));
+
+    dump_pt_walk(page_to_maddr(p2m->root), addr,
+                 P2M_ROOT_LEVEL, P2M_ROOT_PAGES);
+}
+
+/*
+ * p2m_save_state and p2m_restore_state work in pair to workaround
+ * ARM64_WORKAROUND_AT_SPECULATE. p2m_save_state will set-up VTTBR to
+ * point to the empty page-tables to stop allocating TLB entries.
+ */
+void p2m_save_state(struct vcpu *p)
+{
+    p->arch.sctlr = READ_SYSREG(SCTLR_EL1);
+
+    if ( cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+    {
+        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
+        /*
+         * Ensure VTTBR_EL2 is correctly synchronized so we can restore
+         * the next vCPU context without worrying about AT instruction
+         * speculation.
+         */
+        isb();
+    }
+}
+
+void p2m_restore_state(struct vcpu *n)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(n->domain);
+    uint8_t *last_vcpu_ran;
+
+    if ( is_idle_vcpu(n) )
+        return;
+
+    WRITE_SYSREG(n->arch.sctlr, SCTLR_EL1);
+    WRITE_SYSREG(n->arch.hcr_el2, HCR_EL2);
+
+    /*
+     * ARM64_WORKAROUND_AT_SPECULATE: VTTBR_EL2 should be restored after all
+     * registers associated to EL1/EL0 translations regime have been
+     * synchronized.
+     */
+    asm volatile(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_AT_SPECULATE));
+    WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
+
+    last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
+
+    /*
+     * While we are restoring an out-of-context translation regime
+     * we still need to ensure:
+     *  - VTTBR_EL2 is synchronized before flushing the TLBs
+     *  - All registers for EL1 are synchronized before executing an AT
+     *  instructions targeting S1/S2.
+     */
+    isb();
+
+    /*
+     * Flush local TLB for the domain to prevent wrong TLB translation
+     * when running multiple vCPU of the same domain on a single pCPU.
+     */
+    if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
+        flush_guest_tlb_local();
+
+    *last_vcpu_ran = n->vcpu_id;
+}
+
+/*
+ * Force a synchronous P2M TLB flush.
+ *
+ * Must be called with the p2m lock held.
+ */
+static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
+{
+    unsigned long flags = 0;
+    uint64_t ovttbr;
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    /*
+     * ARM only provides an instruction to flush TLBs for the current
+     * VMID. So switch to the VTTBR of a given P2M if different.
+     */
+    ovttbr = READ_SYSREG64(VTTBR_EL2);
+    if ( ovttbr != p2m->vttbr )
+    {
+        uint64_t vttbr;
+
+        local_irq_save(flags);
+
+        /*
+         * ARM64_WORKAROUND_AT_SPECULATE: We need to stop AT to allocate
+         * TLBs entries because the context is partially modified. We
+         * only need the VMID for flushing the TLBs, so we can generate
+         * a new VTTBR with the VMID to flush and the empty root table.
+         */
+        if ( !cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+            vttbr = p2m->vttbr;
+        else
+            vttbr = generate_vttbr(p2m->vmid, empty_root_mfn);
+
+        WRITE_SYSREG64(vttbr, VTTBR_EL2);
+
+        /* Ensure VTTBR_EL2 is synchronized before flushing the TLBs */
+        isb();
+    }
+
+    flush_guest_tlb();
+
+    if ( ovttbr != READ_SYSREG64(VTTBR_EL2) )
+    {
+        WRITE_SYSREG64(ovttbr, VTTBR_EL2);
+        /* Ensure VTTBR_EL2 is back in place before continuing. */
+        isb();
+        local_irq_restore(flags);
+    }
+
+    p2m->need_flush = false;
+}
+
+void p2m_tlb_flush_sync(struct p2m_domain *p2m)
+{
+    if ( p2m->need_flush )
+        p2m_force_tlb_flush_sync(p2m);
+}
+
+/*
+ * Find and map the root page table. The caller is responsible for
+ * unmapping the table.
+ *
+ * The function will return NULL if the offset of the root table is
+ * invalid.
+ */
+static lpae_t *p2m_get_root_pointer(struct p2m_domain *p2m,
+                                    gfn_t gfn)
+{
+    unsigned long root_table;
+
+    /*
+     * While the root table index is the offset from the previous level,
+     * we can't use (P2M_ROOT_LEVEL - 1) because the root level might be
+     * 0. Yet we still want to check if all the unused bits are zeroed.
+     */
+    root_table = gfn_x(gfn) >> (XEN_PT_LEVEL_ORDER(P2M_ROOT_LEVEL) +
+                                XEN_PT_LPAE_SHIFT);
+    if ( root_table >= P2M_ROOT_PAGES )
+        return NULL;
+
+    return __map_domain_page(p2m->root + root_table);
+}
+
+/*
+ * Lookup the MFN corresponding to a domain's GFN.
+ * Lookup mem access in the ratrix tree.
+ * The entries associated to the GFN is considered valid.
+ */
+static p2m_access_t p2m_mem_access_radix_get(struct p2m_domain *p2m, gfn_t gfn)
+{
+    void *ptr;
+
+    if ( !p2m->mem_access_enabled )
+        return p2m->default_access;
+
+    ptr = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
+    if ( !ptr )
+        return p2m_access_rwx;
+    else
+        return radix_tree_ptr_to_int(ptr);
+}
+
+/*
+ * In the case of the P2M, the valid bit is used for other purpose. Use
+ * the type to check whether an entry is valid.
+ */
+static inline bool p2m_is_valid(lpae_t pte)
+{
+    return pte.p2m.type != p2m_invalid;
+}
+
+/*
+ * lpae_is_* helpers don't check whether the valid bit is set in the
+ * PTE. Provide our own overlay to check the valid bit.
+ */
+static inline bool p2m_is_mapping(lpae_t pte, unsigned int level)
+{
+    return p2m_is_valid(pte) && lpae_is_mapping(pte, level);
+}
+
+static inline bool p2m_is_superpage(lpae_t pte, unsigned int level)
+{
+    return p2m_is_valid(pte) && lpae_is_superpage(pte, level);
+}
+
+#define GUEST_TABLE_MAP_FAILED 0
+#define GUEST_TABLE_SUPER_PAGE 1
+#define GUEST_TABLE_NORMAL_PAGE 2
+
+static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry);
+
+/*
+ * Take the currently mapped table, find the corresponding GFN entry,
+ * and map the next table, if available. The previous table will be
+ * unmapped if the next level was mapped (e.g GUEST_TABLE_NORMAL_PAGE
+ * returned).
+ *
+ * The read_only parameters indicates whether intermediate tables should
+ * be allocated when not present.
+ *
+ * Return values:
+ *  GUEST_TABLE_MAP_FAILED: Either read_only was set and the entry
+ *  was empty, or allocating a new page failed.
+ *  GUEST_TABLE_NORMAL_PAGE: next level mapped normally
+ *  GUEST_TABLE_SUPER_PAGE: The next entry points to a superpage.
+ */
+static int p2m_next_level(struct p2m_domain *p2m, bool read_only,
+                          unsigned int level, lpae_t **table,
+                          unsigned int offset)
+{
+    lpae_t *entry;
+    int ret;
+    mfn_t mfn;
+
+    entry = *table + offset;
+
+    if ( !p2m_is_valid(*entry) )
+    {
+        if ( read_only )
+            return GUEST_TABLE_MAP_FAILED;
+
+        ret = p2m_create_table(p2m, entry);
+        if ( ret )
+            return GUEST_TABLE_MAP_FAILED;
+    }
+
+    /* The function p2m_next_level is never called at the 3rd level */
+    ASSERT(level < 3);
+    if ( p2m_is_mapping(*entry, level) )
+        return GUEST_TABLE_SUPER_PAGE;
+
+    mfn = lpae_get_mfn(*entry);
+
+    unmap_domain_page(*table);
+    *table = map_domain_page(mfn);
+
+    return GUEST_TABLE_NORMAL_PAGE;
+}
+
+/*
+ * Get the details of a given gfn.
+ *
+ * If the entry is present, the associated MFN will be returned and the
+ * access and type filled up. The page_order will correspond to the
+ * order of the mapping in the page table (i.e it could be a superpage).
+ *
+ * If the entry is not present, INVALID_MFN will be returned and the
+ * page_order will be set according to the order of the invalid range.
+ *
+ * valid will contain the value of bit[0] (e.g valid bit) of the
+ * entry.
+ */
+mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+                    p2m_type_t *t, p2m_access_t *a,
+                    unsigned int *page_order,
+                    bool *valid)
+{
+    paddr_t addr = gfn_to_gaddr(gfn);
+    unsigned int level = 0;
+    lpae_t entry, *table;
+    int rc;
+    mfn_t mfn = INVALID_MFN;
+    p2m_type_t _t;
+    DECLARE_OFFSETS(offsets, addr);
+
+    ASSERT(p2m_is_locked(p2m));
+    BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
+
+    /* Allow t to be NULL */
+    t = t ?: &_t;
+
+    *t = p2m_invalid;
+
+    if ( valid )
+        *valid = false;
+
+    /* XXX: Check if the mapping is lower than the mapped gfn */
+
+    /* This gfn is higher than the highest the p2m map currently holds */
+    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
+    {
+        for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
+            if ( (gfn_x(gfn) & (XEN_PT_LEVEL_MASK(level) >> PAGE_SHIFT)) >
+                 gfn_x(p2m->max_mapped_gfn) )
+                break;
+
+        goto out;
+    }
+
+    table = p2m_get_root_pointer(p2m, gfn);
+
+    /*
+     * the table should always be non-NULL because the gfn is below
+     * p2m->max_mapped_gfn and the root table pages are always present.
+     */
+    if ( !table )
+    {
+        ASSERT_UNREACHABLE();
+        level = P2M_ROOT_LEVEL;
+        goto out;
+    }
+
+    for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
+    {
+        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+            goto out_unmap;
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    entry = table[offsets[level]];
+
+    if ( p2m_is_valid(entry) )
+    {
+        *t = entry.p2m.type;
+
+        if ( a )
+            *a = p2m_mem_access_radix_get(p2m, gfn);
+
+        mfn = lpae_get_mfn(entry);
+        /*
+         * The entry may point to a superpage. Find the MFN associated
+         * to the GFN.
+         */
+        mfn = mfn_add(mfn,
+                      gfn_x(gfn) & ((1UL << XEN_PT_LEVEL_ORDER(level)) - 1));
+
+        if ( valid )
+            *valid = lpae_is_valid(entry);
+    }
+
+out_unmap:
+    unmap_domain_page(table);
+
+out:
+    if ( page_order )
+        *page_order = XEN_PT_LEVEL_ORDER(level);
+
+    return mfn;
+}
+
+mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
+{
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    p2m_read_lock(p2m);
+    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL, NULL);
+    p2m_read_unlock(p2m);
+
+    return mfn;
+}
+
+struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn,
+                                        p2m_type_t *t)
+{
+    struct page_info *page;
+    p2m_type_t p2mt;
+    mfn_t mfn = p2m_lookup(d, gfn, &p2mt);
+
+    if ( t )
+        *t = p2mt;
+
+    if ( !p2m_is_any_ram(p2mt) )
+        return NULL;
+
+    if ( !mfn_valid(mfn) )
+        return NULL;
+
+    page = mfn_to_page(mfn);
+
+    /*
+     * get_page won't work on foreign mapping because the page doesn't
+     * belong to the current domain.
+     */
+    if ( p2m_is_foreign(p2mt) )
+    {
+        struct domain *fdom = page_get_owner_and_reference(page);
+        ASSERT(fdom != NULL);
+        ASSERT(fdom != d);
+        return page;
+    }
+
+    return get_page(page, d) ? page : NULL;
+}
+
+static void p2m_set_permission(lpae_t *e, p2m_type_t t, p2m_access_t a)
+{
+    /* First apply type permissions */
+    switch ( t )
+    {
+    case p2m_ram_rw:
+        e->p2m.xn = 0;
+        e->p2m.write = 1;
+        break;
+
+    case p2m_ram_ro:
+        e->p2m.xn = 0;
+        e->p2m.write = 0;
+        break;
+
+    case p2m_iommu_map_rw:
+    case p2m_map_foreign_rw:
+    case p2m_grant_map_rw:
+    case p2m_mmio_direct_dev:
+    case p2m_mmio_direct_nc:
+    case p2m_mmio_direct_c:
+        e->p2m.xn = 1;
+        e->p2m.write = 1;
+        break;
+
+    case p2m_iommu_map_ro:
+    case p2m_map_foreign_ro:
+    case p2m_grant_map_ro:
+    case p2m_invalid:
+        e->p2m.xn = 1;
+        e->p2m.write = 0;
+        break;
+
+    case p2m_max_real_type:
+        BUG();
+        break;
+    }
+
+    /* Then restrict with access permissions */
+    switch ( a )
+    {
+    case p2m_access_rwx:
+        break;
+    case p2m_access_wx:
+        e->p2m.read = 0;
+        break;
+    case p2m_access_rw:
+        e->p2m.xn = 1;
+        break;
+    case p2m_access_w:
+        e->p2m.read = 0;
+        e->p2m.xn = 1;
+        break;
+    case p2m_access_rx:
+    case p2m_access_rx2rw:
+        e->p2m.write = 0;
+        break;
+    case p2m_access_x:
+        e->p2m.write = 0;
+        e->p2m.read = 0;
+        break;
+    case p2m_access_r:
+        e->p2m.write = 0;
+        e->p2m.xn = 1;
+        break;
+    case p2m_access_n:
+    case p2m_access_n2rwx:
+        e->p2m.read = e->p2m.write = 0;
+        e->p2m.xn = 1;
+        break;
+    }
+}
+
+static lpae_t mfn_to_p2m_entry(mfn_t mfn, p2m_type_t t, p2m_access_t a)
+{
+    /*
+     * sh, xn and write bit will be defined in the following switches
+     * based on mattr and t.
+     */
+    lpae_t e = (lpae_t) {
+        .p2m.af = 1,
+        .p2m.read = 1,
+        .p2m.table = 1,
+        .p2m.valid = 1,
+        .p2m.type = t,
+    };
+
+    BUILD_BUG_ON(p2m_max_real_type > (1 << 4));
+
+    switch ( t )
+    {
+    case p2m_mmio_direct_dev:
+        e.p2m.mattr = MATTR_DEV;
+        e.p2m.sh = LPAE_SH_OUTER;
+        break;
+
+    case p2m_mmio_direct_c:
+        e.p2m.mattr = MATTR_MEM;
+        e.p2m.sh = LPAE_SH_OUTER;
+        break;
+
+    /*
+     * ARM ARM: Overlaying the shareability attribute (DDI
+     * 0406C.b B3-1376 to 1377)
+     *
+     * A memory region with a resultant memory type attribute of Normal,
+     * and a resultant cacheability attribute of Inner Non-cacheable,
+     * Outer Non-cacheable, must have a resultant shareability attribute
+     * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
+     *
+     * On ARMv8 shareability is ignored and explicitly treated as Outer
+     * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
+     * See the note for table D4-40, in page 1788 of the ARM DDI 0487A.j.
+     */
+    case p2m_mmio_direct_nc:
+        e.p2m.mattr = MATTR_MEM_NC;
+        e.p2m.sh = LPAE_SH_OUTER;
+        break;
+
+    default:
+        e.p2m.mattr = MATTR_MEM;
+        e.p2m.sh = LPAE_SH_INNER;
+    }
+
+    p2m_set_permission(&e, t, a);
+
+    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
+
+    lpae_set_mfn(e, mfn);
+
+    return e;
+}
+
+/* Generate table entry with correct attributes. */
+static lpae_t page_to_p2m_table(struct page_info *page)
+{
+    /*
+     * The access value does not matter because the hardware will ignore
+     * the permission fields for table entry.
+     *
+     * We use p2m_ram_rw so the entry has a valid type. This is important
+     * for p2m_is_valid() to return valid on table entries.
+     */
+    return mfn_to_p2m_entry(page_to_mfn(page), p2m_ram_rw, p2m_access_rwx);
+}
+
+static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool clean_pte)
+{
+    write_pte(p, pte);
+    if ( clean_pte )
+        clean_dcache(*p);
+}
+
+static inline void p2m_remove_pte(lpae_t *p, bool clean_pte)
+{
+    lpae_t pte;
+
+    memset(&pte, 0x00, sizeof(pte));
+    p2m_write_pte(p, pte, clean_pte);
+}
+
+/* Allocate a new page table page and hook it in via the given entry. */
+static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry)
+{
+    struct page_info *page;
+    lpae_t *p;
+
+    ASSERT(!p2m_is_valid(*entry));
+
+    page = p2m_alloc_page(p2m->domain);
+    if ( page == NULL )
+        return -ENOMEM;
+
+    page_list_add(page, &p2m->pages);
+
+    p = __map_domain_page(page);
+    clear_page(p);
+
+    if ( p2m->clean_pte )
+        clean_dcache_va_range(p, PAGE_SIZE);
+
+    unmap_domain_page(p);
+
+    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
+
+    return 0;
+}
+
+static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
+                                    p2m_access_t a)
+{
+    int rc;
+
+    if ( !p2m->mem_access_enabled )
+        return 0;
+
+    if ( p2m_access_rwx == a )
+    {
+        radix_tree_delete(&p2m->mem_access_settings, gfn_x(gfn));
+        return 0;
+    }
+
+    rc = radix_tree_insert(&p2m->mem_access_settings, gfn_x(gfn),
+                           radix_tree_int_to_ptr(a));
+    if ( rc == -EEXIST )
+    {
+        /* If a setting already exists, change it to the new one */
+        radix_tree_replace_slot(
+            radix_tree_lookup_slot(
+                &p2m->mem_access_settings, gfn_x(gfn)),
+            radix_tree_int_to_ptr(a));
+        rc = 0;
+    }
+
+    return rc;
+}
+
+/*
+ * Put any references on the single 4K page referenced by pte.
+ * TODO: Handle superpages, for now we only take special references for leaf
+ * pages (specifically foreign ones, which can't be super mapped today).
+ */
+static void p2m_put_l3_page(const lpae_t pte)
+{
+    mfn_t mfn = lpae_get_mfn(pte);
+
+    ASSERT(p2m_is_valid(pte));
+
+    /*
+     * TODO: Handle other p2m types
+     *
+     * It's safe to do the put_page here because page_alloc will
+     * flush the TLBs if the page is reallocated before the end of
+     * this loop.
+     */
+    if ( p2m_is_foreign(pte.p2m.type) )
+    {
+        ASSERT(mfn_valid(mfn));
+        put_page(mfn_to_page(mfn));
+    }
+    /* Detect the xenheap page and mark the stored GFN as invalid. */
+    else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
+}
+
+/* Free lpae sub-tree behind an entry */
+static void p2m_free_entry(struct p2m_domain *p2m,
+                           lpae_t entry, unsigned int level)
+{
+    unsigned int i;
+    lpae_t *table;
+    mfn_t mfn;
+    struct page_info *pg;
+
+    /* Nothing to do if the entry is invalid. */
+    if ( !p2m_is_valid(entry) )
+        return;
+
+    if ( p2m_is_superpage(entry, level) || (level == 3) )
+    {
+#ifdef CONFIG_IOREQ_SERVER
+        /*
+         * If this gets called then either the entry was replaced by an entry
+         * with a different base (valid case) or the shattering of a superpage
+         * has failed (error case).
+         * So, at worst, the spurious mapcache invalidation might be sent.
+         */
+        if ( p2m_is_ram(entry.p2m.type) &&
+             domain_has_ioreq_server(p2m->domain) )
+            ioreq_request_mapcache_invalidate(p2m->domain);
+#endif
+
+        p2m->stats.mappings[level]--;
+        /* Nothing to do if the entry is a super-page. */
+        if ( level == 3 )
+            p2m_put_l3_page(entry);
+        return;
+    }
+
+    table = map_domain_page(lpae_get_mfn(entry));
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+        p2m_free_entry(p2m, *(table + i), level + 1);
+
+    unmap_domain_page(table);
+
+    /*
+     * Make sure all the references in the TLB have been removed before
+     * freing the intermediate page table.
+     * XXX: Should we defer the free of the page table to avoid the
+     * flush?
+     */
+    p2m_tlb_flush_sync(p2m);
+
+    mfn = lpae_get_mfn(entry);
+    ASSERT(mfn_valid(mfn));
+
+    pg = mfn_to_page(mfn);
+
+    page_list_del(pg, &p2m->pages);
+    p2m_free_page(p2m->domain, pg);
+}
+
+static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
+                                unsigned int level, unsigned int target,
+                                const unsigned int *offsets)
+{
+    struct page_info *page;
+    unsigned int i;
+    lpae_t pte, *table;
+    bool rv = true;
+
+    /* Convenience aliases */
+    mfn_t mfn = lpae_get_mfn(*entry);
+    unsigned int next_level = level + 1;
+    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
+
+    /*
+     * This should only be called with target != level and the entry is
+     * a superpage.
+     */
+    ASSERT(level < target);
+    ASSERT(p2m_is_superpage(*entry, level));
+
+    page = p2m_alloc_page(p2m->domain);
+    if ( !page )
+        return false;
+
+    page_list_add(page, &p2m->pages);
+    table = __map_domain_page(page);
+
+    /*
+     * We are either splitting a first level 1G page into 512 second level
+     * 2M pages, or a second level 2M page into 512 third level 4K pages.
+     */
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+    {
+        lpae_t *new_entry = table + i;
+
+        /*
+         * Use the content of the superpage entry and override
+         * the necessary fields. So the correct permission are kept.
+         */
+        pte = *entry;
+        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
+
+        /*
+         * First and second level pages set p2m.table = 0, but third
+         * level entries set p2m.table = 1.
+         */
+        pte.p2m.table = (next_level == 3);
+
+        write_pte(new_entry, pte);
+    }
+
+    /* Update stats */
+    p2m->stats.shattered[level]++;
+    p2m->stats.mappings[level]--;
+    p2m->stats.mappings[next_level] += XEN_PT_LPAE_ENTRIES;
+
+    /*
+     * Shatter superpage in the page to the level we want to make the
+     * changes.
+     * This is done outside the loop to avoid checking the offset to
+     * know whether the entry should be shattered for every entry.
+     */
+    if ( next_level != target )
+        rv = p2m_split_superpage(p2m, table + offsets[next_level],
+                                 level + 1, target, offsets);
+
+    if ( p2m->clean_pte )
+        clean_dcache_va_range(table, PAGE_SIZE);
+
+    unmap_domain_page(table);
+
+    /*
+     * Even if we failed, we should install the newly allocated LPAE
+     * entry. The caller will be in charge to free the sub-tree.
+     */
+    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte);
+
+    return rv;
+}
+
+/*
+ * Insert an entry in the p2m. This should be called with a mapping
+ * equal to a page/superpage (4K, 2M, 1G).
+ */
+static int __p2m_set_entry(struct p2m_domain *p2m,
+                           gfn_t sgfn,
+                           unsigned int page_order,
+                           mfn_t smfn,
+                           p2m_type_t t,
+                           p2m_access_t a)
+{
+    unsigned int level = 0;
+    unsigned int target = 3 - (page_order / XEN_PT_LPAE_SHIFT);
+    lpae_t *entry, *table, orig_pte;
+    int rc;
+    /* A mapping is removed if the MFN is invalid. */
+    bool removing_mapping = mfn_eq(smfn, INVALID_MFN);
+    DECLARE_OFFSETS(offsets, gfn_to_gaddr(sgfn));
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    /*
+     * Check if the level target is valid: we only support
+     * 4K - 2M - 1G mapping.
+     */
+    ASSERT(target > 0 && target <= 3);
+
+    table = p2m_get_root_pointer(p2m, sgfn);
+    if ( !table )
+        return -EINVAL;
+
+    for ( level = P2M_ROOT_LEVEL; level < target; level++ )
+    {
+        /*
+         * Don't try to allocate intermediate page table if the mapping
+         * is about to be removed.
+         */
+        rc = p2m_next_level(p2m, removing_mapping,
+                            level, &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+        {
+            /*
+             * We are here because p2m_next_level has failed to map
+             * the intermediate page table (e.g the table does not exist
+             * and they p2m tree is read-only). It is a valid case
+             * when removing a mapping as it may not exist in the
+             * page table. In this case, just ignore it.
+             */
+            rc = removing_mapping ?  0 : -ENOENT;
+            goto out;
+        }
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    entry = table + offsets[level];
+
+    /*
+     * If we are here with level < target, we must be at a leaf node,
+     * and we need to break up the superpage.
+     */
+    if ( level < target )
+    {
+        /* We need to split the original page. */
+        lpae_t split_pte = *entry;
+
+        ASSERT(p2m_is_superpage(*entry, level));
+
+        if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
+        {
+            /*
+             * The current super-page is still in-place, so re-increment
+             * the stats.
+             */
+            p2m->stats.mappings[level]++;
+
+            /* Free the allocated sub-tree */
+            p2m_free_entry(p2m, split_pte, level);
+
+            rc = -ENOMEM;
+            goto out;
+        }
+
+        /*
+         * Follow the break-before-sequence to update the entry.
+         * For more details see (D4.7.1 in ARM DDI 0487A.j).
+         */
+        p2m_remove_pte(entry, p2m->clean_pte);
+        p2m_force_tlb_flush_sync(p2m);
+
+        p2m_write_pte(entry, split_pte, p2m->clean_pte);
+
+        /* then move to the level we want to make real changes */
+        for ( ; level < target; level++ )
+        {
+            rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
+
+            /*
+             * The entry should be found and either be a table
+             * or a superpage if level 3 is not targeted
+             */
+            ASSERT(rc == GUEST_TABLE_NORMAL_PAGE ||
+                   (rc == GUEST_TABLE_SUPER_PAGE && target < 3));
+        }
+
+        entry = table + offsets[level];
+    }
+
+    /*
+     * We should always be there with the correct level because
+     * all the intermediate tables have been installed if necessary.
+     */
+    ASSERT(level == target);
+
+    orig_pte = *entry;
+
+    /*
+     * The radix-tree can only work on 4KB. This is only used when
+     * memaccess is enabled and during shutdown.
+     */
+    ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
+           p2m->domain->is_dying);
+    /*
+     * The access type should always be p2m_access_rwx when the mapping
+     * is removed.
+     */
+    ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx));
+    /*
+     * Update the mem access permission before update the P2M. So we
+     * don't have to revert the mapping if it has failed.
+     */
+    rc = p2m_mem_access_radix_set(p2m, sgfn, a);
+    if ( rc )
+        goto out;
+
+    /*
+     * Always remove the entry in order to follow the break-before-make
+     * sequence when updating the translation table (D4.7.1 in ARM DDI
+     * 0487A.j).
+     */
+    if ( lpae_is_valid(orig_pte) || removing_mapping )
+        p2m_remove_pte(entry, p2m->clean_pte);
+
+    if ( removing_mapping )
+        /* Flush can be deferred if the entry is removed */
+        p2m->need_flush |= !!lpae_is_valid(orig_pte);
+    else
+    {
+        lpae_t pte = mfn_to_p2m_entry(smfn, t, a);
+
+        if ( level < 3 )
+            pte.p2m.table = 0; /* Superpage entry */
+
+        /*
+         * It is necessary to flush the TLB before writing the new entry
+         * to keep coherency when the previous entry was valid.
+         *
+         * Although, it could be defered when only the permissions are
+         * changed (e.g in case of memaccess).
+         */
+        if ( lpae_is_valid(orig_pte) )
+        {
+            if ( likely(!p2m->mem_access_enabled) ||
+                 P2M_CLEAR_PERM(pte) != P2M_CLEAR_PERM(orig_pte) )
+                p2m_force_tlb_flush_sync(p2m);
+            else
+                p2m->need_flush = true;
+        }
+        else if ( !p2m_is_valid(orig_pte) ) /* new mapping */
+            p2m->stats.mappings[level]++;
+
+        p2m_write_pte(entry, pte, p2m->clean_pte);
+
+        p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
+                                      gfn_add(sgfn, (1UL << page_order) - 1));
+        p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn);
+    }
+
+    if ( is_iommu_enabled(p2m->domain) &&
+         (lpae_is_valid(orig_pte) || lpae_is_valid(*entry)) )
+    {
+        unsigned int flush_flags = 0;
+
+        if ( lpae_is_valid(orig_pte) )
+            flush_flags |= IOMMU_FLUSHF_modified;
+        if ( lpae_is_valid(*entry) )
+            flush_flags |= IOMMU_FLUSHF_added;
+
+        rc = iommu_iotlb_flush(p2m->domain, _dfn(gfn_x(sgfn)),
+                               1UL << page_order, flush_flags);
+    }
+    else
+        rc = 0;
+
+    /*
+     * Free the entry only if the original pte was valid and the base
+     * is different (to avoid freeing when permission is changed).
+     */
+    if ( p2m_is_valid(orig_pte) &&
+         !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
+        p2m_free_entry(p2m, orig_pte, level);
+
+out:
+    unmap_domain_page(table);
+
+    return rc;
+}
+
+int p2m_set_entry(struct p2m_domain *p2m,
+                  gfn_t sgfn,
+                  unsigned long nr,
+                  mfn_t smfn,
+                  p2m_type_t t,
+                  p2m_access_t a)
+{
+    int rc = 0;
+
+    /*
+     * Any reference taken by the P2M mappings (e.g. foreign mapping) will
+     * be dropped in relinquish_p2m_mapping(). As the P2M will still
+     * be accessible after, we need to prevent mapping to be added when the
+     * domain is dying.
+     */
+    if ( unlikely(p2m->domain->is_dying) )
+        return -ENOMEM;
+
+    while ( nr )
+    {
+        unsigned long mask;
+        unsigned long order;
+
+        /*
+         * Don't take into account the MFN when removing mapping (i.e
+         * MFN_INVALID) to calculate the correct target order.
+         *
+         * XXX: Support superpage mappings if nr is not aligned to a
+         * superpage size.
+         */
+        mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
+        mask |= gfn_x(sgfn) | nr;
+
+        /* Always map 4k by 4k when memaccess is enabled */
+        if ( unlikely(p2m->mem_access_enabled) )
+            order = THIRD_ORDER;
+        else if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
+            order = FIRST_ORDER;
+        else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
+            order = SECOND_ORDER;
+        else
+            order = THIRD_ORDER;
+
+        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
+        if ( rc )
+            break;
+
+        sgfn = gfn_add(sgfn, (1 << order));
+        if ( !mfn_eq(smfn, INVALID_MFN) )
+           smfn = mfn_add(smfn, (1 << order));
+
+        nr -= (1 << order);
+    }
+
+    return rc;
+}
+
+/* Invalidate all entries in the table. The p2m should be write locked. */
+static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn)
+{
+    lpae_t *table;
+    unsigned int i;
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    table = map_domain_page(mfn);
+
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+    {
+        lpae_t pte = table[i];
+
+        /*
+         * Writing an entry can be expensive because it may involve
+         * cleaning the cache. So avoid updating the entry if the valid
+         * bit is already cleared.
+         */
+        if ( !pte.p2m.valid )
+            continue;
+
+        pte.p2m.valid = 0;
+
+        p2m_write_pte(&table[i], pte, p2m->clean_pte);
+    }
+
+    unmap_domain_page(table);
+
+    p2m->need_flush = true;
+}
+
+/*
+ * Invalidate all entries in the root page-tables. This is
+ * useful to get fault on entry and do an action.
+ *
+ * p2m_invalid_root() should not be called when the P2M is shared with
+ * the IOMMU because it will cause IOMMU fault.
+ */
+void p2m_invalidate_root(struct p2m_domain *p2m)
+{
+    unsigned int i;
+
+    ASSERT(!iommu_use_hap_pt(p2m->domain));
+
+    p2m_write_lock(p2m);
+
+    for ( i = 0; i < P2M_ROOT_LEVEL; i++ )
+        p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i));
+
+    p2m_write_unlock(p2m);
+}
+
+/*
+ * Resolve any translation fault due to change in the p2m. This
+ * includes break-before-make and valid bit cleared.
+ */
+bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    unsigned int level = 0;
+    bool resolved = false;
+    lpae_t entry, *table;
+
+    /* Convenience aliases */
+    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
+
+    p2m_write_lock(p2m);
+
+    /* This gfn is higher than the highest the p2m map currently holds */
+    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
+        goto out;
+
+    table = p2m_get_root_pointer(p2m, gfn);
+    /*
+     * The table should always be non-NULL because the gfn is below
+     * p2m->max_mapped_gfn and the root table pages are always present.
+     */
+    if ( !table )
+    {
+        ASSERT_UNREACHABLE();
+        goto out;
+    }
+
+    /*
+     * Go down the page-tables until an entry has the valid bit unset or
+     * a block/page entry has been hit.
+     */
+    for ( level = P2M_ROOT_LEVEL; level <= 3; level++ )
+    {
+        int rc;
+
+        entry = table[offsets[level]];
+
+        if ( level == 3 )
+            break;
+
+        /* Stop as soon as we hit an entry with the valid bit unset. */
+        if ( !lpae_is_valid(entry) )
+            break;
+
+        rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+            goto out_unmap;
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    /*
+     * If the valid bit of the entry is set, it means someone was playing with
+     * the Stage-2 page table. Nothing to do and mark the fault as resolved.
+     */
+    if ( lpae_is_valid(entry) )
+    {
+        resolved = true;
+        goto out_unmap;
+    }
+
+    /*
+     * The valid bit is unset. If the entry is still not valid then the fault
+     * cannot be resolved, exit and report it.
+     */
+    if ( !p2m_is_valid(entry) )
+        goto out_unmap;
+
+    /*
+     * Now we have an entry with valid bit unset, but still valid from
+     * the P2M point of view.
+     *
+     * If an entry is pointing to a table, each entry of the table will
+     * have there valid bit cleared. This allows a function to clear the
+     * full p2m with just a couple of write. The valid bit will then be
+     * propagated on the fault.
+     * If an entry is pointing to a block/page, no work to do for now.
+     */
+    if ( lpae_is_table(entry, level) )
+        p2m_invalidate_table(p2m, lpae_get_mfn(entry));
+
+    /*
+     * Now that the work on the entry is done, set the valid bit to prevent
+     * another fault on that entry.
+     */
+    resolved = true;
+    entry.p2m.valid = 1;
+
+    p2m_write_pte(table + offsets[level], entry, p2m->clean_pte);
+
+    /*
+     * No need to flush the TLBs as the modified entry had the valid bit
+     * unset.
+     */
+
+out_unmap:
+    unmap_domain_page(table);
+
+out:
+    p2m_write_unlock(p2m);
+
+    return resolved;
+}
+
+int p2m_insert_mapping(struct domain *d, gfn_t start_gfn, unsigned long nr,
+                       mfn_t mfn, p2m_type_t t)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
+
+    p2m_write_lock(p2m);
+    rc = p2m_set_entry(p2m, start_gfn, nr, mfn, t, p2m->default_access);
+    p2m_write_unlock(p2m);
+
+    return rc;
+}
+
+static inline int p2m_remove_mapping(struct domain *d,
+                                     gfn_t start_gfn,
+                                     unsigned long nr,
+                                     mfn_t mfn)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    unsigned long i;
+    int rc;
+
+    p2m_write_lock(p2m);
+    /*
+     * Before removing the GFN - MFN mapping for any RAM pages make sure
+     * that there is no difference between what is already mapped and what
+     * is requested to be unmapped.
+     * If they don't match bail out early. For instance, this could happen
+     * if two CPUs are requesting to unmap the same P2M entry concurrently.
+     */
+    for ( i = 0; i < nr; )
+    {
+        unsigned int cur_order;
+        p2m_type_t t;
+        mfn_t mfn_return = p2m_get_entry(p2m, gfn_add(start_gfn, i), &t, NULL,
+                                         &cur_order, NULL);
+
+        if ( p2m_is_any_ram(t) &&
+             (!mfn_valid(mfn) || !mfn_eq(mfn_add(mfn, i), mfn_return)) )
+        {
+            rc = -EILSEQ;
+            goto out;
+        }
+
+        i += (1UL << cur_order) -
+             ((gfn_x(start_gfn) + i) & ((1UL << cur_order) - 1));
+    }
+
+    rc = p2m_set_entry(p2m, start_gfn, nr, INVALID_MFN,
+                       p2m_invalid, p2m_access_rwx);
+
+out:
+    p2m_write_unlock(p2m);
+
+    return rc;
+}
+
+int map_regions_p2mt(struct domain *d,
+                     gfn_t gfn,
+                     unsigned long nr,
+                     mfn_t mfn,
+                     p2m_type_t p2mt)
+{
+    return p2m_insert_mapping(d, gfn, nr, mfn, p2mt);
+}
+
+int unmap_regions_p2mt(struct domain *d,
+                       gfn_t gfn,
+                       unsigned long nr,
+                       mfn_t mfn)
+{
+    return p2m_remove_mapping(d, gfn, nr, mfn);
+}
+
+int map_mmio_regions(struct domain *d,
+                     gfn_t start_gfn,
+                     unsigned long nr,
+                     mfn_t mfn)
+{
+    return p2m_insert_mapping(d, start_gfn, nr, mfn, p2m_mmio_direct_dev);
+}
+
+int unmap_mmio_regions(struct domain *d,
+                       gfn_t start_gfn,
+                       unsigned long nr,
+                       mfn_t mfn)
+{
+    return p2m_remove_mapping(d, start_gfn, nr, mfn);
+}
+
+int map_dev_mmio_page(struct domain *d, gfn_t gfn, mfn_t mfn)
+{
+    int res;
+
+    if ( !iomem_access_permitted(d, mfn_x(mfn), mfn_x(mfn)) )
+        return 0;
+
+    res = p2m_insert_mapping(d, gfn, 1, mfn, p2m_mmio_direct_c);
+    if ( res < 0 )
+    {
+        printk(XENLOG_G_ERR "Unable to map MFN %#"PRI_mfn" in %pd\n",
+               mfn_x(mfn), d);
+        return res;
+    }
+
+    return 0;
+}
+
+int guest_physmap_add_entry(struct domain *d,
+                            gfn_t gfn,
+                            mfn_t mfn,
+                            unsigned long page_order,
+                            p2m_type_t t)
+{
+    return p2m_insert_mapping(d, gfn, (1 << page_order), mfn, t);
+}
+
+int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
+                              unsigned int page_order)
+{
+    return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
+}
+
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    struct page_info *page = mfn_to_page(mfn);
+    int rc;
+
+    ASSERT(arch_acquire_resource_check(d));
+
+    if ( !get_page(page, fd) )
+        return -EINVAL;
+
+    /*
+     * It is valid to always use p2m_map_foreign_rw here as if this gets
+     * called then d != fd. A case when d == fd would be rejected by
+     * rcu_lock_remote_domain_by_id() earlier. Put a respective ASSERT()
+     * to catch incorrect usage in future.
+     */
+    ASSERT(d != fd);
+
+    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
+    if ( rc )
+        put_page(page);
+
+    return rc;
+}
+
+static struct page_info *p2m_allocate_root(void)
+{
+    struct page_info *page;
+    unsigned int i;
+
+    page = alloc_domheap_pages(NULL, P2M_ROOT_ORDER, 0);
+    if ( page == NULL )
+        return NULL;
+
+    /* Clear both first level pages */
+    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
+        clear_and_clean_page(page + i);
+
+    return page;
+}
+
+static int p2m_alloc_table(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    p2m->root = p2m_allocate_root();
+    if ( !p2m->root )
+        return -ENOMEM;
+
+    p2m->vttbr = generate_vttbr(p2m->vmid, page_to_mfn(p2m->root));
+
+    /*
+     * Make sure that all TLBs corresponding to the new VMID are flushed
+     * before using it
+     */
+    p2m_write_lock(p2m);
+    p2m_force_tlb_flush_sync(p2m);
+    p2m_write_unlock(p2m);
+
+    return 0;
+}
+
+
+static spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
+
+/*
+ * VTTBR_EL2 VMID field is 8 or 16 bits. AArch64 may support 16-bit VMID.
+ * Using a bitmap here limits us to 256 or 65536 (for AArch64) concurrent
+ * domains. The bitmap space will be allocated dynamically based on
+ * whether 8 or 16 bit VMIDs are supported.
+ */
+static unsigned long *vmid_mask;
+
+static void p2m_vmid_allocator_init(void)
+{
+    /*
+     * allocate space for vmid_mask based on MAX_VMID
+     */
+    vmid_mask = xzalloc_array(unsigned long, BITS_TO_LONGS(MAX_VMID));
+
+    if ( !vmid_mask )
+        panic("Could not allocate VMID bitmap space\n");
+
+    set_bit(INVALID_VMID, vmid_mask);
+}
+
+static int p2m_alloc_vmid(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    int rc, nr;
+
+    spin_lock(&vmid_alloc_lock);
+
+    nr = find_first_zero_bit(vmid_mask, MAX_VMID);
+
+    ASSERT(nr != INVALID_VMID);
+
+    if ( nr == MAX_VMID )
+    {
+        rc = -EBUSY;
+        printk(XENLOG_ERR "p2m.c: dom%d: VMID pool exhausted\n", d->domain_id);
+        goto out;
+    }
+
+    set_bit(nr, vmid_mask);
+
+    p2m->vmid = nr;
+
+    rc = 0;
+
+out:
+    spin_unlock(&vmid_alloc_lock);
+    return rc;
+}
+
+static void p2m_free_vmid(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    spin_lock(&vmid_alloc_lock);
+    if ( p2m->vmid != INVALID_VMID )
+        clear_bit(p2m->vmid, vmid_mask);
+
+    spin_unlock(&vmid_alloc_lock);
+}
+
+int p2m_teardown(struct domain *d, bool allow_preemption)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    unsigned long count = 0;
+    struct page_info *pg;
+    unsigned int i;
+    int rc = 0;
+
+    if ( page_list_empty(&p2m->pages) )
+        return 0;
+
+    p2m_write_lock(p2m);
+
+    /*
+     * We are about to free the intermediate page-tables, so clear the
+     * root to prevent any walk to use them.
+     */
+    for ( i = 0; i < P2M_ROOT_PAGES; i++ )
+        clear_and_clean_page(p2m->root + i);
+
+    /*
+     * The domain will not be scheduled anymore, so in theory we should
+     * not need to flush the TLBs. Do it for safety purpose.
+     *
+     * Note that all the devices have already been de-assigned. So we don't
+     * need to flush the IOMMU TLB here.
+     */
+    p2m_force_tlb_flush_sync(p2m);
+
+    while ( (pg = page_list_remove_head(&p2m->pages)) )
+    {
+        p2m_free_page(p2m->domain, pg);
+        count++;
+        /* Arbitrarily preempt every 512 iterations */
+        if ( allow_preemption && !(count % 512) && hypercall_preempt_check() )
+        {
+            rc = -ERESTART;
+            break;
+        }
+    }
+
+    p2m_write_unlock(p2m);
+
+    return rc;
+}
+
+void p2m_final_teardown(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    /* p2m not actually initialized */
+    if ( !p2m->domain )
+        return;
+
+    /*
+     * No need to call relinquish_p2m_mapping() here because
+     * p2m_final_teardown() is called either after domain_relinquish_resources()
+     * where relinquish_p2m_mapping() has been called, or from failure path of
+     * domain_create()/arch_domain_create() where mappings that require
+     * p2m_put_l3_page() should never be created. For the latter case, also see
+     * comment on top of the p2m_set_entry() for more info.
+     */
+
+    BUG_ON(p2m_teardown(d, false));
+    ASSERT(page_list_empty(&p2m->pages));
+
+    while ( p2m_teardown_allocation(d) == -ERESTART )
+        continue; /* No preemption support here */
+    ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
+
+    if ( p2m->root )
+        free_domheap_pages(p2m->root, P2M_ROOT_ORDER);
+
+    p2m->root = NULL;
+
+    p2m_free_vmid(d);
+
+    radix_tree_destroy(&p2m->mem_access_settings, NULL);
+
+    p2m->domain = NULL;
+}
+
+int p2m_init(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
+    unsigned int cpu;
+
+    rwlock_init(&p2m->lock);
+    spin_lock_init(&d->arch.paging.lock);
+    INIT_PAGE_LIST_HEAD(&p2m->pages);
+    INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
+
+    p2m->vmid = INVALID_VMID;
+    p2m->max_mapped_gfn = _gfn(0);
+    p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
+
+    p2m->default_access = p2m_access_rwx;
+    p2m->mem_access_enabled = false;
+    radix_tree_init(&p2m->mem_access_settings);
+
+    /*
+     * Some IOMMUs don't support coherent PT walk. When the p2m is
+     * shared with the CPU, Xen has to make sure that the PT changes have
+     * reached the memory
+     */
+    p2m->clean_pte = is_iommu_enabled(d) &&
+        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
+
+    /*
+     * Make sure that the type chosen to is able to store the an vCPU ID
+     * between 0 and the maximum of virtual CPUS supported as long as
+     * the INVALID_VCPU_ID.
+     */
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
+
+    for_each_possible_cpu(cpu)
+       p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
+
+    /*
+     * "Trivial" initialisation is now complete.  Set the backpointer so
+     * p2m_teardown() and friends know to do something.
+     */
+    p2m->domain = d;
+
+    rc = p2m_alloc_vmid(d);
+    if ( rc )
+        return rc;
+
+    rc = p2m_alloc_table(d);
+    if ( rc )
+        return rc;
+
+    /*
+     * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
+     * when the domain is created. Considering the worst case for page
+     * tables and keep a buffer, populate 16 pages to the P2M pages pool here.
+     * For GICv3, the above-mentioned P2M mapping is not necessary, but since
+     * the allocated 16 pages here would not be lost, hence populate these
+     * pages unconditionally.
+     */
+    spin_lock(&d->arch.paging.lock);
+    rc = p2m_set_allocation(d, 16, NULL);
+    spin_unlock(&d->arch.paging.lock);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+/*
+ * The function will go through the p2m and remove page reference when it
+ * is required. The mapping will be removed from the p2m.
+ *
+ * XXX: See whether the mapping can be left intact in the p2m.
+ */
+int relinquish_p2m_mapping(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    unsigned long count = 0;
+    p2m_type_t t;
+    int rc = 0;
+    unsigned int order;
+    gfn_t start, end;
+
+    BUG_ON(!d->is_dying);
+    /* No mappings can be added in the P2M after the P2M lock is released. */
+    p2m_write_lock(p2m);
+
+    start = p2m->lowest_mapped_gfn;
+    end = gfn_add(p2m->max_mapped_gfn, 1);
+
+    for ( ; gfn_x(start) < gfn_x(end);
+          start = gfn_next_boundary(start, order) )
+    {
+        mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, NULL);
+
+        count++;
+        /*
+         * Arbitrarily preempt every 512 iterations.
+         */
+        if ( !(count % 512) && hypercall_preempt_check() )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        /*
+         * p2m_set_entry will take care of removing reference on page
+         * when it is necessary and removing the mapping in the p2m.
+         */
+        if ( !mfn_eq(mfn, INVALID_MFN) )
+        {
+            /*
+             * For valid mapping, the start will always be aligned as
+             * entry will be removed whilst relinquishing.
+             */
+            rc = __p2m_set_entry(p2m, start, order, INVALID_MFN,
+                                 p2m_invalid, p2m_access_rwx);
+            if ( unlikely(rc) )
+            {
+                printk(XENLOG_G_ERR "Unable to remove mapping gfn=%#"PRI_gfn" order=%u from the p2m of domain %d\n", gfn_x(start), order, d->domain_id);
+                break;
+            }
+        }
+    }
+
+    /*
+     * Update lowest_mapped_gfn so on the next call we still start where
+     * we stopped.
+     */
+    p2m->lowest_mapped_gfn = start;
+
+    p2m_write_unlock(p2m);
+
+    return rc;
+}
+
+int p2m_cache_flush_range(struct domain *d, gfn_t *pstart, gfn_t end)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    gfn_t next_block_gfn;
+    gfn_t start = *pstart;
+    mfn_t mfn = INVALID_MFN;
+    p2m_type_t t;
+    unsigned int order;
+    int rc = 0;
+    /* Counter for preemption */
+    unsigned short count = 0;
+
+    /*
+     * The operation cache flush will invalidate the RAM assigned to the
+     * guest in a given range. It will not modify the page table and
+     * flushing the cache whilst the page is used by another CPU is
+     * fine. So using read-lock is fine here.
+     */
+    p2m_read_lock(p2m);
+
+    start = gfn_max(start, p2m->lowest_mapped_gfn);
+    end = gfn_min(end, gfn_add(p2m->max_mapped_gfn, 1));
+
+    next_block_gfn = start;
+
+    while ( gfn_x(start) < gfn_x(end) )
+    {
+       /*
+         * Cleaning the cache for the P2M may take a long time. So we
+         * need to be able to preempt. We will arbitrarily preempt every
+         * time count reach 512 or above.
+         *
+         * The count will be incremented by:
+         *  - 1 on region skipped
+         *  - 10 for each page requiring a flush
+         */
+        if ( count >= 512 )
+        {
+            if ( softirq_pending(smp_processor_id()) )
+            {
+                rc = -ERESTART;
+                break;
+            }
+            count = 0;
+        }
+
+        /*
+         * We want to flush page by page as:
+         *  - it may not be possible to map the full block (can be up to 1GB)
+         *    in Xen memory
+         *  - we may want to do fine grain preemption as flushing multiple
+         *    page in one go may take a long time
+         *
+         * As p2m_get_entry is able to return the size of the mapping
+         * in the p2m, it is pointless to execute it for each page.
+         *
+         * We can optimize it by tracking the gfn of the next
+         * block. So we will only call p2m_get_entry for each block (can
+         * be up to 1GB).
+         */
+        if ( gfn_eq(start, next_block_gfn) )
+        {
+            bool valid;
+
+            mfn = p2m_get_entry(p2m, start, &t, NULL, &order, &valid);
+            next_block_gfn = gfn_next_boundary(start, order);
+
+            if ( mfn_eq(mfn, INVALID_MFN) || !p2m_is_any_ram(t) || !valid )
+            {
+                count++;
+                start = next_block_gfn;
+                continue;
+            }
+        }
+
+        count += 10;
+
+        flush_page_to_ram(mfn_x(mfn), false);
+
+        start = gfn_add(start, 1);
+        mfn = mfn_add(mfn, 1);
+    }
+
+    if ( rc != -ERESTART )
+        invalidate_icache();
+
+    p2m_read_unlock(p2m);
+
+    *pstart = start;
+
+    return rc;
+}
+
+/*
+ * Clean & invalidate RAM associated to the guest vCPU.
+ *
+ * The function can only work with the current vCPU and should be called
+ * with IRQ enabled as the vCPU could get preempted.
+ */
+void p2m_flush_vm(struct vcpu *v)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
+    int rc;
+    gfn_t start = _gfn(0);
+
+    ASSERT(v == current);
+    ASSERT(local_irq_is_enabled());
+    ASSERT(v->arch.need_flush_to_ram);
+
+    do
+    {
+        rc = p2m_cache_flush_range(v->domain, &start, _gfn(ULONG_MAX));
+        if ( rc == -ERESTART )
+            do_softirq();
+    } while ( rc == -ERESTART );
+
+    if ( rc != 0 )
+        gprintk(XENLOG_WARNING,
+                "P2M has not been correctly cleaned (rc = %d)\n",
+                rc);
+
+    /*
+     * Invalidate the p2m to track which page was modified by the guest
+     * between call of p2m_flush_vm().
+     */
+    p2m_invalidate_root(p2m);
+
+    v->arch.need_flush_to_ram = false;
+}
+
+/*
+ * See note at ARMv7 ARM B1.14.4 (DDI 0406C.c) (TL;DR: S/W ops are not
+ * easily virtualized).
+ *
+ * Main problems:
+ *  - S/W ops are local to a CPU (not broadcast)
+ *  - We have line migration behind our back (speculation)
+ *  - System caches don't support S/W at all (damn!)
+ *
+ * In the face of the above, the best we can do is to try and convert
+ * S/W ops to VA ops. Because the guest is not allowed to infer the S/W
+ * to PA mapping, it can only use S/W to nuke the whole cache, which is
+ * rather a good thing for us.
+ *
+ * Also, it is only used when turning caches on/off ("The expected
+ * usage of the cache maintenance instructions that operate by set/way
+ * is associated with the powerdown and powerup of caches, if this is
+ * required by the implementation.").
+ *
+ * We use the following policy:
+ *  - If we trap a S/W operation, we enabled VM trapping to detect
+ *  caches being turned on/off, and do a full clean.
+ *
+ *  - We flush the caches on both caches being turned on and off.
+ *
+ *  - Once the caches are enabled, we stop trapping VM ops.
+ */
+void p2m_set_way_flush(struct vcpu *v, struct cpu_user_regs *regs,
+                       const union hsr hsr)
+{
+    /* This function can only work with the current vCPU. */
+    ASSERT(v == current);
+
+    if ( iommu_use_hap_pt(current->domain) )
+    {
+        gprintk(XENLOG_ERR,
+                "The cache should be flushed by VA rather than by set/way.\n");
+        inject_undef_exception(regs, hsr);
+        return;
+    }
+
+    if ( !(v->arch.hcr_el2 & HCR_TVM) )
+    {
+        v->arch.need_flush_to_ram = true;
+        vcpu_hcr_set_flags(v, HCR_TVM);
+    }
+}
+
+void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
+{
+    bool now_enabled = vcpu_has_cache_enabled(v);
+
+    /* This function can only work with the current vCPU. */
+    ASSERT(v == current);
+
+    /*
+     * If switching the MMU+caches on, need to invalidate the caches.
+     * If switching it off, need to clean the caches.
+     * Clean + invalidate does the trick always.
+     */
+    if ( was_enabled != now_enabled )
+        v->arch.need_flush_to_ram = true;
+
+    /* Caches are now on, stop trapping VM ops (until a S/W op) */
+    if ( now_enabled )
+        vcpu_hcr_clear_flags(v, HCR_TVM);
+}
+
+mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
+{
+    return p2m_lookup(d, gfn, NULL);
+}
+
+struct page_info *get_page_from_gva(struct vcpu *v, vaddr_t va,
+                                    unsigned long flags)
+{
+    struct domain *d = v->domain;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    struct page_info *page = NULL;
+    paddr_t maddr = 0;
+    uint64_t par;
+    mfn_t mfn;
+    p2m_type_t t;
+
+    /*
+     * XXX: To support a different vCPU, we would need to load the
+     * VTTBR_EL2, TTBR0_EL1, TTBR1_EL1 and SCTLR_EL1
+     */
+    if ( v != current )
+        return NULL;
+
+    /*
+     * The lock is here to protect us against the break-before-make
+     * sequence used when updating the entry.
+     */
+    p2m_read_lock(p2m);
+    par = gvirt_to_maddr(va, &maddr, flags);
+    p2m_read_unlock(p2m);
+
+    /*
+     * gvirt_to_maddr may fail if the entry does not have the valid bit
+     * set. Fallback to the second method:
+     *  1) Translate the VA to IPA using software lookup -> Stage-1 page-table
+     *  may not be accessible because the stage-2 entries may have valid
+     *  bit unset.
+     *  2) Software lookup of the MFN
+     *
+     * Note that when memaccess is enabled, we instead call directly
+     * p2m_mem_access_check_and_get_page(...). Because the function is a
+     * a variant of the methods described above, it will be able to
+     * handle entries with valid bit unset.
+     *
+     * TODO: Integrate more nicely memaccess with the rest of the
+     * function.
+     * TODO: Use the fault error in PAR_EL1 to avoid pointless
+     *  translation.
+     */
+    if ( par )
+    {
+        paddr_t ipa;
+        unsigned int s1_perms;
+
+        /*
+         * When memaccess is enabled, the translation GVA to MADDR may
+         * have failed because of a permission fault.
+         */
+        if ( p2m->mem_access_enabled )
+            return p2m_mem_access_check_and_get_page(va, flags, v);
+
+        /*
+         * The software stage-1 table walk can still fail, e.g, if the
+         * GVA is not mapped.
+         */
+        if ( !guest_walk_tables(v, va, &ipa, &s1_perms) )
+        {
+            dprintk(XENLOG_G_DEBUG,
+                    "%pv: Failed to walk page-table va %#"PRIvaddr"\n", v, va);
+            return NULL;
+        }
+
+        mfn = p2m_lookup(d, gaddr_to_gfn(ipa), &t);
+        if ( mfn_eq(INVALID_MFN, mfn) || !p2m_is_ram(t) )
+            return NULL;
+
+        /*
+         * Check permission that are assumed by the caller. For instance
+         * in case of guestcopy, the caller assumes that the translated
+         * page can be accessed with the requested permissions. If this
+         * is not the case, we should fail.
+         *
+         * Please note that we do not check for the GV2M_EXEC
+         * permission. This is fine because the hardware-based translation
+         * instruction does not test for execute permissions.
+         */
+        if ( (flags & GV2M_WRITE) && !(s1_perms & GV2M_WRITE) )
+            return NULL;
+
+        if ( (flags & GV2M_WRITE) && t != p2m_ram_rw )
+            return NULL;
+    }
+    else
+        mfn = maddr_to_mfn(maddr);
+
+    if ( !mfn_valid(mfn) )
+    {
+        dprintk(XENLOG_G_DEBUG, "%pv: Invalid MFN %#"PRI_mfn"\n",
+                v, mfn_x(mfn));
+        return NULL;
+    }
+
+    page = mfn_to_page(mfn);
+    ASSERT(page);
+
+    if ( unlikely(!get_page(page, d)) )
+    {
+        dprintk(XENLOG_G_DEBUG, "%pv: Failing to acquire the MFN %#"PRI_mfn"\n",
+                v, mfn_x(maddr_to_mfn(maddr)));
+        return NULL;
+    }
+
+    return page;
+}
+
+/* VTCR value to be configured by all CPUs. Set only once by the boot CPU */
+static register_t __read_mostly vtcr;
+
+static void setup_virt_paging_one(void *data)
+{
+    WRITE_SYSREG(vtcr, VTCR_EL2);
+
+    /*
+     * ARM64_WORKAROUND_AT_SPECULATE: We want to keep the TLBs free from
+     * entries related to EL1/EL0 translation regime until a guest vCPU
+     * is running. For that, we need to set-up VTTBR to point to an empty
+     * page-table and turn on stage-2 translation. The TLB entries
+     * associated with EL1/EL0 translation regime will also be flushed in case
+     * an AT instruction was speculated before hand.
+     */
+    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+    {
+        WRITE_SYSREG64(generate_vttbr(INVALID_VMID, empty_root_mfn), VTTBR_EL2);
+        WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_VM, HCR_EL2);
+        isb();
+
+        flush_all_guests_tlb_local();
+    }
+}
+
+void __init setup_virt_paging(void)
+{
+    /* Setup Stage 2 address translation */
+    register_t val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
+
+#ifdef CONFIG_ARM_32
+    if ( p2m_ipa_bits < 40 )
+        panic("P2M: Not able to support %u-bit IPA at the moment\n",
+              p2m_ipa_bits);
+
+    printk("P2M: 40-bit IPA\n");
+    p2m_ipa_bits = 40;
+    val |= VTCR_T0SZ(0x18); /* 40 bit IPA */
+    val |= VTCR_SL0(0x1); /* P2M starts at first level */
+#else /* CONFIG_ARM_64 */
+    const struct {
+        unsigned int pabits; /* Physical Address Size */
+        unsigned int t0sz;   /* Desired T0SZ, minimum in comment */
+        unsigned int root_order; /* Page order of the root of the p2m */
+        unsigned int sl0;    /* Desired SL0, maximum in comment */
+    } pa_range_info[] = {
+        /* T0SZ minimum and SL0 maximum from ARM DDI 0487H.a Table D5-6 */
+        /*      PA size, t0sz(min), root-order, sl0(max) */
+        [0] = { 32,      32/*32*/,  0,          1 },
+        [1] = { 36,      28/*28*/,  0,          1 },
+        [2] = { 40,      24/*24*/,  1,          1 },
+        [3] = { 42,      22/*22*/,  3,          1 },
+        [4] = { 44,      20/*20*/,  0,          2 },
+        [5] = { 48,      16/*16*/,  0,          2 },
+        [6] = { 52,      12/*12*/,  4,          2 },
+        [7] = { 0 }  /* Invalid */
+    };
+
+    unsigned int i;
+    unsigned int pa_range = 0x10; /* Larger than any possible value */
+
+    /*
+     * Restrict "p2m_ipa_bits" if needed. As P2M table is always configured
+     * with IPA bits == PA bits, compare against "pabits".
+     */
+    if ( pa_range_info[system_cpuinfo.mm64.pa_range].pabits < p2m_ipa_bits )
+        p2m_ipa_bits = pa_range_info[system_cpuinfo.mm64.pa_range].pabits;
+
+    /*
+     * cpu info sanitization made sure we support 16bits VMID only if all
+     * cores are supporting it.
+     */
+    if ( system_cpuinfo.mm64.vmid_bits == MM64_VMID_16_BITS_SUPPORT )
+        max_vmid = MAX_VMID_16_BIT;
+
+    /* Choose suitable "pa_range" according to the resulted "p2m_ipa_bits". */
+    for ( i = 0; i < ARRAY_SIZE(pa_range_info); i++ )
+    {
+        if ( p2m_ipa_bits == pa_range_info[i].pabits )
+        {
+            pa_range = i;
+            break;
+        }
+    }
+
+    /* pa_range is 4 bits but we don't support all modes */
+    if ( pa_range >= ARRAY_SIZE(pa_range_info) || !pa_range_info[pa_range].pabits )
+        panic("Unknown encoding of ID_AA64MMFR0_EL1.PARange %x\n", pa_range);
+
+    val |= VTCR_PS(pa_range);
+    val |= VTCR_TG0_4K;
+
+    /* Set the VS bit only if 16 bit VMID is supported. */
+    if ( MAX_VMID == MAX_VMID_16_BIT )
+        val |= VTCR_VS;
+    val |= VTCR_SL0(pa_range_info[pa_range].sl0);
+    val |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
+
+    p2m_root_order = pa_range_info[pa_range].root_order;
+    p2m_root_level = 2 - pa_range_info[pa_range].sl0;
+    p2m_ipa_bits = 64 - pa_range_info[pa_range].t0sz;
+
+    printk("P2M: %d-bit IPA with %d-bit PA and %d-bit VMID\n",
+           p2m_ipa_bits,
+           pa_range_info[pa_range].pabits,
+           ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
+#endif
+    printk("P2M: %d levels with order-%d root, VTCR 0x%"PRIregister"\n",
+           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
+
+    p2m_vmid_allocator_init();
+
+    /* It is not allowed to concatenate a level zero root */
+    BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
+    vtcr = val;
+
+    /*
+     * ARM64_WORKAROUND_AT_SPECULATE requires to allocate root table
+     * with all entries zeroed.
+     */
+    if ( cpus_have_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+    {
+        struct page_info *root;
+
+        root = p2m_allocate_root();
+        if ( !root )
+            panic("Unable to allocate root table for ARM64_WORKAROUND_AT_SPECULATE\n");
+
+        empty_root_mfn = page_to_mfn(root);
+    }
+
+    setup_virt_paging_one(NULL);
+    smp_call_function(setup_virt_paging_one, NULL, 1);
+}
+
+static int cpu_virt_paging_callback(struct notifier_block *nfb,
+                                    unsigned long action,
+                                    void *hcpu)
+{
+    switch ( action )
+    {
+    case CPU_STARTING:
+        ASSERT(system_state != SYS_STATE_boot);
+        setup_virt_paging_one(NULL);
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_virt_paging_nfb = {
+    .notifier_call = cpu_virt_paging_callback,
+};
+
+static int __init cpu_virt_paging_init(void)
+{
+    register_cpu_notifier(&cpu_virt_paging_nfb);
+
+    return 0;
+}
+/*
+ * Initialization of the notifier has to be done at init rather than presmp_init
+ * phase because: the registered notifier is used to setup virtual paging for
+ * non-boot CPUs after the initial virtual paging for all CPUs is already setup,
+ * i.e. when a non-boot CPU is hotplugged after the system has booted. In other
+ * words, the notifier should be registered after the virtual paging is
+ * initially setup (setup_virt_paging() is called from start_xen()). This is
+ * required because vtcr config value has to be set before a notifier can fire.
+ */
+__initcall(cpu_virt_paging_init);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/p2m_mpu.c b/xen/arch/arm/p2m_mpu.c
new file mode 100644
index 0000000000..0a95d58111
--- /dev/null
+++ b/xen/arch/arm/p2m_mpu.c
@@ -0,0 +1,191 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <xen/lib.h>
+#include <xen/mm-frame.h>
+#include <xen/sched.h>
+
+#include <asm/p2m.h>
+
+/* TODO: Implement on the first usage */
+void p2m_write_unlock(struct p2m_domain *p2m)
+{
+}
+
+void p2m_dump_info(struct domain *d)
+{
+}
+
+void dump_p2m_lookup(struct domain *d, paddr_t addr)
+{
+}
+
+void p2m_save_state(struct vcpu *p)
+{
+}
+
+void p2m_restore_state(struct vcpu *n)
+{
+}
+
+mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+                    p2m_type_t *t, p2m_access_t *a,
+                    unsigned int *page_order,
+                    bool *valid)
+{
+    return INVALID_MFN;
+}
+
+mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
+{
+    return INVALID_MFN;
+}
+
+struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn,
+                                        p2m_type_t *t)
+{
+    return NULL;
+}
+
+int p2m_set_entry(struct p2m_domain *p2m,
+                  gfn_t sgfn,
+                  unsigned long nr,
+                  mfn_t smfn,
+                  p2m_type_t t,
+                  p2m_access_t a)
+{
+    return -ENOSYS;
+}
+
+void p2m_invalidate_root(struct p2m_domain *p2m)
+{
+}
+
+bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn)
+{
+    return false;
+}
+
+int p2m_insert_mapping(struct domain *d, gfn_t start_gfn, unsigned long nr,
+                       mfn_t mfn, p2m_type_t t)
+{
+    return -ENOSYS;
+}
+
+int map_regions_p2mt(struct domain *d,
+                     gfn_t gfn,
+                     unsigned long nr,
+                     mfn_t mfn,
+                     p2m_type_t p2mt)
+{
+    return -ENOSYS;
+}
+
+int unmap_regions_p2mt(struct domain *d,
+                       gfn_t gfn,
+                       unsigned long nr,
+                       mfn_t mfn)
+{
+    return -ENOSYS;
+}
+
+int map_mmio_regions(struct domain *d,
+                     gfn_t start_gfn,
+                     unsigned long nr,
+                     mfn_t mfn)
+{
+    return -ENOSYS;
+}
+
+int unmap_mmio_regions(struct domain *d,
+                       gfn_t start_gfn,
+                       unsigned long nr,
+                       mfn_t mfn)
+{
+    return -ENOSYS;
+}
+
+int map_dev_mmio_page(struct domain *d, gfn_t gfn, mfn_t mfn)
+{
+    return -ENOSYS;
+}
+
+int guest_physmap_add_entry(struct domain *d,
+                            gfn_t gfn,
+                            mfn_t mfn,
+                            unsigned long page_order,
+                            p2m_type_t t)
+{
+    return -ENOSYS;
+}
+
+int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
+                              unsigned int page_order)
+{
+    return -ENOSYS;
+}
+
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    return -ENOSYS;
+}
+
+int p2m_teardown(struct domain *d, bool allow_preemption)
+{
+    return -ENOSYS;
+}
+
+void p2m_final_teardown(struct domain *d)
+{
+}
+
+int p2m_init(struct domain *d)
+{
+    return -ENOSYS;
+}
+
+int relinquish_p2m_mapping(struct domain *d)
+{
+    return -ENOSYS;
+}
+
+int p2m_cache_flush_range(struct domain *d, gfn_t *pstart, gfn_t end)
+{
+    return -ENOSYS;
+}
+
+void p2m_flush_vm(struct vcpu *v)
+{
+}
+
+void p2m_set_way_flush(struct vcpu *v, struct cpu_user_regs *regs,
+                       const union hsr hsr)
+{
+}
+
+void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
+{
+}
+
+mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
+{
+    return INVALID_MFN;
+}
+
+struct page_info *get_page_from_gva(struct vcpu *v, vaddr_t va,
+                                    unsigned long flags)
+{
+    return NULL;
+}
+
+void __init setup_virt_paging(void)
+{
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 38/40] xen/mpu: implement setup_virt_paging for MPU system
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (36 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 37/40] xen/mpu: move MMU specific P2M code to p2m_mmu.c Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 39/40] xen/mpu: re-order xen_mpumap in arch_init_finialize Penny Zheng
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

For MMU system setup_virt_paging is used to configure stage 2 address
translation, like IPA bits, VMID bits, etc. And this function is also doing the
VMID allocator initializtion for later VM creation.

Except for IPA bits and VMID bits, the setup_virt_paging function in MPU
system should be also responsible for determining the default EL1/EL0
translation regime.
ARMv8-R AArch64 could have the following memory translation regime:
- PMSAv8-64 at both EL1/EL0 and EL2
- PMSAv8-64 or VMSAv8-64 at EL1/EL0 and PMSAv8-64 at EL2
The default value will be VMSAv8-64, unless the platform could not support,
which could be checked against MSA_frac bit in Memory Model Feature Register 0(
ID_AA64MMFR0_EL1)

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/include/asm/arm64/sysregs.h |  6 ++
 xen/arch/arm/include/asm/cpufeature.h    |  7 ++
 xen/arch/arm/include/asm/p2m.h           | 18 +++++
 xen/arch/arm/include/asm/processor.h     | 13 ++++
 xen/arch/arm/p2m.c                       | 28 ++++++++
 xen/arch/arm/p2m_mmu.c                   | 38 ----------
 xen/arch/arm/p2m_mpu.c                   | 91 ++++++++++++++++++++++--
 7 files changed, 159 insertions(+), 42 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h b/xen/arch/arm/include/asm/arm64/sysregs.h
index 9546e8e3d0..7d4f959dae 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -507,6 +507,12 @@
 /* MPU Protection Region Enable Register encode */
 #define PRENR_EL2 S3_4_C6_C1_1
 
+/* Virtualization Secure Translation Control Register */
+#define VSTCR_EL2  S3_4_C2_C6_2
+#define VSTCR_EL2_RES1_SHIFT 31
+#define VSTCR_EL2_SA_SHIFT   30
+#define VSTCR_EL2_SC_SHIFT   20
+
 #endif
 
 #ifdef CONFIG_ARM_SECURE_STATE
diff --git a/xen/arch/arm/include/asm/cpufeature.h b/xen/arch/arm/include/asm/cpufeature.h
index c62cf6293f..513e5b9918 100644
--- a/xen/arch/arm/include/asm/cpufeature.h
+++ b/xen/arch/arm/include/asm/cpufeature.h
@@ -244,6 +244,12 @@ struct cpuinfo_arm {
             unsigned long tgranule_16K:4;
             unsigned long tgranule_64K:4;
             unsigned long tgranule_4K:4;
+#ifdef CONFIG_ARM_V8R
+            unsigned long __res:16;
+            unsigned long msa:4;
+            unsigned long msa_frac:4;
+            unsigned long __res0:8;
+#else
             unsigned long tgranule_16k_2:4;
             unsigned long tgranule_64k_2:4;
             unsigned long tgranule_4k_2:4;
@@ -251,6 +257,7 @@ struct cpuinfo_arm {
             unsigned long __res0:8;
             unsigned long fgt:4;
             unsigned long ecv:4;
+#endif
 
             /* MMFR1 */
             unsigned long hafdbs:4;
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index a430aca232..cd28a9091a 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -14,9 +14,27 @@
 /* Holds the bit size of IPAs in p2m tables.  */
 extern unsigned int p2m_ipa_bits;
 
+#define MAX_VMID_8_BIT  (1UL << 8)
+#define MAX_VMID_16_BIT (1UL << 16)
+
+#define INVALID_VMID 0 /* VMID 0 is reserved */
+
+#ifdef CONFIG_ARM_64
+extern unsigned int max_vmid;
+/* VMID is by default 8 bit width on AArch64 */
+#define MAX_VMID       max_vmid
+#else
+/* VMID is always 8 bit width on AArch32 */
+#define MAX_VMID        MAX_VMID_8_BIT
+#endif
+
+extern spinlock_t vmid_alloc_lock;
+extern unsigned long *vmid_mask;
+
 struct domain;
 
 extern void memory_type_changed(struct domain *);
+extern void p2m_vmid_allocator_init(void);
 
 /* Per-p2m-table state */
 struct p2m_domain {
diff --git a/xen/arch/arm/include/asm/processor.h b/xen/arch/arm/include/asm/processor.h
index 1dd81d7d52..d866421d88 100644
--- a/xen/arch/arm/include/asm/processor.h
+++ b/xen/arch/arm/include/asm/processor.h
@@ -388,6 +388,12 @@
 
 #define VTCR_RES1       (_AC(1,UL)<<31)
 
+#ifdef CONFIG_ARM_V8R
+#define VTCR_MSA_VMSA   (_AC(0x1,UL)<<31)
+#define VTCR_MSA_PMSA   ~(_AC(0x1,UL)<<31)
+#define NSA_SEL2        ~(_AC(0x1,UL)<<30)
+#endif
+
 /* HCPTR Hyp. Coprocessor Trap Register */
 #define HCPTR_TAM       ((_AC(1,U)<<30))
 #define HCPTR_TTA       ((_AC(1,U)<<20))        /* Trap trace registers */
@@ -447,6 +453,13 @@
 #define MM64_VMID_16_BITS_SUPPORT   0x2
 #endif
 
+#ifdef CONFIG_ARM_V8R
+#define MM64_MSA_PMSA_SUPPORT       0xf
+#define MM64_MSA_FRAC_NONE_SUPPORT  0x0
+#define MM64_MSA_FRAC_PMSA_SUPPORT  0x1
+#define MM64_MSA_FRAC_VMSA_SUPPORT  0x2
+#endif
+
 #ifndef __ASSEMBLY__
 
 extern register_t __cpu_logical_map[];
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 42f51051e0..0d0063aa2e 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -4,6 +4,21 @@
 
 #include <asm/event.h>
 #include <asm/page.h>
+#include <asm/p2m.h>
+
+#ifdef CONFIG_ARM_64
+unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
+#endif
+
+spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
+
+/*
+ * VTTBR_EL2 VMID field is 8 or 16 bits. AArch64 may support 16-bit VMID.
+ * Using a bitmap here limits us to 256 or 65536 (for AArch64) concurrent
+ * domains. The bitmap space will be allocated dynamically based on
+ * whether 8 or 16 bit VMIDs are supported.
+ */
+unsigned long *vmid_mask;
 
 /*
  * Set to the maximum configured support for IPA bits, so the number of IPA bits can be
@@ -142,6 +157,19 @@ void __init p2m_restrict_ipa_bits(unsigned int ipa_bits)
         p2m_ipa_bits = ipa_bits;
 }
 
+void p2m_vmid_allocator_init(void)
+{
+    /*
+     * allocate space for vmid_mask based on MAX_VMID
+     */
+    vmid_mask = xzalloc_array(unsigned long, BITS_TO_LONGS(MAX_VMID));
+
+    if ( !vmid_mask )
+        panic("Could not allocate VMID bitmap space\n");
+
+    set_bit(INVALID_VMID, vmid_mask);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/p2m_mmu.c b/xen/arch/arm/p2m_mmu.c
index 88a9d8f392..7e1afd0bb3 100644
--- a/xen/arch/arm/p2m_mmu.c
+++ b/xen/arch/arm/p2m_mmu.c
@@ -14,20 +14,6 @@
 #include <asm/page.h>
 #include <asm/traps.h>
 
-#define MAX_VMID_8_BIT  (1UL << 8)
-#define MAX_VMID_16_BIT (1UL << 16)
-
-#define INVALID_VMID 0 /* VMID 0 is reserved */
-
-#ifdef CONFIG_ARM_64
-static unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
-/* VMID is by default 8 bit width on AArch64 */
-#define MAX_VMID       max_vmid
-#else
-/* VMID is always 8 bit width on AArch32 */
-#define MAX_VMID        MAX_VMID_8_BIT
-#endif
-
 #ifdef CONFIG_ARM_64
 unsigned int __read_mostly p2m_root_order;
 unsigned int __read_mostly p2m_root_level;
@@ -1516,30 +1502,6 @@ static int p2m_alloc_table(struct domain *d)
     return 0;
 }
 
-
-static spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
-
-/*
- * VTTBR_EL2 VMID field is 8 or 16 bits. AArch64 may support 16-bit VMID.
- * Using a bitmap here limits us to 256 or 65536 (for AArch64) concurrent
- * domains. The bitmap space will be allocated dynamically based on
- * whether 8 or 16 bit VMIDs are supported.
- */
-static unsigned long *vmid_mask;
-
-static void p2m_vmid_allocator_init(void)
-{
-    /*
-     * allocate space for vmid_mask based on MAX_VMID
-     */
-    vmid_mask = xzalloc_array(unsigned long, BITS_TO_LONGS(MAX_VMID));
-
-    if ( !vmid_mask )
-        panic("Could not allocate VMID bitmap space\n");
-
-    set_bit(INVALID_VMID, vmid_mask);
-}
-
 static int p2m_alloc_vmid(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/arm/p2m_mpu.c b/xen/arch/arm/p2m_mpu.c
index 0a95d58111..77b4bc9221 100644
--- a/xen/arch/arm/p2m_mpu.c
+++ b/xen/arch/arm/p2m_mpu.c
@@ -2,8 +2,95 @@
 #include <xen/lib.h>
 #include <xen/mm-frame.h>
 #include <xen/sched.h>
+#include <xen/warning.h>
 
 #include <asm/p2m.h>
+#include <asm/processor.h>
+#include <asm/sysregs.h>
+
+void __init setup_virt_paging(void)
+{
+    uint64_t val = 0;
+    bool p2m_vmsa = true;
+
+    /* PA size */
+    const unsigned int pa_range_info[] = { 32, 36, 40, 42, 44, 48, 52, 0, /* Invalid */ };
+
+    /*
+     * Restrict "p2m_ipa_bits" if needed. As P2M table is always configured
+     * with IPA bits == PA bits, compare against "pabits".
+     */
+    if ( pa_range_info[system_cpuinfo.mm64.pa_range] < p2m_ipa_bits )
+        p2m_ipa_bits = pa_range_info[system_cpuinfo.mm64.pa_range];
+
+    /* In ARMV8R, hypervisor in secure EL2. */
+    val &= NSA_SEL2;
+
+    /*
+     * ARMv8-R AArch64 could have the following memory system
+     * configurations:
+     * - PMSAv8-64 at EL1 and EL2
+     * - PMSAv8-64 or VMSAv8-64 at EL1 and PMSAv8-64 at EL2
+     *
+     * In ARMv8-R, the only permitted value is
+     * 0b1111(MM64_MSA_PMSA_SUPPORT).
+     */
+    if ( system_cpuinfo.mm64.msa == MM64_MSA_PMSA_SUPPORT )
+    {
+        if ( system_cpuinfo.mm64.msa_frac == MM64_MSA_FRAC_NONE_SUPPORT )
+            goto fault;
+
+        if ( system_cpuinfo.mm64.msa_frac != MM64_MSA_FRAC_VMSA_SUPPORT )
+        {
+            p2m_vmsa = false;
+            warning_add("Be aware of that there is no support for VMSAv8-64 at EL1 on this platform.\n");
+        }
+    }
+    else
+        goto fault;
+
+    /*
+     * If the platform supports both PMSAv8-64 or VMSAv8-64 at EL1,
+     * then it's VTCR_EL2.MSA that determines the EL1 memory system
+     * architecture.
+     * Normally, we set the initial VTCR_EL2.MSA value VMSAv8-64 support,
+     * unless this platform only supports PMSAv8-64.
+     */
+    if ( !p2m_vmsa )
+        val &= VTCR_MSA_PMSA;
+    else
+        val |= VTCR_MSA_VMSA;
+
+    /*
+     * cpuinfo sanitization makes sure we support 16bits VMID only if
+     * all cores are supporting it.
+     */
+    if ( system_cpuinfo.mm64.vmid_bits == MM64_VMID_16_BITS_SUPPORT )
+        max_vmid = MAX_VMID_16_BIT;
+
+    /* Set the VS bit only if 16 bit VMID is supported. */
+    if ( MAX_VMID == MAX_VMID_16_BIT )
+        val |= VTCR_VS;
+
+    p2m_vmid_allocator_init();
+
+    WRITE_SYSREG(val, VTCR_EL2);
+
+    /*
+     * All stage 2 translations for the Secure PA space access the
+     * Secure PA space, so we keep SA bit as 0.
+     *
+     * Stage 2 NS configuration is checked against stage 1 NS configuration
+     * in EL1&0 translation regime for the given address, and generate a
+     * fault if they are different. So we set SC bit as 1.
+     */
+    WRITE_SYSREG(1 << VSTCR_EL2_RES1_SHIFT | 1 << VSTCR_EL2_SC_SHIFT, VTCR_EL2);
+
+    return;
+
+fault:
+    panic("Hardware with no PMSAv8-64 support in any translation regime.\n");
+}
 
 /* TODO: Implement on the first usage */
 void p2m_write_unlock(struct p2m_domain *p2m)
@@ -177,10 +264,6 @@ struct page_info *get_page_from_gva(struct vcpu *v, vaddr_t va,
     return NULL;
 }
 
-void __init setup_virt_paging(void)
-{
-}
-
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 39/40] xen/mpu: re-order xen_mpumap in arch_init_finialize
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (37 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 38/40] xen/mpu: implement setup_virt_paging for MPU system Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH v2 40/40] xen/mpu: add Kconfig option to enable Armv8-R AArch64 support Penny Zheng
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Penny Zheng

In function init_done, we have finished the booting and we do the final
clean-up working, including marking the section .data.ro_after_init
read-only, free init text and init data section, etc.

In MPU system, other than above operations, we also need to re-order
Xen MPU memory region mapping table(xen_mpumap).

In xen_mpumap, we have two type MPU memory region: fixed memory region
and switching memory region.
Fixed memory region are referring to the regions which won't change
since birth, like Xen .text section, while switching region(i.e. device memory)
are regions that gets switched out when idle vcpu leaving hypervisor mode,
and gets switched in when idle vcpu entering hypervisor mode. They were added
at tail during the boot stage.
To save the trouble of hunting down each switching region in time-sensitive
context switch, we re-order xen_mpumap to keep fixed regions still in the
front, and switching ones in the heels of them.

We define a MPU memory region mapping table(sw_mpumap) to store all
switching regions. After disabling them at its original position, we
re-enable them at re-ordering position.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
---
 xen/arch/arm/include/asm/arm64/mpu.h |   5 ++
 xen/arch/arm/include/asm/mm_mpu.h    |   1 +
 xen/arch/arm/include/asm/setup.h     |   2 +
 xen/arch/arm/mm_mpu.c                | 110 +++++++++++++++++++++++++++
 xen/arch/arm/setup.c                 |  13 +---
 xen/arch/arm/setup_mmu.c             |  16 ++++
 xen/arch/arm/setup_mpu.c             |  20 +++++
 7 files changed, 155 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
index b4e50a9a0e..e058f36435 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -155,6 +155,11 @@ typedef struct {
     (uint64_t)((_pr->prlar.reg.limit << MPU_REGION_SHIFT) | 0x3f); \
 })
 
+#define region_needs_switching_on_ctxt(pr) ({               \
+    pr_t *_pr = pr;                                         \
+    _pr->prlar.reg.sw;                                      \
+})
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ARM64_MPU_H__ */
diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
index 5aa61c43b6..f8f54eb901 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -10,6 +10,7 @@
  * section by section based on static configuration in Device Tree.
  */
 extern void setup_static_mappings(void);
+extern int reorder_xen_mpumap(void);
 
 extern struct page_info *frame_table;
 
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index d4c1336597..39cd95553d 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -182,6 +182,8 @@ int map_range_to_domain(const struct dt_device_node *dev,
 
 extern const char __ro_after_init_start[], __ro_after_init_end[];
 
+extern void arch_init_finialize(void);
+
 struct init_info
 {
     /* Pointer to the stack, used by head.S when entering in C */
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 118bb11d1a..434ed872c1 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -80,6 +80,25 @@ static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
 
 extern char __init_data_begin[], __init_end[];
 
+/*
+ * MPU memory mapping table records regions that needs switching in/out
+ * during vcpu context switch
+ */
+static pr_t *sw_mpumap;
+static uint64_t nr_sw_mpumap;
+
+/*
+ * After reordering, nr_xen_mpumap records number of regions for Xen fixed
+ * memory mapping
+ */
+static uint64_t nr_xen_mpumap;
+
+/*
+ * After reordering, nr_cpu_mpumap records number of EL2 valid
+ * MPU memory regions
+ */
+static uint64_t nr_cpu_mpumap;
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
     uint64_t _sel = sel;                                                \
@@ -847,6 +866,97 @@ void unmap_page_from_xen_misc(void)
 {
 }
 
+void dump_hyp_mapping(void)
+{
+    uint64_t i = 0;
+    pr_t region;
+
+    for ( i = 0; i < nr_cpu_mpumap; i++ )
+    {
+        access_protection_region(true, &region, NULL, i);
+        printk(XENLOG_INFO
+               "MPU memory region [%lu]: 0x%"PRIpaddr" - 0x%"PRIpaddr".\n",
+               i, pr_get_base(&region), pr_get_limit(&region));
+    }
+}
+
+/* Standard entry to dynamically allocate MPU memory region mapping table. */
+static pr_t *alloc_mpumap(void)
+{
+    pr_t *map;
+
+    /*
+     * A MPU memory region structure(pr_t) takes 16 bytes, even with maximum
+     * supported MPU protection regions in EL2, 255, MPU table at most takes up
+     * less than 4KB(PAGE_SIZE).
+     */
+    map = alloc_xenheap_pages(0, 0);
+    if ( map == NULL )
+        return NULL;
+
+    clear_page(map);
+    return map;
+}
+
+/*
+ * Switching region(i.e. device memory) are regions that gets switched out
+ * when idle vcpu leaving hypervisor mode, and gets switched in when idle vcpu
+ * entering hypervisor mode. They're added at tail during the boot stage.
+ * To save the trouble of hunting down each switching region in time-sensitive
+ * context switch, we re-order xen_mpumap to keep fixed regions still in the
+ * front, and switching ones in the heels of them.
+ */
+int reorder_xen_mpumap(void)
+{
+    uint64_t i;
+
+    sw_mpumap = alloc_mpumap();
+    if ( !sw_mpumap )
+        return -ENOMEM;
+
+    /* Record the switching regions in sw_mpumap. */
+    for ( i = next_transient_region_idx - 1; i < max_xen_mpumap; i++ )
+    {
+        pr_t *region;
+
+        region = &xen_mpumap[i];
+        if ( region_is_valid(region) && region_needs_switching_on_ctxt(region) )
+        {
+            sw_mpumap[nr_sw_mpumap++] = xen_mpumap[i];
+
+            /*
+             * Disable it temporarily for later enabling it in the
+             * new reordered position
+             * WARNING: since device memory section, as switching region,
+             * will get disabled temporarily, console will become inaccessible
+             * in a short time.
+             */
+            control_mpu_region_from_index(i, false);
+            memset(&xen_mpumap[i], 0, sizeof(pr_t));
+        }
+    }
+
+    /* Put switching regions after fixed regions */
+    i = 0;
+    nr_cpu_mpumap = nr_xen_mpumap = next_fixed_region_idx;
+    do
+    {
+        access_protection_region(false, NULL,
+                                 (const pr_t*)(&sw_mpumap[i]),
+                                 nr_cpu_mpumap);
+        nr_cpu_mpumap++;
+    } while ( ++i < nr_sw_mpumap );
+
+    /*
+     * Now, xen_mpumap becomes a tight mapping, with fixed region at front and
+     * switching ones after fixed ones.
+     */
+    printk(XENLOG_INFO "Xen EL2 MPU memory region mapping after re-order.\n");
+    dump_hyp_mapping();
+
+    return 0;
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 49ba998f68..b21fc4b8e2 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -61,23 +61,12 @@ domid_t __read_mostly max_init_domid;
 
 static __used void init_done(void)
 {
-    int rc;
-
     /* Must be done past setting system_state. */
     unregister_init_virtual_region();
 
     free_init_memory();
 
-    /*
-     * We have finished booting. Mark the section .data.ro_after_init
-     * read-only.
-     */
-    rc = modify_xen_mappings((unsigned long)&__ro_after_init_start,
-                             (unsigned long)&__ro_after_init_end,
-                             PAGE_HYPERVISOR_RO);
-    if ( rc )
-        panic("Unable to mark the .data.ro_after_init section read-only (rc = %d)\n",
-              rc);
+    arch_init_finialize();
 
     startup_cpu_idle_loop();
 }
diff --git a/xen/arch/arm/setup_mmu.c b/xen/arch/arm/setup_mmu.c
index 611a60633e..5b7a5de086 100644
--- a/xen/arch/arm/setup_mmu.c
+++ b/xen/arch/arm/setup_mmu.c
@@ -365,6 +365,22 @@ void __init discard_initial_modules(void)
     remove_early_mappings();
 }
 
+void arch_init_finialize(void)
+{
+    int rc;
+
+    /*
+     * We have finished booting. Mark the section .data.ro_after_init
+     * read-only.
+     */
+    rc = modify_xen_mappings((unsigned long)&__ro_after_init_start,
+                             (unsigned long)&__ro_after_init_end,
+                             PAGE_HYPERVISOR_RO);
+    if ( rc )
+        panic("Unable to mark the .data.ro_after_init section read-only (rc = %d)\n",
+              rc);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index f47f1f39ee..b510780cde 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -178,6 +178,26 @@ void __init discard_initial_modules(void)
     remove_early_mappings();
 }
 
+void arch_init_finialize(void)
+{
+    int rc;
+
+    /*
+     * We have finished booting. Mark the section .data.ro_after_init
+     * read-only.
+     */
+    rc = modify_xen_mappings((unsigned long)&__ro_after_init_start,
+                             (unsigned long)&__ro_after_init_end,
+                             REGION_HYPERVISOR_RO);
+    if ( rc )
+        panic("mpu: Unable to mark the .data.ro_after_init section read-only (rc = %d)\n",
+              rc);
+
+    rc = reorder_xen_mpumap();
+    if ( rc )
+        panic("mpu: Failed to reorder Xen MPU memory mapping (rc = %d)\n", rc);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH v2 40/40] xen/mpu: add Kconfig option to enable Armv8-R AArch64 support
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (38 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 39/40] xen/mpu: re-order xen_mpumap in arch_init_finialize Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  5:29 ` [PATCH] xen/mpu: make Xen boot to idle on MPU systems(DNM) Penny Zheng
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk

Introduce a Kconfig option to enable Armv8-R64 architecture
support. STATIC_MEMORY and HAS_MPU will be selected by
ARM_V8R by default, because Armv8-R64 only has PMSAv8-64 on secure-EL2
and only supports statically configured system.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Kconfig | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ee942a33bc..dc93b805a6 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -9,6 +9,15 @@ config ARM_64
 	select 64BIT
 	select HAS_FAST_MULTIPLY
 
+config ARM_V8R
+       bool "ARMv8-R AArch64 architecture support (UNSUPPORTED)" if UNSUPPORTED
+       default n
+       select STATIC_MEMORY
+       depends on ARM_64
+       help
+         This option enables Armv8-R profile for Arm64. Enabling this option
+         results in selecting MPU.
+
 config ARM
 	def_bool y
 	select HAS_ALTERNATIVE if !ARM_V8R
@@ -68,6 +77,10 @@ config HAS_ITS
         bool "GICv3 ITS MSI controller support (UNSUPPORTED)" if UNSUPPORTED
         depends on GICV3 && !NEW_VGIC && !ARM_32
 
+config HAS_MPU
+	bool "Protected Memory System Architecture"
+	depends on ARM_V8R
+
 config HVM
         def_bool y
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH] xen/mpu: make Xen boot to idle on MPU systems(DNM)
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (39 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH v2 40/40] xen/mpu: add Kconfig option to enable Armv8-R AArch64 support Penny Zheng
@ 2023-01-13  5:29 ` Penny Zheng
  2023-01-13  8:54 ` [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Jan Beulich
  2023-01-24 19:31 ` Ayan Kumar Halder
  42 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13  5:29 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk

From: Wei Chen <wei.chen@arm.com>

As we have not implmented guest support in part#1 series of MPU
support, Xen can not create any guest in boot time. So in this
patch we make Xen boot to idle on MPU system for reviewer to
test part#1 series.

THIS PATCH IS ONLY FOR TESTING, NOT FOR REVIEWING.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/mm_mpu.c |  3 +++
 xen/arch/arm/setup.c  | 21 ++++++++++++---------
 xen/arch/arm/traps.c  |  2 ++
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 434ed872c1..73d5779ab4 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -32,6 +32,9 @@
 #include <asm/page.h>
 #include <asm/setup.h>
 
+/* Non-boot CPUs use this to find the correct pagetables. */
+uint64_t init_ttbr;
+
 #ifdef NDEBUG
 static inline void
 __attribute__ ((__format__ (__printf__, 1, 2)))
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index b21fc4b8e2..d04ad8f838 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -803,16 +803,19 @@ void __init start_xen(unsigned long boot_phys_offset,
 #endif
     enable_cpu_features();
 
-    /* Create initial domain 0. */
-    if ( !is_dom0less_mode() )
-        create_dom0();
-    else
-        printk(XENLOG_INFO "Xen dom0less mode detected\n");
-
-    if ( acpi_disabled )
+    if ( !IS_ENABLED(CONFIG_ARM_V8R) )
     {
-        create_domUs();
-        alloc_static_evtchn();
+        /* Create initial domain 0. */
+        if ( !is_dom0less_mode() )
+            create_dom0();
+        else
+            printk(XENLOG_INFO "Xen dom0less mode detected\n");
+
+        if ( acpi_disabled )
+        {
+            create_domUs();
+            alloc_static_evtchn();
+        }
     }
 
     /*
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 061c92acbd..2444f7f6d8 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -963,7 +963,9 @@ void vcpu_show_registers(const struct vcpu *v)
     ctxt.ifsr32_el2 = v->arch.ifsr;
 #endif
 
+#ifndef CONFIG_HAS_MPU
     ctxt.vttbr_el2 = v->domain->arch.p2m.vttbr;
+#endif
 
     _show_registers(&v->arch.cpu_info->guest_cpu_user_regs, &ctxt, 1, v);
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (40 preceding siblings ...)
  2023-01-13  5:29 ` [PATCH] xen/mpu: make Xen boot to idle on MPU systems(DNM) Penny Zheng
@ 2023-01-13  8:54 ` Jan Beulich
  2023-01-13  9:16   ` Julien Grall
  2023-01-24 19:31 ` Ayan Kumar Halder
  42 siblings, 1 reply; 122+ messages in thread
From: Jan Beulich @ 2023-01-13  8:54 UTC (permalink / raw)
  To: Penny Zheng
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné,
	xen-devel

On 13.01.2023 06:28, Penny Zheng wrote:
>  xen/arch/x86/Kconfig                      |    1 +
>  xen/common/Kconfig                        |    6 +
>  xen/common/Makefile                       |    2 +-
>  xen/include/xen/vmap.h                    |   93 +-

I would like to take a look at these non-Arm changes, but I view it as not
very reasonable to wade through 40 patches just to find those changes. The
titles don't look to help in that either. For such pretty large series,
could you please help non-Arm folks by pointing out in some way where the
non-Arm changes actually are?

Thanks, Jan


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1
  2023-01-13  8:54 ` [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Jan Beulich
@ 2023-01-13  9:16   ` Julien Grall
  2023-01-13  9:28     ` Jan Beulich
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-13  9:16 UTC (permalink / raw)
  To: Jan Beulich, Penny Zheng
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné,
	xen-devel

Hi,

On 13/01/2023 08:54, Jan Beulich wrote:
> On 13.01.2023 06:28, Penny Zheng wrote:
>>   xen/arch/x86/Kconfig                      |    1 +
>>   xen/common/Kconfig                        |    6 +
>>   xen/common/Makefile                       |    2 +-
>>   xen/include/xen/vmap.h                    |   93 +-
> 
> I would like to take a look at these non-Arm changes, but I view it as not
> very reasonable to wade through 40 patches just to find those changes.

Right, but that's the purpose of the different CC list on each patch. 
AFAICT, Penny respected that and you should have been CC to the three 
patches (#30, #31, #32) touching common/x86 code.

> The
> titles don't look to help in that either. For such pretty large series,
> could you please help non-Arm folks by pointing out in some way where the
> non-Arm changes actually are?

See above. I am not entirely sure what else you are requested here. Do 
you want Penny to be explicit and list the patch modified in the cover 
letter?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1
  2023-01-13  9:16   ` Julien Grall
@ 2023-01-13  9:28     ` Jan Beulich
  0 siblings, 0 replies; 122+ messages in thread
From: Jan Beulich @ 2023-01-13  9:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné,
	xen-devel, Penny Zheng

On 13.01.2023 10:16, Julien Grall wrote:
> On 13/01/2023 08:54, Jan Beulich wrote:
>> On 13.01.2023 06:28, Penny Zheng wrote:
>>>   xen/arch/x86/Kconfig                      |    1 +
>>>   xen/common/Kconfig                        |    6 +
>>>   xen/common/Makefile                       |    2 +-
>>>   xen/include/xen/vmap.h                    |   93 +-
>>
>> I would like to take a look at these non-Arm changes, but I view it as not
>> very reasonable to wade through 40 patches just to find those changes.
> 
> Right, but that's the purpose of the different CC list on each patch. 
> AFAICT, Penny respected that and you should have been CC to the three 
> patches (#30, #31, #32) touching common/x86 code.

Right, but I have no way to immediately see which patches I have been
Cc-ed on. Unlike you (iiuc) I'm subscribed to the list, and hence mails
all look the same whether or not I'm CC-ed. Then again I only now
realize that there are ways to filter what I've got - I'm sorry for
not having thought of this earlier.

>> The
>> titles don't look to help in that either. For such pretty large series,
>> could you please help non-Arm folks by pointing out in some way where the
>> non-Arm changes actually are?
> 
> See above. I am not entirely sure what else you are requested here. Do 
> you want Penny to be explicit and list the patch modified in the cover 
> letter?

For a large series mostly touching Arm code, calling out the
"outliers" (when patch titles don't make this clear) could certainly
help. It's not like I'm asking to do such everywhere.

Jan


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 30/40] xen/mpu: disable VMAP sub-system for MPU systems
  2023-01-13  5:29 ` [PATCH v2 30/40] xen/mpu: disable VMAP sub-system for MPU systems Penny Zheng
@ 2023-01-13  9:39   ` Jan Beulich
  0 siblings, 0 replies; 122+ messages in thread
From: Jan Beulich @ 2023-01-13  9:39 UTC (permalink / raw)
  To: Penny Zheng
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné,
	xen-devel

On 13.01.2023 06:29, Penny Zheng wrote:
> VMAP in MMU system, is used to remap a range of normal memory
> or device memory to another virtual address with new attributes
> for specific purpose, like ALTERNATIVE feature. Since there is
> no virtual address translation support in MPU system, we can
> not support VMAP in MPU system.
> 
> So in this patch, we disable VMAP for MPU systems, and some
> features depending on VMAP also need to be disabled at the same
> time, Like ALTERNATIVE, CPU ERRATA.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/Kconfig                   |  3 +-
>  xen/arch/arm/Makefile                  |  2 +-
>  xen/arch/arm/include/asm/alternative.h | 15 +++++
>  xen/arch/arm/include/asm/cpuerrata.h   | 12 ++++
>  xen/arch/arm/setup.c                   |  7 +++
>  xen/arch/x86/Kconfig                   |  1 +
>  xen/common/Kconfig                     |  3 +
>  xen/common/Makefile                    |  2 +-
>  xen/include/xen/vmap.h                 | 81 ++++++++++++++++++++++++--
>  9 files changed, 119 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index c6b6b612d1..9230c8b885 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -11,12 +11,13 @@ config ARM_64
>  
>  config ARM
>  	def_bool y
> -	select HAS_ALTERNATIVE
> +	select HAS_ALTERNATIVE if !ARM_V8R

Judging from the connection you make in the description, I think this
wants to be "if HAS_VMAP".

>  	select HAS_DEVICE_TREE
>  	select HAS_PASSTHROUGH
>  	select HAS_PDX
>  	select HAS_PMAP
>  	select IOMMU_FORCE_PT_SHARE
> +	select HAS_VMAP if !ARM_V8R

I think entries here are intended to be sorted alphabetically.

> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -28,6 +28,7 @@ config X86
>  	select HAS_UBSAN
>  	select HAS_VPCI if HVM
>  	select NEEDS_LIBELF
> +	select HAS_VMAP

Here they are certainly meant to be.

> --- a/xen/include/xen/vmap.h
> +++ b/xen/include/xen/vmap.h
> @@ -1,15 +1,17 @@
> -#if !defined(__XEN_VMAP_H__) && defined(VMAP_VIRT_START)
> +#if !defined(__XEN_VMAP_H__) && (defined(VMAP_VIRT_START) || !defined(CONFIG_HAS_VMAP))
>  #define __XEN_VMAP_H__
>  
> -#include <xen/mm-frame.h>
> -#include <xen/page-size.h>
> -
>  enum vmap_region {
>      VMAP_DEFAULT,
>      VMAP_XEN,
>      VMAP_REGION_NR,
>  };
>  
> +#ifdef CONFIG_HAS_VMAP
> +
> +#include <xen/mm-frame.h>
> +#include <xen/page-size.h>
> +
>  void vm_init_type(enum vmap_region type, void *start, void *end);
>  
>  void *__vmap(const mfn_t *mfn, unsigned int granularity, unsigned int nr,
> @@ -38,4 +40,75 @@ static inline void vm_init(void)
>      vm_init_type(VMAP_DEFAULT, (void *)VMAP_VIRT_START, arch_vmap_virt_end());
>  }
>  
> +#else /* !CONFIG_HAS_VMAP */
> +
> +static inline void vm_init_type(enum vmap_region type, void *start, void *end)
> +{
> +    ASSERT_UNREACHABLE();
> +}

Do you really need this and all other inline stubs? Imo the goal ought
to be to have as few of them as possible: The one above won't be
referenced if you further make LIVEPATCH depend on HAS_VMAP (which I
think you need to do anyway), and the only other call to the function
is visible in context above (i.e. won't be used either when !HAS_VMAP).
In other cases merely having a declaration (but no definition) may be
sufficient, as the compiler may be able to eliminate calls.

Jan


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system
  2023-01-13  5:29 ` [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system Penny Zheng
@ 2023-01-13  9:42   ` Jan Beulich
  2023-01-13 10:10   ` Jan Beulich
  1 sibling, 0 replies; 122+ messages in thread
From: Jan Beulich @ 2023-01-13  9:42 UTC (permalink / raw)
  To: Penny Zheng
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	xen-devel

On 13.01.2023 06:29, Penny Zheng wrote:
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -43,6 +43,9 @@ config HAS_EX_TABLE
>  config HAS_FAST_MULTIPLY
>  	bool
>  
> +config HAS_FIXMAP
> +	bool

I think it'll end up misleading if this option is not selected by x86
as well. So imo you either add that, or you move the option to an Arm-
specific Kconfig.

Jan


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx
  2023-01-13  5:29 ` [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx Penny Zheng
@ 2023-01-13  9:49   ` Jan Beulich
  2023-02-09 11:14   ` Julien Grall
  1 sibling, 0 replies; 122+ messages in thread
From: Jan Beulich @ 2023-01-13  9:49 UTC (permalink / raw)
  To: Penny Zheng
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	xen-devel

On 13.01.2023 06:29, Penny Zheng wrote:
> --- a/xen/include/xen/vmap.h
> +++ b/xen/include/xen/vmap.h
> @@ -89,15 +89,27 @@ static inline void vfree(void *va)
>      ASSERT_UNREACHABLE();
>  }
>  
> +#ifdef CONFIG_HAS_MPU
> +void __iomem *ioremap(paddr_t, size_t);
> +#else
>  void __iomem *ioremap(paddr_t, size_t)
>  {
>      ASSERT_UNREACHABLE();
>      return NULL;
>  }
> +#endif

If, as per the comment on the earlier patch, a mere declaration isn't
sufficient, the earlier patch will need to make the stub static inline.
I'm actually surprised you didn't see a build failure from it not being
so. At the point here I then actually question why the stub function
isn't being dropped here again (assuming it needs putting in place at
all earlier on).

Furthermore, once you want a declaration for the function here as well,
I think it would be better to consolidate both declarations: It's
awkward to have to remember to update two instances, in case any
changes are necessary.

Jan


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h
  2023-01-13  5:28 ` [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h Penny Zheng
@ 2023-01-13 10:06   ` Julien Grall
  2023-01-13 10:39     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-13 10:06 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Julien Grall

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> These two variables are stale variables, they only have declarations
> in config.h, they don't have any definition and no any code is using
> these two variables. So in this patch, we remove them from config.h.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Acked-by: Julien Grall <jgrall@amazon.com>

I was going to commit this patch, however this technically needs your 
signed-off-by as the sender of this new version.

If you confirm your signed-off-by, then I can commit without a resending.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system
  2023-01-13  5:29 ` [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system Penny Zheng
  2023-01-13  9:42   ` Jan Beulich
@ 2023-01-13 10:10   ` Jan Beulich
  2023-02-09 11:01     ` Julien Grall
  1 sibling, 1 reply; 122+ messages in thread
From: Jan Beulich @ 2023-01-13 10:10 UTC (permalink / raw)
  To: Penny Zheng
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	xen-devel

On 13.01.2023 06:29, Penny Zheng wrote:
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -13,9 +13,10 @@ config ARM
>  	def_bool y
>  	select HAS_ALTERNATIVE if !ARM_V8R
>  	select HAS_DEVICE_TREE
> +	select HAS_FIXMAP if !ARM_V8R
>  	select HAS_PASSTHROUGH
>  	select HAS_PDX
> -	select HAS_PMAP
> +	select HAS_PMAP if !ARM_V8R
>  	select IOMMU_FORCE_PT_SHARE
>  	select HAS_VMAP if !ARM_V8R

Thinking about it - wouldn't it make sense to fold HAS_VMAP and HAS_FIXMAP
into a single HAS_MMU?

Jan


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h
  2023-01-13 10:06   ` Julien Grall
@ 2023-01-13 10:39     ` Penny Zheng
  0 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-13 10:39 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Julien Grall

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Friday, January 13, 2023 6:07 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Julien Grall
> <jgrall@amazon.com>
> Subject: Re: [PATCH v2 01/40] xen/arm: remove xen_phys_start and
> xenheap_phys_end from config.h
> 
> Hi Penny,

Hi Julien

> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> > These two variables are stale variables, they only have declarations
> > in config.h, they don't have any definition and no any code is using
> > these two variables. So in this patch, we remove them from config.h.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > Acked-by: Julien Grall <jgrall@amazon.com>
> 
> I was going to commit this patch, however this technically needs your signed-
> off-by as the sender of this new version.
> 
> If you confirm your signed-off-by, then I can commit without a resending.
> 

Yes, I confirm, thx

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64
  2023-01-13  5:28 ` [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64 Penny Zheng
@ 2023-01-17 23:09   ` Julien Grall
  2023-01-18  2:19     ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-17 23:09 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> Currently, ARM_EFI will mandatorily selected by Arm64.
> Even if the user knows for sure that their images will not
> start in the EFI environment, they can't disable the EFI
> support for Arm64. This means there will be about 3K lines
> unused code in their images.
> 
> So in this patch, we make ARM_EFI selectable for Arm64, and
> based on that, we can use CONFIG_ARM_EFI to gate the EFI
> specific code in head.S for those images that will not be
> booted in EFI environment.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>

Your signed-off-by is missing.

> ---
> v1 -> v2:
> 1. New patch
> ---
>   xen/arch/arm/Kconfig      | 10 ++++++++--
>   xen/arch/arm/arm64/head.S | 15 +++++++++++++--
>   2 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 239d3aed3c..ace7178c9a 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -7,7 +7,6 @@ config ARM_64
>   	def_bool y
>   	depends on !ARM_32
>   	select 64BIT
> -	select ARM_EFI
>   	select HAS_FAST_MULTIPLY
>   
>   config ARM
> @@ -37,7 +36,14 @@ config ACPI
>   	  an alternative to device tree on ARM64.
>   
>   config ARM_EFI
> -	bool
> +	bool "UEFI boot service support"
> +	depends on ARM_64
> +	default y
> +	help
> +	  This option provides support for boot services through
> +	  UEFI firmware. A UEFI stub is provided to allow Xen to
> +	  be booted as an EFI application. This is only useful for
> +	  Xen that may run on systems that have UEFI firmware.

I would drop the last sentence as this is implied with the rest of the 
paragraph.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA
  2023-01-13  5:28 ` [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA Penny Zheng
@ 2023-01-17 23:16   ` Julien Grall
  2023-01-18  2:32     ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-17 23:16 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
>  From Arm ARM Supplement of Armv8-R AArch64 (DDI 0600A) [1],
> section D1.6.2 TLB maintenance instructions, we know that
> Armv8-R AArch64 permits an implementation to cache stage 1
> VMSAv8-64 and stage 2 PMSAv8-64 attributes as a common entry
> for the Secure EL1&0 translation regime. But for Xen itself,
> it's running with stage 1 PMSAv8-64 on Armv8-R AArch64. The
> EL2 MPU updates for stage 1 PMSAv8-64 will not be cached in
> TLB entries. So we don't need any TLB invalidation for Xen
> itself in EL2.

So I understand the theory here. But I would expect that none of the 
common code will call any of those helpers. Therefore the #ifdef should 
be unnecessary.

Can you clarify if my understanding is correct?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R
  2023-01-13  5:28 ` [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R Penny Zheng
@ 2023-01-17 23:24   ` Julien Grall
  2023-01-18  3:00     ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-17 23:24 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Jiamei . Xie

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> On Armv8-A, Xen has a fixed virtual start address (link address
> too) for all Armv8-A platforms. In an MMU based system, Xen can
> map its loaded address to this virtual start address. So, on
> Armv8-A platforms, the Xen start address does not need to be
> configurable. But on Armv8-R platforms, there is no MMU to map
> loaded address to a fixed virtual address and different platforms
> will have very different address space layout. So Xen cannot use
> a fixed physical address on MPU based system and need to have it
> configurable.
> 
> In this patch we introduce one Kconfig option for users to define
> the default Xen start address for Armv8-R. Users can enter the
> address in config time, or select the tailored platform config
> file from arch/arm/configs.
> 
> And as we introduced Armv8-R platforms to Xen, that means the
> existed Arm64 platforms should not be listed in Armv8-R platform
> list, so we add !ARM_V8R dependency for these platforms.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Jiamei.Xie <jiamei.xie@arm.com>

Your signed-off-by is missing.

> ---
> v1 -> v2:
> 1. Remove the platform header fvp_baser.h.
> 2. Remove the default start address for fvp_baser64.
> 3. Remove the description of default address from commit log.
> 4. Change HAS_MPU to ARM_V8R for Xen start address dependency.
>     No matter Arm-v8r board has MPU or not, it always need to
>     specify the start address.

I don't quite understand the last sentence. Are you saying that it is 
possible to have an ARMv8-R system with an MPU nor a page-table?

> ---
>   xen/arch/arm/Kconfig           |  8 ++++++++
>   xen/arch/arm/platforms/Kconfig | 16 +++++++++++++---
>   2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index ace7178c9a..c6b6b612d1 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -145,6 +145,14 @@ config TEE
>   	  This option enables generic TEE mediators support. It allows guests
>   	  to access real TEE via one of TEE mediators implemented in XEN.
>   
> +config XEN_START_ADDRESS
> +	hex "Xen start address: keep default to use platform defined address"
> +	default 0
> +	depends on ARM_V8R

It is still pretty unclear to me what would be the difference between 
HAS_MPU and ARM_V8R.

> +	help
> +	  This option allows to set the customized address at which Xen will be
> +	  linked on MPU systems. This address must be aligned to a page size.
> +
>   source "arch/arm/tee/Kconfig"
>   
>   config STATIC_SHM
> diff --git a/xen/arch/arm/platforms/Kconfig b/xen/arch/arm/platforms/Kconfig
> index c93a6b2756..0904793a0b 100644
> --- a/xen/arch/arm/platforms/Kconfig
> +++ b/xen/arch/arm/platforms/Kconfig
> @@ -1,6 +1,7 @@
>   choice
>   	prompt "Platform Support"
>   	default ALL_PLAT
> +	default FVP_BASER if ARM_V8R
>   	---help---
>   	Choose which hardware platform to enable in Xen.
>   
> @@ -8,13 +9,14 @@ choice
>   
>   config ALL_PLAT
>   	bool "All Platforms"
> +	depends on !ARM_V8R
>   	---help---
>   	Enable support for all available hardware platforms. It doesn't
>   	automatically select any of the related drivers.
>   
>   config QEMU
>   	bool "QEMU aarch virt machine support"
> -	depends on ARM_64
> +	depends on ARM_64 && !ARM_V8R
>   	select GICV3
>   	select HAS_PL011
>   	---help---
> @@ -23,7 +25,7 @@ config QEMU
>   
>   config RCAR3
>   	bool "Renesas RCar3 support"
> -	depends on ARM_64
> +	depends on ARM_64 && !ARM_V8R
>   	select HAS_SCIF
>   	select IPMMU_VMSA
>   	---help---
> @@ -31,14 +33,22 @@ config RCAR3
>   
>   config MPSOC
>   	bool "Xilinx Ultrascale+ MPSoC support"
> -	depends on ARM_64
> +	depends on ARM_64 && !ARM_V8R
>   	select HAS_CADENCE_UART
>   	select ARM_SMMU
>   	---help---
>   	Enable all the required drivers for Xilinx Ultrascale+ MPSoC
>   
> +config FVP_BASER
> +	bool "Fixed Virtual Platform BaseR support"
> +	depends on ARM_V8R
> +	help
> +	  Enable platform specific configurations for Fixed Virtual
> +	  Platform BaseR

This seems unrelated to this patch.

> +
>   config NO_PLAT
>   	bool "No Platforms"
> +	depends on !ARM_V8R
>   	---help---
>   	Do not enable specific support for any platform.
>   

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S
  2023-01-13  5:28 ` [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S Penny Zheng
@ 2023-01-17 23:37   ` Julien Grall
  2023-01-18  3:09     ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-17 23:37 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> We want to reuse head.S for MPU systems, but there are some
> code implemented for MMU systems only. We will move such
> code to another MMU specific file. But before that, we will
> do some preparations in this patch to make them easier
> for reviewing:

Well, I agree that...

> 1. Fix the indentations of code comments.

... changing the indentation is better here. But...

> 2. Export some symbols that will be accessed out of file
>     scope.

... I have no idea which functions are going to be used in a separate 
file. So I think they should belong to the patch moving the code.

> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>

Your signed-off-by is missing.

> ---
> v1 -> v2:
> 1. New patch.
> ---
>   xen/arch/arm/arm64/head.S | 40 +++++++++++++++++++--------------------
>   1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 93f9b0b9d5..b2214bc5e3 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -136,22 +136,22 @@
>           add \xb, \xb, x20
>   .endm
>   
> -        .section .text.header, "ax", %progbits
> -        /*.aarch64*/
> +.section .text.header, "ax", %progbits
> +/*.aarch64*/

This change is not mentioned.

>   
> -        /*
> -         * Kernel startup entry point.
> -         * ---------------------------
> -         *
> -         * The requirements are:
> -         *   MMU = off, D-cache = off, I-cache = on or off,
> -         *   x0 = physical address to the FDT blob.
> -         *
> -         * This must be the very first address in the loaded image.
> -         * It should be linked at XEN_VIRT_START, and loaded at any
> -         * 4K-aligned address.  All of text+data+bss must fit in 2MB,
> -         * or the initial pagetable code below will need adjustment.
> -         */
> +/*
> + * Kernel startup entry point.
> + * ---------------------------
> + *
> + * The requirements are:
> + *   MMU = off, D-cache = off, I-cache = on or off,
> + *   x0 = physical address to the FDT blob.
> + *
> + * This must be the very first address in the loaded image.
> + * It should be linked at XEN_VIRT_START, and loaded at any
> + * 4K-aligned address.  All of text+data+bss must fit in 2MB,
> + * or the initial pagetable code below will need adjustment.
> + */
>   
>   GLOBAL(start)
>           /*
> @@ -586,7 +586,7 @@ ENDPROC(cpu_init)
>    *
>    * Clobbers x0 - x4
>    */
> -create_page_tables:
> +ENTRY(create_page_tables)

I am not sure about keeping this name. Now we have create_page_tables() 
and arch_setup_page_tables().

I would conside to name it create_boot_page_tables().

>           /* Prepare the page-tables for mapping Xen */
>           ldr   x0, =XEN_VIRT_START
>           create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
> @@ -680,7 +680,7 @@ ENDPROC(create_page_tables)
>    *
>    * Clobbers x0 - x3
>    */
> -enable_mmu:
> +ENTRY(enable_mmu)
>           PRINT("- Turning on paging -\r\n")
>   
>           /*
> @@ -714,7 +714,7 @@ ENDPROC(enable_mmu)
>    *
>    * Clobbers x0 - x1
>    */
> -remove_identity_mapping:
> +ENTRY(remove_identity_mapping)

Patch #14 should be before this patch. So you don't have to export 
remove_identity_mapping temporarily.

This will also avoid (transient) naming confusing with my work (see [1]).

>           /*
>            * Find the zeroeth slot used. Remove the entry from zeroeth
>            * table if the slot is not XEN_ZEROETH_SLOT.
> @@ -775,7 +775,7 @@ ENDPROC(remove_identity_mapping)
>    *
>    * Clobbers x0 - x3
>    */
> -setup_fixmap:
> +ENTRY(setup_fixmap)
>   #ifdef CONFIG_EARLY_PRINTK
>           /* Add UART to the fixmap table */
>           ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
> @@ -871,7 +871,7 @@ ENDPROC(init_uart)
>    * x0: Nul-terminated string to print.
>    * x23: Early UART base address
>    * Clobbers x0-x1 */
> -puts:
> +ENTRY(puts)

This name is a bit too generic to be globally exported. It is also now 
quite confusing because we have "early_puts" and "puts".

I would consider to name it asm_puts(). It is still not great but 
hopefully it would give a hint that should be call from assembly code.

>           early_uart_ready x23, 1
>           ldrb  w1, [x0], #1           /* Load next char */
>           cbz   w1, 1f                 /* Exit on nul */

Cheers,

[1] https://lore.kernel.org/all/20230113101136.479-13-julien@xen.org/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections
  2023-01-13  5:28 ` [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections Penny Zheng
@ 2023-01-17 23:46   ` Julien Grall
  2023-01-18  2:18     ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-17 23:46 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> Only the first 4KB of Xen image will be mapped as identity
> (PA == VA). At the moment, Xen guarantees this by having
> everything that needs to be used in the identity mapping
> in head.S before _end_boot and checking at link time if this
> fits in 4KB.
> 
> In previous patch, we have moved the MMU code outside of
> head.S. Although we have added .text.header to the new file
> to guarantee all identity map code still in the first 4KB.
> However, the order of these two files on this 4KB depends
> on the build tools. Currently, we use the build tools to
> process the order of objs in the Makefile to ensure that
> head.S must be at the top. But if you change to another build
> tools, it may not be the same result.

Right, so this is fixing a bug you introduced in the previous patch. We 
should really avoid introducing (latent) regression in a series. So 
please re-order the patches.

> 
> In this patch we introduce .text.idmap to head_mmu.S, and
> add this section after .text.header. to ensure code of
> head_mmu.S after the code of header.S.
> 
> After this, we will still include some code that does not
> belong to identity map before _end_boot. Because we have
> moved _end_boot to head_mmu.S. 

I dislike this approach because you are expecting that only head_mmu.S 
will be part of .text.idmap. If it is not, everything could blow up again.

That said, if you look at staging, you will notice that now _end_boot is 
defined in the linker script to avoid any issue.

> That means all code in head.S
> will be included before _end_boot. In this patch, we also
> added .text flag in the place of original _end_boot in head.S.
> All the code after .text in head.S will not be included in
> identity map section.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
> v1 -> v2:
> 1. New patch.
> ---
>   xen/arch/arm/arm64/head.S     | 6 ++++++
>   xen/arch/arm/arm64/head_mmu.S | 2 +-
>   xen/arch/arm/xen.lds.S        | 1 +
>   3 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 5cfa47279b..782bd1f94c 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -466,6 +466,12 @@ fail:   PRINT("- Boot failed -\r\n")
>           b     1b
>   ENDPROC(fail)
>   
> +/*
> + * For the code that do not need in indentity map section,
> + * we put them back to normal .text section
> + */
> +.section .text, "ax", %progbits
> +

I would argue that puts wants to be part of the idmap.

>   #ifdef CONFIG_EARLY_PRINTK
>   /*
>    * Initialize the UART. Should only be called on the boot CPU.
> diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
> index e2c8f07140..6ff13c751c 100644
> --- a/xen/arch/arm/arm64/head_mmu.S
> +++ b/xen/arch/arm/arm64/head_mmu.S
> @@ -105,7 +105,7 @@
>           str   \tmp2, [\tmp3, \tmp1, lsl #3]
>   .endm
>   
> -.section .text.header, "ax", %progbits
> +.section .text.idmap, "ax", %progbits
>   /*.aarch64*/
>   
>   /*
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 92c2984052..bc45ea2c65 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -33,6 +33,7 @@ SECTIONS
>     .text : {
>           _stext = .;            /* Text section */
>          *(.text.header)
> +       *(.text.idmap)
>   
>          *(.text.cold)
>          *(.text.unlikely .text.*_unlikely .text.unlikely.*)

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R
  2023-01-13  5:28 ` [PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R Penny Zheng
@ 2023-01-17 23:49   ` Julien Grall
  2023-01-18  1:43     ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-17 23:49 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> There is no VMSA support on Armv8-R AArch64, so we can not map early
> UART to FIXMAP_CONSOLE. Instead, we use PA == VA to define
> EARLY_UART_VIRTUAL_ADDRESS on Armv8-R AArch64.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>

Your signed-off-by is missing.

> ---
> 1. New patch
> ---
>   xen/arch/arm/include/asm/early_printk.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/xen/arch/arm/include/asm/early_printk.h b/xen/arch/arm/include/asm/early_printk.h
> index c5149b2976..44a230853f 100644
> --- a/xen/arch/arm/include/asm/early_printk.h
> +++ b/xen/arch/arm/include/asm/early_printk.h
> @@ -15,10 +15,22 @@
>   
>   #ifdef CONFIG_EARLY_PRINTK
>   
> +#ifdef CONFIG_ARM_V8R

Shouldn't this be CONFIG_HAS_MPU?

> +
> +/*
> + * For Armv-8r, there is not VMSA support in EL2, so we use VA == PA

s/not/no/

> + * for EARLY_UART_VIRTUAL_ADDRESS. > + */
> +#define EARLY_UART_VIRTUAL_ADDRESS CONFIG_EARLY_UART_BASE_ADDRESS
> +
> +#else
> +
>   /* need to add the uart address offset in page to the fixmap address */
>   #define EARLY_UART_VIRTUAL_ADDRESS \
>       (FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
>   
> +#endif /* CONFIG_ARM_V8R */
> +
>   #endif /* !CONFIG_EARLY_PRINTK */
>   
>   #endif

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R
  2023-01-17 23:49   ` Julien Grall
@ 2023-01-18  1:43     ` Wei Chen
  0 siblings, 0 replies; 122+ messages in thread
From: Wei Chen @ 2023-01-18  1:43 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 7:49
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 08/40] xen/arm: use PA == VA for
> EARLY_UART_VIRTUAL_ADDRESS on Armv-8R
> 
> Hi Penny,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> > There is no VMSA support on Armv8-R AArch64, so we can not map early
> > UART to FIXMAP_CONSOLE. Instead, we use PA == VA to define
> > EARLY_UART_VIRTUAL_ADDRESS on Armv8-R AArch64.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> 
> Your signed-off-by is missing.
> 
> > ---
> > 1. New patch
> > ---
> >   xen/arch/arm/include/asm/early_printk.h | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/xen/arch/arm/include/asm/early_printk.h
> b/xen/arch/arm/include/asm/early_printk.h
> > index c5149b2976..44a230853f 100644
> > --- a/xen/arch/arm/include/asm/early_printk.h
> > +++ b/xen/arch/arm/include/asm/early_printk.h
> > @@ -15,10 +15,22 @@
> >
> >   #ifdef CONFIG_EARLY_PRINTK
> >
> > +#ifdef CONFIG_ARM_V8R
> 
> Shouldn't this be CONFIG_HAS_MPU?
> 

We had considered that there may be an implementation of Arm8R without
an MPU, so we used CONFIG_ARM_V8R here. But you're right, we have not
support non-MPU scenario in this series, so use CONFIG_HAS_MPU here
would be better to indicate this is a feature based code section.
We will change it to CONFIG_HAS_MPU in next version.

> > +
> > +/*
> > + * For Armv-8r, there is not VMSA support in EL2, so we use VA == PA
> 
> s/not/no/
> 

Ok.

Cheers,
Wei Chen

> > + * for EARLY_UART_VIRTUAL_ADDRESS. > + */
> > +#define EARLY_UART_VIRTUAL_ADDRESS CONFIG_EARLY_UART_BASE_ADDRESS
> > +
> > +#else
> > +
> >   /* need to add the uart address offset in page to the fixmap address
> */
> >   #define EARLY_UART_VIRTUAL_ADDRESS \
> >       (FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS &
> ~PAGE_MASK))
> >
> > +#endif /* CONFIG_ARM_V8R */
> > +
> >   #endif /* !CONFIG_EARLY_PRINTK */
> >
> >   #endif
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections
  2023-01-17 23:46   ` Julien Grall
@ 2023-01-18  2:18     ` Wei Chen
  2023-01-18 10:55       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Wei Chen @ 2023-01-18  2:18 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 7:46
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity
> map sections
> 
> Hi,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> > Only the first 4KB of Xen image will be mapped as identity
> > (PA == VA). At the moment, Xen guarantees this by having
> > everything that needs to be used in the identity mapping
> > in head.S before _end_boot and checking at link time if this
> > fits in 4KB.
> >
> > In previous patch, we have moved the MMU code outside of
> > head.S. Although we have added .text.header to the new file
> > to guarantee all identity map code still in the first 4KB.
> > However, the order of these two files on this 4KB depends
> > on the build tools. Currently, we use the build tools to
> > process the order of objs in the Makefile to ensure that
> > head.S must be at the top. But if you change to another build
> > tools, it may not be the same result.
> 
> Right, so this is fixing a bug you introduced in the previous patch. We
> should really avoid introducing (latent) regression in a series. So
> please re-order the patches.
> 

Ok.

> >
> > In this patch we introduce .text.idmap to head_mmu.S, and
> > add this section after .text.header. to ensure code of
> > head_mmu.S after the code of header.S.
> >
> > After this, we will still include some code that does not
> > belong to identity map before _end_boot. Because we have
> > moved _end_boot to head_mmu.S.
> 
> I dislike this approach because you are expecting that only head_mmu.S
> will be part of .text.idmap. If it is not, everything could blow up again.
> 

I agree.

> That said, if you look at staging, you will notice that now _end_boot is
> defined in the linker script to avoid any issue.
> 

Sorry, I am not quite clear about this comment. The _end_boot of original
staging branch is defined in head.S. And I am not quite sure how this
_end_boot solve multiple files contain idmap code.

Cheers,
Wei Chen

> > That means all code in head.S
> > will be included before _end_boot. In this patch, we also
> > added .text flag in the place of original _end_boot in head.S.
> > All the code after .text in head.S will not be included in
> > identity map section.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> > v1 -> v2:
> > 1. New patch.
> > ---
> >   xen/arch/arm/arm64/head.S     | 6 ++++++
> >   xen/arch/arm/arm64/head_mmu.S | 2 +-
> >   xen/arch/arm/xen.lds.S        | 1 +
> >   3 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 5cfa47279b..782bd1f94c 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -466,6 +466,12 @@ fail:   PRINT("- Boot failed -\r\n")
> >           b     1b
> >   ENDPROC(fail)
> >
> > +/*
> > + * For the code that do not need in indentity map section,
> > + * we put them back to normal .text section
> > + */
> > +.section .text, "ax", %progbits
> > +
> 
> I would argue that puts wants to be part of the idmap.
> 

I am ok to move puts to idmap. But from the original head.S, puts is
placed after _end_boot, and from the xen.ld.S, we can see idmap is
area is the section of "_end_boot - start". The reason of moving puts
to idmap is because we're using it in idmap?

Cheers,
Wei Chen

> >   #ifdef CONFIG_EARLY_PRINTK
> >   /*
> >    * Initialize the UART. Should only be called on the boot CPU.
> > diff --git a/xen/arch/arm/arm64/head_mmu.S
> b/xen/arch/arm/arm64/head_mmu.S
> > index e2c8f07140..6ff13c751c 100644
> > --- a/xen/arch/arm/arm64/head_mmu.S
> > +++ b/xen/arch/arm/arm64/head_mmu.S
> > @@ -105,7 +105,7 @@
> >           str   \tmp2, [\tmp3, \tmp1, lsl #3]
> >   .endm
> >
> > -.section .text.header, "ax", %progbits
> > +.section .text.idmap, "ax", %progbits
> >   /*.aarch64*/
> >
> >   /*
> > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> > index 92c2984052..bc45ea2c65 100644
> > --- a/xen/arch/arm/xen.lds.S
> > +++ b/xen/arch/arm/xen.lds.S
> > @@ -33,6 +33,7 @@ SECTIONS
> >     .text : {
> >           _stext = .;            /* Text section */
> >          *(.text.header)
> > +       *(.text.idmap)
> >
> >          *(.text.cold)
> >          *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64
  2023-01-17 23:09   ` Julien Grall
@ 2023-01-18  2:19     ` Wei Chen
  0 siblings, 0 replies; 122+ messages in thread
From: Wei Chen @ 2023-01-18  2:19 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 7:09
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64
> 
> Hi Penny,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> > Currently, ARM_EFI will mandatorily selected by Arm64.
> > Even if the user knows for sure that their images will not
> > start in the EFI environment, they can't disable the EFI
> > support for Arm64. This means there will be about 3K lines
> > unused code in their images.
> >
> > So in this patch, we make ARM_EFI selectable for Arm64, and
> > based on that, we can use CONFIG_ARM_EFI to gate the EFI
> > specific code in head.S for those images that will not be
> > booted in EFI environment.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> 
> Your signed-off-by is missing.
> 
> > ---
> > v1 -> v2:
> > 1. New patch
> > ---
> >   xen/arch/arm/Kconfig      | 10 ++++++++--
> >   xen/arch/arm/arm64/head.S | 15 +++++++++++++--
> >   2 files changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> > index 239d3aed3c..ace7178c9a 100644
> > --- a/xen/arch/arm/Kconfig
> > +++ b/xen/arch/arm/Kconfig
> > @@ -7,7 +7,6 @@ config ARM_64
> >   	def_bool y
> >   	depends on !ARM_32
> >   	select 64BIT
> > -	select ARM_EFI
> >   	select HAS_FAST_MULTIPLY
> >
> >   config ARM
> > @@ -37,7 +36,14 @@ config ACPI
> >   	  an alternative to device tree on ARM64.
> >
> >   config ARM_EFI
> > -	bool
> > +	bool "UEFI boot service support"
> > +	depends on ARM_64
> > +	default y
> > +	help
> > +	  This option provides support for boot services through
> > +	  UEFI firmware. A UEFI stub is provided to allow Xen to
> > +	  be booted as an EFI application. This is only useful for
> > +	  Xen that may run on systems that have UEFI firmware.
> 
> I would drop the last sentence as this is implied with the rest of the
> paragraph.
> 

Ok.

Cheers,
Wei Chen

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA
  2023-01-17 23:16   ` Julien Grall
@ 2023-01-18  2:32     ` Wei Chen
  0 siblings, 0 replies; 122+ messages in thread
From: Wei Chen @ 2023-01-18  2:32 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 7:17
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-
> R64 PMSA
> 
> Hi,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> >  From Arm ARM Supplement of Armv8-R AArch64 (DDI 0600A) [1],
> > section D1.6.2 TLB maintenance instructions, we know that
> > Armv8-R AArch64 permits an implementation to cache stage 1
> > VMSAv8-64 and stage 2 PMSAv8-64 attributes as a common entry
> > for the Secure EL1&0 translation regime. But for Xen itself,
> > it's running with stage 1 PMSAv8-64 on Armv8-R AArch64. The
> > EL2 MPU updates for stage 1 PMSAv8-64 will not be cached in
> > TLB entries. So we don't need any TLB invalidation for Xen
> > itself in EL2.
> 
> So I understand the theory here. But I would expect that none of the
> common code will call any of those helpers. Therefore the #ifdef should
> be unnecessary.
> 
> Can you clarify if my understanding is correct?
> 

Yes, you're right, after we separate common code and MMU code, these
helpers will be called in MMU specific code only. We will drop this
patch in next version.

Cheers,
Wei Chen

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R
  2023-01-17 23:24   ` Julien Grall
@ 2023-01-18  3:00     ` Wei Chen
  2023-01-18  9:44       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Wei Chen @ 2023-01-18  3:00 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Jiamei Xie

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 7:24
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Jiamei Xie
> <Jiamei.Xie@arm.com>
> Subject: Re: [PATCH v2 04/40] xen/arm: add an option to define Xen start
> address for Armv8-R
> 
> Hi Penny,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> > On Armv8-A, Xen has a fixed virtual start address (link address
> > too) for all Armv8-A platforms. In an MMU based system, Xen can
> > map its loaded address to this virtual start address. So, on
> > Armv8-A platforms, the Xen start address does not need to be
> > configurable. But on Armv8-R platforms, there is no MMU to map
> > loaded address to a fixed virtual address and different platforms
> > will have very different address space layout. So Xen cannot use
> > a fixed physical address on MPU based system and need to have it
> > configurable.
> >
> > In this patch we introduce one Kconfig option for users to define
> > the default Xen start address for Armv8-R. Users can enter the
> > address in config time, or select the tailored platform config
> > file from arch/arm/configs.
> >
> > And as we introduced Armv8-R platforms to Xen, that means the
> > existed Arm64 platforms should not be listed in Armv8-R platform
> > list, so we add !ARM_V8R dependency for these platforms.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > Signed-off-by: Jiamei.Xie <jiamei.xie@arm.com>
> 
> Your signed-off-by is missing.
> 
> > ---
> > v1 -> v2:
> > 1. Remove the platform header fvp_baser.h.
> > 2. Remove the default start address for fvp_baser64.
> > 3. Remove the description of default address from commit log.
> > 4. Change HAS_MPU to ARM_V8R for Xen start address dependency.
> >     No matter Arm-v8r board has MPU or not, it always need to
> >     specify the start address.
> 
> I don't quite understand the last sentence. Are you saying that it is
> possible to have an ARMv8-R system with an MPU nor a page-table?
> 

Yes, from the Cortex-R82 page [1], you can see the MPU is optional in EL1
and EL2:
"Two optional and programmable MPUs controlled from EL1 and EL2 respectively."

Although it is unlikely that vendors using the Armv8-R IP will do so, it
is indeed an option. In the ID register, there are also related bits in
ID_AA64MMFR0_EL1 (MSA_frac) to indicate this.

> > ---
> >   xen/arch/arm/Kconfig           |  8 ++++++++
> >   xen/arch/arm/platforms/Kconfig | 16 +++++++++++++---
> >   2 files changed, 21 insertions(+), 3 deletions(-)
> >
> > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> > index ace7178c9a..c6b6b612d1 100644
> > --- a/xen/arch/arm/Kconfig
> > +++ b/xen/arch/arm/Kconfig
> > @@ -145,6 +145,14 @@ config TEE
> >   	  This option enables generic TEE mediators support. It allows
> guests
> >   	  to access real TEE via one of TEE mediators implemented in XEN.
> >
> > +config XEN_START_ADDRESS
> > +	hex "Xen start address: keep default to use platform defined
> address"
> > +	default 0
> > +	depends on ARM_V8R
> 
> It is still pretty unclear to me what would be the difference between
> HAS_MPU and ARM_V8R.
> 

If we don't want to support non-MPU supported Armv8-R, I think they are the
same. IMO, non-MPU supported Armv8-R is meaningless to Xen.

> > +	help
> > +	  This option allows to set the customized address at which Xen will
> be
> > +	  linked on MPU systems. This address must be aligned to a page size.
> > +
> >   source "arch/arm/tee/Kconfig"
> >
> >   config STATIC_SHM
> > diff --git a/xen/arch/arm/platforms/Kconfig
> b/xen/arch/arm/platforms/Kconfig
> > index c93a6b2756..0904793a0b 100644
> > --- a/xen/arch/arm/platforms/Kconfig
> > +++ b/xen/arch/arm/platforms/Kconfig
> > @@ -1,6 +1,7 @@
> >   choice
> >   	prompt "Platform Support"
> >   	default ALL_PLAT
> > +	default FVP_BASER if ARM_V8R
> >   	---help---
> >   	Choose which hardware platform to enable in Xen.
> >
> > @@ -8,13 +9,14 @@ choice
> >
> >   config ALL_PLAT
> >   	bool "All Platforms"
> > +	depends on !ARM_V8R
> >   	---help---
> >   	Enable support for all available hardware platforms. It doesn't
> >   	automatically select any of the related drivers.
> >
> >   config QEMU
> >   	bool "QEMU aarch virt machine support"
> > -	depends on ARM_64
> > +	depends on ARM_64 && !ARM_V8R
> >   	select GICV3
> >   	select HAS_PL011
> >   	---help---
> > @@ -23,7 +25,7 @@ config QEMU
> >
> >   config RCAR3
> >   	bool "Renesas RCar3 support"
> > -	depends on ARM_64
> > +	depends on ARM_64 && !ARM_V8R
> >   	select HAS_SCIF
> >   	select IPMMU_VMSA
> >   	---help---
> > @@ -31,14 +33,22 @@ config RCAR3
> >
> >   config MPSOC
> >   	bool "Xilinx Ultrascale+ MPSoC support"
> > -	depends on ARM_64
> > +	depends on ARM_64 && !ARM_V8R
> >   	select HAS_CADENCE_UART
> >   	select ARM_SMMU
> >   	---help---
> >   	Enable all the required drivers for Xilinx Ultrascale+ MPSoC
> >
> > +config FVP_BASER
> > +	bool "Fixed Virtual Platform BaseR support"
> > +	depends on ARM_V8R
> > +	help
> > +	  Enable platform specific configurations for Fixed Virtual
> > +	  Platform BaseR
> 
> This seems unrelated to this patch.
> 

Can we add some descriptions in commit log for this change, or we
Should move it to a new patch? We had preferred to use separate
patches for this kind of changes, but we found the number of patches
would become more and more. This problem has been bothering us for
organizing patches.

[1] https://developer.arm.com/Processors/Cortex-R82

Cheers,
Wei Chen

> > +
> >   config NO_PLAT
> >   	bool "No Platforms"
> > +	depends on !ARM_V8R
> >   	---help---
> >   	Do not enable specific support for any platform.
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S
  2023-01-17 23:37   ` Julien Grall
@ 2023-01-18  3:09     ` Wei Chen
  2023-01-18  9:50       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Wei Chen @ 2023-01-18  3:09 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 7:37
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related
> code from head.S
> 
> Hi Penny,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> > We want to reuse head.S for MPU systems, but there are some
> > code implemented for MMU systems only. We will move such
> > code to another MMU specific file. But before that, we will
> > do some preparations in this patch to make them easier
> > for reviewing:
> 
> Well, I agree that...
> 
> > 1. Fix the indentations of code comments.
> 
> ... changing the indentation is better here. But...
> 
> > 2. Export some symbols that will be accessed out of file
> >     scope.
> 
> ... I have no idea which functions are going to be used in a separate
> file. So I think they should belong to the patch moving the code.
> 

Ok, I will move these changes to the moving code patches.

> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> 
> Your signed-off-by is missing.
> 
> > ---
> > v1 -> v2:
> > 1. New patch.
> > ---
> >   xen/arch/arm/arm64/head.S | 40 +++++++++++++++++++--------------------
> >   1 file changed, 20 insertions(+), 20 deletions(-)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 93f9b0b9d5..b2214bc5e3 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -136,22 +136,22 @@
> >           add \xb, \xb, x20
> >   .endm
> >
> > -        .section .text.header, "ax", %progbits
> > -        /*.aarch64*/
> > +.section .text.header, "ax", %progbits
> > +/*.aarch64*/
> 
> This change is not mentioned.
> 

I will add the description in commit message.

> >
> > -        /*
> > -         * Kernel startup entry point.
> > -         * ---------------------------
> > -         *
> > -         * The requirements are:
> > -         *   MMU = off, D-cache = off, I-cache = on or off,
> > -         *   x0 = physical address to the FDT blob.
> > -         *
> > -         * This must be the very first address in the loaded image.
> > -         * It should be linked at XEN_VIRT_START, and loaded at any
> > -         * 4K-aligned address.  All of text+data+bss must fit in 2MB,
> > -         * or the initial pagetable code below will need adjustment.
> > -         */
> > +/*
> > + * Kernel startup entry point.
> > + * ---------------------------
> > + *
> > + * The requirements are:
> > + *   MMU = off, D-cache = off, I-cache = on or off,
> > + *   x0 = physical address to the FDT blob.
> > + *
> > + * This must be the very first address in the loaded image.
> > + * It should be linked at XEN_VIRT_START, and loaded at any
> > + * 4K-aligned address.  All of text+data+bss must fit in 2MB,
> > + * or the initial pagetable code below will need adjustment.
> > + */
> >
> >   GLOBAL(start)
> >           /*
> > @@ -586,7 +586,7 @@ ENDPROC(cpu_init)
> >    *
> >    * Clobbers x0 - x4
> >    */
> > -create_page_tables:
> > +ENTRY(create_page_tables)
> 
> I am not sure about keeping this name. Now we have create_page_tables()
> and arch_setup_page_tables().
> 
> I would conside to name it create_boot_page_tables().
> 

Do you need me to rename it in this patch?

> >           /* Prepare the page-tables for mapping Xen */
> >           ldr   x0, =XEN_VIRT_START
> >           create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
> > @@ -680,7 +680,7 @@ ENDPROC(create_page_tables)
> >    *
> >    * Clobbers x0 - x3
> >    */
> > -enable_mmu:
> > +ENTRY(enable_mmu)
> >           PRINT("- Turning on paging -\r\n")
> >
> >           /*
> > @@ -714,7 +714,7 @@ ENDPROC(enable_mmu)
> >    *
> >    * Clobbers x0 - x1
> >    */
> > -remove_identity_mapping:
> > +ENTRY(remove_identity_mapping)
> 
> Patch #14 should be before this patch. So you don't have to export
> remove_identity_mapping temporarily.
> 
> This will also avoid (transient) naming confusing with my work (see [1]).
> 

Ok, we will do it.

> >           /*
> >            * Find the zeroeth slot used. Remove the entry from zeroeth
> >            * table if the slot is not XEN_ZEROETH_SLOT.
> > @@ -775,7 +775,7 @@ ENDPROC(remove_identity_mapping)
> >    *
> >    * Clobbers x0 - x3
> >    */
> > -setup_fixmap:
> > +ENTRY(setup_fixmap)
> >   #ifdef CONFIG_EARLY_PRINTK
> >           /* Add UART to the fixmap table */
> >           ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
> > @@ -871,7 +871,7 @@ ENDPROC(init_uart)
> >    * x0: Nul-terminated string to print.
> >    * x23: Early UART base address
> >    * Clobbers x0-x1 */
> > -puts:
> > +ENTRY(puts)
> 
> This name is a bit too generic to be globally exported. It is also now
> quite confusing because we have "early_puts" and "puts".
> 
> I would consider to name it asm_puts(). It is still not great but
> hopefully it would give a hint that should be call from assembly code.
> 

Yes, I had the same concern. I will rename it in next version.

Cheers,
Wei Chen

> >           early_uart_ready x23, 1
> >           ldrb  w1, [x0], #1           /* Load next char */
> >           cbz   w1, 1f                 /* Exit on nul */
> 
> Cheers,
> 
> [1] https://lore.kernel.org/all/20230113101136.479-13-julien@xen.org/
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R
  2023-01-18  3:00     ` Wei Chen
@ 2023-01-18  9:44       ` Julien Grall
  2023-01-18 10:22         ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-18  9:44 UTC (permalink / raw)
  To: Wei Chen, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Jiamei Xie



On 18/01/2023 03:00, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2023年1月18日 7:24
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Jiamei Xie
>> <Jiamei.Xie@arm.com>
>> Subject: Re: [PATCH v2 04/40] xen/arm: add an option to define Xen start
>> address for Armv8-R
>>
>> Hi Penny,
>>
>> On 13/01/2023 05:28, Penny Zheng wrote:
>>> From: Wei Chen <wei.chen@arm.com>
>>>
>>> On Armv8-A, Xen has a fixed virtual start address (link address
>>> too) for all Armv8-A platforms. In an MMU based system, Xen can
>>> map its loaded address to this virtual start address. So, on
>>> Armv8-A platforms, the Xen start address does not need to be
>>> configurable. But on Armv8-R platforms, there is no MMU to map
>>> loaded address to a fixed virtual address and different platforms
>>> will have very different address space layout. So Xen cannot use
>>> a fixed physical address on MPU based system and need to have it
>>> configurable.
>>>
>>> In this patch we introduce one Kconfig option for users to define
>>> the default Xen start address for Armv8-R. Users can enter the
>>> address in config time, or select the tailored platform config
>>> file from arch/arm/configs.
>>>
>>> And as we introduced Armv8-R platforms to Xen, that means the
>>> existed Arm64 platforms should not be listed in Armv8-R platform
>>> list, so we add !ARM_V8R dependency for these platforms.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> Signed-off-by: Jiamei.Xie <jiamei.xie@arm.com>
>>
>> Your signed-off-by is missing.
>>
>>> ---
>>> v1 -> v2:
>>> 1. Remove the platform header fvp_baser.h.
>>> 2. Remove the default start address for fvp_baser64.
>>> 3. Remove the description of default address from commit log.
>>> 4. Change HAS_MPU to ARM_V8R for Xen start address dependency.
>>>      No matter Arm-v8r board has MPU or not, it always need to
>>>      specify the start address.
>>
>> I don't quite understand the last sentence. Are you saying that it is
>> possible to have an ARMv8-R system with an MPU nor a page-table?
>>
> 
> Yes, from the Cortex-R82 page [1], you can see the MPU is optional in EL1
> and EL2:
> "Two optional and programmable MPUs controlled from EL1 and EL2 respectively."
Would this mean a vendor may provide their custom solution to protect 
the memory?

> 
> Although it is unlikely that vendors using the Armv8-R IP will do so, it
> is indeed an option. In the ID register, there are also related bits in
> ID_AA64MMFR0_EL1 (MSA_frac) to indicate this.
> 
>>> ---
>>>    xen/arch/arm/Kconfig           |  8 ++++++++
>>>    xen/arch/arm/platforms/Kconfig | 16 +++++++++++++---
>>>    2 files changed, 21 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>>> index ace7178c9a..c6b6b612d1 100644
>>> --- a/xen/arch/arm/Kconfig
>>> +++ b/xen/arch/arm/Kconfig
>>> @@ -145,6 +145,14 @@ config TEE
>>>    	  This option enables generic TEE mediators support. It allows
>> guests
>>>    	  to access real TEE via one of TEE mediators implemented in XEN.
>>>
>>> +config XEN_START_ADDRESS
>>> +	hex "Xen start address: keep default to use platform defined
>> address"
>>> +	default 0
>>> +	depends on ARM_V8R
>>
>> It is still pretty unclear to me what would be the difference between
>> HAS_MPU and ARM_V8R.
>>
> 
> If we don't want to support non-MPU supported Armv8-R, I think they are the
> same. IMO, non-MPU supported Armv8-R is meaningless to Xen.
OOI, why do you think this is meaningless?

> 
>>> +	help
>>> +	  This option allows to set the customized address at which Xen will
>> be
>>> +	  linked on MPU systems. This address must be aligned to a page size.
>>> +
>>>    source "arch/arm/tee/Kconfig"
>>>
>>>    config STATIC_SHM
>>> diff --git a/xen/arch/arm/platforms/Kconfig
>> b/xen/arch/arm/platforms/Kconfig
>>> index c93a6b2756..0904793a0b 100644
>>> --- a/xen/arch/arm/platforms/Kconfig
>>> +++ b/xen/arch/arm/platforms/Kconfig
>>> @@ -1,6 +1,7 @@
>>>    choice
>>>    	prompt "Platform Support"
>>>    	default ALL_PLAT
>>> +	default FVP_BASER if ARM_V8R
>>>    	---help---
>>>    	Choose which hardware platform to enable in Xen.
>>>
>>> @@ -8,13 +9,14 @@ choice
>>>
>>>    config ALL_PLAT
>>>    	bool "All Platforms"
>>> +	depends on !ARM_V8R
>>>    	---help---
>>>    	Enable support for all available hardware platforms. It doesn't
>>>    	automatically select any of the related drivers.
>>>
>>>    config QEMU
>>>    	bool "QEMU aarch virt machine support"
>>> -	depends on ARM_64
>>> +	depends on ARM_64 && !ARM_V8R
>>>    	select GICV3
>>>    	select HAS_PL011
>>>    	---help---
>>> @@ -23,7 +25,7 @@ config QEMU
>>>
>>>    config RCAR3
>>>    	bool "Renesas RCar3 support"
>>> -	depends on ARM_64
>>> +	depends on ARM_64 && !ARM_V8R
>>>    	select HAS_SCIF
>>>    	select IPMMU_VMSA
>>>    	---help---
>>> @@ -31,14 +33,22 @@ config RCAR3
>>>
>>>    config MPSOC
>>>    	bool "Xilinx Ultrascale+ MPSoC support"
>>> -	depends on ARM_64
>>> +	depends on ARM_64 && !ARM_V8R
>>>    	select HAS_CADENCE_UART
>>>    	select ARM_SMMU
>>>    	---help---
>>>    	Enable all the required drivers for Xilinx Ultrascale+ MPSoC
>>>
>>> +config FVP_BASER
>>> +	bool "Fixed Virtual Platform BaseR support"
>>> +	depends on ARM_V8R
>>> +	help
>>> +	  Enable platform specific configurations for Fixed Virtual
>>> +	  Platform BaseR
>>
>> This seems unrelated to this patch.
>>
> 
> Can we add some descriptions in commit log for this change, or we
> Should move it to a new patch? 

New patch please or introduce it in the patch where you need it.

We had preferred to use separate
> patches for this kind of changes, but we found the number of patches
> would become more and more. This problem has been bothering us for
> organizing patches.

I understand the concern of increasing the number of patches. However, 
this also needs to weight against the review.

In this case, it is very difficult for me to understand why we need to 
introduce FVP_BASER.

In fact, on the previous version, we discussed to not introduce any new 
platform specific config. So I am a bit surprised this is actually needed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S
  2023-01-18  3:09     ` Wei Chen
@ 2023-01-18  9:50       ` Julien Grall
  2023-01-18 10:24         ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-18  9:50 UTC (permalink / raw)
  To: Wei Chen, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk



On 18/01/2023 03:09, Wei Chen wrote:
> Hi Julien,
> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2023年1月18日 7:37
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related
>> code from head.S
>>
>> Hi Penny,
>>
>> On 13/01/2023 05:28, Penny Zheng wrote:
>>> From: Wei Chen <wei.chen@arm.com>
>>>
>>> We want to reuse head.S for MPU systems, but there are some
>>> code implemented for MMU systems only. We will move such
>>> code to another MMU specific file. But before that, we will
>>> do some preparations in this patch to make them easier
>>> for reviewing:
>>
>> Well, I agree that...
>>
>>> 1. Fix the indentations of code comments.
>>
>> ... changing the indentation is better here. But...
>>
>>> 2. Export some symbols that will be accessed out of file
>>>      scope.
>>
>> ... I have no idea which functions are going to be used in a separate
>> file. So I think they should belong to the patch moving the code.
>>
> 
> Ok, I will move these changes to the moving code patches.
> 
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>
>> Your signed-off-by is missing.
>>
>>> ---
>>> v1 -> v2:
>>> 1. New patch.
>>> ---
>>>    xen/arch/arm/arm64/head.S | 40 +++++++++++++++++++--------------------
>>>    1 file changed, 20 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
>>> index 93f9b0b9d5..b2214bc5e3 100644
>>> --- a/xen/arch/arm/arm64/head.S
>>> +++ b/xen/arch/arm/arm64/head.S
>>> @@ -136,22 +136,22 @@
>>>            add \xb, \xb, x20
>>>    .endm
>>>
>>> -        .section .text.header, "ax", %progbits
>>> -        /*.aarch64*/
>>> +.section .text.header, "ax", %progbits
>>> +/*.aarch64*/
>>
>> This change is not mentioned.
>>
> 
> I will add the description in commit message.
> 
>>>
>>> -        /*
>>> -         * Kernel startup entry point.
>>> -         * ---------------------------
>>> -         *
>>> -         * The requirements are:
>>> -         *   MMU = off, D-cache = off, I-cache = on or off,
>>> -         *   x0 = physical address to the FDT blob.
>>> -         *
>>> -         * This must be the very first address in the loaded image.
>>> -         * It should be linked at XEN_VIRT_START, and loaded at any
>>> -         * 4K-aligned address.  All of text+data+bss must fit in 2MB,
>>> -         * or the initial pagetable code below will need adjustment.
>>> -         */
>>> +/*
>>> + * Kernel startup entry point.
>>> + * ---------------------------
>>> + *
>>> + * The requirements are:
>>> + *   MMU = off, D-cache = off, I-cache = on or off,
>>> + *   x0 = physical address to the FDT blob.
>>> + *
>>> + * This must be the very first address in the loaded image.
>>> + * It should be linked at XEN_VIRT_START, and loaded at any
>>> + * 4K-aligned address.  All of text+data+bss must fit in 2MB,
>>> + * or the initial pagetable code below will need adjustment.
>>> + */
>>>
>>>    GLOBAL(start)
>>>            /*
>>> @@ -586,7 +586,7 @@ ENDPROC(cpu_init)
>>>     *
>>>     * Clobbers x0 - x4
>>>     */
>>> -create_page_tables:
>>> +ENTRY(create_page_tables)
>>
>> I am not sure about keeping this name. Now we have create_page_tables()
>> and arch_setup_page_tables().
>>
>> I would conside to name it create_boot_page_tables().
>>
> 
> Do you need me to rename it in this patch?

So looking at the rest of the series, I see you are already renaming the 
helper in patch #11. I think it would be better if the naming is done 
earlier.

That said, I am not convinced that create_page_tables() should actually 
be called externally.

In fact, you have something like:

    bl create_page_tables
    bl enable_mmu

Both will need a MMU/MPU specific implementation. So it would be better 
if we provide a wrapper to limit the number of external functions.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R
  2023-01-18  9:44       ` Julien Grall
@ 2023-01-18 10:22         ` Wei Chen
  2023-01-18 10:59           ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Wei Chen @ 2023-01-18 10:22 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Jiamei Xie

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 17:44
> To: Wei Chen <Wei.Chen@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>;
> Jiamei Xie <Jiamei.Xie@arm.com>
> Subject: Re: [PATCH v2 04/40] xen/arm: add an option to define Xen start
> address for Armv8-R
> 
> 
> 
> On 18/01/2023 03:00, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2023年1月18日 7:24
> >> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> >> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Jiamei Xie
> >> <Jiamei.Xie@arm.com>
> >> Subject: Re: [PATCH v2 04/40] xen/arm: add an option to define Xen
> start
> >> address for Armv8-R
> >>
> >> Hi Penny,
> >>
> >> On 13/01/2023 05:28, Penny Zheng wrote:
> >>> From: Wei Chen <wei.chen@arm.com>
> >>>
> >>> On Armv8-A, Xen has a fixed virtual start address (link address
> >>> too) for all Armv8-A platforms. In an MMU based system, Xen can
> >>> map its loaded address to this virtual start address. So, on
> >>> Armv8-A platforms, the Xen start address does not need to be
> >>> configurable. But on Armv8-R platforms, there is no MMU to map
> >>> loaded address to a fixed virtual address and different platforms
> >>> will have very different address space layout. So Xen cannot use
> >>> a fixed physical address on MPU based system and need to have it
> >>> configurable.
> >>>
> >>> In this patch we introduce one Kconfig option for users to define
> >>> the default Xen start address for Armv8-R. Users can enter the
> >>> address in config time, or select the tailored platform config
> >>> file from arch/arm/configs.
> >>>
> >>> And as we introduced Armv8-R platforms to Xen, that means the
> >>> existed Arm64 platforms should not be listed in Armv8-R platform
> >>> list, so we add !ARM_V8R dependency for these platforms.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> Signed-off-by: Jiamei.Xie <jiamei.xie@arm.com>
> >>
> >> Your signed-off-by is missing.
> >>
> >>> ---
> >>> v1 -> v2:
> >>> 1. Remove the platform header fvp_baser.h.
> >>> 2. Remove the default start address for fvp_baser64.
> >>> 3. Remove the description of default address from commit log.
> >>> 4. Change HAS_MPU to ARM_V8R for Xen start address dependency.
> >>>      No matter Arm-v8r board has MPU or not, it always need to
> >>>      specify the start address.
> >>
> >> I don't quite understand the last sentence. Are you saying that it is
> >> possible to have an ARMv8-R system with an MPU nor a page-table?
> >>
> >
> > Yes, from the Cortex-R82 page [1], you can see the MPU is optional in
> EL1
> > and EL2:
> > "Two optional and programmable MPUs controlled from EL1 and EL2
> respectively."
> Would this mean a vendor may provide their custom solution to protect
> the memory?
> 

Ah, you gave me a new idea, yes in the "ARM DDI 0600A.c G1.3.7" MSA_frac
of ID_AA64MMFR0_EL1 says:
0b0000 PMSAv8-64 not supported in any translation regime.
0b0000 is not permitted value.

So maybe you're right, on Armv8-R64, we always have MPU in EL1&EL2, the
optional is for MPU customization.

> >
> > Although it is unlikely that vendors using the Armv8-R IP will do so, it
> > is indeed an option. In the ID register, there are also related bits in
> > ID_AA64MMFR0_EL1 (MSA_frac) to indicate this.
> >
> >>> ---
> >>>    xen/arch/arm/Kconfig           |  8 ++++++++
> >>>    xen/arch/arm/platforms/Kconfig | 16 +++++++++++++---
> >>>    2 files changed, 21 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> >>> index ace7178c9a..c6b6b612d1 100644
> >>> --- a/xen/arch/arm/Kconfig
> >>> +++ b/xen/arch/arm/Kconfig
> >>> @@ -145,6 +145,14 @@ config TEE
> >>>    	  This option enables generic TEE mediators support. It allows
> >> guests
> >>>    	  to access real TEE via one of TEE mediators implemented in
> XEN.
> >>>
> >>> +config XEN_START_ADDRESS
> >>> +	hex "Xen start address: keep default to use platform defined
> >> address"
> >>> +	default 0
> >>> +	depends on ARM_V8R
> >>
> >> It is still pretty unclear to me what would be the difference between
> >> HAS_MPU and ARM_V8R.
> >>
> >
> > If we don't want to support non-MPU supported Armv8-R, I think they are
> the
> > same. IMO, non-MPU supported Armv8-R is meaningless to Xen.
> OOI, why do you think this is meaningless?

If there is Armv8-R board without EL2 MPU, how can we protect Xen? Of course,
if users don't care about security, Xen still can support it.

> 
> >
> >>> +	help
> >>> +	  This option allows to set the customized address at which Xen will
> >> be
> >>> +	  linked on MPU systems. This address must be aligned to a page size.
> >>> +
> >>>    source "arch/arm/tee/Kconfig"
> >>>
> >>>    config STATIC_SHM
> >>> diff --git a/xen/arch/arm/platforms/Kconfig
> >> b/xen/arch/arm/platforms/Kconfig
> >>> index c93a6b2756..0904793a0b 100644
> >>> --- a/xen/arch/arm/platforms/Kconfig
> >>> +++ b/xen/arch/arm/platforms/Kconfig
> >>> @@ -1,6 +1,7 @@
> >>>    choice
> >>>    	prompt "Platform Support"
> >>>    	default ALL_PLAT
> >>> +	default FVP_BASER if ARM_V8R
> >>>    	---help---
> >>>    	Choose which hardware platform to enable in Xen.
> >>>
> >>> @@ -8,13 +9,14 @@ choice
> >>>
> >>>    config ALL_PLAT
> >>>    	bool "All Platforms"
> >>> +	depends on !ARM_V8R
> >>>    	---help---
> >>>    	Enable support for all available hardware platforms. It
> doesn't
> >>>    	automatically select any of the related drivers.
> >>>
> >>>    config QEMU
> >>>    	bool "QEMU aarch virt machine support"
> >>> -	depends on ARM_64
> >>> +	depends on ARM_64 && !ARM_V8R
> >>>    	select GICV3
> >>>    	select HAS_PL011
> >>>    	---help---
> >>> @@ -23,7 +25,7 @@ config QEMU
> >>>
> >>>    config RCAR3
> >>>    	bool "Renesas RCar3 support"
> >>> -	depends on ARM_64
> >>> +	depends on ARM_64 && !ARM_V8R
> >>>    	select HAS_SCIF
> >>>    	select IPMMU_VMSA
> >>>    	---help---
> >>> @@ -31,14 +33,22 @@ config RCAR3
> >>>
> >>>    config MPSOC
> >>>    	bool "Xilinx Ultrascale+ MPSoC support"
> >>> -	depends on ARM_64
> >>> +	depends on ARM_64 && !ARM_V8R
> >>>    	select HAS_CADENCE_UART
> >>>    	select ARM_SMMU
> >>>    	---help---
> >>>    	Enable all the required drivers for Xilinx Ultrascale+ MPSoC
> >>>
> >>> +config FVP_BASER
> >>> +	bool "Fixed Virtual Platform BaseR support"
> >>> +	depends on ARM_V8R
> >>> +	help
> >>> +	  Enable platform specific configurations for Fixed Virtual
> >>> +	  Platform BaseR
> >>
> >> This seems unrelated to this patch.
> >>
> >
> > Can we add some descriptions in commit log for this change, or we
> > Should move it to a new patch?
> 
> New patch please or introduce it in the patch where you need it.
> 
> We had preferred to use separate
> > patches for this kind of changes, but we found the number of patches
> > would become more and more. This problem has been bothering us for
> > organizing patches.
> 
> I understand the concern of increasing the number of patches. However,
> this also needs to weight against the review.
> 

Understand.

> In this case, it is very difficult for me to understand why we need to
> introduce FVP_BASER.
> 
> In fact, on the previous version, we discussed to not introduce any new
> platform specific config. So I am a bit surprised this is actually needed.
> 

No, this is no true, it's my mistake, I forgot to remove FVP_BASER from
this Kconfig. Actually, we do not need this one. We also don't need a
new patch for it.

Cheers,
Wei Chen

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S
  2023-01-18  9:50       ` Julien Grall
@ 2023-01-18 10:24         ` Wei Chen
  0 siblings, 0 replies; 122+ messages in thread
From: Wei Chen @ 2023-01-18 10:24 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2023年1月18日 17:50
> To: Wei Chen <Wei.Chen@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related
> code from head.S
> 
> 
> 
> On 18/01/2023 03:09, Wei Chen wrote:
> > Hi Julien,
> >
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2023年1月18日 7:37
> >> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> >> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> >> Subject: Re: [PATCH v2 05/40] xen/arm64: prepare for moving MMU related
> >> code from head.S
> >>
> >> Hi Penny,
> >>
> >> On 13/01/2023 05:28, Penny Zheng wrote:
> >>> From: Wei Chen <wei.chen@arm.com>
> >>>
> >>> We want to reuse head.S for MPU systems, but there are some
> >>> code implemented for MMU systems only. We will move such
> >>> code to another MMU specific file. But before that, we will
> >>> do some preparations in this patch to make them easier
> >>> for reviewing:
> >>
> >> Well, I agree that...
> >>
> >>> 1. Fix the indentations of code comments.
> >>
> >> ... changing the indentation is better here. But...
> >>
> >>> 2. Export some symbols that will be accessed out of file
> >>>      scope.
> >>
> >> ... I have no idea which functions are going to be used in a separate
> >> file. So I think they should belong to the patch moving the code.
> >>
> >
> > Ok, I will move these changes to the moving code patches.
> >
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>
> >> Your signed-off-by is missing.
> >>
> >>> ---
> >>> v1 -> v2:
> >>> 1. New patch.
> >>> ---
> >>>    xen/arch/arm/arm64/head.S | 40 +++++++++++++++++++-----------------
> ---
> >>>    1 file changed, 20 insertions(+), 20 deletions(-)
> >>>
> >>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> >>> index 93f9b0b9d5..b2214bc5e3 100644
> >>> --- a/xen/arch/arm/arm64/head.S
> >>> +++ b/xen/arch/arm/arm64/head.S
> >>> @@ -136,22 +136,22 @@
> >>>            add \xb, \xb, x20
> >>>    .endm
> >>>
> >>> -        .section .text.header, "ax", %progbits
> >>> -        /*.aarch64*/
> >>> +.section .text.header, "ax", %progbits
> >>> +/*.aarch64*/
> >>
> >> This change is not mentioned.
> >>
> >
> > I will add the description in commit message.
> >
> >>>
> >>> -        /*
> >>> -         * Kernel startup entry point.
> >>> -         * ---------------------------
> >>> -         *
> >>> -         * The requirements are:
> >>> -         *   MMU = off, D-cache = off, I-cache = on or off,
> >>> -         *   x0 = physical address to the FDT blob.
> >>> -         *
> >>> -         * This must be the very first address in the loaded image.
> >>> -         * It should be linked at XEN_VIRT_START, and loaded at any
> >>> -         * 4K-aligned address.  All of text+data+bss must fit in 2MB,
> >>> -         * or the initial pagetable code below will need adjustment.
> >>> -         */
> >>> +/*
> >>> + * Kernel startup entry point.
> >>> + * ---------------------------
> >>> + *
> >>> + * The requirements are:
> >>> + *   MMU = off, D-cache = off, I-cache = on or off,
> >>> + *   x0 = physical address to the FDT blob.
> >>> + *
> >>> + * This must be the very first address in the loaded image.
> >>> + * It should be linked at XEN_VIRT_START, and loaded at any
> >>> + * 4K-aligned address.  All of text+data+bss must fit in 2MB,
> >>> + * or the initial pagetable code below will need adjustment.
> >>> + */
> >>>
> >>>    GLOBAL(start)
> >>>            /*
> >>> @@ -586,7 +586,7 @@ ENDPROC(cpu_init)
> >>>     *
> >>>     * Clobbers x0 - x4
> >>>     */
> >>> -create_page_tables:
> >>> +ENTRY(create_page_tables)
> >>
> >> I am not sure about keeping this name. Now we have create_page_tables()
> >> and arch_setup_page_tables().
> >>
> >> I would conside to name it create_boot_page_tables().
> >>
> >
> > Do you need me to rename it in this patch?
> 
> So looking at the rest of the series, I see you are already renaming the
> helper in patch #11. I think it would be better if the naming is done
> earlier.
> 
> That said, I am not convinced that create_page_tables() should actually
> be called externally.
> 
> In fact, you have something like:
> 
>     bl create_page_tables
>     bl enable_mmu
> 
> Both will need a MMU/MPU specific implementation. So it would be better
> if we provide a wrapper to limit the number of external functions.
>

I agree with you, we will try to wrapper some functions instead of
export them.

Cheers,
Wei Chen
 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections
  2023-01-18  2:18     ` Wei Chen
@ 2023-01-18 10:55       ` Julien Grall
  2023-01-18 11:40         ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-18 10:55 UTC (permalink / raw)
  To: Wei Chen, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

On 18/01/2023 02:18, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2023年1月18日 7:46
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity
>> map sections
>>
>> Hi,
>>
>> On 13/01/2023 05:28, Penny Zheng wrote:
>>> From: Wei Chen <wei.chen@arm.com>
>>>
>>> Only the first 4KB of Xen image will be mapped as identity
>>> (PA == VA). At the moment, Xen guarantees this by having
>>> everything that needs to be used in the identity mapping
>>> in head.S before _end_boot and checking at link time if this
>>> fits in 4KB.
>>>
>>> In previous patch, we have moved the MMU code outside of
>>> head.S. Although we have added .text.header to the new file
>>> to guarantee all identity map code still in the first 4KB.
>>> However, the order of these two files on this 4KB depends
>>> on the build tools. Currently, we use the build tools to
>>> process the order of objs in the Makefile to ensure that
>>> head.S must be at the top. But if you change to another build
>>> tools, it may not be the same result.
>>
>> Right, so this is fixing a bug you introduced in the previous patch. We
>> should really avoid introducing (latent) regression in a series. So
>> please re-order the patches.
>>
> 
> Ok.
> 
>>>
>>> In this patch we introduce .text.idmap to head_mmu.S, and
>>> add this section after .text.header. to ensure code of
>>> head_mmu.S after the code of header.S.
>>>
>>> After this, we will still include some code that does not
>>> belong to identity map before _end_boot. Because we have
>>> moved _end_boot to head_mmu.S.
>>
>> I dislike this approach because you are expecting that only head_mmu.S
>> will be part of .text.idmap. If it is not, everything could blow up again.
>>
> 
> I agree.
> 
>> That said, if you look at staging, you will notice that now _end_boot is
>> defined in the linker script to avoid any issue.
>>
> 
> Sorry, I am not quite clear about this comment. The _end_boot of original
> staging branch is defined in head.S. And I am not quite sure how this
> _end_boot solve multiple files contain idmap code.

If you look at the latest staging, there is a commit (229ebd517b9d) that 
now define _end_boot in the linker script.

The .text.idmap section can be added before the definition of _end_boot.

> 
> Cheers,
> Wei Chen
> 
>>> That means all code in head.S
>>> will be included before _end_boot. In this patch, we also
>>> added .text flag in the place of original _end_boot in head.S.
>>> All the code after .text in head.S will not be included in
>>> identity map section.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>> v1 -> v2:
>>> 1. New patch.
>>> ---
>>>    xen/arch/arm/arm64/head.S     | 6 ++++++
>>>    xen/arch/arm/arm64/head_mmu.S | 2 +-
>>>    xen/arch/arm/xen.lds.S        | 1 +
>>>    3 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
>>> index 5cfa47279b..782bd1f94c 100644
>>> --- a/xen/arch/arm/arm64/head.S
>>> +++ b/xen/arch/arm/arm64/head.S
>>> @@ -466,6 +466,12 @@ fail:   PRINT("- Boot failed -\r\n")
>>>            b     1b
>>>    ENDPROC(fail)
>>>
>>> +/*
>>> + * For the code that do not need in indentity map section,
>>> + * we put them back to normal .text section
>>> + */
>>> +.section .text, "ax", %progbits
>>> +
>>
>> I would argue that puts wants to be part of the idmap.
>>
> 
> I am ok to move puts to idmap. But from the original head.S, puts is
> placed after _end_boot, and from the xen.ld.S, we can see idmap is
> area is the section of "_end_boot - start". 

The original position of _end_boot is wrong. It didn't take into account 
the literal pool (there are at the end of the unit). So they would be 
past _end_boot.

> The reason of moving puts
> to idmap is because we're using it in idmap?

I guess it depends of what idmap really mean here. If you only interpret 
as the MMU is on and VA == PA. Then not yet (I was thinking to introduce 
a few calls).

If you also include the MMU off. Then yes.

Also, in the context of cache coloring, we will need to have a 
trampoline for cache coloring. So it would be better to keep everything 
close together as it is easier to copy.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R
  2023-01-18 10:22         ` Wei Chen
@ 2023-01-18 10:59           ` Julien Grall
  2023-01-18 11:27             ` Wei Chen
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-18 10:59 UTC (permalink / raw)
  To: Wei Chen, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Jiamei Xie

Hi,

On 18/01/2023 10:22, Wei Chen wrote:
>>> Although it is unlikely that vendors using the Armv8-R IP will do so, it
>>> is indeed an option. In the ID register, there are also related bits in
>>> ID_AA64MMFR0_EL1 (MSA_frac) to indicate this.
>>>
>>>>> ---
>>>>>     xen/arch/arm/Kconfig           |  8 ++++++++
>>>>>     xen/arch/arm/platforms/Kconfig | 16 +++++++++++++---
>>>>>     2 files changed, 21 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>>>>> index ace7178c9a..c6b6b612d1 100644
>>>>> --- a/xen/arch/arm/Kconfig
>>>>> +++ b/xen/arch/arm/Kconfig
>>>>> @@ -145,6 +145,14 @@ config TEE
>>>>>     	  This option enables generic TEE mediators support. It allows
>>>> guests
>>>>>     	  to access real TEE via one of TEE mediators implemented in
>> XEN.
>>>>>
>>>>> +config XEN_START_ADDRESS
>>>>> +	hex "Xen start address: keep default to use platform defined
>>>> address"
>>>>> +	default 0
>>>>> +	depends on ARM_V8R
>>>>
>>>> It is still pretty unclear to me what would be the difference between
>>>> HAS_MPU and ARM_V8R.
>>>>
>>>
>>> If we don't want to support non-MPU supported Armv8-R, I think they are
>> the
>>> same. IMO, non-MPU supported Armv8-R is meaningless to Xen.
>> OOI, why do you think this is meaningless?
> 
> If there is Armv8-R board without EL2 MPU, how can we protect Xen?

So what you call EL2 MPU is an MPU that is following the Arm 
specification. In theory, you could have a proprietary mechanism for that.

So the question is whether a system not following the Arm specification 
is allowed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R
  2023-01-18 10:59           ` Julien Grall
@ 2023-01-18 11:27             ` Wei Chen
  0 siblings, 0 replies; 122+ messages in thread
From: Wei Chen @ 2023-01-18 11:27 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Jiamei Xie

Hi Julien,

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Julien Grall
> Sent: 2023年1月18日 19:00
> To: Wei Chen <Wei.Chen@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>;
> Jiamei Xie <Jiamei.Xie@arm.com>
> Subject: Re: [PATCH v2 04/40] xen/arm: add an option to define Xen start
> address for Armv8-R
> 
> Hi,
> 
> On 18/01/2023 10:22, Wei Chen wrote:
> >>> Although it is unlikely that vendors using the Armv8-R IP will do so,
> it
> >>> is indeed an option. In the ID register, there are also related bits
> in
> >>> ID_AA64MMFR0_EL1 (MSA_frac) to indicate this.
> >>>
> >>>>> ---
> >>>>>     xen/arch/arm/Kconfig           |  8 ++++++++
> >>>>>     xen/arch/arm/platforms/Kconfig | 16 +++++++++++++---
> >>>>>     2 files changed, 21 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> >>>>> index ace7178c9a..c6b6b612d1 100644
> >>>>> --- a/xen/arch/arm/Kconfig
> >>>>> +++ b/xen/arch/arm/Kconfig
> >>>>> @@ -145,6 +145,14 @@ config TEE
> >>>>>     	  This option enables generic TEE mediators support. It allows
> >>>> guests
> >>>>>     	  to access real TEE via one of TEE mediators implemented in
> >> XEN.
> >>>>>
> >>>>> +config XEN_START_ADDRESS
> >>>>> +	hex "Xen start address: keep default to use platform defined
> >>>> address"
> >>>>> +	default 0
> >>>>> +	depends on ARM_V8R
> >>>>
> >>>> It is still pretty unclear to me what would be the difference between
> >>>> HAS_MPU and ARM_V8R.
> >>>>
> >>>
> >>> If we don't want to support non-MPU supported Armv8-R, I think they
> are
> >> the
> >>> same. IMO, non-MPU supported Armv8-R is meaningless to Xen.
> >> OOI, why do you think this is meaningless?
> >
> > If there is Armv8-R board without EL2 MPU, how can we protect Xen?
> 
> So what you call EL2 MPU is an MPU that is following the Arm
> specification. In theory, you could have a proprietary mechanism for that.
> 
> So the question is whether a system not following the Arm specification
> is allowed.
> 

I think no, the PMSA is an architectural feature, the spec contains CPU and MPU
interfaces. Vendors can have their own hardware implementation, but need follow
the Arm spec.

But I agree that, here we could change to "depends on HAS_MPU" which will make
It easier to used by other Arm Architecture or other architecture in the future.

Cheers,
Wei Chen

> Cheers,
> 
> --
> Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections
  2023-01-18 10:55       ` Julien Grall
@ 2023-01-18 11:40         ` Wei Chen
  0 siblings, 0 replies; 122+ messages in thread
From: Wei Chen @ 2023-01-18 11:40 UTC (permalink / raw)
  To: Julien Grall, Penny Zheng, xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> >>>
> >>> In this patch we introduce .text.idmap to head_mmu.S, and
> >>> add this section after .text.header. to ensure code of
> >>> head_mmu.S after the code of header.S.
> >>>
> >>> After this, we will still include some code that does not
> >>> belong to identity map before _end_boot. Because we have
> >>> moved _end_boot to head_mmu.S.
> >>
> >> I dislike this approach because you are expecting that only head_mmu.S
> >> will be part of .text.idmap. If it is not, everything could blow up
> again.
> >>
> >
> > I agree.
> >
> >> That said, if you look at staging, you will notice that now _end_boot
> is
> >> defined in the linker script to avoid any issue.
> >>
> >
> > Sorry, I am not quite clear about this comment. The _end_boot of
> original
> > staging branch is defined in head.S. And I am not quite sure how this
> > _end_boot solve multiple files contain idmap code.
> 
> If you look at the latest staging, there is a commit (229ebd517b9d) that
> now define _end_boot in the linker script.
> 
> The .text.idmap section can be added before the definition of _end_boot.
> 

Oh, my branch was a little old, I have seen this new definition in xen.ld.S
after I update the branch. I understand now.

> >
> > Cheers,
> > Wei Chen
> >
> >>> That means all code in head.S
> >>> will be included before _end_boot. In this patch, we also
> >>> added .text flag in the place of original _end_boot in head.S.
> >>> All the code after .text in head.S will not be included in
> >>> identity map section.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>> v1 -> v2:
> >>> 1. New patch.
> >>> ---
> >>>    xen/arch/arm/arm64/head.S     | 6 ++++++
> >>>    xen/arch/arm/arm64/head_mmu.S | 2 +-
> >>>    xen/arch/arm/xen.lds.S        | 1 +
> >>>    3 files changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> >>> index 5cfa47279b..782bd1f94c 100644
> >>> --- a/xen/arch/arm/arm64/head.S
> >>> +++ b/xen/arch/arm/arm64/head.S
> >>> @@ -466,6 +466,12 @@ fail:   PRINT("- Boot failed -\r\n")
> >>>            b     1b
> >>>    ENDPROC(fail)
> >>>
> >>> +/*
> >>> + * For the code that do not need in indentity map section,
> >>> + * we put them back to normal .text section
> >>> + */
> >>> +.section .text, "ax", %progbits
> >>> +
> >>
> >> I would argue that puts wants to be part of the idmap.
> >>
> >
> > I am ok to move puts to idmap. But from the original head.S, puts is
> > placed after _end_boot, and from the xen.ld.S, we can see idmap is
> > area is the section of "_end_boot - start".
> 
> The original position of _end_boot is wrong. It didn't take into account
> the literal pool (there are at the end of the unit). So they would be
> past _end_boot.
> 

Ok.

> > The reason of moving puts
> > to idmap is because we're using it in idmap?
> 
> I guess it depends of what idmap really mean here. If you only interpret
> as the MMU is on and VA == PA. Then not yet (I was thinking to introduce
> a few calls).
> 
> If you also include the MMU off. Then yes.
> 
> Also, in the context of cache coloring, we will need to have a
> trampoline for cache coloring. So it would be better to keep everything
> close together as it is easier to copy.
> 

Understand, thanks!

Cheers,
Wei Chen

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-13  5:28 ` [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map Penny Zheng
@ 2023-01-19 10:18   ` Ayan Kumar Halder
  2023-01-29  6:47     ` Penny Zheng
  2023-01-19 15:04   ` Julien Grall
  1 sibling, 1 reply; 122+ messages in thread
From: Ayan Kumar Halder @ 2023-01-19 10:18 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Chen, Penny Zheng, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr_Babchuk


On 13/01/2023 05:28, Penny Zheng wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
>
>
> From: Penny Zheng <penny.zheng@arm.com>
>
> The start-of-day Xen MPU memory region layout shall be like as follows:
>
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data
> xen_mpumap[3] : Xen read-write data
> xen_mpumap[4] : Xen BSS
> ......
> xen_mpumap[max_xen_mpumap - 2]: Xen init data
> xen_mpumap[max_xen_mpumap - 1]: Xen init text
>
> max_xen_mpumap refers to the number of regions supported by the EL2 MPU.
> The layout shall be compliant with what we describe in xen.lds.S, or the
> codes need adjustment.
>
> As MMU system and MPU system have different functions to create
> the boot MMU/MPU memory management data, instead of introducing
> extra #ifdef in main code flow, we introduce a neutral name
> prepare_early_mappings for both, and also to replace create_page_tables for MMU.
>
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/arm64/Makefile              |   2 +
>   xen/arch/arm/arm64/head.S                |  17 +-
>   xen/arch/arm/arm64/head_mmu.S            |   4 +-
>   xen/arch/arm/arm64/head_mpu.S            | 323 +++++++++++++++++++++++
>   xen/arch/arm/include/asm/arm64/mpu.h     |  63 +++++
>   xen/arch/arm/include/asm/arm64/sysregs.h |  49 ++++
>   xen/arch/arm/mm_mpu.c                    |  48 ++++
>   xen/arch/arm/xen.lds.S                   |   4 +
>   8 files changed, 502 insertions(+), 8 deletions(-)
>   create mode 100644 xen/arch/arm/arm64/head_mpu.S
>   create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
>   create mode 100644 xen/arch/arm/mm_mpu.c
>
> diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
> index 22da2f54b5..438c9737ad 100644
> --- a/xen/arch/arm/arm64/Makefile
> +++ b/xen/arch/arm/arm64/Makefile
> @@ -10,6 +10,8 @@ obj-y += entry.o
>   obj-y += head.o
>   ifneq ($(CONFIG_HAS_MPU),y)
>   obj-y += head_mmu.o
> +else
> +obj-y += head_mpu.o
>   endif
>   obj-y += insn.o
>   obj-$(CONFIG_LIVEPATCH) += livepatch.o
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 782bd1f94c..145e3d53dc 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -68,9 +68,9 @@
>    *  x24 -
>    *  x25 -
>    *  x26 - skip_zero_bss (boot cpu only)
> - *  x27 -
> - *  x28 -
> - *  x29 -
> + *  x27 - region selector (mpu only)
> + *  x28 - prbar (mpu only)
> + *  x29 - prlar (mpu only)
>    *  x30 - lr
>    */
>
> @@ -82,7 +82,7 @@
>    * ---------------------------
>    *
>    * The requirements are:
> - *   MMU = off, D-cache = off, I-cache = on or off,
> + *   MMU/MPU = off, D-cache = off, I-cache = on or off,
>    *   x0 = physical address to the FDT blob.
>    *
>    * This must be the very first address in the loaded image.
> @@ -252,7 +252,12 @@ real_start_efi:
>
>           bl    check_cpu_mode
>           bl    cpu_init
> -        bl    create_page_tables
> +
> +        /*
> +         * Create boot memory management data, pagetable for MMU systems
> +         * and memory regions for MPU systems.
> +         */
> +        bl    prepare_early_mappings
>           bl    enable_mmu
>
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
> @@ -310,7 +315,7 @@ GLOBAL(init_secondary)
>   #endif
>           bl    check_cpu_mode
>           bl    cpu_init
> -        bl    create_page_tables
> +        bl    prepare_early_mappings
>           bl    enable_mmu
>
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
> diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
> index 6ff13c751c..2346f755df 100644
> --- a/xen/arch/arm/arm64/head_mmu.S
> +++ b/xen/arch/arm/arm64/head_mmu.S
> @@ -123,7 +123,7 @@
>    *
>    * Clobbers x0 - x4
>    */
> -ENTRY(create_page_tables)
> +ENTRY(prepare_early_mappings)
>           /* Prepare the page-tables for mapping Xen */
>           ldr   x0, =XEN_VIRT_START
>           create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
> @@ -208,7 +208,7 @@ virtphys_clash:
>           /* Identity map clashes with boot_third, which we cannot handle yet */
>           PRINT("- Unable to build boot page tables - virt and phys addresses clash. -\r\n")
>           b     fail
> -ENDPROC(create_page_tables)
> +ENDPROC(prepare_early_mappings)

NIT:- Can this renaming be done in a separate patch of its own (before 
this patch).

So that this patch can be only about the new functionality introduced.

>
>   /*
>    * Turn on the Data Cache and the MMU. The function will return on the 1:1
> diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
> new file mode 100644
> index 0000000000..0b97ce4646
> --- /dev/null
> +++ b/xen/arch/arm/arm64/head_mpu.S
> @@ -0,0 +1,323 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Start-of-day code for an Armv8-R AArch64 MPU system.
> + */
> +
> +#include <asm/arm64/mpu.h>
> +#include <asm/early_printk.h>
> +#include <asm/page.h>
> +
> +/*
> + * One entry in Xen MPU memory region mapping table(xen_mpumap) is a structure
> + * of pr_t, which is 16-bytes size, so the entry offset is the order of 4.
> + */
NIT :- It would be good to quote Arm ARM from which can be referred for 
the definitions.
> +#define MPU_ENTRY_SHIFT         0x4
> +
> +#define REGION_SEL_MASK         0xf
> +
> +#define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
> +#define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
> +#define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
> +
> +#define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
> +
> +/*
> + * Macro to round up the section address to be PAGE_SIZE aligned
> + * Each section(e.g. .text, .data, etc) in xen.lds.S is page-aligned,
> + * which is usually guarded with ". = ALIGN(PAGE_SIZE)" in the head,
> + * or in the end
> + */
> +.macro roundup_section, xb
> +        add   \xb, \xb, #(PAGE_SIZE-1)
> +        and   \xb, \xb, #PAGE_MASK
> +.endm
> +
> +/*
> + * Macro to create a new MPU memory region entry, which is a structure
> + * of pr_t,  in \prmap.
> + *
> + * Inputs:
> + * prmap:   mpu memory region map table symbol
> + * sel:     region selector
> + * prbar:   preserve value for PRBAR_EL2
> + * prlar    preserve value for PRLAR_EL2
> + *
> + * Clobbers \tmp1, \tmp2
> + *
> + */
> +.macro create_mpu_entry prmap, sel, prbar, prlar, tmp1, tmp2
> +    mov   \tmp2, \sel
> +    lsl   \tmp2, \tmp2, #MPU_ENTRY_SHIFT
> +    adr_l \tmp1, \prmap
> +    /* Write the first 8 bytes(prbar_t) of pr_t */
> +    str   \prbar, [\tmp1, \tmp2]
> +
> +    add   \tmp2, \tmp2, #8
> +    /* Write the last 8 bytes(prlar_t) of pr_t */
> +    str   \prlar, [\tmp1, \tmp2]
> +.endm
> +
> +/*
> + * Macro to store the maximum number of regions supported by the EL2 MPU
> + * in max_xen_mpumap, which is identified by MPUIR_EL2.
> + *
> + * Outputs:
> + * nr_regions: preserve the maximum number of regions supported by the EL2 MPU
> + *
> + * Clobbers \tmp1
> + *
> + */
> +.macro read_max_el2_regions, nr_regions, tmp1
> +    load_paddr \tmp1, max_xen_mpumap
> +    mrs   \nr_regions, MPUIR_EL2
> +    isb
> +    str   \nr_regions, [\tmp1]
> +.endm
> +
> +/*
> + * Macro to prepare and set a MPU memory region
> + *
> + * Inputs:
> + * base:        base address symbol (should be page-aligned)
> + * limit:       limit address symbol
> + * sel:         region selector
> + * prbar:       store computed PRBAR_EL2 value
> + * prlar:       store computed PRLAR_EL2 value
> + * attr_prbar:  PRBAR_EL2-related memory attributes. If not specified it will be REGION_DATA_PRBAR
> + * attr_prlar:  PRLAR_EL2-related memory attributes. If not specified it will be REGION_NORMAL_PRLAR
> + *
> + * Clobber \tmp1
> + *
> + */
> +.macro prepare_xen_region, base, limit, sel, prbar, prlar, tmp1, attr_prbar=REGION_DATA_PRBAR, attr_prlar=REGION_NORMAL_PRLAR
> +    /* Prepare value for PRBAR_EL2 reg and preserve it in \prbar.*/
> +    load_paddr \prbar, \base
> +    and   \prbar, \prbar, #MPU_REGION_MASK
> +    mov   \tmp1, #\attr_prbar
> +    orr   \prbar, \prbar, \tmp1
> +
> +    /* Prepare value for PRLAR_EL2 reg and preserve it in \prlar.*/
> +    load_paddr \prlar, \limit
> +    /* Round up limit address to be PAGE_SIZE aligned */
> +    roundup_section \prlar
> +    /* Limit address should be inclusive */
> +    sub   \prlar, \prlar, #1
> +    and   \prlar, \prlar, #MPU_REGION_MASK
> +    mov   \tmp1, #\attr_prlar
> +    orr   \prlar, \prlar, \tmp1
> +
> +    mov   x27, \sel
> +    mov   x28, \prbar
> +    mov   x29, \prlar

Any reasons for using x27, x28, x29 to pass function parameters.

https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst 
states x0..x7 should be used (Table 2, General-purpose registers and 
AAPCS64 usage).

> +    /*
> +     * x27, x28, x29 are special registers designed as
> +     * inputs for function write_pr
> +     */
> +    bl    write_pr
> +.endm
> +
> +.section .text.idmap, "ax", %progbits
> +
> +/*
> + * ENTRY to configure a EL2 MPU memory region
> + * ARMv8-R AArch64 at most supports 255 MPU protection regions.
> + * See section G1.3.18 of the reference manual for ARMv8-R AArch64,
> + * PRBAR<n>_EL2 and PRLAR<n>_EL2 provides access to the EL2 MPU region
> + * determined by the value of 'n' and PRSELR_EL2.REGION as
> + * PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
> + * For example to access regions from 16 to 31 (0b10000 to 0b11111):
> + * - Set PRSELR_EL2 to 0b1xxxx
> + * - Region 16 configuration is accessible through PRBAR0_EL2 and PRLAR0_EL2
> + * - Region 17 configuration is accessible through PRBAR1_EL2 and PRLAR1_EL2
> + * - Region 18 configuration is accessible through PRBAR2_EL2 and PRLAR2_EL2
> + * - ...
> + * - Region 31 configuration is accessible through PRBAR15_EL2 and PRLAR15_EL2
> + *
> + * Inputs:
> + * x27: region selector
> + * x28: preserve value for PRBAR_EL2
> + * x29: preserve value for PRLAR_EL2
> + *
> + */
> +ENTRY(write_pr)
> +    msr   PRSELR_EL2, x27
> +    dsb   sy
> +    and   x27, x27, #REGION_SEL_MASK
> +    cmp   x27, #0
> +    bne   1f
> +    msr   PRBAR0_EL2, x28
> +    msr   PRLAR0_EL2, x29
> +    b     out
> +1:
> +    cmp   x27, #1
> +    bne   2f
> +    msr   PRBAR1_EL2, x28
> +    msr   PRLAR1_EL2, x29
> +    b     out
> +2:
> +    cmp   x27, #2
> +    bne   3f
> +    msr   PRBAR2_EL2, x28
> +    msr   PRLAR2_EL2, x29
> +    b     out
> +3:
> +    cmp   x27, #3
> +    bne   4f
> +    msr   PRBAR3_EL2, x28
> +    msr   PRLAR3_EL2, x29
> +    b     out
> +4:
> +    cmp   x27, #4
> +    bne   5f
> +    msr   PRBAR4_EL2, x28
> +    msr   PRLAR4_EL2, x29
> +    b     out
> +5:
> +    cmp   x27, #5
> +    bne   6f
> +    msr   PRBAR5_EL2, x28
> +    msr   PRLAR5_EL2, x29
> +    b     out
> +6:
> +    cmp   x27, #6
> +    bne   7f
> +    msr   PRBAR6_EL2, x28
> +    msr   PRLAR6_EL2, x29
> +    b     out
> +7:
> +    cmp   x27, #7
> +    bne   8f
> +    msr   PRBAR7_EL2, x28
> +    msr   PRLAR7_EL2, x29
> +    b     out
> +8:
> +    cmp   x27, #8
> +    bne   9f
> +    msr   PRBAR8_EL2, x28
> +    msr   PRLAR8_EL2, x29
> +    b     out
> +9:
> +    cmp   x27, #9
> +    bne   10f
> +    msr   PRBAR9_EL2, x28
> +    msr   PRLAR9_EL2, x29
> +    b     out
> +10:
> +    cmp   x27, #10
> +    bne   11f
> +    msr   PRBAR10_EL2, x28
> +    msr   PRLAR10_EL2, x29
> +    b     out
> +11:
> +    cmp   x27, #11
> +    bne   12f
> +    msr   PRBAR11_EL2, x28
> +    msr   PRLAR11_EL2, x29
> +    b     out
> +12:
> +    cmp   x27, #12
> +    bne   13f
> +    msr   PRBAR12_EL2, x28
> +    msr   PRLAR12_EL2, x29
> +    b     out
> +13:
> +    cmp   x27, #13
> +    bne   14f
> +    msr   PRBAR13_EL2, x28
> +    msr   PRLAR13_EL2, x29
> +    b     out
> +14:
> +    cmp   x27, #14
> +    bne   15f
> +    msr   PRBAR14_EL2, x28
> +    msr   PRLAR14_EL2, x29
> +    b     out
> +15:
> +    msr   PRBAR15_EL2, x28
> +    msr   PRLAR15_EL2, x29
> +out:
> +    isb
> +    ret
> +ENDPROC(write_pr)
> +
> +/*
> + * Static start-of-day Xen EL2 MPU memory region layout.
> + *
> + *     xen_mpumap[0] : Xen text
> + *     xen_mpumap[1] : Xen read-only data
> + *     xen_mpumap[2] : Xen read-only after init data
> + *     xen_mpumap[3] : Xen read-write data
> + *     xen_mpumap[4] : Xen BSS
> + *     ......
> + *     xen_mpumap[max_xen_mpumap - 2]: Xen init data
> + *     xen_mpumap[max_xen_mpumap - 1]: Xen init text
> + *
> + * Clobbers x0 - x6
> + *
> + * It shall be compliant with what describes in xen.lds.S, or the below
> + * codes need adjustment.
> + * It shall also follow the rules of putting fixed MPU memory region in
> + * the front, and the others in the rear, which, here, mainly refers to
> + * boot-only region, like Xen init text region.
> + */
> +ENTRY(prepare_early_mappings)
> +    /* stack LR as write_pr will be called later like nested function */
> +    mov   x6, lr
> +
> +    /* x0: region sel */
> +    mov   x0, xzr
> +    /* Xen text section. */
> +    prepare_xen_region _stext, _etext, x0, x1, x2, x3, attr_prbar=REGION_TEXT_PRBAR
> +    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
> +
> +    add   x0, x0, #1
> +    /* Xen read-only data section. */
> +    prepare_xen_region _srodata, _erodata, x0, x1, x2, x3, attr_prbar=REGION_RO_PRBAR
> +    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
> +
> +    add   x0, x0, #1
> +    /* Xen read-only after init data section. */
> +    prepare_xen_region __ro_after_init_start, __ro_after_init_end, x0, x1, x2, x3
> +    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
> +
> +    add   x0, x0, #1
> +    /* Xen read-write data section. */
> +    prepare_xen_region __data_begin, __init_begin, x0, x1, x2, x3
> +    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
> +
> +    read_max_el2_regions x5, x3 /* x5: max_mpumap */
> +    sub   x5, x5, #1
> +    /* Xen init text section. */
> +    prepare_xen_region _sinittext, _einittext, x5, x1, x2, x3, attr_prbar=REGION_TEXT_PRBAR
> +    create_mpu_entry xen_mpumap, x5, x1, x2, x3, x4
> +
> +    sub   x5, x5, #1
> +    /* Xen init data section. */
> +    prepare_xen_region __init_data_begin, __init_end, x5, x1, x2, x3
> +    create_mpu_entry xen_mpumap, x5, x1, x2, x3, x4
> +
> +    add   x0, x0, #1
> +    /* Xen BSS section. */
> +    prepare_xen_region __bss_start, __bss_end, x0, x1, x2, x3
> +    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4
> +
> +    /* Update next_fixed_region_idx and next_transient_region_idx */
> +    load_paddr x3, next_fixed_region_idx
> +    add   x0, x0, #1
> +    str   x0, [x3]
> +    load_paddr x4, next_transient_region_idx
> +    sub   x5, x5, #1
> +    str   x5, [x4]
> +
> +    mov   lr, x6
> +    ret
> +ENDPROC(prepare_early_mappings)
> +
> +GLOBAL(_end_boot)
> +
> +/*
> + * Local variables:
> + * mode: ASM
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
> new file mode 100644
> index 0000000000..c945dd53db
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * mpu.h: Arm Memory Protection Region definitions.
> + */
> +
> +#ifndef __ARM64_MPU_H__
> +#define __ARM64_MPU_H__
> +
> +#define MPU_REGION_SHIFT  6
> +#define MPU_REGION_ALIGN  (_AC(1, UL) << MPU_REGION_SHIFT)
> +#define MPU_REGION_MASK   (~(MPU_REGION_ALIGN - 1))
> +
> +/*
> + * MPUIR_EL2.Region identifies the number of regions supported by the EL2 MPU.
> + * It is a 8-bit field, so 255 MPU memory regions at most.
> + */
> +#define ARM_MAX_MPU_MEMORY_REGIONS 255
> +
> +#ifndef __ASSEMBLY__
> +
> +/* Protection Region Base Address Register */
> +typedef union {
> +    struct __packed {
> +        unsigned long xn:2;       /* Execute-Never */
> +        unsigned long ap:2;       /* Acess Permission */
> +        unsigned long sh:2;       /* Sharebility */
> +        unsigned long base:42;    /* Base Address */
> +        unsigned long pad:16;
> +    } reg;
> +    uint64_t bits;
> +} prbar_t;
> +
> +/* Protection Region Limit Address Register */
> +typedef union {
> +    struct __packed {
> +        unsigned long en:1;     /* Region enable */
> +        unsigned long ai:3;     /* Memory Attribute Index */
> +        unsigned long ns:1;     /* Not-Secure */
> +        unsigned long res:1;    /* Reserved 0 by hardware */
> +        unsigned long limit:42; /* Limit Address */
> +        unsigned long pad:16;
> +    } reg;
> +    uint64_t bits;
> +} prlar_t;
> +
> +/* MPU Protection Region */
> +typedef struct {
> +    prbar_t prbar;
> +    prlar_t prlar;
> +} pr_t;
> +
> +#endif /* __ASSEMBLY__ */
> +
> +#endif /* __ARM64_MPU_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h b/xen/arch/arm/include/asm/arm64/sysregs.h
> index 4638999514..aca9bca5b1 100644
> --- a/xen/arch/arm/include/asm/arm64/sysregs.h
> +++ b/xen/arch/arm/include/asm/arm64/sysregs.h
> @@ -458,6 +458,55 @@
>   #define ZCR_ELx_LEN_SIZE             9
>   #define ZCR_ELx_LEN_MASK             0x1ff
>
> +/* System registers for Armv8-R AArch64 */
> +#ifdef CONFIG_HAS_MPU
> +
> +/* EL2 MPU Protection Region Base Address Register encode */
> +#define PRBAR_EL2   S3_4_C6_C8_0
> +#define PRBAR0_EL2  S3_4_C6_C8_0
> +#define PRBAR1_EL2  S3_4_C6_C8_4
> +#define PRBAR2_EL2  S3_4_C6_C9_0
> +#define PRBAR3_EL2  S3_4_C6_C9_4
> +#define PRBAR4_EL2  S3_4_C6_C10_0
> +#define PRBAR5_EL2  S3_4_C6_C10_4
> +#define PRBAR6_EL2  S3_4_C6_C11_0
> +#define PRBAR7_EL2  S3_4_C6_C11_4
> +#define PRBAR8_EL2  S3_4_C6_C12_0
> +#define PRBAR9_EL2  S3_4_C6_C12_4
> +#define PRBAR10_EL2 S3_4_C6_C13_0
> +#define PRBAR11_EL2 S3_4_C6_C13_4
> +#define PRBAR12_EL2 S3_4_C6_C14_0
> +#define PRBAR13_EL2 S3_4_C6_C14_4
> +#define PRBAR14_EL2 S3_4_C6_C15_0
> +#define PRBAR15_EL2 S3_4_C6_C15_4
> +
> +/* EL2 MPU Protection Region Limit Address Register encode */
> +#define PRLAR_EL2   S3_4_C6_C8_1
> +#define PRLAR0_EL2  S3_4_C6_C8_1
> +#define PRLAR1_EL2  S3_4_C6_C8_5
> +#define PRLAR2_EL2  S3_4_C6_C9_1
> +#define PRLAR3_EL2  S3_4_C6_C9_5
> +#define PRLAR4_EL2  S3_4_C6_C10_1
> +#define PRLAR5_EL2  S3_4_C6_C10_5
> +#define PRLAR6_EL2  S3_4_C6_C11_1
> +#define PRLAR7_EL2  S3_4_C6_C11_5
> +#define PRLAR8_EL2  S3_4_C6_C12_1
> +#define PRLAR9_EL2  S3_4_C6_C12_5
> +#define PRLAR10_EL2 S3_4_C6_C13_1
> +#define PRLAR11_EL2 S3_4_C6_C13_5
> +#define PRLAR12_EL2 S3_4_C6_C14_1
> +#define PRLAR13_EL2 S3_4_C6_C14_5
> +#define PRLAR14_EL2 S3_4_C6_C15_1
> +#define PRLAR15_EL2 S3_4_C6_C15_5
> +
> +/* MPU Protection Region Selection Register encode */
> +#define PRSELR_EL2 S3_4_C6_C2_1
> +
> +/* MPU Type registers encode */
> +#define MPUIR_EL2 S3_4_C0_C0_4
> +
> +#endif
> +
>   /* Access to system registers */
>
>   #define WRITE_SYSREG64(v, name) do {                    \
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> new file mode 100644
> index 0000000000..43e9a1be4d
> --- /dev/null
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * xen/arch/arm/mm_mpu.c
> + *
> + * MPU based memory managment code for Armv8-R AArch64.
> + *
> + * Copyright (C) 2022 Arm Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/init.h>
> +#include <xen/page-size.h>
> +#include <asm/arm64/mpu.h>
> +
> +/* Xen MPU memory region mapping table. */
> +pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
> +     xen_mpumap[ARM_MAX_MPU_MEMORY_REGIONS];
> +
> +/* Index into MPU memory region map for fixed regions, ascending from zero. */
> +uint64_t __ro_after_init next_fixed_region_idx;
> +/*
> + * Index into MPU memory region map for transient regions, like boot-only
> + * region, which descends from max_xen_mpumap.
> + */
> +uint64_t __ro_after_init next_transient_region_idx;
> +
> +/* Maximum number of supported MPU memory regions by the EL2 MPU. */
> +uint64_t __ro_after_init max_xen_mpumap;
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index bc45ea2c65..79965a3c17 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -91,6 +91,8 @@ SECTIONS
>         __ro_after_init_end = .;
>     } : text
>
> +  . = ALIGN(PAGE_SIZE);
> +  __data_begin = .;
>     .data.read_mostly : {
>          /* Exception table */
>          __start___ex_table = .;
> @@ -157,7 +159,9 @@ SECTIONS
>          *(.altinstr_replacement)
>     } :text
>     . = ALIGN(PAGE_SIZE);
> +
>     .init.data : {
> +       __init_data_begin = .;            /* Init data */
>          *(.init.rodata)
>          *(.init.rodata.*)
>
> --
> 2.25.1
>
NIT:- Would you consider splitting this patch, something like this :-

1. Renaming of the mmu function

2. Define sysregs, prlar_t, prbar_t and other other hardware specific 
macros.

3. Define write_pr

4. The rest of the changes (ie prepare_early_mappings(), xen.lds.S, etc)

- Ayan



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h
  2023-01-13  5:28 ` [PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h Penny Zheng
@ 2023-01-19 14:20   ` Julien Grall
  2023-06-05  5:20     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-19 14:20 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> Xen defines some global configuration macros for Arm in
> config.h. We still want to use it for Armv8-R systems, but
> there are some address related macros that are defined for
> MMU systems. These macros will not be used by MPU systems,
> Adding ifdefery with CONFIG_HAS_MPU to gate these macros
> will result in a messy and hard-to-read/maintain code.
> 
> So we keep some common definitions still in config.h, but
> move virtual address related definitions to a new file -
> config_mmu.h. And use a new file config_mpu.h to store
> definitions for MPU systems. To avoid spreading #ifdef
> everywhere, we keep the same definition names for MPU
> systems, like XEN_VIRT_START and HYPERVISOR_VIRT_START,
> but the definition contents are MPU specific.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
> v1 -> v2:
> 1. Remove duplicated FIXMAP definitions from config_mmu.h
> ---
>   xen/arch/arm/include/asm/config.h     | 103 +++--------------------
>   xen/arch/arm/include/asm/config_mmu.h | 112 ++++++++++++++++++++++++++
>   xen/arch/arm/include/asm/config_mpu.h |  25 ++++++

I think this patch wants to be split in two. So we keep code movement 
separate from the introduction of new feature (e.g. MPU).

Furthermore, I think it would be better to name the new header layout_* 
(or similar).

Lastly, you are going to introduce several file with _mmu or _mpu. I 
would rather prefer if we create directory instead.


>   3 files changed, 147 insertions(+), 93 deletions(-)
>   create mode 100644 xen/arch/arm/include/asm/config_mmu.h
>   create mode 100644 xen/arch/arm/include/asm/config_mpu.h
> 
> diff --git a/xen/arch/arm/include/asm/config.h b/xen/arch/arm/include/asm/config.h
> index 25a625ff08..86d8142959 100644
> --- a/xen/arch/arm/include/asm/config.h
> +++ b/xen/arch/arm/include/asm/config.h
> @@ -48,6 +48,12 @@
>   
>   #define INVALID_VCPU_ID MAX_VIRT_CPUS
>   
> +/* Used for calculating PDX */

I am not entirely sure to understand the purpose of this comment.

> +#ifdef CONFIG_ARM_64
> +#define FRAMETABLE_SIZE        GB(32)
> +#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
> +#endif
> +

Why do you only keep the 64-bit version in config.h?

However... the frametable size is limited by the space we reserve in the 
virtual address space. This would not be the case for the MPU.

So having the limit in common seems a bit odd. In fact, I think we 
should look at getting rid of the limit for the MPU.

[...]

> diff --git a/xen/arch/arm/include/asm/config_mmu.h b/xen/arch/arm/include/asm/config_mmu.h
> new file mode 100644
> index 0000000000..c12ff25cf4
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/config_mmu.h
> @@ -0,0 +1,112 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/******************************************************************************
> + * config_mmu.h
> + *
> + * A Linux-style configuration list, only can be included by config.h

Why do you need to restrict where this is included? And if you really 
need it, then you should check it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-13  5:28 ` [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map Penny Zheng
  2023-01-19 10:18   ` Ayan Kumar Halder
@ 2023-01-19 15:04   ` Julien Grall
  2023-01-29  5:39     ` Penny Zheng
  1 sibling, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-19 15:04 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Penny Zheng <penny.zheng@arm.com>
> 
> The start-of-day Xen MPU memory region layout shall be like as follows:
> 
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data
> xen_mpumap[3] : Xen read-write data
> xen_mpumap[4] : Xen BSS
> ......
> xen_mpumap[max_xen_mpumap - 2]: Xen init data
> xen_mpumap[max_xen_mpumap - 1]: Xen init text

Can you explain why the init region should be at the end of the MPU?

> 
> max_xen_mpumap refers to the number of regions supported by the EL2 MPU.
> The layout shall be compliant with what we describe in xen.lds.S, or the
> codes need adjustment.
> 
> As MMU system and MPU system have different functions to create
> the boot MMU/MPU memory management data, instead of introducing
> extra #ifdef in main code flow, we introduce a neutral name
> prepare_early_mappings for both, and also to replace create_page_tables for MMU.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/arm64/Makefile              |   2 +
>   xen/arch/arm/arm64/head.S                |  17 +-
>   xen/arch/arm/arm64/head_mmu.S            |   4 +-
>   xen/arch/arm/arm64/head_mpu.S            | 323 +++++++++++++++++++++++
>   xen/arch/arm/include/asm/arm64/mpu.h     |  63 +++++
>   xen/arch/arm/include/asm/arm64/sysregs.h |  49 ++++
>   xen/arch/arm/mm_mpu.c                    |  48 ++++
>   xen/arch/arm/xen.lds.S                   |   4 +
>   8 files changed, 502 insertions(+), 8 deletions(-)
>   create mode 100644 xen/arch/arm/arm64/head_mpu.S
>   create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
>   create mode 100644 xen/arch/arm/mm_mpu.c
> 
> diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
> index 22da2f54b5..438c9737ad 100644
> --- a/xen/arch/arm/arm64/Makefile
> +++ b/xen/arch/arm/arm64/Makefile
> @@ -10,6 +10,8 @@ obj-y += entry.o
>   obj-y += head.o
>   ifneq ($(CONFIG_HAS_MPU),y)
>   obj-y += head_mmu.o
> +else
> +obj-y += head_mpu.o
>   endif
>   obj-y += insn.o
>   obj-$(CONFIG_LIVEPATCH) += livepatch.o
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 782bd1f94c..145e3d53dc 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -68,9 +68,9 @@
>    *  x24 -
>    *  x25 -
>    *  x26 - skip_zero_bss (boot cpu only)
> - *  x27 -
> - *  x28 -
> - *  x29 -
> + *  x27 - region selector (mpu only)
> + *  x28 - prbar (mpu only)
> + *  x29 - prlar (mpu only) >    *  x30 - lr
>    */
>   
> @@ -82,7 +82,7 @@
>    * ---------------------------
>    *
>    * The requirements are:
> - *   MMU = off, D-cache = off, I-cache = on or off,
> + *   MMU/MPU = off, D-cache = off, I-cache = on or off,
>    *   x0 = physical address to the FDT blob.
>    *
>    * This must be the very first address in the loaded image.
> @@ -252,7 +252,12 @@ real_start_efi:
>   
>           bl    check_cpu_mode
>           bl    cpu_init
> -        bl    create_page_tables
> +
> +        /*
> +         * Create boot memory management data, pagetable for MMU systems
> +         * and memory regions for MPU systems.
> +         */
> +        bl    prepare_early_mappings
>           bl    enable_mmu
>   
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
> @@ -310,7 +315,7 @@ GLOBAL(init_secondary)
>   #endif
>           bl    check_cpu_mode
>           bl    cpu_init
> -        bl    create_page_tables
> +        bl    prepare_early_mappings
>           bl    enable_mmu
>   
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
> diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
> index 6ff13c751c..2346f755df 100644
> --- a/xen/arch/arm/arm64/head_mmu.S
> +++ b/xen/arch/arm/arm64/head_mmu.S
> @@ -123,7 +123,7 @@
>    *
>    * Clobbers x0 - x4
>    */
> -ENTRY(create_page_tables)
> +ENTRY(prepare_early_mappings)
>           /* Prepare the page-tables for mapping Xen */
>           ldr   x0, =XEN_VIRT_START
>           create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
> @@ -208,7 +208,7 @@ virtphys_clash:
>           /* Identity map clashes with boot_third, which we cannot handle yet */
>           PRINT("- Unable to build boot page tables - virt and phys addresses clash. -\r\n")
>           b     fail
> -ENDPROC(create_page_tables)
> +ENDPROC(prepare_early_mappings)
>   
>   /*
>    * Turn on the Data Cache and the MMU. The function will return on the 1:1
> diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
> new file mode 100644
> index 0000000000..0b97ce4646
> --- /dev/null
> +++ b/xen/arch/arm/arm64/head_mpu.S
> @@ -0,0 +1,323 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Start-of-day code for an Armv8-R AArch64 MPU system.
> + */
> +
> +#include <asm/arm64/mpu.h>
> +#include <asm/early_printk.h>
> +#include <asm/page.h>
> +
> +/*
> + * One entry in Xen MPU memory region mapping table(xen_mpumap) is a structure
> + * of pr_t, which is 16-bytes size, so the entry offset is the order of 4.
> + */
> +#define MPU_ENTRY_SHIFT         0x4
> +
> +#define REGION_SEL_MASK         0xf
> +
> +#define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
> +#define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
> +#define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
> +
> +#define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
> +
> +/*
> + * Macro to round up the section address to be PAGE_SIZE aligned
> + * Each section(e.g. .text, .data, etc) in xen.lds.S is page-aligned,
> + * which is usually guarded with ". = ALIGN(PAGE_SIZE)" in the head,
> + * or in the end
> + */
> +.macro roundup_section, xb
> +        add   \xb, \xb, #(PAGE_SIZE-1)
> +        and   \xb, \xb, #PAGE_MASK
> +.endm
> +
> +/*
> + * Macro to create a new MPU memory region entry, which is a structure
> + * of pr_t,  in \prmap.
> + *
> + * Inputs:
> + * prmap:   mpu memory region map table symbol
> + * sel:     region selector
> + * prbar:   preserve value for PRBAR_EL2
> + * prlar    preserve value for PRLAR_EL2
> + *
> + * Clobbers \tmp1, \tmp2
> + *
> + */
> +.macro create_mpu_entry prmap, sel, prbar, prlar, tmp1, tmp2
> +    mov   \tmp2, \sel
> +    lsl   \tmp2, \tmp2, #MPU_ENTRY_SHIFT
> +    adr_l \tmp1, \prmap
> +    /* Write the first 8 bytes(prbar_t) of pr_t */
> +    str   \prbar, [\tmp1, \tmp2]
> +
> +    add   \tmp2, \tmp2, #8
> +    /* Write the last 8 bytes(prlar_t) of pr_t */
> +    str   \prlar, [\tmp1, \tmp2]

Any particular reason to not use 'stp'?

Also, AFAICT, with data cache disabled. But at least on ARMv8-A, the 
cache is never really off. So don't need some cache maintainance?

FAOD, I know the existing MMU code has the same issue. But I would 
rather prefer if the new code introduced is compliant to the Arm Arm.

> +.endm
> +
> +/*
> + * Macro to store the maximum number of regions supported by the EL2 MPU
> + * in max_xen_mpumap, which is identified by MPUIR_EL2.
> + *
> + * Outputs:
> + * nr_regions: preserve the maximum number of regions supported by the EL2 MPU
> + *
> + * Clobbers \tmp1
> + * > + */

Are you going to have multiple users? If not, then I would prefer if 
this is folded in the only caller.

> +.macro read_max_el2_regions, nr_regions, tmp1
> +    load_paddr \tmp1, max_xen_mpumap

I would rather prefer if we restrict the use of global while the MMU if 
off (see why above).

> +    mrs   \nr_regions, MPUIR_EL2
> +    isb

What's that isb for?

> +    str   \nr_regions, [\tmp1]
> +.endm
> +
> +/*
> + * Macro to prepare and set a MPU memory region
> + *
> + * Inputs:
> + * base:        base address symbol (should be page-aligned)
> + * limit:       limit address symbol
> + * sel:         region selector
> + * prbar:       store computed PRBAR_EL2 value
> + * prlar:       store computed PRLAR_EL2 value
> + * attr_prbar:  PRBAR_EL2-related memory attributes. If not specified it will be REGION_DATA_PRBAR
> + * attr_prlar:  PRLAR_EL2-related memory attributes. If not specified it will be REGION_NORMAL_PRLAR
> + *
> + * Clobber \tmp1

This macro will also clobber x27, x28, x29.

> + *
> + */
> +.macro prepare_xen_region, base, limit, sel, prbar, prlar, tmp1, attr_prbar=REGION_DATA_PRBAR, attr_prlar=REGION_NORMAL_PRLAR
> +    /* Prepare value for PRBAR_EL2 reg and preserve it in \prbar.*/
> +    load_paddr \prbar, \base
> +    and   \prbar, \prbar, #MPU_REGION_MASK
> +    mov   \tmp1, #\attr_prbar
> +    orr   \prbar, \prbar, \tmp1
> +
> +    /* Prepare value for PRLAR_EL2 reg and preserve it in \prlar.*/
> +    load_paddr \prlar, \limit
> +    /* Round up limit address to be PAGE_SIZE aligned */
> +    roundup_section \prlar
> +    /* Limit address should be inclusive */
> +    sub   \prlar, \prlar, #1
> +    and   \prlar, \prlar, #MPU_REGION_MASK
> +    mov   \tmp1, #\attr_prlar
> +    orr   \prlar, \prlar, \tmp1
> +
> +    mov   x27, \sel
> +    mov   x28, \prbar
> +    mov   x29, \prlar
> +    /*
> +     * x27, x28, x29 are special registers designed as
> +     * inputs for function write_pr
> +     */
> +    bl    write_pr
> +.endm
> +
> +.section .text.idmap, "ax", %progbits
> +
> +/*
> + * ENTRY to configure a EL2 MPU memory region
> + * ARMv8-R AArch64 at most supports 255 MPU protection regions.
> + * See section G1.3.18 of the reference manual for ARMv8-R AArch64,
> + * PRBAR<n>_EL2 and PRLAR<n>_EL2 provides access to the EL2 MPU region
> + * determined by the value of 'n' and PRSELR_EL2.REGION as
> + * PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
> + * For example to access regions from 16 to 31 (0b10000 to 0b11111):
> + * - Set PRSELR_EL2 to 0b1xxxx
> + * - Region 16 configuration is accessible through PRBAR0_EL2 and PRLAR0_EL2
> + * - Region 17 configuration is accessible through PRBAR1_EL2 and PRLAR1_EL2
> + * - Region 18 configuration is accessible through PRBAR2_EL2 and PRLAR2_EL2
> + * - ...
> + * - Region 31 configuration is accessible through PRBAR15_EL2 and PRLAR15_EL2
> + *
> + * Inputs:
> + * x27: region selector
> + * x28: preserve value for PRBAR_EL2
> + * x29: preserve value for PRLAR_EL2
> + *
> + */
> +ENTRY(write_pr)

AFAICT, this function would not be necessary if the index for the init 
sections were hardcoded.

So I would like to understand why the index cannot be hardcoded.

> +    msr   PRSELR_EL2, x27
> +    dsb   sy

What is this 'dsb' for? Also why 'sy'?

> +    and   x27, x27, #REGION_SEL_MASK
> +    cmp   x27, #0
> +    bne   1f
> +    msr   PRBAR0_EL2, x28
> +    msr   PRLAR0_EL2, x29
> +    b     out
> +1:
> +    cmp   x27, #1
> +    bne   2f
> +    msr   PRBAR1_EL2, x28
> +    msr   PRLAR1_EL2, x29
> +    b     out
> +2:
> +    cmp   x27, #2
> +    bne   3f
> +    msr   PRBAR2_EL2, x28
> +    msr   PRLAR2_EL2, x29
> +    b     out
> +3:
> +    cmp   x27, #3
> +    bne   4f
> +    msr   PRBAR3_EL2, x28
> +    msr   PRLAR3_EL2, x29
> +    b     out
> +4:
> +    cmp   x27, #4
> +    bne   5f
> +    msr   PRBAR4_EL2, x28
> +    msr   PRLAR4_EL2, x29
> +    b     out
> +5:
> +    cmp   x27, #5
> +    bne   6f
> +    msr   PRBAR5_EL2, x28
> +    msr   PRLAR5_EL2, x29
> +    b     out
> +6:
> +    cmp   x27, #6
> +    bne   7f
> +    msr   PRBAR6_EL2, x28
> +    msr   PRLAR6_EL2, x29
> +    b     out
> +7:
> +    cmp   x27, #7
> +    bne   8f
> +    msr   PRBAR7_EL2, x28
> +    msr   PRLAR7_EL2, x29
> +    b     out
> +8:
> +    cmp   x27, #8
> +    bne   9f
> +    msr   PRBAR8_EL2, x28
> +    msr   PRLAR8_EL2, x29
> +    b     out
> +9:
> +    cmp   x27, #9
> +    bne   10f
> +    msr   PRBAR9_EL2, x28
> +    msr   PRLAR9_EL2, x29
> +    b     out
> +10:
> +    cmp   x27, #10
> +    bne   11f
> +    msr   PRBAR10_EL2, x28
> +    msr   PRLAR10_EL2, x29
> +    b     out
> +11:
> +    cmp   x27, #11
> +    bne   12f
> +    msr   PRBAR11_EL2, x28
> +    msr   PRLAR11_EL2, x29
> +    b     out
> +12:
> +    cmp   x27, #12
> +    bne   13f
> +    msr   PRBAR12_EL2, x28
> +    msr   PRLAR12_EL2, x29
> +    b     out
> +13:
> +    cmp   x27, #13
> +    bne   14f
> +    msr   PRBAR13_EL2, x28
> +    msr   PRLAR13_EL2, x29
> +    b     out
> +14:
> +    cmp   x27, #14
> +    bne   15f
> +    msr   PRBAR14_EL2, x28
> +    msr   PRLAR14_EL2, x29
> +    b     out
> +15:
> +    msr   PRBAR15_EL2, x28
> +    msr   PRLAR15_EL2, x29
> +out:
> +    isb

What is this 'isb' for?

> +    ret
> +ENDPROC(write_pr)
> +
> +/*
> + * Static start-of-day Xen EL2 MPU memory region layout.
> + *
> + *     xen_mpumap[0] : Xen text
> + *     xen_mpumap[1] : Xen read-only data
> + *     xen_mpumap[2] : Xen read-only after init data
> + *     xen_mpumap[3] : Xen read-write data
> + *     xen_mpumap[4] : Xen BSS
> + *     ......
> + *     xen_mpumap[max_xen_mpumap - 2]: Xen init data
> + *     xen_mpumap[max_xen_mpumap - 1]: Xen init text
> + *
> + * Clobbers x0 - x6
> + *
> + * It shall be compliant with what describes in xen.lds.S, or the below
> + * codes need adjustment.
> + * It shall also follow the rules of putting fixed MPU memory region in
> + * the front, and the others in the rear, which, here, mainly refers to
> + * boot-only region, like Xen init text region.
> + */
> +ENTRY(prepare_early_mappings)
> +    /* stack LR as write_pr will be called later like nested function */
> +    mov   x6, lr
> +
> +    /* x0: region sel */
> +    mov   x0, xzr
> +    /* Xen text section. */
> +    prepare_xen_region _stext, _etext, x0, x1, x2, x3, attr_prbar=REGION_TEXT_PRBAR
> +    create_mpu_entry xen_mpumap, x0, x1, x2, x3, x4

You always seem to call prepare_xen_region and create_mpu_entry. Can 
they be combined?

Also, will the first parameter of create_mpu_entry always be xen_mpumap? 
If so, I will remove it from the parameter.


[...]

> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index bc45ea2c65..79965a3c17 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -91,6 +91,8 @@ SECTIONS
>         __ro_after_init_end = .;
>     } : text
>   
> +  . = ALIGN(PAGE_SIZE);

Why do you need this ALIGN?

> +  __data_begin = .;
>     .data.read_mostly : {
>          /* Exception table */
>          __start___ex_table = .;
> @@ -157,7 +159,9 @@ SECTIONS
>          *(.altinstr_replacement)

I know you are not using alternative instructions yet. But, you should 
make sure they are included. So I think rather than introduce 
__init_data_begin, you want to use "_einitext" for the start of the 
"Init data" section.

>     } :text
>     . = ALIGN(PAGE_SIZE);
> +

Spurious?

>     .init.data : {
> +       __init_data_begin = .;            /* Init data */
>          *(.init.rodata)
>          *(.init.rodata.*)
>   

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement
  2023-01-13  5:28 ` [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement Penny Zheng
@ 2023-01-23 17:07   ` Ayan Kumar Halder
  2023-01-24 18:54   ` Julien Grall
  1 sibling, 0 replies; 122+ messages in thread
From: Ayan Kumar Halder @ 2023-01-23 17:07 UTC (permalink / raw)
  To: xen-devel

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
>
>
> We need a new helper for Xen to enable MPU in boot-time.
> The new helper is semantically consistent with the original enable_mmu.
>
> If the Background region is enabled, then the MPU uses the default memory
> map as the Background region for generating the memory
> attributes when MPU is disabled.
> Since the default memory map of the Armv8-R AArch64 architecture is
> IMPLEMENTATION DEFINED, we always turn off the Background region.
>
> In this patch, we also introduce a neutral name enable_mm for
> Xen to enable MMU/MPU. This can help us to keep one code flow
> in head.S
>
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/arm64/head.S     |  5 +++--
>   xen/arch/arm/arm64/head_mmu.S |  4 ++--
>   xen/arch/arm/arm64/head_mpu.S | 19 +++++++++++++++++++
>   3 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 145e3d53dc..7f3f973468 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -258,7 +258,8 @@ real_start_efi:
>            * and memory regions for MPU systems.
>            */
>           bl    prepare_early_mappings
> -        bl    enable_mmu
> +        /* Turn on MMU or MPU */
> +        bl    enable_mm
>
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
>           ldr   x0, =primary_switched
> @@ -316,7 +317,7 @@ GLOBAL(init_secondary)
>           bl    check_cpu_mode
>           bl    cpu_init
>           bl    prepare_early_mappings
> -        bl    enable_mmu
> +        bl    enable_mm
>
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
>           ldr   x0, =secondary_switched
> diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
> index 2346f755df..b59c40495f 100644
> --- a/xen/arch/arm/arm64/head_mmu.S
> +++ b/xen/arch/arm/arm64/head_mmu.S
> @@ -217,7 +217,7 @@ ENDPROC(prepare_early_mappings)
>    *
>    * Clobbers x0 - x3
>    */
> -ENTRY(enable_mmu)
> +ENTRY(enable_mm)
>           PRINT("- Turning on paging -\r\n")
>
>           /*
> @@ -239,7 +239,7 @@ ENTRY(enable_mmu)
>           msr   SCTLR_EL2, x0          /* now paging is enabled */
>           isb                          /* Now, flush the icache */
>           ret
> -ENDPROC(enable_mmu)
> +ENDPROC(enable_mm)
>
>   /*
>    * Remove the 1:1 map from the page-tables. It is not easy to keep track
> diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
> index 0b97ce4646..e2ac69b0cc 100644
> --- a/xen/arch/arm/arm64/head_mpu.S
> +++ b/xen/arch/arm/arm64/head_mpu.S
> @@ -315,6 +315,25 @@ ENDPROC(prepare_early_mappings)
>
>   GLOBAL(_end_boot)
>
> +/*
> + * Enable EL2 MPU and data cache
> + * If the Background region is enabled, then the MPU uses the default memory
> + * map as the Background region for generating the memory
> + * attributes when MPU is disabled.
> + * Since the default memory map of the Armv8-R AArch64 architecture is
> + * IMPLEMENTATION DEFINED, we intend to turn off the Background region here.
> + */
> +ENTRY(enable_mm)
> +    mrs   x0, SCTLR_EL2
> +    orr   x0, x0, #SCTLR_Axx_ELx_M    /* Enable MPU */
> +    orr   x0, x0, #SCTLR_Axx_ELx_C    /* Enable D-cache */
> +    orr   x0, x0, #SCTLR_Axx_ELx_WXN  /* Enable WXN */
> +    dsb   sy
> +    msr   SCTLR_EL2, x0
> +    isb
> +    ret
> +ENDPROC(enable_mm)

Can this be renamed to enable_mpu or enable_mpu_and_cache() ?

Can we also have the corresponding disable function in this patch ?

Also (compared with "[PATCH v6 10/11] xen/arm64: introduce helpers for 
MPU enable/disable"), I see that you have added #SCTLR_Axx_ELx_WXN. What 
is the reason for this ?

- Ayan

> +
>   /*
>    * Local variables:
>    * mode: ASM
> --
> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement
  2023-01-13  5:28 ` [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement Penny Zheng
  2023-01-23 17:07   ` Ayan Kumar Halder
@ 2023-01-24 18:54   ` Julien Grall
  1 sibling, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-01-24 18:54 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> We need a new helper for Xen to enable MPU in boot-time.
> The new helper is semantically consistent with the original enable_mmu.
> 
> If the Background region is enabled, then the MPU uses the default memory
> map as the Background region for generating the memory
> attributes when MPU is disabled.
> Since the default memory map of the Armv8-R AArch64 architecture is
> IMPLEMENTATION DEFINED, we always turn off the Background region.

You are saying this. But I don't see any code below clearing 
SCTLR_EL2.BR. Can you clarify?

> 
> In this patch, we also introduce a neutral name enable_mm for
> Xen to enable MMU/MPU. This can help us to keep one code flow
> in head.S

NIT: Missing full stop.

> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/arm64/head.S     |  5 +++--
>   xen/arch/arm/arm64/head_mmu.S |  4 ++--
>   xen/arch/arm/arm64/head_mpu.S | 19 +++++++++++++++++++
>   3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 145e3d53dc..7f3f973468 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -258,7 +258,8 @@ real_start_efi:
>            * and memory regions for MPU systems.
>            */
>           bl    prepare_early_mappings
> -        bl    enable_mmu
> +        /* Turn on MMU or MPU */
> +        bl    enable_mm
>   
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
>           ldr   x0, =primary_switched
> @@ -316,7 +317,7 @@ GLOBAL(init_secondary)
>           bl    check_cpu_mode
>           bl    cpu_init
>           bl    prepare_early_mappings
> -        bl    enable_mmu
> +        bl    enable_mm
>   
>           /* We are still in the 1:1 mapping. Jump to the runtime Virtual Address. */
>           ldr   x0, =secondary_switched
> diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
> index 2346f755df..b59c40495f 100644
> --- a/xen/arch/arm/arm64/head_mmu.S
> +++ b/xen/arch/arm/arm64/head_mmu.S
> @@ -217,7 +217,7 @@ ENDPROC(prepare_early_mappings)
>    *
>    * Clobbers x0 - x3
>    */
> -ENTRY(enable_mmu)
> +ENTRY(enable_mm)
>           PRINT("- Turning on paging -\r\n")
>   
>           /*
> @@ -239,7 +239,7 @@ ENTRY(enable_mmu)
>           msr   SCTLR_EL2, x0          /* now paging is enabled */
>           isb                          /* Now, flush the icache */
>           ret
> -ENDPROC(enable_mmu)
> +ENDPROC(enable_mm)
>   
>   /*
>    * Remove the 1:1 map from the page-tables. It is not easy to keep track
> diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
> index 0b97ce4646..e2ac69b0cc 100644
> --- a/xen/arch/arm/arm64/head_mpu.S
> +++ b/xen/arch/arm/arm64/head_mpu.S
> @@ -315,6 +315,25 @@ ENDPROC(prepare_early_mappings)
>   
>   GLOBAL(_end_boot)
>   
> +/*
> + * Enable EL2 MPU and data cache
> + * If the Background region is enabled, then the MPU uses the default memory
> + * map as the Background region for generating the memory
> + * attributes when MPU is disabled.
> + * Since the default memory map of the Armv8-R AArch64 architecture is
> + * IMPLEMENTATION DEFINED, we intend to turn off the Background region here.

Please document which register you are clobberring. See the MMU code for 
examples how to do you.

> + */
> +ENTRY(enable_mm)
> +    mrs   x0, SCTLR_EL2
> +    orr   x0, x0, #SCTLR_Axx_ELx_M    /* Enable MPU */
> +    orr   x0, x0, #SCTLR_Axx_ELx_C    /* Enable D-cache */
> +    orr   x0, x0, #SCTLR_Axx_ELx_WXN  /* Enable WXN */
> +    dsb   sy

Please document the reason of each dsb. In this case, it is not entirely 
clear what this is for.

> +    msr   SCTLR_EL2, x0
> +    isb

Likely for isb.

> +    ret
> +ENDPROC(enable_mm)
> +
Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-13  5:28 ` [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART Penny Zheng
@ 2023-01-24 19:09   ` Julien Grall
  2023-01-29  6:17     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-24 19:09 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Peny,

On 13/01/2023 05:28, Penny Zheng wrote:
> In MMU system, we map the UART in the fixmap (when earlyprintk is used).
> However in MPU system, we map the UART with a transient MPU memory
> region.
> 
> So we introduce a new unified function setup_early_uart to replace
> the previous setup_fixmap.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/arm64/head.S               |  2 +-
>   xen/arch/arm/arm64/head_mmu.S           |  4 +-
>   xen/arch/arm/arm64/head_mpu.S           | 52 +++++++++++++++++++++++++
>   xen/arch/arm/include/asm/early_printk.h |  1 +
>   4 files changed, 56 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 7f3f973468..a92883319d 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -272,7 +272,7 @@ primary_switched:
>            * afterwards.
>            */
>           bl    remove_identity_mapping
> -        bl    setup_fixmap
> +        bl    setup_early_uart
>   #ifdef CONFIG_EARLY_PRINTK
>           /* Use a virtual address to access the UART. */
>           ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
> diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
> index b59c40495f..a19b7c873d 100644
> --- a/xen/arch/arm/arm64/head_mmu.S
> +++ b/xen/arch/arm/arm64/head_mmu.S
> @@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
>    *
>    * Clobbers x0 - x3
>    */
> -ENTRY(setup_fixmap)
> +ENTRY(setup_early_uart)

This function is doing more than enable the early UART. It also setups 
the fixmap even earlyprintk is not configured.

I am not entirely sure what could be the name. Maybe this needs to be 
split further.

>   #ifdef CONFIG_EARLY_PRINTK
>           /* Add UART to the fixmap table */
>           ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
> @@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
>           dsb   nshst
>   
>           ret
> -ENDPROC(setup_fixmap)
> +ENDPROC(setup_early_uart)
>   
>   /* Fail-stop */
>   fail:   PRINT("- Boot failed -\r\n")
> diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
> index e2ac69b0cc..72d1e0863d 100644
> --- a/xen/arch/arm/arm64/head_mpu.S
> +++ b/xen/arch/arm/arm64/head_mpu.S
> @@ -18,8 +18,10 @@
>   #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
>   #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
>   #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
> +#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
>   
>   #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
> +#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
>   
>   /*
>    * Macro to round up the section address to be PAGE_SIZE aligned
> @@ -334,6 +336,56 @@ ENTRY(enable_mm)
>       ret
>   ENDPROC(enable_mm)
>   
> +/*
> + * Map the early UART with a new transient MPU memory region.
> + *

Missing "Inputs: "

> + * x27: region selector
> + * x28: prbar
> + * x29: prlar
> + *
> + * Clobbers x0 - x4
> + *
> + */
> +ENTRY(setup_early_uart)
> +#ifdef CONFIG_EARLY_PRINTK
> +    /* stack LR as write_pr will be called later like nested function */
> +    mov   x3, lr
> +
> +    /*
> +     * MPU region for early UART is a transient region, since it will be
> +     * replaced by specific device memory layout when FDT gets parsed.

I would rather not mention "FDT" here because this code is independent 
to the firmware table used.

However, any reason to use a transient region rather than the one that 
will be used for the UART driver?

> +     */
> +    load_paddr x0, next_transient_region_idx
> +    ldr   x4, [x0]
> +
> +    ldr   x28, =CONFIG_EARLY_UART_BASE_ADDRESS
> +    and   x28, x28, #MPU_REGION_MASK
> +    mov   x1, #REGION_DEVICE_PRBAR
> +    orr   x28, x28, x1

This needs some documentation to explain the logic. Maybe even a macro.

> +
> +    ldr x29, =(CONFIG_EARLY_UART_BASE_ADDRESS + EARLY_UART_SIZE)
> +    roundup_section x29

Does this mean we could give access to more than necessary? Shouldn't 
instead prevent compilation if the size doesn't align with the section size?

> +    /* Limit address is inclusive */
> +    sub   x29, x29, #1
> +    and   x29, x29, #MPU_REGION_MASK
> +    mov   x2, #REGION_DEVICE_PRLAR
> +    orr   x29, x29, x2
> +
> +    mov   x27, x4

This needs some documentation like:

x27: region selector

See how we documented the existing helpers.

> +    bl    write_pr
> +
> +    /* Create a new entry in xen_mpumap for early UART */
> +    create_mpu_entry xen_mpumap, x4, x28, x29, x1, x2
> +
> +    /* Update next_transient_region_idx */
> +    sub   x4, x4, #1
> +    str   x4, [x0]
> +
> +    mov   lr, x3
> +    ret
> +#endif
> +ENDPROC(setup_early_uart)
> +
>   /*
>    * Local variables:
>    * mode: ASM
> diff --git a/xen/arch/arm/include/asm/early_printk.h b/xen/arch/arm/include/asm/early_printk.h
> index 44a230853f..d87623e6d5 100644
> --- a/xen/arch/arm/include/asm/early_printk.h
> +++ b/xen/arch/arm/include/asm/early_printk.h
> @@ -22,6 +22,7 @@
>    * for EARLY_UART_VIRTUAL_ADDRESS.
>    */
>   #define EARLY_UART_VIRTUAL_ADDRESS CONFIG_EARLY_UART_BASE_ADDRESS
> +#define EARLY_UART_SIZE            0x1000

Shouldn't this be PAGE_SIZE? If not, how did you come up with the number?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 18/40] xen/mpu: introduce helper access_protection_region
  2023-01-13  5:28 ` [PATCH v2 18/40] xen/mpu: introduce helper access_protection_region Penny Zheng
@ 2023-01-24 19:20   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-01-24 19:20 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> Each EL2 MPU protection region could be configured using PRBAR<n>_EL2 and
> PRLAR<n>_EL2.
> 
> This commit introduces a new helper access_protection_region() to access
> EL2 MPU protection region, including both read/write operations.
> 
> As explained in section G1.3.18 of the reference manual for AArch64v8R,
> a set of system register PRBAR<n>_EL2 and PRLAR<n>_EL2 provide access to
> the EL2 MPU region which is determined by the value of 'n' and
> PRSELR_EL2.REGION as PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
> For example to access regions from 16 to 31:
> - Set PRSELR_EL2 to 0b1xxxx
> - Region 16 configuration is accessible through PRBAR0_EL2 and PRLAR0_EL2
> - Region 17 configuration is accessible through PRBAR1_EL2 and PRLAR1_EL2
> - Region 18 configuration is accessible through PRBAR2_EL2 and PRLAR2_EL2
> - ...
> - Region 31 configuration is accessible through PRBAR15_EL2 and PRLAR15_EL2
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/mm_mpu.c | 151 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 151 insertions(+)
> 
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index c9e17ab6da..f2b494449c 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -46,6 +46,157 @@ uint64_t __ro_after_init next_transient_region_idx;
>   /* Maximum number of supported MPU memory regions by the EL2 MPU. */
>   uint64_t __ro_after_init max_xen_mpumap;
>   
> +/* Write a MPU protection region */
> +#define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
> +    uint64_t _sel = sel;                                                \
> +    const pr_t *_pr = pr;                                               \
> +    asm volatile(                                                       \
> +        "msr "__stringify(PRSELR_EL2)", %0;" /* Selects the region */   \

This is an open-coding version of WRITE_SYSREG(). Can we use it instead?

> +        "dsb sy;"                                                       \

What is this dsb for? Also is the 'sy' really necessary?

> +        "msr "__stringify(prbar_el2)", %1;" /* Write PRBAR<n>_EL2 */    \

WRITE_SYSREG()?

> +        "msr "__stringify(prlar_el2)", %2;" /* Write PRLAR<n>_EL2 */    \

WRITE_SYSREG()?

> +        "dsb sy;"                                                       \

Same about dsb. But I would consider to move the dsb and selection part 
outside of the macro. So they could outside of the switch and reduce the 
code generation.

> +        : : "r" (_sel), "r" (_pr->prbar.bits), "r" (_pr->prlar.bits));  \
> +})
> +
> +/* Read a MPU protection region */
> +#define READ_PROTECTION_REGION(sel, prbar_el2, prlar_el2) ({            \

My comment on READ_PROTECTION also applies here. But you would want to 
use READ_SYSREG() for 'mrs'.

> +    uint64_t _sel = sel;                                                \
> +    pr_t _pr;                                                           \
> +    asm volatile(                                                       \
> +        "msr "__stringify(PRSELR_EL2)", %2;" /* Selects the region */   \
> +        "dsb sy;"                                                       \
> +        "mrs %0, "__stringify(prbar_el2)";" /* Read PRBAR<n>_EL2 */     \
> +        "mrs %1, "__stringify(prlar_el2)";" /* Read PRLAR<n>_EL2 */     \
> +        "dsb sy;"                                                       \
> +        : "=r" (_pr.prbar.bits), "=r" (_pr.prlar.bits) : "r" (_sel));   \
> +    _pr;                                                                \
> +})
> +
> +/*
> + * Access MPU protection region, including both read/write operations.
> + * Armv8-R AArch64 at most supports 255 MPU protection regions.
> + * See section G1.3.18 of the reference manual for Armv8-R AArch64,
> + * PRBAR<n>_EL2 and PRLAR<n>_EL2 provide access to the EL2 MPU region
> + * determined by the value of 'n' and PRSELR_EL2.REGION as
> + * PRSELR_EL2.REGION<7:4>:n(n = 0, 1, 2, ... , 15)
> + * For example to access regions from 16 to 31 (0b10000 to 0b11111):
> + * - Set PRSELR_EL2 to 0b1xxxx
> + * - Region 16 configuration is accessible through PRBAR0_ELx and PRLAR0_ELx
> + * - Region 17 configuration is accessible through PRBAR1_ELx and PRLAR1_ELx
> + * - Region 18 configuration is accessible through PRBAR2_ELx and PRLAR2_ELx
> + * - ...
> + * - Region 31 configuration is accessible through PRBAR15_ELx and PRLAR15_ELx
> + *
> + * @read: if it is read operation.
> + * @pr_read: mpu protection region returned by read op.
> + * @pr_write: const mpu protection region passed through write op.
> + * @sel: mpu protection region selector
> + */
> +static void access_protection_region(bool read, pr_t *pr_read,
> +                                     const pr_t *pr_write, uint64_t sel)

I would rather prefer if we introduce two helpers (one for the read 
operation, the other for the write operation). This would make the code 
a bit easier to read.

> +{
> +    switch ( sel & 0xf )
> +    {
> +    case 0:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR0_EL2, PRLAR0_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR0_EL2, PRLAR0_EL2);
> +        break;
> +    case 1:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR1_EL2, PRLAR1_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR1_EL2, PRLAR1_EL2);
> +        break;
> +    case 2:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR2_EL2, PRLAR2_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR2_EL2, PRLAR2_EL2);
> +        break;
> +    case 3:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR3_EL2, PRLAR3_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR3_EL2, PRLAR3_EL2);
> +        break;
> +    case 4:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR4_EL2, PRLAR4_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR4_EL2, PRLAR4_EL2);
> +        break;
> +    case 5:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR5_EL2, PRLAR5_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR5_EL2, PRLAR5_EL2);
> +        break;
> +    case 6:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR6_EL2, PRLAR6_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR6_EL2, PRLAR6_EL2);
> +        break;
> +    case 7:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR7_EL2, PRLAR7_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR7_EL2, PRLAR7_EL2);
> +        break;
> +    case 8:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR8_EL2, PRLAR8_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR8_EL2, PRLAR8_EL2);
> +        break;
> +    case 9:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR9_EL2, PRLAR9_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR9_EL2, PRLAR9_EL2);
> +        break;
> +    case 10:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR10_EL2, PRLAR10_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR10_EL2, PRLAR10_EL2);
> +        break;
> +    case 11:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR11_EL2, PRLAR11_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR11_EL2, PRLAR11_EL2);
> +        break;
> +    case 12:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR12_EL2, PRLAR12_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR12_EL2, PRLAR12_EL2);
> +        break;
> +    case 13:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR13_EL2, PRLAR13_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR13_EL2, PRLAR13_EL2);
> +        break;
> +    case 14:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR14_EL2, PRLAR14_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR14_EL2, PRLAR14_EL2);
> +        break;
> +    case 15:
> +        if ( read )
> +            *pr_read = READ_PROTECTION_REGION(sel, PRBAR15_EL2, PRLAR15_EL2);
> +        else
> +            WRITE_PROTECTION_REGION(sel, pr_write, PRBAR15_EL2, PRLAR15_EL2);
> +        break;

What if the caller pass a number higher than 15?

> +    }
> +}
> +
>   /* TODO: Implementation on the first usage */
>   void dump_hyp_walk(vaddr_t addr)
>   {

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1
  2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
                   ` (41 preceding siblings ...)
  2023-01-13  8:54 ` [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Jan Beulich
@ 2023-01-24 19:31 ` Ayan Kumar Halder
  42 siblings, 0 replies; 122+ messages in thread
From: Ayan Kumar Halder @ 2023-01-24 19:31 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Roger Pau Monné

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
>
>
> The Armv8-R architecture profile was designed to support use cases
> that have a high sensitivity to deterministic execution. (e.g.
> Fuel Injection, Brake control, Drive trains, Motor control etc)
>
> Arm announced Armv8-R in 2013, it is the latest generation Arm
> architecture targeted at the Real-time profile. It introduces
> virtualization at the highest security level while retaining the
> Protected Memory System Architecture (PMSA) based on a Memory
> Protection Unit (MPU). In 2020, Arm announced Cortex-R82,
> which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
> The latest Armv8-R64 document can be found [1]. And the features of
> Armv8-R64 architecture:
>    - An exception model that is compatible with the Armv8-A model
>    - Virtualization with support for guest operating systems
>    - PMSA virtualization using MPUs In EL2.
>    - Adds support for the 64-bit A64 instruction set.
>    - Supports up to 48-bit physical addressing.
>    - Supports three Exception Levels (ELs)
>          - Secure EL2 - The Highest Privilege
>          - Secure EL1 - RichOS (MMU) or RTOS (MPU)
>          - Secure EL0 - Application Workloads
>   - Supports only a single Security state - Secure.
>   - MPU in EL1 & EL2 is configurable, MMU in EL1 is configurable.
>
> These patch series are implementing the Armv8-R64 MPU support
> for Xen, which are based on the discussion of
> "Proposal for Porting Xen to Armv8-R64 - DraftC" [2].
>
> We will implement the Armv8-R64 and MPU support in three stages:
> 1. Boot Xen itself to idle thread, do not create any guests on it.
> 2. Support to boot MPU and MMU domains on Armv8-R64 Xen.
> 3. SMP and other advanced features of Xen support on Armv8-R64.
>
> As we have not implemented guest support in part#1 series of MPU
> support, Xen can not create any guest in boot time. So in this
> patch serie, we provide an extra DNM-commit in the last for users
> to test Xen boot to idle on MPU system.
>
> We will split these patches to several parts, this series is the
> part#1, v1 is in [3], the full PoC can be found in [4]. More software for
> Armv8-R64 can be found in [5];
>
> [1] https://developer.arm.com/documentation/ddi0600/latest
> [2] https://lists.xenproject.org/archives/html/xen-devel/2022-05/msg00643.html
> [3] https://lists.xenproject.org/archives/html/xen-devel/2022-11/msg00289.html
> [4] https://gitlab.com/xen-project/people/weic/xen/-/tree/integration/mpu_v2
> [5] https://armv8r64-refstack.docs.arm.com/en/v5.0/
>
> Penny Zheng (28):
>    xen/mpu: build up start-of-day Xen MPU memory region map
>    xen/mpu: introduce helpers for MPU enablement
>    xen/mpu: introduce unified function setup_early_uart to map early UART
>    xen/arm64: head: Jump to the runtime mapping in enable_mm()
>    xen/arm: introduce setup_mm_mappings
>    xen/mpu: plump virt/maddr/mfn convertion in MPU system
>    xen/mpu: introduce helper access_protection_region
>    xen/mpu: populate a new region in Xen MPU mapping table
>    xen/mpu: plump early_fdt_map in MPU systems
>    xen/arm: move MMU-specific setup_mm to setup_mmu.c
>    xen/mpu: implement MPU version of setup_mm in setup_mpu.c
>    xen/mpu: initialize frametable in MPU system
>    xen/mpu: introduce "mpu,xxx-memory-section"
>    xen/mpu: map MPU guest memory section before static memory
>      initialization
>    xen/mpu: destroy an existing entry in Xen MPU memory mapping table
>    xen/mpu: map device memory resource in MPU system
>    xen/mpu: map boot module section in MPU system
>    xen/mpu: introduce mpu_memory_section_contains for address range check
>    xen/mpu: disable VMAP sub-system for MPU systems
>    xen/mpu: disable FIXMAP in MPU system
>    xen/mpu: implement MPU version of ioremap_xxx
>    xen/mpu: free init memory in MPU system
>    xen/mpu: destroy boot modules and early FDT mapping in MPU system
>    xen/mpu: Use secure hypervisor timer for AArch64v8R
>    xen/mpu: move MMU specific P2M code to p2m_mmu.c
>    xen/mpu: implement setup_virt_paging for MPU system
>    xen/mpu: re-order xen_mpumap in arch_init_finialize
>    xen/mpu: add Kconfig option to enable Armv8-R AArch64 support
>
> Wei Chen (13):
>    xen/arm: remove xen_phys_start and xenheap_phys_end from config.h
>    xen/arm: make ARM_EFI selectable for Arm64
>    xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA
>    xen/arm: add an option to define Xen start address for Armv8-R
>    xen/arm64: prepare for moving MMU related code from head.S
>    xen/arm64: move MMU related code from head.S to head_mmu.S
>    xen/arm64: add .text.idmap for Xen identity map sections
>    xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R
>    xen/arm: decouple copy_from_paddr with FIXMAP
>    xen/arm: split MMU and MPU config files from config.h
>    xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h
>    xen/arm: check mapping status and attributes for MPU copy_from_paddr
>    xen/mpu: make Xen boot to idle on MPU systems(DNM)
>
>   xen/arch/arm/Kconfig                      |   44 +-
>   xen/arch/arm/Makefile                     |   17 +-
>   xen/arch/arm/arm64/Makefile               |    5 +
>   xen/arch/arm/arm64/head.S                 |  466 +----
>   xen/arch/arm/arm64/head_mmu.S             |  399 ++++
>   xen/arch/arm/arm64/head_mpu.S             |  394 ++++
>   xen/arch/arm/bootfdt.c                    |   13 +-
>   xen/arch/arm/domain_build.c               |    4 +
>   xen/arch/arm/include/asm/alternative.h    |   15 +
>   xen/arch/arm/include/asm/arm64/flushtlb.h |   25 +
>   xen/arch/arm/include/asm/arm64/macros.h   |   51 +
>   xen/arch/arm/include/asm/arm64/mpu.h      |  174 ++
>   xen/arch/arm/include/asm/arm64/sysregs.h  |   77 +
>   xen/arch/arm/include/asm/config.h         |  105 +-
>   xen/arch/arm/include/asm/config_mmu.h     |  112 +
>   xen/arch/arm/include/asm/config_mpu.h     |   25 +
>   xen/arch/arm/include/asm/cpregs.h         |    4 +-
>   xen/arch/arm/include/asm/cpuerrata.h      |   12 +
>   xen/arch/arm/include/asm/cpufeature.h     |    7 +
>   xen/arch/arm/include/asm/early_printk.h   |   13 +
>   xen/arch/arm/include/asm/fixmap.h         |   28 +-
>   xen/arch/arm/include/asm/flushtlb.h       |   22 +
>   xen/arch/arm/include/asm/mm.h             |   78 +-
>   xen/arch/arm/include/asm/mm_mmu.h         |   77 +
>   xen/arch/arm/include/asm/mm_mpu.h         |   54 +
>   xen/arch/arm/include/asm/p2m.h            |   27 +-
>   xen/arch/arm/include/asm/p2m_mmu.h        |   28 +
>   xen/arch/arm/include/asm/processor.h      |   13 +
>   xen/arch/arm/include/asm/setup.h          |   39 +
>   xen/arch/arm/kernel.c                     |   31 +-
>   xen/arch/arm/mm.c                         | 1340 +-----------
>   xen/arch/arm/mm_mmu.c                     | 1376 +++++++++++++
>   xen/arch/arm/mm_mpu.c                     | 1056 ++++++++++
>   xen/arch/arm/p2m.c                        | 2282 +--------------------
>   xen/arch/arm/p2m_mmu.c                    | 2257 ++++++++++++++++++++
>   xen/arch/arm/p2m_mpu.c                    |  274 +++
>   xen/arch/arm/platforms/Kconfig            |   16 +-
>   xen/arch/arm/setup.c                      |  394 +---
>   xen/arch/arm/setup_mmu.c                  |  391 ++++
>   xen/arch/arm/setup_mpu.c                  |  208 ++
>   xen/arch/arm/time.c                       |   14 +-
>   xen/arch/arm/traps.c                      |    2 +
>   xen/arch/arm/xen.lds.S                    |   10 +-
>   xen/arch/x86/Kconfig                      |    1 +
>   xen/common/Kconfig                        |    6 +
>   xen/common/Makefile                       |    2 +-
>   xen/include/xen/vmap.h                    |   93 +-
>   47 files changed, 7500 insertions(+), 4581 deletions(-)
>   create mode 100644 xen/arch/arm/arm64/head_mmu.S
>   create mode 100644 xen/arch/arm/arm64/head_mpu.S
>   create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
>   create mode 100644 xen/arch/arm/include/asm/config_mmu.h
>   create mode 100644 xen/arch/arm/include/asm/config_mpu.h
>   create mode 100644 xen/arch/arm/include/asm/mm_mmu.h
>   create mode 100644 xen/arch/arm/include/asm/mm_mpu.h
>   create mode 100644 xen/arch/arm/include/asm/p2m_mmu.h
>   create mode 100644 xen/arch/arm/mm_mmu.c
>   create mode 100644 xen/arch/arm/mm_mpu.c
>   create mode 100644 xen/arch/arm/p2m_mmu.c
>   create mode 100644 xen/arch/arm/p2m_mpu.c
>   create mode 100644 xen/arch/arm/setup_mmu.c
>   create mode 100644 xen/arch/arm/setup_mpu.c
>
> --
> 2.25.1
>
>
I applied this serie and there were some compilation issues :-

1. drivers/passthrough/arm/smmu.c:1240:29: error: ‘P2M_ROOT_LEVEL’ 
undeclared (first use in this function)
  1240 |                 reg |= (2 - P2M_ROOT_LEVEL) << TTBCR_SL0_SHIFT;

2. drivers/passthrough/arm/smmu-v3.c:1211:24: error: ‘P2M_ROOT_LEVEL’ 
undeclared (first use in this function)
  1211 |         vtcr->sl = 2 - P2M_ROOT_LEVEL;

For the above two issues, I have disabled SMMU.

3. /scratch/ayankuma/xen_v8r_64/xen/arch/arm/arm64/head.S:470: undefined 
reference to `init_ttbr'
You might need to wrap with some #ifdef .

Can you provide me the dts and the config file with which you have tested ?

I see that the console got stuck at this line.

"(XEN) Command line: console=dtuart dtuart=serial0"

Looking into setup_static_mappings(),

     for ( uint8_t i = MSINFO_GUEST; i < MSINFO_MAX; i++ )
     {
#ifdef CONFIG_EARLY_PRINTK
         if ( i == MSINFO_DEVICE )
             /*
              * Destroy early UART mapping before mapping device memory 
section.
              * WARNING:console will be inaccessible temporarily.
              */
             destroy_xen_mappings(CONFIG_EARLY_UART_BASE_ADDRESS,
                                  CONFIG_EARLY_UART_BASE_ADDRESS + 
EARLY_UART_SIZE);
#endif
         map_mpu_memory_section_on_boot(i, mpu_section_mattr[i]); 
<<<<----- Is this expected to map "mpu,device-memory-section" ?
     }

- Ayan



^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-19 15:04   ` Julien Grall
@ 2023-01-29  5:39     ` Penny Zheng
  2023-01-29  7:37       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-29  5:39 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Thursday, January 19, 2023 11:04 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> memory region map
> 
> Hi Penny,
>

Hi Julien

Sorry for the late response, just come back from Chinese Spring Festival Holiday~
 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Penny Zheng <penny.zheng@arm.com>
> >
> > The start-of-day Xen MPU memory region layout shall be like as follows:
> >
> > xen_mpumap[0] : Xen text
> > xen_mpumap[1] : Xen read-only data
> > xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
> > read-write data xen_mpumap[4] : Xen BSS ......
> > xen_mpumap[max_xen_mpumap - 2]: Xen init data
> > xen_mpumap[max_xen_mpumap - 1]: Xen init text
> 
> Can you explain why the init region should be at the end of the MPU?
> 

As discussed in the v1 Serie, I'd like to put all transient MPU regions, like boot-only region,
at the end of the MPU.
Since they will get removed at the end of the boot, I am trying not to leave holes in the MPU
map by putting all transient MPU regions at rear. 

> >
> > max_xen_mpumap refers to the number of regions supported by the EL2
> MPU.
> > The layout shall be compliant with what we describe in xen.lds.S, or
> > the codes need adjustment.
> >
> > As MMU system and MPU system have different functions to create the
> > boot MMU/MPU memory management data, instead of introducing extra
> > #ifdef in main code flow, we introduce a neutral name
> > prepare_early_mappings for both, and also to replace create_page_tables
> for MMU.
> >
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/arm64/Makefile              |   2 +
> >   xen/arch/arm/arm64/head.S                |  17 +-
> >   xen/arch/arm/arm64/head_mmu.S            |   4 +-
> >   xen/arch/arm/arm64/head_mpu.S            | 323
> +++++++++++++++++++++++
> >   xen/arch/arm/include/asm/arm64/mpu.h     |  63 +++++
> >   xen/arch/arm/include/asm/arm64/sysregs.h |  49 ++++
> >   xen/arch/arm/mm_mpu.c                    |  48 ++++
> >   xen/arch/arm/xen.lds.S                   |   4 +
> >   8 files changed, 502 insertions(+), 8 deletions(-)
> >   create mode 100644 xen/arch/arm/arm64/head_mpu.S
> >   create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
> >   create mode 100644 xen/arch/arm/mm_mpu.c
> >
> > +/*
> > + * Macro to create a new MPU memory region entry, which is a
> > +structure
> > + * of pr_t,  in \prmap.
> > + *
> > + * Inputs:
> > + * prmap:   mpu memory region map table symbol
> > + * sel:     region selector
> > + * prbar:   preserve value for PRBAR_EL2
> > + * prlar    preserve value for PRLAR_EL2
> > + *
> > + * Clobbers \tmp1, \tmp2
> > + *
> > + */
> > +.macro create_mpu_entry prmap, sel, prbar, prlar, tmp1, tmp2
> > +    mov   \tmp2, \sel
> > +    lsl   \tmp2, \tmp2, #MPU_ENTRY_SHIFT
> > +    adr_l \tmp1, \prmap
> > +    /* Write the first 8 bytes(prbar_t) of pr_t */
> > +    str   \prbar, [\tmp1, \tmp2]
> > +
> > +    add   \tmp2, \tmp2, #8
> > +    /* Write the last 8 bytes(prlar_t) of pr_t */
> > +    str   \prlar, [\tmp1, \tmp2]
> 
> Any particular reason to not use 'stp'?
> 
> Also, AFAICT, with data cache disabled. But at least on ARMv8-A, the cache is
> never really off. So don't need some cache maintainance?
> 
> FAOD, I know the existing MMU code has the same issue. But I would rather
> prefer if the new code introduced is compliant to the Arm Arm.
> 

True, `stp` is better and I will clean data cache to be compliant to the Arm Arm.
I write the following example to see if I catch what you suggested:
```
add \tmp1, \tmp1, \tmp2
stp \prbar, \prlar, [\tmp1]
dc cvau, \tmp1
isb
dsb sy
```

> > +.endm
> > +
> > +/*
> > + * Macro to store the maximum number of regions supported by the EL2
> > +MPU
> > + * in max_xen_mpumap, which is identified by MPUIR_EL2.
> > + *
> > + * Outputs:
> > + * nr_regions: preserve the maximum number of regions supported by
> > +the EL2 MPU
> > + *
> > + * Clobbers \tmp1
> > + * > + */
> 
> Are you going to have multiple users? If not, then I would prefer if this is
> folded in the only caller.
> 

Ok. I will fold since I think it is one-time reading thingy.

> > +.macro read_max_el2_regions, nr_regions, tmp1
> > +    load_paddr \tmp1, max_xen_mpumap
> 
> I would rather prefer if we restrict the use of global while the MMU if off (see
> why above).
> 

If we don't use global here, then after MPU enabled, we need to re-read MPUIR_EL2
to get the number of maximum EL2 regions.

Or I put data cache clean after accessing global, is it better?
```
str   \nr_regions, [\tmp1]
dc cvau, \tmp1
isb
dsb sy
```

> > +    mrs   \nr_regions, MPUIR_EL2
> > +    isb
> 
> What's that isb for?
> 
> > +    str   \nr_regions, [\tmp1]
> > +.endm
> > +
> > +/*
> > + * ENTRY to configure a EL2 MPU memory region
> > + * ARMv8-R AArch64 at most supports 255 MPU protection regions.
> > + * See section G1.3.18 of the reference manual for ARMv8-R AArch64,
> > + * PRBAR<n>_EL2 and PRLAR<n>_EL2 provides access to the EL2 MPU
> > +region
> > + * determined by the value of 'n' and PRSELR_EL2.REGION as
> > + * PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
> > + * For example to access regions from 16 to 31 (0b10000 to 0b11111):
> > + * - Set PRSELR_EL2 to 0b1xxxx
> > + * - Region 16 configuration is accessible through PRBAR0_EL2 and
> > +PRLAR0_EL2
> > + * - Region 17 configuration is accessible through PRBAR1_EL2 and
> > +PRLAR1_EL2
> > + * - Region 18 configuration is accessible through PRBAR2_EL2 and
> > +PRLAR2_EL2
> > + * - ...
> > + * - Region 31 configuration is accessible through PRBAR15_EL2 and
> > +PRLAR15_EL2
> > + *
> > + * Inputs:
> > + * x27: region selector
> > + * x28: preserve value for PRBAR_EL2
> > + * x29: preserve value for PRLAR_EL2
> > + *
> > + */
> > +ENTRY(write_pr)
> 
> AFAICT, this function would not be necessary if the index for the init sections
> were hardcoded.
> 
> So I would like to understand why the index cannot be hardcoded.
> 

The reason is that we are putting init sections at the *end* of the MPU map, and
the length of the whole MPU map is platform-specific. We read it from MPUIR_EL2.
 
> > +    msr   PRSELR_EL2, x27
> > +    dsb   sy
> 
> [...]
> 
> > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index
> > bc45ea2c65..79965a3c17 100644
> > --- a/xen/arch/arm/xen.lds.S
> > +++ b/xen/arch/arm/xen.lds.S
> > @@ -91,6 +91,8 @@ SECTIONS
> >         __ro_after_init_end = .;
> >     } : text
> >
> > +  . = ALIGN(PAGE_SIZE);
> 
> Why do you need this ALIGN?
> 

I need a symbol as the start of the data section, so I introduce
"__data_begin = .;". 
If I use "__ro_after_init_end = .;" instead, I'm afraid in the future,
if someone introduces a new section after ro-after-init section, this part
also needs modification too.

When we define MPU regions for each section in xen.lds.S, we always treat these sections
page-aligned.
I checked each section in xen.lds.S, and ". = ALIGN(PAGE_SIZE);" is either added
at section head, or at the tail of the previous section, to make sure starting address symbol
page-aligned.

And if we don't put this ALIGN, if "__ro_after_init_end " symbol itself is not page-aligned,
the two adjacent sections will overlap in MPU.
 
> > +  __data_begin = .;
> >     .data.read_mostly : {
> >          /* Exception table */
> >          __start___ex_table = .;
> > @@ -157,7 +159,9 @@ SECTIONS
> >          *(.altinstr_replacement)
> 
> I know you are not using alternative instructions yet. But, you should make
> sure they are included. So I think rather than introduce __init_data_begin,
> you want to use "_einitext" for the start of the "Init data" section.
> 
> >     } :text
> >     . = ALIGN(PAGE_SIZE);
> > +
> 
> Spurious?
> 
> >     .init.data : {
> > +       __init_data_begin = .;            /* Init data */
> >          *(.init.rodata)
> >          *(.init.rodata.*)
> >
> 
> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny Zheng

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-24 19:09   ` Julien Grall
@ 2023-01-29  6:17     ` Penny Zheng
  2023-01-29  7:43       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-29  6:17 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Wednesday, January 25, 2023 3:09 AM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> setup_early_uart to map early UART
> 
> Hi Peny,

Hi Julien,

> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > In MMU system, we map the UART in the fixmap (when earlyprintk is used).
> > However in MPU system, we map the UART with a transient MPU memory
> > region.
> >
> > So we introduce a new unified function setup_early_uart to replace the
> > previous setup_fixmap.
> >
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/arm64/head.S               |  2 +-
> >   xen/arch/arm/arm64/head_mmu.S           |  4 +-
> >   xen/arch/arm/arm64/head_mpu.S           | 52
> +++++++++++++++++++++++++
> >   xen/arch/arm/include/asm/early_printk.h |  1 +
> >   4 files changed, 56 insertions(+), 3 deletions(-)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 7f3f973468..a92883319d 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -272,7 +272,7 @@ primary_switched:
> >            * afterwards.
> >            */
> >           bl    remove_identity_mapping
> > -        bl    setup_fixmap
> > +        bl    setup_early_uart
> >   #ifdef CONFIG_EARLY_PRINTK
> >           /* Use a virtual address to access the UART. */
> >           ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
> > diff --git a/xen/arch/arm/arm64/head_mmu.S
> > b/xen/arch/arm/arm64/head_mmu.S index b59c40495f..a19b7c873d
> 100644
> > --- a/xen/arch/arm/arm64/head_mmu.S
> > +++ b/xen/arch/arm/arm64/head_mmu.S
> > @@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
> >    *
> >    * Clobbers x0 - x3
> >    */
> > -ENTRY(setup_fixmap)
> > +ENTRY(setup_early_uart)
> 
> This function is doing more than enable the early UART. It also setups the
> fixmap even earlyprintk is not configured.

True, true.
I've thoroughly read the MMU implementation of setup_fixmap, and I'll try to split
it up.

> 
> I am not entirely sure what could be the name. Maybe this needs to be split
> further.
> 
> >   #ifdef CONFIG_EARLY_PRINTK
> >           /* Add UART to the fixmap table */
> >           ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
> > @@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
> >           dsb   nshst
> >
> >           ret
> > -ENDPROC(setup_fixmap)
> > +ENDPROC(setup_early_uart)
> >
> >   /* Fail-stop */
> >   fail:   PRINT("- Boot failed -\r\n")
> > diff --git a/xen/arch/arm/arm64/head_mpu.S
> > b/xen/arch/arm/arm64/head_mpu.S index e2ac69b0cc..72d1e0863d
> 100644
> > --- a/xen/arch/arm/arm64/head_mpu.S
> > +++ b/xen/arch/arm/arm64/head_mpu.S
> > @@ -18,8 +18,10 @@
> >   #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
> >   #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
> >   #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
> > +#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
> >
> >   #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
> > +#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
> >
> >   /*
> >    * Macro to round up the section address to be PAGE_SIZE aligned @@
> > -334,6 +336,56 @@ ENTRY(enable_mm)
> >       ret
> >   ENDPROC(enable_mm)
> >
> > +/*
> > + * Map the early UART with a new transient MPU memory region.
> > + *
> 
> Missing "Inputs: "
> 
> > + * x27: region selector
> > + * x28: prbar
> > + * x29: prlar
> > + *
> > + * Clobbers x0 - x4
> > + *
> > + */
> > +ENTRY(setup_early_uart)
> > +#ifdef CONFIG_EARLY_PRINTK
> > +    /* stack LR as write_pr will be called later like nested function */
> > +    mov   x3, lr
> > +
> > +    /*
> > +     * MPU region for early UART is a transient region, since it will be
> > +     * replaced by specific device memory layout when FDT gets parsed.
> 
> I would rather not mention "FDT" here because this code is independent to
> the firmware table used.
> 
> However, any reason to use a transient region rather than the one that will
> be used for the UART driver?
> 

We don’t want to define a MPU region for each device driver. It will exhaust
MPU regions very quickly.
In commit " [PATCH v2 28/40] xen/mpu: map boot module section in MPU system", 
A new FDT property `mpu,device-memory-section` will be introduced for users to statically
configure the whole system device memory with the least number of memory regions in Device Tree.
This section shall cover all devices that will be used in Xen, like `UART`, `GIC`, etc.
For FVP_BaseR_AEMv8R, we have the following definition:
```
mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;
```

> > +     */
> > +    load_paddr x0, next_transient_region_idx
> > +    ldr   x4, [x0]
> > +
> > +    ldr   x28, =CONFIG_EARLY_UART_BASE_ADDRESS
> > +    and   x28, x28, #MPU_REGION_MASK
> > +    mov   x1, #REGION_DEVICE_PRBAR
> > +    orr   x28, x28, x1
> 
> This needs some documentation to explain the logic. Maybe even a macro.
> 

Do you suggest that I shall explain how we compose PRBAR_EL2 register?

> > +
> > +    ldr x29, =(CONFIG_EARLY_UART_BASE_ADDRESS + EARLY_UART_SIZE)
> > +    roundup_section x29
> 
> Does this mean we could give access to more than necessary? Shouldn't
> instead prevent compilation if the size doesn't align with the section size?
> 

True, we could not treat uart section like we do for the section defined in xen.lds.S.
CONFIG_EARLY_UART_BASE_ADDRESS and EARLY_UART_SIZE shall both be checked
if it is aligned with PAGE_SIZE.

> > +    /* Limit address is inclusive */
> > +    sub   x29, x29, #1
> > +    and   x29, x29, #MPU_REGION_MASK
> > +    mov   x2, #REGION_DEVICE_PRLAR
> > +    orr   x29, x29, x2
> > +
> > +    mov   x27, x4
> 
> This needs some documentation like:
> 
> x27: region selector
> 
> See how we documented the existing helpers.
> 
> > +    bl    write_pr
> > +
> > +    /* Create a new entry in xen_mpumap for early UART */
> > +    create_mpu_entry xen_mpumap, x4, x28, x29, x1, x2
> > +
> > +    /* Update next_transient_region_idx */
> > +    sub   x4, x4, #1
> > +    str   x4, [x0]
> > +
> > +    mov   lr, x3
> > +    ret
> > +#endif
> > +ENDPROC(setup_early_uart)
> > +
> >   /*
> >    * Local variables:
> >    * mode: ASM
> > diff --git a/xen/arch/arm/include/asm/early_printk.h
> > b/xen/arch/arm/include/asm/early_printk.h
> > index 44a230853f..d87623e6d5 100644
> > --- a/xen/arch/arm/include/asm/early_printk.h
> > +++ b/xen/arch/arm/include/asm/early_printk.h
> > @@ -22,6 +22,7 @@
> >    * for EARLY_UART_VIRTUAL_ADDRESS.
> >    */
> >   #define EARLY_UART_VIRTUAL_ADDRESS
> CONFIG_EARLY_UART_BASE_ADDRESS
> > +#define EARLY_UART_SIZE            0x1000
> 
> Shouldn't this be PAGE_SIZE? If not, how did you come up with the number?
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-19 10:18   ` Ayan Kumar Halder
@ 2023-01-29  6:47     ` Penny Zheng
  0 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-01-29  6:47 UTC (permalink / raw)
  To: Ayan Kumar Halder, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr_Babchuk

Hi Ayan

> -----Original Message-----
> From: Ayan Kumar Halder <ayankuma@amd.com>
> Sent: Thursday, January 19, 2023 6:19 PM
> To: xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Stefano Stabellini <sstabellini@kernel.org>; Julien
> Grall <julien@xen.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr_Babchuk@epam.com
> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> memory region map
> 
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > CAUTION: This message has originated from an External Source. Please use
> proper judgment and caution when opening attachments, clicking links, or
> responding to this email.
> >
> >
> > From: Penny Zheng <penny.zheng@arm.com>
> >
> > The start-of-day Xen MPU memory region layout shall be like as follows:
> >
> > xen_mpumap[0] : Xen text
> > xen_mpumap[1] : Xen read-only data
> > xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
> > read-write data xen_mpumap[4] : Xen BSS ......
> > xen_mpumap[max_xen_mpumap - 2]: Xen init data
> > xen_mpumap[max_xen_mpumap - 1]: Xen init text
> >
> > max_xen_mpumap refers to the number of regions supported by the EL2
> MPU.
> > The layout shall be compliant with what we describe in xen.lds.S, or
> > the codes need adjustment.
> >
> > As MMU system and MPU system have different functions to create the
> > boot MMU/MPU memory management data, instead of introducing extra
> > #ifdef in main code flow, we introduce a neutral name
> > prepare_early_mappings for both, and also to replace create_page_tables
> for MMU.
> >
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/arm64/Makefile              |   2 +
> >   xen/arch/arm/arm64/head.S                |  17 +-
> >   xen/arch/arm/arm64/head_mmu.S            |   4 +-
> >   xen/arch/arm/arm64/head_mpu.S            | 323
> +++++++++++++++++++++++
> >   xen/arch/arm/include/asm/arm64/mpu.h     |  63 +++++
> >   xen/arch/arm/include/asm/arm64/sysregs.h |  49 ++++
> >   xen/arch/arm/mm_mpu.c                    |  48 ++++
> >   xen/arch/arm/xen.lds.S                   |   4 +
> >   8 files changed, 502 insertions(+), 8 deletions(-)
> >   create mode 100644 xen/arch/arm/arm64/head_mpu.S
> >   create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
> >   create mode 100644 xen/arch/arm/mm_mpu.c
> >
> > diff --git a/xen/arch/arm/arm64/Makefile
> b/xen/arch/arm/arm64/Makefile
> > index 22da2f54b5..438c9737ad 100644
> > --- a/xen/arch/arm/arm64/Makefile
> > +++ b/xen/arch/arm/arm64/Makefile
> > @@ -10,6 +10,8 @@ obj-y += entry.o
> >   obj-y += head.o
> >   ifneq ($(CONFIG_HAS_MPU),y)
> >   obj-y += head_mmu.o
> > +else
> > +obj-y += head_mpu.o
> >   endif
> >   obj-y += insn.o
> >   obj-$(CONFIG_LIVEPATCH) += livepatch.o diff --git
> > a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S index
> > 782bd1f94c..145e3d53dc 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -68,9 +68,9 @@
> >    *  x24 -
> >    *  x25 -
> >    *  x26 - skip_zero_bss (boot cpu only)
> > - *  x27 -
> > - *  x28 -
> > - *  x29 -
> > + *  x27 - region selector (mpu only)
> > + *  x28 - prbar (mpu only)
> > + *  x29 - prlar (mpu only)
> >    *  x30 - lr
> >    */
> >
> > @@ -82,7 +82,7 @@
> >    * ---------------------------
> >    *
> >    * The requirements are:
> > - *   MMU = off, D-cache = off, I-cache = on or off,
> > + *   MMU/MPU = off, D-cache = off, I-cache = on or off,
> >    *   x0 = physical address to the FDT blob.
> >    *
> >    * This must be the very first address in the loaded image.
> > @@ -252,7 +252,12 @@ real_start_efi:
> >
> >           bl    check_cpu_mode
> >           bl    cpu_init
> > -        bl    create_page_tables
> > +
> > +        /*
> > +         * Create boot memory management data, pagetable for MMU
> systems
> > +         * and memory regions for MPU systems.
> > +         */
> > +        bl    prepare_early_mappings
> >           bl    enable_mmu
> >
> >           /* We are still in the 1:1 mapping. Jump to the runtime
> > Virtual Address. */ @@ -310,7 +315,7 @@ GLOBAL(init_secondary)
> >   #endif
> >           bl    check_cpu_mode
> >           bl    cpu_init
> > -        bl    create_page_tables
> > +        bl    prepare_early_mappings
> >           bl    enable_mmu
> >
> >           /* We are still in the 1:1 mapping. Jump to the runtime
> > Virtual Address. */ diff --git a/xen/arch/arm/arm64/head_mmu.S
> > b/xen/arch/arm/arm64/head_mmu.S index 6ff13c751c..2346f755df
> 100644
> > --- a/xen/arch/arm/arm64/head_mmu.S
> > +++ b/xen/arch/arm/arm64/head_mmu.S
> > @@ -123,7 +123,7 @@
> >    *
> >    * Clobbers x0 - x4
> >    */
> > -ENTRY(create_page_tables)
> > +ENTRY(prepare_early_mappings)
> >           /* Prepare the page-tables for mapping Xen */
> >           ldr   x0, =XEN_VIRT_START
> >           create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2,
> > x3 @@ -208,7 +208,7 @@ virtphys_clash:
> >           /* Identity map clashes with boot_third, which we cannot handle yet
> */
> >           PRINT("- Unable to build boot page tables - virt and phys addresses
> clash. -\r\n")
> >           b     fail
> > -ENDPROC(create_page_tables)
> > +ENDPROC(prepare_early_mappings)
> 
> NIT:- Can this renaming be done in a separate patch of its own (before this
> patch).
> 

Yay, you're right. I'll put it in different commit.

> So that this patch can be only about the new functionality introduced.
> 
> >
> >   /*
> >    * Turn on the Data Cache and the MMU. The function will return on
> > the 1:1 diff --git a/xen/arch/arm/arm64/head_mpu.S
> > b/xen/arch/arm/arm64/head_mpu.S new file mode 100644 index
> > 0000000000..0b97ce4646
> > --- /dev/null
> > +++ b/xen/arch/arm/arm64/head_mpu.S
> > @@ -0,0 +1,323 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Start-of-day code for an Armv8-R AArch64 MPU system.
> > + */
> > +
> > +#include <asm/arm64/mpu.h>
> > +#include <asm/early_printk.h>
> > +#include <asm/page.h>
> > +
> > +/*
> > + * One entry in Xen MPU memory region mapping table(xen_mpumap) is
> a
> > +structure
> > + * of pr_t, which is 16-bytes size, so the entry offset is the order of 4.
> > + */
> NIT :- It would be good to quote Arm ARM from which can be referred for
> the definitions.
> > +#define MPU_ENTRY_SHIFT         0x4
> > +
> > +#define REGION_SEL_MASK         0xf
> > +
> > +#define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
> > +#define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
> > +#define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
> > +
> > +#define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
> > +
> > +/*
> > + * Macro to round up the section address to be PAGE_SIZE aligned
> > + * Each section(e.g. .text, .data, etc) in xen.lds.S is page-aligned,
> > + * which is usually guarded with ". = ALIGN(PAGE_SIZE)" in the head,
> > + * or in the end
> > + */
> > +.macro roundup_section, xb
> > +        add   \xb, \xb, #(PAGE_SIZE-1)
> > +        and   \xb, \xb, #PAGE_MASK
> > +.endm
> > +
> > +/*
> > + * Macro to create a new MPU memory region entry, which is a
> > +structure
> > + * of pr_t,  in \prmap.
> > + *
> > + * Inputs:
> > + * prmap:   mpu memory region map table symbol
> > + * sel:     region selector
> > + * prbar:   preserve value for PRBAR_EL2
> > + * prlar    preserve value for PRLAR_EL2
> > + *
> > + * Clobbers \tmp1, \tmp2
> > + *
> > + */
> > +.macro create_mpu_entry prmap, sel, prbar, prlar, tmp1, tmp2
> > +    mov   \tmp2, \sel
> > +    lsl   \tmp2, \tmp2, #MPU_ENTRY_SHIFT
> > +    adr_l \tmp1, \prmap
> > +    /* Write the first 8 bytes(prbar_t) of pr_t */
> > +    str   \prbar, [\tmp1, \tmp2]
> > +
> > +    add   \tmp2, \tmp2, #8
> > +    /* Write the last 8 bytes(prlar_t) of pr_t */
> > +    str   \prlar, [\tmp1, \tmp2]
> > +.endm
> > +
> > +/*
> > + * Macro to store the maximum number of regions supported by the EL2
> > +MPU
> > + * in max_xen_mpumap, which is identified by MPUIR_EL2.
> > + *
> > + * Outputs:
> > + * nr_regions: preserve the maximum number of regions supported by
> > +the EL2 MPU
> > + *
> > + * Clobbers \tmp1
> > + *
> > + */
> > +.macro read_max_el2_regions, nr_regions, tmp1
> > +    load_paddr \tmp1, max_xen_mpumap
> > +    mrs   \nr_regions, MPUIR_EL2
> > +    isb
> > +    str   \nr_regions, [\tmp1]
> > +.endm
> > +
> > +/*
> > + * Macro to prepare and set a MPU memory region
> > + *
> > + * Inputs:
> > + * base:        base address symbol (should be page-aligned)
> > + * limit:       limit address symbol
> > + * sel:         region selector
> > + * prbar:       store computed PRBAR_EL2 value
> > + * prlar:       store computed PRLAR_EL2 value
> > + * attr_prbar:  PRBAR_EL2-related memory attributes. If not specified
> > +it will be REGION_DATA_PRBAR
> > + * attr_prlar:  PRLAR_EL2-related memory attributes. If not specified
> > +it will be REGION_NORMAL_PRLAR
> > + *
> > + * Clobber \tmp1
> > + *
> > + */
> > +.macro prepare_xen_region, base, limit, sel, prbar, prlar, tmp1,
> attr_prbar=REGION_DATA_PRBAR, attr_prlar=REGION_NORMAL_PRLAR
> > +    /* Prepare value for PRBAR_EL2 reg and preserve it in \prbar.*/
> > +    load_paddr \prbar, \base
> > +    and   \prbar, \prbar, #MPU_REGION_MASK
> > +    mov   \tmp1, #\attr_prbar
> > +    orr   \prbar, \prbar, \tmp1
> > +
> > +    /* Prepare value for PRLAR_EL2 reg and preserve it in \prlar.*/
> > +    load_paddr \prlar, \limit
> > +    /* Round up limit address to be PAGE_SIZE aligned */
> > +    roundup_section \prlar
> > +    /* Limit address should be inclusive */
> > +    sub   \prlar, \prlar, #1
> > +    and   \prlar, \prlar, #MPU_REGION_MASK
> > +    mov   \tmp1, #\attr_prlar
> > +    orr   \prlar, \prlar, \tmp1
> > +
> > +    mov   x27, \sel
> > +    mov   x28, \prbar
> > +    mov   x29, \prlar
> 
> Any reasons for using x27, x28, x29 to pass function parameters.
> 
> https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
> states x0..x7 should be used (Table 2, General-purpose registers and
> AAPCS64 usage).
> 

These registers are documented and reserved in xen/arch/arm/arm64/head.S, like
how we reserve x26 to pass function parameter in skip_zero_bss, see
```
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 782bd1f94c..145e3d53dc 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -68,9 +68,9 @@
  *  x24 -
  *  x25 -
  *  x26 - skip_zero_bss (boot cpu only)
- *  x27 -
- *  x28 -
- *  x29 -
+ *  x27 - region selector (mpu only)
+ *  x28 - prbar (mpu only)
+ *  x29 - prlar (mpu only)
  *  x30 - lr
  */
```
x0...x7 are already commonly used in xen/arch/arm/arm64/head.S, it is difficult for me
to preserve them only for write_pr.

If we are using x0...x7 as function parameter, I need to stack/pop them to mutate
stack operation in write_pr to avoid corruption.

> > +    /*
> > +     * x2skip_zero7, x28, x29 are special registers designed as
> > +     * inputs for function write_pr
> > +     */
> > +    bl    write_pr
> > +.endm
> > +
[...]
> > --
> > 2.25.1
> >
> NIT:- Would you consider splitting this patch, something like this :-
> 
> 1. Renaming of the mmu function
> 
> 2. Define sysregs, prlar_t, prbar_t and other other hardware specific macros.
> 
> 3. Define write_pr
> 
> 4. The rest of the changes (ie prepare_early_mappings(), xen.lds.S, etc)
> 

For 2, 3 and 4, it will break the rule of "Always define and introduce at the
first usage".
However, I know that this commit is very big ;/, so as long as maintainers are also
in favor of your splitting suggestion, I'm happy to do the split too~

> - Ayan


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-29  5:39     ` Penny Zheng
@ 2023-01-29  7:37       ` Julien Grall
  2023-01-30  5:45         ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-29  7:37 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 29/01/2023 05:39, Penny Zheng wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Thursday, January 19, 2023 11:04 PM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
>> memory region map
>>
>> Hi Penny,
>>
> 
> Hi Julien
> 
> Sorry for the late response, just come back from Chinese Spring Festival Holiday~
>   
>> On 13/01/2023 05:28, Penny Zheng wrote:
>>> From: Penny Zheng <penny.zheng@arm.com>
>>>
>>> The start-of-day Xen MPU memory region layout shall be like as follows:
>>>
>>> xen_mpumap[0] : Xen text
>>> xen_mpumap[1] : Xen read-only data
>>> xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
>>> read-write data xen_mpumap[4] : Xen BSS ......
>>> xen_mpumap[max_xen_mpumap - 2]: Xen init data
>>> xen_mpumap[max_xen_mpumap - 1]: Xen init text
>>
>> Can you explain why the init region should be at the end of the MPU?
>>
> 
> As discussed in the v1 Serie, I'd like to put all transient MPU regions, like boot-only region,
> at the end of the MPU.

I vaguely recall the discussion but can't seem to find the thread. Do 
you have a link? (A summary in the patch would have been nice)

> Since they will get removed at the end of the boot, I am trying not to leave holes in the MPU
> map by putting all transient MPU regions at rear.

I understand the principle, but I am not convinced this is worth it 
because of the increase complexity in the assembly code.

What would be the problem with reshuffling partially the MPU once we booted?

> 
>>>
>>> max_xen_mpumap refers to the number of regions supported by the EL2
>> MPU.
>>> The layout shall be compliant with what we describe in xen.lds.S, or
>>> the codes need adjustment.
>>>
>>> As MMU system and MPU system have different functions to create the
>>> boot MMU/MPU memory management data, instead of introducing extra
>>> #ifdef in main code flow, we introduce a neutral name
>>> prepare_early_mappings for both, and also to replace create_page_tables
>> for MMU.
>>>
>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/arm64/Makefile              |   2 +
>>>    xen/arch/arm/arm64/head.S                |  17 +-
>>>    xen/arch/arm/arm64/head_mmu.S            |   4 +-
>>>    xen/arch/arm/arm64/head_mpu.S            | 323
>> +++++++++++++++++++++++
>>>    xen/arch/arm/include/asm/arm64/mpu.h     |  63 +++++
>>>    xen/arch/arm/include/asm/arm64/sysregs.h |  49 ++++
>>>    xen/arch/arm/mm_mpu.c                    |  48 ++++
>>>    xen/arch/arm/xen.lds.S                   |   4 +
>>>    8 files changed, 502 insertions(+), 8 deletions(-)
>>>    create mode 100644 xen/arch/arm/arm64/head_mpu.S
>>>    create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
>>>    create mode 100644 xen/arch/arm/mm_mpu.c
>>>
>>> +/*
>>> + * Macro to create a new MPU memory region entry, which is a
>>> +structure
>>> + * of pr_t,  in \prmap.
>>> + *
>>> + * Inputs:
>>> + * prmap:   mpu memory region map table symbol
>>> + * sel:     region selector
>>> + * prbar:   preserve value for PRBAR_EL2
>>> + * prlar    preserve value for PRLAR_EL2
>>> + *
>>> + * Clobbers \tmp1, \tmp2
>>> + *
>>> + */
>>> +.macro create_mpu_entry prmap, sel, prbar, prlar, tmp1, tmp2
>>> +    mov   \tmp2, \sel
>>> +    lsl   \tmp2, \tmp2, #MPU_ENTRY_SHIFT
>>> +    adr_l \tmp1, \prmap
>>> +    /* Write the first 8 bytes(prbar_t) of pr_t */
>>> +    str   \prbar, [\tmp1, \tmp2]
>>> +
>>> +    add   \tmp2, \tmp2, #8
>>> +    /* Write the last 8 bytes(prlar_t) of pr_t */
>>> +    str   \prlar, [\tmp1, \tmp2]
>>
>> Any particular reason to not use 'stp'?
>>
>> Also, AFAICT, with data cache disabled. But at least on ARMv8-A, the cache is
>> never really off. So don't need some cache maintainance?
>>
>> FAOD, I know the existing MMU code has the same issue. But I would rather
>> prefer if the new code introduced is compliant to the Arm Arm.
>>
> 
> True, `stp` is better and I will clean data cache to be compliant to the Arm Arm.
> I write the following example to see if I catch what you suggested:
> ```
> add \tmp1, \tmp1, \tmp2
> stp \prbar, \prlar, [\tmp1]
> dc cvau, \tmp1

I think this wants to be invalidate rather than clean because the cache 
is off.

> isb
> dsb sy
> ```
> 
>>> +.endm
>>> +
>>> +/*
>>> + * Macro to store the maximum number of regions supported by the EL2
>>> +MPU
>>> + * in max_xen_mpumap, which is identified by MPUIR_EL2.
>>> + *
>>> + * Outputs:
>>> + * nr_regions: preserve the maximum number of regions supported by
>>> +the EL2 MPU
>>> + *
>>> + * Clobbers \tmp1
>>> + * > + */
>>
>> Are you going to have multiple users? If not, then I would prefer if this is
>> folded in the only caller.
>>
> 
> Ok. I will fold since I think it is one-time reading thingy.
> 
>>> +.macro read_max_el2_regions, nr_regions, tmp1
>>> +    load_paddr \tmp1, max_xen_mpumap
>>
>> I would rather prefer if we restrict the use of global while the MMU if off (see
>> why above).
>>
> 
> If we don't use global here, then after MPU enabled, we need to re-read MPUIR_EL2
> to get the number of maximum EL2 regions.

Which, IMHO, is better than having to think about cache.

> 
> Or I put data cache clean after accessing global, is it better?
> ```
> str   \nr_regions, [\tmp1]
> dc cvau, \tmp1
> isb
> dsb sy
> ```
> 
>>> +    mrs   \nr_regions, MPUIR_EL2
>>> +    isb
>>
>> What's that isb for?
>>
>>> +    str   \nr_regions, [\tmp1]
>>> +.endm
>>> +
>>> +/*
>>> + * ENTRY to configure a EL2 MPU memory region
>>> + * ARMv8-R AArch64 at most supports 255 MPU protection regions.
>>> + * See section G1.3.18 of the reference manual for ARMv8-R AArch64,
>>> + * PRBAR<n>_EL2 and PRLAR<n>_EL2 provides access to the EL2 MPU
>>> +region
>>> + * determined by the value of 'n' and PRSELR_EL2.REGION as
>>> + * PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
>>> + * For example to access regions from 16 to 31 (0b10000 to 0b11111):
>>> + * - Set PRSELR_EL2 to 0b1xxxx
>>> + * - Region 16 configuration is accessible through PRBAR0_EL2 and
>>> +PRLAR0_EL2
>>> + * - Region 17 configuration is accessible through PRBAR1_EL2 and
>>> +PRLAR1_EL2
>>> + * - Region 18 configuration is accessible through PRBAR2_EL2 and
>>> +PRLAR2_EL2
>>> + * - ...
>>> + * - Region 31 configuration is accessible through PRBAR15_EL2 and
>>> +PRLAR15_EL2
>>> + *
>>> + * Inputs:
>>> + * x27: region selector
>>> + * x28: preserve value for PRBAR_EL2
>>> + * x29: preserve value for PRLAR_EL2
>>> + *
>>> + */
>>> +ENTRY(write_pr)
>>
>> AFAICT, this function would not be necessary if the index for the init sections
>> were hardcoded.
>>
>> So I would like to understand why the index cannot be hardcoded.
>>
> 
> The reason is that we are putting init sections at the *end* of the MPU map, and
> the length of the whole MPU map is platform-specific. We read it from MPUIR_EL2.

Right, I got that bit from the code. What I would like to understand is 
why all the initial address cannot be hardocoded?

 From a brief look, this would simplify a lot the assembly code.

>   
>>> +    msr   PRSELR_EL2, x27
>>> +    dsb   sy
>>
>> [...]
>>
>>> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index
>>> bc45ea2c65..79965a3c17 100644
>>> --- a/xen/arch/arm/xen.lds.S
>>> +++ b/xen/arch/arm/xen.lds.S
>>> @@ -91,6 +91,8 @@ SECTIONS
>>>          __ro_after_init_end = .;
>>>      } : text
>>>
>>> +  . = ALIGN(PAGE_SIZE);
>>
>> Why do you need this ALIGN?
>>
> 
> I need a symbol as the start of the data section, so I introduce
> "__data_begin = .;".
> If I use "__ro_after_init_end = .;" instead, I'm afraid in the future,
> if someone introduces a new section after ro-after-init section, this part
> also needs modification too.

I haven't suggested there is a problem to define a new symbol. I am 
merely asking about the align.

> 
> When we define MPU regions for each section in xen.lds.S, we always treat these sections
> page-aligned.
> I checked each section in xen.lds.S, and ". = ALIGN(PAGE_SIZE);" is either added
> at section head, or at the tail of the previous section, to make sure starting address symbol
> page-aligned.
> 
> And if we don't put this ALIGN, if "__ro_after_init_end " symbol itself is not page-aligned,
> the two adjacent sections will overlap in MPU.

__ro_after_init_end *has* to be page aligned because the permissions are 
different than for __data_begin.

If we were going to add a new section, then either it has the same 
permission as .data.read.mostly and we will bundle them or it doesn't 
and we would need a .align.

But today, the extra .ALIGN seems unnecessary (at least in the context 
of this patch).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-29  6:17     ` Penny Zheng
@ 2023-01-29  7:43       ` Julien Grall
  2023-01-30  6:24         ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-29  7:43 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 29/01/2023 06:17, Penny Zheng wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Wednesday, January 25, 2023 3:09 AM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>> setup_early_uart to map early UART
>>
>> Hi Peny,
> 
> Hi Julien,
> 
>>
>> On 13/01/2023 05:28, Penny Zheng wrote:
>>> In MMU system, we map the UART in the fixmap (when earlyprintk is used).
>>> However in MPU system, we map the UART with a transient MPU memory
>>> region.
>>>
>>> So we introduce a new unified function setup_early_uart to replace the
>>> previous setup_fixmap.
>>>
>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/arm64/head.S               |  2 +-
>>>    xen/arch/arm/arm64/head_mmu.S           |  4 +-
>>>    xen/arch/arm/arm64/head_mpu.S           | 52
>> +++++++++++++++++++++++++
>>>    xen/arch/arm/include/asm/early_printk.h |  1 +
>>>    4 files changed, 56 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
>>> index 7f3f973468..a92883319d 100644
>>> --- a/xen/arch/arm/arm64/head.S
>>> +++ b/xen/arch/arm/arm64/head.S
>>> @@ -272,7 +272,7 @@ primary_switched:
>>>             * afterwards.
>>>             */
>>>            bl    remove_identity_mapping
>>> -        bl    setup_fixmap
>>> +        bl    setup_early_uart
>>>    #ifdef CONFIG_EARLY_PRINTK
>>>            /* Use a virtual address to access the UART. */
>>>            ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
>>> diff --git a/xen/arch/arm/arm64/head_mmu.S
>>> b/xen/arch/arm/arm64/head_mmu.S index b59c40495f..a19b7c873d
>> 100644
>>> --- a/xen/arch/arm/arm64/head_mmu.S
>>> +++ b/xen/arch/arm/arm64/head_mmu.S
>>> @@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
>>>     *
>>>     * Clobbers x0 - x3
>>>     */
>>> -ENTRY(setup_fixmap)
>>> +ENTRY(setup_early_uart)
>>
>> This function is doing more than enable the early UART. It also setups the
>> fixmap even earlyprintk is not configured.
> 
> True, true.
> I've thoroughly read the MMU implementation of setup_fixmap, and I'll try to split
> it up.
> 
>>
>> I am not entirely sure what could be the name. Maybe this needs to be split
>> further.
>>
>>>    #ifdef CONFIG_EARLY_PRINTK
>>>            /* Add UART to the fixmap table */
>>>            ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
>>> @@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
>>>            dsb   nshst
>>>
>>>            ret
>>> -ENDPROC(setup_fixmap)
>>> +ENDPROC(setup_early_uart)
>>>
>>>    /* Fail-stop */
>>>    fail:   PRINT("- Boot failed -\r\n")
>>> diff --git a/xen/arch/arm/arm64/head_mpu.S
>>> b/xen/arch/arm/arm64/head_mpu.S index e2ac69b0cc..72d1e0863d
>> 100644
>>> --- a/xen/arch/arm/arm64/head_mpu.S
>>> +++ b/xen/arch/arm/arm64/head_mpu.S
>>> @@ -18,8 +18,10 @@
>>>    #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
>>>    #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
>>>    #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
>>> +#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
>>>
>>>    #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
>>> +#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
>>>
>>>    /*
>>>     * Macro to round up the section address to be PAGE_SIZE aligned @@
>>> -334,6 +336,56 @@ ENTRY(enable_mm)
>>>        ret
>>>    ENDPROC(enable_mm)
>>>
>>> +/*
>>> + * Map the early UART with a new transient MPU memory region.
>>> + *
>>
>> Missing "Inputs: "
>>
>>> + * x27: region selector
>>> + * x28: prbar
>>> + * x29: prlar
>>> + *
>>> + * Clobbers x0 - x4
>>> + *
>>> + */
>>> +ENTRY(setup_early_uart)
>>> +#ifdef CONFIG_EARLY_PRINTK
>>> +    /* stack LR as write_pr will be called later like nested function */
>>> +    mov   x3, lr
>>> +
>>> +    /*
>>> +     * MPU region for early UART is a transient region, since it will be
>>> +     * replaced by specific device memory layout when FDT gets parsed.
>>
>> I would rather not mention "FDT" here because this code is independent to
>> the firmware table used.
>>
>> However, any reason to use a transient region rather than the one that will
>> be used for the UART driver?
>>
> 
> We don’t want to define a MPU region for each device driver. It will exhaust
> MPU regions very quickly.
What the usual size of an MPU?

However, even if you don't want to define one for every device, it still 
seem to be sensible to define a fixed temporary one for the early UART 
as this would simplify the assembly code.


> In commit " [PATCH v2 28/40] xen/mpu: map boot module section in MPU system",

Did you mean patch #27?

> A new FDT property `mpu,device-memory-section` will be introduced for users to statically
> configure the whole system device memory with the least number of memory regions in Device Tree.
> This section shall cover all devices that will be used in Xen, like `UART`, `GIC`, etc.
> For FVP_BaseR_AEMv8R, we have the following definition:
> ```
> mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;
> ```

I am a bit worry this will be a recipe for mistake. Do you have an 
example where the MPU will be exhausted if we reserve some entries for 
each device (or some)?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-29  7:37       ` Julien Grall
@ 2023-01-30  5:45         ` Penny Zheng
  2023-01-30  9:39           ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-30  5:45 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Sunday, January 29, 2023 3:37 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> memory region map
> 
> Hi Penny,
> 

Hi Julien,

> On 29/01/2023 05:39, Penny Zheng wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Thursday, January 19, 2023 11:04 PM
> >> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >> <Volodymyr_Babchuk@epam.com>
> >> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> >> memory region map
> >>
> >> Hi Penny,
> >>
> >
> > Hi Julien
> >
> > Sorry for the late response, just come back from Chinese Spring
> > Festival Holiday~
> >
> >> On 13/01/2023 05:28, Penny Zheng wrote:
> >>> From: Penny Zheng <penny.zheng@arm.com>
> >>>
> >>> The start-of-day Xen MPU memory region layout shall be like as follows:
> >>>
> >>> xen_mpumap[0] : Xen text
> >>> xen_mpumap[1] : Xen read-only data
> >>> xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
> >>> read-write data xen_mpumap[4] : Xen BSS ......
> >>> xen_mpumap[max_xen_mpumap - 2]: Xen init data
> >>> xen_mpumap[max_xen_mpumap - 1]: Xen init text
> >>
> >> Can you explain why the init region should be at the end of the MPU?
> >>
> >
> > As discussed in the v1 Serie, I'd like to put all transient MPU
> > regions, like boot-only region, at the end of the MPU.
> 
> I vaguely recall the discussion but can't seem to find the thread. Do you have
> a link? (A summary in the patch would have been nice)
> 
> > Since they will get removed at the end of the boot, I am trying not to
> > leave holes in the MPU map by putting all transient MPU regions at rear.
> 
> I understand the principle, but I am not convinced this is worth it because of
> the increase complexity in the assembly code.
> 
> What would be the problem with reshuffling partially the MPU once we
> booted?

 There are three types of MPU regions during boot-time:
1. Fixed MPU region
Regions like Xen text section, Xen heap section, etc.
2. Boot-only MPU region
Regions like Xen init sections, etc. It will be removed at the end of booting.
3.   Regions need switching in/out during vcpu context switch
Region like system device memory map. 
For example, for FVP_BaseR_AEMv8R, we have [0x80000000, 0xfffff000) as
the whole system device memory map for Xen(idle vcpu) in EL2,  when
context switching to guest vcpu, it shall be replaced with guest-specific
device mapping, like vgic, vpl011, passthrough device, etc.

We don't have two mappings for different stage translations in MPU, like we had in MMU.
Xen stage 1 EL2 mapping and stage 2 mapping are both sharing one MPU memory mapping(xen_mpumap)
So to save the trouble of hunting down each switching regions in time-sensitive context switch, we
must re-order xen_mpumap to keep fixed regions in the front, and switching ones in the heels of them.

In Patch Serie v1, I was adding MPU regions in sequence,  and I introduced a set of bitmaps to record the location of
same type regions. At the end of booting, I need to *disable* MPU to do the reshuffling, as I can't
move regions like xen heap while MPU on.

And we discussed that it is too risky to disable MPU, and you suggested [1]
"
> You should not need any reorg if you map the boot-only section towards in
> the higher slot index (or just after the fixed ones).
"

Maybe in assembly, we know exactly how many fixed regions are, boot-only regions are, but in C codes, we parse FDT
to get static configuration, like we don't know how many fixed regions for xen static heap is enough. 
Approximation is not suggested in MPU system with limited MPU regions, some platform may only have 16 MPU regions,
IMHO, it is not worthy wasting in approximation. 

So I take the suggestion of putting regions in the higher slot index. Putting fixed regions in the front, and putting
boot-only and switching ones at tail. Then, at the end of booting, when we reorg the mpu mapping, we remove
all boot-only regions, and for switching ones, we disable-relocate(after fixed ones)-enable them. Specific codes in [2].

[1] https://lists.xenproject.org/archives/html/xen-devel/2022-11/msg00457.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2023-01/msg00795.html

> 
[...]
> >>> +/*
> >>> + * ENTRY to configure a EL2 MPU memory region
> >>> + * ARMv8-R AArch64 at most supports 255 MPU protection regions.
> >>> + * See section G1.3.18 of the reference manual for ARMv8-R AArch64,
> >>> + * PRBAR<n>_EL2 and PRLAR<n>_EL2 provides access to the EL2 MPU
> >>> +region
> >>> + * determined by the value of 'n' and PRSELR_EL2.REGION as
> >>> + * PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
> >>> + * For example to access regions from 16 to 31 (0b10000 to 0b11111):
> >>> + * - Set PRSELR_EL2 to 0b1xxxx
> >>> + * - Region 16 configuration is accessible through PRBAR0_EL2 and
> >>> +PRLAR0_EL2
> >>> + * - Region 17 configuration is accessible through PRBAR1_EL2 and
> >>> +PRLAR1_EL2
> >>> + * - Region 18 configuration is accessible through PRBAR2_EL2 and
> >>> +PRLAR2_EL2
> >>> + * - ...
> >>> + * - Region 31 configuration is accessible through PRBAR15_EL2 and
> >>> +PRLAR15_EL2
> >>> + *
> >>> + * Inputs:
> >>> + * x27: region selector
> >>> + * x28: preserve value for PRBAR_EL2
> >>> + * x29: preserve value for PRLAR_EL2
> >>> + *
> >>> + */
> >>> +ENTRY(write_pr)
> >>
> >> AFAICT, this function would not be necessary if the index for the
> >> init sections were hardcoded.
> >>
> >> So I would like to understand why the index cannot be hardcoded.
> >>
> >
> > The reason is that we are putting init sections at the *end* of the
> > MPU map, and the length of the whole MPU map is platform-specific. We
> read it from MPUIR_EL2.
> 
> Right, I got that bit from the code. What I would like to understand is why all
> the initial address cannot be hardocoded?
> 
>  From a brief look, this would simplify a lot the assembly code.
> 

Like I said before,  "map towards higher slot", if it is not the tail, it is hard to decide another
number to meet different platforms and various FDT static configuration.

If we, in assembly, put fixed regions in front and boot-only regions after, then, when we
enter C world, we immediately do a simple reshuffle, which means that we need to relocate
these init sections to tail, it is workable only when MPU is disabled, unless we're sure that
"reshuffling part" is not using any init codes or data.
  
> >
> >>> +    msr   PRSELR_EL2, x27
> >>> +    dsb   sy
> >>
> >> [...]
> >>
> >>> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index
> >>> bc45ea2c65..79965a3c17 100644
> >>> --- a/xen/arch/arm/xen.lds.S
> >>> +++ b/xen/arch/arm/xen.lds.S
> >>> @@ -91,6 +91,8 @@ SECTIONS
> >>>          __ro_after_init_end = .;
> >>>      } : text
> >>>
> >>> +  . = ALIGN(PAGE_SIZE);
> >>
> >> Why do you need this ALIGN?
> >>
> >
> > I need a symbol as the start of the data section, so I introduce
> > "__data_begin = .;".
> > If I use "__ro_after_init_end = .;" instead, I'm afraid in the future,
> > if someone introduces a new section after ro-after-init section, this
> > part also needs modification too.
> 
> I haven't suggested there is a problem to define a new symbol. I am merely
> asking about the align.
> 
> >
> > When we define MPU regions for each section in xen.lds.S, we always
> > treat these sections page-aligned.
> > I checked each section in xen.lds.S, and ". = ALIGN(PAGE_SIZE);" is
> > either added at section head, or at the tail of the previous section,
> > to make sure starting address symbol page-aligned.
> >
> > And if we don't put this ALIGN, if "__ro_after_init_end " symbol
> > itself is not page-aligned, the two adjacent sections will overlap in MPU.
> 
> __ro_after_init_end *has* to be page aligned because the permissions are
> different than for __data_begin.
> 
> If we were going to add a new section, then either it has the same permission
> as .data.read.mostly and we will bundle them or it doesn't and we would
> need a .align.
> 
> But today, the extra .ALIGN seems unnecessary (at least in the context of this
> patch).
> 

Understood, I'll remove

> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny Zheng

--
Penny Zheng

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-29  7:43       ` Julien Grall
@ 2023-01-30  6:24         ` Penny Zheng
  2023-01-30 10:00           ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-30  6:24 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi, Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Sunday, January 29, 2023 3:43 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> setup_early_uart to map early UART
> 
> Hi Penny,
> 
> On 29/01/2023 06:17, Penny Zheng wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Wednesday, January 25, 2023 3:09 AM
> >> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >> <Volodymyr_Babchuk@epam.com>
> >> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> >> setup_early_uart to map early UART
> >>
> >> Hi Peny,
> >
> > Hi Julien,
> >
> >>
> >> On 13/01/2023 05:28, Penny Zheng wrote:
> >>> In MMU system, we map the UART in the fixmap (when earlyprintk is
> used).
> >>> However in MPU system, we map the UART with a transient MPU
> memory
> >>> region.
> >>>
> >>> So we introduce a new unified function setup_early_uart to replace
> >>> the previous setup_fixmap.
> >>>
> >>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>>    xen/arch/arm/arm64/head.S               |  2 +-
> >>>    xen/arch/arm/arm64/head_mmu.S           |  4 +-
> >>>    xen/arch/arm/arm64/head_mpu.S           | 52
> >> +++++++++++++++++++++++++
> >>>    xen/arch/arm/include/asm/early_printk.h |  1 +
> >>>    4 files changed, 56 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> >>> index 7f3f973468..a92883319d 100644
> >>> --- a/xen/arch/arm/arm64/head.S
> >>> +++ b/xen/arch/arm/arm64/head.S
> >>> @@ -272,7 +272,7 @@ primary_switched:
> >>>             * afterwards.
> >>>             */
> >>>            bl    remove_identity_mapping
> >>> -        bl    setup_fixmap
> >>> +        bl    setup_early_uart
> >>>    #ifdef CONFIG_EARLY_PRINTK
> >>>            /* Use a virtual address to access the UART. */
> >>>            ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
> >>> diff --git a/xen/arch/arm/arm64/head_mmu.S
> >>> b/xen/arch/arm/arm64/head_mmu.S index b59c40495f..a19b7c873d
> >> 100644
> >>> --- a/xen/arch/arm/arm64/head_mmu.S
> >>> +++ b/xen/arch/arm/arm64/head_mmu.S
> >>> @@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
> >>>     *
> >>>     * Clobbers x0 - x3
> >>>     */
> >>> -ENTRY(setup_fixmap)
> >>> +ENTRY(setup_early_uart)
> >>
> >> This function is doing more than enable the early UART. It also
> >> setups the fixmap even earlyprintk is not configured.
> >
> > True, true.
> > I've thoroughly read the MMU implementation of setup_fixmap, and I'll
> > try to split it up.
> >
> >>
> >> I am not entirely sure what could be the name. Maybe this needs to be
> >> split further.
> >>
> >>>    #ifdef CONFIG_EARLY_PRINTK
> >>>            /* Add UART to the fixmap table */
> >>>            ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
> >>> @@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
> >>>            dsb   nshst
> >>>
> >>>            ret
> >>> -ENDPROC(setup_fixmap)
> >>> +ENDPROC(setup_early_uart)
> >>>
> >>>    /* Fail-stop */
> >>>    fail:   PRINT("- Boot failed -\r\n")
> >>> diff --git a/xen/arch/arm/arm64/head_mpu.S
> >>> b/xen/arch/arm/arm64/head_mpu.S index e2ac69b0cc..72d1e0863d
> >> 100644
> >>> --- a/xen/arch/arm/arm64/head_mpu.S
> >>> +++ b/xen/arch/arm/arm64/head_mpu.S
> >>> @@ -18,8 +18,10 @@
> >>>    #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
> >>>    #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
> >>>    #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
> >>> +#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
> >>>
> >>>    #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
> >>> +#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
> >>>
> >>>    /*
> >>>     * Macro to round up the section address to be PAGE_SIZE aligned
> >>> @@
> >>> -334,6 +336,56 @@ ENTRY(enable_mm)
> >>>        ret
> >>>    ENDPROC(enable_mm)
> >>>
> >>> +/*
> >>> + * Map the early UART with a new transient MPU memory region.
> >>> + *
> >>
> >> Missing "Inputs: "
> >>
> >>> + * x27: region selector
> >>> + * x28: prbar
> >>> + * x29: prlar
> >>> + *
> >>> + * Clobbers x0 - x4
> >>> + *
> >>> + */
> >>> +ENTRY(setup_early_uart)
> >>> +#ifdef CONFIG_EARLY_PRINTK
> >>> +    /* stack LR as write_pr will be called later like nested function */
> >>> +    mov   x3, lr
> >>> +
> >>> +    /*
> >>> +     * MPU region for early UART is a transient region, since it will be
> >>> +     * replaced by specific device memory layout when FDT gets parsed.
> >>
> >> I would rather not mention "FDT" here because this code is
> >> independent to the firmware table used.
> >>
> >> However, any reason to use a transient region rather than the one
> >> that will be used for the UART driver?
> >>
> >
> > We don’t want to define a MPU region for each device driver. It will
> > exhaust MPU regions very quickly.
> What the usual size of an MPU?
> 
> However, even if you don't want to define one for every device, it still seem
> to be sensible to define a fixed temporary one for the early UART as this
> would simplify the assembly code.
> 

We will add fixed MPU regions for Xen static heap in function setup_mm.
If we put early uart region in front(fixed region place), it will leave holes
later after removing it.

> 
> > In commit " [PATCH v2 28/40] xen/mpu: map boot module section in MPU
> > system",
> 
> Did you mean patch #27?
> 
> > A new FDT property `mpu,device-memory-section` will be introduced for
> > users to statically configure the whole system device memory with the
> least number of memory regions in Device Tree.
> > This section shall cover all devices that will be used in Xen, like `UART`,
> `GIC`, etc.
> > For FVP_BaseR_AEMv8R, we have the following definition:
> > ```
> > mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>; ```
> 
> I am a bit worry this will be a recipe for mistake. Do you have an example
> where the MPU will be exhausted if we reserve some entries for each device
> (or some)?
> 

Yes, we have internal platform where MPU regions are only 16. It will almost eat up
all MPU regions based on current implementation, when launching two guests in platform.

Let's calculate the most simple scenario:
The following is MPU-related static configuration in device tree:
```
        mpu,boot-module-section = <0x0 0x10000000 0x0 0x10000000>;
        mpu,guest-memory-section = <0x0 0x20000000 0x0 0x40000000>;
        mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;
        mpu,shared-memory-section = <0x0 0x7a000000 0x0 0x02000000>;

        xen,static-heap = <0x0 0x60000000 0x0 0x1a000000>;
```
At the end of the boot, before reshuffling, the MPU region usage will be as follows:
7 (defined in assembly) + FDT(early_fdt_map) + 5 (at least one region for each "mpu,xxx-memory-section").

That will be already at least 13 MPU regions ;\.

> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny Zheng

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-30  5:45         ` Penny Zheng
@ 2023-01-30  9:39           ` Julien Grall
  2023-01-31  4:11             ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-30  9:39 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 30/01/2023 05:45, Penny Zheng wrote:
>   There are three types of MPU regions during boot-time:
> 1. Fixed MPU region
> Regions like Xen text section, Xen heap section, etc.
> 2. Boot-only MPU region
> Regions like Xen init sections, etc. It will be removed at the end of booting.
> 3.   Regions need switching in/out during vcpu context switch
> Region like system device memory map.
> For example, for FVP_BaseR_AEMv8R, we have [0x80000000, 0xfffff000) as
> the whole system device memory map for Xen(idle vcpu) in EL2,  when
> context switching to guest vcpu, it shall be replaced with guest-specific
> device mapping, like vgic, vpl011, passthrough device, etc.
> 
> We don't have two mappings for different stage translations in MPU, like we had in MMU.
> Xen stage 1 EL2 mapping and stage 2 mapping are both sharing one MPU memory mapping(xen_mpumap)
> So to save the trouble of hunting down each switching regions in time-sensitive context switch, we
> must re-order xen_mpumap to keep fixed regions in the front, and switching ones in the heels of them.

 From my understanding, hunting down each switching regions would be a 
matter to loop over a bitmap. There will be a slight increase in the 
number of instructions executed, but I don't believe it will be noticeable.

> 
> In Patch Serie v1, I was adding MPU regions in sequence,  and I introduced a set of bitmaps to record the location of
> same type regions. At the end of booting, I need to *disable* MPU to do the reshuffling, as I can't
> move regions like xen heap while MPU on.
> 
> And we discussed that it is too risky to disable MPU, and you suggested [1]
> "
>> You should not need any reorg if you map the boot-only section towards in
>> the higher slot index (or just after the fixed ones).
> "

Right, looking at the new code. I realize that this was probably a bad 
idea because we are adding complexity in the assembly code.

> 
> Maybe in assembly, we know exactly how many fixed regions are, boot-only regions are, but in C codes, we parse FDT
> to get static configuration, like we don't know how many fixed regions for xen static heap is enough.
> Approximation is not suggested in MPU system with limited MPU regions, some platform may only have 16 MPU regions,
> IMHO, it is not worthy wasting in approximation.

I haven't suggested to use approximation anywhere here. I will answer 
about the limited number of entries in the other thread.

> 
> So I take the suggestion of putting regions in the higher slot index. Putting fixed regions in the front, and putting
> boot-only and switching ones at tail. Then, at the end of booting, when we reorg the mpu mapping, we remove
> all boot-only regions, and for switching ones, we disable-relocate(after fixed ones)-enable them. Specific codes in [2].

 From this discussion, it feels to me that you are trying to make the 
code more complicated just to keep the split and save a few cycles (see 
above).

I would suggest to investigate the cost of "hunting down each section". 
Depending on the result, we can discuss what the best approach.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-30  6:24         ` Penny Zheng
@ 2023-01-30 10:00           ` Julien Grall
  2023-01-31  5:38             ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-30 10:00 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk



On 30/01/2023 06:24, Penny Zheng wrote:
> Hi, Julien

Hi Penny,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Sunday, January 29, 2023 3:43 PM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>> setup_early_uart to map early UART
>>
>> Hi Penny,
>>
>> On 29/01/2023 06:17, Penny Zheng wrote:
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: Wednesday, January 25, 2023 3:09 AM
>>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
>> devel@lists.xenproject.org
>>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>>>> <sstabellini@kernel.org>; Bertrand Marquis
>>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
>>>> <Volodymyr_Babchuk@epam.com>
>>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>>>> setup_early_uart to map early UART
>>>>
>>>> Hi Peny,
>>>
>>> Hi Julien,
>>>
>>>>
>>>> On 13/01/2023 05:28, Penny Zheng wrote:
>>>>> In MMU system, we map the UART in the fixmap (when earlyprintk is
>> used).
>>>>> However in MPU system, we map the UART with a transient MPU
>> memory
>>>>> region.
>>>>>
>>>>> So we introduce a new unified function setup_early_uart to replace
>>>>> the previous setup_fixmap.
>>>>>
>>>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>>>> ---
>>>>>     xen/arch/arm/arm64/head.S               |  2 +-
>>>>>     xen/arch/arm/arm64/head_mmu.S           |  4 +-
>>>>>     xen/arch/arm/arm64/head_mpu.S           | 52
>>>> +++++++++++++++++++++++++
>>>>>     xen/arch/arm/include/asm/early_printk.h |  1 +
>>>>>     4 files changed, 56 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
>>>>> index 7f3f973468..a92883319d 100644
>>>>> --- a/xen/arch/arm/arm64/head.S
>>>>> +++ b/xen/arch/arm/arm64/head.S
>>>>> @@ -272,7 +272,7 @@ primary_switched:
>>>>>              * afterwards.
>>>>>              */
>>>>>             bl    remove_identity_mapping
>>>>> -        bl    setup_fixmap
>>>>> +        bl    setup_early_uart
>>>>>     #ifdef CONFIG_EARLY_PRINTK
>>>>>             /* Use a virtual address to access the UART. */
>>>>>             ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
>>>>> diff --git a/xen/arch/arm/arm64/head_mmu.S
>>>>> b/xen/arch/arm/arm64/head_mmu.S index b59c40495f..a19b7c873d
>>>> 100644
>>>>> --- a/xen/arch/arm/arm64/head_mmu.S
>>>>> +++ b/xen/arch/arm/arm64/head_mmu.S
>>>>> @@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
>>>>>      *
>>>>>      * Clobbers x0 - x3
>>>>>      */
>>>>> -ENTRY(setup_fixmap)
>>>>> +ENTRY(setup_early_uart)
>>>>
>>>> This function is doing more than enable the early UART. It also
>>>> setups the fixmap even earlyprintk is not configured.
>>>
>>> True, true.
>>> I've thoroughly read the MMU implementation of setup_fixmap, and I'll
>>> try to split it up.
>>>
>>>>
>>>> I am not entirely sure what could be the name. Maybe this needs to be
>>>> split further.
>>>>
>>>>>     #ifdef CONFIG_EARLY_PRINTK
>>>>>             /* Add UART to the fixmap table */
>>>>>             ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
>>>>> @@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
>>>>>             dsb   nshst
>>>>>
>>>>>             ret
>>>>> -ENDPROC(setup_fixmap)
>>>>> +ENDPROC(setup_early_uart)
>>>>>
>>>>>     /* Fail-stop */
>>>>>     fail:   PRINT("- Boot failed -\r\n")
>>>>> diff --git a/xen/arch/arm/arm64/head_mpu.S
>>>>> b/xen/arch/arm/arm64/head_mpu.S index e2ac69b0cc..72d1e0863d
>>>> 100644
>>>>> --- a/xen/arch/arm/arm64/head_mpu.S
>>>>> +++ b/xen/arch/arm/arm64/head_mpu.S
>>>>> @@ -18,8 +18,10 @@
>>>>>     #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
>>>>>     #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
>>>>>     #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
>>>>> +#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
>>>>>
>>>>>     #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1 */
>>>>> +#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
>>>>>
>>>>>     /*
>>>>>      * Macro to round up the section address to be PAGE_SIZE aligned
>>>>> @@
>>>>> -334,6 +336,56 @@ ENTRY(enable_mm)
>>>>>         ret
>>>>>     ENDPROC(enable_mm)
>>>>>
>>>>> +/*
>>>>> + * Map the early UART with a new transient MPU memory region.
>>>>> + *
>>>>
>>>> Missing "Inputs: "
>>>>
>>>>> + * x27: region selector
>>>>> + * x28: prbar
>>>>> + * x29: prlar
>>>>> + *
>>>>> + * Clobbers x0 - x4
>>>>> + *
>>>>> + */
>>>>> +ENTRY(setup_early_uart)
>>>>> +#ifdef CONFIG_EARLY_PRINTK
>>>>> +    /* stack LR as write_pr will be called later like nested function */
>>>>> +    mov   x3, lr
>>>>> +
>>>>> +    /*
>>>>> +     * MPU region for early UART is a transient region, since it will be
>>>>> +     * replaced by specific device memory layout when FDT gets parsed.
>>>>
>>>> I would rather not mention "FDT" here because this code is
>>>> independent to the firmware table used.
>>>>
>>>> However, any reason to use a transient region rather than the one
>>>> that will be used for the UART driver?
>>>>
>>>
>>> We don’t want to define a MPU region for each device driver. It will
>>> exhaust MPU regions very quickly.
>> What the usual size of an MPU?
>>
>> However, even if you don't want to define one for every device, it still seem
>> to be sensible to define a fixed temporary one for the early UART as this
>> would simplify the assembly code.
>>
> 
> We will add fixed MPU regions for Xen static heap in function setup_mm.
> If we put early uart region in front(fixed region place), it will leave holes
> later after removing it.

Why? The entry could be re-used to map the devices entry.

> 
>>
>>> In commit " [PATCH v2 28/40] xen/mpu: map boot module section in MPU
>>> system",
>>
>> Did you mean patch #27?
>>
>>> A new FDT property `mpu,device-memory-section` will be introduced for
>>> users to statically configure the whole system device memory with the
>> least number of memory regions in Device Tree.
>>> This section shall cover all devices that will be used in Xen, like `UART`,
>> `GIC`, etc.
>>> For FVP_BaseR_AEMv8R, we have the following definition:
>>> ```
>>> mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>; ```
>>
>> I am a bit worry this will be a recipe for mistake. Do you have an example
>> where the MPU will be exhausted if we reserve some entries for each device
>> (or some)?
>>
> 
> Yes, we have internal platform where MPU regions are only 16.

Internal is in silicon (e.g. real) or virtual platform?

>  It will almost eat up
> all MPU regions based on current implementation, when launching two guests in platform.
> 
> Let's calculate the most simple scenario:
> The following is MPU-related static configuration in device tree:
> ```
>          mpu,boot-module-section = <0x0 0x10000000 0x0 0x10000000>;
>          mpu,guest-memory-section = <0x0 0x20000000 0x0 0x40000000>;
>          mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;
>          mpu,shared-memory-section = <0x0 0x7a000000 0x0 0x02000000>;
> 
>          xen,static-heap = <0x0 0x60000000 0x0 0x1a000000>;
> ```
> At the end of the boot, before reshuffling, the MPU region usage will be as follows:
> 7 (defined in assembly) + FDT(early_fdt_map) + 5 (at least one region for each "mpu,xxx-memory-section").

Can you list the 7 sections? Is it including the init section?

> 
> That will be already at least 13 MPU regions ;\.

The section I am the most concern of is mpu,device-memory-section 
because it would likely mean that all the devices will be mapped in Xen. 
Is there any risk that the guest may use different memory attribute?

On the platform you are describing, what are the devices you are 
expected to be used by Xen?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-30  9:39           ` Julien Grall
@ 2023-01-31  4:11             ` Penny Zheng
  2023-01-31  9:27               ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-31  4:11 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Wei Chen, Stefano Stabellini,
	Bertrand Marquis, ayan.kumar.halder
  Cc: Volodymyr Babchuk

Hi Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Monday, January 30, 2023 5:40 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> memory region map
> 
> Hi Penny,
> 
> On 30/01/2023 05:45, Penny Zheng wrote:
> >   There are three types of MPU regions during boot-time:
> > 1. Fixed MPU region
> > Regions like Xen text section, Xen heap section, etc.
> > 2. Boot-only MPU region
> > Regions like Xen init sections, etc. It will be removed at the end of booting.
> > 3.   Regions need switching in/out during vcpu context switch
> > Region like system device memory map.
> > For example, for FVP_BaseR_AEMv8R, we have [0x80000000, 0xfffff000) as
> > the whole system device memory map for Xen(idle vcpu) in EL2,  when
> > context switching to guest vcpu, it shall be replaced with
> > guest-specific device mapping, like vgic, vpl011, passthrough device, etc.
> >
> > We don't have two mappings for different stage translations in MPU, like
> we had in MMU.
> > Xen stage 1 EL2 mapping and stage 2 mapping are both sharing one MPU
> > memory mapping(xen_mpumap) So to save the trouble of hunting down
> each
> > switching regions in time-sensitive context switch, we must re-order
> xen_mpumap to keep fixed regions in the front, and switching ones in the
> heels of them.
> 
>  From my understanding, hunting down each switching regions would be a
> matter to loop over a bitmap. There will be a slight increase in the number
> of instructions executed, but I don't believe it will be noticeable.
> 
> >
> > In Patch Serie v1, I was adding MPU regions in sequence,  and I
> > introduced a set of bitmaps to record the location of same type
> > regions. At the end of booting, I need to *disable* MPU to do the
> reshuffling, as I can't move regions like xen heap while MPU on.
> >
> > And we discussed that it is too risky to disable MPU, and you
> > suggested [1] "
> >> You should not need any reorg if you map the boot-only section
> >> towards in the higher slot index (or just after the fixed ones).
> > "
> 
> Right, looking at the new code. I realize that this was probably a bad idea
> because we are adding complexity in the assembly code.
> 
> >
> > Maybe in assembly, we know exactly how many fixed regions are,
> > boot-only regions are, but in C codes, we parse FDT to get static
> configuration, like we don't know how many fixed regions for xen static
> heap is enough.
> > Approximation is not suggested in MPU system with limited MPU regions,
> > some platform may only have 16 MPU regions, IMHO, it is not worthy
> wasting in approximation.
> 
> I haven't suggested to use approximation anywhere here. I will answer
> about the limited number of entries in the other thread.
> 
> >
> > So I take the suggestion of putting regions in the higher slot index.
> > Putting fixed regions in the front, and putting boot-only and
> > switching ones at tail. Then, at the end of booting, when we reorg the
> mpu mapping, we remove all boot-only regions, and for switching ones, we
> disable-relocate(after fixed ones)-enable them. Specific codes in [2].
> 
>  From this discussion, it feels to me that you are trying to make the code
> more complicated just to keep the split and save a few cycles (see above).
> 
> I would suggest to investigate the cost of "hunting down each section".
> Depending on the result, we can discuss what the best approach.
> 

Correct me if I'm wrong, the complicated things in assembly you are worried about
is that we couldn't define the index for initial sections, no hardcoded to keep simple.
And function write_pr, ik, is really a big chunk of codes, however the logic is simple there,
just a bunch of "switch-cases".

If we are adding MPU regions in sequence as you suggested, while using bitmap at the
same time to record used entry.
TBH, this is how I designed at the very beginning internally. We found that if we don't
do reorg late-boot to keep fixed in front and switching ones after, each time when we
do vcpu context switch, not only we need to hunt down switching ones to disable,
while we add new switch-in regions, using bitmap to find free entry is saying that the
process is unpredictable. Uncertainty is what we want to avoid in Armv8-R architecture. 

Hmmm, TBH, I really really like your suggestion to put boot-only/switching regions into
higher slot. It really saved a lot trouble in late-init reorg and also avoids disabling MPU
at the same time. The split is a simple and easy-to-understand construction compared
with bitmap too.

IMHO, reorg is really worth doing. We put all complicated things in boot-time to
make runtime context-switch simple and fast, even for a few cycles.
As the Armv8-R architecture profile from the beginning is designed to support use
cases that have a high sensitivity to deterministic execution. (e.g. Fuel Injection,
Brake control, Drive trains, Motor control etc).
However, when talking about architecture thing, I need more professional opinions, 
@Wei Chen @Bertrand Marquis
Also, Will R52 implementation encounter the same issue. @ayan.kumar.halder@xilinx.com
@Stefano Stabellini

> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny Zheng

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-30 10:00           ` Julien Grall
@ 2023-01-31  5:38             ` Penny Zheng
  2023-01-31  9:41               ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-01-31  5:38 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Monday, January 30, 2023 6:00 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> setup_early_uart to map early UART
> 
> 
> 
> On 30/01/2023 06:24, Penny Zheng wrote:
> > Hi, Julien
> 
> Hi Penny,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Sunday, January 29, 2023 3:43 PM
> >> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >> <Volodymyr_Babchuk@epam.com>
> >> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> >> setup_early_uart to map early UART
> >>
> >> Hi Penny,
> >>
> >> On 29/01/2023 06:17, Penny Zheng wrote:
> >>>> -----Original Message-----
> >>>> From: Julien Grall <julien@xen.org>
> >>>> Sent: Wednesday, January 25, 2023 3:09 AM
> >>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> >> devel@lists.xenproject.org
> >>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >>>> <sstabellini@kernel.org>; Bertrand Marquis
> >>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >>>> <Volodymyr_Babchuk@epam.com>
> >>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> >>>> setup_early_uart to map early UART
> >>>>
> >>>> Hi Peny,
> >>>
> >>> Hi Julien,
> >>>
> >>>>
> >>>> On 13/01/2023 05:28, Penny Zheng wrote:
> >>>>> In MMU system, we map the UART in the fixmap (when earlyprintk is
> >> used).
> >>>>> However in MPU system, we map the UART with a transient MPU
> >> memory
> >>>>> region.
> >>>>>
> >>>>> So we introduce a new unified function setup_early_uart to replace
> >>>>> the previous setup_fixmap.
> >>>>>
> >>>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> >>>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>>>> ---
> >>>>>     xen/arch/arm/arm64/head.S               |  2 +-
> >>>>>     xen/arch/arm/arm64/head_mmu.S           |  4 +-
> >>>>>     xen/arch/arm/arm64/head_mpu.S           | 52
> >>>> +++++++++++++++++++++++++
> >>>>>     xen/arch/arm/include/asm/early_printk.h |  1 +
> >>>>>     4 files changed, 56 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/xen/arch/arm/arm64/head.S
> b/xen/arch/arm/arm64/head.S
> >>>>> index 7f3f973468..a92883319d 100644
> >>>>> --- a/xen/arch/arm/arm64/head.S
> >>>>> +++ b/xen/arch/arm/arm64/head.S
> >>>>> @@ -272,7 +272,7 @@ primary_switched:
> >>>>>              * afterwards.
> >>>>>              */
> >>>>>             bl    remove_identity_mapping
> >>>>> -        bl    setup_fixmap
> >>>>> +        bl    setup_early_uart
> >>>>>     #ifdef CONFIG_EARLY_PRINTK
> >>>>>             /* Use a virtual address to access the UART. */
> >>>>>             ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
> >>>>> diff --git a/xen/arch/arm/arm64/head_mmu.S
> >>>>> b/xen/arch/arm/arm64/head_mmu.S index b59c40495f..a19b7c873d
> >>>> 100644
> >>>>> --- a/xen/arch/arm/arm64/head_mmu.S
> >>>>> +++ b/xen/arch/arm/arm64/head_mmu.S
> >>>>> @@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
> >>>>>      *
> >>>>>      * Clobbers x0 - x3
> >>>>>      */
> >>>>> -ENTRY(setup_fixmap)
> >>>>> +ENTRY(setup_early_uart)
> >>>>
> >>>> This function is doing more than enable the early UART. It also
> >>>> setups the fixmap even earlyprintk is not configured.
> >>>
> >>> True, true.
> >>> I've thoroughly read the MMU implementation of setup_fixmap, and
> >>> I'll try to split it up.
> >>>
> >>>>
> >>>> I am not entirely sure what could be the name. Maybe this needs to
> >>>> be split further.
> >>>>
> >>>>>     #ifdef CONFIG_EARLY_PRINTK
> >>>>>             /* Add UART to the fixmap table */
> >>>>>             ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
> >>>>> @@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
> >>>>>             dsb   nshst
> >>>>>
> >>>>>             ret
> >>>>> -ENDPROC(setup_fixmap)
> >>>>> +ENDPROC(setup_early_uart)
> >>>>>
> >>>>>     /* Fail-stop */
> >>>>>     fail:   PRINT("- Boot failed -\r\n")
> >>>>> diff --git a/xen/arch/arm/arm64/head_mpu.S
> >>>>> b/xen/arch/arm/arm64/head_mpu.S index e2ac69b0cc..72d1e0863d
> >>>> 100644
> >>>>> --- a/xen/arch/arm/arm64/head_mpu.S
> >>>>> +++ b/xen/arch/arm/arm64/head_mpu.S
> >>>>> @@ -18,8 +18,10 @@
> >>>>>     #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
> >>>>>     #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
> >>>>>     #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
> >>>>> +#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
> >>>>>
> >>>>>     #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1
> */
> >>>>> +#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
> >>>>>
> >>>>>     /*
> >>>>>      * Macro to round up the section address to be PAGE_SIZE
> >>>>> aligned @@
> >>>>> -334,6 +336,56 @@ ENTRY(enable_mm)
> >>>>>         ret
> >>>>>     ENDPROC(enable_mm)
> >>>>>
> >>>>> +/*
> >>>>> + * Map the early UART with a new transient MPU memory region.
> >>>>> + *
> >>>>
> >>>> Missing "Inputs: "
> >>>>
> >>>>> + * x27: region selector
> >>>>> + * x28: prbar
> >>>>> + * x29: prlar
> >>>>> + *
> >>>>> + * Clobbers x0 - x4
> >>>>> + *
> >>>>> + */
> >>>>> +ENTRY(setup_early_uart)
> >>>>> +#ifdef CONFIG_EARLY_PRINTK
> >>>>> +    /* stack LR as write_pr will be called later like nested function */
> >>>>> +    mov   x3, lr
> >>>>> +
> >>>>> +    /*
> >>>>> +     * MPU region for early UART is a transient region, since it will be
> >>>>> +     * replaced by specific device memory layout when FDT gets
> parsed.
> >>>>
> >>>> I would rather not mention "FDT" here because this code is
> >>>> independent to the firmware table used.
> >>>>
> >>>> However, any reason to use a transient region rather than the one
> >>>> that will be used for the UART driver?
> >>>>
> >>>
> >>> We don’t want to define a MPU region for each device driver. It will
> >>> exhaust MPU regions very quickly.
> >> What the usual size of an MPU?
> >>
> >> However, even if you don't want to define one for every device, it
> >> still seem to be sensible to define a fixed temporary one for the
> >> early UART as this would simplify the assembly code.
> >>
> >
> > We will add fixed MPU regions for Xen static heap in function setup_mm.
> > If we put early uart region in front(fixed region place), it will
> > leave holes later after removing it.
> 
> Why? The entry could be re-used to map the devices entry.
> 
> >
> >>
> >>> In commit " [PATCH v2 28/40] xen/mpu: map boot module section in
> MPU
> >>> system",
> >>
> >> Did you mean patch #27?
> >>
> >>> A new FDT property `mpu,device-memory-section` will be introduced
> >>> for users to statically configure the whole system device memory
> >>> with the
> >> least number of memory regions in Device Tree.
> >>> This section shall cover all devices that will be used in Xen, like
> >>> `UART`,
> >> `GIC`, etc.
> >>> For FVP_BaseR_AEMv8R, we have the following definition:
> >>> ```
> >>> mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>; ```
> >>
> >> I am a bit worry this will be a recipe for mistake. Do you have an
> >> example where the MPU will be exhausted if we reserve some entries
> >> for each device (or some)?
> >>
> >
> > Yes, we have internal platform where MPU regions are only 16.
> 
> Internal is in silicon (e.g. real) or virtual platform?
> 

Sorry, we met this kind of type platform is all I'm allowed to say.
Due to NDA, I couldn’t tell more.

> >  It will almost eat up
> > all MPU regions based on current implementation, when launching two
> guests in platform.
> >
> > Let's calculate the most simple scenario:
> > The following is MPU-related static configuration in device tree:
> > ```
> >          mpu,boot-module-section = <0x0 0x10000000 0x0 0x10000000>;
> >          mpu,guest-memory-section = <0x0 0x20000000 0x0 0x40000000>;
> >          mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;
> >          mpu,shared-memory-section = <0x0 0x7a000000 0x0 0x02000000>;
> >
> >          xen,static-heap = <0x0 0x60000000 0x0 0x1a000000>; ``` At the
> > end of the boot, before reshuffling, the MPU region usage will be as
> follows:
> > 7 (defined in assembly) + FDT(early_fdt_map) + 5 (at least one region for
> each "mpu,xxx-memory-section").
> 
> Can you list the 7 sections? Is it including the init section?
> 

Yes, I'll draw the layout for you:
'''
 Xen MPU Map before reorg:

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
xen_mpumap[5] : Xen static heap
......
xen_mpumap[max_xen_mpumap - 7]: Static shared memory section
xen_mpumap[max_xen_mpumap - 6]: Boot Module memory section(kernel, initramfs, etc)
xen_mpumap[max_xen_mpumap - 5]: Device memory section
xen_mpumap[max_xen_mpumap - 4]: Guest memory section
xen_mpumap[max_xen_mpumap - 3]: Early FDT
xen_mpumap[max_xen_mpumap - 2]: Xen init data
xen_mpumap[max_xen_mpumap - 1]: Xen init text

In the end of boot, function init_done will do the reorg and boot-only region clean-up:

Xen MPU Map after reorg(idle vcpu):

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
xen_mpumap[5] : Xen static heap
xen_mpumap[6] : Guest memory section
xen_mpumap[7] : Device memory section
xen_mpumap[6] : Static shared memory section

Xen MPU Map on runtime(guest vcpu):

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
xen_mpumap[5] : Xen static heap
xen_mpumap[6] : Guest memory
xen_mpumap[7] : vGIC map
xen_mpumap[8] : vPL011 map
xen_mpumap[9] : Passthrough device map(UART, etc)
xen_mpumap[10] : Static shared memory section

> >
> > That will be already at least 13 MPU regions ;\.
> 
> The section I am the most concern of is mpu,device-memory-section
> because it would likely mean that all the devices will be mapped in Xen.
> Is there any risk that the guest may use different memory attribute?
> 

Yes, on current implementation, per-domain vgic, vpl011, and passthrough device map
will be individually added into per-domain P2M mapping, then when switching into guest
vcpu from xen idle vcpu, device memory section will be replaced by vgic, vpl011, passthrough
device map.

> On the platform you are describing, what are the devices you are expected
> to be used by Xen?
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-31  4:11             ` Penny Zheng
@ 2023-01-31  9:27               ` Julien Grall
  2023-02-01  5:39                 ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-31  9:27 UTC (permalink / raw)
  To: Penny Zheng, xen-devel, Wei Chen, Stefano Stabellini,
	Bertrand Marquis, ayan.kumar.halder
  Cc: Volodymyr Babchuk



On 31/01/2023 04:11, Penny Zheng wrote:
> Hi Julien
> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Monday, January 30, 2023 5:40 PM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
>> memory region map
>>
>> Hi Penny,
>>
>> On 30/01/2023 05:45, Penny Zheng wrote:
>>>    There are three types of MPU regions during boot-time:
>>> 1. Fixed MPU region
>>> Regions like Xen text section, Xen heap section, etc.
>>> 2. Boot-only MPU region
>>> Regions like Xen init sections, etc. It will be removed at the end of booting.
>>> 3.   Regions need switching in/out during vcpu context switch
>>> Region like system device memory map.
>>> For example, for FVP_BaseR_AEMv8R, we have [0x80000000, 0xfffff000) as
>>> the whole system device memory map for Xen(idle vcpu) in EL2,  when
>>> context switching to guest vcpu, it shall be replaced with
>>> guest-specific device mapping, like vgic, vpl011, passthrough device, etc.
>>>
>>> We don't have two mappings for different stage translations in MPU, like
>> we had in MMU.
>>> Xen stage 1 EL2 mapping and stage 2 mapping are both sharing one MPU
>>> memory mapping(xen_mpumap) So to save the trouble of hunting down
>> each
>>> switching regions in time-sensitive context switch, we must re-order
>> xen_mpumap to keep fixed regions in the front, and switching ones in the
>> heels of them.
>>
>>   From my understanding, hunting down each switching regions would be a
>> matter to loop over a bitmap. There will be a slight increase in the number
>> of instructions executed, but I don't believe it will be noticeable.
>>
>>>
>>> In Patch Serie v1, I was adding MPU regions in sequence,  and I
>>> introduced a set of bitmaps to record the location of same type
>>> regions. At the end of booting, I need to *disable* MPU to do the
>> reshuffling, as I can't move regions like xen heap while MPU on.
>>>
>>> And we discussed that it is too risky to disable MPU, and you
>>> suggested [1] "
>>>> You should not need any reorg if you map the boot-only section
>>>> towards in the higher slot index (or just after the fixed ones).
>>> "
>>
>> Right, looking at the new code. I realize that this was probably a bad idea
>> because we are adding complexity in the assembly code.
>>
>>>
>>> Maybe in assembly, we know exactly how many fixed regions are,
>>> boot-only regions are, but in C codes, we parse FDT to get static
>> configuration, like we don't know how many fixed regions for xen static
>> heap is enough.
>>> Approximation is not suggested in MPU system with limited MPU regions,
>>> some platform may only have 16 MPU regions, IMHO, it is not worthy
>> wasting in approximation.
>>
>> I haven't suggested to use approximation anywhere here. I will answer
>> about the limited number of entries in the other thread.
>>
>>>
>>> So I take the suggestion of putting regions in the higher slot index.
>>> Putting fixed regions in the front, and putting boot-only and
>>> switching ones at tail. Then, at the end of booting, when we reorg the
>> mpu mapping, we remove all boot-only regions, and for switching ones, we
>> disable-relocate(after fixed ones)-enable them. Specific codes in [2].
>>
>>   From this discussion, it feels to me that you are trying to make the code
>> more complicated just to keep the split and save a few cycles (see above).
>>
>> I would suggest to investigate the cost of "hunting down each section".
>> Depending on the result, we can discuss what the best approach.
>>
> 
> Correct me if I'm wrong, the complicated things in assembly you are worried about
> is that we couldn't define the index for initial sections, no hardcoded to keep simple.

Correct.

> And function write_pr, ik, is really a big chunk of codes, however the logic is simple there,
> just a bunch of "switch-cases".

I agree that write_pr() is a bunch of switch-cases. But there are a lot 
of duplication in it and the interface to use it is, IMHO, not intuitive.

> 
> If we are adding MPU regions in sequence as you suggested, while using bitmap at the
> same time to record used entry.
> TBH, this is how I designed at the very beginning internally. We found that if we don't
> do reorg late-boot to keep fixed in front and switching ones after, each time when we
> do vcpu context switch, not only we need to hunt down switching ones to disable,
> while we add new switch-in regions, using bitmap to find free entry is saying that the
> process is unpredictable. Uncertainty is what we want to avoid in Armv8-R architecture.

I don't understand why it would be unpredictable. For a given 
combination of platform/device-tree, the bitmap will always look the 
same. So the number of cycles/instructions will always be the same.

This is not very different from the case where you split the MPU in two 
because

> 
> Hmmm, TBH, I really really like your suggestion to put boot-only/switching regions into
> higher slot. It really saved a lot trouble in late-init reorg and also avoids disabling MPU
> at the same time. The split is a simple and easy-to-understand construction compared
> with bitmap too.

I would like to propose another split. I will reply to that in the 
thread where you provided the MPU layout.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-31  5:38             ` Penny Zheng
@ 2023-01-31  9:41               ` Julien Grall
  2023-02-01  5:36                 ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-01-31  9:41 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 31/01/2023 05:38, Penny Zheng wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Monday, January 30, 2023 6:00 PM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>> setup_early_uart to map early UART
>>
>>
>>
>> On 30/01/2023 06:24, Penny Zheng wrote:
>>> Hi, Julien
>>
>> Hi Penny,
>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: Sunday, January 29, 2023 3:43 PM
>>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
>> devel@lists.xenproject.org
>>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>>>> <sstabellini@kernel.org>; Bertrand Marquis
>>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
>>>> <Volodymyr_Babchuk@epam.com>
>>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>>>> setup_early_uart to map early UART
>>>>
>>>> Hi Penny,
>>>>
>>>> On 29/01/2023 06:17, Penny Zheng wrote:
>>>>>> -----Original Message-----
>>>>>> From: Julien Grall <julien@xen.org>
>>>>>> Sent: Wednesday, January 25, 2023 3:09 AM
>>>>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
>>>> devel@lists.xenproject.org
>>>>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>>>>>> <sstabellini@kernel.org>; Bertrand Marquis
>>>>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
>>>>>> <Volodymyr_Babchuk@epam.com>
>>>>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>>>>>> setup_early_uart to map early UART
>>>>>>
>>>>>> Hi Peny,
>>>>>
>>>>> Hi Julien,
>>>>>
>>>>>>
>>>>>> On 13/01/2023 05:28, Penny Zheng wrote:
>>>>>>> In MMU system, we map the UART in the fixmap (when earlyprintk is
>>>> used).
>>>>>>> However in MPU system, we map the UART with a transient MPU
>>>> memory
>>>>>>> region.
>>>>>>>
>>>>>>> So we introduce a new unified function setup_early_uart to replace
>>>>>>> the previous setup_fixmap.
>>>>>>>
>>>>>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>>>>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>>>>>> ---
>>>>>>>      xen/arch/arm/arm64/head.S               |  2 +-
>>>>>>>      xen/arch/arm/arm64/head_mmu.S           |  4 +-
>>>>>>>      xen/arch/arm/arm64/head_mpu.S           | 52
>>>>>> +++++++++++++++++++++++++
>>>>>>>      xen/arch/arm/include/asm/early_printk.h |  1 +
>>>>>>>      4 files changed, 56 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/xen/arch/arm/arm64/head.S
>> b/xen/arch/arm/arm64/head.S
>>>>>>> index 7f3f973468..a92883319d 100644
>>>>>>> --- a/xen/arch/arm/arm64/head.S
>>>>>>> +++ b/xen/arch/arm/arm64/head.S
>>>>>>> @@ -272,7 +272,7 @@ primary_switched:
>>>>>>>               * afterwards.
>>>>>>>               */
>>>>>>>              bl    remove_identity_mapping
>>>>>>> -        bl    setup_fixmap
>>>>>>> +        bl    setup_early_uart
>>>>>>>      #ifdef CONFIG_EARLY_PRINTK
>>>>>>>              /* Use a virtual address to access the UART. */
>>>>>>>              ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
>>>>>>> diff --git a/xen/arch/arm/arm64/head_mmu.S
>>>>>>> b/xen/arch/arm/arm64/head_mmu.S index b59c40495f..a19b7c873d
>>>>>> 100644
>>>>>>> --- a/xen/arch/arm/arm64/head_mmu.S
>>>>>>> +++ b/xen/arch/arm/arm64/head_mmu.S
>>>>>>> @@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
>>>>>>>       *
>>>>>>>       * Clobbers x0 - x3
>>>>>>>       */
>>>>>>> -ENTRY(setup_fixmap)
>>>>>>> +ENTRY(setup_early_uart)
>>>>>>
>>>>>> This function is doing more than enable the early UART. It also
>>>>>> setups the fixmap even earlyprintk is not configured.
>>>>>
>>>>> True, true.
>>>>> I've thoroughly read the MMU implementation of setup_fixmap, and
>>>>> I'll try to split it up.
>>>>>
>>>>>>
>>>>>> I am not entirely sure what could be the name. Maybe this needs to
>>>>>> be split further.
>>>>>>
>>>>>>>      #ifdef CONFIG_EARLY_PRINTK
>>>>>>>              /* Add UART to the fixmap table */
>>>>>>>              ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
>>>>>>> @@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
>>>>>>>              dsb   nshst
>>>>>>>
>>>>>>>              ret
>>>>>>> -ENDPROC(setup_fixmap)
>>>>>>> +ENDPROC(setup_early_uart)
>>>>>>>
>>>>>>>      /* Fail-stop */
>>>>>>>      fail:   PRINT("- Boot failed -\r\n")
>>>>>>> diff --git a/xen/arch/arm/arm64/head_mpu.S
>>>>>>> b/xen/arch/arm/arm64/head_mpu.S index e2ac69b0cc..72d1e0863d
>>>>>> 100644
>>>>>>> --- a/xen/arch/arm/arm64/head_mpu.S
>>>>>>> +++ b/xen/arch/arm/arm64/head_mpu.S
>>>>>>> @@ -18,8 +18,10 @@
>>>>>>>      #define REGION_TEXT_PRBAR       0x38    /* SH=11 AP=10 XN=00 */
>>>>>>>      #define REGION_RO_PRBAR         0x3A    /* SH=11 AP=10 XN=10 */
>>>>>>>      #define REGION_DATA_PRBAR       0x32    /* SH=11 AP=00 XN=10 */
>>>>>>> +#define REGION_DEVICE_PRBAR     0x22    /* SH=10 AP=00 XN=10 */
>>>>>>>
>>>>>>>      #define REGION_NORMAL_PRLAR     0x0f    /* NS=0 ATTR=111 EN=1
>> */
>>>>>>> +#define REGION_DEVICE_PRLAR     0x09    /* NS=0 ATTR=100 EN=1 */
>>>>>>>
>>>>>>>      /*
>>>>>>>       * Macro to round up the section address to be PAGE_SIZE
>>>>>>> aligned @@
>>>>>>> -334,6 +336,56 @@ ENTRY(enable_mm)
>>>>>>>          ret
>>>>>>>      ENDPROC(enable_mm)
>>>>>>>
>>>>>>> +/*
>>>>>>> + * Map the early UART with a new transient MPU memory region.
>>>>>>> + *
>>>>>>
>>>>>> Missing "Inputs: "
>>>>>>
>>>>>>> + * x27: region selector
>>>>>>> + * x28: prbar
>>>>>>> + * x29: prlar
>>>>>>> + *
>>>>>>> + * Clobbers x0 - x4
>>>>>>> + *
>>>>>>> + */
>>>>>>> +ENTRY(setup_early_uart)
>>>>>>> +#ifdef CONFIG_EARLY_PRINTK
>>>>>>> +    /* stack LR as write_pr will be called later like nested function */
>>>>>>> +    mov   x3, lr
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * MPU region for early UART is a transient region, since it will be
>>>>>>> +     * replaced by specific device memory layout when FDT gets
>> parsed.
>>>>>>
>>>>>> I would rather not mention "FDT" here because this code is
>>>>>> independent to the firmware table used.
>>>>>>
>>>>>> However, any reason to use a transient region rather than the one
>>>>>> that will be used for the UART driver?
>>>>>>
>>>>>
>>>>> We don’t want to define a MPU region for each device driver. It will
>>>>> exhaust MPU regions very quickly.
>>>> What the usual size of an MPU?
>>>>
>>>> However, even if you don't want to define one for every device, it
>>>> still seem to be sensible to define a fixed temporary one for the
>>>> early UART as this would simplify the assembly code.
>>>>
>>>
>>> We will add fixed MPU regions for Xen static heap in function setup_mm.
>>> If we put early uart region in front(fixed region place), it will
>>> leave holes later after removing it.
>>
>> Why? The entry could be re-used to map the devices entry.
>>
>>>
>>>>
>>>>> In commit " [PATCH v2 28/40] xen/mpu: map boot module section in
>> MPU
>>>>> system",
>>>>
>>>> Did you mean patch #27?
>>>>
>>>>> A new FDT property `mpu,device-memory-section` will be introduced
>>>>> for users to statically configure the whole system device memory
>>>>> with the
>>>> least number of memory regions in Device Tree.
>>>>> This section shall cover all devices that will be used in Xen, like
>>>>> `UART`,
>>>> `GIC`, etc.
>>>>> For FVP_BaseR_AEMv8R, we have the following definition:
>>>>> ```
>>>>> mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>; ```
>>>>
>>>> I am a bit worry this will be a recipe for mistake. Do you have an
>>>> example where the MPU will be exhausted if we reserve some entries
>>>> for each device (or some)?
>>>>
>>>
>>> Yes, we have internal platform where MPU regions are only 16.
>>
>> Internal is in silicon (e.g. real) or virtual platform?
>>
> 
> Sorry, we met this kind of type platform is all I'm allowed to say.
> Due to NDA, I couldn’t tell more.
> 
>>>   It will almost eat up
>>> all MPU regions based on current implementation, when launching two
>> guests in platform.
>>>
>>> Let's calculate the most simple scenario:
>>> The following is MPU-related static configuration in device tree:
>>> ```
>>>           mpu,boot-module-section = <0x0 0x10000000 0x0 0x10000000>;
>>>           mpu,guest-memory-section = <0x0 0x20000000 0x0 0x40000000>;
>>>           mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;
>>>           mpu,shared-memory-section = <0x0 0x7a000000 0x0 0x02000000>;
>>>
>>>           xen,static-heap = <0x0 0x60000000 0x0 0x1a000000>; ``` At the
>>> end of the boot, before reshuffling, the MPU region usage will be as
>> follows:
>>> 7 (defined in assembly) + FDT(early_fdt_map) + 5 (at least one region for
>> each "mpu,xxx-memory-section").
>>
>> Can you list the 7 sections? Is it including the init section?
>>
> 
> Yes, I'll draw the layout for you:

Thanks!

> '''
>   Xen MPU Map before reorg:
> 
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data
> xen_mpumap[3] : Xen read-write data
> xen_mpumap[4] : Xen BSS
> xen_mpumap[5] : Xen static heap
> ......
> xen_mpumap[max_xen_mpumap - 7]: Static shared memory section
> xen_mpumap[max_xen_mpumap - 6]: Boot Module memory section(kernel, initramfs, etc)
> xen_mpumap[max_xen_mpumap - 5]: Device memory section
> xen_mpumap[max_xen_mpumap - 4]: Guest memory section
> xen_mpumap[max_xen_mpumap - 3]: Early FDT
> xen_mpumap[max_xen_mpumap - 2]: Xen init data
> xen_mpumap[max_xen_mpumap - 1]: Xen init text
> 
> In the end of boot, function init_done will do the reorg and boot-only region clean-up:
> 
> Xen MPU Map after reorg(idle vcpu):
> 
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data

In theory 1 and 2 could be merged after boot. But I guess it might be 
complicated?

> xen_mpumap[3] : Xen read-write data
> xen_mpumap[4] : Xen BSS
> xen_mpumap[5] : Xen static heap
> xen_mpumap[6] : Guest memory section

Why do you need to map the "Guest memory section" for the idle vCPU?

> xen_mpumap[7] : Device memory section

I might be missing some context here. But why this section is not also 
mapped in the context of the guest vCPU?

For instance, how would you write to the serial console when the context 
is the guest vCPU?

> xen_mpumap[6] : Static shared memory section
> 
> Xen MPU Map on runtime(guest vcpu):
> 
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data
> xen_mpumap[3] : Xen read-write data
> xen_mpumap[4] : Xen BSS
> xen_mpumap[5] : Xen static heap
> xen_mpumap[6] : Guest memory
> xen_mpumap[7] : vGIC map
> xen_mpumap[8] : vPL011 map

I was expected the PL011 to be fully emulated. So why is this necessary?

> xen_mpumap[9] : Passthrough device map(UART, etc)
> xen_mpumap[10] : Static shared memory section
> 
>>>
>>> That will be already at least 13 MPU regions ;\.
>>
>> The section I am the most concern of is mpu,device-memory-section
>> because it would likely mean that all the devices will be mapped in Xen.
>> Is there any risk that the guest may use different memory attribute?
>>
> 
> Yes, on current implementation, per-domain vgic, vpl011, and passthrough device map
> will be individually added into per-domain P2M mapping, then when switching into guest
> vcpu from xen idle vcpu, device memory section will be replaced by vgic, vpl011, passthrough
> device map.

Per above, I am not entirely sure how you could remove the device memory 
section when using the guest vCPU.

Now about the layout between init and runtime. From previous discussion, 
you said you didn't want to have init section to be fixed because of the 
section "Xen static heap".

Furthermore, you also mention that you didn't want a bitmap. So how 
about the following for the assembly part:

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
xen_mpumap[5]: Early FDT
xen_mpumap[6]: Xen init data
xen_mpumap[7]: Xen init text
xen_mpumap[8]: Early UART (optional)

Then when you switch to C, you could have:

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
xen_mpumap[5]: Early FDT
xen_mpumap[6]: Xen init data
xen_mpumap[7]: Xen init text

xen_mpumap[max_xen_mpumap - 4]: Device memory section
xen_mpumap[max_xen_mpumap - 3]: Guest memory section
xen_mpumap[max_xen_mpumap - 2]: Static shared memory section
xen_mpumap[max_xen_mpumap - 1] : Xen static heap

And at runtime, you would keep the "Xen static heap" right at the end of 
the MPU and keep the middle entries as the switchable one.

There would be not bitmap with this solution and all the entries for the 
assembly code would be fixed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-01-31  9:41               ` Julien Grall
@ 2023-02-01  5:36                 ` Penny Zheng
  2023-02-01 19:26                   ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-02-01  5:36 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Tuesday, January 31, 2023 5:42 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> setup_early_uart to map early UART
> 
> Hi Penny,
> 
> On 31/01/2023 05:38, Penny Zheng wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Monday, January 30, 2023 6:00 PM
> >> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >> <Volodymyr_Babchuk@epam.com>
> >> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> >> setup_early_uart to map early UART
> >>
> >>
> >>
> >> On 30/01/2023 06:24, Penny Zheng wrote:
> >>> Hi, Julien
> >>
> >> Hi Penny,
> >>
> >>>> -----Original Message-----
> >>>> From: Julien Grall <julien@xen.org>
> >>>> Sent: Sunday, January 29, 2023 3:43 PM
> >>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> >> devel@lists.xenproject.org
> >>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >>>> <sstabellini@kernel.org>; Bertrand Marquis
> >>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >>>> <Volodymyr_Babchuk@epam.com>
> >>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> >>>> setup_early_uart to map early UART
> >>>>
> >>>> Hi Penny,
> >>>>
> >>>> On 29/01/2023 06:17, Penny Zheng wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Julien Grall <julien@xen.org>
> >>>>>> Sent: Wednesday, January 25, 2023 3:09 AM
> >>>>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> >>>> devel@lists.xenproject.org
> >>>>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >>>>>> <sstabellini@kernel.org>; Bertrand Marquis
> >>>>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >>>>>> <Volodymyr_Babchuk@epam.com>
> >>>>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> >>>>>> setup_early_uart to map early UART
> >>>>>>
> >>>>>> Hi Peny,
> >>>>>
> >>>>> Hi Julien,
> >>>>>
> >>>>>>
> >>>>>> On 13/01/2023 05:28, Penny Zheng wrote:
> >>>>>>> In MMU system, we map the UART in the fixmap (when earlyprintk
> >>>>>>> is
> >>>> used).
> >>>>>>> However in MPU system, we map the UART with a transient MPU
> >>>> memory
> >>>>>>> region.
> >>>>>>>
> >>>>>>> So we introduce a new unified function setup_early_uart to
> >>>>>>> replace the previous setup_fixmap.
> >>>>>>>
> >>>>>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> >>>>>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>>>>>> ---
> >>>>>>>      xen/arch/arm/arm64/head.S               |  2 +-
> >>>>>>>      xen/arch/arm/arm64/head_mmu.S           |  4 +-
> >>>>>>>      xen/arch/arm/arm64/head_mpu.S           | 52
> >>>>>> +++++++++++++++++++++++++
> >>>>>>>      xen/arch/arm/include/asm/early_printk.h |  1 +
> >>>>>>>      4 files changed, 56 insertions(+), 3 deletions(-)
> >>>>>>>
> > Yes, I'll draw the layout for you:
> 
> Thanks!
> 
> > '''
> >   Xen MPU Map before reorg:
> >
> > xen_mpumap[0] : Xen text
> > xen_mpumap[1] : Xen read-only data
> > xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
> > read-write data xen_mpumap[4] : Xen BSS xen_mpumap[5] : Xen static
> > heap ......
> > xen_mpumap[max_xen_mpumap - 7]: Static shared memory section
> > xen_mpumap[max_xen_mpumap - 6]: Boot Module memory
> section(kernel,
> > initramfs, etc) xen_mpumap[max_xen_mpumap - 5]: Device memory
> section
> > xen_mpumap[max_xen_mpumap - 4]: Guest memory section
> > xen_mpumap[max_xen_mpumap - 3]: Early FDT
> xen_mpumap[max_xen_mpumap -
> > 2]: Xen init data xen_mpumap[max_xen_mpumap - 1]: Xen init text
> >
> > In the end of boot, function init_done will do the reorg and boot-only
> region clean-up:
> >
> > Xen MPU Map after reorg(idle vcpu):
> >
> > xen_mpumap[0] : Xen text
> > xen_mpumap[1] : Xen read-only data
> > xen_mpumap[2] : Xen read-only after init data
> 
> In theory 1 and 2 could be merged after boot. But I guess it might be
> complicated?
> 

In theory, if in C merging codes, we do not use any read-only data or read-only-after-init
data, then, ig, it will be ok.
Since, In MPU system, when we implement C merging codes, we need to disable region 1 and 2
firstly, and enable the merged region after. The reason is that two MPU regions with address overlapping
is not allowed when MPU on.
 
> > xen_mpumap[3] : Xen read-write data
> > xen_mpumap[4] : Xen BSS
> > xen_mpumap[5] : Xen static heap
> > xen_mpumap[6] : Guest memory section
> 
> Why do you need to map the "Guest memory section" for the idle vCPU?
> 

Hmmm, "Guest memory section" here refers to *ALL* guest RAM address range with only EL2 read/write access.

For guest vcpu, this section will be replaced by guest itself own RAM with both EL1/EL2 access.


> > xen_mpumap[7] : Device memory section
> 
> I might be missing some context here. But why this section is not also
> mapped in the context of the guest vCPU?
> 
> For instance, how would you write to the serial console when the context is
> the guest vCPU?
> 

I think, as Xen itself, it shall have access to all system device memory on EL2.
Ik, it is not accurate in current MMU implementation, only devices with supported driver
will get ioremap.

But like we discussed before, if following the same strategy as MMU does, with limited
MPU regions, we could not afford mapping a MPU region for each device.
For example, On FVPv8R model, we have four uarts, and a GICv3. At most, we may provide
four MPU regions for uarts, and two MPU regions for Distributor and one Redistributor region.
So, I thought up this new device tree property “mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;“
to roughly map all system device memory for Xen itself.

For guest, it shall only see vgic, vpl011, and its own passthrough device. And here, to maintain safe and
isolation, we will be mapping a MPU region for each device for guest vcpu.
For example, vgic and vpl011 are emulated and direct-map in MPU. Relevant device
mapping(GFN == MFN with only EL2 access)will be added to its *P2M mapping table*, in vgic_v3_domain_init [1].

Later, on vcpu context switching, when switching from idle vcpu, device memory section gets disabled
and switched out in ctxt_switch_from [2], later when switching into guest vcpu, vgic and vpl011 device mapping
will be switched in along with the whole P2M mapping table [3]. 

Words might be ambiguous, but all related code implementation is on MPU patch serie part II - guest initialization, you may
have to check the gitlab link:
[1] https://gitlab.com/xen-project/people/weic/xen/-/commit/a51d5b25eb17a50a36b27987a2f48e14793ac585 
[2] https://gitlab.com/xen-project/people/weic/xen/-/commit/c6a069d777d9407aeda42b7e5b08a086a1c15976 
[3] https://gitlab.com/xen-project/people/weic/xen/-/commit/d8c6408b6eef1190d75c9bd4e58557d34fc8b4df 

> > xen_mpumap[6] : Static shared memory section
> >
> > Xen MPU Map on runtime(guest vcpu):
> >
> > xen_mpumap[0] : Xen text
> > xen_mpumap[1] : Xen read-only data
> > xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
> > read-write data xen_mpumap[4] : Xen BSS xen_mpumap[5] : Xen static
> > heap xen_mpumap[6] : Guest memory xen_mpumap[7] : vGIC map
> > xen_mpumap[8] : vPL011 map
> 
> I was expected the PL011 to be fully emulated. So why is this necessary?
> 
> > xen_mpumap[9] : Passthrough device map(UART, etc) xen_mpumap[10] :
> > Static shared memory section
> >
> >>>
> >>> That will be already at least 13 MPU regions ;\.
> >>
> >> The section I am the most concern of is mpu,device-memory-section
> >> because it would likely mean that all the devices will be mapped in Xen.
> >> Is there any risk that the guest may use different memory attribute?
> >>
> >
> > Yes, on current implementation, per-domain vgic, vpl011, and
> > passthrough device map will be individually added into per-domain P2M
> > mapping, then when switching into guest vcpu from xen idle vcpu,
> > device memory section will be replaced by vgic, vpl011, passthrough
> device map.
> 
> Per above, I am not entirely sure how you could remove the device memory
> section when using the guest vCPU.
> 
> Now about the layout between init and runtime. From previous discussion,
> you said you didn't want to have init section to be fixed because of the
> section "Xen static heap".
> 
> Furthermore, you also mention that you didn't want a bitmap. So how
> about the following for the assembly part:
> 
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen read-
> write data xen_mpumap[4] : Xen BSS
> xen_mpumap[5]: Early FDT
> xen_mpumap[6]: Xen init data
> xen_mpumap[7]: Xen init text
> xen_mpumap[8]: Early UART (optional)
> 
> Then when you switch to C, you could have:
> 
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen read-
> write data xen_mpumap[4] : Xen BSS
> xen_mpumap[5]: Early FDT
> xen_mpumap[6]: Xen init data
> xen_mpumap[7]: Xen init text
> 
> xen_mpumap[max_xen_mpumap - 4]: Device memory section
> xen_mpumap[max_xen_mpumap - 3]: Guest memory section
> xen_mpumap[max_xen_mpumap - 2]: Static shared memory section
> xen_mpumap[max_xen_mpumap - 1] : Xen static heap
> 
> And at runtime, you would keep the "Xen static heap" right at the end of
> the MPU and keep the middle entries as the switchable one.
> 
> There would be not bitmap with this solution and all the entries for the
> assembly code would be fixed.
> 

That's really smart! Thanks!

I've done a little twist on your design: I've put all switchable ones in the middle
after regions defined in assembly. If near Xen heap region, I'm afraid if one day in the future,
someone is introducing another fixed region in C, we will leave holes there.

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
( Fixed MPU region defined in assembly )
--------------------------------------------------------------------------
xen_mpumap[5]: Xen init data
xen_mpumap[6]: Xen init text
xen_mpumap[7]: Early FDT
xen_mpumap[8]: Guest memory section
xen_mpumap[9]: Device memory section
xen_mpumap[10]: Static shared memory section
( boot-only and switching regions defined in C )
--------------------------------------------------------------------------
...
xen_mpumap[max_xen_mpumap - 1] : Xen static heap
( Fixed MPU region defined in C )
--------------------------------------------------------------------------

After re-org:
xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
( Fixed MPU region defined in assembly )
--------------------------------------------------------------------------
xen_mpumap[8]: Guest memory section
xen_mpumap[9]: Device memory section
xen_mpumap[10]: Static shared memory section
( Switching region )
--------------------------------------------------------------------------
...
xen_mpumap[max_xen_mpumap - 1] : Xen static heap
( Fixed MPU region defined in C )

If you're fine with it, then next serie, I'll use this layout, to keep both
simple assembly and re-org process.

> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny Zheng

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-01-31  9:27               ` Julien Grall
@ 2023-02-01  5:39                 ` Penny Zheng
  2023-02-01 18:56                   ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-02-01  5:39 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Wei Chen, Stefano Stabellini,
	Bertrand Marquis, ayan.kumar.halder
  Cc: Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Tuesday, January 31, 2023 5:28 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
> Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> ayan.kumar.halder@xilinx.com
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> memory region map
> 
> 
> 
> On 31/01/2023 04:11, Penny Zheng wrote:
> > Hi Julien
> >
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Monday, January 30, 2023 5:40 PM
> >> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
> devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> >> <Volodymyr_Babchuk@epam.com>
> >> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> >> memory region map
> >>
> >> Hi Penny,
> >>
[...]
> >>
> >> I would suggest to investigate the cost of "hunting down each section".
> >> Depending on the result, we can discuss what the best approach.
> >>
> >
> > Correct me if I'm wrong, the complicated things in assembly you are
> > worried about is that we couldn't define the index for initial sections, no
> hardcoded to keep simple.
> 
> Correct.
> 
> > And function write_pr, ik, is really a big chunk of codes, however the
> > logic is simple there, just a bunch of "switch-cases".
> 
> I agree that write_pr() is a bunch of switch-cases. But there are a lot of
> duplication in it and the interface to use it is, IMHO, not intuitive.
> 
> >
> > If we are adding MPU regions in sequence as you suggested, while using
> > bitmap at the same time to record used entry.
> > TBH, this is how I designed at the very beginning internally. We found
> > that if we don't do reorg late-boot to keep fixed in front and
> > switching ones after, each time when we do vcpu context switch, not
> > only we need to hunt down switching ones to disable, while we add new
> > switch-in regions, using bitmap to find free entry is saying that the
> process is unpredictable. Uncertainty is what we want to avoid in Armv8-R
> architecture.
> 
> I don't understand why it would be unpredictable. For a given combination
> of platform/device-tree, the bitmap will always look the same. So the
> number of cycles/instructions will always be the same.
> 

In boot-time, it will be always the same. But if we still use bitmap to find free
entry(for switching MPU regions) on runtime, hmmm, I thought this part will
be unpredictable.

> This is not very different from the case where you split the MPU in two
> because
> 
> >
> > Hmmm, TBH, I really really like your suggestion to put
> > boot-only/switching regions into higher slot. It really saved a lot
> > trouble in late-init reorg and also avoids disabling MPU at the same
> > time. The split is a simple and easy-to-understand construction compared
> with bitmap too.
> 
> I would like to propose another split. I will reply to that in the thread where
> you provided the MPU layout.
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-02-01  5:39                 ` Penny Zheng
@ 2023-02-01 18:56                   ` Julien Grall
  2023-02-02 10:53                     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-02-01 18:56 UTC (permalink / raw)
  To: Penny Zheng, xen-devel, Wei Chen, Stefano Stabellini,
	Bertrand Marquis, ayan.kumar.halder
  Cc: Volodymyr Babchuk

Hi Penny,

On 01/02/2023 05:39, Penny Zheng wrote:
>>> If we are adding MPU regions in sequence as you suggested, while using
>>> bitmap at the same time to record used entry.
>>> TBH, this is how I designed at the very beginning internally. We found
>>> that if we don't do reorg late-boot to keep fixed in front and
>>> switching ones after, each time when we do vcpu context switch, not
>>> only we need to hunt down switching ones to disable, while we add new
>>> switch-in regions, using bitmap to find free entry is saying that the
>> process is unpredictable. Uncertainty is what we want to avoid in Armv8-R
>> architecture.
>>
>> I don't understand why it would be unpredictable. For a given combination
>> of platform/device-tree, the bitmap will always look the same. So the
>> number of cycles/instructions will always be the same.
>>
> 
> In boot-time, it will be always the same. But if we still use bitmap to find free
> entry(for switching MPU regions) on runtime, hmmm, I thought this part will
> be unpredictable.

I know this point is now moot as we agreed on not using a bitmap but I 
wanted to answer on the unpredictability part.

It depends on whether you decide to allocate more entry at runtime. My 
assumption is you won't and therefore the the time to walk the bitmap 
will always be consistent.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-02-01  5:36                 ` Penny Zheng
@ 2023-02-01 19:26                   ` Julien Grall
  2023-02-02  8:05                     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-02-01 19:26 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk



On 01/02/2023 05:36, Penny Zheng wrote:
> Hi Julien

Hi Penny,

> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Tuesday, January 31, 2023 5:42 PM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>> setup_early_uart to map early UART
>>
>> Hi Penny,
>>
>> On 31/01/2023 05:38, Penny Zheng wrote:
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: Monday, January 30, 2023 6:00 PM
>>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
>> devel@lists.xenproject.org
>>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>>>> <sstabellini@kernel.org>; Bertrand Marquis
>>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
>>>> <Volodymyr_Babchuk@epam.com>
>>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>>>> setup_early_uart to map early UART
>>>>
>>>>
>>>>
>>>> On 30/01/2023 06:24, Penny Zheng wrote:
>>>>> Hi, Julien
>>>>
>>>> Hi Penny,
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Julien Grall <julien@xen.org>
>>>>>> Sent: Sunday, January 29, 2023 3:43 PM
>>>>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
>>>> devel@lists.xenproject.org
>>>>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>>>>>> <sstabellini@kernel.org>; Bertrand Marquis
>>>>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
>>>>>> <Volodymyr_Babchuk@epam.com>
>>>>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>>>>>> setup_early_uart to map early UART
>>>>>>
>>>>>> Hi Penny,
>>>>>>
>>>>>> On 29/01/2023 06:17, Penny Zheng wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Julien Grall <julien@xen.org>
>>>>>>>> Sent: Wednesday, January 25, 2023 3:09 AM
>>>>>>>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-
>>>>>> devel@lists.xenproject.org
>>>>>>>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>>>>>>>> <sstabellini@kernel.org>; Bertrand Marquis
>>>>>>>> <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
>>>>>>>> <Volodymyr_Babchuk@epam.com>
>>>>>>>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>>>>>>>> setup_early_uart to map early UART
>>>>>>>>
>>>>>>>> Hi Peny,
>>>>>>>
>>>>>>> Hi Julien,
>>>>>>>
>>>>>>>>
>>>>>>>> On 13/01/2023 05:28, Penny Zheng wrote:
>>>>>>>>> In MMU system, we map the UART in the fixmap (when earlyprintk
>>>>>>>>> is
>>>>>> used).
>>>>>>>>> However in MPU system, we map the UART with a transient MPU
>>>>>> memory
>>>>>>>>> region.
>>>>>>>>>
>>>>>>>>> So we introduce a new unified function setup_early_uart to
>>>>>>>>> replace the previous setup_fixmap.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>>>>>>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>>>>>>>> ---
>>>>>>>>>       xen/arch/arm/arm64/head.S               |  2 +-
>>>>>>>>>       xen/arch/arm/arm64/head_mmu.S           |  4 +-
>>>>>>>>>       xen/arch/arm/arm64/head_mpu.S           | 52
>>>>>>>> +++++++++++++++++++++++++
>>>>>>>>>       xen/arch/arm/include/asm/early_printk.h |  1 +
>>>>>>>>>       4 files changed, 56 insertions(+), 3 deletions(-)
>>>>>>>>>
>>> Yes, I'll draw the layout for you:
>>
>> Thanks!
>>
>>> '''
>>>    Xen MPU Map before reorg:
>>>
>>> xen_mpumap[0] : Xen text
>>> xen_mpumap[1] : Xen read-only data
>>> xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
>>> read-write data xen_mpumap[4] : Xen BSS xen_mpumap[5] : Xen static
>>> heap ......
>>> xen_mpumap[max_xen_mpumap - 7]: Static shared memory section
>>> xen_mpumap[max_xen_mpumap - 6]: Boot Module memory
>> section(kernel,
>>> initramfs, etc) xen_mpumap[max_xen_mpumap - 5]: Device memory
>> section
>>> xen_mpumap[max_xen_mpumap - 4]: Guest memory section
>>> xen_mpumap[max_xen_mpumap - 3]: Early FDT
>> xen_mpumap[max_xen_mpumap -
>>> 2]: Xen init data xen_mpumap[max_xen_mpumap - 1]: Xen init text
>>>
>>> In the end of boot, function init_done will do the reorg and boot-only
>> region clean-up:
>>>
>>> Xen MPU Map after reorg(idle vcpu):
>>>
>>> xen_mpumap[0] : Xen text
>>> xen_mpumap[1] : Xen read-only data
>>> xen_mpumap[2] : Xen read-only after init data
>>
>> In theory 1 and 2 could be merged after boot. But I guess it might be
>> complicated?
>>
> 
> In theory, if in C merging codes, we do not use any read-only data or read-only-after-init
> data, then, ig, it will be ok.
> Since, In MPU system, when we implement C merging codes, we need to disable region 1 and 2
> firstly, and enable the merged region after. The reason is that two MPU regions with address overlapping
> is not allowed when MPU on.

Good to know! I think it should be feasible to avoid accessing read-only 
variable while doing the merge.

Anyway, this looks more like a potential optimization for the future.

>   
>>> xen_mpumap[3] : Xen read-write data
>>> xen_mpumap[4] : Xen BSS
>>> xen_mpumap[5] : Xen static heap
>>> xen_mpumap[6] : Guest memory section
>>
>> Why do you need to map the "Guest memory section" for the idle vCPU?
>>
> 
> Hmmm, "Guest memory section" here refers to *ALL* guest RAM address range with only EL2 read/write access.

For what purpose? Earlier, you said you had a setup with a limited 
number of MPU entries. So it may not be possible to map all the guests RAM.

Xen should only need to access the guest memory in hypercalls and 
scrubbing. In both cases you could map/unmap on demand.

> 
> For guest vcpu, this section will be replaced by guest itself own RAM with both EL1/EL2 access.
> 
> 
>>> xen_mpumap[7] : Device memory section
>>
>> I might be missing some context here. But why this section is not also
>> mapped in the context of the guest vCPU?
>>
>> For instance, how would you write to the serial console when the context is
>> the guest vCPU?
>>
> 
> I think, as Xen itself, it shall have access to all system device memory on EL2.
> Ik, it is not accurate in current MMU implementation, only devices with supported driver
> will get ioremap.

So in the MMU case, we are not mapping all the devices in Xen because we 
don't exactly know which memory attributes will be used by the guest.

If we are using different attributes, then we are risking to break 
coherency. Could the same issue happen with the MPU?

If so, then you should not mapped those regions in Xen.

> 
> But like we discussed before, if following the same strategy as MMU does, with limited
> MPU regions, we could not afford mapping a MPU region for each device.
> For example, On FVPv8R model, we have four uarts, and a GICv3. At most, we may provide
> four MPU regions for uarts, and two MPU regions for Distributor and one Redistributor region.
> So, I thought up this new device tree property “mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;“
> to roughly map all system device memory for Xen itself.

Why do you say "roughly"? Is it possible that you have non-device region 
in the range?

> 
> For guest, it shall only see vgic, vpl011, and its own passthrough device. And here, to maintain safe and
> isolation, we will be mapping a MPU region for each device for guest vcpu.
> For example, vgic and vpl011 are emulated and direct-map in MPU. Relevant device

I am confused. If the vGIC/vPL011 is emulated then why do you need to 
map it in the MPU? IOW, wouldn't you receive a fault in the hypervisor 
if the guest is trying to access a region not present in the MPU?

> mapping(GFN == MFN with only EL2 access)will be added to its *P2M mapping table*, in vgic_v3_domain_init [1].
> 
> Later, on vcpu context switching, when switching from idle vcpu, device memory section gets disabled
> and switched out in ctxt_switch_from [2], later when switching into guest vcpu, vgic and vpl011 device mapping
> will be switched in along with the whole P2M mapping table [3].
> 
> Words might be ambiguous, but all related code implementation is on MPU patch serie part II - guest initialization, you may
> have to check the gitlab link:
> [1] https://gitlab.com/xen-project/people/weic/xen/-/commit/a51d5b25eb17a50a36b27987a2f48e14793ac585
> [2] https://gitlab.com/xen-project/people/weic/xen/-/commit/c6a069d777d9407aeda42b7e5b08a086a1c15976
> [3] https://gitlab.com/xen-project/people/weic/xen/-/commit/d8c6408b6eef1190d75c9bd4e58557d34fc8b4df

I have looked at the code and this doesn't entirely answer my question. 
So let me provide an example.

Xen can print to the serial console at any time. So Xen should be able 
to access the physical UART even when it has context switched to the 
guest vCPU.

But above you said that the physical device would not be accessible and 
instead you map the virtual UART. So how Xen is supported to access the 
physical UART?

Or by vpl011 did you actually mean the physical UART? If so, then if you 
map the device one by one in the MPU context, then it would likely mean 
to have space to map them one by one in the idle context.

> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data
> xen_mpumap[3] : Xen read-write data
> xen_mpumap[4] : Xen BSS
> ( Fixed MPU region defined in assembly )
> --------------------------------------------------------------------------
> xen_mpumap[5]: Xen init data
> xen_mpumap[6]: Xen init text
> xen_mpumap[7]: Early FDT
> xen_mpumap[8]: Guest memory section
> xen_mpumap[9]: Device memory section
> xen_mpumap[10]: Static shared memory section
> ( boot-only and switching regions defined in C )
> --------------------------------------------------------------------------
> ...
> xen_mpumap[max_xen_mpumap - 1] : Xen static heap
> ( Fixed MPU region defined in C )
> --------------------------------------------------------------------------
> 
> After re-org:
> xen_mpumap[0] : Xen text
> xen_mpumap[1] : Xen read-only data
> xen_mpumap[2] : Xen read-only after init data
> xen_mpumap[3] : Xen read-write data
> xen_mpumap[4] : Xen BSS
> ( Fixed MPU region defined in assembly )
> --------------------------------------------------------------------------
> xen_mpumap[8]: Guest memory section
> xen_mpumap[9]: Device memory section
> xen_mpumap[10]: Static shared memory section
> ( Switching region )
> --------------------------------------------------------------------------
> ...
> xen_mpumap[max_xen_mpumap - 1] : Xen static heap
> ( Fixed MPU region defined in C )
> 
> If you're fine with it, then next serie, I'll use this layout, to keep both
> simple assembly and re-org process.

I am ok in principle with the layout you propose. My main requirement is 
that the region used in assembly are fixed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-02-01 19:26                   ` Julien Grall
@ 2023-02-02  8:05                     ` Penny Zheng
  2023-02-02 11:11                       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-02-02  8:05 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Thursday, February 2, 2023 3:27 AM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
> setup_early_uart to map early UART
> 
> 
> 
> On 01/02/2023 05:36, Penny Zheng wrote:
> > Hi Julien
> 
> Hi Penny,
>

Hi Julien,
 
> >
[...]
> >>> xen_mpumap[3] : Xen read-write data
> >>> xen_mpumap[4] : Xen BSS
> >>> xen_mpumap[5] : Xen static heap
> >>> xen_mpumap[6] : Guest memory section
> >>
> >> Why do you need to map the "Guest memory section" for the idle vCPU?
> >>
> >
> > Hmmm, "Guest memory section" here refers to *ALL* guest RAM address
> range with only EL2 read/write access.
> 
> For what purpose? Earlier, you said you had a setup with a limited number
> of MPU entries. So it may not be possible to map all the guests RAM.
> 

The "Guest memory section" I referred here is the memory section defined in my new
introducing device tree property, "mpu,guest-memory-section = <...>".  It will include
"ALL" guest memory.

Let me find an example to illustrate why I introduced it and how it shall work:
In MPU system, all guest RAM *MUST* be statically configured through "xen,static-mem" under domain node.
We found that with more and more guests in,  the scattering of  "xen,static-mem" may
exhaust MPU regions very quickly. TBH, at that time, I didn't figure out that I could map/unmap on demand.
And in MMU system, We will never encounter this issue, setup_directmap_mappings will map the whole system
RAM to the directmap area for Xen to access in EL2. 
So instead, "mpu,guest-memory-section" is introduced to limit the scattering and map in advance, we enforce
users to ensure all guest RAM(through "xen,static-mem") must be included within "mpu,guest-memory-section".
e.g.
mpu,guest-memory-section = <0x0 0x20000000 0x0 0x40000000>;
DomU1:
xen,static-mem = <0x0 0x40000000 0x0 0x20000000>;
DomU2:
xen,static-mem = <0x0 0x20000000 0x0 0x20000000>;

> Xen should only need to access the guest memory in hypercalls and
> scrubbing. In both cases you could map/unmap on demand.
> 

thanks for the explanation.
In my understanding, during boot-up, there are two spots where Xen may touch the guest memory:
one is that doing synchronous scrubbing in function unprepare_staticmem_pages.
Another is coping and pasting kernel image to guest memory.
In both cases, we could map/unmap on demand. 
And if you think map/unmap on demand is better than "mpu,guest-memory-section", I'll try to fix it in next serie
 
> >
> > For guest vcpu, this section will be replaced by guest itself own RAM with
> both EL1/EL2 access.
> >
> >
> >>> xen_mpumap[7] : Device memory section
> >>
> >> I might be missing some context here. But why this section is not
> >> also mapped in the context of the guest vCPU?
> >>
> >> For instance, how would you write to the serial console when the
> >> context is the guest vCPU?
> >>
> >
> > I think, as Xen itself, it shall have access to all system device memory on
> EL2.
> > Ik, it is not accurate in current MMU implementation, only devices
> > with supported driver will get ioremap.
> 
> So in the MMU case, we are not mapping all the devices in Xen because we
> don't exactly know which memory attributes will be used by the guest.
> 
> If we are using different attributes, then we are risking to break coherency.
> Could the same issue happen with the MPU?
> 
> If so, then you should not mapped those regions in Xen.
> 
> >
> > But like we discussed before, if following the same strategy as MMU
> > does, with limited MPU regions, we could not afford mapping a MPU
> region for each device.
> > For example, On FVPv8R model, we have four uarts, and a GICv3. At
> > most, we may provide four MPU regions for uarts, and two MPU regions
> for Distributor and one Redistributor region.
> > So, I thought up this new device tree property
> > “mpu,device-memory-section = <0x0 0x80000000 0x0 0x7ffff000>;“ to
> roughly map all system device memory for Xen itself.
> 
> Why do you say "roughly"? Is it possible that you have non-device region in
> the range?
> 
> >
> > For guest, it shall only see vgic, vpl011, and its own passthrough
> > device. And here, to maintain safe and isolation, we will be mapping a
> MPU region for each device for guest vcpu.
> > For example, vgic and vpl011 are emulated and direct-map in MPU.
> > Relevant device
> 
> I am confused. If the vGIC/vPL011 is emulated then why do you need to
> map it in the MPU? IOW, wouldn't you receive a fault in the hypervisor if
> the guest is trying to access a region not present in the MPU?
> 
> > mapping(GFN == MFN with only EL2 access)will be added to its *P2M
> mapping table*, in vgic_v3_domain_init [1].
> >
> > Later, on vcpu context switching, when switching from idle vcpu,
> > device memory section gets disabled and switched out in
> > ctxt_switch_from [2], later when switching into guest vcpu, vgic and
> vpl011 device mapping will be switched in along with the whole P2M
> mapping table [3].
> >
> > Words might be ambiguous, but all related code implementation is on
> > MPU patch serie part II - guest initialization, you may have to check the
> gitlab link:
> > [1]
> > https://gitlab.com/xen-project/people/weic/xen/-
> /commit/a51d5b25eb17a5
> > 0a36b27987a2f48e14793ac585 [2]
> > https://gitlab.com/xen-project/people/weic/xen/-
> /commit/c6a069d777d940
> > 7aeda42b7e5b08a086a1c15976 [3]
> > https://gitlab.com/xen-project/people/weic/xen/-
> /commit/d8c6408b6eef11
> > 90d75c9bd4e58557d34fc8b4df
> 
> I have looked at the code and this doesn't entirely answer my question.
> So let me provide an example.
> 
> Xen can print to the serial console at any time. So Xen should be able to
> access the physical UART even when it has context switched to the guest
> vCPU.
> 

I understand your concern on "device memory section" with your
Example here. True, the current implementation is buggy.

Yes, if vpl011 is not enabled in guest and we instead passthrough a UART to
guest, in current design, Xen is not able to access the physical UART on guest mode.

All guests in MPU are direct-map, So like you said after, the mapping for
vpl011 on guest mode is the same with the physical UART.  And this hides the problem, to
let Xen being able to access to the physical UART.

I'll drop the design on "device memory section", and let device driver map
on demand in boot-time.
  
> But above you said that the physical device would not be accessible and
> instead you map the virtual UART. So how Xen is supported to access the
> physical UART?
> 
> Or by vpl011 did you actually mean the physical UART? If so, then if you
> map the device one by one in the MPU context, then it would likely mean
> to have space to map them one by one in the idle context.
> 
> > xen_mpumap[0] : Xen text
> > xen_mpumap[1] : Xen read-only data
> > xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
> > read-write data xen_mpumap[4] : Xen BSS ( Fixed MPU region defined in
> > assembly )
> > ----------------------------------------------------------------------
> > ----
> > xen_mpumap[5]: Xen init data
> > xen_mpumap[6]: Xen init text
> > xen_mpumap[7]: Early FDT
> > xen_mpumap[8]: Guest memory section
> > xen_mpumap[9]: Device memory section
> > xen_mpumap[10]: Static shared memory section ( boot-only and switching
> > regions defined in C )
> > ----------------------------------------------------------------------
> > ----
> > ...
> > xen_mpumap[max_xen_mpumap - 1] : Xen static heap ( Fixed MPU region
> > defined in C )
> > ----------------------------------------------------------------------
> > ----
> >
> > After re-org:
> > xen_mpumap[0] : Xen text
> > xen_mpumap[1] : Xen read-only data
> > xen_mpumap[2] : Xen read-only after init data xen_mpumap[3] : Xen
> > read-write data xen_mpumap[4] : Xen BSS ( Fixed MPU region defined in
> > assembly )
> > ----------------------------------------------------------------------
> > ----
> > xen_mpumap[8]: Guest memory section
> > xen_mpumap[9]: Device memory section
> > xen_mpumap[10]: Static shared memory section ( Switching region )
> > ----------------------------------------------------------------------
> > ----
> > ...
> > xen_mpumap[max_xen_mpumap - 1] : Xen static heap ( Fixed MPU region
> > defined in C )
> >
> > If you're fine with it, then next serie, I'll use this layout, to keep
> > both simple assembly and re-org process.
> 
> I am ok in principle with the layout you propose. My main requirement is
> that the region used in assembly are fixed.
> 
> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny Zheng

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-02-01 18:56                   ` Julien Grall
@ 2023-02-02 10:53                     ` Penny Zheng
  2023-02-02 10:58                       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-02-02 10:53 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Wei Chen, Stefano Stabellini,
	Bertrand Marquis, ayan.kumar.halder
  Cc: Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Thursday, February 2, 2023 2:57 AM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
> Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> ayan.kumar.halder@xilinx.com
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> memory region map
> 
> Hi Penny,
> 
> On 01/02/2023 05:39, Penny Zheng wrote:
> >>> If we are adding MPU regions in sequence as you suggested, while
> >>> using bitmap at the same time to record used entry.
> >>> TBH, this is how I designed at the very beginning internally. We
> >>> found that if we don't do reorg late-boot to keep fixed in front and
> >>> switching ones after, each time when we do vcpu context switch, not
> >>> only we need to hunt down switching ones to disable, while we add
> >>> new switch-in regions, using bitmap to find free entry is saying
> >>> that the
> >> process is unpredictable. Uncertainty is what we want to avoid in
> >> Armv8-R architecture.
> >>
> >> I don't understand why it would be unpredictable. For a given
> >> combination of platform/device-tree, the bitmap will always look the
> >> same. So the number of cycles/instructions will always be the same.
> >>
> >
> > In boot-time, it will be always the same. But if we still use bitmap
> > to find free entry(for switching MPU regions) on runtime, hmmm, I
> > thought this part will be unpredictable.
> 
> I know this point is now moot as we agreed on not using a bitmap but I
> wanted to answer on the unpredictability part.
> 
> It depends on whether you decide to allocate more entry at runtime. My
> assumption is you won't and therefore the the time to walk the bitmap will
> always be consistent.
> 

In MPU, we don't have something like vttbr_el2 in MMU, to store stage 2
EL1/EL0 translation table. Xen stage 1 EL2 mapping and stage 2 EL1/EL0
mapping are both sharing one table.
So when context switching into different guest, the current design is to disable
DOM1's guest RAM mapping firstly, then enable DOM2's guest RAM mapping,
to ensure isolation and safety.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-02-02 10:53                     ` Penny Zheng
@ 2023-02-02 10:58                       ` Julien Grall
  2023-02-02 11:30                         ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-02-02 10:58 UTC (permalink / raw)
  To: Penny Zheng, xen-devel, Wei Chen, Stefano Stabellini,
	Bertrand Marquis, ayan.kumar.halder
  Cc: Volodymyr Babchuk



On 02/02/2023 10:53, Penny Zheng wrote:
> Hi Julien,

Hi,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Thursday, February 2, 2023 2:57 AM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
>> Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> ayan.kumar.halder@xilinx.com
>> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
>> memory region map
>>
>> Hi Penny,
>>
>> On 01/02/2023 05:39, Penny Zheng wrote:
>>>>> If we are adding MPU regions in sequence as you suggested, while
>>>>> using bitmap at the same time to record used entry.
>>>>> TBH, this is how I designed at the very beginning internally. We
>>>>> found that if we don't do reorg late-boot to keep fixed in front and
>>>>> switching ones after, each time when we do vcpu context switch, not
>>>>> only we need to hunt down switching ones to disable, while we add
>>>>> new switch-in regions, using bitmap to find free entry is saying
>>>>> that the
>>>> process is unpredictable. Uncertainty is what we want to avoid in
>>>> Armv8-R architecture.
>>>>
>>>> I don't understand why it would be unpredictable. For a given
>>>> combination of platform/device-tree, the bitmap will always look the
>>>> same. So the number of cycles/instructions will always be the same.
>>>>
>>>
>>> In boot-time, it will be always the same. But if we still use bitmap
>>> to find free entry(for switching MPU regions) on runtime, hmmm, I
>>> thought this part will be unpredictable.
>>
>> I know this point is now moot as we agreed on not using a bitmap but I
>> wanted to answer on the unpredictability part.
>>
>> It depends on whether you decide to allocate more entry at runtime. My
>> assumption is you won't and therefore the the time to walk the bitmap will
>> always be consistent.
>>
> 
> In MPU, we don't have something like vttbr_el2 in MMU, to store stage 2
> EL1/EL0 translation table. Xen stage 1 EL2 mapping and stage 2 EL1/EL0
> mapping are both sharing one table.
> So when context switching into different guest, the current design is to disable
> DOM1's guest RAM mapping firstly, then enable DOM2's guest RAM mapping,
> to ensure isolation and safety.

I understood that but I don't understand how this is related to my point 
here. The entries you are replacing are always going to be the same 
after boot.

So if you have a bitmap indicate the fixed entries and you don't add 
more fixed one at runtime, then it will always take the same time to 
walk it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART
  2023-02-02  8:05                     ` Penny Zheng
@ 2023-02-02 11:11                       ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-02 11:11 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 02/02/2023 08:05, Penny Zheng wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Thursday, February 2, 2023 3:27 AM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 13/40] xen/mpu: introduce unified function
>> setup_early_uart to map early UART
>>
>>
>>
>> On 01/02/2023 05:36, Penny Zheng wrote:
>>> Hi Julien
>>
>> Hi Penny,
>>
> 
> Hi Julien,
>   
>>>
> [...]
>>>>> xen_mpumap[3] : Xen read-write data
>>>>> xen_mpumap[4] : Xen BSS
>>>>> xen_mpumap[5] : Xen static heap
>>>>> xen_mpumap[6] : Guest memory section
>>>>
>>>> Why do you need to map the "Guest memory section" for the idle vCPU?
>>>>
>>>
>>> Hmmm, "Guest memory section" here refers to *ALL* guest RAM address
>> range with only EL2 read/write access.
>>
>> For what purpose? Earlier, you said you had a setup with a limited number
>> of MPU entries. So it may not be possible to map all the guests RAM.
>>
> 
> The "Guest memory section" I referred here is the memory section defined in my new
> introducing device tree property, "mpu,guest-memory-section = <...>".  It will include
> "ALL" guest memory.
> 
> Let me find an example to illustrate why I introduced it and how it shall work:
> In MPU system, all guest RAM *MUST* be statically configured through "xen,static-mem" under domain node.
> We found that with more and more guests in,  the scattering of  "xen,static-mem" may
> exhaust MPU regions very quickly. TBH, at that time, I didn't figure out that I could map/unmap on demand.
> And in MMU system, We will never encounter this issue, setup_directmap_mappings will map the whole system
> RAM to the directmap area for Xen to access in EL2.
> So instead, "mpu,guest-memory-section" is introduced to limit the scattering and map in advance, we enforce
> users to ensure all guest RAM(through "xen,static-mem") must be included within "mpu,guest-memory-section".
> e.g.
> mpu,guest-memory-section = <0x0 0x20000000 0x0 0x40000000>;
> DomU1:
> xen,static-mem = <0x0 0x40000000 0x0 0x20000000>;
> DomU2:
> xen,static-mem = <0x0 0x20000000 0x0 0x20000000>;
> 
>> Xen should only need to access the guest memory in hypercalls and
>> scrubbing. In both cases you could map/unmap on demand.
>>
> 
> thanks for the explanation.
> In my understanding, during boot-up, there are two spots where Xen may touch the guest memory:
> one is that doing synchronous scrubbing in function unprepare_staticmem_pages.
> Another is coping and pasting kernel image to guest memory.
> In both cases, we could map/unmap on demand.
> And if you think map/unmap on demand is better than "mpu,guest-memory-section", I'll try to fix it in next serie

I think it would be better for a few reasons:
  1) You are making the assumption that all the RAM for the guests are 
contiguous. This may not be true for various reason (i.e. split bank...).
  2) It reduces the amount of work for the integrator
  3) You increase the defense in the hypervisor but it is more difficult 
to access the guest memory if there is a breakage

I don't expect major rework because you could plug the update to the MPU 
in map_domain_page().

>> I have looked at the code and this doesn't entirely answer my question.
>> So let me provide an example.
>>
>> Xen can print to the serial console at any time. So Xen should be able to
>> access the physical UART even when it has context switched to the guest
>> vCPU.
>>
> 
> I understand your concern on "device memory section" with your
> Example here. True, the current implementation is buggy.
> 
> Yes, if vpl011 is not enabled in guest and we instead passthrough a UART to
> guest, in current design, Xen is not able to access the physical UART on guest mode.

So all the guests are using the same UART?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map
  2023-02-02 10:58                       ` Julien Grall
@ 2023-02-02 11:30                         ` Penny Zheng
  0 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-02-02 11:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Wei Chen, Stefano Stabellini,
	Bertrand Marquis, ayan.kumar.halder
  Cc: Volodymyr Babchuk

Hi, Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Thursday, February 2, 2023 6:58 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
> Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> ayan.kumar.halder@xilinx.com
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> memory region map
> 
> 
> 
> On 02/02/2023 10:53, Penny Zheng wrote:
> > Hi Julien,
> 
> Hi,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Thursday, February 2, 2023 2:57 AM
> >> To: Penny Zheng <Penny.Zheng@arm.com>;
> >> xen-devel@lists.xenproject.org; Wei Chen <Wei.Chen@arm.com>;
> Stefano
> >> Stabellini <sstabellini@kernel.org>; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; ayan.kumar.halder@xilinx.com
> >> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> >> Subject: Re: [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU
> >> memory region map
> >>
> >> Hi Penny,
> >>
> >> On 01/02/2023 05:39, Penny Zheng wrote:
> >>>>> If we are adding MPU regions in sequence as you suggested, while
> >>>>> using bitmap at the same time to record used entry.
> >>>>> TBH, this is how I designed at the very beginning internally. We
> >>>>> found that if we don't do reorg late-boot to keep fixed in front
> >>>>> and switching ones after, each time when we do vcpu context
> >>>>> switch, not only we need to hunt down switching ones to disable,
> >>>>> while we add new switch-in regions, using bitmap to find free
> >>>>> entry is saying that the
> >>>> process is unpredictable. Uncertainty is what we want to avoid in
> >>>> Armv8-R architecture.
> >>>>
> >>>> I don't understand why it would be unpredictable. For a given
> >>>> combination of platform/device-tree, the bitmap will always look
> >>>> the same. So the number of cycles/instructions will always be the
> same.
> >>>>
> >>>
> >>> In boot-time, it will be always the same. But if we still use bitmap
> >>> to find free entry(for switching MPU regions) on runtime, hmmm, I
> >>> thought this part will be unpredictable.
> >>
> >> I know this point is now moot as we agreed on not using a bitmap but
> >> I wanted to answer on the unpredictability part.
> >>
> >> It depends on whether you decide to allocate more entry at runtime.
> >> My assumption is you won't and therefore the the time to walk the
> >> bitmap will always be consistent.
> >>
> >
> > In MPU, we don't have something like vttbr_el2 in MMU, to store stage
> > 2
> > EL1/EL0 translation table. Xen stage 1 EL2 mapping and stage 2 EL1/EL0
> > mapping are both sharing one table.
> > So when context switching into different guest, the current design is
> > to disable DOM1's guest RAM mapping firstly, then enable DOM2's guest
> > RAM mapping, to ensure isolation and safety.
> 
> I understood that but I don't understand how this is related to my point
> here. The entries you are replacing are always going to be the same after
> boot.
> 
> So if you have a bitmap indicate the fixed entries and you don't add more
> fixed one at runtime, then it will always take the same time to walk it.
> 

Ah, sorry for taking so long to understand ;/. True, the fixed entries will never
change after boot-time, each time when switching to guest vcpu, we always choose
the same entry.

> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny Zheng

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 14/40] xen/arm64: head: Jump to the runtime mapping in enable_mm()
  2023-01-13  5:28 ` [PATCH v2 14/40] xen/arm64: head: Jump to the runtime mapping in enable_mm() Penny Zheng
@ 2023-02-05 21:13   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-05 21:13 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> At the moment, on MMU system, enable_mm() will return to an address in
> the 1:1 mapping, then each path is responsible to switch to virtual runtime
> mapping. Then remove_identity_mapping() is called to remove all 1:1 mapping.
> 
> Since remove_identity_mapping() is not necessary on MPU system, and we also
> avoid creating empty function for MPU system, trying to keep only one codeflow
> in arm64/head.S, we move path switch and remove_identity_mapping() in
> enable_mm() on MMU system.

AFAICT, remove_identity_mapping() is still using ENTRY(). But you could 
avoid to introduce ENTRY() if you re-order your series so this patch 
happens before the MMU specific code is moved in a separate helper.

> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/arm64/head.S     | 28 +++++++++++++---------------
>   xen/arch/arm/arm64/head_mmu.S | 33 ++++++++++++++++++++++++++++++---
>   2 files changed, 43 insertions(+), 18 deletions(-)

This will need to be rebased on top of [1] (which will be merged pretty 
soon). There are two main differences:

  1) enable_mmu has an extra parameter to take the root page-tables
  2) the remove_identity_mapping should only be called for the boot CPU.

So I think we want to introduce two functions:
  1) enable_boot_mmu
  2) enable_runtime_mmu

You might need the same for the MPU as I would expect it would be per-CPU.

Cheers,

[1] 20230127195508.2786-1-julien@xen.org

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h
  2023-01-13  5:28 ` [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h Penny Zheng
@ 2023-02-05 21:30   ` Julien Grall
  2023-02-07  3:59     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-02-05 21:30 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> To make the code readable and maintainable, we move MMU-specific
> memory management code from mm.c to mm_mmu.c and move MMU-specific
> definitions from mm.h to mm_mmu.h.
> Later we will create mm_mpu.h and mm_mpu.c for MPU-specific memory
> management code.

This sentence implies there is no mm_mpu.{c, h} yet and this is not 
touched within this patch. However...


> This will avoid lots of #ifdef in memory management code and header files.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> ---
>   xen/arch/arm/Makefile             |    5 +
>   xen/arch/arm/include/asm/mm.h     |   19 +-
>   xen/arch/arm/include/asm/mm_mmu.h |   35 +
>   xen/arch/arm/mm.c                 | 1352 +---------------------------
>   xen/arch/arm/mm_mmu.c             | 1376 +++++++++++++++++++++++++++++
>   xen/arch/arm/mm_mpu.c             |   67 ++

... It looks like they already exists and you are modifying them. That 
said, it would be better if this patch only contains code movement (IOW 
no MPU changes).

>   6 files changed, 1488 insertions(+), 1366 deletions(-)
>   create mode 100644 xen/arch/arm/include/asm/mm_mmu.h
>   create mode 100644 xen/arch/arm/mm_mmu.c

I don't particular like the naming. I think it would make more sense to 
introduce two directories: "mmu" and "mpu" which includes code specific 
to each flavor of Xen.

> 
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 4d076b278b..21188b207f 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -37,6 +37,11 @@ obj-y += kernel.init.o
>   obj-$(CONFIG_LIVEPATCH) += livepatch.o
>   obj-y += mem_access.o
>   obj-y += mm.o
> +ifneq ($(CONFIG_HAS_MPU), y)
> +obj-y += mm_mmu.o
> +else
> +obj-y += mm_mpu.o
> +endif
Shouldn't mm_mpu.o have been incldued where it was first introduced?

>   obj-y += monitor.o
>   obj-y += p2m.o
>   obj-y += percpu.o
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 68adcac9fa..1b9fdb6ff5 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -154,13 +154,6 @@ struct page_info
>   #define _PGC_need_scrub   _PGC_allocated
>   #define PGC_need_scrub    PGC_allocated
>   
> -extern mfn_t directmap_mfn_start, directmap_mfn_end;
> -extern vaddr_t directmap_virt_end;
> -#ifdef CONFIG_ARM_64
> -extern vaddr_t directmap_virt_start;
> -extern unsigned long directmap_base_pdx;
> -#endif
> -
>   #ifdef CONFIG_ARM_32
>   #define is_xen_heap_page(page) is_xen_heap_mfn(page_to_mfn(page))
>   #define is_xen_heap_mfn(mfn) ({                                 \
> @@ -192,8 +185,6 @@ extern unsigned long total_pages;
>   
>   #define PDX_GROUP_SHIFT SECOND_SHIFT
>   
> -/* Boot-time pagetable setup */
> -extern void setup_pagetables(unsigned long boot_phys_offset);
>   /* Map FDT in boot pagetable */
>   extern void *early_fdt_map(paddr_t fdt_paddr);
>   /* Remove early mappings */
> @@ -203,12 +194,6 @@ extern void remove_early_mappings(void);
>   extern int init_secondary_pagetables(int cpu);
>   /* Switch secondary CPUS to its own pagetables and finalise MMU setup */
>   extern void mmu_init_secondary_cpu(void);
> -/*
> - * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
> - * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
> - * For Arm64, map the region in the directmap area.
> - */
> -extern void setup_directmap_mappings(unsigned long base_mfn, unsigned long nr_mfns);
>   /* Map a frame table to cover physical addresses ps through pe */
>   extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
>   /* map a physical range in virtual memory */
> @@ -256,6 +241,10 @@ static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
>   #define vmap_to_mfn(va)     maddr_to_mfn(virt_to_maddr((vaddr_t)va))
>   #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
>   
> +#ifndef CONFIG_HAS_MPU
> +#include <asm/mm_mmu.h>
> +#endif
> +
>   /* Page-align address and convert to frame number format */
>   #define paddr_to_pfn_aligned(paddr)    paddr_to_pfn(PAGE_ALIGN(paddr))
>   
> diff --git a/xen/arch/arm/include/asm/mm_mmu.h b/xen/arch/arm/include/asm/mm_mmu.h
> new file mode 100644
> index 0000000000..a5e63d8af8
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/mm_mmu.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef __ARCH_ARM_MM_MMU__
> +#define __ARCH_ARM_MM_MMU__
> +
> +extern mfn_t directmap_mfn_start, directmap_mfn_end;
> +extern vaddr_t directmap_virt_end;
> +#ifdef CONFIG_ARM_64
> +extern vaddr_t directmap_virt_start;
> +extern unsigned long directmap_base_pdx;
> +#endif
> +
> +/* Boot-time pagetable setup */
> +extern void setup_pagetables(unsigned long boot_phys_offset);
> +#define setup_mm_mappings(boot_phys_offset) setup_pagetables(boot_phys_offset)
> +
> +/* Non-boot CPUs use this to find the correct pagetables. */
> +extern uint64_t init_ttbr;

Newline here please.

> +/*
> + * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
> + * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
> + * For Arm64, map the region in the directmap area.
> + */
> +extern void setup_directmap_mappings(unsigned long base_mfn,
> +                                     unsigned long nr_mfns);
> +
> +#endif /* __ARCH_ARM_MM_MMU__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 8f15814c5e..e1ce2a62dc 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -2,371 +2,24 @@
>   /*
>    * xen/arch/arm/mm.c
>    *
> - * MMU code for an ARMv7-A with virt extensions.
> + * Memory management common code for MMU and MPU system.
>    *
>    * Tim Deegan <tim@xen.org>
>    * Copyright (c) 2011 Citrix Systems.
>    */
>   
>   #include <xen/domain_page.h>
> -#include <xen/errno.h>
>   #include <xen/grant_table.h>
> -#include <xen/guest_access.h>
> -#include <xen/init.h>
> -#include <xen/libfdt/libfdt.h>
> -#include <xen/mm.h>
> -#include <xen/pfn.h>
> -#include <xen/pmap.h>
>   #include <xen/sched.h>
> -#include <xen/sizes.h>
>   #include <xen/types.h>
> -#include <xen/vmap.h>
>   
>   #include <xsm/xsm.h>
>   
> -#include <asm/fixmap.h>
> -#include <asm/setup.h>
> -
> -#include <public/memory.h>
> -
> -/* Override macros from asm/page.h to make them work with mfn_t */
> -#undef virt_to_mfn
> -#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> -#undef mfn_to_virt
> -#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))
> -
> -#ifdef NDEBUG
> -static inline void
> -__attribute__ ((__format__ (__printf__, 1, 2)))
> -mm_printk(const char *fmt, ...) {}
> -#else
> -#define mm_printk(fmt, args...)             \
> -    do                                      \
> -    {                                       \
> -        dprintk(XENLOG_ERR, fmt, ## args);  \
> -        WARN();                             \
> -    } while (0)
> -#endif
> -
> -/* Static start-of-day pagetables that we use before the allocators
> - * are up. These are used by all CPUs during bringup before switching
> - * to the CPUs own pagetables.
> - *
> - * These pagetables have a very simple structure. They include:
> - *  - 2MB worth of 4K mappings of xen at XEN_VIRT_START, boot_first and
> - *    boot_second are used to populate the tables down to boot_third
> - *    which contains the actual mapping.
> - *  - a 1:1 mapping of xen at its current physical address. This uses a
> - *    section mapping at whichever of boot_{pgtable,first,second}
> - *    covers that physical address.
> - *
> - * For the boot CPU these mappings point to the address where Xen was
> - * loaded by the bootloader. For secondary CPUs they point to the
> - * relocated copy of Xen for the benefit of secondary CPUs.
> - *
> - * In addition to the above for the boot CPU the device-tree is
> - * initially mapped in the boot misc slot. This mapping is not present
> - * for secondary CPUs.
> - *
> - * Finally, if EARLY_PRINTK is enabled then xen_fixmap will be mapped
> - * by the CPU once it has moved off the 1:1 mapping.
> - */
> -DEFINE_BOOT_PAGE_TABLE(boot_pgtable);
> -#ifdef CONFIG_ARM_64
> -DEFINE_BOOT_PAGE_TABLE(boot_first);
> -DEFINE_BOOT_PAGE_TABLE(boot_first_id);
> -#endif
> -DEFINE_BOOT_PAGE_TABLE(boot_second_id);
> -DEFINE_BOOT_PAGE_TABLE(boot_third_id);
> -DEFINE_BOOT_PAGE_TABLE(boot_second);
> -DEFINE_BOOT_PAGE_TABLE(boot_third);
> -
> -/* Main runtime page tables */
> -
> -/*
> - * For arm32 xen_pgtable are per-PCPU and are allocated before
> - * bringing up each CPU. For arm64 xen_pgtable is common to all PCPUs.
> - *
> - * xen_second, xen_fixmap and xen_xenmap are always shared between all
> - * PCPUs.
> - */
> -
> -#ifdef CONFIG_ARM_64
> -#define HYP_PT_ROOT_LEVEL 0
> -static DEFINE_PAGE_TABLE(xen_pgtable);
> -static DEFINE_PAGE_TABLE(xen_first);
> -#define THIS_CPU_PGTABLE xen_pgtable
> -#else
> -#define HYP_PT_ROOT_LEVEL 1
> -/* Per-CPU pagetable pages */
> -/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
> -DEFINE_PER_CPU(lpae_t *, xen_pgtable);
> -#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
> -/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
> -static DEFINE_PAGE_TABLE(cpu0_pgtable);
> -#endif
> -
> -/* Common pagetable leaves */
> -/* Second level page table used to cover Xen virtual address space */
> -static DEFINE_PAGE_TABLE(xen_second);
> -/* Third level page table used for fixmap */
> -DEFINE_BOOT_PAGE_TABLE(xen_fixmap);
> -/*
> - * Third level page table used to map Xen itself with the XN bit set
> - * as appropriate.
> - */
> -static DEFINE_PAGE_TABLE(xen_xenmap);
> -
> -/* Non-boot CPUs use this to find the correct pagetables. */
> -uint64_t init_ttbr;
> -
> -static paddr_t phys_offset;
> -
> -/* Limits of the Xen heap */
> -mfn_t directmap_mfn_start __read_mostly = INVALID_MFN_INITIALIZER;
> -mfn_t directmap_mfn_end __read_mostly;
> -vaddr_t directmap_virt_end __read_mostly;
> -#ifdef CONFIG_ARM_64
> -vaddr_t directmap_virt_start __read_mostly;
> -unsigned long directmap_base_pdx __read_mostly;
> -#endif
> -
>   unsigned long frametable_base_pdx __read_mostly;
> -unsigned long frametable_virt_end __read_mostly;
>   
>   unsigned long max_page;
>   unsigned long total_pages;
>   
> -extern char __init_begin[], __init_end[];
> -
> -/* Checking VA memory layout alignment. */
> -static void __init __maybe_unused build_assertions(void)
> -{
> -    /* 2MB aligned regions */
> -    BUILD_BUG_ON(XEN_VIRT_START & ~SECOND_MASK);
> -    BUILD_BUG_ON(FIXMAP_ADDR(0) & ~SECOND_MASK);
> -    /* 1GB aligned regions */
> -#ifdef CONFIG_ARM_32
> -    BUILD_BUG_ON(XENHEAP_VIRT_START & ~FIRST_MASK);
> -#else
> -    BUILD_BUG_ON(DIRECTMAP_VIRT_START & ~FIRST_MASK);
> -#endif
> -    /* Page table structure constraints */
> -#ifdef CONFIG_ARM_64
> -    BUILD_BUG_ON(zeroeth_table_offset(XEN_VIRT_START));
> -#endif
> -    BUILD_BUG_ON(first_table_offset(XEN_VIRT_START));
> -#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
> -    BUILD_BUG_ON(DOMHEAP_VIRT_START & ~FIRST_MASK);
> -#endif
> -    /*
> -     * The boot code expects the regions XEN_VIRT_START, FIXMAP_ADDR(0),
> -     * BOOT_FDT_VIRT_START to use the same 0th (arm64 only) and 1st
> -     * slot in the page tables.
> -     */
> -#define CHECK_SAME_SLOT(level, virt1, virt2) \
> -    BUILD_BUG_ON(level##_table_offset(virt1) != level##_table_offset(virt2))
> -
> -#ifdef CONFIG_ARM_64
> -    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, FIXMAP_ADDR(0));
> -    CHECK_SAME_SLOT(zeroeth, XEN_VIRT_START, BOOT_FDT_VIRT_START);
> -#endif
> -    CHECK_SAME_SLOT(first, XEN_VIRT_START, FIXMAP_ADDR(0));
> -    CHECK_SAME_SLOT(first, XEN_VIRT_START, BOOT_FDT_VIRT_START);
> -
> -#undef CHECK_SAME_SLOT
> -}
> -
> -static lpae_t *xen_map_table(mfn_t mfn)
> -{
> -    /*
> -     * During early boot, map_domain_page() may be unusable. Use the
> -     * PMAP to map temporarily a page-table.
> -     */
> -    if ( system_state == SYS_STATE_early_boot )
> -        return pmap_map(mfn);
> -
> -    return map_domain_page(mfn);
> -}
> -
> -static void xen_unmap_table(const lpae_t *table)
> -{
> -    /*
> -     * During early boot, xen_map_table() will not use map_domain_page()
> -     * but the PMAP.
> -     */
> -    if ( system_state == SYS_STATE_early_boot )
> -        pmap_unmap(table);
> -    else
> -        unmap_domain_page(table);
> -}
> -
> -void dump_pt_walk(paddr_t ttbr, paddr_t addr,
> -                  unsigned int root_level,
> -                  unsigned int nr_root_tables)
> -{
> -    static const char *level_strs[4] = { "0TH", "1ST", "2ND", "3RD" };
> -    const mfn_t root_mfn = maddr_to_mfn(ttbr);
> -    const unsigned int offsets[4] = {
> -        zeroeth_table_offset(addr),
> -        first_table_offset(addr),
> -        second_table_offset(addr),
> -        third_table_offset(addr)
> -    };
> -    lpae_t pte, *mapping;
> -    unsigned int level, root_table;
> -
> -#ifdef CONFIG_ARM_32
> -    BUG_ON(root_level < 1);
> -#endif
> -    BUG_ON(root_level > 3);
> -
> -    if ( nr_root_tables > 1 )
> -    {
> -        /*
> -         * Concatenated root-level tables. The table number will be
> -         * the offset at the previous level. It is not possible to
> -         * concatenate a level-0 root.
> -         */
> -        BUG_ON(root_level == 0);
> -        root_table = offsets[root_level - 1];
> -        printk("Using concatenated root table %u\n", root_table);
> -        if ( root_table >= nr_root_tables )
> -        {
> -            printk("Invalid root table offset\n");
> -            return;
> -        }
> -    }
> -    else
> -        root_table = 0;
> -
> -    mapping = xen_map_table(mfn_add(root_mfn, root_table));
> -
> -    for ( level = root_level; ; level++ )
> -    {
> -        if ( offsets[level] > XEN_PT_LPAE_ENTRIES )
> -            break;
> -
> -        pte = mapping[offsets[level]];
> -
> -        printk("%s[0x%03x] = 0x%"PRIpaddr"\n",
> -               level_strs[level], offsets[level], pte.bits);
> -
> -        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
> -            break;
> -
> -        /* For next iteration */
> -        xen_unmap_table(mapping);
> -        mapping = xen_map_table(lpae_get_mfn(pte));
> -    }
> -
> -    xen_unmap_table(mapping);
> -}
> -
> -void dump_hyp_walk(vaddr_t addr)
> -{
> -    uint64_t ttbr = READ_SYSREG64(TTBR0_EL2);
> -
> -    printk("Walking Hypervisor VA 0x%"PRIvaddr" "
> -           "on CPU%d via TTBR 0x%016"PRIx64"\n",
> -           addr, smp_processor_id(), ttbr);
> -
> -    dump_pt_walk(ttbr, addr, HYP_PT_ROOT_LEVEL, 1);
> -}
> -
> -lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr)
> -{
> -    lpae_t e = (lpae_t) {
> -        .pt = {
> -            .valid = 1,           /* Mappings are present */
> -            .table = 0,           /* Set to 1 for links and 4k maps */
> -            .ai = attr,
> -            .ns = 1,              /* Hyp mode is in the non-secure world */
> -            .up = 1,              /* See below */
> -            .ro = 0,              /* Assume read-write */
> -            .af = 1,              /* No need for access tracking */
> -            .ng = 1,              /* Makes TLB flushes easier */
> -            .contig = 0,          /* Assume non-contiguous */
> -            .xn = 1,              /* No need to execute outside .text */
> -            .avail = 0,           /* Reference count for domheap mapping */
> -        }};
> -    /*
> -     * For EL2 stage-1 page table, up (aka AP[1]) is RES1 as the translation
> -     * regime applies to only one exception level (see D4.4.4 and G4.6.1
> -     * in ARM DDI 0487B.a). If this changes, remember to update the
> -     * hard-coded values in head.S too.
> -     */
> -
> -    switch ( attr )
> -    {
> -    case MT_NORMAL_NC:
> -        /*
> -         * ARM ARM: Overlaying the shareability attribute (DDI
> -         * 0406C.b B3-1376 to 1377)
> -         *
> -         * A memory region with a resultant memory type attribute of Normal,
> -         * and a resultant cacheability attribute of Inner Non-cacheable,
> -         * Outer Non-cacheable, must have a resultant shareability attribute
> -         * of Outer Shareable, otherwise shareability is UNPREDICTABLE.
> -         *
> -         * On ARMv8 sharability is ignored and explicitly treated as Outer
> -         * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable.
> -         */
> -        e.pt.sh = LPAE_SH_OUTER;
> -        break;
> -    case MT_DEVICE_nGnRnE:
> -    case MT_DEVICE_nGnRE:
> -        /*
> -         * Shareability is ignored for non-Normal memory, Outer is as
> -         * good as anything.
> -         *
> -         * On ARMv8 sharability is ignored and explicitly treated as Outer
> -         * Shareable for any device memory type.
> -         */
> -        e.pt.sh = LPAE_SH_OUTER;
> -        break;
> -    default:
> -        e.pt.sh = LPAE_SH_INNER;  /* Xen mappings are SMP coherent */
> -        break;
> -    }
> -
> -    ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK));
> -
> -    lpae_set_mfn(e, mfn);
> -
> -    return e;
> -}
> -
> -/* Map a 4k page in a fixmap entry */
> -void set_fixmap(unsigned int map, mfn_t mfn, unsigned int flags)
> -{
> -    int res;
> -
> -    res = map_pages_to_xen(FIXMAP_ADDR(map), mfn, 1, flags);
> -    BUG_ON(res != 0);
> -}
> -
> -/* Remove a mapping from a fixmap entry */
> -void clear_fixmap(unsigned int map)
> -{
> -    int res;
> -
> -    res = destroy_xen_mappings(FIXMAP_ADDR(map), FIXMAP_ADDR(map) + PAGE_SIZE);
> -    BUG_ON(res != 0);
> -}
> -
> -void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes)
> -{
> -    set_fixmap(FIXMAP_MISC, mfn, attributes);
> -
> -    return fix_to_virt(FIXMAP_MISC);
> -}
> -
> -void unmap_page_from_xen_misc(void)
> -{
> -    clear_fixmap(FIXMAP_MISC);
> -}
> -
>   void flush_page_to_ram(unsigned long mfn, bool sync_icache)
>   {
>       void *v = map_domain_page(_mfn(mfn));
> @@ -386,878 +39,6 @@ void flush_page_to_ram(unsigned long mfn, bool sync_icache)
>           invalidate_icache();
>   }
>   
> -static inline lpae_t pte_of_xenaddr(vaddr_t va)
> -{
> -    paddr_t ma = va + phys_offset;
> -
> -    return mfn_to_xen_entry(maddr_to_mfn(ma), MT_NORMAL);
> -}
> -
> -void * __init early_fdt_map(paddr_t fdt_paddr)
> -{
> -    /* We are using 2MB superpage for mapping the FDT */
> -    paddr_t base_paddr = fdt_paddr & SECOND_MASK;
> -    paddr_t offset;
> -    void *fdt_virt;
> -    uint32_t size;
> -    int rc;
> -
> -    /*
> -     * Check whether the physical FDT address is set and meets the minimum
> -     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
> -     * least 8 bytes so that we always access the magic and size fields
> -     * of the FDT header after mapping the first chunk, double check if
> -     * that is indeed the case.
> -     */
> -    BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
> -    if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
> -        return NULL;
> -
> -    /* The FDT is mapped using 2MB superpage */
> -    BUILD_BUG_ON(BOOT_FDT_VIRT_START % SZ_2M);
> -
> -    rc = map_pages_to_xen(BOOT_FDT_VIRT_START, maddr_to_mfn(base_paddr),
> -                          SZ_2M >> PAGE_SHIFT,
> -                          PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
> -    if ( rc )
> -        panic("Unable to map the device-tree.\n");
> -
> -
> -    offset = fdt_paddr % SECOND_SIZE;
> -    fdt_virt = (void *)BOOT_FDT_VIRT_START + offset;
> -
> -    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
> -        return NULL;
> -
> -    size = fdt_totalsize(fdt_virt);
> -    if ( size > MAX_FDT_SIZE )
> -        return NULL;
> -
> -    if ( (offset + size) > SZ_2M )
> -    {
> -        rc = map_pages_to_xen(BOOT_FDT_VIRT_START + SZ_2M,
> -                              maddr_to_mfn(base_paddr + SZ_2M),
> -                              SZ_2M >> PAGE_SHIFT,
> -                              PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
> -        if ( rc )
> -            panic("Unable to map the device-tree\n");
> -    }
> -
> -    return fdt_virt;
> -}
> -
> -void __init remove_early_mappings(void)
> -{
> -    int rc;
> -
> -    /* destroy the _PAGE_BLOCK mapping */
> -    rc = modify_xen_mappings(BOOT_FDT_VIRT_START,
> -                             BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE,
> -                             _PAGE_BLOCK);
> -    BUG_ON(rc);
> -}
> -
> -/*
> - * After boot, Xen page-tables should not contain mapping that are both
> - * Writable and eXecutables.
> - *
> - * This should be called on each CPU to enforce the policy.
> - */
> -static void xen_pt_enforce_wnx(void)
> -{
> -    WRITE_SYSREG(READ_SYSREG(SCTLR_EL2) | SCTLR_Axx_ELx_WXN, SCTLR_EL2);
> -    /*
> -     * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
> -     * before flushing the TLBs.
> -     */
> -    isb();
> -    flush_xen_tlb_local();
> -}
> -
> -extern void switch_ttbr(uint64_t ttbr);
> -
> -/* Clear a translation table and clean & invalidate the cache */
> -static void clear_table(void *table)
> -{
> -    clear_page(table);
> -    clean_and_invalidate_dcache_va_range(table, PAGE_SIZE);
> -}
> -
> -/* Boot-time pagetable setup.
> - * Changes here may need matching changes in head.S */
> -void __init setup_pagetables(unsigned long boot_phys_offset)
> -{
> -    uint64_t ttbr;
> -    lpae_t pte, *p;
> -    int i;
> -
> -    phys_offset = boot_phys_offset;
> -
> -#ifdef CONFIG_ARM_64
> -    p = (void *) xen_pgtable;
> -    p[0] = pte_of_xenaddr((uintptr_t)xen_first);
> -    p[0].pt.table = 1;
> -    p[0].pt.xn = 0;
> -    p = (void *) xen_first;
> -#else
> -    p = (void *) cpu0_pgtable;
> -#endif
> -
> -    /* Map xen second level page-table */
> -    p[0] = pte_of_xenaddr((uintptr_t)(xen_second));
> -    p[0].pt.table = 1;
> -    p[0].pt.xn = 0;
> -
> -    /* Break up the Xen mapping into 4k pages and protect them separately. */
> -    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
> -    {
> -        vaddr_t va = XEN_VIRT_START + (i << PAGE_SHIFT);
> -
> -        if ( !is_kernel(va) )
> -            break;
> -        pte = pte_of_xenaddr(va);
> -        pte.pt.table = 1; /* 4k mappings always have this bit set */
> -        if ( is_kernel_text(va) || is_kernel_inittext(va) )
> -        {
> -            pte.pt.xn = 0;
> -            pte.pt.ro = 1;
> -        }
> -        if ( is_kernel_rodata(va) )
> -            pte.pt.ro = 1;
> -        xen_xenmap[i] = pte;
> -    }
> -
> -    /* Initialise xen second level entries ... */
> -    /* ... Xen's text etc */
> -
> -    pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
> -    pte.pt.table = 1;
> -    xen_second[second_table_offset(XEN_VIRT_START)] = pte;
> -
> -    /* ... Fixmap */
> -    pte = pte_of_xenaddr((vaddr_t)xen_fixmap);
> -    pte.pt.table = 1;
> -    xen_second[second_table_offset(FIXMAP_ADDR(0))] = pte;
> -
> -#ifdef CONFIG_ARM_64
> -    ttbr = (uintptr_t) xen_pgtable + phys_offset;
> -#else
> -    ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
> -#endif
> -
> -    switch_ttbr(ttbr);
> -
> -    xen_pt_enforce_wnx();
> -
> -#ifdef CONFIG_ARM_32
> -    per_cpu(xen_pgtable, 0) = cpu0_pgtable;
> -#endif
> -}
> -
> -static void clear_boot_pagetables(void)
> -{
> -    /*
> -     * Clear the copy of the boot pagetables. Each secondary CPU
> -     * rebuilds these itself (see head.S).
> -     */
> -    clear_table(boot_pgtable);
> -#ifdef CONFIG_ARM_64
> -    clear_table(boot_first);
> -    clear_table(boot_first_id);
> -#endif
> -    clear_table(boot_second);
> -    clear_table(boot_third);
> -}
> -
> -#ifdef CONFIG_ARM_64
> -int init_secondary_pagetables(int cpu)
> -{
> -    clear_boot_pagetables();
> -
> -    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
> -     * pagetables, but rewrite it each time for consistency with 32 bit. */
> -    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
> -    clean_dcache(init_ttbr);
> -    return 0;
> -}
> -#else
> -int init_secondary_pagetables(int cpu)
> -{
> -    lpae_t *first;
> -
> -    first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level trie */
> -
> -    if ( !first )
> -    {
> -        printk("CPU%u: Unable to allocate the first page-table\n", cpu);
> -        return -ENOMEM;
> -    }
> -
> -    /* Initialise root pagetable from root of boot tables */
> -    memcpy(first, cpu0_pgtable, PAGE_SIZE);
> -    per_cpu(xen_pgtable, cpu) = first;
> -
> -    if ( !init_domheap_mappings(cpu) )
> -    {
> -        printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
> -        per_cpu(xen_pgtable, cpu) = NULL;
> -        free_xenheap_page(first);
> -        return -ENOMEM;
> -    }
> -
> -    clear_boot_pagetables();
> -
> -    /* Set init_ttbr for this CPU coming up */
> -    init_ttbr = __pa(first);
> -    clean_dcache(init_ttbr);
> -
> -    return 0;
> -}
> -#endif
> -
> -/* MMU setup for secondary CPUS (which already have paging enabled) */
> -void mmu_init_secondary_cpu(void)
> -{
> -    xen_pt_enforce_wnx();
> -}
> -
> -#ifdef CONFIG_ARM_32
> -/*
> - * Set up the direct-mapped xenheap:
> - * up to 1GB of contiguous, always-mapped memory.
> - */
> -void __init setup_directmap_mappings(unsigned long base_mfn,
> -                                     unsigned long nr_mfns)
> -{
> -    int rc;
> -
> -    rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns,
> -                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
> -    if ( rc )
> -        panic("Unable to setup the directmap mappings.\n");
> -
> -    /* Record where the directmap is, for translation routines. */
> -    directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
> -}
> -#else /* CONFIG_ARM_64 */
> -/* Map the region in the directmap area. */
> -void __init setup_directmap_mappings(unsigned long base_mfn,
> -                                     unsigned long nr_mfns)
> -{
> -    int rc;
> -
> -    /* First call sets the directmap physical and virtual offset. */
> -    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
> -    {
> -        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
> -
> -        directmap_mfn_start = _mfn(base_mfn);
> -        directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
> -        /*
> -         * The base address may not be aligned to the first level
> -         * size (e.g. 1GB when using 4KB pages). This would prevent
> -         * superpage mappings for all the regions because the virtual
> -         * address and machine address should both be suitably aligned.
> -         *
> -         * Prevent that by offsetting the start of the directmap virtual
> -         * address.
> -         */
> -        directmap_virt_start = DIRECTMAP_VIRT_START +
> -            (base_mfn - mfn_gb) * PAGE_SIZE;
> -    }
> -
> -    if ( base_mfn < mfn_x(directmap_mfn_start) )
> -        panic("cannot add directmap mapping at %lx below heap start %lx\n",
> -              base_mfn, mfn_x(directmap_mfn_start));
> -
> -    rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
> -                          _mfn(base_mfn), nr_mfns,
> -                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
> -    if ( rc )
> -        panic("Unable to setup the directmap mappings.\n");
> -}
> -#endif
> -
> -/* Map a frame table to cover physical addresses ps through pe */
> -void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
> -{
> -    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
> -                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
> -    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
> -    mfn_t base_mfn;
> -    const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : MB(32);
> -    int rc;
> -
> -    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
> -    /* Round up to 2M or 32M boundary, as appropriate. */
> -    frametable_size = ROUNDUP(frametable_size, mapping_size);
> -    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
> -
> -    rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> -                          frametable_size >> PAGE_SHIFT,
> -                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
> -    if ( rc )
> -        panic("Unable to setup the frametable mappings.\n");
> -
> -    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
> -    memset(&frame_table[nr_pdxs], -1,
> -           frametable_size - (nr_pdxs * sizeof(struct page_info)));
> -
> -    frametable_virt_end = FRAMETABLE_VIRT_START + (nr_pdxs * sizeof(struct page_info));
> -}
> -
> -void *__init arch_vmap_virt_end(void)
> -{
> -    return (void *)(VMAP_VIRT_START + VMAP_VIRT_SIZE);
> -}
> -
> -/*
> - * This function should only be used to remap device address ranges
> - * TODO: add a check to verify this assumption
> - */
> -void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
> -{
> -    mfn_t mfn = _mfn(PFN_DOWN(pa));
> -    unsigned int offs = pa & (PAGE_SIZE - 1);
> -    unsigned int nr = PFN_UP(offs + len);
> -    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
> -
> -    if ( ptr == NULL )
> -        return NULL;
> -
> -    return ptr + offs;
> -}
> -
> -void *ioremap(paddr_t pa, size_t len)
> -{
> -    return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> -}
> -
> -static int create_xen_table(lpae_t *entry)
> -{
> -    mfn_t mfn;
> -    void *p;
> -    lpae_t pte;
> -
> -    if ( system_state != SYS_STATE_early_boot )
> -    {
> -        struct page_info *pg = alloc_domheap_page(NULL, 0);
> -
> -        if ( pg == NULL )
> -            return -ENOMEM;
> -
> -        mfn = page_to_mfn(pg);
> -    }
> -    else
> -        mfn = alloc_boot_pages(1, 1);
> -
> -    p = xen_map_table(mfn);
> -    clear_page(p);
> -    xen_unmap_table(p);
> -
> -    pte = mfn_to_xen_entry(mfn, MT_NORMAL);
> -    pte.pt.table = 1;
> -    write_pte(entry, pte);
> -
> -    return 0;
> -}
> -
> -#define XEN_TABLE_MAP_FAILED 0
> -#define XEN_TABLE_SUPER_PAGE 1
> -#define XEN_TABLE_NORMAL_PAGE 2
> -
> -/*
> - * Take the currently mapped table, find the corresponding entry,
> - * and map the next table, if available.
> - *
> - * The read_only parameters indicates whether intermediate tables should
> - * be allocated when not present.
> - *
> - * Return values:
> - *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
> - *  was empty, or allocating a new page failed.
> - *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
> - *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
> - */
> -static int xen_pt_next_level(bool read_only, unsigned int level,
> -                             lpae_t **table, unsigned int offset)
> -{
> -    lpae_t *entry;
> -    int ret;
> -    mfn_t mfn;
> -
> -    entry = *table + offset;
> -
> -    if ( !lpae_is_valid(*entry) )
> -    {
> -        if ( read_only )
> -            return XEN_TABLE_MAP_FAILED;
> -
> -        ret = create_xen_table(entry);
> -        if ( ret )
> -            return XEN_TABLE_MAP_FAILED;
> -    }
> -
> -    /* The function xen_pt_next_level is never called at the 3rd level */
> -    if ( lpae_is_mapping(*entry, level) )
> -        return XEN_TABLE_SUPER_PAGE;
> -
> -    mfn = lpae_get_mfn(*entry);
> -
> -    xen_unmap_table(*table);
> -    *table = xen_map_table(mfn);
> -
> -    return XEN_TABLE_NORMAL_PAGE;
> -}
> -
> -/* Sanity check of the entry */
> -static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
> -                               unsigned int flags)
> -{
> -    /* Sanity check when modifying an entry. */
> -    if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
> -    {
> -        /* We don't allow modifying an invalid entry. */
> -        if ( !lpae_is_valid(entry) )
> -        {
> -            mm_printk("Modifying invalid entry is not allowed.\n");
> -            return false;
> -        }
> -
> -        /* We don't allow modifying a table entry */
> -        if ( !lpae_is_mapping(entry, level) )
> -        {
> -            mm_printk("Modifying a table entry is not allowed.\n");
> -            return false;
> -        }
> -
> -        /* We don't allow changing memory attributes. */
> -        if ( entry.pt.ai != PAGE_AI_MASK(flags) )
> -        {
> -            mm_printk("Modifying memory attributes is not allowed (0x%x -> 0x%x).\n",
> -                      entry.pt.ai, PAGE_AI_MASK(flags));
> -            return false;
> -        }
> -
> -        /* We don't allow modifying entry with contiguous bit set. */
> -        if ( entry.pt.contig )
> -        {
> -            mm_printk("Modifying entry with contiguous bit set is not allowed.\n");
> -            return false;
> -        }
> -    }
> -    /* Sanity check when inserting a mapping */
> -    else if ( flags & _PAGE_PRESENT )
> -    {
> -        /* We should be here with a valid MFN. */
> -        ASSERT(!mfn_eq(mfn, INVALID_MFN));
> -
> -        /*
> -         * We don't allow replacing any valid entry.
> -         *
> -         * Note that the function xen_pt_update() relies on this
> -         * assumption and will skip the TLB flush. The function will need
> -         * to be updated if the check is relaxed.
> -         */
> -        if ( lpae_is_valid(entry) )
> -        {
> -            if ( lpae_is_mapping(entry, level) )
> -                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
> -                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
> -            else
> -                mm_printk("Trying to replace a table with a mapping.\n");
> -            return false;
> -        }
> -    }
> -    /* Sanity check when removing a mapping. */
> -    else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
> -    {
> -        /* We should be here with an invalid MFN. */
> -        ASSERT(mfn_eq(mfn, INVALID_MFN));
> -
> -        /* We don't allow removing a table */
> -        if ( lpae_is_table(entry, level) )
> -        {
> -            mm_printk("Removing a table is not allowed.\n");
> -            return false;
> -        }
> -
> -        /* We don't allow removing a mapping with contiguous bit set. */
> -        if ( entry.pt.contig )
> -        {
> -            mm_printk("Removing entry with contiguous bit set is not allowed.\n");
> -            return false;
> -        }
> -    }
> -    /* Sanity check when populating the page-table. No check so far. */
> -    else
> -    {
> -        ASSERT(flags & _PAGE_POPULATE);
> -        /* We should be here with an invalid MFN */
> -        ASSERT(mfn_eq(mfn, INVALID_MFN));
> -    }
> -
> -    return true;
> -}
> -
> -/* Update an entry at the level @target. */
> -static int xen_pt_update_entry(mfn_t root, unsigned long virt,
> -                               mfn_t mfn, unsigned int target,
> -                               unsigned int flags)
> -{
> -    int rc;
> -    unsigned int level;
> -    lpae_t *table;
> -    /*
> -     * The intermediate page tables are read-only when the MFN is not valid
> -     * and we are not populating page table.
> -     * This means we either modify permissions or remove an entry.
> -     */
> -    bool read_only = mfn_eq(mfn, INVALID_MFN) && !(flags & _PAGE_POPULATE);
> -    lpae_t pte, *entry;
> -
> -    /* convenience aliases */
> -    DECLARE_OFFSETS(offsets, (paddr_t)virt);
> -
> -    /* _PAGE_POPULATE and _PAGE_PRESENT should never be set together. */
> -    ASSERT((flags & (_PAGE_POPULATE|_PAGE_PRESENT)) != (_PAGE_POPULATE|_PAGE_PRESENT));
> -
> -    table = xen_map_table(root);
> -    for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
> -    {
> -        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
> -        if ( rc == XEN_TABLE_MAP_FAILED )
> -        {
> -            /*
> -             * We are here because xen_pt_next_level has failed to map
> -             * the intermediate page table (e.g the table does not exist
> -             * and the pt is read-only). It is a valid case when
> -             * removing a mapping as it may not exist in the page table.
> -             * In this case, just ignore it.
> -             */
> -            if ( flags & (_PAGE_PRESENT|_PAGE_POPULATE) )
> -            {
> -                mm_printk("%s: Unable to map level %u\n", __func__, level);
> -                rc = -ENOENT;
> -                goto out;
> -            }
> -            else
> -            {
> -                rc = 0;
> -                goto out;
> -            }
> -        }
> -        else if ( rc != XEN_TABLE_NORMAL_PAGE )
> -            break;
> -    }
> -
> -    if ( level != target )
> -    {
> -        mm_printk("%s: Shattering superpage is not supported\n", __func__);
> -        rc = -EOPNOTSUPP;
> -        goto out;
> -    }
> -
> -    entry = table + offsets[level];
> -
> -    rc = -EINVAL;
> -    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
> -        goto out;
> -
> -    /* If we are only populating page-table, then we are done. */
> -    rc = 0;
> -    if ( flags & _PAGE_POPULATE )
> -        goto out;
> -
> -    /* We are removing the page */
> -    if ( !(flags & _PAGE_PRESENT) )
> -        memset(&pte, 0x00, sizeof(pte));
> -    else
> -    {
> -        /* We are inserting a mapping => Create new pte. */
> -        if ( !mfn_eq(mfn, INVALID_MFN) )
> -        {
> -            pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
> -
> -            /*
> -             * First and second level pages set pte.pt.table = 0, but
> -             * third level entries set pte.pt.table = 1.
> -             */
> -            pte.pt.table = (level == 3);
> -        }
> -        else /* We are updating the permission => Copy the current pte. */
> -            pte = *entry;
> -
> -        /* Set permission */
> -        pte.pt.ro = PAGE_RO_MASK(flags);
> -        pte.pt.xn = PAGE_XN_MASK(flags);
> -        /* Set contiguous bit */
> -        pte.pt.contig = !!(flags & _PAGE_CONTIG);
> -    }
> -
> -    write_pte(entry, pte);
> -
> -    rc = 0;
> -
> -out:
> -    xen_unmap_table(table);
> -
> -    return rc;
> -}
> -
> -/* Return the level where mapping should be done */
> -static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
> -                                unsigned int flags)
> -{
> -    unsigned int level;
> -    unsigned long mask;
> -
> -    /*
> -      * Don't take into account the MFN when removing mapping (i.e
> -      * MFN_INVALID) to calculate the correct target order.
> -      *
> -      * Per the Arm Arm, `vfn` and `mfn` must be both superpage aligned.
> -      * They are or-ed together and then checked against the size of
> -      * each level.
> -      *
> -      * `left` is not included and checked separately to allow
> -      * superpage mapping even if it is not properly aligned (the
> -      * user may have asked to map 2MB + 4k).
> -      */
> -     mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
> -     mask |= vfn;
> -
> -     /*
> -      * Always use level 3 mapping unless the caller request block
> -      * mapping.
> -      */
> -     if ( likely(!(flags & _PAGE_BLOCK)) )
> -         level = 3;
> -     else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) &&
> -               (nr >= BIT(FIRST_ORDER, UL)) )
> -         level = 1;
> -     else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) &&
> -               (nr >= BIT(SECOND_ORDER, UL)) )
> -         level = 2;
> -     else
> -         level = 3;
> -
> -     return level;
> -}
> -
> -#define XEN_PT_4K_NR_CONTIG 16
> -
> -/*
> - * Check whether the contiguous bit can be set. Return the number of
> - * contiguous entry allowed. If not allowed, return 1.
> - */
> -static unsigned int xen_pt_check_contig(unsigned long vfn, mfn_t mfn,
> -                                        unsigned int level, unsigned long left,
> -                                        unsigned int flags)
> -{
> -    unsigned long nr_contig;
> -
> -    /*
> -     * Allow the contiguous bit to set when the caller requests block
> -     * mapping.
> -     */
> -    if ( !(flags & _PAGE_BLOCK) )
> -        return 1;
> -
> -    /*
> -     * We don't allow to remove mapping with the contiguous bit set.
> -     * So shortcut the logic and directly return 1.
> -     */
> -    if ( mfn_eq(mfn, INVALID_MFN) )
> -        return 1;
> -
> -    /*
> -     * The number of contiguous entries varies depending on the page
> -     * granularity used. The logic below assumes 4KB.
> -     */
> -    BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
> -
> -    /*
> -     * In order to enable the contiguous bit, we should have enough entries
> -     * to map left and both the virtual and physical address should be
> -     * aligned to the size of 16 translation tables entries.
> -     */
> -    nr_contig = BIT(XEN_PT_LEVEL_ORDER(level), UL) * XEN_PT_4K_NR_CONTIG;
> -
> -    if ( (left < nr_contig) || ((mfn_x(mfn) | vfn) & (nr_contig - 1)) )
> -        return 1;
> -
> -    return XEN_PT_4K_NR_CONTIG;
> -}
> -
> -static DEFINE_SPINLOCK(xen_pt_lock);
> -
> -static int xen_pt_update(unsigned long virt,
> -                         mfn_t mfn,
> -                         /* const on purpose as it is used for TLB flush */
> -                         const unsigned long nr_mfns,
> -                         unsigned int flags)
> -{
> -    int rc = 0;
> -    unsigned long vfn = virt >> PAGE_SHIFT;
> -    unsigned long left = nr_mfns;
> -
> -    /*
> -     * For arm32, page-tables are different on each CPUs. Yet, they share
> -     * some common mappings. It is assumed that only common mappings
> -     * will be modified with this function.
> -     *
> -     * XXX: Add a check.
> -     */
> -    const mfn_t root = maddr_to_mfn(READ_SYSREG64(TTBR0_EL2));
> -
> -    /*
> -     * The hardware was configured to forbid mapping both writeable and
> -     * executable.
> -     * When modifying/creating mapping (i.e _PAGE_PRESENT is set),
> -     * prevent any update if this happen.
> -     */
> -    if ( (flags & _PAGE_PRESENT) && !PAGE_RO_MASK(flags) &&
> -         !PAGE_XN_MASK(flags) )
> -    {
> -        mm_printk("Mappings should not be both Writeable and Executable.\n");
> -        return -EINVAL;
> -    }
> -
> -    if ( flags & _PAGE_CONTIG )
> -    {
> -        mm_printk("_PAGE_CONTIG is an internal only flag.\n");
> -        return -EINVAL;
> -    }
> -
> -    if ( !IS_ALIGNED(virt, PAGE_SIZE) )
> -    {
> -        mm_printk("The virtual address is not aligned to the page-size.\n");
> -        return -EINVAL;
> -    }
> -
> -    spin_lock(&xen_pt_lock);
> -
> -    while ( left )
> -    {
> -        unsigned int order, level, nr_contig, new_flags;
> -
> -        level = xen_pt_mapping_level(vfn, mfn, left, flags);
> -        order = XEN_PT_LEVEL_ORDER(level);
> -
> -        ASSERT(left >= BIT(order, UL));
> -
> -        /*
> -         * Check if we can set the contiguous mapping and update the
> -         * flags accordingly.
> -         */
> -        nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags);
> -        new_flags = flags | ((nr_contig > 1) ? _PAGE_CONTIG : 0);
> -
> -        for ( ; nr_contig > 0; nr_contig-- )
> -        {
> -            rc = xen_pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level,
> -                                     new_flags);
> -            if ( rc )
> -                break;
> -
> -            vfn += 1U << order;
> -            if ( !mfn_eq(mfn, INVALID_MFN) )
> -                mfn = mfn_add(mfn, 1U << order);
> -
> -            left -= (1U << order);
> -        }
> -
> -        if ( rc )
> -            break;
> -    }
> -
> -    /*
> -     * The TLBs flush can be safely skipped when a mapping is inserted
> -     * as we don't allow mapping replacement (see xen_pt_check_entry()).
> -     *
> -     * For all the other cases, the TLBs will be flushed unconditionally
> -     * even if the mapping has failed. This is because we may have
> -     * partially modified the PT. This will prevent any unexpected
> -     * behavior afterwards.
> -     */
> -    if ( !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) )
> -        flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
> -
> -    spin_unlock(&xen_pt_lock);
> -
> -    return rc;
> -}
> -
> -int map_pages_to_xen(unsigned long virt,
> -                     mfn_t mfn,
> -                     unsigned long nr_mfns,
> -                     unsigned int flags)
> -{
> -    return xen_pt_update(virt, mfn, nr_mfns, flags);
> -}
> -
> -int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
> -{
> -    return xen_pt_update(virt, INVALID_MFN, nr_mfns, _PAGE_POPULATE);
> -}
> -
> -int destroy_xen_mappings(unsigned long s, unsigned long e)
> -{
> -    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
> -    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
> -    ASSERT(s <= e);
> -    return xen_pt_update(s, INVALID_MFN, (e - s) >> PAGE_SHIFT, 0);
> -}
> -
> -int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
> -{
> -    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
> -    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
> -    ASSERT(s <= e);
> -    return xen_pt_update(s, INVALID_MFN, (e - s) >> PAGE_SHIFT, flags);
> -}
> -
> -/* Release all __init and __initdata ranges to be reused */
> -void free_init_memory(void)

This function doesn't look specific to the MMU.

> -{
> -    paddr_t pa = virt_to_maddr(__init_begin);
> -    unsigned long len = __init_end - __init_begin;
> -    uint32_t insn;
> -    unsigned int i, nr = len / sizeof(insn);
> -    uint32_t *p;
> -    int rc;
> -
> -    rc = modify_xen_mappings((unsigned long)__init_begin,
> -                             (unsigned long)__init_end, PAGE_HYPERVISOR_RW);
> -    if ( rc )
> -        panic("Unable to map RW the init section (rc = %d)\n", rc);
> -
> -    /*
> -     * From now on, init will not be used for execution anymore,
> -     * so nuke the instruction cache to remove entries related to init.
> -     */
> -    invalidate_icache_local();
> -
> -#ifdef CONFIG_ARM_32
> -    /* udf instruction i.e (see A8.8.247 in ARM DDI 0406C.c) */
> -    insn = 0xe7f000f0;
> -#else
> -    insn = AARCH64_BREAK_FAULT;
> -#endif
> -    p = (uint32_t *)__init_begin;
> -    for ( i = 0; i < nr; i++ )
> -        *(p + i) = insn;
> -
> -    rc = destroy_xen_mappings((unsigned long)__init_begin,
> -                              (unsigned long)__init_end);
> -    if ( rc )
> -        panic("Unable to remove the init section (rc = %d)\n", rc);
> -
> -    init_domheap_pages(pa, pa + len);
> -    printk("Freed %ldkB init memory.\n", (long)(__init_end-__init_begin)>>10);
> -}
> -


[...]

> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index 43e9a1be4d..87a12042cc 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -20,8 +20,10 @@
>    */
>   
>   #include <xen/init.h>
> +#include <xen/mm.h>
>   #include <xen/page-size.h>
>   #include <asm/arm64/mpu.h>
> +#include <asm/page.h>

Regardless of what I wrote above, none of the code you add seems to 
require <asm/page.h>

>   
>   /* Xen MPU memory region mapping table. */
>   pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
> @@ -38,6 +40,71 @@ uint64_t __ro_after_init next_transient_region_idx;
>   /* Maximum number of supported MPU memory regions by the EL2 MPU. */
>   uint64_t __ro_after_init max_xen_mpumap;
>   
> +/* TODO: Implementation on the first usage */

It is not clear what you mean given there are some callers.

> +void dump_hyp_walk(vaddr_t addr)
> +{

Please add ASSERT_UNREACHABLE() for any dummy helper you have introduced 
any plan to implement later. This will be helpful to track down any 
function you haven't implemented.


> +}
> +
> +void * __init early_fdt_map(paddr_t fdt_paddr)
> +{
> +    return NULL;
> +}
> +
> +void __init remove_early_mappings(void)
> +{

Ditto

> +}
> +
> +int init_secondary_pagetables(int cpu)
> +{
> +    return -ENOSYS;
> +}
> +
> +void mmu_init_secondary_cpu(void)
> +{

Ditto. The name of the function is also a bit odd given this is an MPU 
specific file. This likely want to be renamed to mm_init_secondary_cpu().

> +}
> +
> +void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
> +{
> +    return NULL;
> +}
> +
> +void *ioremap(paddr_t pa, size_t len)
> +{
> +    return NULL;
> +}
> +
> +int map_pages_to_xen(unsigned long virt,
> +                     mfn_t mfn,
> +                     unsigned long nr_mfns,
> +                     unsigned int flags)
> +{
> +    return -ENOSYS;
> +}
> +
> +int destroy_xen_mappings(unsigned long s, unsigned long e)
> +{
> +    return -ENOSYS;
> +}
> +
> +int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
> +{
> +    return -ENOSYS;
> +}
> +
> +void free_init_memory(void)
> +{

Ditto.

> +}
> +
> +int xenmem_add_to_physmap_one(
> +    struct domain *d,
> +    unsigned int space,
> +    union add_to_physmap_extra extra,
> +    unsigned long idx,
> +    gfn_t gfn)
> +{
> +    return -ENOSYS;
> +}
> +
>   /*
>    * Local variables:
>    * mode: C

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings
  2023-01-13  5:28 ` [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings Penny Zheng
@ 2023-02-05 21:32   ` Julien Grall
  2023-02-07  4:40     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-02-05 21:32 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> Function setup_pagetables is responsible for boot-time pagetable setup
> in MMU system.
> But in MPU system, we have already built up start-of-day Xen MPU memory region
> mapping at the very beginning in assembly.
> 
> So in order to keep only one codeflow in arm/setup.c, setup_mm_mappings
> , with a more generic name, is introduced and act as an empty stub in
> MPU system.

is the empty stub temporarily?

> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/mm.h     |  2 ++
>   xen/arch/arm/include/asm/mm_mpu.h | 16 ++++++++++++++++
>   xen/arch/arm/setup.c              |  2 +-
>   3 files changed, 19 insertions(+), 1 deletion(-)
>   create mode 100644 xen/arch/arm/include/asm/mm_mpu.h
> 
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 1b9fdb6ff5..9b4c07d965 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -243,6 +243,8 @@ static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
>   
>   #ifndef CONFIG_HAS_MPU
>   #include <asm/mm_mmu.h>
> +#else
> +#include <asm/mm_mpu.h>
>   #endif
>   
>   /* Page-align address and convert to frame number format */
> diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
> new file mode 100644
> index 0000000000..1f3cff7743
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/mm_mpu.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef __ARCH_ARM_MM_MPU__
> +#define __ARCH_ARM_MM_MPU__
> +
> +#define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
> +
> +#endif /* __ARCH_ARM_MM_MPU__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 1f26f67b90..d7d200179c 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -1003,7 +1003,7 @@ void __init start_xen(unsigned long boot_phys_offset,
>       /* Initialize traps early allow us to get backtrace when an error occurred */
>       init_traps();
>   
> -    setup_pagetables(boot_phys_offset);
> +    setup_mm_mappings(boot_phys_offset);

You are renaming the caller but not the function. Why?

>   
>       smp_clear_cpu_maps();
>   

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 17/40] xen/mpu: plump virt/maddr/mfn convertion in MPU system
  2023-01-13  5:28 ` [PATCH v2 17/40] xen/mpu: plump virt/maddr/mfn convertion in MPU system Penny Zheng
@ 2023-02-05 21:36   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-05 21:36 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

title: typo: s/convertion/conversion/

On 13/01/2023 05:28, Penny Zheng wrote:
> virt_to_maddr and maddr_to_virt are used widely in Xen code. So
> even there is no VMSA in MPU system, we keep the interface name to
> stay the same code flow.
> 
> We move the existing virt/maddr convertion from mm.h to mm_mmu.h.
> And the MPU version of virt/maddr convertion is simple, returning

ditto.

> the input address as the output.
> 
> We should overide virt_to_mfn/mfn_to_virt in source file mm_mpu.c the

ditto: s/overide/override/

> same way in mm_mmu.c.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/mm.h     | 26 --------------------------
>   xen/arch/arm/include/asm/mm_mmu.h | 26 ++++++++++++++++++++++++++
>   xen/arch/arm/include/asm/mm_mpu.h | 13 +++++++++++++
>   xen/arch/arm/mm_mpu.c             |  6 ++++++
>   4 files changed, 45 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 9b4c07d965..e29158028a 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -250,32 +250,6 @@ static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
>   /* Page-align address and convert to frame number format */
>   #define paddr_to_pfn_aligned(paddr)    paddr_to_pfn(PAGE_ALIGN(paddr))
>   
> -static inline paddr_t __virt_to_maddr(vaddr_t va)
> -{
> -    uint64_t par = va_to_par(va);
> -    return (par & PADDR_MASK & PAGE_MASK) | (va & ~PAGE_MASK);
> -}
> -#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
> -
> -#ifdef CONFIG_ARM_32
> -static inline void *maddr_to_virt(paddr_t ma)
> -{
> -    ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
> -    ma -= mfn_to_maddr(directmap_mfn_start);
> -    return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
> -}
> -#else
> -static inline void *maddr_to_virt(paddr_t ma)
> -{
> -    ASSERT((mfn_to_pdx(maddr_to_mfn(ma)) - directmap_base_pdx) <
> -           (DIRECTMAP_SIZE >> PAGE_SHIFT));
> -    return (void *)(XENHEAP_VIRT_START -
> -                    (directmap_base_pdx << PAGE_SHIFT) +
> -                    ((ma & ma_va_bottom_mask) |
> -                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
> -}
> -#endif
> -
>   /*
>    * Translate a guest virtual address to a machine address.
>    * Return the fault information if the translation has failed else 0.
> diff --git a/xen/arch/arm/include/asm/mm_mmu.h b/xen/arch/arm/include/asm/mm_mmu.h
> index a5e63d8af8..6d7e5ddde7 100644
> --- a/xen/arch/arm/include/asm/mm_mmu.h
> +++ b/xen/arch/arm/include/asm/mm_mmu.h
> @@ -23,6 +23,32 @@ extern uint64_t init_ttbr;
>   extern void setup_directmap_mappings(unsigned long base_mfn,
>                                        unsigned long nr_mfns);
>   
> +static inline paddr_t __virt_to_maddr(vaddr_t va)
> +{
> +    uint64_t par = va_to_par(va);
> +    return (par & PADDR_MASK & PAGE_MASK) | (va & ~PAGE_MASK);
> +}
> +#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
> +
> +#ifdef CONFIG_ARM_32
> +static inline void *maddr_to_virt(paddr_t ma)
> +{
> +    ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
> +    ma -= mfn_to_maddr(directmap_mfn_start);
> +    return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
> +}
> +#else
> +static inline void *maddr_to_virt(paddr_t ma)
> +{
> +    ASSERT((mfn_to_pdx(maddr_to_mfn(ma)) - directmap_base_pdx) <
> +           (DIRECTMAP_SIZE >> PAGE_SHIFT));
> +    return (void *)(XENHEAP_VIRT_START -
> +                    (directmap_base_pdx << PAGE_SHIFT) +
> +                    ((ma & ma_va_bottom_mask) |
> +                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
> +}
> +#endif
> +
>   #endif /* __ARCH_ARM_MM_MMU__ */
>   
>   /*
> diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
> index 1f3cff7743..3a4b07f187 100644
> --- a/xen/arch/arm/include/asm/mm_mpu.h
> +++ b/xen/arch/arm/include/asm/mm_mpu.h
> @@ -4,6 +4,19 @@
>   
>   #define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
>   
> +static inline paddr_t __virt_to_maddr(vaddr_t va)
> +{
> +    /* In MPU system, VA == PA. */
> +    return (paddr_t)va;
> +}
> +#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))

This define is exactly the same as the MMU version. Can it be 
implemented in mm.h?

> +
> +static inline void *maddr_to_virt(paddr_t ma)
> +{
> +    /* In MPU system, VA == PA. */
> +    return (void *)ma;
> +}
> +
>   #endif /* __ARCH_ARM_MM_MPU__ */
>   
>   /*
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index 87a12042cc..c9e17ab6da 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -29,6 +29,12 @@
>   pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
>        xen_mpumap[ARM_MAX_MPU_MEMORY_REGIONS];
>   
> +/* Override macros from asm/page.h to make them work with mfn_t */
> +#undef virt_to_mfn
> +#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> +#undef mfn_to_virt
> +#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))

They should be implemented when you need them.

> +
>   /* Index into MPU memory region map for fixed regions, ascending from zero. */
>   uint64_t __ro_after_init next_fixed_region_idx;
>   /*

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table
  2023-01-13  5:28 ` [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table Penny Zheng
@ 2023-02-05 21:45   ` Julien Grall
  2023-02-07  5:07     ` Penny Zheng
  0 siblings, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-02-05 21:45 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> The new helper xen_mpumap_update() is responsible for updating an entry
> in Xen MPU memory mapping table, including creating a new entry, updating
> or destroying an existing one.
> 
> This commit only talks about populating a new entry in Xen MPU mapping table(
> xen_mpumap). Others will be introduced in the following commits.
> 
> In xen_mpumap_update_entry(), firstly, we shall check if requested address
> range [base, limit) is not mapped. Then we use pr_of_xenaddr() to build up the
> structure of MPU memory region(pr_t).
> In the last, we set memory attribute and permission based on variable @flags.
> 
> To summarize all region attributes in one variable @flags, layout of the
> flags is elaborated as follows:
> [0:2] Memory attribute Index
> [3:4] Execute Never
> [5:6] Access Permission
> [7]   Region Present
> Also, we provide a set of definitions(REGION_HYPERVISOR_RW, etc) that combine
> the memory attribute and permission for common combinations.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/arm64/mpu.h |  72 +++++++
>   xen/arch/arm/mm_mpu.c                | 276 ++++++++++++++++++++++++++-
>   2 files changed, 340 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
> index c945dd53db..fcde6ad0db 100644
> --- a/xen/arch/arm/include/asm/arm64/mpu.h
> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> @@ -16,6 +16,61 @@
>    */
>   #define ARM_MAX_MPU_MEMORY_REGIONS 255
>   
> +/* Access permission attributes. */
> +/* Read/Write at EL2, No Access at EL1/EL0. */
> +#define AP_RW_EL2 0x0
> +/* Read/Write at EL2/EL1/EL0 all levels. */
> +#define AP_RW_ALL 0x1
> +/* Read-only at EL2, No Access at EL1/EL0. */
> +#define AP_RO_EL2 0x2
> +/* Read-only at EL2/EL1/EL0 all levels. */
> +#define AP_RO_ALL 0x3
> +
> +/*
> + * Excute never.
> + * Stage 1 EL2 translation regime.
> + * XN[1] determines whether execution of the instruction fetched from the MPU
> + * memory region is permitted.
> + * Stage 2 EL1/EL0 translation regime.
> + * XN[0] determines whether execution of the instruction fetched from the MPU
> + * memory region is permitted.
> + */
> +#define XN_DISABLED    0x0
> +#define XN_P2M_ENABLED 0x1
> +#define XN_ENABLED     0x2
> +
> +/*
> + * Layout of the flags used for updating Xen MPU region attributes
> + * [0:2] Memory attribute Index
> + * [3:4] Execute Never
> + * [5:6] Access Permission
> + * [7]   Region Present
> + */
> +#define _REGION_AI_BIT            0
> +#define _REGION_XN_BIT            3
> +#define _REGION_AP_BIT            5
> +#define _REGION_PRESENT_BIT       7
> +#define _REGION_XN                (2U << _REGION_XN_BIT)
> +#define _REGION_RO                (2U << _REGION_AP_BIT)
> +#define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
> +#define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
> +#define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
> +#define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
> +#define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
> +
> +/*
> + * _REGION_NORMAL is convenience define. It is not meant to be used
> + * outside of this header.
> + */
> +#define _REGION_NORMAL            (MT_NORMAL|_REGION_PRESENT)
> +
> +#define REGION_HYPERVISOR_RW      (_REGION_NORMAL|_REGION_XN)
> +#define REGION_HYPERVISOR_RO      (_REGION_NORMAL|_REGION_XN|_REGION_RO)
> +
> +#define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
> +
> +#define INVALID_REGION            (~0UL)
> +
>   #ifndef __ASSEMBLY__
>   
>   /* Protection Region Base Address Register */
> @@ -49,6 +104,23 @@ typedef struct {
>       prlar_t prlar;
>   } pr_t;
>   
> +/* Access to set base address of MPU protection region(pr_t). */
> +#define pr_set_base(pr, paddr) ({                           \
> +    pr_t *_pr = pr;                                         \
> +    _pr->prbar.reg.base = (paddr >> MPU_REGION_SHIFT);      \
> +})
> +
> +/* Access to set limit address of MPU protection region(pr_t). */
> +#define pr_set_limit(pr, paddr) ({                          \
> +    pr_t *_pr = pr;                                         \
> +    _pr->prlar.reg.limit = (paddr >> MPU_REGION_SHIFT);     \
> +})
> +
> +#define region_is_valid(pr) ({                              \
> +    pr_t *_pr = pr;                                         \
> +    _pr->prlar.reg.en;                                      \
> +})

Can they all be implemented using static inline?

> +
>   #endif /* __ASSEMBLY__ */
>   
>   #endif /* __ARM64_MPU_H__ */
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index f2b494449c..08720a7c19 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -22,9 +22,23 @@
>   #include <xen/init.h>
>   #include <xen/mm.h>
>   #include <xen/page-size.h>
> +#include <xen/spinlock.h>
>   #include <asm/arm64/mpu.h>
>   #include <asm/page.h>
>   
> +#ifdef NDEBUG
> +static inline void
> +__attribute__ ((__format__ (__printf__, 1, 2)))
> +region_printk(const char *fmt, ...) {}
> +#else
> +#define region_printk(fmt, args...)         \
> +    do                                      \
> +    {                                       \
> +        dprintk(XENLOG_ERR, fmt, ## args);  \
> +        WARN();                             \
> +    } while (0)
> +#endif
> +
>   /* Xen MPU memory region mapping table. */
>   pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
>        xen_mpumap[ARM_MAX_MPU_MEMORY_REGIONS];
> @@ -46,6 +60,8 @@ uint64_t __ro_after_init next_transient_region_idx;
>   /* Maximum number of supported MPU memory regions by the EL2 MPU. */
>   uint64_t __ro_after_init max_xen_mpumap;
>   
> +static DEFINE_SPINLOCK(xen_mpumap_lock);
> +
>   /* Write a MPU protection region */
>   #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
>       uint64_t _sel = sel;                                                \
> @@ -73,6 +89,28 @@ uint64_t __ro_after_init max_xen_mpumap;
>       _pr;                                                                \
>   })
>   
> +/*
> + * In boot-time, fixed MPU regions(e.g. Xen text section) are added
> + * at the front, indexed by next_fixed_region_idx, the others like
> + * boot-only regions(e.g. early FDT) should be added at the rear,
> + * indexed by next_transient_region_idx.
> + * With more and more MPU regions added-in, when the two indexes
> + * meet and pass with each other, we would run out of the whole
> + * EL2 MPU memory regions.
> + */
> +static bool __init xen_boot_mpu_regions_is_full(void)
> +{
> +    return next_transient_region_idx < next_fixed_region_idx;
> +}
> +
> +static void __init update_boot_xen_mpumap_idx(uint64_t idx)
> +{
> +    if ( idx == next_transient_region_idx )
> +        next_transient_region_idx--;
> +    else
> +        next_fixed_region_idx++;
> +}
> +
>   /*
>    * Access MPU protection region, including both read/write operations.
>    * Armv8-R AArch64 at most supports 255 MPU protection regions.
> @@ -197,6 +235,236 @@ static void access_protection_region(bool read, pr_t *pr_read,
>       }
>   }
>   
> +/*
> + * Standard entry for building up the structure of MPU memory region(pr_t).
> + * It is equivalent to mfn_to_xen_entry in MMU system.
> + * base and limit both refer to inclusive address.
> + */
> +static inline pr_t pr_of_xenaddr(paddr_t base, paddr_t limit, unsigned attr)
> +{
> +    prbar_t prbar;
> +    prlar_t prlar;
> +    pr_t region;
> +
> +    /* Build up value for PRBAR_EL2. */
> +    prbar = (prbar_t) {
> +        .reg = {
> +            .ap = AP_RW_EL2,  /* Read/Write at EL2, no access at EL1/EL0. */
> +            .xn = XN_ENABLED, /* No need to execute outside .text */
> +        }};
> +
> +    switch ( attr )
> +    {
> +    case MT_NORMAL_NC:
> +        /*
> +         * ARM ARM: Overlaying the shareability attribute (DDI
> +         * 0406C.b B3-1376 to 1377)
> +         *
> +         * A memory region with a resultant memory type attribute of normal,
> +         * and a resultant cacheability attribute of Inner non-cacheable,
> +         * outer non-cacheable, must have a resultant shareability attribute
> +         * of outer shareable, otherwise shareability is UNPREDICTABLE.
> +         *
> +         * On ARMv8 sharability is ignored and explicitly treated as outer
> +         * shareable for normal inner non-cacheable, outer non-cacheable.
> +         */
> +        prbar.reg.sh = LPAE_SH_OUTER;
> +        break;
> +    case MT_DEVICE_nGnRnE:
> +    case MT_DEVICE_nGnRE:
> +        /*
> +         * Shareability is ignored for non-normal memory, Outer is as
> +         * good as anything.
> +         *
> +         * On ARMv8 sharability is ignored and explicitly treated as outer
> +         * shareable for any device memory type.
> +         */
> +        prbar.reg.sh = LPAE_SH_OUTER;
> +        break;
> +    default:
> +        /* Xen mappings are SMP coherent */
> +        prbar.reg.sh = LPAE_SH_INNER;
> +        break;
> +    }
> +
> +    /* Build up value for PRLAR_EL2. */
> +    prlar = (prlar_t) {
> +        .reg = {
> +            .ns = 0,        /* Hyp mode is in secure world */
> +            .ai = attr,
> +            .en = 1,        /* Region enabled */
> +        }};
> +
> +    /* Build up MPU memory region. */
> +    region = (pr_t) {
> +        .prbar = prbar,
> +        .prlar = prlar,
> +    };
> +
> +    /* Set base address and limit address. */
> +    pr_set_base(&region, base);
> +    pr_set_limit(&region, limit);
> +
> +    return region;
> +}
> +
> +#define MPUMAP_REGION_FAILED    0
> +#define MPUMAP_REGION_FOUND     1
> +#define MPUMAP_REGION_INCLUSIVE 2
> +#define MPUMAP_REGION_OVERLAP   3
> +
> +/*
> + * Check whether memory range [base, limit] is mapped in MPU memory
> + * region table \mpu. Only address range is considered, memory attributes
> + * and permission are not considered here.
> + * If we find the match, the associated index will be filled up.
> + * If the entry is not present, INVALID_REGION will be set in \index
> + *
> + * Make sure that parameter \base and \limit are both referring
> + * inclusive addresss
> + *
> + * Return values:
> + *  MPUMAP_REGION_FAILED: no mapping and no overmapping
> + *  MPUMAP_REGION_FOUND: find an exact match in address
> + *  MPUMAP_REGION_INCLUSIVE: find an inclusive match in address
> + *  MPUMAP_REGION_OVERLAP: overlap with the existing mapping
> + */
> +static int mpumap_contain_region(pr_t *mpu, uint64_t nr_regions,
> +                                 paddr_t base, paddr_t limit, uint64_t *index)

Is it really possible to support 2^64 - 1 region? If so, is that the 
case on arm32 as well?

> +{
> +    uint64_t i = 0;
> +    uint64_t _index = INVALID_REGION;
> +
> +    /* Allow index to be NULL */
> +    index = index ?: &_index;
> +
> +    for ( ; i < nr_regions; i++ )
> +    {
> +        paddr_t iter_base = pr_get_base(&mpu[i]);
> +        paddr_t iter_limit = pr_get_limit(&mpu[i]);
> +
> +        /* Found an exact valid match */
> +        if ( (iter_base == base) && (iter_limit == limit) &&
> +             region_is_valid(&mpu[i]) )
> +        {
> +            *index = i;
> +            return MPUMAP_REGION_FOUND;
> +        }
> +
> +        /* No overlapping */
> +        if ( (iter_limit < base) || (iter_base > limit) )
> +            continue;
> +        /* Inclusive and valid */
> +        else if ( (base >= iter_base) && (limit <= iter_limit) &&
> +                  region_is_valid(&mpu[i]) )
> +        {
> +            *index = i;
> +            return MPUMAP_REGION_INCLUSIVE;
> +        }
> +        else
> +        {
> +            region_printk("Range 0x%"PRIpaddr" - 0x%"PRIpaddr" overlaps with the existing region 0x%"PRIpaddr" - 0x%"PRIpaddr"\n",
> +                          base, limit, iter_base, iter_limit);
> +            return MPUMAP_REGION_OVERLAP;
> +        }
> +    }
> +
> +    return MPUMAP_REGION_FAILED;
> +}
> +
> +/*
> + * Update an entry at the index @idx.
> + * @base:  base address
> + * @limit: limit address(exclusive)
> + * @flags: region attributes, should be the combination of REGION_HYPERVISOR_xx
> + */
> +static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
> +                                   unsigned int flags)
> +{
> +    uint64_t idx;
> +    int rc;
> +
> +    rc = mpumap_contain_region(xen_mpumap, max_xen_mpumap, base, limit - 1,
> +                               &idx);
> +    if ( rc == MPUMAP_REGION_OVERLAP )
> +        return -EINVAL;
> +
> +    /* We are inserting a mapping => Create new region. */
> +    if ( flags & _REGION_PRESENT )
> +    {
> +        if ( rc != MPUMAP_REGION_FAILED )
> +            return -EINVAL;
> +
> +        if ( xen_boot_mpu_regions_is_full() )
> +        {
> +            region_printk("There is no room left in EL2 MPU memory region mapping\n");
> +            return -ENOMEM;
> +        }
> +
> +        /* During boot time, the default index is next_fixed_region_idx. */
> +        if ( system_state <= SYS_STATE_active )
> +            idx = next_fixed_region_idx;
> +
> +        xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1, REGION_AI_MASK(flags));
> +        /* Set permission */
> +        xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
> +        xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
> +
> +        /* Update and enable the region */
> +        access_protection_region(false, NULL, (const pr_t*)(&xen_mpumap[idx]),
> +                                 idx);
> +
> +        if ( system_state <= SYS_STATE_active )
> +            update_boot_xen_mpumap_idx(idx);
> +    }
> +
> +    return 0;
> +}
> +
> +static int xen_mpumap_update(paddr_t base, paddr_t limit, unsigned int flags)
> +{
> +    int rc;
> +
> +    /*
> +     * The hardware was configured to forbid mapping both writeable and
> +     * executable.
> +     * When modifying/creating mapping (i.e _REGION_PRESENT is set),
> +     * prevent any update if this happen.
> +     */
> +    if ( (flags & _REGION_PRESENT) && !REGION_RO_MASK(flags) &&
> +         !REGION_XN_MASK(flags) )
> +    {
> +        region_printk("Mappings should not be both Writeable and Executable.\n");
> +        return -EINVAL;
> +    }
> +
> +    if ( !IS_ALIGNED(base, PAGE_SIZE) || !IS_ALIGNED(limit, PAGE_SIZE) )
> +    {
> +        region_printk("base address 0x%"PRIpaddr", or limit address 0x%"PRIpaddr" is not page aligned.\n",
> +                      base, limit);
> +        return -EINVAL;
> +    }
> +
> +    spin_lock(&xen_mpumap_lock);
> +
> +    rc = xen_mpumap_update_entry(base, limit, flags);
> +
> +    spin_unlock(&xen_mpumap_lock);
> +
> +    return rc;
> +}
> +
> +int map_pages_to_xen(unsigned long virt,
> +                     mfn_t mfn,
> +                     unsigned long nr_mfns,
> +                     unsigned int flags)
> +{
> +    ASSERT(virt == mfn_to_maddr(mfn));
> +
> +    return xen_mpumap_update(mfn_to_maddr(mfn),
> +                             mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
> +}
> +
>   /* TODO: Implementation on the first usage */
>   void dump_hyp_walk(vaddr_t addr)
>   {
> @@ -230,14 +498,6 @@ void *ioremap(paddr_t pa, size_t len)
>       return NULL;
>   }
>   
> -int map_pages_to_xen(unsigned long virt,
> -                     mfn_t mfn,
> -                     unsigned long nr_mfns,
> -                     unsigned int flags)
> -{
> -    return -ENOSYS;
> -}
> -

Why do you implement map_pages_to_xen() at a different place?


>   int destroy_xen_mappings(unsigned long s, unsigned long e)
>   {
>       return -ENOSYS;

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems
  2023-01-13  5:28 ` [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems Penny Zheng
@ 2023-02-05 21:52   ` Julien Grall
  2023-02-06 10:11   ` Julien Grall
  1 sibling, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-05 21:52 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> In MPU system, device tree binary can be packed with Xen
> image through CONFIG_DTB_FILE, or provided by bootloader through x0.
> 
> In MPU system, each section in xen.lds.S is PAGE_SIZE aligned.
> So in order to not overlap with the previous BSS section, dtb section
> should be made page-aligned too.
> We add . = ALIGN(PAGE_SIZE); in the head of dtb section to make it happen.
> 
> In this commit, we map early FDT with a transient MPU memory region at
> rear with REGION_HYPERVISOR_BOOT.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/arm64/mpu.h |  5 +++
>   xen/arch/arm/mm_mpu.c                | 63 +++++++++++++++++++++++++---
>   xen/arch/arm/xen.lds.S               |  5 ++-
>   3 files changed, 67 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
> index fcde6ad0db..b85e420a90 100644
> --- a/xen/arch/arm/include/asm/arm64/mpu.h
> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> @@ -45,18 +45,22 @@
>    * [3:4] Execute Never
>    * [5:6] Access Permission
>    * [7]   Region Present
> + * [8]   Boot-only Region
>    */
>   #define _REGION_AI_BIT            0
>   #define _REGION_XN_BIT            3
>   #define _REGION_AP_BIT            5
>   #define _REGION_PRESENT_BIT       7
> +#define _REGION_BOOTONLY_BIT      8

In a follow-up patch, you are replacing BOOTONLY with TRANSIENT. Please 
avoid renaming new functions and instead introduce them with the correct 
name from the start.

>   #define _REGION_XN                (2U << _REGION_XN_BIT)
>   #define _REGION_RO                (2U << _REGION_AP_BIT)
>   #define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
> +#define _REGION_BOOTONLY          (1U << _REGION_BOOTONLY_BIT)
>   #define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
>   #define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
>   #define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
>   #define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
> +#define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT) & 0x1U)
>   
>   /*
>    * _REGION_NORMAL is convenience define. It is not meant to be used
> @@ -68,6 +72,7 @@
>   #define REGION_HYPERVISOR_RO      (_REGION_NORMAL|_REGION_XN|_REGION_RO)
>   
>   #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
> +#define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
>   
>   #define INVALID_REGION            (~0UL)
>   
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index 08720a7c19..b34dbf4515 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -20,11 +20,16 @@
>    */
>   
>   #include <xen/init.h>
> +#include <xen/libfdt/libfdt.h>
>   #include <xen/mm.h>
>   #include <xen/page-size.h>
> +#include <xen/pfn.h>
> +#include <xen/sizes.h>
>   #include <xen/spinlock.h>
>   #include <asm/arm64/mpu.h>
> +#include <asm/early_printk.h>
>   #include <asm/page.h>
> +#include <asm/setup.h>
>   
>   #ifdef NDEBUG
>   static inline void
> @@ -62,6 +67,8 @@ uint64_t __ro_after_init max_xen_mpumap;
>   
>   static DEFINE_SPINLOCK(xen_mpumap_lock);
>   
> +static paddr_t dtb_paddr;
> +
>   /* Write a MPU protection region */
>   #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
>       uint64_t _sel = sel;                                                \
> @@ -403,7 +410,16 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
>   
>           /* During boot time, the default index is next_fixed_region_idx. */
>           if ( system_state <= SYS_STATE_active )
> -            idx = next_fixed_region_idx;
> +        {
> +            /*
> +             * If it is a boot-only region (i.e. region for early FDT),
> +             * it shall be added from the tail for late init re-organizing
> +             */
> +            if ( REGION_BOOTONLY_MASK(flags) )
> +                idx = next_transient_region_idx;
> +            else
> +                idx = next_fixed_region_idx;
> +        }
>   
>           xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1, REGION_AI_MASK(flags));
>           /* Set permission */
> @@ -465,14 +481,51 @@ int map_pages_to_xen(unsigned long virt,
>                                mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
>   }
>   
> -/* TODO: Implementation on the first usage */
> -void dump_hyp_walk(vaddr_t addr) > +void * __init early_fdt_map(paddr_t fdt_paddr)

A fair amount of this code is the same as the MMU version. Can we share 
some code?

>   {
> +    void *fdt_virt;
> +    uint32_t size;
> +
> +    /*
> +     * Check whether the physical FDT address is set and meets the minimum
> +     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
> +     * least 8 bytes so that we always access the magic and size fields
> +     * of the FDT header after mapping the first chunk, double check if
> +     * that is indeed the case.
> +     */
> +     BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
> +     if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
> +         return NULL;
> +
> +    dtb_paddr = fdt_paddr;
> +    /*
> +     * In MPU system, device tree binary can be packed with Xen image
> +     * through CONFIG_DTB_FILE, or provided by bootloader through x0.
> +     * Map FDT with a transient MPU memory region of MAX_FDT_SIZE.
> +     * After that, we can do some magic check.
> +     */
> +    if ( map_pages_to_xen(round_pgdown(fdt_paddr),
> +                          maddr_to_mfn(round_pgdown(fdt_paddr)),
> +                          round_pgup(MAX_FDT_SIZE) >> PAGE_SHIFT,
> +                          REGION_HYPERVISOR_BOOT) )
> +        panic("Unable to map the device-tree.\n");
> +
> +    /* VA == PA */
> +    fdt_virt = maddr_to_virt(fdt_paddr);
> +
> +    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
> +        return NULL;
> +
> +    size = fdt_totalsize(fdt_virt);
> +    if ( size > MAX_FDT_SIZE )
> +        return NULL;
> +
> +    return fdt_virt;
>   }
>   
> -void * __init early_fdt_map(paddr_t fdt_paddr)

Same as the previous patch, why do you implmenet early_fdt_map() at a 
different place in the file?

> +/* TODO: Implementation on the first usage */
> +void dump_hyp_walk(vaddr_t addr)
>   {
> -    return NULL;
>   }
>   
>   void __init remove_early_mappings(void)
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 79965a3c17..0565e22a1f 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -218,7 +218,10 @@ SECTIONS
>     _end = . ;
>   
>     /* Section for the device tree blob (if any). */
> -  .dtb : { *(.dtb) } :text
> +  .dtb : {
> +      . = ALIGN(PAGE_SIZE);
> +      *(.dtb)
> +  } :text
>   
>     DWARF2_DEBUG_SECTIONS
>   

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 23/40] xen/mpu: initialize frametable in MPU system
  2023-01-13  5:28 ` [PATCH v2 23/40] xen/mpu: initialize frametable in MPU system Penny Zheng
@ 2023-02-05 22:07   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-05 22:07 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> Xen is using page as the smallest granularity for memory managment.
> And we want to follow the same concept in MPU system.
> That is, structure page_info and the frametable which is used for storing
> and managing page_info is also required in MPU system.
> 
> In MPU system, since there is no virtual address translation (VA == PA),
> we can not use a fixed VA address(FRAMETABLE_VIRT_START) to map frametable
> like MMU system does.
> Instead, we define a variable "struct page_info *frame_table" as frametable
> pointer, and ask boot allocator to allocate memory for frametable.
> 
> As frametable is successfully initialized, the convertion between machine frame
> number/machine address/"virtual address" and page-info structure is
> ready too, like mfn_to_page/maddr_to_page/virt_to_page, etc
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/mm.h     | 15 ---------------
>   xen/arch/arm/include/asm/mm_mmu.h | 16 ++++++++++++++++

You are already moving some bits related to the frametable in an earlier 
patch. So I am a bit surprised to see some changes in mm_mmu.h here.

I would also rather prefer if the code changes are separated from the 
addition of the MPU code. They could them be moved at the beginning of 
the series and hopefully be merged before the rest (reducing the size 
this series).

>   xen/arch/arm/include/asm/mm_mpu.h | 17 +++++++++++++++++
>   xen/arch/arm/mm_mpu.c             | 25 +++++++++++++++++++++++++
>   4 files changed, 58 insertions(+), 15 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index e29158028a..7969ec9f98 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -176,7 +176,6 @@ struct page_info
>   
>   #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
>   
> -#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>   /* PDX of the first page in the frame table. */
>   extern unsigned long frametable_base_pdx;
>   
> @@ -280,20 +279,6 @@ static inline uint64_t gvirt_to_maddr(vaddr_t va, paddr_t *pa,
>   #define virt_to_mfn(va)     __virt_to_mfn(va)
>   #define mfn_to_virt(mfn)    __mfn_to_virt(mfn)
>   
> -/* Convert between Xen-heap virtual addresses and page-info structures. */
> -static inline struct page_info *virt_to_page(const void *v)
> -{
> -    unsigned long va = (unsigned long)v;
> -    unsigned long pdx;
> -
> -    ASSERT(va >= XENHEAP_VIRT_START);
> -    ASSERT(va < directmap_virt_end);
> -
> -    pdx = (va - XENHEAP_VIRT_START) >> PAGE_SHIFT;
> -    pdx += mfn_to_pdx(directmap_mfn_start);
> -    return frame_table + pdx - frametable_base_pdx;
> -}
> -
>   static inline void *page_to_virt(const struct page_info *pg)
>   {
>       return mfn_to_virt(mfn_x(page_to_mfn(pg)));
> diff --git a/xen/arch/arm/include/asm/mm_mmu.h b/xen/arch/arm/include/asm/mm_mmu.h
> index 6d7e5ddde7..bc1b04c4c7 100644
> --- a/xen/arch/arm/include/asm/mm_mmu.h
> +++ b/xen/arch/arm/include/asm/mm_mmu.h
> @@ -23,6 +23,8 @@ extern uint64_t init_ttbr;
>   extern void setup_directmap_mappings(unsigned long base_mfn,
>                                        unsigned long nr_mfns);
>   
> +#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
> +
>   static inline paddr_t __virt_to_maddr(vaddr_t va)
>   {
>       uint64_t par = va_to_par(va);
> @@ -49,6 +51,20 @@ static inline void *maddr_to_virt(paddr_t ma)
>   }
>   #endif
>   
> +/* Convert between Xen-heap virtual addresses and page-info structures. */
> +static inline struct page_info *virt_to_page(const void *v)
> +{
> +    unsigned long va = (unsigned long)v;
> +    unsigned long pdx;
> +
> +    ASSERT(va >= XENHEAP_VIRT_START);
> +    ASSERT(va < directmap_virt_end);
> +
> +    pdx = (va - XENHEAP_VIRT_START) >> PAGE_SHIFT;
> +    pdx += mfn_to_pdx(directmap_mfn_start);
> +    return frame_table + pdx - frametable_base_pdx;
> +}
> +
>   #endif /* __ARCH_ARM_MM_MMU__ */
>   
>   /*
> diff --git a/xen/arch/arm/include/asm/mm_mpu.h b/xen/arch/arm/include/asm/mm_mpu.h
> index fe6a828a50..eebd5b5d35 100644
> --- a/xen/arch/arm/include/asm/mm_mpu.h
> +++ b/xen/arch/arm/include/asm/mm_mpu.h
> @@ -9,6 +9,8 @@
>    */
>   extern void setup_static_mappings(void);
>   
> +extern struct page_info *frame_table;
> +
>   static inline paddr_t __virt_to_maddr(vaddr_t va)
>   {
>       /* In MPU system, VA == PA. */
> @@ -22,6 +24,21 @@ static inline void *maddr_to_virt(paddr_t ma)
>       return (void *)ma;
>   }
>   
> +/* Convert between virtual address to page-info structure. */
> +static inline struct page_info *virt_to_page(const void *v)
> +{
> +    unsigned long va = (unsigned long)v;
> +    unsigned long pdx;
> +
> +    /*
> +     * In MPU system, VA == PA, virt_to_maddr() outputs the
> +     * exact input address.
> +     */
You are describing an implementation details of virt_to_maddr() which 
doesn't matter here.

> +    pdx = mfn_to_pdx(maddr_to_mfn(virt_to_maddr(va)));

Why not using virt_to_mfn()?

Also, I would consider to add ASSERT(mfn_is_valid(...)) to confirm the 
MFN you pass is covered by the frametable. (This would be a sort of 
equalivent check to the MMU one).

> +
> +    return frame_table + pdx - frametable_base_pdx;
> +}
> +
>   #endif /* __ARCH_ARM_MM_MPU__ */
>   
>   /*
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index f057ee26df..7b282be4fb 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -69,6 +69,8 @@ static DEFINE_SPINLOCK(xen_mpumap_lock);
>   
>   static paddr_t dtb_paddr;
>   
> +struct page_info *frame_table;
> +
>   /* Write a MPU protection region */
>   #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
>       uint64_t _sel = sel;                                                \
> @@ -564,6 +566,29 @@ void __init setup_static_mappings(void)
>       /* TODO: guest memory section, device memory section, boot-module section, etc */
>   }
>   
> +/* Map a frame table to cover physical addresses ps through pe */
> +void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)

I have to admit, I am a bit puzzled why you added some stub in earlier 
patches for some functions but not for others. How did you make the 
decision on which one to stub?

> +{
> +    mfn_t base_mfn;
> +    unsigned long nr_pdxs = mfn_to_pdx(mfn_add(maddr_to_mfn(pe), -1)) -
> +                            mfn_to_pdx(maddr_to_mfn(ps)) + 1;
> +    unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
> +
> +    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
> +    frametable_size = ROUNDUP(frametable_size, PAGE_SIZE);

Maybe assert()/panic() the function is not called twice?

> +    /*
> +     * Since VA == PA in MPU and we've already setup Xenheap mapping
> +     * in setup_staticheap_mappings(), we could easily deduce the
> +     * "virtual address" of frame table.
> +     */
> +    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 1);
> +    frame_table = (struct page_info*)mfn_to_virt(base_mfn);

Coding style: "struct page_info *".

> +
> +    memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
> +    memset(&frame_table[nr_pdxs], -1,
> +           frametable_size - (nr_pdxs * sizeof(struct page_info)));
> +}
> +
>   /* TODO: Implementation on the first usage */
>   void dump_hyp_walk(vaddr_t addr)
>   {

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 36/40] xen/mpu: Use secure hypervisor timer for AArch64v8R
  2023-01-13  5:29 ` [PATCH v2 36/40] xen/mpu: Use secure hypervisor timer for AArch64v8R Penny Zheng
@ 2023-02-05 22:26   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-05 22:26 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:29, Penny Zheng wrote:
> As AArch64v8R only has one secure state, we have to use secure EL2 hypervisor
> timer for Xen in secure EL2.
> 
> In this patch, we introduce a Kconfig option ARM_SECURE_STATE.
> With this new Kconfig option, we can re-define the timer's
> system register name in different secure state, but keep the
> timer code flow unchanged.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/Kconfig                     |  7 +++++++
>   xen/arch/arm/include/asm/arm64/sysregs.h | 21 ++++++++++++++++++++-
>   xen/arch/arm/include/asm/cpregs.h        |  4 ++--
>   xen/arch/arm/time.c                      | 14 +++++++-------
>   4 files changed, 36 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 91491341c4..ee942a33bc 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -47,6 +47,13 @@ config ARM_EFI
>   	  be booted as an EFI application. This is only useful for
>   	  Xen that may run on systems that have UEFI firmware.
>   
> +config ARM_SECURE_STATE
> +	bool "Xen will run in Arm Secure State"
> +	depends on ARM_V8R
> +	help
> +	  In this state, a Processing Element (PE) can access the secure
> +	  physical address space, and the secure copy of banked registers.

Above, you suggest that the Armv8r will always use the secure EL2 timer. 
But here you allow the integrator to disable it. Why?

> +
>   config GICV3
>   	bool "GICv3 driver"
>   	depends on !NEW_VGIC
> diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h b/xen/arch/arm/include/asm/arm64/sysregs.h
> index c46daf6f69..9546e8e3d0 100644
> --- a/xen/arch/arm/include/asm/arm64/sysregs.h
> +++ b/xen/arch/arm/include/asm/arm64/sysregs.h
> @@ -458,7 +458,6 @@
>   #define ZCR_ELx_LEN_SIZE             9
>   #define ZCR_ELx_LEN_MASK             0x1ff
>   
> -/* System registers for Armv8-R AArch64 */

Why is this removed?

>   #ifdef CONFIG_HAS_MPU
>   
>   /* EL2 MPU Protection Region Base Address Register encode */
> @@ -510,6 +509,26 @@
>   
>   #endif
>   
> +#ifdef CONFIG_ARM_SECURE_STATE
> +/*
> + * The Armv8-R AArch64 architecture always executes code in Secure
> + * state with EL2 as the highest Exception.
> + *
> + * Hypervisor timer registers for Secure EL2.
> + */
> +#define CNTHPS_TVAL_EL2  S3_4_C14_C5_0
> +#define CNTHPS_CTL_EL2   S3_4_C14_C5_1
> +#define CNTHPS_CVAL_EL2  S3_4_C14_C5_2
> +#define CNTHPx_TVAL_EL2  CNTHPS_TVAL_EL2
> +#define CNTHPx_CTL_EL2   CNTHPS_CTL_EL2
> +#define CNTHPx_CVAL_EL2  CNTHPS_CVAL_EL2
> +#else
> +/* Hypervisor timer registers for Non-Secure EL2. */
> +#define CNTHPx_TVAL_EL2  CNTHP_TVAL_EL2
> +#define CNTHPx_CTL_EL2   CNTHP_CTL_EL2
> +#define CNTHPx_CVAL_EL2  CNTHP_CVAL_EL2
> +#endif /* CONFIG_ARM_SECURE_STATE */

Given there is only one state, I would actually prefer if we alias 
CNTHP_*_EL2 to CNTHPS_*_EL2. So there is no renaming.

> +
>   /* Access to system registers */
>   
>   #define WRITE_SYSREG64(v, name) do {                    \
> diff --git a/xen/arch/arm/include/asm/cpregs.h b/xen/arch/arm/include/asm/cpregs.h
> index 6b083de204..a704677fbc 100644
> --- a/xen/arch/arm/include/asm/cpregs.h
> +++ b/xen/arch/arm/include/asm/cpregs.h
> @@ -374,8 +374,8 @@
>   #define CLIDR_EL1               CLIDR
>   #define CNTFRQ_EL0              CNTFRQ
>   #define CNTHCTL_EL2             CNTHCTL
> -#define CNTHP_CTL_EL2           CNTHP_CTL
> -#define CNTHP_CVAL_EL2          CNTHP_CVAL
> +#define CNTHPx_CTL_EL2          CNTHP_CTL
> +#define CNTHPx_CVAL_EL2         CNTHP_CVAL
>   #define CNTKCTL_EL1             CNTKCTL
>   #define CNTPCT_EL0              CNTPCT
>   #define CNTP_CTL_EL0            CNTP_CTL
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index 433d7be909..3bba733b83 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -196,13 +196,13 @@ int reprogram_timer(s_time_t timeout)
>   
>       if ( timeout == 0 )
>       {
> -        WRITE_SYSREG(0, CNTHP_CTL_EL2);
> +        WRITE_SYSREG(0, CNTHPx_CTL_EL2);
>           return 1;
>       }
>   
>       deadline = ns_to_ticks(timeout) + boot_count;
> -    WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2);
> -    WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2);
> +    WRITE_SYSREG64(deadline, CNTHPx_CVAL_EL2);
> +    WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHPx_CTL_EL2);
>       isb();
>   
>       /* No need to check for timers in the past; the Generic Timer fires
> @@ -213,7 +213,7 @@ int reprogram_timer(s_time_t timeout)
>   /* Handle the firing timer */
>   static void htimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
>   {
> -    if ( unlikely(!(READ_SYSREG(CNTHP_CTL_EL2) & CNTx_CTL_PENDING)) )
> +    if ( unlikely(!(READ_SYSREG(CNTHPx_CTL_EL2) & CNTx_CTL_PENDING)) )
>           return;
>   
>       perfc_incr(hyp_timer_irqs);
> @@ -222,7 +222,7 @@ static void htimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
>       raise_softirq(TIMER_SOFTIRQ);
>   
>       /* Disable the timer to avoid more interrupts */
> -    WRITE_SYSREG(0, CNTHP_CTL_EL2);
> +    WRITE_SYSREG(0, CNTHPx_CTL_EL2);
>   }
>   
>   static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
> @@ -281,7 +281,7 @@ void init_timer_interrupt(void)
>       /* Do not let the VMs program the physical timer, only read the physical counter */
>       WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
>       WRITE_SYSREG(0, CNTP_CTL_EL0);    /* Physical timer disabled */
> -    WRITE_SYSREG(0, CNTHP_CTL_EL2);   /* Hypervisor's timer disabled */
> +    WRITE_SYSREG(0, CNTHPx_CTL_EL2);   /* Hypervisor's timer disabled */
>       isb();
>   
>       request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> @@ -301,7 +301,7 @@ void init_timer_interrupt(void)
>   static void deinit_timer_interrupt(void)
>   {
>       WRITE_SYSREG(0, CNTP_CTL_EL0);    /* Disable physical timer */
> -    WRITE_SYSREG(0, CNTHP_CTL_EL2);   /* Disable hypervisor's timer */
> +    WRITE_SYSREG(0, CNTHPx_CTL_EL2);   /* Disable hypervisor's timer */
>       isb();
>   
>       release_irq(timer_irq[TIMER_HYP_PPI], NULL);

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems
  2023-01-13  5:28 ` [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems Penny Zheng
  2023-02-05 21:52   ` Julien Grall
@ 2023-02-06 10:11   ` Julien Grall
  2023-02-07  6:30     ` Penny Zheng
  1 sibling, 1 reply; 122+ messages in thread
From: Julien Grall @ 2023-02-06 10:11 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

A few more remarks.

On 13/01/2023 05:28, Penny Zheng wrote:
> In MPU system, device tree binary can be packed with Xen
> image through CONFIG_DTB_FILE, or provided by bootloader through x0.
> 
> In MPU system, each section in xen.lds.S is PAGE_SIZE aligned.
> So in order to not overlap with the previous BSS section, dtb section
> should be made page-aligned too.
> We add . = ALIGN(PAGE_SIZE); in the head of dtb section to make it happen.
> 
> In this commit, we map early FDT with a transient MPU memory region at
> rear with REGION_HYPERVISOR_BOOT.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/arm64/mpu.h |  5 +++
>   xen/arch/arm/mm_mpu.c                | 63 +++++++++++++++++++++++++---
>   xen/arch/arm/xen.lds.S               |  5 ++-
>   3 files changed, 67 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
> index fcde6ad0db..b85e420a90 100644
> --- a/xen/arch/arm/include/asm/arm64/mpu.h
> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> @@ -45,18 +45,22 @@
>    * [3:4] Execute Never
>    * [5:6] Access Permission
>    * [7]   Region Present
> + * [8]   Boot-only Region
>    */
>   #define _REGION_AI_BIT            0
>   #define _REGION_XN_BIT            3
>   #define _REGION_AP_BIT            5
>   #define _REGION_PRESENT_BIT       7
> +#define _REGION_BOOTONLY_BIT      8
>   #define _REGION_XN                (2U << _REGION_XN_BIT)
>   #define _REGION_RO                (2U << _REGION_AP_BIT)
>   #define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
> +#define _REGION_BOOTONLY          (1U << _REGION_BOOTONLY_BIT)
>   #define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
>   #define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
>   #define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
>   #define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
> +#define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT) & 0x1U)
>   
>   /*
>    * _REGION_NORMAL is convenience define. It is not meant to be used
> @@ -68,6 +72,7 @@
>   #define REGION_HYPERVISOR_RO      (_REGION_NORMAL|_REGION_XN|_REGION_RO)
>   
>   #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
> +#define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
>   
>   #define INVALID_REGION            (~0UL)
>   
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index 08720a7c19..b34dbf4515 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -20,11 +20,16 @@
>    */
>   
>   #include <xen/init.h>
> +#include <xen/libfdt/libfdt.h>
>   #include <xen/mm.h>
>   #include <xen/page-size.h>
> +#include <xen/pfn.h>
> +#include <xen/sizes.h>
>   #include <xen/spinlock.h>
>   #include <asm/arm64/mpu.h>
> +#include <asm/early_printk.h>
>   #include <asm/page.h>
> +#include <asm/setup.h>
>   
>   #ifdef NDEBUG
>   static inline void
> @@ -62,6 +67,8 @@ uint64_t __ro_after_init max_xen_mpumap;
>   
>   static DEFINE_SPINLOCK(xen_mpumap_lock);
>   
> +static paddr_t dtb_paddr;
> +
>   /* Write a MPU protection region */
>   #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
>       uint64_t _sel = sel;                                                \
> @@ -403,7 +410,16 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
>   
>           /* During boot time, the default index is next_fixed_region_idx. */
>           if ( system_state <= SYS_STATE_active )
> -            idx = next_fixed_region_idx;
> +        {
> +            /*
> +             * If it is a boot-only region (i.e. region for early FDT),
> +             * it shall be added from the tail for late init re-organizing
> +             */
> +            if ( REGION_BOOTONLY_MASK(flags) )
> +                idx = next_transient_region_idx;
> +            else
> +                idx = next_fixed_region_idx;
> +        }
>   
>           xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1, REGION_AI_MASK(flags));
>           /* Set permission */
> @@ -465,14 +481,51 @@ int map_pages_to_xen(unsigned long virt,
>                                mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
>   }
>   
> -/* TODO: Implementation on the first usage */
> -void dump_hyp_walk(vaddr_t addr)
> +void * __init early_fdt_map(paddr_t fdt_paddr)
>   {
> +    void *fdt_virt;
> +    uint32_t size;
> +
> +    /*
> +     * Check whether the physical FDT address is set and meets the minimum
> +     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
> +     * least 8 bytes so that we always access the magic and size fields
> +     * of the FDT header after mapping the first chunk, double check if
> +     * that is indeed the case.
> +     */
> +     BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
> +     if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
> +         return NULL;
> +
> +    dtb_paddr = fdt_paddr;
> +    /*
> +     * In MPU system, device tree binary can be packed with Xen image
> +     * through CONFIG_DTB_FILE, or provided by bootloader through x0.

The behavior you describe is not specific to the MPU system. I also 
don't quite understand how describing the method to pass the DT actually 
matters here.

> +     * Map FDT with a transient MPU memory region of MAX_FDT_SIZE.
> +     * After that, we can do some magic check.
> +     */
> +    if ( map_pages_to_xen(round_pgdown(fdt_paddr),

I haven't looked at the rest of the series. But from here, it seems a 
bit strange to use map_pages_to_xen() because the virt and the phys 
should be the same.

Do you plan to share some code where map_pages_to_xen() will be used?

> +                          maddr_to_mfn(round_pgdown(fdt_paddr)),
> +                          round_pgup(MAX_FDT_SIZE) >> PAGE_SHIFT,

This will not work properly is the Device-Tree is MAX_FDT_SIZE (could 
already be page-aligned) but the start address is not page-aligned.

But I think trying to map the maximum size from the start could 
potentially result to some issue. Below the excerpt from the Image 
documentation:

"The device tree blob (dtb) must be placed on an 8-byte boundary and 
must not exceed 2 megabytes in size. Since the dtb will be mapped 
cacheable using blocks of up to 2 megabytes in size, it must not be 
placed within any 2M region which must be mapped with any specific 
attributes."

So it would be better to map the first 2MB. Check the size and then 
re-map with an extra 2MB if needed.

> +                          REGION_HYPERVISOR_BOOT) ) > +        panic("Unable to map the device-tree.\n");
> +
> +    /* VA == PA */

I have seen in a few places where you add a similar comment. But I am 
not sure to understand how this help to describe the implementation of 
maddr_to_virt().

> +    fdt_virt = maddr_to_virt(fdt_paddr);
> +
> +    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
> +        return NULL;
> +
> +    size = fdt_totalsize(fdt_virt);
> +    if ( size > MAX_FDT_SIZE )
> +        return NULL;
> +
> +    return fdt_virt;
>   }
>   
> -void * __init early_fdt_map(paddr_t fdt_paddr)
> +/* TODO: Implementation on the first usage */
> +void dump_hyp_walk(vaddr_t addr)
>   {
> -    return NULL;
>   }
>   
>   void __init remove_early_mappings(void)
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 79965a3c17..0565e22a1f 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -218,7 +218,10 @@ SECTIONS
>     _end = . ;
>   
>     /* Section for the device tree blob (if any). */
> -  .dtb : { *(.dtb) } :text
> +  .dtb : {
> +      . = ALIGN(PAGE_SIZE);
> +      *(.dtb)
> +  } :text
>   
>     DWARF2_DEBUG_SECTIONS
>   

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h
  2023-02-05 21:30   ` Julien Grall
@ 2023-02-07  3:59     ` Penny Zheng
  2023-02-07  8:41       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-02-07  3:59 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Monday, February 6, 2023 5:30 AM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 15/40] xen/arm: move MMU-specific memory
> management code to mm_mmu.c/mm_mmu.h
> 
> Hi,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > From: Wei Chen <wei.chen@arm.com>
> >
> > To make the code readable and maintainable, we move MMU-specific
> > memory management code from mm.c to mm_mmu.c and move MMU-
> specific
> > definitions from mm.h to mm_mmu.h.
> > Later we will create mm_mpu.h and mm_mpu.c for MPU-specific memory
> > management code.
> 
> This sentence implies there is no mm_mpu.{c, h} yet and this is not touched
> within this patch. However...
> 
> 
> > This will avoid lots of #ifdef in memory management code and header files.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > ---
> >   xen/arch/arm/Makefile             |    5 +
> >   xen/arch/arm/include/asm/mm.h     |   19 +-
> >   xen/arch/arm/include/asm/mm_mmu.h |   35 +
> >   xen/arch/arm/mm.c                 | 1352 +---------------------------
> >   xen/arch/arm/mm_mmu.c             | 1376
> +++++++++++++++++++++++++++++
> >   xen/arch/arm/mm_mpu.c             |   67 ++
> 
> ... It looks like they already exists and you are modifying them. That
> said, it would be better if this patch only contains code movement (IOW
> no MPU changes).
> 
> >   6 files changed, 1488 insertions(+), 1366 deletions(-)
> >   create mode 100644 xen/arch/arm/include/asm/mm_mmu.h
> >   create mode 100644 xen/arch/arm/mm_mmu.c
> 
> I don't particular like the naming. I think it would make more sense to
> introduce two directories: "mmu" and "mpu" which includes code specific
> to each flavor of Xen.
> 
[...]
> >
> > -
> > -/* Release all __init and __initdata ranges to be reused */
> > -void free_init_memory(void)
> 
> This function doesn't look specific to the MMU.
> 

Functions like, early_fdt_map[1] / setup_frametable_mappings[2] / free_init_memory [3] ...
they both share quite the same logic as MMU does in MPU system, the difference could only
be address translation regime. Still, in order to avoid putting too much #ifdef here and there,
I implement different MMU and MPU version of them.
 
Or I keep them in generic file here, then in future commits when we implement MPU version
of them(I list related commits below), I transfer them to MMU file there.

Wdyt?

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-01/msg00774.html 
[2] https://lists.xenproject.org/archives/html/xen-devel/2023-01/msg00787.html 	
[3] https://lists.xenproject.org/archives/html/xen-devel/2023-01/msg00786.html 

> > -{
> > -    paddr_t pa = virt_to_maddr(__init_begin);
> > -    unsigned long len = __init_end - __init_begin;
> > -    uint32_t insn;
> > -    unsigned int i, nr = len / sizeof(insn);
> > -    uint32_t *p;
> > -    int rc;
> > -
> > -    rc = modify_xen_mappings((unsigned long)__init_begin,
> > -                             (unsigned long)__init_end, PAGE_HYPERVISOR_RW);
> > -    if ( rc )
> > -        panic("Unable to map RW the init section (rc = %d)\n", rc);
> > -
> > -    /*
> > -     * From now on, init will not be used for execution anymore,
> > -     * so nuke the instruction cache to remove entries related to init.
> > -     */
> > -    invalidate_icache_local();
> > -
> > -#ifdef CONFIG_ARM_32
> > -    /* udf instruction i.e (see A8.8.247 in ARM DDI 0406C.c) */
> > -    insn = 0xe7f000f0;
> > -#else
> > -    insn = AARCH64_BREAK_FAULT;
> > -#endif
> > -    p = (uint32_t *)__init_begin;
> > -    for ( i = 0; i < nr; i++ )
> > -        *(p + i) = insn;
> > -
> > -    rc = destroy_xen_mappings((unsigned long)__init_begin,
> > -                              (unsigned long)__init_end);
> > -    if ( rc )
> > -        panic("Unable to remove the init section (rc = %d)\n", rc);
> > -
> > -    init_domheap_pages(pa, pa + len);
> > -    printk("Freed %ldkB init memory.\n", (long)(__init_end-
> __init_begin)>>10);
> > -}
> > -
> 
> 
> [...]
> 
> > diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> > index 43e9a1be4d..87a12042cc 100644
> > --- a/xen/arch/arm/mm_mpu.c
> > +++ b/xen/arch/arm/mm_mpu.c
> > @@ -20,8 +20,10 @@
> >    */
> >
> >   #include <xen/init.h>
> > +#include <xen/mm.h>
> >   #include <xen/page-size.h>
> >   #include <asm/arm64/mpu.h>
> > +#include <asm/page.h>
> 
> Regardless of what I wrote above, none of the code you add seems to
> require <asm/page.h>
> 
> >
> >   /* Xen MPU memory region mapping table. */
> >   pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
> > @@ -38,6 +40,71 @@ uint64_t __ro_after_init next_transient_region_idx;
> >   /* Maximum number of supported MPU memory regions by the EL2 MPU.
> */
> >   uint64_t __ro_after_init max_xen_mpumap;
> >
> > +/* TODO: Implementation on the first usage */
> 
> It is not clear what you mean given there are some callers.
> 
> > +void dump_hyp_walk(vaddr_t addr)
> > +{
> 
> Please add ASSERT_UNREACHABLE() for any dummy helper you have
> introduced
> any plan to implement later. This will be helpful to track down any
> function you haven't implemented.
> 
> 
> > +}
> > +
> > +void * __init early_fdt_map(paddr_t fdt_paddr)
> > +{
> > +    return NULL;
> > +}
> > +
> > +void __init remove_early_mappings(void)
> > +{
> 
> Ditto
> 
> > +}
> > +
> > +int init_secondary_pagetables(int cpu)
> > +{
> > +    return -ENOSYS;
> > +}
> > +
> > +void mmu_init_secondary_cpu(void)
> > +{
> 
> Ditto. The name of the function is also a bit odd given this is an MPU
> specific file. This likely want to be renamed to mm_init_secondary_cpu().
> 
> > +}
> > +
> > +void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
> > +{
> > +    return NULL;
> > +}
> > +
> > +void *ioremap(paddr_t pa, size_t len)
> > +{
> > +    return NULL;
> > +}
> > +
> > +int map_pages_to_xen(unsigned long virt,
> > +                     mfn_t mfn,
> > +                     unsigned long nr_mfns,
> > +                     unsigned int flags)
> > +{
> > +    return -ENOSYS;
> > +}
> > +
> > +int destroy_xen_mappings(unsigned long s, unsigned long e)
> > +{
> > +    return -ENOSYS;
> > +}
> > +
> > +int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int
> flags)
> > +{
> > +    return -ENOSYS;
> > +}
> > +
> > +void free_init_memory(void)
> > +{
> 
> Ditto.
> 
> > +}
> > +
> > +int xenmem_add_to_physmap_one(
> > +    struct domain *d,
> > +    unsigned int space,
> > +    union add_to_physmap_extra extra,
> > +    unsigned long idx,
> > +    gfn_t gfn)
> > +{
> > +    return -ENOSYS;
> > +}
> > +
> >   /*
> >    * Local variables:
> >    * mode: C
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings
  2023-02-05 21:32   ` Julien Grall
@ 2023-02-07  4:40     ` Penny Zheng
  0 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-02-07  4:40 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Monday, February 6, 2023 5:32 AM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings
> 
> Hi,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > Function setup_pagetables is responsible for boot-time pagetable setup
> > in MMU system.
> > But in MPU system, we have already built up start-of-day Xen MPU
> > memory region mapping at the very beginning in assembly.
> >
> > So in order to keep only one codeflow in arm/setup.c,
> > setup_mm_mappings , with a more generic name, is introduced and act as
> > an empty stub in MPU system.
> 
> is the empty stub temporarily?
> 
> >
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/include/asm/mm.h     |  2 ++
> >   xen/arch/arm/include/asm/mm_mpu.h | 16 ++++++++++++++++
> >   xen/arch/arm/setup.c              |  2 +-
> >   3 files changed, 19 insertions(+), 1 deletion(-)
> >   create mode 100644 xen/arch/arm/include/asm/mm_mpu.h
> >
> > diff --git a/xen/arch/arm/include/asm/mm.h
> > b/xen/arch/arm/include/asm/mm.h index 1b9fdb6ff5..9b4c07d965 100644
> > --- a/xen/arch/arm/include/asm/mm.h
> > +++ b/xen/arch/arm/include/asm/mm.h
> > @@ -243,6 +243,8 @@ static inline void __iomem *ioremap_wc(paddr_t
> > start, size_t len)
> >
> >   #ifndef CONFIG_HAS_MPU
> >   #include <asm/mm_mmu.h>
> > +#else
> > +#include <asm/mm_mpu.h>
> >   #endif
> >
> >   /* Page-align address and convert to frame number format */ diff
> > --git a/xen/arch/arm/include/asm/mm_mpu.h
> > b/xen/arch/arm/include/asm/mm_mpu.h
> > new file mode 100644
> > index 0000000000..1f3cff7743
> > --- /dev/null
> > +++ b/xen/arch/arm/include/asm/mm_mpu.h
> > @@ -0,0 +1,16 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */ #ifndef
> > +__ARCH_ARM_MM_MPU__ #define __ARCH_ARM_MM_MPU__
> > +
> > +#define setup_mm_mappings(boot_phys_offset)
> > +((void)(boot_phys_offset))
> > +
> > +#endif /* __ARCH_ARM_MM_MPU__ */
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c index
> > 1f26f67b90..d7d200179c 100644
> > --- a/xen/arch/arm/setup.c
> > +++ b/xen/arch/arm/setup.c
> > @@ -1003,7 +1003,7 @@ void __init start_xen(unsigned long
> boot_phys_offset,
> >       /* Initialize traps early allow us to get backtrace when an error occurred
> */
> >       init_traps();
> >
> > -    setup_pagetables(boot_phys_offset);
> > +    setup_mm_mappings(boot_phys_offset);
> 
> You are renaming the caller but not the function. Why?
> 

It is a reorg mistake.  MMU-related implementation has been mistakenly
put in previous commit "[PATCH v2 15/40] xen/arm: move MMU-specific
memory management code to mm_mmu.c/mm_mmu.h"(https://lists.xenproject.org/archives/html/xen-devel/2023-01/msg00776.html )
Sorry for that.
I'll extract the relative codes from the previous commit:
'''
+/* Boot-time pagetable setup */
+#define setup_mm_mappings(boot_phys_offset) setup_pagetables(boot_phys_offset)
'''

> >
> >       smp_clear_cpu_maps();
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table
  2023-02-05 21:45   ` Julien Grall
@ 2023-02-07  5:07     ` Penny Zheng
  0 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-02-07  5:07 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Monday, February 6, 2023 5:46 AM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU
> mapping table
> 
> Hi,
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > The new helper xen_mpumap_update() is responsible for updating an
> > entry in Xen MPU memory mapping table, including creating a new entry,
> > updating or destroying an existing one.
> >
> > This commit only talks about populating a new entry in Xen MPU mapping
> > table( xen_mpumap). Others will be introduced in the following commits.
> >
> > In xen_mpumap_update_entry(), firstly, we shall check if requested
> > address range [base, limit) is not mapped. Then we use pr_of_xenaddr()
> > to build up the structure of MPU memory region(pr_t).
> > In the last, we set memory attribute and permission based on variable
> @flags.
> >
> > To summarize all region attributes in one variable @flags, layout of
> > the flags is elaborated as follows:
> > [0:2] Memory attribute Index
> > [3:4] Execute Never
> > [5:6] Access Permission
> > [7]   Region Present
> > Also, we provide a set of definitions(REGION_HYPERVISOR_RW, etc) that
> > combine the memory attribute and permission for common combinations.
> >
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/include/asm/arm64/mpu.h |  72 +++++++
> >   xen/arch/arm/mm_mpu.c                | 276 ++++++++++++++++++++++++++-
> >   2 files changed, 340 insertions(+), 8 deletions(-)
> >
[...]
> > +
> > +#define MPUMAP_REGION_FAILED    0
> > +#define MPUMAP_REGION_FOUND     1
> > +#define MPUMAP_REGION_INCLUSIVE 2
> > +#define MPUMAP_REGION_OVERLAP   3
> > +
> > +/*
> > + * Check whether memory range [base, limit] is mapped in MPU memory
> > + * region table \mpu. Only address range is considered, memory
> > +attributes
> > + * and permission are not considered here.
> > + * If we find the match, the associated index will be filled up.
> > + * If the entry is not present, INVALID_REGION will be set in \index
> > + *
> > + * Make sure that parameter \base and \limit are both referring
> > + * inclusive addresss
> > + *
> > + * Return values:
> > + *  MPUMAP_REGION_FAILED: no mapping and no overmapping
> > + *  MPUMAP_REGION_FOUND: find an exact match in address
> > + *  MPUMAP_REGION_INCLUSIVE: find an inclusive match in address
> > + *  MPUMAP_REGION_OVERLAP: overlap with the existing mapping  */
> > +static int mpumap_contain_region(pr_t *mpu, uint64_t nr_regions,
> > +                                 paddr_t base, paddr_t limit,
> > +uint64_t *index)
> 
> Is it really possible to support 2^64 - 1 region? If so, is that the case on arm32
> as well?
> 

No, the allowable bitwidth is 8 bit. I'll change it uint8_t instead

> > +{
> > +    uint64_t i = 0;
> > +    uint64_t _index = INVALID_REGION;
> > +
> > +    /* Allow index to be NULL */
> > +    index = index ?: &_index;
> > +
> > +    for ( ; i < nr_regions; i++ )
> > +    {
> > +        paddr_t iter_base = pr_get_base(&mpu[i]);
> > +        paddr_t iter_limit = pr_get_limit(&mpu[i]);
> > +
> > +        /* Found an exact valid match */
> > +        if ( (iter_base == base) && (iter_limit == limit) &&
> > +             region_is_valid(&mpu[i]) )
> > +        {
> > +            *index = i;
> > +            return MPUMAP_REGION_FOUND;
> > +        }
> > +
> > +        /* No overlapping */
> > +        if ( (iter_limit < base) || (iter_base > limit) )
> > +            continue;
> > +        /* Inclusive and valid */
> > +        else if ( (base >= iter_base) && (limit <= iter_limit) &&
> > +                  region_is_valid(&mpu[i]) )
> > +        {
> > +            *index = i;
> > +            return MPUMAP_REGION_INCLUSIVE;
> > +        }
> > +        else
> > +        {
> > +            region_printk("Range 0x%"PRIpaddr" - 0x%"PRIpaddr" overlaps
> with the existing region 0x%"PRIpaddr" - 0x%"PRIpaddr"\n",
> > +                          base, limit, iter_base, iter_limit);
> > +            return MPUMAP_REGION_OVERLAP;
> > +        }
> > +    }
> > +
> > +    return MPUMAP_REGION_FAILED;
> > +}
> > +
> > +/*
> > + * Update an entry at the index @idx.
> > + * @base:  base address
> > + * @limit: limit address(exclusive)
> > + * @flags: region attributes, should be the combination of
> > +REGION_HYPERVISOR_xx  */ static int
> xen_mpumap_update_entry(paddr_t
> > +base, paddr_t limit,
> > +                                   unsigned int flags) {
> > +    uint64_t idx;
> > +    int rc;
> > +
> > +    rc = mpumap_contain_region(xen_mpumap, max_xen_mpumap, base,
> limit - 1,
> > +                               &idx);
> > +    if ( rc == MPUMAP_REGION_OVERLAP )
> > +        return -EINVAL;
> > +
> > +    /* We are inserting a mapping => Create new region. */
> > +    if ( flags & _REGION_PRESENT )
> > +    {
> > +        if ( rc != MPUMAP_REGION_FAILED )
> > +            return -EINVAL;
> > +
> > +        if ( xen_boot_mpu_regions_is_full() )
> > +        {
> > +            region_printk("There is no room left in EL2 MPU memory region
> mapping\n");
> > +            return -ENOMEM;
> > +        }
> > +
> > +        /* During boot time, the default index is next_fixed_region_idx. */
> > +        if ( system_state <= SYS_STATE_active )
> > +            idx = next_fixed_region_idx;
> > +
> > +        xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1,
> REGION_AI_MASK(flags));
> > +        /* Set permission */
> > +        xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
> > +        xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
> > +
> > +        /* Update and enable the region */
> > +        access_protection_region(false, NULL, (const
> pr_t*)(&xen_mpumap[idx]),
> > +                                 idx);
> > +
> > +        if ( system_state <= SYS_STATE_active )
> > +            update_boot_xen_mpumap_idx(idx);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int xen_mpumap_update(paddr_t base, paddr_t limit, unsigned
> > +int flags) {
> > +    int rc;
> > +
> > +    /*
> > +     * The hardware was configured to forbid mapping both writeable and
> > +     * executable.
> > +     * When modifying/creating mapping (i.e _REGION_PRESENT is set),
> > +     * prevent any update if this happen.
> > +     */
> > +    if ( (flags & _REGION_PRESENT) && !REGION_RO_MASK(flags) &&
> > +         !REGION_XN_MASK(flags) )
> > +    {
> > +        region_printk("Mappings should not be both Writeable and
> Executable.\n");
> > +        return -EINVAL;
> > +    }
> > +
> > +    if ( !IS_ALIGNED(base, PAGE_SIZE) || !IS_ALIGNED(limit, PAGE_SIZE) )
> > +    {
> > +        region_printk("base address 0x%"PRIpaddr", or limit address
> 0x%"PRIpaddr" is not page aligned.\n",
> > +                      base, limit);
> > +        return -EINVAL;
> > +    }
> > +
> > +    spin_lock(&xen_mpumap_lock);
> > +
> > +    rc = xen_mpumap_update_entry(base, limit, flags);
> > +
> > +    spin_unlock(&xen_mpumap_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +int map_pages_to_xen(unsigned long virt,
> > +                     mfn_t mfn,
> > +                     unsigned long nr_mfns,
> > +                     unsigned int flags) {
> > +    ASSERT(virt == mfn_to_maddr(mfn));
> > +
> > +    return xen_mpumap_update(mfn_to_maddr(mfn),
> > +                             mfn_to_maddr(mfn_add(mfn, nr_mfns)),
> > +flags); }
> > +
> >   /* TODO: Implementation on the first usage */
> >   void dump_hyp_walk(vaddr_t addr)
> >   {
> > @@ -230,14 +498,6 @@ void *ioremap(paddr_t pa, size_t len)
> >       return NULL;
> >   }
> >
> > -int map_pages_to_xen(unsigned long virt,
> > -                     mfn_t mfn,
> > -                     unsigned long nr_mfns,
> > -                     unsigned int flags)
> > -{
> > -    return -ENOSYS;
> > -}
> > -
> 
> Why do you implement map_pages_to_xen() at a different place?
> 
> 
> >   int destroy_xen_mappings(unsigned long s, unsigned long e)
> >   {
> >       return -ENOSYS;
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems
  2023-02-06 10:11   ` Julien Grall
@ 2023-02-07  6:30     ` Penny Zheng
  2023-02-07  8:47       ` Julien Grall
  0 siblings, 1 reply; 122+ messages in thread
From: Penny Zheng @ 2023-02-07  6:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Monday, February 6, 2023 6:11 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: Re: [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU
> systems
> 
> Hi,
> 
> A few more remarks.
> 
> On 13/01/2023 05:28, Penny Zheng wrote:
> > In MPU system, device tree binary can be packed with Xen image through
> > CONFIG_DTB_FILE, or provided by bootloader through x0.
> >
> > In MPU system, each section in xen.lds.S is PAGE_SIZE aligned.
> > So in order to not overlap with the previous BSS section, dtb section
> > should be made page-aligned too.
> > We add . = ALIGN(PAGE_SIZE); in the head of dtb section to make it happen.
> >
> > In this commit, we map early FDT with a transient MPU memory region at
> > rear with REGION_HYPERVISOR_BOOT.
> >
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/include/asm/arm64/mpu.h |  5 +++
> >   xen/arch/arm/mm_mpu.c                | 63 +++++++++++++++++++++++++---
> >   xen/arch/arm/xen.lds.S               |  5 ++-
> >   3 files changed, 67 insertions(+), 6 deletions(-)
> >
> > diff --git a/xen/arch/arm/include/asm/arm64/mpu.h
> > b/xen/arch/arm/include/asm/arm64/mpu.h
> > index fcde6ad0db..b85e420a90 100644
> > --- a/xen/arch/arm/include/asm/arm64/mpu.h
> > +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> > @@ -45,18 +45,22 @@
> >    * [3:4] Execute Never
> >    * [5:6] Access Permission
> >    * [7]   Region Present
> > + * [8]   Boot-only Region
> >    */
> >   #define _REGION_AI_BIT            0
> >   #define _REGION_XN_BIT            3
> >   #define _REGION_AP_BIT            5
> >   #define _REGION_PRESENT_BIT       7
> > +#define _REGION_BOOTONLY_BIT      8
> >   #define _REGION_XN                (2U << _REGION_XN_BIT)
> >   #define _REGION_RO                (2U << _REGION_AP_BIT)
> >   #define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
> > +#define _REGION_BOOTONLY          (1U << _REGION_BOOTONLY_BIT)
> >   #define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
> >   #define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
> >   #define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
> >   #define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
> > +#define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT)
> & 0x1U)
> >
> >   /*
> >    * _REGION_NORMAL is convenience define. It is not meant to be used
> > @@ -68,6 +72,7 @@
> >   #define REGION_HYPERVISOR_RO
> (_REGION_NORMAL|_REGION_XN|_REGION_RO)
> >
> >   #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
> > +#define REGION_HYPERVISOR_BOOT
> (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
> >
> >   #define INVALID_REGION            (~0UL)
> >
> > diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c index
> > 08720a7c19..b34dbf4515 100644
> > --- a/xen/arch/arm/mm_mpu.c
> > +++ b/xen/arch/arm/mm_mpu.c
> > @@ -20,11 +20,16 @@
> >    */
> >
> >   #include <xen/init.h>
> > +#include <xen/libfdt/libfdt.h>
> >   #include <xen/mm.h>
> >   #include <xen/page-size.h>
> > +#include <xen/pfn.h>
> > +#include <xen/sizes.h>
> >   #include <xen/spinlock.h>
> >   #include <asm/arm64/mpu.h>
> > +#include <asm/early_printk.h>
> >   #include <asm/page.h>
> > +#include <asm/setup.h>
> >
> >   #ifdef NDEBUG
> >   static inline void
> > @@ -62,6 +67,8 @@ uint64_t __ro_after_init max_xen_mpumap;
> >
> >   static DEFINE_SPINLOCK(xen_mpumap_lock);
> >
> > +static paddr_t dtb_paddr;
> > +
> >   /* Write a MPU protection region */
> >   #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
> >       uint64_t _sel = sel;                                                \
> > @@ -403,7 +410,16 @@ static int xen_mpumap_update_entry(paddr_t
> base,
> > paddr_t limit,
> >
> >           /* During boot time, the default index is next_fixed_region_idx. */
> >           if ( system_state <= SYS_STATE_active )
> > -            idx = next_fixed_region_idx;
> > +        {
> > +            /*
> > +             * If it is a boot-only region (i.e. region for early FDT),
> > +             * it shall be added from the tail for late init re-organizing
> > +             */
> > +            if ( REGION_BOOTONLY_MASK(flags) )
> > +                idx = next_transient_region_idx;
> > +            else
> > +                idx = next_fixed_region_idx;
> > +        }
> >
> >           xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1,
> REGION_AI_MASK(flags));
> >           /* Set permission */
> > @@ -465,14 +481,51 @@ int map_pages_to_xen(unsigned long virt,
> >                                mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
> >   }
> >
> > -/* TODO: Implementation on the first usage */ -void
> > dump_hyp_walk(vaddr_t addr)
> > +void * __init early_fdt_map(paddr_t fdt_paddr)
> >   {
> > +    void *fdt_virt;
> > +    uint32_t size;
> > +
> > +    /*
> > +     * Check whether the physical FDT address is set and meets the
> minimum
> > +     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to
> be at
> > +     * least 8 bytes so that we always access the magic and size fields
> > +     * of the FDT header after mapping the first chunk, double check if
> > +     * that is indeed the case.
> > +     */
> > +     BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
> > +     if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
> > +         return NULL;
> > +
> > +    dtb_paddr = fdt_paddr;
> > +    /*
> > +     * In MPU system, device tree binary can be packed with Xen image
> > +     * through CONFIG_DTB_FILE, or provided by bootloader through x0.
> 
> The behavior you describe is not specific to the MPU system. I also don't
> quite understand how describing the method to pass the DT actually matters
> here.
> 
> > +     * Map FDT with a transient MPU memory region of MAX_FDT_SIZE.
> > +     * After that, we can do some magic check.
> > +     */
> > +    if ( map_pages_to_xen(round_pgdown(fdt_paddr),
> 
> I haven't looked at the rest of the series. But from here, it seems a bit strange
> to use map_pages_to_xen() because the virt and the phys should be the
> same.
> 

Hmm, t thought map_pages_to_xen, is to set up memory mapping for access.
In MPU, we also need to set up a MPU memory region for the FDT, even without
virt-to-phys conversion 

> Do you plan to share some code where map_pages_to_xen() will be used?
> 

Each time, in C boot-time, we build up a new MPU memory region for stage 1
EL2 memory mapping, we use this map_pages_to_xen to complete.
I think it has the same effect as it has in MMU, other than MMU sets up
virt-to-phys memory mapping and MPU always sets up identity memory mapping.

> > +                          maddr_to_mfn(round_pgdown(fdt_paddr)),
> > +                          round_pgup(MAX_FDT_SIZE) >> PAGE_SHIFT,
> 
> This will not work properly is the Device-Tree is MAX_FDT_SIZE (could
> already be page-aligned) but the start address is not page-aligned.
> 
> But I think trying to map the maximum size from the start could potentially
> result to some issue. Below the excerpt from the Image
> documentation:
> 
> "The device tree blob (dtb) must be placed on an 8-byte boundary and must
> not exceed 2 megabytes in size. Since the dtb will be mapped cacheable using
> blocks of up to 2 megabytes in size, it must not be placed within any 2M
> region which must be mapped with any specific attributes."
> 
> So it would be better to map the first 2MB. Check the size and then re-map
> with an extra 2MB if needed.
> 

Oh, under special circumstances, the current implementation will map exceeding 2MB.
Thanks for explanation!
I will map as you suggested.

> > +                          REGION_HYPERVISOR_BOOT) ) > +        panic("Unable to
> map the device-tree.\n");
> > +
> > +    /* VA == PA */
> 
> I have seen in a few places where you add a similar comment. But I am not
> sure to understand how this help to describe the implementation of
> maddr_to_virt().
> 
> > +    fdt_virt = maddr_to_virt(fdt_paddr);
> > +
> > +    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
> > +        return NULL;
> > +
> > +    size = fdt_totalsize(fdt_virt);
> > +    if ( size > MAX_FDT_SIZE )
> > +        return NULL;
> > +
> > +    return fdt_virt;
> >   }
> >
> > -void * __init early_fdt_map(paddr_t fdt_paddr)
> > +/* TODO: Implementation on the first usage */ void
> > +dump_hyp_walk(vaddr_t addr)
> >   {
> > -    return NULL;
> >   }
> >
> >   void __init remove_early_mappings(void) diff --git
> > a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index
> > 79965a3c17..0565e22a1f 100644
> > --- a/xen/arch/arm/xen.lds.S
> > +++ b/xen/arch/arm/xen.lds.S
> > @@ -218,7 +218,10 @@ SECTIONS
> >     _end = . ;
> >
> >     /* Section for the device tree blob (if any). */
> > -  .dtb : { *(.dtb) } :text
> > +  .dtb : {
> > +      . = ALIGN(PAGE_SIZE);
> > +      *(.dtb)
> > +  } :text
> >
> >     DWARF2_DEBUG_SECTIONS
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h
  2023-02-07  3:59     ` Penny Zheng
@ 2023-02-07  8:41       ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-07  8:41 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 07/02/2023 03:59, Penny Zheng wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Monday, February 6, 2023 5:30 AM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 15/40] xen/arm: move MMU-specific memory
>> management code to mm_mmu.c/mm_mmu.h
>>
>> Hi,
>>
>> On 13/01/2023 05:28, Penny Zheng wrote:
>>> From: Wei Chen <wei.chen@arm.com>
>>>
>>> To make the code readable and maintainable, we move MMU-specific
>>> memory management code from mm.c to mm_mmu.c and move MMU-
>> specific
>>> definitions from mm.h to mm_mmu.h.
>>> Later we will create mm_mpu.h and mm_mpu.c for MPU-specific memory
>>> management code.
>>
>> This sentence implies there is no mm_mpu.{c, h} yet and this is not touched
>> within this patch. However...
>>
>>
>>> This will avoid lots of #ifdef in memory management code and header files.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>> ---
>>>    xen/arch/arm/Makefile             |    5 +
>>>    xen/arch/arm/include/asm/mm.h     |   19 +-
>>>    xen/arch/arm/include/asm/mm_mmu.h |   35 +
>>>    xen/arch/arm/mm.c                 | 1352 +---------------------------
>>>    xen/arch/arm/mm_mmu.c             | 1376
>> +++++++++++++++++++++++++++++
>>>    xen/arch/arm/mm_mpu.c             |   67 ++
>>
>> ... It looks like they already exists and you are modifying them. That
>> said, it would be better if this patch only contains code movement (IOW
>> no MPU changes).
>>
>>>    6 files changed, 1488 insertions(+), 1366 deletions(-)
>>>    create mode 100644 xen/arch/arm/include/asm/mm_mmu.h
>>>    create mode 100644 xen/arch/arm/mm_mmu.c
>>
>> I don't particular like the naming. I think it would make more sense to
>> introduce two directories: "mmu" and "mpu" which includes code specific
>> to each flavor of Xen.
>>
> [...]
>>>
>>> -
>>> -/* Release all __init and __initdata ranges to be reused */
>>> -void free_init_memory(void)
>>
>> This function doesn't look specific to the MMU.
>>
> 
> Functions like, early_fdt_map[1] / setup_frametable_mappings[2] / free_init_memory [3] ...

I looked at setup_frametable_mappings() and didn't think it was possible 
to share much code. But I agree for early_fdt_map and free_init_memory().

> they both share quite the same logic as MMU does in MPU system, the difference could only
> be address translation regime. Still, in order to avoid putting too much #ifdef here and there,
> I implement different MMU and MPU version of them.

I am not sure why you would need to put #ifdef in the code. Looking at 
it, there is usually only a chunk that is different for the mapping. So 
you could provide an helper that will be implemented in the MMU/MPU code.

>   
> Or I keep them in generic file here, then in future commits when we implement MPU version
> of them(I list related commits below), I transfer them to MMU file there.

I am not entirely sure. In one way it help to figure out the commonality 
but on the other way we are mixing code movement and new code.

In this case, the code movement would probably small, so that might be 
better for the review.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems
  2023-02-07  6:30     ` Penny Zheng
@ 2023-02-07  8:47       ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-07  8:47 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: Wei Chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk



On 07/02/2023 06:30, Penny Zheng wrote:
> Hi Julien

Hi Penny,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Monday, February 6, 2023 6:11 PM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> Subject: Re: [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU
>> systems
>>
>> Hi,
>>
>> A few more remarks.
>>
>> On 13/01/2023 05:28, Penny Zheng wrote:
>>> In MPU system, device tree binary can be packed with Xen image through
>>> CONFIG_DTB_FILE, or provided by bootloader through x0.
>>>
>>> In MPU system, each section in xen.lds.S is PAGE_SIZE aligned.
>>> So in order to not overlap with the previous BSS section, dtb section
>>> should be made page-aligned too.
>>> We add . = ALIGN(PAGE_SIZE); in the head of dtb section to make it happen.
>>>
>>> In this commit, we map early FDT with a transient MPU memory region at
>>> rear with REGION_HYPERVISOR_BOOT.
>>>
>>> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/include/asm/arm64/mpu.h |  5 +++
>>>    xen/arch/arm/mm_mpu.c                | 63 +++++++++++++++++++++++++---
>>>    xen/arch/arm/xen.lds.S               |  5 ++-
>>>    3 files changed, 67 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h
>>> b/xen/arch/arm/include/asm/arm64/mpu.h
>>> index fcde6ad0db..b85e420a90 100644
>>> --- a/xen/arch/arm/include/asm/arm64/mpu.h
>>> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
>>> @@ -45,18 +45,22 @@
>>>     * [3:4] Execute Never
>>>     * [5:6] Access Permission
>>>     * [7]   Region Present
>>> + * [8]   Boot-only Region
>>>     */
>>>    #define _REGION_AI_BIT            0
>>>    #define _REGION_XN_BIT            3
>>>    #define _REGION_AP_BIT            5
>>>    #define _REGION_PRESENT_BIT       7
>>> +#define _REGION_BOOTONLY_BIT      8
>>>    #define _REGION_XN                (2U << _REGION_XN_BIT)
>>>    #define _REGION_RO                (2U << _REGION_AP_BIT)
>>>    #define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
>>> +#define _REGION_BOOTONLY          (1U << _REGION_BOOTONLY_BIT)
>>>    #define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
>>>    #define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
>>>    #define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
>>>    #define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
>>> +#define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT)
>> & 0x1U)
>>>
>>>    /*
>>>     * _REGION_NORMAL is convenience define. It is not meant to be used
>>> @@ -68,6 +72,7 @@
>>>    #define REGION_HYPERVISOR_RO
>> (_REGION_NORMAL|_REGION_XN|_REGION_RO)
>>>
>>>    #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
>>> +#define REGION_HYPERVISOR_BOOT
>> (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
>>>
>>>    #define INVALID_REGION            (~0UL)
>>>
>>> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c index
>>> 08720a7c19..b34dbf4515 100644
>>> --- a/xen/arch/arm/mm_mpu.c
>>> +++ b/xen/arch/arm/mm_mpu.c
>>> @@ -20,11 +20,16 @@
>>>     */
>>>
>>>    #include <xen/init.h>
>>> +#include <xen/libfdt/libfdt.h>
>>>    #include <xen/mm.h>
>>>    #include <xen/page-size.h>
>>> +#include <xen/pfn.h>
>>> +#include <xen/sizes.h>
>>>    #include <xen/spinlock.h>
>>>    #include <asm/arm64/mpu.h>
>>> +#include <asm/early_printk.h>
>>>    #include <asm/page.h>
>>> +#include <asm/setup.h>
>>>
>>>    #ifdef NDEBUG
>>>    static inline void
>>> @@ -62,6 +67,8 @@ uint64_t __ro_after_init max_xen_mpumap;
>>>
>>>    static DEFINE_SPINLOCK(xen_mpumap_lock);
>>>
>>> +static paddr_t dtb_paddr;
>>> +
>>>    /* Write a MPU protection region */
>>>    #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
>>>        uint64_t _sel = sel;                                                \
>>> @@ -403,7 +410,16 @@ static int xen_mpumap_update_entry(paddr_t
>> base,
>>> paddr_t limit,
>>>
>>>            /* During boot time, the default index is next_fixed_region_idx. */
>>>            if ( system_state <= SYS_STATE_active )
>>> -            idx = next_fixed_region_idx;
>>> +        {
>>> +            /*
>>> +             * If it is a boot-only region (i.e. region for early FDT),
>>> +             * it shall be added from the tail for late init re-organizing
>>> +             */
>>> +            if ( REGION_BOOTONLY_MASK(flags) )
>>> +                idx = next_transient_region_idx;
>>> +            else
>>> +                idx = next_fixed_region_idx;
>>> +        }
>>>
>>>            xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1,
>> REGION_AI_MASK(flags));
>>>            /* Set permission */
>>> @@ -465,14 +481,51 @@ int map_pages_to_xen(unsigned long virt,
>>>                                 mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
>>>    }
>>>
>>> -/* TODO: Implementation on the first usage */ -void
>>> dump_hyp_walk(vaddr_t addr)
>>> +void * __init early_fdt_map(paddr_t fdt_paddr)
>>>    {
>>> +    void *fdt_virt;
>>> +    uint32_t size;
>>> +
>>> +    /*
>>> +     * Check whether the physical FDT address is set and meets the
>> minimum
>>> +     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to
>> be at
>>> +     * least 8 bytes so that we always access the magic and size fields
>>> +     * of the FDT header after mapping the first chunk, double check if
>>> +     * that is indeed the case.
>>> +     */
>>> +     BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
>>> +     if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
>>> +         return NULL;
>>> +
>>> +    dtb_paddr = fdt_paddr;
>>> +    /*
>>> +     * In MPU system, device tree binary can be packed with Xen image
>>> +     * through CONFIG_DTB_FILE, or provided by bootloader through x0.
>>
>> The behavior you describe is not specific to the MPU system. I also don't
>> quite understand how describing the method to pass the DT actually matters
>> here.
>>
>>> +     * Map FDT with a transient MPU memory region of MAX_FDT_SIZE.
>>> +     * After that, we can do some magic check.
>>> +     */
>>> +    if ( map_pages_to_xen(round_pgdown(fdt_paddr),
>>
>> I haven't looked at the rest of the series. But from here, it seems a bit strange
>> to use map_pages_to_xen() because the virt and the phys should be the
>> same.
>>
> 
> Hmm, t thought map_pages_to_xen, is to set up memory mapping for access.
> In MPU, we also need to set up a MPU memory region for the FDT, even without
> virt-to-phys conversion

I think my point was misunderstood. I agree that we need a function to 
update the MPU. Instead I was asking whether using map_pages_to_xen() 
rather than creating a new helper with an MPU specific would not be 
better so we don't have to pass a pointless parameter (virt). That's why...

> 
>> Do you plan to share some code where map_pages_to_xen() will be used?

... I was asking if you were going to share code with the MMU that may 
end up to use this function.

If yes, then I agree in common code, it would be best to use 
map_pages_to_xen(). For MPU specific code, I would consider to provide 
an helper that doesn't need the virt to reduce the amount of unnecessary 
code.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 25/40] xen/mpu: map MPU guest memory section before static memory initialization
  2023-01-13  5:28 ` [PATCH v2 25/40] xen/mpu: map MPU guest memory section before static memory initialization Penny Zheng
@ 2023-02-09 10:51   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-09 10:51 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi,

On 13/01/2023 05:28, Penny Zheng wrote:
> Previous commit introduces a new device tree property
> "mpu,guest-memory-section" to define MPU guest memory section, which
> will mitigate the scattering of statically-configured guest RAM.
> 
> We only need to set up MPU memory region mapping for MPU guest memory section
> to have access to all guest RAM.
> And this should happen before static memory initialization(init_staticmem_pages())
> 
> MPU memory region for MPU guest memory secction gets switched out when
> idle vcpu leaving, to avoid region overlapping if the vcpu enters into guest
> mode later. On the contrary, it gets switched in when idle vcpu entering.

As I pointed out in a separate thread, I don't quite understand why you 
are making the difference between idle vCPU and guest vCPU.

> We introduce a bit in region "region.prlar.sw"(struct pr_t region) to
> indicate this kind of feature.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/arm64/mpu.h | 14 ++++++---
>   xen/arch/arm/mm_mpu.c                | 47 +++++++++++++++++++++++++---
>   2 files changed, 53 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
> index b85e420a90..0044bbf05d 100644
> --- a/xen/arch/arm/include/asm/arm64/mpu.h
> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> @@ -45,22 +45,26 @@
>    * [3:4] Execute Never
>    * [5:6] Access Permission
>    * [7]   Region Present
> - * [8]   Boot-only Region
> + * [8:9] 0b00: Fixed Region; 0b01: Boot-only Region;
> + *       0b10: Region needs switching out/in during vcpu context switch;
>    */
>   #define _REGION_AI_BIT            0
>   #define _REGION_XN_BIT            3
>   #define _REGION_AP_BIT            5
>   #define _REGION_PRESENT_BIT       7
> -#define _REGION_BOOTONLY_BIT      8
> +#define _REGION_TRANSIENT_BIT     8
>   #define _REGION_XN                (2U << _REGION_XN_BIT)
>   #define _REGION_RO                (2U << _REGION_AP_BIT)
>   #define _REGION_PRESENT           (1U << _REGION_PRESENT_BIT)
> -#define _REGION_BOOTONLY          (1U << _REGION_BOOTONLY_BIT)
> +#define _REGION_BOOTONLY          (1U << _REGION_TRANSIENT_BIT)
> +#define _REGION_SWITCH            (2U << _REGION_TRANSIENT_BIT)
>   #define REGION_AI_MASK(x)         (((x) >> _REGION_AI_BIT) & 0x7U)
>   #define REGION_XN_MASK(x)         (((x) >> _REGION_XN_BIT) & 0x3U)
>   #define REGION_AP_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x3U)
>   #define REGION_RO_MASK(x)         (((x) >> _REGION_AP_BIT) & 0x2U)
>   #define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT) & 0x1U)
> +#define REGION_SWITCH_MASK(x)     (((x) >> _REGION_TRANSIENT_BIT) & 0x2U)
> +#define REGION_TRANSIENT_MASK(x)  (((x) >> _REGION_TRANSIENT_BIT) & 0x3U)
>   
>   /*
>    * _REGION_NORMAL is convenience define. It is not meant to be used
> @@ -73,6 +77,7 @@
>   
>   #define REGION_HYPERVISOR         REGION_HYPERVISOR_RW
>   #define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
> +#define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
>   
>   #define INVALID_REGION            (~0UL)
>   
> @@ -98,7 +103,8 @@ typedef union {
>           unsigned long ns:1;     /* Not-Secure */
>           unsigned long res:1;    /* Reserved 0 by hardware */
>           unsigned long limit:42; /* Limit Address */
> -        unsigned long pad:16;
> +        unsigned long pad:15;
> +        unsigned long sw:1;     /* Region gets switched out/in during vcpu context switch? */
>       } reg;
>       uint64_t bits;
>   } prlar_t;
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index 7b282be4fb..d2e19e836c 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -71,6 +71,10 @@ static paddr_t dtb_paddr;
>   
>   struct page_info *frame_table;
>   
> +static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
> +    REGION_HYPERVISOR_SWITCH,
> +};
> +
>   /* Write a MPU protection region */
>   #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
>       uint64_t _sel = sel;                                                \
> @@ -414,10 +418,13 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
>           if ( system_state <= SYS_STATE_active )
>           {
>               /*
> -             * If it is a boot-only region (i.e. region for early FDT),
> -             * it shall be added from the tail for late init re-organizing
> +             * If it is a transient region, including boot-only region
> +             * (i.e. region for early FDT), and region which needs switching
> +             * in/out during vcpu context switch(i.e. region for guest memory
> +             * section), it shall be added from the tail for late init
> +             * re-organizing
>                */
> -            if ( REGION_BOOTONLY_MASK(flags) )
> +            if ( REGION_TRANSIENT_MASK(flags) )

Please introduce REGION_TRANSIENT_MAKS() from the start.

>                   idx = next_transient_region_idx;
>               else
>                   idx = next_fixed_region_idx;
> @@ -427,6 +434,13 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
>           /* Set permission */
>           xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
>           xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
> +        /*
> +         * Bit sw indicates that region gets switched out when idle vcpu
> +         * leaving hypervisor mode, and region gets switched in when idle vcpu
> +         * entering hypervisor mode.
> +         */


The idle vCPU will never exit the hypervisor mode. In that fact vCPU 
only exists for the scheduling purpose. So I don't quite understand this 
comment.

> +        if ( REGION_SWITCH_MASK(flags) )
> +            xen_mpumap[idx].prlar.reg.sw = 1;
>   
>           /* Update and enable the region */
>           access_protection_region(false, NULL, (const pr_t*)(&xen_mpumap[idx]),
> @@ -552,6 +566,29 @@ static void __init setup_staticheap_mappings(void)
>       }
>   }
>   
> +static void __init map_mpu_memory_section_on_boot(enum mpu_section_info type,
> +                                                  unsigned int flags)
> +{
> +    unsigned int i = 0;
> +
> +    for ( ; i < mpuinfo.sections[type].nr_banks; i++ )
> +    {
> +        paddr_t start = round_pgup(
> +                        mpuinfo.sections[type].bank[i].start);
> +        paddr_t size = round_pgdown(mpuinfo.sections[type].bank[i].size);

I think it would be better if we force the address to be aligned in the 
Device-Tree. This will avoid the user to chase why a part of the region 
is not mapped.

> +
> +        /*
> +         * Map MPU memory section with transient MPU memory region,
> +         * as they are either boot-only, or will be switched out/in
> +         * during vcpu context switch(i.e. guest memory section).
> +         */
> +        if ( map_pages_to_xen(start, maddr_to_mfn(start), size >> PAGE_SHIFT,
> +                              flags) )
> +            panic("mpu: failed to map MPU memory section %s\n",
> +                  mpu_section_info_str[type]);
> +    }
> +}
> +
>   /*
>    * System RAM is statically partitioned into different functionality
>    * section in Device Tree, including static xenheap, guest memory
> @@ -563,7 +600,9 @@ void __init setup_static_mappings(void)
>   {
>       setup_staticheap_mappings();
>   
> -    /* TODO: guest memory section, device memory section, boot-module section, etc */
> +    for ( uint8_t i = MSINFO_GUEST; i < MSINFO_MAX; i++ )
> +        map_mpu_memory_section_on_boot(i, mpu_section_mattr[i]);
> +    /* TODO: device memory section, boot-module section, etc */
>   }
>   
>   /* Map a frame table to cover physical addresses ps through pe */

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 26/40] xen/mpu: destroy an existing entry in Xen MPU memory mapping table
  2023-01-13  5:28 ` [PATCH v2 26/40] xen/mpu: destroy an existing entry in Xen MPU memory mapping table Penny Zheng
@ 2023-02-09 10:57   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-09 10:57 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:
> This commit expands xen_mpumap_update/xen_mpumap_update_entry to include
> destroying an existing entry.
> 
> We define a new helper "control_xen_mpumap_region_from_index" to enable/disable
> the MPU region based on index. If region is within [0, 31], we could quickly
> disable the MPU region through PRENR_EL2 which provides direct access to the
> PRLAR_EL2.EN bits of EL2 MPU regions.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/arm64/mpu.h     | 20 ++++++
>   xen/arch/arm/include/asm/arm64/sysregs.h |  3 +
>   xen/arch/arm/mm_mpu.c                    | 77 ++++++++++++++++++++++--
>   3 files changed, 95 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
> index 0044bbf05d..c1dea1c8e9 100644
> --- a/xen/arch/arm/include/asm/arm64/mpu.h
> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> @@ -16,6 +16,8 @@
>    */
>   #define ARM_MAX_MPU_MEMORY_REGIONS 255
>   
> +#define MPU_PRENR_BITS    32
> +
>   /* Access permission attributes. */
>   /* Read/Write at EL2, No Access at EL1/EL0. */
>   #define AP_RW_EL2 0x0
> @@ -132,6 +134,24 @@ typedef struct {
>       _pr->prlar.reg.en;                                      \
>   })
>   
> +/*
> + * Access to get base address of MPU protection region(pr_t).
> + * The base address shall be zero extended.
> + */
> +#define pr_get_base(pr) ({                                  \
> +    pr_t *_pr = pr;                                         \
> +    (uint64_t)_pr->prbar.reg.base << MPU_REGION_SHIFT;      \
> +})

Can this be a static inline?

> +
> +/*
> + * Access to get limit address of MPU protection region(pr_t).
> + * The limit address shall be concatenated with 0x3f.
> + */
> +#define pr_get_limit(pr) ({                                        \
> +    pr_t *_pr = pr;                                                \
> +    (uint64_t)((_pr->prlar.reg.limit << MPU_REGION_SHIFT) | 0x3f); \
> +})

Same.

> +
>   #endif /* __ASSEMBLY__ */
>   
>   #endif /* __ARM64_MPU_H__ */
> diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h b/xen/arch/arm/include/asm/arm64/sysregs.h
> index aca9bca5b1..c46daf6f69 100644
> --- a/xen/arch/arm/include/asm/arm64/sysregs.h
> +++ b/xen/arch/arm/include/asm/arm64/sysregs.h
> @@ -505,6 +505,9 @@
>   /* MPU Type registers encode */
>   #define MPUIR_EL2 S3_4_C0_C0_4
>   
> +/* MPU Protection Region Enable Register encode */
> +#define PRENR_EL2 S3_4_C6_C1_1
> +
>   #endif
>   
>   /* Access to system registers */
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index d2e19e836c..3a0d110b13 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -385,6 +385,45 @@ static int mpumap_contain_region(pr_t *mpu, uint64_t nr_regions,
>       return MPUMAP_REGION_FAILED;
>   }
>   
> +/* Disable or enable EL2 MPU memory region at index #index */
> +static void control_mpu_region_from_index(uint64_t index, bool enable)
> +{
> +    pr_t region;
> +
> +    access_protection_region(true, &region, NULL, index);
> +    if ( (region_is_valid(&region) && enable) ||
> +         (!region_is_valid(&region) && !enable) )

You could write:

!(region_is_valid(&region) ^ enable)

> +    {
> +        printk(XENLOG_WARNING
> +               "mpu: MPU memory region[%lu] is already %s\n", index,
> +               enable ? "enabled" : "disabled");
> +        return;
> +    }
> +
> +    /*
> +     * ARM64v8R provides PRENR_EL2 to have direct access to the
> +     * PRLAR_EL2.EN bits of EL2 MPU regions from 0 to 31.
> +     */
> +    if ( index < MPU_PRENR_BITS )
> +    {
> +        uint64_t orig, after;
> +
> +        orig = READ_SYSREG(PRENR_EL2);
> +        if ( enable )
> +            /* Set respective bit */
> +            after = orig | (1UL << index);
> +        else
> +            /* Clear respective bit */
> +            after = orig & (~(1UL << index));
> +        WRITE_SYSREG(after, PRENR_EL2);

Don't you need an isb (or similar) to ensure this is visible before...

> +    }
> +    else
> +    {
> +        region.prlar.reg.en = enable ? 1 : 0;
> +        access_protection_region(false, NULL, (const pr_t*)&region, index);
> +    }
> +}
> +
>   /*
>    * Update an entry at the index @idx.
>    * @base:  base address
> @@ -449,6 +488,30 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
>           if ( system_state <= SYS_STATE_active )
>               update_boot_xen_mpumap_idx(idx);
>       }
> +    else
> +    {
> +        /*
> +         * Currently, we only support destroying a *WHOLE* MPU memory region,
> +         * part-region removing is not supported, as in worst case, it will
> +         * lead to two fragments in result after destroying.
> +         * part-region removing will be introduced only when actual usage
> +         * comes.
> +         */
> +        if ( rc == MPUMAP_REGION_INCLUSIVE )
> +        {
> +            region_printk("mpu: part-region removing is not supported\n");
> +            return -EINVAL;
> +        }
> +
> +        /* We are removing the region */
> +        if ( rc != MPUMAP_REGION_FOUND )
> +            return -EINVAL;
> +
> +        control_mpu_region_from_index(idx, false);
> +
> +        /* Clear the according MPU memory region entry.*/
> +        memset(&xen_mpumap[idx], 0, sizeof(pr_t));

... zeroing the entry? Also, you could use memzero() here.

> +    }
>   
>       return 0;
>   }
> @@ -589,6 +652,15 @@ static void __init map_mpu_memory_section_on_boot(enum mpu_section_info type,
>       }
>   }
>   
> +int destroy_xen_mappings(unsigned long s, unsigned long e)
> +{
> +    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
> +    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
> +    ASSERT(s <= e);
> +
> +    return xen_mpumap_update(s, e, 0);
> +}
> +
>   /*
>    * System RAM is statically partitioned into different functionality
>    * section in Device Tree, including static xenheap, guest memory
> @@ -656,11 +728,6 @@ void *ioremap(paddr_t pa, size_t len)
>       return NULL;
>   }
>   
> -int destroy_xen_mappings(unsigned long s, unsigned long e)
> -{
> -    return -ENOSYS;
> -}
> -
>   int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
>   {
>       return -ENOSYS;

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system
  2023-01-13 10:10   ` Jan Beulich
@ 2023-02-09 11:01     ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-09 11:01 UTC (permalink / raw)
  To: Jan Beulich, Penny Zheng
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	xen-devel

Hi,

On 13/01/2023 10:10, Jan Beulich wrote:
> On 13.01.2023 06:29, Penny Zheng wrote:
>> --- a/xen/arch/arm/Kconfig
>> +++ b/xen/arch/arm/Kconfig
>> @@ -13,9 +13,10 @@ config ARM
>>   	def_bool y
>>   	select HAS_ALTERNATIVE if !ARM_V8R
>>   	select HAS_DEVICE_TREE
>> +	select HAS_FIXMAP if !ARM_V8R
>>   	select HAS_PASSTHROUGH
>>   	select HAS_PDX
>> -	select HAS_PMAP
>> +	select HAS_PMAP if !ARM_V8R
>>   	select IOMMU_FORCE_PT_SHARE
>>   	select HAS_VMAP if !ARM_V8R
> 
> Thinking about it - wouldn't it make sense to fold HAS_VMAP and HAS_FIXMAP
> into a single HAS_MMU?

I think it would make sense.

Furthermore, this patch would be better towards the start of the series.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx
  2023-01-13  5:29 ` [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx Penny Zheng
  2023-01-13  9:49   ` Jan Beulich
@ 2023-02-09 11:14   ` Julien Grall
  1 sibling, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-09 11:14 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu

Hi,

On 13/01/2023 05:29, Penny Zheng wrote:
> Function ioremap_xxx is normally being used to remap device address ranges
> in MMU system during device driver initialization.
> 
> However, in MPU system, virtual translation is not supported and
> device memory layout is statically configured in Device Tree, and being mapped
> at very early stage.
> So here we only add a check to verify this assumption.
> 
> But for tolerating a few cases where the function is called to map for
> temporary copy and paste, like ioremap_wc in kernel image loading, the
> region attribute mismatch will be treated as warning than error.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/include/asm/arm64/mpu.h |  1 +
>   xen/arch/arm/include/asm/mm.h        | 16 ++++-
>   xen/arch/arm/include/asm/mm_mpu.h    |  2 +
>   xen/arch/arm/mm_mpu.c                | 88 ++++++++++++++++++++++++----
>   xen/include/xen/vmap.h               | 12 ++++
>   5 files changed, 106 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h b/xen/arch/arm/include/asm/arm64/mpu.h
> index 8e8679bc82..b4e50a9a0e 100644
> --- a/xen/arch/arm/include/asm/arm64/mpu.h
> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
> @@ -82,6 +82,7 @@
>   #define REGION_HYPERVISOR_BOOT    (REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
>   #define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
>   #define REGION_HYPERVISOR_NOCACHE (_REGION_DEVICE|MT_DEVICE_nGnRE|_REGION_SWITCH)
> +#define REGION_HYPERVISOR_WC      (_REGION_DEVICE|MT_NORMAL_NC)
>   
>   #define INVALID_REGION            (~0UL)
>   
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 7969ec9f98..fa44cfc50d 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -14,6 +14,10 @@
>   # error "unknown ARM variant"
>   #endif
>   
> +#if defined(CONFIG_HAS_MPU)
> +# include <asm/arm64/mpu.h>
> +#endif
> +
>   /* Align Xen to a 2 MiB boundary. */
>   #define XEN_PADDR_ALIGN (1 << 21)
>   
> @@ -198,19 +202,25 @@ extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
>   /* map a physical range in virtual memory */
>   void __iomem *ioremap_attr(paddr_t start, size_t len, unsigned int attributes);
>   
> +#ifndef CONFIG_HAS_MPU
> +#define DEFINE_ATTRIBUTE(var)   (PAGE_##var)
> +#else
> +#define DEFINE_ATTRIBUTE(var)   (REGION_##var)
> +#endif
The macro implies that part of the naming would be common between the 
MPU and MMU code. So I think it would be better if the full name is the 
shared.

My preference would be to go with PAGE_* as this is used for both x86 
and MMU arm. I think the naming would still be OK on the MPU because, 
AFAICT, you still map at a page granularity (or aligned to).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 34/40] xen/mpu: free init memory in MPU system
  2023-01-13  5:29 ` [PATCH v2 34/40] xen/mpu: free init memory in MPU system Penny Zheng
@ 2023-02-09 11:27   ` Julien Grall
  0 siblings, 0 replies; 122+ messages in thread
From: Julien Grall @ 2023-02-09 11:27 UTC (permalink / raw)
  To: Penny Zheng, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Penny,

On 13/01/2023 05:29, Penny Zheng wrote:
> This commit implements free_init_memory in MPU system, trying to keep
> the same strategy with MMU system.
> 
> In order to inserting BRK instruction into init code section, which
> aims to provok a fault on purpose, we should change init code section
> permission to RW at first.
> Function modify_xen_mappings is introduced to modify permission of the
> existing valid MPU memory region.
> 
> Then we nuke the instruction cache to remove entries related to init
> text.
> At last, we destroy these two MPU memory regions referring init text and
> init data using destroy_xen_mappings.
> 
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/mm_mpu.c | 85 ++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 83 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
> index 0b720004ee..de0c7d919a 100644
> --- a/xen/arch/arm/mm_mpu.c
> +++ b/xen/arch/arm/mm_mpu.c
> @@ -20,6 +20,7 @@
>    */
>   
>   #include <xen/init.h>
> +#include <xen/kernel.h>
>   #include <xen/libfdt/libfdt.h>
>   #include <xen/mm.h>
>   #include <xen/page-size.h>
> @@ -77,6 +78,8 @@ static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
>       REGION_HYPERVISOR_BOOT,
>   };
>   
> +extern char __init_data_begin[], __init_end[];

Now we have two places define __init_end as extern. Can this instead be 
defined in setup.h?

> +
>   /* Write a MPU protection region */
>   #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({       \
>       uint64_t _sel = sel;                                                \
> @@ -443,8 +446,41 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t limit,
>       if ( rc == MPUMAP_REGION_OVERLAP )
>           return -EINVAL;
>   
> +    /* We are updating the permission. */
> +    if ( (flags & _REGION_PRESENT) && (rc == MPUMAP_REGION_FOUND ||
> +                                       rc == MPUMAP_REGION_INCLUSIVE) )
> +    {
> +
> +        /*
> +         * Currently, we only support modifying a *WHOLE* MPU memory region,
> +         * part-region modification is not supported, as in worst case, it will
> +         * lead to three fragments in result after modification.
> +         * part-region modification will be introduced only when actual usage
> +         * come
> +         */
> +        if ( rc == MPUMAP_REGION_INCLUSIVE )
> +        {
> +            region_printk("mpu: part-region modification is not supported\n");
> +            return -EINVAL;
> +        }
> +
> +        /* We don't allow changing memory attributes. */
> +        if (xen_mpumap[idx].prlar.reg.ai != REGION_AI_MASK(flags) )
> +        {
> +            region_printk("Modifying memory attributes is not allowed (0x%x -> 0x%x).\n",
> +                          xen_mpumap[idx].prlar.reg.ai, REGION_AI_MASK(flags));
> +            return -EINVAL;
> +        }
> +
> +        /* Set new permission */
> +        xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
> +        xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
> +
> +        access_protection_region(false, NULL, (const pr_t*)(&xen_mpumap[idx]),
> +                                 idx);
> +    }
>       /* We are inserting a mapping => Create new region. */
> -    if ( flags & _REGION_PRESENT )
> +    else if ( flags & _REGION_PRESENT )
>       {
>           if ( rc != MPUMAP_REGION_FAILED )
>               return -EINVAL;
> @@ -831,11 +867,56 @@ void mmu_init_secondary_cpu(void)
>   
>   int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
>   {
> -    return -ENOSYS;
> +    ASSERT(IS_ALIGNED(s, PAGE_SIZE));
> +    ASSERT(IS_ALIGNED(e, PAGE_SIZE));
> +    ASSERT(s <= e);
> +    return xen_mpumap_update(s, e, flags);
>   }
>   
>   void free_init_memory(void)
>   {
> +    /* Kernel init text section. */
> +    paddr_t init_text = virt_to_maddr(_sinittext);
> +    paddr_t init_text_end = round_pgup(virt_to_maddr(_einittext));
> +    /* Kernel init data. */
> +    paddr_t init_data = virt_to_maddr(__init_data_begin);
> +    paddr_t init_data_end = round_pgup(virt_to_maddr(__init_end));
> +    unsigned long init_section[4] = {(unsigned long)init_text,
> +                                     (unsigned long)init_text_end,
> +                                     (unsigned long)init_data,
> +                                     (unsigned long)init_data_end};
> +    unsigned int nr_init = 2;

At first, it wasn't  obvious what's the 2 meant here. It also seems you 
expect the number to be in-sync with the one above.

I don't think the genericity is necessary here. But if you want it, then 
it would be better to use an array of structure (begin/end) so you can 
use ARRAY_SIZE() afterwards and avoid magic like "i * 2".

> +    uint32_t insn = AARCH64_BREAK_FAULT;

AMD is also working on 32-bit ARMv8R support. When it is easy (like) 
here it would best to avoid making the assumption about 64-bit only.

That said, to me it feels like a big part of this code could be shared 
with the MMU version.

> +    unsigned int i = 0, j = 0;
> +
> +    /* Change kernel init text section to RW. */
> +    modify_xen_mappings((unsigned long)init_text,
> +                        (unsigned long)init_text_end, REGION_HYPERVISOR_RW);
> +
> +    /*
> +     * From now on, init will not be used for execution anymore,
> +     * so nuke the instruction cache to remove entries related to init.
> +     */
> +    invalidate_icache_local();
> +
> +    /* Destroy two MPU memory regions referring init text and init data. */
> +    for ( ; i < nr_init; i++ )
> +    {
> +        uint32_t *p;
> +        unsigned int nr;
> +        int rc;
> +
> +        i = 2 * i;

... avoid such magic.

> +        p = (uint32_t *)init_section[i];
> +        nr = (init_section[i + 1] - init_section[i]) / sizeof(uint32_t);
> +
> +        for ( ; j < nr ; j++ )
> +            *(p + j) = insn;
> +
> +        rc = destroy_xen_mappings(init_section[i], init_section[i + 1]);
> +        if ( rc < 0 )
> +            panic("Unable to remove the init section (rc = %d)\n", rc);
> +    }
>   }
>   
>   int xenmem_add_to_physmap_one(

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h
  2023-01-19 14:20   ` Julien Grall
@ 2023-06-05  5:20     ` Penny Zheng
  0 siblings, 0 replies; 122+ messages in thread
From: Penny Zheng @ 2023-06-05  5:20 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.chen, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

[-- Attachment #1: Type: text/plain, Size: 3480 bytes --]

Hi,


Sorry for the late reply. Got sidetracked by other tasks...


On 2023/1/19 22:20, Julien Grall wrote:
> Hi,
>
> On 13/01/2023 05:28, Penny Zheng wrote:
>> From: Wei Chen <wei.chen@arm.com>
>>
>> Xen defines some global configuration macros for Arm in
>> config.h. We still want to use it for Armv8-R systems, but
>> there are some address related macros that are defined for
>> MMU systems. These macros will not be used by MPU systems,
>> Adding ifdefery with CONFIG_HAS_MPU to gate these macros
>> will result in a messy and hard-to-read/maintain code.
>>
>> So we keep some common definitions still in config.h, but
>> move virtual address related definitions to a new file -
>> config_mmu.h. And use a new file config_mpu.h to store
>> definitions for MPU systems. To avoid spreading #ifdef
>> everywhere, we keep the same definition names for MPU
>> systems, like XEN_VIRT_START and HYPERVISOR_VIRT_START,
>> but the definition contents are MPU specific.
>>
>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>> ---
>> v1 -> v2:
>> 1. Remove duplicated FIXMAP definitions from config_mmu.h
>> ---
>>   xen/arch/arm/include/asm/config.h     | 103 +++--------------------
>>   xen/arch/arm/include/asm/config_mmu.h | 112 ++++++++++++++++++++++++++
>>   xen/arch/arm/include/asm/config_mpu.h |  25 ++++++
>
> I think this patch wants to be split in two. So we keep code movement 
> separate from the introduction of new feature (e.g. MPU).
>
> Furthermore, I think it would be better to name the new header 
> layout_* (or similar).
>
> Lastly, you are going to introduce several file with _mmu or _mpu. I 
> would rather prefer if we create directory instead.
>
>
>>   3 files changed, 147 insertions(+), 93 deletions(-)
>>   create mode 100644 xen/arch/arm/include/asm/config_mmu.h
>>   create mode 100644 xen/arch/arm/include/asm/config_mpu.h
>>
>> diff --git a/xen/arch/arm/include/asm/config.h 
>> b/xen/arch/arm/include/asm/config.h
>> index 25a625ff08..86d8142959 100644
>> --- a/xen/arch/arm/include/asm/config.h
>> +++ b/xen/arch/arm/include/asm/config.h
>> @@ -48,6 +48,12 @@
>>     #define INVALID_VCPU_ID MAX_VIRT_CPUS
>>   +/* Used for calculating PDX */
>
> I am not entirely sure to understand the purpose of this comment.
>
>> +#ifdef CONFIG_ARM_64
>> +#define FfaRAMETABLE_SIZE        GB(32)
>> +#define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
>> +#endif
>> +
>
> Why do you only keep the 64-bit version in config.h?
>
> However... the frametable size is limited by the space we reserve in 
> the virtual address space. This would not be the case for the MPU.
>

Yes, but when calculating variable /pdx_group_valid/, which is defined 
as '''

/unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(/
                       (/FRAMETABLE_NR + PDX_GROUP_COUNT - 1) / 
PDX_GROUP_COUNT)] = { [0] = 1 }/ ''',

it relies on FRAMETABLE_NR to limit array length. If we are trying to 
get rid of the limit for the MPU, hmmm,

it may bring a lot of changes in pdx common codes, for example, maybe 
variable /pdx_group_valid/ needs to

be allocated in runtime, according actual frametable size, at least for 
MPU case.

So, here, I intend to keep the same limit as MMU has for MPU too, or any 
suggestion from you?


> So having the limit in common seems a bit odd. In fact, I think we 
> should look at getting rid of the limit for the MPU.
>
> [...]

[...]

> Cheers,
>
Cheers,

Penny Zheng

[-- Attachment #2: Type: text/html, Size: 5694 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, other threads:[~2023-06-05  5:21 UTC | newest]

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13  5:28 [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Penny Zheng
2023-01-13  5:28 ` [PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h Penny Zheng
2023-01-13 10:06   ` Julien Grall
2023-01-13 10:39     ` Penny Zheng
2023-01-13  5:28 ` [PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64 Penny Zheng
2023-01-17 23:09   ` Julien Grall
2023-01-18  2:19     ` Wei Chen
2023-01-13  5:28 ` [PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA Penny Zheng
2023-01-17 23:16   ` Julien Grall
2023-01-18  2:32     ` Wei Chen
2023-01-13  5:28 ` [PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R Penny Zheng
2023-01-17 23:24   ` Julien Grall
2023-01-18  3:00     ` Wei Chen
2023-01-18  9:44       ` Julien Grall
2023-01-18 10:22         ` Wei Chen
2023-01-18 10:59           ` Julien Grall
2023-01-18 11:27             ` Wei Chen
2023-01-13  5:28 ` [PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S Penny Zheng
2023-01-17 23:37   ` Julien Grall
2023-01-18  3:09     ` Wei Chen
2023-01-18  9:50       ` Julien Grall
2023-01-18 10:24         ` Wei Chen
2023-01-13  5:28 ` [PATCH v2 06/40] xen/arm64: move MMU related code from head.S to head_mmu.S Penny Zheng
2023-01-13  5:28 ` [PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections Penny Zheng
2023-01-17 23:46   ` Julien Grall
2023-01-18  2:18     ` Wei Chen
2023-01-18 10:55       ` Julien Grall
2023-01-18 11:40         ` Wei Chen
2023-01-13  5:28 ` [PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R Penny Zheng
2023-01-17 23:49   ` Julien Grall
2023-01-18  1:43     ` Wei Chen
2023-01-13  5:28 ` [PATCH v2 09/40] xen/arm: decouple copy_from_paddr with FIXMAP Penny Zheng
2023-01-13  5:28 ` [PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h Penny Zheng
2023-01-19 14:20   ` Julien Grall
2023-06-05  5:20     ` Penny Zheng
2023-01-13  5:28 ` [PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map Penny Zheng
2023-01-19 10:18   ` Ayan Kumar Halder
2023-01-29  6:47     ` Penny Zheng
2023-01-19 15:04   ` Julien Grall
2023-01-29  5:39     ` Penny Zheng
2023-01-29  7:37       ` Julien Grall
2023-01-30  5:45         ` Penny Zheng
2023-01-30  9:39           ` Julien Grall
2023-01-31  4:11             ` Penny Zheng
2023-01-31  9:27               ` Julien Grall
2023-02-01  5:39                 ` Penny Zheng
2023-02-01 18:56                   ` Julien Grall
2023-02-02 10:53                     ` Penny Zheng
2023-02-02 10:58                       ` Julien Grall
2023-02-02 11:30                         ` Penny Zheng
2023-01-13  5:28 ` [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement Penny Zheng
2023-01-23 17:07   ` Ayan Kumar Halder
2023-01-24 18:54   ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART Penny Zheng
2023-01-24 19:09   ` Julien Grall
2023-01-29  6:17     ` Penny Zheng
2023-01-29  7:43       ` Julien Grall
2023-01-30  6:24         ` Penny Zheng
2023-01-30 10:00           ` Julien Grall
2023-01-31  5:38             ` Penny Zheng
2023-01-31  9:41               ` Julien Grall
2023-02-01  5:36                 ` Penny Zheng
2023-02-01 19:26                   ` Julien Grall
2023-02-02  8:05                     ` Penny Zheng
2023-02-02 11:11                       ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 14/40] xen/arm64: head: Jump to the runtime mapping in enable_mm() Penny Zheng
2023-02-05 21:13   ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h Penny Zheng
2023-02-05 21:30   ` Julien Grall
2023-02-07  3:59     ` Penny Zheng
2023-02-07  8:41       ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 16/40] xen/arm: introduce setup_mm_mappings Penny Zheng
2023-02-05 21:32   ` Julien Grall
2023-02-07  4:40     ` Penny Zheng
2023-01-13  5:28 ` [PATCH v2 17/40] xen/mpu: plump virt/maddr/mfn convertion in MPU system Penny Zheng
2023-02-05 21:36   ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 18/40] xen/mpu: introduce helper access_protection_region Penny Zheng
2023-01-24 19:20   ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table Penny Zheng
2023-02-05 21:45   ` Julien Grall
2023-02-07  5:07     ` Penny Zheng
2023-01-13  5:28 ` [PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems Penny Zheng
2023-02-05 21:52   ` Julien Grall
2023-02-06 10:11   ` Julien Grall
2023-02-07  6:30     ` Penny Zheng
2023-02-07  8:47       ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 21/40] xen/arm: move MMU-specific setup_mm to setup_mmu.c Penny Zheng
2023-01-13  5:28 ` [PATCH v2 22/40] xen/mpu: implement MPU version of setup_mm in setup_mpu.c Penny Zheng
2023-01-13  5:28 ` [PATCH v2 23/40] xen/mpu: initialize frametable in MPU system Penny Zheng
2023-02-05 22:07   ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 24/40] xen/mpu: introduce "mpu,xxx-memory-section" Penny Zheng
2023-01-13  5:28 ` [PATCH v2 25/40] xen/mpu: map MPU guest memory section before static memory initialization Penny Zheng
2023-02-09 10:51   ` Julien Grall
2023-01-13  5:28 ` [PATCH v2 26/40] xen/mpu: destroy an existing entry in Xen MPU memory mapping table Penny Zheng
2023-02-09 10:57   ` Julien Grall
2023-01-13  5:29 ` [PATCH v2 27/40] xen/mpu: map device memory resource in MPU system Penny Zheng
2023-01-13  5:29 ` [PATCH v2 28/40] xen/mpu: map boot module section " Penny Zheng
2023-01-13  5:29 ` [PATCH v2 29/40] xen/mpu: introduce mpu_memory_section_contains for address range check Penny Zheng
2023-01-13  5:29 ` [PATCH v2 30/40] xen/mpu: disable VMAP sub-system for MPU systems Penny Zheng
2023-01-13  9:39   ` Jan Beulich
2023-01-13  5:29 ` [PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system Penny Zheng
2023-01-13  9:42   ` Jan Beulich
2023-01-13 10:10   ` Jan Beulich
2023-02-09 11:01     ` Julien Grall
2023-01-13  5:29 ` [PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx Penny Zheng
2023-01-13  9:49   ` Jan Beulich
2023-02-09 11:14   ` Julien Grall
2023-01-13  5:29 ` [PATCH v2 33/40] xen/arm: check mapping status and attributes for MPU copy_from_paddr Penny Zheng
2023-01-13  5:29 ` [PATCH v2 34/40] xen/mpu: free init memory in MPU system Penny Zheng
2023-02-09 11:27   ` Julien Grall
2023-01-13  5:29 ` [PATCH v2 35/40] xen/mpu: destroy boot modules and early FDT mapping " Penny Zheng
2023-01-13  5:29 ` [PATCH v2 36/40] xen/mpu: Use secure hypervisor timer for AArch64v8R Penny Zheng
2023-02-05 22:26   ` Julien Grall
2023-01-13  5:29 ` [PATCH v2 37/40] xen/mpu: move MMU specific P2M code to p2m_mmu.c Penny Zheng
2023-01-13  5:29 ` [PATCH v2 38/40] xen/mpu: implement setup_virt_paging for MPU system Penny Zheng
2023-01-13  5:29 ` [PATCH v2 39/40] xen/mpu: re-order xen_mpumap in arch_init_finialize Penny Zheng
2023-01-13  5:29 ` [PATCH v2 40/40] xen/mpu: add Kconfig option to enable Armv8-R AArch64 support Penny Zheng
2023-01-13  5:29 ` [PATCH] xen/mpu: make Xen boot to idle on MPU systems(DNM) Penny Zheng
2023-01-13  8:54 ` [PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1 Jan Beulich
2023-01-13  9:16   ` Julien Grall
2023-01-13  9:28     ` Jan Beulich
2023-01-24 19:31 ` Ayan Kumar Halder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.