All of lore.kernel.org
 help / color / mirror / Atom feed
* Proposal for Porting Xen to Armv8-R64 - DraftA
@ 2022-02-24  6:01 Wei Chen
  2022-02-24 11:52 ` Ayan Kumar Halder
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Wei Chen @ 2022-02-24  6:01 UTC (permalink / raw)
  To: xen-devel, julien, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

# Proposal for Porting Xen to Armv8-R64

This proposal will introduce the PoC work of porting Xen to Armv8-R64,
which includes:
- The changes of current Xen capability, like Xen build system, memory
  management, domain management, vCPU context switch.
- The expanded Xen capability, like static-allocation and direct-map.

***Notes:***
1. ***This proposal only covers the work of porting Xen to Armv8-R64***
   ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
   ***Trusted-Frimware (TF-R). This is an external dependency,***
   ***so we think the discussion of Xen SMP support on Armv8-R64***
   ***should be started when single-CPU support is complete.***
2. ***This proposal will not touch xen-tools. In current stage,***
   ***Xen on Armv8-R64 only support dom0less, all guests should***
   ***be booted from device tree.***

## 1. Essential Background

### 1.1. Armv8-R64 Profile
The Armv-R architecture profile was designed to support use cases that
have a high sensitivity to deterministic execution. (e.g. Fuel Injection,
Brake control, Drive trains, Motor control etc)

Arm announced Armv8-R in 2013, it is the latest generation Arm architecture
targeted at the Real-time profile. It introduces virtualization at the highest
security level while retaining the Protected Memory System Architecture (PMSA)
based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex-R82,
which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.

- The latest Armv8-R64 document can be found here:
  [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R AArch64 architecture profile](https://developer.arm.com/documentation/ddi0600/latest/).

- Armv-R Architecture progression:
  Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
  The following figure is a simple comparison of "R" processors based on
  different Armv-R Architectures.
  ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8imBpbvIr2eqBguEB)

- The Armv8-R architecture evolved additional features on top of Armv7-R:
    - An exception model that is compatible with the Armv8-A model
    - Virtualization with support for guest operating systems
        - PMSA virtualization using MPUs In EL2.
- The new features of Armv8-R64 architecture
    - Adds support for the 64-bit A64 instruction set, previously Armv8-R
      only supported A32.
    - Supports up to 48-bit physical addressing, previously up to 32-bit
      addressing was supported.
    - Optional Arm Neon technology and Advanced SIMD
    - Supports three Exception Levels (ELs)
        - Secure EL2 - The Highest Privilege, MPU only, for firmware, hypervisor
        - Secure EL1 - RichOS (MMU) or RTOS (MPU)
        - Secure EL0 - Application Workloads
    - Optionally supports Virtual Memory System Architecture at S-EL1/S-EL0.
      This means it's possible to run rich OS kernels - like Linux - either
      bare-metal or as a guest.
- Differences with the Armv8-A AArch64 architecture
    - Supports only a single Security state - Secure. There is not Non-Secure
      execution state supported.
    - EL3 is not supported, EL2 is mandatory. This means secure EL2 is the
      highest EL.
    - Supports the A64 ISA instruction
        - With a small set of well-defined differences
    - Provides a PMSA (Protected Memory System Architecture) based
      virtualization model.
        - As opposed to Armv8-A AArch64's VMSA based Virtualization
        - Can support address bits up to 52 if FEAT_LPA is enabled,
          otherwise 48 bits.
        - Determines the access permissions and memory attributes of
          the target PA.
        - Can implement PMSAv8-64 at EL1 and EL2
            - Address translation flat-maps the VA to the PA for EL2 Stage 1.
            - Address translation flat-maps the VA to the PA for EL1 Stage 1.
            - Address translation flat-maps the IPA to the PA for EL1 Stage 2.
    - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.

### 1.2. Xen Challenges with PMSA Virtualization
Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
with an MPU and host multiple guest OSes.

- No MMU at EL2:
    - No EL2 Stage 1 address translation
        - Xen provides fixed ARM64 virtual memory layout as basis of EL2
          stage 1 address translation, which is not applicable on MPU system,
          where there is no virtual addressing. As a result, any operation
          involving transition from PA to VA, like ioremap, needs modification
          on MPU system.
    - Xen's run-time addresses are the same as the link time addresses.
        - Enable PIC (position-independent code) on a real-time target
          processor probably very rare.
    - Xen will need to use the EL2 MPU memory region descriptors to manage
      access permissions and attributes for accesses made by VMs at EL1/0.
        - Xen currently relies on MMU EL1 stage 2 table to manage these
          accesses.
- No MMU Stage 2 translation at EL1:
    - A guest doesn't have an independent guest physical address space
    - A guest can not reuse the current Intermediate Physical Address
      memory layout
    - A guest uses physical addresses to access memory and devices
    - The MPU at EL2 manages EL1 stage 2 access permissions and attributes
- There are a limited number of MPU protection regions at both EL2 and EL1:
    - Architecturally, the maximum number of protection regions is 256,
      typical implementations have 32.
    - By contrast, Xen does not need to consider the number of page table
      entries in theory when using MMU.
- The MPU protection regions at EL2 need to be shared between the hypervisor
  and the guest stage 2.
    - Requires careful consideration - may impact feature 'fullness' of both
      the hypervisor and the guest
    - By contrast, when using MMU, Xen has standalone P2M table for guest
      stage 2 accesses.

## 2. Proposed changes of Xen
### **2.1. Changes of build system:**

- ***Introduce new Kconfig options for Armv8-R64***:
  Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may not
  expect one Xen binary to run on all machines. Xen images are not common
  across Armv8-R64 platforms. Xen must be re-built for different Armv8-R64
  platforms. Because these platforms may have different memory layout and
  link address.
    - `ARM64_V8R`:
      This option enables Armv8-R profile for Arm64. Enabling this option
      results in selecting MPU. This Kconfig option is used to gate some
      Armv8-R64 specific code except MPU code, like some code for Armv8-R64
      only system ID registers access.

    - `ARM_MPU`
      This option enables MPU on ARMv8-R architecture. Enabling this option
      results in disabling MMU. This Kconfig option is used to gate some
      ARM_MPU specific code. Once when this Kconfig option has been enabled,
      the MMU relate code will not be built for Armv8-R64. The reason why
      not depends on runtime detection to select MMU or MPU is that, we don't
      think we can use one image for both Armv8-R64 and Armv8-A64. Another
      reason that we separate MPU and V8R in provision to allow to support MPU
      on 32bit Arm one day.

    - `XEN_START_ADDRESS`
      This option allows to set the custom address at which Xen will be
      linked. This address must be aligned to a page size. Xen's run-time
      addresses are the same as the link time addresses. Different platforms
      may have differnt memory layout. This Kconfig option provides users
      the ability to select proper link addresses for their boards.
      ***Notes: Fixed link address means the Xen binary could not be***
      ***relocated by EFI loader. So in current stage, Xen could not***
      ***be launched as an EFI application on Armv8-R64.***

    - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
      `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
      These Kconfig options allow to set memory regions for Xen code, data
      and device memory. Before parsing memory information from device tree,
      Xen will use the values that stored in these options to setup boot-time
      MPU configuration. Why we need a boot-time MPU configuration?
      1. More deterministic: Arm MPU supports background regions,
         if we don't configure the MPU regions and don't enable MPU.
         We can enable MPU background regions. But that means all RAM
         is RWX. Random values in RAM or maliciously embedded data can
         be exploited. Using these Kconfig options allow users to have
         a deterministic RAM area to execute code.
      2. More compatible: On some Armv8-R64 platforms, if the MPU is
         disabled, the `dc zva` instruction will make the system halt.
         And this instruction will be embedded in some built-in functions,
         like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
         the built-in functions will not contain `dc zva`. However, it is
         obviously unlikely that we will be able to recompile all GCC
         for ARMv8-R64.
      3. One optional idea:
          We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` or
          `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
          MPU normal memory. It's enough to support Xen run in boot time.

- ***Define new system registers for compilers***:
  Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
  specific system registers. As Armv8-R64 only have secure state, so
  at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
  first GCC version that supports Armv8.4 is GCC 8.1. In addition to
  these, PMSA of Armv8-R64 introduced lots of MPU related system registers:
  `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
  `MPUIR_ELx`. But the first GCC version to support these system registers
  is GCC 11. So we have two ways to make compilers to work properly with
  these system registers.
  1. Bump GCC version to GCC 11.
     The pros of this method is that, we don't need to encode these
     system registers in macros by ourselves. But the cons are that,
     we have to update Makefiles to support GCC 11 for Armv8-R64.
     1.1. Check the GCC version 11 for Armv8-R64.
     1.2. Add march=armv8r to CFLAGS for Armv8-R64.
     1.3. Solve the confliction of march=armv8r and mcpu=generic
    These changes will affect common Makefiles, not only Arm Makefiles.
    And GCC 11 is new, lots of toolchains and Distro haven't supported it.

  2. Encode new system registers in macros ***(preferred)***
        ```
        /* Virtualization Secure Translation Control Register */
        #define VSTCR_EL2  S3_4_C2_C6_2
        /* Virtualization System Control Register */
        #define VSCTLR_EL2 S3_4_C2_C0_0
        /* EL1 MPU Protection Region Base Address Register encode */
        #define PRBAR_EL1  S3_0_C6_C8_0
        ...
        /* EL2 MPU Protection Region Base Address Register encode */
        #define PRBAR_EL2  S3_4_C6_C8_0
        ...
        ```
     If we encode all above system registers, we don't need to bump GCC
     version. And the common CFLAGS Xen is using still can be applied to
     Armv8-R64. We don't need to modify Makefiles to add specific CFLAGS.

### **2.2. Changes of the initialization process**
In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
initialization process. In addition to some architecture differences, there
is no more than reusable code that we will distinguish through CONFIG_ARM_MPU
or CONFIG_ARM64_V8R. We want most of the initialization code to be reusable
between Armv8-R64 and Armv8-A64.

- We will reuse the original head.s and setup.c of Arm. But replace the
  MMU and page table operations in these files with configuration operations
  for MPU and MPU regions.

- We provide a boot-time MPU configuration. This MPU configuration will
  support Xen to finish its initialization. And this boot-time MPU
  configuration will record the memory regions that will be parsed from
  device tree.

  In the end of Xen initialization, we will use a runtime MPU configuration
  to replace boot-time MPU configuration. The runtime MPU configuration will
  merge and reorder memory regions to save more MPU regions for guests.
  ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRDoacQVTwUtWIGU)

- Defer system unpausing domain.
  When Xen initialization is about to end, Xen unpause guests created
  during initialization. But this will cause some issues. The unpause
  action occurs before free_init_memory, however the runtime MPU configuration
  is built after free_init_memory.

  So if the unpaused guests start executing the context switch at this
  point, then its MPU context will base on the boot-time MPU configuration.
  Probably it will be inconsistent with runtime MPU configuration, this
  will cause unexpected problems (This may not happen in a single core
  system, but on SMP systems, this problem is foreseeable, so we hope to
  solve it at the beginning).

### **2.3. Changes to reduce memory fragmentation**

In general, memory in Xen system can be classified to 4 classes:
`image sections`, `heap sections`, `guest RAM`, `boot modules (guest Kernel,
initrd and dtb)`

Currently, Xen doesn't have any restriction for users how to allocate
memory for different classes. That means users can place boot modules
anywhere, can reserve Xen heap memory anywhere and can allocate guest
memory anywhere.

In a VMSA system, this would not be too much of a problem, since the
MMU can manage memory at a granularity of 4KB after all. But in a
PMSA system, this will be a big problem. On Armv8-R64, the max MPU
protection regions number has been limited to 256. But in typical
processor implementations, few processors will design more than 32
MPU protection regions. Add in the fact that Xen shares MPU protection
regions with guest's EL1 Stage 2. It becomes even more important
to properly plan the use of MPU protection regions.

- An ideal of memory usage layout restriction:
![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAtd75XtrngcnW)
1. Reserve proper MPU regions for Xen image (code, rodata and data + bss).
2. Reserve one MPU region for boot modules.
   That means the placement of all boot modules, include guest kernel,
   initrd and dtb, will be limited to this MPU region protected area.
3. Reserve one or more MPU regions for Xen heap.
   On Armv8-R64, the guest memory is predefined in device tree, it will
   not be allocated from heap. Unlike Armv8-A64, we will not move all
   free memory to heap. We want Xen heap is dertermistic too, so Xen on
   Armv8-R64 also rely on Xen static heap feature. The memory for Xen
   heap will be defined in tree too. Considering that physical memory
   can also be discontinuous, one or more MPU protection regions needs
   to be reserved for Xen HEAP.
4. If we name above used MPU protection regions PART_A, and name left
   MPU protection regions PART_B:
   4.1. In hypervisor context, Xen will map left RAM and devices to PART_B.
        This will give Xen the ability to access whole memory.
   4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B.
        In this case, Xen just need to update PART_B in context switch,
        but keep PART_A as fixed.

***Notes: Static allocation will be mandatory on MPU based systems***

**A sample device tree of memory layout restriction**:
```
chosen {
    ...
    /*
     * Define a section to place boot modules,
     * all boot modules must be placed in this section.
     */
    mpu,boot-module-section = <0x10000000 0x10000000>;
    /*
     * Define a section to cover all guest RAM. All guest RAM must be located
     * within this section. The pros is that, in best case, we can only have
     * one MPU protection region to map all guest RAM for Xen.
     */
    mpu,guest-memory-section = <0x20000000 0x30000000>;
    /*
     * Define a memory section that can cover all device memory that
     * will be used in Xen.
     */
    mpu,device-memory-section = <0x80000000 0x7ffff000>;
    /* Define a section for Xen heap */
    xen,static-mem = <0x50000000 0x20000000>;

    domU1 {
        ...
        #xen,static-mem-address-cells = <0x01>;
        #xen,static-mem-size-cells = <0x01>;
        /* Statically allocated guest memory, within mpu,guest-memory-section */
        xen,static-mem = <0x30000000 0x1f000000>;

        module@11000000 {
            compatible = "multiboot,kernel\0multiboot,module";
            /* Boot module address, within mpu,boot-module-section */
            reg = <0x11000000 0x3000000>;
            ...
        };

        module@10FF0000 {
                compatible = "multiboot,device-tree\0multiboot,module";
                /* Boot module address, within mpu,boot-module-section */
                reg = <0x10ff0000 0x10000>;
                ...
        };
    };
};
```

### **2.4. Changes of memory management**
Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
decouple Xen from VMSA. And give Xen the ability to manage memory in PMSA.

1. ***Use buddy allocator to manage physical pages for PMSA***
   From the view of physical page, PMSA and VMSA don't have any difference.
   So we can reuse buddy allocator on Armv8-R64 to manage physical pages.
   The difference is that, in VMSA, Xen will map allocated pages to virtual
   addresses. But in PMSA, Xen just convert the pages to physical address.

2. ***Can not use virtual address for memory management***
   As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using virtual
   address to manage memory. This brings some problems, some virtual address
   based features could not work well on Armv8-R64, like `FIXMAP`, `vmap/vumap`,
   `ioremap` and `alternative`.

   But the functions or macros of these features are used in lots of common
   code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate code
   everywhere. In this case, we propose to use stub helpers to make the changes
   transparently to common code.
   1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap operations.
      This will return physical address directly of fixmapped item.
   2. For `vmap/vumap`, we will use some empty inline stub helpers:
        ```
        static inline void vm_init_type(...) {}
        static inline void *__vmap(...)
        {
            return NULL;
        }
        static inline void vunmap(const void *va) {}
        static inline void *vmalloc(size_t size)
        {
            return NULL;
        }
        static inline void *vmalloc_xen(size_t size)
        {
            return NULL;
        }
        static inline void vfree(void *va) {}
        ```

   3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to always
      return `NULL`, they could not work well on Armv8-R64 without changes.
      `ioremap` will return input address directly.
        ```
        static inline void *ioremap_attr(...)
        {
            /* We don't have the ability to change input PA cache attributes */
            if ( CACHE_ATTR_need_change )
                return NULL;
            return (void *)pa;
        }
        static inline void __iomem *ioremap_nocache(...)
        {
            return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
        }
        static inline void __iomem *ioremap_cache(...)
        {
            return ioremap_attr(start, len, PAGE_HYPERVISOR);
        }
        static inline void __iomem *ioremap_wc(...)
        {
            return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
        }
        void *ioremap(...)
        {
            return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
        }

        ```
    4. For `alternative`, it depends on `vmap` too. We will simply disable
       it on Armv8-R64 in current stage. How to implement `alternative`
       on Armv8-R64 is better to be discussed after basic functions of Xen
       on Armv8-R64 work well.
       But simply disable `alternative` will make `cpus_have_const_cap` always
       return false.
        ```
        * System capability check for constant cap */
        #define cpus_have_const_cap(num) ({                \
               register_t __ret;                           \
                                                           \
               asm volatile (ALTERNATIVE("mov %0, #0",     \
                                         "mov %0, #1",     \
                                         num)              \
                             : "=r" (__ret));              \
                                                           \
                unlikely(__ret);                           \
                })
        ```
        So, before we have an PMSA `alternative` implementation, we have to
        implement a separate `cpus_have_const_cap` for Armv8-R64:
        ```
        #define cpus_have_const_cap(num) cpus_have_cap(num)
        ```

### **2.5. Changes of guest management**
Armv8-R64 only supports PMSA in EL2, but it supports configurable
VMSA or PMSA in EL1. This means Xen will have a new type guest on
Armv8-R64 - MPU based guest.

1. **Add a new domain type - MPU_DOMAIN**
   When user want to create a guest that will be using MPU in EL1, user
   should add a `mpu` property in device tree `domU` node, like following
   example:
    ```
    domU2 {
        compatible = "xen,domain";
        direct-map;
        mpu; --> Indicates this domain will use PMSA in EL1.
        ...
    };
    ```
    Corresponding to `mpu` property in device tree, we also need to introduce
    a new flag `XEN_DOMCTL_CDF_INTERNAL_mpu` for domain to mark itself as an
    MPU domain. This flag will be used in domain creation and domain doing
    vCPU context switch.
    1. Domain creation need this flag to decide enable PMSA or VMSA in EL1.
    2. vCPU context switch need this flag to decide save/restore MMU or MPU
       related registers.

2. **Add MPU registers to vCPU save EL1 MPU context**
   Current Xen only support MMU based guest, so it hasn't considered to
   save/restore MPU context. In this case, we need to add MPU registers
   to `arch_vcpu`:
    ```
    struct arch_vcpu
    {
    #ifdef CONFIG_ARM_MPU
        /* Virtualization Translation Control Register */
        register_t vtcr_el2;

        /* EL1 MPU regions' registers */
        pr_t mpu_regions[CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS];
    #endif
    }
    ```
    Armv8-R64 can support max to 256 MPU regions. But that's just theoretical.
    So we don't want to define `pr_t mpu_regions[256]`, this is a memory waste
    in most of time. So we decided to let the user specify through a Kconfig
    option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can be `32`,
    it's a typical implementation on Armv8-R64. Users will recompile Xen when
    their platform changes. So when the MPU changes, respecifying the MPU
    protection regions number will not cause additional problems.

3. **MPU based P2M table management**
   Armv8-R64 EL2 doesn't have EL1 stage 2 address translation. But through
   PMSA, it still has the ability to control the permissions and attributes
   of EL1 stage 2. In this case, we still hope to keep the interface
   consistent with MMU based P2M as far as possible.

   p2m->root will point to an allocated memory. In Armv8-A64, this memory
   is used to save the EL1 stage 2 translation table. But in Armv8-R64,
   this memory will be used to store EL2 MPU protection regions that are
   used by guest. During domain creation, Xen will prepare the data in
   this memory to make guest can access proper RAM and devices. When the
   guest's vCPU will be scheduled in, this data will be written to MPU
   protection region registers.

### **2.6. Changes of exception trap**
As Armv8-R64 has compatible exception mode with Armv8-A64, so we can reuse most
of Armv8-A64's exception trap & handler code. But except the trap based on EL1
stage 2 translation abort.

In Armv8-A64, we use `FSC_FLT_TRANS`
```
    case FSC_FLT_TRANS:
        ...
        if ( is_data )
        {
            enum io_state state = try_handle_mmio(regs, hsr, gpa);
            ...
        }
```
But for Armv8-R64, we have to use `FSC_FLT_PERM`
```
    case FSC_FLT_PERM:
        ...
        if ( is_data )
        {
            enum io_state state = try_handle_mmio(regs, hsr, gpa);
            ...
        }
```

### **2.5. Changes of device driver**
1. Because Armv8-R64 only has single secure state, this will affect some
devices that have two secure state, like GIC. But fortunately, most
vendors will not link a two secure state GIC to Armv8-R64 processors.
Current GIC driver can work well with single secure state GIC for Armv8-R64.
2. Xen should use secure hypervisor timer in Secure EL2. We will introduce
a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for timer. 

### **2.7. Changes of virtual device**
Currently, we only support pass-through devices in guest. Because event
channel, xen-bus, xen-storage and other advanced Xen features haven't been
enabled in Armv8-R64.

--
Cheers,
Wei Chen



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-24  6:01 Proposal for Porting Xen to Armv8-R64 - DraftA Wei Chen
@ 2022-02-24 11:52 ` Ayan Kumar Halder
  2022-02-25  6:33   ` Wei Chen
  2022-02-25  0:55 ` Stefano Stabellini
  2022-02-25 20:55 ` Julien Grall
  2 siblings, 1 reply; 34+ messages in thread
From: Ayan Kumar Halder @ 2022-02-24 11:52 UTC (permalink / raw)
  To: Wei Chen, xen-devel, julien, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Wei,

This is a nice writeup. I have a few initial queries.

On 24/02/2022 06:01, Wei Chen wrote:
> # Proposal for Porting Xen to Armv8-R64
>
> This proposal will introduce the PoC work of porting Xen to Armv8-R64,
> which includes:
> - The changes of current Xen capability, like Xen build system, memory
>    management, domain management, vCPU context switch.
> - The expanded Xen capability, like static-allocation and direct-map.
>
> ***Notes:***
> 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
>     ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
>     ***Trusted-Frimware (TF-R). This is an external dependency,***
>     ***so we think the discussion of Xen SMP support on Armv8-R64***
>     ***should be started when single-CPU support is complete.***
> 2. ***This proposal will not touch xen-tools. In current stage,***
>     ***Xen on Armv8-R64 only support dom0less, all guests should***
>     ***be booted from device tree.***
>
> ## 1. Essential Background
>
> ### 1.1. Armv8-R64 Profile
> The Armv-R architecture profile was designed to support use cases that
> have a high sensitivity to deterministic execution. (e.g. Fuel Injection,
> Brake control, Drive trains, Motor control etc)
>
> Arm announced Armv8-R in 2013, it is the latest generation Arm architecture
> targeted at the Real-time profile. It introduces virtualization at the highest
> security level while retaining the Protected Memory System Architecture (PMSA)
> based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex-R82,
> which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.

Is there some good document explaining the difference between MPU and 
MMU ? And when do we need one vs other.

>
> - The latest Armv8-R64 document can be found here:
>    [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R AArch64 architecture profile](https://developer.arm.com/documentation/ddi0600/latest/).
>
> - Armv-R Architecture progression:
>    Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
>    The following figure is a simple comparison of "R" processors based on
>    different Armv-R Architectures.
>    ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8imBpbvIr2eqBguEB)
>
> - The Armv8-R architecture evolved additional features on top of Armv7-R:
>      - An exception model that is compatible with the Armv8-A model
>      - Virtualization with support for guest operating systems
>          - PMSA virtualization using MPUs In EL2.
> - The new features of Armv8-R64 architecture
>      - Adds support for the 64-bit A64 instruction set, previously Armv8-R
>        only supported A32.
>      - Supports up to 48-bit physical addressing, previously up to 32-bit
>        addressing was supported.
>      - Optional Arm Neon technology and Advanced SIMD
>      - Supports three Exception Levels (ELs)
>          - Secure EL2 - The Highest Privilege, MPU only, for firmware, hypervisor
>          - Secure EL1 - RichOS (MMU) or RTOS (MPU)
>          - Secure EL0 - Application Workloads
>      - Optionally supports Virtual Memory System Architecture at S-EL1/S-EL0.
>        This means it's possible to run rich OS kernels - like Linux - either
>        bare-metal or as a guest.
> - Differences with the Armv8-A AArch64 architecture
>      - Supports only a single Security state - Secure. There is not Non-Secure
>        execution state supported.

If so, then I guess there is no Trustzone kind of protection available. 
I mean where application is normal world can request for data to be 
processed in secure world (by switching the NS bit on AXI).

Also, does Armv8-R support Trustzone controller 400 which helps to 
partition memory into different protected enclaves based on NSAID ?

(Apologies if my queries are irrelevant, I am asking this purely out of 
my own interest :) )

>      - EL3 is not supported, EL2 is mandatory. This means secure EL2 is the
>        highest EL.
>      - Supports the A64 ISA instruction
>          - With a small set of well-defined differences
>      - Provides a PMSA (Protected Memory System Architecture) based
>        virtualization model.
>          - As opposed to Armv8-A AArch64's VMSA based Virtualization
>          - Can support address bits up to 52 if FEAT_LPA is enabled,
>            otherwise 48 bits.
>          - Determines the access permissions and memory attributes of
>            the target PA.
>          - Can implement PMSAv8-64 at EL1 and EL2
>              - Address translation flat-maps the VA to the PA for EL2 Stage 1.
>              - Address translation flat-maps the VA to the PA for EL1 Stage 1.
>              - Address translation flat-maps the IPA to the PA for EL1 Stage 2.
>      - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.
>
> ### 1.2. Xen Challenges with PMSA Virtualization
> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
> with an MPU and host multiple guest OSes.
>
> - No MMU at EL2:
>      - No EL2 Stage 1 address translation
>          - Xen provides fixed ARM64 virtual memory layout as basis of EL2
>            stage 1 address translation, which is not applicable on MPU system,
>            where there is no virtual addressing. As a result, any operation
>            involving transition from PA to VA, like ioremap, needs modification
>            on MPU system.
>      - Xen's run-time addresses are the same as the link time addresses.
>          - Enable PIC (position-independent code) on a real-time target
>            processor probably very rare.
>      - Xen will need to use the EL2 MPU memory region descriptors to manage
>        access permissions and attributes for accesses made by VMs at EL1/0.
>          - Xen currently relies on MMU EL1 stage 2 table to manage these
>            accesses.
> - No MMU Stage 2 translation at EL1:
>      - A guest doesn't have an independent guest physical address space
>      - A guest can not reuse the current Intermediate Physical Address
>        memory layout
>      - A guest uses physical addresses to access memory and devices
>      - The MPU at EL2 manages EL1 stage 2 access permissions and attributes
> - There are a limited number of MPU protection regions at both EL2 and EL1:
>      - Architecturally, the maximum number of protection regions is 256,
>        typical implementations have 32.
>      - By contrast, Xen does not need to consider the number of page table
>        entries in theory when using MMU.
> - The MPU protection regions at EL2 need to be shared between the hypervisor
>    and the guest stage 2.
>      - Requires careful consideration - may impact feature 'fullness' of both
>        the hypervisor and the guest
>      - By contrast, when using MMU, Xen has standalone P2M table for guest
>        stage 2 accesses.
So, can it support running both RTOS and Linux as guests ? My 
understanding is no as we can't enable MPU (for RTOS) and MMU (for 
Linux) at the same time. There needs to be two separate images of Xen. 
Please confirm.
>
> ## 2. Proposed changes of Xen
> ### **2.1. Changes of build system:**
>
> - ***Introduce new Kconfig options for Armv8-R64***:
>    Unlike Armv8-A, because lack of MMU support on Armv8-R64,
But Armv8-R64 supports VMSA (Refer
ARM DDI 0600A.d ID120821, B1.2.2,
Virtual Memory System Architecture, VMSAv8-64). So it should support 
MMU, isn't it ?

- Ayan
> we may not
>    expect one Xen binary to run on all machines. Xen images are not common
>    across Armv8-R64 platforms. Xen must be re-built for different Armv8-R64
>    platforms. Because these platforms may have different memory layout and
>    link address.
>      - `ARM64_V8R`:
>        This option enables Armv8-R profile for Arm64. Enabling this option
>        results in selecting MPU. This Kconfig option is used to gate some
>        Armv8-R64 specific code except MPU code, like some code for Armv8-R64
>        only system ID registers access.
>
>      - `ARM_MPU`
>        This option enables MPU on ARMv8-R architecture. Enabling this option
>        results in disabling MMU. This Kconfig option is used to gate some
>        ARM_MPU specific code. Once when this Kconfig option has been enabled,
>        the MMU relate code will not be built for Armv8-R64. The reason why
>        not depends on runtime detection to select MMU or MPU is that, we don't
>        think we can use one image for both Armv8-R64 and Armv8-A64. Another
>        reason that we separate MPU and V8R in provision to allow to support MPU
>        on 32bit Arm one day.
>
>      - `XEN_START_ADDRESS`
>        This option allows to set the custom address at which Xen will be
>        linked. This address must be aligned to a page size. Xen's run-time
>        addresses are the same as the link time addresses. Different platforms
>        may have differnt memory layout. This Kconfig option provides users
>        the ability to select proper link addresses for their boards.
>        ***Notes: Fixed link address means the Xen binary could not be***
>        ***relocated by EFI loader. So in current stage, Xen could not***
>        ***be launched as an EFI application on Armv8-R64.***
>
>      - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
>        `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
>        These Kconfig options allow to set memory regions for Xen code, data
>        and device memory. Before parsing memory information from device tree,
>        Xen will use the values that stored in these options to setup boot-time
>        MPU configuration. Why we need a boot-time MPU configuration?
>        1. More deterministic: Arm MPU supports background regions,
>           if we don't configure the MPU regions and don't enable MPU.
>           We can enable MPU background regions. But that means all RAM
>           is RWX. Random values in RAM or maliciously embedded data can
>           be exploited. Using these Kconfig options allow users to have
>           a deterministic RAM area to execute code.
>        2. More compatible: On some Armv8-R64 platforms, if the MPU is
>           disabled, the `dc zva` instruction will make the system halt.
>           And this instruction will be embedded in some built-in functions,
>           like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
>           the built-in functions will not contain `dc zva`. However, it is
>           obviously unlikely that we will be able to recompile all GCC
>           for ARMv8-R64.
>        3. One optional idea:
>            We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` or
>            `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
>            MPU normal memory. It's enough to support Xen run in boot time.
>
> - ***Define new system registers for compilers***:
>    Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
>    specific system registers. As Armv8-R64 only have secure state, so
>    at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
>    first GCC version that supports Armv8.4 is GCC 8.1. In addition to
>    these, PMSA of Armv8-R64 introduced lots of MPU related system registers:
>    `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
>    `MPUIR_ELx`. But the first GCC version to support these system registers
>    is GCC 11. So we have two ways to make compilers to work properly with
>    these system registers.
>    1. Bump GCC version to GCC 11.
>       The pros of this method is that, we don't need to encode these
>       system registers in macros by ourselves. But the cons are that,
>       we have to update Makefiles to support GCC 11 for Armv8-R64.
>       1.1. Check the GCC version 11 for Armv8-R64.
>       1.2. Add march=armv8r to CFLAGS for Armv8-R64.
>       1.3. Solve the confliction of march=armv8r and mcpu=generic
>      These changes will affect common Makefiles, not only Arm Makefiles.
>      And GCC 11 is new, lots of toolchains and Distro haven't supported it.
>
>    2. Encode new system registers in macros ***(preferred)***
>          ```
>          /* Virtualization Secure Translation Control Register */
>          #define VSTCR_EL2  S3_4_C2_C6_2
>          /* Virtualization System Control Register */
>          #define VSCTLR_EL2 S3_4_C2_C0_0
>          /* EL1 MPU Protection Region Base Address Register encode */
>          #define PRBAR_EL1  S3_0_C6_C8_0
>          ...
>          /* EL2 MPU Protection Region Base Address Register encode */
>          #define PRBAR_EL2  S3_4_C6_C8_0
>          ...
>          ```
>       If we encode all above system registers, we don't need to bump GCC
>       version. And the common CFLAGS Xen is using still can be applied to
>       Armv8-R64. We don't need to modify Makefiles to add specific CFLAGS.
>
> ### **2.2. Changes of the initialization process**
> In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
> initialization process. In addition to some architecture differences, there
> is no more than reusable code that we will distinguish through CONFIG_ARM_MPU
> or CONFIG_ARM64_V8R. We want most of the initialization code to be reusable
> between Armv8-R64 and Armv8-A64.
>
> - We will reuse the original head.s and setup.c of Arm. But replace the
>    MMU and page table operations in these files with configuration operations
>    for MPU and MPU regions.
>
> - We provide a boot-time MPU configuration. This MPU configuration will
>    support Xen to finish its initialization. And this boot-time MPU
>    configuration will record the memory regions that will be parsed from
>    device tree.
>
>    In the end of Xen initialization, we will use a runtime MPU configuration
>    to replace boot-time MPU configuration. The runtime MPU configuration will
>    merge and reorder memory regions to save more MPU regions for guests.
>    ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRDoacQVTwUtWIGU)
>
> - Defer system unpausing domain.
>    When Xen initialization is about to end, Xen unpause guests created
>    during initialization. But this will cause some issues. The unpause
>    action occurs before free_init_memory, however the runtime MPU configuration
>    is built after free_init_memory.
>
>    So if the unpaused guests start executing the context switch at this
>    point, then its MPU context will base on the boot-time MPU configuration.
>    Probably it will be inconsistent with runtime MPU configuration, this
>    will cause unexpected problems (This may not happen in a single core
>    system, but on SMP systems, this problem is foreseeable, so we hope to
>    solve it at the beginning).
>
> ### **2.3. Changes to reduce memory fragmentation**
>
> In general, memory in Xen system can be classified to 4 classes:
> `image sections`, `heap sections`, `guest RAM`, `boot modules (guest Kernel,
> initrd and dtb)`
>
> Currently, Xen doesn't have any restriction for users how to allocate
> memory for different classes. That means users can place boot modules
> anywhere, can reserve Xen heap memory anywhere and can allocate guest
> memory anywhere.
>
> In a VMSA system, this would not be too much of a problem, since the
> MMU can manage memory at a granularity of 4KB after all. But in a
> PMSA system, this will be a big problem. On Armv8-R64, the max MPU
> protection regions number has been limited to 256. But in typical
> processor implementations, few processors will design more than 32
> MPU protection regions. Add in the fact that Xen shares MPU protection
> regions with guest's EL1 Stage 2. It becomes even more important
> to properly plan the use of MPU protection regions.
>
> - An ideal of memory usage layout restriction:
> ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAtd75XtrngcnW)
> 1. Reserve proper MPU regions for Xen image (code, rodata and data + bss).
> 2. Reserve one MPU region for boot modules.
>     That means the placement of all boot modules, include guest kernel,
>     initrd and dtb, will be limited to this MPU region protected area.
> 3. Reserve one or more MPU regions for Xen heap.
>     On Armv8-R64, the guest memory is predefined in device tree, it will
>     not be allocated from heap. Unlike Armv8-A64, we will not move all
>     free memory to heap. We want Xen heap is dertermistic too, so Xen on
>     Armv8-R64 also rely on Xen static heap feature. The memory for Xen
>     heap will be defined in tree too. Considering that physical memory
>     can also be discontinuous, one or more MPU protection regions needs
>     to be reserved for Xen HEAP.
> 4. If we name above used MPU protection regions PART_A, and name left
>     MPU protection regions PART_B:
>     4.1. In hypervisor context, Xen will map left RAM and devices to PART_B.
>          This will give Xen the ability to access whole memory.
>     4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B.
>          In this case, Xen just need to update PART_B in context switch,
>          but keep PART_A as fixed.
>
> ***Notes: Static allocation will be mandatory on MPU based systems***
>
> **A sample device tree of memory layout restriction**:
> ```
> chosen {
>      ...
>      /*
>       * Define a section to place boot modules,
>       * all boot modules must be placed in this section.
>       */
>      mpu,boot-module-section = <0x10000000 0x10000000>;
>      /*
>       * Define a section to cover all guest RAM. All guest RAM must be located
>       * within this section. The pros is that, in best case, we can only have
>       * one MPU protection region to map all guest RAM for Xen.
>       */
>      mpu,guest-memory-section = <0x20000000 0x30000000>;
>      /*
>       * Define a memory section that can cover all device memory that
>       * will be used in Xen.
>       */
>      mpu,device-memory-section = <0x80000000 0x7ffff000>;
>      /* Define a section for Xen heap */
>      xen,static-mem = <0x50000000 0x20000000>;
>
>      domU1 {
>          ...
>          #xen,static-mem-address-cells = <0x01>;
>          #xen,static-mem-size-cells = <0x01>;
>          /* Statically allocated guest memory, within mpu,guest-memory-section */
>          xen,static-mem = <0x30000000 0x1f000000>;
>
>          module@11000000 {
>              compatible = "multiboot,kernel\0multiboot,module";
>              /* Boot module address, within mpu,boot-module-section */
>              reg = <0x11000000 0x3000000>;
>              ...
>          };
>
>          module@10FF0000 {
>                  compatible = "multiboot,device-tree\0multiboot,module";
>                  /* Boot module address, within mpu,boot-module-section */
>                  reg = <0x10ff0000 0x10000>;
>                  ...
>          };
>      };
> };
> ```
>
> ### **2.4. Changes of memory management**
> Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
> decouple Xen from VMSA. And give Xen the ability to manage memory in PMSA.
>
> 1. ***Use buddy allocator to manage physical pages for PMSA***
>     From the view of physical page, PMSA and VMSA don't have any difference.
>     So we can reuse buddy allocator on Armv8-R64 to manage physical pages.
>     The difference is that, in VMSA, Xen will map allocated pages to virtual
>     addresses. But in PMSA, Xen just convert the pages to physical address.
>
> 2. ***Can not use virtual address for memory management***
>     As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using virtual
>     address to manage memory. This brings some problems, some virtual address
>     based features could not work well on Armv8-R64, like `FIXMAP`, `vmap/vumap`,
>     `ioremap` and `alternative`.
>
>     But the functions or macros of these features are used in lots of common
>     code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate code
>     everywhere. In this case, we propose to use stub helpers to make the changes
>     transparently to common code.
>     1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap operations.
>        This will return physical address directly of fixmapped item.
>     2. For `vmap/vumap`, we will use some empty inline stub helpers:
>          ```
>          static inline void vm_init_type(...) {}
>          static inline void *__vmap(...)
>          {
>              return NULL;
>          }
>          static inline void vunmap(const void *va) {}
>          static inline void *vmalloc(size_t size)
>          {
>              return NULL;
>          }
>          static inline void *vmalloc_xen(size_t size)
>          {
>              return NULL;
>          }
>          static inline void vfree(void *va) {}
>          ```
>
>     3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to always
>        return `NULL`, they could not work well on Armv8-R64 without changes.
>        `ioremap` will return input address directly.
>          ```
>          static inline void *ioremap_attr(...)
>          {
>              /* We don't have the ability to change input PA cache attributes */
>              if ( CACHE_ATTR_need_change )
>                  return NULL;
>              return (void *)pa;
>          }
>          static inline void __iomem *ioremap_nocache(...)
>          {
>              return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
>          }
>          static inline void __iomem *ioremap_cache(...)
>          {
>              return ioremap_attr(start, len, PAGE_HYPERVISOR);
>          }
>          static inline void __iomem *ioremap_wc(...)
>          {
>              return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
>          }
>          void *ioremap(...)
>          {
>              return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
>          }
>
>          ```
>      4. For `alternative`, it depends on `vmap` too. We will simply disable
>         it on Armv8-R64 in current stage. How to implement `alternative`
>         on Armv8-R64 is better to be discussed after basic functions of Xen
>         on Armv8-R64 work well.
>         But simply disable `alternative` will make `cpus_have_const_cap` always
>         return false.
>          ```
>          * System capability check for constant cap */
>          #define cpus_have_const_cap(num) ({                \
>                 register_t __ret;                           \
>                                                             \
>                 asm volatile (ALTERNATIVE("mov %0, #0",     \
>                                           "mov %0, #1",     \
>                                           num)              \
>                               : "=r" (__ret));              \
>                                                             \
>                  unlikely(__ret);                           \
>                  })
>          ```
>          So, before we have an PMSA `alternative` implementation, we have to
>          implement a separate `cpus_have_const_cap` for Armv8-R64:
>          ```
>          #define cpus_have_const_cap(num) cpus_have_cap(num)
>          ```
>
> ### **2.5. Changes of guest management**
> Armv8-R64 only supports PMSA in EL2, but it supports configurable
> VMSA or PMSA in EL1. This means Xen will have a new type guest on
> Armv8-R64 - MPU based guest.
>
> 1. **Add a new domain type - MPU_DOMAIN**
>     When user want to create a guest that will be using MPU in EL1, user
>     should add a `mpu` property in device tree `domU` node, like following
>     example:
>      ```
>      domU2 {
>          compatible = "xen,domain";
>          direct-map;
>          mpu; --> Indicates this domain will use PMSA in EL1.
>          ...
>      };
>      ```
>      Corresponding to `mpu` property in device tree, we also need to introduce
>      a new flag `XEN_DOMCTL_CDF_INTERNAL_mpu` for domain to mark itself as an
>      MPU domain. This flag will be used in domain creation and domain doing
>      vCPU context switch.
>      1. Domain creation need this flag to decide enable PMSA or VMSA in EL1.
>      2. vCPU context switch need this flag to decide save/restore MMU or MPU
>         related registers.
>
> 2. **Add MPU registers to vCPU save EL1 MPU context**
>     Current Xen only support MMU based guest, so it hasn't considered to
>     save/restore MPU context. In this case, we need to add MPU registers
>     to `arch_vcpu`:
>      ```
>      struct arch_vcpu
>      {
>      #ifdef CONFIG_ARM_MPU
>          /* Virtualization Translation Control Register */
>          register_t vtcr_el2;
>
>          /* EL1 MPU regions' registers */
>          pr_t mpu_regions[CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS];
>      #endif
>      }
>      ```
>      Armv8-R64 can support max to 256 MPU regions. But that's just theoretical.
>      So we don't want to define `pr_t mpu_regions[256]`, this is a memory waste
>      in most of time. So we decided to let the user specify through a Kconfig
>      option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can be `32`,
>      it's a typical implementation on Armv8-R64. Users will recompile Xen when
>      their platform changes. So when the MPU changes, respecifying the MPU
>      protection regions number will not cause additional problems.
>
> 3. **MPU based P2M table management**
>     Armv8-R64 EL2 doesn't have EL1 stage 2 address translation. But through
>     PMSA, it still has the ability to control the permissions and attributes
>     of EL1 stage 2. In this case, we still hope to keep the interface
>     consistent with MMU based P2M as far as possible.
>
>     p2m->root will point to an allocated memory. In Armv8-A64, this memory
>     is used to save the EL1 stage 2 translation table. But in Armv8-R64,
>     this memory will be used to store EL2 MPU protection regions that are
>     used by guest. During domain creation, Xen will prepare the data in
>     this memory to make guest can access proper RAM and devices. When the
>     guest's vCPU will be scheduled in, this data will be written to MPU
>     protection region registers.
>
> ### **2.6. Changes of exception trap**
> As Armv8-R64 has compatible exception mode with Armv8-A64, so we can reuse most
> of Armv8-A64's exception trap & handler code. But except the trap based on EL1
> stage 2 translation abort.
>
> In Armv8-A64, we use `FSC_FLT_TRANS`
> ```
>      case FSC_FLT_TRANS:
>          ...
>          if ( is_data )
>          {
>              enum io_state state = try_handle_mmio(regs, hsr, gpa);
>              ...
>          }
> ```
> But for Armv8-R64, we have to use `FSC_FLT_PERM`
> ```
>      case FSC_FLT_PERM:
>          ...
>          if ( is_data )
>          {
>              enum io_state state = try_handle_mmio(regs, hsr, gpa);
>              ...
>          }
> ```
>
> ### **2.5. Changes of device driver**
> 1. Because Armv8-R64 only has single secure state, this will affect some
> devices that have two secure state, like GIC. But fortunately, most
> vendors will not link a two secure state GIC to Armv8-R64 processors.
> Current GIC driver can work well with single secure state GIC for Armv8-R64.
> 2. Xen should use secure hypervisor timer in Secure EL2. We will introduce
> a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for timer.
>
> ### **2.7. Changes of virtual device**
> Currently, we only support pass-through devices in guest. Because event
> channel, xen-bus, xen-storage and other advanced Xen features haven't been
> enabled in Armv8-R64.
>
> --
> Cheers,
> Wei Chen
>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-24  6:01 Proposal for Porting Xen to Armv8-R64 - DraftA Wei Chen
  2022-02-24 11:52 ` Ayan Kumar Halder
@ 2022-02-25  0:55 ` Stefano Stabellini
  2022-02-25 10:48   ` Wei Chen
  2022-02-25 20:55 ` Julien Grall
  2 siblings, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2022-02-25  0:55 UTC (permalink / raw)
  To: Wei Chen
  Cc: xen-devel, julien, Stefano Stabellini, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd

Hi Wei,

This is extremely exciting, thanks for the very nice summary!


On Thu, 24 Feb 2022, Wei Chen wrote:
> # Proposal for Porting Xen to Armv8-R64
> 
> This proposal will introduce the PoC work of porting Xen to Armv8-R64,
> which includes:
> - The changes of current Xen capability, like Xen build system, memory
>   management, domain management, vCPU context switch.
> - The expanded Xen capability, like static-allocation and direct-map.
> 
> ***Notes:***
> 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
>    ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
>    ***Trusted-Frimware (TF-R). This is an external dependency,***
>    ***so we think the discussion of Xen SMP support on Armv8-R64***
>    ***should be started when single-CPU support is complete.***
> 2. ***This proposal will not touch xen-tools. In current stage,***
>    ***Xen on Armv8-R64 only support dom0less, all guests should***
>    ***be booted from device tree.***
> 
> ## 1. Essential Background
> 
> ### 1.1. Armv8-R64 Profile
> The Armv-R architecture profile was designed to support use cases that
> have a high sensitivity to deterministic execution. (e.g. Fuel Injection,
> Brake control, Drive trains, Motor control etc)
> 
> Arm announced Armv8-R in 2013, it is the latest generation Arm architecture
> targeted at the Real-time profile. It introduces virtualization at the highest
> security level while retaining the Protected Memory System Architecture (PMSA)
> based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex-R82,
> which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
> 
> - The latest Armv8-R64 document can be found here:
>   [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R AArch64 architecture profile](https://developer.arm.com/documentation/ddi0600/latest/).
> 
> - Armv-R Architecture progression:
>   Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
>   The following figure is a simple comparison of "R" processors based on
>   different Armv-R Architectures.
>   ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8imBpbvIr2eqBguEB)
> 
> - The Armv8-R architecture evolved additional features on top of Armv7-R:
>     - An exception model that is compatible with the Armv8-A model
>     - Virtualization with support for guest operating systems
>         - PMSA virtualization using MPUs In EL2.
> - The new features of Armv8-R64 architecture
>     - Adds support for the 64-bit A64 instruction set, previously Armv8-R
>       only supported A32.
>     - Supports up to 48-bit physical addressing, previously up to 32-bit
>       addressing was supported.
>     - Optional Arm Neon technology and Advanced SIMD
>     - Supports three Exception Levels (ELs)
>         - Secure EL2 - The Highest Privilege, MPU only, for firmware, hypervisor
>         - Secure EL1 - RichOS (MMU) or RTOS (MPU)
>         - Secure EL0 - Application Workloads
>     - Optionally supports Virtual Memory System Architecture at S-EL1/S-EL0.
>       This means it's possible to run rich OS kernels - like Linux - either
>       bare-metal or as a guest.
> - Differences with the Armv8-A AArch64 architecture
>     - Supports only a single Security state - Secure. There is not Non-Secure
>       execution state supported.
>     - EL3 is not supported, EL2 is mandatory. This means secure EL2 is the
>       highest EL.
>     - Supports the A64 ISA instruction
>         - With a small set of well-defined differences
>     - Provides a PMSA (Protected Memory System Architecture) based
>       virtualization model.
>         - As opposed to Armv8-A AArch64's VMSA based Virtualization
>         - Can support address bits up to 52 if FEAT_LPA is enabled,
>           otherwise 48 bits.
>         - Determines the access permissions and memory attributes of
>           the target PA.
>         - Can implement PMSAv8-64 at EL1 and EL2
>             - Address translation flat-maps the VA to the PA for EL2 Stage 1.
>             - Address translation flat-maps the VA to the PA for EL1 Stage 1.
>             - Address translation flat-maps the IPA to the PA for EL1 Stage 2.
>     - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.
> 
> ### 1.2. Xen Challenges with PMSA Virtualization
> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
> with an MPU and host multiple guest OSes.
> 
> - No MMU at EL2:
>     - No EL2 Stage 1 address translation
>         - Xen provides fixed ARM64 virtual memory layout as basis of EL2
>           stage 1 address translation, which is not applicable on MPU system,
>           where there is no virtual addressing. As a result, any operation
>           involving transition from PA to VA, like ioremap, needs modification
>           on MPU system.
>     - Xen's run-time addresses are the same as the link time addresses.
>         - Enable PIC (position-independent code) on a real-time target
>           processor probably very rare.
>     - Xen will need to use the EL2 MPU memory region descriptors to manage
>       access permissions and attributes for accesses made by VMs at EL1/0.
>         - Xen currently relies on MMU EL1 stage 2 table to manage these
>           accesses.
> - No MMU Stage 2 translation at EL1:
>     - A guest doesn't have an independent guest physical address space
>     - A guest can not reuse the current Intermediate Physical Address
>       memory layout
>     - A guest uses physical addresses to access memory and devices
>     - The MPU at EL2 manages EL1 stage 2 access permissions and attributes
> - There are a limited number of MPU protection regions at both EL2 and EL1:
>     - Architecturally, the maximum number of protection regions is 256,
>       typical implementations have 32.
>     - By contrast, Xen does not need to consider the number of page table
>       entries in theory when using MMU.
> - The MPU protection regions at EL2 need to be shared between the hypervisor
>   and the guest stage 2.
>     - Requires careful consideration - may impact feature 'fullness' of both
>       the hypervisor and the guest
>     - By contrast, when using MMU, Xen has standalone P2M table for guest
>       stage 2 accesses.
> 
> ## 2. Proposed changes of Xen
> ### **2.1. Changes of build system:**
> 
> - ***Introduce new Kconfig options for Armv8-R64***:
>   Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may not
>   expect one Xen binary to run on all machines. Xen images are not common
>   across Armv8-R64 platforms. Xen must be re-built for different Armv8-R64
>   platforms. Because these platforms may have different memory layout and
>   link address.
>     - `ARM64_V8R`:
>       This option enables Armv8-R profile for Arm64. Enabling this option
>       results in selecting MPU. This Kconfig option is used to gate some
>       Armv8-R64 specific code except MPU code, like some code for Armv8-R64
>       only system ID registers access.
> 
>     - `ARM_MPU`
>       This option enables MPU on ARMv8-R architecture. Enabling this option
>       results in disabling MMU. This Kconfig option is used to gate some
>       ARM_MPU specific code. Once when this Kconfig option has been enabled,
>       the MMU relate code will not be built for Armv8-R64. The reason why
>       not depends on runtime detection to select MMU or MPU is that, we don't
>       think we can use one image for both Armv8-R64 and Armv8-A64. Another
>       reason that we separate MPU and V8R in provision to allow to support MPU
>       on 32bit Arm one day.
> 
>     - `XEN_START_ADDRESS`
>       This option allows to set the custom address at which Xen will be
>       linked. This address must be aligned to a page size. Xen's run-time
>       addresses are the same as the link time addresses. Different platforms
>       may have differnt memory layout. This Kconfig option provides users
>       the ability to select proper link addresses for their boards.
>       ***Notes: Fixed link address means the Xen binary could not be***
>       ***relocated by EFI loader. So in current stage, Xen could not***
>       ***be launched as an EFI application on Armv8-R64.***
> 
>     - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
>       `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
>       These Kconfig options allow to set memory regions for Xen code, data
>       and device memory. Before parsing memory information from device tree,
>       Xen will use the values that stored in these options to setup boot-time
>       MPU configuration. Why we need a boot-time MPU configuration?
>       1. More deterministic: Arm MPU supports background regions,
>          if we don't configure the MPU regions and don't enable MPU.
>          We can enable MPU background regions. But that means all RAM
>          is RWX. Random values in RAM or maliciously embedded data can
>          be exploited. Using these Kconfig options allow users to have
>          a deterministic RAM area to execute code.
>       2. More compatible: On some Armv8-R64 platforms, if the MPU is
>          disabled, the `dc zva` instruction will make the system halt.
>          And this instruction will be embedded in some built-in functions,
>          like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
>          the built-in functions will not contain `dc zva`. However, it is
>          obviously unlikely that we will be able to recompile all GCC
>          for ARMv8-R64.
>       3. One optional idea:
>           We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` or
>           `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
>           MPU normal memory. It's enough to support Xen run in boot time.

I can imagine that we need to have a different Xen build for each
ARMv8-R platform. Do you envision that XEN_START_ADDRESS and
ARM_MPU_*_MEMORY_START/END are preconfigured based on the platform
choice at build time? I don't think we want a user to provide all of
those addresses by hand, right?

The next question is whether we could automatically generate
XEN_START_ADDRESS and ARM_MPU_*_MEMORY_START/END based on the platform
device tree at build time (at build time, not runtime). That would
make things a lot easier and it is also aligned with the way Zephyr and
other RTOSes and baremetal apps work.

The device tree can be given as input to the build system, and the
Makefiles would take care of generating XEN_START_ADDRESS and
ARM_MPU_*_MEMORY_START/END based on /memory and other interesting nodes.


> - ***Define new system registers for compilers***:
>   Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
>   specific system registers. As Armv8-R64 only have secure state, so
>   at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
>   first GCC version that supports Armv8.4 is GCC 8.1. In addition to
>   these, PMSA of Armv8-R64 introduced lots of MPU related system registers:
>   `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
>   `MPUIR_ELx`. But the first GCC version to support these system registers
>   is GCC 11. So we have two ways to make compilers to work properly with
>   these system registers.
>   1. Bump GCC version to GCC 11.
>      The pros of this method is that, we don't need to encode these
>      system registers in macros by ourselves. But the cons are that,
>      we have to update Makefiles to support GCC 11 for Armv8-R64.
>      1.1. Check the GCC version 11 for Armv8-R64.
>      1.2. Add march=armv8r to CFLAGS for Armv8-R64.
>      1.3. Solve the confliction of march=armv8r and mcpu=generic
>     These changes will affect common Makefiles, not only Arm Makefiles.
>     And GCC 11 is new, lots of toolchains and Distro haven't supported it.
> 
>   2. Encode new system registers in macros ***(preferred)***
>         ```
>         /* Virtualization Secure Translation Control Register */
>         #define VSTCR_EL2  S3_4_C2_C6_2
>         /* Virtualization System Control Register */
>         #define VSCTLR_EL2 S3_4_C2_C0_0
>         /* EL1 MPU Protection Region Base Address Register encode */
>         #define PRBAR_EL1  S3_0_C6_C8_0
>         ...
>         /* EL2 MPU Protection Region Base Address Register encode */
>         #define PRBAR_EL2  S3_4_C6_C8_0
>         ...
>         ```
>      If we encode all above system registers, we don't need to bump GCC
>      version. And the common CFLAGS Xen is using still can be applied to
>      Armv8-R64. We don't need to modify Makefiles to add specific CFLAGS.

I think that's fine and we did something similar with the original ARMv7-A
port if I remember correctly.


> ### **2.2. Changes of the initialization process**
> In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
> initialization process. In addition to some architecture differences, there
> is no more than reusable code that we will distinguish through CONFIG_ARM_MPU
> or CONFIG_ARM64_V8R. We want most of the initialization code to be reusable
> between Armv8-R64 and Armv8-A64.

+1


> - We will reuse the original head.s and setup.c of Arm. But replace the
>   MMU and page table operations in these files with configuration operations
>   for MPU and MPU regions.
> 
> - We provide a boot-time MPU configuration. This MPU configuration will
>   support Xen to finish its initialization. And this boot-time MPU
>   configuration will record the memory regions that will be parsed from
>   device tree.
> 
>   In the end of Xen initialization, we will use a runtime MPU configuration
>   to replace boot-time MPU configuration. The runtime MPU configuration will
>   merge and reorder memory regions to save more MPU regions for guests.
>   ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRDoacQVTwUtWIGU)
> 
> - Defer system unpausing domain.
>   When Xen initialization is about to end, Xen unpause guests created
>   during initialization. But this will cause some issues. The unpause
>   action occurs before free_init_memory, however the runtime MPU configuration
>   is built after free_init_memory.
> 
>   So if the unpaused guests start executing the context switch at this
>   point, then its MPU context will base on the boot-time MPU configuration.
>   Probably it will be inconsistent with runtime MPU configuration, this
>   will cause unexpected problems (This may not happen in a single core
>   system, but on SMP systems, this problem is foreseeable, so we hope to
>   solve it at the beginning).
> 
> ### **2.3. Changes to reduce memory fragmentation**
> 
> In general, memory in Xen system can be classified to 4 classes:
> `image sections`, `heap sections`, `guest RAM`, `boot modules (guest Kernel,
> initrd and dtb)`
> 
> Currently, Xen doesn't have any restriction for users how to allocate
> memory for different classes. That means users can place boot modules
> anywhere, can reserve Xen heap memory anywhere and can allocate guest
> memory anywhere.
> 
> In a VMSA system, this would not be too much of a problem, since the
> MMU can manage memory at a granularity of 4KB after all. But in a
> PMSA system, this will be a big problem. On Armv8-R64, the max MPU
> protection regions number has been limited to 256. But in typical
> processor implementations, few processors will design more than 32
> MPU protection regions. Add in the fact that Xen shares MPU protection
> regions with guest's EL1 Stage 2. It becomes even more important
> to properly plan the use of MPU protection regions.
> 
> - An ideal of memory usage layout restriction:
> ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAtd75XtrngcnW)
> 1. Reserve proper MPU regions for Xen image (code, rodata and data + bss).
> 2. Reserve one MPU region for boot modules.
>    That means the placement of all boot modules, include guest kernel,
>    initrd and dtb, will be limited to this MPU region protected area.
> 3. Reserve one or more MPU regions for Xen heap.
>    On Armv8-R64, the guest memory is predefined in device tree, it will
>    not be allocated from heap. Unlike Armv8-A64, we will not move all
>    free memory to heap. We want Xen heap is dertermistic too, so Xen on
>    Armv8-R64 also rely on Xen static heap feature. The memory for Xen
>    heap will be defined in tree too. Considering that physical memory
>    can also be discontinuous, one or more MPU protection regions needs
>    to be reserved for Xen HEAP.
> 4. If we name above used MPU protection regions PART_A, and name left
>    MPU protection regions PART_B:
>    4.1. In hypervisor context, Xen will map left RAM and devices to PART_B.
>         This will give Xen the ability to access whole memory.
>    4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B.
>         In this case, Xen just need to update PART_B in context switch,
>         but keep PART_A as fixed.

I think that the memory layout and restrictions that you wrote above
make sense. I have some comments on the way they are represented in
device tree, but that's different.


> ***Notes: Static allocation will be mandatory on MPU based systems***
> 
> **A sample device tree of memory layout restriction**:
> ```
> chosen {
>     ...
>     /*
>      * Define a section to place boot modules,
>      * all boot modules must be placed in this section.
>      */
>     mpu,boot-module-section = <0x10000000 0x10000000>;
>     /*
>      * Define a section to cover all guest RAM. All guest RAM must be located
>      * within this section. The pros is that, in best case, we can only have
>      * one MPU protection region to map all guest RAM for Xen.
>      */
>     mpu,guest-memory-section = <0x20000000 0x30000000>;
>     /*
>      * Define a memory section that can cover all device memory that
>      * will be used in Xen.
>      */
>     mpu,device-memory-section = <0x80000000 0x7ffff000>;
>     /* Define a section for Xen heap */
>     xen,static-mem = <0x50000000 0x20000000>;

As mentioned above, I understand the need for these sections, but why do
we need to describe them in device tree at all? Could Xen select them by
itself during boot?

If not, and considering that we have to generate
ARM_MPU_*_MEMORY_START/END anyway at build time, would it make sense to
also generate mpu,guest-memory-section, xen,static-mem, etc. at build
time rather than passing it via device tree to Xen at runtime?

What's the value of doing ARM_MPU_*_MEMORY_START/END at build time and
everything else at runtime?

It looks like we are forced to have the sections definitions at build
time because we need them before we can parse device tree. In that case,
we might as well define all the sections at build time.

But I think it would be even better if Xen could automatically choose
xen,static-mem, mpu,guest-memory-section, etc. on its own based on the
regular device tree information (/memory, /amba, etc.), without any need
for explicitly describing each range with these new properties.

 
>     domU1 {
>         ...
>         #xen,static-mem-address-cells = <0x01>;
>         #xen,static-mem-size-cells = <0x01>;
>         /* Statically allocated guest memory, within mpu,guest-memory-section */
>         xen,static-mem = <0x30000000 0x1f000000>;
> 
>         module@11000000 {
>             compatible = "multiboot,kernel\0multiboot,module";
>             /* Boot module address, within mpu,boot-module-section */
>             reg = <0x11000000 0x3000000>;
>             ...
>         };
> 
>         module@10FF0000 {
>                 compatible = "multiboot,device-tree\0multiboot,module";
>                 /* Boot module address, within mpu,boot-module-section */
>                 reg = <0x10ff0000 0x10000>;
>                 ...
>         };
>     };
> };
> ```
> 
> ### **2.4. Changes of memory management**
> Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
> decouple Xen from VMSA. And give Xen the ability to manage memory in PMSA.
> 
> 1. ***Use buddy allocator to manage physical pages for PMSA***
>    From the view of physical page, PMSA and VMSA don't have any difference.
>    So we can reuse buddy allocator on Armv8-R64 to manage physical pages.
>    The difference is that, in VMSA, Xen will map allocated pages to virtual
>    addresses. But in PMSA, Xen just convert the pages to physical address.
> 
> 2. ***Can not use virtual address for memory management***
>    As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using virtual
>    address to manage memory. This brings some problems, some virtual address
>    based features could not work well on Armv8-R64, like `FIXMAP`, `vmap/vumap`,
>    `ioremap` and `alternative`.
> 
>    But the functions or macros of these features are used in lots of common
>    code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate code
>    everywhere. In this case, we propose to use stub helpers to make the changes
>    transparently to common code.
>    1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap operations.
>       This will return physical address directly of fixmapped item.
>    2. For `vmap/vumap`, we will use some empty inline stub helpers:
>         ```
>         static inline void vm_init_type(...) {}
>         static inline void *__vmap(...)
>         {
>             return NULL;
>         }
>         static inline void vunmap(const void *va) {}
>         static inline void *vmalloc(size_t size)
>         {
>             return NULL;
>         }
>         static inline void *vmalloc_xen(size_t size)
>         {
>             return NULL;
>         }
>         static inline void vfree(void *va) {}
>         ```
> 
>    3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to always
>       return `NULL`, they could not work well on Armv8-R64 without changes.
>       `ioremap` will return input address directly.
>         ```
>         static inline void *ioremap_attr(...)
>         {
>             /* We don't have the ability to change input PA cache attributes */
>             if ( CACHE_ATTR_need_change )
>                 return NULL;
>             return (void *)pa;
>         }
>         static inline void __iomem *ioremap_nocache(...)
>         {
>             return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
>         }
>         static inline void __iomem *ioremap_cache(...)
>         {
>             return ioremap_attr(start, len, PAGE_HYPERVISOR);
>         }
>         static inline void __iomem *ioremap_wc(...)
>         {
>             return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
>         }
>         void *ioremap(...)
>         {
>             return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
>         }
> 
>         ```
>     4. For `alternative`, it depends on `vmap` too. We will simply disable
>        it on Armv8-R64 in current stage. How to implement `alternative`
>        on Armv8-R64 is better to be discussed after basic functions of Xen
>        on Armv8-R64 work well.
>        But simply disable `alternative` will make `cpus_have_const_cap` always
>        return false.
>         ```
>         * System capability check for constant cap */
>         #define cpus_have_const_cap(num) ({                \
>                register_t __ret;                           \
>                                                            \
>                asm volatile (ALTERNATIVE("mov %0, #0",     \
>                                          "mov %0, #1",     \
>                                          num)              \
>                              : "=r" (__ret));              \
>                                                            \
>                 unlikely(__ret);                           \
>                 })
>         ```
>         So, before we have an PMSA `alternative` implementation, we have to
>         implement a separate `cpus_have_const_cap` for Armv8-R64:
>         ```
>         #define cpus_have_const_cap(num) cpus_have_cap(num)
>         ```

I think it is OK to disable alternative


> ### **2.5. Changes of guest management**
> Armv8-R64 only supports PMSA in EL2, but it supports configurable
> VMSA or PMSA in EL1. This means Xen will have a new type guest on
> Armv8-R64 - MPU based guest.
> 
> 1. **Add a new domain type - MPU_DOMAIN**
>    When user want to create a guest that will be using MPU in EL1, user
>    should add a `mpu` property in device tree `domU` node, like following
>    example:
>     ```
>     domU2 {
>         compatible = "xen,domain";
>         direct-map;
>         mpu; --> Indicates this domain will use PMSA in EL1.
>         ...
>     };
>     ```
>     Corresponding to `mpu` property in device tree, we also need to introduce
>     a new flag `XEN_DOMCTL_CDF_INTERNAL_mpu` for domain to mark itself as an
>     MPU domain. This flag will be used in domain creation and domain doing
>     vCPU context switch.
>     1. Domain creation need this flag to decide enable PMSA or VMSA in EL1.
>     2. vCPU context switch need this flag to decide save/restore MMU or MPU
>        related registers.
> 
> 2. **Add MPU registers to vCPU save EL1 MPU context**
>    Current Xen only support MMU based guest, so it hasn't considered to
>    save/restore MPU context. In this case, we need to add MPU registers
>    to `arch_vcpu`:
>     ```
>     struct arch_vcpu
>     {
>     #ifdef CONFIG_ARM_MPU
>         /* Virtualization Translation Control Register */
>         register_t vtcr_el2;
> 
>         /* EL1 MPU regions' registers */
>         pr_t mpu_regions[CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS];
>     #endif
>     }
>     ```
>     Armv8-R64 can support max to 256 MPU regions. But that's just theoretical.
>     So we don't want to define `pr_t mpu_regions[256]`, this is a memory waste
>     in most of time. So we decided to let the user specify through a Kconfig
>     option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can be `32`,
>     it's a typical implementation on Armv8-R64. Users will recompile Xen when
>     their platform changes. So when the MPU changes, respecifying the MPU
>     protection regions number will not cause additional problems.

I wonder if we could probe the number of MPU regions at runtime and
dynamically allocate the memory needed to store them in arch_vcpu.


>
> 3. **MPU based P2M table management**
>    Armv8-R64 EL2 doesn't have EL1 stage 2 address translation. But through
>    PMSA, it still has the ability to control the permissions and attributes
>    of EL1 stage 2. In this case, we still hope to keep the interface
>    consistent with MMU based P2M as far as possible.
> 
>    p2m->root will point to an allocated memory. In Armv8-A64, this memory
>    is used to save the EL1 stage 2 translation table. But in Armv8-R64,
>    this memory will be used to store EL2 MPU protection regions that are
>    used by guest. During domain creation, Xen will prepare the data in
>    this memory to make guest can access proper RAM and devices. When the
>    guest's vCPU will be scheduled in, this data will be written to MPU
>    protection region registers.
> 
> ### **2.6. Changes of exception trap**
> As Armv8-R64 has compatible exception mode with Armv8-A64, so we can reuse most
> of Armv8-A64's exception trap & handler code. But except the trap based on EL1
> stage 2 translation abort.
> 
> In Armv8-A64, we use `FSC_FLT_TRANS`
> ```
>     case FSC_FLT_TRANS:
>         ...
>         if ( is_data )
>         {
>             enum io_state state = try_handle_mmio(regs, hsr, gpa);
>             ...
>         }
> ```
> But for Armv8-R64, we have to use `FSC_FLT_PERM`
> ```
>     case FSC_FLT_PERM:
>         ...
>         if ( is_data )
>         {
>             enum io_state state = try_handle_mmio(regs, hsr, gpa);
>             ...
>         }
> ```
> 
> ### **2.5. Changes of device driver**
> 1. Because Armv8-R64 only has single secure state, this will affect some
> devices that have two secure state, like GIC. But fortunately, most
> vendors will not link a two secure state GIC to Armv8-R64 processors.
> Current GIC driver can work well with single secure state GIC for Armv8-R64.
> 2. Xen should use secure hypervisor timer in Secure EL2. We will introduce
> a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for timer. 
> 
> ### **2.7. Changes of virtual device**
> Currently, we only support pass-through devices in guest. Because event
> channel, xen-bus, xen-storage and other advanced Xen features haven't been
> enabled in Armv8-R64.

That's fine -- it is a great start! Looking forward to it!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-24 11:52 ` Ayan Kumar Halder
@ 2022-02-25  6:33   ` Wei Chen
  0 siblings, 0 replies; 34+ messages in thread
From: Wei Chen @ 2022-02-25  6:33 UTC (permalink / raw)
  To: Ayan Kumar Halder, xen-devel, julien, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

HI Ayan,

> -----Original Message-----
> From: Ayan Kumar Halder <ayan.kumar.halder@xilinx.com>
> Sent: 2022年2月24日 19:52
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> julien@xen.org; Stefano Stabellini <sstabellini@kernel.org>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Wei,
> 
> This is a nice writeup. I have a few initial queries.
> 
> On 24/02/2022 06:01, Wei Chen wrote:
> > # Proposal for Porting Xen to Armv8-R64
> >
> > This proposal will introduce the PoC work of porting Xen to Armv8-R64,
> > which includes:
> > - The changes of current Xen capability, like Xen build system, memory
> >    management, domain management, vCPU context switch.
> > - The expanded Xen capability, like static-allocation and direct-map.
> >
> > ***Notes:***
> > 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
> >     ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> >     ***Trusted-Frimware (TF-R). This is an external dependency,***
> >     ***so we think the discussion of Xen SMP support on Armv8-R64***
> >     ***should be started when single-CPU support is complete.***
> > 2. ***This proposal will not touch xen-tools. In current stage,***
> >     ***Xen on Armv8-R64 only support dom0less, all guests should***
> >     ***be booted from device tree.***
> >
> > ## 1. Essential Background
> >
> > ### 1.1. Armv8-R64 Profile
> > The Armv-R architecture profile was designed to support use cases that
> > have a high sensitivity to deterministic execution. (e.g. Fuel Injection,
> > Brake control, Drive trains, Motor control etc)
> >
> > Arm announced Armv8-R in 2013, it is the latest generation Arm
> architecture
> > targeted at the Real-time profile. It introduces virtualization at the
> highest
> > security level while retaining the Protected Memory System Architecture
> (PMSA)
> > based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex-
> R82,
> > which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
> 
> Is there some good document explaining the difference between MPU and
> MMU ? And when do we need one vs other.
> 

The Arm Architecture Reference Manual Supplement document:
https://developer.arm.com/documentation/ddi0600/latest/
Will introduce the PMSA and VMSA for Armv8-R.

> >
> > - The latest Armv8-R64 document can be found here:
> >    [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R
> AArch64 architecture
> profile](https://developer.arm.com/documentation/ddi0600/latest/).
> >
> > - Armv-R Architecture progression:
> >    Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
> >    The following figure is a simple comparison of "R" processors based
> on
> >    different Armv-R Architectures.
> >    ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8
> imBpbvIr2eqBguEB)
> >
> > - The Armv8-R architecture evolved additional features on top of Armv7-R:
> >      - An exception model that is compatible with the Armv8-A model
> >      - Virtualization with support for guest operating systems
> >          - PMSA virtualization using MPUs In EL2.
> > - The new features of Armv8-R64 architecture
> >      - Adds support for the 64-bit A64 instruction set, previously
> Armv8-R
> >        only supported A32.
> >      - Supports up to 48-bit physical addressing, previously up to 32-
> bit
> >        addressing was supported.
> >      - Optional Arm Neon technology and Advanced SIMD
> >      - Supports three Exception Levels (ELs)
> >          - Secure EL2 - The Highest Privilege, MPU only, for firmware,
> hypervisor
> >          - Secure EL1 - RichOS (MMU) or RTOS (MPU)
> >          - Secure EL0 - Application Workloads
> >      - Optionally supports Virtual Memory System Architecture at S-
> EL1/S-EL0.
> >        This means it's possible to run rich OS kernels - like Linux -
> either
> >        bare-metal or as a guest.
> > - Differences with the Armv8-A AArch64 architecture
> >      - Supports only a single Security state - Secure. There is not Non-
> Secure
> >        execution state supported.
> 
> If so, then I guess there is no Trustzone kind of protection available.
> I mean where application is normal world can request for data to be
> processed in secure world (by switching the NS bit on AXI).
> 

On Armv8-R, there is not any non-secure application. All workloads on Armv8-R
are secure. But in a heterogeneous system (e.g. cortex-a + cortex-r), we can
treat the entire Armv8-R as a TrustZone. The traditional secure applications
on Cortex-A TrustZone can be deployed to Armv8-R, the NS-applications on
Cortex-A can use IPC to send request to Armv8-R "TrustZone".

> Also, does Armv8-R support Trustzone controller 400 which helps to
> partition memory into different protected enclaves based on NSAID ?
> 
> (Apologies if my queries are irrelevant, I am asking this purely out of
> my own interest :) )

Yes, if you have downloaded the FVP_BaseR_AEMv8R, you can find it has
integrated this IP in model. Though, I don't know what is the use-case
of it on the V8R. But from architecture's view you can use TZC-400 for
Armv8-R.

> 
> >      - EL3 is not supported, EL2 is mandatory. This means secure EL2 is
> the
> >        highest EL.
> >      - Supports the A64 ISA instruction
> >          - With a small set of well-defined differences
> >      - Provides a PMSA (Protected Memory System Architecture) based
> >        virtualization model.
> >          - As opposed to Armv8-A AArch64's VMSA based Virtualization
> >          - Can support address bits up to 52 if FEAT_LPA is enabled,
> >            otherwise 48 bits.
> >          - Determines the access permissions and memory attributes of
> >            the target PA.
> >          - Can implement PMSAv8-64 at EL1 and EL2
> >              - Address translation flat-maps the VA to the PA for EL2
> Stage 1.
> >              - Address translation flat-maps the VA to the PA for EL1
> Stage 1.
> >              - Address translation flat-maps the IPA to the PA for EL1
> Stage 2.
> >      - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.
> >
> > ### 1.2. Xen Challenges with PMSA Virtualization
> > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
> > with an MPU and host multiple guest OSes.
> >
> > - No MMU at EL2:
> >      - No EL2 Stage 1 address translation
> >          - Xen provides fixed ARM64 virtual memory layout as basis of
> EL2
> >            stage 1 address translation, which is not applicable on MPU
> system,
> >            where there is no virtual addressing. As a result, any
> operation
> >            involving transition from PA to VA, like ioremap, needs
> modification
> >            on MPU system.
> >      - Xen's run-time addresses are the same as the link time addresses.
> >          - Enable PIC (position-independent code) on a real-time target
> >            processor probably very rare.
> >      - Xen will need to use the EL2 MPU memory region descriptors to
> manage
> >        access permissions and attributes for accesses made by VMs at
> EL1/0.
> >          - Xen currently relies on MMU EL1 stage 2 table to manage these
> >            accesses.
> > - No MMU Stage 2 translation at EL1:
> >      - A guest doesn't have an independent guest physical address space
> >      - A guest can not reuse the current Intermediate Physical Address
> >        memory layout
> >      - A guest uses physical addresses to access memory and devices
> >      - The MPU at EL2 manages EL1 stage 2 access permissions and
> attributes
> > - There are a limited number of MPU protection regions at both EL2 and
> EL1:
> >      - Architecturally, the maximum number of protection regions is 256,
> >        typical implementations have 32.
> >      - By contrast, Xen does not need to consider the number of page
> table
> >        entries in theory when using MMU.
> > - The MPU protection regions at EL2 need to be shared between the
> hypervisor
> >    and the guest stage 2.
> >      - Requires careful consideration - may impact feature 'fullness' of
> both
> >        the hypervisor and the guest
> >      - By contrast, when using MMU, Xen has standalone P2M table for
> guest
> >        stage 2 accesses.
> So, can it support running both RTOS and Linux as guests ? My
> understanding is no as we can't enable MPU (for RTOS) and MMU (for
> Linux) at the same time. There needs to be two separate images of Xen.
> Please confirm.
> >
> > ## 2. Proposed changes of Xen
> > ### **2.1. Changes of build system:**
> >
> > - ***Introduce new Kconfig options for Armv8-R64***:
> >    Unlike Armv8-A, because lack of MMU support on Armv8-R64,
> But Armv8-R64 supports VMSA (Refer
> ARM DDI 0600A.d ID120821, B1.2.2,
> Virtual Memory System Architecture, VMSAv8-64). So it should support
> MMU, isn't it ?
> 

Sorry, my description of this sentence is not accurate enough. It should be
"lack of MMU support on Armv8-R64 EL2". Only EL1 can be configured to VMSA.
Even so, the VMSA in EL1 support is not mandatory. If you want to enable
VMSA in EL1 for guest, you have to check the ID register to confirm your
platform support VMSA in EL1. 

> - Ayan
> > we may not
> >    expect one Xen binary to run on all machines. Xen images are not
> common
> >    across Armv8-R64 platforms. Xen must be re-built for different Armv8-
> R64
> >    platforms. Because these platforms may have different memory layout
> and
> >    link address.
> >      - `ARM64_V8R`:
> >        This option enables Armv8-R profile for Arm64. Enabling this
> option
> >        results in selecting MPU. This Kconfig option is used to gate
> some
> >        Armv8-R64 specific code except MPU code, like some code for
> Armv8-R64
> >        only system ID registers access.
> >
> >      - `ARM_MPU`
> >        This option enables MPU on ARMv8-R architecture. Enabling this
> option
> >        results in disabling MMU. This Kconfig option is used to gate
> some
> >        ARM_MPU specific code. Once when this Kconfig option has been
> enabled,
> >        the MMU relate code will not be built for Armv8-R64. The reason
> why
> >        not depends on runtime detection to select MMU or MPU is that, we
> don't
> >        think we can use one image for both Armv8-R64 and Armv8-A64.
> Another
> >        reason that we separate MPU and V8R in provision to allow to
> support MPU
> >        on 32bit Arm one day.
> >
> >      - `XEN_START_ADDRESS`
> >        This option allows to set the custom address at which Xen will be
> >        linked. This address must be aligned to a page size. Xen's run-
> time
> >        addresses are the same as the link time addresses. Different
> platforms
> >        may have differnt memory layout. This Kconfig option provides
> users
> >        the ability to select proper link addresses for their boards.
> >        ***Notes: Fixed link address means the Xen binary could not be***
> >        ***relocated by EFI loader. So in current stage, Xen could not***
> >        ***be launched as an EFI application on Armv8-R64.***
> >
> >      - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
> >        `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
> >        These Kconfig options allow to set memory regions for Xen code,
> data
> >        and device memory. Before parsing memory information from device
> tree,
> >        Xen will use the values that stored in these options to setup
> boot-time
> >        MPU configuration. Why we need a boot-time MPU configuration?
> >        1. More deterministic: Arm MPU supports background regions,
> >           if we don't configure the MPU regions and don't enable MPU.
> >           We can enable MPU background regions. But that means all RAM
> >           is RWX. Random values in RAM or maliciously embedded data can
> >           be exploited. Using these Kconfig options allow users to have
> >           a deterministic RAM area to execute code.
> >        2. More compatible: On some Armv8-R64 platforms, if the MPU is
> >           disabled, the `dc zva` instruction will make the system halt.
> >           And this instruction will be embedded in some built-in
> functions,
> >           like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
> >           the built-in functions will not contain `dc zva`. However, it
> is
> >           obviously unlikely that we will be able to recompile all GCC
> >           for ARMv8-R64.
> >        3. One optional idea:
> >            We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB`
> or
> >            `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
> >            MPU normal memory. It's enough to support Xen run in boot
> time.
> >
> > - ***Define new system registers for compilers***:
> >    Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> >    specific system registers. As Armv8-R64 only have secure state, so
> >    at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
> >    first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> >    these, PMSA of Armv8-R64 introduced lots of MPU related system
> registers:
> >    `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
> >    `MPUIR_ELx`. But the first GCC version to support these system
> registers
> >    is GCC 11. So we have two ways to make compilers to work properly
> with
> >    these system registers.
> >    1. Bump GCC version to GCC 11.
> >       The pros of this method is that, we don't need to encode these
> >       system registers in macros by ourselves. But the cons are that,
> >       we have to update Makefiles to support GCC 11 for Armv8-R64.
> >       1.1. Check the GCC version 11 for Armv8-R64.
> >       1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> >       1.3. Solve the confliction of march=armv8r and mcpu=generic
> >      These changes will affect common Makefiles, not only Arm Makefiles.
> >      And GCC 11 is new, lots of toolchains and Distro haven't supported
> it.
> >
> >    2. Encode new system registers in macros ***(preferred)***
> >          ```
> >          /* Virtualization Secure Translation Control Register */
> >          #define VSTCR_EL2  S3_4_C2_C6_2
> >          /* Virtualization System Control Register */
> >          #define VSCTLR_EL2 S3_4_C2_C0_0
> >          /* EL1 MPU Protection Region Base Address Register encode */
> >          #define PRBAR_EL1  S3_0_C6_C8_0
> >          ...
> >          /* EL2 MPU Protection Region Base Address Register encode */
> >          #define PRBAR_EL2  S3_4_C6_C8_0
> >          ...
> >          ```
> >       If we encode all above system registers, we don't need to bump GCC
> >       version. And the common CFLAGS Xen is using still can be applied
> to
> >       Armv8-R64. We don't need to modify Makefiles to add specific
> CFLAGS.
> >
> > ### **2.2. Changes of the initialization process**
> > In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
> > initialization process. In addition to some architecture differences,
> there
> > is no more than reusable code that we will distinguish through
> CONFIG_ARM_MPU
> > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> reusable
> > between Armv8-R64 and Armv8-A64.
> >
> > - We will reuse the original head.s and setup.c of Arm. But replace the
> >    MMU and page table operations in these files with configuration
> operations
> >    for MPU and MPU regions.
> >
> > - We provide a boot-time MPU configuration. This MPU configuration will
> >    support Xen to finish its initialization. And this boot-time MPU
> >    configuration will record the memory regions that will be parsed from
> >    device tree.
> >
> >    In the end of Xen initialization, we will use a runtime MPU
> configuration
> >    to replace boot-time MPU configuration. The runtime MPU configuration
> will
> >    merge and reorder memory regions to save more MPU regions for guests.
> >    ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqR
> DoacQVTwUtWIGU)
> >
> > - Defer system unpausing domain.
> >    When Xen initialization is about to end, Xen unpause guests created
> >    during initialization. But this will cause some issues. The unpause
> >    action occurs before free_init_memory, however the runtime MPU
> configuration
> >    is built after free_init_memory.
> >
> >    So if the unpaused guests start executing the context switch at this
> >    point, then its MPU context will base on the boot-time MPU
> configuration.
> >    Probably it will be inconsistent with runtime MPU configuration, this
> >    will cause unexpected problems (This may not happen in a single core
> >    system, but on SMP systems, this problem is foreseeable, so we hope
> to
> >    solve it at the beginning).
> >
> > ### **2.3. Changes to reduce memory fragmentation**
> >
> > In general, memory in Xen system can be classified to 4 classes:
> > `image sections`, `heap sections`, `guest RAM`, `boot modules (guest
> Kernel,
> > initrd and dtb)`
> >
> > Currently, Xen doesn't have any restriction for users how to allocate
> > memory for different classes. That means users can place boot modules
> > anywhere, can reserve Xen heap memory anywhere and can allocate guest
> > memory anywhere.
> >
> > In a VMSA system, this would not be too much of a problem, since the
> > MMU can manage memory at a granularity of 4KB after all. But in a
> > PMSA system, this will be a big problem. On Armv8-R64, the max MPU
> > protection regions number has been limited to 256. But in typical
> > processor implementations, few processors will design more than 32
> > MPU protection regions. Add in the fact that Xen shares MPU protection
> > regions with guest's EL1 Stage 2. It becomes even more important
> > to properly plan the use of MPU protection regions.
> >
> > - An ideal of memory usage layout restriction:
> > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAt
> d75XtrngcnW)
> > 1. Reserve proper MPU regions for Xen image (code, rodata and data +
> bss).
> > 2. Reserve one MPU region for boot modules.
> >     That means the placement of all boot modules, include guest kernel,
> >     initrd and dtb, will be limited to this MPU region protected area.
> > 3. Reserve one or more MPU regions for Xen heap.
> >     On Armv8-R64, the guest memory is predefined in device tree, it will
> >     not be allocated from heap. Unlike Armv8-A64, we will not move all
> >     free memory to heap. We want Xen heap is dertermistic too, so Xen on
> >     Armv8-R64 also rely on Xen static heap feature. The memory for Xen
> >     heap will be defined in tree too. Considering that physical memory
> >     can also be discontinuous, one or more MPU protection regions needs
> >     to be reserved for Xen HEAP.
> > 4. If we name above used MPU protection regions PART_A, and name left
> >     MPU protection regions PART_B:
> >     4.1. In hypervisor context, Xen will map left RAM and devices to
> PART_B.
> >          This will give Xen the ability to access whole memory.
> >     4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B.
> >          In this case, Xen just need to update PART_B in context switch,
> >          but keep PART_A as fixed.
> >
> > ***Notes: Static allocation will be mandatory on MPU based systems***
> >
> > **A sample device tree of memory layout restriction**:
> > ```
> > chosen {
> >      ...
> >      /*
> >       * Define a section to place boot modules,
> >       * all boot modules must be placed in this section.
> >       */
> >      mpu,boot-module-section = <0x10000000 0x10000000>;
> >      /*
> >       * Define a section to cover all guest RAM. All guest RAM must be
> located
> >       * within this section. The pros is that, in best case, we can only
> have
> >       * one MPU protection region to map all guest RAM for Xen.
> >       */
> >      mpu,guest-memory-section = <0x20000000 0x30000000>;
> >      /*
> >       * Define a memory section that can cover all device memory that
> >       * will be used in Xen.
> >       */
> >      mpu,device-memory-section = <0x80000000 0x7ffff000>;
> >      /* Define a section for Xen heap */
> >      xen,static-mem = <0x50000000 0x20000000>;
> >
> >      domU1 {
> >          ...
> >          #xen,static-mem-address-cells = <0x01>;
> >          #xen,static-mem-size-cells = <0x01>;
> >          /* Statically allocated guest memory, within mpu,guest-memory-
> section */
> >          xen,static-mem = <0x30000000 0x1f000000>;
> >
> >          module@11000000 {
> >              compatible = "multiboot,kernel\0multiboot,module";
> >              /* Boot module address, within mpu,boot-module-section */
> >              reg = <0x11000000 0x3000000>;
> >              ...
> >          };
> >
> >          module@10FF0000 {
> >                  compatible = "multiboot,device-tree\0multiboot,module";
> >                  /* Boot module address, within mpu,boot-module-section
> */
> >                  reg = <0x10ff0000 0x10000>;
> >                  ...
> >          };
> >      };
> > };
> > ```
> >
> > ### **2.4. Changes of memory management**
> > Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
> > decouple Xen from VMSA. And give Xen the ability to manage memory in
> PMSA.
> >
> > 1. ***Use buddy allocator to manage physical pages for PMSA***
> >     From the view of physical page, PMSA and VMSA don't have any
> difference.
> >     So we can reuse buddy allocator on Armv8-R64 to manage physical
> pages.
> >     The difference is that, in VMSA, Xen will map allocated pages to
> virtual
> >     addresses. But in PMSA, Xen just convert the pages to physical
> address.
> >
> > 2. ***Can not use virtual address for memory management***
> >     As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using
> virtual
> >     address to manage memory. This brings some problems, some virtual
> address
> >     based features could not work well on Armv8-R64, like `FIXMAP`,
> `vmap/vumap`,
> >     `ioremap` and `alternative`.
> >
> >     But the functions or macros of these features are used in lots of
> common
> >     code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate
> code
> >     everywhere. In this case, we propose to use stub helpers to make the
> changes
> >     transparently to common code.
> >     1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap
> operations.
> >        This will return physical address directly of fixmapped item.
> >     2. For `vmap/vumap`, we will use some empty inline stub helpers:
> >          ```
> >          static inline void vm_init_type(...) {}
> >          static inline void *__vmap(...)
> >          {
> >              return NULL;
> >          }
> >          static inline void vunmap(const void *va) {}
> >          static inline void *vmalloc(size_t size)
> >          {
> >              return NULL;
> >          }
> >          static inline void *vmalloc_xen(size_t size)
> >          {
> >              return NULL;
> >          }
> >          static inline void vfree(void *va) {}
> >          ```
> >
> >     3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to
> always
> >        return `NULL`, they could not work well on Armv8-R64 without
> changes.
> >        `ioremap` will return input address directly.
> >          ```
> >          static inline void *ioremap_attr(...)
> >          {
> >              /* We don't have the ability to change input PA cache
> attributes */
> >              if ( CACHE_ATTR_need_change )
> >                  return NULL;
> >              return (void *)pa;
> >          }
> >          static inline void __iomem *ioremap_nocache(...)
> >          {
> >              return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
> >          }
> >          static inline void __iomem *ioremap_cache(...)
> >          {
> >              return ioremap_attr(start, len, PAGE_HYPERVISOR);
> >          }
> >          static inline void __iomem *ioremap_wc(...)
> >          {
> >              return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
> >          }
> >          void *ioremap(...)
> >          {
> >              return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> >          }
> >
> >          ```
> >      4. For `alternative`, it depends on `vmap` too. We will simply
> disable
> >         it on Armv8-R64 in current stage. How to implement `alternative`
> >         on Armv8-R64 is better to be discussed after basic functions of
> Xen
> >         on Armv8-R64 work well.
> >         But simply disable `alternative` will make `cpus_have_const_cap`
> always
> >         return false.
> >          ```
> >          * System capability check for constant cap */
> >          #define cpus_have_const_cap(num) ({                \
> >                 register_t __ret;                           \
> >                                                             \
> >                 asm volatile (ALTERNATIVE("mov %0, #0",     \
> >                                           "mov %0, #1",     \
> >                                           num)              \
> >                               : "=r" (__ret));              \
> >                                                             \
> >                  unlikely(__ret);                           \
> >                  })
> >          ```
> >          So, before we have an PMSA `alternative` implementation, we
> have to
> >          implement a separate `cpus_have_const_cap` for Armv8-R64:
> >          ```
> >          #define cpus_have_const_cap(num) cpus_have_cap(num)
> >          ```
> >
> > ### **2.5. Changes of guest management**
> > Armv8-R64 only supports PMSA in EL2, but it supports configurable
> > VMSA or PMSA in EL1. This means Xen will have a new type guest on
> > Armv8-R64 - MPU based guest.
> >
> > 1. **Add a new domain type - MPU_DOMAIN**
> >     When user want to create a guest that will be using MPU in EL1, user
> >     should add a `mpu` property in device tree `domU` node, like
> following
> >     example:
> >      ```
> >      domU2 {
> >          compatible = "xen,domain";
> >          direct-map;
> >          mpu; --> Indicates this domain will use PMSA in EL1.
> >          ...
> >      };
> >      ```
> >      Corresponding to `mpu` property in device tree, we also need to
> introduce
> >      a new flag `XEN_DOMCTL_CDF_INTERNAL_mpu` for domain to mark itself
> as an
> >      MPU domain. This flag will be used in domain creation and domain
> doing
> >      vCPU context switch.
> >      1. Domain creation need this flag to decide enable PMSA or VMSA in
> EL1.
> >      2. vCPU context switch need this flag to decide save/restore MMU or
> MPU
> >         related registers.
> >
> > 2. **Add MPU registers to vCPU save EL1 MPU context**
> >     Current Xen only support MMU based guest, so it hasn't considered to
> >     save/restore MPU context. In this case, we need to add MPU registers
> >     to `arch_vcpu`:
> >      ```
> >      struct arch_vcpu
> >      {
> >      #ifdef CONFIG_ARM_MPU
> >          /* Virtualization Translation Control Register */
> >          register_t vtcr_el2;
> >
> >          /* EL1 MPU regions' registers */
> >          pr_t mpu_regions[CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS];
> >      #endif
> >      }
> >      ```
> >      Armv8-R64 can support max to 256 MPU regions. But that's just
> theoretical.
> >      So we don't want to define `pr_t mpu_regions[256]`, this is a
> memory waste
> >      in most of time. So we decided to let the user specify through a
> Kconfig
> >      option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can
> be `32`,
> >      it's a typical implementation on Armv8-R64. Users will recompile
> Xen when
> >      their platform changes. So when the MPU changes, respecifying the
> MPU
> >      protection regions number will not cause additional problems.
> >
> > 3. **MPU based P2M table management**
> >     Armv8-R64 EL2 doesn't have EL1 stage 2 address translation. But
> through
> >     PMSA, it still has the ability to control the permissions and
> attributes
> >     of EL1 stage 2. In this case, we still hope to keep the interface
> >     consistent with MMU based P2M as far as possible.
> >
> >     p2m->root will point to an allocated memory. In Armv8-A64, this
> memory
> >     is used to save the EL1 stage 2 translation table. But in Armv8-R64,
> >     this memory will be used to store EL2 MPU protection regions that
> are
> >     used by guest. During domain creation, Xen will prepare the data in
> >     this memory to make guest can access proper RAM and devices. When
> the
> >     guest's vCPU will be scheduled in, this data will be written to MPU
> >     protection region registers.
> >
> > ### **2.6. Changes of exception trap**
> > As Armv8-R64 has compatible exception mode with Armv8-A64, so we can
> reuse most
> > of Armv8-A64's exception trap & handler code. But except the trap based
> on EL1
> > stage 2 translation abort.
> >
> > In Armv8-A64, we use `FSC_FLT_TRANS`
> > ```
> >      case FSC_FLT_TRANS:
> >          ...
> >          if ( is_data )
> >          {
> >              enum io_state state = try_handle_mmio(regs, hsr, gpa);
> >              ...
> >          }
> > ```
> > But for Armv8-R64, we have to use `FSC_FLT_PERM`
> > ```
> >      case FSC_FLT_PERM:
> >          ...
> >          if ( is_data )
> >          {
> >              enum io_state state = try_handle_mmio(regs, hsr, gpa);
> >              ...
> >          }
> > ```
> >
> > ### **2.5. Changes of device driver**
> > 1. Because Armv8-R64 only has single secure state, this will affect some
> > devices that have two secure state, like GIC. But fortunately, most
> > vendors will not link a two secure state GIC to Armv8-R64 processors.
> > Current GIC driver can work well with single secure state GIC for Armv8-
> R64.
> > 2. Xen should use secure hypervisor timer in Secure EL2. We will
> introduce
> > a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for timer.
> >
> > ### **2.7. Changes of virtual device**
> > Currently, we only support pass-through devices in guest. Because event
> > channel, xen-bus, xen-storage and other advanced Xen features haven't
> been
> > enabled in Armv8-R64.
> >
> > --
> > Cheers,
> > Wei Chen
> >
> >

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-25  0:55 ` Stefano Stabellini
@ 2022-02-25 10:48   ` Wei Chen
  2022-02-25 20:12     ` Julien Grall
  2022-02-25 23:54     ` Stefano Stabellini
  0 siblings, 2 replies; 34+ messages in thread
From: Wei Chen @ 2022-02-25 10:48 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, julien, Bertrand Marquis, Penny Zheng, Henry Wang, nd

[-- Attachment #1: Type: text/plain, Size: 33692 bytes --]

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2022年2月25日 8:56
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; julien@xen.org; Stefano Stabellini
> <sstabellini@kernel.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Penny Zheng <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd
> <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
>
> Hi Wei,
>
> This is extremely exciting, thanks for the very nice summary!
>
>
> On Thu, 24 Feb 2022, Wei Chen wrote:
> > # Proposal for Porting Xen to Armv8-R64
> >
> > This proposal will introduce the PoC work of porting Xen to Armv8-R64,
> > which includes:
> > - The changes of current Xen capability, like Xen build system, memory
> >   management, domain management, vCPU context switch.
> > - The expanded Xen capability, like static-allocation and direct-map.
> >
> > ***Notes:***
> > 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
> >    ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> >    ***Trusted-Frimware (TF-R). This is an external dependency,***
> >    ***so we think the discussion of Xen SMP support on Armv8-R64***
> >    ***should be started when single-CPU support is complete.***
> > 2. ***This proposal will not touch xen-tools. In current stage,***
> >    ***Xen on Armv8-R64 only support dom0less, all guests should***
> >    ***be booted from device tree.***
> >
> > ## 1. Essential Background
> >
> > ### 1.1. Armv8-R64 Profile
> > The Armv-R architecture profile was designed to support use cases that
> > have a high sensitivity to deterministic execution. (e.g. Fuel Injection,
> > Brake control, Drive trains, Motor control etc)
> >
> > Arm announced Armv8-R in 2013, it is the latest generation Arm
> architecture
> > targeted at the Real-time profile. It introduces virtualization at the
> highest
> > security level while retaining the Protected Memory System Architecture
> (PMSA)
> > based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex-
> R82,
> > which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
> >
> > - The latest Armv8-R64 document can be found here:
> >   [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R
> AArch64 architecture
> profile](https://developer.arm.com/documentation/ddi0600/latest/).
> >
> > - Armv-R Architecture progression:
> >   Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
> >   The following figure is a simple comparison of "R" processors based on
> >   different Armv-R Architectures.
> >   ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8i
> mBpbvIr2eqBguEB)
> >
> > - The Armv8-R architecture evolved additional features on top of Armv7-R:
> >     - An exception model that is compatible with the Armv8-A model
> >     - Virtualization with support for guest operating systems
> >         - PMSA virtualization using MPUs In EL2.
> > - The new features of Armv8-R64 architecture
> >     - Adds support for the 64-bit A64 instruction set, previously Armv8-
> R
> >       only supported A32.
> >     - Supports up to 48-bit physical addressing, previously up to 32-bit
> >       addressing was supported.
> >     - Optional Arm Neon technology and Advanced SIMD
> >     - Supports three Exception Levels (ELs)
> >         - Secure EL2 - The Highest Privilege, MPU only, for firmware,
> hypervisor
> >         - Secure EL1 - RichOS (MMU) or RTOS (MPU)
> >         - Secure EL0 - Application Workloads
> >     - Optionally supports Virtual Memory System Architecture at S-EL1/S-
> EL0.
> >       This means it's possible to run rich OS kernels - like Linux -
> either
> >       bare-metal or as a guest.
> > - Differences with the Armv8-A AArch64 architecture
> >     - Supports only a single Security state - Secure. There is not Non-
> Secure
> >       execution state supported.
> >     - EL3 is not supported, EL2 is mandatory. This means secure EL2 is
> the
> >       highest EL.
> >     - Supports the A64 ISA instruction
> >         - With a small set of well-defined differences
> >     - Provides a PMSA (Protected Memory System Architecture) based
> >       virtualization model.
> >         - As opposed to Armv8-A AArch64's VMSA based Virtualization
> >         - Can support address bits up to 52 if FEAT_LPA is enabled,
> >           otherwise 48 bits.
> >         - Determines the access permissions and memory attributes of
> >           the target PA.
> >         - Can implement PMSAv8-64 at EL1 and EL2
> >             - Address translation flat-maps the VA to the PA for EL2
> Stage 1.
> >             - Address translation flat-maps the VA to the PA for EL1
> Stage 1.
> >             - Address translation flat-maps the IPA to the PA for EL1
> Stage 2.
> >     - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.
> >
> > ### 1.2. Xen Challenges with PMSA Virtualization
> > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
> > with an MPU and host multiple guest OSes.
> >
> > - No MMU at EL2:
> >     - No EL2 Stage 1 address translation
> >         - Xen provides fixed ARM64 virtual memory layout as basis of EL2
> >           stage 1 address translation, which is not applicable on MPU
> system,
> >           where there is no virtual addressing. As a result, any
> operation
> >           involving transition from PA to VA, like ioremap, needs
> modification
> >           on MPU system.
> >     - Xen's run-time addresses are the same as the link time addresses.
> >         - Enable PIC (position-independent code) on a real-time target
> >           processor probably very rare.
> >     - Xen will need to use the EL2 MPU memory region descriptors to
> manage
> >       access permissions and attributes for accesses made by VMs at
> EL1/0.
> >         - Xen currently relies on MMU EL1 stage 2 table to manage these
> >           accesses.
> > - No MMU Stage 2 translation at EL1:
> >     - A guest doesn't have an independent guest physical address space
> >     - A guest can not reuse the current Intermediate Physical Address
> >       memory layout
> >     - A guest uses physical addresses to access memory and devices
> >     - The MPU at EL2 manages EL1 stage 2 access permissions and
> attributes
> > - There are a limited number of MPU protection regions at both EL2 and
> EL1:
> >     - Architecturally, the maximum number of protection regions is 256,
> >       typical implementations have 32.
> >     - By contrast, Xen does not need to consider the number of page
> table
> >       entries in theory when using MMU.
> > - The MPU protection regions at EL2 need to be shared between the
> hypervisor
> >   and the guest stage 2.
> >     - Requires careful consideration - may impact feature 'fullness' of
> both
> >       the hypervisor and the guest
> >     - By contrast, when using MMU, Xen has standalone P2M table for
> guest
> >       stage 2 accesses.
> >
> > ## 2. Proposed changes of Xen
> > ### **2.1. Changes of build system:**
> >
> > - ***Introduce new Kconfig options for Armv8-R64***:
> >   Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may not
> >   expect one Xen binary to run on all machines. Xen images are not
> common
> >   across Armv8-R64 platforms. Xen must be re-built for different Armv8-
> R64
> >   platforms. Because these platforms may have different memory layout
> and
> >   link address.
> >     - `ARM64_V8R`:
> >       This option enables Armv8-R profile for Arm64. Enabling this
> option
> >       results in selecting MPU. This Kconfig option is used to gate some
> >       Armv8-R64 specific code except MPU code, like some code for Armv8-
> R64
> >       only system ID registers access.
> >
> >     - `ARM_MPU`
> >       This option enables MPU on ARMv8-R architecture. Enabling this
> option
> >       results in disabling MMU. This Kconfig option is used to gate some
> >       ARM_MPU specific code. Once when this Kconfig option has been
> enabled,
> >       the MMU relate code will not be built for Armv8-R64. The reason
> why
> >       not depends on runtime detection to select MMU or MPU is that, we
> don't
> >       think we can use one image for both Armv8-R64 and Armv8-A64.
> Another
> >       reason that we separate MPU and V8R in provision to allow to
> support MPU
> >       on 32bit Arm one day.
> >
> >     - `XEN_START_ADDRESS`
> >       This option allows to set the custom address at which Xen will be
> >       linked. This address must be aligned to a page size. Xen's run-
> time
> >       addresses are the same as the link time addresses. Different
> platforms
> >       may have differnt memory layout. This Kconfig option provides
> users
> >       the ability to select proper link addresses for their boards.
> >       ***Notes: Fixed link address means the Xen binary could not be***
> >       ***relocated by EFI loader. So in current stage, Xen could not***
> >       ***be launched as an EFI application on Armv8-R64.***
> >
> >     - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
> >       `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
> >       These Kconfig options allow to set memory regions for Xen code,
> data
> >       and device memory. Before parsing memory information from device
> tree,
> >       Xen will use the values that stored in these options to setup
> boot-time
> >       MPU configuration. Why we need a boot-time MPU configuration?
> >       1. More deterministic: Arm MPU supports background regions,
> >          if we don't configure the MPU regions and don't enable MPU.
> >          We can enable MPU background regions. But that means all RAM
> >          is RWX. Random values in RAM or maliciously embedded data can
> >          be exploited. Using these Kconfig options allow users to have
> >          a deterministic RAM area to execute code.
> >       2. More compatible: On some Armv8-R64 platforms, if the MPU is
> >          disabled, the `dc zva` instruction will make the system halt.
> >          And this instruction will be embedded in some built-in
> functions,
> >          like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
> >          the built-in functions will not contain `dc zva`. However, it
> is
> >          obviously unlikely that we will be able to recompile all GCC
> >          for ARMv8-R64.
> >       3. One optional idea:
> >           We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` or
> >           `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
> >           MPU normal memory. It's enough to support Xen run in boot time.
>
> I can imagine that we need to have a different Xen build for each
> ARMv8-R platform. Do you envision that XEN_START_ADDRESS and
> ARM_MPU_*_MEMORY_START/END are preconfigured based on the platform
> choice at build time? I don't think we want a user to provide all of
> those addresses by hand, right?

Yes, this is in our TODO list. We want to reuse current arm/platforms and
Kconfig menu for Armv8-R.

>
> The next question is whether we could automatically generate
> XEN_START_ADDRESS and ARM_MPU_*_MEMORY_START/END based on the platform
> device tree at build time (at build time, not runtime). That would
> make things a lot easier and it is also aligned with the way Zephyr and
> other RTOSes and baremetal apps work.

It's a considerable option. But here we may encounter some problems need
to be solved first:
1. Does CONFIG_DTB must be selected by default on Armv8-R? Without firmware
   or bootloader (like u-boot), we have to build DTB into Xen binary. This
   can guarantee build-time DTB is the same as runtime DTB. But eventually,
   we will have firmware and bootloader before Xen launch (as Arm EBBR's
   requirement). In this case, we may not build DTB into Xen image. And
   we can't guarantee build-time DTB is the same as runtime DTB.
2. If build-time DTB is the same as runtime DTB, how can we determine
   the XEN_START_ADDRESS in DTB describe memory range? Should we always
   limit Xen to boot from lowest address? Or will we introduce some new
   DT property to specify the Xen start address? I think this DT property
   also can solve above question#1.

>
> The device tree can be given as input to the build system, and the
> Makefiles would take care of generating XEN_START_ADDRESS and
> ARM_MPU_*_MEMORY_START/END based on /memory and other interesting nodes.
>

If we can solve above questions, yes, device tree is a good idea for
XEN_START_ADDRESS. For ARM_MPU_NORMAL_MEMORY_*, we can get them from
memory nodes, but for ARM_MPU_DEVICE_MEMORY_*, they are not easy for
us to scan all devices' nodes. And it's very tricky, if the memory
regions are interleaved. So in our current RFC code, we select to use
the optional idea:
We map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` for MPU normal memory.
But we use mpu,device-memory-section in DT for MPU device memory.

>
> > - ***Define new system registers for compilers***:
> >   Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> >   specific system registers. As Armv8-R64 only have secure state, so
> >   at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
> >   first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> >   these, PMSA of Armv8-R64 introduced lots of MPU related system
> registers:
> >   `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
> >   `MPUIR_ELx`. But the first GCC version to support these system
> registers
> >   is GCC 11. So we have two ways to make compilers to work properly with
> >   these system registers.
> >   1. Bump GCC version to GCC 11.
> >      The pros of this method is that, we don't need to encode these
> >      system registers in macros by ourselves. But the cons are that,
> >      we have to update Makefiles to support GCC 11 for Armv8-R64.
> >      1.1. Check the GCC version 11 for Armv8-R64.
> >      1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> >      1.3. Solve the confliction of march=armv8r and mcpu=generic
> >     These changes will affect common Makefiles, not only Arm Makefiles.
> >     And GCC 11 is new, lots of toolchains and Distro haven't supported
> it.
> >
> >   2. Encode new system registers in macros ***(preferred)***
> >         ```
> >         /* Virtualization Secure Translation Control Register */
> >         #define VSTCR_EL2  S3_4_C2_C6_2
> >         /* Virtualization System Control Register */
> >         #define VSCTLR_EL2 S3_4_C2_C0_0
> >         /* EL1 MPU Protection Region Base Address Register encode */
> >         #define PRBAR_EL1  S3_0_C6_C8_0
> >         ...
> >         /* EL2 MPU Protection Region Base Address Register encode */
> >         #define PRBAR_EL2  S3_4_C6_C8_0
> >         ...
> >         ```
> >      If we encode all above system registers, we don't need to bump GCC
> >      version. And the common CFLAGS Xen is using still can be applied to
> >      Armv8-R64. We don't need to modify Makefiles to add specific CFLAGS.
>
> I think that's fine and we did something similar with the original ARMv7-A
> port if I remember correctly.
>
>
> > ### **2.2. Changes of the initialization process**
> > In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
> > initialization process. In addition to some architecture differences,
> there
> > is no more than reusable code that we will distinguish through
> CONFIG_ARM_MPU
> > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> reusable
> > between Armv8-R64 and Armv8-A64.
>
> +1
>
>
> > - We will reuse the original head.s and setup.c of Arm. But replace the
> >   MMU and page table operations in these files with configuration
> operations
> >   for MPU and MPU regions.
> >
> > - We provide a boot-time MPU configuration. This MPU configuration will
> >   support Xen to finish its initialization. And this boot-time MPU
> >   configuration will record the memory regions that will be parsed from
> >   device tree.
> >
> >   In the end of Xen initialization, we will use a runtime MPU
> configuration
> >   to replace boot-time MPU configuration. The runtime MPU configuration
> will
> >   merge and reorder memory regions to save more MPU regions for guests.
> >   ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRD
> oacQVTwUtWIGU)
> >
> > - Defer system unpausing domain.
> >   When Xen initialization is about to end, Xen unpause guests created
> >   during initialization. But this will cause some issues. The unpause
> >   action occurs before free_init_memory, however the runtime MPU
> configuration
> >   is built after free_init_memory.
> >
> >   So if the unpaused guests start executing the context switch at this
> >   point, then its MPU context will base on the boot-time MPU
> configuration.
> >   Probably it will be inconsistent with runtime MPU configuration, this
> >   will cause unexpected problems (This may not happen in a single core
> >   system, but on SMP systems, this problem is foreseeable, so we hope to
> >   solve it at the beginning).
> >
> > ### **2.3. Changes to reduce memory fragmentation**
> >
> > In general, memory in Xen system can be classified to 4 classes:
> > `image sections`, `heap sections`, `guest RAM`, `boot modules (guest
> Kernel,
> > initrd and dtb)`
> >
> > Currently, Xen doesn't have any restriction for users how to allocate
> > memory for different classes. That means users can place boot modules
> > anywhere, can reserve Xen heap memory anywhere and can allocate guest
> > memory anywhere.
> >
> > In a VMSA system, this would not be too much of a problem, since the
> > MMU can manage memory at a granularity of 4KB after all. But in a
> > PMSA system, this will be a big problem. On Armv8-R64, the max MPU
> > protection regions number has been limited to 256. But in typical
> > processor implementations, few processors will design more than 32
> > MPU protection regions. Add in the fact that Xen shares MPU protection
> > regions with guest's EL1 Stage 2. It becomes even more important
> > to properly plan the use of MPU protection regions.
> >
> > - An ideal of memory usage layout restriction:
> > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAt
> d75XtrngcnW)
> > 1. Reserve proper MPU regions for Xen image (code, rodata and data +
> bss).
> > 2. Reserve one MPU region for boot modules.
> >    That means the placement of all boot modules, include guest kernel,
> >    initrd and dtb, will be limited to this MPU region protected area.
> > 3. Reserve one or more MPU regions for Xen heap.
> >    On Armv8-R64, the guest memory is predefined in device tree, it will
> >    not be allocated from heap. Unlike Armv8-A64, we will not move all
> >    free memory to heap. We want Xen heap is dertermistic too, so Xen on
> >    Armv8-R64 also rely on Xen static heap feature. The memory for Xen
> >    heap will be defined in tree too. Considering that physical memory
> >    can also be discontinuous, one or more MPU protection regions needs
> >    to be reserved for Xen HEAP.
> > 4. If we name above used MPU protection regions PART_A, and name left
> >    MPU protection regions PART_B:
> >    4.1. In hypervisor context, Xen will map left RAM and devices to
> PART_B.
> >         This will give Xen the ability to access whole memory.
> >    4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B.
> >         In this case, Xen just need to update PART_B in context switch,
> >         but keep PART_A as fixed.
>
> I think that the memory layout and restrictions that you wrote above
> make sense. I have some comments on the way they are represented in
> device tree, but that's different.
>
>
> > ***Notes: Static allocation will be mandatory on MPU based systems***
> >
> > **A sample device tree of memory layout restriction**:
> > ```
> > chosen {
> >     ...
> >     /*
> >      * Define a section to place boot modules,
> >      * all boot modules must be placed in this section.
> >      */
> >     mpu,boot-module-section = <0x10000000 0x10000000>;
> >     /*
> >      * Define a section to cover all guest RAM. All guest RAM must be
> located
> >      * within this section. The pros is that, in best case, we can only
> have
> >      * one MPU protection region to map all guest RAM for Xen.
> >      */
> >     mpu,guest-memory-section = <0x20000000 0x30000000>;
> >     /*
> >      * Define a memory section that can cover all device memory that
> >      * will be used in Xen.
> >      */
> >     mpu,device-memory-section = <0x80000000 0x7ffff000>;
> >     /* Define a section for Xen heap */
> >     xen,static-mem = <0x50000000 0x20000000>;
>
> As mentioned above, I understand the need for these sections, but why do
> we need to describe them in device tree at all? Could Xen select them by
> itself during boot?

I think without some inputs, Xen could not do this or will do it in some
assumption. For example, assume the first the boot-module-section determined
by lowest address and highest address of all modules. And the same for
guest-memory-section, calculated from all guest allocated memory regions.


>
> If not, and considering that we have to generate
> ARM_MPU_*_MEMORY_START/END anyway at build time, would it make sense to
> also generate mpu,guest-memory-section, xen,static-mem, etc. at build
> time rather than passing it via device tree to Xen at runtime?
>

Did you mean we still add these information in device tree, but for build
time only. In runtime we don't parse them?

> What's the value of doing ARM_MPU_*_MEMORY_START/END at build time and
> everything else at runtime?

ARM_MPU_*_MEMORY_START/END is defined by platform. But other things are
users customized. They can change their usage without rebuild the image.

>
> It looks like we are forced to have the sections definitions at build
> time because we need them before we can parse device tree. In that case,
> we might as well define all the sections at build time.
>
> But I think it would be even better if Xen could automatically choose
> xen,static-mem, mpu,guest-memory-section, etc. on its own based on the
> regular device tree information (/memory, /amba, etc.), without any need
> for explicitly describing each range with these new properties.
>

for mpu,guest-memory-section, with the limitations: no other usage between
different guest' memory nodes, this is OK. But for xen,static-mem (heap),
we just want everything on a MPU system is dertermistic. But, of course Xen
can select left memory for heap without static-mem.

>
> >     domU1 {
> >         ...
> >         #xen,static-mem-address-cells = <0x01>;
> >         #xen,static-mem-size-cells = <0x01>;
> >         /* Statically allocated guest memory, within mpu,guest-memory-
> section */
> >         xen,static-mem = <0x30000000 0x1f000000>;
> >
> >         module@11000000 {
> >             compatible = "multiboot,kernel\0multiboot,module";
> >             /* Boot module address, within mpu,boot-module-section */
> >             reg = <0x11000000 0x3000000>;
> >             ...
> >         };
> >
> >         module@10FF0000 {
> >                 compatible = "multiboot,device-tree\0multiboot,module";
> >                 /* Boot module address, within mpu,boot-module-section
> */
> >                 reg = <0x10ff0000 0x10000>;
> >                 ...
> >         };
> >     };
> > };
> > ```
> >
> > ### **2.4. Changes of memory management**
> > Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
> > decouple Xen from VMSA. And give Xen the ability to manage memory in
> PMSA.
> >
> > 1. ***Use buddy allocator to manage physical pages for PMSA***
> >    From the view of physical page, PMSA and VMSA don't have any
> difference.
> >    So we can reuse buddy allocator on Armv8-R64 to manage physical pages.
> >    The difference is that, in VMSA, Xen will map allocated pages to
> virtual
> >    addresses. But in PMSA, Xen just convert the pages to physical
> address.
> >
> > 2. ***Can not use virtual address for memory management***
> >    As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using
> virtual
> >    address to manage memory. This brings some problems, some virtual
> address
> >    based features could not work well on Armv8-R64, like `FIXMAP`,
> `vmap/vumap`,
> >    `ioremap` and `alternative`.
> >
> >    But the functions or macros of these features are used in lots of
> common
> >    code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate
> code
> >    everywhere. In this case, we propose to use stub helpers to make the
> changes
> >    transparently to common code.
> >    1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap
> operations.
> >       This will return physical address directly of fixmapped item.
> >    2. For `vmap/vumap`, we will use some empty inline stub helpers:
> >         ```
> >         static inline void vm_init_type(...) {}
> >         static inline void *__vmap(...)
> >         {
> >             return NULL;
> >         }
> >         static inline void vunmap(const void *va) {}
> >         static inline void *vmalloc(size_t size)
> >         {
> >             return NULL;
> >         }
> >         static inline void *vmalloc_xen(size_t size)
> >         {
> >             return NULL;
> >         }
> >         static inline void vfree(void *va) {}
> >         ```
> >
> >    3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to
> always
> >       return `NULL`, they could not work well on Armv8-R64 without
> changes.
> >       `ioremap` will return input address directly.
> >         ```
> >         static inline void *ioremap_attr(...)
> >         {
> >             /* We don't have the ability to change input PA cache
> attributes */
> >             if ( CACHE_ATTR_need_change )
> >                 return NULL;
> >             return (void *)pa;
> >         }
> >         static inline void __iomem *ioremap_nocache(...)
> >         {
> >             return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
> >         }
> >         static inline void __iomem *ioremap_cache(...)
> >         {
> >             return ioremap_attr(start, len, PAGE_HYPERVISOR);
> >         }
> >         static inline void __iomem *ioremap_wc(...)
> >         {
> >             return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
> >         }
> >         void *ioremap(...)
> >         {
> >             return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> >         }
> >
> >         ```
> >     4. For `alternative`, it depends on `vmap` too. We will simply
> disable
> >        it on Armv8-R64 in current stage. How to implement `alternative`
> >        on Armv8-R64 is better to be discussed after basic functions of
> Xen
> >        on Armv8-R64 work well.
> >        But simply disable `alternative` will make `cpus_have_const_cap`
> always
> >        return false.
> >         ```
> >         * System capability check for constant cap */
> >         #define cpus_have_const_cap(num) ({                \
> >                register_t __ret;                           \
> >                                                            \
> >                asm volatile (ALTERNATIVE("mov %0, #0",     \
> >                                          "mov %0, #1",     \
> >                                          num)              \
> >                              : "=r" (__ret));              \
> >                                                            \
> >                 unlikely(__ret);                           \
> >                 })
> >         ```
> >         So, before we have an PMSA `alternative` implementation, we have
> to
> >         implement a separate `cpus_have_const_cap` for Armv8-R64:
> >         ```
> >         #define cpus_have_const_cap(num) cpus_have_cap(num)
> >         ```
>
> I think it is OK to disable alternative
>
>
> > ### **2.5. Changes of guest management**
> > Armv8-R64 only supports PMSA in EL2, but it supports configurable
> > VMSA or PMSA in EL1. This means Xen will have a new type guest on
> > Armv8-R64 - MPU based guest.
> >
> > 1. **Add a new domain type - MPU_DOMAIN**
> >    When user want to create a guest that will be using MPU in EL1, user
> >    should add a `mpu` property in device tree `domU` node, like
> following
> >    example:
> >     ```
> >     domU2 {
> >         compatible = "xen,domain";
> >         direct-map;
> >         mpu; --> Indicates this domain will use PMSA in EL1.
> >         ...
> >     };
> >     ```
> >     Corresponding to `mpu` property in device tree, we also need to
> introduce
> >     a new flag `XEN_DOMCTL_CDF_INTERNAL_mpu` for domain to mark itself
> as an
> >     MPU domain. This flag will be used in domain creation and domain
> doing
> >     vCPU context switch.
> >     1. Domain creation need this flag to decide enable PMSA or VMSA in
> EL1.
> >     2. vCPU context switch need this flag to decide save/restore MMU or
> MPU
> >        related registers.
> >
> > 2. **Add MPU registers to vCPU save EL1 MPU context**
> >    Current Xen only support MMU based guest, so it hasn't considered to
> >    save/restore MPU context. In this case, we need to add MPU registers
> >    to `arch_vcpu`:
> >     ```
> >     struct arch_vcpu
> >     {
> >     #ifdef CONFIG_ARM_MPU
> >         /* Virtualization Translation Control Register */
> >         register_t vtcr_el2;
> >
> >         /* EL1 MPU regions' registers */
> >         pr_t mpu_regions[CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS];
> >     #endif
> >     }
> >     ```
> >     Armv8-R64 can support max to 256 MPU regions. But that's just
> theoretical.
> >     So we don't want to define `pr_t mpu_regions[256]`, this is a memory
> waste
> >     in most of time. So we decided to let the user specify through a
> Kconfig
> >     option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can be
> `32`,
> >     it's a typical implementation on Armv8-R64. Users will recompile Xen
> when
> >     their platform changes. So when the MPU changes, respecifying the
> MPU
> >     protection regions number will not cause additional problems.
>
> I wonder if we could probe the number of MPU regions at runtime and
> dynamically allocate the memory needed to store them in arch_vcpu.
>

We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But it seems
we will encounter some static allocated arch_vcpu problems and sizeof issue.

>
> >
> > 3. **MPU based P2M table management**
> >    Armv8-R64 EL2 doesn't have EL1 stage 2 address translation. But
> through
> >    PMSA, it still has the ability to control the permissions and
> attributes
> >    of EL1 stage 2. In this case, we still hope to keep the interface
> >    consistent with MMU based P2M as far as possible.
> >
> >    p2m->root will point to an allocated memory. In Armv8-A64, this
> memory
> >    is used to save the EL1 stage 2 translation table. But in Armv8-R64,
> >    this memory will be used to store EL2 MPU protection regions that are
> >    used by guest. During domain creation, Xen will prepare the data in
> >    this memory to make guest can access proper RAM and devices. When the
> >    guest's vCPU will be scheduled in, this data will be written to MPU
> >    protection region registers.
> >
> > ### **2.6. Changes of exception trap**
> > As Armv8-R64 has compatible exception mode with Armv8-A64, so we can
> reuse most
> > of Armv8-A64's exception trap & handler code. But except the trap based
> on EL1
> > stage 2 translation abort.
> >
> > In Armv8-A64, we use `FSC_FLT_TRANS`
> > ```
> >     case FSC_FLT_TRANS:
> >         ...
> >         if ( is_data )
> >         {
> >             enum io_state state = try_handle_mmio(regs, hsr, gpa);
> >             ...
> >         }
> > ```
> > But for Armv8-R64, we have to use `FSC_FLT_PERM`
> > ```
> >     case FSC_FLT_PERM:
> >         ...
> >         if ( is_data )
> >         {
> >             enum io_state state = try_handle_mmio(regs, hsr, gpa);
> >             ...
> >         }
> > ```
> >
> > ### **2.5. Changes of device driver**
> > 1. Because Armv8-R64 only has single secure state, this will affect some
> > devices that have two secure state, like GIC. But fortunately, most
> > vendors will not link a two secure state GIC to Armv8-R64 processors.
> > Current GIC driver can work well with single secure state GIC for Armv8-
> R64.
> > 2. Xen should use secure hypervisor timer in Secure EL2. We will
> introduce
> > a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for timer.
> >
> > ### **2.7. Changes of virtual device**
> > Currently, we only support pass-through devices in guest. Because event
> > channel, xen-bus, xen-storage and other advanced Xen features haven't
> been
> > enabled in Armv8-R64.
>
> That's fine -- it is a great start! Looking forward to it!

[-- Attachment #2: Type: text/html, Size: 53969 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-25 10:48   ` Wei Chen
@ 2022-02-25 20:12     ` Julien Grall
  2022-03-01  6:29       ` Wei Chen
  2022-02-25 23:54     ` Stefano Stabellini
  1 sibling, 1 reply; 34+ messages in thread
From: Julien Grall @ 2022-02-25 20:12 UTC (permalink / raw)
  To: Wei Chen, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Wei,

On 25/02/2022 10:48, Wei Chen wrote:
>> >     Armv8-R64 can support max to 256 MPU regions. But that's just
>> theoretical.
>> >     So we don't want to define `pr_t mpu_regions[256]`, this is a memory
>> waste
>> >     in most of time. So we decided to let the user specify through a
>> Kconfig
>> >     option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can be
>> `32`,
>> >     it's a typical implementation on Armv8-R64. Users will recompile Xen
>> when
>> >     their platform changes. So when the MPU changes, respecifying the
>> MPU
>> >     protection regions number will not cause additional problems.
>> 
>> I wonder if we could probe the number of MPU regions at runtime and
>> dynamically allocate the memory needed to store them in arch_vcpu.
>> 
> 
> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But it seems
> we will encounter some static allocated arch_vcpu problems and sizeof issue.

Does it need to be embedded in arch_vcpu? If not, then we could allocate 
memory outside and add a pointer in arch_vcpu.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-24  6:01 Proposal for Porting Xen to Armv8-R64 - DraftA Wei Chen
  2022-02-24 11:52 ` Ayan Kumar Halder
  2022-02-25  0:55 ` Stefano Stabellini
@ 2022-02-25 20:55 ` Julien Grall
  2022-03-01  7:51   ` Wei Chen
  2 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2022-02-25 20:55 UTC (permalink / raw)
  To: Wei Chen, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Wei,

Thank you for sending the proposal. Please find some comments below.

On 24/02/2022 06:01, Wei Chen wrote:
> # Proposal for Porting Xen to Armv8-R64
> 
> This proposal will introduce the PoC work of porting Xen to Armv8-R64,
> which includes:
> - The changes of current Xen capability, like Xen build system, memory
>    management, domain management, vCPU context switch.
> - The expanded Xen capability, like static-allocation and direct-map.
> 
> ***Notes:***
> 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
>     ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
>     ***Trusted-Frimware (TF-R). This is an external dependency,***
>     ***so we think the discussion of Xen SMP support on Armv8-R64***
>     ***should be started when single-CPU support is complete.***

I agree that we should first focus on single-CPU support.

> 2. ***This proposal will not touch xen-tools. In current stage,***
>     ***Xen on Armv8-R64 only support dom0less, all guests should***
>     ***be booted from device tree.***

Make sense. I actually expect some issues in the way xen-tools would 
need to access memory of the domain that is been created.

[...]

> ### 1.2. Xen Challenges with PMSA Virtualization
> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
> with an MPU and host multiple guest OSes.
> 
> - No MMU at EL2:
>      - No EL2 Stage 1 address translation
>          - Xen provides fixed ARM64 virtual memory layout as basis of EL2
>            stage 1 address translation, which is not applicable on MPU system,
>            where there is no virtual addressing. As a result, any operation
>            involving transition from PA to VA, like ioremap, needs modification
>            on MPU system.
>      - Xen's run-time addresses are the same as the link time addresses.
>          - Enable PIC (position-independent code) on a real-time target
>            processor probably very rare.

Aside the assembly boot code and UEFI stub, Xen already runs at the same 
address as it was linked.

>      - Xen will need to use the EL2 MPU memory region descriptors to manage
>        access permissions and attributes for accesses made by VMs at EL1/0.
>          - Xen currently relies on MMU EL1 stage 2 table to manage these
>            accesses.
> - No MMU Stage 2 translation at EL1:
>      - A guest doesn't have an independent guest physical address space
>      - A guest can not reuse the current Intermediate Physical Address
>        memory layout
>      - A guest uses physical addresses to access memory and devices
>      - The MPU at EL2 manages EL1 stage 2 access permissions and attributes
> - There are a limited number of MPU protection regions at both EL2 and EL1:
>      - Architecturally, the maximum number of protection regions is 256,
>        typical implementations have 32.
>      - By contrast, Xen does not need to consider the number of page table
>        entries in theory when using MMU.
> - The MPU protection regions at EL2 need to be shared between the hypervisor
>    and the guest stage 2.
>      - Requires careful consideration - may impact feature 'fullness' of both
>        the hypervisor and the guest
>      - By contrast, when using MMU, Xen has standalone P2M table for guest
>        stage 2 accesses.

[...]

> - ***Define new system registers for compilers***:
>    Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
>    specific system registers. As Armv8-R64 only have secure state, so
>    at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
>    first GCC version that supports Armv8.4 is GCC 8.1. In addition to
>    these, PMSA of Armv8-R64 introduced lots of MPU related system registers:
>    `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
>    `MPUIR_ELx`. But the first GCC version to support these system registers
>    is GCC 11. So we have two ways to make compilers to work properly with
>    these system registers.
>    1. Bump GCC version to GCC 11.
>       The pros of this method is that, we don't need to encode these
>       system registers in macros by ourselves. But the cons are that,
>       we have to update Makefiles to support GCC 11 for Armv8-R64.
>       1.1. Check the GCC version 11 for Armv8-R64.
>       1.2. Add march=armv8r to CFLAGS for Armv8-R64.
>       1.3. Solve the confliction of march=armv8r and mcpu=generic
>      These changes will affect common Makefiles, not only Arm Makefiles.
>      And GCC 11 is new, lots of toolchains and Distro haven't supported it.

I agree that forcing to use GCC11 is not a good idea. But I am not sure 
to understand the problem with the -march=.... Ultimately, shouldn't we 
aim to build Xen ARMv8-R with -march=armv8r?

[...]

> ### **2.2. Changes of the initialization process**
> In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
> initialization process. In addition to some architecture differences, there
> is no more than reusable code that we will distinguish through CONFIG_ARM_MPU
> or CONFIG_ARM64_V8R. We want most of the initialization code to be reusable
> between Armv8-R64 and Armv8-A64.
> 
> - We will reuse the original head.s and setup.c of Arm. But replace the
>    MMU and page table operations in these files with configuration operations
>    for MPU and MPU regions.
> 
> - We provide a boot-time MPU configuration. This MPU configuration will
>    support Xen to finish its initialization. And this boot-time MPU
>    configuration will record the memory regions that will be parsed from
>    device tree.
> 
>    In the end of Xen initialization, we will use a runtime MPU configuration
>    to replace boot-time MPU configuration. The runtime MPU configuration will
>    merge and reorder memory regions to save more MPU regions for guests.
>    ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRDoacQVTwUtWIGU)
> 
> - Defer system unpausing domain.
>    When Xen initialization is about to end, Xen unpause guests created
>    during initialization. But this will cause some issues. The unpause
>    action occurs before free_init_memory, however the runtime MPU configuration
>    is built after free_init_memory.

I was half expecting that free_init_memory() would not be called for Xen 
Armv8R.

> 
>    So if the unpaused guests start executing the context switch at this
>    point, then its MPU context will base on the boot-time MPU configuration.

Can you explain why you want to switch the MPU configuration that late?

>    Probably it will be inconsistent with runtime MPU configuration, this
>    will cause unexpected problems (This may not happen in a single core
>    system, but on SMP systems, this problem is foreseeable, so we hope to
>    solve it at the beginning).

[...]

> ### **2.4. Changes of memory management**
> Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
> decouple Xen from VMSA. And give Xen the ability to manage memory in PMSA.
> 
> 1. ***Use buddy allocator to manage physical pages for PMSA***
>     From the view of physical page, PMSA and VMSA don't have any difference.
>     So we can reuse buddy allocator on Armv8-R64 to manage physical pages.
>     The difference is that, in VMSA, Xen will map allocated pages to virtual
>     addresses. But in PMSA, Xen just convert the pages to physical address.
> 
> 2. ***Can not use virtual address for memory management***
>     As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using virtual
>     address to manage memory. This brings some problems, some virtual address
>     based features could not work well on Armv8-R64, like `FIXMAP`, `vmap/vumap`,
>     `ioremap` and `alternative`.
> 
>     But the functions or macros of these features are used in lots of common
>     code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate code
>     everywhere. In this case, we propose to use stub helpers to make the changes
>     transparently to common code.
>     1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap operations.
>        This will return physical address directly of fixmapped item.
>     2. For `vmap/vumap`, we will use some empty inline stub helpers:
>          ```
>          static inline void vm_init_type(...) {}
>          static inline void *__vmap(...)
>          {
>              return NULL;
>          }
>          static inline void vunmap(const void *va) {}
>          static inline void *vmalloc(size_t size)
>          {
>              return NULL;
>          }
>          static inline void *vmalloc_xen(size_t size)
>          {
>              return NULL;
>          }
>          static inline void vfree(void *va) {}
>          ```
> 
>     3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to always
>        return `NULL`, they could not work well on Armv8-R64 without changes.
>        `ioremap` will return input address directly.
>          ```
>          static inline void *ioremap_attr(...)
>          {
>              /* We don't have the ability to change input PA cache attributes */
OOI, who will set them?

>              if ( CACHE_ATTR_need_change )
>                  return NULL;
>              return (void *)pa;
>          }
>          static inline void __iomem *ioremap_nocache(...)
>          {
>              return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
>          }
>          static inline void __iomem *ioremap_cache(...)
>          {
>              return ioremap_attr(start, len, PAGE_HYPERVISOR);
>          }
>          static inline void __iomem *ioremap_wc(...)
>          {
>              return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
>          }
>          void *ioremap(...)
>          {
>              return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
>          }
> 
>          ```
>      4. For `alternative`, it depends on `vmap` too.

The only reason we depend on vmap() is because the map the sections 
*text read-only and we enforce WnX. For VMSA, it would be possible to 
avoid vmap() with some rework. I don't know for PMSA.

> We will simply disable
>         it on Armv8-R64 in current stage. How to implement `alternative`
>         on Armv8-R64 is better to be discussed after basic functions of Xen
>         on Armv8-R64 work well.
alternative are mostly helpful to handle errata or enable features that 
are not present on all CPUs. I wouldn't expect this to be necessary at 
the beginning. In fact, on Arm, it was introduced > 4 years after the 
initial port :).

[...]

> ### **2.5. Changes of device driver**
> 1. Because Armv8-R64 only has single secure state, this will affect some
> devices that have two secure state, like GIC. But fortunately, most
> vendors will not link a two secure state GIC to Armv8-R64 processors.
> Current GIC driver can work well with single secure state GIC for Armv8-R64.
> 2. Xen should use secure hypervisor timer in Secure EL2. We will introduce
> a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for timer.
> 
> ### **2.7. Changes of virtual device**
> Currently, we only support pass-through devices in guest. Because event
> channel, xen-bus, xen-storage and other advanced Xen features haven't been
> enabled in Armv8-R64.

That's fine. I expect to require quite a bit of work to move from Xen 
sharing the pages (e.g. like for grant-tables) to the guest sharing pages.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-25 10:48   ` Wei Chen
  2022-02-25 20:12     ` Julien Grall
@ 2022-02-25 23:54     ` Stefano Stabellini
  2022-03-01 12:55       ` Wei Chen
  1 sibling, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2022-02-25 23:54 UTC (permalink / raw)
  To: Wei Chen
  Cc: Stefano Stabellini, xen-devel, julien, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd

[-- Attachment #1: Type: text/plain, Size: 31237 bytes --]

On Fri, 25 Feb 2022, Wei Chen wrote:
> > Hi Wei,
> >
> > This is extremely exciting, thanks for the very nice summary!
> >
> >
> > On Thu, 24 Feb 2022, Wei Chen wrote:
> > > # Proposal for Porting Xen to Armv8-R64
> > >
> > > This proposal will introduce the PoC work of porting Xen to Armv8-R64,
> > > which includes:
> > > - The changes of current Xen capability, like Xen build system, memory
> > >   management, domain management, vCPU context switch.
> > > - The expanded Xen capability, like static-allocation and direct-map.
> > >
> > > ***Notes:***
> > > 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
> > >    ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> > >    ***Trusted-Frimware (TF-R). This is an external dependency,***
> > >    ***so we think the discussion of Xen SMP support on Armv8-R64***
> > >    ***should be started when single-CPU support is complete.***
> > > 2. ***This proposal will not touch xen-tools. In current stage,***
> > >    ***Xen on Armv8-R64 only support dom0less, all guests should***
> > >    ***be booted from device tree.***
> > >
> > > ## 1. Essential Background
> > >
> > > ### 1.1. Armv8-R64 Profile
> > > The Armv-R architecture profile was designed to support use cases that
> > > have a high sensitivity to deterministic execution. (e.g. Fuel Injection,
> > > Brake control, Drive trains, Motor control etc)
> > >
> > > Arm announced Armv8-R in 2013, it is the latest generation Arm
> > architecture
> > > targeted at the Real-time profile. It introduces virtualization at the
> > highest
> > > security level while retaining the Protected Memory System Architecture
> > (PMSA)
> > > based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex-
> > R82,
> > > which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
> > >
> > > - The latest Armv8-R64 document can be found here:
> > >   [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R
> > AArch64 architecture
> > profile](https://developer.arm.com/documentation/ddi0600/latest/).
> > >
> > > - Armv-R Architecture progression:
> > >   Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
> > >   The following figure is a simple comparison of "R" processors based on
> > >   different Armv-R Architectures.
> > >   ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8i
> > mBpbvIr2eqBguEB)
> > >
> > > - The Armv8-R architecture evolved additional features on top of Armv7-R:
> > >     - An exception model that is compatible with the Armv8-A model
> > >     - Virtualization with support for guest operating systems
> > >         - PMSA virtualization using MPUs In EL2.
> > > - The new features of Armv8-R64 architecture
> > >     - Adds support for the 64-bit A64 instruction set, previously Armv8-
> > R
> > >       only supported A32.
> > >     - Supports up to 48-bit physical addressing, previously up to 32-bit
> > >       addressing was supported.
> > >     - Optional Arm Neon technology and Advanced SIMD
> > >     - Supports three Exception Levels (ELs)
> > >         - Secure EL2 - The Highest Privilege, MPU only, for firmware,
> > hypervisor
> > >         - Secure EL1 - RichOS (MMU) or RTOS (MPU)
> > >         - Secure EL0 - Application Workloads
> > >     - Optionally supports Virtual Memory System Architecture at S-EL1/S-
> > EL0.
> > >       This means it's possible to run rich OS kernels - like Linux -
> > either
> > >       bare-metal or as a guest.
> > > - Differences with the Armv8-A AArch64 architecture
> > >     - Supports only a single Security state - Secure. There is not Non-
> > Secure
> > >       execution state supported.
> > >     - EL3 is not supported, EL2 is mandatory. This means secure EL2 is
> > the
> > >       highest EL.
> > >     - Supports the A64 ISA instruction
> > >         - With a small set of well-defined differences
> > >     - Provides a PMSA (Protected Memory System Architecture) based
> > >       virtualization model.
> > >         - As opposed to Armv8-A AArch64's VMSA based Virtualization
> > >         - Can support address bits up to 52 if FEAT_LPA is enabled,
> > >           otherwise 48 bits.
> > >         - Determines the access permissions and memory attributes of
> > >           the target PA.
> > >         - Can implement PMSAv8-64 at EL1 and EL2
> > >             - Address translation flat-maps the VA to the PA for EL2
> > Stage 1.
> > >             - Address translation flat-maps the VA to the PA for EL1
> > Stage 1.
> > >             - Address translation flat-maps the IPA to the PA for EL1
> > Stage 2.
> > >     - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.
> > >
> > > ### 1.2. Xen Challenges with PMSA Virtualization
> > > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
> > > with an MPU and host multiple guest OSes.
> > >
> > > - No MMU at EL2:
> > >     - No EL2 Stage 1 address translation
> > >         - Xen provides fixed ARM64 virtual memory layout as basis of EL2
> > >           stage 1 address translation, which is not applicable on MPU
> > system,
> > >           where there is no virtual addressing. As a result, any
> > operation
> > >           involving transition from PA to VA, like ioremap, needs
> > modification
> > >           on MPU system.
> > >     - Xen's run-time addresses are the same as the link time addresses.
> > >         - Enable PIC (position-independent code) on a real-time target
> > >           processor probably very rare.
> > >     - Xen will need to use the EL2 MPU memory region descriptors to
> > manage
> > >       access permissions and attributes for accesses made by VMs at
> > EL1/0.
> > >         - Xen currently relies on MMU EL1 stage 2 table to manage these
> > >           accesses.
> > > - No MMU Stage 2 translation at EL1:
> > >     - A guest doesn't have an independent guest physical address space
> > >     - A guest can not reuse the current Intermediate Physical Address
> > >       memory layout
> > >     - A guest uses physical addresses to access memory and devices
> > >     - The MPU at EL2 manages EL1 stage 2 access permissions and
> > attributes
> > > - There are a limited number of MPU protection regions at both EL2 and
> > EL1:
> > >     - Architecturally, the maximum number of protection regions is 256,
> > >       typical implementations have 32.
> > >     - By contrast, Xen does not need to consider the number of page
> > table
> > >       entries in theory when using MMU.
> > > - The MPU protection regions at EL2 need to be shared between the
> > hypervisor
> > >   and the guest stage 2.
> > >     - Requires careful consideration - may impact feature 'fullness' of
> > both
> > >       the hypervisor and the guest
> > >     - By contrast, when using MMU, Xen has standalone P2M table for
> > guest
> > >       stage 2 accesses.
> > >
> > > ## 2. Proposed changes of Xen
> > > ### **2.1. Changes of build system:**
> > >
> > > - ***Introduce new Kconfig options for Armv8-R64***:
> > >   Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may not
> > >   expect one Xen binary to run on all machines. Xen images are not
> > common
> > >   across Armv8-R64 platforms. Xen must be re-built for different Armv8-
> > R64
> > >   platforms. Because these platforms may have different memory layout
> > and
> > >   link address.
> > >     - `ARM64_V8R`:
> > >       This option enables Armv8-R profile for Arm64. Enabling this
> > option
> > >       results in selecting MPU. This Kconfig option is used to gate some
> > >       Armv8-R64 specific code except MPU code, like some code for Armv8-
> > R64
> > >       only system ID registers access.
> > >
> > >     - `ARM_MPU`
> > >       This option enables MPU on ARMv8-R architecture. Enabling this
> > option
> > >       results in disabling MMU. This Kconfig option is used to gate some
> > >       ARM_MPU specific code. Once when this Kconfig option has been
> > enabled,
> > >       the MMU relate code will not be built for Armv8-R64. The reason
> > why
> > >       not depends on runtime detection to select MMU or MPU is that, we
> > don't
> > >       think we can use one image for both Armv8-R64 and Armv8-A64.
> > Another
> > >       reason that we separate MPU and V8R in provision to allow to
> > support MPU
> > >       on 32bit Arm one day.
> > >
> > >     - `XEN_START_ADDRESS`
> > >       This option allows to set the custom address at which Xen will be
> > >       linked. This address must be aligned to a page size. Xen's run-
> > time
> > >       addresses are the same as the link time addresses. Different
> > platforms
> > >       may have differnt memory layout. This Kconfig option provides
> > users
> > >       the ability to select proper link addresses for their boards.
> > >       ***Notes: Fixed link address means the Xen binary could not be***
> > >       ***relocated by EFI loader. So in current stage, Xen could not***
> > >       ***be launched as an EFI application on Armv8-R64.***
> > >
> > >     - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
> > >       `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
> > >       These Kconfig options allow to set memory regions for Xen code,
> > data
> > >       and device memory. Before parsing memory information from device
> > tree,
> > >       Xen will use the values that stored in these options to setup
> > boot-time
> > >       MPU configuration. Why we need a boot-time MPU configuration?
> > >       1. More deterministic: Arm MPU supports background regions,
> > >          if we don't configure the MPU regions and don't enable MPU.
> > >          We can enable MPU background regions. But that means all RAM
> > >          is RWX. Random values in RAM or maliciously embedded data can
> > >          be exploited. Using these Kconfig options allow users to have
> > >          a deterministic RAM area to execute code.
> > >       2. More compatible: On some Armv8-R64 platforms, if the MPU is
> > >          disabled, the `dc zva` instruction will make the system halt.
> > >          And this instruction will be embedded in some built-in
> > functions,
> > >          like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
> > >          the built-in functions will not contain `dc zva`. However, it
> > is
> > >          obviously unlikely that we will be able to recompile all GCC
> > >          for ARMv8-R64.
> > >       3. One optional idea:
> > >           We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` or
> > >           `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
> > >           MPU normal memory. It's enough to support Xen run in boot time.
> >
> > I can imagine that we need to have a different Xen build for each
> > ARMv8-R platform. Do you envision that XEN_START_ADDRESS and
> > ARM_MPU_*_MEMORY_START/END are preconfigured based on the platform
> > choice at build time? I don't think we want a user to provide all of
> > those addresses by hand, right?
> 
> Yes, this is in our TODO list. We want to reuse current arm/platforms and
> Kconfig menu for Armv8-R.
 
OK, good


> > The next question is whether we could automatically generate
> > XEN_START_ADDRESS and ARM_MPU_*_MEMORY_START/END based on the platform
> > device tree at build time (at build time, not runtime). That would
> > make things a lot easier and it is also aligned with the way Zephyr and
> > other RTOSes and baremetal apps work.
> 
> It's a considerable option. But here we may encounter some problems need
> to be solved first:
> 1. Does CONFIG_DTB must be selected by default on Armv8-R? Without firmware
>    or bootloader (like u-boot), we have to build DTB into Xen binary.

CONFIG_DTB should trigger runtime support for device tree, while here we
are talking about build time support for device tree. It is very
different.

Just to make an example, the whole build-time device tree could be
scanned by Makefiles and other scripts, leading to C header files
generations, but no code in Xen to parse device tree at all.

DTB ---> Makefiles/scripts ---> .h files ---> Makefiles/scripts ---> xen


I am not saying this is the best way to do it, I am only pointing out
that build-time device tree does not imply run-time device tree. Also,
it doesn't imply a DTB built-in the Xen binary (although that is also an
option).

The way many baremetal OSes and RTOSes work is that they take a DTB as
input to the build *only*. From the DTB, the build-time make system
generates #defines and header files that are imported in C.

The resulting RTOS binary doesn't need support for DTB, because all the
right addresses have already been provided as #define by the Make
system.

I don't think we need to go to the extreme of removing DTB support from
Xen on ARMv8-R. I am only saying that if we add build-time device tree
support it would make it easier to support multiple boards without
having to have platform files in Xen for each of them, and we can do
that without any impact on runtime device tree parsing.


>    This
>    can guarantee build-time DTB is the same as runtime DTB. But eventually,
>    we will have firmware and bootloader before Xen launch (as Arm EBBR's
>    requirement). In this case, we may not build DTB into Xen image. And
>    we can't guarantee build-time DTB is the same as runtime DTB.

As mentioned, if we have a build-time DTB we might not need a run-time
DTB. Secondly, I think it is entirely reasonable to expect that the
build-time DTB and the run-time DTB are the same.

It is the same problem with platform files: we have to assume that the
information in the platform files matches the runtime DTB.


> 2. If build-time DTB is the same as runtime DTB, how can we determine
>    the XEN_START_ADDRESS in DTB describe memory range? Should we always
>    limit Xen to boot from lowest address? Or will we introduce some new
>    DT property to specify the Xen start address? I think this DT property
>    also can solve above question#1.
 
The loading address should be automatically chosen by the build scripts.
We can do that now with ImageBuilder [1]: it selects a 2MB-aligned
address for each binary to load, one by one starting from a 2MB offset
from start of memory.

[1] https://gitlab.com/ViryaOS/imagebuilder/-/blob/master/scripts/uboot-script-gen#L390

So the build scripts can select XEN_START_ADDRESS based on the
memory node information on the build-time device tree. And there should
be no need to add XEN_START_ADDRESS to the runtime device tree.


> > The device tree can be given as input to the build system, and the
> > Makefiles would take care of generating XEN_START_ADDRESS and
> > ARM_MPU_*_MEMORY_START/END based on /memory and other interesting nodes.
> >
> 
> If we can solve above questions, yes, device tree is a good idea for
> XEN_START_ADDRESS. For ARM_MPU_NORMAL_MEMORY_*, we can get them from
> memory nodes, but for ARM_MPU_DEVICE_MEMORY_*, they are not easy for
> us to scan all devices' nodes. And it's very tricky, if the memory
> regions are interleaved. So in our current RFC code, we select to use
> the optional idea:
> We map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` for MPU normal memory.
> But we use mpu,device-memory-section in DT for MPU device memory.

Keep in mind that we are talking about build-time scripts: it doesn't
matter if they are slow. We can scan the build-time dtb as many time as
needed and generate ARM_MPU_DEVICE_MEMORY_* as appropriate. It might
make "make xen" slower but runtime will be unaffected.

So, I don't think this is a problem.


> > > - ***Define new system registers for compilers***:
> > >   Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> > >   specific system registers. As Armv8-R64 only have secure state, so
> > >   at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
> > >   first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> > >   these, PMSA of Armv8-R64 introduced lots of MPU related system
> > registers:
> > >   `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
> > >   `MPUIR_ELx`. But the first GCC version to support these system
> > registers
> > >   is GCC 11. So we have two ways to make compilers to work properly with
> > >   these system registers.
> > >   1. Bump GCC version to GCC 11.
> > >      The pros of this method is that, we don't need to encode these
> > >      system registers in macros by ourselves. But the cons are that,
> > >      we have to update Makefiles to support GCC 11 for Armv8-R64.
> > >      1.1. Check the GCC version 11 for Armv8-R64.
> > >      1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> > >      1.3. Solve the confliction of march=armv8r and mcpu=generic
> > >     These changes will affect common Makefiles, not only Arm Makefiles.
> > >     And GCC 11 is new, lots of toolchains and Distro haven't supported
> > it.
> > >
> > >   2. Encode new system registers in macros ***(preferred)***
> > >         ```
> > >         /* Virtualization Secure Translation Control Register */
> > >         #define VSTCR_EL2  S3_4_C2_C6_2
> > >         /* Virtualization System Control Register */
> > >         #define VSCTLR_EL2 S3_4_C2_C0_0
> > >         /* EL1 MPU Protection Region Base Address Register encode */
> > >         #define PRBAR_EL1  S3_0_C6_C8_0
> > >         ...
> > >         /* EL2 MPU Protection Region Base Address Register encode */
> > >         #define PRBAR_EL2  S3_4_C6_C8_0
> > >         ...
> > >         ```
> > >      If we encode all above system registers, we don't need to bump GCC
> > >      version. And the common CFLAGS Xen is using still can be applied to
> > >      Armv8-R64. We don't need to modify Makefiles to add specific CFLAGS.
> >
> > I think that's fine and we did something similar with the original ARMv7-A
> > port if I remember correctly.
> >
> >
> > > ### **2.2. Changes of the initialization process**
> > > In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
> > > initialization process. In addition to some architecture differences,
> > there
> > > is no more than reusable code that we will distinguish through
> > CONFIG_ARM_MPU
> > > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> > reusable
> > > between Armv8-R64 and Armv8-A64.
> >
> > +1
> >
> >
> > > - We will reuse the original head.s and setup.c of Arm. But replace the
> > >   MMU and page table operations in these files with configuration
> > operations
> > >   for MPU and MPU regions.
> > >
> > > - We provide a boot-time MPU configuration. This MPU configuration will
> > >   support Xen to finish its initialization. And this boot-time MPU
> > >   configuration will record the memory regions that will be parsed from
> > >   device tree.
> > >
> > >   In the end of Xen initialization, we will use a runtime MPU
> > configuration
> > >   to replace boot-time MPU configuration. The runtime MPU configuration
> > will
> > >   merge and reorder memory regions to save more MPU regions for guests.
> > >   ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRD
> > oacQVTwUtWIGU)
> > >
> > > - Defer system unpausing domain.
> > >   When Xen initialization is about to end, Xen unpause guests created
> > >   during initialization. But this will cause some issues. The unpause
> > >   action occurs before free_init_memory, however the runtime MPU
> > configuration
> > >   is built after free_init_memory.
> > >
> > >   So if the unpaused guests start executing the context switch at this
> > >   point, then its MPU context will base on the boot-time MPU
> > configuration.
> > >   Probably it will be inconsistent with runtime MPU configuration, this
> > >   will cause unexpected problems (This may not happen in a single core
> > >   system, but on SMP systems, this problem is foreseeable, so we hope to
> > >   solve it at the beginning).
> > >
> > > ### **2.3. Changes to reduce memory fragmentation**
> > >
> > > In general, memory in Xen system can be classified to 4 classes:
> > > `image sections`, `heap sections`, `guest RAM`, `boot modules (guest
> > Kernel,
> > > initrd and dtb)`
> > >
> > > Currently, Xen doesn't have any restriction for users how to allocate
> > > memory for different classes. That means users can place boot modules
> > > anywhere, can reserve Xen heap memory anywhere and can allocate guest
> > > memory anywhere.
> > >
> > > In a VMSA system, this would not be too much of a problem, since the
> > > MMU can manage memory at a granularity of 4KB after all. But in a
> > > PMSA system, this will be a big problem. On Armv8-R64, the max MPU
> > > protection regions number has been limited to 256. But in typical
> > > processor implementations, few processors will design more than 32
> > > MPU protection regions. Add in the fact that Xen shares MPU protection
> > > regions with guest's EL1 Stage 2. It becomes even more important
> > > to properly plan the use of MPU protection regions.
> > >
> > > - An ideal of memory usage layout restriction:
> > > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAt
> > d75XtrngcnW)
> > > 1. Reserve proper MPU regions for Xen image (code, rodata and data +
> > bss).
> > > 2. Reserve one MPU region for boot modules.
> > >    That means the placement of all boot modules, include guest kernel,
> > >    initrd and dtb, will be limited to this MPU region protected area.
> > > 3. Reserve one or more MPU regions for Xen heap.
> > >    On Armv8-R64, the guest memory is predefined in device tree, it will
> > >    not be allocated from heap. Unlike Armv8-A64, we will not move all
> > >    free memory to heap. We want Xen heap is dertermistic too, so Xen on
> > >    Armv8-R64 also rely on Xen static heap feature. The memory for Xen
> > >    heap will be defined in tree too. Considering that physical memory
> > >    can also be discontinuous, one or more MPU protection regions needs
> > >    to be reserved for Xen HEAP.
> > > 4. If we name above used MPU protection regions PART_A, and name left
> > >    MPU protection regions PART_B:
> > >    4.1. In hypervisor context, Xen will map left RAM and devices to
> > PART_B.
> > >         This will give Xen the ability to access whole memory.
> > >    4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B.
> > >         In this case, Xen just need to update PART_B in context switch,
> > >         but keep PART_A as fixed.
> >
> > I think that the memory layout and restrictions that you wrote above
> > make sense. I have some comments on the way they are represented in
> > device tree, but that's different.
> >
> >
> > > ***Notes: Static allocation will be mandatory on MPU based systems***
> > >
> > > **A sample device tree of memory layout restriction**:
> > > ```
> > > chosen {
> > >     ...
> > >     /*
> > >      * Define a section to place boot modules,
> > >      * all boot modules must be placed in this section.
> > >      */
> > >     mpu,boot-module-section = <0x10000000 0x10000000>;
> > >     /*
> > >      * Define a section to cover all guest RAM. All guest RAM must be
> > located
> > >      * within this section. The pros is that, in best case, we can only
> > have
> > >      * one MPU protection region to map all guest RAM for Xen.
> > >      */
> > >     mpu,guest-memory-section = <0x20000000 0x30000000>;
> > >     /*
> > >      * Define a memory section that can cover all device memory that
> > >      * will be used in Xen.
> > >      */
> > >     mpu,device-memory-section = <0x80000000 0x7ffff000>;
> > >     /* Define a section for Xen heap */
> > >     xen,static-mem = <0x50000000 0x20000000>;
> >
> > As mentioned above, I understand the need for these sections, but why do
> > we need to describe them in device tree at all? Could Xen select them by
> > itself during boot?
> 
> I think without some inputs, Xen could not do this or will do it in some
> assumption. For example, assume the first the boot-module-section determined
> by lowest address and highest address of all modules. And the same for
> guest-memory-section, calculated from all guest allocated memory regions.

Right, I think that the mpu,boot-module-section should be generated by a
set of scripts like ImageBuilder. Something with a list of all the
binaries that need to be loaded and also the DTB at build-time.
Something like ImageBuilder would have the ability to add
"mpu,boot-module-section" to device tree automatically and automatically
choose a good address for it.
 
As an example, today ImageBuilder takes as input a config file like the
following:

---
MEMORY_START="0x0"
MEMORY_END="0x80000000"

DEVICE_TREE="4.16-2022.1/mpsoc.dtb"
XEN="4.16-2022.1/xen"
DOM0_KERNEL="4.16-2022.1/Image-dom0-5.16"
DOM0_RAMDISK="4.16-2022.1/xen-rootfs.cpio.gz"

NUM_DOMUS=1
DOMU_KERNEL[0]="4.16-2022.1/Image-domU"
DOMU_RAMDISK[0]="4.16-2022.1/initrd.cpio"
DOMU_PASSTHROUGH_DTB[0]="4.16-2022.1/passthrough-example-sram.dtb"
---

And generates a U-Boot boot.scr script with:
- load addresses for each binary
- commands to edit the DTB to add those addresses to device tree (e.g.
  dom0less kernels addresses)

ImageBuilder can also modify the DTB at build time instead (instead of
doing it from boot.scr.) See FDTEDIT.

I am not saying we should use ImageBuilder, but it sounds like we need
something similar.


> > If not, and considering that we have to generate
> > ARM_MPU_*_MEMORY_START/END anyway at build time, would it make sense to
> > also generate mpu,guest-memory-section, xen,static-mem, etc. at build
> > time rather than passing it via device tree to Xen at runtime?
> >
> 
> Did you mean we still add these information in device tree, but for build
> time only. In runtime we don't parse them?

Yes, something like that, but see below.


> > What's the value of doing ARM_MPU_*_MEMORY_START/END at build time and
> > everything else at runtime?
> 
> ARM_MPU_*_MEMORY_START/END is defined by platform. But other things are
> users customized. They can change their usage without rebuild the image.
 
Good point.

We don't want to have to rebuild Xen if the user updated a guest kernel,
resulting in a larger boot-module-section.

So I think it makes sense that "mpu,boot-module-section" is generated by
the scripts (e.g. ImageBuilder) at build time, and Xen reads the
property at boot from the runtime device tree.

I think we need to divide the information into two groups:


# Group1: board info

This information is platform specific and it is not meant to change
depending on the VM configuration. Ideally, we build Xen for a platform
once, then we can use the same Xen binary together with any combination
of dom0/domU kernels and ramdisks.

This kind of information doesn't need to be exposed to the runtime
device tree. But we can still use a build-time device tree to generate
the addresses if it is convenient.

XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and ARM_MPU_NORMAL_MEMORY_*
seem to be part of this group.


# Group2: boot configuration

This information is about the specific set of binaries and VMs that we
need to boot. It is conceptually similar to the dom0less device tree
nodes that we already have. If we change one of the VM binaries, we
likely have to refresh the information here.

"mpu,boot-module-section" probably belongs to this group (unless we find
a way to define "mpu,boot-module-section" generically so that we don't
need to change it any time the set of boot modules change.)


> > It looks like we are forced to have the sections definitions at build
> > time because we need them before we can parse device tree. In that case,
> > we might as well define all the sections at build time.
> >
> > But I think it would be even better if Xen could automatically choose
> > xen,static-mem, mpu,guest-memory-section, etc. on its own based on the
> > regular device tree information (/memory, /amba, etc.), without any need
> > for explicitly describing each range with these new properties.
> >
> 
> for mpu,guest-memory-section, with the limitations: no other usage between
> different guest' memory nodes, this is OK. But for xen,static-mem (heap),
> we just want everything on a MPU system is dertermistic. But, of course Xen
> can select left memory for heap without static-mem.

It is good that you think they can be chosen by Xen.

Differently from "boot-module-section", which has to do with the boot
modules selected by the user for a specific execution,
guest-memory-section and static-mem are Xen specific memory
policies/allocations.

A user wouldn't know how to fill them in. And I worry that even a script
like ImageBuilder wouldn't be the best place to pick these values --
they seem too "important" to leave to a script.

But it seems possible to choose the values in Xen:
- Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at build time
- Xen reads boot-module-section from device tree

It should be possible at this point for Xen to pick the best values for
guest-memory-section and static-mem based on the memory available.


> > >     domU1 {
> > >         ...
> > >         #xen,static-mem-address-cells = <0x01>;
> > >         #xen,static-mem-size-cells = <0x01>;
> > >         /* Statically allocated guest memory, within mpu,guest-memory-
> > section */
> > >         xen,static-mem = <0x30000000 0x1f000000>;
> > >
> > >         module@11000000 {
> > >             compatible = "multiboot,kernel\0multiboot,module";
> > >             /* Boot module address, within mpu,boot-module-section */
> > >             reg = <0x11000000 0x3000000>;
> > >             ...
> > >         };
> > >
> > >         module@10FF0000 {
> > >                 compatible = "multiboot,device-tree\0multiboot,module";
> > >                 /* Boot module address, within mpu,boot-module-section
> > */
> > >                 reg = <0x10ff0000 0x10000>;
> > >                 ...
> > >         };
> > >     };
> > > };
> > > ```

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-25 20:12     ` Julien Grall
@ 2022-03-01  6:29       ` Wei Chen
  2022-03-01 13:17         ` Julien Grall
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-01  6:29 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年2月26日 4:12
> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Wei,
> 
> On 25/02/2022 10:48, Wei Chen wrote:
> >> >     Armv8-R64 can support max to 256 MPU regions. But that's just
> >> theoretical.
> >> >     So we don't want to define `pr_t mpu_regions[256]`, this is a
> memory
> >> waste
> >> >     in most of time. So we decided to let the user specify through a
> >> Kconfig
> >> >     option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can
> be
> >> `32`,
> >> >     it's a typical implementation on Armv8-R64. Users will recompile
> Xen
> >> when
> >> >     their platform changes. So when the MPU changes, respecifying the
> >> MPU
> >> >     protection regions number will not cause additional problems.
> >>
> >> I wonder if we could probe the number of MPU regions at runtime and
> >> dynamically allocate the memory needed to store them in arch_vcpu.
> >>
> >
> > We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But it
> seems
> > we will encounter some static allocated arch_vcpu problems and sizeof
> issue.
> 
> Does it need to be embedded in arch_vcpu? If not, then we could allocate
> memory outside and add a pointer in arch_vcpu.
> 

We had thought to use a pointer in arch_vcpu instead of embedding mpu_regions
into arch_vcpu. But we noticed that arch_vcpu has a __cacheline_aligned
attribute, this may be because of arch_vcpu will be used very frequently
in some critical path. So if we use the pointer for mpu_regions, may cause
some cache miss in these critical path, for example, in context_swtich.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-25 20:55 ` Julien Grall
@ 2022-03-01  7:51   ` Wei Chen
  2022-03-02  7:21     ` Penny Zheng
  2022-03-02 12:00     ` Julien Grall
  0 siblings, 2 replies; 34+ messages in thread
From: Wei Chen @ 2022-03-01  7:51 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年2月26日 4:55
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; Stefano
> Stabellini <sstabellini@kernel.org>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Wei,
> 
> Thank you for sending the proposal. Please find some comments below.
> 
> On 24/02/2022 06:01, Wei Chen wrote:
> > # Proposal for Porting Xen to Armv8-R64
> >
> > This proposal will introduce the PoC work of porting Xen to Armv8-R64,
> > which includes:
> > - The changes of current Xen capability, like Xen build system, memory
> >    management, domain management, vCPU context switch.
> > - The expanded Xen capability, like static-allocation and direct-map.
> >
> > ***Notes:***
> > 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
> >     ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> >     ***Trusted-Frimware (TF-R). This is an external dependency,***
> >     ***so we think the discussion of Xen SMP support on Armv8-R64***
> >     ***should be started when single-CPU support is complete.***
> 
> I agree that we should first focus on single-CPU support.
> 

ack.

> > 2. ***This proposal will not touch xen-tools. In current stage,***
> >     ***Xen on Armv8-R64 only support dom0less, all guests should***
> >     ***be booted from device tree.***
> 
> Make sense. I actually expect some issues in the way xen-tools would
> need to access memory of the domain that is been created.
> 

Yes, we also feel that changes to xen-tools could be a big job in the
future (both xen common implementation and tools need changes).  

> [...]
> 
> > ### 1.2. Xen Challenges with PMSA Virtualization
> > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
> > with an MPU and host multiple guest OSes.
> >
> > - No MMU at EL2:
> >      - No EL2 Stage 1 address translation
> >          - Xen provides fixed ARM64 virtual memory layout as basis of
> EL2
> >            stage 1 address translation, which is not applicable on MPU
> system,
> >            where there is no virtual addressing. As a result, any
> operation
> >            involving transition from PA to VA, like ioremap, needs
> modification
> >            on MPU system.
> >      - Xen's run-time addresses are the same as the link time addresses.
> >          - Enable PIC (position-independent code) on a real-time target
> >            processor probably very rare.
> 
> Aside the assembly boot code and UEFI stub, Xen already runs at the same
> address as it was linked.
> 

But the difference is that, base on MMU, we can use the same link address
for all platforms. But on MPU system, we can't do it in the same way.

> >      - Xen will need to use the EL2 MPU memory region descriptors to
> manage
> >        access permissions and attributes for accesses made by VMs at
> EL1/0.
> >          - Xen currently relies on MMU EL1 stage 2 table to manage these
> >            accesses.
> > - No MMU Stage 2 translation at EL1:
> >      - A guest doesn't have an independent guest physical address space
> >      - A guest can not reuse the current Intermediate Physical Address
> >        memory layout
> >      - A guest uses physical addresses to access memory and devices
> >      - The MPU at EL2 manages EL1 stage 2 access permissions and
> attributes
> > - There are a limited number of MPU protection regions at both EL2 and
> EL1:
> >      - Architecturally, the maximum number of protection regions is 256,
> >        typical implementations have 32.
> >      - By contrast, Xen does not need to consider the number of page
> table
> >        entries in theory when using MMU.
> > - The MPU protection regions at EL2 need to be shared between the
> hypervisor
> >    and the guest stage 2.
> >      - Requires careful consideration - may impact feature 'fullness' of
> both
> >        the hypervisor and the guest
> >      - By contrast, when using MMU, Xen has standalone P2M table for
> guest
> >        stage 2 accesses.
> 
> [...]
> 
> > - ***Define new system registers for compilers***:
> >    Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> >    specific system registers. As Armv8-R64 only have secure state, so
> >    at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
> >    first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> >    these, PMSA of Armv8-R64 introduced lots of MPU related system
> registers:
> >    `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
> >    `MPUIR_ELx`. But the first GCC version to support these system
> registers
> >    is GCC 11. So we have two ways to make compilers to work properly
> with
> >    these system registers.
> >    1. Bump GCC version to GCC 11.
> >       The pros of this method is that, we don't need to encode these
> >       system registers in macros by ourselves. But the cons are that,
> >       we have to update Makefiles to support GCC 11 for Armv8-R64.
> >       1.1. Check the GCC version 11 for Armv8-R64.
> >       1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> >       1.3. Solve the confliction of march=armv8r and mcpu=generic
> >      These changes will affect common Makefiles, not only Arm Makefiles.
> >      And GCC 11 is new, lots of toolchains and Distro haven't supported
> it.
> 
> I agree that forcing to use GCC11 is not a good idea. But I am not sure
> to understand the problem with the -march=.... Ultimately, shouldn't we
> aim to build Xen ARMv8-R with -march=armv8r?
> 

Actually, we had done, but we reverted it from RFC patch series. The reason
has been listed above. But that is not the major reason. The main reason
is that:
Armv8-R AArch64 supports the A64 ISA instruction set with some modifications:
Redefines DMB, DSB, and adds an DFB. But actually, the encodings of DMB and
DSB are still the same with A64. And DFB is a alias of DSB #12.

In this case, we don't think we need a new arch flag to generate new
instructions for Armv8-R. And we have discussed with Arm kernel guys, they
will not update the build system to build Linux that will be running on
Armv8-R64 EL1 either.


> [...]
> 
> > ### **2.2. Changes of the initialization process**
> > In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
> > initialization process. In addition to some architecture differences,
> there
> > is no more than reusable code that we will distinguish through
> CONFIG_ARM_MPU
> > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> reusable
> > between Armv8-R64 and Armv8-A64.
> >
> > - We will reuse the original head.s and setup.c of Arm. But replace the
> >    MMU and page table operations in these files with configuration
> operations
> >    for MPU and MPU regions.
> >
> > - We provide a boot-time MPU configuration. This MPU configuration will
> >    support Xen to finish its initialization. And this boot-time MPU
> >    configuration will record the memory regions that will be parsed from
> >    device tree.
> >
> >    In the end of Xen initialization, we will use a runtime MPU
> configuration
> >    to replace boot-time MPU configuration. The runtime MPU configuration
> will
> >    merge and reorder memory regions to save more MPU regions for guests.
> >    ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqR
> DoacQVTwUtWIGU)
> >
> > - Defer system unpausing domain.
> >    When Xen initialization is about to end, Xen unpause guests created
> >    during initialization. But this will cause some issues. The unpause
> >    action occurs before free_init_memory, however the runtime MPU
> configuration
> >    is built after free_init_memory.
> 
> I was half expecting that free_init_memory() would not be called for Xen
> Armv8R.
>

We had called free_init_memory for Xen Armv8R, but it doesn't really mean
much. As we have static heap, so we don't reclaim init memory to heap. And
this reclaimed memory could not be used by Xen data and bss either. But
from the security perspective, free_init_memory will drop the Xen init
code & data, this will reduce the code an attacker can exploit.

> >
> >    So if the unpaused guests start executing the context switch at this
> >    point, then its MPU context will base on the boot-time MPU
> configuration.
> 
> Can you explain why you want to switch the MPU configuration that late?
> 

In the boot stage, Xen is the only user of MPU. It may add some memory
nodes or device memory to MPU regions for temporary usage. After free
init memory, we want to reclaim these MPU regions to give more MPU regions
can be used for guests. Also we will do some merge and reorder work. This
work can make MPU regions to be easier managed in guest context switch.

> >    Probably it will be inconsistent with runtime MPU configuration, this
> >    will cause unexpected problems (This may not happen in a single core
> >    system, but on SMP systems, this problem is foreseeable, so we hope
> to
> >    solve it at the beginning).
> 
> [...]
> 
> > ### **2.4. Changes of memory management**
> > Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
> > decouple Xen from VMSA. And give Xen the ability to manage memory in
> PMSA.
> >
> > 1. ***Use buddy allocator to manage physical pages for PMSA***
> >     From the view of physical page, PMSA and VMSA don't have any
> difference.
> >     So we can reuse buddy allocator on Armv8-R64 to manage physical
> pages.
> >     The difference is that, in VMSA, Xen will map allocated pages to
> virtual
> >     addresses. But in PMSA, Xen just convert the pages to physical
> address.
> >
> > 2. ***Can not use virtual address for memory management***
> >     As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using
> virtual
> >     address to manage memory. This brings some problems, some virtual
> address
> >     based features could not work well on Armv8-R64, like `FIXMAP`,
> `vmap/vumap`,
> >     `ioremap` and `alternative`.
> >
> >     But the functions or macros of these features are used in lots of
> common
> >     code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate
> code
> >     everywhere. In this case, we propose to use stub helpers to make the
> changes
> >     transparently to common code.
> >     1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap
> operations.
> >        This will return physical address directly of fixmapped item.
> >     2. For `vmap/vumap`, we will use some empty inline stub helpers:
> >          ```
> >          static inline void vm_init_type(...) {}
> >          static inline void *__vmap(...)
> >          {
> >              return NULL;
> >          }
> >          static inline void vunmap(const void *va) {}
> >          static inline void *vmalloc(size_t size)
> >          {
> >              return NULL;
> >          }
> >          static inline void *vmalloc_xen(size_t size)
> >          {
> >              return NULL;
> >          }
> >          static inline void vfree(void *va) {}
> >          ```
> >
> >     3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to
> always
> >        return `NULL`, they could not work well on Armv8-R64 without
> changes.
> >        `ioremap` will return input address directly.
> >          ```
> >          static inline void *ioremap_attr(...)
> >          {
> >              /* We don't have the ability to change input PA cache
> attributes */
> OOI, who will set them?

Some callers that want to change a memory's attribute will set them. Something like
ioremap_nocache. I am not sure is this what you had asked : )

> 
> >              if ( CACHE_ATTR_need_change )
> >                  return NULL;
> >              return (void *)pa;
> >          }
> >          static inline void __iomem *ioremap_nocache(...)
> >          {
> >              return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
> >          }
> >          static inline void __iomem *ioremap_cache(...)
> >          {
> >              return ioremap_attr(start, len, PAGE_HYPERVISOR);
> >          }
> >          static inline void __iomem *ioremap_wc(...)
> >          {
> >              return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
> >          }
> >          void *ioremap(...)
> >          {
> >              return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> >          }
> >
> >          ```
> >      4. For `alternative`, it depends on `vmap` too.
> 
> The only reason we depend on vmap() is because the map the sections
> *text read-only and we enforce WnX. For VMSA, it would be possible to
> avoid vmap() with some rework. I don't know for PMSA.
> 

For PMSA, we still enforce WnX. For your use case, I assume it's alternative.
It still may have some possibility to avoid vmap(). But there may be some
security issues. We had thought to disable MPU -> update xen text -> enable
MPU to copy VMSA alternative's behavior. The problem with this, however,
is that at some point, all memory is RWX. There maybe some security risk.
But because it's in init stage, it probably doesn't matter as much as I thought.

> > We will simply disable
> >         it on Armv8-R64 in current stage. How to implement `alternative`
> >         on Armv8-R64 is better to be discussed after basic functions of
> Xen
> >         on Armv8-R64 work well.
> alternative are mostly helpful to handle errata or enable features that
> are not present on all CPUs. I wouldn't expect this to be necessary at
> the beginning. In fact, on Arm, it was introduced > 4 years after the
> initial port :).

I hope it won't take us so long, this time : )

> 
> [...]
> 
> > ### **2.5. Changes of device driver**
> > 1. Because Armv8-R64 only has single secure state, this will affect some
> > devices that have two secure state, like GIC. But fortunately, most
> > vendors will not link a two secure state GIC to Armv8-R64 processors.
> > Current GIC driver can work well with single secure state GIC for Armv8-
> R64.
> > 2. Xen should use secure hypervisor timer in Secure EL2. We will
> introduce
> > a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for timer.
> >
> > ### **2.7. Changes of virtual device**
> > Currently, we only support pass-through devices in guest. Because event
> > channel, xen-bus, xen-storage and other advanced Xen features haven't
> been
> > enabled in Armv8-R64.
> 
> That's fine. I expect to require quite a bit of work to move from Xen
> sharing the pages (e.g. like for grant-tables) to the guest sharing pages.
> 

Yes.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-02-25 23:54     ` Stefano Stabellini
@ 2022-03-01 12:55       ` Wei Chen
  2022-03-01 23:38         ` Stefano Stabellini
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-01 12:55 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, julien, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2022年2月26日 7:54
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org; julien@xen.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> On Fri, 25 Feb 2022, Wei Chen wrote:
> > > Hi Wei,
> > >
> > > This is extremely exciting, thanks for the very nice summary!
> > >
> > >
> > > On Thu, 24 Feb 2022, Wei Chen wrote:
> > > > # Proposal for Porting Xen to Armv8-R64
> > > >
> > > > This proposal will introduce the PoC work of porting Xen to Armv8-
> R64,
> > > > which includes:
> > > > - The changes of current Xen capability, like Xen build system,
> memory
> > > >   management, domain management, vCPU context switch.
> > > > - The expanded Xen capability, like static-allocation and direct-map.
> > > >
> > > > ***Notes:***
> > > > 1. ***This proposal only covers the work of porting Xen to Armv8-
> R64***
> > > >    ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> > > >    ***Trusted-Frimware (TF-R). This is an external dependency,***
> > > >    ***so we think the discussion of Xen SMP support on Armv8-R64***
> > > >    ***should be started when single-CPU support is complete.***
> > > > 2. ***This proposal will not touch xen-tools. In current stage,***
> > > >    ***Xen on Armv8-R64 only support dom0less, all guests should***
> > > >    ***be booted from device tree.***
> > > >
> > > > ## 1. Essential Background
> > > >
> > > > ### 1.1. Armv8-R64 Profile
> > > > The Armv-R architecture profile was designed to support use cases
> that
> > > > have a high sensitivity to deterministic execution. (e.g. Fuel
> Injection,
> > > > Brake control, Drive trains, Motor control etc)
> > > >
> > > > Arm announced Armv8-R in 2013, it is the latest generation Arm
> > > architecture
> > > > targeted at the Real-time profile. It introduces virtualization at
> the
> > > highest
> > > > security level while retaining the Protected Memory System
> Architecture
> > > (PMSA)
> > > > based on a Memory Protection Unit (MPU). In 2020, Arm announced
> Cortex-
> > > R82,
> > > > which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
> > > >
> > > > - The latest Armv8-R64 document can be found here:
> > > >   [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R
> > > AArch64 architecture
> > > profile](https://developer.arm.com/documentation/ddi0600/latest/).
> > > >
> > > > - Armv-R Architecture progression:
> > > >   Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
> > > >   The following figure is a simple comparison of "R" processors
> based on
> > > >   different Armv-R Architectures.
> > > >   ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2K
> PZ8i
> > > mBpbvIr2eqBguEB)
> > > >
> > > > - The Armv8-R architecture evolved additional features on top of
> Armv7-R:
> > > >     - An exception model that is compatible with the Armv8-A model
> > > >     - Virtualization with support for guest operating systems
> > > >         - PMSA virtualization using MPUs In EL2.
> > > > - The new features of Armv8-R64 architecture
> > > >     - Adds support for the 64-bit A64 instruction set, previously
> Armv8-
> > > R
> > > >       only supported A32.
> > > >     - Supports up to 48-bit physical addressing, previously up to
> 32-bit
> > > >       addressing was supported.
> > > >     - Optional Arm Neon technology and Advanced SIMD
> > > >     - Supports three Exception Levels (ELs)
> > > >         - Secure EL2 - The Highest Privilege, MPU only, for firmware,
> > > hypervisor
> > > >         - Secure EL1 - RichOS (MMU) or RTOS (MPU)
> > > >         - Secure EL0 - Application Workloads
> > > >     - Optionally supports Virtual Memory System Architecture at S-
> EL1/S-
> > > EL0.
> > > >       This means it's possible to run rich OS kernels - like Linux -
> > > either
> > > >       bare-metal or as a guest.
> > > > - Differences with the Armv8-A AArch64 architecture
> > > >     - Supports only a single Security state - Secure. There is not
> Non-
> > > Secure
> > > >       execution state supported.
> > > >     - EL3 is not supported, EL2 is mandatory. This means secure EL2
> is
> > > the
> > > >       highest EL.
> > > >     - Supports the A64 ISA instruction
> > > >         - With a small set of well-defined differences
> > > >     - Provides a PMSA (Protected Memory System Architecture) based
> > > >       virtualization model.
> > > >         - As opposed to Armv8-A AArch64's VMSA based Virtualization
> > > >         - Can support address bits up to 52 if FEAT_LPA is enabled,
> > > >           otherwise 48 bits.
> > > >         - Determines the access permissions and memory attributes of
> > > >           the target PA.
> > > >         - Can implement PMSAv8-64 at EL1 and EL2
> > > >             - Address translation flat-maps the VA to the PA for EL2
> > > Stage 1.
> > > >             - Address translation flat-maps the VA to the PA for EL1
> > > Stage 1.
> > > >             - Address translation flat-maps the IPA to the PA for
> EL1
> > > Stage 2.
> > > >     - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.
> > > >
> > > > ### 1.2. Xen Challenges with PMSA Virtualization
> > > > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to
> run
> > > > with an MPU and host multiple guest OSes.
> > > >
> > > > - No MMU at EL2:
> > > >     - No EL2 Stage 1 address translation
> > > >         - Xen provides fixed ARM64 virtual memory layout as basis of
> EL2
> > > >           stage 1 address translation, which is not applicable on
> MPU
> > > system,
> > > >           where there is no virtual addressing. As a result, any
> > > operation
> > > >           involving transition from PA to VA, like ioremap, needs
> > > modification
> > > >           on MPU system.
> > > >     - Xen's run-time addresses are the same as the link time
> addresses.
> > > >         - Enable PIC (position-independent code) on a real-time
> target
> > > >           processor probably very rare.
> > > >     - Xen will need to use the EL2 MPU memory region descriptors to
> > > manage
> > > >       access permissions and attributes for accesses made by VMs at
> > > EL1/0.
> > > >         - Xen currently relies on MMU EL1 stage 2 table to manage
> these
> > > >           accesses.
> > > > - No MMU Stage 2 translation at EL1:
> > > >     - A guest doesn't have an independent guest physical address
> space
> > > >     - A guest can not reuse the current Intermediate Physical
> Address
> > > >       memory layout
> > > >     - A guest uses physical addresses to access memory and devices
> > > >     - The MPU at EL2 manages EL1 stage 2 access permissions and
> > > attributes
> > > > - There are a limited number of MPU protection regions at both EL2
> and
> > > EL1:
> > > >     - Architecturally, the maximum number of protection regions is
> 256,
> > > >       typical implementations have 32.
> > > >     - By contrast, Xen does not need to consider the number of page
> > > table
> > > >       entries in theory when using MMU.
> > > > - The MPU protection regions at EL2 need to be shared between the
> > > hypervisor
> > > >   and the guest stage 2.
> > > >     - Requires careful consideration - may impact feature 'fullness'
> of
> > > both
> > > >       the hypervisor and the guest
> > > >     - By contrast, when using MMU, Xen has standalone P2M table for
> > > guest
> > > >       stage 2 accesses.
> > > >
> > > > ## 2. Proposed changes of Xen
> > > > ### **2.1. Changes of build system:**
> > > >
> > > > - ***Introduce new Kconfig options for Armv8-R64***:
> > > >   Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may
> not
> > > >   expect one Xen binary to run on all machines. Xen images are not
> > > common
> > > >   across Armv8-R64 platforms. Xen must be re-built for different
> Armv8-
> > > R64
> > > >   platforms. Because these platforms may have different memory
> layout
> > > and
> > > >   link address.
> > > >     - `ARM64_V8R`:
> > > >       This option enables Armv8-R profile for Arm64. Enabling this
> > > option
> > > >       results in selecting MPU. This Kconfig option is used to gate
> some
> > > >       Armv8-R64 specific code except MPU code, like some code for
> Armv8-
> > > R64
> > > >       only system ID registers access.
> > > >
> > > >     - `ARM_MPU`
> > > >       This option enables MPU on ARMv8-R architecture. Enabling this
> > > option
> > > >       results in disabling MMU. This Kconfig option is used to gate
> some
> > > >       ARM_MPU specific code. Once when this Kconfig option has been
> > > enabled,
> > > >       the MMU relate code will not be built for Armv8-R64. The
> reason
> > > why
> > > >       not depends on runtime detection to select MMU or MPU is that,
> we
> > > don't
> > > >       think we can use one image for both Armv8-R64 and Armv8-A64.
> > > Another
> > > >       reason that we separate MPU and V8R in provision to allow to
> > > support MPU
> > > >       on 32bit Arm one day.
> > > >
> > > >     - `XEN_START_ADDRESS`
> > > >       This option allows to set the custom address at which Xen will
> be
> > > >       linked. This address must be aligned to a page size. Xen's
> run-
> > > time
> > > >       addresses are the same as the link time addresses. Different
> > > platforms
> > > >       may have differnt memory layout. This Kconfig option provides
> > > users
> > > >       the ability to select proper link addresses for their boards.
> > > >       ***Notes: Fixed link address means the Xen binary could not
> be***
> > > >       ***relocated by EFI loader. So in current stage, Xen could
> not***
> > > >       ***be launched as an EFI application on Armv8-R64.***
> > > >
> > > >     - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
> > > >       `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
> > > >       These Kconfig options allow to set memory regions for Xen code,
> > > data
> > > >       and device memory. Before parsing memory information from
> device
> > > tree,
> > > >       Xen will use the values that stored in these options to setup
> > > boot-time
> > > >       MPU configuration. Why we need a boot-time MPU configuration?
> > > >       1. More deterministic: Arm MPU supports background regions,
> > > >          if we don't configure the MPU regions and don't enable MPU.
> > > >          We can enable MPU background regions. But that means all
> RAM
> > > >          is RWX. Random values in RAM or maliciously embedded data
> can
> > > >          be exploited. Using these Kconfig options allow users to
> have
> > > >          a deterministic RAM area to execute code.
> > > >       2. More compatible: On some Armv8-R64 platforms, if the MPU is
> > > >          disabled, the `dc zva` instruction will make the system
> halt.
> > > >          And this instruction will be embedded in some built-in
> > > functions,
> > > >          like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
> > > >          the built-in functions will not contain `dc zva`. However,
> it
> > > is
> > > >          obviously unlikely that we will be able to recompile all
> GCC
> > > >          for ARMv8-R64.
> > > >       3. One optional idea:
> > > >           We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS +
> 2MB` or
> > > >           `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
> > > >           MPU normal memory. It's enough to support Xen run in boot
> time.
> > >
> > > I can imagine that we need to have a different Xen build for each
> > > ARMv8-R platform. Do you envision that XEN_START_ADDRESS and
> > > ARM_MPU_*_MEMORY_START/END are preconfigured based on the platform
> > > choice at build time? I don't think we want a user to provide all of
> > > those addresses by hand, right?
> >
> > Yes, this is in our TODO list. We want to reuse current arm/platforms
> and
> > Kconfig menu for Armv8-R.
> 
> OK, good
> 
> 
> > > The next question is whether we could automatically generate
> > > XEN_START_ADDRESS and ARM_MPU_*_MEMORY_START/END based on the platform
> > > device tree at build time (at build time, not runtime). That would
> > > make things a lot easier and it is also aligned with the way Zephyr
> and
> > > other RTOSes and baremetal apps work.
> >
> > It's a considerable option. But here we may encounter some problems need
> > to be solved first:
> > 1. Does CONFIG_DTB must be selected by default on Armv8-R? Without
> firmware
> >    or bootloader (like u-boot), we have to build DTB into Xen binary.
> 
> CONFIG_DTB should trigger runtime support for device tree, while here we
> are talking about build time support for device tree. It is very
> different.
> 
> Just to make an example, the whole build-time device tree could be
> scanned by Makefiles and other scripts, leading to C header files
> generations, but no code in Xen to parse device tree at all.
> 
> DTB ---> Makefiles/scripts ---> .h files ---> Makefiles/scripts ---> xen
> 

Yes, this is feasible.

> 
> I am not saying this is the best way to do it, I am only pointing out
> that build-time device tree does not imply run-time device tree. Also,
> it doesn't imply a DTB built-in the Xen binary (although that is also an
> option).
> 

I agree.

> The way many baremetal OSes and RTOSes work is that they take a DTB as
> input to the build *only*. From the DTB, the build-time make system
> generates #defines and header files that are imported in C.
> 
> The resulting RTOS binary doesn't need support for DTB, because all the
> right addresses have already been provided as #define by the Make
> system.
> 
> I don't think we need to go to the extreme of removing DTB support from
> Xen on ARMv8-R. I am only saying that if we add build-time device tree
> support it would make it easier to support multiple boards without
> having to have platform files in Xen for each of them, and we can do
> that without any impact on runtime device tree parsing.
> 

As V8R's use cases maybe mainly focus on some real-time/critical scenarios,
this may be a better method than platform files. We don't need to maintain
the platform related definitions header files. Xen also can skip the some
platform information parsing in boot time. This will increase the boot speed
of Xen in real-time/critical scenarios.

> 
> >    This
> >    can guarantee build-time DTB is the same as runtime DTB. But
> eventually,
> >    we will have firmware and bootloader before Xen launch (as Arm EBBR's
> >    requirement). In this case, we may not build DTB into Xen image. And
> >    we can't guarantee build-time DTB is the same as runtime DTB.
> 
> As mentioned, if we have a build-time DTB we might not need a run-time
> DTB. Secondly, I think it is entirely reasonable to expect that the
> build-time DTB and the run-time DTB are the same.
> 

Yes, if we implement in this way, we should describe it in limitation
of v8r Xen.

> It is the same problem with platform files: we have to assume that the
> information in the platform files matches the runtime DTB.
> 

indeed.

> 
> > 2. If build-time DTB is the same as runtime DTB, how can we determine
> >    the XEN_START_ADDRESS in DTB describe memory range? Should we always
> >    limit Xen to boot from lowest address? Or will we introduce some new
> >    DT property to specify the Xen start address? I think this DT
> property
> >    also can solve above question#1.
> 
> The loading address should be automatically chosen by the build scripts.
> We can do that now with ImageBuilder [1]: it selects a 2MB-aligned
> address for each binary to load, one by one starting from a 2MB offset
> from start of memory.
> 
> [1] https://gitlab.com/ViryaOS/imagebuilder/-/blob/master/scripts/uboot-
> script-gen#L390
> 
> So the build scripts can select XEN_START_ADDRESS based on the
> memory node information on the build-time device tree. And there should
> be no need to add XEN_START_ADDRESS to the runtime device tree.
> 

This is fine if there are no explicit restrictions on the platform.
Some platform may reserve some memory area for something like firmware,
But I think it's OK, in the worst case, we can hide this area from
build DTB.

> 
> > > The device tree can be given as input to the build system, and the
> > > Makefiles would take care of generating XEN_START_ADDRESS and
> > > ARM_MPU_*_MEMORY_START/END based on /memory and other interesting
> nodes.
> > >
> >
> > If we can solve above questions, yes, device tree is a good idea for
> > XEN_START_ADDRESS. For ARM_MPU_NORMAL_MEMORY_*, we can get them from
> > memory nodes, but for ARM_MPU_DEVICE_MEMORY_*, they are not easy for
> > us to scan all devices' nodes. And it's very tricky, if the memory
> > regions are interleaved. So in our current RFC code, we select to use
> > the optional idea:
> > We map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` for MPU normal
> memory.
> > But we use mpu,device-memory-section in DT for MPU device memory.
> 
> Keep in mind that we are talking about build-time scripts: it doesn't
> matter if they are slow. We can scan the build-time dtb as many time as
> needed and generate ARM_MPU_DEVICE_MEMORY_* as appropriate. It might
> make "make xen" slower but runtime will be unaffected.
> 
> So, I don't think this is a problem.
> 

OK.

> 
> > > > - ***Define new system registers for compilers***:
> > > >   Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> > > >   specific system registers. As Armv8-R64 only have secure state, so
> > > >   at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And
> the
> > > >   first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> > > >   these, PMSA of Armv8-R64 introduced lots of MPU related system
> > > registers:
> > > >   `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx`
> and
> > > >   `MPUIR_ELx`. But the first GCC version to support these system
> > > registers
> > > >   is GCC 11. So we have two ways to make compilers to work properly
> with
> > > >   these system registers.
> > > >   1. Bump GCC version to GCC 11.
> > > >      The pros of this method is that, we don't need to encode these
> > > >      system registers in macros by ourselves. But the cons are that,
> > > >      we have to update Makefiles to support GCC 11 for Armv8-R64.
> > > >      1.1. Check the GCC version 11 for Armv8-R64.
> > > >      1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> > > >      1.3. Solve the confliction of march=armv8r and mcpu=generic
> > > >     These changes will affect common Makefiles, not only Arm
> Makefiles.
> > > >     And GCC 11 is new, lots of toolchains and Distro haven't
> supported
> > > it.
> > > >
> > > >   2. Encode new system registers in macros ***(preferred)***
> > > >         ```
> > > >         /* Virtualization Secure Translation Control Register */
> > > >         #define VSTCR_EL2  S3_4_C2_C6_2
> > > >         /* Virtualization System Control Register */
> > > >         #define VSCTLR_EL2 S3_4_C2_C0_0
> > > >         /* EL1 MPU Protection Region Base Address Register encode */
> > > >         #define PRBAR_EL1  S3_0_C6_C8_0
> > > >         ...
> > > >         /* EL2 MPU Protection Region Base Address Register encode */
> > > >         #define PRBAR_EL2  S3_4_C6_C8_0
> > > >         ...
> > > >         ```
> > > >      If we encode all above system registers, we don't need to bump
> GCC
> > > >      version. And the common CFLAGS Xen is using still can be
> applied to
> > > >      Armv8-R64. We don't need to modify Makefiles to add specific
> CFLAGS.
> > >
> > > I think that's fine and we did something similar with the original
> ARMv7-A
> > > port if I remember correctly.
> > >
> > >
> > > > ### **2.2. Changes of the initialization process**
> > > > In general, we still expect Armv8-R64 and Armv8-A64 to have a
> consistent
> > > > initialization process. In addition to some architecture differences,
> > > there
> > > > is no more than reusable code that we will distinguish through
> > > CONFIG_ARM_MPU
> > > > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> > > reusable
> > > > between Armv8-R64 and Armv8-A64.
> > >
> > > +1
> > >
> > >
> > > > - We will reuse the original head.s and setup.c of Arm. But replace
> the
> > > >   MMU and page table operations in these files with configuration
> > > operations
> > > >   for MPU and MPU regions.
> > > >
> > > > - We provide a boot-time MPU configuration. This MPU configuration
> will
> > > >   support Xen to finish its initialization. And this boot-time MPU
> > > >   configuration will record the memory regions that will be parsed
> from
> > > >   device tree.
> > > >
> > > >   In the end of Xen initialization, we will use a runtime MPU
> > > configuration
> > > >   to replace boot-time MPU configuration. The runtime MPU
> configuration
> > > will
> > > >   merge and reorder memory regions to save more MPU regions for
> guests.
> > > >   ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1
> PqRD
> > > oacQVTwUtWIGU)
> > > >
> > > > - Defer system unpausing domain.
> > > >   When Xen initialization is about to end, Xen unpause guests
> created
> > > >   during initialization. But this will cause some issues. The
> unpause
> > > >   action occurs before free_init_memory, however the runtime MPU
> > > configuration
> > > >   is built after free_init_memory.
> > > >
> > > >   So if the unpaused guests start executing the context switch at
> this
> > > >   point, then its MPU context will base on the boot-time MPU
> > > configuration.
> > > >   Probably it will be inconsistent with runtime MPU configuration,
> this
> > > >   will cause unexpected problems (This may not happen in a single
> core
> > > >   system, but on SMP systems, this problem is foreseeable, so we
> hope to
> > > >   solve it at the beginning).
> > > >
> > > > ### **2.3. Changes to reduce memory fragmentation**
> > > >
> > > > In general, memory in Xen system can be classified to 4 classes:
> > > > `image sections`, `heap sections`, `guest RAM`, `boot modules (guest
> > > Kernel,
> > > > initrd and dtb)`
> > > >
> > > > Currently, Xen doesn't have any restriction for users how to
> allocate
> > > > memory for different classes. That means users can place boot
> modules
> > > > anywhere, can reserve Xen heap memory anywhere and can allocate
> guest
> > > > memory anywhere.
> > > >
> > > > In a VMSA system, this would not be too much of a problem, since the
> > > > MMU can manage memory at a granularity of 4KB after all. But in a
> > > > PMSA system, this will be a big problem. On Armv8-R64, the max MPU
> > > > protection regions number has been limited to 256. But in typical
> > > > processor implementations, few processors will design more than 32
> > > > MPU protection regions. Add in the fact that Xen shares MPU
> protection
> > > > regions with guest's EL1 Stage 2. It becomes even more important
> > > > to properly plan the use of MPU protection regions.
> > > >
> > > > - An ideal of memory usage layout restriction:
> > > > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3
> kXAt
> > > d75XtrngcnW)
> > > > 1. Reserve proper MPU regions for Xen image (code, rodata and data +
> > > bss).
> > > > 2. Reserve one MPU region for boot modules.
> > > >    That means the placement of all boot modules, include guest
> kernel,
> > > >    initrd and dtb, will be limited to this MPU region protected area.
> > > > 3. Reserve one or more MPU regions for Xen heap.
> > > >    On Armv8-R64, the guest memory is predefined in device tree, it
> will
> > > >    not be allocated from heap. Unlike Armv8-A64, we will not move
> all
> > > >    free memory to heap. We want Xen heap is dertermistic too, so Xen
> on
> > > >    Armv8-R64 also rely on Xen static heap feature. The memory for
> Xen
> > > >    heap will be defined in tree too. Considering that physical
> memory
> > > >    can also be discontinuous, one or more MPU protection regions
> needs
> > > >    to be reserved for Xen HEAP.
> > > > 4. If we name above used MPU protection regions PART_A, and name
> left
> > > >    MPU protection regions PART_B:
> > > >    4.1. In hypervisor context, Xen will map left RAM and devices to
> > > PART_B.
> > > >         This will give Xen the ability to access whole memory.
> > > >    4.2. In guest context, Xen will create EL1 stage 2 mapping in
> PART_B.
> > > >         In this case, Xen just need to update PART_B in context
> switch,
> > > >         but keep PART_A as fixed.
> > >
> > > I think that the memory layout and restrictions that you wrote above
> > > make sense. I have some comments on the way they are represented in
> > > device tree, but that's different.
> > >
> > >
> > > > ***Notes: Static allocation will be mandatory on MPU based
> systems***
> > > >
> > > > **A sample device tree of memory layout restriction**:
> > > > ```
> > > > chosen {
> > > >     ...
> > > >     /*
> > > >      * Define a section to place boot modules,
> > > >      * all boot modules must be placed in this section.
> > > >      */
> > > >     mpu,boot-module-section = <0x10000000 0x10000000>;
> > > >     /*
> > > >      * Define a section to cover all guest RAM. All guest RAM must
> be
> > > located
> > > >      * within this section. The pros is that, in best case, we can
> only
> > > have
> > > >      * one MPU protection region to map all guest RAM for Xen.
> > > >      */
> > > >     mpu,guest-memory-section = <0x20000000 0x30000000>;
> > > >     /*
> > > >      * Define a memory section that can cover all device memory that
> > > >      * will be used in Xen.
> > > >      */
> > > >     mpu,device-memory-section = <0x80000000 0x7ffff000>;
> > > >     /* Define a section for Xen heap */
> > > >     xen,static-mem = <0x50000000 0x20000000>;
> > >
> > > As mentioned above, I understand the need for these sections, but why
> do
> > > we need to describe them in device tree at all? Could Xen select them
> by
> > > itself during boot?
> >
> > I think without some inputs, Xen could not do this or will do it in some
> > assumption. For example, assume the first the boot-module-section
> determined
> > by lowest address and highest address of all modules. And the same for
> > guest-memory-section, calculated from all guest allocated memory regions.
> 
> Right, I think that the mpu,boot-module-section should be generated by a
> set of scripts like ImageBuilder. Something with a list of all the
> binaries that need to be loaded and also the DTB at build-time.
> Something like ImageBuilder would have the ability to add
> "mpu,boot-module-section" to device tree automatically and automatically
> choose a good address for it.
> 
> As an example, today ImageBuilder takes as input a config file like the
> following:
> 
> ---
> MEMORY_START="0x0"
> MEMORY_END="0x80000000"
> 
> DEVICE_TREE="4.16-2022.1/mpsoc.dtb"
> XEN="4.16-2022.1/xen"
> DOM0_KERNEL="4.16-2022.1/Image-dom0-5.16"
> DOM0_RAMDISK="4.16-2022.1/xen-rootfs.cpio.gz"
> 
> NUM_DOMUS=1
> DOMU_KERNEL[0]="4.16-2022.1/Image-domU"
> DOMU_RAMDISK[0]="4.16-2022.1/initrd.cpio"
> DOMU_PASSTHROUGH_DTB[0]="4.16-2022.1/passthrough-example-sram.dtb"
> ---
> 
> And generates a U-Boot boot.scr script with:
> - load addresses for each binary
> - commands to edit the DTB to add those addresses to device tree (e.g.
>   dom0less kernels addresses)
> 
> ImageBuilder can also modify the DTB at build time instead (instead of
> doing it from boot.scr.) See FDTEDIT.
> 
> I am not saying we should use ImageBuilder, but it sounds like we need
> something similar.
> 
> 

Yes, exactly. I have comment on Henry's stack heap RFC to said we need
a similar tool. Now, here it is : )

> > > If not, and considering that we have to generate
> > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it make sense
> to
> > > also generate mpu,guest-memory-section, xen,static-mem, etc. at build
> > > time rather than passing it via device tree to Xen at runtime?
> > >
> >
> > Did you mean we still add these information in device tree, but for
> build
> > time only. In runtime we don't parse them?
> 
> Yes, something like that, but see below.
> 
> 
> > > What's the value of doing ARM_MPU_*_MEMORY_START/END at build time and
> > > everything else at runtime?
> >
> > ARM_MPU_*_MEMORY_START/END is defined by platform. But other things are
> > users customized. They can change their usage without rebuild the image.
> 
> Good point.
> 
> We don't want to have to rebuild Xen if the user updated a guest kernel,
> resulting in a larger boot-module-section.
> 
> So I think it makes sense that "mpu,boot-module-section" is generated by
> the scripts (e.g. ImageBuilder) at build time, and Xen reads the
> property at boot from the runtime device tree.
> 
> I think we need to divide the information into two groups:
> 
> 
> # Group1: board info
> 
> This information is platform specific and it is not meant to change
> depending on the VM configuration. Ideally, we build Xen for a platform
> once, then we can use the same Xen binary together with any combination
> of dom0/domU kernels and ramdisks.
> 
> This kind of information doesn't need to be exposed to the runtime
> device tree. But we can still use a build-time device tree to generate
> the addresses if it is convenient.
> 
> XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and ARM_MPU_NORMAL_MEMORY_*
> seem to be part of this group.
> 

Yes.

> 
> # Group2: boot configuration
> 
> This information is about the specific set of binaries and VMs that we
> need to boot. It is conceptually similar to the dom0less device tree
> nodes that we already have. If we change one of the VM binaries, we
> likely have to refresh the information here.
> 
> "mpu,boot-module-section" probably belongs to this group (unless we find
> a way to define "mpu,boot-module-section" generically so that we don't
> need to change it any time the set of boot modules change.)
> 
> 

I agree.

> > > It looks like we are forced to have the sections definitions at build
> > > time because we need them before we can parse device tree. In that
> case,
> > > we might as well define all the sections at build time.
> > >
> > > But I think it would be even better if Xen could automatically choose
> > > xen,static-mem, mpu,guest-memory-section, etc. on its own based on the
> > > regular device tree information (/memory, /amba, etc.), without any
> need
> > > for explicitly describing each range with these new properties.
> > >
> >
> > for mpu,guest-memory-section, with the limitations: no other usage
> between
> > different guest' memory nodes, this is OK. But for xen,static-mem (heap),
> > we just want everything on a MPU system is dertermistic. But, of course
> Xen
> > can select left memory for heap without static-mem.
> 
> It is good that you think they can be chosen by Xen.
> 
> Differently from "boot-module-section", which has to do with the boot
> modules selected by the user for a specific execution,
> guest-memory-section and static-mem are Xen specific memory
> policies/allocations.
> 
> A user wouldn't know how to fill them in. And I worry that even a script

But users should know it, because static-mem for guest must be allocated
in this range. And users take the responsibility to set the DomU's
static allocate memory ranges.

> like ImageBuilder wouldn't be the best place to pick these values --
> they seem too "important" to leave to a script.
> 
> But it seems possible to choose the values in Xen:
> - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at build time
> - Xen reads boot-module-section from device tree
> 
> It should be possible at this point for Xen to pick the best values for
> guest-memory-section and static-mem based on the memory available.
> 

How Xen to pick? Does it mean in static allocation DomU DT node, we just
need a size, but don't require a start address for static-mem?

> 
> > > >     domU1 {
> > > >         ...
> > > >         #xen,static-mem-address-cells = <0x01>;
> > > >         #xen,static-mem-size-cells = <0x01>;
> > > >         /* Statically allocated guest memory, within mpu,guest-
> memory-
> > > section */
> > > >         xen,static-mem = <0x30000000 0x1f000000>;
> > > >
> > > >         module@11000000 {
> > > >             compatible = "multiboot,kernel\0multiboot,module";
> > > >             /* Boot module address, within mpu,boot-module-section
> */
> > > >             reg = <0x11000000 0x3000000>;
> > > >             ...
> > > >         };
> > > >
> > > >         module@10FF0000 {
> > > >                 compatible = "multiboot,device-
> tree\0multiboot,module";
> > > >                 /* Boot module address, within mpu,boot-module-
> section
> > > */
> > > >                 reg = <0x10ff0000 0x10000>;
> > > >                 ...
> > > >         };
> > > >     };
> > > > };
> > > > ```

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-01  6:29       ` Wei Chen
@ 2022-03-01 13:17         ` Julien Grall
  2022-03-02  6:43           ` Wei Chen
  0 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2022-03-01 13:17 UTC (permalink / raw)
  To: Wei Chen, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

On 01/03/2022 06:29, Wei Chen wrote:
> Hi Julien,

Hi,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2022年2月26日 4:12
>> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>
>> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
>> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
>> <Henry.Wang@arm.com>; nd <nd@arm.com>
>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
>>
>> Hi Wei,
>>
>> On 25/02/2022 10:48, Wei Chen wrote:
>>>>>       Armv8-R64 can support max to 256 MPU regions. But that's just
>>>> theoretical.
>>>>>       So we don't want to define `pr_t mpu_regions[256]`, this is a
>> memory
>>>> waste
>>>>>       in most of time. So we decided to let the user specify through a
>>>> Kconfig
>>>>>       option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value can
>> be
>>>> `32`,
>>>>>       it's a typical implementation on Armv8-R64. Users will recompile
>> Xen
>>>> when
>>>>>       their platform changes. So when the MPU changes, respecifying the
>>>> MPU
>>>>>       protection regions number will not cause additional problems.
>>>>
>>>> I wonder if we could probe the number of MPU regions at runtime and
>>>> dynamically allocate the memory needed to store them in arch_vcpu.
>>>>
>>>
>>> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But it
>> seems
>>> we will encounter some static allocated arch_vcpu problems and sizeof
>> issue.
>>
>> Does it need to be embedded in arch_vcpu? If not, then we could allocate
>> memory outside and add a pointer in arch_vcpu.
>>
> 
> We had thought to use a pointer in arch_vcpu instead of embedding mpu_regions
> into arch_vcpu. But we noticed that arch_vcpu has a __cacheline_aligned
> attribute, this may be because of arch_vcpu will be used very frequently
> in some critical path. So if we use the pointer for mpu_regions, may cause
> some cache miss in these critical path, for example, in context_swtich.

 From my understanding, the idea behind ``cacheline_aligned`` is to 
avoid the struct vcpu to be shared with other datastructure. Otherwise 
you may end up to have two pCPUs to frequently write the same cacheline 
which is not ideal.

arch_vcpu should embbed anything that will be accessed often (e.g. 
entry/exit) to certain point. For instance, not everything related to 
the vGIC are embbed in the vCPU/Domain structure.

I am a bit split regarding the mpu_regions. If they are mainly used in 
the context_switch() then I would argue this is a premature optimization 
because the scheduling decision is probably going to take a lot more 
time than the context switch itself.

Note that for the P2M we already have that indirection because it is 
embbed in the struct domain.

This raises one question, why is the MPUs regions will be per-vCPU 
rather per domain?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-01 12:55       ` Wei Chen
@ 2022-03-01 23:38         ` Stefano Stabellini
  2022-03-02  7:13           ` Wei Chen
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2022-03-01 23:38 UTC (permalink / raw)
  To: Wei Chen
  Cc: Stefano Stabellini, xen-devel, julien, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd

[-- Attachment #1: Type: text/plain, Size: 37937 bytes --]

On Tue, 1 Mar 2022, Wei Chen wrote:
> > On Fri, 25 Feb 2022, Wei Chen wrote:
> > > > Hi Wei,
> > > >
> > > > This is extremely exciting, thanks for the very nice summary!
> > > >
> > > >
> > > > On Thu, 24 Feb 2022, Wei Chen wrote:
> > > > > # Proposal for Porting Xen to Armv8-R64
> > > > >
> > > > > This proposal will introduce the PoC work of porting Xen to Armv8-
> > R64,
> > > > > which includes:
> > > > > - The changes of current Xen capability, like Xen build system,
> > memory
> > > > >   management, domain management, vCPU context switch.
> > > > > - The expanded Xen capability, like static-allocation and direct-map.
> > > > >
> > > > > ***Notes:***
> > > > > 1. ***This proposal only covers the work of porting Xen to Armv8-
> > R64***
> > > > >    ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> > > > >    ***Trusted-Frimware (TF-R). This is an external dependency,***
> > > > >    ***so we think the discussion of Xen SMP support on Armv8-R64***
> > > > >    ***should be started when single-CPU support is complete.***
> > > > > 2. ***This proposal will not touch xen-tools. In current stage,***
> > > > >    ***Xen on Armv8-R64 only support dom0less, all guests should***
> > > > >    ***be booted from device tree.***
> > > > >
> > > > > ## 1. Essential Background
> > > > >
> > > > > ### 1.1. Armv8-R64 Profile
> > > > > The Armv-R architecture profile was designed to support use cases
> > that
> > > > > have a high sensitivity to deterministic execution. (e.g. Fuel
> > Injection,
> > > > > Brake control, Drive trains, Motor control etc)
> > > > >
> > > > > Arm announced Armv8-R in 2013, it is the latest generation Arm
> > > > architecture
> > > > > targeted at the Real-time profile. It introduces virtualization at
> > the
> > > > highest
> > > > > security level while retaining the Protected Memory System
> > Architecture
> > > > (PMSA)
> > > > > based on a Memory Protection Unit (MPU). In 2020, Arm announced
> > Cortex-
> > > > R82,
> > > > > which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
> > > > >
> > > > > - The latest Armv8-R64 document can be found here:
> > > > >   [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R
> > > > AArch64 architecture
> > > > profile](https://developer.arm.com/documentation/ddi0600/latest/).
> > > > >
> > > > > - Armv-R Architecture progression:
> > > > >   Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
> > > > >   The following figure is a simple comparison of "R" processors
> > based on
> > > > >   different Armv-R Architectures.
> > > > >   ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2K
> > PZ8i
> > > > mBpbvIr2eqBguEB)
> > > > >
> > > > > - The Armv8-R architecture evolved additional features on top of
> > Armv7-R:
> > > > >     - An exception model that is compatible with the Armv8-A model
> > > > >     - Virtualization with support for guest operating systems
> > > > >         - PMSA virtualization using MPUs In EL2.
> > > > > - The new features of Armv8-R64 architecture
> > > > >     - Adds support for the 64-bit A64 instruction set, previously
> > Armv8-
> > > > R
> > > > >       only supported A32.
> > > > >     - Supports up to 48-bit physical addressing, previously up to
> > 32-bit
> > > > >       addressing was supported.
> > > > >     - Optional Arm Neon technology and Advanced SIMD
> > > > >     - Supports three Exception Levels (ELs)
> > > > >         - Secure EL2 - The Highest Privilege, MPU only, for firmware,
> > > > hypervisor
> > > > >         - Secure EL1 - RichOS (MMU) or RTOS (MPU)
> > > > >         - Secure EL0 - Application Workloads
> > > > >     - Optionally supports Virtual Memory System Architecture at S-
> > EL1/S-
> > > > EL0.
> > > > >       This means it's possible to run rich OS kernels - like Linux -
> > > > either
> > > > >       bare-metal or as a guest.
> > > > > - Differences with the Armv8-A AArch64 architecture
> > > > >     - Supports only a single Security state - Secure. There is not
> > Non-
> > > > Secure
> > > > >       execution state supported.
> > > > >     - EL3 is not supported, EL2 is mandatory. This means secure EL2
> > is
> > > > the
> > > > >       highest EL.
> > > > >     - Supports the A64 ISA instruction
> > > > >         - With a small set of well-defined differences
> > > > >     - Provides a PMSA (Protected Memory System Architecture) based
> > > > >       virtualization model.
> > > > >         - As opposed to Armv8-A AArch64's VMSA based Virtualization
> > > > >         - Can support address bits up to 52 if FEAT_LPA is enabled,
> > > > >           otherwise 48 bits.
> > > > >         - Determines the access permissions and memory attributes of
> > > > >           the target PA.
> > > > >         - Can implement PMSAv8-64 at EL1 and EL2
> > > > >             - Address translation flat-maps the VA to the PA for EL2
> > > > Stage 1.
> > > > >             - Address translation flat-maps the VA to the PA for EL1
> > > > Stage 1.
> > > > >             - Address translation flat-maps the IPA to the PA for
> > EL1
> > > > Stage 2.
> > > > >     - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable.
> > > > >
> > > > > ### 1.2. Xen Challenges with PMSA Virtualization
> > > > > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to
> > run
> > > > > with an MPU and host multiple guest OSes.
> > > > >
> > > > > - No MMU at EL2:
> > > > >     - No EL2 Stage 1 address translation
> > > > >         - Xen provides fixed ARM64 virtual memory layout as basis of
> > EL2
> > > > >           stage 1 address translation, which is not applicable on
> > MPU
> > > > system,
> > > > >           where there is no virtual addressing. As a result, any
> > > > operation
> > > > >           involving transition from PA to VA, like ioremap, needs
> > > > modification
> > > > >           on MPU system.
> > > > >     - Xen's run-time addresses are the same as the link time
> > addresses.
> > > > >         - Enable PIC (position-independent code) on a real-time
> > target
> > > > >           processor probably very rare.
> > > > >     - Xen will need to use the EL2 MPU memory region descriptors to
> > > > manage
> > > > >       access permissions and attributes for accesses made by VMs at
> > > > EL1/0.
> > > > >         - Xen currently relies on MMU EL1 stage 2 table to manage
> > these
> > > > >           accesses.
> > > > > - No MMU Stage 2 translation at EL1:
> > > > >     - A guest doesn't have an independent guest physical address
> > space
> > > > >     - A guest can not reuse the current Intermediate Physical
> > Address
> > > > >       memory layout
> > > > >     - A guest uses physical addresses to access memory and devices
> > > > >     - The MPU at EL2 manages EL1 stage 2 access permissions and
> > > > attributes
> > > > > - There are a limited number of MPU protection regions at both EL2
> > and
> > > > EL1:
> > > > >     - Architecturally, the maximum number of protection regions is
> > 256,
> > > > >       typical implementations have 32.
> > > > >     - By contrast, Xen does not need to consider the number of page
> > > > table
> > > > >       entries in theory when using MMU.
> > > > > - The MPU protection regions at EL2 need to be shared between the
> > > > hypervisor
> > > > >   and the guest stage 2.
> > > > >     - Requires careful consideration - may impact feature 'fullness'
> > of
> > > > both
> > > > >       the hypervisor and the guest
> > > > >     - By contrast, when using MMU, Xen has standalone P2M table for
> > > > guest
> > > > >       stage 2 accesses.
> > > > >
> > > > > ## 2. Proposed changes of Xen
> > > > > ### **2.1. Changes of build system:**
> > > > >
> > > > > - ***Introduce new Kconfig options for Armv8-R64***:
> > > > >   Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may
> > not
> > > > >   expect one Xen binary to run on all machines. Xen images are not
> > > > common
> > > > >   across Armv8-R64 platforms. Xen must be re-built for different
> > Armv8-
> > > > R64
> > > > >   platforms. Because these platforms may have different memory
> > layout
> > > > and
> > > > >   link address.
> > > > >     - `ARM64_V8R`:
> > > > >       This option enables Armv8-R profile for Arm64. Enabling this
> > > > option
> > > > >       results in selecting MPU. This Kconfig option is used to gate
> > some
> > > > >       Armv8-R64 specific code except MPU code, like some code for
> > Armv8-
> > > > R64
> > > > >       only system ID registers access.
> > > > >
> > > > >     - `ARM_MPU`
> > > > >       This option enables MPU on ARMv8-R architecture. Enabling this
> > > > option
> > > > >       results in disabling MMU. This Kconfig option is used to gate
> > some
> > > > >       ARM_MPU specific code. Once when this Kconfig option has been
> > > > enabled,
> > > > >       the MMU relate code will not be built for Armv8-R64. The
> > reason
> > > > why
> > > > >       not depends on runtime detection to select MMU or MPU is that,
> > we
> > > > don't
> > > > >       think we can use one image for both Armv8-R64 and Armv8-A64.
> > > > Another
> > > > >       reason that we separate MPU and V8R in provision to allow to
> > > > support MPU
> > > > >       on 32bit Arm one day.
> > > > >
> > > > >     - `XEN_START_ADDRESS`
> > > > >       This option allows to set the custom address at which Xen will
> > be
> > > > >       linked. This address must be aligned to a page size. Xen's
> > run-
> > > > time
> > > > >       addresses are the same as the link time addresses. Different
> > > > platforms
> > > > >       may have differnt memory layout. This Kconfig option provides
> > > > users
> > > > >       the ability to select proper link addresses for their boards.
> > > > >       ***Notes: Fixed link address means the Xen binary could not
> > be***
> > > > >       ***relocated by EFI loader. So in current stage, Xen could
> > not***
> > > > >       ***be launched as an EFI application on Armv8-R64.***
> > > > >
> > > > >     - `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END`
> > > > >       `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END`
> > > > >       These Kconfig options allow to set memory regions for Xen code,
> > > > data
> > > > >       and device memory. Before parsing memory information from
> > device
> > > > tree,
> > > > >       Xen will use the values that stored in these options to setup
> > > > boot-time
> > > > >       MPU configuration. Why we need a boot-time MPU configuration?
> > > > >       1. More deterministic: Arm MPU supports background regions,
> > > > >          if we don't configure the MPU regions and don't enable MPU.
> > > > >          We can enable MPU background regions. But that means all
> > RAM
> > > > >          is RWX. Random values in RAM or maliciously embedded data
> > can
> > > > >          be exploited. Using these Kconfig options allow users to
> > have
> > > > >          a deterministic RAM area to execute code.
> > > > >       2. More compatible: On some Armv8-R64 platforms, if the MPU is
> > > > >          disabled, the `dc zva` instruction will make the system
> > halt.
> > > > >          And this instruction will be embedded in some built-in
> > > > functions,
> > > > >          like `memory set`. If we use `-ddont_use_dc` to rebuild GCC,
> > > > >          the built-in functions will not contain `dc zva`. However,
> > it
> > > > is
> > > > >          obviously unlikely that we will be able to recompile all
> > GCC
> > > > >          for ARMv8-R64.
> > > > >       3. One optional idea:
> > > > >           We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS +
> > 2MB` or
> > > > >           `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end` for
> > > > >           MPU normal memory. It's enough to support Xen run in boot
> > time.
> > > >
> > > > I can imagine that we need to have a different Xen build for each
> > > > ARMv8-R platform. Do you envision that XEN_START_ADDRESS and
> > > > ARM_MPU_*_MEMORY_START/END are preconfigured based on the platform
> > > > choice at build time? I don't think we want a user to provide all of
> > > > those addresses by hand, right?
> > >
> > > Yes, this is in our TODO list. We want to reuse current arm/platforms
> > and
> > > Kconfig menu for Armv8-R.
> > 
> > OK, good
> > 
> > 
> > > > The next question is whether we could automatically generate
> > > > XEN_START_ADDRESS and ARM_MPU_*_MEMORY_START/END based on the platform
> > > > device tree at build time (at build time, not runtime). That would
> > > > make things a lot easier and it is also aligned with the way Zephyr
> > and
> > > > other RTOSes and baremetal apps work.
> > >
> > > It's a considerable option. But here we may encounter some problems need
> > > to be solved first:
> > > 1. Does CONFIG_DTB must be selected by default on Armv8-R? Without
> > firmware
> > >    or bootloader (like u-boot), we have to build DTB into Xen binary.
> > 
> > CONFIG_DTB should trigger runtime support for device tree, while here we
> > are talking about build time support for device tree. It is very
> > different.
> > 
> > Just to make an example, the whole build-time device tree could be
> > scanned by Makefiles and other scripts, leading to C header files
> > generations, but no code in Xen to parse device tree at all.
> > 
> > DTB ---> Makefiles/scripts ---> .h files ---> Makefiles/scripts ---> xen
> > 
> 
> Yes, this is feasible.
> 
> > 
> > I am not saying this is the best way to do it, I am only pointing out
> > that build-time device tree does not imply run-time device tree. Also,
> > it doesn't imply a DTB built-in the Xen binary (although that is also an
> > option).
> > 
> 
> I agree.
> 
> > The way many baremetal OSes and RTOSes work is that they take a DTB as
> > input to the build *only*. From the DTB, the build-time make system
> > generates #defines and header files that are imported in C.
> > 
> > The resulting RTOS binary doesn't need support for DTB, because all the
> > right addresses have already been provided as #define by the Make
> > system.
> > 
> > I don't think we need to go to the extreme of removing DTB support from
> > Xen on ARMv8-R. I am only saying that if we add build-time device tree
> > support it would make it easier to support multiple boards without
> > having to have platform files in Xen for each of them, and we can do
> > that without any impact on runtime device tree parsing.
> > 
> 
> As V8R's use cases maybe mainly focus on some real-time/critical scenarios,
> this may be a better method than platform files. We don't need to maintain
> the platform related definitions header files. Xen also can skip the some
> platform information parsing in boot time. This will increase the boot speed
> of Xen in real-time/critical scenarios.

+1


> > >    This
> > >    can guarantee build-time DTB is the same as runtime DTB. But
> > eventually,
> > >    we will have firmware and bootloader before Xen launch (as Arm EBBR's
> > >    requirement). In this case, we may not build DTB into Xen image. And
> > >    we can't guarantee build-time DTB is the same as runtime DTB.
> > 
> > As mentioned, if we have a build-time DTB we might not need a run-time
> > DTB. Secondly, I think it is entirely reasonable to expect that the
> > build-time DTB and the run-time DTB are the same.
> > 
> 
> Yes, if we implement in this way, we should describe it in limitation
> of v8r Xen.
> 
> > It is the same problem with platform files: we have to assume that the
> > information in the platform files matches the runtime DTB.
> > 
> 
> indeed.
> 
> > 
> > > 2. If build-time DTB is the same as runtime DTB, how can we determine
> > >    the XEN_START_ADDRESS in DTB describe memory range? Should we always
> > >    limit Xen to boot from lowest address? Or will we introduce some new
> > >    DT property to specify the Xen start address? I think this DT
> > property
> > >    also can solve above question#1.
> > 
> > The loading address should be automatically chosen by the build scripts.
> > We can do that now with ImageBuilder [1]: it selects a 2MB-aligned
> > address for each binary to load, one by one starting from a 2MB offset
> > from start of memory.
> > 
> > [1] https://gitlab.com/ViryaOS/imagebuilder/-/blob/master/scripts/uboot-
> > script-gen#L390
> > 
> > So the build scripts can select XEN_START_ADDRESS based on the
> > memory node information on the build-time device tree. And there should
> > be no need to add XEN_START_ADDRESS to the runtime device tree.
> > 
> 
> This is fine if there are no explicit restrictions on the platform.
> Some platform may reserve some memory area for something like firmware,
> But I think it's OK, in the worst case, we can hide this area from
> build DTB.
> 
> > 
> > > > The device tree can be given as input to the build system, and the
> > > > Makefiles would take care of generating XEN_START_ADDRESS and
> > > > ARM_MPU_*_MEMORY_START/END based on /memory and other interesting
> > nodes.
> > > >
> > >
> > > If we can solve above questions, yes, device tree is a good idea for
> > > XEN_START_ADDRESS. For ARM_MPU_NORMAL_MEMORY_*, we can get them from
> > > memory nodes, but for ARM_MPU_DEVICE_MEMORY_*, they are not easy for
> > > us to scan all devices' nodes. And it's very tricky, if the memory
> > > regions are interleaved. So in our current RFC code, we select to use
> > > the optional idea:
> > > We map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` for MPU normal
> > memory.
> > > But we use mpu,device-memory-section in DT for MPU device memory.
> > 
> > Keep in mind that we are talking about build-time scripts: it doesn't
> > matter if they are slow. We can scan the build-time dtb as many time as
> > needed and generate ARM_MPU_DEVICE_MEMORY_* as appropriate. It might
> > make "make xen" slower but runtime will be unaffected.
> > 
> > So, I don't think this is a problem.
> > 
> 
> OK.
> 
> > 
> > > > > - ***Define new system registers for compilers***:
> > > > >   Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> > > > >   specific system registers. As Armv8-R64 only have secure state, so
> > > > >   at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And
> > the
> > > > >   first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> > > > >   these, PMSA of Armv8-R64 introduced lots of MPU related system
> > > > registers:
> > > > >   `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx`
> > and
> > > > >   `MPUIR_ELx`. But the first GCC version to support these system
> > > > registers
> > > > >   is GCC 11. So we have two ways to make compilers to work properly
> > with
> > > > >   these system registers.
> > > > >   1. Bump GCC version to GCC 11.
> > > > >      The pros of this method is that, we don't need to encode these
> > > > >      system registers in macros by ourselves. But the cons are that,
> > > > >      we have to update Makefiles to support GCC 11 for Armv8-R64.
> > > > >      1.1. Check the GCC version 11 for Armv8-R64.
> > > > >      1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> > > > >      1.3. Solve the confliction of march=armv8r and mcpu=generic
> > > > >     These changes will affect common Makefiles, not only Arm
> > Makefiles.
> > > > >     And GCC 11 is new, lots of toolchains and Distro haven't
> > supported
> > > > it.
> > > > >
> > > > >   2. Encode new system registers in macros ***(preferred)***
> > > > >         ```
> > > > >         /* Virtualization Secure Translation Control Register */
> > > > >         #define VSTCR_EL2  S3_4_C2_C6_2
> > > > >         /* Virtualization System Control Register */
> > > > >         #define VSCTLR_EL2 S3_4_C2_C0_0
> > > > >         /* EL1 MPU Protection Region Base Address Register encode */
> > > > >         #define PRBAR_EL1  S3_0_C6_C8_0
> > > > >         ...
> > > > >         /* EL2 MPU Protection Region Base Address Register encode */
> > > > >         #define PRBAR_EL2  S3_4_C6_C8_0
> > > > >         ...
> > > > >         ```
> > > > >      If we encode all above system registers, we don't need to bump
> > GCC
> > > > >      version. And the common CFLAGS Xen is using still can be
> > applied to
> > > > >      Armv8-R64. We don't need to modify Makefiles to add specific
> > CFLAGS.
> > > >
> > > > I think that's fine and we did something similar with the original
> > ARMv7-A
> > > > port if I remember correctly.
> > > >
> > > >
> > > > > ### **2.2. Changes of the initialization process**
> > > > > In general, we still expect Armv8-R64 and Armv8-A64 to have a
> > consistent
> > > > > initialization process. In addition to some architecture differences,
> > > > there
> > > > > is no more than reusable code that we will distinguish through
> > > > CONFIG_ARM_MPU
> > > > > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> > > > reusable
> > > > > between Armv8-R64 and Armv8-A64.
> > > >
> > > > +1
> > > >
> > > >
> > > > > - We will reuse the original head.s and setup.c of Arm. But replace
> > the
> > > > >   MMU and page table operations in these files with configuration
> > > > operations
> > > > >   for MPU and MPU regions.
> > > > >
> > > > > - We provide a boot-time MPU configuration. This MPU configuration
> > will
> > > > >   support Xen to finish its initialization. And this boot-time MPU
> > > > >   configuration will record the memory regions that will be parsed
> > from
> > > > >   device tree.
> > > > >
> > > > >   In the end of Xen initialization, we will use a runtime MPU
> > > > configuration
> > > > >   to replace boot-time MPU configuration. The runtime MPU
> > configuration
> > > > will
> > > > >   merge and reorder memory regions to save more MPU regions for
> > guests.
> > > > >   ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1
> > PqRD
> > > > oacQVTwUtWIGU)
> > > > >
> > > > > - Defer system unpausing domain.
> > > > >   When Xen initialization is about to end, Xen unpause guests
> > created
> > > > >   during initialization. But this will cause some issues. The
> > unpause
> > > > >   action occurs before free_init_memory, however the runtime MPU
> > > > configuration
> > > > >   is built after free_init_memory.
> > > > >
> > > > >   So if the unpaused guests start executing the context switch at
> > this
> > > > >   point, then its MPU context will base on the boot-time MPU
> > > > configuration.
> > > > >   Probably it will be inconsistent with runtime MPU configuration,
> > this
> > > > >   will cause unexpected problems (This may not happen in a single
> > core
> > > > >   system, but on SMP systems, this problem is foreseeable, so we
> > hope to
> > > > >   solve it at the beginning).
> > > > >
> > > > > ### **2.3. Changes to reduce memory fragmentation**
> > > > >
> > > > > In general, memory in Xen system can be classified to 4 classes:
> > > > > `image sections`, `heap sections`, `guest RAM`, `boot modules (guest
> > > > Kernel,
> > > > > initrd and dtb)`
> > > > >
> > > > > Currently, Xen doesn't have any restriction for users how to
> > allocate
> > > > > memory for different classes. That means users can place boot
> > modules
> > > > > anywhere, can reserve Xen heap memory anywhere and can allocate
> > guest
> > > > > memory anywhere.
> > > > >
> > > > > In a VMSA system, this would not be too much of a problem, since the
> > > > > MMU can manage memory at a granularity of 4KB after all. But in a
> > > > > PMSA system, this will be a big problem. On Armv8-R64, the max MPU
> > > > > protection regions number has been limited to 256. But in typical
> > > > > processor implementations, few processors will design more than 32
> > > > > MPU protection regions. Add in the fact that Xen shares MPU
> > protection
> > > > > regions with guest's EL1 Stage 2. It becomes even more important
> > > > > to properly plan the use of MPU protection regions.
> > > > >
> > > > > - An ideal of memory usage layout restriction:
> > > > > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3
> > kXAt
> > > > d75XtrngcnW)
> > > > > 1. Reserve proper MPU regions for Xen image (code, rodata and data +
> > > > bss).
> > > > > 2. Reserve one MPU region for boot modules.
> > > > >    That means the placement of all boot modules, include guest
> > kernel,
> > > > >    initrd and dtb, will be limited to this MPU region protected area.
> > > > > 3. Reserve one or more MPU regions for Xen heap.
> > > > >    On Armv8-R64, the guest memory is predefined in device tree, it
> > will
> > > > >    not be allocated from heap. Unlike Armv8-A64, we will not move
> > all
> > > > >    free memory to heap. We want Xen heap is dertermistic too, so Xen
> > on
> > > > >    Armv8-R64 also rely on Xen static heap feature. The memory for
> > Xen
> > > > >    heap will be defined in tree too. Considering that physical
> > memory
> > > > >    can also be discontinuous, one or more MPU protection regions
> > needs
> > > > >    to be reserved for Xen HEAP.
> > > > > 4. If we name above used MPU protection regions PART_A, and name
> > left
> > > > >    MPU protection regions PART_B:
> > > > >    4.1. In hypervisor context, Xen will map left RAM and devices to
> > > > PART_B.
> > > > >         This will give Xen the ability to access whole memory.
> > > > >    4.2. In guest context, Xen will create EL1 stage 2 mapping in
> > PART_B.
> > > > >         In this case, Xen just need to update PART_B in context
> > switch,
> > > > >         but keep PART_A as fixed.
> > > >
> > > > I think that the memory layout and restrictions that you wrote above
> > > > make sense. I have some comments on the way they are represented in
> > > > device tree, but that's different.
> > > >
> > > >
> > > > > ***Notes: Static allocation will be mandatory on MPU based
> > systems***
> > > > >
> > > > > **A sample device tree of memory layout restriction**:
> > > > > ```
> > > > > chosen {
> > > > >     ...
> > > > >     /*
> > > > >      * Define a section to place boot modules,
> > > > >      * all boot modules must be placed in this section.
> > > > >      */
> > > > >     mpu,boot-module-section = <0x10000000 0x10000000>;
> > > > >     /*
> > > > >      * Define a section to cover all guest RAM. All guest RAM must
> > be
> > > > located
> > > > >      * within this section. The pros is that, in best case, we can
> > only
> > > > have
> > > > >      * one MPU protection region to map all guest RAM for Xen.
> > > > >      */
> > > > >     mpu,guest-memory-section = <0x20000000 0x30000000>;
> > > > >     /*
> > > > >      * Define a memory section that can cover all device memory that
> > > > >      * will be used in Xen.
> > > > >      */
> > > > >     mpu,device-memory-section = <0x80000000 0x7ffff000>;
> > > > >     /* Define a section for Xen heap */
> > > > >     xen,static-mem = <0x50000000 0x20000000>;
> > > >
> > > > As mentioned above, I understand the need for these sections, but why
> > do
> > > > we need to describe them in device tree at all? Could Xen select them
> > by
> > > > itself during boot?
> > >
> > > I think without some inputs, Xen could not do this or will do it in some
> > > assumption. For example, assume the first the boot-module-section
> > determined
> > > by lowest address and highest address of all modules. And the same for
> > > guest-memory-section, calculated from all guest allocated memory regions.
> > 
> > Right, I think that the mpu,boot-module-section should be generated by a
> > set of scripts like ImageBuilder. Something with a list of all the
> > binaries that need to be loaded and also the DTB at build-time.
> > Something like ImageBuilder would have the ability to add
> > "mpu,boot-module-section" to device tree automatically and automatically
> > choose a good address for it.
> > 
> > As an example, today ImageBuilder takes as input a config file like the
> > following:
> > 
> > ---
> > MEMORY_START="0x0"
> > MEMORY_END="0x80000000"
> > 
> > DEVICE_TREE="4.16-2022.1/mpsoc.dtb"
> > XEN="4.16-2022.1/xen"
> > DOM0_KERNEL="4.16-2022.1/Image-dom0-5.16"
> > DOM0_RAMDISK="4.16-2022.1/xen-rootfs.cpio.gz"
> > 
> > NUM_DOMUS=1
> > DOMU_KERNEL[0]="4.16-2022.1/Image-domU"
> > DOMU_RAMDISK[0]="4.16-2022.1/initrd.cpio"
> > DOMU_PASSTHROUGH_DTB[0]="4.16-2022.1/passthrough-example-sram.dtb"
> > ---
> > 
> > And generates a U-Boot boot.scr script with:
> > - load addresses for each binary
> > - commands to edit the DTB to add those addresses to device tree (e.g.
> >   dom0less kernels addresses)
> > 
> > ImageBuilder can also modify the DTB at build time instead (instead of
> > doing it from boot.scr.) See FDTEDIT.
> > 
> > I am not saying we should use ImageBuilder, but it sounds like we need
> > something similar.
> > 
> > 
> 
> Yes, exactly. I have comment on Henry's stack heap RFC to said we need
> a similar tool. Now, here it is : )

Ahah yes :-)

Initially I wrote ImageBuilder because people kept sending me emails to
ask me for help with dom0less and almost always it was an address
loading error.

I would be happy to turn ImageBuilder into something useful for ARMv8-R
as well and add more maintainers from ARM and other companies.


> > > > If not, and considering that we have to generate
> > > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it make sense
> > to
> > > > also generate mpu,guest-memory-section, xen,static-mem, etc. at build
> > > > time rather than passing it via device tree to Xen at runtime?
> > > >
> > >
> > > Did you mean we still add these information in device tree, but for
> > build
> > > time only. In runtime we don't parse them?
> > 
> > Yes, something like that, but see below.
> > 
> > 
> > > > What's the value of doing ARM_MPU_*_MEMORY_START/END at build time and
> > > > everything else at runtime?
> > >
> > > ARM_MPU_*_MEMORY_START/END is defined by platform. But other things are
> > > users customized. They can change their usage without rebuild the image.
> > 
> > Good point.
> > 
> > We don't want to have to rebuild Xen if the user updated a guest kernel,
> > resulting in a larger boot-module-section.
> > 
> > So I think it makes sense that "mpu,boot-module-section" is generated by
> > the scripts (e.g. ImageBuilder) at build time, and Xen reads the
> > property at boot from the runtime device tree.
> > 
> > I think we need to divide the information into two groups:
> > 
> > 
> > # Group1: board info
> > 
> > This information is platform specific and it is not meant to change
> > depending on the VM configuration. Ideally, we build Xen for a platform
> > once, then we can use the same Xen binary together with any combination
> > of dom0/domU kernels and ramdisks.
> > 
> > This kind of information doesn't need to be exposed to the runtime
> > device tree. But we can still use a build-time device tree to generate
> > the addresses if it is convenient.
> > 
> > XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and ARM_MPU_NORMAL_MEMORY_*
> > seem to be part of this group.
> > 
> 
> Yes.
> 
> > 
> > # Group2: boot configuration
> > 
> > This information is about the specific set of binaries and VMs that we
> > need to boot. It is conceptually similar to the dom0less device tree
> > nodes that we already have. If we change one of the VM binaries, we
> > likely have to refresh the information here.
> > 
> > "mpu,boot-module-section" probably belongs to this group (unless we find
> > a way to define "mpu,boot-module-section" generically so that we don't
> > need to change it any time the set of boot modules change.)
> > 
> > 
> 
> I agree.
> 
> > > > It looks like we are forced to have the sections definitions at build
> > > > time because we need them before we can parse device tree. In that
> > case,
> > > > we might as well define all the sections at build time.
> > > >
> > > > But I think it would be even better if Xen could automatically choose
> > > > xen,static-mem, mpu,guest-memory-section, etc. on its own based on the
> > > > regular device tree information (/memory, /amba, etc.), without any
> > need
> > > > for explicitly describing each range with these new properties.
> > > >
> > >
> > > for mpu,guest-memory-section, with the limitations: no other usage
> > between
> > > different guest' memory nodes, this is OK. But for xen,static-mem (heap),
> > > we just want everything on a MPU system is dertermistic. But, of course
> > Xen
> > > can select left memory for heap without static-mem.
> > 
> > It is good that you think they can be chosen by Xen.
> > 
> > Differently from "boot-module-section", which has to do with the boot
> > modules selected by the user for a specific execution,
> > guest-memory-section and static-mem are Xen specific memory
> > policies/allocations.
> > 
> > A user wouldn't know how to fill them in. And I worry that even a script
> 
> But users should know it, because static-mem for guest must be allocated
> in this range. And users take the responsibility to set the DomU's
> static allocate memory ranges.

Let me premise that my goal is to avoid having many users reporting
errors to xen-devel and xen-users when actually it is just a wrong
choice of addresses.

I think we need to make a distinction between addresses for the boot
modules, e.g. addresses where to load xen, the dom0/U kernel, dom0/U
ramdisk in memory at boot time, and VM static memory addresses.

The boot modules addresses are particularly difficult to fill in because
they are many and a small update in one of the modules could invalidate
all the other addresses. This is why I ended up writing ImageBuilder.
Since them, I received several emails from users thanking me for
ImageBuilder :-)

The static VM memory addresses (xen,static-mem) should be a bit easier
to fill in correctly. They are meant to be chosen once, and it shouldn't
happen that an update on a kernel forces the user to change all the VM
static memory addresses. Also, I know that some users actually want to
be able to choose the domU addresses by hand because they have specific
needs. So it is good that we can let the user choose the addresses if
they want to.

With all of that said, I do think that many users won't have an opinion
on the VM static memory addresses and won't know how to choose them.
It would be error prone to let them try to fill them in by hand. So I
was already planning on adding support to ImageBuilder to automatically
generate xen,static-mem for dom0less domains.


Going back to this specific discussion about boot-module-section: I can
see now that, given xen,static-mem is chosen by ImageBuilder (or
similar) and not Xen, then it makes sense to have ImageBuilder (or
similar) also generate boot-module-section.



> > like ImageBuilder wouldn't be the best place to pick these values --
> > they seem too "important" to leave to a script.
> > 
> > But it seems possible to choose the values in Xen:
> > - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at build time
> > - Xen reads boot-module-section from device tree
> > 
> > It should be possible at this point for Xen to pick the best values for
> > guest-memory-section and static-mem based on the memory available.
> > 
> 
> How Xen to pick? Does it mean in static allocation DomU DT node, we just
> need a size, but don't require a start address for static-mem?

Yes the idea was that the user would only provide the size (e.g.
DOMU_STATIC_MEM[1]=1024) and the addresses would be automatically
calculated. But I didn't mean to change the existing xen,static-mem
device tree bindings. So it is best if the xen,static-mem addresses
generation is done by ImageBuilder (or similar tool) instead of Xen.

Sorry for the confusion!


> > > > >     domU1 {
> > > > >         ...
> > > > >         #xen,static-mem-address-cells = <0x01>;
> > > > >         #xen,static-mem-size-cells = <0x01>;
> > > > >         /* Statically allocated guest memory, within mpu,guest-
> > memory-
> > > > section */
> > > > >         xen,static-mem = <0x30000000 0x1f000000>;
> > > > >
> > > > >         module@11000000 {
> > > > >             compatible = "multiboot,kernel\0multiboot,module";
> > > > >             /* Boot module address, within mpu,boot-module-section
> > */
> > > > >             reg = <0x11000000 0x3000000>;
> > > > >             ...
> > > > >         };
> > > > >
> > > > >         module@10FF0000 {
> > > > >                 compatible = "multiboot,device-
> > tree\0multiboot,module";
> > > > >                 /* Boot module address, within mpu,boot-module-
> > section
> > > > */
> > > > >                 reg = <0x10ff0000 0x10000>;
> > > > >                 ...
> > > > >         };
> > > > >     };
> > > > > };
> > > > > ```

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-01 13:17         ` Julien Grall
@ 2022-03-02  6:43           ` Wei Chen
  2022-03-02 10:24             ` Julien Grall
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-02  6:43 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年3月1日 21:17
> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> On 01/03/2022 06:29, Wei Chen wrote:
> > Hi Julien,
> 
> Hi,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2022年2月26日 4:12
> >> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>
> >> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry
> Wang
> >> <Henry.Wang@arm.com>; nd <nd@arm.com>
> >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>
> >> Hi Wei,
> >>
> >> On 25/02/2022 10:48, Wei Chen wrote:
> >>>>>       Armv8-R64 can support max to 256 MPU regions. But that's just
> >>>> theoretical.
> >>>>>       So we don't want to define `pr_t mpu_regions[256]`, this is a
> >> memory
> >>>> waste
> >>>>>       in most of time. So we decided to let the user specify through
> a
> >>>> Kconfig
> >>>>>       option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value
> can
> >> be
> >>>> `32`,
> >>>>>       it's a typical implementation on Armv8-R64. Users will
> recompile
> >> Xen
> >>>> when
> >>>>>       their platform changes. So when the MPU changes, respecifying
> the
> >>>> MPU
> >>>>>       protection regions number will not cause additional problems.
> >>>>
> >>>> I wonder if we could probe the number of MPU regions at runtime and
> >>>> dynamically allocate the memory needed to store them in arch_vcpu.
> >>>>
> >>>
> >>> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But it
> >> seems
> >>> we will encounter some static allocated arch_vcpu problems and sizeof
> >> issue.
> >>
> >> Does it need to be embedded in arch_vcpu? If not, then we could
> allocate
> >> memory outside and add a pointer in arch_vcpu.
> >>
> >
> > We had thought to use a pointer in arch_vcpu instead of embedding
> mpu_regions
> > into arch_vcpu. But we noticed that arch_vcpu has a __cacheline_aligned
> > attribute, this may be because of arch_vcpu will be used very frequently
> > in some critical path. So if we use the pointer for mpu_regions, may
> cause
> > some cache miss in these critical path, for example, in context_swtich.
> 
>  From my understanding, the idea behind ``cacheline_aligned`` is to
> avoid the struct vcpu to be shared with other datastructure. Otherwise
> you may end up to have two pCPUs to frequently write the same cacheline
> which is not ideal.
> 
> arch_vcpu should embbed anything that will be accessed often (e.g.
> entry/exit) to certain point. For instance, not everything related to
> the vGIC are embbed in the vCPU/Domain structure.
> 
> I am a bit split regarding the mpu_regions. If they are mainly used in
> the context_switch() then I would argue this is a premature optimization
> because the scheduling decision is probably going to take a lot more
> time than the context switch itself.

mpu_regions in arch_vcpu are used to save guest's EL1 MPU context. So, yes,
they are mainly used in context_switch. In terms of the number of registers,
it will save/restore more work than the original V8A. And on V8R we also need
to keep most of the original V8A save/restore work. So it will take longer
than the original V8A context_switch. And I think this is due to architecture's
difference. So it's impossible for us not to save/restore EL1 MPU region
registers in context_switch. And we have done some optimization for EL1 MPU
save/restore:
1. Assembly code for EL1 MPU context_switch
2. Use real MPU regions number instead of CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS
   in context_switch. CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS is defined the Max
   supported EL1 MPU regions for this Xen image. All platforms that implement
   EL1 MPU regions in this range can work well with this Xen Image. But if the
   implemented EL1 MPU region number exceeds CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS,
   this Xen image could not work well on this platform.
  
> 
> Note that for the P2M we already have that indirection because it is
> embbed in the struct domain.

It's different with V8A P2M case. In V8A context_switch we just need to
save/restore VTTBR, we don't need to do P2M table walk. But on V8R, we
need to access valid mpu_regions for save/restore.

> 
> This raises one question, why is the MPUs regions will be per-vCPU
> rather per domain?
> 

Because there is a EL1 MPU component for each pCPU. We can't assume guest
to use the same EL1 MPU configuration for all vCPU.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-01 23:38         ` Stefano Stabellini
@ 2022-03-02  7:13           ` Wei Chen
  2022-03-02 22:55             ` Stefano Stabellini
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-02  7:13 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, julien, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2022年3月2日 7:39
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org; julien@xen.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> On Tue, 1 Mar 2022, Wei Chen wrote:
> > > On Fri, 25 Feb 2022, Wei Chen wrote:
> > > > > Hi Wei,
> > > > >
> > > > > This is extremely exciting, thanks for the very nice summary!
> > > > >
> > > > >
> > > > > On Thu, 24 Feb 2022, Wei Chen wrote:
> > > > > > # Proposal for Porting Xen to Armv8-R64
> > > > > >
> > > > > > This proposal will introduce the PoC work of porting Xen to
> Armv8-
> > > R64,
> > > > > > which includes:
> > > > > > - The changes of current Xen capability, like Xen build system,
> > > memory
> > > > > >   management, domain management, vCPU context switch.
> > > > > > - The expanded Xen capability, like static-allocation and
> direct-map.
> > > > > >
> > > > > > ***Notes:***
> > > > > > 1. ***This proposal only covers the work of porting Xen to
> Armv8-
> > > R64***
> > > > > >    ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-
> R***
> > > > > >    ***Trusted-Frimware (TF-R). This is an external
> dependency,***
> > > > > >    ***so we think the discussion of Xen SMP support on Armv8-
> R64***
> > > > > >    ***should be started when single-CPU support is complete.***
> > > > > > 2. ***This proposal will not touch xen-tools. In current
> stage,***
> > > > > >    ***Xen on Armv8-R64 only support dom0less, all guests
> should***
> > > > > >    ***be booted from device tree.***
> > > > > >
> > > > > > ## 1. Essential Background
> > > > > >
> > > > > > ### 1.1. Armv8-R64 Profile
> > > > > > The Armv-R architecture profile was designed to support use
> cases
> > > that
> > > > > > have a high sensitivity to deterministic execution. (e.g. Fuel
> > > Injection,
> > > > > > Brake control, Drive trains, Motor control etc)
> > > > > >
> > > > > > Arm announced Armv8-R in 2013, it is the latest generation Arm
> > > > > architecture
> > > > > > targeted at the Real-time profile. It introduces virtualization
> at
> > > the
> > > > > highest
> > > > > > security level while retaining the Protected Memory System
> > > Architecture
> > > > > (PMSA)
> > > > > > based on a Memory Protection Unit (MPU). In 2020, Arm announced
> > > Cortex-
> > > > > R82,
> > > > > > which is the first Arm 64-bit Cortex-R processor based on Armv8-
> R64.
> > > > > >
> > > > > > - The latest Armv8-R64 document can be found here:
> > > > > >   [Arm Architecture Reference Manual Supplement - Armv8, for
> Armv8-R
> > > > > AArch64 architecture
> > > > > profile](https://developer.arm.com/documentation/ddi0600/latest/).
> > > > > >
> > > > > > - Armv-R Architecture progression:
> > > > > >   Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64
> > > > > >   The following figure is a simple comparison of "R" processors
> > > based on
> > > > > >   different Armv-R Architectures.
> > > > > >   ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8
> zY2K
> > > PZ8i
> > > > > mBpbvIr2eqBguEB)
> > > > > >
> > > > > > - The Armv8-R architecture evolved additional features on top of
> > > Armv7-R:
> > > > > >     - An exception model that is compatible with the Armv8-A
> model
> > > > > >     - Virtualization with support for guest operating systems
> > > > > >         - PMSA virtualization using MPUs In EL2.
> > > > > > - The new features of Armv8-R64 architecture
> > > > > >     - Adds support for the 64-bit A64 instruction set,
> previously
> > > Armv8-
> > > > > R
> > > > > >       only supported A32.
> > > > > >     - Supports up to 48-bit physical addressing, previously up
> to
> > > 32-bit
> > > > > >       addressing was supported.
> > > > > >     - Optional Arm Neon technology and Advanced SIMD
> > > > > >     - Supports three Exception Levels (ELs)
> > > > > >         - Secure EL2 - The Highest Privilege, MPU only, for
> firmware,
> > > > > hypervisor
> > > > > >         - Secure EL1 - RichOS (MMU) or RTOS (MPU)
> > > > > >         - Secure EL0 - Application Workloads
> > > > > >     - Optionally supports Virtual Memory System Architecture at
> S-
> > > EL1/S-
> > > > > EL0.
> > > > > >       This means it's possible to run rich OS kernels - like
> Linux -
> > > > > either
> > > > > >       bare-metal or as a guest.
> > > > > > - Differences with the Armv8-A AArch64 architecture
> > > > > >     - Supports only a single Security state - Secure. There is
> not
> > > Non-
> > > > > Secure
> > > > > >       execution state supported.
> > > > > >     - EL3 is not supported, EL2 is mandatory. This means secure
> EL2
> > > is
> > > > > the
> > > > > >       highest EL.
> > > > > >     - Supports the A64 ISA instruction
> > > > > >         - With a small set of well-defined differences
> > > > > >     - Provides a PMSA (Protected Memory System Architecture)
> based
> > > > > >       virtualization model.
> > > > > >         - As opposed to Armv8-A AArch64's VMSA based
> Virtualization
> > > > > >         - Can support address bits up to 52 if FEAT_LPA is
> enabled,
> > > > > >           otherwise 48 bits.
> > > > > >         - Determines the access permissions and memory
> attributes of
> > > > > >           the target PA.
> > > > > >         - Can implement PMSAv8-64 at EL1 and EL2
> > > > > >             - Address translation flat-maps the VA to the PA for
> EL2
> > > > > Stage 1.
> > > > > >             - Address translation flat-maps the VA to the PA for
> EL1
> > > > > Stage 1.
> > > > > >             - Address translation flat-maps the IPA to the PA
> for
> > > EL1
> > > > > Stage 2.
> > > > > >     - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is
> configurable.
> > > > > >
> > > > > > ### 1.2. Xen Challenges with PMSA Virtualization
> > > > > > Xen is PMSA unaware Type-1 Hypervisor, it will need
> modifications to
> > > run
> > > > > > with an MPU and host multiple guest OSes.
> > > > > >
> > > > > > - No MMU at EL2:
> > > > > >     - No EL2 Stage 1 address translation
> > > > > >         - Xen provides fixed ARM64 virtual memory layout as
> basis of
> > > EL2
> > > > > >           stage 1 address translation, which is not applicable
> on
> > > MPU
> > > > > system,
> > > > > >           where there is no virtual addressing. As a result, any
> > > > > operation
> > > > > >           involving transition from PA to VA, like ioremap,
> needs
> > > > > modification
> > > > > >           on MPU system.
> > > > > >     - Xen's run-time addresses are the same as the link time
> > > addresses.
> > > > > >         - Enable PIC (position-independent code) on a real-time
> > > target
> > > > > >           processor probably very rare.
> > > > > >     - Xen will need to use the EL2 MPU memory region descriptors
> to
> > > > > manage
> > > > > >       access permissions and attributes for accesses made by VMs
> at
> > > > > EL1/0.
> > > > > >         - Xen currently relies on MMU EL1 stage 2 table to
> manage
> > > these
> > > > > >           accesses.
> > > > > > - No MMU Stage 2 translation at EL1:
> > > > > >     - A guest doesn't have an independent guest physical address
> > > space
> > > > > >     - A guest can not reuse the current Intermediate Physical
> > > Address
> > > > > >       memory layout
> > > > > >     - A guest uses physical addresses to access memory and
> devices
> > > > > >     - The MPU at EL2 manages EL1 stage 2 access permissions and
> > > > > attributes
> > > > > > - There are a limited number of MPU protection regions at both
> EL2
> > > and
> > > > > EL1:
> > > > > >     - Architecturally, the maximum number of protection regions
> is
> > > 256,
> > > > > >       typical implementations have 32.
> > > > > >     - By contrast, Xen does not need to consider the number of
> page
> > > > > table
> > > > > >       entries in theory when using MMU.
> > > > > > - The MPU protection regions at EL2 need to be shared between
> the
> > > > > hypervisor
> > > > > >   and the guest stage 2.
> > > > > >     - Requires careful consideration - may impact feature
> 'fullness'
> > > of
> > > > > both
> > > > > >       the hypervisor and the guest
> > > > > >     - By contrast, when using MMU, Xen has standalone P2M table
> for
> > > > > guest
> > > > > >       stage 2 accesses.
> > > > > >
> > > > > > ## 2. Proposed changes of Xen
> > > > > > ### **2.1. Changes of build system:**
> > > > > >
> > > > > > - ***Introduce new Kconfig options for Armv8-R64***:
> > > > > >   Unlike Armv8-A, because lack of MMU support on Armv8-R64, we
> may
> > > not
> > > > > >   expect one Xen binary to run on all machines. Xen images are
> not
> > > > > common
> > > > > >   across Armv8-R64 platforms. Xen must be re-built for different
> > > Armv8-
> > > > > R64
> > > > > >   platforms. Because these platforms may have different memory
> > > layout
> > > > > and
> > > > > >   link address.
> > > > > >     - `ARM64_V8R`:
> > > > > >       This option enables Armv8-R profile for Arm64. Enabling
> this
> > > > > option
> > > > > >       results in selecting MPU. This Kconfig option is used to
> gate
> > > some
> > > > > >       Armv8-R64 specific code except MPU code, like some code
> for
> > > Armv8-
> > > > > R64
> > > > > >       only system ID registers access.
> > > > > >
> > > > > >     - `ARM_MPU`
> > > > > >       This option enables MPU on ARMv8-R architecture. Enabling
> this
> > > > > option
> > > > > >       results in disabling MMU. This Kconfig option is used to
> gate
> > > some
> > > > > >       ARM_MPU specific code. Once when this Kconfig option has
> been
> > > > > enabled,
> > > > > >       the MMU relate code will not be built for Armv8-R64. The
> > > reason
> > > > > why
> > > > > >       not depends on runtime detection to select MMU or MPU is
> that,
> > > we
> > > > > don't
> > > > > >       think we can use one image for both Armv8-R64 and Armv8-
> A64.
> > > > > Another
> > > > > >       reason that we separate MPU and V8R in provision to allow
> to
> > > > > support MPU
> > > > > >       on 32bit Arm one day.
> > > > > >
> > > > > >     - `XEN_START_ADDRESS`
> > > > > >       This option allows to set the custom address at which Xen
> will
> > > be
> > > > > >       linked. This address must be aligned to a page size. Xen's
> > > run-
> > > > > time
> > > > > >       addresses are the same as the link time addresses.
> Different
> > > > > platforms
> > > > > >       may have differnt memory layout. This Kconfig option
> provides
> > > > > users
> > > > > >       the ability to select proper link addresses for their
> boards.
> > > > > >       ***Notes: Fixed link address means the Xen binary could
> not
> > > be***
> > > > > >       ***relocated by EFI loader. So in current stage, Xen could
> > > not***
> > > > > >       ***be launched as an EFI application on Armv8-R64.***
> > > > > >
> > > > > >     - `ARM_MPU_NORMAL_MEMORY_START` and
> `ARM_MPU_NORMAL_MEMORY_END`
> > > > > >       `ARM_MPU_DEVICE_MEMORY_START` and
> `ARM_MPU_DEVICE_MEMORY_END`
> > > > > >       These Kconfig options allow to set memory regions for Xen
> code,
> > > > > data
> > > > > >       and device memory. Before parsing memory information from
> > > device
> > > > > tree,
> > > > > >       Xen will use the values that stored in these options to
> setup
> > > > > boot-time
> > > > > >       MPU configuration. Why we need a boot-time MPU
> configuration?
> > > > > >       1. More deterministic: Arm MPU supports background regions,
> > > > > >          if we don't configure the MPU regions and don't enable
> MPU.
> > > > > >          We can enable MPU background regions. But that means
> all
> > > RAM
> > > > > >          is RWX. Random values in RAM or maliciously embedded
> data
> > > can
> > > > > >          be exploited. Using these Kconfig options allow users
> to
> > > have
> > > > > >          a deterministic RAM area to execute code.
> > > > > >       2. More compatible: On some Armv8-R64 platforms, if the
> MPU is
> > > > > >          disabled, the `dc zva` instruction will make the system
> > > halt.
> > > > > >          And this instruction will be embedded in some built-in
> > > > > functions,
> > > > > >          like `memory set`. If we use `-ddont_use_dc` to rebuild
> GCC,
> > > > > >          the built-in functions will not contain `dc zva`.
> However,
> > > it
> > > > > is
> > > > > >          obviously unlikely that we will be able to recompile
> all
> > > GCC
> > > > > >          for ARMv8-R64.
> > > > > >       3. One optional idea:
> > > > > >           We can map `XEN_START_ADDRESS` to `XEN_START_ADDRESS +
> > > 2MB` or
> > > > > >           `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_end`
> for
> > > > > >           MPU normal memory. It's enough to support Xen run in
> boot
> > > time.
> > > > >
> > > > > I can imagine that we need to have a different Xen build for each
> > > > > ARMv8-R platform. Do you envision that XEN_START_ADDRESS and
> > > > > ARM_MPU_*_MEMORY_START/END are preconfigured based on the platform
> > > > > choice at build time? I don't think we want a user to provide all
> of
> > > > > those addresses by hand, right?
> > > >
> > > > Yes, this is in our TODO list. We want to reuse current
> arm/platforms
> > > and
> > > > Kconfig menu for Armv8-R.
> > >
> > > OK, good
> > >
> > >
> > > > > The next question is whether we could automatically generate
> > > > > XEN_START_ADDRESS and ARM_MPU_*_MEMORY_START/END based on the
> platform
> > > > > device tree at build time (at build time, not runtime). That would
> > > > > make things a lot easier and it is also aligned with the way
> Zephyr
> > > and
> > > > > other RTOSes and baremetal apps work.
> > > >
> > > > It's a considerable option. But here we may encounter some problems
> need
> > > > to be solved first:
> > > > 1. Does CONFIG_DTB must be selected by default on Armv8-R? Without
> > > firmware
> > > >    or bootloader (like u-boot), we have to build DTB into Xen binary.
> > >
> > > CONFIG_DTB should trigger runtime support for device tree, while here
> we
> > > are talking about build time support for device tree. It is very
> > > different.
> > >
> > > Just to make an example, the whole build-time device tree could be
> > > scanned by Makefiles and other scripts, leading to C header files
> > > generations, but no code in Xen to parse device tree at all.
> > >
> > > DTB ---> Makefiles/scripts ---> .h files ---> Makefiles/scripts --->
> xen
> > >
> >
> > Yes, this is feasible.
> >
> > >
> > > I am not saying this is the best way to do it, I am only pointing out
> > > that build-time device tree does not imply run-time device tree. Also,
> > > it doesn't imply a DTB built-in the Xen binary (although that is also
> an
> > > option).
> > >
> >
> > I agree.
> >
> > > The way many baremetal OSes and RTOSes work is that they take a DTB as
> > > input to the build *only*. From the DTB, the build-time make system
> > > generates #defines and header files that are imported in C.
> > >
> > > The resulting RTOS binary doesn't need support for DTB, because all
> the
> > > right addresses have already been provided as #define by the Make
> > > system.
> > >
> > > I don't think we need to go to the extreme of removing DTB support
> from
> > > Xen on ARMv8-R. I am only saying that if we add build-time device tree
> > > support it would make it easier to support multiple boards without
> > > having to have platform files in Xen for each of them, and we can do
> > > that without any impact on runtime device tree parsing.
> > >
> >
> > As V8R's use cases maybe mainly focus on some real-time/critical
> scenarios,
> > this may be a better method than platform files. We don't need to
> maintain
> > the platform related definitions header files. Xen also can skip the
> some
> > platform information parsing in boot time. This will increase the boot
> speed
> > of Xen in real-time/critical scenarios.
> 
> +1
> 
> 
> > > >    This
> > > >    can guarantee build-time DTB is the same as runtime DTB. But
> > > eventually,
> > > >    we will have firmware and bootloader before Xen launch (as Arm
> EBBR's
> > > >    requirement). In this case, we may not build DTB into Xen image.
> And
> > > >    we can't guarantee build-time DTB is the same as runtime DTB.
> > >
> > > As mentioned, if we have a build-time DTB we might not need a run-time
> > > DTB. Secondly, I think it is entirely reasonable to expect that the
> > > build-time DTB and the run-time DTB are the same.
> > >
> >
> > Yes, if we implement in this way, we should describe it in limitation
> > of v8r Xen.
> >
> > > It is the same problem with platform files: we have to assume that the
> > > information in the platform files matches the runtime DTB.
> > >
> >
> > indeed.
> >
> > >
> > > > 2. If build-time DTB is the same as runtime DTB, how can we
> determine
> > > >    the XEN_START_ADDRESS in DTB describe memory range? Should we
> always
> > > >    limit Xen to boot from lowest address? Or will we introduce some
> new
> > > >    DT property to specify the Xen start address? I think this DT
> > > property
> > > >    also can solve above question#1.
> > >
> > > The loading address should be automatically chosen by the build
> scripts.
> > > We can do that now with ImageBuilder [1]: it selects a 2MB-aligned
> > > address for each binary to load, one by one starting from a 2MB offset
> > > from start of memory.
> > >
> > > [1] https://gitlab.com/ViryaOS/imagebuilder/-
> /blob/master/scripts/uboot-
> > > script-gen#L390
> > >
> > > So the build scripts can select XEN_START_ADDRESS based on the
> > > memory node information on the build-time device tree. And there
> should
> > > be no need to add XEN_START_ADDRESS to the runtime device tree.
> > >
> >
> > This is fine if there are no explicit restrictions on the platform.
> > Some platform may reserve some memory area for something like firmware,
> > But I think it's OK, in the worst case, we can hide this area from
> > build DTB.
> >
> > >
> > > > > The device tree can be given as input to the build system, and the
> > > > > Makefiles would take care of generating XEN_START_ADDRESS and
> > > > > ARM_MPU_*_MEMORY_START/END based on /memory and other interesting
> > > nodes.
> > > > >
> > > >
> > > > If we can solve above questions, yes, device tree is a good idea for
> > > > XEN_START_ADDRESS. For ARM_MPU_NORMAL_MEMORY_*, we can get them from
> > > > memory nodes, but for ARM_MPU_DEVICE_MEMORY_*, they are not easy for
> > > > us to scan all devices' nodes. And it's very tricky, if the memory
> > > > regions are interleaved. So in our current RFC code, we select to
> use
> > > > the optional idea:
> > > > We map `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` for MPU
> normal
> > > memory.
> > > > But we use mpu,device-memory-section in DT for MPU device memory.
> > >
> > > Keep in mind that we are talking about build-time scripts: it doesn't
> > > matter if they are slow. We can scan the build-time dtb as many time
> as
> > > needed and generate ARM_MPU_DEVICE_MEMORY_* as appropriate. It might
> > > make "make xen" slower but runtime will be unaffected.
> > >
> > > So, I don't think this is a problem.
> > >
> >
> > OK.
> >
> > >
> > > > > > - ***Define new system registers for compilers***:
> > > > > >   Armv8-R64 is based on Armv8.4. That means we will use some
> Armv8.4
> > > > > >   specific system registers. As Armv8-R64 only have secure state,
> so
> > > > > >   at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen.
> And
> > > the
> > > > > >   first GCC version that supports Armv8.4 is GCC 8.1. In
> addition to
> > > > > >   these, PMSA of Armv8-R64 introduced lots of MPU related system
> > > > > registers:
> > > > > >   `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`,
> `PRENR_ELx`
> > > and
> > > > > >   `MPUIR_ELx`. But the first GCC version to support these system
> > > > > registers
> > > > > >   is GCC 11. So we have two ways to make compilers to work
> properly
> > > with
> > > > > >   these system registers.
> > > > > >   1. Bump GCC version to GCC 11.
> > > > > >      The pros of this method is that, we don't need to encode
> these
> > > > > >      system registers in macros by ourselves. But the cons are
> that,
> > > > > >      we have to update Makefiles to support GCC 11 for Armv8-R64.
> > > > > >      1.1. Check the GCC version 11 for Armv8-R64.
> > > > > >      1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> > > > > >      1.3. Solve the confliction of march=armv8r and mcpu=generic
> > > > > >     These changes will affect common Makefiles, not only Arm
> > > Makefiles.
> > > > > >     And GCC 11 is new, lots of toolchains and Distro haven't
> > > supported
> > > > > it.
> > > > > >
> > > > > >   2. Encode new system registers in macros ***(preferred)***
> > > > > >         ```
> > > > > >         /* Virtualization Secure Translation Control Register */
> > > > > >         #define VSTCR_EL2  S3_4_C2_C6_2
> > > > > >         /* Virtualization System Control Register */
> > > > > >         #define VSCTLR_EL2 S3_4_C2_C0_0
> > > > > >         /* EL1 MPU Protection Region Base Address Register
> encode */
> > > > > >         #define PRBAR_EL1  S3_0_C6_C8_0
> > > > > >         ...
> > > > > >         /* EL2 MPU Protection Region Base Address Register
> encode */
> > > > > >         #define PRBAR_EL2  S3_4_C6_C8_0
> > > > > >         ...
> > > > > >         ```
> > > > > >      If we encode all above system registers, we don't need to
> bump
> > > GCC
> > > > > >      version. And the common CFLAGS Xen is using still can be
> > > applied to
> > > > > >      Armv8-R64. We don't need to modify Makefiles to add
> specific
> > > CFLAGS.
> > > > >
> > > > > I think that's fine and we did something similar with the original
> > > ARMv7-A
> > > > > port if I remember correctly.
> > > > >
> > > > >
> > > > > > ### **2.2. Changes of the initialization process**
> > > > > > In general, we still expect Armv8-R64 and Armv8-A64 to have a
> > > consistent
> > > > > > initialization process. In addition to some architecture
> differences,
> > > > > there
> > > > > > is no more than reusable code that we will distinguish through
> > > > > CONFIG_ARM_MPU
> > > > > > or CONFIG_ARM64_V8R. We want most of the initialization code to
> be
> > > > > reusable
> > > > > > between Armv8-R64 and Armv8-A64.
> > > > >
> > > > > +1
> > > > >
> > > > >
> > > > > > - We will reuse the original head.s and setup.c of Arm. But
> replace
> > > the
> > > > > >   MMU and page table operations in these files with
> configuration
> > > > > operations
> > > > > >   for MPU and MPU regions.
> > > > > >
> > > > > > - We provide a boot-time MPU configuration. This MPU
> configuration
> > > will
> > > > > >   support Xen to finish its initialization. And this boot-time
> MPU
> > > > > >   configuration will record the memory regions that will be
> parsed
> > > from
> > > > > >   device tree.
> > > > > >
> > > > > >   In the end of Xen initialization, we will use a runtime MPU
> > > > > configuration
> > > > > >   to replace boot-time MPU configuration. The runtime MPU
> > > configuration
> > > > > will
> > > > > >   merge and reorder memory regions to save more MPU regions for
> > > guests.
> > > > > >   ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3l
> TlH1
> > > PqRD
> > > > > oacQVTwUtWIGU)
> > > > > >
> > > > > > - Defer system unpausing domain.
> > > > > >   When Xen initialization is about to end, Xen unpause guests
> > > created
> > > > > >   during initialization. But this will cause some issues. The
> > > unpause
> > > > > >   action occurs before free_init_memory, however the runtime MPU
> > > > > configuration
> > > > > >   is built after free_init_memory.
> > > > > >
> > > > > >   So if the unpaused guests start executing the context switch
> at
> > > this
> > > > > >   point, then its MPU context will base on the boot-time MPU
> > > > > configuration.
> > > > > >   Probably it will be inconsistent with runtime MPU
> configuration,
> > > this
> > > > > >   will cause unexpected problems (This may not happen in a
> single
> > > core
> > > > > >   system, but on SMP systems, this problem is foreseeable, so we
> > > hope to
> > > > > >   solve it at the beginning).
> > > > > >
> > > > > > ### **2.3. Changes to reduce memory fragmentation**
> > > > > >
> > > > > > In general, memory in Xen system can be classified to 4 classes:
> > > > > > `image sections`, `heap sections`, `guest RAM`, `boot modules
> (guest
> > > > > Kernel,
> > > > > > initrd and dtb)`
> > > > > >
> > > > > > Currently, Xen doesn't have any restriction for users how to
> > > allocate
> > > > > > memory for different classes. That means users can place boot
> > > modules
> > > > > > anywhere, can reserve Xen heap memory anywhere and can allocate
> > > guest
> > > > > > memory anywhere.
> > > > > >
> > > > > > In a VMSA system, this would not be too much of a problem, since
> the
> > > > > > MMU can manage memory at a granularity of 4KB after all. But in
> a
> > > > > > PMSA system, this will be a big problem. On Armv8-R64, the max
> MPU
> > > > > > protection regions number has been limited to 256. But in
> typical
> > > > > > processor implementations, few processors will design more than
> 32
> > > > > > MPU protection regions. Add in the fact that Xen shares MPU
> > > protection
> > > > > > regions with guest's EL1 Stage 2. It becomes even more important
> > > > > > to properly plan the use of MPU protection regions.
> > > > > >
> > > > > > - An ideal of memory usage layout restriction:
> > > > > > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAyp
> Ttd3
> > > kXAt
> > > > > d75XtrngcnW)
> > > > > > 1. Reserve proper MPU regions for Xen image (code, rodata and
> data +
> > > > > bss).
> > > > > > 2. Reserve one MPU region for boot modules.
> > > > > >    That means the placement of all boot modules, include guest
> > > kernel,
> > > > > >    initrd and dtb, will be limited to this MPU region protected
> area.
> > > > > > 3. Reserve one or more MPU regions for Xen heap.
> > > > > >    On Armv8-R64, the guest memory is predefined in device tree,
> it
> > > will
> > > > > >    not be allocated from heap. Unlike Armv8-A64, we will not
> move
> > > all
> > > > > >    free memory to heap. We want Xen heap is dertermistic too, so
> Xen
> > > on
> > > > > >    Armv8-R64 also rely on Xen static heap feature. The memory
> for
> > > Xen
> > > > > >    heap will be defined in tree too. Considering that physical
> > > memory
> > > > > >    can also be discontinuous, one or more MPU protection regions
> > > needs
> > > > > >    to be reserved for Xen HEAP.
> > > > > > 4. If we name above used MPU protection regions PART_A, and name
> > > left
> > > > > >    MPU protection regions PART_B:
> > > > > >    4.1. In hypervisor context, Xen will map left RAM and devices
> to
> > > > > PART_B.
> > > > > >         This will give Xen the ability to access whole memory.
> > > > > >    4.2. In guest context, Xen will create EL1 stage 2 mapping in
> > > PART_B.
> > > > > >         In this case, Xen just need to update PART_B in context
> > > switch,
> > > > > >         but keep PART_A as fixed.
> > > > >
> > > > > I think that the memory layout and restrictions that you wrote
> above
> > > > > make sense. I have some comments on the way they are represented
> in
> > > > > device tree, but that's different.
> > > > >
> > > > >
> > > > > > ***Notes: Static allocation will be mandatory on MPU based
> > > systems***
> > > > > >
> > > > > > **A sample device tree of memory layout restriction**:
> > > > > > ```
> > > > > > chosen {
> > > > > >     ...
> > > > > >     /*
> > > > > >      * Define a section to place boot modules,
> > > > > >      * all boot modules must be placed in this section.
> > > > > >      */
> > > > > >     mpu,boot-module-section = <0x10000000 0x10000000>;
> > > > > >     /*
> > > > > >      * Define a section to cover all guest RAM. All guest RAM
> must
> > > be
> > > > > located
> > > > > >      * within this section. The pros is that, in best case, we
> can
> > > only
> > > > > have
> > > > > >      * one MPU protection region to map all guest RAM for Xen.
> > > > > >      */
> > > > > >     mpu,guest-memory-section = <0x20000000 0x30000000>;
> > > > > >     /*
> > > > > >      * Define a memory section that can cover all device memory
> that
> > > > > >      * will be used in Xen.
> > > > > >      */
> > > > > >     mpu,device-memory-section = <0x80000000 0x7ffff000>;
> > > > > >     /* Define a section for Xen heap */
> > > > > >     xen,static-mem = <0x50000000 0x20000000>;
> > > > >
> > > > > As mentioned above, I understand the need for these sections, but
> why
> > > do
> > > > > we need to describe them in device tree at all? Could Xen select
> them
> > > by
> > > > > itself during boot?
> > > >
> > > > I think without some inputs, Xen could not do this or will do it in
> some
> > > > assumption. For example, assume the first the boot-module-section
> > > determined
> > > > by lowest address and highest address of all modules. And the same
> for
> > > > guest-memory-section, calculated from all guest allocated memory
> regions.
> > >
> > > Right, I think that the mpu,boot-module-section should be generated by
> a
> > > set of scripts like ImageBuilder. Something with a list of all the
> > > binaries that need to be loaded and also the DTB at build-time.
> > > Something like ImageBuilder would have the ability to add
> > > "mpu,boot-module-section" to device tree automatically and
> automatically
> > > choose a good address for it.
> > >
> > > As an example, today ImageBuilder takes as input a config file like
> the
> > > following:
> > >
> > > ---
> > > MEMORY_START="0x0"
> > > MEMORY_END="0x80000000"
> > >
> > > DEVICE_TREE="4.16-2022.1/mpsoc.dtb"
> > > XEN="4.16-2022.1/xen"
> > > DOM0_KERNEL="4.16-2022.1/Image-dom0-5.16"
> > > DOM0_RAMDISK="4.16-2022.1/xen-rootfs.cpio.gz"
> > >
> > > NUM_DOMUS=1
> > > DOMU_KERNEL[0]="4.16-2022.1/Image-domU"
> > > DOMU_RAMDISK[0]="4.16-2022.1/initrd.cpio"
> > > DOMU_PASSTHROUGH_DTB[0]="4.16-2022.1/passthrough-example-sram.dtb"
> > > ---
> > >
> > > And generates a U-Boot boot.scr script with:
> > > - load addresses for each binary
> > > - commands to edit the DTB to add those addresses to device tree (e.g.
> > >   dom0less kernels addresses)
> > >
> > > ImageBuilder can also modify the DTB at build time instead (instead of
> > > doing it from boot.scr.) See FDTEDIT.
> > >
> > > I am not saying we should use ImageBuilder, but it sounds like we need
> > > something similar.
> > >
> > >
> >
> > Yes, exactly. I have comment on Henry's stack heap RFC to said we need
> > a similar tool. Now, here it is : )
> 
> Ahah yes :-)
> 
> Initially I wrote ImageBuilder because people kept sending me emails to
> ask me for help with dom0less and almost always it was an address
> loading error.
> 

Yes, at present, it is not very convenient, many problems are caused by DTS
configuration errors

> I would be happy to turn ImageBuilder into something useful for ARMv8-R
> as well and add more maintainers from ARM and other companies.
> 

+1 : )

> 
> > > > > If not, and considering that we have to generate
> > > > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it make
> sense
> > > to
> > > > > also generate mpu,guest-memory-section, xen,static-mem, etc. at
> build
> > > > > time rather than passing it via device tree to Xen at runtime?
> > > > >
> > > >
> > > > Did you mean we still add these information in device tree, but for
> > > build
> > > > time only. In runtime we don't parse them?
> > >
> > > Yes, something like that, but see below.
> > >
> > >
> > > > > What's the value of doing ARM_MPU_*_MEMORY_START/END at build time
> and
> > > > > everything else at runtime?
> > > >
> > > > ARM_MPU_*_MEMORY_START/END is defined by platform. But other things
> are
> > > > users customized. They can change their usage without rebuild the
> image.
> > >
> > > Good point.
> > >
> > > We don't want to have to rebuild Xen if the user updated a guest
> kernel,
> > > resulting in a larger boot-module-section.
> > >
> > > So I think it makes sense that "mpu,boot-module-section" is generated
> by
> > > the scripts (e.g. ImageBuilder) at build time, and Xen reads the
> > > property at boot from the runtime device tree.
> > >
> > > I think we need to divide the information into two groups:
> > >
> > >
> > > # Group1: board info
> > >
> > > This information is platform specific and it is not meant to change
> > > depending on the VM configuration. Ideally, we build Xen for a
> platform
> > > once, then we can use the same Xen binary together with any
> combination
> > > of dom0/domU kernels and ramdisks.
> > >
> > > This kind of information doesn't need to be exposed to the runtime
> > > device tree. But we can still use a build-time device tree to generate
> > > the addresses if it is convenient.
> > >
> > > XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and
> ARM_MPU_NORMAL_MEMORY_*
> > > seem to be part of this group.
> > >
> >
> > Yes.
> >
> > >
> > > # Group2: boot configuration
> > >
> > > This information is about the specific set of binaries and VMs that we
> > > need to boot. It is conceptually similar to the dom0less device tree
> > > nodes that we already have. If we change one of the VM binaries, we
> > > likely have to refresh the information here.
> > >
> > > "mpu,boot-module-section" probably belongs to this group (unless we
> find
> > > a way to define "mpu,boot-module-section" generically so that we don't
> > > need to change it any time the set of boot modules change.)
> > >
> > >
> >
> > I agree.
> >
> > > > > It looks like we are forced to have the sections definitions at
> build
> > > > > time because we need them before we can parse device tree. In that
> > > case,
> > > > > we might as well define all the sections at build time.
> > > > >
> > > > > But I think it would be even better if Xen could automatically
> choose
> > > > > xen,static-mem, mpu,guest-memory-section, etc. on its own based on
> the
> > > > > regular device tree information (/memory, /amba, etc.), without
> any
> > > need
> > > > > for explicitly describing each range with these new properties.
> > > > >
> > > >
> > > > for mpu,guest-memory-section, with the limitations: no other usage
> > > between
> > > > different guest' memory nodes, this is OK. But for xen,static-mem
> (heap),
> > > > we just want everything on a MPU system is dertermistic. But, of
> course
> > > Xen
> > > > can select left memory for heap without static-mem.
> > >
> > > It is good that you think they can be chosen by Xen.
> > >
> > > Differently from "boot-module-section", which has to do with the boot
> > > modules selected by the user for a specific execution,
> > > guest-memory-section and static-mem are Xen specific memory
> > > policies/allocations.
> > >
> > > A user wouldn't know how to fill them in. And I worry that even a
> script
> >
> > But users should know it, because static-mem for guest must be allocated
> > in this range. And users take the responsibility to set the DomU's
> > static allocate memory ranges.
> 
> Let me premise that my goal is to avoid having many users reporting
> errors to xen-devel and xen-users when actually it is just a wrong
> choice of addresses.
> 
> I think we need to make a distinction between addresses for the boot
> modules, e.g. addresses where to load xen, the dom0/U kernel, dom0/U
> ramdisk in memory at boot time, and VM static memory addresses.
> 
> The boot modules addresses are particularly difficult to fill in because
> they are many and a small update in one of the modules could invalidate
> all the other addresses. This is why I ended up writing ImageBuilder.
> Since them, I received several emails from users thanking me for
> ImageBuilder :-)
> 

Thanks +999 😊


> The static VM memory addresses (xen,static-mem) should be a bit easier
> to fill in correctly. They are meant to be chosen once, and it shouldn't
> happen that an update on a kernel forces the user to change all the VM
> static memory addresses. Also, I know that some users actually want to
> be able to choose the domU addresses by hand because they have specific
> needs. So it is good that we can let the user choose the addresses if
> they want to.
> 

Yes.

> With all of that said, I do think that many users won't have an opinion
> on the VM static memory addresses and won't know how to choose them.
> It would be error prone to let them try to fill them in by hand. So I
> was already planning on adding support to ImageBuilder to automatically
> generate xen,static-mem for dom0less domains.
> 

Let me make sure that's what you said: Users give an VM memory size to
ImageBuilder, and ImageBuilder will generate xen,static-mem = <start, size>.
For specific VM, ImageBuilder also can accept start and size as inputs?

Do I understand this correctly?

> 
> Going back to this specific discussion about boot-module-section: I can
> see now that, given xen,static-mem is chosen by ImageBuilder (or

By hand : )

> similar) and not Xen, then it makes sense to have ImageBuilder (or
> similar) also generate boot-module-section.
> 

If my above understanding is right, then yes.

> 
> 
> > > like ImageBuilder wouldn't be the best place to pick these values --
> > > they seem too "important" to leave to a script.
> > >
> > > But it seems possible to choose the values in Xen:
> > > - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at build
> time
> > > - Xen reads boot-module-section from device tree
> > >
> > > It should be possible at this point for Xen to pick the best values
> for
> > > guest-memory-section and static-mem based on the memory available.
> > >
> >
> > How Xen to pick? Does it mean in static allocation DomU DT node, we just
> > need a size, but don't require a start address for static-mem?
> 
> Yes the idea was that the user would only provide the size (e.g.
> DOMU_STATIC_MEM[1]=1024) and the addresses would be automatically
> calculated. But I didn't mean to change the existing xen,static-mem
> device tree bindings. So it is best if the xen,static-mem addresses
> generation is done by ImageBuilder (or similar tool) instead of Xen.
> 

If we still keep the option for user to specify the start and size
parameters for VM memory, because it maybe very important for a
deterministic system (fully static system), I agree with you.

And in current static-allocation, I think Xen doesn't generate
xen,static-mem addresses, all by hands...

> Sorry for the confusion!
> 

NP ; )

> 
> > > > > >     domU1 {
> > > > > >         ...
> > > > > >         #xen,static-mem-address-cells = <0x01>;
> > > > > >         #xen,static-mem-size-cells = <0x01>;
> > > > > >         /* Statically allocated guest memory, within mpu,guest-
> > > memory-
> > > > > section */
> > > > > >         xen,static-mem = <0x30000000 0x1f000000>;
> > > > > >
> > > > > >         module@11000000 {
> > > > > >             compatible = "multiboot,kernel\0multiboot,module";
> > > > > >             /* Boot module address, within mpu,boot-module-
> section
> > > */
> > > > > >             reg = <0x11000000 0x3000000>;
> > > > > >             ...
> > > > > >         };
> > > > > >
> > > > > >         module@10FF0000 {
> > > > > >                 compatible = "multiboot,device-
> > > tree\0multiboot,module";
> > > > > >                 /* Boot module address, within mpu,boot-module-
> > > section
> > > > > */
> > > > > >                 reg = <0x10ff0000 0x10000>;
> > > > > >                 ...
> > > > > >         };
> > > > > >     };
> > > > > > };
> > > > > > ```

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-01  7:51   ` Wei Chen
@ 2022-03-02  7:21     ` Penny Zheng
  2022-03-02 12:06       ` Julien Grall
  2022-03-02 12:00     ` Julien Grall
  1 sibling, 1 reply; 34+ messages in thread
From: Penny Zheng @ 2022-03-02  7:21 UTC (permalink / raw)
  To: Wei Chen, Julien Grall, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Henry Wang, nd

Hi julien

> -----Original Message-----
> From: Wei Chen <Wei.Chen@arm.com>
> Sent: Tuesday, March 1, 2022 3:52 PM
> To: Julien Grall <julien@xen.org>; xen-devel@lists.xenproject.org; Stefano
> Stabellini <sstabellini@kernel.org>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd
> <nd@arm.com>
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Julien,
> 
> > -----Original Message-----
> > From: Julien Grall <julien@xen.org>
> > Sent: 2022年2月26日 4:55
> > To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> > Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> > <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd
> > <nd@arm.com>
> > Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >
> > Hi Wei,
> >
> > Thank you for sending the proposal. Please find some comments below.
> >
> > On 24/02/2022 06:01, Wei Chen wrote:
> > > # Proposal for Porting Xen to Armv8-R64
> > >
> > > This proposal will introduce the PoC work of porting Xen to
> > > Armv8-R64, which includes:
> > > - The changes of current Xen capability, like Xen build system, memory
> > >    management, domain management, vCPU context switch.
> > > - The expanded Xen capability, like static-allocation and direct-map.
> > >
> > > ***Notes:***
> > > 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
> > >     ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> > >     ***Trusted-Frimware (TF-R). This is an external dependency,***
> > >     ***so we think the discussion of Xen SMP support on Armv8-R64***
> > >     ***should be started when single-CPU support is complete.***
> >
> > I agree that we should first focus on single-CPU support.
> >
> 
> ack.
> 
> > > 2. ***This proposal will not touch xen-tools. In current stage,***
> > >     ***Xen on Armv8-R64 only support dom0less, all guests should***
> > >     ***be booted from device tree.***
> >
> > Make sense. I actually expect some issues in the way xen-tools would
> > need to access memory of the domain that is been created.
> >
> 
> Yes, we also feel that changes to xen-tools could be a big job in the future
> (both xen common implementation and tools need changes).
> 
> > [...]
> >
> > > ### 1.2. Xen Challenges with PMSA Virtualization Xen is PMSA unaware
> > > Type-1 Hypervisor, it will need modifications to run with an MPU and
> > > host multiple guest OSes.
> > >
> > > - No MMU at EL2:
> > >      - No EL2 Stage 1 address translation
> > >          - Xen provides fixed ARM64 virtual memory layout as basis
> > > of
> > EL2
> > >            stage 1 address translation, which is not applicable on
> > > MPU
> > system,
> > >            where there is no virtual addressing. As a result, any
> > operation
> > >            involving transition from PA to VA, like ioremap, needs
> > modification
> > >            on MPU system.
> > >      - Xen's run-time addresses are the same as the link time addresses.
> > >          - Enable PIC (position-independent code) on a real-time target
> > >            processor probably very rare.
> >
> > Aside the assembly boot code and UEFI stub, Xen already runs at the
> > same address as it was linked.
> >
> 
> But the difference is that, base on MMU, we can use the same link address
> for all platforms. But on MPU system, we can't do it in the same way.
> 
> > >      - Xen will need to use the EL2 MPU memory region descriptors to
> > manage
> > >        access permissions and attributes for accesses made by VMs at
> > EL1/0.
> > >          - Xen currently relies on MMU EL1 stage 2 table to manage these
> > >            accesses.
> > > - No MMU Stage 2 translation at EL1:
> > >      - A guest doesn't have an independent guest physical address space
> > >      - A guest can not reuse the current Intermediate Physical Address
> > >        memory layout
> > >      - A guest uses physical addresses to access memory and devices
> > >      - The MPU at EL2 manages EL1 stage 2 access permissions and
> > attributes
> > > - There are a limited number of MPU protection regions at both EL2
> > > and
> > EL1:
> > >      - Architecturally, the maximum number of protection regions is 256,
> > >        typical implementations have 32.
> > >      - By contrast, Xen does not need to consider the number of page
> > table
> > >        entries in theory when using MMU.
> > > - The MPU protection regions at EL2 need to be shared between the
> > hypervisor
> > >    and the guest stage 2.
> > >      - Requires careful consideration - may impact feature
> > > 'fullness' of
> > both
> > >        the hypervisor and the guest
> > >      - By contrast, when using MMU, Xen has standalone P2M table for
> > guest
> > >        stage 2 accesses.
> >
> > [...]
> >
> > > - ***Define new system registers for compilers***:
> > >    Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> > >    specific system registers. As Armv8-R64 only have secure state, so
> > >    at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
> > >    first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> > >    these, PMSA of Armv8-R64 introduced lots of MPU related system
> > registers:
> > >    `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx`
> and
> > >    `MPUIR_ELx`. But the first GCC version to support these system
> > registers
> > >    is GCC 11. So we have two ways to make compilers to work properly
> > with
> > >    these system registers.
> > >    1. Bump GCC version to GCC 11.
> > >       The pros of this method is that, we don't need to encode these
> > >       system registers in macros by ourselves. But the cons are that,
> > >       we have to update Makefiles to support GCC 11 for Armv8-R64.
> > >       1.1. Check the GCC version 11 for Armv8-R64.
> > >       1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> > >       1.3. Solve the confliction of march=armv8r and mcpu=generic
> > >      These changes will affect common Makefiles, not only Arm Makefiles.
> > >      And GCC 11 is new, lots of toolchains and Distro haven't
> > > supported
> > it.
> >
> > I agree that forcing to use GCC11 is not a good idea. But I am not
> > sure to understand the problem with the -march=.... Ultimately,
> > shouldn't we aim to build Xen ARMv8-R with -march=armv8r?
> >
> 
> Actually, we had done, but we reverted it from RFC patch series. The reason
> has been listed above. But that is not the major reason. The main reason is
> that:
> Armv8-R AArch64 supports the A64 ISA instruction set with some
> modifications:
> Redefines DMB, DSB, and adds an DFB. But actually, the encodings of DMB
> and DSB are still the same with A64. And DFB is a alias of DSB #12.
> 
> In this case, we don't think we need a new arch flag to generate new
> instructions for Armv8-R. And we have discussed with Arm kernel guys, they
> will not update the build system to build Linux that will be running on
> Armv8-R64 EL1 either.
> 
> 
> > [...]
> >
> > > ### **2.2. Changes of the initialization process** In general, we
> > > still expect Armv8-R64 and Armv8-A64 to have a consistent
> > > initialization process. In addition to some architecture
> > > differences,
> > there
> > > is no more than reusable code that we will distinguish through
> > CONFIG_ARM_MPU
> > > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> > reusable
> > > between Armv8-R64 and Armv8-A64.
> > >
> > > - We will reuse the original head.s and setup.c of Arm. But replace the
> > >    MMU and page table operations in these files with configuration
> > operations
> > >    for MPU and MPU regions.
> > >
> > > - We provide a boot-time MPU configuration. This MPU configuration will
> > >    support Xen to finish its initialization. And this boot-time MPU
> > >    configuration will record the memory regions that will be parsed from
> > >    device tree.
> > >
> > >    In the end of Xen initialization, we will use a runtime MPU
> > configuration
> > >    to replace boot-time MPU configuration. The runtime MPU
> > > configuration
> > will
> > >    merge and reorder memory regions to save more MPU regions for
> guests.
> > >
> > > ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1P
> q
> > > R
> > DoacQVTwUtWIGU)
> > >
> > > - Defer system unpausing domain.
> > >    When Xen initialization is about to end, Xen unpause guests created
> > >    during initialization. But this will cause some issues. The unpause
> > >    action occurs before free_init_memory, however the runtime MPU
> > configuration
> > >    is built after free_init_memory.
> >
> > I was half expecting that free_init_memory() would not be called for
> > Xen Armv8R.
> >
> 
> We had called free_init_memory for Xen Armv8R, but it doesn't really mean
> much. As we have static heap, so we don't reclaim init memory to heap. And
> this reclaimed memory could not be used by Xen data and bss either. But
> from the security perspective, free_init_memory will drop the Xen init code &
> data, this will reduce the code an attacker can exploit.
> 
> > >
> > >    So if the unpaused guests start executing the context switch at this
> > >    point, then its MPU context will base on the boot-time MPU
> > configuration.
> >
> > Can you explain why you want to switch the MPU configuration that late?
> >
> 

It is more related to the implementation.

In the boot stage, we allocate MPU regions in sequence until the max. 
Since a few MPU region will get removed along the way, it leaves hole there.
Such like when heap is ready, fdt will be reallocated in the heap, which means the
MPU region for device tree is in no need. And also in free_init_memory, although we
do not give back init memory to the heap, we will also destroy according MPU
regions to make them inaccessible.
Without ordering, we need a bitmap to record such information.

In context switch, the memory layout is quite different for guest mode and
hypervisor mode. When switching to guest mode, only guest RAM, emulated/passthrough
devices, etc could be seen, but in hypervisor mode, all guests RAM and device memory
shall be seen. And without reordering, we need to iterate all MPU regions to find
according regions to disable during runtime context switch, that's definitely a overhead.

So we propose an ordering at the tail of the boot time, to put all fixed MPU region
in the head, like xen text/data, etc, and put all flexible ones at tail, like device memory,
guests RAM.
Then later in context switch,  we could easily just disable ones from tail and inserts new
ones in the tail.    

> In the boot stage, Xen is the only user of MPU. It may add some memory
> nodes or device memory to MPU regions for temporary usage. After free init
> memory, we want to reclaim these MPU regions to give more MPU regions
> can be used for guests. Also we will do some merge and reorder work. This
> work can make MPU regions to be easier managed in guest context switch.
> 
> > >    Probably it will be inconsistent with runtime MPU configuration, this
> > >    will cause unexpected problems (This may not happen in a single core
> > >    system, but on SMP systems, this problem is foreseeable, so we
> > > hope
> > to
> > >    solve it at the beginning).
> >
> > [...]
> >
> > > ### **2.4. Changes of memory management** Xen is coupled with VMSA,
> > > in order to port Xen to Armv8-R64, we have to decouple Xen from
> > > VMSA. And give Xen the ability to manage memory in
> > PMSA.
> > >
> > > 1. ***Use buddy allocator to manage physical pages for PMSA***
> > >     From the view of physical page, PMSA and VMSA don't have any
> > difference.
> > >     So we can reuse buddy allocator on Armv8-R64 to manage physical
> > pages.
> > >     The difference is that, in VMSA, Xen will map allocated pages to
> > virtual
> > >     addresses. But in PMSA, Xen just convert the pages to physical
> > address.
> > >
> > > 2. ***Can not use virtual address for memory management***
> > >     As Armv8-R64 only has PMSA in EL2, Xen loses the ability of
> > > using
> > virtual
> > >     address to manage memory. This brings some problems, some
> > > virtual
> > address
> > >     based features could not work well on Armv8-R64, like `FIXMAP`,
> > `vmap/vumap`,
> > >     `ioremap` and `alternative`.
> > >
> > >     But the functions or macros of these features are used in lots
> > > of
> > common
> > >     code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate
> > > relate
> > code
> > >     everywhere. In this case, we propose to use stub helpers to make
> > > the
> > changes
> > >     transparently to common code.
> > >     1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap
> > operations.
> > >        This will return physical address directly of fixmapped item.
> > >     2. For `vmap/vumap`, we will use some empty inline stub helpers:
> > >          ```
> > >          static inline void vm_init_type(...) {}
> > >          static inline void *__vmap(...)
> > >          {
> > >              return NULL;
> > >          }
> > >          static inline void vunmap(const void *va) {}
> > >          static inline void *vmalloc(size_t size)
> > >          {
> > >              return NULL;
> > >          }
> > >          static inline void *vmalloc_xen(size_t size)
> > >          {
> > >              return NULL;
> > >          }
> > >          static inline void vfree(void *va) {}
> > >          ```
> > >
> > >     3. For `ioremap`, it depends on `vmap`. As we have make `vmap`
> > > to
> > always
> > >        return `NULL`, they could not work well on Armv8-R64 without
> > changes.
> > >        `ioremap` will return input address directly.
> > >          ```
> > >          static inline void *ioremap_attr(...)
> > >          {
> > >              /* We don't have the ability to change input PA cache
> > attributes */
> > OOI, who will set them?
> 
> Some callers that want to change a memory's attribute will set them.
> Something like ioremap_nocache. I am not sure is this what you had asked : )
> 
> >
> > >              if ( CACHE_ATTR_need_change )
> > >                  return NULL;
> > >              return (void *)pa;
> > >          }
> > >          static inline void __iomem *ioremap_nocache(...)
> > >          {
> > >              return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
> > >          }
> > >          static inline void __iomem *ioremap_cache(...)
> > >          {
> > >              return ioremap_attr(start, len, PAGE_HYPERVISOR);
> > >          }
> > >          static inline void __iomem *ioremap_wc(...)
> > >          {
> > >              return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
> > >          }
> > >          void *ioremap(...)
> > >          {
> > >              return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> > >          }
> > >
> > >          ```
> > >      4. For `alternative`, it depends on `vmap` too.
> >
> > The only reason we depend on vmap() is because the map the sections
> > *text read-only and we enforce WnX. For VMSA, it would be possible to
> > avoid vmap() with some rework. I don't know for PMSA.
> >
> 
> For PMSA, we still enforce WnX. For your use case, I assume it's alternative.
> It still may have some possibility to avoid vmap(). But there may be some
> security issues. We had thought to disable MPU -> update xen text -> enable
> MPU to copy VMSA alternative's behavior. The problem with this, however,
> is that at some point, all memory is RWX. There maybe some security risk.
> But because it's in init stage, it probably doesn't matter as much as I thought.
> 

In MMU system, we use vmap() to change requested xen text codes(a few lines) temporarily
to RW to apply the alternative codes, the granularity for it could be 4KB.

But on MPU system, we give the whole XEN text code a MPU region, so otherwise we disable
the whole MPU to make it happen, which leads to a little risk for running c codes where MPU
disabled, or all text memory becoming RWX at this alternative time.
 
> > > We will simply disable
> > >         it on Armv8-R64 in current stage. How to implement `alternative`
> > >         on Armv8-R64 is better to be discussed after basic functions
> > > of
> > Xen
> > >         on Armv8-R64 work well.
> > alternative are mostly helpful to handle errata or enable features
> > that are not present on all CPUs. I wouldn't expect this to be
> > necessary at the beginning. In fact, on Arm, it was introduced > 4
> > years after the initial port :).
> 
> I hope it won't take us so long, this time : )
> 
> >
> > [...]
> >
> > > ### **2.5. Changes of device driver** 1. Because Armv8-R64 only has
> > > single secure state, this will affect some devices that have two
> > > secure state, like GIC. But fortunately, most vendors will not link
> > > a two secure state GIC to Armv8-R64 processors.
> > > Current GIC driver can work well with single secure state GIC for
> > > Armv8-
> > R64.
> > > 2. Xen should use secure hypervisor timer in Secure EL2. We will
> > introduce
> > > a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for
> timer.
> > >
> > > ### **2.7. Changes of virtual device** Currently, we only support
> > > pass-through devices in guest. Because event channel, xen-bus,
> > > xen-storage and other advanced Xen features haven't
> > been
> > > enabled in Armv8-R64.
> >
> > That's fine. I expect to require quite a bit of work to move from Xen
> > sharing the pages (e.g. like for grant-tables) to the guest sharing pages.
> >
> 
> Yes.
> 
> > Cheers,
> >
> > --
> > Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-02  6:43           ` Wei Chen
@ 2022-03-02 10:24             ` Julien Grall
  2022-03-03  1:35               ` Wei Chen
  0 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2022-03-02 10:24 UTC (permalink / raw)
  To: Wei Chen, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Wei,

On 02/03/2022 06:43, Wei Chen wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2022年3月1日 21:17
>> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>> <sstabellini@kernel.org>
>> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
>> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
>> <Henry.Wang@arm.com>; nd <nd@arm.com>
>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
>>
>> On 01/03/2022 06:29, Wei Chen wrote:
>>> Hi Julien,
>>
>> Hi,
>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 2022年2月26日 4:12
>>>> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
>>>> <sstabellini@kernel.org>
>>>> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
>>>> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry
>> Wang
>>>> <Henry.Wang@arm.com>; nd <nd@arm.com>
>>>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
>>>>
>>>> Hi Wei,
>>>>
>>>> On 25/02/2022 10:48, Wei Chen wrote:
>>>>>>>        Armv8-R64 can support max to 256 MPU regions. But that's just
>>>>>> theoretical.
>>>>>>>        So we don't want to define `pr_t mpu_regions[256]`, this is a
>>>> memory
>>>>>> waste
>>>>>>>        in most of time. So we decided to let the user specify through
>> a
>>>>>> Kconfig
>>>>>>>        option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default value
>> can
>>>> be
>>>>>> `32`,
>>>>>>>        it's a typical implementation on Armv8-R64. Users will
>> recompile
>>>> Xen
>>>>>> when
>>>>>>>        their platform changes. So when the MPU changes, respecifying
>> the
>>>>>> MPU
>>>>>>>        protection regions number will not cause additional problems.
>>>>>>
>>>>>> I wonder if we could probe the number of MPU regions at runtime and
>>>>>> dynamically allocate the memory needed to store them in arch_vcpu.
>>>>>>
>>>>>
>>>>> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But it
>>>> seems
>>>>> we will encounter some static allocated arch_vcpu problems and sizeof
>>>> issue.
>>>>
>>>> Does it need to be embedded in arch_vcpu? If not, then we could
>> allocate
>>>> memory outside and add a pointer in arch_vcpu.
>>>>
>>>
>>> We had thought to use a pointer in arch_vcpu instead of embedding
>> mpu_regions
>>> into arch_vcpu. But we noticed that arch_vcpu has a __cacheline_aligned
>>> attribute, this may be because of arch_vcpu will be used very frequently
>>> in some critical path. So if we use the pointer for mpu_regions, may
>> cause
>>> some cache miss in these critical path, for example, in context_swtich.
>>
>>   From my understanding, the idea behind ``cacheline_aligned`` is to
>> avoid the struct vcpu to be shared with other datastructure. Otherwise
>> you may end up to have two pCPUs to frequently write the same cacheline
>> which is not ideal.
>>
>> arch_vcpu should embbed anything that will be accessed often (e.g.
>> entry/exit) to certain point. For instance, not everything related to
>> the vGIC are embbed in the vCPU/Domain structure.
>>
>> I am a bit split regarding the mpu_regions. If they are mainly used in
>> the context_switch() then I would argue this is a premature optimization
>> because the scheduling decision is probably going to take a lot more
>> time than the context switch itself.
> 
> mpu_regions in arch_vcpu are used to save guest's EL1 MPU context. So, yes,
> they are mainly used in context_switch. In terms of the number of registers,
> it will save/restore more work than the original V8A. And on V8R we also need
> to keep most of the original V8A save/restore work. So it will take longer
> than the original V8A context_switch. And I think this is due to architecture's
> difference. So it's impossible for us not to save/restore EL1 MPU region
> registers in context_switch. And we have done some optimization for EL1 MPU
> save/restore:
> 1. Assembly code for EL1 MPU context_switch

This discussion reminds me when KVM decided to rewrite their context 
switch from assembly to C. The outcome was the compiler is able to do a 
better job than us when it comes to optimizing.

With a C version, we could also share the save/restore code with 32-bit 
and it is easier to read/maintain.

So I would suggest to run some numbers to check if it really worth 
implementing the MPU save/restore in assembly.

> 2. Use real MPU regions number instead of CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS
>     in context_switch. CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS is defined the Max
>     supported EL1 MPU regions for this Xen image. All platforms that implement
>     EL1 MPU regions in this range can work well with this Xen Image. But if the
>     implemented EL1 MPU region number exceeds CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS,
>     this Xen image could not work well on this platform.

This sounds similar to the GICv3. The number of LRs depends on the 
hardware. See how we dealt with it in gicv3_save_lrs().

>    
>>
>> Note that for the P2M we already have that indirection because it is
>> embbed in the struct domain.
> 
> It's different with V8A P2M case. In V8A context_switch we just need to
> save/restore VTTBR, we don't need to do P2M table walk. But on V8R, we
> need to access valid mpu_regions for save/restore.

The save/restore for the P2M is a bit more complicated than simply 
save/restore the VTTBR. But yes, I agree the code for the MPU will 
likely be more complicated.

> 
>>
>> This raises one question, why is the MPUs regions will be per-vCPU
>> rather per domain?
>>
> 
> Because there is a EL1 MPU component for each pCPU. We can't assume guest
> to use the same EL1 MPU configuration for all vCPU.

Ah. Sorry, I thought you were referring to whatever Xen will use to 
prevent the guest accessing outside of its designated region.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-01  7:51   ` Wei Chen
  2022-03-02  7:21     ` Penny Zheng
@ 2022-03-02 12:00     ` Julien Grall
  2022-03-03  2:06       ` Wei Chen
  1 sibling, 1 reply; 34+ messages in thread
From: Julien Grall @ 2022-03-02 12:00 UTC (permalink / raw)
  To: Wei Chen, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd



On 01/03/2022 07:51, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2022年2月26日 4:55
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; Stefano
>> Stabellini <sstabellini@kernel.org>
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
>> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
>>> ### 1.2. Xen Challenges with PMSA Virtualization
>>> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run
>>> with an MPU and host multiple guest OSes.
>>>
>>> - No MMU at EL2:
>>>       - No EL2 Stage 1 address translation
>>>           - Xen provides fixed ARM64 virtual memory layout as basis of
>> EL2
>>>             stage 1 address translation, which is not applicable on MPU
>> system,
>>>             where there is no virtual addressing. As a result, any
>> operation
>>>             involving transition from PA to VA, like ioremap, needs
>> modification
>>>             on MPU system.
>>>       - Xen's run-time addresses are the same as the link time addresses.
>>>           - Enable PIC (position-independent code) on a real-time target
>>>             processor probably very rare.
>>
>> Aside the assembly boot code and UEFI stub, Xen already runs at the same
>> address as it was linked.
>>
> 
> But the difference is that, base on MMU, we can use the same link address
> for all platforms. But on MPU system, we can't do it in the same way.

I agree that we currently use the same link address for all the 
platforms. But this is also a problem when using MMU because EL2 has a 
single TTBR.

At the moment we are switching page-tables with the MMU which is not 
safe. Instead we need to turn out the MMU off, switch page-tables and 
then turn on the MMU. This means we need to have an identity mapping of 
Xen in the page-tables. Assuming Xen is not relocated, the identity 
mapping may clash with Xen (or the rest of the virtual address map).

My initial idea was to enable PIC and update the relocation at boot 
time. But this is a bit cumbersome to do. So now I am looking to have a 
semi-dynamic virtual layout and find some place to relocate part of Xen 
to use for CPU bring-up.

Anyway, my point is we possibly could look at PIC if that could allow 
generic Xen image.

>>>       - Xen will need to use the EL2 MPU memory region descriptors to
>> manage
>>>         access permissions and attributes for accesses made by VMs at
>> EL1/0.
>>>           - Xen currently relies on MMU EL1 stage 2 table to manage these
>>>             accesses.
>>> - No MMU Stage 2 translation at EL1:
>>>       - A guest doesn't have an independent guest physical address space
>>>       - A guest can not reuse the current Intermediate Physical Address
>>>         memory layout
>>>       - A guest uses physical addresses to access memory and devices
>>>       - The MPU at EL2 manages EL1 stage 2 access permissions and
>> attributes
>>> - There are a limited number of MPU protection regions at both EL2 and
>> EL1:
>>>       - Architecturally, the maximum number of protection regions is 256,
>>>         typical implementations have 32.
>>>       - By contrast, Xen does not need to consider the number of page
>> table
>>>         entries in theory when using MMU.
>>> - The MPU protection regions at EL2 need to be shared between the
>> hypervisor
>>>     and the guest stage 2.
>>>       - Requires careful consideration - may impact feature 'fullness' of
>> both
>>>         the hypervisor and the guest
>>>       - By contrast, when using MMU, Xen has standalone P2M table for
>> guest
>>>         stage 2 accesses.
>>
>> [...]
>>
>>> - ***Define new system registers for compilers***:
>>>     Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
>>>     specific system registers. As Armv8-R64 only have secure state, so
>>>     at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
>>>     first GCC version that supports Armv8.4 is GCC 8.1. In addition to
>>>     these, PMSA of Armv8-R64 introduced lots of MPU related system
>> registers:
>>>     `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and
>>>     `MPUIR_ELx`. But the first GCC version to support these system
>> registers
>>>     is GCC 11. So we have two ways to make compilers to work properly
>> with
>>>     these system registers.
>>>     1. Bump GCC version to GCC 11.
>>>        The pros of this method is that, we don't need to encode these
>>>        system registers in macros by ourselves. But the cons are that,
>>>        we have to update Makefiles to support GCC 11 for Armv8-R64.
>>>        1.1. Check the GCC version 11 for Armv8-R64.
>>>        1.2. Add march=armv8r to CFLAGS for Armv8-R64.
>>>        1.3. Solve the confliction of march=armv8r and mcpu=generic
>>>       These changes will affect common Makefiles, not only Arm Makefiles.
>>>       And GCC 11 is new, lots of toolchains and Distro haven't supported
>> it.
>>
>> I agree that forcing to use GCC11 is not a good idea. But I am not sure
>> to understand the problem with the -march=.... Ultimately, shouldn't we
>> aim to build Xen ARMv8-R with -march=armv8r?
>>
> 
> Actually, we had done, but we reverted it from RFC patch series. The reason
> has been listed above. But that is not the major reason. The main reason
> is that:
> Armv8-R AArch64 supports the A64 ISA instruction set with some modifications:
> Redefines DMB, DSB, and adds an DFB. But actually, the encodings of DMB and
> DSB are still the same with A64. And DFB is a alias of DSB #12.
> 
> In this case, we don't think we need a new arch flag to generate new
> instructions for Armv8-R. And we have discussed with Arm kernel guys, they
> will not update the build system to build Linux that will be running on
> Armv8-R64 EL1 either.

Good to know that the kernel folks plan to do the same. Thanks for the 
explanation!

> 
> 
>> [...]
>>
>>> ### **2.2. Changes of the initialization process**
>>> In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent
>>> initialization process. In addition to some architecture differences,
>> there
>>> is no more than reusable code that we will distinguish through
>> CONFIG_ARM_MPU
>>> or CONFIG_ARM64_V8R. We want most of the initialization code to be
>> reusable
>>> between Armv8-R64 and Armv8-A64.
>>>
>>> - We will reuse the original head.s and setup.c of Arm. But replace the
>>>     MMU and page table operations in these files with configuration
>> operations
>>>     for MPU and MPU regions.
>>>
>>> - We provide a boot-time MPU configuration. This MPU configuration will
>>>     support Xen to finish its initialization. And this boot-time MPU
>>>     configuration will record the memory regions that will be parsed from
>>>     device tree.
>>>
>>>     In the end of Xen initialization, we will use a runtime MPU
>> configuration
>>>     to replace boot-time MPU configuration. The runtime MPU configuration
>> will
>>>     merge and reorder memory regions to save more MPU regions for guests.
>>>     ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqR
>> DoacQVTwUtWIGU)
>>>
>>> - Defer system unpausing domain.
>>>     When Xen initialization is about to end, Xen unpause guests created
>>>     during initialization. But this will cause some issues. The unpause
>>>     action occurs before free_init_memory, however the runtime MPU
>> configuration
>>>     is built after free_init_memory.
>>
>> I was half expecting that free_init_memory() would not be called for Xen
>> Armv8R.
>>
> 
> We had called free_init_memory for Xen Armv8R, but it doesn't really mean
> much. As we have static heap, so we don't reclaim init memory to heap. And
> this reclaimed memory could not be used by Xen data and bss either. But
> from the security perspective, free_init_memory will drop the Xen init
> code & data, this will reduce the code an attacker can exploit.
IIUC, zero-ing the region (or something) similar will be sufficient 
here. IOW, you don't necessarily need to remove the mappings.

>>>
>>>     So if the unpaused guests start executing the context switch at this
>>>     point, then its MPU context will base on the boot-time MPU
>> configuration.
>>
>> Can you explain why you want to switch the MPU configuration that late?
>>
> 
> In the boot stage, Xen is the only user of MPU. It may add some memory
> nodes or device memory to MPU regions for temporary usage. After free
> init memory, we want to reclaim these MPU regions to give more MPU regions
> can be used for guests. Also we will do some merge and reorder work. This
> work can make MPU regions to be easier managed in guest context switch.

Do you have any example of such regions?
> 
>>>     Probably it will be inconsistent with runtime MPU configuration, this
>>>     will cause unexpected problems (This may not happen in a single core
>>>     system, but on SMP systems, this problem is foreseeable, so we hope
>> to
>>>     solve it at the beginning).
>>
>> [...]
>>
>>> ### **2.4. Changes of memory management**
>>> Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to
>>> decouple Xen from VMSA. And give Xen the ability to manage memory in
>> PMSA.
>>>
>>> 1. ***Use buddy allocator to manage physical pages for PMSA***
>>>      From the view of physical page, PMSA and VMSA don't have any
>> difference.
>>>      So we can reuse buddy allocator on Armv8-R64 to manage physical
>> pages.
>>>      The difference is that, in VMSA, Xen will map allocated pages to
>> virtual
>>>      addresses. But in PMSA, Xen just convert the pages to physical
>> address.
>>>
>>> 2. ***Can not use virtual address for memory management***
>>>      As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using
>> virtual
>>>      address to manage memory. This brings some problems, some virtual
>> address
>>>      based features could not work well on Armv8-R64, like `FIXMAP`,
>> `vmap/vumap`,
>>>      `ioremap` and `alternative`.
>>>
>>>      But the functions or macros of these features are used in lots of
>> common
>>>      code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate
>> code
>>>      everywhere. In this case, we propose to use stub helpers to make the
>> changes
>>>      transparently to common code.
>>>      1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap
>> operations.
>>>         This will return physical address directly of fixmapped item.
>>>      2. For `vmap/vumap`, we will use some empty inline stub helpers:
>>>           ```
>>>           static inline void vm_init_type(...) {}
>>>           static inline void *__vmap(...)
>>>           {
>>>               return NULL;
>>>           }
>>>           static inline void vunmap(const void *va) {}
>>>           static inline void *vmalloc(size_t size)
>>>           {
>>>               return NULL;
>>>           }
>>>           static inline void *vmalloc_xen(size_t size)
>>>           {
>>>               return NULL;
>>>           }
>>>           static inline void vfree(void *va) {}
>>>           ```
>>>
>>>      3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to
>> always
>>>         return `NULL`, they could not work well on Armv8-R64 without
>> changes.
>>>         `ioremap` will return input address directly.
>>>           ```
>>>           static inline void *ioremap_attr(...)
>>>           {
>>>               /* We don't have the ability to change input PA cache
>> attributes */
>> OOI, who will set them?
> 
> Some callers that want to change a memory's attribute will set them. Something like
> ioremap_nocache. I am not sure is this what you had asked : )

I am a bit confused. If ioremap_nocache() can change the attribute, then 
why would ioremap_attr() not be able to do it?

> 
>>
>>>               if ( CACHE_ATTR_need_change )
>>>                   return NULL;
>>>               return (void *)pa;
>>>           }
>>>           static inline void __iomem *ioremap_nocache(...)
>>>           {
>>>               return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
>>>           }
>>>           static inline void __iomem *ioremap_cache(...)
>>>           {
>>>               return ioremap_attr(start, len, PAGE_HYPERVISOR);
>>>           }
>>>           static inline void __iomem *ioremap_wc(...)
>>>           {
>>>               return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
>>>           }
>>>           void *ioremap(...)
>>>           {
>>>               return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
>>>           }
>>>
>>>           ```
>>>       4. For `alternative`, it depends on `vmap` too.
>>
>> The only reason we depend on vmap() is because the map the sections
>> *text read-only and we enforce WnX. For VMSA, it would be possible to
>> avoid vmap() with some rework. I don't know for PMSA.
>>
> 
> For PMSA, we still enforce WnX. For your use case, I assume it's alternative.
> It still may have some possibility to avoid vmap(). But there may be some
> security issues. We had thought to disable MPU -> update xen text -> enable
> MPU to copy VMSA alternative's behavior. The problem with this, however,
> is that at some point, all memory is RWX. There maybe some security risk. > But because it's in init stage, it probably doesn't matter as much as 
I thought.

For boot code, we need to ensure the code is compliant to the Arm Arm. 
Other than that, it is OK to have the memory RWX for a short period of time.

In fact, when we originally boot Xen, we don't enforce WnX. We will 
start to enforce when initializing the memory. But there are no blocker 
to delay it (other than writing the code :)).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-02  7:21     ` Penny Zheng
@ 2022-03-02 12:06       ` Julien Grall
  0 siblings, 0 replies; 34+ messages in thread
From: Julien Grall @ 2022-03-02 12:06 UTC (permalink / raw)
  To: Penny Zheng, Wei Chen, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Henry Wang, nd



On 02/03/2022 07:21, Penny Zheng wrote:
> Hi julien

Hi Penny,

>>>>
>>>>     So if the unpaused guests start executing the context switch at this
>>>>     point, then its MPU context will base on the boot-time MPU
>>> configuration.
>>>
>>> Can you explain why you want to switch the MPU configuration that late?
>>>
>>
> 
> It is more related to the implementation.
> 
> In the boot stage, we allocate MPU regions in sequence until the max.
> Since a few MPU region will get removed along the way, it leaves hole there.
> Such like when heap is ready, fdt will be reallocated in the heap, which means the
> MPU region for device tree is in no need. And also in free_init_memory, although we
> do not give back init memory to the heap, we will also destroy according MPU
> regions to make them inaccessible.
> Without ordering, we need a bitmap to record such information.
> 
> In context switch, the memory layout is quite different for guest mode and
> hypervisor mode. When switching to guest mode, only guest RAM, emulated/passthrough
> devices, etc could be seen, but in hypervisor mode, all guests RAM and device memory
> shall be seen. And without reordering, we need to iterate all MPU regions to find
> according regions to disable during runtime context switch, that's definitely a overhead.
> 
> So we propose an ordering at the tail of the boot time, to put all fixed MPU region
> in the head, like xen text/data, etc, and put all flexible ones at tail, like device memory,
> guests RAM.
> Then later in context switch,  we could easily just disable ones from tail and inserts new
> ones in the tail.

Thank you for the clarification. This makes sense to me. I would suggest 
to update the proposal to reflect this decision.

>> For PMSA, we still enforce WnX. For your use case, I assume it's alternative.
>> It still may have some possibility to avoid vmap(). But there may be some
>> security issues. We had thought to disable MPU -> update xen text -> enable
>> MPU to copy VMSA alternative's behavior. The problem with this, however,
>> is that at some point, all memory is RWX. There maybe some security risk.
>> But because it's in init stage, it probably doesn't matter as much as I thought.
>>
> 
> In MMU system, we use vmap() to change requested xen text codes(a few lines) temporarily
> to RW to apply the alternative codes, the granularity for it could be 4KB.
> 
> But on MPU system, we give the whole XEN text code a MPU region, so otherwise we disable
> the whole MPU to make it happen, which leads to a little risk for running c codes where MPU
> disabled, or all text memory becoming RWX at this alternative time.

See my answer to Wei. So long the code is compliant with the Arm Arm, it 
would be acceptable to have boot code running with RWX for a short 
period of time.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-02  7:13           ` Wei Chen
@ 2022-03-02 22:55             ` Stefano Stabellini
  2022-03-03  1:05               ` Wei Chen
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2022-03-02 22:55 UTC (permalink / raw)
  To: Wei Chen
  Cc: Stefano Stabellini, xen-devel, julien, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd

[-- Attachment #1: Type: text/plain, Size: 7932 bytes --]

On Wed, 2 Mar 2022, Wei Chen wrote:
> > > > > > If not, and considering that we have to generate
> > > > > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it make
> > sense
> > > > to
> > > > > > also generate mpu,guest-memory-section, xen,static-mem, etc. at
> > build
> > > > > > time rather than passing it via device tree to Xen at runtime?
> > > > > >
> > > > >
> > > > > Did you mean we still add these information in device tree, but for
> > > > build
> > > > > time only. In runtime we don't parse them?
> > > >
> > > > Yes, something like that, but see below.
> > > >
> > > >
> > > > > > What's the value of doing ARM_MPU_*_MEMORY_START/END at build time
> > and
> > > > > > everything else at runtime?
> > > > >
> > > > > ARM_MPU_*_MEMORY_START/END is defined by platform. But other things
> > are
> > > > > users customized. They can change their usage without rebuild the
> > image.
> > > >
> > > > Good point.
> > > >
> > > > We don't want to have to rebuild Xen if the user updated a guest
> > kernel,
> > > > resulting in a larger boot-module-section.
> > > >
> > > > So I think it makes sense that "mpu,boot-module-section" is generated
> > by
> > > > the scripts (e.g. ImageBuilder) at build time, and Xen reads the
> > > > property at boot from the runtime device tree.
> > > >
> > > > I think we need to divide the information into two groups:
> > > >
> > > >
> > > > # Group1: board info
> > > >
> > > > This information is platform specific and it is not meant to change
> > > > depending on the VM configuration. Ideally, we build Xen for a
> > platform
> > > > once, then we can use the same Xen binary together with any
> > combination
> > > > of dom0/domU kernels and ramdisks.
> > > >
> > > > This kind of information doesn't need to be exposed to the runtime
> > > > device tree. But we can still use a build-time device tree to generate
> > > > the addresses if it is convenient.
> > > >
> > > > XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and
> > ARM_MPU_NORMAL_MEMORY_*
> > > > seem to be part of this group.
> > > >
> > >
> > > Yes.
> > >
> > > >
> > > > # Group2: boot configuration
> > > >
> > > > This information is about the specific set of binaries and VMs that we
> > > > need to boot. It is conceptually similar to the dom0less device tree
> > > > nodes that we already have. If we change one of the VM binaries, we
> > > > likely have to refresh the information here.
> > > >
> > > > "mpu,boot-module-section" probably belongs to this group (unless we
> > find
> > > > a way to define "mpu,boot-module-section" generically so that we don't
> > > > need to change it any time the set of boot modules change.)
> > > >
> > > >
> > >
> > > I agree.
> > >
> > > > > > It looks like we are forced to have the sections definitions at
> > build
> > > > > > time because we need them before we can parse device tree. In that
> > > > case,
> > > > > > we might as well define all the sections at build time.
> > > > > >
> > > > > > But I think it would be even better if Xen could automatically
> > choose
> > > > > > xen,static-mem, mpu,guest-memory-section, etc. on its own based on
> > the
> > > > > > regular device tree information (/memory, /amba, etc.), without
> > any
> > > > need
> > > > > > for explicitly describing each range with these new properties.
> > > > > >
> > > > >
> > > > > for mpu,guest-memory-section, with the limitations: no other usage
> > > > between
> > > > > different guest' memory nodes, this is OK. But for xen,static-mem
> > (heap),
> > > > > we just want everything on a MPU system is dertermistic. But, of
> > course
> > > > Xen
> > > > > can select left memory for heap without static-mem.
> > > >
> > > > It is good that you think they can be chosen by Xen.
> > > >
> > > > Differently from "boot-module-section", which has to do with the boot
> > > > modules selected by the user for a specific execution,
> > > > guest-memory-section and static-mem are Xen specific memory
> > > > policies/allocations.
> > > >
> > > > A user wouldn't know how to fill them in. And I worry that even a
> > script
> > >
> > > But users should know it, because static-mem for guest must be allocated
> > > in this range. And users take the responsibility to set the DomU's
> > > static allocate memory ranges.
> > 
> > Let me premise that my goal is to avoid having many users reporting
> > errors to xen-devel and xen-users when actually it is just a wrong
> > choice of addresses.
> > 
> > I think we need to make a distinction between addresses for the boot
> > modules, e.g. addresses where to load xen, the dom0/U kernel, dom0/U
> > ramdisk in memory at boot time, and VM static memory addresses.
> > 
> > The boot modules addresses are particularly difficult to fill in because
> > they are many and a small update in one of the modules could invalidate
> > all the other addresses. This is why I ended up writing ImageBuilder.
> > Since them, I received several emails from users thanking me for
> > ImageBuilder :-)
> > 
> 
> Thanks +999 😊
> 
> 
> > The static VM memory addresses (xen,static-mem) should be a bit easier
> > to fill in correctly. They are meant to be chosen once, and it shouldn't
> > happen that an update on a kernel forces the user to change all the VM
> > static memory addresses. Also, I know that some users actually want to
> > be able to choose the domU addresses by hand because they have specific
> > needs. So it is good that we can let the user choose the addresses if
> > they want to.
> > 
> 
> Yes.
> 
> > With all of that said, I do think that many users won't have an opinion
> > on the VM static memory addresses and won't know how to choose them.
> > It would be error prone to let them try to fill them in by hand. So I
> > was already planning on adding support to ImageBuilder to automatically
> > generate xen,static-mem for dom0less domains.
> > 
> 
> Let me make sure that's what you said: Users give an VM memory size to
> ImageBuilder, and ImageBuilder will generate xen,static-mem = <start, size>.
> For specific VM, ImageBuilder also can accept start and size as inputs?
> 
> Do I understand this correctly?

Yes, exactly

 
> > Going back to this specific discussion about boot-module-section: I can
> > see now that, given xen,static-mem is chosen by ImageBuilder (or
> 
> By hand : )
> 
> > similar) and not Xen, then it makes sense to have ImageBuilder (or
> > similar) also generate boot-module-section.
> > 
> 
> If my above understanding is right, then yes.

Yes, I think we are on the same page
 
 
> > > > like ImageBuilder wouldn't be the best place to pick these values --
> > > > they seem too "important" to leave to a script.
> > > >
> > > > But it seems possible to choose the values in Xen:
> > > > - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at build
> > time
> > > > - Xen reads boot-module-section from device tree
> > > >
> > > > It should be possible at this point for Xen to pick the best values
> > for
> > > > guest-memory-section and static-mem based on the memory available.
> > > >
> > >
> > > How Xen to pick? Does it mean in static allocation DomU DT node, we just
> > > need a size, but don't require a start address for static-mem?
> > 
> > Yes the idea was that the user would only provide the size (e.g.
> > DOMU_STATIC_MEM[1]=1024) and the addresses would be automatically
> > calculated. But I didn't mean to change the existing xen,static-mem
> > device tree bindings. So it is best if the xen,static-mem addresses
> > generation is done by ImageBuilder (or similar tool) instead of Xen.
> > 
> 
> If we still keep the option for user to specify the start and size
> parameters for VM memory, because it maybe very important for a
> deterministic system (fully static system), I agree with you.
> 
> And in current static-allocation, I think Xen doesn't generate
> xen,static-mem addresses, all by hands...

Yeah 


> > Sorry for the confusion!
> > 
> 
> NP ; )

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-02 22:55             ` Stefano Stabellini
@ 2022-03-03  1:05               ` Wei Chen
  2022-03-03  2:03                 ` Stefano Stabellini
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-03  1:05 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, julien, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2022年3月3日 6:56
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org; julien@xen.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> On Wed, 2 Mar 2022, Wei Chen wrote:
> > > > > > > If not, and considering that we have to generate
> > > > > > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it make
> > > sense
> > > > > to
> > > > > > > also generate mpu,guest-memory-section, xen,static-mem, etc.
> at
> > > build
> > > > > > > time rather than passing it via device tree to Xen at runtime?
> > > > > > >
> > > > > >
> > > > > > Did you mean we still add these information in device tree, but
> for
> > > > > build
> > > > > > time only. In runtime we don't parse them?
> > > > >
> > > > > Yes, something like that, but see below.
> > > > >
> > > > >
> > > > > > > What's the value of doing ARM_MPU_*_MEMORY_START/END at build
> time
> > > and
> > > > > > > everything else at runtime?
> > > > > >
> > > > > > ARM_MPU_*_MEMORY_START/END is defined by platform. But other
> things
> > > are
> > > > > > users customized. They can change their usage without rebuild
> the
> > > image.
> > > > >
> > > > > Good point.
> > > > >
> > > > > We don't want to have to rebuild Xen if the user updated a guest
> > > kernel,
> > > > > resulting in a larger boot-module-section.
> > > > >
> > > > > So I think it makes sense that "mpu,boot-module-section" is
> generated
> > > by
> > > > > the scripts (e.g. ImageBuilder) at build time, and Xen reads the
> > > > > property at boot from the runtime device tree.
> > > > >
> > > > > I think we need to divide the information into two groups:
> > > > >
> > > > >
> > > > > # Group1: board info
> > > > >
> > > > > This information is platform specific and it is not meant to
> change
> > > > > depending on the VM configuration. Ideally, we build Xen for a
> > > platform
> > > > > once, then we can use the same Xen binary together with any
> > > combination
> > > > > of dom0/domU kernels and ramdisks.
> > > > >
> > > > > This kind of information doesn't need to be exposed to the runtime
> > > > > device tree. But we can still use a build-time device tree to
> generate
> > > > > the addresses if it is convenient.
> > > > >
> > > > > XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and
> > > ARM_MPU_NORMAL_MEMORY_*
> > > > > seem to be part of this group.
> > > > >
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > > # Group2: boot configuration
> > > > >
> > > > > This information is about the specific set of binaries and VMs
> that we
> > > > > need to boot. It is conceptually similar to the dom0less device
> tree
> > > > > nodes that we already have. If we change one of the VM binaries,
> we
> > > > > likely have to refresh the information here.
> > > > >
> > > > > "mpu,boot-module-section" probably belongs to this group (unless
> we
> > > find
> > > > > a way to define "mpu,boot-module-section" generically so that we
> don't
> > > > > need to change it any time the set of boot modules change.)
> > > > >
> > > > >
> > > >
> > > > I agree.
> > > >
> > > > > > > It looks like we are forced to have the sections definitions
> at
> > > build
> > > > > > > time because we need them before we can parse device tree. In
> that
> > > > > case,
> > > > > > > we might as well define all the sections at build time.
> > > > > > >
> > > > > > > But I think it would be even better if Xen could automatically
> > > choose
> > > > > > > xen,static-mem, mpu,guest-memory-section, etc. on its own
> based on
> > > the
> > > > > > > regular device tree information (/memory, /amba, etc.),
> without
> > > any
> > > > > need
> > > > > > > for explicitly describing each range with these new properties.
> > > > > > >
> > > > > >
> > > > > > for mpu,guest-memory-section, with the limitations: no other
> usage
> > > > > between
> > > > > > different guest' memory nodes, this is OK. But for xen,static-
> mem
> > > (heap),
> > > > > > we just want everything on a MPU system is dertermistic. But, of
> > > course
> > > > > Xen
> > > > > > can select left memory for heap without static-mem.
> > > > >
> > > > > It is good that you think they can be chosen by Xen.
> > > > >
> > > > > Differently from "boot-module-section", which has to do with the
> boot
> > > > > modules selected by the user for a specific execution,
> > > > > guest-memory-section and static-mem are Xen specific memory
> > > > > policies/allocations.
> > > > >
> > > > > A user wouldn't know how to fill them in. And I worry that even a
> > > script
> > > >
> > > > But users should know it, because static-mem for guest must be
> allocated
> > > > in this range. And users take the responsibility to set the DomU's
> > > > static allocate memory ranges.
> > >
> > > Let me premise that my goal is to avoid having many users reporting
> > > errors to xen-devel and xen-users when actually it is just a wrong
> > > choice of addresses.
> > >
> > > I think we need to make a distinction between addresses for the boot
> > > modules, e.g. addresses where to load xen, the dom0/U kernel, dom0/U
> > > ramdisk in memory at boot time, and VM static memory addresses.
> > >
> > > The boot modules addresses are particularly difficult to fill in
> because
> > > they are many and a small update in one of the modules could
> invalidate
> > > all the other addresses. This is why I ended up writing ImageBuilder.
> > > Since them, I received several emails from users thanking me for
> > > ImageBuilder :-)
> > >
> >
> > Thanks +999 😊
> >
> >
> > > The static VM memory addresses (xen,static-mem) should be a bit easier
> > > to fill in correctly. They are meant to be chosen once, and it
> shouldn't
> > > happen that an update on a kernel forces the user to change all the VM
> > > static memory addresses. Also, I know that some users actually want to
> > > be able to choose the domU addresses by hand because they have
> specific
> > > needs. So it is good that we can let the user choose the addresses if
> > > they want to.
> > >
> >
> > Yes.
> >
> > > With all of that said, I do think that many users won't have an
> opinion
> > > on the VM static memory addresses and won't know how to choose them.
> > > It would be error prone to let them try to fill them in by hand. So I
> > > was already planning on adding support to ImageBuilder to
> automatically
> > > generate xen,static-mem for dom0less domains.
> > >
> >
> > Let me make sure that's what you said: Users give an VM memory size to
> > ImageBuilder, and ImageBuilder will generate xen,static-mem = <start,
> size>.
> > For specific VM, ImageBuilder also can accept start and size as inputs?
> >
> > Do I understand this correctly?
> 
> Yes, exactly
> 
> 
> > > Going back to this specific discussion about boot-module-section: I
> can
> > > see now that, given xen,static-mem is chosen by ImageBuilder (or
> >
> > By hand : )
> >
> > > similar) and not Xen, then it makes sense to have ImageBuilder (or
> > > similar) also generate boot-module-section.
> > >
> >
> > If my above understanding is right, then yes.
> 
> Yes, I think we are on the same page
> 
> 
> > > > > like ImageBuilder wouldn't be the best place to pick these values
> --
> > > > > they seem too "important" to leave to a script.
> > > > >
> > > > > But it seems possible to choose the values in Xen:
> > > > > - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at
> build
> > > time
> > > > > - Xen reads boot-module-section from device tree
> > > > >
> > > > > It should be possible at this point for Xen to pick the best
> values
> > > for
> > > > > guest-memory-section and static-mem based on the memory available.
> > > > >
> > > >
> > > > How Xen to pick? Does it mean in static allocation DomU DT node, we
> just
> > > > need a size, but don't require a start address for static-mem?
> > >
> > > Yes the idea was that the user would only provide the size (e.g.
> > > DOMU_STATIC_MEM[1]=1024) and the addresses would be automatically
> > > calculated. But I didn't mean to change the existing xen,static-mem
> > > device tree bindings. So it is best if the xen,static-mem addresses
> > > generation is done by ImageBuilder (or similar tool) instead of Xen.
> > >
> >
> > If we still keep the option for user to specify the start and size
> > parameters for VM memory, because it maybe very important for a
> > deterministic system (fully static system), I agree with you.
> >
> > And in current static-allocation, I think Xen doesn't generate
> > xen,static-mem addresses, all by hands...
> 
> Yeah
> 
> 

I will update my proposal to cover our above discussion, but I forgot one
thing. As the platform header files will be generated from DTS, does it
mean we have to maintain platform dts files in Xen like what Zephyr has
done? And do you have some idea to integrate the "ImageBuilder"? Make it
as a submodule of Xen or integrate to xen-tools?


> > > Sorry for the confusion!
> > >
> >
> > NP ; )

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-02 10:24             ` Julien Grall
@ 2022-03-03  1:35               ` Wei Chen
  2022-03-03  9:15                 ` Julien Grall
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-03  1:35 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年3月2日 18:25
> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Wei,
> 
> On 02/03/2022 06:43, Wei Chen wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2022年3月1日 21:17
> >> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >> <sstabellini@kernel.org>
> >> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> >> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry
> Wang
> >> <Henry.Wang@arm.com>; nd <nd@arm.com>
> >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>
> >> On 01/03/2022 06:29, Wei Chen wrote:
> >>> Hi Julien,
> >>
> >> Hi,
> >>
> >>>> -----Original Message-----
> >>>> From: Julien Grall <julien@xen.org>
> >>>> Sent: 2022年2月26日 4:12
> >>>> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> >>>> <sstabellini@kernel.org>
> >>>> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> >>>> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry
> >> Wang
> >>>> <Henry.Wang@arm.com>; nd <nd@arm.com>
> >>>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>>>
> >>>> Hi Wei,
> >>>>
> >>>> On 25/02/2022 10:48, Wei Chen wrote:
> >>>>>>>        Armv8-R64 can support max to 256 MPU regions. But that's
> just
> >>>>>> theoretical.
> >>>>>>>        So we don't want to define `pr_t mpu_regions[256]`, this is
> a
> >>>> memory
> >>>>>> waste
> >>>>>>>        in most of time. So we decided to let the user specify
> through
> >> a
> >>>>>> Kconfig
> >>>>>>>        option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default
> value
> >> can
> >>>> be
> >>>>>> `32`,
> >>>>>>>        it's a typical implementation on Armv8-R64. Users will
> >> recompile
> >>>> Xen
> >>>>>> when
> >>>>>>>        their platform changes. So when the MPU changes,
> respecifying
> >> the
> >>>>>> MPU
> >>>>>>>        protection regions number will not cause additional
> problems.
> >>>>>>
> >>>>>> I wonder if we could probe the number of MPU regions at runtime and
> >>>>>> dynamically allocate the memory needed to store them in arch_vcpu.
> >>>>>>
> >>>>>
> >>>>> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But
> it
> >>>> seems
> >>>>> we will encounter some static allocated arch_vcpu problems and
> sizeof
> >>>> issue.
> >>>>
> >>>> Does it need to be embedded in arch_vcpu? If not, then we could
> >> allocate
> >>>> memory outside and add a pointer in arch_vcpu.
> >>>>
> >>>
> >>> We had thought to use a pointer in arch_vcpu instead of embedding
> >> mpu_regions
> >>> into arch_vcpu. But we noticed that arch_vcpu has a
> __cacheline_aligned
> >>> attribute, this may be because of arch_vcpu will be used very
> frequently
> >>> in some critical path. So if we use the pointer for mpu_regions, may
> >> cause
> >>> some cache miss in these critical path, for example, in context_swtich.
> >>
> >>   From my understanding, the idea behind ``cacheline_aligned`` is to
> >> avoid the struct vcpu to be shared with other datastructure. Otherwise
> >> you may end up to have two pCPUs to frequently write the same cacheline
> >> which is not ideal.
> >>
> >> arch_vcpu should embbed anything that will be accessed often (e.g.
> >> entry/exit) to certain point. For instance, not everything related to
> >> the vGIC are embbed in the vCPU/Domain structure.
> >>
> >> I am a bit split regarding the mpu_regions. If they are mainly used in
> >> the context_switch() then I would argue this is a premature
> optimization
> >> because the scheduling decision is probably going to take a lot more
> >> time than the context switch itself.
> >
> > mpu_regions in arch_vcpu are used to save guest's EL1 MPU context. So,
> yes,
> > they are mainly used in context_switch. In terms of the number of
> registers,
> > it will save/restore more work than the original V8A. And on V8R we also
> need
> > to keep most of the original V8A save/restore work. So it will take
> longer
> > than the original V8A context_switch. And I think this is due to
> architecture's
> > difference. So it's impossible for us not to save/restore EL1 MPU region
> > registers in context_switch. And we have done some optimization for EL1
> MPU
> > save/restore:
> > 1. Assembly code for EL1 MPU context_switch
> 
> This discussion reminds me when KVM decided to rewrite their context
> switch from assembly to C. The outcome was the compiler is able to do a
> better job than us when it comes to optimizing.
> 
> With a C version, we could also share the save/restore code with 32-bit
> and it is easier to read/maintain.
> 
> So I would suggest to run some numbers to check if it really worth
> implementing the MPU save/restore in assembly.
> 

It's interesting to hear KVM guys have similar discussion. Yes, if the
gains of assembly code is not very obvious, then reusing the code for 32-bit
would be more important. As our current platform (FVP) could not do very
precise performance measurement. I want to keep current assembly code there,
when we have a platform that can do such measurement we can have a thread
to discuss it.

> > 2. Use real MPU regions number instead of
> CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS
> >     in context_switch. CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS is defined
> the Max
> >     supported EL1 MPU regions for this Xen image. All platforms that
> implement
> >     EL1 MPU regions in this range can work well with this Xen Image. But
> if the
> >     implemented EL1 MPU region number exceeds
> CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS,
> >     this Xen image could not work well on this platform.
> 
> This sounds similar to the GICv3. The number of LRs depends on the
> hardware. See how we dealt with it in gicv3_save_lrs().
>

This is a good suggestion, we will check the GIC code.

> >
> >>
> >> Note that for the P2M we already have that indirection because it is
> >> embbed in the struct domain.
> >
> > It's different with V8A P2M case. In V8A context_switch we just need to
> > save/restore VTTBR, we don't need to do P2M table walk. But on V8R, we
> > need to access valid mpu_regions for save/restore.
> 
> The save/restore for the P2M is a bit more complicated than simply
> save/restore the VTTBR. But yes, I agree the code for the MPU will
> likely be more complicated.
> 
> >
> >>
> >> This raises one question, why is the MPUs regions will be per-vCPU
> >> rather per domain?
> >>
> >
> > Because there is a EL1 MPU component for each pCPU. We can't assume
> guest
> > to use the same EL1 MPU configuration for all vCPU.
> 
> Ah. Sorry, I thought you were referring to whatever Xen will use to
> prevent the guest accessing outside of its designated region.
> 

NP : )

Thanks,
Wei Chen

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03  1:05               ` Wei Chen
@ 2022-03-03  2:03                 ` Stefano Stabellini
  2022-03-03  2:12                   ` Wei Chen
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2022-03-03  2:03 UTC (permalink / raw)
  To: Wei Chen
  Cc: Stefano Stabellini, xen-devel, julien, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd, George.Dunlap

[-- Attachment #1: Type: text/plain, Size: 10124 bytes --]

On Thu, 3 Mar 2022, Wei Chen wrote:
> > On Wed, 2 Mar 2022, Wei Chen wrote:
> > > > > > > > If not, and considering that we have to generate
> > > > > > > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it make
> > > > sense
> > > > > > to
> > > > > > > > also generate mpu,guest-memory-section, xen,static-mem, etc.
> > at
> > > > build
> > > > > > > > time rather than passing it via device tree to Xen at runtime?
> > > > > > > >
> > > > > > >
> > > > > > > Did you mean we still add these information in device tree, but
> > for
> > > > > > build
> > > > > > > time only. In runtime we don't parse them?
> > > > > >
> > > > > > Yes, something like that, but see below.
> > > > > >
> > > > > >
> > > > > > > > What's the value of doing ARM_MPU_*_MEMORY_START/END at build
> > time
> > > > and
> > > > > > > > everything else at runtime?
> > > > > > >
> > > > > > > ARM_MPU_*_MEMORY_START/END is defined by platform. But other
> > things
> > > > are
> > > > > > > users customized. They can change their usage without rebuild
> > the
> > > > image.
> > > > > >
> > > > > > Good point.
> > > > > >
> > > > > > We don't want to have to rebuild Xen if the user updated a guest
> > > > kernel,
> > > > > > resulting in a larger boot-module-section.
> > > > > >
> > > > > > So I think it makes sense that "mpu,boot-module-section" is
> > generated
> > > > by
> > > > > > the scripts (e.g. ImageBuilder) at build time, and Xen reads the
> > > > > > property at boot from the runtime device tree.
> > > > > >
> > > > > > I think we need to divide the information into two groups:
> > > > > >
> > > > > >
> > > > > > # Group1: board info
> > > > > >
> > > > > > This information is platform specific and it is not meant to
> > change
> > > > > > depending on the VM configuration. Ideally, we build Xen for a
> > > > platform
> > > > > > once, then we can use the same Xen binary together with any
> > > > combination
> > > > > > of dom0/domU kernels and ramdisks.
> > > > > >
> > > > > > This kind of information doesn't need to be exposed to the runtime
> > > > > > device tree. But we can still use a build-time device tree to
> > generate
> > > > > > the addresses if it is convenient.
> > > > > >
> > > > > > XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and
> > > > ARM_MPU_NORMAL_MEMORY_*
> > > > > > seem to be part of this group.
> > > > > >
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > # Group2: boot configuration
> > > > > >
> > > > > > This information is about the specific set of binaries and VMs
> > that we
> > > > > > need to boot. It is conceptually similar to the dom0less device
> > tree
> > > > > > nodes that we already have. If we change one of the VM binaries,
> > we
> > > > > > likely have to refresh the information here.
> > > > > >
> > > > > > "mpu,boot-module-section" probably belongs to this group (unless
> > we
> > > > find
> > > > > > a way to define "mpu,boot-module-section" generically so that we
> > don't
> > > > > > need to change it any time the set of boot modules change.)
> > > > > >
> > > > > >
> > > > >
> > > > > I agree.
> > > > >
> > > > > > > > It looks like we are forced to have the sections definitions
> > at
> > > > build
> > > > > > > > time because we need them before we can parse device tree. In
> > that
> > > > > > case,
> > > > > > > > we might as well define all the sections at build time.
> > > > > > > >
> > > > > > > > But I think it would be even better if Xen could automatically
> > > > choose
> > > > > > > > xen,static-mem, mpu,guest-memory-section, etc. on its own
> > based on
> > > > the
> > > > > > > > regular device tree information (/memory, /amba, etc.),
> > without
> > > > any
> > > > > > need
> > > > > > > > for explicitly describing each range with these new properties.
> > > > > > > >
> > > > > > >
> > > > > > > for mpu,guest-memory-section, with the limitations: no other
> > usage
> > > > > > between
> > > > > > > different guest' memory nodes, this is OK. But for xen,static-
> > mem
> > > > (heap),
> > > > > > > we just want everything on a MPU system is dertermistic. But, of
> > > > course
> > > > > > Xen
> > > > > > > can select left memory for heap without static-mem.
> > > > > >
> > > > > > It is good that you think they can be chosen by Xen.
> > > > > >
> > > > > > Differently from "boot-module-section", which has to do with the
> > boot
> > > > > > modules selected by the user for a specific execution,
> > > > > > guest-memory-section and static-mem are Xen specific memory
> > > > > > policies/allocations.
> > > > > >
> > > > > > A user wouldn't know how to fill them in. And I worry that even a
> > > > script
> > > > >
> > > > > But users should know it, because static-mem for guest must be
> > allocated
> > > > > in this range. And users take the responsibility to set the DomU's
> > > > > static allocate memory ranges.
> > > >
> > > > Let me premise that my goal is to avoid having many users reporting
> > > > errors to xen-devel and xen-users when actually it is just a wrong
> > > > choice of addresses.
> > > >
> > > > I think we need to make a distinction between addresses for the boot
> > > > modules, e.g. addresses where to load xen, the dom0/U kernel, dom0/U
> > > > ramdisk in memory at boot time, and VM static memory addresses.
> > > >
> > > > The boot modules addresses are particularly difficult to fill in
> > because
> > > > they are many and a small update in one of the modules could
> > invalidate
> > > > all the other addresses. This is why I ended up writing ImageBuilder.
> > > > Since them, I received several emails from users thanking me for
> > > > ImageBuilder :-)
> > > >
> > >
> > > Thanks +999 😊
> > >
> > >
> > > > The static VM memory addresses (xen,static-mem) should be a bit easier
> > > > to fill in correctly. They are meant to be chosen once, and it
> > shouldn't
> > > > happen that an update on a kernel forces the user to change all the VM
> > > > static memory addresses. Also, I know that some users actually want to
> > > > be able to choose the domU addresses by hand because they have
> > specific
> > > > needs. So it is good that we can let the user choose the addresses if
> > > > they want to.
> > > >
> > >
> > > Yes.
> > >
> > > > With all of that said, I do think that many users won't have an
> > opinion
> > > > on the VM static memory addresses and won't know how to choose them.
> > > > It would be error prone to let them try to fill them in by hand. So I
> > > > was already planning on adding support to ImageBuilder to
> > automatically
> > > > generate xen,static-mem for dom0less domains.
> > > >
> > >
> > > Let me make sure that's what you said: Users give an VM memory size to
> > > ImageBuilder, and ImageBuilder will generate xen,static-mem = <start,
> > size>.
> > > For specific VM, ImageBuilder also can accept start and size as inputs?
> > >
> > > Do I understand this correctly?
> > 
> > Yes, exactly
> > 
> > 
> > > > Going back to this specific discussion about boot-module-section: I
> > can
> > > > see now that, given xen,static-mem is chosen by ImageBuilder (or
> > >
> > > By hand : )
> > >
> > > > similar) and not Xen, then it makes sense to have ImageBuilder (or
> > > > similar) also generate boot-module-section.
> > > >
> > >
> > > If my above understanding is right, then yes.
> > 
> > Yes, I think we are on the same page
> > 
> > 
> > > > > > like ImageBuilder wouldn't be the best place to pick these values
> > --
> > > > > > they seem too "important" to leave to a script.
> > > > > >
> > > > > > But it seems possible to choose the values in Xen:
> > > > > > - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at
> > build
> > > > time
> > > > > > - Xen reads boot-module-section from device tree
> > > > > >
> > > > > > It should be possible at this point for Xen to pick the best
> > values
> > > > for
> > > > > > guest-memory-section and static-mem based on the memory available.
> > > > > >
> > > > >
> > > > > How Xen to pick? Does it mean in static allocation DomU DT node, we
> > just
> > > > > need a size, but don't require a start address for static-mem?
> > > >
> > > > Yes the idea was that the user would only provide the size (e.g.
> > > > DOMU_STATIC_MEM[1]=1024) and the addresses would be automatically
> > > > calculated. But I didn't mean to change the existing xen,static-mem
> > > > device tree bindings. So it is best if the xen,static-mem addresses
> > > > generation is done by ImageBuilder (or similar tool) instead of Xen.
> > > >
> > >
> > > If we still keep the option for user to specify the start and size
> > > parameters for VM memory, because it maybe very important for a
> > > deterministic system (fully static system), I agree with you.
> > >
> > > And in current static-allocation, I think Xen doesn't generate
> > > xen,static-mem addresses, all by hands...
> > 
> > Yeah
> > 
> 
> I will update my proposal to cover our above discussion, but I forgot one
> thing. As the platform header files will be generated from DTS, does it
> mean we have to maintain platform dts files in Xen like what Zephyr has
> done?

I would prefer not to have to maintain platform dts files in Xen like
Zephyr is doing. Ideally, the user should be able to take any
spec-compliant device tree file and use it. I would say: let's start
without adding the dts files to Xen (we might have one under docs/ but
just as an example.) We can add them later if the need arise.


> And do you have some idea to integrate the "ImageBuilder"? Make it
> as a submodule of Xen or integrate to xen-tools?

I think it would be best if ImageBuilder was kept as a separate
repository because there should be no strong ties between ImageBuilder
versions and Xen versions. It is more convenient to handle it in a
separate repository, especially as Yocto and other build systems might
clone ImageBuilder during the build to generate boot.scr (it is already
the case).

That said, it might be good to make it more "official" but moving it to
Xen Project. I can talk to George about creating
http://xenbits.xen.org/git-http/imagebuilder.git or
https://gitlab.com/xen-project/imagebuilder.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-02 12:00     ` Julien Grall
@ 2022-03-03  2:06       ` Wei Chen
  2022-03-03 19:51         ` Julien Grall
  2022-03-07  2:12         ` Wei Chen
  0 siblings, 2 replies; 34+ messages in thread
From: Wei Chen @ 2022-03-03  2:06 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年3月2日 20:00
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; Stefano
> Stabellini <sstabellini@kernel.org>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> 
> 
> On 01/03/2022 07:51, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2022年2月26日 4:55
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini <sstabellini@kernel.org>
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> >> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>> ### 1.2. Xen Challenges with PMSA Virtualization
> >>> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to
> run
> >>> with an MPU and host multiple guest OSes.
> >>>
> >>> - No MMU at EL2:
> >>>       - No EL2 Stage 1 address translation
> >>>           - Xen provides fixed ARM64 virtual memory layout as basis of
> >> EL2
> >>>             stage 1 address translation, which is not applicable on
> MPU
> >> system,
> >>>             where there is no virtual addressing. As a result, any
> >> operation
> >>>             involving transition from PA to VA, like ioremap, needs
> >> modification
> >>>             on MPU system.
> >>>       - Xen's run-time addresses are the same as the link time
> addresses.
> >>>           - Enable PIC (position-independent code) on a real-time
> target
> >>>             processor probably very rare.
> >>
> >> Aside the assembly boot code and UEFI stub, Xen already runs at the
> same
> >> address as it was linked.
> >>
> >
> > But the difference is that, base on MMU, we can use the same link
> address
> > for all platforms. But on MPU system, we can't do it in the same way.
> 
> I agree that we currently use the same link address for all the
> platforms. But this is also a problem when using MMU because EL2 has a
> single TTBR.
> 
> At the moment we are switching page-tables with the MMU which is not
> safe. Instead we need to turn out the MMU off, switch page-tables and
> then turn on the MMU. This means we need to have an identity mapping of
> Xen in the page-tables. Assuming Xen is not relocated, the identity
> mapping may clash with Xen (or the rest of the virtual address map).
> 

Is this the same reason we create a dummy reloc section for EFI loader?

> My initial idea was to enable PIC and update the relocation at boot
> time. But this is a bit cumbersome to do. So now I am looking to have a
> semi-dynamic virtual layout and find some place to relocate part of Xen
> to use for CPU bring-up.
> 
> Anyway, my point is we possibly could look at PIC if that could allow
> generic Xen image.
> 

I understand your concern. IMO, PIC is possible to do this, but obviously,
it's not a small amount of work. And I want to hear some suggestions from
Stefano, because he also has some solutions in previous thread. 

> >>>       - Xen will need to use the EL2 MPU memory region descriptors to
> >> manage
> >>>         access permissions and attributes for accesses made by VMs at
> >> EL1/0.
> >>>           - Xen currently relies on MMU EL1 stage 2 table to manage
> these
> >>>             accesses.
> >>> - No MMU Stage 2 translation at EL1:
> >>>       - A guest doesn't have an independent guest physical address
> space
> >>>       - A guest can not reuse the current Intermediate Physical
> Address
> >>>         memory layout
> >>>       - A guest uses physical addresses to access memory and devices
> >>>       - The MPU at EL2 manages EL1 stage 2 access permissions and
> >> attributes
> >>> - There are a limited number of MPU protection regions at both EL2 and
> >> EL1:
> >>>       - Architecturally, the maximum number of protection regions is
> 256,
> >>>         typical implementations have 32.
> >>>       - By contrast, Xen does not need to consider the number of page
> >> table
> >>>         entries in theory when using MMU.
> >>> - The MPU protection regions at EL2 need to be shared between the
> >> hypervisor
> >>>     and the guest stage 2.
> >>>       - Requires careful consideration - may impact feature 'fullness'
> of
> >> both
> >>>         the hypervisor and the guest
> >>>       - By contrast, when using MMU, Xen has standalone P2M table for
> >> guest
> >>>         stage 2 accesses.
> >>
> >> [...]
> >>
> >>> - ***Define new system registers for compilers***:
> >>>     Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> >>>     specific system registers. As Armv8-R64 only have secure state, so
> >>>     at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And
> the
> >>>     first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> >>>     these, PMSA of Armv8-R64 introduced lots of MPU related system
> >> registers:
> >>>     `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx`
> and
> >>>     `MPUIR_ELx`. But the first GCC version to support these system
> >> registers
> >>>     is GCC 11. So we have two ways to make compilers to work properly
> >> with
> >>>     these system registers.
> >>>     1. Bump GCC version to GCC 11.
> >>>        The pros of this method is that, we don't need to encode these
> >>>        system registers in macros by ourselves. But the cons are that,
> >>>        we have to update Makefiles to support GCC 11 for Armv8-R64.
> >>>        1.1. Check the GCC version 11 for Armv8-R64.
> >>>        1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> >>>        1.3. Solve the confliction of march=armv8r and mcpu=generic
> >>>       These changes will affect common Makefiles, not only Arm
> Makefiles.
> >>>       And GCC 11 is new, lots of toolchains and Distro haven't
> supported
> >> it.
> >>
> >> I agree that forcing to use GCC11 is not a good idea. But I am not sure
> >> to understand the problem with the -march=.... Ultimately, shouldn't we
> >> aim to build Xen ARMv8-R with -march=armv8r?
> >>
> >
> > Actually, we had done, but we reverted it from RFC patch series. The
> reason
> > has been listed above. But that is not the major reason. The main reason
> > is that:
> > Armv8-R AArch64 supports the A64 ISA instruction set with some
> modifications:
> > Redefines DMB, DSB, and adds an DFB. But actually, the encodings of DMB
> and
> > DSB are still the same with A64. And DFB is a alias of DSB #12.
> >
> > In this case, we don't think we need a new arch flag to generate new
> > instructions for Armv8-R. And we have discussed with Arm kernel guys,
> they
> > will not update the build system to build Linux that will be running on
> > Armv8-R64 EL1 either.
> 
> Good to know that the kernel folks plan to do the same. Thanks for the
> explanation!
> 
> >
> >
> >> [...]
> >>
> >>> ### **2.2. Changes of the initialization process**
> >>> In general, we still expect Armv8-R64 and Armv8-A64 to have a
> consistent
> >>> initialization process. In addition to some architecture differences,
> >> there
> >>> is no more than reusable code that we will distinguish through
> >> CONFIG_ARM_MPU
> >>> or CONFIG_ARM64_V8R. We want most of the initialization code to be
> >> reusable
> >>> between Armv8-R64 and Armv8-A64.
> >>>
> >>> - We will reuse the original head.s and setup.c of Arm. But replace
> the
> >>>     MMU and page table operations in these files with configuration
> >> operations
> >>>     for MPU and MPU regions.
> >>>
> >>> - We provide a boot-time MPU configuration. This MPU configuration
> will
> >>>     support Xen to finish its initialization. And this boot-time MPU
> >>>     configuration will record the memory regions that will be parsed
> from
> >>>     device tree.
> >>>
> >>>     In the end of Xen initialization, we will use a runtime MPU
> >> configuration
> >>>     to replace boot-time MPU configuration. The runtime MPU
> configuration
> >> will
> >>>     merge and reorder memory regions to save more MPU regions for
> guests.
> >>>     ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1
> PqR
> >> DoacQVTwUtWIGU)
> >>>
> >>> - Defer system unpausing domain.
> >>>     When Xen initialization is about to end, Xen unpause guests
> created
> >>>     during initialization. But this will cause some issues. The
> unpause
> >>>     action occurs before free_init_memory, however the runtime MPU
> >> configuration
> >>>     is built after free_init_memory.
> >>
> >> I was half expecting that free_init_memory() would not be called for
> Xen
> >> Armv8R.
> >>
> >
> > We had called free_init_memory for Xen Armv8R, but it doesn't really
> mean
> > much. As we have static heap, so we don't reclaim init memory to heap.
> And
> > this reclaimed memory could not be used by Xen data and bss either. But
> > from the security perspective, free_init_memory will drop the Xen init
> > code & data, this will reduce the code an attacker can exploit.
> IIUC, zero-ing the region (or something) similar will be sufficient
> here. IOW, you don't necessarily need to remove the mappings.
> 
> >>>
> >>>     So if the unpaused guests start executing the context switch at
> this
> >>>     point, then its MPU context will base on the boot-time MPU
> >> configuration.
> >>
> >> Can you explain why you want to switch the MPU configuration that late?
> >>
> >
> > In the boot stage, Xen is the only user of MPU. It may add some memory
> > nodes or device memory to MPU regions for temporary usage. After free
> > init memory, we want to reclaim these MPU regions to give more MPU
> regions
> > can be used for guests. Also we will do some merge and reorder work.
> This
> > work can make MPU regions to be easier managed in guest context switch.
> 
> Do you have any example of such regions?
> >
> >>>     Probably it will be inconsistent with runtime MPU configuration,
> this
> >>>     will cause unexpected problems (This may not happen in a single
> core
> >>>     system, but on SMP systems, this problem is foreseeable, so we
> hope
> >> to
> >>>     solve it at the beginning).
> >>
> >> [...]
> >>
> >>> ### **2.4. Changes of memory management**
> >>> Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have
> to
> >>> decouple Xen from VMSA. And give Xen the ability to manage memory in
> >> PMSA.
> >>>
> >>> 1. ***Use buddy allocator to manage physical pages for PMSA***
> >>>      From the view of physical page, PMSA and VMSA don't have any
> >> difference.
> >>>      So we can reuse buddy allocator on Armv8-R64 to manage physical
> >> pages.
> >>>      The difference is that, in VMSA, Xen will map allocated pages to
> >> virtual
> >>>      addresses. But in PMSA, Xen just convert the pages to physical
> >> address.
> >>>
> >>> 2. ***Can not use virtual address for memory management***
> >>>      As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using
> >> virtual
> >>>      address to manage memory. This brings some problems, some virtual
> >> address
> >>>      based features could not work well on Armv8-R64, like `FIXMAP`,
> >> `vmap/vumap`,
> >>>      `ioremap` and `alternative`.
> >>>
> >>>      But the functions or macros of these features are used in lots of
> >> common
> >>>      code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate
> relate
> >> code
> >>>      everywhere. In this case, we propose to use stub helpers to make
> the
> >> changes
> >>>      transparently to common code.
> >>>      1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap
> >> operations.
> >>>         This will return physical address directly of fixmapped item.
> >>>      2. For `vmap/vumap`, we will use some empty inline stub helpers:
> >>>           ```
> >>>           static inline void vm_init_type(...) {}
> >>>           static inline void *__vmap(...)
> >>>           {
> >>>               return NULL;
> >>>           }
> >>>           static inline void vunmap(const void *va) {}
> >>>           static inline void *vmalloc(size_t size)
> >>>           {
> >>>               return NULL;
> >>>           }
> >>>           static inline void *vmalloc_xen(size_t size)
> >>>           {
> >>>               return NULL;
> >>>           }
> >>>           static inline void vfree(void *va) {}
> >>>           ```
> >>>
> >>>      3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to
> >> always
> >>>         return `NULL`, they could not work well on Armv8-R64 without
> >> changes.
> >>>         `ioremap` will return input address directly.
> >>>           ```
> >>>           static inline void *ioremap_attr(...)
> >>>           {
> >>>               /* We don't have the ability to change input PA cache
> >> attributes */
> >> OOI, who will set them?
> >
> > Some callers that want to change a memory's attribute will set them.
> Something like
> > ioremap_nocache. I am not sure is this what you had asked : )
> 
> I am a bit confused. If ioremap_nocache() can change the attribute, then
> why would ioremap_attr() not be able to do it?
> 

MMU based iorepmap_xxxx can use a new VA and new PTE to do this. But for
MPU, we can't do it, except you change the whole MPU region's attribute.
The reasons are:
1. For V8R PMSA, one physical address only be existed one MPU region.
2. There's not enough MPU regions for us to split one MPU region to
   multiple MPU regions (changed pages region and unmodified pages regions).

> >
> >>
> >>>               if ( CACHE_ATTR_need_change )
> >>>                   return NULL;
> >>>               return (void *)pa;
> >>>           }
> >>>           static inline void __iomem *ioremap_nocache(...)
> >>>           {
> >>>               return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
> >>>           }
> >>>           static inline void __iomem *ioremap_cache(...)
> >>>           {
> >>>               return ioremap_attr(start, len, PAGE_HYPERVISOR);
> >>>           }
> >>>           static inline void __iomem *ioremap_wc(...)
> >>>           {
> >>>               return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
> >>>           }
> >>>           void *ioremap(...)
> >>>           {
> >>>               return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> >>>           }
> >>>
> >>>           ```
> >>>       4. For `alternative`, it depends on `vmap` too.
> >>
> >> The only reason we depend on vmap() is because the map the sections
> >> *text read-only and we enforce WnX. For VMSA, it would be possible to
> >> avoid vmap() with some rework. I don't know for PMSA.
> >>
> >
> > For PMSA, we still enforce WnX. For your use case, I assume it's
> alternative.
> > It still may have some possibility to avoid vmap(). But there may be
> some
> > security issues. We had thought to disable MPU -> update xen text ->
> enable
> > MPU to copy VMSA alternative's behavior. The problem with this, however,
> > is that at some point, all memory is RWX. There maybe some security
> risk. > But because it's in init stage, it probably doesn't matter as much
> as
> I thought.
> 
> For boot code, we need to ensure the code is compliant to the Arm Arm.
> Other than that, it is OK to have the memory RWX for a short period of
> time.
> 
> In fact, when we originally boot Xen, we don't enforce WnX. We will
> start to enforce when initializing the memory. But there are no blocker
> to delay it (other than writing the code :)).

Ah, ok, it seems we still can implement alternative on MPU system.
I will update it in new version proposal, but place it in TODO, I don't
want to include it before single CPU support be merged. Because current
patch series is huge enough : )

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03  2:03                 ` Stefano Stabellini
@ 2022-03-03  2:12                   ` Wei Chen
  2022-03-03  2:15                     ` Stefano Stabellini
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-03  2:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, julien, Bertrand Marquis, Penny Zheng, Henry Wang, nd,
	George.Dunlap

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2022年3月3日 10:04
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org; julien@xen.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>; George.Dunlap@citrix.com
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> On Thu, 3 Mar 2022, Wei Chen wrote:
> > > On Wed, 2 Mar 2022, Wei Chen wrote:
> > > > > > > > > If not, and considering that we have to generate
> > > > > > > > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it
> make
> > > > > sense
> > > > > > > to
> > > > > > > > > also generate mpu,guest-memory-section, xen,static-mem,
> etc.
> > > at
> > > > > build
> > > > > > > > > time rather than passing it via device tree to Xen at
> runtime?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Did you mean we still add these information in device tree,
> but
> > > for
> > > > > > > build
> > > > > > > > time only. In runtime we don't parse them?
> > > > > > >
> > > > > > > Yes, something like that, but see below.
> > > > > > >
> > > > > > >
> > > > > > > > > What's the value of doing ARM_MPU_*_MEMORY_START/END at
> build
> > > time
> > > > > and
> > > > > > > > > everything else at runtime?
> > > > > > > >
> > > > > > > > ARM_MPU_*_MEMORY_START/END is defined by platform. But other
> > > things
> > > > > are
> > > > > > > > users customized. They can change their usage without
> rebuild
> > > the
> > > > > image.
> > > > > > >
> > > > > > > Good point.
> > > > > > >
> > > > > > > We don't want to have to rebuild Xen if the user updated a
> guest
> > > > > kernel,
> > > > > > > resulting in a larger boot-module-section.
> > > > > > >
> > > > > > > So I think it makes sense that "mpu,boot-module-section" is
> > > generated
> > > > > by
> > > > > > > the scripts (e.g. ImageBuilder) at build time, and Xen reads
> the
> > > > > > > property at boot from the runtime device tree.
> > > > > > >
> > > > > > > I think we need to divide the information into two groups:
> > > > > > >
> > > > > > >
> > > > > > > # Group1: board info
> > > > > > >
> > > > > > > This information is platform specific and it is not meant to
> > > change
> > > > > > > depending on the VM configuration. Ideally, we build Xen for a
> > > > > platform
> > > > > > > once, then we can use the same Xen binary together with any
> > > > > combination
> > > > > > > of dom0/domU kernels and ramdisks.
> > > > > > >
> > > > > > > This kind of information doesn't need to be exposed to the
> runtime
> > > > > > > device tree. But we can still use a build-time device tree to
> > > generate
> > > > > > > the addresses if it is convenient.
> > > > > > >
> > > > > > > XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and
> > > > > ARM_MPU_NORMAL_MEMORY_*
> > > > > > > seem to be part of this group.
> > > > > > >
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > >
> > > > > > > # Group2: boot configuration
> > > > > > >
> > > > > > > This information is about the specific set of binaries and VMs
> > > that we
> > > > > > > need to boot. It is conceptually similar to the dom0less
> device
> > > tree
> > > > > > > nodes that we already have. If we change one of the VM
> binaries,
> > > we
> > > > > > > likely have to refresh the information here.
> > > > > > >
> > > > > > > "mpu,boot-module-section" probably belongs to this group
> (unless
> > > we
> > > > > find
> > > > > > > a way to define "mpu,boot-module-section" generically so that
> we
> > > don't
> > > > > > > need to change it any time the set of boot modules change.)
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > I agree.
> > > > > >
> > > > > > > > > It looks like we are forced to have the sections
> definitions
> > > at
> > > > > build
> > > > > > > > > time because we need them before we can parse device tree.
> In
> > > that
> > > > > > > case,
> > > > > > > > > we might as well define all the sections at build time.
> > > > > > > > >
> > > > > > > > > But I think it would be even better if Xen could
> automatically
> > > > > choose
> > > > > > > > > xen,static-mem, mpu,guest-memory-section, etc. on its own
> > > based on
> > > > > the
> > > > > > > > > regular device tree information (/memory, /amba, etc.),
> > > without
> > > > > any
> > > > > > > need
> > > > > > > > > for explicitly describing each range with these new
> properties.
> > > > > > > > >
> > > > > > > >
> > > > > > > > for mpu,guest-memory-section, with the limitations: no other
> > > usage
> > > > > > > between
> > > > > > > > different guest' memory nodes, this is OK. But for
> xen,static-
> > > mem
> > > > > (heap),
> > > > > > > > we just want everything on a MPU system is dertermistic. But,
> of
> > > > > course
> > > > > > > Xen
> > > > > > > > can select left memory for heap without static-mem.
> > > > > > >
> > > > > > > It is good that you think they can be chosen by Xen.
> > > > > > >
> > > > > > > Differently from "boot-module-section", which has to do with
> the
> > > boot
> > > > > > > modules selected by the user for a specific execution,
> > > > > > > guest-memory-section and static-mem are Xen specific memory
> > > > > > > policies/allocations.
> > > > > > >
> > > > > > > A user wouldn't know how to fill them in. And I worry that
> even a
> > > > > script
> > > > > >
> > > > > > But users should know it, because static-mem for guest must be
> > > allocated
> > > > > > in this range. And users take the responsibility to set the
> DomU's
> > > > > > static allocate memory ranges.
> > > > >
> > > > > Let me premise that my goal is to avoid having many users
> reporting
> > > > > errors to xen-devel and xen-users when actually it is just a wrong
> > > > > choice of addresses.
> > > > >
> > > > > I think we need to make a distinction between addresses for the
> boot
> > > > > modules, e.g. addresses where to load xen, the dom0/U kernel,
> dom0/U
> > > > > ramdisk in memory at boot time, and VM static memory addresses.
> > > > >
> > > > > The boot modules addresses are particularly difficult to fill in
> > > because
> > > > > they are many and a small update in one of the modules could
> > > invalidate
> > > > > all the other addresses. This is why I ended up writing
> ImageBuilder.
> > > > > Since them, I received several emails from users thanking me for
> > > > > ImageBuilder :-)
> > > > >
> > > >
> > > > Thanks +999 😊
> > > >
> > > >
> > > > > The static VM memory addresses (xen,static-mem) should be a bit
> easier
> > > > > to fill in correctly. They are meant to be chosen once, and it
> > > shouldn't
> > > > > happen that an update on a kernel forces the user to change all
> the VM
> > > > > static memory addresses. Also, I know that some users actually
> want to
> > > > > be able to choose the domU addresses by hand because they have
> > > specific
> > > > > needs. So it is good that we can let the user choose the addresses
> if
> > > > > they want to.
> > > > >
> > > >
> > > > Yes.
> > > >
> > > > > With all of that said, I do think that many users won't have an
> > > opinion
> > > > > on the VM static memory addresses and won't know how to choose
> them.
> > > > > It would be error prone to let them try to fill them in by hand.
> So I
> > > > > was already planning on adding support to ImageBuilder to
> > > automatically
> > > > > generate xen,static-mem for dom0less domains.
> > > > >
> > > >
> > > > Let me make sure that's what you said: Users give an VM memory size
> to
> > > > ImageBuilder, and ImageBuilder will generate xen,static-mem = <start,
> > > size>.
> > > > For specific VM, ImageBuilder also can accept start and size as
> inputs?
> > > >
> > > > Do I understand this correctly?
> > >
> > > Yes, exactly
> > >
> > >
> > > > > Going back to this specific discussion about boot-module-section:
> I
> > > can
> > > > > see now that, given xen,static-mem is chosen by ImageBuilder (or
> > > >
> > > > By hand : )
> > > >
> > > > > similar) and not Xen, then it makes sense to have ImageBuilder (or
> > > > > similar) also generate boot-module-section.
> > > > >
> > > >
> > > > If my above understanding is right, then yes.
> > >
> > > Yes, I think we are on the same page
> > >
> > >
> > > > > > > like ImageBuilder wouldn't be the best place to pick these
> values
> > > --
> > > > > > > they seem too "important" to leave to a script.
> > > > > > >
> > > > > > > But it seems possible to choose the values in Xen:
> > > > > > > - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at
> > > build
> > > > > time
> > > > > > > - Xen reads boot-module-section from device tree
> > > > > > >
> > > > > > > It should be possible at this point for Xen to pick the best
> > > values
> > > > > for
> > > > > > > guest-memory-section and static-mem based on the memory
> available.
> > > > > > >
> > > > > >
> > > > > > How Xen to pick? Does it mean in static allocation DomU DT node,
> we
> > > just
> > > > > > need a size, but don't require a start address for static-mem?
> > > > >
> > > > > Yes the idea was that the user would only provide the size (e.g.
> > > > > DOMU_STATIC_MEM[1]=1024) and the addresses would be automatically
> > > > > calculated. But I didn't mean to change the existing xen,static-
> mem
> > > > > device tree bindings. So it is best if the xen,static-mem
> addresses
> > > > > generation is done by ImageBuilder (or similar tool) instead of
> Xen.
> > > > >
> > > >
> > > > If we still keep the option for user to specify the start and size
> > > > parameters for VM memory, because it maybe very important for a
> > > > deterministic system (fully static system), I agree with you.
> > > >
> > > > And in current static-allocation, I think Xen doesn't generate
> > > > xen,static-mem addresses, all by hands...
> > >
> > > Yeah
> > >
> >
> > I will update my proposal to cover our above discussion, but I forgot
> one
> > thing. As the platform header files will be generated from DTS, does it
> > mean we have to maintain platform dts files in Xen like what Zephyr has
> > done?
> 
> I would prefer not to have to maintain platform dts files in Xen like
> Zephyr is doing. Ideally, the user should be able to take any
> spec-compliant device tree file and use it. I would say: let's start
> without adding the dts files to Xen (we might have one under docs/ but
> just as an example.) We can add them later if the need arise.
> 

But without any default dts, that means we can't start to build Xen for
v8R? It seems in this case, we need Makefile to print some message to
tell users to specific his/her dts path.

> 
> > And do you have some idea to integrate the "ImageBuilder"? Make it
> > as a submodule of Xen or integrate to xen-tools?
> 
> I think it would be best if ImageBuilder was kept as a separate
> repository because there should be no strong ties between ImageBuilder
> versions and Xen versions. It is more convenient to handle it in a
> separate repository, especially as Yocto and other build systems might
> clone ImageBuilder during the build to generate boot.scr (it is already
> the case).
> 
> That said, it might be good to make it more "official" but moving it to
> Xen Project. I can talk to George about creating
> http://xenbits.xen.org/git-http/imagebuilder.git or
> https://gitlab.com/xen-project/imagebuilder.

That's good : )


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03  2:12                   ` Wei Chen
@ 2022-03-03  2:15                     ` Stefano Stabellini
  0 siblings, 0 replies; 34+ messages in thread
From: Stefano Stabellini @ 2022-03-03  2:15 UTC (permalink / raw)
  To: Wei Chen
  Cc: Stefano Stabellini, xen-devel, julien, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd, George.Dunlap

[-- Attachment #1: Type: text/plain, Size: 11619 bytes --]

On Thu, 3 Mar 2022, Wei Chen wrote:
> > On Thu, 3 Mar 2022, Wei Chen wrote:
> > > > On Wed, 2 Mar 2022, Wei Chen wrote:
> > > > > > > > > > If not, and considering that we have to generate
> > > > > > > > > > ARM_MPU_*_MEMORY_START/END anyway at build time, would it
> > make
> > > > > > sense
> > > > > > > > to
> > > > > > > > > > also generate mpu,guest-memory-section, xen,static-mem,
> > etc.
> > > > at
> > > > > > build
> > > > > > > > > > time rather than passing it via device tree to Xen at
> > runtime?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Did you mean we still add these information in device tree,
> > but
> > > > for
> > > > > > > > build
> > > > > > > > > time only. In runtime we don't parse them?
> > > > > > > >
> > > > > > > > Yes, something like that, but see below.
> > > > > > > >
> > > > > > > >
> > > > > > > > > > What's the value of doing ARM_MPU_*_MEMORY_START/END at
> > build
> > > > time
> > > > > > and
> > > > > > > > > > everything else at runtime?
> > > > > > > > >
> > > > > > > > > ARM_MPU_*_MEMORY_START/END is defined by platform. But other
> > > > things
> > > > > > are
> > > > > > > > > users customized. They can change their usage without
> > rebuild
> > > > the
> > > > > > image.
> > > > > > > >
> > > > > > > > Good point.
> > > > > > > >
> > > > > > > > We don't want to have to rebuild Xen if the user updated a
> > guest
> > > > > > kernel,
> > > > > > > > resulting in a larger boot-module-section.
> > > > > > > >
> > > > > > > > So I think it makes sense that "mpu,boot-module-section" is
> > > > generated
> > > > > > by
> > > > > > > > the scripts (e.g. ImageBuilder) at build time, and Xen reads
> > the
> > > > > > > > property at boot from the runtime device tree.
> > > > > > > >
> > > > > > > > I think we need to divide the information into two groups:
> > > > > > > >
> > > > > > > >
> > > > > > > > # Group1: board info
> > > > > > > >
> > > > > > > > This information is platform specific and it is not meant to
> > > > change
> > > > > > > > depending on the VM configuration. Ideally, we build Xen for a
> > > > > > platform
> > > > > > > > once, then we can use the same Xen binary together with any
> > > > > > combination
> > > > > > > > of dom0/domU kernels and ramdisks.
> > > > > > > >
> > > > > > > > This kind of information doesn't need to be exposed to the
> > runtime
> > > > > > > > device tree. But we can still use a build-time device tree to
> > > > generate
> > > > > > > > the addresses if it is convenient.
> > > > > > > >
> > > > > > > > XEN_START_ADDRESS, ARM_MPU_DEVICE_MEMORY_*, and
> > > > > > ARM_MPU_NORMAL_MEMORY_*
> > > > > > > > seem to be part of this group.
> > > > > > > >
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > >
> > > > > > > > # Group2: boot configuration
> > > > > > > >
> > > > > > > > This information is about the specific set of binaries and VMs
> > > > that we
> > > > > > > > need to boot. It is conceptually similar to the dom0less
> > device
> > > > tree
> > > > > > > > nodes that we already have. If we change one of the VM
> > binaries,
> > > > we
> > > > > > > > likely have to refresh the information here.
> > > > > > > >
> > > > > > > > "mpu,boot-module-section" probably belongs to this group
> > (unless
> > > > we
> > > > > > find
> > > > > > > > a way to define "mpu,boot-module-section" generically so that
> > we
> > > > don't
> > > > > > > > need to change it any time the set of boot modules change.)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > I agree.
> > > > > > >
> > > > > > > > > > It looks like we are forced to have the sections
> > definitions
> > > > at
> > > > > > build
> > > > > > > > > > time because we need them before we can parse device tree.
> > In
> > > > that
> > > > > > > > case,
> > > > > > > > > > we might as well define all the sections at build time.
> > > > > > > > > >
> > > > > > > > > > But I think it would be even better if Xen could
> > automatically
> > > > > > choose
> > > > > > > > > > xen,static-mem, mpu,guest-memory-section, etc. on its own
> > > > based on
> > > > > > the
> > > > > > > > > > regular device tree information (/memory, /amba, etc.),
> > > > without
> > > > > > any
> > > > > > > > need
> > > > > > > > > > for explicitly describing each range with these new
> > properties.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > for mpu,guest-memory-section, with the limitations: no other
> > > > usage
> > > > > > > > between
> > > > > > > > > different guest' memory nodes, this is OK. But for
> > xen,static-
> > > > mem
> > > > > > (heap),
> > > > > > > > > we just want everything on a MPU system is dertermistic. But,
> > of
> > > > > > course
> > > > > > > > Xen
> > > > > > > > > can select left memory for heap without static-mem.
> > > > > > > >
> > > > > > > > It is good that you think they can be chosen by Xen.
> > > > > > > >
> > > > > > > > Differently from "boot-module-section", which has to do with
> > the
> > > > boot
> > > > > > > > modules selected by the user for a specific execution,
> > > > > > > > guest-memory-section and static-mem are Xen specific memory
> > > > > > > > policies/allocations.
> > > > > > > >
> > > > > > > > A user wouldn't know how to fill them in. And I worry that
> > even a
> > > > > > script
> > > > > > >
> > > > > > > But users should know it, because static-mem for guest must be
> > > > allocated
> > > > > > > in this range. And users take the responsibility to set the
> > DomU's
> > > > > > > static allocate memory ranges.
> > > > > >
> > > > > > Let me premise that my goal is to avoid having many users
> > reporting
> > > > > > errors to xen-devel and xen-users when actually it is just a wrong
> > > > > > choice of addresses.
> > > > > >
> > > > > > I think we need to make a distinction between addresses for the
> > boot
> > > > > > modules, e.g. addresses where to load xen, the dom0/U kernel,
> > dom0/U
> > > > > > ramdisk in memory at boot time, and VM static memory addresses.
> > > > > >
> > > > > > The boot modules addresses are particularly difficult to fill in
> > > > because
> > > > > > they are many and a small update in one of the modules could
> > > > invalidate
> > > > > > all the other addresses. This is why I ended up writing
> > ImageBuilder.
> > > > > > Since them, I received several emails from users thanking me for
> > > > > > ImageBuilder :-)
> > > > > >
> > > > >
> > > > > Thanks +999 😊
> > > > >
> > > > >
> > > > > > The static VM memory addresses (xen,static-mem) should be a bit
> > easier
> > > > > > to fill in correctly. They are meant to be chosen once, and it
> > > > shouldn't
> > > > > > happen that an update on a kernel forces the user to change all
> > the VM
> > > > > > static memory addresses. Also, I know that some users actually
> > want to
> > > > > > be able to choose the domU addresses by hand because they have
> > > > specific
> > > > > > needs. So it is good that we can let the user choose the addresses
> > if
> > > > > > they want to.
> > > > > >
> > > > >
> > > > > Yes.
> > > > >
> > > > > > With all of that said, I do think that many users won't have an
> > > > opinion
> > > > > > on the VM static memory addresses and won't know how to choose
> > them.
> > > > > > It would be error prone to let them try to fill them in by hand.
> > So I
> > > > > > was already planning on adding support to ImageBuilder to
> > > > automatically
> > > > > > generate xen,static-mem for dom0less domains.
> > > > > >
> > > > >
> > > > > Let me make sure that's what you said: Users give an VM memory size
> > to
> > > > > ImageBuilder, and ImageBuilder will generate xen,static-mem = <start,
> > > > size>.
> > > > > For specific VM, ImageBuilder also can accept start and size as
> > inputs?
> > > > >
> > > > > Do I understand this correctly?
> > > >
> > > > Yes, exactly
> > > >
> > > >
> > > > > > Going back to this specific discussion about boot-module-section:
> > I
> > > > can
> > > > > > see now that, given xen,static-mem is chosen by ImageBuilder (or
> > > > >
> > > > > By hand : )
> > > > >
> > > > > > similar) and not Xen, then it makes sense to have ImageBuilder (or
> > > > > > similar) also generate boot-module-section.
> > > > > >
> > > > >
> > > > > If my above understanding is right, then yes.
> > > >
> > > > Yes, I think we are on the same page
> > > >
> > > >
> > > > > > > > like ImageBuilder wouldn't be the best place to pick these
> > values
> > > > --
> > > > > > > > they seem too "important" to leave to a script.
> > > > > > > >
> > > > > > > > But it seems possible to choose the values in Xen:
> > > > > > > > - Xen knows ARM_MPU_NORMAL_MEMORY_* because it was defined at
> > > > build
> > > > > > time
> > > > > > > > - Xen reads boot-module-section from device tree
> > > > > > > >
> > > > > > > > It should be possible at this point for Xen to pick the best
> > > > values
> > > > > > for
> > > > > > > > guest-memory-section and static-mem based on the memory
> > available.
> > > > > > > >
> > > > > > >
> > > > > > > How Xen to pick? Does it mean in static allocation DomU DT node,
> > we
> > > > just
> > > > > > > need a size, but don't require a start address for static-mem?
> > > > > >
> > > > > > Yes the idea was that the user would only provide the size (e.g.
> > > > > > DOMU_STATIC_MEM[1]=1024) and the addresses would be automatically
> > > > > > calculated. But I didn't mean to change the existing xen,static-
> > mem
> > > > > > device tree bindings. So it is best if the xen,static-mem
> > addresses
> > > > > > generation is done by ImageBuilder (or similar tool) instead of
> > Xen.
> > > > > >
> > > > >
> > > > > If we still keep the option for user to specify the start and size
> > > > > parameters for VM memory, because it maybe very important for a
> > > > > deterministic system (fully static system), I agree with you.
> > > > >
> > > > > And in current static-allocation, I think Xen doesn't generate
> > > > > xen,static-mem addresses, all by hands...
> > > >
> > > > Yeah
> > > >
> > >
> > > I will update my proposal to cover our above discussion, but I forgot
> > one
> > > thing. As the platform header files will be generated from DTS, does it
> > > mean we have to maintain platform dts files in Xen like what Zephyr has
> > > done?
> > 
> > I would prefer not to have to maintain platform dts files in Xen like
> > Zephyr is doing. Ideally, the user should be able to take any
> > spec-compliant device tree file and use it. I would say: let's start
> > without adding the dts files to Xen (we might have one under docs/ but
> > just as an example.) We can add them later if the need arise.
> > 
> 
> But without any default dts, that means we can't start to build Xen for
> v8R? It seems in this case, we need Makefile to print some message to
> tell users to specific his/her dts path.
 
Yes, exactly.



> > > And do you have some idea to integrate the "ImageBuilder"? Make it
> > > as a submodule of Xen or integrate to xen-tools?
> > 
> > I think it would be best if ImageBuilder was kept as a separate
> > repository because there should be no strong ties between ImageBuilder
> > versions and Xen versions. It is more convenient to handle it in a
> > separate repository, especially as Yocto and other build systems might
> > clone ImageBuilder during the build to generate boot.scr (it is already
> > the case).
> > 
> > That said, it might be good to make it more "official" but moving it to
> > Xen Project. I can talk to George about creating
> > http://xenbits.xen.org/git-http/imagebuilder.git or
> > https://gitlab.com/xen-project/imagebuilder.
> 
> That's good : )

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03  1:35               ` Wei Chen
@ 2022-03-03  9:15                 ` Julien Grall
  2022-03-03 10:43                   ` Wei Chen
  0 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2022-03-03  9:15 UTC (permalink / raw)
  To: Wei Chen, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Wei,

On 03/03/2022 01:35, Wei Chen wrote:
>>> 1. Assembly code for EL1 MPU context_switch
>>
>> This discussion reminds me when KVM decided to rewrite their context
>> switch from assembly to C. The outcome was the compiler is able to do a
>> better job than us when it comes to optimizing.
>>
>> With a C version, we could also share the save/restore code with 32-bit
>> and it is easier to read/maintain.
>>
>> So I would suggest to run some numbers to check if it really worth
>> implementing the MPU save/restore in assembly.
>>
> 
> It's interesting to hear KVM guys have similar discussion. Yes, if the
> gains of assembly code is not very obvious, then reusing the code for 32-bit
> would be more important. As our current platform (FVP) could not do very
> precise performance measurement. I want to keep current assembly code there,
> when we have a platform that can do such measurement we can have a thread
> to discuss it.

I briefly looked at the code, the assembly version is not going to be 
trivial to review and we don't know yet whether it has an advantage. So 
I would say this should be the inverse here.

We want the C version first until we can prove the assembly version is 
better.

My gut feeling is we will not need the assembly version.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03  9:15                 ` Julien Grall
@ 2022-03-03 10:43                   ` Wei Chen
  0 siblings, 0 replies; 34+ messages in thread
From: Wei Chen @ 2022-03-03 10:43 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年3月3日 17:15
> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Wei,
> 
> On 03/03/2022 01:35, Wei Chen wrote:
> >>> 1. Assembly code for EL1 MPU context_switch
> >>
> >> This discussion reminds me when KVM decided to rewrite their context
> >> switch from assembly to C. The outcome was the compiler is able to do a
> >> better job than us when it comes to optimizing.
> >>
> >> With a C version, we could also share the save/restore code with 32-bit
> >> and it is easier to read/maintain.
> >>
> >> So I would suggest to run some numbers to check if it really worth
> >> implementing the MPU save/restore in assembly.
> >>
> >
> > It's interesting to hear KVM guys have similar discussion. Yes, if the
> > gains of assembly code is not very obvious, then reusing the code for
> 32-bit
> > would be more important. As our current platform (FVP) could not do very
> > precise performance measurement. I want to keep current assembly code
> there,
> > when we have a platform that can do such measurement we can have a
> thread
> > to discuss it.
> 
> I briefly looked at the code, the assembly version is not going to be
> trivial to review and we don't know yet whether it has an advantage. So
> I would say this should be the inverse here.
> 
> We want the C version first until we can prove the assembly version is
> better.
> 
> My gut feeling is we will not need the assembly version.
> 

Ok, we will rollback to C version. After we will finish the measurements,
then we will discuss it again (if the assembly has enough gain).

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03  2:06       ` Wei Chen
@ 2022-03-03 19:51         ` Julien Grall
  2022-03-04  5:38           ` Wei Chen
  2022-03-07  2:12         ` Wei Chen
  1 sibling, 1 reply; 34+ messages in thread
From: Julien Grall @ 2022-03-03 19:51 UTC (permalink / raw)
  To: Wei Chen, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Wei,

On 03/03/2022 02:06, Wei Chen wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2022年3月2日 20:00
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; Stefano
>> Stabellini <sstabellini@kernel.org>
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
>> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
>>
>>
>>
>> On 01/03/2022 07:51, Wei Chen wrote:
>>> Hi Julien,
>>
>> Hi Wei,
>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 2022年2月26日 4:55
>>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> Stefano
>>>> Stabellini <sstabellini@kernel.org>
>>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
>>>> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
>>>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
>>>>> ### 1.2. Xen Challenges with PMSA Virtualization
>>>>> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to
>> run
>>>>> with an MPU and host multiple guest OSes.
>>>>>
>>>>> - No MMU at EL2:
>>>>>        - No EL2 Stage 1 address translation
>>>>>            - Xen provides fixed ARM64 virtual memory layout as basis of
>>>> EL2
>>>>>              stage 1 address translation, which is not applicable on
>> MPU
>>>> system,
>>>>>              where there is no virtual addressing. As a result, any
>>>> operation
>>>>>              involving transition from PA to VA, like ioremap, needs
>>>> modification
>>>>>              on MPU system.
>>>>>        - Xen's run-time addresses are the same as the link time
>> addresses.
>>>>>            - Enable PIC (position-independent code) on a real-time
>> target
>>>>>              processor probably very rare.
>>>>
>>>> Aside the assembly boot code and UEFI stub, Xen already runs at the
>> same
>>>> address as it was linked.
>>>>
>>>
>>> But the difference is that, base on MMU, we can use the same link
>> address
>>> for all platforms. But on MPU system, we can't do it in the same way.
>>
>> I agree that we currently use the same link address for all the
>> platforms. But this is also a problem when using MMU because EL2 has a
>> single TTBR.
>>
>> At the moment we are switching page-tables with the MMU which is not
>> safe. Instead we need to turn out the MMU off, switch page-tables and
>> then turn on the MMU. This means we need to have an identity mapping of
>> Xen in the page-tables. Assuming Xen is not relocated, the identity
>> mapping may clash with Xen (or the rest of the virtual address map).
>>
> 
> Is this the same reason we create a dummy reloc section for EFI loader?

The relocations for the EFI loader are necessary because IIRC it is 
running with virt == phys.

But this brings to all sort of problem:

https://lore.kernel.org/all/20171221145521.29526-1-julien.grall@linaro.org/

[...]

>>>
>>> Some callers that want to change a memory's attribute will set them.
>> Something like
>>> ioremap_nocache. I am not sure is this what you had asked : )
>>
>> I am a bit confused. If ioremap_nocache() can change the attribute, then
>> why would ioremap_attr() not be able to do it?
>>
> 
> MMU based iorepmap_xxxx can use a new VA and new PTE to do this. But for
> MPU, we can't do it, except you change the whole MPU region's attribute.
> The reasons are:
> 1. For V8R PMSA, one physical address only be existed one MPU region.
> 2. There's not enough MPU regions for us to split one MPU region to
>     multiple MPU regions (changed pages region and unmodified pages regions).

Ok. I think we should at least check the attributes requested match the 
one in the MPU.

> 
>>>
>>>>
>>>>>                if ( CACHE_ATTR_need_change )
>>>>>                    return NULL;
>>>>>                return (void *)pa;
>>>>>            }
>>>>>            static inline void __iomem *ioremap_nocache(...)
>>>>>            {
>>>>>                return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
>>>>>            }
>>>>>            static inline void __iomem *ioremap_cache(...)
>>>>>            {
>>>>>                return ioremap_attr(start, len, PAGE_HYPERVISOR);
>>>>>            }
>>>>>            static inline void __iomem *ioremap_wc(...)
>>>>>            {
>>>>>                return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
>>>>>            }
>>>>>            void *ioremap(...)
>>>>>            {
>>>>>                return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
>>>>>            }
>>>>>
>>>>>            ```
>>>>>        4. For `alternative`, it depends on `vmap` too.
>>>>
>>>> The only reason we depend on vmap() is because the map the sections
>>>> *text read-only and we enforce WnX. For VMSA, it would be possible to
>>>> avoid vmap() with some rework. I don't know for PMSA.
>>>>
>>>
>>> For PMSA, we still enforce WnX. For your use case, I assume it's
>> alternative.
>>> It still may have some possibility to avoid vmap(). But there may be
>> some
>>> security issues. We had thought to disable MPU -> update xen text ->
>> enable
>>> MPU to copy VMSA alternative's behavior. The problem with this, however,
>>> is that at some point, all memory is RWX. There maybe some security
>> risk. > But because it's in init stage, it probably doesn't matter as much
>> as
>> I thought.
>>
>> For boot code, we need to ensure the code is compliant to the Arm Arm.
>> Other than that, it is OK to have the memory RWX for a short period of
>> time.
>>
>> In fact, when we originally boot Xen, we don't enforce WnX. We will
>> start to enforce when initializing the memory. But there are no blocker
>> to delay it (other than writing the code :)).
> 
> Ah, ok, it seems we still can implement alternative on MPU system.
> I will update it in new version proposal, but place it in TODO, I don't
> want to include it before single CPU support be merged. Because current
> patch series is huge enough : )

That's fine with me. I am not expecting you to implement everything we 
discussed here from day 1! :)

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03 19:51         ` Julien Grall
@ 2022-03-04  5:38           ` Wei Chen
  0 siblings, 0 replies; 34+ messages in thread
From: Wei Chen @ 2022-03-04  5:38 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年3月4日 3:51
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; Stefano
> Stabellini <sstabellini@kernel.org>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Wei,
> 
> On 03/03/2022 02:06, Wei Chen wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2022年3月2日 20:00
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini <sstabellini@kernel.org>
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> >> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>
> >>
> >>
> >> On 01/03/2022 07:51, Wei Chen wrote:
> >>> Hi Julien,
> >>
> >> Hi Wei,
> >>
> >>>> -----Original Message-----
> >>>> From: Julien Grall <julien@xen.org>
> >>>> Sent: 2022年2月26日 4:55
> >>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> Stefano
> >>>> Stabellini <sstabellini@kernel.org>
> >>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> >>>> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd
> <nd@arm.com>
> >>>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>>>> ### 1.2. Xen Challenges with PMSA Virtualization
> >>>>> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to
> >> run
> >>>>> with an MPU and host multiple guest OSes.
> >>>>>
> >>>>> - No MMU at EL2:
> >>>>>        - No EL2 Stage 1 address translation
> >>>>>            - Xen provides fixed ARM64 virtual memory layout as basis
> of
> >>>> EL2
> >>>>>              stage 1 address translation, which is not applicable on
> >> MPU
> >>>> system,
> >>>>>              where there is no virtual addressing. As a result, any
> >>>> operation
> >>>>>              involving transition from PA to VA, like ioremap, needs
> >>>> modification
> >>>>>              on MPU system.
> >>>>>        - Xen's run-time addresses are the same as the link time
> >> addresses.
> >>>>>            - Enable PIC (position-independent code) on a real-time
> >> target
> >>>>>              processor probably very rare.
> >>>>
> >>>> Aside the assembly boot code and UEFI stub, Xen already runs at the
> >> same
> >>>> address as it was linked.
> >>>>
> >>>
> >>> But the difference is that, base on MMU, we can use the same link
> >> address
> >>> for all platforms. But on MPU system, we can't do it in the same way.
> >>
> >> I agree that we currently use the same link address for all the
> >> platforms. But this is also a problem when using MMU because EL2 has a
> >> single TTBR.
> >>
> >> At the moment we are switching page-tables with the MMU which is not
> >> safe. Instead we need to turn out the MMU off, switch page-tables and
> >> then turn on the MMU. This means we need to have an identity mapping of
> >> Xen in the page-tables. Assuming Xen is not relocated, the identity
> >> mapping may clash with Xen (or the rest of the virtual address map).
> >>
> >
> > Is this the same reason we create a dummy reloc section for EFI loader?
> 
> The relocations for the EFI loader are necessary because IIRC it is
> running with virt == phys.
> 
> But this brings to all sort of problem:
> 
> https://lore.kernel.org/all/20171221145521.29526-1-
> julien.grall@linaro.org/
> 

It's interesting, I will have a look into that thread.

> [...]
> 
> >>>
> >>> Some callers that want to change a memory's attribute will set them.
> >> Something like
> >>> ioremap_nocache. I am not sure is this what you had asked : )
> >>
> >> I am a bit confused. If ioremap_nocache() can change the attribute,
> then
> >> why would ioremap_attr() not be able to do it?
> >>
> >
> > MMU based iorepmap_xxxx can use a new VA and new PTE to do this. But for
> > MPU, we can't do it, except you change the whole MPU region's attribute.
> > The reasons are:
> > 1. For V8R PMSA, one physical address only be existed one MPU region.
> > 2. There's not enough MPU regions for us to split one MPU region to
> >     multiple MPU regions (changed pages region and unmodified pages
> regions).
> 
> Ok. I think we should at least check the attributes requested match the
> one in the MPU.
> 

Yes, this is what we want to do.

> >
> >>>
> >>>>
> >>>>>                if ( CACHE_ATTR_need_change )
> >>>>>                    return NULL;
> >>>>>                return (void *)pa;
> >>>>>            }
> >>>>>            static inline void __iomem *ioremap_nocache(...)
> >>>>>            {
> >>>>>                return ioremap_attr(start, len,
> PAGE_HYPERVISOR_NOCACHE);
> >>>>>            }
> >>>>>            static inline void __iomem *ioremap_cache(...)
> >>>>>            {
> >>>>>                return ioremap_attr(start, len, PAGE_HYPERVISOR);
> >>>>>            }
> >>>>>            static inline void __iomem *ioremap_wc(...)
> >>>>>            {
> >>>>>                return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
> >>>>>            }
> >>>>>            void *ioremap(...)
> >>>>>            {
> >>>>>                return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> >>>>>            }
> >>>>>
> >>>>>            ```
> >>>>>        4. For `alternative`, it depends on `vmap` too.
> >>>>
> >>>> The only reason we depend on vmap() is because the map the sections
> >>>> *text read-only and we enforce WnX. For VMSA, it would be possible to
> >>>> avoid vmap() with some rework. I don't know for PMSA.
> >>>>
> >>>
> >>> For PMSA, we still enforce WnX. For your use case, I assume it's
> >> alternative.
> >>> It still may have some possibility to avoid vmap(). But there may be
> >> some
> >>> security issues. We had thought to disable MPU -> update xen text ->
> >> enable
> >>> MPU to copy VMSA alternative's behavior. The problem with this,
> however,
> >>> is that at some point, all memory is RWX. There maybe some security
> >> risk. > But because it's in init stage, it probably doesn't matter as
> much
> >> as
> >> I thought.
> >>
> >> For boot code, we need to ensure the code is compliant to the Arm Arm.
> >> Other than that, it is OK to have the memory RWX for a short period of
> >> time.
> >>
> >> In fact, when we originally boot Xen, we don't enforce WnX. We will
> >> start to enforce when initializing the memory. But there are no blocker
> >> to delay it (other than writing the code :)).
> >
> > Ah, ok, it seems we still can implement alternative on MPU system.
> > I will update it in new version proposal, but place it in TODO, I don't
> > want to include it before single CPU support be merged. Because current
> > patch series is huge enough : )
> 
> That's fine with me. I am not expecting you to implement everything we
> discussed here from day 1! :)
> 

Great! Thanks~

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-03  2:06       ` Wei Chen
  2022-03-03 19:51         ` Julien Grall
@ 2022-03-07  2:12         ` Wei Chen
  2022-03-07 22:58           ` Stefano Stabellini
  1 sibling, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-07  2:12 UTC (permalink / raw)
  To: Wei Chen, Julien Grall, xen-devel, Stefano Stabellini
  Cc: Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Stefano,

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Wei
> Chen
> Sent: 2022年3月3日 10:07
> To: Julien Grall <julien@xen.org>; xen-devel@lists.xenproject.org; Stefano
> Stabellini <sstabellini@kernel.org>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Julien,
> 
> > -----Original Message-----
> > From: Julien Grall <julien@xen.org>
> > Sent: 2022年3月2日 20:00
> > To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; Stefano
> > Stabellini <sstabellini@kernel.org>
> > Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> > <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd <nd@arm.com>
> > Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >
> >
> >
> > On 01/03/2022 07:51, Wei Chen wrote:
> > > Hi Julien,
> >
> > Hi Wei,
> >
> > >> -----Original Message-----
> > >> From: Julien Grall <julien@xen.org>
> > >> Sent: 2022年2月26日 4:55
> > >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> > Stefano
> > >> Stabellini <sstabellini@kernel.org>
> > >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> > >> <Penny.Zheng@arm.com>; Henry Wang <Henry.Wang@arm.com>; nd
> <nd@arm.com>
> > >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> > >>> ### 1.2. Xen Challenges with PMSA Virtualization
> > >>> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to
> > run
> > >>> with an MPU and host multiple guest OSes.
> > >>>
> > >>> - No MMU at EL2:
> > >>>       - No EL2 Stage 1 address translation
> > >>>           - Xen provides fixed ARM64 virtual memory layout as basis
> of
> > >> EL2
> > >>>             stage 1 address translation, which is not applicable on
> > MPU
> > >> system,
> > >>>             where there is no virtual addressing. As a result, any
> > >> operation
> > >>>             involving transition from PA to VA, like ioremap, needs
> > >> modification
> > >>>             on MPU system.
> > >>>       - Xen's run-time addresses are the same as the link time
> > addresses.
> > >>>           - Enable PIC (position-independent code) on a real-time
> > target
> > >>>             processor probably very rare.
> > >>
> > >> Aside the assembly boot code and UEFI stub, Xen already runs at the
> > same
> > >> address as it was linked.
> > >>
> > >
> > > But the difference is that, base on MMU, we can use the same link
> > address
> > > for all platforms. But on MPU system, we can't do it in the same way.
> >
> > I agree that we currently use the same link address for all the
> > platforms. But this is also a problem when using MMU because EL2 has a
> > single TTBR.
> >
> > At the moment we are switching page-tables with the MMU which is not
> > safe. Instead we need to turn out the MMU off, switch page-tables and
> > then turn on the MMU. This means we need to have an identity mapping of
> > Xen in the page-tables. Assuming Xen is not relocated, the identity
> > mapping may clash with Xen (or the rest of the virtual address map).
> >
> 
> Is this the same reason we create a dummy reloc section for EFI loader?
> 
> > My initial idea was to enable PIC and update the relocation at boot
> > time. But this is a bit cumbersome to do. So now I am looking to have a
> > semi-dynamic virtual layout and find some place to relocate part of Xen
> > to use for CPU bring-up.
> >
> > Anyway, my point is we possibly could look at PIC if that could allow
> > generic Xen image.
> >
> 
> I understand your concern. IMO, PIC is possible to do this, but obviously,
> it's not a small amount of work. And I want to hear some suggestions from
> Stefano, because he also has some solutions in previous thread.
>

Can you have a look at the PIC discussion between Julien and me?
I think we may need some inputs from your view.

Thanks,
Wei Chen

[...]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-07  2:12         ` Wei Chen
@ 2022-03-07 22:58           ` Stefano Stabellini
  2022-03-08  7:28             ` Wei Chen
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2022-03-07 22:58 UTC (permalink / raw)
  To: Wei Chen
  Cc: Julien Grall, xen-devel, Stefano Stabellini, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd

On Mon, 7 Mar 2022, Wei Chen wrote:
> > > On 01/03/2022 07:51, Wei Chen wrote:
> > > >>> ### 1.2. Xen Challenges with PMSA Virtualization
> > > >>> Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to
> > > run
> > > >>> with an MPU and host multiple guest OSes.
> > > >>>
> > > >>> - No MMU at EL2:
> > > >>>       - No EL2 Stage 1 address translation
> > > >>>           - Xen provides fixed ARM64 virtual memory layout as basis
> > of
> > > >> EL2
> > > >>>             stage 1 address translation, which is not applicable on
> > > MPU
> > > >> system,
> > > >>>             where there is no virtual addressing. As a result, any
> > > >> operation
> > > >>>             involving transition from PA to VA, like ioremap, needs
> > > >> modification
> > > >>>             on MPU system.
> > > >>>       - Xen's run-time addresses are the same as the link time
> > > addresses.
> > > >>>           - Enable PIC (position-independent code) on a real-time
> > > target
> > > >>>             processor probably very rare.
> > > >>
> > > >> Aside the assembly boot code and UEFI stub, Xen already runs at the
> > > same
> > > >> address as it was linked.
> > > >>
> > > >
> > > > But the difference is that, base on MMU, we can use the same link
> > > address
> > > > for all platforms. But on MPU system, we can't do it in the same way.
> > >
> > > I agree that we currently use the same link address for all the
> > > platforms. But this is also a problem when using MMU because EL2 has a
> > > single TTBR.
> > >
> > > At the moment we are switching page-tables with the MMU which is not
> > > safe. Instead we need to turn out the MMU off, switch page-tables and
> > > then turn on the MMU. This means we need to have an identity mapping of
> > > Xen in the page-tables. Assuming Xen is not relocated, the identity
> > > mapping may clash with Xen (or the rest of the virtual address map).
> > >
> > 
> > Is this the same reason we create a dummy reloc section for EFI loader?
> > 
> > > My initial idea was to enable PIC and update the relocation at boot
> > > time. But this is a bit cumbersome to do. So now I am looking to have a
> > > semi-dynamic virtual layout and find some place to relocate part of Xen
> > > to use for CPU bring-up.
> > >
> > > Anyway, my point is we possibly could look at PIC if that could allow
> > > generic Xen image.
> > >
> > 
> > I understand your concern. IMO, PIC is possible to do this, but obviously,
> > it's not a small amount of work. And I want to hear some suggestions from
> > Stefano, because he also has some solutions in previous thread.
> >
> 
> Can you have a look at the PIC discussion between Julien and me?
> I think we may need some inputs from your view.

If we have to have a build-time device tree anyway, we could
automatically generate the link address, together with other required
addresses. There would little benefit to do PIC if we have to have a
build-time device tree in any case.

On the other hand, if we could get rid of the build-time device tree
altogether, then yes doing PIC provides some benefits. It would allow us
to have single Xen binary working on multiple Cortex-R boards. However,
I don't think that would be important from a user perspective. People
will not install Ubuntu on a Cortex-R and apt-get xen.  The target is
embedded: users will know from the start the board they will target, so
it would not be a problem for them to build Xen for a specific board.
ImageBuilder (or something like it) will still be required to generate
boot scripts and boot info. In other words, although it would be
convenient to produce a generic binary, it is not a must-have feature
and I would consider it low-priority compared to others.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-07 22:58           ` Stefano Stabellini
@ 2022-03-08  7:28             ` Wei Chen
  2022-03-08 19:49               ` Stefano Stabellini
  0 siblings, 1 reply; 34+ messages in thread
From: Wei Chen @ 2022-03-08  7:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, xen-devel, Bertrand Marquis, Penny Zheng, Henry Wang, nd

Hi Stefano,

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Stefano Stabellini
> Sent: 2022年3月8日 6:58
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Julien Grall <julien@xen.org>; xen-devel@lists.xenproject.org; Stefano
> Stabellini <sstabellini@kernel.org>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Penny Zheng <Penny.Zheng@arm.com>; Henry Wang
> <Henry.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> On Mon, 7 Mar 2022, Wei Chen wrote:
> > > > On 01/03/2022 07:51, Wei Chen wrote:
> > > > >>> ### 1.2. Xen Challenges with PMSA Virtualization
> > > > >>> Xen is PMSA unaware Type-1 Hypervisor, it will need
> modifications to
> > > > run
> > > > >>> with an MPU and host multiple guest OSes.
> > > > >>>
> > > > >>> - No MMU at EL2:
> > > > >>>       - No EL2 Stage 1 address translation
> > > > >>>           - Xen provides fixed ARM64 virtual memory layout as
> basis
> > > of
> > > > >> EL2
> > > > >>>             stage 1 address translation, which is not applicable
> on
> > > > MPU
> > > > >> system,
> > > > >>>             where there is no virtual addressing. As a result,
> any
> > > > >> operation
> > > > >>>             involving transition from PA to VA, like ioremap,
> needs
> > > > >> modification
> > > > >>>             on MPU system.
> > > > >>>       - Xen's run-time addresses are the same as the link time
> > > > addresses.
> > > > >>>           - Enable PIC (position-independent code) on a real-
> time
> > > > target
> > > > >>>             processor probably very rare.
> > > > >>
> > > > >> Aside the assembly boot code and UEFI stub, Xen already runs at
> the
> > > > same
> > > > >> address as it was linked.
> > > > >>
> > > > >
> > > > > But the difference is that, base on MMU, we can use the same link
> > > > address
> > > > > for all platforms. But on MPU system, we can't do it in the same
> way.
> > > >
> > > > I agree that we currently use the same link address for all the
> > > > platforms. But this is also a problem when using MMU because EL2 has
> a
> > > > single TTBR.
> > > >
> > > > At the moment we are switching page-tables with the MMU which is not
> > > > safe. Instead we need to turn out the MMU off, switch page-tables
> and
> > > > then turn on the MMU. This means we need to have an identity mapping
> of
> > > > Xen in the page-tables. Assuming Xen is not relocated, the identity
> > > > mapping may clash with Xen (or the rest of the virtual address map).
> > > >
> > >
> > > Is this the same reason we create a dummy reloc section for EFI loader?
> > >
> > > > My initial idea was to enable PIC and update the relocation at boot
> > > > time. But this is a bit cumbersome to do. So now I am looking to
> have a
> > > > semi-dynamic virtual layout and find some place to relocate part of
> Xen
> > > > to use for CPU bring-up.
> > > >
> > > > Anyway, my point is we possibly could look at PIC if that could
> allow
> > > > generic Xen image.
> > > >
> > >
> > > I understand your concern. IMO, PIC is possible to do this, but
> obviously,
> > > it's not a small amount of work. And I want to hear some suggestions
> from
> > > Stefano, because he also has some solutions in previous thread.
> > >
> >
> > Can you have a look at the PIC discussion between Julien and me?
> > I think we may need some inputs from your view.
> 
> If we have to have a build-time device tree anyway, we could
> automatically generate the link address, together with other required
> addresses. There would little benefit to do PIC if we have to have a
> build-time device tree in any case.
> 
> On the other hand, if we could get rid of the build-time device tree
> altogether, then yes doing PIC provides some benefits. It would allow us
> to have single Xen binary working on multiple Cortex-R boards. However,
> I don't think that would be important from a user perspective. People
> will not install Ubuntu on a Cortex-R and apt-get xen.  The target is
> embedded: users will know from the start the board they will target, so
> it would not be a problem for them to build Xen for a specific board.
> ImageBuilder (or something like it) will still be required to generate
> boot scripts and boot info. In other words, although it would be
> convenient to produce a generic binary, it is not a must-have feature
> and I would consider it low-priority compared to others.

I tend to agree with your opinion. We can get some benefit from PIC,
but the priority may be low. We have encountered a problem when we're
trying to use EFI loader to boot xen.efi on v8R. Due to lack of relocation
capability, the EFI loader could not launch xen.efi on V8R. But Xen EFI
boot capability is a requirement of Arm EBBR [1]. In order to support Xen
EFI boot on V8R, may be we still need a partially supported PIC. Only some
boot code support PIC to make EFI relocation happy. This boot code will
help Xen to check its loaded address and relocate Xen image to Xen's
run-time address if need.

How about we place PIC support to TODO list for further discussion,
I don't think we can include so many items in day1 : ) 

[1]https://arm-software.github.io/ebbr/index.html

Cheers,
Wei Chen



^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: Proposal for Porting Xen to Armv8-R64 - DraftA
  2022-03-08  7:28             ` Wei Chen
@ 2022-03-08 19:49               ` Stefano Stabellini
  0 siblings, 0 replies; 34+ messages in thread
From: Stefano Stabellini @ 2022-03-08 19:49 UTC (permalink / raw)
  To: Wei Chen
  Cc: Stefano Stabellini, Julien Grall, xen-devel, Bertrand Marquis,
	Penny Zheng, Henry Wang, nd

On Tue, 8 Mar 2022, Wei Chen wrote:
> > On Mon, 7 Mar 2022, Wei Chen wrote:
> > > > > On 01/03/2022 07:51, Wei Chen wrote:
> > > > > >>> ### 1.2. Xen Challenges with PMSA Virtualization
> > > > > >>> Xen is PMSA unaware Type-1 Hypervisor, it will need
> > modifications to
> > > > > run
> > > > > >>> with an MPU and host multiple guest OSes.
> > > > > >>>
> > > > > >>> - No MMU at EL2:
> > > > > >>>       - No EL2 Stage 1 address translation
> > > > > >>>           - Xen provides fixed ARM64 virtual memory layout as
> > basis
> > > > of
> > > > > >> EL2
> > > > > >>>             stage 1 address translation, which is not applicable
> > on
> > > > > MPU
> > > > > >> system,
> > > > > >>>             where there is no virtual addressing. As a result,
> > any
> > > > > >> operation
> > > > > >>>             involving transition from PA to VA, like ioremap,
> > needs
> > > > > >> modification
> > > > > >>>             on MPU system.
> > > > > >>>       - Xen's run-time addresses are the same as the link time
> > > > > addresses.
> > > > > >>>           - Enable PIC (position-independent code) on a real-
> > time
> > > > > target
> > > > > >>>             processor probably very rare.
> > > > > >>
> > > > > >> Aside the assembly boot code and UEFI stub, Xen already runs at
> > the
> > > > > same
> > > > > >> address as it was linked.
> > > > > >>
> > > > > >
> > > > > > But the difference is that, base on MMU, we can use the same link
> > > > > address
> > > > > > for all platforms. But on MPU system, we can't do it in the same
> > way.
> > > > >
> > > > > I agree that we currently use the same link address for all the
> > > > > platforms. But this is also a problem when using MMU because EL2 has
> > a
> > > > > single TTBR.
> > > > >
> > > > > At the moment we are switching page-tables with the MMU which is not
> > > > > safe. Instead we need to turn out the MMU off, switch page-tables
> > and
> > > > > then turn on the MMU. This means we need to have an identity mapping
> > of
> > > > > Xen in the page-tables. Assuming Xen is not relocated, the identity
> > > > > mapping may clash with Xen (or the rest of the virtual address map).
> > > > >
> > > >
> > > > Is this the same reason we create a dummy reloc section for EFI loader?
> > > >
> > > > > My initial idea was to enable PIC and update the relocation at boot
> > > > > time. But this is a bit cumbersome to do. So now I am looking to
> > have a
> > > > > semi-dynamic virtual layout and find some place to relocate part of
> > Xen
> > > > > to use for CPU bring-up.
> > > > >
> > > > > Anyway, my point is we possibly could look at PIC if that could
> > allow
> > > > > generic Xen image.
> > > > >
> > > >
> > > > I understand your concern. IMO, PIC is possible to do this, but
> > obviously,
> > > > it's not a small amount of work. And I want to hear some suggestions
> > from
> > > > Stefano, because he also has some solutions in previous thread.
> > > >
> > >
> > > Can you have a look at the PIC discussion between Julien and me?
> > > I think we may need some inputs from your view.
> > 
> > If we have to have a build-time device tree anyway, we could
> > automatically generate the link address, together with other required
> > addresses. There would little benefit to do PIC if we have to have a
> > build-time device tree in any case.
> > 
> > On the other hand, if we could get rid of the build-time device tree
> > altogether, then yes doing PIC provides some benefits. It would allow us
> > to have single Xen binary working on multiple Cortex-R boards. However,
> > I don't think that would be important from a user perspective. People
> > will not install Ubuntu on a Cortex-R and apt-get xen.  The target is
> > embedded: users will know from the start the board they will target, so
> > it would not be a problem for them to build Xen for a specific board.
> > ImageBuilder (or something like it) will still be required to generate
> > boot scripts and boot info. In other words, although it would be
> > convenient to produce a generic binary, it is not a must-have feature
> > and I would consider it low-priority compared to others.
> 
> I tend to agree with your opinion. We can get some benefit from PIC,
> but the priority may be low. We have encountered a problem when we're
> trying to use EFI loader to boot xen.efi on v8R. Due to lack of relocation
> capability, the EFI loader could not launch xen.efi on V8R. But Xen EFI
> boot capability is a requirement of Arm EBBR [1]. In order to support Xen
> EFI boot on V8R, may be we still need a partially supported PIC. Only some
> boot code support PIC to make EFI relocation happy. This boot code will
> help Xen to check its loaded address and relocate Xen image to Xen's
> run-time address if need.
> 
> How about we place PIC support to TODO list for further discussion,
> I don't think we can include so many items in day1 : ) 
> 
> [1]https://arm-software.github.io/ebbr/index.html

Sounds good to me :-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-03-08 19:49 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24  6:01 Proposal for Porting Xen to Armv8-R64 - DraftA Wei Chen
2022-02-24 11:52 ` Ayan Kumar Halder
2022-02-25  6:33   ` Wei Chen
2022-02-25  0:55 ` Stefano Stabellini
2022-02-25 10:48   ` Wei Chen
2022-02-25 20:12     ` Julien Grall
2022-03-01  6:29       ` Wei Chen
2022-03-01 13:17         ` Julien Grall
2022-03-02  6:43           ` Wei Chen
2022-03-02 10:24             ` Julien Grall
2022-03-03  1:35               ` Wei Chen
2022-03-03  9:15                 ` Julien Grall
2022-03-03 10:43                   ` Wei Chen
2022-02-25 23:54     ` Stefano Stabellini
2022-03-01 12:55       ` Wei Chen
2022-03-01 23:38         ` Stefano Stabellini
2022-03-02  7:13           ` Wei Chen
2022-03-02 22:55             ` Stefano Stabellini
2022-03-03  1:05               ` Wei Chen
2022-03-03  2:03                 ` Stefano Stabellini
2022-03-03  2:12                   ` Wei Chen
2022-03-03  2:15                     ` Stefano Stabellini
2022-02-25 20:55 ` Julien Grall
2022-03-01  7:51   ` Wei Chen
2022-03-02  7:21     ` Penny Zheng
2022-03-02 12:06       ` Julien Grall
2022-03-02 12:00     ` Julien Grall
2022-03-03  2:06       ` Wei Chen
2022-03-03 19:51         ` Julien Grall
2022-03-04  5:38           ` Wei Chen
2022-03-07  2:12         ` Wei Chen
2022-03-07 22:58           ` Stefano Stabellini
2022-03-08  7:28             ` Wei Chen
2022-03-08 19:49               ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.