xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs
@ 2024-04-09  4:53 Henry Wang
  2024-04-09  4:53 ` [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map Henry Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Henry Wang @ 2024-04-09  4:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Anthony PERARD, Juergen Gross, Andrew Cooper,
	George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk

An error message can seen from the init-dom0less application on
direct-mapped 1:1 domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

This is because populate_physmap() automatically assumes gfn == mfn
for direct mapped domains. This cannot be true for the magic pages
that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
helper application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list
is empty at that time.

This series tries to fix this issue using a DOMCTL-based approach,
because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
and inform the toolstack about the region found by hypervisor for
mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
guest memory map, currently only used for the magic page regions.
Patch 2 generalized the extended region finding logic so that it can
be reused for other use cases such as finding 1:1 domU magic regions.
Patch 3 uses the same approach as finding the extended regions to find
the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
hardcoding all base addresses of guest magic region in the init-dom0less
application by consuming the newly introduced DOMCTL. Patch 5 is a
simple patch to do some code duplication clean-up in xc.

Gitlab pipeline for this series:
https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1245192195

Henry Wang (5):
  xen/domctl, tools: Introduce a new domctl to get guest memory map
  xen/arm: Generalize the extended region finding logic
  xen/arm: Find unallocated spaces for magic pages of direct-mapped domU
  xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less
  tools/libs/ctrl: Simplify xc helpers related to populate_physmap()

 tools/helpers/init-dom0less.c            |  35 ++--
 tools/include/xenctrl.h                  |  11 ++
 tools/libs/ctrl/xc_domain.c              | 204 +++++++++++++----------
 xen/arch/arm/dom0less-build.c            |  11 ++
 xen/arch/arm/domain.c                    |  14 ++
 xen/arch/arm/domain_build.c              | 131 ++++++++++-----
 xen/arch/arm/domctl.c                    |  28 +++-
 xen/arch/arm/include/asm/domain.h        |   8 +
 xen/arch/arm/include/asm/domain_build.h  |   4 +
 xen/arch/arm/include/asm/setup.h         |   5 +
 xen/arch/arm/include/asm/static-memory.h |   7 +
 xen/arch/arm/static-memory.c             |  39 +++++
 xen/common/memory.c                      |  30 +++-
 xen/include/public/arch-arm.h            |   7 +
 xen/include/public/domctl.h              |  29 ++++
 xen/include/public/memory.h              |   3 +-
 16 files changed, 415 insertions(+), 151 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map
  2024-04-09  4:53 [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Henry Wang
@ 2024-04-09  4:53 ` Henry Wang
  2024-04-18 12:37   ` Jan Beulich
  2024-04-09  4:53 ` [PATCH v4 2/5] xen/arm: Generalize the extended region finding logic Henry Wang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Henry Wang @ 2024-04-09  4:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Anthony PERARD, Juergen Gross, Andrew Cooper,
	George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Alec Kwapis

There are some use cases where the toolstack needs to know the guest
memory map. For example, the toolstack helper application
"init-dom0less" needs to know the guest magic page regions for 1:1
direct-mapped dom0less DomUs to allocate magic pages.

To address such needs, add XEN_DOMCTL_get_mem_map hypercall and
related data structures to query the hypervisor for the guest memory
map. The guest memory map is recorded in the domain structure and
currently only guest magic page region is recorded in the guest
memory map. The guest magic page region is initialized at the domain
creation time as the layout in the public header, and it is updated
for 1:1 dom0less DomUs (see the following commit) to avoid conflict
with RAM.

Take the opportunity to drop an unnecessary empty line to keep the
coding style consistent in the file.

Reported-by: Alec Kwapis <alec.kwapis@medtronic.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
---
RFC: I think the newly introduced "struct xen_domctl_mem_map" is
quite duplicated with "struct xen_memory_map", any comment on reuse
the "struct xen_memory_map" for simplicity?
v4:
- Drop the unnecessary initialization and printk.
- Use XEN_* prefix instead of GUEST_* for domctl.
- Move the mem region type to mem region structure.
- Drop the check of Xen internal state in the domctl.
- Handle the nr_regions properly (Fill only as much of the array
  as there is space for, but return the full count to the caller)
  to make sure the caller can know if it specifies a too small array.
v3:
- Init the return rc for XEN_DOMCTL_get_mem_map.
- Copy the nr_mem_regions back as it should be both IN & OUT.
- Check if mem_map->nr_mem_regions exceeds the XEN_MAX_MEM_REGIONS
  when adding a new entry.
- Allow XEN_MAX_MEM_REGIONS to be different between different archs.
- Add explicit padding and check to the domctl structures.
v2:
- New patch
---
 tools/include/xenctrl.h           |  4 ++++
 tools/libs/ctrl/xc_domain.c       | 37 +++++++++++++++++++++++++++++++
 xen/arch/arm/domain.c             | 14 ++++++++++++
 xen/arch/arm/domctl.c             | 28 ++++++++++++++++++++++-
 xen/arch/arm/include/asm/domain.h |  8 +++++++
 xen/include/public/arch-arm.h     |  7 ++++++
 xen/include/public/domctl.h       | 29 ++++++++++++++++++++++++
 7 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e054..b25e9772a2 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1195,6 +1195,10 @@ int xc_domain_setmaxmem(xc_interface *xch,
                         uint32_t domid,
                         uint64_t max_memkb);
 
+int xc_get_domain_mem_map(xc_interface *xch, uint32_t domid,
+                          struct xen_mem_region mem_regions[],
+                          uint32_t *nr_regions);
+
 int xc_domain_set_memmap_limit(xc_interface *xch,
                                uint32_t domid,
                                unsigned long map_limitkb);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d..4dba55d01d 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -697,6 +697,43 @@ int xc_domain_setmaxmem(xc_interface *xch,
     return do_domctl(xch, &domctl);
 }
 
+int xc_get_domain_mem_map(xc_interface *xch, uint32_t domid,
+                          struct xen_mem_region mem_regions[],
+                          uint32_t *nr_regions)
+{
+    int rc;
+    uint32_t nr = *nr_regions;
+    struct xen_domctl domctl = {
+        .cmd         = XEN_DOMCTL_get_mem_map,
+        .domain      = domid,
+        .u.mem_map = {
+            .nr_mem_regions = nr,
+        },
+    };
+
+    DECLARE_HYPERCALL_BOUNCE(mem_regions, sizeof(xen_mem_region_t) * nr,
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( !mem_regions || xc_hypercall_bounce_pre(xch, mem_regions) || nr < 1 )
+        return -1;
+
+    set_xen_guest_handle(domctl.u.mem_map.buffer, mem_regions);
+
+    rc = do_domctl(xch, &domctl);
+
+    xc_hypercall_bounce_post(xch, mem_regions);
+
+    if ( nr < domctl.u.mem_map.nr_mem_regions )
+    {
+        PERROR("Too small nr_regions %u", nr);
+        return -1;
+    }
+
+    *nr_regions = domctl.u.mem_map.nr_mem_regions;
+
+    return rc;
+}
+
 #if defined(__i386__) || defined(__x86_64__)
 int xc_domain_set_memory_map(xc_interface *xch,
                                uint32_t domid,
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 34cbfe699a..0c9761b65b 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -697,6 +697,7 @@ int arch_domain_create(struct domain *d,
 {
     unsigned int count = 0;
     int rc;
+    struct mem_map_domain *mem_map = &d->arch.mem_map;
 
     BUILD_BUG_ON(GUEST_MAX_VCPUS < MAX_VIRT_CPUS);
 
@@ -786,6 +787,19 @@ int arch_domain_create(struct domain *d,
     d->arch.sve_vl = config->arch.sve_vl;
 #endif
 
+    if ( mem_map->nr_mem_regions < XEN_MAX_MEM_REGIONS )
+    {
+        mem_map->regions[mem_map->nr_mem_regions].start = GUEST_MAGIC_BASE;
+        mem_map->regions[mem_map->nr_mem_regions].size = GUEST_MAGIC_SIZE;
+        mem_map->regions[mem_map->nr_mem_regions].type = XEN_MEM_REGION_MAGIC;
+        mem_map->nr_mem_regions++;
+    }
+    else
+    {
+        rc = -ENOSPC;
+        goto fail;
+    }
+
     return 0;
 
 fail:
diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..8f62719cfa 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -148,7 +148,6 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
 
         return 0;
     }
-
     case XEN_DOMCTL_vuart_op:
     {
         int rc;
@@ -176,6 +175,33 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
 
         return rc;
     }
+    case XEN_DOMCTL_get_mem_map:
+    {
+        int rc = 0;
+        uint32_t nr_regions;
+
+        if ( domctl->u.mem_map.pad )
+            return -EINVAL;
+
+        /*
+         * Fill the buffer only as much of the array as there is space for,
+         * but always return the full count in the hypervisor to the caller.
+         * This way we can avoid overflowing the buffer and also make sure
+         * the caller can know if it specifies too small an array.
+         */
+        nr_regions = min(d->arch.mem_map.nr_mem_regions,
+                         domctl->u.mem_map.nr_mem_regions);
+
+        domctl->u.mem_map.nr_mem_regions = d->arch.mem_map.nr_mem_regions;
+
+        if ( copy_to_guest(domctl->u.mem_map.buffer,
+                           d->arch.mem_map.regions, nr_regions) ||
+             __copy_field_to_guest(u_domctl, domctl,
+                                   u.mem_map.nr_mem_regions) )
+            rc = -EFAULT;
+
+        return rc;
+    }
     default:
         return subarch_do_domctl(domctl, d, u_domctl);
     }
diff --git a/xen/arch/arm/include/asm/domain.h b/xen/arch/arm/include/asm/domain.h
index f1d72c6e48..a559a9e499 100644
--- a/xen/arch/arm/include/asm/domain.h
+++ b/xen/arch/arm/include/asm/domain.h
@@ -10,6 +10,7 @@
 #include <asm/gic.h>
 #include <asm/vgic.h>
 #include <asm/vpl011.h>
+#include <public/domctl.h>
 #include <public/hvm/params.h>
 
 struct hvm_domain
@@ -59,6 +60,11 @@ struct paging_domain {
     unsigned long p2m_total_pages;
 };
 
+struct mem_map_domain {
+    unsigned int nr_mem_regions;
+    struct xen_mem_region regions[XEN_MAX_MEM_REGIONS];
+};
+
 struct arch_domain
 {
 #ifdef CONFIG_ARM_64
@@ -77,6 +83,8 @@ struct arch_domain
 
     struct paging_domain paging;
 
+    struct mem_map_domain mem_map;
+
     struct vmmio vmmio;
 
     /* Continuable domain_relinquish_resources(). */
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index e167e14f8d..eba61e1ac6 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -223,6 +223,13 @@ typedef uint64_t xen_pfn_t;
  */
 #define XEN_LEGACY_MAX_VCPUS 1
 
+/*
+ * Maximum number of memory map regions for guest memory layout.
+ * Used by XEN_DOMCTL_get_mem_map, currently there is only one region
+ * for the guest magic pages.
+ */
+#define XEN_MAX_MEM_REGIONS 1
+
 typedef uint64_t xen_ulong_t;
 #define PRI_xen_ulong PRIx64
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index a33f9ec32b..974c07ee61 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -946,6 +946,33 @@ struct xen_domctl_paging_mempool {
     uint64_aligned_t size; /* Size in bytes. */
 };
 
+#ifndef XEN_MAX_MEM_REGIONS
+#define XEN_MAX_MEM_REGIONS 1
+#endif
+
+struct xen_mem_region {
+    uint64_aligned_t start;
+    uint64_aligned_t size;
+#define XEN_MEM_REGION_DEFAULT    0
+#define XEN_MEM_REGION_MAGIC      1
+    uint32_t         type;
+    /* Must be zero */
+    uint32_t         pad;
+};
+typedef struct xen_mem_region xen_mem_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_mem_region_t);
+
+struct xen_domctl_mem_map {
+    /* IN & OUT */
+    uint32_t         nr_mem_regions;
+    /* Must be zero */
+    uint32_t         pad;
+    /* OUT */
+    XEN_GUEST_HANDLE_64(xen_mem_region_t) buffer;
+};
+typedef struct xen_domctl_mem_map xen_domctl_mem_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_mem_map_t);
+
 #if defined(__i386__) || defined(__x86_64__)
 struct xen_domctl_vcpu_msr {
     uint32_t         index;
@@ -1277,6 +1304,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_vmtrace_op                    84
 #define XEN_DOMCTL_get_paging_mempool_size       85
 #define XEN_DOMCTL_set_paging_mempool_size       86
+#define XEN_DOMCTL_get_mem_map                   87
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1339,6 +1367,7 @@ struct xen_domctl {
         struct xen_domctl_vuart_op          vuart_op;
         struct xen_domctl_vmtrace_op        vmtrace_op;
         struct xen_domctl_paging_mempool    paging_mempool;
+        struct xen_domctl_mem_map           mem_map;
         uint8_t                             pad[128];
     } u;
 };
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/5] xen/arm: Generalize the extended region finding logic
  2024-04-09  4:53 [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Henry Wang
  2024-04-09  4:53 ` [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map Henry Wang
@ 2024-04-09  4:53 ` Henry Wang
  2024-04-09  4:53 ` [PATCH v4 3/5] xen/arm: Find unallocated spaces for magic pages of direct-mapped domU Henry Wang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Henry Wang @ 2024-04-09  4:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk

Recently there are needs to find unallocated/used memory regions
for different use cases at boot time, and the logic of finding
extended regions can be reused for some of the new use cases.

This commit therefore generalize the extended region finding logic.
Firstly, extract the extended region finding logic to a dedicated
helper find_unused_regions(). Then add and pass down a `min_region_size`
parameter so that the helpers can take a flexible region size
instead of the hardcoded 64MB for extended region. Finally, adjust
the variable and function names to make them general instead of for
extended region only.

Take the opportunity to update the stale in-code comment of
find_unallocated_memory() and find_memory_holes().

Signed-off-by: Henry Wang <xin.wang2@amd.com>
---
v4:
- No change
v3:
- New patch
---
 xen/arch/arm/domain_build.c             | 107 ++++++++++++++----------
 xen/arch/arm/include/asm/domain_build.h |   2 +
 xen/arch/arm/include/asm/setup.h        |   5 ++
 3 files changed, 70 insertions(+), 44 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 085d88671e..d2a9c047ea 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -814,19 +814,21 @@ int __init make_memory_node(const struct domain *d,
     return res;
 }
 
-static int __init add_ext_regions(unsigned long s_gfn, unsigned long e_gfn,
-                                  void *data)
+static int __init add_regions(unsigned long s_gfn, unsigned long e_gfn,
+                              void *data)
 {
-    struct meminfo *ext_regions = data;
+    struct mem_unused_info *unused = data;
+    struct meminfo *regions = unused->regions;
+    paddr_t min_region_size = unused->min_region_size;
     paddr_t start, size;
     paddr_t s = pfn_to_paddr(s_gfn);
     paddr_t e = pfn_to_paddr(e_gfn);
 
-    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
+    if ( regions->nr_banks >= ARRAY_SIZE(regions->bank) )
         return 0;
 
     /*
-     * Both start and size of the extended region should be 2MB aligned to
+     * Both start and size of the region should be 2MB aligned to
      * potentially allow superpage mapping.
      */
     start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
@@ -845,26 +847,27 @@ static int __init add_ext_regions(unsigned long s_gfn, unsigned long e_gfn,
      * not quite useful but increase bookkeeping and not too large
      * to skip a large proportion of unused address space.
      */
-    if ( size < MB(64) )
+    if ( size < min_region_size )
         return 0;
 
-    ext_regions->bank[ext_regions->nr_banks].start = start;
-    ext_regions->bank[ext_regions->nr_banks].size = size;
-    ext_regions->nr_banks++;
+    regions->bank[regions->nr_banks].start = start;
+    regions->bank[regions->nr_banks].size = size;
+    regions->nr_banks++;
 
     return 0;
 }
 
 /*
- * Find unused regions of Host address space which can be exposed to Dom0
- * as extended regions for the special memory mappings. In order to calculate
- * regions we exclude every region assigned to Dom0 from the Host RAM:
+ * Find unused regions of Host address space which can be exposed to
+ * direct-mapped domains as regions for the special memory mappings.
+ * In order to calculate regions we exclude every region assigned to
+ * direct-mapped domains from the Host RAM:
  * - domain RAM
  * - reserved-memory
  * - grant table space
  */
 static int __init find_unallocated_memory(const struct kernel_info *kinfo,
-                                          struct meminfo *ext_regions)
+                                          struct mem_unused_info *unused)
 {
     const struct meminfo *assign_mem = &kinfo->mem;
     struct rangeset *unalloc_mem;
@@ -893,7 +896,7 @@ static int __init find_unallocated_memory(const struct kernel_info *kinfo,
         }
     }
 
-    /* Remove RAM assigned to Dom0 */
+    /* Remove RAM assigned to domain */
     for ( i = 0; i < assign_mem->nr_banks; i++ )
     {
         start = assign_mem->bank[i].start;
@@ -942,10 +945,10 @@ static int __init find_unallocated_memory(const struct kernel_info *kinfo,
     start = 0;
     end = (1ULL << p2m_ipa_bits) - 1;
     res = rangeset_report_ranges(unalloc_mem, PFN_DOWN(start), PFN_DOWN(end),
-                                 add_ext_regions, ext_regions);
+                                 add_regions, unused);
     if ( res )
-        ext_regions->nr_banks = 0;
-    else if ( !ext_regions->nr_banks )
+        unused->regions->nr_banks = 0;
+    else if ( !unused->regions->nr_banks )
         res = -ENOENT;
 
 out:
@@ -982,16 +985,16 @@ static int __init handle_pci_range(const struct dt_device_node *dev,
 }
 
 /*
- * Find the holes in the Host DT which can be exposed to Dom0 as extended
- * regions for the special memory mappings. In order to calculate regions
- * we exclude every addressable memory region described by "reg" and "ranges"
+ * Find the holes in the Host DT which can be exposed to direct-mapped domains
+ * as regions for the special memory mappings. In order to calculate regions we
+ * exclude every addressable memory region described by "reg" and "ranges"
  * properties from the maximum possible addressable physical memory range:
  * - MMIO
  * - Host RAM
  * - PCI aperture
  */
 static int __init find_memory_holes(const struct kernel_info *kinfo,
-                                    struct meminfo *ext_regions)
+                                    struct mem_unused_info *unused)
 {
     struct dt_device_node *np;
     struct rangeset *mem_holes;
@@ -1068,10 +1071,10 @@ static int __init find_memory_holes(const struct kernel_info *kinfo,
     start = 0;
     end = (1ULL << p2m_ipa_bits) - 1;
     res = rangeset_report_ranges(mem_holes, PFN_DOWN(start), PFN_DOWN(end),
-                                 add_ext_regions,  ext_regions);
+                                 add_regions, unused);
     if ( res )
-        ext_regions->nr_banks = 0;
-    else if ( !ext_regions->nr_banks )
+        unused->regions->nr_banks = 0;
+    else if ( !unused->regions->nr_banks )
         res = -ENOENT;
 
 out:
@@ -1081,35 +1084,62 @@ out:
 }
 
 static int __init find_domU_holes(const struct kernel_info *kinfo,
-                                  struct meminfo *ext_regions)
+                                  struct mem_unused_info *unused)
 {
     unsigned int i;
     uint64_t bankend;
     const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
     const uint64_t banksize[] = GUEST_RAM_BANK_SIZES;
+    struct meminfo *regions = unused->regions;
+    paddr_t min_region_size = unused->min_region_size;
     int res = -ENOENT;
 
     for ( i = 0; i < GUEST_RAM_BANKS; i++ )
     {
-        struct membank *ext_bank = &(ext_regions->bank[ext_regions->nr_banks]);
+        struct membank *bank = &(regions->bank[regions->nr_banks]);
 
-        ext_bank->start = ROUNDUP(bankbase[i] + kinfo->mem.bank[i].size, SZ_2M);
+        bank->start = ROUNDUP(bankbase[i] + kinfo->mem.bank[i].size, SZ_2M);
 
         bankend = ~0ULL >> (64 - p2m_ipa_bits);
         bankend = min(bankend, bankbase[i] + banksize[i] - 1);
-        if ( bankend > ext_bank->start )
-            ext_bank->size = bankend - ext_bank->start + 1;
+        if ( bankend > bank->start )
+            bank->size = bankend - bank->start + 1;
 
-        /* 64MB is the minimum size of an extended region */
-        if ( ext_bank->size < MB(64) )
+        if ( bank->size < min_region_size )
             continue;
-        ext_regions->nr_banks++;
+        regions->nr_banks++;
         res = 0;
     }
 
     return res;
 }
 
+int __init find_unused_regions(struct domain *d,
+                               const struct kernel_info *kinfo,
+                               struct meminfo *regions,
+                               paddr_t min_region_size)
+{
+    int res;
+    struct mem_unused_info unused;
+
+    unused.regions = regions;
+    unused.min_region_size = min_region_size;
+
+    if ( is_domain_direct_mapped(d) )
+    {
+        if ( !is_iommu_enabled(d) )
+            res = find_unallocated_memory(kinfo, &unused);
+        else
+            res = find_memory_holes(kinfo, &unused);
+    }
+    else
+    {
+        res = find_domU_holes(kinfo, &unused);
+    }
+
+    return res;
+}
+
 int __init make_hypervisor_node(struct domain *d,
                                 const struct kernel_info *kinfo,
                                 int addrcells, int sizecells)
@@ -1161,18 +1191,7 @@ int __init make_hypervisor_node(struct domain *d,
         if ( !ext_regions )
             return -ENOMEM;
 
-        if ( is_domain_direct_mapped(d) )
-        {
-            if ( !is_iommu_enabled(d) )
-                res = find_unallocated_memory(kinfo, ext_regions);
-            else
-                res = find_memory_holes(kinfo, ext_regions);
-        }
-        else
-        {
-            res = find_domU_holes(kinfo, ext_regions);
-        }
-
+        res = find_unused_regions(d, kinfo, ext_regions, MB(64));
         if ( res )
             printk(XENLOG_WARNING "%pd: failed to allocate extended regions\n",
                    d);
diff --git a/xen/arch/arm/include/asm/domain_build.h b/xen/arch/arm/include/asm/domain_build.h
index da9e6025f3..74432123fe 100644
--- a/xen/arch/arm/include/asm/domain_build.h
+++ b/xen/arch/arm/include/asm/domain_build.h
@@ -10,6 +10,8 @@ bool allocate_bank_memory(struct domain *d, struct kernel_info *kinfo,
                           gfn_t sgfn, paddr_t tot_size);
 int construct_domain(struct domain *d, struct kernel_info *kinfo);
 int domain_fdt_begin_node(void *fdt, const char *name, uint64_t unit);
+int find_unused_regions(struct domain *d, const struct kernel_info *kinfo,
+                        struct meminfo *regions, paddr_t min_region_size);
 int make_chosen_node(const struct kernel_info *kinfo);
 int make_cpus_node(const struct domain *d, void *fdt);
 int make_hypervisor_node(struct domain *d, const struct kernel_info *kinfo,
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index d15a88d2e0..d24c6d31c8 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -61,6 +61,11 @@ struct meminfo {
     struct membank bank[NR_MEM_BANKS];
 };
 
+struct mem_unused_info {
+    struct meminfo *regions;
+    paddr_t min_region_size;
+};
+
 /*
  * The domU flag is set for kernels and ramdisks of "xen,domain" nodes.
  * The purpose of the domU flag is to avoid getting confused in
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/5] xen/arm: Find unallocated spaces for magic pages of direct-mapped domU
  2024-04-09  4:53 [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Henry Wang
  2024-04-09  4:53 ` [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map Henry Wang
  2024-04-09  4:53 ` [PATCH v4 2/5] xen/arm: Generalize the extended region finding logic Henry Wang
@ 2024-04-09  4:53 ` Henry Wang
  2024-04-09  4:53 ` [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less Henry Wang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Henry Wang @ 2024-04-09  4:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Alec Kwapis

For 1:1 direct-mapped dom0less DomUs, the magic pages should not clash
with any RAM region. To find a proper region for guest magic pages,
we can reuse the logic of finding domain extended regions.

If the extended region is enabled, since the extended region banks are
at least 64MB, carve out the first 16MB from the first extended region
bank for magic pages of direct-mapped domU. If the extended region is
disabled, call the newly introduced helper find_11_domU_magic_region()
to find a GUEST_MAGIC_SIZE sized unused region.

Reported-by: Alec Kwapis <alec.kwapis@medtronic.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
---
v4:
- No change
v3:
- Extract the logic of finding unallocated spaces for magic pages of
  direct-mapped domU to a dedicated function in static-memory.c
- Properly handle error and free memory in find_11_domU_magic_region()
v2:
- New patch
---
 xen/arch/arm/dom0less-build.c            | 11 +++++++
 xen/arch/arm/domain_build.c              | 24 ++++++++++++++-
 xen/arch/arm/include/asm/domain_build.h  |  2 ++
 xen/arch/arm/include/asm/static-memory.h |  7 +++++
 xen/arch/arm/static-memory.c             | 39 ++++++++++++++++++++++++
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index fb63ec6fd1..1963f029fe 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -682,6 +682,17 @@ static int __init prepare_dtb_domU(struct domain *d, struct kernel_info *kinfo)
 
     if ( kinfo->dom0less_feature & DOM0LESS_ENHANCED_NO_XS )
     {
+        /*
+         * Find the guest magic region for 1:1 dom0less domU when the extended
+         * region is not enabled.
+         */
+        if ( !opt_ext_regions || is_32bit_domain(d) )
+        {
+            ret = find_11_domU_magic_region(d, kinfo);
+            if ( ret )
+                goto err;
+        }
+
         ret = make_hypervisor_node(d, kinfo, addrcells, sizecells);
         if ( ret )
             goto err;
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d2a9c047ea..d5a9baf8b0 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -46,7 +46,7 @@ integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
  * If true, the extended regions support is enabled for dom0 and
  * dom0less domUs.
  */
-static bool __initdata opt_ext_regions = true;
+bool __initdata opt_ext_regions = true;
 boolean_param("ext_regions", opt_ext_regions);
 
 static u64 __initdata dom0_mem;
@@ -1196,6 +1196,28 @@ int __init make_hypervisor_node(struct domain *d,
             printk(XENLOG_WARNING "%pd: failed to allocate extended regions\n",
                    d);
         nr_ext_regions = ext_regions->nr_banks;
+
+        /*
+         * If extended region is enabled, carve out the 16MB guest magic page
+         * regions from the first bank of extended region (at least 64MB) for
+         * the 1:1 dom0less DomUs
+         */
+        if ( is_domain_direct_mapped(d) && !is_hardware_domain(d) )
+        {
+            struct mem_map_domain *mem_map = &d->arch.mem_map;
+
+            for ( i = 0; i < mem_map->nr_mem_regions; i++ )
+            {
+                if ( mem_map->regions[i].type == XEN_MEM_REGION_MAGIC )
+                {
+                    mem_map->regions[i].start = ext_regions->bank[0].start;
+                    mem_map->regions[i].size = GUEST_MAGIC_SIZE;
+
+                    ext_regions->bank[0].start += GUEST_MAGIC_SIZE;
+                    ext_regions->bank[0].size -= GUEST_MAGIC_SIZE;
+                }
+            }
+        }
     }
 
     reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
diff --git a/xen/arch/arm/include/asm/domain_build.h b/xen/arch/arm/include/asm/domain_build.h
index 74432123fe..063ff727bb 100644
--- a/xen/arch/arm/include/asm/domain_build.h
+++ b/xen/arch/arm/include/asm/domain_build.h
@@ -4,6 +4,8 @@
 #include <xen/sched.h>
 #include <asm/kernel.h>
 
+extern bool opt_ext_regions;
+
 typedef __be32 gic_interrupt_t[3];
 
 bool allocate_bank_memory(struct domain *d, struct kernel_info *kinfo,
diff --git a/xen/arch/arm/include/asm/static-memory.h b/xen/arch/arm/include/asm/static-memory.h
index 3e3efd70c3..01e51217ca 100644
--- a/xen/arch/arm/include/asm/static-memory.h
+++ b/xen/arch/arm/include/asm/static-memory.h
@@ -12,6 +12,7 @@ void allocate_static_memory(struct domain *d, struct kernel_info *kinfo,
 void assign_static_memory_11(struct domain *d, struct kernel_info *kinfo,
                              const struct dt_device_node *node);
 void init_staticmem_pages(void);
+int find_11_domU_magic_region(struct domain *d, struct kernel_info *kinfo);
 
 #else /* !CONFIG_STATIC_MEMORY */
 
@@ -31,6 +32,12 @@ static inline void assign_static_memory_11(struct domain *d,
 
 static inline void init_staticmem_pages(void) {};
 
+static inline int find_11_domU_magic_region(struct domain *d,
+                                            struct kernel_info *kinfo)
+{
+    return 0;
+}
+
 #endif /* CONFIG_STATIC_MEMORY */
 
 #endif /* __ASM_STATIC_MEMORY_H_ */
diff --git a/xen/arch/arm/static-memory.c b/xen/arch/arm/static-memory.c
index cffbab7241..ab1ec5e73a 100644
--- a/xen/arch/arm/static-memory.c
+++ b/xen/arch/arm/static-memory.c
@@ -2,6 +2,7 @@
 
 #include <xen/sched.h>
 
+#include <asm/domain_build.h>
 #include <asm/static-memory.h>
 
 static bool __init append_static_memory_to_bank(struct domain *d,
@@ -276,6 +277,44 @@ void __init init_staticmem_pages(void)
     }
 }
 
+int __init find_11_domU_magic_region(struct domain *d,
+                                     struct kernel_info *kinfo)
+{
+    if ( is_domain_direct_mapped(d) )
+    {
+        struct meminfo *magic_region = xzalloc(struct meminfo);
+        struct mem_map_domain *mem_map = &d->arch.mem_map;
+        unsigned int i;
+        int ret = 0;
+
+        if ( !magic_region )
+            return -ENOMEM;
+
+        ret = find_unused_regions(d, kinfo, magic_region, GUEST_MAGIC_SIZE);
+        if ( ret )
+        {
+            printk(XENLOG_WARNING
+                   "%pd: failed to find a region for domain magic pages\n", d);
+            xfree(magic_region);
+            return -ENOENT;
+        }
+
+        /* Update the domain memory map. */
+        for ( i = 0; i < mem_map->nr_mem_regions; i++ )
+        {
+            if ( mem_map->regions[i].type == XEN_MEM_REGION_MAGIC )
+            {
+                mem_map->regions[i].start = magic_region->bank[0].start;
+                mem_map->regions[i].size = GUEST_MAGIC_SIZE;
+            }
+        }
+
+        xfree(magic_region);
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less
  2024-04-09  4:53 [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Henry Wang
                   ` (2 preceding siblings ...)
  2024-04-09  4:53 ` [PATCH v4 3/5] xen/arm: Find unallocated spaces for magic pages of direct-mapped domU Henry Wang
@ 2024-04-09  4:53 ` Henry Wang
  2024-04-18 12:54   ` Jan Beulich
  2024-04-09  4:53 ` [PATCH v4 5/5] tools/libs/ctrl: Simplify xc helpers related to populate_physmap() Henry Wang
  2024-04-18 14:16 ` [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Daniel P. Smith
  5 siblings, 1 reply; 17+ messages in thread
From: Henry Wang @ 2024-04-09  4:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Henry Wang, Anthony PERARD, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Juergen Gross,
	Alec Kwapis

Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.

Instead of hardcoding the guest magic pages region, use the
XEN_DOMCTL_get_mem_map domctl to get the start address of the guest
magic pages region. Add a new sub-op XENMEM_populate_physmap_heap_alloc
and the MEMF_force_heap_alloc flag to force populate_physmap() to
allocate page from domheap instead of using 1:1 or static allocated
pages to map the magic pages.

Reported-by: Alec Kwapis <alec.kwapis@medtronic.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
---
v4:
- Use the new subop.
- Add assert to check the flag isn't already set when coming back from
  construct_memop_from_reservation().
- Use &= to clear the flag instead of clear_bit().
- Move the alias to xen/common/memory.c
v3:
- Don't ignore the error from xc_get_domain_mem_map().
- Re-purposing the _MEMF_no_refcount as _MEMF_force_heap_alloc to
  avoid introduction of a new, single-use flag.
- Reject other reservation sub-ops to use the newly added flag.
- Replace all the GUEST_MAGIC_BASE usages.
v2:
- New patch
---
 tools/helpers/init-dom0less.c | 35 +++++++++++-----
 tools/include/xenctrl.h       |  7 ++++
 tools/libs/ctrl/xc_domain.c   | 79 ++++++++++++++++++++++++++---------
 xen/common/memory.c           | 30 +++++++++++--
 xen/include/public/memory.h   |  3 +-
 5 files changed, 120 insertions(+), 34 deletions(-)

diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c
index fee93459c4..dccab7b29b 100644
--- a/tools/helpers/init-dom0less.c
+++ b/tools/helpers/init-dom0less.c
@@ -19,24 +19,42 @@
 #define XENSTORE_PFN_OFFSET 1
 #define STR_MAX_LENGTH 128
 
+static xen_pfn_t xs_page_base;
+static xen_pfn_t xs_page_p2m;
+
 static int alloc_xs_page(struct xc_interface_core *xch,
                          libxl_dominfo *info,
                          uint64_t *xenstore_pfn)
 {
-    int rc;
-    const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT;
-    xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET;
+    int rc, i;
+    uint32_t nr_regions = XEN_MAX_MEM_REGIONS;
+    struct xen_mem_region mem_regions[XEN_MAX_MEM_REGIONS] = {0};
+
+    rc = xc_get_domain_mem_map(xch, info->domid, mem_regions, &nr_regions);
+    if (rc < 0)
+        return rc;
+
+    for ( i = 0; i < nr_regions; i++ )
+    {
+        if ( mem_regions[i].type == XEN_MEM_REGION_MAGIC )
+        {
+            xs_page_base = mem_regions[i].start >> XC_PAGE_SHIFT;
+            xs_page_p2m = (mem_regions[i].start >> XC_PAGE_SHIFT) +
+                          XENSTORE_PFN_OFFSET;
+        }
+    }
 
     rc = xc_domain_setmaxmem(xch, info->domid,
                              info->max_memkb + (XC_PAGE_SIZE/1024));
     if (rc < 0)
         return rc;
 
-    rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, &p2m);
+    rc = xc_domain_populate_physmap_heap_exact(xch, info->domid, 1, 0, 0,
+                                               &xs_page_p2m);
     if (rc < 0)
         return rc;
 
-    *xenstore_pfn = base + XENSTORE_PFN_OFFSET;
+    *xenstore_pfn = xs_page_base + XENSTORE_PFN_OFFSET;
     rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn);
     if (rc < 0)
         return rc;
@@ -145,8 +163,7 @@ static int create_xenstore(struct xs_handle *xsh,
     rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, info->current_memkb);
     if (rc < 0 || rc >= STR_MAX_LENGTH)
         return rc;
-    rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld",
-                  (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET);
+    rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu_xen_pfn, xs_page_p2m);
     if (rc < 0 || rc >= STR_MAX_LENGTH)
         return rc;
     rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port);
@@ -282,9 +299,7 @@ static int init_domain(struct xs_handle *xsh,
     if (rc)
         err(1, "writing to xenstore");
 
-    rc = xs_introduce_domain(xsh, info->domid,
-            (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET,
-            xenstore_evtchn);
+    rc = xs_introduce_domain(xsh, info->domid, xs_page_p2m, xenstore_evtchn);
     if (!rc)
         err(1, "xs_introduce_domain");
     return 0;
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index b25e9772a2..c1a601813a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1342,6 +1342,13 @@ int xc_domain_populate_physmap(xc_interface *xch,
                                unsigned int mem_flags,
                                xen_pfn_t *extent_start);
 
+int xc_domain_populate_physmap_heap_exact(xc_interface *xch,
+                                          uint32_t domid,
+                                          unsigned long nr_extents,
+                                          unsigned int extent_order,
+                                          unsigned int mem_flags,
+                                          xen_pfn_t *extent_start);
+
 int xc_domain_populate_physmap_exact(xc_interface *xch,
                                      uint32_t domid,
                                      unsigned long nr_extents,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index 4dba55d01d..82c1554613 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -916,6 +916,36 @@ int xc_domain_nr_gpfns(xc_interface *xch, uint32_t domid, xen_pfn_t *gpfns)
     return rc;
 }
 
+static int xc_populate_physmap_cmd(xc_interface *xch,
+                                   unsigned int cmd,
+                                   uint32_t domid,
+                                   unsigned long nr_extents,
+                                   unsigned int extent_order,
+                                   unsigned int mem_flags,
+                                   xen_pfn_t *extent_start)
+{
+    int err;
+    DECLARE_HYPERCALL_BOUNCE(extent_start, nr_extents * sizeof(*extent_start), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+    struct xen_memory_reservation reservation = {
+        .nr_extents   = nr_extents,
+        .extent_order = extent_order,
+        .mem_flags    = mem_flags,
+        .domid        = domid
+    };
+
+    if ( xc_hypercall_bounce_pre(xch, extent_start) )
+    {
+        PERROR("Could not bounce memory for XENMEM_populate_physmap hypercall");
+        return -1;
+    }
+    set_xen_guest_handle(reservation.extent_start, extent_start);
+
+    err = xc_memory_op(xch, cmd, &reservation, sizeof(reservation));
+
+    xc_hypercall_bounce_post(xch, extent_start);
+    return err;
+}
+
 int xc_domain_increase_reservation(xc_interface *xch,
                                    uint32_t domid,
                                    unsigned long nr_extents,
@@ -1135,26 +1165,9 @@ int xc_domain_populate_physmap(xc_interface *xch,
                                unsigned int mem_flags,
                                xen_pfn_t *extent_start)
 {
-    int err;
-    DECLARE_HYPERCALL_BOUNCE(extent_start, nr_extents * sizeof(*extent_start), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    struct xen_memory_reservation reservation = {
-        .nr_extents   = nr_extents,
-        .extent_order = extent_order,
-        .mem_flags    = mem_flags,
-        .domid        = domid
-    };
-
-    if ( xc_hypercall_bounce_pre(xch, extent_start) )
-    {
-        PERROR("Could not bounce memory for XENMEM_populate_physmap hypercall");
-        return -1;
-    }
-    set_xen_guest_handle(reservation.extent_start, extent_start);
-
-    err = xc_memory_op(xch, XENMEM_populate_physmap, &reservation, sizeof(reservation));
-
-    xc_hypercall_bounce_post(xch, extent_start);
-    return err;
+    return xc_populate_physmap_cmd(xch, XENMEM_populate_physmap, domid,
+                                   nr_extents, extent_order, mem_flags,
+                                   extent_start);
 }
 
 int xc_domain_populate_physmap_exact(xc_interface *xch,
@@ -1182,6 +1195,32 @@ int xc_domain_populate_physmap_exact(xc_interface *xch,
     return err;
 }
 
+int xc_domain_populate_physmap_heap_exact(xc_interface *xch,
+                                          uint32_t domid,
+                                          unsigned long nr_extents,
+                                          unsigned int extent_order,
+                                          unsigned int mem_flags,
+                                          xen_pfn_t *extent_start)
+{
+    int err;
+
+    err = xc_populate_physmap_cmd(xch, XENMEM_populate_physmap_heap_alloc,
+                                  domid, nr_extents, extent_order, mem_flags,
+                                  extent_start);
+    if ( err == nr_extents )
+        return 0;
+
+    if ( err >= 0 )
+    {
+        DPRINTF("Failed allocation for dom %d: %ld extents of order %d\n",
+                domid, nr_extents, extent_order);
+        errno = EBUSY;
+        err = -1;
+    }
+
+    return err;
+}
+
 int xc_domain_memory_exchange_pages(xc_interface *xch,
                                     uint32_t domid,
                                     unsigned long nr_in_extents,
diff --git a/xen/common/memory.c b/xen/common/memory.c
index b4593f5f45..a4733869c2 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -155,6 +155,14 @@ static void increase_reservation(struct memop_args *a)
     a->nr_done = i;
 }
 
+/*
+ * Alias of _MEMF_no_refcount to avoid introduction of a new, single-use flag.
+ * This flag should be used for populate_physmap() only as a re-purposing of
+ * _MEMF_no_refcount to force a non-1:1 allocation from domheap.
+ */
+#define _MEMF_force_heap_alloc _MEMF_no_refcount
+#define  MEMF_force_heap_alloc (1U<<_MEMF_force_heap_alloc)
+
 static void populate_physmap(struct memop_args *a)
 {
     struct page_info *page;
@@ -219,7 +227,8 @@ static void populate_physmap(struct memop_args *a)
         }
         else
         {
-            if ( is_domain_direct_mapped(d) )
+            if ( is_domain_direct_mapped(d) &&
+                 !(a->memflags & MEMF_force_heap_alloc) )
             {
                 mfn = _mfn(gpfn);
 
@@ -246,7 +255,8 @@ static void populate_physmap(struct memop_args *a)
 
                 mfn = _mfn(gpfn);
             }
-            else if ( is_domain_using_staticmem(d) )
+            else if ( is_domain_using_staticmem(d) &&
+                      !(a->memflags & MEMF_force_heap_alloc) )
             {
                 /*
                  * No easy way to guarantee the retrieved pages are contiguous,
@@ -271,6 +281,14 @@ static void populate_physmap(struct memop_args *a)
             }
             else
             {
+                /*
+                 * Avoid passing MEMF_force_heap_alloc down to
+                 * alloc_domheap_pages() where the meaning would be the
+                 * original MEMF_no_refcount.
+                 */
+                if ( unlikely(a->memflags & MEMF_force_heap_alloc) )
+                    a->memflags &= ~MEMF_force_heap_alloc;
+
                 page = alloc_domheap_pages(d, a->extent_order, a->memflags);
 
                 if ( unlikely(!page) )
@@ -1404,6 +1422,7 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     {
     case XENMEM_increase_reservation:
     case XENMEM_decrease_reservation:
+    case XENMEM_populate_physmap_heap_alloc:
     case XENMEM_populate_physmap:
         if ( copy_from_guest(&reservation, arg, 1) )
             return start_extent;
@@ -1433,6 +1452,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
              && (reservation.mem_flags & XENMEMF_populate_on_demand) )
             args.memflags |= MEMF_populate_on_demand;
 
+        /* Assert flag is not set from construct_memop_from_reservation(). */
+        ASSERT(!(args.memflags & MEMF_force_heap_alloc));
+        if ( op == XENMEM_populate_physmap_heap_alloc )
+            args.memflags |= MEMF_force_heap_alloc;
+
         if ( xsm_memory_adjust_reservation(XSM_TARGET, curr_d, d) )
         {
             rcu_unlock_domain(d);
@@ -1453,7 +1477,7 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         case XENMEM_decrease_reservation:
             decrease_reservation(&args);
             break;
-        default: /* XENMEM_populate_physmap */
+        default: /* XENMEM_populate_{physmap, physmap_heap_alloc} */
             populate_physmap(&args);
             break;
         }
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 5e545ae9a4..5e79992671 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -21,6 +21,7 @@
 #define XENMEM_increase_reservation 0
 #define XENMEM_decrease_reservation 1
 #define XENMEM_populate_physmap     6
+#define XENMEM_populate_physmap_heap_alloc 29
 
 #if __XEN_INTERFACE_VERSION__ >= 0x00030209
 /*
@@ -731,7 +732,7 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 29 */
+/* Next available subop number is 30 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 5/5] tools/libs/ctrl: Simplify xc helpers related to populate_physmap()
  2024-04-09  4:53 [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Henry Wang
                   ` (3 preceding siblings ...)
  2024-04-09  4:53 ` [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less Henry Wang
@ 2024-04-09  4:53 ` Henry Wang
  2024-04-18 14:16 ` [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Daniel P. Smith
  5 siblings, 0 replies; 17+ messages in thread
From: Henry Wang @ 2024-04-09  4:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Henry Wang, Anthony PERARD, Juergen Gross, Jan Beulich

There are currently a lot of xc helpers to call populate_physmap()
in the hypervisor with different parameters, such as:
- xc_domain_{increase, decrease}_{reservation, reservation_exact}
- xc_domain_populate_physmap
- xc_domain_populate_{physmap_exact, physmap_heap_exact}

Most of them share the same and duplicated logic.

Extract the duplicated code of these xc helpers to local helper
functions to simplify the code.

No functional change intended.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
---
v4:
- New patch.
---
 tools/libs/ctrl/xc_domain.c | 178 +++++++++++++-----------------------
 1 file changed, 62 insertions(+), 116 deletions(-)

diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index 82c1554613..d023596bed 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -935,7 +935,7 @@ static int xc_populate_physmap_cmd(xc_interface *xch,
 
     if ( xc_hypercall_bounce_pre(xch, extent_start) )
     {
-        PERROR("Could not bounce memory for XENMEM_populate_physmap hypercall");
+        PERROR("Could not bounce memory for hypercall %u", cmd);
         return -1;
     }
     set_xen_guest_handle(reservation.extent_start, extent_start);
@@ -946,39 +946,8 @@ static int xc_populate_physmap_cmd(xc_interface *xch,
     return err;
 }
 
-int xc_domain_increase_reservation(xc_interface *xch,
-                                   uint32_t domid,
-                                   unsigned long nr_extents,
-                                   unsigned int extent_order,
-                                   unsigned int mem_flags,
-                                   xen_pfn_t *extent_start)
-{
-    int err;
-    DECLARE_HYPERCALL_BOUNCE(extent_start, nr_extents * sizeof(*extent_start), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    struct xen_memory_reservation reservation = {
-        .nr_extents   = nr_extents,
-        .extent_order = extent_order,
-        .mem_flags    = mem_flags,
-        .domid        = domid
-    };
-
-    /* may be NULL */
-    if ( xc_hypercall_bounce_pre(xch, extent_start) )
-    {
-        PERROR("Could not bounce memory for XENMEM_increase_reservation hypercall");
-        return -1;
-    }
-
-    set_xen_guest_handle(reservation.extent_start, extent_start);
-
-    err = xc_memory_op(xch, XENMEM_increase_reservation, &reservation, sizeof(reservation));
-
-    xc_hypercall_bounce_post(xch, extent_start);
-
-    return err;
-}
-
-int xc_domain_increase_reservation_exact(xc_interface *xch,
+static int xc_populate_physmap_cmd_exact(xc_interface *xch,
+                                         unsigned int cmd,
                                          uint32_t domid,
                                          unsigned long nr_extents,
                                          unsigned int extent_order,
@@ -987,58 +956,75 @@ int xc_domain_increase_reservation_exact(xc_interface *xch,
 {
     int err;
 
-    err = xc_domain_increase_reservation(xch, domid, nr_extents,
-                                         extent_order, mem_flags, extent_start);
-
+    err = xc_populate_physmap_cmd(xch, cmd, domid, nr_extents,
+                                  extent_order, mem_flags, extent_start);
     if ( err == nr_extents )
         return 0;
 
     if ( err >= 0 )
     {
-        DPRINTF("Failed allocation for dom %d: "
+        switch ( cmd )
+        {
+        case XENMEM_increase_reservation:
+            DPRINTF("Failed allocation for dom %d: "
                 "%ld extents of order %d, mem_flags %x\n",
                 domid, nr_extents, extent_order, mem_flags);
-        errno = ENOMEM;
+            errno = ENOMEM;
+            break;
+        case XENMEM_decrease_reservation:
+            DPRINTF("Failed deallocation for dom %d: %ld extents of order %d\n",
+                    domid, nr_extents, extent_order);
+            errno = EINVAL;
+            break;
+        case XENMEM_populate_physmap_heap_alloc:
+        case XENMEM_populate_physmap:
+            DPRINTF("Failed allocation for dom %d: %ld extents of order %d\n",
+                    domid, nr_extents, extent_order);
+            errno = EBUSY;
+            break;
+        default:
+            DPRINTF("Invalid cmd %u\n", cmd);
+            errno = EINVAL;
+            break;
+        }
         err = -1;
     }
 
     return err;
 }
 
-int xc_domain_decrease_reservation(xc_interface *xch,
+int xc_domain_increase_reservation(xc_interface *xch,
                                    uint32_t domid,
                                    unsigned long nr_extents,
                                    unsigned int extent_order,
+                                   unsigned int mem_flags,
                                    xen_pfn_t *extent_start)
 {
-    int err;
-    DECLARE_HYPERCALL_BOUNCE(extent_start, nr_extents * sizeof(*extent_start), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    struct xen_memory_reservation reservation = {
-        .nr_extents   = nr_extents,
-        .extent_order = extent_order,
-        .mem_flags    = 0,
-        .domid        = domid
-    };
-
-    if ( extent_start == NULL )
-    {
-        DPRINTF("decrease_reservation extent_start is NULL!\n");
-        errno = EINVAL;
-        return -1;
-    }
-
-    if ( xc_hypercall_bounce_pre(xch, extent_start) )
-    {
-        PERROR("Could not bounce memory for XENMEM_decrease_reservation hypercall");
-        return -1;
-    }
-    set_xen_guest_handle(reservation.extent_start, extent_start);
-
-    err = xc_memory_op(xch, XENMEM_decrease_reservation, &reservation, sizeof(reservation));
+    return xc_populate_physmap_cmd(xch, XENMEM_increase_reservation, domid,
+                                   nr_extents, extent_order, mem_flags,
+                                   extent_start);
+}
 
-    xc_hypercall_bounce_post(xch, extent_start);
+int xc_domain_increase_reservation_exact(xc_interface *xch,
+                                         uint32_t domid,
+                                         unsigned long nr_extents,
+                                         unsigned int extent_order,
+                                         unsigned int mem_flags,
+                                         xen_pfn_t *extent_start)
+{
+    return xc_populate_physmap_cmd_exact(xch, XENMEM_increase_reservation,
+                                         domid, nr_extents, extent_order,
+                                         mem_flags, extent_start);
+}
 
-    return err;
+int xc_domain_decrease_reservation(xc_interface *xch,
+                                   uint32_t domid,
+                                   unsigned long nr_extents,
+                                   unsigned int extent_order,
+                                   xen_pfn_t *extent_start)
+{
+    return xc_populate_physmap_cmd(xch, XENMEM_decrease_reservation, domid,
+                                   nr_extents, extent_order, 0, extent_start);
 }
 
 int xc_domain_decrease_reservation_exact(xc_interface *xch,
@@ -1047,23 +1033,9 @@ int xc_domain_decrease_reservation_exact(xc_interface *xch,
                                          unsigned int extent_order,
                                          xen_pfn_t *extent_start)
 {
-    int err;
-
-    err = xc_domain_decrease_reservation(xch, domid, nr_extents,
-                                         extent_order, extent_start);
-
-    if ( err == nr_extents )
-        return 0;
-
-    if ( err >= 0 )
-    {
-        DPRINTF("Failed deallocation for dom %d: %ld extents of order %d\n",
-                domid, nr_extents, extent_order);
-        errno = EINVAL;
-        err = -1;
-    }
-
-    return err;
+    return xc_populate_physmap_cmd_exact(xch, XENMEM_decrease_reservation,
+                                         domid, nr_extents, extent_order,
+                                         0, extent_start);
 }
 
 int xc_domain_add_to_physmap(xc_interface *xch,
@@ -1177,22 +1149,9 @@ int xc_domain_populate_physmap_exact(xc_interface *xch,
                                      unsigned int mem_flags,
                                      xen_pfn_t *extent_start)
 {
-    int err;
-
-    err = xc_domain_populate_physmap(xch, domid, nr_extents,
-                                     extent_order, mem_flags, extent_start);
-    if ( err == nr_extents )
-        return 0;
-
-    if ( err >= 0 )
-    {
-        DPRINTF("Failed allocation for dom %d: %ld extents of order %d\n",
-                domid, nr_extents, extent_order);
-        errno = EBUSY;
-        err = -1;
-    }
-
-    return err;
+    return xc_populate_physmap_cmd_exact(xch, XENMEM_populate_physmap,
+                                         domid, nr_extents, extent_order,
+                                         mem_flags, extent_start);
 }
 
 int xc_domain_populate_physmap_heap_exact(xc_interface *xch,
@@ -1202,23 +1161,10 @@ int xc_domain_populate_physmap_heap_exact(xc_interface *xch,
                                           unsigned int mem_flags,
                                           xen_pfn_t *extent_start)
 {
-    int err;
-
-    err = xc_populate_physmap_cmd(xch, XENMEM_populate_physmap_heap_alloc,
-                                  domid, nr_extents, extent_order, mem_flags,
-                                  extent_start);
-    if ( err == nr_extents )
-        return 0;
-
-    if ( err >= 0 )
-    {
-        DPRINTF("Failed allocation for dom %d: %ld extents of order %d\n",
-                domid, nr_extents, extent_order);
-        errno = EBUSY;
-        err = -1;
-    }
-
-    return err;
+    return xc_populate_physmap_cmd_exact(xch,
+                                         XENMEM_populate_physmap_heap_alloc,
+                                         domid, nr_extents, extent_order,
+                                         mem_flags, extent_start);
 }
 
 int xc_domain_memory_exchange_pages(xc_interface *xch,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map
  2024-04-09  4:53 ` [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map Henry Wang
@ 2024-04-18 12:37   ` Jan Beulich
  2024-04-19  2:27     ` Henry Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-04-18 12:37 UTC (permalink / raw)
  To: Henry Wang
  Cc: Anthony PERARD, Juergen Gross, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Alec Kwapis, xen-devel

On 09.04.2024 06:53, Henry Wang wrote:
> --- a/tools/libs/ctrl/xc_domain.c
> +++ b/tools/libs/ctrl/xc_domain.c
> @@ -697,6 +697,43 @@ int xc_domain_setmaxmem(xc_interface *xch,
>      return do_domctl(xch, &domctl);
>  }
>  
> +int xc_get_domain_mem_map(xc_interface *xch, uint32_t domid,
> +                          struct xen_mem_region mem_regions[],
> +                          uint32_t *nr_regions)
> +{
> +    int rc;
> +    uint32_t nr = *nr_regions;
> +    struct xen_domctl domctl = {
> +        .cmd         = XEN_DOMCTL_get_mem_map,
> +        .domain      = domid,
> +        .u.mem_map = {
> +            .nr_mem_regions = nr,
> +        },
> +    };
> +
> +    DECLARE_HYPERCALL_BOUNCE(mem_regions, sizeof(xen_mem_region_t) * nr,
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +
> +    if ( !mem_regions || xc_hypercall_bounce_pre(xch, mem_regions) || nr < 1 )

Why the nr < 1 part? For a caller to size the necessary buffer, it may want
to pass in 0 (and a NULL buffer pointer) first.

> @@ -176,6 +175,33 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
>  
>          return rc;
>      }
> +    case XEN_DOMCTL_get_mem_map:
> +    {
> +        int rc = 0;
> +        uint32_t nr_regions;

unsigned int (see ./CODING_STYLE)?

> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -223,6 +223,13 @@ typedef uint64_t xen_pfn_t;
>   */
>  #define XEN_LEGACY_MAX_VCPUS 1
>  
> +/*
> + * Maximum number of memory map regions for guest memory layout.
> + * Used by XEN_DOMCTL_get_mem_map, currently there is only one region
> + * for the guest magic pages.
> + */
> +#define XEN_MAX_MEM_REGIONS 1

Why is this in the public header? I can only find Xen-internal uses.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -946,6 +946,33 @@ struct xen_domctl_paging_mempool {
>      uint64_aligned_t size; /* Size in bytes. */
>  };
>  
> +#ifndef XEN_MAX_MEM_REGIONS
> +#define XEN_MAX_MEM_REGIONS 1
> +#endif
> +
> +struct xen_mem_region {
> +    uint64_aligned_t start;
> +    uint64_aligned_t size;
> +#define XEN_MEM_REGION_DEFAULT    0

I can't spot any use of this. What's its purpose?

> +#define XEN_MEM_REGION_MAGIC      1
> +    uint32_t         type;
> +    /* Must be zero */
> +    uint32_t         pad;

This being OUT only, I don't think the comment makes sense. I'd omit it
completely; if you absolutely want one, please say "will" instead of "must".

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less
  2024-04-09  4:53 ` [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less Henry Wang
@ 2024-04-18 12:54   ` Jan Beulich
  2024-04-19  2:31     ` Henry Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-04-18 12:54 UTC (permalink / raw)
  To: Henry Wang
  Cc: Anthony PERARD, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Juergen Gross, Alec Kwapis, xen-devel,
	Daniel Smith

On 09.04.2024 06:53, Henry Wang wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -155,6 +155,14 @@ static void increase_reservation(struct memop_args *a)
>      a->nr_done = i;
>  }
>  
> +/*
> + * Alias of _MEMF_no_refcount to avoid introduction of a new, single-use flag.
> + * This flag should be used for populate_physmap() only as a re-purposing of
> + * _MEMF_no_refcount to force a non-1:1 allocation from domheap.
> + */
> +#define _MEMF_force_heap_alloc _MEMF_no_refcount
> +#define  MEMF_force_heap_alloc (1U<<_MEMF_force_heap_alloc)

Nit (style): Blanks around << please.

Also do you really need both constants? I dont think so.

Plus please make sure to #undef the constant once no longer needed, to
help spotting / avoiding misuses.

> @@ -219,7 +227,8 @@ static void populate_physmap(struct memop_args *a)
>          }
>          else
>          {
> -            if ( is_domain_direct_mapped(d) )
> +            if ( is_domain_direct_mapped(d) &&
> +                 !(a->memflags & MEMF_force_heap_alloc) )
>              {
>                  mfn = _mfn(gpfn);
>  
> @@ -246,7 +255,8 @@ static void populate_physmap(struct memop_args *a)
>  
>                  mfn = _mfn(gpfn);
>              }
> -            else if ( is_domain_using_staticmem(d) )
> +            else if ( is_domain_using_staticmem(d) &&
> +                      !(a->memflags & MEMF_force_heap_alloc) )
>              {
>                  /*
>                   * No easy way to guarantee the retrieved pages are contiguous,
> @@ -271,6 +281,14 @@ static void populate_physmap(struct memop_args *a)
>              }
>              else
>              {
> +                /*
> +                 * Avoid passing MEMF_force_heap_alloc down to
> +                 * alloc_domheap_pages() where the meaning would be the
> +                 * original MEMF_no_refcount.
> +                 */
> +                if ( unlikely(a->memflags & MEMF_force_heap_alloc) )
> +                    a->memflags &= ~MEMF_force_heap_alloc;

As asked before: Why the if()?

> @@ -1404,6 +1422,7 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      {
>      case XENMEM_increase_reservation:
>      case XENMEM_decrease_reservation:
> +    case XENMEM_populate_physmap_heap_alloc:
>      case XENMEM_populate_physmap:
>          if ( copy_from_guest(&reservation, arg, 1) )
>              return start_extent;

Nit or not: Please insert the new case label last.

> @@ -1433,6 +1452,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>               && (reservation.mem_flags & XENMEMF_populate_on_demand) )
>              args.memflags |= MEMF_populate_on_demand;
>  
> +        /* Assert flag is not set from construct_memop_from_reservation(). */
> +        ASSERT(!(args.memflags & MEMF_force_heap_alloc));
> +        if ( op == XENMEM_populate_physmap_heap_alloc )
> +            args.memflags |= MEMF_force_heap_alloc;

Wouldn't this more logically live ...

>          if ( xsm_memory_adjust_reservation(XSM_TARGET, curr_d, d) )
>          {
>              rcu_unlock_domain(d);
> @@ -1453,7 +1477,7 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          case XENMEM_decrease_reservation:
>              decrease_reservation(&args);
>              break;

here, as

          case XENMEM_populate_physmap_heap_alloc:
              ...
              fallthrough;
> -        default: /* XENMEM_populate_physmap */
> +        default: /* XENMEM_populate_{physmap, physmap_heap_alloc} */

Otherwise: Just XENMEM_populate_physmap{,_heap_alloc} perhaps?

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -21,6 +21,7 @@
>  #define XENMEM_increase_reservation 0
>  #define XENMEM_decrease_reservation 1
>  #define XENMEM_populate_physmap     6
> +#define XENMEM_populate_physmap_heap_alloc 29

Without a comment, how is one supposed to know what the difference is of
this new sub-op compared to the "normal" one? I actually wonder whether
referring to a Xen internal (allocation requested to come from the heap)
is actually a good idea here. I'm inclined to suggest to name this after
the purpose it has from the guest or tool stack perspective.

Speaking of which: Is this supposed to be guest-accessible, or is it
intended for tool-stack use only (I have to admit I don't even know where
init-dom0less actually runs)? In the latter case that also wants enforcing.
This may require an adjustment to the XSM hook in use here. Cc-ing Daniel
for possible advice.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs
  2024-04-09  4:53 [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Henry Wang
                   ` (4 preceding siblings ...)
  2024-04-09  4:53 ` [PATCH v4 5/5] tools/libs/ctrl: Simplify xc helpers related to populate_physmap() Henry Wang
@ 2024-04-18 14:16 ` Daniel P. Smith
  2024-04-19  1:45   ` Henry Wang
  2024-04-25 22:18   ` Stefano Stabellini
  5 siblings, 2 replies; 17+ messages in thread
From: Daniel P. Smith @ 2024-04-18 14:16 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Anthony PERARD, Juergen Gross, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk

On 4/9/24 00:53, Henry Wang wrote:
> An error message can seen from the init-dom0less application on
> direct-mapped 1:1 domains:
> ```
> Allocating magic pages
> memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
> Error on alloc magic pages
> ```
> 
> This is because populate_physmap() automatically assumes gfn == mfn
> for direct mapped domains. This cannot be true for the magic pages
> that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
> helper application executed in Dom0. For domain using statically
> allocated memory but not 1:1 direct-mapped, similar error "failed to
> retrieve a reserved page" can be seen as the reserved memory list
> is empty at that time.
> 
> This series tries to fix this issue using a DOMCTL-based approach,
> because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
> and inform the toolstack about the region found by hypervisor for
> mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
> guest memory map, currently only used for the magic page regions.
> Patch 2 generalized the extended region finding logic so that it can
> be reused for other use cases such as finding 1:1 domU magic regions.
> Patch 3 uses the same approach as finding the extended regions to find
> the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
> hardcoding all base addresses of guest magic region in the init-dom0less
> application by consuming the newly introduced DOMCTL. Patch 5 is a
> simple patch to do some code duplication clean-up in xc.

Hey Henry,

To help provide some perspective, these issues are not experienced with 
hyperlaunch. This is because we understood early on that you cannot move 
a lightweight version of the toolstack into hypervisor init and not 
provide a mechanism to communicate what it did to the runtime control 
plane. We evaluated the possible mechanism, to include introducing a new 
hypercall op, and ultimately settled on using hypfs. The primary reason 
is this information is static data that, while informative later, is 
only necessary for the control plane to understand the state of the 
system. As a result, hyperlaunch is able to allocate any and all special 
pages required as part of domain construction and communicate their 
addresses to the control plane. As for XSM, hypfs is already protected 
and at this time we do not see any domain builder information needing to 
be restricted separately from the data already present in hypfs.

I would like to make the suggestion that instead of continuing down this 
path, perhaps you might consider adopting the hyperlaunch usage of 
hypfs. Then adjust dom0less domain construction to allocate the special 
pages at construction time. The original hyperlaunch series includes a 
patch that provides the helper app for the xenstore announcement. And I 
can provide you with updated versions if that would be helpful.

V/r,
Daniel P. Smith


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs
  2024-04-18 14:16 ` [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Daniel P. Smith
@ 2024-04-19  1:45   ` Henry Wang
  2024-04-25 22:18   ` Stefano Stabellini
  1 sibling, 0 replies; 17+ messages in thread
From: Henry Wang @ 2024-04-19  1:45 UTC (permalink / raw)
  To: Daniel P. Smith, xen-devel
  Cc: Anthony PERARD, Juergen Gross, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk

Hi Daniel,

On 4/18/2024 10:16 PM, Daniel P. Smith wrote:
> On 4/9/24 00:53, Henry Wang wrote:
>> An error message can seen from the init-dom0less application on
>> direct-mapped 1:1 domains:
>> ```
>> Allocating magic pages
>> memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
>> Error on alloc magic pages
>> ```
>>
>> This is because populate_physmap() automatically assumes gfn == mfn
>> for direct mapped domains. This cannot be true for the magic pages
>> that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
>> helper application executed in Dom0. For domain using statically
>> allocated memory but not 1:1 direct-mapped, similar error "failed to
>> retrieve a reserved page" can be seen as the reserved memory list
>> is empty at that time.
>>
>> This series tries to fix this issue using a DOMCTL-based approach,
>> because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
>> and inform the toolstack about the region found by hypervisor for
>> mapping the magic pages.
>
> Hey Henry,
>
> To help provide some perspective, these issues are not experienced 
> with hyperlaunch. This is because we understood early on that you 
> cannot move a lightweight version of the toolstack into hypervisor 
> init and not provide a mechanism to communicate what it did to the 
> runtime control plane. We evaluated the possible mechanism, to include 
> introducing a new hypercall op, and ultimately settled on using hypfs. 
> The primary reason is this information is static data that, while 
> informative later, is only necessary for the control plane to 
> understand the state of the system. As a result, hyperlaunch is able 
> to allocate any and all special pages required as part of domain 
> construction and communicate their addresses to the control plane. As 
> for XSM, hypfs is already protected and at this time we do not see any 
> domain builder information needing to be restricted separately from 
> the data already present in hypfs.
>
> I would like to make the suggestion that instead of continuing down 
> this path, perhaps you might consider adopting the hyperlaunch usage 
> of hypfs. Then adjust dom0less domain construction to allocate the 
> special pages at construction time. 

Thank you for the suggestion. I think your proposal makes sense. However 
I am not familiar with the hypfs so may I ask some questions first to 
confirm if I understand your proposal correctly: Do you mean I should 
firstly find, allocate and create mapping for these special pages at the 
dom0less domU's construction time, then store the GPA in hypfs and 
extract the GPA from init-dom0less app later on? Should I use existing 
interfaces such as xenhypfs_{open,cat,ls, etc} or I may probably need to 
add new hypercall ops?

> The original hyperlaunch series includes a patch that provides the 
> helper app for the xenstore announcement. And I can provide you with 
> updated versions if that would be helpful.

Thank you, yes a pointer to the corresponding series and patch would be 
definitely helpful.

Kind regards,
Henry

>
> V/r,
> Daniel P. Smith



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map
  2024-04-18 12:37   ` Jan Beulich
@ 2024-04-19  2:27     ` Henry Wang
  2024-04-19  6:16       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Henry Wang @ 2024-04-19  2:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Anthony PERARD, Juergen Gross, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Alec Kwapis, xen-devel

Hi Jan,

On 4/18/2024 8:37 PM, Jan Beulich wrote:
> On 09.04.2024 06:53, Henry Wang wrote:
>> --- a/tools/libs/ctrl/xc_domain.c
>> +++ b/tools/libs/ctrl/xc_domain.c
>> @@ -697,6 +697,43 @@ int xc_domain_setmaxmem(xc_interface *xch,
>>       return do_domctl(xch, &domctl);
>>   }
>>   
>> +int xc_get_domain_mem_map(xc_interface *xch, uint32_t domid,
>> +                          struct xen_mem_region mem_regions[],
>> +                          uint32_t *nr_regions)
>> +{
>> +    int rc;
>> +    uint32_t nr = *nr_regions;
>> +    struct xen_domctl domctl = {
>> +        .cmd         = XEN_DOMCTL_get_mem_map,
>> +        .domain      = domid,
>> +        .u.mem_map = {
>> +            .nr_mem_regions = nr,
>> +        },
>> +    };
>> +
>> +    DECLARE_HYPERCALL_BOUNCE(mem_regions, sizeof(xen_mem_region_t) * nr,
>> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>> +
>> +    if ( !mem_regions || xc_hypercall_bounce_pre(xch, mem_regions) || nr < 1 )
> Why the nr < 1 part? For a caller to size the necessary buffer, it may want
> to pass in 0 (and a NULL buffer pointer) first.

I will drop this nr < 1 part.

>> @@ -176,6 +175,33 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
>>   
>>           return rc;
>>       }
>> +    case XEN_DOMCTL_get_mem_map:
>> +    {
>> +        int rc = 0;
>> +        uint32_t nr_regions;
> unsigned int (see ./CODING_STYLE)?

Ok, I will use unsigned int.

>> --- a/xen/include/public/arch-arm.h
>> +++ b/xen/include/public/arch-arm.h
>> @@ -223,6 +223,13 @@ typedef uint64_t xen_pfn_t;
>>    */
>>   #define XEN_LEGACY_MAX_VCPUS 1
>>   
>> +/*
>> + * Maximum number of memory map regions for guest memory layout.
>> + * Used by XEN_DOMCTL_get_mem_map, currently there is only one region
>> + * for the guest magic pages.
>> + */
>> +#define XEN_MAX_MEM_REGIONS 1
> Why is this in the public header? I can only find Xen-internal uses.

It will also be used in the init-dom0less app which is the toolstack side.

>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -946,6 +946,33 @@ struct xen_domctl_paging_mempool {
>>       uint64_aligned_t size; /* Size in bytes. */
>>   };
>>   
>> +#ifndef XEN_MAX_MEM_REGIONS
>> +#define XEN_MAX_MEM_REGIONS 1
>> +#endif
>> +
>> +struct xen_mem_region {
>> +    uint64_aligned_t start;
>> +    uint64_aligned_t size;
>> +#define XEN_MEM_REGION_DEFAULT    0
> I can't spot any use of this. What's its purpose?

I can drop it. My original intention is to define a default type since 
the struct arch_domain should be zalloc-ed.

>> +#define XEN_MEM_REGION_MAGIC      1
>> +    uint32_t         type;
>> +    /* Must be zero */
>> +    uint32_t         pad;
> This being OUT only, I don't think the comment makes sense. I'd omit it
> completely; if you absolutely want one, please say "will" instead of "must".

Sure, I will follow your suggestion. Thanks.

Kind regards,
Henry


>
> Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less
  2024-04-18 12:54   ` Jan Beulich
@ 2024-04-19  2:31     ` Henry Wang
  2024-04-19  6:18       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Henry Wang @ 2024-04-19  2:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Anthony PERARD, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Juergen Gross, Alec Kwapis, xen-devel,
	Daniel Smith

Hi Jan,

On 4/18/2024 8:54 PM, Jan Beulich wrote:
> On 09.04.2024 06:53, Henry Wang wrote:
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -155,6 +155,14 @@ static void increase_reservation(struct memop_args *a)
>>       a->nr_done = i;
>>   }
>>   
>> +/*
>> + * Alias of _MEMF_no_refcount to avoid introduction of a new, single-use flag.
>> + * This flag should be used for populate_physmap() only as a re-purposing of
>> + * _MEMF_no_refcount to force a non-1:1 allocation from domheap.
>> + */
>> +#define _MEMF_force_heap_alloc _MEMF_no_refcount
>> +#define  MEMF_force_heap_alloc (1U<<_MEMF_force_heap_alloc)
> Nit (style): Blanks around << please.
>
> Also do you really need both constants? I dont think so.
>
> Plus please make sure to #undef the constant once no longer needed, to
> help spotting / avoiding misuses.

Sounds good, I will fix the NIT, drop the first #define and properly add 
#undef.

>> @@ -219,7 +227,8 @@ static void populate_physmap(struct memop_args *a)
>>           }
>>           else
>>           {
>> -            if ( is_domain_direct_mapped(d) )
>> +            if ( is_domain_direct_mapped(d) &&
>> +                 !(a->memflags & MEMF_force_heap_alloc) )
>>               {
>>                   mfn = _mfn(gpfn);
>>   
>> @@ -246,7 +255,8 @@ static void populate_physmap(struct memop_args *a)
>>   
>>                   mfn = _mfn(gpfn);
>>               }
>> -            else if ( is_domain_using_staticmem(d) )
>> +            else if ( is_domain_using_staticmem(d) &&
>> +                      !(a->memflags & MEMF_force_heap_alloc) )
>>               {
>>                   /*
>>                    * No easy way to guarantee the retrieved pages are contiguous,
>> @@ -271,6 +281,14 @@ static void populate_physmap(struct memop_args *a)
>>               }
>>               else
>>               {
>> +                /*
>> +                 * Avoid passing MEMF_force_heap_alloc down to
>> +                 * alloc_domheap_pages() where the meaning would be the
>> +                 * original MEMF_no_refcount.
>> +                 */
>> +                if ( unlikely(a->memflags & MEMF_force_heap_alloc) )
>> +                    a->memflags &= ~MEMF_force_heap_alloc;
> As asked before: Why the if()?

I think there is no need to clear the flag if it is not set. But you are 
correct, the if is not needed. I can drop it.

>> @@ -1404,6 +1422,7 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>       {
>>       case XENMEM_increase_reservation:
>>       case XENMEM_decrease_reservation:
>> +    case XENMEM_populate_physmap_heap_alloc:
>>       case XENMEM_populate_physmap:
>>           if ( copy_from_guest(&reservation, arg, 1) )
>>               return start_extent;
> Nit or not: Please insert the new case label last.

Sure.

>> @@ -1433,6 +1452,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>                && (reservation.mem_flags & XENMEMF_populate_on_demand) )
>>               args.memflags |= MEMF_populate_on_demand;
>>   
>> +        /* Assert flag is not set from construct_memop_from_reservation(). */
>> +        ASSERT(!(args.memflags & MEMF_force_heap_alloc));
>> +        if ( op == XENMEM_populate_physmap_heap_alloc )
>> +            args.memflags |= MEMF_force_heap_alloc;
> Wouldn't this more logically live ...
>
>>           if ( xsm_memory_adjust_reservation(XSM_TARGET, curr_d, d) )
>>           {
>>               rcu_unlock_domain(d);
>> @@ -1453,7 +1477,7 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           case XENMEM_decrease_reservation:
>>               decrease_reservation(&args);
>>               break;
> here, as
>
>            case XENMEM_populate_physmap_heap_alloc:
>                ...
>                fallthrough;

Ok.

>> -        default: /* XENMEM_populate_physmap */
>> +        default: /* XENMEM_populate_{physmap, physmap_heap_alloc} */
> Otherwise: Just XENMEM_populate_physmap{,_heap_alloc} perhaps?

Sounds good, thanks for the suggestion.

>> --- a/xen/include/public/memory.h
>> +++ b/xen/include/public/memory.h
>> @@ -21,6 +21,7 @@
>>   #define XENMEM_increase_reservation 0
>>   #define XENMEM_decrease_reservation 1
>>   #define XENMEM_populate_physmap     6
>> +#define XENMEM_populate_physmap_heap_alloc 29
> Without a comment, how is one supposed to know what the difference is of
> this new sub-op compared to the "normal" one? I actually wonder whether
> referring to a Xen internal (allocation requested to come from the heap)
> is actually a good idea here. I'm inclined to suggest to name this after
> the purpose it has from the guest or tool stack perspective.
>
> Speaking of which: Is this supposed to be guest-accessible, or is it
> intended for tool-stack use only (I have to admit I don't even know where
> init-dom0less actually runs)? In the latter case that also wants enforcing.
> This may require an adjustment to the XSM hook in use here. Cc-ing Daniel
> for possible advice.

This sub-op should be called by the init-dom0less application (toolstack 
side), which runs in Dom0. Daniel has proposed an alternative solution 
which is based on the hypfs. If we decide to go that route, I think I 
will rewrite the series. I will wait for the discussion settled. Thanks 
for looping in Daniel!

Kind regards,
Henry

>
> Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map
  2024-04-19  2:27     ` Henry Wang
@ 2024-04-19  6:16       ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2024-04-19  6:16 UTC (permalink / raw)
  To: Henry Wang
  Cc: Anthony PERARD, Juergen Gross, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Alec Kwapis, xen-devel

On 19.04.2024 04:27, Henry Wang wrote:
> On 4/18/2024 8:37 PM, Jan Beulich wrote:
>> On 09.04.2024 06:53, Henry Wang wrote:
>>> --- a/xen/include/public/arch-arm.h
>>> +++ b/xen/include/public/arch-arm.h
>>> @@ -223,6 +223,13 @@ typedef uint64_t xen_pfn_t;
>>>    */
>>>   #define XEN_LEGACY_MAX_VCPUS 1
>>>   
>>> +/*
>>> + * Maximum number of memory map regions for guest memory layout.
>>> + * Used by XEN_DOMCTL_get_mem_map, currently there is only one region
>>> + * for the guest magic pages.
>>> + */
>>> +#define XEN_MAX_MEM_REGIONS 1
>> Why is this in the public header? I can only find Xen-internal uses.
> 
> It will also be used in the init-dom0less app which is the toolstack side.

I've looked there. It's only a convenience to use it there. Imo you want to
do the buffer sizing dynamically (utilizing the change to the hypercall
implementation that I talked you into) and drop this constant from the
public interface.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less
  2024-04-19  2:31     ` Henry Wang
@ 2024-04-19  6:18       ` Jan Beulich
  2024-04-19  6:30         ` Henry Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-04-19  6:18 UTC (permalink / raw)
  To: Henry Wang
  Cc: Anthony PERARD, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Juergen Gross, Alec Kwapis, xen-devel,
	Daniel Smith

On 19.04.2024 04:31, Henry Wang wrote:
> On 4/18/2024 8:54 PM, Jan Beulich wrote:
>> On 09.04.2024 06:53, Henry Wang wrote:
>>> --- a/xen/include/public/memory.h
>>> +++ b/xen/include/public/memory.h
>>> @@ -21,6 +21,7 @@
>>>   #define XENMEM_increase_reservation 0
>>>   #define XENMEM_decrease_reservation 1
>>>   #define XENMEM_populate_physmap     6
>>> +#define XENMEM_populate_physmap_heap_alloc 29
>> Without a comment, how is one supposed to know what the difference is of
>> this new sub-op compared to the "normal" one? I actually wonder whether
>> referring to a Xen internal (allocation requested to come from the heap)
>> is actually a good idea here. I'm inclined to suggest to name this after
>> the purpose it has from the guest or tool stack perspective.
>>
>> Speaking of which: Is this supposed to be guest-accessible, or is it
>> intended for tool-stack use only (I have to admit I don't even know where
>> init-dom0less actually runs)? In the latter case that also wants enforcing.
>> This may require an adjustment to the XSM hook in use here. Cc-ing Daniel
>> for possible advice.
> 
> This sub-op should be called by the init-dom0less application (toolstack 
> side), which runs in Dom0.

I'm puzzled: How can init-dom0less (note its name!) run in Dom0, when there
is none?

Jan

> Daniel has proposed an alternative solution 
> which is based on the hypfs. If we decide to go that route, I think I 
> will rewrite the series. I will wait for the discussion settled. Thanks 
> for looping in Daniel!
> 
> Kind regards,
> Henry



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less
  2024-04-19  6:18       ` Jan Beulich
@ 2024-04-19  6:30         ` Henry Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Henry Wang @ 2024-04-19  6:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Anthony PERARD, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Juergen Gross, Alec Kwapis, xen-devel,
	Daniel Smith

Hi Jan,

On 4/19/2024 2:18 PM, Jan Beulich wrote:
> On 19.04.2024 04:31, Henry Wang wrote:
>> On 4/18/2024 8:54 PM, Jan Beulich wrote:
>>> On 09.04.2024 06:53, Henry Wang wrote:
>>>> --- a/xen/include/public/memory.h
>>>> +++ b/xen/include/public/memory.h
>>>> @@ -21,6 +21,7 @@
>>>>    #define XENMEM_increase_reservation 0
>>>>    #define XENMEM_decrease_reservation 1
>>>>    #define XENMEM_populate_physmap     6
>>>> +#define XENMEM_populate_physmap_heap_alloc 29
>>> Without a comment, how is one supposed to know what the difference is of
>>> this new sub-op compared to the "normal" one? I actually wonder whether
>>> referring to a Xen internal (allocation requested to come from the heap)
>>> is actually a good idea here. I'm inclined to suggest to name this after
>>> the purpose it has from the guest or tool stack perspective.
>>>
>>> Speaking of which: Is this supposed to be guest-accessible, or is it
>>> intended for tool-stack use only (I have to admit I don't even know where
>>> init-dom0less actually runs)? In the latter case that also wants enforcing.
>>> This may require an adjustment to the XSM hook in use here. Cc-ing Daniel
>>> for possible advice.
>> This sub-op should be called by the init-dom0less application (toolstack
>> side), which runs in Dom0.
> I'm puzzled: How can init-dom0less (note its name!) run in Dom0, when there
> is none?

[1] is the original patch that introduced this application (More details 
can be found in the cover letter of the original series of [1]). I think 
the use case for this application is to make dom0less domains to use the 
PV driver when dom0 and dom0less domUs exist at the same time. There 
used to be a discussion regarding the naming confusion, see [2] commit 
message, but I cannot remember if this discussion has settled or not.

[1] 
https://lore.kernel.org/xen-devel/20220505001656.395419-6-sstabellini@kernel.org/
[2] 
https://lore.kernel.org/xen-devel/20230630091210.3742121-1-luca.fancellu@arm.com/

Kind regards,
Henry

> Jan
>
>> Daniel has proposed an alternative solution
>> which is based on the hypfs. If we decide to go that route, I think I
>> will rewrite the series. I will wait for the discussion settled. Thanks
>> for looping in Daniel!
>>
>> Kind regards,
>> Henry



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs
  2024-04-18 14:16 ` [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Daniel P. Smith
  2024-04-19  1:45   ` Henry Wang
@ 2024-04-25 22:18   ` Stefano Stabellini
  2024-04-26  5:22     ` Henry Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Stefano Stabellini @ 2024-04-25 22:18 UTC (permalink / raw)
  To: Daniel P. Smith
  Cc: Henry Wang, xen-devel, Anthony PERARD, Juergen Gross,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk

On Thu, 18 Apr 2024, Daniel P. Smith wrote:
> On 4/9/24 00:53, Henry Wang wrote:
> > An error message can seen from the init-dom0less application on
> > direct-mapped 1:1 domains:
> > ```
> > Allocating magic pages
> > memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
> > Error on alloc magic pages
> > ```
> > 
> > This is because populate_physmap() automatically assumes gfn == mfn
> > for direct mapped domains. This cannot be true for the magic pages
> > that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
> > helper application executed in Dom0. For domain using statically
> > allocated memory but not 1:1 direct-mapped, similar error "failed to
> > retrieve a reserved page" can be seen as the reserved memory list
> > is empty at that time.
> > 
> > This series tries to fix this issue using a DOMCTL-based approach,
> > because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
> > and inform the toolstack about the region found by hypervisor for
> > mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
> > guest memory map, currently only used for the magic page regions.
> > Patch 2 generalized the extended region finding logic so that it can
> > be reused for other use cases such as finding 1:1 domU magic regions.
> > Patch 3 uses the same approach as finding the extended regions to find
> > the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
> > hardcoding all base addresses of guest magic region in the init-dom0less
> > application by consuming the newly introduced DOMCTL. Patch 5 is a
> > simple patch to do some code duplication clean-up in xc.
> 
> Hey Henry,
> 
> To help provide some perspective, these issues are not experienced with
> hyperlaunch. This is because we understood early on that you cannot move a
> lightweight version of the toolstack into hypervisor init and not provide a
> mechanism to communicate what it did to the runtime control plane. We
> evaluated the possible mechanism, to include introducing a new hypercall op,
> and ultimately settled on using hypfs. The primary reason is this information
> is static data that, while informative later, is only necessary for the
> control plane to understand the state of the system. As a result, hyperlaunch
> is able to allocate any and all special pages required as part of domain
> construction and communicate their addresses to the control plane. As for XSM,
> hypfs is already protected and at this time we do not see any domain builder
> information needing to be restricted separately from the data already present
> in hypfs.
> 
> I would like to make the suggestion that instead of continuing down this path,
> perhaps you might consider adopting the hyperlaunch usage of hypfs. Then
> adjust dom0less domain construction to allocate the special pages at
> construction time. The original hyperlaunch series includes a patch that
> provides the helper app for the xenstore announcement. And I can provide you
> with updated versions if that would be helpful.

I also think that the new domctl is not needed and that the dom0less
domain builder should allocate the magic pages. On ARM, we already
allocate HVM_PARAM_CALLBACK_IRQ during dom0less domain build and set
HVM_PARAM_STORE_PFN to ~0ULL. I think it would be only natural to extend
that code to also allocate the magic pages and set HVM_PARAM_STORE_PFN
(and others) correctly. If we do it that way it is simpler and
consistent with the HVM_PARAM_CALLBACK_IRQ allocation, and we don't even
need hypfs. Currently we do not enable hypfs in our safety
certifiability configuration.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs
  2024-04-25 22:18   ` Stefano Stabellini
@ 2024-04-26  5:22     ` Henry Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Henry Wang @ 2024-04-26  5:22 UTC (permalink / raw)
  To: Stefano Stabellini, Daniel P. Smith
  Cc: xen-devel, Anthony PERARD, Juergen Gross, Andrew Cooper,
	George Dunlap, Jan Beulich, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk

Hi Stefano, Daniel,

On 4/26/2024 6:18 AM, Stefano Stabellini wrote:
> On Thu, 18 Apr 2024, Daniel P. Smith wrote:
>> On 4/9/24 00:53, Henry Wang wrote:
>>> An error message can seen from the init-dom0less application on
>>> direct-mapped 1:1 domains:
>>> ```
>>> Allocating magic pages
>>> memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
>>> Error on alloc magic pages
>>> ```
>>>
>>> This is because populate_physmap() automatically assumes gfn == mfn
>>> for direct mapped domains. This cannot be true for the magic pages
>>> that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
>>> helper application executed in Dom0. For domain using statically
>>> allocated memory but not 1:1 direct-mapped, similar error "failed to
>>> retrieve a reserved page" can be seen as the reserved memory list
>>> is empty at that time.
>>>
>>> This series tries to fix this issue using a DOMCTL-based approach,
>>> because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
>>> and inform the toolstack about the region found by hypervisor for
>>> mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
>>> guest memory map, currently only used for the magic page regions.
>>> Patch 2 generalized the extended region finding logic so that it can
>>> be reused for other use cases such as finding 1:1 domU magic regions.
>>> Patch 3 uses the same approach as finding the extended regions to find
>>> the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
>>> hardcoding all base addresses of guest magic region in the init-dom0less
>>> application by consuming the newly introduced DOMCTL. Patch 5 is a
>>> simple patch to do some code duplication clean-up in xc.
>> Hey Henry,
>>
>> To help provide some perspective, these issues are not experienced with
>> hyperlaunch. This is because we understood early on that you cannot move a
>> lightweight version of the toolstack into hypervisor init and not provide a
>> mechanism to communicate what it did to the runtime control plane. We
>> evaluated the possible mechanism, to include introducing a new hypercall op,
>> and ultimately settled on using hypfs. The primary reason is this information
>> is static data that, while informative later, is only necessary for the
>> control plane to understand the state of the system. As a result, hyperlaunch
>> is able to allocate any and all special pages required as part of domain
>> construction and communicate their addresses to the control plane. As for XSM,
>> hypfs is already protected and at this time we do not see any domain builder
>> information needing to be restricted separately from the data already present
>> in hypfs.
>>
>> I would like to make the suggestion that instead of continuing down this path,
>> perhaps you might consider adopting the hyperlaunch usage of hypfs. Then
>> adjust dom0less domain construction to allocate the special pages at
>> construction time. The original hyperlaunch series includes a patch that
>> provides the helper app for the xenstore announcement. And I can provide you
>> with updated versions if that would be helpful.
> I also think that the new domctl is not needed and that the dom0less
> domain builder should allocate the magic pages.

Yes this is indeed much better. Thanks Daniel for suggesting this.

> On ARM, we already
> allocate HVM_PARAM_CALLBACK_IRQ during dom0less domain build and set
> HVM_PARAM_STORE_PFN to ~0ULL. I think it would be only natural to extend
> that code to also allocate the magic pages and set HVM_PARAM_STORE_PFN
> (and others) correctly. If we do it that way it is simpler and
> consistent with the HVM_PARAM_CALLBACK_IRQ allocation, and we don't even
> need hypfs. Currently we do not enable hypfs in our safety
> certifiability configuration.

It is indeed very important to consider the safety certification (which 
I completely missed). Therefore I've sent an updated version based on 
HVMOP [1]. In the future we can switch to hypfs if needed.

[1] 
https://lore.kernel.org/xen-devel/20240426031455.579637-1-xin.wang2@amd.com/

Kind regards,
Henry



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-04-26  5:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-09  4:53 [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Henry Wang
2024-04-09  4:53 ` [PATCH v4 1/5] xen/domctl, tools: Introduce a new domctl to get guest memory map Henry Wang
2024-04-18 12:37   ` Jan Beulich
2024-04-19  2:27     ` Henry Wang
2024-04-19  6:16       ` Jan Beulich
2024-04-09  4:53 ` [PATCH v4 2/5] xen/arm: Generalize the extended region finding logic Henry Wang
2024-04-09  4:53 ` [PATCH v4 3/5] xen/arm: Find unallocated spaces for magic pages of direct-mapped domU Henry Wang
2024-04-09  4:53 ` [PATCH v4 4/5] xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less Henry Wang
2024-04-18 12:54   ` Jan Beulich
2024-04-19  2:31     ` Henry Wang
2024-04-19  6:18       ` Jan Beulich
2024-04-19  6:30         ` Henry Wang
2024-04-09  4:53 ` [PATCH v4 5/5] tools/libs/ctrl: Simplify xc helpers related to populate_physmap() Henry Wang
2024-04-18 14:16 ` [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs Daniel P. Smith
2024-04-19  1:45   ` Henry Wang
2024-04-25 22:18   ` Stefano Stabellini
2024-04-26  5:22     ` Henry Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).