All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space")
@ 2021-09-29 22:52 Oleksandr Tyshchenko
  2021-09-29 22:52 ` [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-29 22:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné,
	Henry Wang, Bertrand Marquis, Wei Chen

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

You can find an initial discussion at [1]-[4].

The extended region (safe range) is a region of guest physical address space
which is unused and could be safely used to create grant/foreign mappings instead
of wasting real RAM pages from the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and advertised
to it via "reg" property under hypervisor node in the guest device-tree
(the indexes for extended regions are 1...N).
No device tree bindings update is needed, guest infers the presense of extended
regions from the number of regions in "reg" property.
New compatible/property will be needed (but only after this patch [5] or alternative
goes in) to indicate that "region 0 is safe to use". Until this patch is merged it is
not safe to use extended regions for the grant table space.

The extended regions are calculated differently for direct mapped Dom0 (with and without
IOMMU) and non-direct mapped DomUs.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain currently.
- The ACPI case is not covered.

Xen patch series is also available at [6]. The corresponding Linux patch series is at [7]
for now (last 4 patches).

Tested on Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with updated virtio-disk backend [8]
running in Dom0 (256MB RAM) and DomD (2GB RAM). In both cases the backend pre-maps DomU memory
which is 3GB. All foreign memory gets mapped into extended regions (so the amount of RAM in
the backend domain is not reduced). No issues were observed.

[1] https://lore.kernel.org/xen-devel/1627489110-25633-1-git-send-email-olekstysh@gmail.com/
[2] https://lore.kernel.org/xen-devel/1631034578-12598-1-git-send-email-olekstysh@gmail.com/
[3] https://lore.kernel.org/xen-devel/1631297924-8658-1-git-send-email-olekstysh@gmail.com/
[4] https://lore.kernel.org/xen-devel/1632437334-12015-1-git-send-email-olekstysh@gmail.com/
[5] https://lore.kernel.org/xen-devel/1632425551-18910-1-git-send-email-olekstysh@gmail.com/
[6] https://github.com/otyshchenko1/xen/commits/map_opt_ml5
[7] https://github.com/otyshchenko1/linux/commits/map_opt_ml4
[8] https://github.com/otyshchenko1/virtio-disk/commits/map_opt_next

Oleksandr Tyshchenko (3):
  xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  xen/arm: Add handling of extended regions for Dom0
  libxl/arm: Add handling of extended regions for DomU

 docs/misc/xen-command-line.pandoc |  11 ++
 tools/include/libxl.h             |   7 +
 tools/libs/light/libxl.c          |   2 +
 tools/libs/light/libxl_arm.c      |  70 +++++++++-
 tools/libs/light/libxl_types.idl  |   2 +
 tools/xl/xl_info.c                |   2 +
 xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/sysctl.c             |   2 +
 xen/arch/x86/sysctl.c             |   2 +
 xen/include/public/arch-arm.h     |   3 +
 xen/include/public/sysctl.h       |   4 +-
 11 files changed, 382 insertions(+), 9 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-09-29 22:52 [PATCH V4 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
@ 2021-09-29 22:52 ` Oleksandr Tyshchenko
  2021-09-30 23:00   ` Stefano Stabellini
  2021-09-29 22:52 ` [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
  2021-09-29 22:52 ` [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
  2 siblings, 1 reply; 25+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-29 22:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

We need to pass info about maximum supported guest address
space size to the toolstack on Arm in order to properly
calculate the base and size of the extended region (safe range)
for the guest. The extended region is unused address space which
could be safely used by domain for foreign/grant mappings on Arm.
The extended region itself will be handled by the subsequents
patch.

Use p2m_ipa_bits variable on Arm, the x86 equivalent is
hap_paddr_bits.

As we change the size of structure bump the interface version.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
---
Please note, that review comments for the RFC version [1] haven't been addressed yet.
It is not forgotten, some clarification is needed. It will be addressed for the next version.

[1] https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/

Changes RFC -> V2:
   - update patch subject/description
   - replace arch-specific sub-struct with common gpaddr_bits
     field and update code to reflect that

Changes V2 -> V3:
   - make the field uint8_t and add uint8_t pad[7] after
   - remove leading blanks in libxl.h

Changes V3 -> V4:
   - also print gpaddr_bits from output_physinfo()
   - add Michal's R-b
---
 tools/include/libxl.h            | 7 +++++++
 tools/libs/light/libxl.c         | 2 ++
 tools/libs/light/libxl_types.idl | 2 ++
 tools/xl/xl_info.c               | 2 ++
 xen/arch/arm/sysctl.c            | 2 ++
 xen/arch/x86/sysctl.c            | 2 ++
 xen/include/public/sysctl.h      | 4 +++-
 7 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index b9ba16d..63f9550 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -856,6 +856,13 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_PHYSINFO_MAX_POSSIBLE_MFN 1
 
 /*
+ * LIBXL_HAVE_PHYSINFO_GPADDR_BITS
+ *
+ * If this is defined, libxl_physinfo has a "gpaddr_bits" field.
+ */
+#define LIBXL_HAVE_PHYSINFO_GPADDR_BITS 1
+
+/*
  * LIBXL_HAVE_DOMINFO_OUTSTANDING_MEMKB 1
  *
  * If this is defined, libxl_dominfo will contain a MemKB type field called
diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
index 204eb0b..c86624f 100644
--- a/tools/libs/light/libxl.c
+++ b/tools/libs/light/libxl.c
@@ -405,6 +405,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
     physinfo->cap_vmtrace =
         !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_vmtrace);
 
+    physinfo->gpaddr_bits = xcphysinfo.gpaddr_bits;
+
     GC_FREE;
     return 0;
 }
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 3f9fff6..bf27fe6 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -1061,6 +1061,8 @@ libxl_physinfo = Struct("physinfo", [
     ("cap_shadow", bool),
     ("cap_iommu_hap_pt_share", bool),
     ("cap_vmtrace", bool),
+
+    ("gpaddr_bits", uint8),
     ], dir=DIR_OUT)
 
 libxl_connectorinfo = Struct("connectorinfo", [
diff --git a/tools/xl/xl_info.c b/tools/xl/xl_info.c
index 8383e4a..dfbbeaa 100644
--- a/tools/xl/xl_info.c
+++ b/tools/xl/xl_info.c
@@ -221,6 +221,8 @@ static void output_physinfo(void)
          info.cap_vmtrace ? " vmtrace" : ""
         );
 
+    maybe_printf("gpaddr_bits            : %d\n", info.gpaddr_bits);
+
     vinfo = libxl_get_version_info(ctx);
     if (vinfo) {
         i = (1 << 20) / vinfo->pagesize;
diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c
index f87944e..91dca4f 100644
--- a/xen/arch/arm/sysctl.c
+++ b/xen/arch/arm/sysctl.c
@@ -15,6 +15,8 @@
 void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
 {
     pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm | XEN_SYSCTL_PHYSCAP_hap;
+
+    pi->gpaddr_bits = p2m_ipa_bits;
 }
 
 long arch_do_sysctl(struct xen_sysctl *sysctl,
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index aff52a1..7b14865 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -135,6 +135,8 @@ void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_hap;
     if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_shadow;
+
+    pi->gpaddr_bits = hap_paddr_bits;
 }
 
 long arch_do_sysctl(
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 039ccf8..0450a78 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -35,7 +35,7 @@
 #include "domctl.h"
 #include "physdev.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
 
 /*
  * Read console content from Xen buffer ring.
@@ -120,6 +120,8 @@ struct xen_sysctl_physinfo {
     uint64_aligned_t outstanding_pages;
     uint64_aligned_t max_mfn; /* Largest possible MFN on this host */
     uint32_t hw_cap[8];
+    uint8_t gpaddr_bits;
+    uint8_t pad[7];
 };
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-29 22:52 [PATCH V4 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
  2021-09-29 22:52 ` [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
@ 2021-09-29 22:52 ` Oleksandr Tyshchenko
  2021-09-30 15:36   ` Luca Fancellu
                     ` (3 more replies)
  2021-09-29 22:52 ` [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
  2 siblings, 4 replies; 25+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-29 22:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Volodymyr Babchuk

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
the algorithm to choose extended regions for it is different
in comparison with the algorithm for non-direct mapped DomU.
What is more, that extended regions should be chosen differently
whether IOMMU is enabled or not.

Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
holes found in host device-tree if otherwise. Make sure that
extended regions are 2MB-aligned and located within maximum possible
addressable physical memory range. The minimum size of extended
region is 64MB. The maximum number of extended regions is 128,
which is an artificial limit to minimize code changes (we reuse
struct meminfo to describe extended regions, so there are an array
field for 128 elements).

It worth mentioning that unallocated memory solution (when the IOMMU
is disabled) will work safely until Dom0 is able to allocate memory
outside of the original range.

Also introduce command line option to be able to globally enable or
disable support for extended regions for Dom0 (enabled by default).

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Please note, we need to decide which approach to use in find_unallocated_memory(),
you can find details at:
https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/

Changes RFC -> V2:
   - update patch description
   - drop uneeded "extended-region" DT property

Changes V2 -> V3:
   - update patch description
   - add comment for "size" calculation in add_ext_regions()
   - clarify "end" calculation in find_unallocated_memory() and
     find_memory_holes()
   - only pick up regions with size >= 64MB
   - allocate reg dynamically instead of keeping on the stack in
     make_hypervisor_node()
   - do not show warning for 32-bit domain
   - drop Linux specific limits EXT_REGION_*
   - also cover "ranges" property in find_memory_holes()
   - add command line arg to enable/disable extended region support

Changes V3 -> V4:
  - update opt_ext_regions purpose and comment in code
  - reorganize make_hypervisor_node() to move allocations after initial
    checks, allocate only required amount of elements instead of maximum
    possible
---
 docs/misc/xen-command-line.pandoc |  11 ++
 xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 294 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 177e656..5cae4ad 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
 Note that specifying zero as domU value means zero, while for dom0 it means
 to use the default.
 
+### ext_regions (Arm)
+> `= <boolean>`
+
+> Default : `true`
+
+Flag to enable or disable support for extended regions for Dom0.
+
+Extended regions are ranges of unused address space exposed to Dom0 as
+"safe to use" for special memory mappings. Disable if your board device
+tree is incomplete.
+
 ### flask
 > `= permissive | enforcing | late | disabled`
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d233d63..c5afbe2 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -34,6 +34,10 @@
 static unsigned int __initdata opt_dom0_max_vcpus;
 integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
 
+/* If true, the extended regions support is enabled for dom0 */
+static bool __initdata opt_ext_regions = true;
+boolean_param("ext_regions", opt_ext_regions);
+
 static u64 __initdata dom0_mem;
 static bool __initdata dom0_mem_set;
 
@@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
     return res;
 }
 
+static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
+{
+    struct meminfo *ext_regions = data;
+    paddr_t start, size;
+
+    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
+        return 0;
+
+    /* Both start and size of the extended region should be 2MB aligned */
+    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
+    if ( start > e )
+        return 0;
+
+    /*
+     * e is actually "end-1" because it is called by rangeset functions
+     * which are inclusive of the last address.
+     */
+    e += 1;
+    size = (e - start) & ~(SZ_2M - 1);
+    if ( size < MB(64) )
+        return 0;
+
+    ext_regions->bank[ext_regions->nr_banks].start = start;
+    ext_regions->bank[ext_regions->nr_banks].size = size;
+    ext_regions->nr_banks++;
+
+    return 0;
+}
+
+static int __init find_unallocated_memory(const struct kernel_info *kinfo,
+                                          struct meminfo *ext_regions)
+{
+    const struct meminfo *assign_mem = &kinfo->mem;
+    struct rangeset *unalloc_mem;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find unallocated memory for extended regions\n");
+
+    unalloc_mem = rangeset_new(NULL, NULL, 0);
+    if ( !unalloc_mem )
+        return -ENOMEM;
+
+    /* Start with all available RAM */
+    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
+    {
+        start = bootinfo.mem.bank[i].start;
+        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
+        res = rangeset_add_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove RAM assigned to Dom0 */
+    for ( i = 0; i < assign_mem->nr_banks; i++ )
+    {
+        start = assign_mem->bank[i].start;
+        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
+        res = rangeset_remove_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove reserved-memory regions */
+    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        start = bootinfo.reserved_mem.bank[i].start;
+        end = bootinfo.reserved_mem.bank[i].start +
+            bootinfo.reserved_mem.bank[i].size;
+        res = rangeset_remove_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove grant table region */
+    start = kinfo->gnttab_start;
+    end = kinfo->gnttab_start + kinfo->gnttab_size;
+    res = rangeset_remove_range(unalloc_mem, start, end - 1);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_report_ranges(unalloc_mem, start, end,
+                                 add_ext_regions, ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(unalloc_mem);
+
+    return res;
+}
+
+static int __init find_memory_holes(const struct kernel_info *kinfo,
+                                    struct meminfo *ext_regions)
+{
+    struct dt_device_node *np;
+    struct rangeset *mem_holes;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find memory holes for extended regions\n");
+
+    mem_holes = rangeset_new(NULL, NULL, 0);
+    if ( !mem_holes )
+        return -ENOMEM;
+
+    /* Start with maximum possible addressable physical memory range */
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_add_range(mem_holes, start, end);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    /*
+     * Remove regions described by "reg" and "ranges" properties where
+     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
+     */
+    dt_for_each_device_node( dt_host, np )
+    {
+        unsigned int naddr;
+        u64 addr, size;
+
+        naddr = dt_number_of_address(np);
+
+        for ( i = 0; i < naddr; i++ )
+        {
+            res = dt_device_get_address(np, i, &addr, &size);
+            if ( res )
+            {
+                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
+                       i, dt_node_full_name(np));
+                goto out;
+            }
+
+            start = addr & PAGE_MASK;
+            end = PAGE_ALIGN(addr + size);
+            res = rangeset_remove_range(mem_holes, start, end - 1);
+            if ( res )
+            {
+                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                       start, end);
+                goto out;
+            }
+        }
+
+        if ( dt_device_type_is_equal(np, "pci" ) )
+        {
+            unsigned int range_size, nr_ranges;
+            int na, ns, pna;
+            const __be32 *ranges;
+            u32 len;
+
+            /*
+             * Looking for non-empty ranges property which in this context
+             * describes the PCI host bridge aperture.
+             */
+            ranges = dt_get_property(np, "ranges", &len);
+            if ( !ranges || !len )
+                continue;
+
+            pna = dt_n_addr_cells(np);
+            na = dt_child_n_addr_cells(np);
+            ns = dt_child_n_size_cells(np);
+            range_size = pna + na + ns;
+            nr_ranges = len / sizeof(__be32) / range_size;
+
+            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
+            {
+                /* Skip the child address and get the parent (CPU) address */
+                addr = dt_read_number(ranges + na, pna);
+                size = dt_read_number(ranges + na + pna, ns);
+
+                start = addr & PAGE_MASK;
+                end = PAGE_ALIGN(addr + size);
+                res = rangeset_remove_range(mem_holes, start, end - 1);
+                if ( res )
+                {
+                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                           start, end);
+                    goto out;
+                }
+            }
+        }
+    }
+
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_report_ranges(mem_holes, start, end,
+                                 add_ext_regions,  ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(mem_holes);
+
+    return res;
+}
+
 static int __init make_hypervisor_node(struct domain *d,
                                        const struct kernel_info *kinfo,
                                        int addrcells, int sizecells)
@@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
     const char compat[] =
         "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
         "xen,xen";
-    __be32 reg[4];
+    __be32 *reg, *cells;
     gic_interrupt_t intr;
-    __be32 *cells;
     int res;
     void *fdt = kinfo->fdt;
+    struct meminfo *ext_regions = NULL;
+    unsigned int i, nr_ext_regions;
 
     dt_dprintk("Create hypervisor node\n");
 
@@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
     if ( res )
         return res;
 
+    if ( !opt_ext_regions )
+    {
+        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
+        nr_ext_regions = 0;
+    }
+    else if ( is_32bit_domain(d) )
+    {
+        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
+        nr_ext_regions = 0;
+    }
+    else
+    {
+        ext_regions = xzalloc(struct meminfo);
+        if ( !ext_regions )
+            return -ENOMEM;
+
+        if ( !is_iommu_enabled(d) )
+            res = find_unallocated_memory(kinfo, ext_regions);
+        else
+            res = find_memory_holes(kinfo, ext_regions);
+
+        if ( res )
+            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
+        nr_ext_regions = ext_regions->nr_banks;
+    }
+
+    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
+    if ( !reg )
+    {
+        xfree(ext_regions);
+        return -ENOMEM;
+    }
+
     /* reg 0 is grant table space */
     cells = &reg[0];
     dt_child_set_range(&cells, addrcells, sizecells,
                        kinfo->gnttab_start, kinfo->gnttab_size);
+    /* reg 1...N are extended regions */
+    for ( i = 0; i < nr_ext_regions; i++ )
+    {
+        u64 start = ext_regions->bank[i].start;
+        u64 size = ext_regions->bank[i].size;
+
+        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
+                   i, start, start + size);
+
+        dt_child_set_range(&cells, addrcells, sizecells, start, size);
+    }
+
     res = fdt_property(fdt, "reg", reg,
-                       dt_cells_to_size(addrcells + sizecells));
+                       dt_cells_to_size(addrcells + sizecells) *
+                       (nr_ext_regions + 1));
+    xfree(ext_regions);
+    xfree(reg);
+
     if ( res )
         return res;
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-09-29 22:52 [PATCH V4 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
  2021-09-29 22:52 ` [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
  2021-09-29 22:52 ` [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
@ 2021-09-29 22:52 ` Oleksandr Tyshchenko
  2021-10-05 19:42   ` Oleksandr
  2 siblings, 1 reply; 25+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-29 22:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Juergen Gross, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

The algorithm to choose extended regions for non-direct mapped
DomU is simpler in comparison with the algorithm for direct mapped
Dom0. As we have a lot of unused space above 4GB, provide single
2MB-aligned region from the second RAM bank taking into the account
the maximum supported guest address space size and the amount of
memory assigned to the guest. The maximum size of the region is 128GB.
The minimum size is 64MB.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Ian Jackson <iwj@xenproject.org>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Tested-by: Michal Orzel <michal.orzel@arm.com>
---
Changes RFC -> V2:
   - update patch description
   - drop uneeded "extended-region" DT property
   - clear reg array in finalise_ext_region() and add a TODO

Changes V2 -> V3:
   - update patch description, comments in code
   - only pick up regions with size >= 64MB
   - move the region calculation to make_hypervisor_node() and drop
     finalise_ext_region()
   - extend the list of arguments for make_hypervisor_node()
   - do not show warning for 32-bit domain
   - change the region alignment from 1GB to 2MB
   - move EXT_REGION_SIZE to public/arch-arm.h

Changes V3 -> V4:
   - add R-b, A-b and T-b
---
 tools/libs/light/libxl_arm.c  | 70 +++++++++++++++++++++++++++++++++++++++----
 xen/include/public/arch-arm.h |  3 ++
 2 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index e3140a6..a67b68e 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -598,9 +598,17 @@ static int make_timer_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
+
 static int make_hypervisor_node(libxl__gc *gc, void *fdt,
-                                const libxl_version_info *vers)
+                                const libxl_version_info *vers,
+                                const libxl_domain_build_info *b_info,
+                                const struct xc_dom_image *dom)
 {
+    uint64_t region_size = 0, region_base, ramsize, bank1size,
+        bank1end_align, bank1end_max;
+    uint8_t gpaddr_bits;
+    libxl_physinfo physinfo;
     int res;
     gic_interrupt intr;
 
@@ -615,9 +623,61 @@ static int make_hypervisor_node(libxl__gc *gc, void *fdt,
                               "xen,xen");
     if (res) return res;
 
-    /* reg 0 is grant table space */
-    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
+    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
+        LOG(DEBUG, "The extended regions are only supported for 64-bit guest currently");
+        goto out;
+    }
+
+    res = libxl_get_physinfo(CTX, &physinfo);
+    assert(!res);
+
+    gpaddr_bits = physinfo.gpaddr_bits;
+    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
+
+    /*
+     * Try to allocate single 2MB-aligned extended region from the second RAM
+     * bank (above 4GB) taking into the account the maximum supported guest
+     * address space size and the amount of memory assigned to the guest.
+     * As the guest memory layout is not populated yet we cannot rely on
+     * dom->rambank_size[1], so calculate the actual size of the second bank
+     * using "max_memkb" value.
+     */
+    bank1end_max = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
+    ramsize = b_info->max_memkb * 1024;
+    if (ramsize <= GUEST_RAM0_SIZE)
+        bank1size = 0;
+    else
+        bank1size = ramsize - GUEST_RAM0_SIZE;
+    bank1end_align = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(bank1size);
+
+    if (bank1end_max <= bank1end_align) {
+        LOG(WARN, "The extended region cannot be allocated, not enough space");
+        goto out;
+    }
+
+    if (bank1end_max - bank1end_align > GUEST_EXT_REGION_MAX_SIZE) {
+        region_base = bank1end_max - GUEST_EXT_REGION_MAX_SIZE;
+        region_size = GUEST_EXT_REGION_MAX_SIZE;
+    } else {
+        region_base = bank1end_align;
+        region_size = bank1end_max - bank1end_align;
+    }
+
+out:
+    /*
+     * The region 0 for grant table space must be always present. If we managed
+     * to allocate the extended region then insert it as region 1.
+     */
+    if (region_size >= GUEST_EXT_REGION_MIN_SIZE) {
+        LOG(DEBUG, "Extended region: %#"PRIx64"->%#"PRIx64"\n",
+            region_base, region_base + region_size);
+
+        res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                                2, GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE,
+                                region_base, region_size);
+    } else
+        res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                                1, GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
     if (res) return res;
 
     /*
@@ -963,7 +1023,7 @@ next_resize:
         }
 
         FDT( make_timer_node(gc, fdt, ainfo, state->clock_frequency) );
-        FDT( make_hypervisor_node(gc, fdt, vers) );
+        FDT( make_hypervisor_node(gc, fdt, vers, info, dom) );
 
         if (info->arch_arm.vuart == LIBXL_VUART_TYPE_SBSA_UART)
             FDT( make_vpl011_uart_node(gc, fdt, ainfo, dom) );
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 6b5a5f8..df59933 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -449,6 +449,9 @@ typedef uint64_t xen_callback_t;
 #define GUEST_RAM_BANK_BASES   { GUEST_RAM0_BASE, GUEST_RAM1_BASE }
 #define GUEST_RAM_BANK_SIZES   { GUEST_RAM0_SIZE, GUEST_RAM1_SIZE }
 
+#define GUEST_EXT_REGION_MAX_SIZE   xen_mk_ullong(0x2000000000) /* 128GB */
+#define GUEST_EXT_REGION_MIN_SIZE   xen_mk_ullong(0x0004000000) /* 64MB */
+
 /* Current supported guest VCPUs */
 #define GUEST_MAX_VCPUS 128
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-29 22:52 ` [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
@ 2021-09-30 15:36   ` Luca Fancellu
  2021-09-30 22:53   ` Stefano Stabellini
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Luca Fancellu @ 2021-09-30 15:36 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu, Volodymyr Babchuk



> On 29 Sep 2021, at 23:52, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>  currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>   - update patch description
>   - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>   - update patch description
>   - add comment for "size" calculation in add_ext_regions()
>   - clarify "end" calculation in find_unallocated_memory() and
>     find_memory_holes()
>   - only pick up regions with size >= 64MB
>   - allocate reg dynamically instead of keeping on the stack in
>     make_hypervisor_node()
>   - do not show warning for 32-bit domain
>   - drop Linux specific limits EXT_REGION_*
>   - also cover "ranges" property in find_memory_holes()
>   - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>  - update opt_ext_regions purpose and comment in code
>  - reorganize make_hypervisor_node() to move allocations after initial
>    checks, allocate only required amount of elements instead of maximum
>    possible
> ---
> docs/misc/xen-command-line.pandoc |  11 ++
> xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
> 2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
> Note that specifying zero as domU value means zero, while for dom0 it means
> to use the default.
> 
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
> ### flask
>> `= permissive | enforcing | late | disabled`
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
> static unsigned int __initdata opt_dom0_max_vcpus;
> integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
> 
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
> static u64 __initdata dom0_mem;
> static bool __initdata dom0_mem_set;
> 
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>     return res;
> }
> 
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)
> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)
> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {
> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
> static int __init make_hypervisor_node(struct domain *d,
>                                        const struct kernel_info *kinfo,
>                                        int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>     const char compat[] =
>         "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>         "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>     gic_interrupt_t intr;
> -    __be32 *cells;
>     int res;
>     void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
> 
>     dt_dprintk("Create hypervisor node\n");
> 
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>     if ( res )
>         return res;
> 
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>     /* reg 0 is grant table space */
>     cells = &reg[0];
>     dt_child_set_range(&cells, addrcells, sizecells,
>                        kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);
> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>     res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>     if ( res )
>         return res;
> 
> -- 
> 2.7.4
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-29 22:52 ` [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
  2021-09-30 15:36   ` Luca Fancellu
@ 2021-09-30 22:53   ` Stefano Stabellini
  2021-10-02  0:33   ` Stefano Stabellini
  2021-10-04  6:59   ` Julien Grall
  3 siblings, 0 replies; 25+ messages in thread
From: Stefano Stabellini @ 2021-09-30 22:53 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu, Volodymyr Babchuk

On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>    - update patch description
>    - add comment for "size" calculation in add_ext_regions()
>    - clarify "end" calculation in find_unallocated_memory() and
>      find_memory_holes()
>    - only pick up regions with size >= 64MB
>    - allocate reg dynamically instead of keeping on the stack in
>      make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - drop Linux specific limits EXT_REGION_*
>    - also cover "ranges" property in find_memory_holes()
>    - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>   - update opt_ext_regions purpose and comment in code
>   - reorganize make_hypervisor_node() to move allocations after initial
>     checks, allocate only required amount of elements instead of maximum
>     possible
> ---
>  docs/misc/xen-command-line.pandoc |  11 ++
>  xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>  Note that specifying zero as domU value means zero, while for dom0 it means
>  to use the default.
>  
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
>  ### flask
>  > `= permissive | enforcing | late | disabled`
>  
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
>  static unsigned int __initdata opt_dom0_max_vcpus;
>  integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>  
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
>  static u64 __initdata dom0_mem;
>  static bool __initdata dom0_mem_set;
>  
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>      return res;
>  }
>  
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)
> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)
> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {
> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
>  static int __init make_hypervisor_node(struct domain *d,
>                                         const struct kernel_info *kinfo,
>                                         int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>      const char compat[] =
>          "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>          "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>      gic_interrupt_t intr;
> -    __be32 *cells;
>      int res;
>      void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
>  
>      dt_dprintk("Create hypervisor node\n");
>  
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>      if ( res )
>          return res;
>  
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>      /* reg 0 is grant table space */
>      cells = &reg[0];
>      dt_child_set_range(&cells, addrcells, sizecells,
>                         kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);
> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>      res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>      if ( res )
>          return res;
>  
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-09-29 22:52 ` [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
@ 2021-09-30 23:00   ` Stefano Stabellini
  2021-10-01  7:50     ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Stefano Stabellini @ 2021-09-30 23:00 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Ian Jackson, Wei Liu,
	Anthony PERARD, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Juergen Gross,
	Volodymyr Babchuk, Roger Pau Monné

On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> We need to pass info about maximum supported guest address
> space size to the toolstack on Arm in order to properly
> calculate the base and size of the extended region (safe range)
> for the guest. The extended region is unused address space which
> could be safely used by domain for foreign/grant mappings on Arm.
> The extended region itself will be handled by the subsequents
> patch.
> 
> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
> hap_paddr_bits.
> 
> As we change the size of structure bump the interface version.
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Please note, that review comments for the RFC version [1] haven't been addressed yet.
> It is not forgotten, some clarification is needed. It will be addressed for the next version.
> 
> [1] https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
> 
> Changes RFC -> V2:
>    - update patch subject/description
>    - replace arch-specific sub-struct with common gpaddr_bits
>      field and update code to reflect that
> 
> Changes V2 -> V3:
>    - make the field uint8_t and add uint8_t pad[7] after
>    - remove leading blanks in libxl.h
> 
> Changes V3 -> V4:
>    - also print gpaddr_bits from output_physinfo()
>    - add Michal's R-b
> ---
>  tools/include/libxl.h            | 7 +++++++
>  tools/libs/light/libxl.c         | 2 ++
>  tools/libs/light/libxl_types.idl | 2 ++
>  tools/xl/xl_info.c               | 2 ++
>  xen/arch/arm/sysctl.c            | 2 ++
>  xen/arch/x86/sysctl.c            | 2 ++
>  xen/include/public/sysctl.h      | 4 +++-
>  7 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/include/libxl.h b/tools/include/libxl.h
> index b9ba16d..63f9550 100644
> --- a/tools/include/libxl.h
> +++ b/tools/include/libxl.h
> @@ -856,6 +856,13 @@ typedef struct libxl__ctx libxl_ctx;
>  #define LIBXL_HAVE_PHYSINFO_MAX_POSSIBLE_MFN 1
>  
>  /*
> + * LIBXL_HAVE_PHYSINFO_GPADDR_BITS
> + *
> + * If this is defined, libxl_physinfo has a "gpaddr_bits" field.
> + */
> +#define LIBXL_HAVE_PHYSINFO_GPADDR_BITS 1
> +
> +/*
>   * LIBXL_HAVE_DOMINFO_OUTSTANDING_MEMKB 1
>   *
>   * If this is defined, libxl_dominfo will contain a MemKB type field called
> diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
> index 204eb0b..c86624f 100644
> --- a/tools/libs/light/libxl.c
> +++ b/tools/libs/light/libxl.c
> @@ -405,6 +405,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
>      physinfo->cap_vmtrace =
>          !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_vmtrace);
>  
> +    physinfo->gpaddr_bits = xcphysinfo.gpaddr_bits;
> +
>      GC_FREE;
>      return 0;
>  }
> diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
> index 3f9fff6..bf27fe6 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -1061,6 +1061,8 @@ libxl_physinfo = Struct("physinfo", [
>      ("cap_shadow", bool),
>      ("cap_iommu_hap_pt_share", bool),
>      ("cap_vmtrace", bool),
> +
> +    ("gpaddr_bits", uint8),
>      ], dir=DIR_OUT)
>  
>  libxl_connectorinfo = Struct("connectorinfo", [
> diff --git a/tools/xl/xl_info.c b/tools/xl/xl_info.c
> index 8383e4a..dfbbeaa 100644
> --- a/tools/xl/xl_info.c
> +++ b/tools/xl/xl_info.c
> @@ -221,6 +221,8 @@ static void output_physinfo(void)
>           info.cap_vmtrace ? " vmtrace" : ""
>          );
>  
> +    maybe_printf("gpaddr_bits            : %d\n", info.gpaddr_bits);
> +
>      vinfo = libxl_get_version_info(ctx);
>      if (vinfo) {
>          i = (1 << 20) / vinfo->pagesize;
> diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c
> index f87944e..91dca4f 100644
> --- a/xen/arch/arm/sysctl.c
> +++ b/xen/arch/arm/sysctl.c
> @@ -15,6 +15,8 @@
>  void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
>  {
>      pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm | XEN_SYSCTL_PHYSCAP_hap;
> +
> +    pi->gpaddr_bits = p2m_ipa_bits;
>  }
>  
>  long arch_do_sysctl(struct xen_sysctl *sysctl,
> diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
> index aff52a1..7b14865 100644
> --- a/xen/arch/x86/sysctl.c
> +++ b/xen/arch/x86/sysctl.c
> @@ -135,6 +135,8 @@ void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
>          pi->capabilities |= XEN_SYSCTL_PHYSCAP_hap;
>      if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
>          pi->capabilities |= XEN_SYSCTL_PHYSCAP_shadow;
> +
> +    pi->gpaddr_bits = hap_paddr_bits;
>  }
>  
>  long arch_do_sysctl(
> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 039ccf8..0450a78 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -35,7 +35,7 @@
>  #include "domctl.h"
>  #include "physdev.h"
>  
> -#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
> +#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
>  
>  /*
>   * Read console content from Xen buffer ring.
> @@ -120,6 +120,8 @@ struct xen_sysctl_physinfo {
>      uint64_aligned_t outstanding_pages;
>      uint64_aligned_t max_mfn; /* Largest possible MFN on this host */
>      uint32_t hw_cap[8];
> +    uint8_t gpaddr_bits;
> +    uint8_t pad[7];
>  };
>  
>  /*
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-09-30 23:00   ` Stefano Stabellini
@ 2021-10-01  7:50     ` Jan Beulich
  2021-10-01  8:19       ` Oleksandr
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2021-10-01  7:50 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Oleksandr Tyshchenko, Ian Jackson, Wei Liu,
	Anthony PERARD, Andrew Cooper, George Dunlap, Julien Grall,
	Juergen Gross, Volodymyr Babchuk, Roger Pau Monné,
	Oleksandr Tyshchenko

On 01.10.2021 01:00, Stefano Stabellini wrote:
> On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> We need to pass info about maximum supported guest address
>> space size to the toolstack on Arm in order to properly
>> calculate the base and size of the extended region (safe range)
>> for the guest. The extended region is unused address space which
>> could be safely used by domain for foreign/grant mappings on Arm.
>> The extended region itself will be handled by the subsequents
>> patch.
>>
>> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>> hap_paddr_bits.
>>
>> As we change the size of structure bump the interface version.
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> 
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

I have to admit that I'm a little puzzled to see these R-b-s when ...

>> Please note, that review comments for the RFC version [1] haven't been addressed yet.
>> It is not forgotten, some clarification is needed. It will be addressed for the next version.
>>
>> [1] https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/

... Oleksandr makes clear this patch isn't really ready yet. The tags
could misguide a committer into putting in this series despite the
open issue(s).

Jan



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-10-01  7:50     ` Jan Beulich
@ 2021-10-01  8:19       ` Oleksandr
  2021-10-01 23:24         ` Stefano Stabellini
  0 siblings, 1 reply; 25+ messages in thread
From: Oleksandr @ 2021-10-01  8:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Anthony PERARD, Andrew Cooper, George Dunlap,
	Julien Grall, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné


On 01.10.21 10:50, Jan Beulich wrote:

Hi Jan

> On 01.10.2021 01:00, Stefano Stabellini wrote:
>> On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> We need to pass info about maximum supported guest address
>>> space size to the toolstack on Arm in order to properly
>>> calculate the base and size of the extended region (safe range)
>>> for the guest. The extended region is unused address space which
>>> could be safely used by domain for foreign/grant mappings on Arm.
>>> The extended region itself will be handled by the subsequents
>>> patch.
>>>
>>> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>>> hap_paddr_bits.
>>>
>>> As we change the size of structure bump the interface version.
>>>
>>> Suggested-by: Julien Grall <jgrall@amazon.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> I have to admit that I'm a little puzzled to see these R-b-s when ...
>
>>> Please note, that review comments for the RFC version [1] haven't been addressed yet.
>>> It is not forgotten, some clarification is needed. It will be addressed for the next version.
>>>
>>> [1] https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
> ... Oleksandr makes clear this patch isn't really ready yet.

Unfortunately, this is true. I am still waiting for the clarification [1]



> The tags
> could misguide a committer into putting in this series despite the
> open issue(s).
>
> Jan


[1] 
https://lore.kernel.org/xen-devel/6a2a183d-c9d8-df2a-41aa-b25283fab197@gmail.com/

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-10-01  8:19       ` Oleksandr
@ 2021-10-01 23:24         ` Stefano Stabellini
  2021-10-02  7:35           ` Julien Grall
  0 siblings, 1 reply; 25+ messages in thread
From: Stefano Stabellini @ 2021-10-01 23:24 UTC (permalink / raw)
  To: Oleksandr
  Cc: Jan Beulich, Stefano Stabellini, xen-devel, Oleksandr Tyshchenko,
	Ian Jackson, Wei Liu, Anthony PERARD, Andrew Cooper,
	George Dunlap, Julien Grall, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné,
	Bertrand.Marquis

Bertrand, see comment on ID_AA64MMFR0_EL1 below, any ideas?


On Fri, 1 Oct 2021, Oleksandr wrote:
> On 01.10.21 10:50, Jan Beulich wrote:
> > On 01.10.2021 01:00, Stefano Stabellini wrote:
> > > On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > 
> > > > We need to pass info about maximum supported guest address
> > > > space size to the toolstack on Arm in order to properly
> > > > calculate the base and size of the extended region (safe range)
> > > > for the guest. The extended region is unused address space which
> > > > could be safely used by domain for foreign/grant mappings on Arm.
> > > > The extended region itself will be handled by the subsequents
> > > > patch.
> > > > 
> > > > Use p2m_ipa_bits variable on Arm, the x86 equivalent is
> > > > hap_paddr_bits.
> > > > 
> > > > As we change the size of structure bump the interface version.
> > > > 
> > > > Suggested-by: Julien Grall <jgrall@amazon.com>
> > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> > > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> > I have to admit that I'm a little puzzled to see these R-b-s when ...
> > 
> > > > Please note, that review comments for the RFC version [1] haven't been
> > > > addressed yet.
> > > > It is not forgotten, some clarification is needed. It will be addressed
> > > > for the next version.
> > > > 
> > > > [1]
> > > > https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
> > ... Oleksandr makes clear this patch isn't really ready yet.
> 
> Unfortunately, this is true. I am still waiting for the clarification [1]

Although I was aware of comments to older versions, this is actually the
first version of this patch that I reviewed with any level of details; I
didn't read previous comments very closely. I tried to find any bugs or
problems with it and I couldn't see any, so I gave my reviewed-by. I
should have clarified that was meant for the ARM part as I don't have a
full understanding of the implications of using hap_paddr_bits on x86
for VM migration.


But let me take this opportunity to say that although I think the
hypercall is OK, I wish we didn't need this patch at all: it is
problematic because it touches tools, x86 and ARM hypervisor code all
together. It needs at least three acks/reviewed-by to get accepted: from
an x86 maintainer, an arm maintainer and from a tools maintainer. I
don't say this to criticize the patch acceptance process: this patch
makes changes to an existing hypercall so it is only fair that it needs
to go through extra levels of scrutiny. For the sake of simplicity and
decoupling (reducing dependencies between patches and between
components), I think it would be best to introduce an #define for the
minimum value of gpaddr_bits and then move this patch at the end of the
series; that way it becomes optional. Unfortunately the minimum value
is 32 (in practice I have never seen less than 40 but the architecture
supports 32 as minimum).


Actually, the info we are looking for is already exposed via
ID_AA64MMFR0_EL1. ID_AA64MMFR0_EL1 can be read from a virtual machine,
and Linux let userspace read it [1]. Regardless of this patch series, we
should make sure that Xen exposes the right mm64.pa_range value to guest
virtual machines. If that is done right, then you can just add support
for reading ID_AA64MMFR0_EL1 in libxl/libxc and then we don't need any
hypercall modifications changes.

So, in theory we already have all the interfaces we need, but in
practice they don't work: unfortunaly both Xen and Linux mark
ID_AA64MMFR0_EL1 as FTR_HIDDEN in cpufeature.c so neither Linux from
Xen, not userspace from Linux can actually read the real value :-/
They always read zero.

(Also I think we have an issue today with p2m_restrict_ipa_bits not
updating the mm64.pa_range value. I think that it should be fixed.)

Bertrand, do you have any ideas in regards to ID_AA64MMFR0_EL1?

If not, maybe we could just go with
#define MIN_GPADDR_BITS 32


[1] https://01.org/linuxgraphics/gfx-docs/drm/arm64/cpu-feature-registers.html


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-29 22:52 ` [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
  2021-09-30 15:36   ` Luca Fancellu
  2021-09-30 22:53   ` Stefano Stabellini
@ 2021-10-02  0:33   ` Stefano Stabellini
  2021-10-02 12:40     ` Oleksandr
  2021-10-04  6:41     ` Julien Grall
  2021-10-04  6:59   ` Julien Grall
  3 siblings, 2 replies; 25+ messages in thread
From: Stefano Stabellini @ 2021-10-02  0:33 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu, Volodymyr Babchuk

On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

I thought about it and I decided to commit this patch because it doesn't
actually need anything from the other two patches, and it is very useful
on its own (both of them are for domU, while this one is for dom0).

In regards to Julien's suggestion: as explained in earlier emails I
prefer this version but I don't have a strong opinion. If Julien still
prefers the other approach we can still change it in time for 4.16
(Oleksandr has already implemented both and I am happy to review.)


> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>    - update patch description
>    - add comment for "size" calculation in add_ext_regions()
>    - clarify "end" calculation in find_unallocated_memory() and
>      find_memory_holes()
>    - only pick up regions with size >= 64MB
>    - allocate reg dynamically instead of keeping on the stack in
>      make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - drop Linux specific limits EXT_REGION_*
>    - also cover "ranges" property in find_memory_holes()
>    - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>   - update opt_ext_regions purpose and comment in code
>   - reorganize make_hypervisor_node() to move allocations after initial
>     checks, allocate only required amount of elements instead of maximum
>     possible
> ---
>  docs/misc/xen-command-line.pandoc |  11 ++
>  xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>  Note that specifying zero as domU value means zero, while for dom0 it means
>  to use the default.
>  
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
>  ### flask
>  > `= permissive | enforcing | late | disabled`
>  
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
>  static unsigned int __initdata opt_dom0_max_vcpus;
>  integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>  
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
>  static u64 __initdata dom0_mem;
>  static bool __initdata dom0_mem_set;
>  
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>      return res;
>  }
>  
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)
> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)
> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {
> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
>  static int __init make_hypervisor_node(struct domain *d,
>                                         const struct kernel_info *kinfo,
>                                         int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>      const char compat[] =
>          "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>          "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>      gic_interrupt_t intr;
> -    __be32 *cells;
>      int res;
>      void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
>  
>      dt_dprintk("Create hypervisor node\n");
>  
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>      if ( res )
>          return res;
>  
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>      /* reg 0 is grant table space */
>      cells = &reg[0];
>      dt_child_set_range(&cells, addrcells, sizecells,
>                         kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);
> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>      res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>      if ( res )
>          return res;
>  
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-10-01 23:24         ` Stefano Stabellini
@ 2021-10-02  7:35           ` Julien Grall
  2021-10-02 14:08             ` Oleksandr
  0 siblings, 1 reply; 25+ messages in thread
From: Julien Grall @ 2021-10-02  7:35 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr, Jan Beulich, xen-devel, Oleksandr Tyshchenko,
	Ian Jackson, Wei Liu, Anthony PERARD, Andrew Cooper,
	George Dunlap, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné,
	Bertrand Marquis

[-- Attachment #1: Type: text/plain, Size: 6037 bytes --]

Hi

On Sat, 2 Oct 2021, 01:24 Stefano Stabellini, <sstabellini@kernel.org>
wrote:

> Bertrand, see comment on ID_AA64MMFR0_EL1 below, any ideas?
>
>
> On Fri, 1 Oct 2021, Oleksandr wrote:
> > On 01.10.21 10:50, Jan Beulich wrote:
> > > On 01.10.2021 01:00, Stefano Stabellini wrote:
> > > > On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > >
> > > > > We need to pass info about maximum supported guest address
> > > > > space size to the toolstack on Arm in order to properly
> > > > > calculate the base and size of the extended region (safe range)
> > > > > for the guest. The extended region is unused address space which
> > > > > could be safely used by domain for foreign/grant mappings on Arm.
> > > > > The extended region itself will be handled by the subsequents
> > > > > patch.
> > > > >
> > > > > Use p2m_ipa_bits variable on Arm, the x86 equivalent is
> > > > > hap_paddr_bits.
> > > > >
> > > > > As we change the size of structure bump the interface version.
> > > > >
> > > > > Suggested-by: Julien Grall <jgrall@amazon.com>
> > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com
> >
> > > > > Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> > > > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> > > I have to admit that I'm a little puzzled to see these R-b-s when ...
> > >
> > > > > Please note, that review comments for the RFC version [1] haven't
> been
> > > > > addressed yet.
> > > > > It is not forgotten, some clarification is needed. It will be
> addressed
> > > > > for the next version.
> > > > >
> > > > > [1]
> > > > >
> https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
> > > ... Oleksandr makes clear this patch isn't really ready yet.
> >
> > Unfortunately, this is true. I am still waiting for the clarification [1]
>
> Although I was aware of comments to older versions, this is actually the
> first version of this patch that I reviewed with any level of details; I
> didn't read previous comments very closely. I tried to find any bugs or
> problems with it and I couldn't see any, so I gave my reviewed-by. I
> should have clarified that was meant for the ARM part as I don't have a
> full understanding of the implications of using hap_paddr_bits on x86
> for VM migration.


>
> But let me take this opportunity to say that although I think the
> hypercall is OK, I wish we didn't need this patch at all: it is
> problematic because it touches tools, x86 and ARM hypervisor code all
> together. It needs at least three acks/reviewed-by to get accepted: from
> an x86 maintainer, an arm maintainer and from a tools maintainer. I
> don't say this to criticize the patch acceptance process: this patch
> makes changes to an existing hypercall so it is only fair that it needs
> to go through extra levels of scrutiny. For the sake of simplicity and
> decoupling (reducing dependencies between patches and between
> components), I think it would be best to introduce an #define for the
> minimum value of gpaddr_bits and then move this patch at the end of the
> series; that way it becomes optional.


It depends what you mean by optional. Yes we can add hack to avoid the
hypercall... But the more scalable solution is the hypercall.

I am slightly concerned that if we don't push for the hypercall now, then
there will be no incentive to do it afterwards...

So I went through Andrew's e-mail to understand what's the request. I
understand that there are some problem with migration. But it doesn't look
like we need to solve them now. Instead,  AFAICT, his main ask for this
series is to switch to a domctl.

It seems the conversation is simply stuck on waiting for Andrew to provide
details on what would look like. Did we ping Andrew on IRC?

Unfortunately the minimum value
> is 32 (in practice I have never seen less than 40 but the architecture
> supports 32 as minimum).


>
> Actually, the info we are looking for is already exposed via
> ID_AA64MMFR0_EL1. ID_AA64MMFR0_EL1 can be read from a virtual machine,
> and Linux let userspace read it [1]. Regardless of this patch series, we
> should make sure that Xen exposes the right mm64.pa_range value to guest
> virtual machines. If that is done right, then you can just add support
> for reading ID_AA64MMFR0_EL1 in libxl/libxc and then we don't need any
> hypercall modifications changes.


From my understanding, from a VM PoV "pa_range" should represent the size
of the guest physical address space.

Today, it happens that every VM is using the same P2M size. However, I
would rather not make such assumption in the userspace.


> So, in theory we already have all the interfaces we need, but in
> practice they don't work: unfortunaly both Xen and Linux mark
> ID_AA64MMFR0_EL1 as FTR_HIDDEN in cpufeature.c so neither Linux from
> Xen, not userspace from Linux can actually read the real value :-/
> They always read zero.
>
> (Also I think we have an issue today with p2m_restrict_ipa_bits not
> updating the mm64.pa_range value. I think that it should be fixed.)


It looks like it. That should be handled in a separate patch though.


> Bertrand, do you have any ideas in regards to ID_AA64MMFR0_EL1?
>
> If not, maybe we could just go with
> #define MIN_GPADDR_BITS 32.


The toolstack would have to consider it as the "maximum" because it may not
be safe to expose anything above.

With 32, we are going to be limited in term of space we can find.

We could potentially use 40 bits as a minimum. Although it still feels a
bit of a hack to me given that the IOMMU may restrict it further and the
architecture can in theory support less.

Overall, I still strongly prefer the hypercall approach. If a common one is
difficult to achieve, then we can extend the domctl to create a domain to
provide the p2m_bits (in the same way as we deal for the GIC version) in an
arch specific way.

Cheers,


>
> [1]
> https://01.org/linuxgraphics/gfx-docs/drm/arm64/cpu-feature-registers.html
>

[-- Attachment #2: Type: text/html, Size: 9473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-10-02  0:33   ` Stefano Stabellini
@ 2021-10-02 12:40     ` Oleksandr
  2021-10-04  6:41     ` Julien Grall
  1 sibling, 0 replies; 25+ messages in thread
From: Oleksandr @ 2021-10-02 12:40 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Wei Liu,
	Volodymyr Babchuk


On 02.10.21 03:33, Stefano Stabellini wrote:

Hi Stefano

> On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The minimum size of extended
>> region is 64MB. The maximum number of extended regions is 128,
>> which is an artificial limit to minimize code changes (we reuse
>> struct meminfo to describe extended regions, so there are an array
>> field for 128 elements).
>>
>> It worth mentioning that unallocated memory solution (when the IOMMU
>> is disabled) will work safely until Dom0 is able to allocate memory
>> outside of the original range.
>>
>> Also introduce command line option to be able to globally enable or
>> disable support for extended regions for Dom0 (enabled by default).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> I thought about it and I decided to commit this patch because it doesn't
> actually need anything from the other two patches, and it is very useful
> on its own (both of them are for domU, while this one is for dom0).

Thank you.


>
> In regards to Julien's suggestion: as explained in earlier emails I
> prefer this version but I don't have a strong opinion. If Julien still
> prefers the other approach we can still change it in time for 4.16
> (Oleksandr has already implemented both and I am happy to review.)

Sure, we will able to do it if needed.


>
>
>> ---
>> Please note, we need to decide which approach to use in find_unallocated_memory(),
>> you can find details at:
>> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
>>
>> Changes RFC -> V2:
>>     - update patch description
>>     - drop uneeded "extended-region" DT property
>>
>> Changes V2 -> V3:
>>     - update patch description
>>     - add comment for "size" calculation in add_ext_regions()
>>     - clarify "end" calculation in find_unallocated_memory() and
>>       find_memory_holes()
>>     - only pick up regions with size >= 64MB
>>     - allocate reg dynamically instead of keeping on the stack in
>>       make_hypervisor_node()
>>     - do not show warning for 32-bit domain
>>     - drop Linux specific limits EXT_REGION_*
>>     - also cover "ranges" property in find_memory_holes()
>>     - add command line arg to enable/disable extended region support
>>
>> Changes V3 -> V4:
>>    - update opt_ext_regions purpose and comment in code
>>    - reorganize make_hypervisor_node() to move allocations after initial
>>      checks, allocate only required amount of elements instead of maximum
>>      possible
>> ---
>>   docs/misc/xen-command-line.pandoc |  11 ++
>>   xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 294 insertions(+), 3 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
>> index 177e656..5cae4ad 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>>   Note that specifying zero as domU value means zero, while for dom0 it means
>>   to use the default.
>>   
>> +### ext_regions (Arm)
>> +> `= <boolean>`
>> +
>> +> Default : `true`
>> +
>> +Flag to enable or disable support for extended regions for Dom0.
>> +
>> +Extended regions are ranges of unused address space exposed to Dom0 as
>> +"safe to use" for special memory mappings. Disable if your board device
>> +tree is incomplete.
>> +
>>   ### flask
>>   > `= permissive | enforcing | late | disabled`
>>   
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index d233d63..c5afbe2 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -34,6 +34,10 @@
>>   static unsigned int __initdata opt_dom0_max_vcpus;
>>   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>>   
>> +/* If true, the extended regions support is enabled for dom0 */
>> +static bool __initdata opt_ext_regions = true;
>> +boolean_param("ext_regions", opt_ext_regions);
>> +
>>   static u64 __initdata dom0_mem;
>>   static bool __initdata dom0_mem_set;
>>   
>> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>>       return res;
>>   }
>>   
>> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
>> +{
>> +    struct meminfo *ext_regions = data;
>> +    paddr_t start, size;
>> +
>> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
>> +        return 0;
>> +
>> +    /* Both start and size of the extended region should be 2MB aligned */
>> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
>> +    if ( start > e )
>> +        return 0;
>> +
>> +    /*
>> +     * e is actually "end-1" because it is called by rangeset functions
>> +     * which are inclusive of the last address.
>> +     */
>> +    e += 1;
>> +    size = (e - start) & ~(SZ_2M - 1);
>> +    if ( size < MB(64) )
>> +        return 0;
>> +
>> +    ext_regions->bank[ext_regions->nr_banks].start = start;
>> +    ext_regions->bank[ext_regions->nr_banks].size = size;
>> +    ext_regions->nr_banks++;
>> +
>> +    return 0;
>> +}
>> +
>> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
>> +                                          struct meminfo *ext_regions)
>> +{
>> +    const struct meminfo *assign_mem = &kinfo->mem;
>> +    struct rangeset *unalloc_mem;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find unallocated memory for extended regions\n");
>> +
>> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
>> +    if ( !unalloc_mem )
>> +        return -ENOMEM;
>> +
>> +    /* Start with all available RAM */
>> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.mem.bank[i].start;
>> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
>> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove RAM assigned to Dom0 */
>> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
>> +    {
>> +        start = assign_mem->bank[i].start;
>> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove reserved-memory regions */
>> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.reserved_mem.bank[i].start;
>> +        end = bootinfo.reserved_mem.bank[i].start +
>> +            bootinfo.reserved_mem.bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove grant table region */
>> +    start = kinfo->gnttab_start;
>> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
>> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(unalloc_mem, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(unalloc_mem);
>> +
>> +    return res;
>> +}
>> +
>> +static int __init find_memory_holes(const struct kernel_info *kinfo,
>> +                                    struct meminfo *ext_regions)
>> +{
>> +    struct dt_device_node *np;
>> +    struct rangeset *mem_holes;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find memory holes for extended regions\n");
>> +
>> +    mem_holes = rangeset_new(NULL, NULL, 0);
>> +    if ( !mem_holes )
>> +        return -ENOMEM;
>> +
>> +    /* Start with maximum possible addressable physical memory range */
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_add_range(mem_holes, start, end);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    /*
>> +     * Remove regions described by "reg" and "ranges" properties where
>> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
>> +     */
>> +    dt_for_each_device_node( dt_host, np )
>> +    {
>> +        unsigned int naddr;
>> +        u64 addr, size;
>> +
>> +        naddr = dt_number_of_address(np);
>> +
>> +        for ( i = 0; i < naddr; i++ )
>> +        {
>> +            res = dt_device_get_address(np, i, &addr, &size);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
>> +                       i, dt_node_full_name(np));
>> +                goto out;
>> +            }
>> +
>> +            start = addr & PAGE_MASK;
>> +            end = PAGE_ALIGN(addr + size);
>> +            res = rangeset_remove_range(mem_holes, start, end - 1);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                       start, end);
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        if ( dt_device_type_is_equal(np, "pci" ) )
>> +        {
>> +            unsigned int range_size, nr_ranges;
>> +            int na, ns, pna;
>> +            const __be32 *ranges;
>> +            u32 len;
>> +
>> +            /*
>> +             * Looking for non-empty ranges property which in this context
>> +             * describes the PCI host bridge aperture.
>> +             */
>> +            ranges = dt_get_property(np, "ranges", &len);
>> +            if ( !ranges || !len )
>> +                continue;
>> +
>> +            pna = dt_n_addr_cells(np);
>> +            na = dt_child_n_addr_cells(np);
>> +            ns = dt_child_n_size_cells(np);
>> +            range_size = pna + na + ns;
>> +            nr_ranges = len / sizeof(__be32) / range_size;
>> +
>> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
>> +            {
>> +                /* Skip the child address and get the parent (CPU) address */
>> +                addr = dt_read_number(ranges + na, pna);
>> +                size = dt_read_number(ranges + na + pna, ns);
>> +
>> +                start = addr & PAGE_MASK;
>> +                end = PAGE_ALIGN(addr + size);
>> +                res = rangeset_remove_range(mem_holes, start, end - 1);
>> +                if ( res )
>> +                {
>> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                           start, end);
>> +                    goto out;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(mem_holes, start, end,
>> +                                 add_ext_regions,  ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(mem_holes);
>> +
>> +    return res;
>> +}
>> +
>>   static int __init make_hypervisor_node(struct domain *d,
>>                                          const struct kernel_info *kinfo,
>>                                          int addrcells, int sizecells)
>> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>>       const char compat[] =
>>           "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>>           "xen,xen";
>> -    __be32 reg[4];
>> +    __be32 *reg, *cells;
>>       gic_interrupt_t intr;
>> -    __be32 *cells;
>>       int res;
>>       void *fdt = kinfo->fdt;
>> +    struct meminfo *ext_regions = NULL;
>> +    unsigned int i, nr_ext_regions;
>>   
>>       dt_dprintk("Create hypervisor node\n");
>>   
>> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>>       if ( res )
>>           return res;
>>   
>> +    if ( !opt_ext_regions )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
>> +        nr_ext_regions = 0;
>> +    }
>> +    else if ( is_32bit_domain(d) )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
>> +        nr_ext_regions = 0;
>> +    }
>> +    else
>> +    {
>> +        ext_regions = xzalloc(struct meminfo);
>> +        if ( !ext_regions )
>> +            return -ENOMEM;
>> +
>> +        if ( !is_iommu_enabled(d) )
>> +            res = find_unallocated_memory(kinfo, ext_regions);
>> +        else
>> +            res = find_memory_holes(kinfo, ext_regions);
>> +
>> +        if ( res )
>> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
>> +        nr_ext_regions = ext_regions->nr_banks;
>> +    }
>> +
>> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
>> +    if ( !reg )
>> +    {
>> +        xfree(ext_regions);
>> +        return -ENOMEM;
>> +    }
>> +
>>       /* reg 0 is grant table space */
>>       cells = &reg[0];
>>       dt_child_set_range(&cells, addrcells, sizecells,
>>                          kinfo->gnttab_start, kinfo->gnttab_size);
>> +    /* reg 1...N are extended regions */
>> +    for ( i = 0; i < nr_ext_regions; i++ )
>> +    {
>> +        u64 start = ext_regions->bank[i].start;
>> +        u64 size = ext_regions->bank[i].size;
>> +
>> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
>> +                   i, start, start + size);
>> +
>> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
>> +    }
>> +
>>       res = fdt_property(fdt, "reg", reg,
>> -                       dt_cells_to_size(addrcells + sizecells));
>> +                       dt_cells_to_size(addrcells + sizecells) *
>> +                       (nr_ext_regions + 1));
>> +    xfree(ext_regions);
>> +    xfree(reg);
>> +
>>       if ( res )
>>           return res;
>>   
>> -- 
>> 2.7.4
>>
-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-10-02  7:35           ` Julien Grall
@ 2021-10-02 14:08             ` Oleksandr
  2021-10-04 21:11               ` Stefano Stabellini
  0 siblings, 1 reply; 25+ messages in thread
From: Oleksandr @ 2021-10-02 14:08 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Jan Beulich, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Anthony PERARD, Andrew Cooper, George Dunlap,
	Juergen Gross, Volodymyr Babchuk, Roger Pau Monné,
	Bertrand Marquis

[-- Attachment #1: Type: text/plain, Size: 7963 bytes --]


On 02.10.21 10:35, Julien Grall wrote:

Hi Julien, Stefano.


Thank you for your comments!

> Hi
>
> On Sat, 2 Oct 2021, 01:24 Stefano Stabellini, <sstabellini@kernel.org 
> <mailto:sstabellini@kernel.org>> wrote:
>
>     Bertrand, see comment on ID_AA64MMFR0_EL1 below, any ideas?
>
>
>     On Fri, 1 Oct 2021, Oleksandr wrote:
>     > On 01.10.21 10:50, Jan Beulich wrote:
>     > > On 01.10.2021 01:00, Stefano Stabellini wrote:
>     > > > On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>     > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com
>     <mailto:oleksandr_tyshchenko@epam.com>>
>     > > > >
>     > > > > We need to pass info about maximum supported guest address
>     > > > > space size to the toolstack on Arm in order to properly
>     > > > > calculate the base and size of the extended region (safe
>     range)
>     > > > > for the guest. The extended region is unused address space
>     which
>     > > > > could be safely used by domain for foreign/grant mappings
>     on Arm.
>     > > > > The extended region itself will be handled by the subsequents
>     > > > > patch.
>     > > > >
>     > > > > Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>     > > > > hap_paddr_bits.
>     > > > >
>     > > > > As we change the size of structure bump the interface version.
>     > > > >
>     > > > > Suggested-by: Julien Grall <jgrall@amazon.com
>     <mailto:jgrall@amazon.com>>
>     > > > > Signed-off-by: Oleksandr Tyshchenko
>     <oleksandr_tyshchenko@epam.com <mailto:oleksandr_tyshchenko@epam.com>>
>     > > > > Reviewed-by: Michal Orzel <michal.orzel@arm.com
>     <mailto:michal.orzel@arm.com>>
>     > > > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org
>     <mailto:sstabellini@kernel.org>>
>     > > I have to admit that I'm a little puzzled to see these R-b-s
>     when ...
>     > >
>     > > > > Please note, that review comments for the RFC version [1]
>     haven't been
>     > > > > addressed yet.
>     > > > > It is not forgotten, some clarification is needed. It will
>     be addressed
>     > > > > for the next version.
>     > > > >
>     > > > > [1]
>     > > > >
>     https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
>     > > ... Oleksandr makes clear this patch isn't really ready yet.
>     >
>     > Unfortunately, this is true. I am still waiting for the
>     clarification [1]
>
>     Although I was aware of comments to older versions, this is
>     actually the
>     first version of this patch that I reviewed with any level of
>     details; I
>     didn't read previous comments very closely. I tried to find any
>     bugs or
>     problems with it and I couldn't see any, so I gave my reviewed-by. I
>     should have clarified that was meant for the ARM part as I don't
>     have a
>     full understanding of the implications of using hap_paddr_bits on x86
>     for VM migration.
>
>
>
>     But let me take this opportunity to say that although I think the
>     hypercall is OK, I wish we didn't need this patch at all: it is
>     problematic because it touches tools, x86 and ARM hypervisor code all
>     together. It needs at least three acks/reviewed-by to get
>     accepted: from
>     an x86 maintainer, an arm maintainer and from a tools maintainer. I
>     don't say this to criticize the patch acceptance process: this patch
>     makes changes to an existing hypercall so it is only fair that it
>     needs
>     to go through extra levels of scrutiny. For the sake of simplicity and
>     decoupling (reducing dependencies between patches and between
>     components), I think it would be best to introduce an #define for the
>     minimum value of gpaddr_bits and then move this patch at the end
>     of the
>     series; that way it becomes optional.
>
>
> It depends what you mean by optional. Yes we can add hack to avoid the 
> hypercall... But the more scalable solution is the hypercall.
>
> I am slightly concerned that if we don't push for the hypercall now, 
> then there will be no incentive to do it afterwards...
>
> So I went through Andrew's e-mail to understand what's the request. I 
> understand that there are some problem with migration. But it doesn't 
> look like we need to solve them now. Instead,  AFAICT, his main ask 
> for this series is to switch to a domctl.
>
> It seems the conversation is simply stuck on waiting for Andrew to 
> provide details on what would look like. Did we ping Andrew on IRC?
>
>     Unfortunately the minimum value
>     is 32 (in practice I have never seen less than 40 but the architecture
>     supports 32 as minimum).
>
>
>
>     Actually, the info we are looking for is already exposed via
>     ID_AA64MMFR0_EL1. ID_AA64MMFR0_EL1 can be read from a virtual machine,
>     and Linux let userspace read it [1]. Regardless of this patch
>     series, we
>     should make sure that Xen exposes the right mm64.pa_range value to
>     guest
>     virtual machines. If that is done right, then you can just add support
>     for reading ID_AA64MMFR0_EL1 in libxl/libxc and then we don't need any
>     hypercall modifications changes.
>
>
> From my understanding, from a VM PoV "pa_range" should represent the 
> size of the guest physical address space.
>
> Today, it happens that every VM is using the same P2M size. However, I 
> would rather not make such assumption in the userspace.
>
>
>     So, in theory we already have all the interfaces we need, but in
>     practice they don't work: unfortunaly both Xen and Linux mark
>     ID_AA64MMFR0_EL1 as FTR_HIDDEN in cpufeature.c so neither Linux from
>     Xen, not userspace from Linux can actually read the real value :-/
>     They always read zero.
>
>     (Also I think we have an issue today with p2m_restrict_ipa_bits not
>     updating the mm64.pa_range value. I think that it should be fixed.)
>
>
> It looks like it. That should be handled in a separate patch though.
>
>
>     Bertrand, do you have any ideas in regards to ID_AA64MMFR0_EL1?
>
>     If not, maybe we could just go with
>     #define MIN_GPADDR_BITS 32.
>
>
> The toolstack would have to consider it as the "maximum" because it 
> may not be safe to expose anything above.
>
> With 32, we are going to be limited in term of space we can find.
>
> We could potentially use 40 bits as a minimum. Although it still feels 
> a bit of a hack to me given that the IOMMU may restrict it further and 
> the architecture can in theory support less.
>
> Overall, I still strongly prefer the hypercall approach. If a common 
> one is difficult to achieve, then we can extend the domctl to create a 
> domain to provide the p2m_bits (in the same way as we deal for the GIC 
> version) in an arch specific way.


To summarize:
If we don't query the hypervisor to provide gpaddr_bits we have two options:
- The safe option is to use minimum possible value which is 32 bits on 
Arm64. But, there would be of no practical use.
- The unsafe option is to use let's say "default" 40 bits and pray it 
will work in all cases on Arm64 (it is ok on Arm32).

So we definitely need to query the hypervisor. As it turned out the 
sysctl approach is not welcome, in the long term we want to have this 
information per domain. I have been absolutely OK with that valid ask 
since RFC, I just wanted to know what was the preferred way to do this 
(new domctl, existing, etc)...

I analyzed what Julien had suggested regarding pass gpaddr_bits via 
Arm's struct xen_arch_domainconfig (I assume, this should be an OUT 
parameter) and I think it makes sense. Taking into the account that the 
feature freeze date is coming, I will wait a few days, and if there are 
no objections I will send updated version (patch #3 also needs updating 
as it expects the gpaddr_bits to be in physinfo).


>
> Cheers,
>
>
>
>     [1]
>     https://01.org/linuxgraphics/gfx-docs/drm/arm64/cpu-feature-registers.html
>
-- 
Regards,

Oleksandr Tyshchenko


[-- Attachment #2: Type: text/html, Size: 14941 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-10-02  0:33   ` Stefano Stabellini
  2021-10-02 12:40     ` Oleksandr
@ 2021-10-04  6:41     ` Julien Grall
  1 sibling, 0 replies; 25+ messages in thread
From: Julien Grall @ 2021-10-04  6:41 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Wei Liu, Volodymyr Babchuk

Hi Stefano,

On 02/10/2021 02:33, Stefano Stabellini wrote:
> On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The minimum size of extended
>> region is 64MB. The maximum number of extended regions is 128,
>> which is an artificial limit to minimize code changes (we reuse
>> struct meminfo to describe extended regions, so there are an array
>> field for 128 elements).
>>
>> It worth mentioning that unallocated memory solution (when the IOMMU
>> is disabled) will work safely until Dom0 is able to allocate memory
>> outside of the original range.
>>
>> Also introduce command line option to be able to globally enable or
>> disable support for extended regions for Dom0 (enabled by default).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> I thought about it and I decided to commit this patch because it doesn't
> actually need anything from the other two patches, and it is very useful
> on its own (both of them are for domU, while this one is for dom0).
> 
> In regards to Julien's suggestion: as explained in earlier emails I
> prefer this version but I don't have a strong opinion. If Julien still
> prefers the other approach we can still change it in time for 4.16
> (Oleksandr has already implemented both and I am happy to review.)

Lets keep the committed approach for the 4.16. This is not something we 
tie into the ABI so it can be modified later on if we find some issues 
(i.e. more and more ranges to exclude).

However, I would still like to see some changes on top of this patch for 
4.16 (will comment separately).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-29 22:52 ` [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
                     ` (2 preceding siblings ...)
  2021-10-02  0:33   ` Stefano Stabellini
@ 2021-10-04  6:59   ` Julien Grall
  2021-10-04 12:08     ` Oleksandr
  3 siblings, 1 reply; 25+ messages in thread
From: Julien Grall @ 2021-10-04  6:59 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Stefano Stabellini, Wei Liu, Volodymyr Babchuk

Hi Oleksandr,

I saw Stefano committed this patch on Friday. However, I didn't have a 
chance go to through a second time and would like to request some 
follow-up changes.

On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>    currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. 

You explained below why the 128 limits, but I don't see any explanation 
on why 2MB and 64MB.

IIRC, 2MB was to potentally allow superpage mapping. I am not entirely 
sure for 64MB.

Could you add an in-code comment explaining the two limits?

> The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>     - update patch description
>     - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>     - update patch description
>     - add comment for "size" calculation in add_ext_regions()
>     - clarify "end" calculation in find_unallocated_memory() and
>       find_memory_holes()
>     - only pick up regions with size >= 64MB
>     - allocate reg dynamically instead of keeping on the stack in
>       make_hypervisor_node()
>     - do not show warning for 32-bit domain
>     - drop Linux specific limits EXT_REGION_*
>     - also cover "ranges" property in find_memory_holes()
>     - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>    - update opt_ext_regions purpose and comment in code
>    - reorganize make_hypervisor_node() to move allocations after initial
>      checks, allocate only required amount of elements instead of maximum
>      possible
> ---
>   docs/misc/xen-command-line.pandoc |  11 ++
>   xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-

The document in docs/misc/arm/device-tree/guest.txt needs to be updated 
to reflect the change in the binding.

>   2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>   Note that specifying zero as domU value means zero, while for dom0 it means
>   to use the default.
>   
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
>   ### flask
>   > `= permissive | enforcing | late | disabled`
>   
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
>   static unsigned int __initdata opt_dom0_max_vcpus;
>   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>   
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
>   static u64 __initdata dom0_mem;
>   static bool __initdata dom0_mem_set;
>   
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>       return res;
>   }
>   
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)

It would be good to have a comment on top of this function summarizing 
how this is meant to work.

> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",

start and end are paddr_t. So you want to use PRIpaddr rather than PRIx64.

> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

Ditto.

> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

Ditto.

> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

ditto.

> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)

I think it would be good to have a comment on top of the function how 
this is meant to work.

> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",

Please use PRIpaddr.

> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {

The code below looks like an open-coding version of dt_for_each_range(). 
Can you try to re-use it please? This will help to reduce the complexity 
of this function.

> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

This should be PRIpaddr.

> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
>   static int __init make_hypervisor_node(struct domain *d,
>                                          const struct kernel_info *kinfo,
>                                          int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>       const char compat[] =
>           "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>           "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>       gic_interrupt_t intr;
> -    __be32 *cells;
>       int res;
>       void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
>   
>       dt_dprintk("Create hypervisor node\n");
>   
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>       if ( res )
>           return res;
>   
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");

The extended regions is going to be critical for the performance in 
dom0. So I think this at least want to be a XENLOG_INFO.

> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");

This would want to be use XENLOG_WARN.

> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>       /* reg 0 is grant table space */
>       cells = &reg[0];
>       dt_child_set_range(&cells, addrcells, sizecells,
>                          kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);

I would prefer if this is a printk() so we get the extended region 
listed from the beginning.

> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>       res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>       if ( res )
>           return res;
>   
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-10-04  6:59   ` Julien Grall
@ 2021-10-04 12:08     ` Oleksandr
  2021-10-06 18:11       ` Julien Grall
  0 siblings, 1 reply; 25+ messages in thread
From: Oleksandr @ 2021-10-04 12:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Stefano Stabellini, Wei Liu,
	Volodymyr Babchuk


On 04.10.21 09:59, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien


>
> I saw Stefano committed this patch on Friday. However, I didn't have a 
> chance go to through a second time and would like to request some 
> follow-up changes.

ok, do you prefer the follow-up patch to be pushed separately or within 
the rest patches of this series (#1 and #3)?  If the former, I will try 
to push it today to close this question.


>
>
> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The minimum size of extended
>> region is 64MB. 
>
> You explained below why the 128 limits, but I don't see any 
> explanation on why 2MB and 64MB.
>
> IIRC, 2MB was to potentally allow superpage mapping. I am not entirely 
> sure for 64MB.
>
> Could you add an in-code comment explaining the two limits?

Yes. There was a discussion at [1]. 64MB was chosen as a reasonable 
value to deal with between initial 2MB (we might end up having a lot of 
small ranges which are not quite useful but increase bookkeeping) and 
suggested 1GB (we might not be able find a suitable regions at all).


>
>
>> The maximum number of extended regions is 128,
>> which is an artificial limit to minimize code changes (we reuse
>> struct meminfo to describe extended regions, so there are an array
>> field for 128 elements).
>>
>> It worth mentioning that unallocated memory solution (when the IOMMU
>> is disabled) will work safely until Dom0 is able to allocate memory
>> outside of the original range.
>>
>> Also introduce command line option to be able to globally enable or
>> disable support for extended regions for Dom0 (enabled by default).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Please note, we need to decide which approach to use in 
>> find_unallocated_memory(),
>> you can find details at:
>> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/ 
>>
>>
>> Changes RFC -> V2:
>>     - update patch description
>>     - drop uneeded "extended-region" DT property
>>
>> Changes V2 -> V3:
>>     - update patch description
>>     - add comment for "size" calculation in add_ext_regions()
>>     - clarify "end" calculation in find_unallocated_memory() and
>>       find_memory_holes()
>>     - only pick up regions with size >= 64MB
>>     - allocate reg dynamically instead of keeping on the stack in
>>       make_hypervisor_node()
>>     - do not show warning for 32-bit domain
>>     - drop Linux specific limits EXT_REGION_*
>>     - also cover "ranges" property in find_memory_holes()
>>     - add command line arg to enable/disable extended region support
>>
>> Changes V3 -> V4:
>>    - update opt_ext_regions purpose and comment in code
>>    - reorganize make_hypervisor_node() to move allocations after initial
>>      checks, allocate only required amount of elements instead of 
>> maximum
>>      possible
>> ---
>>   docs/misc/xen-command-line.pandoc |  11 ++
>>   xen/arch/arm/domain_build.c       | 286 
>> +++++++++++++++++++++++++++++++++++++-
>
> The document in docs/misc/arm/device-tree/guest.txt needs to be 
> updated to reflect the change in the binding.

Good point. Will update.


>
>>   2 files changed, 294 insertions(+), 3 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.pandoc 
>> b/docs/misc/xen-command-line.pandoc
>> index 177e656..5cae4ad 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>>   Note that specifying zero as domU value means zero, while for dom0 
>> it means
>>   to use the default.
>>   +### ext_regions (Arm)
>> +> `= <boolean>`
>> +
>> +> Default : `true`
>> +
>> +Flag to enable or disable support for extended regions for Dom0.
>> +
>> +Extended regions are ranges of unused address space exposed to Dom0 as
>> +"safe to use" for special memory mappings. Disable if your board device
>> +tree is incomplete.
>> +
>>   ### flask
>>   > `= permissive | enforcing | late | disabled`
>>   diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index d233d63..c5afbe2 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -34,6 +34,10 @@
>>   static unsigned int __initdata opt_dom0_max_vcpus;
>>   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>>   +/* If true, the extended regions support is enabled for dom0 */
>> +static bool __initdata opt_ext_regions = true;
>> +boolean_param("ext_regions", opt_ext_regions);
>> +
>>   static u64 __initdata dom0_mem;
>>   static bool __initdata dom0_mem_set;
>>   @@ -886,6 +890,232 @@ static int __init make_memory_node(const 
>> struct domain *d,
>>       return res;
>>   }
>>   +static int __init add_ext_regions(unsigned long s, unsigned long 
>> e, void *data)
>> +{
>> +    struct meminfo *ext_regions = data;
>> +    paddr_t start, size;
>> +
>> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
>> +        return 0;
>> +
>> +    /* Both start and size of the extended region should be 2MB 
>> aligned */
>> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
>> +    if ( start > e )
>> +        return 0;
>> +
>> +    /*
>> +     * e is actually "end-1" because it is called by rangeset functions
>> +     * which are inclusive of the last address.
>> +     */
>> +    e += 1;
>> +    size = (e - start) & ~(SZ_2M - 1);
>> +    if ( size < MB(64) )
>> +        return 0;
>> +
>> +    ext_regions->bank[ext_regions->nr_banks].start = start;
>> +    ext_regions->bank[ext_regions->nr_banks].size = size;
>> +    ext_regions->nr_banks++;
>> +
>> +    return 0;
>> +}
>> +
>> +static int __init find_unallocated_memory(const struct kernel_info 
>> *kinfo,
>> +                                          struct meminfo *ext_regions)
>
> It would be good to have a comment on top of this function summarizing 
> how this is meant to work.

Will add.


>
>
>> +{
>> +    const struct meminfo *assign_mem = &kinfo->mem;
>> +    struct rangeset *unalloc_mem;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find unallocated memory for extended regions\n");
>> +
>> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
>> +    if ( !unalloc_mem )
>> +        return -ENOMEM;
>> +
>> +    /* Start with all available RAM */
>> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.mem.bank[i].start;
>> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
>> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to add: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> start and end are paddr_t. So you want to use PRIpaddr rather than PRIx64.

ok for this and the similar comments below.


>
>
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove RAM assigned to Dom0 */
>> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
>> +    {
>> +        start = assign_mem->bank[i].start;
>> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> Ditto.
>
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove reserved-memory regions */
>> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.reserved_mem.bank[i].start;
>> +        end = bootinfo.reserved_mem.bank[i].start +
>> +            bootinfo.reserved_mem.bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> Ditto.
>
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove grant table region */
>> +    start = kinfo->gnttab_start;
>> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
>> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>
> ditto.
>
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(unalloc_mem, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(unalloc_mem);
>> +
>> +    return res;
>> +}
>> +
>> +static int __init find_memory_holes(const struct kernel_info *kinfo,
>> +                                    struct meminfo *ext_regions)
>
> I think it would be good to have a comment on top of the function how 
> this is meant to work.

Will add.


>
>
>> +{
>> +    struct dt_device_node *np;
>> +    struct rangeset *mem_holes;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find memory holes for extended regions\n");
>> +
>> +    mem_holes = rangeset_new(NULL, NULL, 0);
>> +    if ( !mem_holes )
>> +        return -ENOMEM;
>> +
>> +    /* Start with maximum possible addressable physical memory range */
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_add_range(mem_holes, start, end);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>
> Please use PRIpaddr.
>
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    /*
>> +     * Remove regions described by "reg" and "ranges" properties where
>> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
>> +     */
>> +    dt_for_each_device_node( dt_host, np )
>> +    {
>> +        unsigned int naddr;
>> +        u64 addr, size;
>> +
>> +        naddr = dt_number_of_address(np);
>> +
>> +        for ( i = 0; i < naddr; i++ )
>> +        {
>> +            res = dt_device_get_address(np, i, &addr, &size);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Unable to retrieve address %u for 
>> %s\n",
>> +                       i, dt_node_full_name(np));
>> +                goto out;
>> +            }
>> +
>> +            start = addr & PAGE_MASK;
>> +            end = PAGE_ALIGN(addr + size);
>> +            res = rangeset_remove_range(mem_holes, start, end - 1);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>> +                       start, end);
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        if ( dt_device_type_is_equal(np, "pci" ) )
>> +        {
>
> The code below looks like an open-coding version of 
> dt_for_each_range(). Can you try to re-use it please? This will help 
> to reduce the complexity of this function.

You are right, it makes sense, will definitely reuse. If I was aware of 
that function before I would safe some time I spent on the investigation 
how to parse that)


diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index ef2a177..8820b9c 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1003,6 +1003,26 @@ out:
      return res;
  }

+static int __init handle_pci_range(const struct dt_device_node *dev,
+                                   u64 addr, u64 len, void *data)
+{
+    struct rangeset *mem_holes = data;
+    paddr_t start, end;
+    int res;
+
+    start = addr & PAGE_MASK;
+    end = PAGE_ALIGN(addr + len);
+    res = rangeset_remove_range(mem_holes, start, end - 1);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        return res;
+    }
+
+    return 0;
+}
+
  static int __init find_memory_holes(const struct kernel_info *kinfo,
                                      struct meminfo *ext_regions)
  {
@@ -1061,43 +1081,19 @@ static int __init find_memory_holes(const struct 
kernel_info *kinfo,
              }
          }

-        if ( dt_device_type_is_equal(np, "pci" ) )
+        if ( dt_device_type_is_equal(np, "pci") )
          {
-            unsigned int range_size, nr_ranges;
-            int na, ns, pna;
-            const __be32 *ranges;
-            u32 len;
-
              /*
-             * Looking for non-empty ranges property which in this context
-             * describes the PCI host bridge aperture.
+             * The ranges property in this context describes the PCI host
+             * bridge aperture. It shall be absent if no addresses are 
mapped
+             * through the bridge.
               */
-            ranges = dt_get_property(np, "ranges", &len);
-            if ( !ranges || !len )
+            if ( !dt_get_property(np, "ranges", NULL) )
                  continue;

-            pna = dt_n_addr_cells(np);
-            na = dt_child_n_addr_cells(np);
-            ns = dt_child_n_size_cells(np);
-            range_size = pna + na + ns;
-            nr_ranges = len / sizeof(__be32) / range_size;
-
-            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
-            {
-                /* Skip the child address and get the parent (CPU) 
address */
-                addr = dt_read_number(ranges + na, pna);
-                size = dt_read_number(ranges + na + pna, ns);
-
-                start = addr & PAGE_MASK;
-                end = PAGE_ALIGN(addr + size);
-                res = rangeset_remove_range(mem_holes, start, end - 1);
-                if ( res )
-                {
-                    printk(XENLOG_ERR "Failed to remove: 
%#"PRIx64"->%#"PRIx64"\n",
-                           start, end);
-                    goto out;
-                }
-            }
+            res = dt_for_each_range(np, &handle_pci_range, mem_holes);
+            if ( res )
+                goto out;
          }
      }

(END)


>
>
>> +            unsigned int range_size, nr_ranges;
>> +            int na, ns, pna;
>> +            const __be32 *ranges;
>> +            u32 len;
>> +
>> +            /*
>> +             * Looking for non-empty ranges property which in this 
>> context
>> +             * describes the PCI host bridge aperture.
>> +             */
>> +            ranges = dt_get_property(np, "ranges", &len);
>> +            if ( !ranges || !len )
>> +                continue;
>> +
>> +            pna = dt_n_addr_cells(np);
>> +            na = dt_child_n_addr_cells(np);
>> +            ns = dt_child_n_size_cells(np);
>> +            range_size = pna + na + ns;
>> +            nr_ranges = len / sizeof(__be32) / range_size;
>> +
>> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
>> +            {
>> +                /* Skip the child address and get the parent (CPU) 
>> address */
>> +                addr = dt_read_number(ranges + na, pna);
>> +                size = dt_read_number(ranges + na + pna, ns);
>> +
>> +                start = addr & PAGE_MASK;
>> +                end = PAGE_ALIGN(addr + size);
>> +                res = rangeset_remove_range(mem_holes, start, end - 1);
>> +                if ( res )
>> +                {
>> +                    printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> This should be PRIpaddr.
>
>> +                           start, end);
>> +                    goto out;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(mem_holes, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(mem_holes);
>> +
>> +    return res;
>> +}
>> +
>>   static int __init make_hypervisor_node(struct domain *d,
>>                                          const struct kernel_info 
>> *kinfo,
>>                                          int addrcells, int sizecells)
>> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct 
>> domain *d,
>>       const char compat[] =
>> "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>>           "xen,xen";
>> -    __be32 reg[4];
>> +    __be32 *reg, *cells;
>>       gic_interrupt_t intr;
>> -    __be32 *cells;
>>       int res;
>>       void *fdt = kinfo->fdt;
>> +    struct meminfo *ext_regions = NULL;
>> +    unsigned int i, nr_ext_regions;
>>         dt_dprintk("Create hypervisor node\n");
>>   @@ -919,12 +1150,61 @@ static int __init 
>> make_hypervisor_node(struct domain *d,
>>       if ( res )
>>           return res;
>>   +    if ( !opt_ext_regions )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions support is 
>> disabled\n");
>
> The extended regions is going to be critical for the performance in 
> dom0. So I think this at least want to be a XENLOG_INFO.

ok


>
>
>> +        nr_ext_regions = 0;
>> +    }
>> +    else if ( is_32bit_domain(d) )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions are only supported 
>> for 64-bit guest currently\n");
>
> This would want to be use XENLOG_WARN.

ok, it seems this was in my initial version.


>
>
>> +        nr_ext_regions = 0;
>> +    }
>> +    else
>> +    {
>> +        ext_regions = xzalloc(struct meminfo);
>> +        if ( !ext_regions )
>> +            return -ENOMEM;
>> +
>> +        if ( !is_iommu_enabled(d) )
>> +            res = find_unallocated_memory(kinfo, ext_regions);
>> +        else
>> +            res = find_memory_holes(kinfo, ext_regions);
>> +
>> +        if ( res )
>> +            printk(XENLOG_WARNING "Failed to allocate extended 
>> regions\n");
>> +        nr_ext_regions = ext_regions->nr_banks;
>> +    }
>> +
>> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + 
>> sizecells));
>> +    if ( !reg )
>> +    {
>> +        xfree(ext_regions);
>> +        return -ENOMEM;
>> +    }
>> +
>>       /* reg 0 is grant table space */
>>       cells = &reg[0];
>>       dt_child_set_range(&cells, addrcells, sizecells,
>>                          kinfo->gnttab_start, kinfo->gnttab_size);
>> +    /* reg 1...N are extended regions */
>> +    for ( i = 0; i < nr_ext_regions; i++ )
>> +    {
>> +        u64 start = ext_regions->bank[i].start;
>> +        u64 size = ext_regions->bank[i].size;
>> +
>> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
>> +                   i, start, start + size);
>
> I would prefer if this is a printk() so we get the extended region 
> listed from the beginning.

ok


>
>
>> +
>> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
>> +    }
>> +
>>       res = fdt_property(fdt, "reg", reg,
>> -                       dt_cells_to_size(addrcells + sizecells));
>> +                       dt_cells_to_size(addrcells + sizecells) *
>> +                       (nr_ext_regions + 1));
>> +    xfree(ext_regions);
>> +    xfree(reg);
>> +
>>       if ( res )
>>           return res;
>>
>
[1] 
https://lore.kernel.org/xen-devel/b349e43a-91d9-16ba-57c6-34e790b8b31c@gmail.com/


Thank you.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-10-02 14:08             ` Oleksandr
@ 2021-10-04 21:11               ` Stefano Stabellini
  2021-10-05 19:49                 ` Oleksandr
  0 siblings, 1 reply; 25+ messages in thread
From: Stefano Stabellini @ 2021-10-04 21:11 UTC (permalink / raw)
  To: Oleksandr
  Cc: Julien Grall, Stefano Stabellini, Jan Beulich, xen-devel,
	Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Andrew Cooper, George Dunlap, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné,
	Bertrand Marquis

[-- Attachment #1: Type: text/plain, Size: 7744 bytes --]

On Sat, 2 Oct 2021, Oleksandr wrote:
> On 02.10.21 10:35, Julien Grall wrote:
> 
> Thank you for your comments!
> 
>       Hi
> 
>       On Sat, 2 Oct 2021, 01:24 Stefano Stabellini, <sstabellini@kernel.org> wrote:
>       Bertrand, see comment on ID_AA64MMFR0_EL1 below, any ideas?
> 
> 
>       On Fri, 1 Oct 2021, Oleksandr wrote:
>       > On 01.10.21 10:50, Jan Beulich wrote:
>       > > On 01.10.2021 01:00, Stefano Stabellini wrote:
>       > > > On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>       > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>       > > > >
>       > > > > We need to pass info about maximum supported guest address
>       > > > > space size to the toolstack on Arm in order to properly
>       > > > > calculate the base and size of the extended region (safe range)
>       > > > > for the guest. The extended region is unused address space which
>       > > > > could be safely used by domain for foreign/grant mappings on Arm.
>       > > > > The extended region itself will be handled by the subsequents
>       > > > > patch.
>       > > > >
>       > > > > Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>       > > > > hap_paddr_bits.
>       > > > >
>       > > > > As we change the size of structure bump the interface version.
>       > > > >
>       > > > > Suggested-by: Julien Grall <jgrall@amazon.com>
>       > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>       > > > > Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>       > > > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>       > > I have to admit that I'm a little puzzled to see these R-b-s when ...
>       > >
>       > > > > Please note, that review comments for the RFC version [1] haven't been
>       > > > > addressed yet.
>       > > > > It is not forgotten, some clarification is needed. It will be addressed
>       > > > > for the next version.
>       > > > >
>       > > > > [1]
>       > > > > https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
>       > > ... Oleksandr makes clear this patch isn't really ready yet.
>       >
>       > Unfortunately, this is true. I am still waiting for the clarification [1]
> 
>       Although I was aware of comments to older versions, this is actually the
>       first version of this patch that I reviewed with any level of details; I
>       didn't read previous comments very closely. I tried to find any bugs or
>       problems with it and I couldn't see any, so I gave my reviewed-by. I
>       should have clarified that was meant for the ARM part as I don't have a
>       full understanding of the implications of using hap_paddr_bits on x86
>       for VM migration.
> 
> 
> 
>       But let me take this opportunity to say that although I think the
>       hypercall is OK, I wish we didn't need this patch at all: it is
>       problematic because it touches tools, x86 and ARM hypervisor code all
>       together. It needs at least three acks/reviewed-by to get accepted: from
>       an x86 maintainer, an arm maintainer and from a tools maintainer. I
>       don't say this to criticize the patch acceptance process: this patch
>       makes changes to an existing hypercall so it is only fair that it needs
>       to go through extra levels of scrutiny. For the sake of simplicity and
>       decoupling (reducing dependencies between patches and between
>       components), I think it would be best to introduce an #define for the
>       minimum value of gpaddr_bits and then move this patch at the end of the
>       series; that way it becomes optional.
> 
> 
> It depends what you mean by optional. Yes we can add hack to avoid the hypercall... But the more scalable solution is the hypercall.
> 
> I am slightly concerned that if we don't push for the hypercall now, then there will be no incentive to do it afterwards...
> 
> So I went through Andrew's e-mail to understand what's the request. I understand that there are some problem with migration. But it
> doesn't look like we need to solve them now. Instead,  AFAICT, his main ask for this series is to switch to a domctl.
> 
> It seems the conversation is simply stuck on waiting for Andrew to provide details on what would look like. Did we ping Andrew on
> IRC?
> 
>       Unfortunately the minimum value
>       is 32 (in practice I have never seen less than 40 but the architecture
>       supports 32 as minimum).
> 
> 
> 
>       Actually, the info we are looking for is already exposed via
>       ID_AA64MMFR0_EL1. ID_AA64MMFR0_EL1 can be read from a virtual machine,
>       and Linux let userspace read it [1]. Regardless of this patch series, we
>       should make sure that Xen exposes the right mm64.pa_range value to guest
>       virtual machines. If that is done right, then you can just add support
>       for reading ID_AA64MMFR0_EL1 in libxl/libxc and then we don't need any
>       hypercall modifications changes.
> 
> 
> From my understanding, from a VM PoV "pa_range" should represent the size of the guest physical address space.
> 
> Today, it happens that every VM is using the same P2M size. However, I would rather not make such assumption in the userspace.
> 
> 
>       So, in theory we already have all the interfaces we need, but in
>       practice they don't work: unfortunaly both Xen and Linux mark
>       ID_AA64MMFR0_EL1 as FTR_HIDDEN in cpufeature.c so neither Linux from
>       Xen, not userspace from Linux can actually read the real value :-/
>       They always read zero.
> 
>       (Also I think we have an issue today with p2m_restrict_ipa_bits not
>       updating the mm64.pa_range value. I think that it should be fixed.)
> 
> 
> It looks like it. That should be handled in a separate patch though.
> 
> 
>       Bertrand, do you have any ideas in regards to ID_AA64MMFR0_EL1?
> 
>       If not, maybe we could just go with
>       #define MIN_GPADDR_BITS 32.
> 
> 
> The toolstack would have to consider it as the "maximum" because it may not be safe to expose anything above.
> 
> With 32, we are going to be limited in term of space we can find.
> 
> We could potentially use 40 bits as a minimum. Although it still feels a bit of a hack to me given that the IOMMU may restrict it
> further and the architecture can in theory support less.
> 
> Overall, I still strongly prefer the hypercall approach. If a common one is difficult to achieve, then we can extend the domctl to
> create a domain to provide the p2m_bits (in the same way as we deal for the GIC version) in an arch specific way.
> 
> 
> To summarize:
> If we don't query the hypervisor to provide gpaddr_bits we have two options:
> - The safe option is to use minimum possible value which is 32 bits on Arm64. But, there would be of no practical use.
> - The unsafe option is to use let's say "default" 40 bits and pray it will work in all cases on Arm64 (it is ok on Arm32).
> 
> So we definitely need to query the hypervisor. As it turned out the sysctl approach is not welcome, in the long term we want to have this
> information per domain. I have been absolutely OK with that valid ask since RFC, I just wanted to know what was the preferred way to do
> this (new domctl, existing, etc)...
> 
> I analyzed what Julien had suggested regarding pass gpaddr_bits via Arm's struct xen_arch_domainconfig (I assume, this should be an OUT
> parameter) and I think it makes sense. Taking into the account that the feature freeze date is coming, I will wait a few days, and if there
> are no objections I will send updated version (patch #3 also needs updating as it expects the gpaddr_bits to be in physinfo).


No objections from me, I think Julien's suggestion is a good one.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-09-29 22:52 ` [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
@ 2021-10-05 19:42   ` Oleksandr
  2021-10-05 21:36     ` Stefano Stabellini
  0 siblings, 1 reply; 25+ messages in thread
From: Oleksandr @ 2021-10-05 19:42 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Ian Jackson, Wei Liu,
	Anthony PERARD, Juergen Gross, Volodymyr Babchuk


On 30.09.21 01:52, Oleksandr Tyshchenko wrote:

Hi Stefano, Julien.

> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
>
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
>
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>    currently.
> - The ACPI case is not covered.
>
> ***
>
> The algorithm to choose extended regions for non-direct mapped
> DomU is simpler in comparison with the algorithm for direct mapped
> Dom0. As we have a lot of unused space above 4GB, provide single
> 2MB-aligned region from the second RAM bank taking into the account
> the maximum supported guest address space size and the amount of
> memory assigned to the guest. The maximum size of the region is 128GB.
> The minimum size is 64MB.
>
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> Acked-by: Ian Jackson <iwj@xenproject.org>
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> Tested-by: Michal Orzel <michal.orzel@arm.com>

I though a bit more on this and decided to make a patch more functional 
by trying to also allocate extended region below 4GB, I think we could 
do with it.
Actually if guest memory size is less than GUEST_RAM0_SIZE, we are able 
to provide unused space. I have tested with with various guest memory 
sizes and it worked fine. Also I decided to drop limit for maximum 
extended region size (128GB), we don't apply this limit in Dom0 and I 
don't see why we need it here, moreover the calculation became more 
obvious. I will drop all acks and send updated version. Are there any 
objections?



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-10-04 21:11               ` Stefano Stabellini
@ 2021-10-05 19:49                 ` Oleksandr
  0 siblings, 0 replies; 25+ messages in thread
From: Oleksandr @ 2021-10-05 19:49 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Jan Beulich, xen-devel, Oleksandr Tyshchenko,
	Ian Jackson, Wei Liu, Anthony PERARD, Andrew Cooper,
	George Dunlap, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné,
	Bertrand Marquis


On 05.10.21 00:11, Stefano Stabellini wrote:

Hi Stefano

> On Sat, 2 Oct 2021, Oleksandr wrote:
>> On 02.10.21 10:35, Julien Grall wrote:
>>
>> Thank you for your comments!
>>
>>        Hi
>>
>>        On Sat, 2 Oct 2021, 01:24 Stefano Stabellini, <sstabellini@kernel.org> wrote:
>>        Bertrand, see comment on ID_AA64MMFR0_EL1 below, any ideas?
>>
>>
>>        On Fri, 1 Oct 2021, Oleksandr wrote:
>>        > On 01.10.21 10:50, Jan Beulich wrote:
>>        > > On 01.10.2021 01:00, Stefano Stabellini wrote:
>>        > > > On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>>        > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>        > > > >
>>        > > > > We need to pass info about maximum supported guest address
>>        > > > > space size to the toolstack on Arm in order to properly
>>        > > > > calculate the base and size of the extended region (safe range)
>>        > > > > for the guest. The extended region is unused address space which
>>        > > > > could be safely used by domain for foreign/grant mappings on Arm.
>>        > > > > The extended region itself will be handled by the subsequents
>>        > > > > patch.
>>        > > > >
>>        > > > > Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>>        > > > > hap_paddr_bits.
>>        > > > >
>>        > > > > As we change the size of structure bump the interface version.
>>        > > > >
>>        > > > > Suggested-by: Julien Grall <jgrall@amazon.com>
>>        > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>        > > > > Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>>        > > > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>        > > I have to admit that I'm a little puzzled to see these R-b-s when ...
>>        > >
>>        > > > > Please note, that review comments for the RFC version [1] haven't been
>>        > > > > addressed yet.
>>        > > > > It is not forgotten, some clarification is needed. It will be addressed
>>        > > > > for the next version.
>>        > > > >
>>        > > > > [1]
>>        > > > > https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
>>        > > ... Oleksandr makes clear this patch isn't really ready yet.
>>        >
>>        > Unfortunately, this is true. I am still waiting for the clarification [1]
>>
>>        Although I was aware of comments to older versions, this is actually the
>>        first version of this patch that I reviewed with any level of details; I
>>        didn't read previous comments very closely. I tried to find any bugs or
>>        problems with it and I couldn't see any, so I gave my reviewed-by. I
>>        should have clarified that was meant for the ARM part as I don't have a
>>        full understanding of the implications of using hap_paddr_bits on x86
>>        for VM migration.
>>
>>
>>
>>        But let me take this opportunity to say that although I think the
>>        hypercall is OK, I wish we didn't need this patch at all: it is
>>        problematic because it touches tools, x86 and ARM hypervisor code all
>>        together. It needs at least three acks/reviewed-by to get accepted: from
>>        an x86 maintainer, an arm maintainer and from a tools maintainer. I
>>        don't say this to criticize the patch acceptance process: this patch
>>        makes changes to an existing hypercall so it is only fair that it needs
>>        to go through extra levels of scrutiny. For the sake of simplicity and
>>        decoupling (reducing dependencies between patches and between
>>        components), I think it would be best to introduce an #define for the
>>        minimum value of gpaddr_bits and then move this patch at the end of the
>>        series; that way it becomes optional.
>>
>>
>> It depends what you mean by optional. Yes we can add hack to avoid the hypercall... But the more scalable solution is the hypercall.
>>
>> I am slightly concerned that if we don't push for the hypercall now, then there will be no incentive to do it afterwards...
>>
>> So I went through Andrew's e-mail to understand what's the request. I understand that there are some problem with migration. But it
>> doesn't look like we need to solve them now. Instead,  AFAICT, his main ask for this series is to switch to a domctl.
>>
>> It seems the conversation is simply stuck on waiting for Andrew to provide details on what would look like. Did we ping Andrew on
>> IRC?
>>
>>        Unfortunately the minimum value
>>        is 32 (in practice I have never seen less than 40 but the architecture
>>        supports 32 as minimum).
>>
>>
>>
>>        Actually, the info we are looking for is already exposed via
>>        ID_AA64MMFR0_EL1. ID_AA64MMFR0_EL1 can be read from a virtual machine,
>>        and Linux let userspace read it [1]. Regardless of this patch series, we
>>        should make sure that Xen exposes the right mm64.pa_range value to guest
>>        virtual machines. If that is done right, then you can just add support
>>        for reading ID_AA64MMFR0_EL1 in libxl/libxc and then we don't need any
>>        hypercall modifications changes.
>>
>>
>>  From my understanding, from a VM PoV "pa_range" should represent the size of the guest physical address space.
>>
>> Today, it happens that every VM is using the same P2M size. However, I would rather not make such assumption in the userspace.
>>
>>
>>        So, in theory we already have all the interfaces we need, but in
>>        practice they don't work: unfortunaly both Xen and Linux mark
>>        ID_AA64MMFR0_EL1 as FTR_HIDDEN in cpufeature.c so neither Linux from
>>        Xen, not userspace from Linux can actually read the real value :-/
>>        They always read zero.
>>
>>        (Also I think we have an issue today with p2m_restrict_ipa_bits not
>>        updating the mm64.pa_range value. I think that it should be fixed.)
>>
>>
>> It looks like it. That should be handled in a separate patch though.
>>
>>
>>        Bertrand, do you have any ideas in regards to ID_AA64MMFR0_EL1?
>>
>>        If not, maybe we could just go with
>>        #define MIN_GPADDR_BITS 32.
>>
>>
>> The toolstack would have to consider it as the "maximum" because it may not be safe to expose anything above.
>>
>> With 32, we are going to be limited in term of space we can find.
>>
>> We could potentially use 40 bits as a minimum. Although it still feels a bit of a hack to me given that the IOMMU may restrict it
>> further and the architecture can in theory support less.
>>
>> Overall, I still strongly prefer the hypercall approach. If a common one is difficult to achieve, then we can extend the domctl to
>> create a domain to provide the p2m_bits (in the same way as we deal for the GIC version) in an arch specific way.
>>
>>
>> To summarize:
>> If we don't query the hypervisor to provide gpaddr_bits we have two options:
>> - The safe option is to use minimum possible value which is 32 bits on Arm64. But, there would be of no practical use.
>> - The unsafe option is to use let's say "default" 40 bits and pray it will work in all cases on Arm64 (it is ok on Arm32).
>>
>> So we definitely need to query the hypervisor. As it turned out the sysctl approach is not welcome, in the long term we want to have this
>> information per domain. I have been absolutely OK with that valid ask since RFC, I just wanted to know what was the preferred way to do
>> this (new domctl, existing, etc)...
>>
>> I analyzed what Julien had suggested regarding pass gpaddr_bits via Arm's struct xen_arch_domainconfig (I assume, this should be an OUT
>> parameter) and I think it makes sense. Taking into the account that the feature freeze date is coming, I will wait a few days, and if there
>> are no objections I will send updated version (patch #3 also needs updating as it expects the gpaddr_bits to be in physinfo).
>
> No objections from me, I think Julien's suggestion is a good one.

ok, great. I have already implemented/tested that.


Thank you.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-10-05 19:42   ` Oleksandr
@ 2021-10-05 21:36     ` Stefano Stabellini
  2021-10-06 10:11       ` Oleksandr
  0 siblings, 1 reply; 25+ messages in thread
From: Stefano Stabellini @ 2021-10-05 21:36 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Julien Grall, xen-devel,
	Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Juergen Gross, Volodymyr Babchuk

On Tue, 5 Oct 2021, Oleksandr wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > 
> > The extended region (safe range) is a region of guest physical
> > address space which is unused and could be safely used to create
> > grant/foreign mappings instead of wasting real RAM pages from
> > the domain memory for establishing these mappings.
> > 
> > The extended regions are chosen at the domain creation time and
> > advertised to it via "reg" property under hypervisor node in
> > the guest device-tree. As region 0 is reserved for grant table
> > space (always present), the indexes for extended regions are 1...N.
> > If extended regions could not be allocated for some reason,
> > Xen doesn't fail and behaves as usual, so only inserts region 0.
> > 
> > Please note the following limitations:
> > - The extended region feature is only supported for 64-bit domain
> >    currently.
> > - The ACPI case is not covered.
> > 
> > ***
> > 
> > The algorithm to choose extended regions for non-direct mapped
> > DomU is simpler in comparison with the algorithm for direct mapped
> > Dom0. As we have a lot of unused space above 4GB, provide single
> > 2MB-aligned region from the second RAM bank taking into the account
> > the maximum supported guest address space size and the amount of
> > memory assigned to the guest. The maximum size of the region is 128GB.
> > The minimum size is 64MB.
> > 
> > Suggested-by: Julien Grall <jgrall@amazon.com>
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> > Acked-by: Ian Jackson <iwj@xenproject.org>
> > Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> > Tested-by: Michal Orzel <michal.orzel@arm.com>
> 
> I though a bit more on this and decided to make a patch more functional by
> trying to also allocate extended region below 4GB, I think we could do with
> it.
> Actually if guest memory size is less than GUEST_RAM0_SIZE, we are able to
> provide unused space. I have tested with with various guest memory sizes and
> it worked fine. Also I decided to drop limit for maximum extended region size
> (128GB), we don't apply this limit in Dom0 and I don't see why we need it
> here, moreover the calculation became more obvious. I will drop all acks and
> send updated version. Are there any objections?

I am OK with it; it looks like you made good improvements. One caveat is
that I volunteer to review again no problem, but we'll need a new ack
from Ian Jackson to commit.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-10-05 21:36     ` Stefano Stabellini
@ 2021-10-06 10:11       ` Oleksandr
  0 siblings, 0 replies; 25+ messages in thread
From: Oleksandr @ 2021-10-06 10:11 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Anthony PERARD, Juergen Gross, Volodymyr Babchuk


On 06.10.21 00:36, Stefano Stabellini wrote:

Hi Stefano

> On Tue, 5 Oct 2021, Oleksandr wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> The extended region (safe range) is a region of guest physical
>>> address space which is unused and could be safely used to create
>>> grant/foreign mappings instead of wasting real RAM pages from
>>> the domain memory for establishing these mappings.
>>>
>>> The extended regions are chosen at the domain creation time and
>>> advertised to it via "reg" property under hypervisor node in
>>> the guest device-tree. As region 0 is reserved for grant table
>>> space (always present), the indexes for extended regions are 1...N.
>>> If extended regions could not be allocated for some reason,
>>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>>
>>> Please note the following limitations:
>>> - The extended region feature is only supported for 64-bit domain
>>>     currently.
>>> - The ACPI case is not covered.
>>>
>>> ***
>>>
>>> The algorithm to choose extended regions for non-direct mapped
>>> DomU is simpler in comparison with the algorithm for direct mapped
>>> Dom0. As we have a lot of unused space above 4GB, provide single
>>> 2MB-aligned region from the second RAM bank taking into the account
>>> the maximum supported guest address space size and the amount of
>>> memory assigned to the guest. The maximum size of the region is 128GB.
>>> The minimum size is 64MB.
>>>
>>> Suggested-by: Julien Grall <jgrall@amazon.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>> Acked-by: Ian Jackson <iwj@xenproject.org>
>>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>>> Tested-by: Michal Orzel <michal.orzel@arm.com>
>> I though a bit more on this and decided to make a patch more functional by
>> trying to also allocate extended region below 4GB, I think we could do with
>> it.
>> Actually if guest memory size is less than GUEST_RAM0_SIZE, we are able to
>> provide unused space. I have tested with with various guest memory sizes and
>> it worked fine. Also I decided to drop limit for maximum extended region size
>> (128GB), we don't apply this limit in Dom0 and I don't see why we need it
>> here, moreover the calculation became more obvious. I will drop all acks and
>> send updated version. Are there any objections?
> I am OK with it; it looks like you made good improvements. One caveat is
> that I volunteer to review again no problem,

Great, thank you.


> but we'll need a new ack
> from Ian Jackson to commit.

Yes, I know that.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-10-04 12:08     ` Oleksandr
@ 2021-10-06 18:11       ` Julien Grall
  2021-10-06 18:15         ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 25+ messages in thread
From: Julien Grall @ 2021-10-06 18:11 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Stefano Stabellini, Wei Liu,
	Volodymyr Babchuk

Hi Oleksandr,

On 04/10/2021 14:08, Oleksandr wrote:
> 
> On 04.10.21 09:59, Julien Grall wrote:
>> Hi Oleksandr,
> 
> Hi Julien

Hi Oleksandr,

> 
> 
>>
>> I saw Stefano committed this patch on Friday. However, I didn't have a 
>> chance go to through a second time and would like to request some 
>> follow-up changes.
> 
> ok, do you prefer the follow-up patch to be pushed separately or within 
> the rest patches of this series (#1 and #3)?  If the former, I will try 
> to push it today to close this question.

I don't mind. My main ask is they are addressed for 4.16.

> 
> 
>>
>>
>> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> The extended region (safe range) is a region of guest physical
>>> address space which is unused and could be safely used to create
>>> grant/foreign mappings instead of wasting real RAM pages from
>>> the domain memory for establishing these mappings.
>>>
>>> The extended regions are chosen at the domain creation time and
>>> advertised to it via "reg" property under hypervisor node in
>>> the guest device-tree. As region 0 is reserved for grant table
>>> space (always present), the indexes for extended regions are 1...N.
>>> If extended regions could not be allocated for some reason,
>>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>>
>>> Please note the following limitations:
>>> - The extended region feature is only supported for 64-bit domain
>>>    currently.
>>> - The ACPI case is not covered.
>>>
>>> ***
>>>
>>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>>> the algorithm to choose extended regions for it is different
>>> in comparison with the algorithm for non-direct mapped DomU.
>>> What is more, that extended regions should be chosen differently
>>> whether IOMMU is enabled or not.
>>>
>>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>>> holes found in host device-tree if otherwise. Make sure that
>>> extended regions are 2MB-aligned and located within maximum possible
>>> addressable physical memory range. The minimum size of extended
>>> region is 64MB. 
>>
>> You explained below why the 128 limits, but I don't see any 
>> explanation on why 2MB and 64MB.
>>
>> IIRC, 2MB was to potentally allow superpage mapping. I am not entirely 
>> sure for 64MB.
>>
>> Could you add an in-code comment explaining the two limits?
> 
> Yes. There was a discussion at [1]. 64MB was chosen as a reasonable 
> value to deal with between initial 2MB (we might end up having a lot of 
> small ranges which are not quite useful but increase bookkeeping) and 
> suggested 1GB (we might not be able find a suitable regions at all).

Ok. Please document in the code. Note that I don't think this choice 
should be advertised to the OS. This would give us some flexibility to 
change the size in the future (e.g. if we have platform where chunk of 
less than 64MB is beneficial).

>> The code below looks like an open-coding version of 
>> dt_for_each_range(). Can you try to re-use it please? This will help 
>> to reduce the complexity of this function.
> 
> You are right, it makes sense, will definitely reuse. If I was aware of 
> that function before I would safe some time I spent on the investigation 
> how to parse that)

I have only skimmed through the diff below. This looks fine to me. 
Please submit a formal patch.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-10-06 18:11       ` Julien Grall
@ 2021-10-06 18:15         ` Oleksandr Tyshchenko
  2021-10-06 18:35           ` Julien Grall
  0 siblings, 1 reply; 25+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-06 18:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Stefano Stabellini, Wei Liu,
	Volodymyr Babchuk

[-- Attachment #1: Type: text/plain, Size: 3860 bytes --]

On Wed, Oct 6, 2021 at 9:11 PM Julien Grall <julien@xen.org> wrote:

> Hi Oleksandr,
>

Hi Julien

[Sorry for the possible format issues]



>
> On 04/10/2021 14:08, Oleksandr wrote:
> >
> > On 04.10.21 09:59, Julien Grall wrote:
> >> Hi Oleksandr,
> >
> > Hi Julien
>
> Hi Oleksandr,
>
> >
> >
> >>
> >> I saw Stefano committed this patch on Friday. However, I didn't have a
> >> chance go to through a second time and would like to request some
> >> follow-up changes.
> >
> > ok, do you prefer the follow-up patch to be pushed separately or within
> > the rest patches of this series (#1 and #3)?  If the former, I will try
> > to push it today to close this question.
>
> I don't mind. My main ask is they are addressed for 4.16.
>
> >
> >
> >>
> >>
> >> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
> >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>
> >>> The extended region (safe range) is a region of guest physical
> >>> address space which is unused and could be safely used to create
> >>> grant/foreign mappings instead of wasting real RAM pages from
> >>> the domain memory for establishing these mappings.
> >>>
> >>> The extended regions are chosen at the domain creation time and
> >>> advertised to it via "reg" property under hypervisor node in
> >>> the guest device-tree. As region 0 is reserved for grant table
> >>> space (always present), the indexes for extended regions are 1...N.
> >>> If extended regions could not be allocated for some reason,
> >>> Xen doesn't fail and behaves as usual, so only inserts region 0.
> >>>
> >>> Please note the following limitations:
> >>> - The extended region feature is only supported for 64-bit domain
> >>>    currently.
> >>> - The ACPI case is not covered.
> >>>
> >>> ***
> >>>
> >>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> >>> the algorithm to choose extended regions for it is different
> >>> in comparison with the algorithm for non-direct mapped DomU.
> >>> What is more, that extended regions should be chosen differently
> >>> whether IOMMU is enabled or not.
> >>>
> >>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> >>> holes found in host device-tree if otherwise. Make sure that
> >>> extended regions are 2MB-aligned and located within maximum possible
> >>> addressable physical memory range. The minimum size of extended
> >>> region is 64MB.
> >>
> >> You explained below why the 128 limits, but I don't see any
> >> explanation on why 2MB and 64MB.
> >>
> >> IIRC, 2MB was to potentally allow superpage mapping. I am not entirely
> >> sure for 64MB.
> >>
> >> Could you add an in-code comment explaining the two limits?
> >
> > Yes. There was a discussion at [1]. 64MB was chosen as a reasonable
> > value to deal with between initial 2MB (we might end up having a lot of
> > small ranges which are not quite useful but increase bookkeeping) and
> > suggested 1GB (we might not be able find a suitable regions at all).
>
> Ok. Please document in the code. Note that I don't think this choice
> should be advertised to the OS. This would give us some flexibility to
> change the size in the future (e.g. if we have platform where chunk of
> less than 64MB is beneficial).
>
> >> The code below looks like an open-coding version of
> >> dt_for_each_range(). Can you try to re-use it please? This will help
> >> to reduce the complexity of this function.
> >
> > You are right, it makes sense, will definitely reuse. If I was aware of
> > that function before I would safe some time I spent on the investigation
> > how to parse that)
>
> I have only skimmed through the diff below. This looks fine to me.
> Please submit a formal patch.
>

Already submitted, please take a look at [1].

 [1]
https://lore.kernel.org/xen-devel/1633519346-3686-4-git-send-email-olekstysh@gmail.com/

-- 
Regards,

Oleksandr Tyshchenko

[-- Attachment #2: Type: text/html, Size: 5807 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-10-06 18:15         ` Oleksandr Tyshchenko
@ 2021-10-06 18:35           ` Julien Grall
  0 siblings, 0 replies; 25+ messages in thread
From: Julien Grall @ 2021-10-06 18:35 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Stefano Stabellini, Wei Liu,
	Volodymyr Babchuk

Hi Oleksandr,

On 06/10/2021 20:15, Oleksandr Tyshchenko wrote:
> 
> 
> On Wed, Oct 6, 2021 at 9:11 PM Julien Grall <julien@xen.org 
> <mailto:julien@xen.org>> wrote:
> 
>     Hi Oleksandr,
> 
> 
> Hi Julien
> 
> [Sorry for the possible format issues]
> 
> 
>     On 04/10/2021 14:08, Oleksandr wrote:
>      >
>      > On 04.10.21 09:59, Julien Grall wrote:
>      >> Hi Oleksandr,
>      >
>      > Hi Julien
> 
>     Hi Oleksandr,
> 
>      >
>      >
>      >>
>      >> I saw Stefano committed this patch on Friday. However, I didn't
>     have a
>      >> chance go to through a second time and would like to request some
>      >> follow-up changes.
>      >
>      > ok, do you prefer the follow-up patch to be pushed separately or
>     within
>      > the rest patches of this series (#1 and #3)?  If the former, I
>     will try
>      > to push it today to close this question.
> 
>     I don't mind. My main ask is they are addressed for 4.16.
> 
>      >
>      >
>      >>
>      >>
>      >> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
>      >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com
>     <mailto:oleksandr_tyshchenko@epam.com>>
>      >>>
>      >>> The extended region (safe range) is a region of guest physical
>      >>> address space which is unused and could be safely used to create
>      >>> grant/foreign mappings instead of wasting real RAM pages from
>      >>> the domain memory for establishing these mappings.
>      >>>
>      >>> The extended regions are chosen at the domain creation time and
>      >>> advertised to it via "reg" property under hypervisor node in
>      >>> the guest device-tree. As region 0 is reserved for grant table
>      >>> space (always present), the indexes for extended regions are 1...N.
>      >>> If extended regions could not be allocated for some reason,
>      >>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>      >>>
>      >>> Please note the following limitations:
>      >>> - The extended region feature is only supported for 64-bit domain
>      >>>    currently.
>      >>> - The ACPI case is not covered.
>      >>>
>      >>> ***
>      >>>
>      >>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>      >>> the algorithm to choose extended regions for it is different
>      >>> in comparison with the algorithm for non-direct mapped DomU.
>      >>> What is more, that extended regions should be chosen differently
>      >>> whether IOMMU is enabled or not.
>      >>>
>      >>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>      >>> holes found in host device-tree if otherwise. Make sure that
>      >>> extended regions are 2MB-aligned and located within maximum
>     possible
>      >>> addressable physical memory range. The minimum size of extended
>      >>> region is 64MB.
>      >>
>      >> You explained below why the 128 limits, but I don't see any
>      >> explanation on why 2MB and 64MB.
>      >>
>      >> IIRC, 2MB was to potentally allow superpage mapping. I am not
>     entirely
>      >> sure for 64MB.
>      >>
>      >> Could you add an in-code comment explaining the two limits?
>      >
>      > Yes. There was a discussion at [1]. 64MB was chosen as a reasonable
>      > value to deal with between initial 2MB (we might end up having a
>     lot of
>      > small ranges which are not quite useful but increase bookkeeping)
>     and
>      > suggested 1GB (we might not be able find a suitable regions at all).
> 
>     Ok. Please document in the code. Note that I don't think this choice
>     should be advertised to the OS. This would give us some flexibility to
>     change the size in the future (e.g. if we have platform where chunk of
>     less than 64MB is beneficial).
> 
>      >> The code below looks like an open-coding version of
>      >> dt_for_each_range(). Can you try to re-use it please? This will
>     help
>      >> to reduce the complexity of this function.
>      >
>      > You are right, it makes sense, will definitely reuse. If I was
>     aware of
>      > that function before I would safe some time I spent on the
>     investigation
>      > how to parse that)
> 
>     I have only skimmed through the diff below. This looks fine to me.
>     Please submit a formal patch.
> 
> 
> Already submitted, please take a look at [1].
> 
>   [1] 
> https://lore.kernel.org/xen-devel/1633519346-3686-4-git-send-email-olekstysh@gmail.com/ 
> <https://lore.kernel.org/xen-devel/1633519346-3686-4-git-send-email-olekstysh@gmail.com/>

Thanks! Sorry, my inbox has been growing quite a lot while I was away. I 
will have a look when I am fully back next week :).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-10-06 18:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29 22:52 [PATCH V4 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
2021-09-29 22:52 ` [PATCH V4 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
2021-09-30 23:00   ` Stefano Stabellini
2021-10-01  7:50     ` Jan Beulich
2021-10-01  8:19       ` Oleksandr
2021-10-01 23:24         ` Stefano Stabellini
2021-10-02  7:35           ` Julien Grall
2021-10-02 14:08             ` Oleksandr
2021-10-04 21:11               ` Stefano Stabellini
2021-10-05 19:49                 ` Oleksandr
2021-09-29 22:52 ` [PATCH V4 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
2021-09-30 15:36   ` Luca Fancellu
2021-09-30 22:53   ` Stefano Stabellini
2021-10-02  0:33   ` Stefano Stabellini
2021-10-02 12:40     ` Oleksandr
2021-10-04  6:41     ` Julien Grall
2021-10-04  6:59   ` Julien Grall
2021-10-04 12:08     ` Oleksandr
2021-10-06 18:11       ` Julien Grall
2021-10-06 18:15         ` Oleksandr Tyshchenko
2021-10-06 18:35           ` Julien Grall
2021-09-29 22:52 ` [PATCH V4 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
2021-10-05 19:42   ` Oleksandr
2021-10-05 21:36     ` Stefano Stabellini
2021-10-06 10:11       ` Oleksandr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.