All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space")
@ 2021-09-23 22:48 Oleksandr Tyshchenko
  2021-09-23 22:48 ` [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-23 22:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné,
	Henry Wang, Bertrand Marquis, Wei Chen

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

You can find an initial discussion at [1],[2] and [3].

The extended region (safe range) is a region of guest physical address space
which is unused and could be safely used to create grant/foreign mappings instead
of wasting real RAM pages from the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and advertised
to it via "reg" property under hypervisor node in the guest device-tree
(the indexes for extended regions are 1...N).
No device tree bindings update is needed, guest infers the presense of extended
regions from the number of regions in "reg" property.
New compatible/property will be needed (but only after this patch [4] or alternative
goes in) to indicate that "region 0 is safe to use". Until this patch is merged it is
not safe to use extended regions for the grant table space.

The extended regions are calculated differently for direct mapped
Dom0 (with and without IOMMU) and non-direct mapped DomUs.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain currently.
- The ACPI case is not covered.

Xen patch series is also available at [5]. The corresponding Linux patch series is at [6] for now
(last 4 patches).

Tested on Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with updated virtio-disk backend [7]
running in Dom0 (256MB RAM) and DomD (2GB RAM). In both cases the backend pre-maps DomU memory
which is 3GB. All foreign memory gets mapped into extended regions (so the amount of RAM in
the backend domain is not reduced). No issues were observed.

[1] https://lore.kernel.org/xen-devel/1627489110-25633-1-git-send-email-olekstysh@gmail.com/
[2] https://lore.kernel.org/xen-devel/1631034578-12598-1-git-send-email-olekstysh@gmail.com/
[3] https://lore.kernel.org/xen-devel/1631297924-8658-1-git-send-email-olekstysh@gmail.com/
[4] https://lore.kernel.org/xen-devel/1632425551-18910-1-git-send-email-olekstysh@gmail.com/
[5] https://github.com/otyshchenko1/xen/commits/map_opt_ml4
[6] https://github.com/otyshchenko1/linux/commits/map_opt_ml3
[7] https://github.com/otyshchenko1/virtio-disk/commits/map_opt_next

Oleksandr Tyshchenko (3):
  xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  xen/arm: Add handling of extended regions for Dom0
  libxl/arm: Add handling of extended regions for DomU

 docs/misc/xen-command-line.pandoc |   7 +
 tools/include/libxl.h             |   7 +
 tools/libs/light/libxl.c          |   2 +
 tools/libs/light/libxl_arm.c      |  70 +++++++++-
 tools/libs/light/libxl_types.idl  |   2 +
 xen/arch/arm/domain_build.c       | 280 +++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/sysctl.c             |   2 +
 xen/arch/x86/sysctl.c             |   2 +
 xen/include/public/arch-arm.h     |   3 +
 xen/include/public/sysctl.h       |   4 +-
 10 files changed, 370 insertions(+), 9 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-09-23 22:48 [PATCH V3 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
@ 2021-09-23 22:48 ` Oleksandr Tyshchenko
  2021-09-28  6:28   ` Michal Orzel
  2021-09-23 22:48 ` [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
  2021-09-23 22:48 ` [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
  2 siblings, 1 reply; 12+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-23 22:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

We need to pass info about maximum supported guest address
space size to the toolstack on Arm in order to properly
calculate the base and size of the extended region (safe range)
for the guest. The extended region is unused address space which
could be safely used by domain for foreign/grant mappings on Arm.
The extended region itself will be handled by the subsequents
patch.

Use p2m_ipa_bits variable on Arm, the x86 equivalent is
hap_paddr_bits.

As we change the size of structure bump the interface version.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Please note, that review comments for the RFC version [1] haven't been addressed yet.
It is not forgotten, some clarification is needed. It will be addressed for the next version.

[1] https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/

Changes RFC -> V2:
   - update patch subject/description
   - replace arch-specific sub-struct with common gpaddr_bits
     field and update code to reflect that

Changes V2 -> V3:
   - make the field uint8_t and add uint8_t pad[7] after
   - remove leading blanks in libxl.h
---
 tools/include/libxl.h            | 7 +++++++
 tools/libs/light/libxl.c         | 2 ++
 tools/libs/light/libxl_types.idl | 2 ++
 xen/arch/arm/sysctl.c            | 2 ++
 xen/arch/x86/sysctl.c            | 2 ++
 xen/include/public/sysctl.h      | 4 +++-
 6 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index b9ba16d..63f9550 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -856,6 +856,13 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_PHYSINFO_MAX_POSSIBLE_MFN 1
 
 /*
+ * LIBXL_HAVE_PHYSINFO_GPADDR_BITS
+ *
+ * If this is defined, libxl_physinfo has a "gpaddr_bits" field.
+ */
+#define LIBXL_HAVE_PHYSINFO_GPADDR_BITS 1
+
+/*
  * LIBXL_HAVE_DOMINFO_OUTSTANDING_MEMKB 1
  *
  * If this is defined, libxl_dominfo will contain a MemKB type field called
diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
index 204eb0b..c86624f 100644
--- a/tools/libs/light/libxl.c
+++ b/tools/libs/light/libxl.c
@@ -405,6 +405,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
     physinfo->cap_vmtrace =
         !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_vmtrace);
 
+    physinfo->gpaddr_bits = xcphysinfo.gpaddr_bits;
+
     GC_FREE;
     return 0;
 }
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 3f9fff6..bf27fe6 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -1061,6 +1061,8 @@ libxl_physinfo = Struct("physinfo", [
     ("cap_shadow", bool),
     ("cap_iommu_hap_pt_share", bool),
     ("cap_vmtrace", bool),
+
+    ("gpaddr_bits", uint8),
     ], dir=DIR_OUT)
 
 libxl_connectorinfo = Struct("connectorinfo", [
diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c
index f87944e..91dca4f 100644
--- a/xen/arch/arm/sysctl.c
+++ b/xen/arch/arm/sysctl.c
@@ -15,6 +15,8 @@
 void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
 {
     pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm | XEN_SYSCTL_PHYSCAP_hap;
+
+    pi->gpaddr_bits = p2m_ipa_bits;
 }
 
 long arch_do_sysctl(struct xen_sysctl *sysctl,
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index aff52a1..7b14865 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -135,6 +135,8 @@ void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_hap;
     if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_shadow;
+
+    pi->gpaddr_bits = hap_paddr_bits;
 }
 
 long arch_do_sysctl(
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 039ccf8..0450a78 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -35,7 +35,7 @@
 #include "domctl.h"
 #include "physdev.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
 
 /*
  * Read console content from Xen buffer ring.
@@ -120,6 +120,8 @@ struct xen_sysctl_physinfo {
     uint64_aligned_t outstanding_pages;
     uint64_aligned_t max_mfn; /* Largest possible MFN on this host */
     uint32_t hw_cap[8];
+    uint8_t gpaddr_bits;
+    uint8_t pad[7];
 };
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-23 22:48 [PATCH V3 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
  2021-09-23 22:48 ` [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
@ 2021-09-23 22:48 ` Oleksandr Tyshchenko
  2021-09-24 22:39   ` Stefano Stabellini
  2021-09-23 22:48 ` [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
  2 siblings, 1 reply; 12+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-23 22:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Volodymyr Babchuk

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
the algorithm to choose extended regions for it is different
in comparison with the algorithm for non-direct mapped DomU.
What is more, that extended regions should be chosen differently
whether IOMMU is enabled or not.

Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
holes found in host device-tree if otherwise. Make sure that
extended regions are 2MB-aligned and located within maximum possible
addressable physical memory range. The minimum size of extended
region is 64MB. The maximum number of extended regions is 128,
which is an artificial limit to minimize code changes (we reuse
struct meminfo to describe extended regions, so there are an array
field for 128 elements).

It worth mentioning that unallocated memory solution (when the IOMMU
is disabled) will work safely until Dom0 is able to allocate memory
outside of the original range.

Also introduce command line option to be able to globally enable or
disable support for extended regions for Dom0 (enabled by default).

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, we need to decide which approach to use in find_unallocated_memory(),
you can find details at:
https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/

Changes RFC -> V2:
   - update patch description
   - drop uneeded "extended-region" DT property

Changes V2 -> V3:
   - update patch description
   - add comment for "size" calculation in add_ext_regions()
   - clarify "end" calculation in find_unallocated_memory() and
     find_memory_holes()
   - only pick up regions with size >= 64MB
   - allocate reg dynamically instead of keeping on the stack in
     make_hypervisor_node()
   - do not show warning for 32-bit domain
   - drop Linux specific limits EXT_REGION_*
   - also cover "ranges" property in find_memory_holes()
   - add command line arg to enable/disable extended region support
---
 docs/misc/xen-command-line.pandoc |   7 +
 xen/arch/arm/domain_build.c       | 280 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 284 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 177e656..3bb8eb7 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1081,6 +1081,13 @@ hardware domain is architecture dependent.
 Note that specifying zero as domU value means zero, while for dom0 it means
 to use the default.
 
+### ext_regions (Arm)
+> `= <boolean>`
+
+> Default : `true`
+
+Flag to globally enable or disable support for extended regions for dom0.
+
 ### flask
 > `= permissive | enforcing | late | disabled`
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d233d63..81997d5 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -34,6 +34,10 @@
 static unsigned int __initdata opt_dom0_max_vcpus;
 integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
 
+/* If true, the extended regions support is enabled for dom0 */
+static bool __initdata opt_ext_regions = true;
+boolean_param("ext_regions", opt_ext_regions);
+
 static u64 __initdata dom0_mem;
 static bool __initdata dom0_mem_set;
 
@@ -886,6 +890,233 @@ static int __init make_memory_node(const struct domain *d,
     return res;
 }
 
+static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
+{
+    struct meminfo *ext_regions = data;
+    paddr_t start, size;
+
+    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
+        return 0;
+
+    /* Both start and size of the extended region should be 2MB aligned */
+    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
+    if ( start > e )
+        return 0;
+
+    /*
+     * e is actually "end-1" because it is called by rangeset functions
+     * which are inclusive of the last address.
+     */
+    e += 1;
+    size = (e - start) & ~(SZ_2M - 1);
+    if ( size < MB(64) )
+        return 0;
+
+    ext_regions->bank[ext_regions->nr_banks].start = start;
+    ext_regions->bank[ext_regions->nr_banks].size = size;
+    ext_regions->nr_banks++;
+
+    return 0;
+}
+
+static int __init find_unallocated_memory(const struct kernel_info *kinfo,
+                                          struct meminfo *ext_regions)
+{
+    const struct meminfo *assign_mem = &kinfo->mem;
+    struct rangeset *unalloc_mem;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find unallocated memory for extended regions\n");
+
+    unalloc_mem = rangeset_new(NULL, NULL, 0);
+    if ( !unalloc_mem )
+        return -ENOMEM;
+
+    /* Start with all available RAM */
+    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
+    {
+        start = bootinfo.mem.bank[i].start;
+        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
+        res = rangeset_add_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove RAM assigned to Dom0 */
+    for ( i = 0; i < assign_mem->nr_banks; i++ )
+    {
+        start = assign_mem->bank[i].start;
+        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
+        res = rangeset_remove_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove reserved-memory regions */
+    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        start = bootinfo.reserved_mem.bank[i].start;
+        end = bootinfo.reserved_mem.bank[i].start +
+            bootinfo.reserved_mem.bank[i].size;
+        res = rangeset_remove_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove grant table region */
+    start = kinfo->gnttab_start;
+    end = kinfo->gnttab_start + kinfo->gnttab_size;
+    res = rangeset_remove_range(unalloc_mem, start, end - 1);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_report_ranges(unalloc_mem, start, end,
+                                 add_ext_regions, ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(unalloc_mem);
+
+    return res;
+}
+
+static int __init find_memory_holes(const struct kernel_info *kinfo,
+                                    struct meminfo *ext_regions)
+{
+    struct dt_device_node *np;
+    struct rangeset *mem_holes;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find memory holes for extended regions\n");
+
+    mem_holes = rangeset_new(NULL, NULL, 0);
+    if ( !mem_holes )
+        return -ENOMEM;
+
+    /* Start with maximum possible addressable physical memory range */
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_add_range(mem_holes, start, end);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    /*
+     * Remove regions described by "reg" and "ranges" properties where
+     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
+     */
+    dt_for_each_device_node( dt_host, np )
+    {
+        unsigned int naddr;
+        u64 addr, size;
+
+        naddr = dt_number_of_address(np);
+
+        for ( i = 0; i < naddr; i++ )
+        {
+            res = dt_device_get_address(np, i, &addr, &size);
+            if ( res )
+            {
+                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
+                       i, dt_node_full_name(np));
+                goto out;
+            }
+
+            start = addr & PAGE_MASK;
+            end = PAGE_ALIGN(addr + size);
+            res = rangeset_remove_range(mem_holes, start, end - 1);
+            if ( res )
+            {
+                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                       start, end);
+                goto out;
+            }
+        }
+
+        if ( dt_device_type_is_equal(np, "pci" ) )
+        {
+            unsigned int range_size, nr_ranges;
+            int na, ns, pna;
+            const __be32 *ranges;
+            u32 len;
+
+            /*
+             * Looking for non-empty ranges property which in the context
+             * of the PCI host bridge device node describes the BARs for
+             * the PCI devices.
+             */
+            ranges = dt_get_property(np, "ranges", &len);
+            if ( !ranges || !len )
+                continue;
+
+            pna = dt_n_addr_cells(np);
+            na = dt_child_n_addr_cells(np);
+            ns = dt_child_n_size_cells(np);
+            range_size = pna + na + ns;
+            nr_ranges = len / sizeof(__be32) / range_size;
+
+            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
+            {
+                /* Skip the child address and get the parent (CPU) address */
+                addr = dt_read_number(ranges + na, pna);
+                size = dt_read_number(ranges + na + pna, ns);
+
+                start = addr & PAGE_MASK;
+                end = PAGE_ALIGN(addr + size);
+                res = rangeset_remove_range(mem_holes, start, end - 1);
+                if ( res )
+                {
+                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                           start, end);
+                    goto out;
+                }
+            }
+        }
+    }
+
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_report_ranges(mem_holes, start, end,
+                                 add_ext_regions,  ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(mem_holes);
+
+    return res;
+}
+
 static int __init make_hypervisor_node(struct domain *d,
                                        const struct kernel_info *kinfo,
                                        int addrcells, int sizecells)
@@ -893,11 +1124,12 @@ static int __init make_hypervisor_node(struct domain *d,
     const char compat[] =
         "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
         "xen,xen";
-    __be32 reg[4];
+    __be32 *reg, *cells;
     gic_interrupt_t intr;
-    __be32 *cells;
     int res;
     void *fdt = kinfo->fdt;
+    struct meminfo *ext_regions;
+    unsigned int i;
 
     dt_dprintk("Create hypervisor node\n");
 
@@ -919,12 +1151,54 @@ static int __init make_hypervisor_node(struct domain *d,
     if ( res )
         return res;
 
+    reg = xzalloc_array(__be32, (NR_MEM_BANKS + 1) * 4);
+    if ( !reg )
+        return -ENOMEM;
+
+    ext_regions = xzalloc(struct meminfo);
+    if ( !ext_regions )
+    {
+        xfree(reg);
+        return -ENOMEM;
+    }
+
+    if ( !opt_ext_regions )
+        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
+    else if ( is_32bit_domain(d) )
+        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
+    else
+    {
+        if ( !is_iommu_enabled(d) )
+            res = find_unallocated_memory(kinfo, ext_regions);
+        else
+            res = find_memory_holes(kinfo, ext_regions);
+
+        if ( res )
+            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
+    }
+
     /* reg 0 is grant table space */
     cells = &reg[0];
     dt_child_set_range(&cells, addrcells, sizecells,
                        kinfo->gnttab_start, kinfo->gnttab_size);
+    /* reg 1...N are extended regions */
+    for ( i = 0; i < ext_regions->nr_banks; i++ )
+    {
+        u64 start = ext_regions->bank[i].start;
+        u64 size = ext_regions->bank[i].size;
+
+        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
+                   i, start, start + size);
+
+        dt_child_set_range(&cells, addrcells, sizecells, start, size);
+    }
+
     res = fdt_property(fdt, "reg", reg,
-                       dt_cells_to_size(addrcells + sizecells));
+                       dt_cells_to_size(addrcells + sizecells) *
+                       (ext_regions->nr_banks + 1));
+    xfree(ext_regions);
+    xfree(reg);
+
     if ( res )
         return res;
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-09-23 22:48 [PATCH V3 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
  2021-09-23 22:48 ` [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
  2021-09-23 22:48 ` [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
@ 2021-09-23 22:48 ` Oleksandr Tyshchenko
  2021-09-24 23:14   ` Stefano Stabellini
  2021-09-28  7:28   ` Michal Orzel
  2 siblings, 2 replies; 12+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-23 22:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Juergen Gross, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

The algorithm to choose extended regions for non-direct mapped
DomU is simpler in comparison with the algorithm for direct mapped
Dom0. As we have a lot of unused space above 4GB, provide single
2MB-aligned region from the second RAM bank taking into the account
the maximum supported guest address space size and the amount of
memory assigned to the guest. The maximum size of the region is 128GB.
The minimum size is 64MB.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Changes RFC -> V2:
   - update patch description
   - drop uneeded "extended-region" DT property
   - clear reg array in finalise_ext_region() and add a TODO

Changes V2 -> V3:
   - update patch description, comments in code
   - only pick up regions with size >= 64MB
   - move the region calculation to make_hypervisor_node() and drop
     finalise_ext_region()
   - extend the list of arguments for make_hypervisor_node()
   - do not show warning for 32-bit domain
   - change the region alignment from 1GB to 2MB
   - move EXT_REGION_SIZE to public/arch-arm.h
---
 tools/libs/light/libxl_arm.c  | 70 +++++++++++++++++++++++++++++++++++++++----
 xen/include/public/arch-arm.h |  3 ++
 2 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index e3140a6..a67b68e 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -598,9 +598,17 @@ static int make_timer_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
+
 static int make_hypervisor_node(libxl__gc *gc, void *fdt,
-                                const libxl_version_info *vers)
+                                const libxl_version_info *vers,
+                                const libxl_domain_build_info *b_info,
+                                const struct xc_dom_image *dom)
 {
+    uint64_t region_size = 0, region_base, ramsize, bank1size,
+        bank1end_align, bank1end_max;
+    uint8_t gpaddr_bits;
+    libxl_physinfo physinfo;
     int res;
     gic_interrupt intr;
 
@@ -615,9 +623,61 @@ static int make_hypervisor_node(libxl__gc *gc, void *fdt,
                               "xen,xen");
     if (res) return res;
 
-    /* reg 0 is grant table space */
-    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
+    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
+        LOG(DEBUG, "The extended regions are only supported for 64-bit guest currently");
+        goto out;
+    }
+
+    res = libxl_get_physinfo(CTX, &physinfo);
+    assert(!res);
+
+    gpaddr_bits = physinfo.gpaddr_bits;
+    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
+
+    /*
+     * Try to allocate single 2MB-aligned extended region from the second RAM
+     * bank (above 4GB) taking into the account the maximum supported guest
+     * address space size and the amount of memory assigned to the guest.
+     * As the guest memory layout is not populated yet we cannot rely on
+     * dom->rambank_size[1], so calculate the actual size of the second bank
+     * using "max_memkb" value.
+     */
+    bank1end_max = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
+    ramsize = b_info->max_memkb * 1024;
+    if (ramsize <= GUEST_RAM0_SIZE)
+        bank1size = 0;
+    else
+        bank1size = ramsize - GUEST_RAM0_SIZE;
+    bank1end_align = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(bank1size);
+
+    if (bank1end_max <= bank1end_align) {
+        LOG(WARN, "The extended region cannot be allocated, not enough space");
+        goto out;
+    }
+
+    if (bank1end_max - bank1end_align > GUEST_EXT_REGION_MAX_SIZE) {
+        region_base = bank1end_max - GUEST_EXT_REGION_MAX_SIZE;
+        region_size = GUEST_EXT_REGION_MAX_SIZE;
+    } else {
+        region_base = bank1end_align;
+        region_size = bank1end_max - bank1end_align;
+    }
+
+out:
+    /*
+     * The region 0 for grant table space must be always present. If we managed
+     * to allocate the extended region then insert it as region 1.
+     */
+    if (region_size >= GUEST_EXT_REGION_MIN_SIZE) {
+        LOG(DEBUG, "Extended region: %#"PRIx64"->%#"PRIx64"\n",
+            region_base, region_base + region_size);
+
+        res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                                2, GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE,
+                                region_base, region_size);
+    } else
+        res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                                1, GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
     if (res) return res;
 
     /*
@@ -963,7 +1023,7 @@ next_resize:
         }
 
         FDT( make_timer_node(gc, fdt, ainfo, state->clock_frequency) );
-        FDT( make_hypervisor_node(gc, fdt, vers) );
+        FDT( make_hypervisor_node(gc, fdt, vers, info, dom) );
 
         if (info->arch_arm.vuart == LIBXL_VUART_TYPE_SBSA_UART)
             FDT( make_vpl011_uart_node(gc, fdt, ainfo, dom) );
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 6b5a5f8..df59933 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -449,6 +449,9 @@ typedef uint64_t xen_callback_t;
 #define GUEST_RAM_BANK_BASES   { GUEST_RAM0_BASE, GUEST_RAM1_BASE }
 #define GUEST_RAM_BANK_SIZES   { GUEST_RAM0_SIZE, GUEST_RAM1_SIZE }
 
+#define GUEST_EXT_REGION_MAX_SIZE   xen_mk_ullong(0x2000000000) /* 128GB */
+#define GUEST_EXT_REGION_MIN_SIZE   xen_mk_ullong(0x0004000000) /* 64MB */
+
 /* Current supported guest VCPUs */
 #define GUEST_MAX_VCPUS 128
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-23 22:48 ` [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
@ 2021-09-24 22:39   ` Stefano Stabellini
  2021-09-25 18:20     ` Oleksandr
  0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2021-09-24 22:39 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu, Volodymyr Babchuk

On Fri, 24 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>    - update patch description
>    - add comment for "size" calculation in add_ext_regions()
>    - clarify "end" calculation in find_unallocated_memory() and
>      find_memory_holes()
>    - only pick up regions with size >= 64MB
>    - allocate reg dynamically instead of keeping on the stack in
>      make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - drop Linux specific limits EXT_REGION_*
>    - also cover "ranges" property in find_memory_holes()
>    - add command line arg to enable/disable extended region support
> ---
>  docs/misc/xen-command-line.pandoc |   7 +
>  xen/arch/arm/domain_build.c       | 280 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 284 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..3bb8eb7 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,13 @@ hardware domain is architecture dependent.
>  Note that specifying zero as domU value means zero, while for dom0 it means
>  to use the default.
>  
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to globally enable or disable support for extended regions for dom0.

I'd say:

Flag to enable or disable extended regions for Dom0.

Extended regions are ranges of unused address space exposed to Dom0 as
"safe to use" for special memory mappings. Disable if your board device
tree is incomplete.


>  ### flask
>  > `= permissive | enforcing | late | disabled`
>  
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..81997d5 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
>  static unsigned int __initdata opt_dom0_max_vcpus;
>  integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>  
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
>  static u64 __initdata dom0_mem;
>  static bool __initdata dom0_mem_set;
>  
> @@ -886,6 +890,233 @@ static int __init make_memory_node(const struct domain *d,
>      return res;
>  }
>  
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}

This is a lot better


> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)
> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)
> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;

PAGE_ALIGN(addr)


> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {
> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in the context
> +             * of the PCI host bridge device node describes the BARs for
> +             * the PCI devices.

I'd say that "it describes the PCI host bridge aperture"


> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +
> +            range_size = pna + na + ns;
> +
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +
> +                size = dt_read_number(ranges + na + pna, ns);

Parsing the PCI ranges property is never intuitive, but everything
checks out as far as I can tell


> +                start = addr & PAGE_MASK;

PAGE_ALIGN(start)


> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
>  static int __init make_hypervisor_node(struct domain *d,
>                                         const struct kernel_info *kinfo,
>                                         int addrcells, int sizecells)
> @@ -893,11 +1124,12 @@ static int __init make_hypervisor_node(struct domain *d,
>      const char compat[] =
>          "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>          "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>      gic_interrupt_t intr;
> -    __be32 *cells;
>      int res;
>      void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions;
> +    unsigned int i;
>  
>      dt_dprintk("Create hypervisor node\n");
>  
> @@ -919,12 +1151,54 @@ static int __init make_hypervisor_node(struct domain *d,
>      if ( res )
>          return res;
>  
> +    reg = xzalloc_array(__be32, (NR_MEM_BANKS + 1) * 4);
> +    if ( !reg )
> +        return -ENOMEM;

Instead of (NR_MEM_BANKS + 1) * 4, shouldn't it be:

(ext_regions->nr_banks + 1) * (addrcells + sizecells)

?

xzalloc_array allocates a number of elements, not a number of bytes.


> +
> +    ext_regions = xzalloc(struct meminfo);
> +    if ( !ext_regions )
> +    {
> +        xfree(reg);
> +        return -ENOMEM;
> +    }
> +
> +    if ( !opt_ext_regions )
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +    else if ( is_32bit_domain(d) )
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");

These checks should be before the memory allocations


> +    else
> +    {
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +    }
> +
>      /* reg 0 is grant table space */
>      cells = &reg[0];
>      dt_child_set_range(&cells, addrcells, sizecells,
>                         kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < ext_regions->nr_banks; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);
> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>      res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (ext_regions->nr_banks + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>      if ( res )
>          return res;
>  
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-09-23 22:48 ` [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
@ 2021-09-24 23:14   ` Stefano Stabellini
  2021-09-27 10:48     ` Ian Jackson
  2021-09-28  7:28   ` Michal Orzel
  1 sibling, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2021-09-24 23:14 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Ian Jackson, Wei Liu,
	Anthony PERARD, Juergen Gross, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk

On Fri, 24 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> The algorithm to choose extended regions for non-direct mapped
> DomU is simpler in comparison with the algorithm for direct mapped
> Dom0. As we have a lot of unused space above 4GB, provide single
> 2MB-aligned region from the second RAM bank taking into the account
> the maximum supported guest address space size and the amount of
> memory assigned to the guest. The maximum size of the region is 128GB.
> The minimum size is 64MB.
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
>    - clear reg array in finalise_ext_region() and add a TODO
> 
> Changes V2 -> V3:
>    - update patch description, comments in code
>    - only pick up regions with size >= 64MB
>    - move the region calculation to make_hypervisor_node() and drop
>      finalise_ext_region()
>    - extend the list of arguments for make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - change the region alignment from 1GB to 2MB
>    - move EXT_REGION_SIZE to public/arch-arm.h
> ---
>  tools/libs/light/libxl_arm.c  | 70 +++++++++++++++++++++++++++++++++++++++----
>  xen/include/public/arch-arm.h |  3 ++
>  2 files changed, 68 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
> index e3140a6..a67b68e 100644
> --- a/tools/libs/light/libxl_arm.c
> +++ b/tools/libs/light/libxl_arm.c
> @@ -598,9 +598,17 @@ static int make_timer_node(libxl__gc *gc, void *fdt,
>      return 0;
>  }
>  
> +#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
> +
>  static int make_hypervisor_node(libxl__gc *gc, void *fdt,
> -                                const libxl_version_info *vers)
> +                                const libxl_version_info *vers,
> +                                const libxl_domain_build_info *b_info,
> +                                const struct xc_dom_image *dom)
>  {
> +    uint64_t region_size = 0, region_base, ramsize, bank1size,
> +        bank1end_align, bank1end_max;
> +    uint8_t gpaddr_bits;
> +    libxl_physinfo physinfo;
>      int res;
>      gic_interrupt intr;
>  
> @@ -615,9 +623,61 @@ static int make_hypervisor_node(libxl__gc *gc, void *fdt,
>                                "xen,xen");
>      if (res) return res;
>  
> -    /* reg 0 is grant table space */
> -    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> -                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
> +    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
> +        LOG(DEBUG, "The extended regions are only supported for 64-bit guest currently");
> +        goto out;
> +    }
> +
> +    res = libxl_get_physinfo(CTX, &physinfo);
> +    assert(!res);
> +
> +    gpaddr_bits = physinfo.gpaddr_bits;
> +    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
> +
> +    /*
> +     * Try to allocate single 2MB-aligned extended region from the second RAM
> +     * bank (above 4GB) taking into the account the maximum supported guest
> +     * address space size and the amount of memory assigned to the guest.
> +     * As the guest memory layout is not populated yet we cannot rely on
> +     * dom->rambank_size[1], so calculate the actual size of the second bank
> +     * using "max_memkb" value.
> +     */
> +    bank1end_max = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
> +    ramsize = b_info->max_memkb * 1024;
> +    if (ramsize <= GUEST_RAM0_SIZE)
> +        bank1size = 0;
> +    else
> +        bank1size = ramsize - GUEST_RAM0_SIZE;
> +    bank1end_align = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(bank1size);
> +
> +    if (bank1end_max <= bank1end_align) {
> +        LOG(WARN, "The extended region cannot be allocated, not enough space");
> +        goto out;
> +    }
> +
> +    if (bank1end_max - bank1end_align > GUEST_EXT_REGION_MAX_SIZE) {
> +        region_base = bank1end_max - GUEST_EXT_REGION_MAX_SIZE;
> +        region_size = GUEST_EXT_REGION_MAX_SIZE;
> +    } else {
> +        region_base = bank1end_align;
> +        region_size = bank1end_max - bank1end_align;
> +    }
> +
> +out:
> +    /*
> +     * The region 0 for grant table space must be always present. If we managed
> +     * to allocate the extended region then insert it as region 1.
> +     */
> +    if (region_size >= GUEST_EXT_REGION_MIN_SIZE) {
> +        LOG(DEBUG, "Extended region: %#"PRIx64"->%#"PRIx64"\n",
> +            region_base, region_base + region_size);
> +
> +        res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> +                                2, GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE,
> +                                region_base, region_size);
> +    } else
> +        res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> +                                1, GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
>      if (res) return res;
>  
>      /*
> @@ -963,7 +1023,7 @@ next_resize:
>          }
>  
>          FDT( make_timer_node(gc, fdt, ainfo, state->clock_frequency) );
> -        FDT( make_hypervisor_node(gc, fdt, vers) );
> +        FDT( make_hypervisor_node(gc, fdt, vers, info, dom) );
>  
>          if (info->arch_arm.vuart == LIBXL_VUART_TYPE_SBSA_UART)
>              FDT( make_vpl011_uart_node(gc, fdt, ainfo, dom) );
> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
> index 6b5a5f8..df59933 100644
> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -449,6 +449,9 @@ typedef uint64_t xen_callback_t;
>  #define GUEST_RAM_BANK_BASES   { GUEST_RAM0_BASE, GUEST_RAM1_BASE }
>  #define GUEST_RAM_BANK_SIZES   { GUEST_RAM0_SIZE, GUEST_RAM1_SIZE }
>  
> +#define GUEST_EXT_REGION_MAX_SIZE   xen_mk_ullong(0x2000000000) /* 128GB */
> +#define GUEST_EXT_REGION_MIN_SIZE   xen_mk_ullong(0x0004000000) /* 64MB */
> +
>  /* Current supported guest VCPUs */
>  #define GUEST_MAX_VCPUS 128
>  
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-24 22:39   ` Stefano Stabellini
@ 2021-09-25 18:20     ` Oleksandr
  2021-09-25 19:07       ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Oleksandr @ 2021-09-25 18:20 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Wei Liu,
	Volodymyr Babchuk


On 25.09.21 01:39, Stefano Stabellini wrote:

Hi Stefano

> On Fri, 24 Sep 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The minimum size of extended
>> region is 64MB. The maximum number of extended regions is 128,
>> which is an artificial limit to minimize code changes (we reuse
>> struct meminfo to describe extended regions, so there are an array
>> field for 128 elements).
>>
>> It worth mentioning that unallocated memory solution (when the IOMMU
>> is disabled) will work safely until Dom0 is able to allocate memory
>> outside of the original range.
>>
>> Also introduce command line option to be able to globally enable or
>> disable support for extended regions for Dom0 (enabled by default).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Please note, we need to decide which approach to use in find_unallocated_memory(),
>> you can find details at:
>> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
>>
>> Changes RFC -> V2:
>>     - update patch description
>>     - drop uneeded "extended-region" DT property
>>
>> Changes V2 -> V3:
>>     - update patch description
>>     - add comment for "size" calculation in add_ext_regions()
>>     - clarify "end" calculation in find_unallocated_memory() and
>>       find_memory_holes()
>>     - only pick up regions with size >= 64MB
>>     - allocate reg dynamically instead of keeping on the stack in
>>       make_hypervisor_node()
>>     - do not show warning for 32-bit domain
>>     - drop Linux specific limits EXT_REGION_*
>>     - also cover "ranges" property in find_memory_holes()
>>     - add command line arg to enable/disable extended region support
>> ---
>>   docs/misc/xen-command-line.pandoc |   7 +
>>   xen/arch/arm/domain_build.c       | 280 +++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 284 insertions(+), 3 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
>> index 177e656..3bb8eb7 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1081,6 +1081,13 @@ hardware domain is architecture dependent.
>>   Note that specifying zero as domU value means zero, while for dom0 it means
>>   to use the default.
>>   
>> +### ext_regions (Arm)
>> +> `= <boolean>`
>> +
>> +> Default : `true`
>> +
>> +Flag to globally enable or disable support for extended regions for dom0.
> I'd say:
>
> Flag to enable or disable extended regions for Dom0.
>
> Extended regions are ranges of unused address space exposed to Dom0 as
> "safe to use" for special memory mappings. Disable if your board device
> tree is incomplete.


ok, will update


>
>
>>   ### flask
>>   > `= permissive | enforcing | late | disabled`
>>   
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index d233d63..81997d5 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -34,6 +34,10 @@
>>   static unsigned int __initdata opt_dom0_max_vcpus;
>>   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>>   
>> +/* If true, the extended regions support is enabled for dom0 */
>> +static bool __initdata opt_ext_regions = true;
>> +boolean_param("ext_regions", opt_ext_regions);
>> +
>>   static u64 __initdata dom0_mem;
>>   static bool __initdata dom0_mem_set;
>>   
>> @@ -886,6 +890,233 @@ static int __init make_memory_node(const struct domain *d,
>>       return res;
>>   }
>>   
>> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
>> +{
>> +    struct meminfo *ext_regions = data;
>> +    paddr_t start, size;
>> +
>> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
>> +        return 0;
>> +
>> +    /* Both start and size of the extended region should be 2MB aligned */
>> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
>> +    if ( start > e )
>> +        return 0;
>> +
>> +    /*
>> +     * e is actually "end-1" because it is called by rangeset functions
>> +     * which are inclusive of the last address.
>> +     */
>> +    e += 1;
>> +    size = (e - start) & ~(SZ_2M - 1);
>> +    if ( size < MB(64) )
>> +        return 0;
>> +
>> +    ext_regions->bank[ext_regions->nr_banks].start = start;
>> +    ext_regions->bank[ext_regions->nr_banks].size = size;
>> +    ext_regions->nr_banks++;
>> +
>> +    return 0;
>> +}
> This is a lot better

Great!


>
>
>> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
>> +                                          struct meminfo *ext_regions)
>> +{
>> +    const struct meminfo *assign_mem = &kinfo->mem;
>> +    struct rangeset *unalloc_mem;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find unallocated memory for extended regions\n");
>> +
>> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
>> +    if ( !unalloc_mem )
>> +        return -ENOMEM;
>> +
>> +    /* Start with all available RAM */
>> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.mem.bank[i].start;
>> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
>> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove RAM assigned to Dom0 */
>> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
>> +    {
>> +        start = assign_mem->bank[i].start;
>> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove reserved-memory regions */
>> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.reserved_mem.bank[i].start;
>> +        end = bootinfo.reserved_mem.bank[i].start +
>> +            bootinfo.reserved_mem.bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove grant table region */
>> +    start = kinfo->gnttab_start;
>> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
>> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(unalloc_mem, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(unalloc_mem);
>> +
>> +    return res;
>> +}
>> +
>> +static int __init find_memory_holes(const struct kernel_info *kinfo,
>> +                                    struct meminfo *ext_regions)
>> +{
>> +    struct dt_device_node *np;
>> +    struct rangeset *mem_holes;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find memory holes for extended regions\n");
>> +
>> +    mem_holes = rangeset_new(NULL, NULL, 0);
>> +    if ( !mem_holes )
>> +        return -ENOMEM;
>> +
>> +    /* Start with maximum possible addressable physical memory range */
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_add_range(mem_holes, start, end);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    /*
>> +     * Remove regions described by "reg" and "ranges" properties where
>> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
>> +     */
>> +    dt_for_each_device_node( dt_host, np )
>> +    {
>> +        unsigned int naddr;
>> +        u64 addr, size;
>> +
>> +        naddr = dt_number_of_address(np);
>> +
>> +        for ( i = 0; i < naddr; i++ )
>> +        {
>> +            res = dt_device_get_address(np, i, &addr, &size);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
>> +                       i, dt_node_full_name(np));
>> +                goto out;
>> +            }
>> +
>> +            start = addr & PAGE_MASK;
> PAGE_ALIGN(addr)

Let's imagine we have reg = <0x0 0xee080200 0x0 0x700>;
So using PAGE_ALIGN(0xee080200) we will get a result aligned up - 
0xee081000, but this is not what we want here. But, the end should be 
aligned up.


>
>
>> +            end = PAGE_ALIGN(addr + size);
>> +            res = rangeset_remove_range(mem_holes, start, end - 1);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                       start, end);
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        if ( dt_device_type_is_equal(np, "pci" ) )
>> +        {
>> +            unsigned int range_size, nr_ranges;
>> +            int na, ns, pna;
>> +            const __be32 *ranges;
>> +            u32 len;
>> +
>> +            /*
>> +             * Looking for non-empty ranges property which in the context
>> +             * of the PCI host bridge device node describes the BARs for
>> +             * the PCI devices.
> I'd say that "it describes the PCI host bridge aperture"

ok, will update


>
>
>> +             */
>> +            ranges = dt_get_property(np, "ranges", &len);
>> +            if ( !ranges || !len )
>> +                continue;
>> +
>> +            pna = dt_n_addr_cells(np);
>> +            na = dt_child_n_addr_cells(np);
>> +            ns = dt_child_n_size_cells(np);
>> +
>> +            range_size = pna + na + ns;
>> +
>> +            nr_ranges = len / sizeof(__be32) / range_size;
>> +
>> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
>> +            {
>> +                /* Skip the child address and get the parent (CPU) address */
>> +                addr = dt_read_number(ranges + na, pna);
>> +
>> +                size = dt_read_number(ranges + na + pna, ns);
> Parsing the PCI ranges property is never intuitive, but everything
> checks out as far as I can tell
>
>
>> +                start = addr & PAGE_MASK;
> PAGE_ALIGN(start)

same here


>
>
>> +                end = PAGE_ALIGN(addr + size);
>> +                res = rangeset_remove_range(mem_holes, start, end - 1);
>> +                if ( res )
>> +                {
>> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                           start, end);
>> +                    goto out;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(mem_holes, start, end,
>> +                                 add_ext_regions,  ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(mem_holes);
>> +
>> +    return res;
>> +}
>> +
>>   static int __init make_hypervisor_node(struct domain *d,
>>                                          const struct kernel_info *kinfo,
>>                                          int addrcells, int sizecells)
>> @@ -893,11 +1124,12 @@ static int __init make_hypervisor_node(struct domain *d,
>>       const char compat[] =
>>           "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>>           "xen,xen";
>> -    __be32 reg[4];
>> +    __be32 *reg, *cells;
>>       gic_interrupt_t intr;
>> -    __be32 *cells;
>>       int res;
>>       void *fdt = kinfo->fdt;
>> +    struct meminfo *ext_regions;
>> +    unsigned int i;
>>   
>>       dt_dprintk("Create hypervisor node\n");
>>   
>> @@ -919,12 +1151,54 @@ static int __init make_hypervisor_node(struct domain *d,
>>       if ( res )
>>           return res;
>>   
>> +    reg = xzalloc_array(__be32, (NR_MEM_BANKS + 1) * 4);
>> +    if ( !reg )
>> +        return -ENOMEM;
> Instead of (NR_MEM_BANKS + 1) * 4, shouldn't it be:
>
> (ext_regions->nr_banks + 1) * (addrcells + sizecells)
>
> ?

I think, yes ... But, with other modifications. Please see below


>
> xzalloc_array allocates a number of elements, not a number of bytes.
>
>
>> +
>> +    ext_regions = xzalloc(struct meminfo);
>> +    if ( !ext_regions )
>> +    {
>> +        xfree(reg);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    if ( !opt_ext_regions )
>> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
>> +    else if ( is_32bit_domain(d) )
>> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
> These checks should be before the memory allocations

Good point! Indeed there is no sense of whole "ext_regions" allocations 
if we are not going to handle extended regions. Also there is no need
to allocate "reg" array with maximum possible elements in advance 
(NR_MEM_BANKS + 1) * 4, we can allocate it afterwards when we clearly 
know how many elements we really need
(nr_ext_regions + 1) * 4, as you suggested above.


Below the changes to this function on top of current patch:

@@ -1128,8 +1127,8 @@ static int __init make_hypervisor_node(struct 
domain *d,
      gic_interrupt_t intr;
      int res;
      void *fdt = kinfo->fdt;
-    struct meminfo *ext_regions;
-    unsigned int i;
+    struct meminfo *ext_regions = NULL;
+    unsigned int i, nr_ext_regions;

      dt_dprintk("Create hypervisor node\n");

@@ -1151,23 +1150,22 @@ static int __init make_hypervisor_node(struct 
domain *d,
      if ( res )
          return res;

-    reg = xzalloc_array(__be32, (NR_MEM_BANKS + 1) * 4);
-    if ( !reg )
-        return -ENOMEM;
-
-    ext_regions = xzalloc(struct meminfo);
-    if ( !ext_regions )
-    {
-        xfree(reg);
-        return -ENOMEM;
-    }
-
      if ( !opt_ext_regions )
+    {
          printk(XENLOG_DEBUG "The extended regions support is disabled\n");
+        nr_ext_regions = 0;
+    }
      else if ( is_32bit_domain(d) )
+    {
          printk(XENLOG_DEBUG "The extended regions are only supported 
for 64-bit guest currently\n");
+        nr_ext_regions = 0;
+    }
      else
      {
+        ext_regions = xzalloc(struct meminfo);
+        if ( !ext_regions )
+            return -ENOMEM;
+
          if ( !is_iommu_enabled(d) )
              res = find_unallocated_memory(kinfo, ext_regions);
          else
@@ -1175,6 +1173,14 @@ static int __init make_hypervisor_node(struct 
domain *d,

          if ( res )
              printk(XENLOG_WARNING "Failed to allocate extended 
regions\n");
+        nr_ext_regions = ext_regions->nr_banks;
+    }
+
+    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + 
sizecells));
+    if ( !reg )
+    {
+        xfree(ext_regions);
+        return -ENOMEM;
      }

      /* reg 0 is grant table space */
@@ -1182,7 +1188,7 @@ static int __init make_hypervisor_node(struct 
domain *d,
      dt_child_set_range(&cells, addrcells, sizecells,
                         kinfo->gnttab_start, kinfo->gnttab_size);
      /* reg 1...N are extended regions */
-    for ( i = 0; i < ext_regions->nr_banks; i++ )
+    for ( i = 0; i < nr_ext_regions; i++ )
      {
          u64 start = ext_regions->bank[i].start;
          u64 size = ext_regions->bank[i].size;
@@ -1195,7 +1201,7 @@ static int __init make_hypervisor_node(struct 
domain *d,

      res = fdt_property(fdt, "reg", reg,
                         dt_cells_to_size(addrcells + sizecells) *
-                       (ext_regions->nr_banks + 1));
+                       (nr_ext_regions + 1));
      xfree(ext_regions);
      xfree(reg);

(END)



>
>
>> +    else
>> +    {
>> +        if ( !is_iommu_enabled(d) )
>> +            res = find_unallocated_memory(kinfo, ext_regions);
>> +        else
>> +            res = find_memory_holes(kinfo, ext_regions);
>> +
>> +        if ( res )
>> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
>> +    }
>> +
>>       /* reg 0 is grant table space */
>>       cells = &reg[0];
>>       dt_child_set_range(&cells, addrcells, sizecells,
>>                          kinfo->gnttab_start, kinfo->gnttab_size);
>> +    /* reg 1...N are extended regions */
>> +    for ( i = 0; i < ext_regions->nr_banks; i++ )
>> +    {
>> +        u64 start = ext_regions->bank[i].start;
>> +        u64 size = ext_regions->bank[i].size;
>> +
>> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
>> +                   i, start, start + size);
>> +
>> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
>> +    }
>> +
>>       res = fdt_property(fdt, "reg", reg,
>> -                       dt_cells_to_size(addrcells + sizecells));
>> +                       dt_cells_to_size(addrcells + sizecells) *
>> +                       (ext_regions->nr_banks + 1));
>> +    xfree(ext_regions);
>> +    xfree(reg);
>> +
>>       if ( res )
>>           return res;
>>   
>> -- 
>> 2.7.4
>>
-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0
  2021-09-25 18:20     ` Oleksandr
@ 2021-09-25 19:07       ` Stefano Stabellini
  0 siblings, 0 replies; 12+ messages in thread
From: Stefano Stabellini @ 2021-09-25 19:07 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko,
	Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Julien Grall, Wei Liu, Volodymyr Babchuk

[-- Attachment #1: Type: text/plain, Size: 19776 bytes --]

On Sat, 25 Sep 2021, Oleksandr wrote:
> On 25.09.21 01:39, Stefano Stabellini wrote:
> > On Fri, 24 Sep 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > The extended region (safe range) is a region of guest physical
> > > address space which is unused and could be safely used to create
> > > grant/foreign mappings instead of wasting real RAM pages from
> > > the domain memory for establishing these mappings.
> > > 
> > > The extended regions are chosen at the domain creation time and
> > > advertised to it via "reg" property under hypervisor node in
> > > the guest device-tree. As region 0 is reserved for grant table
> > > space (always present), the indexes for extended regions are 1...N.
> > > If extended regions could not be allocated for some reason,
> > > Xen doesn't fail and behaves as usual, so only inserts region 0.
> > > 
> > > Please note the following limitations:
> > > - The extended region feature is only supported for 64-bit domain
> > >    currently.
> > > - The ACPI case is not covered.
> > > 
> > > ***
> > > 
> > > As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> > > the algorithm to choose extended regions for it is different
> > > in comparison with the algorithm for non-direct mapped DomU.
> > > What is more, that extended regions should be chosen differently
> > > whether IOMMU is enabled or not.
> > > 
> > > Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> > > holes found in host device-tree if otherwise. Make sure that
> > > extended regions are 2MB-aligned and located within maximum possible
> > > addressable physical memory range. The minimum size of extended
> > > region is 64MB. The maximum number of extended regions is 128,
> > > which is an artificial limit to minimize code changes (we reuse
> > > struct meminfo to describe extended regions, so there are an array
> > > field for 128 elements).
> > > 
> > > It worth mentioning that unallocated memory solution (when the IOMMU
> > > is disabled) will work safely until Dom0 is able to allocate memory
> > > outside of the original range.
> > > 
> > > Also introduce command line option to be able to globally enable or
> > > disable support for extended regions for Dom0 (enabled by default).
> > > 
> > > Suggested-by: Julien Grall <jgrall@amazon.com>
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > Please note, we need to decide which approach to use in
> > > find_unallocated_memory(),
> > > you can find details at:
> > > https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> > > 
> > > Changes RFC -> V2:
> > >     - update patch description
> > >     - drop uneeded "extended-region" DT property
> > > 
> > > Changes V2 -> V3:
> > >     - update patch description
> > >     - add comment for "size" calculation in add_ext_regions()
> > >     - clarify "end" calculation in find_unallocated_memory() and
> > >       find_memory_holes()
> > >     - only pick up regions with size >= 64MB
> > >     - allocate reg dynamically instead of keeping on the stack in
> > >       make_hypervisor_node()
> > >     - do not show warning for 32-bit domain
> > >     - drop Linux specific limits EXT_REGION_*
> > >     - also cover "ranges" property in find_memory_holes()
> > >     - add command line arg to enable/disable extended region support
> > > ---
> > >   docs/misc/xen-command-line.pandoc |   7 +
> > >   xen/arch/arm/domain_build.c       | 280
> > > +++++++++++++++++++++++++++++++++++++-
> > >   2 files changed, 284 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/docs/misc/xen-command-line.pandoc
> > > b/docs/misc/xen-command-line.pandoc
> > > index 177e656..3bb8eb7 100644
> > > --- a/docs/misc/xen-command-line.pandoc
> > > +++ b/docs/misc/xen-command-line.pandoc
> > > @@ -1081,6 +1081,13 @@ hardware domain is architecture dependent.
> > >   Note that specifying zero as domU value means zero, while for dom0 it
> > > means
> > >   to use the default.
> > >   +### ext_regions (Arm)
> > > +> `= <boolean>`
> > > +
> > > +> Default : `true`
> > > +
> > > +Flag to globally enable or disable support for extended regions for dom0.
> > I'd say:
> > 
> > Flag to enable or disable extended regions for Dom0.
> > 
> > Extended regions are ranges of unused address space exposed to Dom0 as
> > "safe to use" for special memory mappings. Disable if your board device
> > tree is incomplete.
> 
> 
> ok, will update
> 
> 
> > 
> > 
> > >   ### flask
> > >   > `= permissive | enforcing | late | disabled`
> > >   diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > > index d233d63..81997d5 100644
> > > --- a/xen/arch/arm/domain_build.c
> > > +++ b/xen/arch/arm/domain_build.c
> > > @@ -34,6 +34,10 @@
> > >   static unsigned int __initdata opt_dom0_max_vcpus;
> > >   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
> > >   +/* If true, the extended regions support is enabled for dom0 */
> > > +static bool __initdata opt_ext_regions = true;
> > > +boolean_param("ext_regions", opt_ext_regions);
> > > +
> > >   static u64 __initdata dom0_mem;
> > >   static bool __initdata dom0_mem_set;
> > >   @@ -886,6 +890,233 @@ static int __init make_memory_node(const struct
> > > domain *d,
> > >       return res;
> > >   }
> > >   +static int __init add_ext_regions(unsigned long s, unsigned long e,
> > > void *data)
> > > +{
> > > +    struct meminfo *ext_regions = data;
> > > +    paddr_t start, size;
> > > +
> > > +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> > > +        return 0;
> > > +
> > > +    /* Both start and size of the extended region should be 2MB aligned
> > > */
> > > +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> > > +    if ( start > e )
> > > +        return 0;
> > > +
> > > +    /*
> > > +     * e is actually "end-1" because it is called by rangeset functions
> > > +     * which are inclusive of the last address.
> > > +     */
> > > +    e += 1;
> > > +    size = (e - start) & ~(SZ_2M - 1);
> > > +    if ( size < MB(64) )
> > > +        return 0;
> > > +
> > > +    ext_regions->bank[ext_regions->nr_banks].start = start;
> > > +    ext_regions->bank[ext_regions->nr_banks].size = size;
> > > +    ext_regions->nr_banks++;
> > > +
> > > +    return 0;
> > > +}
> > This is a lot better
> 
> Great!
> 
> 
> > 
> > 
> > > +static int __init find_unallocated_memory(const struct kernel_info
> > > *kinfo,
> > > +                                          struct meminfo *ext_regions)
> > > +{
> > > +    const struct meminfo *assign_mem = &kinfo->mem;
> > > +    struct rangeset *unalloc_mem;
> > > +    paddr_t start, end;
> > > +    unsigned int i;
> > > +    int res;
> > > +
> > > +    dt_dprintk("Find unallocated memory for extended regions\n");
> > > +
> > > +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> > > +    if ( !unalloc_mem )
> > > +        return -ENOMEM;
> > > +
> > > +    /* Start with all available RAM */
> > > +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> > > +    {
> > > +        start = bootinfo.mem.bank[i].start;
> > > +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> > > +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> > > +        if ( res )
> > > +        {
> > > +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> > > +                   start, end);
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > > +    /* Remove RAM assigned to Dom0 */
> > > +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> > > +    {
> > > +        start = assign_mem->bank[i].start;
> > > +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> > > +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> > > +        if ( res )
> > > +        {
> > > +            printk(XENLOG_ERR "Failed to remove:
> > > %#"PRIx64"->%#"PRIx64"\n",
> > > +                   start, end);
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > > +    /* Remove reserved-memory regions */
> > > +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> > > +    {
> > > +        start = bootinfo.reserved_mem.bank[i].start;
> > > +        end = bootinfo.reserved_mem.bank[i].start +
> > > +            bootinfo.reserved_mem.bank[i].size;
> > > +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> > > +        if ( res )
> > > +        {
> > > +            printk(XENLOG_ERR "Failed to remove:
> > > %#"PRIx64"->%#"PRIx64"\n",
> > > +                   start, end);
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > > +    /* Remove grant table region */
> > > +    start = kinfo->gnttab_start;
> > > +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> > > +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> > > +    if ( res )
> > > +    {
> > > +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> > > +               start, end);
> > > +        goto out;
> > > +    }
> > > +
> > > +    start = 0;
> > > +    end = (1ULL << p2m_ipa_bits) - 1;
> > > +    res = rangeset_report_ranges(unalloc_mem, start, end,
> > > +                                 add_ext_regions, ext_regions);
> > > +    if ( res )
> > > +        ext_regions->nr_banks = 0;
> > > +    else if ( !ext_regions->nr_banks )
> > > +        res = -ENOENT;
> > > +
> > > +out:
> > > +    rangeset_destroy(unalloc_mem);
> > > +
> > > +    return res;
> > > +}
> > > +
> > > +static int __init find_memory_holes(const struct kernel_info *kinfo,
> > > +                                    struct meminfo *ext_regions)
> > > +{
> > > +    struct dt_device_node *np;
> > > +    struct rangeset *mem_holes;
> > > +    paddr_t start, end;
> > > +    unsigned int i;
> > > +    int res;
> > > +
> > > +    dt_dprintk("Find memory holes for extended regions\n");
> > > +
> > > +    mem_holes = rangeset_new(NULL, NULL, 0);
> > > +    if ( !mem_holes )
> > > +        return -ENOMEM;
> > > +
> > > +    /* Start with maximum possible addressable physical memory range */
> > > +    start = 0;
> > > +    end = (1ULL << p2m_ipa_bits) - 1;
> > > +    res = rangeset_add_range(mem_holes, start, end);
> > > +    if ( res )
> > > +    {
> > > +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> > > +               start, end);
> > > +        goto out;
> > > +    }
> > > +
> > > +    /*
> > > +     * Remove regions described by "reg" and "ranges" properties where
> > > +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> > > +     */
> > > +    dt_for_each_device_node( dt_host, np )
> > > +    {
> > > +        unsigned int naddr;
> > > +        u64 addr, size;
> > > +
> > > +        naddr = dt_number_of_address(np);
> > > +
> > > +        for ( i = 0; i < naddr; i++ )
> > > +        {
> > > +            res = dt_device_get_address(np, i, &addr, &size);
> > > +            if ( res )
> > > +            {
> > > +                printk(XENLOG_ERR "Unable to retrieve address %u for
> > > %s\n",
> > > +                       i, dt_node_full_name(np));
> > > +                goto out;
> > > +            }
> > > +
> > > +            start = addr & PAGE_MASK;
> > PAGE_ALIGN(addr)
> 
> Let's imagine we have reg = <0x0 0xee080200 0x0 0x700>;
> So using PAGE_ALIGN(0xee080200) we will get a result aligned up - 0xee081000,
> but this is not what we want here. But, the end should be aligned up.

You are right, we want to be conservative here, So PAGE_ALIGN is fine
for end, but for start we need "& PAGE_MASK".


 
> > > +            end = PAGE_ALIGN(addr + size);
> > > +            res = rangeset_remove_range(mem_holes, start, end - 1);
> > > +            if ( res )
> > > +            {
> > > +                printk(XENLOG_ERR "Failed to remove:
> > > %#"PRIx64"->%#"PRIx64"\n",
> > > +                       start, end);
> > > +                goto out;
> > > +            }
> > > +        }
> > > +
> > > +        if ( dt_device_type_is_equal(np, "pci" ) )
> > > +        {
> > > +            unsigned int range_size, nr_ranges;
> > > +            int na, ns, pna;
> > > +            const __be32 *ranges;
> > > +            u32 len;
> > > +
> > > +            /*
> > > +             * Looking for non-empty ranges property which in the context
> > > +             * of the PCI host bridge device node describes the BARs for
> > > +             * the PCI devices.
> > I'd say that "it describes the PCI host bridge aperture"
> 
> ok, will update
> 
> 
> > 
> > 
> > > +             */
> > > +            ranges = dt_get_property(np, "ranges", &len);
> > > +            if ( !ranges || !len )
> > > +                continue;
> > > +
> > > +            pna = dt_n_addr_cells(np);
> > > +            na = dt_child_n_addr_cells(np);
> > > +            ns = dt_child_n_size_cells(np);
> > > +
> > > +            range_size = pna + na + ns;
> > > +
> > > +            nr_ranges = len / sizeof(__be32) / range_size;
> > > +
> > > +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> > > +            {
> > > +                /* Skip the child address and get the parent (CPU)
> > > address */
> > > +                addr = dt_read_number(ranges + na, pna);
> > > +
> > > +                size = dt_read_number(ranges + na + pna, ns);
> > Parsing the PCI ranges property is never intuitive, but everything
> > checks out as far as I can tell
> > 
> > 
> > > +                start = addr & PAGE_MASK;
> > PAGE_ALIGN(start)
> 
> same here
> 
> 
> > 
> > 
> > > +                end = PAGE_ALIGN(addr + size);
> > > +                res = rangeset_remove_range(mem_holes, start, end - 1);
> > > +                if ( res )
> > > +                {
> > > +                    printk(XENLOG_ERR "Failed to remove:
> > > %#"PRIx64"->%#"PRIx64"\n",
> > > +                           start, end);
> > > +                    goto out;
> > > +                }
> > > +            }
> > > +        }
> > > +    }
> > > +
> > > +    start = 0;
> > > +    end = (1ULL << p2m_ipa_bits) - 1;
> > > +    res = rangeset_report_ranges(mem_holes, start, end,
> > > +                                 add_ext_regions,  ext_regions);
> > > +    if ( res )
> > > +        ext_regions->nr_banks = 0;
> > > +    else if ( !ext_regions->nr_banks )
> > > +        res = -ENOENT;
> > > +
> > > +out:
> > > +    rangeset_destroy(mem_holes);
> > > +
> > > +    return res;
> > > +}
> > > +
> > >   static int __init make_hypervisor_node(struct domain *d,
> > >                                          const struct kernel_info *kinfo,
> > >                                          int addrcells, int sizecells)
> > > @@ -893,11 +1124,12 @@ static int __init make_hypervisor_node(struct
> > > domain *d,
> > >       const char compat[] =
> > >           "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
> > >           "xen,xen";
> > > -    __be32 reg[4];
> > > +    __be32 *reg, *cells;
> > >       gic_interrupt_t intr;
> > > -    __be32 *cells;
> > >       int res;
> > >       void *fdt = kinfo->fdt;
> > > +    struct meminfo *ext_regions;
> > > +    unsigned int i;
> > >         dt_dprintk("Create hypervisor node\n");
> > >   @@ -919,12 +1151,54 @@ static int __init make_hypervisor_node(struct
> > > domain *d,
> > >       if ( res )
> > >           return res;
> > >   +    reg = xzalloc_array(__be32, (NR_MEM_BANKS + 1) * 4);
> > > +    if ( !reg )
> > > +        return -ENOMEM;
> > Instead of (NR_MEM_BANKS + 1) * 4, shouldn't it be:
> > 
> > (ext_regions->nr_banks + 1) * (addrcells + sizecells)
> > 
> > ?
> 
> I think, yes ... But, with other modifications. Please see below
> 
> 
> > 
> > xzalloc_array allocates a number of elements, not a number of bytes.
> > 
> > 
> > > +
> > > +    ext_regions = xzalloc(struct meminfo);
> > > +    if ( !ext_regions )
> > > +    {
> > > +        xfree(reg);
> > > +        return -ENOMEM;
> > > +    }
> > > +
> > > +    if ( !opt_ext_regions )
> > > +        printk(XENLOG_DEBUG "The extended regions support is
> > > disabled\n");
> > > +    else if ( is_32bit_domain(d) )
> > > +        printk(XENLOG_DEBUG "The extended regions are only supported for
> > > 64-bit guest currently\n");
> > These checks should be before the memory allocations
> 
> Good point! Indeed there is no sense of whole "ext_regions" allocations if we
> are not going to handle extended regions. Also there is no need
> to allocate "reg" array with maximum possible elements in advance
> (NR_MEM_BANKS + 1) * 4, we can allocate it afterwards when we clearly know how
> many elements we really need
> (nr_ext_regions + 1) * 4, as you suggested above.
> 
> 
> Below the changes to this function on top of current patch:
> 
> @@ -1128,8 +1127,8 @@ static int __init make_hypervisor_node(struct domain *d,
>      gic_interrupt_t intr;
>      int res;
>      void *fdt = kinfo->fdt;
> -    struct meminfo *ext_regions;
> -    unsigned int i;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
> 
>      dt_dprintk("Create hypervisor node\n");
> 
> @@ -1151,23 +1150,22 @@ static int __init make_hypervisor_node(struct domain
> *d,
>      if ( res )
>          return res;
> 
> -    reg = xzalloc_array(__be32, (NR_MEM_BANKS + 1) * 4);
> -    if ( !reg )
> -        return -ENOMEM;
> -
> -    ext_regions = xzalloc(struct meminfo);
> -    if ( !ext_regions )
> -    {
> -        xfree(reg);
> -        return -ENOMEM;
> -    }
> -
>      if ( !opt_ext_regions )
> +    {
>          printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +        nr_ext_regions = 0;
> +    }
>      else if ( is_32bit_domain(d) )
> +    {
>          printk(XENLOG_DEBUG "The extended regions are only supported for
> 64-bit guest currently\n");
> +        nr_ext_regions = 0;
> +    }
>      else
>      {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
>          if ( !is_iommu_enabled(d) )
>              res = find_unallocated_memory(kinfo, ext_regions);
>          else
> @@ -1175,6 +1173,14 @@ static int __init make_hypervisor_node(struct domain
> *d,
> 
>          if ( res )
>              printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells +
> sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
>      }
> 
>      /* reg 0 is grant table space */
> @@ -1182,7 +1188,7 @@ static int __init make_hypervisor_node(struct domain *d,
>      dt_child_set_range(&cells, addrcells, sizecells,
>                         kinfo->gnttab_start, kinfo->gnttab_size);
>      /* reg 1...N are extended regions */
> -    for ( i = 0; i < ext_regions->nr_banks; i++ )
> +    for ( i = 0; i < nr_ext_regions; i++ )
>      {
>          u64 start = ext_regions->bank[i].start;
>          u64 size = ext_regions->bank[i].size;
> @@ -1195,7 +1201,7 @@ static int __init make_hypervisor_node(struct domain *d,
> 
>      res = fdt_property(fdt, "reg", reg,
>                         dt_cells_to_size(addrcells + sizecells) *
> -                       (ext_regions->nr_banks + 1));
> +                       (nr_ext_regions + 1));
>      xfree(ext_regions);
>      xfree(reg);
> 

Yep, that's what I meant, thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-09-24 23:14   ` Stefano Stabellini
@ 2021-09-27 10:48     ` Ian Jackson
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Jackson @ 2021-09-27 10:48 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Ian Jackson, Anthony PERARD, Juergen Gross, Julien Grall,
	Volodymyr Babchuk

Stefano Stabellini writes ("Re: [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU"):
> On Fri, 24 Sep 2021, Oleksandr Tyshchenko wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
...
> > Suggested-by: Julien Grall <jgrall@amazon.com>
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Acked-by: Ian Jackson <iwj@xenproject.org>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-09-23 22:48 ` [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
@ 2021-09-28  6:28   ` Michal Orzel
  2021-09-28 16:05     ` Oleksandr
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Orzel @ 2021-09-28  6:28 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Juergen Gross, Volodymyr Babchuk,
	Roger Pau Monné

Hi Oleksandr,

On 24.09.2021 00:48, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> We need to pass info about maximum supported guest address
> space size to the toolstack on Arm in order to properly
> calculate the base and size of the extended region (safe range)
> for the guest. The extended region is unused address space which
> could be safely used by domain for foreign/grant mappings on Arm.
> The extended region itself will be handled by the subsequents
> patch.
> 
> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
> hap_paddr_bits.
> 
> As we change the size of structure bump the interface version.
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Please note, that review comments for the RFC version [1] haven't been addressed yet.
> It is not forgotten, some clarification is needed. It will be addressed for the next version.
> 
> [1] https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
> 
> Changes RFC -> V2:
>    - update patch subject/description
>    - replace arch-specific sub-struct with common gpaddr_bits
>      field and update code to reflect that
> 
> Changes V2 -> V3:
>    - make the field uint8_t and add uint8_t pad[7] after
>    - remove leading blanks in libxl.h
> ---
>  tools/include/libxl.h            | 7 +++++++
>  tools/libs/light/libxl.c         | 2 ++
>  tools/libs/light/libxl_types.idl | 2 ++
>  xen/arch/arm/sysctl.c            | 2 ++
>  xen/arch/x86/sysctl.c            | 2 ++
>  xen/include/public/sysctl.h      | 4 +++-
>  6 files changed, 18 insertions(+), 1 deletion(-)
> 

Don't you want to print gpaddr_bits field of xen_sysctl_physinfo from output_physinfo (xl_info.c)?

Apart from that:
Reviewed-by: Michal Orzel <michal.orzel@arm.com>

Cheers


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU
  2021-09-23 22:48 ` [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
  2021-09-24 23:14   ` Stefano Stabellini
@ 2021-09-28  7:28   ` Michal Orzel
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Orzel @ 2021-09-28  7:28 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Juergen Gross, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk

Hi,

On 24.09.2021 00:48, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> The algorithm to choose extended regions for non-direct mapped
> DomU is simpler in comparison with the algorithm for direct mapped
> Dom0. As we have a lot of unused space above 4GB, provide single
> 2MB-aligned region from the second RAM bank taking into the account
> the maximum supported guest address space size and the amount of
> memory assigned to the guest. The maximum size of the region is 128GB.
> The minimum size is 64MB.
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> ---
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
>    - clear reg array in finalise_ext_region() and add a TODO
> 
> Changes V2 -> V3:
>    - update patch description, comments in code
>    - only pick up regions with size >= 64MB
>    - move the region calculation to make_hypervisor_node() and drop
>      finalise_ext_region()
>    - extend the list of arguments for make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - change the region alignment from 1GB to 2MB
>    - move EXT_REGION_SIZE to public/arch-arm.h
> ---
>  tools/libs/light/libxl_arm.c  | 70 +++++++++++++++++++++++++++++++++++++++----
>  xen/include/public/arch-arm.h |  3 ++
>  2 files changed, 68 insertions(+), 5 deletions(-)
> 
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Tested-by: Michal Orzel <michal.orzel@arm.com>

Cheers


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo
  2021-09-28  6:28   ` Michal Orzel
@ 2021-09-28 16:05     ` Oleksandr
  0 siblings, 0 replies; 12+ messages in thread
From: Oleksandr @ 2021-09-28 16:05 UTC (permalink / raw)
  To: Michal Orzel
  Cc: xen-devel, Oleksandr Tyshchenko, Ian Jackson, Wei Liu,
	Anthony PERARD, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Juergen Gross,
	Volodymyr Babchuk, Roger Pau Monné


On 28.09.21 09:28, Michal Orzel wrote:
> Hi Oleksandr,

Hi Michal



>
> On 24.09.2021 00:48, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> We need to pass info about maximum supported guest address
>> space size to the toolstack on Arm in order to properly
>> calculate the base and size of the extended region (safe range)
>> for the guest. The extended region is unused address space which
>> could be safely used by domain for foreign/grant mappings on Arm.
>> The extended region itself will be handled by the subsequents
>> patch.
>>
>> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>> hap_paddr_bits.
>>
>> As we change the size of structure bump the interface version.
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Please note, that review comments for the RFC version [1] haven't been addressed yet.
>> It is not forgotten, some clarification is needed. It will be addressed for the next version.
>>
>> [1] https://lore.kernel.org/xen-devel/973f5344-aa10-3ad6-ff02-ad5f358ad279@citrix.com/
>>
>> Changes RFC -> V2:
>>     - update patch subject/description
>>     - replace arch-specific sub-struct with common gpaddr_bits
>>       field and update code to reflect that
>>
>> Changes V2 -> V3:
>>     - make the field uint8_t and add uint8_t pad[7] after
>>     - remove leading blanks in libxl.h
>> ---
>>   tools/include/libxl.h            | 7 +++++++
>>   tools/libs/light/libxl.c         | 2 ++
>>   tools/libs/light/libxl_types.idl | 2 ++
>>   xen/arch/arm/sysctl.c            | 2 ++
>>   xen/arch/x86/sysctl.c            | 2 ++
>>   xen/include/public/sysctl.h      | 4 +++-
>>   6 files changed, 18 insertions(+), 1 deletion(-)
>>
> Don't you want to print gpaddr_bits field of xen_sysctl_physinfo from output_physinfo (xl_info.c)?

Good point, will do, thank you.


>
> Apart from that:
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>

Thanks


>
> Cheers

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-28 16:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-23 22:48 [PATCH V3 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
2021-09-23 22:48 ` [PATCH V3 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
2021-09-28  6:28   ` Michal Orzel
2021-09-28 16:05     ` Oleksandr
2021-09-23 22:48 ` [PATCH V3 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
2021-09-24 22:39   ` Stefano Stabellini
2021-09-25 18:20     ` Oleksandr
2021-09-25 19:07       ` Stefano Stabellini
2021-09-23 22:48 ` [PATCH V3 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
2021-09-24 23:14   ` Stefano Stabellini
2021-09-27 10:48     ` Ian Jackson
2021-09-28  7:28   ` Michal Orzel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.