From: Oleksandr <olekstysh@gmail.com>
To: Julien Grall <julien@xen.org>
Cc: xen-devel@lists.xenproject.org,
Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
Henry Wang <Henry.Wang@arm.com>,
Bertrand Marquis <bertrand.marquis@arm.com>,
Wei Chen <Wei.Chen@arm.com>
Subject: Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
Date: Fri, 17 Sep 2021 22:51:20 +0300 [thread overview]
Message-ID: <08294c53-109a-8544-3a23-85e034d2992d@gmail.com> (raw)
In-Reply-To: <0a72559e-5742-dc33-1c8f-5903c50b27be@xen.org>
On 17.09.21 18:48, Julien Grall wrote:
> Hi Oleksandr,
Hi Julien
>
> On 10/09/2021 23:18, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain.
>> - The ACPI case is not covered.
>
> I understand the ACPI is not covered because we would need to create a
> new binding. But I am not sure to understand why 32-bit domain is not
> supported. Can you explain it?
The 32-bit domain is not supported for simplifying things from the
beginning. It is a little bit difficult to get everything working at
start. As I understand from discussion at [1] we can afford that
simplification. However, I should have mentioned that 32-bit domain is
not supported "for now".
>
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise.
>
> For the case when the IOMMU is disabled, this will only work if dom0
> cannot allocate memory outside of the original range. This is
> currently the case... but I think this should be spelled out in at
> least the commit message.
Agree, will update commit description.
>
>
>> Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The maximum number of extended
>> regions is 128.
>
> Please explain how this limit was chosen.
Well, I decided to not introduce new data struct and etc to represent
extended regions but reuse existing struct meminfo
used for memory/reserved-memory and, as I though, perfectly fitted. So,
that limit come from NR_MEM_BANKS which is 128.
>
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes since RFC:
>> - update patch description
>> - drop uneeded "extended-region" DT property
>> ---
>>
>> xen/arch/arm/domain_build.c | 226
>> +++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 224 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index 206038d..070ec27 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -724,6 +724,196 @@ static int __init make_memory_node(const struct
>> domain *d,
>> return res;
>> }
>> +static int __init add_ext_regions(unsigned long s, unsigned long
>> e, void *data)
>> +{
>> + struct meminfo *ext_regions = data;
>> + paddr_t start, size;
>> +
>> + if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
>> + return 0;
>> +
>> + /* Both start and size of the extended region should be 2MB
>> aligned */
>> + start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
>> + if ( start > e )
>> + return 0;
>> +
>> + size = (e - start + 1) & ~(SZ_2M - 1);
>> + if ( !size )
>> + return 0;
>> +
>> + ext_regions->bank[ext_regions->nr_banks].start = start;
>> + ext_regions->bank[ext_regions->nr_banks].size = size;
>> + ext_regions->nr_banks ++;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * The extended regions will be prevalidated by the memory hotplug path
>> + * in Linux which requires for any added address range to be within
>> maximum
>> + * possible addressable physical memory range for which the linear
>> mapping
>> + * could be created.
>> + * For 48-bit VA space size the maximum addressable range are:
>
> When I read "maximum", I understand an upper limit. But below, you are
> providing a range. So should you drop "maximum"?
yes, it is a little bit confusing.
>
>
> Also, this is tailored to Linux using 48-bit VA. How about other limits?
These limits are calculated at [2]. Sorry, I didn't investigate yet what
values would be for other CONFIG_ARM64_VA_BITS_XXX. Also looks like some
configs depend on 16K/64K pages...
I will try to investigate and provide limits later on.
>
>
>> + * 0x40000000 - 0x80003fffffff
>> + */
>> +#define EXT_REGION_START 0x40000000ULL
>
> I am probably missing something here.... There are platform out there
> with memory starting at 0 (IIRC ZynqMP is one example). So wouldn't
> this potentially rule out the extended region on such platform?
From my understanding the extended region cannot be in 0...0x40000000
range. If these platforms have memory above first GB, I believe the
extended region(s) can be allocated for them.
>
>
>> +#define EXT_REGION_END 0x80003fffffffULL
>> +
>> +static int __init find_unallocated_memory(const struct kernel_info
>> *kinfo,
>> + struct meminfo *ext_regions)
>> +{
>> + const struct meminfo *assign_mem = &kinfo->mem;
>> + struct rangeset *unalloc_mem;
>> + paddr_t start, end;
>> + unsigned int i;
>> + int res;
>
> We technically already know which range of memory is unused. This is
> pretty much any region in the freelist of the page allocator. So how
> about walking the freelist instead?
ok, I will investigate the page allocator code (right now I have no
understanding of how to do that). BTW, I have just grepped "freelist"
through the code and all page context related appearances are in x86
code only.
>
> The advantage is we don't need to worry about modifying the function
> when adding new memory type.
>
> One disavantage is this will not cover *all* the unused memory as this
> is doing. But I think this is an acceptable downside.
>
>> +
>> + dt_dprintk("Find unallocated memory for extended regions\n");
>> +
>> + unalloc_mem = rangeset_new(NULL, NULL, 0);
>> + if ( !unalloc_mem )
>> + return -ENOMEM;
>> +
>> + /* Start with all available RAM */
>> + for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
>> + {
>> + start = bootinfo.mem.bank[i].start;
>> + end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size
>> - 1;
>> + res = rangeset_add_range(unalloc_mem, start, end);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Failed to add:
>> %#"PRIx64"->%#"PRIx64"\n",
>> + start, end);
>> + goto out;
>> + }
>> + }
>> +
>> + /* Remove RAM assigned to Dom0 */
>> + for ( i = 0; i < assign_mem->nr_banks; i++ )
>> + {
>> + start = assign_mem->bank[i].start;
>> + end = assign_mem->bank[i].start + assign_mem->bank[i].size - 1;
>> + res = rangeset_remove_range(unalloc_mem, start, end);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Failed to remove:
>> %#"PRIx64"->%#"PRIx64"\n",
>> + start, end);
>> + goto out;
>> + }
>> + }
>> +
>> + /* Remove reserved-memory regions */
>> + for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
>> + {
>> + start = bootinfo.reserved_mem.bank[i].start;
>> + end = bootinfo.reserved_mem.bank[i].start +
>> + bootinfo.reserved_mem.bank[i].size - 1;
>> + res = rangeset_remove_range(unalloc_mem, start, end);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Failed to remove:
>> %#"PRIx64"->%#"PRIx64"\n",
>> + start, end);
>> + goto out;
>> + }
>> + }
>> +
>> + /* Remove grant table region */
>> + start = kinfo->gnttab_start;
>> + end = kinfo->gnttab_start + kinfo->gnttab_size - 1;
>> + res = rangeset_remove_range(unalloc_mem, start, end);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> + start, end);
>> + goto out;
>> + }
>> +
>> + start = EXT_REGION_START;
>> + end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
>> + res = rangeset_report_ranges(unalloc_mem, start, end,
>> + add_ext_regions, ext_regions);
>> + if ( res )
>> + ext_regions->nr_banks = 0;
>> + else if ( !ext_regions->nr_banks )
>> + res = -ENOENT;
>> +
>> +out:
>> + rangeset_destroy(unalloc_mem);
>> +
>> + return res;
>> +}
>> +
>> +static int __init find_memory_holes(const struct kernel_info *kinfo,
>> + struct meminfo *ext_regions)
>> +{
>> + struct dt_device_node *np;
>> + struct rangeset *mem_holes;
>> + paddr_t start, end;
>> + unsigned int i;
>> + int res;
>> +
>> + dt_dprintk("Find memory holes for extended regions\n");
>> +
>> + mem_holes = rangeset_new(NULL, NULL, 0);
>> + if ( !mem_holes )
>> + return -ENOMEM;
>> +
>> + /* Start with maximum possible addressable physical memory range */
>> + start = EXT_REGION_START;
>> + end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
>> + res = rangeset_add_range(mem_holes, start, end);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> + start, end);
>> + goto out;
>> + }
>> +
>> + /* Remove all regions described by "reg" property (MMIO, RAM,
>> etc) */
>
> Well... The loop below is not going to handle all the regions
> described in the property "reg". Instead, it will cover a subset of
> "reg" where the memory is addressable.
As I understand, we are only interested in subset of "reg" where the
memory is addressable.
>
>
> You will also need to cover "ranges" that will describe the BARs for
> the PCI devices.
Good point. Could you please clarify how to recognize whether it is a
PCI device as long as PCI support is not merged? Or just to find any
device nodes with non-empty "ranges" property
and retrieve addresses?
>
>
>> + dt_for_each_device_node( dt_host, np )
>> + {
>> + unsigned int naddr;
>> + u64 addr, size;
>> +
>> + naddr = dt_number_of_address(np);
>> +
>> + for ( i = 0; i < naddr; i++ )
>> + {
>> + res = dt_device_get_address(np, i, &addr, &size);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Unable to retrieve address %u for
>> %s\n",
>> + i, dt_node_full_name(np));
>> + goto out;
>> + }
>> +
>> + start = addr & PAGE_MASK;
>> + end = PAGE_ALIGN(addr + size) - 1;
>> + res = rangeset_remove_range(mem_holes, start, end);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Failed to remove:
>> %#"PRIx64"->%#"PRIx64"\n",
>> + start, end);
>> + goto out;
>> + }
>> + }
>> + }
>> +
>> + start = EXT_REGION_START;
>> + end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
>> + res = rangeset_report_ranges(mem_holes, start, end,
>> + add_ext_regions, ext_regions);
>> + if ( res )
>> + ext_regions->nr_banks = 0;
>> + else if ( !ext_regions->nr_banks )
>> + res = -ENOENT;
>> +
>> +out:
>> + rangeset_destroy(mem_holes);
>> +
>> + return res;
>> +}
>> +
>> static int __init make_hypervisor_node(struct domain *d,
>> const struct kernel_info
>> *kinfo,
>> int addrcells, int sizecells)
>> @@ -731,11 +921,13 @@ static int __init make_hypervisor_node(struct
>> domain *d,
>> const char compat[] =
>> "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>> "xen,xen";
>> - __be32 reg[4];
>> + __be32 reg[(NR_MEM_BANKS + 1) * 4];
>
> This is a fairly large allocation on the stack. Could we move to a
> dynamic allocation?
Of course, will do.
>
>
>> gic_interrupt_t intr;
>> __be32 *cells;
>> int res;
>> void *fdt = kinfo->fdt;
>> + struct meminfo *ext_regions;
>> + unsigned int i;
>> dt_dprintk("Create hypervisor node\n");
>> @@ -757,12 +949,42 @@ static int __init make_hypervisor_node(struct
>> domain *d,
>> if ( res )
>> return res;
>> + ext_regions = xzalloc(struct meminfo);
>> + if ( !ext_regions )
>> + return -ENOMEM;
>> +
>> + if ( is_32bit_domain(d) )
>> + printk(XENLOG_WARNING "The extended region is only supported
>> for 64-bit guest\n");
>> + else
>> + {
>> + if ( !is_iommu_enabled(d) )
>> + res = find_unallocated_memory(kinfo, ext_regions);
>> + else
>> + res = find_memory_holes(kinfo, ext_regions);
>> +
>> + if ( res )
>> + printk(XENLOG_WARNING "Failed to allocate extended
>> regions\n");
>> + }
>> +
>> /* reg 0 is grant table space */
>> cells = ®[0];
>> dt_child_set_range(&cells, addrcells, sizecells,
>> kinfo->gnttab_start, kinfo->gnttab_size);
>> + /* reg 1...N are extended regions */
>> + for ( i = 0; i < ext_regions->nr_banks; i++ )
>> + {
>> + u64 start = ext_regions->bank[i].start;
>> + u64 size = ext_regions->bank[i].size;
>> +
>> + dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
>> + i, start, start + size);
>> +
>> + dt_child_set_range(&cells, addrcells, sizecells, start, size);
>> + }
>> + xfree(ext_regions);
>> +
>> res = fdt_property(fdt, "reg", reg,
>> - dt_cells_to_size(addrcells + sizecells));
>> + dt_cells_to_size(addrcells + sizecells) * (i
>> + 1));
>> if ( res )
>> return res;
>>
>
> Cheers,
[1]
https://lore.kernel.org/xen-devel/cb1c8fd4-a4c5-c18e-c8db-f8e317d95526@xen.org/
[2]
https://elixir.bootlin.com/linux/v5.15-rc1/source/arch/arm64/mm/mmu.c#L1448
Thank you.
--
Regards,
Oleksandr Tyshchenko
next prev parent reply other threads:[~2021-09-17 19:51 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-10 18:18 [PATCH V2 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
2021-09-10 18:18 ` [PATCH V2 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
2021-09-16 14:49 ` Jan Beulich
2021-09-16 15:43 ` Oleksandr
2021-09-16 15:47 ` Jan Beulich
2021-09-16 16:05 ` Oleksandr
2021-09-10 18:18 ` [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
2021-09-14 0:55 ` Stefano Stabellini
2021-09-15 19:10 ` Oleksandr
2021-09-15 21:21 ` Stefano Stabellini
2021-09-16 20:57 ` Oleksandr
2021-09-16 21:30 ` Stefano Stabellini
2021-09-17 7:28 ` Oleksandr
2021-09-17 14:08 ` Oleksandr
2021-09-17 15:52 ` Julien Grall
2021-09-17 20:13 ` Oleksandr
2021-09-17 15:48 ` Julien Grall
2021-09-17 19:51 ` Oleksandr [this message]
2021-09-17 21:56 ` Stefano Stabellini
2021-09-17 22:37 ` Stefano Stabellini
2021-09-19 14:34 ` Julien Grall
2021-09-19 20:18 ` Oleksandr
2021-09-20 23:21 ` Stefano Stabellini
2021-09-21 18:14 ` Oleksandr
2021-09-21 22:00 ` Stefano Stabellini
2021-09-22 18:25 ` Oleksandr
2021-09-22 20:50 ` Stefano Stabellini
2021-09-23 10:10 ` Oleksandr
2021-09-20 23:55 ` Stefano Stabellini
2021-09-21 19:43 ` Oleksandr
2021-09-22 18:18 ` Oleksandr
2021-09-22 21:05 ` Stefano Stabellini
2021-09-23 10:11 ` Oleksandr
2021-09-18 16:59 ` Oleksandr
2021-09-23 10:41 ` Oleksandr
2021-09-23 16:38 ` Stefano Stabellini
2021-09-23 17:44 ` Oleksandr
2021-09-19 14:00 ` Julien Grall
2021-09-19 17:59 ` Oleksandr
2021-09-10 18:18 ` [PATCH V2 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
2021-09-16 22:35 ` Stefano Stabellini
2021-09-20 20:07 ` Oleksandr
2021-09-21 17:35 ` Oleksandr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08294c53-109a-8544-3a23-85e034d2992d@gmail.com \
--to=olekstysh@gmail.com \
--cc=Henry.Wang@arm.com \
--cc=Volodymyr_Babchuk@epam.com \
--cc=Wei.Chen@arm.com \
--cc=bertrand.marquis@arm.com \
--cc=julien@xen.org \
--cc=oleksandr_tyshchenko@epam.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).