All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleksandr <olekstysh@gmail.com>
To: Julien Grall <julien@xen.org>
Cc: xen-devel@lists.xenproject.org,
	Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Henry Wang <Henry.Wang@arm.com>,
	Bertrand Marquis <bertrand.marquis@arm.com>,
	Wei Chen <Wei.Chen@arm.com>
Subject: Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
Date: Fri, 17 Sep 2021 22:51:20 +0300	[thread overview]
Message-ID: <08294c53-109a-8544-3a23-85e034d2992d@gmail.com> (raw)
In-Reply-To: <0a72559e-5742-dc33-1c8f-5903c50b27be@xen.org>


On 17.09.21 18:48, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien


>
> On 10/09/2021 23:18, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain.
>> - The ACPI case is not covered.
>
> I understand the ACPI is not covered because we would need to create a 
> new binding. But I am not sure to understand why 32-bit domain is not 
> supported. Can you explain it?

The 32-bit domain is not supported for simplifying things from the 
beginning. It is a little bit difficult to get everything working at 
start. As I understand from discussion at [1] we can afford that 
simplification. However, I should have mentioned that 32-bit domain is 
not supported "for now".

>
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. 
>
> For the case when the IOMMU is disabled, this will only work if dom0 
> cannot allocate memory outside of the original range. This is 
> currently the case... but I think this should be spelled out in at 
> least the commit message.

Agree, will update commit description.


>
>
>> Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The maximum number of extended
>> regions is 128.
>
> Please explain how this limit was chosen.
Well, I decided to not introduce new data struct and etc to represent 
extended regions but reuse existing struct meminfo
used for memory/reserved-memory and, as I though, perfectly fitted. So, 
that limit come from NR_MEM_BANKS which is 128.


>
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes since RFC:
>>     - update patch description
>>     - drop uneeded "extended-region" DT property
>> ---
>>
>>   xen/arch/arm/domain_build.c | 226 
>> +++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 224 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index 206038d..070ec27 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -724,6 +724,196 @@ static int __init make_memory_node(const struct 
>> domain *d,
>>       return res;
>>   }
>>   +static int __init add_ext_regions(unsigned long s, unsigned long 
>> e, void *data)
>> +{
>> +    struct meminfo *ext_regions = data;
>> +    paddr_t start, size;
>> +
>> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
>> +        return 0;
>> +
>> +    /* Both start and size of the extended region should be 2MB 
>> aligned */
>> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
>> +    if ( start > e )
>> +        return 0;
>> +
>> +    size = (e - start + 1) & ~(SZ_2M - 1);
>> +    if ( !size )
>> +        return 0;
>> +
>> +    ext_regions->bank[ext_regions->nr_banks].start = start;
>> +    ext_regions->bank[ext_regions->nr_banks].size = size;
>> +    ext_regions->nr_banks ++;
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * The extended regions will be prevalidated by the memory hotplug path
>> + * in Linux which requires for any added address range to be within 
>> maximum
>> + * possible addressable physical memory range for which the linear 
>> mapping
>> + * could be created.
>> + * For 48-bit VA space size the maximum addressable range are:
>
> When I read "maximum", I understand an upper limit. But below, you are 
> providing a range. So should you drop "maximum"?

yes, it is a little bit confusing.


>
>
> Also, this is tailored to Linux using 48-bit VA. How about other limits?
These limits are calculated at [2]. Sorry, I didn't investigate yet what 
values would be for other CONFIG_ARM64_VA_BITS_XXX. Also looks like some 
configs depend on 16K/64K pages...
I will try to investigate and provide limits later on.


>
>
>> + * 0x40000000 - 0x80003fffffff
>> + */
>> +#define EXT_REGION_START   0x40000000ULL
>
> I am probably missing something here.... There are platform out there 
> with memory starting at 0 (IIRC ZynqMP is one example). So wouldn't 
> this potentially rule out the extended region on such platform?

 From my understanding the extended region cannot be in 0...0x40000000 
range. If these platforms have memory above first GB, I believe the 
extended region(s) can be allocated for them.


>
>
>> +#define EXT_REGION_END 0x80003fffffffULL
>> +
>> +static int __init find_unallocated_memory(const struct kernel_info 
>> *kinfo,
>> +                                          struct meminfo *ext_regions)
>> +{
>> +    const struct meminfo *assign_mem = &kinfo->mem;
>> +    struct rangeset *unalloc_mem;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>
> We technically already know which range of memory is unused. This is 
> pretty much any region in the freelist of the page allocator. So how 
> about walking the freelist instead?

ok, I will investigate the page allocator code (right now I have no 
understanding of how to do that). BTW, I have just grepped "freelist" 
through the code and all page context related appearances are in x86 
code only.

>
> The advantage is we don't need to worry about modifying the function 
> when adding new memory type.
>
> One disavantage is this will not cover *all* the unused memory as this 
> is doing. But I think this is an acceptable downside.
>
>> +
>> +    dt_dprintk("Find unallocated memory for extended regions\n");
>> +
>> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
>> +    if ( !unalloc_mem )
>> +        return -ENOMEM;
>> +
>> +    /* Start with all available RAM */
>> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.mem.bank[i].start;
>> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size 
>> - 1;
>> +        res = rangeset_add_range(unalloc_mem, start, end);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to add: 
>> %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove RAM assigned to Dom0 */
>> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
>> +    {
>> +        start = assign_mem->bank[i].start;
>> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size - 1;
>> +        res = rangeset_remove_range(unalloc_mem, start, end);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove reserved-memory regions */
>> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.reserved_mem.bank[i].start;
>> +        end = bootinfo.reserved_mem.bank[i].start +
>> +            bootinfo.reserved_mem.bank[i].size - 1;
>> +        res = rangeset_remove_range(unalloc_mem, start, end);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove grant table region */
>> +    start = kinfo->gnttab_start;
>> +    end = kinfo->gnttab_start + kinfo->gnttab_size - 1;
>> +    res = rangeset_remove_range(unalloc_mem, start, end);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    start = EXT_REGION_START;
>> +    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
>> +    res = rangeset_report_ranges(unalloc_mem, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(unalloc_mem);
>> +
>> +    return res;
>> +}
>> +
>> +static int __init find_memory_holes(const struct kernel_info *kinfo,
>> +                                    struct meminfo *ext_regions)
>> +{
>> +    struct dt_device_node *np;
>> +    struct rangeset *mem_holes;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find memory holes for extended regions\n");
>> +
>> +    mem_holes = rangeset_new(NULL, NULL, 0);
>> +    if ( !mem_holes )
>> +        return -ENOMEM;
>> +
>> +    /* Start with maximum possible addressable physical memory range */
>> +    start = EXT_REGION_START;
>> +    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
>> +    res = rangeset_add_range(mem_holes, start, end);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    /* Remove all regions described by "reg" property (MMIO, RAM, 
>> etc) */
>
> Well... The loop below is not going to handle all the regions 
> described in the property "reg". Instead, it will cover a subset of 
> "reg" where the memory is addressable.

As I understand, we are only interested in subset of "reg" where the 
memory is addressable.


>
>
> You will also need to cover "ranges" that will describe the BARs for 
> the PCI devices.
Good point. Could you please clarify how to recognize whether it is a 
PCI device as long as PCI support is not merged? Or just to find any 
device nodes with non-empty "ranges" property
and retrieve addresses?


>
>
>> +    dt_for_each_device_node( dt_host, np )
>> +    {
>> +        unsigned int naddr;
>> +        u64 addr, size;
>> +
>> +        naddr = dt_number_of_address(np);
>> +
>> +        for ( i = 0; i < naddr; i++ )
>> +        {
>> +            res = dt_device_get_address(np, i, &addr, &size);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Unable to retrieve address %u for 
>> %s\n",
>> +                       i, dt_node_full_name(np));
>> +                goto out;
>> +            }
>> +
>> +            start = addr & PAGE_MASK;
>> +            end = PAGE_ALIGN(addr + size) - 1;
>> +            res = rangeset_remove_range(mem_holes, start, end);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>> +                       start, end);
>> +                goto out;
>> +            }
>> +        }
>> +    }
>> +
>> +    start = EXT_REGION_START;
>> +    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
>> +    res = rangeset_report_ranges(mem_holes, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(mem_holes);
>> +
>> +    return res;
>> +}
>> +
>>   static int __init make_hypervisor_node(struct domain *d,
>>                                          const struct kernel_info 
>> *kinfo,
>>                                          int addrcells, int sizecells)
>> @@ -731,11 +921,13 @@ static int __init make_hypervisor_node(struct 
>> domain *d,
>>       const char compat[] =
>> "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>>           "xen,xen";
>> -    __be32 reg[4];
>> +    __be32 reg[(NR_MEM_BANKS + 1) * 4];
>
> This is a fairly large allocation on the stack. Could we move to a 
> dynamic allocation?

Of course, will do.


>
>
>>       gic_interrupt_t intr;
>>       __be32 *cells;
>>       int res;
>>       void *fdt = kinfo->fdt;
>> +    struct meminfo *ext_regions;
>> +    unsigned int i;
>>         dt_dprintk("Create hypervisor node\n");
>>   @@ -757,12 +949,42 @@ static int __init make_hypervisor_node(struct 
>> domain *d,
>>       if ( res )
>>           return res;
>>   +    ext_regions = xzalloc(struct meminfo);
>> +    if ( !ext_regions )
>> +        return -ENOMEM;
>> +
>> +    if ( is_32bit_domain(d) )
>> +        printk(XENLOG_WARNING "The extended region is only supported 
>> for 64-bit guest\n");
>> +    else
>> +    {
>> +        if ( !is_iommu_enabled(d) )
>> +            res = find_unallocated_memory(kinfo, ext_regions);
>> +        else
>> +            res = find_memory_holes(kinfo, ext_regions);
>> +
>> +        if ( res )
>> +            printk(XENLOG_WARNING "Failed to allocate extended 
>> regions\n");
>> +    }
>> +
>>       /* reg 0 is grant table space */
>>       cells = &reg[0];
>>       dt_child_set_range(&cells, addrcells, sizecells,
>>                          kinfo->gnttab_start, kinfo->gnttab_size);
>> +    /* reg 1...N are extended regions */
>> +    for ( i = 0; i < ext_regions->nr_banks; i++ )
>> +    {
>> +        u64 start = ext_regions->bank[i].start;
>> +        u64 size = ext_regions->bank[i].size;
>> +
>> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
>> +                   i, start, start + size);
>> +
>> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
>> +    }
>> +    xfree(ext_regions);
>> +
>>       res = fdt_property(fdt, "reg", reg,
>> -                       dt_cells_to_size(addrcells + sizecells));
>> +                       dt_cells_to_size(addrcells + sizecells) * (i 
>> + 1));
>>       if ( res )
>>           return res;
>>
>
> Cheers,

[1] 
https://lore.kernel.org/xen-devel/cb1c8fd4-a4c5-c18e-c8db-f8e317d95526@xen.org/

[2] 
https://elixir.bootlin.com/linux/v5.15-rc1/source/arch/arm64/mm/mmu.c#L1448


Thank you.


-- 
Regards,

Oleksandr Tyshchenko



  reply	other threads:[~2021-09-17 19:51 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10 18:18 [PATCH V2 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") Oleksandr Tyshchenko
2021-09-10 18:18 ` [PATCH V2 1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo Oleksandr Tyshchenko
2021-09-16 14:49   ` Jan Beulich
2021-09-16 15:43     ` Oleksandr
2021-09-16 15:47       ` Jan Beulich
2021-09-16 16:05         ` Oleksandr
2021-09-10 18:18 ` [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0 Oleksandr Tyshchenko
2021-09-14  0:55   ` Stefano Stabellini
2021-09-15 19:10     ` Oleksandr
2021-09-15 21:21       ` Stefano Stabellini
2021-09-16 20:57         ` Oleksandr
2021-09-16 21:30           ` Stefano Stabellini
2021-09-17  7:28             ` Oleksandr
2021-09-17 14:08       ` Oleksandr
2021-09-17 15:52       ` Julien Grall
2021-09-17 20:13         ` Oleksandr
2021-09-17 15:48   ` Julien Grall
2021-09-17 19:51     ` Oleksandr [this message]
2021-09-17 21:56       ` Stefano Stabellini
2021-09-17 22:37         ` Stefano Stabellini
2021-09-19 14:34           ` Julien Grall
2021-09-19 20:18             ` Oleksandr
2021-09-20 23:21               ` Stefano Stabellini
2021-09-21 18:14                 ` Oleksandr
2021-09-21 22:00                   ` Stefano Stabellini
2021-09-22 18:25                     ` Oleksandr
2021-09-22 20:50                       ` Stefano Stabellini
2021-09-23 10:10                         ` Oleksandr
2021-09-20 23:55             ` Stefano Stabellini
2021-09-21 19:43         ` Oleksandr
2021-09-22 18:18           ` Oleksandr
2021-09-22 21:05             ` Stefano Stabellini
2021-09-23 10:11               ` Oleksandr
2021-09-18 16:59       ` Oleksandr
2021-09-23 10:41         ` Oleksandr
2021-09-23 16:38           ` Stefano Stabellini
2021-09-23 17:44             ` Oleksandr
2021-09-19 14:00       ` Julien Grall
2021-09-19 17:59         ` Oleksandr
2021-09-10 18:18 ` [PATCH V2 3/3] libxl/arm: Add handling of extended regions for DomU Oleksandr Tyshchenko
2021-09-16 22:35   ` Stefano Stabellini
2021-09-20 20:07     ` Oleksandr
2021-09-21 17:35       ` Oleksandr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08294c53-109a-8544-3a23-85e034d2992d@gmail.com \
    --to=olekstysh@gmail.com \
    --cc=Henry.Wang@arm.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=Wei.Chen@arm.com \
    --cc=bertrand.marquis@arm.com \
    --cc=julien@xen.org \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.