On Wed, 10 Nov 2021, Oleksandr wrote: > On 28.10.21 04:40, Stefano Stabellini wrote: > > Hi Stefano > > I am sorry for the late response. > > > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote: > > > From: Oleksandr Tyshchenko > > > > > > This patch implements arch_xen_unpopulated_init() on Arm where > > > the extended regions (if any) are gathered from DT and inserted > > > into passed Xen resource to be used as unused address space > > > for Xen scratch pages by unpopulated-alloc code. > > > > > > The extended region (safe range) is a region of guest physical > > > address space which is unused and could be safely used to create > > > grant/foreign mappings instead of wasting real RAM pages from > > > the domain memory for establishing these mappings. > > > > > > The extended regions are chosen by the hypervisor at the domain > > > creation time and advertised to it via "reg" property under > > > hypervisor node in the guest device-tree. As region 0 is reserved > > > for grant table space (always present), the indexes for extended > > > regions are 1...N. > > > > > > If arch_xen_unpopulated_init() fails for some reason the default > > > behaviour will be restored (allocate xenballooned pages). > > > > > > This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86. > > > > > > Signed-off-by: Oleksandr Tyshchenko > > > --- > > > Changes RFC -> V2: > > > - new patch, instead of > > > "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide > > > unallocated space" > > > --- > > > arch/arm/xen/enlighten.c | 112 > > > +++++++++++++++++++++++++++++++++++++++++++++++ > > > drivers/xen/Kconfig | 2 +- > > > 2 files changed, 113 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > > > index dea46ec..1a1e0d3 100644 > > > --- a/arch/arm/xen/enlighten.c > > > +++ b/arch/arm/xen/enlighten.c > > > @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq; > > > static phys_addr_t xen_grant_frames; > > > #define GRANT_TABLE_INDEX 0 > > > +#define EXT_REGION_INDEX 1 > > > uint32_t xen_start_flags; > > > EXPORT_SYMBOL(xen_start_flags); > > > @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void) > > > #endif > > > } > > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC > > > +int arch_xen_unpopulated_init(struct resource *res) > > > +{ > > > + struct device_node *np; > > > + struct resource *regs, *tmp_res; > > > + uint64_t min_gpaddr = -1, max_gpaddr = 0; > > > + unsigned int i, nr_reg = 0; > > > + struct range mhp_range; > > > + int rc; > > > + > > > + if (!xen_domain()) > > > + return -ENODEV; > > > + > > > + np = of_find_compatible_node(NULL, NULL, "xen,xen"); > > > + if (WARN_ON(!np)) > > > + return -ENODEV; > > > + > > > + /* Skip region 0 which is reserved for grant table space */ > > > + while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL)) > > > + nr_reg++; > > > + if (!nr_reg) { > > > + pr_err("No extended regions are found\n"); > > > + return -EINVAL; > > > + } > > > + > > > + regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL); > > > + if (!regs) > > > + return -ENOMEM; > > > + > > > + /* > > > + * Create resource from extended regions provided by the hypervisor to > > > be > > > + * used as unused address space for Xen scratch pages. > > > + */ > > > + for (i = 0; i < nr_reg; i++) { > > > + rc = of_address_to_resource(np, i + EXT_REGION_INDEX, > > > ®s[i]); > > > + if (rc) > > > + goto err; > > > + > > > + if (max_gpaddr < regs[i].end) > > > + max_gpaddr = regs[i].end; > > > + if (min_gpaddr > regs[i].start) > > > + min_gpaddr = regs[i].start; > > > + } > > > + > > > + /* Check whether the resource range is within the hotpluggable range > > > */ > > > + mhp_range = mhp_get_pluggable_range(true); > > > + if (min_gpaddr < mhp_range.start) > > > + min_gpaddr = mhp_range.start; > > > + if (max_gpaddr > mhp_range.end) > > > + max_gpaddr = mhp_range.end; > > > + > > > + res->start = min_gpaddr; > > > + res->end = max_gpaddr; > > > + > > > + /* > > > + * Mark holes between extended regions as unavailable. The rest of > > > that > > > + * address space will be available for the allocation. > > > + */ > > > + for (i = 1; i < nr_reg; i++) { > > > + resource_size_t start, end; > > > + > > > + start = regs[i - 1].end + 1; > > > + end = regs[i].start - 1; > > > + > > > + if (start > (end + 1)) { > > Should this be: > > > > if (start >= end) > > > > ? > > Yes, we can do this here (since the checks are equivalent) but ... > > > > + rc = -EINVAL; > > > + goto err; > > > + } > > > + > > > + /* There is no hole between regions */ > > > + if (start == (end + 1)) > > Also here, shouldn't it be: > > > > if (start == end) > > > > ? > >    ... not here. > > As > > "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)" > > but > > "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)" OK. But the check: if (start >= end) Actually covers both cases so that's the only check we need? > > > > I think I am missing again something in termination accounting :-) > > If I understand correctly, we need to follow "end = start + size - 1" rule, so > the "end" is the last address inside a range, but not the "first" address > outside of a range)) yeah > > > + continue; > > > + > > > + /* Check whether the hole range is within the resource range > > > */ > > > + if (start < res->start || end > res->end) { > > By definition I don't think this check is necessary as either condition > > is impossible? > > > This is a good question, let me please explain. > Not all extended regions provided by the hypervisor can be used here. This is > because the addressable physical memory range for which the linear mapping > could be created has limits on Arm, and maximum addressable range depends on > the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them > in hypervisor as this logic could be quite complex as different OS may have > different requirement, etc. This means that we need to make sure that regions > are within the hotpluggable range to avoid a failure later on when a region is > pre-validated by the memory hotplug path. > > The following code limits the resource range based on that: > > +    /* Check whether the resource range is within the hotpluggable range */ > +    mhp_range = mhp_get_pluggable_range(true); > +    if (min_gpaddr < mhp_range.start) > +        min_gpaddr = mhp_range.start; > +    if (max_gpaddr > mhp_range.end) > +        max_gpaddr = mhp_range.end; > + > +    res->start = min_gpaddr; > +    res->end = max_gpaddr; > > In current loop (when calculating and inserting holes) we also need to make > sure that resulting hole range is within the resource range (and adjust/skip > it if not true) as regs[] used for the calculations contains raw regions as > they described in DT so not updated. Otherwise insert_resource() down the > function will return an error for the conflicting operations. Yes, I could > took a different route and update regs[] in advance to adjust/skip > non-suitable regions in front, but I decided to do it on the fly in the loop > here, I thought doing it in advance would add some overhead/complexity. What > do you think? I understand now. > So I am afraid this check is necessary here. > > For example in my environment the extended regions are: > > (XEN) Extended region 0: 0->0x8000000 > (XEN) Extended region 1: 0xc000000->0x30000000 > (XEN) Extended region 2: 0x40000000->0x47e00000 > (XEN) Extended region 3: 0xd0000000->0xe6000000 > (XEN) Extended region 4: 0xe7800000->0xec000000 > (XEN) Extended region 5: 0xf1200000->0xfd000000 > (XEN) Extended region 6: 0x100000000->0x500000000 > (XEN) Extended region 7: 0x580000000->0x600000000 > (XEN) Extended region 8: 0x680000000->0x700000000 > (XEN) Extended region 9: 0x780000000->0x10000000000 > > *With* the check the holes are: > > holes [47e00000 - cfffffff] > holes [e6000000 - e77fffff] > holes [ec000000 - f11fffff] > holes [fd000000 - ffffffff] > holes [500000000 - 57fffffff] > holes [600000000 - 67fffffff] > holes [700000000 - 77fffffff] > > And they seem to look correct, you can see that two possible holes between > extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were > skipped as they entirely located below res->start > which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff). > > *Without* the check these two holes won't be skipped and as the result > insert_resource() will fail. > > > ********** > > > I have one idea how we can simplify filter logic, we can drop all checks here > (including confusing one) in Arm code and update common code a bit: > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 1a1e0d3..ed5b855 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res) >         struct resource *regs, *tmp_res; >         uint64_t min_gpaddr = -1, max_gpaddr = 0; >         unsigned int i, nr_reg = 0; > -       struct range mhp_range; >         int rc; > >         if (!xen_domain()) > @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res) >                         min_gpaddr = regs[i].start; >         } > > -       /* Check whether the resource range is within the hotpluggable range > */ > -       mhp_range = mhp_get_pluggable_range(true); > -       if (min_gpaddr < mhp_range.start) > -               min_gpaddr = mhp_range.start; > -       if (max_gpaddr > mhp_range.end) > -               max_gpaddr = mhp_range.end; > - >         res->start = min_gpaddr; >         res->end = max_gpaddr; > > @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res) >                 if (start == (end + 1)) >                         continue; > > -               /* Check whether the hole range is within the resource range > */ > -               if (start < res->start || end > res->end) { > -                       if (start < res->start) > -                               start = res->start; > -                       if (end > res->end) > -                               end = res->end; > - > -                       if (start >= (end + 1)) > -                               continue; > -               } > - >                 tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL); >                 if (!tmp_res) { >                         rc = -ENOMEM; > diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c > index 1f1d8d8..a5d3ebb 100644 > --- a/drivers/xen/unpopulated-alloc.c > +++ b/drivers/xen/unpopulated-alloc.c > @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages) >         void *vaddr; >         unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION); >         int ret; > +       struct range mhp_range; > >         res = kzalloc(sizeof(*res), GFP_KERNEL); >         if (!res) > @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages) >         res->name = "Xen scratch"; >         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; > > +       mhp_range = mhp_get_pluggable_range(true); > + >         ret = allocate_resource(target_resource, res, > -                               alloc_pages * PAGE_SIZE, 0, -1, > +                               alloc_pages * PAGE_SIZE, mhp_range.start, > mhp_range.end, >                                 PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL); >         if (ret < 0) { >                 pr_err("Cannot allocate new IOMEM resource\n"); > (END) > > I believe, this will work on x86 as arch_get_mappable_range() is not > implemented there, > and the default option contains exactly what being used currently (0, -1). > > struct range __weak arch_get_mappable_range(void) > { >     struct range mhp_range = { >         .start = 0UL, >         .end = -1ULL, >     }; >     return mhp_range; > } > > And this is going to be more generic and clear, what do you think? Yeah this is much better, good thinking!