xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Oleksandr <olekstysh@gmail.com>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>,
	Russell King <linux@armlinux.org.uk>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>, Julien Grall <julien@xen.org>
Subject: Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
Date: Wed, 10 Nov 2021 22:21:40 +0200	[thread overview]
Message-ID: <237f832d-5175-5653-18ee-058a7d7fa7a6@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2110271803060.20134@sstabellini-ThinkPad-T480s>


On 28.10.21 04:40, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.

> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch implements arch_xen_unpopulated_init() on Arm where
>> the extended regions (if any) are gathered from DT and inserted
>> into passed Xen resource to be used as unused address space
>> for Xen scratch pages by unpopulated-alloc code.
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen by the hypervisor at the domain
>> creation time and advertised to it via "reg" property under
>> hypervisor node in the guest device-tree. As region 0 is reserved
>> for grant table space (always present), the indexes for extended
>> regions are 1...N.
>>
>> If arch_xen_unpopulated_init() fails for some reason the default
>> behaviour will be restored (allocate xenballooned pages).
>>
>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes RFC -> V2:
>>     - new patch, instead of
>>      "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
>> ---
>>   arch/arm/xen/enlighten.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/xen/Kconfig      |   2 +-
>>   2 files changed, 113 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index dea46ec..1a1e0d3 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>   static phys_addr_t xen_grant_frames;
>>   
>>   #define GRANT_TABLE_INDEX   0
>> +#define EXT_REGION_INDEX    1
>>   
>>   uint32_t xen_start_flags;
>>   EXPORT_SYMBOL(xen_start_flags);
>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>   #endif
>>   }
>>   
>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>> +int arch_xen_unpopulated_init(struct resource *res)
>> +{
>> +	struct device_node *np;
>> +	struct resource *regs, *tmp_res;
>> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
>> +	unsigned int i, nr_reg = 0;
>> +	struct range mhp_range;
>> +	int rc;
>> +
>> +	if (!xen_domain())
>> +		return -ENODEV;
>> +
>> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
>> +	if (WARN_ON(!np))
>> +		return -ENODEV;
>> +
>> +	/* Skip region 0 which is reserved for grant table space */
>> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
>> +		nr_reg++;
>> +	if (!nr_reg) {
>> +		pr_err("No extended regions are found\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>> +	if (!regs)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * Create resource from extended regions provided by the hypervisor to be
>> +	 * used as unused address space for Xen scratch pages.
>> +	 */
>> +	for (i = 0; i < nr_reg; i++) {
>> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX, &regs[i]);
>> +		if (rc)
>> +			goto err;
>> +
>> +		if (max_gpaddr < regs[i].end)
>> +			max_gpaddr = regs[i].end;
>> +		if (min_gpaddr > regs[i].start)
>> +			min_gpaddr = regs[i].start;
>> +	}
>> +
>> +	/* Check whether the resource range is within the hotpluggable range */
>> +	mhp_range = mhp_get_pluggable_range(true);
>> +	if (min_gpaddr < mhp_range.start)
>> +		min_gpaddr = mhp_range.start;
>> +	if (max_gpaddr > mhp_range.end)
>> +		max_gpaddr = mhp_range.end;
>> +
>> +	res->start = min_gpaddr;
>> +	res->end = max_gpaddr;
>> +
>> +	/*
>> +	 * Mark holes between extended regions as unavailable. The rest of that
>> +	 * address space will be available for the allocation.
>> +	 */
>> +	for (i = 1; i < nr_reg; i++) {
>> +		resource_size_t start, end;
>> +
>> +		start = regs[i - 1].end + 1;
>> +		end = regs[i].start - 1;
>> +
>> +		if (start > (end + 1)) {
> Should this be:
>
> if (start >= end)
>
> ?

Yes, we can do this here (since the checks are equivalent) but ...


>
>
>> +			rc = -EINVAL;
>> +			goto err;
>> +		}
>> +
>> +		/* There is no hole between regions */
>> +		if (start == (end + 1))
> Also here, shouldn't it be:
>
> if (start == end)
>
> ?

    ... not here.

As

"(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"

but

"(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"


>
> I think I am missing again something in termination accounting :-)

If I understand correctly, we need to follow "end = start + size - 1" 
rule, so the "end" is the last address inside a range, but not the 
"first" address outside of a range))


>
>
>> +			continue;
>> +
>> +		/* Check whether the hole range is within the resource range */
>> +		if (start < res->start || end > res->end) {
> By definition I don't think this check is necessary as either condition
> is impossible?


This is a good question, let me please explain.
Not all extended regions provided by the hypervisor can be used here. 
This is because the addressable physical memory range for which the 
linear mapping
could be created has limits on Arm, and maximum addressable range 
depends on the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided 
to not filter them in hypervisor as this logic could be quite complex as 
different OS may have different requirement, etc. This means that we 
need to make sure that regions are within the hotpluggable range to 
avoid a failure later on when a region is pre-validated by the memory 
hotplug path.

The following code limits the resource range based on that:

+    /* Check whether the resource range is within the hotpluggable range */
+    mhp_range = mhp_get_pluggable_range(true);
+    if (min_gpaddr < mhp_range.start)
+        min_gpaddr = mhp_range.start;
+    if (max_gpaddr > mhp_range.end)
+        max_gpaddr = mhp_range.end;
+
+    res->start = min_gpaddr;
+    res->end = max_gpaddr;

In current loop (when calculating and inserting holes) we also need to 
make sure that resulting hole range is within the resource range (and 
adjust/skip it if not true) as regs[] used for the calculations contains 
raw regions as they described in DT so not updated. Otherwise 
insert_resource() down the function will return an error for the 
conflicting operations. Yes, I could took a different route and update 
regs[] in advance to adjust/skip non-suitable regions in front, but I 
decided to do it on the fly in the loop here, I thought doing it in 
advance would add some overhead/complexity. What do you think?

So I am afraid this check is necessary here.

For example in my environment the extended regions are:

(XEN) Extended region 0: 0->0x8000000
(XEN) Extended region 1: 0xc000000->0x30000000
(XEN) Extended region 2: 0x40000000->0x47e00000
(XEN) Extended region 3: 0xd0000000->0xe6000000
(XEN) Extended region 4: 0xe7800000->0xec000000
(XEN) Extended region 5: 0xf1200000->0xfd000000
(XEN) Extended region 6: 0x100000000->0x500000000
(XEN) Extended region 7: 0x580000000->0x600000000
(XEN) Extended region 8: 0x680000000->0x700000000
(XEN) Extended region 9: 0x780000000->0x10000000000

*With* the check the holes are:

holes [47e00000 - cfffffff]
holes [e6000000 - e77fffff]
holes [ec000000 - f11fffff]
holes [fd000000 - ffffffff]
holes [500000000 - 57fffffff]
holes [600000000 - 67fffffff]
holes [700000000 - 77fffffff]

And they seem to look correct, you can see that two possible holes 
between extended regions 0-1 (8000000-bffffff) and 1-2 
(30000000-3fffffff) were skipped as they entirely located below res->start
which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).

*Without* the check these two holes won't be skipped and as the result 
insert_resource() will fail.


**********


I have one idea how we can simplify filter logic, we can drop all checks 
here (including confusing one) in Arm code and update common code a bit:

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 1a1e0d3..ed5b855 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
         struct resource *regs, *tmp_res;
         uint64_t min_gpaddr = -1, max_gpaddr = 0;
         unsigned int i, nr_reg = 0;
-       struct range mhp_range;
         int rc;

         if (!xen_domain())
@@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
                         min_gpaddr = regs[i].start;
         }

-       /* Check whether the resource range is within the hotpluggable 
range */
-       mhp_range = mhp_get_pluggable_range(true);
-       if (min_gpaddr < mhp_range.start)
-               min_gpaddr = mhp_range.start;
-       if (max_gpaddr > mhp_range.end)
-               max_gpaddr = mhp_range.end;
-
         res->start = min_gpaddr;
         res->end = max_gpaddr;

@@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
                 if (start == (end + 1))
                         continue;

-               /* Check whether the hole range is within the resource 
range */
-               if (start < res->start || end > res->end) {
-                       if (start < res->start)
-                               start = res->start;
-                       if (end > res->end)
-                               end = res->end;
-
-                       if (start >= (end + 1))
-                               continue;
-               }
-
                 tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
                 if (!tmp_res) {
                         rc = -ENOMEM;
diff --git a/drivers/xen/unpopulated-alloc.c 
b/drivers/xen/unpopulated-alloc.c
index 1f1d8d8..a5d3ebb 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
         void *vaddr;
         unsigned int i, alloc_pages = round_up(nr_pages, 
PAGES_PER_SECTION);
         int ret;
+       struct range mhp_range;

         res = kzalloc(sizeof(*res), GFP_KERNEL);
         if (!res)
@@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
         res->name = "Xen scratch";
         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

+       mhp_range = mhp_get_pluggable_range(true);
+
         ret = allocate_resource(target_resource, res,
-                               alloc_pages * PAGE_SIZE, 0, -1,
+                               alloc_pages * PAGE_SIZE, 
mhp_range.start, mhp_range.end,
                                 PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
         if (ret < 0) {
                 pr_err("Cannot allocate new IOMEM resource\n");
(END)

I believe, this will work on x86 as arch_get_mappable_range() is not 
implemented there,
and the default option contains exactly what being used currently (0, -1).

struct range __weak arch_get_mappable_range(void)
{
     struct range mhp_range = {
         .start = 0UL,
         .end = -1ULL,
     };
     return mhp_range;
}

And this is going to be more generic and clear, what do you think?


>
>> +			if (start < res->start)
>> +				start = res->start;
>> +			if (end > res->end)
>> +				end = res->end;
>> +
>> +			if (start >= (end + 1))
>> +				continue;
>> +		}
>> +
>> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>> +		if (!tmp_res) {
>> +			rc = -ENOMEM;
>> +			goto err;
>> +		}
>> +
>> +		tmp_res->name = "Unavailable space";
>> +		tmp_res->start = start;
>> +		tmp_res->end = end;
> Do we need to set any flags so that the system can reuse the memory in
> the hole, e.g. IORESOURCE_MEM? Or is it not necessary?


I might be wrong, but I don't think it is necessary. I don't see how the 
system can reuse memory in the holes as
the Xen resource we are constructing here will be exclusively used by 
the unpopulated-alloc code only. I would leave type-less
resource here. Or I missed something?


>
>
>> +		rc = insert_resource(res, tmp_res);
>> +		if (rc) {
>> +			pr_err("Cannot insert resource [%llx - %llx] %d\n",
>> +					tmp_res->start, tmp_res->end, rc);
> Although it is impossible to enable XEN_UNPOPULATED_ALLOC on arm32 due
> to unmet dependencies, I would like to keep the implementation of
> arch_xen_unpopulated_init 32bit clean.
>
> I am getting build errors like (by forcing arch_xen_unpopulated_init to
> compile on arm32):
>
> ./include/linux/kern_levels.h:5:18: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’ [-Wformat=]

Thank you for pointing this out. I will use %pR specifier here and in 
the common code where I print the same message.


>
>
>> +			kfree(tmp_res);
>> +			goto err;
>> +		}
>> +	}
>> +
>> +err:
>> +	kfree(regs);
>> +
>> +	return rc;
>> +}
>> +#endif
>> +
>>   static void __init xen_dt_guest_init(void)
>>   {
>>   	struct device_node *xen_node;
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index 1b2c3ac..e6031fc 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -297,7 +297,7 @@ config XEN_FRONT_PGDIR_SHBUF
>>   
>>   config XEN_UNPOPULATED_ALLOC
>>   	bool "Use unpopulated memory ranges for guest mappings"
>> -	depends on X86 && ZONE_DEVICE
>> +	depends on ZONE_DEVICE
>>   	default XEN_BACKEND || XEN_GNTDEV || XEN_DOM0
>>   	help
>>   	  Use unpopulated memory ranges in order to create mappings for guest

-- 
Regards,

Oleksandr Tyshchenko



  reply	other threads:[~2021-11-10 20:22 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-26 16:05 [PATCH V2 0/4] xen: Add support of extended regions (safe ranges) on Arm Oleksandr Tyshchenko
2021-10-26 16:05 ` [PATCH V2 1/4] xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list() Oleksandr Tyshchenko
2021-10-28 18:57   ` Boris Ostrovsky
2021-10-26 16:05 ` [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT Oleksandr Tyshchenko
2021-10-28  1:28   ` Stefano Stabellini
2021-11-10 22:14     ` Oleksandr
2021-11-19  0:32       ` Stefano Stabellini
2021-11-19 18:25         ` Oleksandr
2021-10-26 16:05 ` [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource Oleksandr Tyshchenko
2021-10-28 16:37   ` Stefano Stabellini
2021-11-09 18:34     ` Oleksandr
2021-11-19  0:59       ` Stefano Stabellini
2021-11-19 18:18         ` Oleksandr
2021-11-20  2:19           ` Stefano Stabellini
2021-11-23 16:46             ` Oleksandr
2021-11-23 21:25               ` Stefano Stabellini
2021-11-24  9:33                 ` Oleksandr
2021-11-24  5:16               ` Juergen Gross
2021-11-24  9:37                 ` Oleksandr
2021-10-28 19:08   ` Boris Ostrovsky
2021-11-09 18:51     ` Oleksandr
2021-10-26 16:05 ` [PATCH V2 4/4] arm/xen: Read extended regions from DT and init " Oleksandr Tyshchenko
2021-10-28  1:40   ` Stefano Stabellini
2021-11-10 20:21     ` Oleksandr [this message]
2021-11-19  1:19       ` Stefano Stabellini
2021-11-19 20:23         ` Oleksandr
2021-11-20  2:36           ` Stefano Stabellini
2021-11-20 13:38             ` Oleksandr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=237f832d-5175-5653-18ee-058a7d7fa7a6@gmail.com \
    --to=olekstysh@gmail.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=jgross@suse.com \
    --cc=julien@xen.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).