All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/4] xen: Add support of extended regions (safe ranges) on Arm
@ 2021-10-26 16:05 ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-arm-kernel, linux-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Russell King,
	Boris Ostrovsky, Juergen Gross, Julien Grall, Bertrand Marquis,
	Wei Chen, Henry Wang

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

You can find the RFC patch series at [1].

The corresponding Xen support (for both Dom0 and DomU) is already committed and
is available in mainline Xen since the following commit:
57f87857dc2de452a796d6bad4f476510efd2aba libxl/arm: Add handling of extended regions for DomU

The extended region (safe range) is a region of guest physical address space
which is unused and could be safely used to create grant/foreign mappings instead
of ballooning out real RAM pages to obtain a physical address space for creating
these mappings (which simply results in wasting domain memory and shattering super
pages in P2M table).

The problem is that we cannot follow Linux advise which memory ranges are unused
on Arm as there might be some identity mappings in P2M table (stage 2) the guest is not
aware of or not all device I/O regions might be known (registered) by the time the guest
starts creating grant/foreign mappings. This is why we need some hints from the hypervisor
which knows all details in advance to be able to choose extended regions (which won't
clash with other resources).

The extended regions are chosen at the domain creation time and advertised to it via
"reg" property under hypervisor node in the guest device-tree [2]. As region 0 is reserved
for grant table space (always present), the indexes for extended regions are 1...N.
No device tree bindings update is needed, guest infers the presence of extended regions
from the number of regions in "reg" property.
    
Please note the following:
- The ACPI case is not covered for now
- patch series was created in a way to retain existing behavior on x86

The patch series is based on v5.15-rc7 and also available at [3], it was fully
tested on Arm64 and only compile tested on x86.

[1] https://lore.kernel.org/all/1627490656-1267-1-git-send-email-olekstysh@gmail.com/
    https://lore.kernel.org/all/1627490656-1267-2-git-send-email-olekstysh@gmail.com/

[2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/misc/arm/device-tree/guest.txt;hb=refs/heads/master

[3] https://github.com/otyshchenko1/linux/commits/map_opt_ml5

Oleksandr Tyshchenko (4):
  xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list()
  arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
  xen/unpopulated-alloc: Add mechanism to use Xen resource
  arm/xen: Read extended regions from DT and init Xen resource

 arch/arm/xen/enlighten.c        | 144 ++++++++++++++++++++++++++++++++++++++--
 drivers/xen/Kconfig             |   2 +-
 drivers/xen/unpopulated-alloc.c |  90 +++++++++++++++++++++++--
 include/xen/xen.h               |   2 +
 4 files changed, 226 insertions(+), 12 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V2 0/4] xen: Add support of extended regions (safe ranges) on Arm
@ 2021-10-26 16:05 ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-arm-kernel, linux-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Russell King,
	Boris Ostrovsky, Juergen Gross, Julien Grall, Bertrand Marquis,
	Wei Chen, Henry Wang

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

You can find the RFC patch series at [1].

The corresponding Xen support (for both Dom0 and DomU) is already committed and
is available in mainline Xen since the following commit:
57f87857dc2de452a796d6bad4f476510efd2aba libxl/arm: Add handling of extended regions for DomU

The extended region (safe range) is a region of guest physical address space
which is unused and could be safely used to create grant/foreign mappings instead
of ballooning out real RAM pages to obtain a physical address space for creating
these mappings (which simply results in wasting domain memory and shattering super
pages in P2M table).

The problem is that we cannot follow Linux advise which memory ranges are unused
on Arm as there might be some identity mappings in P2M table (stage 2) the guest is not
aware of or not all device I/O regions might be known (registered) by the time the guest
starts creating grant/foreign mappings. This is why we need some hints from the hypervisor
which knows all details in advance to be able to choose extended regions (which won't
clash with other resources).

The extended regions are chosen at the domain creation time and advertised to it via
"reg" property under hypervisor node in the guest device-tree [2]. As region 0 is reserved
for grant table space (always present), the indexes for extended regions are 1...N.
No device tree bindings update is needed, guest infers the presence of extended regions
from the number of regions in "reg" property.
    
Please note the following:
- The ACPI case is not covered for now
- patch series was created in a way to retain existing behavior on x86

The patch series is based on v5.15-rc7 and also available at [3], it was fully
tested on Arm64 and only compile tested on x86.

[1] https://lore.kernel.org/all/1627490656-1267-1-git-send-email-olekstysh@gmail.com/
    https://lore.kernel.org/all/1627490656-1267-2-git-send-email-olekstysh@gmail.com/

[2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/misc/arm/device-tree/guest.txt;hb=refs/heads/master

[3] https://github.com/otyshchenko1/linux/commits/map_opt_ml5

Oleksandr Tyshchenko (4):
  xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list()
  arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
  xen/unpopulated-alloc: Add mechanism to use Xen resource
  arm/xen: Read extended regions from DT and init Xen resource

 arch/arm/xen/enlighten.c        | 144 ++++++++++++++++++++++++++++++++++++++--
 drivers/xen/Kconfig             |   2 +-
 drivers/xen/unpopulated-alloc.c |  90 +++++++++++++++++++++++--
 include/xen/xen.h               |   2 +
 4 files changed, 226 insertions(+), 12 deletions(-)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V2 1/4] xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list()
  2021-10-26 16:05 ` Oleksandr Tyshchenko
  (?)
@ 2021-10-26 16:05 ` Oleksandr Tyshchenko
  2021-10-28 18:57   ` Boris Ostrovsky
  -1 siblings, 1 reply; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Oleksandr Tyshchenko, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

If memremap_pages() succeeds the range is guaranteed to have proper page
table, there is no need for an additional virt_addr_valid() check.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V2:
   - new patch, instead of
     "[RFC PATCH 1/2] arm64: mm: Make virt_addr_valid to check for pfn_valid again"
---
 drivers/xen/unpopulated-alloc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
index 87e6b7d..a03dc5b 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -85,7 +85,6 @@ static int fill_list(unsigned int nr_pages)
 	for (i = 0; i < alloc_pages; i++) {
 		struct page *pg = virt_to_page(vaddr + PAGE_SIZE * i);
 
-		BUG_ON(!virt_addr_valid(vaddr + PAGE_SIZE * i));
 		pg->zone_device_data = page_list;
 		page_list = pg;
 		list_count++;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
  2021-10-26 16:05 ` Oleksandr Tyshchenko
@ 2021-10-26 16:05   ` Oleksandr Tyshchenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-arm-kernel, linux-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Russell King, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Read the start address of the grant table space from DT
(region 0).

This patch mostly restores behaviour before commit 3cf4095d7446
("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
but trying not to break the ACPI support added after that commit.
So the patch touches DT part only and leaves the ACPI part with
xen_xlate_map_ballooned_pages().

This is a preparation for using Xen extended region feature
where unused regions of guest physical address space (provided
by the hypervisor) will be used to create grant/foreign/whatever
mappings instead of wasting real RAM pages from the domain memory
for establishing these mappings.

The immediate benefit of this change:
- Avoid superpage shattering in Xen P2M when establishing
  stage-2 mapping (GFN <-> MFN) for the grant table space
- Avoid wasting real RAM pages (reducing the amount of memory
  usuable) for mapping grant table space
- The grant table space is always mapped at the exact
  same place (region 0 is reserved for the grant table)

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V2:
   - new patch
---
 arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 7f1c106b..dea46ec 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -59,6 +59,9 @@ unsigned long xen_released_pages;
 struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
 
 static __read_mostly unsigned int xen_events_irq;
+static phys_addr_t xen_grant_frames;
+
+#define GRANT_TABLE_INDEX   0
 
 uint32_t xen_start_flags;
 EXPORT_SYMBOL(xen_start_flags);
@@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
 static void __init xen_dt_guest_init(void)
 {
 	struct device_node *xen_node;
+	struct resource res;
 
 	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
 	if (!xen_node) {
@@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
 		return;
 	}
 
+	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
+		pr_err("Xen grant table region is not found\n");
+		return;
+	}
+	xen_grant_frames = res.start;
+
 	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
 }
 
@@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
 {
 	struct xen_add_to_physmap xatp;
 	struct shared_info *shared_info_page = NULL;
-	int cpu;
+	int rc, cpu;
 
 	if (!xen_domain())
 		return 0;
 
 	if (!acpi_disabled)
 		xen_acpi_guest_init();
-	else
+	else {
 		xen_dt_guest_init();
 
+		if (!xen_grant_frames)
+			return -ENODEV;
+	}
+
 	if (!xen_events_irq) {
 		pr_err("Xen event channel interrupt not found\n");
 		return -ENODEV;
@@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
 	for_each_possible_cpu(cpu)
 		per_cpu(xen_vcpu_id, cpu) = cpu;
 
-	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
-	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
-					  &xen_auto_xlat_grant_frames.vaddr,
-					  xen_auto_xlat_grant_frames.count)) {
+	if (!acpi_disabled) {
+		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
+		rc = xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
+										   &xen_auto_xlat_grant_frames.vaddr,
+										   xen_auto_xlat_grant_frames.count);
+	} else
+		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
+	if (rc) {
 		free_percpu(xen_vcpu_info);
-		return -ENOMEM;
+		return rc;
 	}
 	gnttab_init();
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
@ 2021-10-26 16:05   ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-arm-kernel, linux-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Russell King, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Read the start address of the grant table space from DT
(region 0).

This patch mostly restores behaviour before commit 3cf4095d7446
("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
but trying not to break the ACPI support added after that commit.
So the patch touches DT part only and leaves the ACPI part with
xen_xlate_map_ballooned_pages().

This is a preparation for using Xen extended region feature
where unused regions of guest physical address space (provided
by the hypervisor) will be used to create grant/foreign/whatever
mappings instead of wasting real RAM pages from the domain memory
for establishing these mappings.

The immediate benefit of this change:
- Avoid superpage shattering in Xen P2M when establishing
  stage-2 mapping (GFN <-> MFN) for the grant table space
- Avoid wasting real RAM pages (reducing the amount of memory
  usuable) for mapping grant table space
- The grant table space is always mapped at the exact
  same place (region 0 is reserved for the grant table)

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V2:
   - new patch
---
 arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 7f1c106b..dea46ec 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -59,6 +59,9 @@ unsigned long xen_released_pages;
 struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
 
 static __read_mostly unsigned int xen_events_irq;
+static phys_addr_t xen_grant_frames;
+
+#define GRANT_TABLE_INDEX   0
 
 uint32_t xen_start_flags;
 EXPORT_SYMBOL(xen_start_flags);
@@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
 static void __init xen_dt_guest_init(void)
 {
 	struct device_node *xen_node;
+	struct resource res;
 
 	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
 	if (!xen_node) {
@@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
 		return;
 	}
 
+	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
+		pr_err("Xen grant table region is not found\n");
+		return;
+	}
+	xen_grant_frames = res.start;
+
 	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
 }
 
@@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
 {
 	struct xen_add_to_physmap xatp;
 	struct shared_info *shared_info_page = NULL;
-	int cpu;
+	int rc, cpu;
 
 	if (!xen_domain())
 		return 0;
 
 	if (!acpi_disabled)
 		xen_acpi_guest_init();
-	else
+	else {
 		xen_dt_guest_init();
 
+		if (!xen_grant_frames)
+			return -ENODEV;
+	}
+
 	if (!xen_events_irq) {
 		pr_err("Xen event channel interrupt not found\n");
 		return -ENODEV;
@@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
 	for_each_possible_cpu(cpu)
 		per_cpu(xen_vcpu_id, cpu) = cpu;
 
-	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
-	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
-					  &xen_auto_xlat_grant_frames.vaddr,
-					  xen_auto_xlat_grant_frames.count)) {
+	if (!acpi_disabled) {
+		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
+		rc = xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
+										   &xen_auto_xlat_grant_frames.vaddr,
+										   xen_auto_xlat_grant_frames.count);
+	} else
+		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
+	if (rc) {
 		free_percpu(xen_vcpu_info);
-		return -ENOMEM;
+		return rc;
 	}
 	gnttab_init();
 
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-10-26 16:05 ` Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  (?)
@ 2021-10-26 16:05 ` Oleksandr Tyshchenko
  2021-10-28 16:37   ` Stefano Stabellini
  2021-10-28 19:08   ` Boris Ostrovsky
  -1 siblings, 2 replies; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Oleksandr Tyshchenko, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The main reason of this change is that unpopulated-alloc
code cannot be used in its current form on Arm, but there
is a desire to reuse it to avoid wasting real RAM pages
for the grant/foreign mappings.

The problem is that system "iomem_resource" is used for
the address space allocation, but the really unallocated
space can't be figured out precisely by the domain on Arm
without hypervisor involvement. For example, not all device
I/O regions are known by the time domain starts creating
grant/foreign mappings. And following the advise from
"iomem_resource" we might end up reusing these regions by
a mistake. So, the hypervisor which maintains the P2M for
the domain is in the best position to provide unused regions
of guest physical address space which could be safely used
to create grant/foreign mappings.

Introduce new helper arch_xen_unpopulated_init() which purpose
is to create specific Xen resource based on the memory regions
provided by the hypervisor to be used as unused space for Xen
scratch pages.

If arch doesn't implement arch_xen_unpopulated_init() to
initialize Xen resource the default "iomem_resource" will be used.
So the behavior on x86 won't be changed.

Also fall back to allocate xenballooned pages (steal real RAM
pages) if we do not have any suitable resource to work with and
as the result we won't be able to provide unpopulated pages.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V2:
   - new patch, instead of
    "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
---
 drivers/xen/unpopulated-alloc.c | 89 +++++++++++++++++++++++++++++++++++++++--
 include/xen/xen.h               |  2 +
 2 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
index a03dc5b..1f1d8d8 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -8,6 +8,7 @@
 
 #include <asm/page.h>
 
+#include <xen/balloon.h>
 #include <xen/page.h>
 #include <xen/xen.h>
 
@@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
 static struct page *page_list;
 static unsigned int list_count;
 
+static struct resource *target_resource;
+static struct resource xen_resource = {
+	.name = "Xen unused space",
+};
+
+/*
+ * If arch is not happy with system "iomem_resource" being used for
+ * the region allocation it can provide it's own view by initializing
+ * "xen_resource" with unused regions of guest physical address space
+ * provided by the hypervisor.
+ */
+int __weak arch_xen_unpopulated_init(struct resource *res)
+{
+	return -ENOSYS;
+}
+
 static int fill_list(unsigned int nr_pages)
 {
 	struct dev_pagemap *pgmap;
-	struct resource *res;
+	struct resource *res, *tmp_res = NULL;
 	void *vaddr;
 	unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
-	int ret = -ENOMEM;
+	int ret;
 
 	res = kzalloc(sizeof(*res), GFP_KERNEL);
 	if (!res)
@@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
 	res->name = "Xen scratch";
 	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
 
-	ret = allocate_resource(&iomem_resource, res,
+	ret = allocate_resource(target_resource, res,
 				alloc_pages * PAGE_SIZE, 0, -1,
 				PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
 	if (ret < 0) {
@@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
 		goto err_resource;
 	}
 
+	/*
+	 * Reserve the region previously allocated from Xen resource to avoid
+	 * re-using it by someone else.
+	 */
+	if (target_resource != &iomem_resource) {
+		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
+		if (!res) {
+			ret = -ENOMEM;
+			goto err_insert;
+		}
+
+		tmp_res->name = res->name;
+		tmp_res->start = res->start;
+		tmp_res->end = res->end;
+		tmp_res->flags = res->flags;
+
+		ret = insert_resource(&iomem_resource, tmp_res);
+		if (ret < 0) {
+			pr_err("Cannot insert IOMEM resource [%llx - %llx]\n",
+			       tmp_res->start, tmp_res->end);
+			kfree(tmp_res);
+			goto err_insert;
+		}
+	}
+
 	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
 	if (!pgmap) {
 		ret = -ENOMEM;
@@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
 err_memremap:
 	kfree(pgmap);
 err_pgmap:
+	if (tmp_res) {
+		release_resource(tmp_res);
+		kfree(tmp_res);
+	}
+err_insert:
 	release_resource(res);
 err_resource:
 	kfree(res);
 	return ret;
 }
 
+static void unpopulated_init(void)
+{
+	static bool inited = false;
+	int ret;
+
+	if (inited)
+		return;
+
+	/*
+	 * Try to initialize Xen resource the first and fall back to default
+	 * resource if arch doesn't offer one.
+	 */
+	ret = arch_xen_unpopulated_init(&xen_resource);
+	if (!ret)
+		target_resource = &xen_resource;
+	else if (ret == -ENOSYS)
+		target_resource = &iomem_resource;
+	else
+		pr_err("Cannot initialize Xen resource\n");
+
+	inited = true;
+}
+
 /**
  * xen_alloc_unpopulated_pages - alloc unpopulated pages
  * @nr_pages: Number of pages
@@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
 	unsigned int i;
 	int ret = 0;
 
+	unpopulated_init();
+
+	/*
+	 * Fall back to default behavior if we do not have any suitable resource
+	 * to allocate required region from and as the result we won't be able to
+	 * construct pages.
+	 */
+	if (!target_resource)
+		return alloc_xenballooned_pages(nr_pages, pages);
+
 	mutex_lock(&list_lock);
 	if (list_count < nr_pages) {
 		ret = fill_list(nr_pages - list_count);
@@ -159,6 +239,9 @@ void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages)
 {
 	unsigned int i;
 
+	if (!target_resource)
+		return free_xenballooned_pages(nr_pages, pages);
+
 	mutex_lock(&list_lock);
 	for (i = 0; i < nr_pages; i++) {
 		pages[i]->zone_device_data = page_list;
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 43efba0..55d2ef8 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -55,6 +55,8 @@ extern u64 xen_saved_max_mem_size;
 #ifdef CONFIG_XEN_UNPOPULATED_ALLOC
 int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
 void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);
+struct resource;
+int arch_xen_unpopulated_init(struct resource *res);
 #else
 #define xen_alloc_unpopulated_pages alloc_xenballooned_pages
 #define xen_free_unpopulated_pages free_xenballooned_pages
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
  2021-10-26 16:05 ` Oleksandr Tyshchenko
@ 2021-10-26 16:05   ` Oleksandr Tyshchenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-arm-kernel, linux-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Russell King,
	Boris Ostrovsky, Juergen Gross, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch implements arch_xen_unpopulated_init() on Arm where
the extended regions (if any) are gathered from DT and inserted
into passed Xen resource to be used as unused address space
for Xen scratch pages by unpopulated-alloc code.

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen by the hypervisor at the domain
creation time and advertised to it via "reg" property under
hypervisor node in the guest device-tree. As region 0 is reserved
for grant table space (always present), the indexes for extended
regions are 1...N.

If arch_xen_unpopulated_init() fails for some reason the default
behaviour will be restored (allocate xenballooned pages).

This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V2:
   - new patch, instead of
    "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
---
 arch/arm/xen/enlighten.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/Kconfig      |   2 +-
 2 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index dea46ec..1a1e0d3 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
 static phys_addr_t xen_grant_frames;
 
 #define GRANT_TABLE_INDEX   0
+#define EXT_REGION_INDEX    1
 
 uint32_t xen_start_flags;
 EXPORT_SYMBOL(xen_start_flags);
@@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
 #endif
 }
 
+#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
+int arch_xen_unpopulated_init(struct resource *res)
+{
+	struct device_node *np;
+	struct resource *regs, *tmp_res;
+	uint64_t min_gpaddr = -1, max_gpaddr = 0;
+	unsigned int i, nr_reg = 0;
+	struct range mhp_range;
+	int rc;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	np = of_find_compatible_node(NULL, NULL, "xen,xen");
+	if (WARN_ON(!np))
+		return -ENODEV;
+
+	/* Skip region 0 which is reserved for grant table space */
+	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
+		nr_reg++;
+
+	if (!nr_reg) {
+		pr_err("No extended regions are found\n");
+		return -EINVAL;
+	}
+
+	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
+	if (!regs)
+		return -ENOMEM;
+
+	/*
+	 * Create resource from extended regions provided by the hypervisor to be
+	 * used as unused address space for Xen scratch pages.
+	 */
+	for (i = 0; i < nr_reg; i++) {
+		rc = of_address_to_resource(np, i + EXT_REGION_INDEX, &regs[i]);
+		if (rc)
+			goto err;
+
+		if (max_gpaddr < regs[i].end)
+			max_gpaddr = regs[i].end;
+		if (min_gpaddr > regs[i].start)
+			min_gpaddr = regs[i].start;
+	}
+
+	/* Check whether the resource range is within the hotpluggable range */
+	mhp_range = mhp_get_pluggable_range(true);
+	if (min_gpaddr < mhp_range.start)
+		min_gpaddr = mhp_range.start;
+	if (max_gpaddr > mhp_range.end)
+		max_gpaddr = mhp_range.end;
+
+	res->start = min_gpaddr;
+	res->end = max_gpaddr;
+
+	/*
+	 * Mark holes between extended regions as unavailable. The rest of that
+	 * address space will be available for the allocation.
+	 */
+	for (i = 1; i < nr_reg; i++) {
+		resource_size_t start, end;
+
+		start = regs[i - 1].end + 1;
+		end = regs[i].start - 1;
+
+		if (start > (end + 1)) {
+			rc = -EINVAL;
+			goto err;
+		}
+
+		/* There is no hole between regions */
+		if (start == (end + 1))
+			continue;
+
+		/* Check whether the hole range is within the resource range */
+		if (start < res->start || end > res->end) {
+			if (start < res->start)
+				start = res->start;
+			if (end > res->end)
+				end = res->end;
+
+			if (start >= (end + 1))
+				continue;
+		}
+
+		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
+		if (!tmp_res) {
+			rc = -ENOMEM;
+			goto err;
+		}
+
+		tmp_res->name = "Unavailable space";
+		tmp_res->start = start;
+		tmp_res->end = end;
+
+		rc = insert_resource(res, tmp_res);
+		if (rc) {
+			pr_err("Cannot insert resource [%llx - %llx] %d\n",
+					tmp_res->start, tmp_res->end, rc);
+			kfree(tmp_res);
+			goto err;
+		}
+	}
+
+err:
+	kfree(regs);
+
+	return rc;
+}
+#endif
+
 static void __init xen_dt_guest_init(void)
 {
 	struct device_node *xen_node;
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 1b2c3ac..e6031fc 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -297,7 +297,7 @@ config XEN_FRONT_PGDIR_SHBUF
 
 config XEN_UNPOPULATED_ALLOC
 	bool "Use unpopulated memory ranges for guest mappings"
-	depends on X86 && ZONE_DEVICE
+	depends on ZONE_DEVICE
 	default XEN_BACKEND || XEN_GNTDEV || XEN_DOM0
 	help
 	  Use unpopulated memory ranges in order to create mappings for guest
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
@ 2021-10-26 16:05   ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-26 16:05 UTC (permalink / raw)
  To: xen-devel, linux-arm-kernel, linux-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Russell King,
	Boris Ostrovsky, Juergen Gross, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch implements arch_xen_unpopulated_init() on Arm where
the extended regions (if any) are gathered from DT and inserted
into passed Xen resource to be used as unused address space
for Xen scratch pages by unpopulated-alloc code.

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen by the hypervisor at the domain
creation time and advertised to it via "reg" property under
hypervisor node in the guest device-tree. As region 0 is reserved
for grant table space (always present), the indexes for extended
regions are 1...N.

If arch_xen_unpopulated_init() fails for some reason the default
behaviour will be restored (allocate xenballooned pages).

This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V2:
   - new patch, instead of
    "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
---
 arch/arm/xen/enlighten.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/Kconfig      |   2 +-
 2 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index dea46ec..1a1e0d3 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
 static phys_addr_t xen_grant_frames;
 
 #define GRANT_TABLE_INDEX   0
+#define EXT_REGION_INDEX    1
 
 uint32_t xen_start_flags;
 EXPORT_SYMBOL(xen_start_flags);
@@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
 #endif
 }
 
+#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
+int arch_xen_unpopulated_init(struct resource *res)
+{
+	struct device_node *np;
+	struct resource *regs, *tmp_res;
+	uint64_t min_gpaddr = -1, max_gpaddr = 0;
+	unsigned int i, nr_reg = 0;
+	struct range mhp_range;
+	int rc;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	np = of_find_compatible_node(NULL, NULL, "xen,xen");
+	if (WARN_ON(!np))
+		return -ENODEV;
+
+	/* Skip region 0 which is reserved for grant table space */
+	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
+		nr_reg++;
+
+	if (!nr_reg) {
+		pr_err("No extended regions are found\n");
+		return -EINVAL;
+	}
+
+	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
+	if (!regs)
+		return -ENOMEM;
+
+	/*
+	 * Create resource from extended regions provided by the hypervisor to be
+	 * used as unused address space for Xen scratch pages.
+	 */
+	for (i = 0; i < nr_reg; i++) {
+		rc = of_address_to_resource(np, i + EXT_REGION_INDEX, &regs[i]);
+		if (rc)
+			goto err;
+
+		if (max_gpaddr < regs[i].end)
+			max_gpaddr = regs[i].end;
+		if (min_gpaddr > regs[i].start)
+			min_gpaddr = regs[i].start;
+	}
+
+	/* Check whether the resource range is within the hotpluggable range */
+	mhp_range = mhp_get_pluggable_range(true);
+	if (min_gpaddr < mhp_range.start)
+		min_gpaddr = mhp_range.start;
+	if (max_gpaddr > mhp_range.end)
+		max_gpaddr = mhp_range.end;
+
+	res->start = min_gpaddr;
+	res->end = max_gpaddr;
+
+	/*
+	 * Mark holes between extended regions as unavailable. The rest of that
+	 * address space will be available for the allocation.
+	 */
+	for (i = 1; i < nr_reg; i++) {
+		resource_size_t start, end;
+
+		start = regs[i - 1].end + 1;
+		end = regs[i].start - 1;
+
+		if (start > (end + 1)) {
+			rc = -EINVAL;
+			goto err;
+		}
+
+		/* There is no hole between regions */
+		if (start == (end + 1))
+			continue;
+
+		/* Check whether the hole range is within the resource range */
+		if (start < res->start || end > res->end) {
+			if (start < res->start)
+				start = res->start;
+			if (end > res->end)
+				end = res->end;
+
+			if (start >= (end + 1))
+				continue;
+		}
+
+		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
+		if (!tmp_res) {
+			rc = -ENOMEM;
+			goto err;
+		}
+
+		tmp_res->name = "Unavailable space";
+		tmp_res->start = start;
+		tmp_res->end = end;
+
+		rc = insert_resource(res, tmp_res);
+		if (rc) {
+			pr_err("Cannot insert resource [%llx - %llx] %d\n",
+					tmp_res->start, tmp_res->end, rc);
+			kfree(tmp_res);
+			goto err;
+		}
+	}
+
+err:
+	kfree(regs);
+
+	return rc;
+}
+#endif
+
 static void __init xen_dt_guest_init(void)
 {
 	struct device_node *xen_node;
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 1b2c3ac..e6031fc 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -297,7 +297,7 @@ config XEN_FRONT_PGDIR_SHBUF
 
 config XEN_UNPOPULATED_ALLOC
 	bool "Use unpopulated memory ranges for guest mappings"
-	depends on X86 && ZONE_DEVICE
+	depends on ZONE_DEVICE
 	default XEN_BACKEND || XEN_GNTDEV || XEN_DOM0
 	help
 	  Use unpopulated memory ranges in order to create mappings for guest
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
  2021-10-26 16:05   ` Oleksandr Tyshchenko
@ 2021-10-28  1:28     ` Stefano Stabellini
  -1 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-10-28  1:28 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Stefano Stabellini, Russell King, Julien Grall

On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Read the start address of the grant table space from DT
> (region 0).
> 
> This patch mostly restores behaviour before commit 3cf4095d7446
> ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
> but trying not to break the ACPI support added after that commit.
> So the patch touches DT part only and leaves the ACPI part with
> xen_xlate_map_ballooned_pages().
> 
> This is a preparation for using Xen extended region feature
> where unused regions of guest physical address space (provided
> by the hypervisor) will be used to create grant/foreign/whatever
> mappings instead of wasting real RAM pages from the domain memory
> for establishing these mappings.
> 
> The immediate benefit of this change:
> - Avoid superpage shattering in Xen P2M when establishing
>   stage-2 mapping (GFN <-> MFN) for the grant table space
> - Avoid wasting real RAM pages (reducing the amount of memory
>   usuable) for mapping grant table space
> - The grant table space is always mapped at the exact
>   same place (region 0 is reserved for the grant table)
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Changes RFC -> V2:
>    - new patch
> ---
>  arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 7f1c106b..dea46ec 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
>  struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
>  
>  static __read_mostly unsigned int xen_events_irq;
> +static phys_addr_t xen_grant_frames;

__read_mostly


> +#define GRANT_TABLE_INDEX   0
>  
>  uint32_t xen_start_flags;
>  EXPORT_SYMBOL(xen_start_flags);
> @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
>  static void __init xen_dt_guest_init(void)
>  {
>  	struct device_node *xen_node;
> +	struct resource res;
>  
>  	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
>  	if (!xen_node) {
> @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
>  		return;
>  	}
>  
> +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
> +		pr_err("Xen grant table region is not found\n");
> +		return;
> +	}
> +	xen_grant_frames = res.start;
> +
>  	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
>  }
>  
> @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
>  {
>  	struct xen_add_to_physmap xatp;
>  	struct shared_info *shared_info_page = NULL;
> -	int cpu;
> +	int rc, cpu;
>  
>  	if (!xen_domain())
>  		return 0;
>  
>  	if (!acpi_disabled)
>  		xen_acpi_guest_init();
> -	else
> +	else {
>  		xen_dt_guest_init();
>  
> +		if (!xen_grant_frames)
> +			return -ENODEV;

maybe we can avoid this, see below


> +	}
> +
>  	if (!xen_events_irq) {
>  		pr_err("Xen event channel interrupt not found\n");
>  		return -ENODEV;
> @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
>  	for_each_possible_cpu(cpu)
>  		per_cpu(xen_vcpu_id, cpu) = cpu;
>  
> -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> -					  &xen_auto_xlat_grant_frames.vaddr,
> -					  xen_auto_xlat_grant_frames.count)) {
> +	if (!acpi_disabled) {

To make the code more resilient couldn't we do:

if (!acpi_disabled || !xen_grant_frames) {


> +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> +		rc = xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> +										   &xen_auto_xlat_grant_frames.vaddr,
> +										   xen_auto_xlat_grant_frames.count);
> +	} else
> +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
> +	if (rc) {
>  		free_percpu(xen_vcpu_info);
> -		return -ENOMEM;
> +		return rc;
>  	}
>  	gnttab_init();


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
@ 2021-10-28  1:28     ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-10-28  1:28 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Stefano Stabellini, Russell King, Julien Grall

On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Read the start address of the grant table space from DT
> (region 0).
> 
> This patch mostly restores behaviour before commit 3cf4095d7446
> ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
> but trying not to break the ACPI support added after that commit.
> So the patch touches DT part only and leaves the ACPI part with
> xen_xlate_map_ballooned_pages().
> 
> This is a preparation for using Xen extended region feature
> where unused regions of guest physical address space (provided
> by the hypervisor) will be used to create grant/foreign/whatever
> mappings instead of wasting real RAM pages from the domain memory
> for establishing these mappings.
> 
> The immediate benefit of this change:
> - Avoid superpage shattering in Xen P2M when establishing
>   stage-2 mapping (GFN <-> MFN) for the grant table space
> - Avoid wasting real RAM pages (reducing the amount of memory
>   usuable) for mapping grant table space
> - The grant table space is always mapped at the exact
>   same place (region 0 is reserved for the grant table)
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Changes RFC -> V2:
>    - new patch
> ---
>  arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 7f1c106b..dea46ec 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
>  struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
>  
>  static __read_mostly unsigned int xen_events_irq;
> +static phys_addr_t xen_grant_frames;

__read_mostly


> +#define GRANT_TABLE_INDEX   0
>  
>  uint32_t xen_start_flags;
>  EXPORT_SYMBOL(xen_start_flags);
> @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
>  static void __init xen_dt_guest_init(void)
>  {
>  	struct device_node *xen_node;
> +	struct resource res;
>  
>  	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
>  	if (!xen_node) {
> @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
>  		return;
>  	}
>  
> +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
> +		pr_err("Xen grant table region is not found\n");
> +		return;
> +	}
> +	xen_grant_frames = res.start;
> +
>  	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
>  }
>  
> @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
>  {
>  	struct xen_add_to_physmap xatp;
>  	struct shared_info *shared_info_page = NULL;
> -	int cpu;
> +	int rc, cpu;
>  
>  	if (!xen_domain())
>  		return 0;
>  
>  	if (!acpi_disabled)
>  		xen_acpi_guest_init();
> -	else
> +	else {
>  		xen_dt_guest_init();
>  
> +		if (!xen_grant_frames)
> +			return -ENODEV;

maybe we can avoid this, see below


> +	}
> +
>  	if (!xen_events_irq) {
>  		pr_err("Xen event channel interrupt not found\n");
>  		return -ENODEV;
> @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
>  	for_each_possible_cpu(cpu)
>  		per_cpu(xen_vcpu_id, cpu) = cpu;
>  
> -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> -					  &xen_auto_xlat_grant_frames.vaddr,
> -					  xen_auto_xlat_grant_frames.count)) {
> +	if (!acpi_disabled) {

To make the code more resilient couldn't we do:

if (!acpi_disabled || !xen_grant_frames) {


> +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> +		rc = xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> +										   &xen_auto_xlat_grant_frames.vaddr,
> +										   xen_auto_xlat_grant_frames.count);
> +	} else
> +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
> +	if (rc) {
>  		free_percpu(xen_vcpu_info);
> -		return -ENOMEM;
> +		return rc;
>  	}
>  	gnttab_init();


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
  2021-10-26 16:05   ` Oleksandr Tyshchenko
@ 2021-10-28  1:40     ` Stefano Stabellini
  -1 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-10-28  1:40 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Stefano Stabellini, Russell King, Boris Ostrovsky, Juergen Gross,
	Julien Grall

[-- Attachment #1: Type: text/plain, Size: 6204 bytes --]

On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch implements arch_xen_unpopulated_init() on Arm where
> the extended regions (if any) are gathered from DT and inserted
> into passed Xen resource to be used as unused address space
> for Xen scratch pages by unpopulated-alloc code.
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen by the hypervisor at the domain
> creation time and advertised to it via "reg" property under
> hypervisor node in the guest device-tree. As region 0 is reserved
> for grant table space (always present), the indexes for extended
> regions are 1...N.
> 
> If arch_xen_unpopulated_init() fails for some reason the default
> behaviour will be restored (allocate xenballooned pages).
> 
> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Changes RFC -> V2:
>    - new patch, instead of
>     "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
> ---
>  arch/arm/xen/enlighten.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/Kconfig      |   2 +-
>  2 files changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index dea46ec..1a1e0d3 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>  static phys_addr_t xen_grant_frames;
>  
>  #define GRANT_TABLE_INDEX   0
> +#define EXT_REGION_INDEX    1
>  
>  uint32_t xen_start_flags;
>  EXPORT_SYMBOL(xen_start_flags);
> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>  #endif
>  }
>  
> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> +int arch_xen_unpopulated_init(struct resource *res)
> +{
> +	struct device_node *np;
> +	struct resource *regs, *tmp_res;
> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
> +	unsigned int i, nr_reg = 0;
> +	struct range mhp_range;
> +	int rc;
> +
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
> +	if (WARN_ON(!np))
> +		return -ENODEV;
> +
> +	/* Skip region 0 which is reserved for grant table space */
> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
> +		nr_reg++;
> +	if (!nr_reg) {
> +		pr_err("No extended regions are found\n");
> +		return -EINVAL;
> +	}
> +
> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> +	if (!regs)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Create resource from extended regions provided by the hypervisor to be
> +	 * used as unused address space for Xen scratch pages.
> +	 */
> +	for (i = 0; i < nr_reg; i++) {
> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX, &regs[i]);
> +		if (rc)
> +			goto err;
> +
> +		if (max_gpaddr < regs[i].end)
> +			max_gpaddr = regs[i].end;
> +		if (min_gpaddr > regs[i].start)
> +			min_gpaddr = regs[i].start;
> +	}
> +
> +	/* Check whether the resource range is within the hotpluggable range */
> +	mhp_range = mhp_get_pluggable_range(true);
> +	if (min_gpaddr < mhp_range.start)
> +		min_gpaddr = mhp_range.start;
> +	if (max_gpaddr > mhp_range.end)
> +		max_gpaddr = mhp_range.end;
> +
> +	res->start = min_gpaddr;
> +	res->end = max_gpaddr;
> +
> +	/*
> +	 * Mark holes between extended regions as unavailable. The rest of that
> +	 * address space will be available for the allocation.
> +	 */
> +	for (i = 1; i < nr_reg; i++) {
> +		resource_size_t start, end;
> +
> +		start = regs[i - 1].end + 1;
> +		end = regs[i].start - 1;
> +
> +		if (start > (end + 1)) {

Should this be:

if (start >= end)

?


> +			rc = -EINVAL;
> +			goto err;
> +		}
> +
> +		/* There is no hole between regions */
> +		if (start == (end + 1))

Also here, shouldn't it be:

if (start == end)

?

I think I am missing again something in termination accounting :-)


> +			continue;
> +
> +		/* Check whether the hole range is within the resource range */
> +		if (start < res->start || end > res->end) {

By definition I don't think this check is necessary as either condition
is impossible?


> +			if (start < res->start)
> +				start = res->start;
> +			if (end > res->end)
> +				end = res->end;
> +
> +			if (start >= (end + 1))
> +				continue;
> +		}
> +
> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
> +		if (!tmp_res) {
> +			rc = -ENOMEM;
> +			goto err;
> +		}
> +
> +		tmp_res->name = "Unavailable space";
> +		tmp_res->start = start;
> +		tmp_res->end = end;

Do we need to set any flags so that the system can reuse the memory in
the hole, e.g. IORESOURCE_MEM? Or is it not necessary?


> +		rc = insert_resource(res, tmp_res);
> +		if (rc) {
> +			pr_err("Cannot insert resource [%llx - %llx] %d\n",
> +					tmp_res->start, tmp_res->end, rc);

Although it is impossible to enable XEN_UNPOPULATED_ALLOC on arm32 due
to unmet dependencies, I would like to keep the implementation of
arch_xen_unpopulated_init 32bit clean.

I am getting build errors like (by forcing arch_xen_unpopulated_init to
compile on arm32):

./include/linux/kern_levels.h:5:18: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’ [-Wformat=]


> +			kfree(tmp_res);
> +			goto err;
> +		}
> +	}
> +
> +err:
> +	kfree(regs);
> +
> +	return rc;
> +}
> +#endif
> +
>  static void __init xen_dt_guest_init(void)
>  {
>  	struct device_node *xen_node;
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 1b2c3ac..e6031fc 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -297,7 +297,7 @@ config XEN_FRONT_PGDIR_SHBUF
>  
>  config XEN_UNPOPULATED_ALLOC
>  	bool "Use unpopulated memory ranges for guest mappings"
> -	depends on X86 && ZONE_DEVICE
> +	depends on ZONE_DEVICE
>  	default XEN_BACKEND || XEN_GNTDEV || XEN_DOM0
>  	help
>  	  Use unpopulated memory ranges in order to create mappings for guest

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
@ 2021-10-28  1:40     ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-10-28  1:40 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Stefano Stabellini, Russell King, Boris Ostrovsky, Juergen Gross,
	Julien Grall

[-- Attachment #1: Type: text/plain, Size: 6204 bytes --]

On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch implements arch_xen_unpopulated_init() on Arm where
> the extended regions (if any) are gathered from DT and inserted
> into passed Xen resource to be used as unused address space
> for Xen scratch pages by unpopulated-alloc code.
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen by the hypervisor at the domain
> creation time and advertised to it via "reg" property under
> hypervisor node in the guest device-tree. As region 0 is reserved
> for grant table space (always present), the indexes for extended
> regions are 1...N.
> 
> If arch_xen_unpopulated_init() fails for some reason the default
> behaviour will be restored (allocate xenballooned pages).
> 
> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Changes RFC -> V2:
>    - new patch, instead of
>     "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
> ---
>  arch/arm/xen/enlighten.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/Kconfig      |   2 +-
>  2 files changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index dea46ec..1a1e0d3 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>  static phys_addr_t xen_grant_frames;
>  
>  #define GRANT_TABLE_INDEX   0
> +#define EXT_REGION_INDEX    1
>  
>  uint32_t xen_start_flags;
>  EXPORT_SYMBOL(xen_start_flags);
> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>  #endif
>  }
>  
> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> +int arch_xen_unpopulated_init(struct resource *res)
> +{
> +	struct device_node *np;
> +	struct resource *regs, *tmp_res;
> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
> +	unsigned int i, nr_reg = 0;
> +	struct range mhp_range;
> +	int rc;
> +
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
> +	if (WARN_ON(!np))
> +		return -ENODEV;
> +
> +	/* Skip region 0 which is reserved for grant table space */
> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
> +		nr_reg++;
> +	if (!nr_reg) {
> +		pr_err("No extended regions are found\n");
> +		return -EINVAL;
> +	}
> +
> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> +	if (!regs)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Create resource from extended regions provided by the hypervisor to be
> +	 * used as unused address space for Xen scratch pages.
> +	 */
> +	for (i = 0; i < nr_reg; i++) {
> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX, &regs[i]);
> +		if (rc)
> +			goto err;
> +
> +		if (max_gpaddr < regs[i].end)
> +			max_gpaddr = regs[i].end;
> +		if (min_gpaddr > regs[i].start)
> +			min_gpaddr = regs[i].start;
> +	}
> +
> +	/* Check whether the resource range is within the hotpluggable range */
> +	mhp_range = mhp_get_pluggable_range(true);
> +	if (min_gpaddr < mhp_range.start)
> +		min_gpaddr = mhp_range.start;
> +	if (max_gpaddr > mhp_range.end)
> +		max_gpaddr = mhp_range.end;
> +
> +	res->start = min_gpaddr;
> +	res->end = max_gpaddr;
> +
> +	/*
> +	 * Mark holes between extended regions as unavailable. The rest of that
> +	 * address space will be available for the allocation.
> +	 */
> +	for (i = 1; i < nr_reg; i++) {
> +		resource_size_t start, end;
> +
> +		start = regs[i - 1].end + 1;
> +		end = regs[i].start - 1;
> +
> +		if (start > (end + 1)) {

Should this be:

if (start >= end)

?


> +			rc = -EINVAL;
> +			goto err;
> +		}
> +
> +		/* There is no hole between regions */
> +		if (start == (end + 1))

Also here, shouldn't it be:

if (start == end)

?

I think I am missing again something in termination accounting :-)


> +			continue;
> +
> +		/* Check whether the hole range is within the resource range */
> +		if (start < res->start || end > res->end) {

By definition I don't think this check is necessary as either condition
is impossible?


> +			if (start < res->start)
> +				start = res->start;
> +			if (end > res->end)
> +				end = res->end;
> +
> +			if (start >= (end + 1))
> +				continue;
> +		}
> +
> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
> +		if (!tmp_res) {
> +			rc = -ENOMEM;
> +			goto err;
> +		}
> +
> +		tmp_res->name = "Unavailable space";
> +		tmp_res->start = start;
> +		tmp_res->end = end;

Do we need to set any flags so that the system can reuse the memory in
the hole, e.g. IORESOURCE_MEM? Or is it not necessary?


> +		rc = insert_resource(res, tmp_res);
> +		if (rc) {
> +			pr_err("Cannot insert resource [%llx - %llx] %d\n",
> +					tmp_res->start, tmp_res->end, rc);

Although it is impossible to enable XEN_UNPOPULATED_ALLOC on arm32 due
to unmet dependencies, I would like to keep the implementation of
arch_xen_unpopulated_init 32bit clean.

I am getting build errors like (by forcing arch_xen_unpopulated_init to
compile on arm32):

./include/linux/kern_levels.h:5:18: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’ [-Wformat=]


> +			kfree(tmp_res);
> +			goto err;
> +		}
> +	}
> +
> +err:
> +	kfree(regs);
> +
> +	return rc;
> +}
> +#endif
> +
>  static void __init xen_dt_guest_init(void)
>  {
>  	struct device_node *xen_node;
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 1b2c3ac..e6031fc 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -297,7 +297,7 @@ config XEN_FRONT_PGDIR_SHBUF
>  
>  config XEN_UNPOPULATED_ALLOC
>  	bool "Use unpopulated memory ranges for guest mappings"
> -	depends on X86 && ZONE_DEVICE
> +	depends on ZONE_DEVICE
>  	default XEN_BACKEND || XEN_GNTDEV || XEN_DOM0
>  	help
>  	  Use unpopulated memory ranges in order to create mappings for guest

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-10-26 16:05 ` [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource Oleksandr Tyshchenko
@ 2021-10-28 16:37   ` Stefano Stabellini
  2021-11-09 18:34     ` Oleksandr
  2021-10-28 19:08   ` Boris Ostrovsky
  1 sibling, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2021-10-28 16:37 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-kernel, Oleksandr Tyshchenko, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Julien Grall

On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The main reason of this change is that unpopulated-alloc
> code cannot be used in its current form on Arm, but there
> is a desire to reuse it to avoid wasting real RAM pages
> for the grant/foreign mappings.
> 
> The problem is that system "iomem_resource" is used for
> the address space allocation, but the really unallocated
> space can't be figured out precisely by the domain on Arm
> without hypervisor involvement. For example, not all device
> I/O regions are known by the time domain starts creating
> grant/foreign mappings. And following the advise from
> "iomem_resource" we might end up reusing these regions by
> a mistake. So, the hypervisor which maintains the P2M for
> the domain is in the best position to provide unused regions
> of guest physical address space which could be safely used
> to create grant/foreign mappings.
> 
> Introduce new helper arch_xen_unpopulated_init() which purpose
> is to create specific Xen resource based on the memory regions
> provided by the hypervisor to be used as unused space for Xen
> scratch pages.
> 
> If arch doesn't implement arch_xen_unpopulated_init() to
> initialize Xen resource the default "iomem_resource" will be used.
> So the behavior on x86 won't be changed.
> 
> Also fall back to allocate xenballooned pages (steal real RAM
> pages) if we do not have any suitable resource to work with and
> as the result we won't be able to provide unpopulated pages.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Changes RFC -> V2:
>    - new patch, instead of
>     "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
> ---
>  drivers/xen/unpopulated-alloc.c | 89 +++++++++++++++++++++++++++++++++++++++--
>  include/xen/xen.h               |  2 +
>  2 files changed, 88 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
> index a03dc5b..1f1d8d8 100644
> --- a/drivers/xen/unpopulated-alloc.c
> +++ b/drivers/xen/unpopulated-alloc.c
> @@ -8,6 +8,7 @@
>  
>  #include <asm/page.h>
>  
> +#include <xen/balloon.h>
>  #include <xen/page.h>
>  #include <xen/xen.h>
>  
> @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
>  static struct page *page_list;
>  static unsigned int list_count;
>  
> +static struct resource *target_resource;
> +static struct resource xen_resource = {
> +	.name = "Xen unused space",
> +};
> +
> +/*
> + * If arch is not happy with system "iomem_resource" being used for
> + * the region allocation it can provide it's own view by initializing
> + * "xen_resource" with unused regions of guest physical address space
> + * provided by the hypervisor.
> + */
> +int __weak arch_xen_unpopulated_init(struct resource *res)
> +{
> +	return -ENOSYS;
> +}
> +
>  static int fill_list(unsigned int nr_pages)
>  {
>  	struct dev_pagemap *pgmap;
> -	struct resource *res;
> +	struct resource *res, *tmp_res = NULL;
>  	void *vaddr;
>  	unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
> -	int ret = -ENOMEM;
> +	int ret;
>  
>  	res = kzalloc(sizeof(*res), GFP_KERNEL);
>  	if (!res)
> @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
>  	res->name = "Xen scratch";
>  	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>  
> -	ret = allocate_resource(&iomem_resource, res,
> +	ret = allocate_resource(target_resource, res,
>  				alloc_pages * PAGE_SIZE, 0, -1,
>  				PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>  	if (ret < 0) {
> @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
>  		goto err_resource;
>  	}
>  
> +	/*
> +	 * Reserve the region previously allocated from Xen resource to avoid
> +	 * re-using it by someone else.
> +	 */
> +	if (target_resource != &iomem_resource) {
> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
> +		if (!res) {
> +			ret = -ENOMEM;
> +			goto err_insert;
> +		}
> +
> +		tmp_res->name = res->name;
> +		tmp_res->start = res->start;
> +		tmp_res->end = res->end;
> +		tmp_res->flags = res->flags;
> +
> +		ret = insert_resource(&iomem_resource, tmp_res);
> +		if (ret < 0) {
> +			pr_err("Cannot insert IOMEM resource [%llx - %llx]\n",
> +			       tmp_res->start, tmp_res->end);
> +			kfree(tmp_res);
> +			goto err_insert;
> +		}
> +	}

I am a bit confused.. why do we need to do this? Who could be
erroneously re-using the region? Are you saying that the next time
allocate_resource is called it could find the same region again? It
doesn't seem possible?


>  	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
>  	if (!pgmap) {
>  		ret = -ENOMEM;
> @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
>  err_memremap:
>  	kfree(pgmap);
>  err_pgmap:
> +	if (tmp_res) {
> +		release_resource(tmp_res);
> +		kfree(tmp_res);
> +	}
> +err_insert:
>  	release_resource(res);
>  err_resource:
>  	kfree(res);
>  	return ret;
>  }
>  
> +static void unpopulated_init(void)
> +{
> +	static bool inited = false;

initialized = false


> +	int ret;
> +
> +	if (inited)
> +		return;
> +
> +	/*
> +	 * Try to initialize Xen resource the first and fall back to default
> +	 * resource if arch doesn't offer one.
> +	 */
> +	ret = arch_xen_unpopulated_init(&xen_resource);
> +	if (!ret)
> +		target_resource = &xen_resource;
> +	else if (ret == -ENOSYS)
> +		target_resource = &iomem_resource;
> +	else
> +		pr_err("Cannot initialize Xen resource\n");
> +
> +	inited = true;
> +}

Would it make sense to call unpopulated_init from an init function,
rather than every time xen_alloc_unpopulated_pages is called?


>  /**
>   * xen_alloc_unpopulated_pages - alloc unpopulated pages
>   * @nr_pages: Number of pages
> @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
>  	unsigned int i;
>  	int ret = 0;
>  
> +	unpopulated_init();
> +
> +	/*
> +	 * Fall back to default behavior if we do not have any suitable resource
> +	 * to allocate required region from and as the result we won't be able to
> +	 * construct pages.
> +	 */
> +	if (!target_resource)
> +		return alloc_xenballooned_pages(nr_pages, pages);

The commit message says that the behavior on x86 doesn't change but this
seems to be a change that could impact x86?


>  	mutex_lock(&list_lock);
>  	if (list_count < nr_pages) {
>  		ret = fill_list(nr_pages - list_count);
> @@ -159,6 +239,9 @@ void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages)
>  {
>  	unsigned int i;
>  
> +	if (!target_resource)
> +		return free_xenballooned_pages(nr_pages, pages);
> +
>  	mutex_lock(&list_lock);
>  	for (i = 0; i < nr_pages; i++) {
>  		pages[i]->zone_device_data = page_list;
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 43efba0..55d2ef8 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -55,6 +55,8 @@ extern u64 xen_saved_max_mem_size;
>  #ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>  int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
>  void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);
> +struct resource;

This is to avoid having to #include linux/ioport.h, right? Is it a
problem or is it just to minimize the headers dependencies?

It looks like adding #include <linux/ioport.h> below #include
<linux/types.h> in include/xen/xen.h would work too. I am not sure what
is the best way though, I'll let Juergen comment.


> +int arch_xen_unpopulated_init(struct resource *res);
>  #else
>  #define xen_alloc_unpopulated_pages alloc_xenballooned_pages
>  #define xen_free_unpopulated_pages free_xenballooned_pages
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 1/4] xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list()
  2021-10-26 16:05 ` [PATCH V2 1/4] xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list() Oleksandr Tyshchenko
@ 2021-10-28 18:57   ` Boris Ostrovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Boris Ostrovsky @ 2021-10-28 18:57 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel, linux-kernel
  Cc: Oleksandr Tyshchenko, Juergen Gross, Stefano Stabellini, Julien Grall


On 10/26/21 12:05 PM, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> If memremap_pages() succeeds the range is guaranteed to have proper page
> table, there is no need for an additional virt_addr_valid() check.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-10-26 16:05 ` [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource Oleksandr Tyshchenko
  2021-10-28 16:37   ` Stefano Stabellini
@ 2021-10-28 19:08   ` Boris Ostrovsky
  2021-11-09 18:51     ` Oleksandr
  1 sibling, 1 reply; 41+ messages in thread
From: Boris Ostrovsky @ 2021-10-28 19:08 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel, linux-kernel
  Cc: Oleksandr Tyshchenko, Juergen Gross, Stefano Stabellini, Julien Grall


On 10/26/21 12:05 PM, Oleksandr Tyshchenko wrote:
>   
> +static void unpopulated_init(void)
> +{
> +	static bool inited = false;
> +	int ret;
> +
> +	if (inited)
> +		return;
> +
> +	/*
> +	 * Try to initialize Xen resource the first and fall back to default
> +	 * resource if arch doesn't offer one.
> +	 */
> +	ret = arch_xen_unpopulated_init(&xen_resource);
> +	if (!ret)
> +		target_resource = &xen_resource;
> +	else if (ret == -ENOSYS)
> +		target_resource = &iomem_resource;
> +	else
> +		pr_err("Cannot initialize Xen resource\n");


I'd pass target_resource as a parameter to arch_xen_unpopulated_init() instead. Default routine will assign it iomem_resource and you won't have to deal with -ENOSYS.


Also, what happens in case of error? Is it fatal? I don't think your changes in fill_list() will work.


> +
> +	inited = true;


I agree with Stefano in that it would be better to call this from an init function, and you won't have t worry about multiple calls here.


-boris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-10-28 16:37   ` Stefano Stabellini
@ 2021-11-09 18:34     ` Oleksandr
  2021-11-19  0:59       ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Oleksandr @ 2021-11-09 18:34 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, Oleksandr Tyshchenko, Boris Ostrovsky,
	Juergen Gross, Julien Grall


On 28.10.21 19:37, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.

> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The main reason of this change is that unpopulated-alloc
>> code cannot be used in its current form on Arm, but there
>> is a desire to reuse it to avoid wasting real RAM pages
>> for the grant/foreign mappings.
>>
>> The problem is that system "iomem_resource" is used for
>> the address space allocation, but the really unallocated
>> space can't be figured out precisely by the domain on Arm
>> without hypervisor involvement. For example, not all device
>> I/O regions are known by the time domain starts creating
>> grant/foreign mappings. And following the advise from
>> "iomem_resource" we might end up reusing these regions by
>> a mistake. So, the hypervisor which maintains the P2M for
>> the domain is in the best position to provide unused regions
>> of guest physical address space which could be safely used
>> to create grant/foreign mappings.
>>
>> Introduce new helper arch_xen_unpopulated_init() which purpose
>> is to create specific Xen resource based on the memory regions
>> provided by the hypervisor to be used as unused space for Xen
>> scratch pages.
>>
>> If arch doesn't implement arch_xen_unpopulated_init() to
>> initialize Xen resource the default "iomem_resource" will be used.
>> So the behavior on x86 won't be changed.
>>
>> Also fall back to allocate xenballooned pages (steal real RAM
>> pages) if we do not have any suitable resource to work with and
>> as the result we won't be able to provide unpopulated pages.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes RFC -> V2:
>>     - new patch, instead of
>>      "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
>> ---
>>   drivers/xen/unpopulated-alloc.c | 89 +++++++++++++++++++++++++++++++++++++++--
>>   include/xen/xen.h               |  2 +
>>   2 files changed, 88 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
>> index a03dc5b..1f1d8d8 100644
>> --- a/drivers/xen/unpopulated-alloc.c
>> +++ b/drivers/xen/unpopulated-alloc.c
>> @@ -8,6 +8,7 @@
>>   
>>   #include <asm/page.h>
>>   
>> +#include <xen/balloon.h>
>>   #include <xen/page.h>
>>   #include <xen/xen.h>
>>   
>> @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
>>   static struct page *page_list;
>>   static unsigned int list_count;
>>   
>> +static struct resource *target_resource;
>> +static struct resource xen_resource = {
>> +	.name = "Xen unused space",
>> +};
>> +
>> +/*
>> + * If arch is not happy with system "iomem_resource" being used for
>> + * the region allocation it can provide it's own view by initializing
>> + * "xen_resource" with unused regions of guest physical address space
>> + * provided by the hypervisor.
>> + */
>> +int __weak arch_xen_unpopulated_init(struct resource *res)
>> +{
>> +	return -ENOSYS;
>> +}
>> +
>>   static int fill_list(unsigned int nr_pages)
>>   {
>>   	struct dev_pagemap *pgmap;
>> -	struct resource *res;
>> +	struct resource *res, *tmp_res = NULL;
>>   	void *vaddr;
>>   	unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>> -	int ret = -ENOMEM;
>> +	int ret;
>>   
>>   	res = kzalloc(sizeof(*res), GFP_KERNEL);
>>   	if (!res)
>> @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
>>   	res->name = "Xen scratch";
>>   	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>   
>> -	ret = allocate_resource(&iomem_resource, res,
>> +	ret = allocate_resource(target_resource, res,
>>   				alloc_pages * PAGE_SIZE, 0, -1,
>>   				PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>>   	if (ret < 0) {
>> @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
>>   		goto err_resource;
>>   	}
>>   
>> +	/*
>> +	 * Reserve the region previously allocated from Xen resource to avoid
>> +	 * re-using it by someone else.
>> +	 */
>> +	if (target_resource != &iomem_resource) {
>> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>> +		if (!res) {
>> +			ret = -ENOMEM;
>> +			goto err_insert;
>> +		}
>> +
>> +		tmp_res->name = res->name;
>> +		tmp_res->start = res->start;
>> +		tmp_res->end = res->end;
>> +		tmp_res->flags = res->flags;
>> +
>> +		ret = insert_resource(&iomem_resource, tmp_res);
>> +		if (ret < 0) {
>> +			pr_err("Cannot insert IOMEM resource [%llx - %llx]\n",
>> +			       tmp_res->start, tmp_res->end);
>> +			kfree(tmp_res);
>> +			goto err_insert;
>> +		}
>> +	}
> I am a bit confused.. why do we need to do this? Who could be
> erroneously re-using the region? Are you saying that the next time
> allocate_resource is called it could find the same region again? It
> doesn't seem possible?


No, as I understand the allocate_resource() being called for the same 
root resource won't provide the same region... We only need to do this 
(insert the region into "iomem_resource") if we allocated it from our 
*internal* "xen_resource", as *global* "iomem_resource" (which is used 
everywhere) is not aware of that region has been already allocated. So 
inserting a region here we reserving it, otherwise it could be reused 
elsewhere.


>
>
>>   	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
>>   	if (!pgmap) {
>>   		ret = -ENOMEM;
>> @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
>>   err_memremap:
>>   	kfree(pgmap);
>>   err_pgmap:
>> +	if (tmp_res) {
>> +		release_resource(tmp_res);
>> +		kfree(tmp_res);
>> +	}
>> +err_insert:
>>   	release_resource(res);
>>   err_resource:
>>   	kfree(res);
>>   	return ret;
>>   }
>>   
>> +static void unpopulated_init(void)
>> +{
>> +	static bool inited = false;
> initialized = false

ok.


>
>
>> +	int ret;
>> +
>> +	if (inited)
>> +		return;
>> +
>> +	/*
>> +	 * Try to initialize Xen resource the first and fall back to default
>> +	 * resource if arch doesn't offer one.
>> +	 */
>> +	ret = arch_xen_unpopulated_init(&xen_resource);
>> +	if (!ret)
>> +		target_resource = &xen_resource;
>> +	else if (ret == -ENOSYS)
>> +		target_resource = &iomem_resource;
>> +	else
>> +		pr_err("Cannot initialize Xen resource\n");
>> +
>> +	inited = true;
>> +}
> Would it make sense to call unpopulated_init from an init function,
> rather than every time xen_alloc_unpopulated_pages is called?

Good point, thank you. Will do. To be honest, I also don't like the 
current approach much.


>
>
>>   /**
>>    * xen_alloc_unpopulated_pages - alloc unpopulated pages
>>    * @nr_pages: Number of pages
>> @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
>>   	unsigned int i;
>>   	int ret = 0;
>>   
>> +	unpopulated_init();
>> +
>> +	/*
>> +	 * Fall back to default behavior if we do not have any suitable resource
>> +	 * to allocate required region from and as the result we won't be able to
>> +	 * construct pages.
>> +	 */
>> +	if (!target_resource)
>> +		return alloc_xenballooned_pages(nr_pages, pages);
> The commit message says that the behavior on x86 doesn't change but this
> seems to be a change that could impact x86?
I don't think, however I didn't tested on x86 and might be wrong, but 
according to the current patch, on x86 the "target_resource" is always 
valid and points to the "iomem_resource" as arch_xen_unpopulated_init() 
is not implemented. So there won't be any fallback to use 
alloc_(free)_xenballooned_pages() here and fill_list() will behave as usual.

You raised a really good question, on Arm we need a fallback to balloon 
out RAM pages again if hypervisor doesn't provide extended regions (we 
run on old version, no unused regions with reasonable size, etc), so I 
decided to put a fallback code here, an indicator of the failure is 
invalid "target_resource". I noticed the patch which is about to be 
upstreamed that removes alloc_(free)xenballooned_pages API [1]. Right 
now I have no idea how/where this fallback could be implemented as this 
is under build option control (CONFIG_XEN_UNPOPULATED_ALLOC). So the API 
with the same name is either used for unpopulated pages (if set) or 
ballooned pages (if not set). I would appreciate suggestions regarding 
that. I am wondering would it be possible and correctly to have both 
mechanisms (unpopulated and ballooned) enabled by default and some init 
code to decide which one to use at runtime or some sort?


>
>>   	mutex_lock(&list_lock);
>>   	if (list_count < nr_pages) {
>>   		ret = fill_list(nr_pages - list_count);
>> @@ -159,6 +239,9 @@ void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages)
>>   {
>>   	unsigned int i;
>>   
>> +	if (!target_resource)
>> +		return free_xenballooned_pages(nr_pages, pages);
>> +
>>   	mutex_lock(&list_lock);
>>   	for (i = 0; i < nr_pages; i++) {
>>   		pages[i]->zone_device_data = page_list;
>> diff --git a/include/xen/xen.h b/include/xen/xen.h
>> index 43efba0..55d2ef8 100644
>> --- a/include/xen/xen.h
>> +++ b/include/xen/xen.h
>> @@ -55,6 +55,8 @@ extern u64 xen_saved_max_mem_size;
>>   #ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>   int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
>>   void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);
>> +struct resource;
> This is to avoid having to #include linux/ioport.h, right? Is it a
> problem or is it just to minimize the headers dependencies?
>
> It looks like adding #include <linux/ioport.h> below #include
> <linux/types.h> in include/xen/xen.h would work too. I am not sure what
> is the best way though, I'll let Juergen comment.
Yes, the initial reason to use forward declaration here was to minimize 
the headers dependencies.
I have rechecked, your suggestion works as well, thank you. So I would 
be OK either way, let's wait for other opinions.


>
>
>> +int arch_xen_unpopulated_init(struct resource *res);
>>   #else
>>   #define xen_alloc_unpopulated_pages alloc_xenballooned_pages
>>   #define xen_free_unpopulated_pages free_xenballooned_pages
>> -- 
>> 2.7.4
>>

[1] https://lore.kernel.org/lkml/20211102092234.17852-1-jgross@suse.com/

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-10-28 19:08   ` Boris Ostrovsky
@ 2021-11-09 18:51     ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-09 18:51 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: xen-devel, linux-kernel, Oleksandr Tyshchenko, Juergen Gross,
	Stefano Stabellini, Julien Grall


On 28.10.21 22:08, Boris Ostrovsky wrote:

Hi Boris

I am sorry for the late response.

>
> On 10/26/21 12:05 PM, Oleksandr Tyshchenko wrote:
>>   +static void unpopulated_init(void)
>> +{
>> +    static bool inited = false;
>> +    int ret;
>> +
>> +    if (inited)
>> +        return;
>> +
>> +    /*
>> +     * Try to initialize Xen resource the first and fall back to 
>> default
>> +     * resource if arch doesn't offer one.
>> +     */
>> +    ret = arch_xen_unpopulated_init(&xen_resource);
>> +    if (!ret)
>> +        target_resource = &xen_resource;
>> +    else if (ret == -ENOSYS)
>> +        target_resource = &iomem_resource;
>> +    else
>> +        pr_err("Cannot initialize Xen resource\n");
>
>
> I'd pass target_resource as a parameter to arch_xen_unpopulated_init() 
> instead. Default routine will assign it iomem_resource and you won't 
> have to deal with -ENOSYS.

That would be much better, thank you. Will do.


>
>
>
> Also, what happens in case of error? Is it fatal? I don't think your 
> changes in fill_list() will work.

The error is fatal as we don't have a suitable resource to allocate a 
region from, and yes, the fill_list() must not be called.


>
>
>
>> +
>> +    inited = true;
>
>
> I agree with Stefano in that it would be better to call this from an 
> init function, and you won't have t worry about multiple calls here.

Yes, that's good point, thank you. Will do.


>
>
>
> -boris

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
  2021-10-28  1:40     ` Stefano Stabellini
@ 2021-11-10 20:21       ` Oleksandr
  -1 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-10 20:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Boris Ostrovsky, Juergen Gross, Julien Grall


On 28.10.21 04:40, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.

> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch implements arch_xen_unpopulated_init() on Arm where
>> the extended regions (if any) are gathered from DT and inserted
>> into passed Xen resource to be used as unused address space
>> for Xen scratch pages by unpopulated-alloc code.
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen by the hypervisor at the domain
>> creation time and advertised to it via "reg" property under
>> hypervisor node in the guest device-tree. As region 0 is reserved
>> for grant table space (always present), the indexes for extended
>> regions are 1...N.
>>
>> If arch_xen_unpopulated_init() fails for some reason the default
>> behaviour will be restored (allocate xenballooned pages).
>>
>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes RFC -> V2:
>>     - new patch, instead of
>>      "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
>> ---
>>   arch/arm/xen/enlighten.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/xen/Kconfig      |   2 +-
>>   2 files changed, 113 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index dea46ec..1a1e0d3 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>   static phys_addr_t xen_grant_frames;
>>   
>>   #define GRANT_TABLE_INDEX   0
>> +#define EXT_REGION_INDEX    1
>>   
>>   uint32_t xen_start_flags;
>>   EXPORT_SYMBOL(xen_start_flags);
>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>   #endif
>>   }
>>   
>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>> +int arch_xen_unpopulated_init(struct resource *res)
>> +{
>> +	struct device_node *np;
>> +	struct resource *regs, *tmp_res;
>> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
>> +	unsigned int i, nr_reg = 0;
>> +	struct range mhp_range;
>> +	int rc;
>> +
>> +	if (!xen_domain())
>> +		return -ENODEV;
>> +
>> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
>> +	if (WARN_ON(!np))
>> +		return -ENODEV;
>> +
>> +	/* Skip region 0 which is reserved for grant table space */
>> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
>> +		nr_reg++;
>> +	if (!nr_reg) {
>> +		pr_err("No extended regions are found\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>> +	if (!regs)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * Create resource from extended regions provided by the hypervisor to be
>> +	 * used as unused address space for Xen scratch pages.
>> +	 */
>> +	for (i = 0; i < nr_reg; i++) {
>> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX, &regs[i]);
>> +		if (rc)
>> +			goto err;
>> +
>> +		if (max_gpaddr < regs[i].end)
>> +			max_gpaddr = regs[i].end;
>> +		if (min_gpaddr > regs[i].start)
>> +			min_gpaddr = regs[i].start;
>> +	}
>> +
>> +	/* Check whether the resource range is within the hotpluggable range */
>> +	mhp_range = mhp_get_pluggable_range(true);
>> +	if (min_gpaddr < mhp_range.start)
>> +		min_gpaddr = mhp_range.start;
>> +	if (max_gpaddr > mhp_range.end)
>> +		max_gpaddr = mhp_range.end;
>> +
>> +	res->start = min_gpaddr;
>> +	res->end = max_gpaddr;
>> +
>> +	/*
>> +	 * Mark holes between extended regions as unavailable. The rest of that
>> +	 * address space will be available for the allocation.
>> +	 */
>> +	for (i = 1; i < nr_reg; i++) {
>> +		resource_size_t start, end;
>> +
>> +		start = regs[i - 1].end + 1;
>> +		end = regs[i].start - 1;
>> +
>> +		if (start > (end + 1)) {
> Should this be:
>
> if (start >= end)
>
> ?

Yes, we can do this here (since the checks are equivalent) but ...


>
>
>> +			rc = -EINVAL;
>> +			goto err;
>> +		}
>> +
>> +		/* There is no hole between regions */
>> +		if (start == (end + 1))
> Also here, shouldn't it be:
>
> if (start == end)
>
> ?

    ... not here.

As

"(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"

but

"(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"


>
> I think I am missing again something in termination accounting :-)

If I understand correctly, we need to follow "end = start + size - 1" 
rule, so the "end" is the last address inside a range, but not the 
"first" address outside of a range))


>
>
>> +			continue;
>> +
>> +		/* Check whether the hole range is within the resource range */
>> +		if (start < res->start || end > res->end) {
> By definition I don't think this check is necessary as either condition
> is impossible?


This is a good question, let me please explain.
Not all extended regions provided by the hypervisor can be used here. 
This is because the addressable physical memory range for which the 
linear mapping
could be created has limits on Arm, and maximum addressable range 
depends on the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided 
to not filter them in hypervisor as this logic could be quite complex as 
different OS may have different requirement, etc. This means that we 
need to make sure that regions are within the hotpluggable range to 
avoid a failure later on when a region is pre-validated by the memory 
hotplug path.

The following code limits the resource range based on that:

+    /* Check whether the resource range is within the hotpluggable range */
+    mhp_range = mhp_get_pluggable_range(true);
+    if (min_gpaddr < mhp_range.start)
+        min_gpaddr = mhp_range.start;
+    if (max_gpaddr > mhp_range.end)
+        max_gpaddr = mhp_range.end;
+
+    res->start = min_gpaddr;
+    res->end = max_gpaddr;

In current loop (when calculating and inserting holes) we also need to 
make sure that resulting hole range is within the resource range (and 
adjust/skip it if not true) as regs[] used for the calculations contains 
raw regions as they described in DT so not updated. Otherwise 
insert_resource() down the function will return an error for the 
conflicting operations. Yes, I could took a different route and update 
regs[] in advance to adjust/skip non-suitable regions in front, but I 
decided to do it on the fly in the loop here, I thought doing it in 
advance would add some overhead/complexity. What do you think?

So I am afraid this check is necessary here.

For example in my environment the extended regions are:

(XEN) Extended region 0: 0->0x8000000
(XEN) Extended region 1: 0xc000000->0x30000000
(XEN) Extended region 2: 0x40000000->0x47e00000
(XEN) Extended region 3: 0xd0000000->0xe6000000
(XEN) Extended region 4: 0xe7800000->0xec000000
(XEN) Extended region 5: 0xf1200000->0xfd000000
(XEN) Extended region 6: 0x100000000->0x500000000
(XEN) Extended region 7: 0x580000000->0x600000000
(XEN) Extended region 8: 0x680000000->0x700000000
(XEN) Extended region 9: 0x780000000->0x10000000000

*With* the check the holes are:

holes [47e00000 - cfffffff]
holes [e6000000 - e77fffff]
holes [ec000000 - f11fffff]
holes [fd000000 - ffffffff]
holes [500000000 - 57fffffff]
holes [600000000 - 67fffffff]
holes [700000000 - 77fffffff]

And they seem to look correct, you can see that two possible holes 
between extended regions 0-1 (8000000-bffffff) and 1-2 
(30000000-3fffffff) were skipped as they entirely located below res->start
which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).

*Without* the check these two holes won't be skipped and as the result 
insert_resource() will fail.


**********


I have one idea how we can simplify filter logic, we can drop all checks 
here (including confusing one) in Arm code and update common code a bit:

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 1a1e0d3..ed5b855 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
         struct resource *regs, *tmp_res;
         uint64_t min_gpaddr = -1, max_gpaddr = 0;
         unsigned int i, nr_reg = 0;
-       struct range mhp_range;
         int rc;

         if (!xen_domain())
@@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
                         min_gpaddr = regs[i].start;
         }

-       /* Check whether the resource range is within the hotpluggable 
range */
-       mhp_range = mhp_get_pluggable_range(true);
-       if (min_gpaddr < mhp_range.start)
-               min_gpaddr = mhp_range.start;
-       if (max_gpaddr > mhp_range.end)
-               max_gpaddr = mhp_range.end;
-
         res->start = min_gpaddr;
         res->end = max_gpaddr;

@@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
                 if (start == (end + 1))
                         continue;

-               /* Check whether the hole range is within the resource 
range */
-               if (start < res->start || end > res->end) {
-                       if (start < res->start)
-                               start = res->start;
-                       if (end > res->end)
-                               end = res->end;
-
-                       if (start >= (end + 1))
-                               continue;
-               }
-
                 tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
                 if (!tmp_res) {
                         rc = -ENOMEM;
diff --git a/drivers/xen/unpopulated-alloc.c 
b/drivers/xen/unpopulated-alloc.c
index 1f1d8d8..a5d3ebb 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
         void *vaddr;
         unsigned int i, alloc_pages = round_up(nr_pages, 
PAGES_PER_SECTION);
         int ret;
+       struct range mhp_range;

         res = kzalloc(sizeof(*res), GFP_KERNEL);
         if (!res)
@@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
         res->name = "Xen scratch";
         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

+       mhp_range = mhp_get_pluggable_range(true);
+
         ret = allocate_resource(target_resource, res,
-                               alloc_pages * PAGE_SIZE, 0, -1,
+                               alloc_pages * PAGE_SIZE, 
mhp_range.start, mhp_range.end,
                                 PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
         if (ret < 0) {
                 pr_err("Cannot allocate new IOMEM resource\n");
(END)

I believe, this will work on x86 as arch_get_mappable_range() is not 
implemented there,
and the default option contains exactly what being used currently (0, -1).

struct range __weak arch_get_mappable_range(void)
{
     struct range mhp_range = {
         .start = 0UL,
         .end = -1ULL,
     };
     return mhp_range;
}

And this is going to be more generic and clear, what do you think?


>
>> +			if (start < res->start)
>> +				start = res->start;
>> +			if (end > res->end)
>> +				end = res->end;
>> +
>> +			if (start >= (end + 1))
>> +				continue;
>> +		}
>> +
>> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>> +		if (!tmp_res) {
>> +			rc = -ENOMEM;
>> +			goto err;
>> +		}
>> +
>> +		tmp_res->name = "Unavailable space";
>> +		tmp_res->start = start;
>> +		tmp_res->end = end;
> Do we need to set any flags so that the system can reuse the memory in
> the hole, e.g. IORESOURCE_MEM? Or is it not necessary?


I might be wrong, but I don't think it is necessary. I don't see how the 
system can reuse memory in the holes as
the Xen resource we are constructing here will be exclusively used by 
the unpopulated-alloc code only. I would leave type-less
resource here. Or I missed something?


>
>
>> +		rc = insert_resource(res, tmp_res);
>> +		if (rc) {
>> +			pr_err("Cannot insert resource [%llx - %llx] %d\n",
>> +					tmp_res->start, tmp_res->end, rc);
> Although it is impossible to enable XEN_UNPOPULATED_ALLOC on arm32 due
> to unmet dependencies, I would like to keep the implementation of
> arch_xen_unpopulated_init 32bit clean.
>
> I am getting build errors like (by forcing arch_xen_unpopulated_init to
> compile on arm32):
>
> ./include/linux/kern_levels.h:5:18: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’ [-Wformat=]

Thank you for pointing this out. I will use %pR specifier here and in 
the common code where I print the same message.


>
>
>> +			kfree(tmp_res);
>> +			goto err;
>> +		}
>> +	}
>> +
>> +err:
>> +	kfree(regs);
>> +
>> +	return rc;
>> +}
>> +#endif
>> +
>>   static void __init xen_dt_guest_init(void)
>>   {
>>   	struct device_node *xen_node;
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index 1b2c3ac..e6031fc 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -297,7 +297,7 @@ config XEN_FRONT_PGDIR_SHBUF
>>   
>>   config XEN_UNPOPULATED_ALLOC
>>   	bool "Use unpopulated memory ranges for guest mappings"
>> -	depends on X86 && ZONE_DEVICE
>> +	depends on ZONE_DEVICE
>>   	default XEN_BACKEND || XEN_GNTDEV || XEN_DOM0
>>   	help
>>   	  Use unpopulated memory ranges in order to create mappings for guest

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
@ 2021-11-10 20:21       ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-10 20:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Boris Ostrovsky, Juergen Gross, Julien Grall


On 28.10.21 04:40, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.

> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch implements arch_xen_unpopulated_init() on Arm where
>> the extended regions (if any) are gathered from DT and inserted
>> into passed Xen resource to be used as unused address space
>> for Xen scratch pages by unpopulated-alloc code.
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen by the hypervisor at the domain
>> creation time and advertised to it via "reg" property under
>> hypervisor node in the guest device-tree. As region 0 is reserved
>> for grant table space (always present), the indexes for extended
>> regions are 1...N.
>>
>> If arch_xen_unpopulated_init() fails for some reason the default
>> behaviour will be restored (allocate xenballooned pages).
>>
>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes RFC -> V2:
>>     - new patch, instead of
>>      "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space"
>> ---
>>   arch/arm/xen/enlighten.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/xen/Kconfig      |   2 +-
>>   2 files changed, 113 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index dea46ec..1a1e0d3 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>   static phys_addr_t xen_grant_frames;
>>   
>>   #define GRANT_TABLE_INDEX   0
>> +#define EXT_REGION_INDEX    1
>>   
>>   uint32_t xen_start_flags;
>>   EXPORT_SYMBOL(xen_start_flags);
>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>   #endif
>>   }
>>   
>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>> +int arch_xen_unpopulated_init(struct resource *res)
>> +{
>> +	struct device_node *np;
>> +	struct resource *regs, *tmp_res;
>> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
>> +	unsigned int i, nr_reg = 0;
>> +	struct range mhp_range;
>> +	int rc;
>> +
>> +	if (!xen_domain())
>> +		return -ENODEV;
>> +
>> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
>> +	if (WARN_ON(!np))
>> +		return -ENODEV;
>> +
>> +	/* Skip region 0 which is reserved for grant table space */
>> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
>> +		nr_reg++;
>> +	if (!nr_reg) {
>> +		pr_err("No extended regions are found\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>> +	if (!regs)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * Create resource from extended regions provided by the hypervisor to be
>> +	 * used as unused address space for Xen scratch pages.
>> +	 */
>> +	for (i = 0; i < nr_reg; i++) {
>> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX, &regs[i]);
>> +		if (rc)
>> +			goto err;
>> +
>> +		if (max_gpaddr < regs[i].end)
>> +			max_gpaddr = regs[i].end;
>> +		if (min_gpaddr > regs[i].start)
>> +			min_gpaddr = regs[i].start;
>> +	}
>> +
>> +	/* Check whether the resource range is within the hotpluggable range */
>> +	mhp_range = mhp_get_pluggable_range(true);
>> +	if (min_gpaddr < mhp_range.start)
>> +		min_gpaddr = mhp_range.start;
>> +	if (max_gpaddr > mhp_range.end)
>> +		max_gpaddr = mhp_range.end;
>> +
>> +	res->start = min_gpaddr;
>> +	res->end = max_gpaddr;
>> +
>> +	/*
>> +	 * Mark holes between extended regions as unavailable. The rest of that
>> +	 * address space will be available for the allocation.
>> +	 */
>> +	for (i = 1; i < nr_reg; i++) {
>> +		resource_size_t start, end;
>> +
>> +		start = regs[i - 1].end + 1;
>> +		end = regs[i].start - 1;
>> +
>> +		if (start > (end + 1)) {
> Should this be:
>
> if (start >= end)
>
> ?

Yes, we can do this here (since the checks are equivalent) but ...


>
>
>> +			rc = -EINVAL;
>> +			goto err;
>> +		}
>> +
>> +		/* There is no hole between regions */
>> +		if (start == (end + 1))
> Also here, shouldn't it be:
>
> if (start == end)
>
> ?

    ... not here.

As

"(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"

but

"(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"


>
> I think I am missing again something in termination accounting :-)

If I understand correctly, we need to follow "end = start + size - 1" 
rule, so the "end" is the last address inside a range, but not the 
"first" address outside of a range))


>
>
>> +			continue;
>> +
>> +		/* Check whether the hole range is within the resource range */
>> +		if (start < res->start || end > res->end) {
> By definition I don't think this check is necessary as either condition
> is impossible?


This is a good question, let me please explain.
Not all extended regions provided by the hypervisor can be used here. 
This is because the addressable physical memory range for which the 
linear mapping
could be created has limits on Arm, and maximum addressable range 
depends on the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided 
to not filter them in hypervisor as this logic could be quite complex as 
different OS may have different requirement, etc. This means that we 
need to make sure that regions are within the hotpluggable range to 
avoid a failure later on when a region is pre-validated by the memory 
hotplug path.

The following code limits the resource range based on that:

+    /* Check whether the resource range is within the hotpluggable range */
+    mhp_range = mhp_get_pluggable_range(true);
+    if (min_gpaddr < mhp_range.start)
+        min_gpaddr = mhp_range.start;
+    if (max_gpaddr > mhp_range.end)
+        max_gpaddr = mhp_range.end;
+
+    res->start = min_gpaddr;
+    res->end = max_gpaddr;

In current loop (when calculating and inserting holes) we also need to 
make sure that resulting hole range is within the resource range (and 
adjust/skip it if not true) as regs[] used for the calculations contains 
raw regions as they described in DT so not updated. Otherwise 
insert_resource() down the function will return an error for the 
conflicting operations. Yes, I could took a different route and update 
regs[] in advance to adjust/skip non-suitable regions in front, but I 
decided to do it on the fly in the loop here, I thought doing it in 
advance would add some overhead/complexity. What do you think?

So I am afraid this check is necessary here.

For example in my environment the extended regions are:

(XEN) Extended region 0: 0->0x8000000
(XEN) Extended region 1: 0xc000000->0x30000000
(XEN) Extended region 2: 0x40000000->0x47e00000
(XEN) Extended region 3: 0xd0000000->0xe6000000
(XEN) Extended region 4: 0xe7800000->0xec000000
(XEN) Extended region 5: 0xf1200000->0xfd000000
(XEN) Extended region 6: 0x100000000->0x500000000
(XEN) Extended region 7: 0x580000000->0x600000000
(XEN) Extended region 8: 0x680000000->0x700000000
(XEN) Extended region 9: 0x780000000->0x10000000000

*With* the check the holes are:

holes [47e00000 - cfffffff]
holes [e6000000 - e77fffff]
holes [ec000000 - f11fffff]
holes [fd000000 - ffffffff]
holes [500000000 - 57fffffff]
holes [600000000 - 67fffffff]
holes [700000000 - 77fffffff]

And they seem to look correct, you can see that two possible holes 
between extended regions 0-1 (8000000-bffffff) and 1-2 
(30000000-3fffffff) were skipped as they entirely located below res->start
which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).

*Without* the check these two holes won't be skipped and as the result 
insert_resource() will fail.


**********


I have one idea how we can simplify filter logic, we can drop all checks 
here (including confusing one) in Arm code and update common code a bit:

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 1a1e0d3..ed5b855 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
         struct resource *regs, *tmp_res;
         uint64_t min_gpaddr = -1, max_gpaddr = 0;
         unsigned int i, nr_reg = 0;
-       struct range mhp_range;
         int rc;

         if (!xen_domain())
@@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
                         min_gpaddr = regs[i].start;
         }

-       /* Check whether the resource range is within the hotpluggable 
range */
-       mhp_range = mhp_get_pluggable_range(true);
-       if (min_gpaddr < mhp_range.start)
-               min_gpaddr = mhp_range.start;
-       if (max_gpaddr > mhp_range.end)
-               max_gpaddr = mhp_range.end;
-
         res->start = min_gpaddr;
         res->end = max_gpaddr;

@@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
                 if (start == (end + 1))
                         continue;

-               /* Check whether the hole range is within the resource 
range */
-               if (start < res->start || end > res->end) {
-                       if (start < res->start)
-                               start = res->start;
-                       if (end > res->end)
-                               end = res->end;
-
-                       if (start >= (end + 1))
-                               continue;
-               }
-
                 tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
                 if (!tmp_res) {
                         rc = -ENOMEM;
diff --git a/drivers/xen/unpopulated-alloc.c 
b/drivers/xen/unpopulated-alloc.c
index 1f1d8d8..a5d3ebb 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
         void *vaddr;
         unsigned int i, alloc_pages = round_up(nr_pages, 
PAGES_PER_SECTION);
         int ret;
+       struct range mhp_range;

         res = kzalloc(sizeof(*res), GFP_KERNEL);
         if (!res)
@@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
         res->name = "Xen scratch";
         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

+       mhp_range = mhp_get_pluggable_range(true);
+
         ret = allocate_resource(target_resource, res,
-                               alloc_pages * PAGE_SIZE, 0, -1,
+                               alloc_pages * PAGE_SIZE, 
mhp_range.start, mhp_range.end,
                                 PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
         if (ret < 0) {
                 pr_err("Cannot allocate new IOMEM resource\n");
(END)

I believe, this will work on x86 as arch_get_mappable_range() is not 
implemented there,
and the default option contains exactly what being used currently (0, -1).

struct range __weak arch_get_mappable_range(void)
{
     struct range mhp_range = {
         .start = 0UL,
         .end = -1ULL,
     };
     return mhp_range;
}

And this is going to be more generic and clear, what do you think?


>
>> +			if (start < res->start)
>> +				start = res->start;
>> +			if (end > res->end)
>> +				end = res->end;
>> +
>> +			if (start >= (end + 1))
>> +				continue;
>> +		}
>> +
>> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>> +		if (!tmp_res) {
>> +			rc = -ENOMEM;
>> +			goto err;
>> +		}
>> +
>> +		tmp_res->name = "Unavailable space";
>> +		tmp_res->start = start;
>> +		tmp_res->end = end;
> Do we need to set any flags so that the system can reuse the memory in
> the hole, e.g. IORESOURCE_MEM? Or is it not necessary?


I might be wrong, but I don't think it is necessary. I don't see how the 
system can reuse memory in the holes as
the Xen resource we are constructing here will be exclusively used by 
the unpopulated-alloc code only. I would leave type-less
resource here. Or I missed something?


>
>
>> +		rc = insert_resource(res, tmp_res);
>> +		if (rc) {
>> +			pr_err("Cannot insert resource [%llx - %llx] %d\n",
>> +					tmp_res->start, tmp_res->end, rc);
> Although it is impossible to enable XEN_UNPOPULATED_ALLOC on arm32 due
> to unmet dependencies, I would like to keep the implementation of
> arch_xen_unpopulated_init 32bit clean.
>
> I am getting build errors like (by forcing arch_xen_unpopulated_init to
> compile on arm32):
>
> ./include/linux/kern_levels.h:5:18: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’ [-Wformat=]

Thank you for pointing this out. I will use %pR specifier here and in 
the common code where I print the same message.


>
>
>> +			kfree(tmp_res);
>> +			goto err;
>> +		}
>> +	}
>> +
>> +err:
>> +	kfree(regs);
>> +
>> +	return rc;
>> +}
>> +#endif
>> +
>>   static void __init xen_dt_guest_init(void)
>>   {
>>   	struct device_node *xen_node;
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index 1b2c3ac..e6031fc 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -297,7 +297,7 @@ config XEN_FRONT_PGDIR_SHBUF
>>   
>>   config XEN_UNPOPULATED_ALLOC
>>   	bool "Use unpopulated memory ranges for guest mappings"
>> -	depends on X86 && ZONE_DEVICE
>> +	depends on ZONE_DEVICE
>>   	default XEN_BACKEND || XEN_GNTDEV || XEN_DOM0
>>   	help
>>   	  Use unpopulated memory ranges in order to create mappings for guest

-- 
Regards,

Oleksandr Tyshchenko


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
  2021-10-28  1:28     ` Stefano Stabellini
@ 2021-11-10 22:14       ` Oleksandr
  -1 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-10 22:14 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Julien Grall


On 28.10.21 04:28, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.

> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Read the start address of the grant table space from DT
>> (region 0).
>>
>> This patch mostly restores behaviour before commit 3cf4095d7446
>> ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
>> but trying not to break the ACPI support added after that commit.
>> So the patch touches DT part only and leaves the ACPI part with
>> xen_xlate_map_ballooned_pages().
>>
>> This is a preparation for using Xen extended region feature
>> where unused regions of guest physical address space (provided
>> by the hypervisor) will be used to create grant/foreign/whatever
>> mappings instead of wasting real RAM pages from the domain memory
>> for establishing these mappings.
>>
>> The immediate benefit of this change:
>> - Avoid superpage shattering in Xen P2M when establishing
>>    stage-2 mapping (GFN <-> MFN) for the grant table space
>> - Avoid wasting real RAM pages (reducing the amount of memory
>>    usuable) for mapping grant table space
>> - The grant table space is always mapped at the exact
>>    same place (region 0 is reserved for the grant table)
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes RFC -> V2:
>>     - new patch
>> ---
>>   arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
>>   1 file changed, 25 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index 7f1c106b..dea46ec 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
>>   struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
>>   
>>   static __read_mostly unsigned int xen_events_irq;
>> +static phys_addr_t xen_grant_frames;
> __read_mostly

ok


>
>
>> +#define GRANT_TABLE_INDEX   0
>>   
>>   uint32_t xen_start_flags;
>>   EXPORT_SYMBOL(xen_start_flags);
>> @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
>>   static void __init xen_dt_guest_init(void)
>>   {
>>   	struct device_node *xen_node;
>> +	struct resource res;
>>   
>>   	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
>>   	if (!xen_node) {
>> @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
>>   		return;
>>   	}
>>   
>> +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
>> +		pr_err("Xen grant table region is not found\n");
>> +		return;
>> +	}
>> +	xen_grant_frames = res.start;
>> +
>>   	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
>>   }
>>   
>> @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
>>   {
>>   	struct xen_add_to_physmap xatp;
>>   	struct shared_info *shared_info_page = NULL;
>> -	int cpu;
>> +	int rc, cpu;
>>   
>>   	if (!xen_domain())
>>   		return 0;
>>   
>>   	if (!acpi_disabled)
>>   		xen_acpi_guest_init();
>> -	else
>> +	else {
>>   		xen_dt_guest_init();
>>   
>> +		if (!xen_grant_frames)
>> +			return -ENODEV;
> maybe we can avoid this, see below
>
>
>> +	}
>> +
>>   	if (!xen_events_irq) {
>>   		pr_err("Xen event channel interrupt not found\n");
>>   		return -ENODEV;
>> @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
>>   	for_each_possible_cpu(cpu)
>>   		per_cpu(xen_vcpu_id, cpu) = cpu;
>>   
>> -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>> -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>> -					  &xen_auto_xlat_grant_frames.vaddr,
>> -					  xen_auto_xlat_grant_frames.count)) {
>> +	if (!acpi_disabled) {
> To make the code more resilient couldn't we do:
>
> if (!acpi_disabled || !xen_grant_frames) {
I think, we can.

On the one hand, indeed the code more resilient and less change.
 From the other hand if grant table region is not found then something 
weird happened as region 0 is always present in reg property if 
hypervisor node is exposed to the guest.
The behavior before commit 3cf4095d7446 ("arm/xen: Use 
xen_xlate_map_ballooned_pages to setup grant table") was exactly the 
same in the context of the failure if region wasn't found.

...

Well, if we want to make code more resilient, I will update. But, looks 
like we also need to switch actions in xen_dt_guest_init() in order to 
process xen_events_irq before xen_grant_frames, otherwise we may return 
after failing with region and end up not initializing xen_events_irq so 
xen_guest_init() will fail earlier than reaches that check.
What do you think?


>
>> +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>> +		rc = xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>> +										   &xen_auto_xlat_grant_frames.vaddr,
>> +										   xen_auto_xlat_grant_frames.count);
>> +	} else
>> +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
>> +	if (rc) {
>>   		free_percpu(xen_vcpu_info);
>> -		return -ENOMEM;
>> +		return rc;
>>   	}
>>   	gnttab_init();

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
@ 2021-11-10 22:14       ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-10 22:14 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Julien Grall


On 28.10.21 04:28, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.

> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Read the start address of the grant table space from DT
>> (region 0).
>>
>> This patch mostly restores behaviour before commit 3cf4095d7446
>> ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
>> but trying not to break the ACPI support added after that commit.
>> So the patch touches DT part only and leaves the ACPI part with
>> xen_xlate_map_ballooned_pages().
>>
>> This is a preparation for using Xen extended region feature
>> where unused regions of guest physical address space (provided
>> by the hypervisor) will be used to create grant/foreign/whatever
>> mappings instead of wasting real RAM pages from the domain memory
>> for establishing these mappings.
>>
>> The immediate benefit of this change:
>> - Avoid superpage shattering in Xen P2M when establishing
>>    stage-2 mapping (GFN <-> MFN) for the grant table space
>> - Avoid wasting real RAM pages (reducing the amount of memory
>>    usuable) for mapping grant table space
>> - The grant table space is always mapped at the exact
>>    same place (region 0 is reserved for the grant table)
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes RFC -> V2:
>>     - new patch
>> ---
>>   arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
>>   1 file changed, 25 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index 7f1c106b..dea46ec 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
>>   struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
>>   
>>   static __read_mostly unsigned int xen_events_irq;
>> +static phys_addr_t xen_grant_frames;
> __read_mostly

ok


>
>
>> +#define GRANT_TABLE_INDEX   0
>>   
>>   uint32_t xen_start_flags;
>>   EXPORT_SYMBOL(xen_start_flags);
>> @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
>>   static void __init xen_dt_guest_init(void)
>>   {
>>   	struct device_node *xen_node;
>> +	struct resource res;
>>   
>>   	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
>>   	if (!xen_node) {
>> @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
>>   		return;
>>   	}
>>   
>> +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
>> +		pr_err("Xen grant table region is not found\n");
>> +		return;
>> +	}
>> +	xen_grant_frames = res.start;
>> +
>>   	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
>>   }
>>   
>> @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
>>   {
>>   	struct xen_add_to_physmap xatp;
>>   	struct shared_info *shared_info_page = NULL;
>> -	int cpu;
>> +	int rc, cpu;
>>   
>>   	if (!xen_domain())
>>   		return 0;
>>   
>>   	if (!acpi_disabled)
>>   		xen_acpi_guest_init();
>> -	else
>> +	else {
>>   		xen_dt_guest_init();
>>   
>> +		if (!xen_grant_frames)
>> +			return -ENODEV;
> maybe we can avoid this, see below
>
>
>> +	}
>> +
>>   	if (!xen_events_irq) {
>>   		pr_err("Xen event channel interrupt not found\n");
>>   		return -ENODEV;
>> @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
>>   	for_each_possible_cpu(cpu)
>>   		per_cpu(xen_vcpu_id, cpu) = cpu;
>>   
>> -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>> -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>> -					  &xen_auto_xlat_grant_frames.vaddr,
>> -					  xen_auto_xlat_grant_frames.count)) {
>> +	if (!acpi_disabled) {
> To make the code more resilient couldn't we do:
>
> if (!acpi_disabled || !xen_grant_frames) {
I think, we can.

On the one hand, indeed the code more resilient and less change.
 From the other hand if grant table region is not found then something 
weird happened as region 0 is always present in reg property if 
hypervisor node is exposed to the guest.
The behavior before commit 3cf4095d7446 ("arm/xen: Use 
xen_xlate_map_ballooned_pages to setup grant table") was exactly the 
same in the context of the failure if region wasn't found.

...

Well, if we want to make code more resilient, I will update. But, looks 
like we also need to switch actions in xen_dt_guest_init() in order to 
process xen_events_irq before xen_grant_frames, otherwise we may return 
after failing with region and end up not initializing xen_events_irq so 
xen_guest_init() will fail earlier than reaches that check.
What do you think?


>
>> +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>> +		rc = xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>> +										   &xen_auto_xlat_grant_frames.vaddr,
>> +										   xen_auto_xlat_grant_frames.count);
>> +	} else
>> +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
>> +	if (rc) {
>>   		free_percpu(xen_vcpu_info);
>> -		return -ENOMEM;
>> +		return rc;
>>   	}
>>   	gnttab_init();

-- 
Regards,

Oleksandr Tyshchenko


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
  2021-11-10 22:14       ` Oleksandr
@ 2021-11-19  0:32         ` Stefano Stabellini
  -1 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-19  0:32 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, linux-arm-kernel, linux-kernel,
	Oleksandr Tyshchenko, Russell King, Julien Grall

On Thu, 11 Nov 2021, Oleksandr wrote:
> On 28.10.21 04:28, Stefano Stabellini wrote:
> 
> Hi Stefano
> 
> I am sorry for the late response.
> 
> > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > Read the start address of the grant table space from DT
> > > (region 0).
> > > 
> > > This patch mostly restores behaviour before commit 3cf4095d7446
> > > ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
> > > but trying not to break the ACPI support added after that commit.
> > > So the patch touches DT part only and leaves the ACPI part with
> > > xen_xlate_map_ballooned_pages().
> > > 
> > > This is a preparation for using Xen extended region feature
> > > where unused regions of guest physical address space (provided
> > > by the hypervisor) will be used to create grant/foreign/whatever
> > > mappings instead of wasting real RAM pages from the domain memory
> > > for establishing these mappings.
> > > 
> > > The immediate benefit of this change:
> > > - Avoid superpage shattering in Xen P2M when establishing
> > >    stage-2 mapping (GFN <-> MFN) for the grant table space
> > > - Avoid wasting real RAM pages (reducing the amount of memory
> > >    usuable) for mapping grant table space
> > > - The grant table space is always mapped at the exact
> > >    same place (region 0 is reserved for the grant table)
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > Changes RFC -> V2:
> > >     - new patch
> > > ---
> > >   arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
> > >   1 file changed, 25 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > index 7f1c106b..dea46ec 100644
> > > --- a/arch/arm/xen/enlighten.c
> > > +++ b/arch/arm/xen/enlighten.c
> > > @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
> > >   struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS]
> > > __initdata;
> > >     static __read_mostly unsigned int xen_events_irq;
> > > +static phys_addr_t xen_grant_frames;
> > __read_mostly
> 
> ok
> 
> 
> > 
> > 
> > > +#define GRANT_TABLE_INDEX   0
> > >     uint32_t xen_start_flags;
> > >   EXPORT_SYMBOL(xen_start_flags);
> > > @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
> > >   static void __init xen_dt_guest_init(void)
> > >   {
> > >   	struct device_node *xen_node;
> > > +	struct resource res;
> > >     	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
> > >   	if (!xen_node) {
> > > @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
> > >   		return;
> > >   	}
> > >   +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
> > > +		pr_err("Xen grant table region is not found\n");
> > > +		return;
> > > +	}
> > > +	xen_grant_frames = res.start;
> > > +
> > >   	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
> > >   }
> > >   @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
> > >   {
> > >   	struct xen_add_to_physmap xatp;
> > >   	struct shared_info *shared_info_page = NULL;
> > > -	int cpu;
> > > +	int rc, cpu;
> > >     	if (!xen_domain())
> > >   		return 0;
> > >     	if (!acpi_disabled)
> > >   		xen_acpi_guest_init();
> > > -	else
> > > +	else {
> > >   		xen_dt_guest_init();
> > >   +		if (!xen_grant_frames)
> > > +			return -ENODEV;
> > maybe we can avoid this, see below
> > 
> > 
> > > +	}
> > > +
> > >   	if (!xen_events_irq) {
> > >   		pr_err("Xen event channel interrupt not found\n");
> > >   		return -ENODEV;
> > > @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
> > >   	for_each_possible_cpu(cpu)
> > >   		per_cpu(xen_vcpu_id, cpu) = cpu;
> > >   -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> > > -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> > > -					  &xen_auto_xlat_grant_frames.vaddr,
> > > -					  xen_auto_xlat_grant_frames.count)) {
> > > +	if (!acpi_disabled) {
> > To make the code more resilient couldn't we do:
> > 
> > if (!acpi_disabled || !xen_grant_frames) {
> I think, we can.
> 
> On the one hand, indeed the code more resilient and less change.
> From the other hand if grant table region is not found then something weird
> happened as region 0 is always present in reg property if hypervisor node is
> exposed to the guest.
> The behavior before commit 3cf4095d7446 ("arm/xen: Use
> xen_xlate_map_ballooned_pages to setup grant table") was exactly the same in
> the context of the failure if region wasn't found.
> 
> ...
> 
> Well, if we want to make code more resilient, I will update. But, looks like
> we also need to switch actions in xen_dt_guest_init() in order to process
> xen_events_irq before xen_grant_frames, otherwise we may return after failing
> with region and end up not initializing xen_events_irq so xen_guest_init()
> will fail earlier than reaches that check.
> What do you think?
 
Yes, you are right. I was re-reading the patch to refresh my memory and
I noticed immediately that xen_dt_guest_init also need to be changed so
that xen_events_irq is set before xen_grant_frames.
 
I think it is a minor change that doesn't add complexity but make the
code more robust so I think it is a good idea

 
> > > +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> > > +		rc =
> > > xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> > > +
> > > &xen_auto_xlat_grant_frames.vaddr,
> > > +
> > > xen_auto_xlat_grant_frames.count);
> > > +	} else
> > > +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
> > > +	if (rc) {
> > >   		free_percpu(xen_vcpu_info);
> > > -		return -ENOMEM;
> > > +		return rc;
> > >   	}
> > >   	gnttab_init();


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
@ 2021-11-19  0:32         ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-19  0:32 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, linux-arm-kernel, linux-kernel,
	Oleksandr Tyshchenko, Russell King, Julien Grall

On Thu, 11 Nov 2021, Oleksandr wrote:
> On 28.10.21 04:28, Stefano Stabellini wrote:
> 
> Hi Stefano
> 
> I am sorry for the late response.
> 
> > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > Read the start address of the grant table space from DT
> > > (region 0).
> > > 
> > > This patch mostly restores behaviour before commit 3cf4095d7446
> > > ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
> > > but trying not to break the ACPI support added after that commit.
> > > So the patch touches DT part only and leaves the ACPI part with
> > > xen_xlate_map_ballooned_pages().
> > > 
> > > This is a preparation for using Xen extended region feature
> > > where unused regions of guest physical address space (provided
> > > by the hypervisor) will be used to create grant/foreign/whatever
> > > mappings instead of wasting real RAM pages from the domain memory
> > > for establishing these mappings.
> > > 
> > > The immediate benefit of this change:
> > > - Avoid superpage shattering in Xen P2M when establishing
> > >    stage-2 mapping (GFN <-> MFN) for the grant table space
> > > - Avoid wasting real RAM pages (reducing the amount of memory
> > >    usuable) for mapping grant table space
> > > - The grant table space is always mapped at the exact
> > >    same place (region 0 is reserved for the grant table)
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > Changes RFC -> V2:
> > >     - new patch
> > > ---
> > >   arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
> > >   1 file changed, 25 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > index 7f1c106b..dea46ec 100644
> > > --- a/arch/arm/xen/enlighten.c
> > > +++ b/arch/arm/xen/enlighten.c
> > > @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
> > >   struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS]
> > > __initdata;
> > >     static __read_mostly unsigned int xen_events_irq;
> > > +static phys_addr_t xen_grant_frames;
> > __read_mostly
> 
> ok
> 
> 
> > 
> > 
> > > +#define GRANT_TABLE_INDEX   0
> > >     uint32_t xen_start_flags;
> > >   EXPORT_SYMBOL(xen_start_flags);
> > > @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
> > >   static void __init xen_dt_guest_init(void)
> > >   {
> > >   	struct device_node *xen_node;
> > > +	struct resource res;
> > >     	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
> > >   	if (!xen_node) {
> > > @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
> > >   		return;
> > >   	}
> > >   +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
> > > +		pr_err("Xen grant table region is not found\n");
> > > +		return;
> > > +	}
> > > +	xen_grant_frames = res.start;
> > > +
> > >   	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
> > >   }
> > >   @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
> > >   {
> > >   	struct xen_add_to_physmap xatp;
> > >   	struct shared_info *shared_info_page = NULL;
> > > -	int cpu;
> > > +	int rc, cpu;
> > >     	if (!xen_domain())
> > >   		return 0;
> > >     	if (!acpi_disabled)
> > >   		xen_acpi_guest_init();
> > > -	else
> > > +	else {
> > >   		xen_dt_guest_init();
> > >   +		if (!xen_grant_frames)
> > > +			return -ENODEV;
> > maybe we can avoid this, see below
> > 
> > 
> > > +	}
> > > +
> > >   	if (!xen_events_irq) {
> > >   		pr_err("Xen event channel interrupt not found\n");
> > >   		return -ENODEV;
> > > @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
> > >   	for_each_possible_cpu(cpu)
> > >   		per_cpu(xen_vcpu_id, cpu) = cpu;
> > >   -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> > > -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> > > -					  &xen_auto_xlat_grant_frames.vaddr,
> > > -					  xen_auto_xlat_grant_frames.count)) {
> > > +	if (!acpi_disabled) {
> > To make the code more resilient couldn't we do:
> > 
> > if (!acpi_disabled || !xen_grant_frames) {
> I think, we can.
> 
> On the one hand, indeed the code more resilient and less change.
> From the other hand if grant table region is not found then something weird
> happened as region 0 is always present in reg property if hypervisor node is
> exposed to the guest.
> The behavior before commit 3cf4095d7446 ("arm/xen: Use
> xen_xlate_map_ballooned_pages to setup grant table") was exactly the same in
> the context of the failure if region wasn't found.
> 
> ...
> 
> Well, if we want to make code more resilient, I will update. But, looks like
> we also need to switch actions in xen_dt_guest_init() in order to process
> xen_events_irq before xen_grant_frames, otherwise we may return after failing
> with region and end up not initializing xen_events_irq so xen_guest_init()
> will fail earlier than reaches that check.
> What do you think?
 
Yes, you are right. I was re-reading the patch to refresh my memory and
I noticed immediately that xen_dt_guest_init also need to be changed so
that xen_events_irq is set before xen_grant_frames.
 
I think it is a minor change that doesn't add complexity but make the
code more robust so I think it is a good idea

 
> > > +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
> > > +		rc =
> > > xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
> > > +
> > > &xen_auto_xlat_grant_frames.vaddr,
> > > +
> > > xen_auto_xlat_grant_frames.count);
> > > +	} else
> > > +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
> > > +	if (rc) {
> > >   		free_percpu(xen_vcpu_info);
> > > -		return -ENOMEM;
> > > +		return rc;
> > >   	}
> > >   	gnttab_init();


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-09 18:34     ` Oleksandr
@ 2021-11-19  0:59       ` Stefano Stabellini
  2021-11-19 18:18         ` Oleksandr
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-19  0:59 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, linux-kernel,
	Oleksandr Tyshchenko, Boris Ostrovsky, Juergen Gross,
	Julien Grall

On Tue, 9 Nov 2021, Oleksandr wrote:
> On 28.10.21 19:37, Stefano Stabellini wrote:
> 
> Hi Stefano
> 
> I am sorry for the late response.
> 
> > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > The main reason of this change is that unpopulated-alloc
> > > code cannot be used in its current form on Arm, but there
> > > is a desire to reuse it to avoid wasting real RAM pages
> > > for the grant/foreign mappings.
> > > 
> > > The problem is that system "iomem_resource" is used for
> > > the address space allocation, but the really unallocated
> > > space can't be figured out precisely by the domain on Arm
> > > without hypervisor involvement. For example, not all device
> > > I/O regions are known by the time domain starts creating
> > > grant/foreign mappings. And following the advise from
> > > "iomem_resource" we might end up reusing these regions by
> > > a mistake. So, the hypervisor which maintains the P2M for
> > > the domain is in the best position to provide unused regions
> > > of guest physical address space which could be safely used
> > > to create grant/foreign mappings.
> > > 
> > > Introduce new helper arch_xen_unpopulated_init() which purpose
> > > is to create specific Xen resource based on the memory regions
> > > provided by the hypervisor to be used as unused space for Xen
> > > scratch pages.
> > > 
> > > If arch doesn't implement arch_xen_unpopulated_init() to
> > > initialize Xen resource the default "iomem_resource" will be used.
> > > So the behavior on x86 won't be changed.
> > > 
> > > Also fall back to allocate xenballooned pages (steal real RAM
> > > pages) if we do not have any suitable resource to work with and
> > > as the result we won't be able to provide unpopulated pages.
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > Changes RFC -> V2:
> > >     - new patch, instead of
> > >      "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
> > > unallocated space"
> > > ---
> > >   drivers/xen/unpopulated-alloc.c | 89
> > > +++++++++++++++++++++++++++++++++++++++--
> > >   include/xen/xen.h               |  2 +
> > >   2 files changed, 88 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/xen/unpopulated-alloc.c
> > > b/drivers/xen/unpopulated-alloc.c
> > > index a03dc5b..1f1d8d8 100644
> > > --- a/drivers/xen/unpopulated-alloc.c
> > > +++ b/drivers/xen/unpopulated-alloc.c
> > > @@ -8,6 +8,7 @@
> > >     #include <asm/page.h>
> > >   +#include <xen/balloon.h>
> > >   #include <xen/page.h>
> > >   #include <xen/xen.h>
> > >   @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
> > >   static struct page *page_list;
> > >   static unsigned int list_count;
> > >   +static struct resource *target_resource;
> > > +static struct resource xen_resource = {
> > > +	.name = "Xen unused space",
> > > +};
> > > +
> > > +/*
> > > + * If arch is not happy with system "iomem_resource" being used for
> > > + * the region allocation it can provide it's own view by initializing
> > > + * "xen_resource" with unused regions of guest physical address space
> > > + * provided by the hypervisor.
> > > + */
> > > +int __weak arch_xen_unpopulated_init(struct resource *res)
> > > +{
> > > +	return -ENOSYS;
> > > +}
> > > +
> > >   static int fill_list(unsigned int nr_pages)
> > >   {
> > >   	struct dev_pagemap *pgmap;
> > > -	struct resource *res;
> > > +	struct resource *res, *tmp_res = NULL;
> > >   	void *vaddr;
> > >   	unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
> > > -	int ret = -ENOMEM;
> > > +	int ret;
> > >     	res = kzalloc(sizeof(*res), GFP_KERNEL);
> > >   	if (!res)
> > > @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
> > >   	res->name = "Xen scratch";
> > >   	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > >   -	ret = allocate_resource(&iomem_resource, res,
> > > +	ret = allocate_resource(target_resource, res,
> > >   				alloc_pages * PAGE_SIZE, 0, -1,
> > >   				PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
> > >   	if (ret < 0) {
> > > @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
> > >   		goto err_resource;
> > >   	}
> > >   +	/*
> > > +	 * Reserve the region previously allocated from Xen resource to avoid
> > > +	 * re-using it by someone else.
> > > +	 */
> > > +	if (target_resource != &iomem_resource) {
> > > +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
> > > +		if (!res) {
> > > +			ret = -ENOMEM;
> > > +			goto err_insert;
> > > +		}
> > > +
> > > +		tmp_res->name = res->name;
> > > +		tmp_res->start = res->start;
> > > +		tmp_res->end = res->end;
> > > +		tmp_res->flags = res->flags;
> > > +
> > > +		ret = insert_resource(&iomem_resource, tmp_res);
> > > +		if (ret < 0) {
> > > +			pr_err("Cannot insert IOMEM resource [%llx - %llx]\n",
> > > +			       tmp_res->start, tmp_res->end);
> > > +			kfree(tmp_res);
> > > +			goto err_insert;
> > > +		}
> > > +	}
> > I am a bit confused.. why do we need to do this? Who could be
> > erroneously re-using the region? Are you saying that the next time
> > allocate_resource is called it could find the same region again? It
> > doesn't seem possible?
> 
> 
> No, as I understand the allocate_resource() being called for the same root
> resource won't provide the same region... We only need to do this (insert the
> region into "iomem_resource") if we allocated it from our *internal*
> "xen_resource", as *global* "iomem_resource" (which is used everywhere) is not
> aware of that region has been already allocated. So inserting a region here we
> reserving it, otherwise it could be reused elsewhere.

But elsewhere where?

Let's say that allocate_resource allocates a range from xen_resource.
From reading the code, it doesn't look like iomem_resource would have
that range because the extended regions described under /hypervisor are
not added automatically to iomem_resource.

So what if we don't call insert_resource? Nothing could allocate the
same range because iomem_resource doesn't have it at all and
xen_resource is not used anywhere if not here.

What am I missing?


Or maybe it is the other way around: core Linux code assumes everything
is described in iomem_resource so something under kernel/ or mm/ would
crash if we start using a page pointing to an address missing from
iomem_resource?
 
 
> > >   	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
> > >   	if (!pgmap) {
> > >   		ret = -ENOMEM;
> > > @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
> > >   err_memremap:
> > >   	kfree(pgmap);
> > >   err_pgmap:
> > > +	if (tmp_res) {
> > > +		release_resource(tmp_res);
> > > +		kfree(tmp_res);
> > > +	}
> > > +err_insert:
> > >   	release_resource(res);
> > >   err_resource:
> > >   	kfree(res);
> > >   	return ret;
> > >   }
> > >   +static void unpopulated_init(void)
> > > +{
> > > +	static bool inited = false;
> > initialized = false
> 
> ok.
> 
> 
> > 
> > 
> > > +	int ret;
> > > +
> > > +	if (inited)
> > > +		return;
> > > +
> > > +	/*
> > > +	 * Try to initialize Xen resource the first and fall back to default
> > > +	 * resource if arch doesn't offer one.
> > > +	 */
> > > +	ret = arch_xen_unpopulated_init(&xen_resource);
> > > +	if (!ret)
> > > +		target_resource = &xen_resource;
> > > +	else if (ret == -ENOSYS)
> > > +		target_resource = &iomem_resource;
> > > +	else
> > > +		pr_err("Cannot initialize Xen resource\n");
> > > +
> > > +	inited = true;
> > > +}
> > Would it make sense to call unpopulated_init from an init function,
> > rather than every time xen_alloc_unpopulated_pages is called?
> 
> Good point, thank you. Will do. To be honest, I also don't like the current
> approach much.
> 
> 
> > 
> > 
> > >   /**
> > >    * xen_alloc_unpopulated_pages - alloc unpopulated pages
> > >    * @nr_pages: Number of pages
> > > @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int
> > > nr_pages, struct page **pages)
> > >   	unsigned int i;
> > >   	int ret = 0;
> > >   +	unpopulated_init();
> > > +
> > > +	/*
> > > +	 * Fall back to default behavior if we do not have any suitable
> > > resource
> > > +	 * to allocate required region from and as the result we won't be able
> > > to
> > > +	 * construct pages.
> > > +	 */
> > > +	if (!target_resource)
> > > +		return alloc_xenballooned_pages(nr_pages, pages);
> > The commit message says that the behavior on x86 doesn't change but this
> > seems to be a change that could impact x86?
> I don't think, however I didn't tested on x86 and might be wrong, but
> according to the current patch, on x86 the "target_resource" is always valid
> and points to the "iomem_resource" as arch_xen_unpopulated_init() is not
> implemented. So there won't be any fallback to use
> alloc_(free)_xenballooned_pages() here and fill_list() will behave as usual.
 
If target_resource is always valid, then we don't need this special
check. In fact, the condition should never be true.


> You raised a really good question, on Arm we need a fallback to balloon out
> RAM pages again if hypervisor doesn't provide extended regions (we run on old
> version, no unused regions with reasonable size, etc), so I decided to put a
> fallback code here, an indicator of the failure is invalid "target_resource".

I think it is unnecessary as we already assume today that
&iomem_resource is always available.


> I noticed the patch which is about to be upstreamed that removes
> alloc_(free)xenballooned_pages API [1]. Right now I have no idea how/where
> this fallback could be implemented as this is under build option control
> (CONFIG_XEN_UNPOPULATED_ALLOC). So the API with the same name is either used
> for unpopulated pages (if set) or ballooned pages (if not set). I would
> appreciate suggestions regarding that. I am wondering would it be possible and
> correctly to have both mechanisms (unpopulated and ballooned) enabled by
> default and some init code to decide which one to use at runtime or some sort?

I would keep it simple and remove the fallback from this patch. So:

- if not CONFIG_XEN_UNPOPULATED_ALLOC, then balloon
- if CONFIG_XEN_UNPOPULATED_ALLOC, then
    - xen_resource if present
    - otherwise iomem_resource

The xen_resource/iomem_resource config can be done at init time using
target_resource. At runtime, target_resource is always != NULL so we
just go ahead and use it.

 
> > 
> > >   	mutex_lock(&list_lock);
> > >   	if (list_count < nr_pages) {
> > >   		ret = fill_list(nr_pages - list_count);
> > > @@ -159,6 +239,9 @@ void xen_free_unpopulated_pages(unsigned int nr_pages,
> > > struct page **pages)
> > >   {
> > >   	unsigned int i;
> > >   +	if (!target_resource)
> > > +		return free_xenballooned_pages(nr_pages, pages);
> > > +
> > >   	mutex_lock(&list_lock);
> > >   	for (i = 0; i < nr_pages; i++) {
> > >   		pages[i]->zone_device_data = page_list;
> > > diff --git a/include/xen/xen.h b/include/xen/xen.h
> > > index 43efba0..55d2ef8 100644
> > > --- a/include/xen/xen.h
> > > +++ b/include/xen/xen.h
> > > @@ -55,6 +55,8 @@ extern u64 xen_saved_max_mem_size;
> > >   #ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > >   int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page
> > > **pages);
> > >   void xen_free_unpopulated_pages(unsigned int nr_pages, struct page
> > > **pages);
> > > +struct resource;
> > This is to avoid having to #include linux/ioport.h, right? Is it a
> > problem or is it just to minimize the headers dependencies?
> > 
> > It looks like adding #include <linux/ioport.h> below #include
> > <linux/types.h> in include/xen/xen.h would work too. I am not sure what
> > is the best way though, I'll let Juergen comment.
> Yes, the initial reason to use forward declaration here was to minimize the
> headers dependencies.
> I have rechecked, your suggestion works as well, thank you. So I would be OK
> either way, let's wait for other opinions.
> 
> 
> > 
> > 
> > > +int arch_xen_unpopulated_init(struct resource *res);
> > >   #else
> > >   #define xen_alloc_unpopulated_pages alloc_xenballooned_pages
> > >   #define xen_free_unpopulated_pages free_xenballooned_pages
> > > -- 
> > > 2.7.4
> > > 
> 
> [1] https://lore.kernel.org/lkml/20211102092234.17852-1-jgross@suse.com/
> 
> -- 
> Regards,
> 
> Oleksandr Tyshchenko
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
  2021-11-10 20:21       ` Oleksandr
@ 2021-11-19  1:19         ` Stefano Stabellini
  -1 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-19  1:19 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, linux-arm-kernel, linux-kernel,
	Oleksandr Tyshchenko, Russell King, Boris Ostrovsky,
	Juergen Gross, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 12819 bytes --]

On Wed, 10 Nov 2021, Oleksandr wrote:
> On 28.10.21 04:40, Stefano Stabellini wrote:
> 
> Hi Stefano
> 
> I am sorry for the late response.
> 
> > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > This patch implements arch_xen_unpopulated_init() on Arm where
> > > the extended regions (if any) are gathered from DT and inserted
> > > into passed Xen resource to be used as unused address space
> > > for Xen scratch pages by unpopulated-alloc code.
> > > 
> > > The extended region (safe range) is a region of guest physical
> > > address space which is unused and could be safely used to create
> > > grant/foreign mappings instead of wasting real RAM pages from
> > > the domain memory for establishing these mappings.
> > > 
> > > The extended regions are chosen by the hypervisor at the domain
> > > creation time and advertised to it via "reg" property under
> > > hypervisor node in the guest device-tree. As region 0 is reserved
> > > for grant table space (always present), the indexes for extended
> > > regions are 1...N.
> > > 
> > > If arch_xen_unpopulated_init() fails for some reason the default
> > > behaviour will be restored (allocate xenballooned pages).
> > > 
> > > This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > Changes RFC -> V2:
> > >     - new patch, instead of
> > >      "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
> > > unallocated space"
> > > ---
> > >   arch/arm/xen/enlighten.c | 112
> > > +++++++++++++++++++++++++++++++++++++++++++++++
> > >   drivers/xen/Kconfig      |   2 +-
> > >   2 files changed, 113 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > index dea46ec..1a1e0d3 100644
> > > --- a/arch/arm/xen/enlighten.c
> > > +++ b/arch/arm/xen/enlighten.c
> > > @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
> > >   static phys_addr_t xen_grant_frames;
> > >     #define GRANT_TABLE_INDEX   0
> > > +#define EXT_REGION_INDEX    1
> > >     uint32_t xen_start_flags;
> > >   EXPORT_SYMBOL(xen_start_flags);
> > > @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
> > >   #endif
> > >   }
> > >   +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > +int arch_xen_unpopulated_init(struct resource *res)
> > > +{
> > > +	struct device_node *np;
> > > +	struct resource *regs, *tmp_res;
> > > +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
> > > +	unsigned int i, nr_reg = 0;
> > > +	struct range mhp_range;
> > > +	int rc;
> > > +
> > > +	if (!xen_domain())
> > > +		return -ENODEV;
> > > +
> > > +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
> > > +	if (WARN_ON(!np))
> > > +		return -ENODEV;
> > > +
> > > +	/* Skip region 0 which is reserved for grant table space */
> > > +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
> > > +		nr_reg++;
> > > +	if (!nr_reg) {
> > > +		pr_err("No extended regions are found\n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> > > +	if (!regs)
> > > +		return -ENOMEM;
> > > +
> > > +	/*
> > > +	 * Create resource from extended regions provided by the hypervisor to
> > > be
> > > +	 * used as unused address space for Xen scratch pages.
> > > +	 */
> > > +	for (i = 0; i < nr_reg; i++) {
> > > +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
> > > &regs[i]);
> > > +		if (rc)
> > > +			goto err;
> > > +
> > > +		if (max_gpaddr < regs[i].end)
> > > +			max_gpaddr = regs[i].end;
> > > +		if (min_gpaddr > regs[i].start)
> > > +			min_gpaddr = regs[i].start;
> > > +	}
> > > +
> > > +	/* Check whether the resource range is within the hotpluggable range
> > > */
> > > +	mhp_range = mhp_get_pluggable_range(true);
> > > +	if (min_gpaddr < mhp_range.start)
> > > +		min_gpaddr = mhp_range.start;
> > > +	if (max_gpaddr > mhp_range.end)
> > > +		max_gpaddr = mhp_range.end;
> > > +
> > > +	res->start = min_gpaddr;
> > > +	res->end = max_gpaddr;
> > > +
> > > +	/*
> > > +	 * Mark holes between extended regions as unavailable. The rest of
> > > that
> > > +	 * address space will be available for the allocation.
> > > +	 */
> > > +	for (i = 1; i < nr_reg; i++) {
> > > +		resource_size_t start, end;
> > > +
> > > +		start = regs[i - 1].end + 1;
> > > +		end = regs[i].start - 1;
> > > +
> > > +		if (start > (end + 1)) {
> > Should this be:
> > 
> > if (start >= end)
> > 
> > ?
> 
> Yes, we can do this here (since the checks are equivalent) but ...
>
> > > +			rc = -EINVAL;
> > > +			goto err;
> > > +		}
> > > +
> > > +		/* There is no hole between regions */
> > > +		if (start == (end + 1))
> > Also here, shouldn't it be:
> > 
> > if (start == end)
> > 
> > ?
> 
>    ... not here.
> 
> As
> 
> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"
> 
> but
> 
> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
 
OK. But the check:

  if (start >= end)

Actually covers both cases so that's the only check we need?


> > 
> > I think I am missing again something in termination accounting :-)
> 
> If I understand correctly, we need to follow "end = start + size - 1" rule, so
> the "end" is the last address inside a range, but not the "first" address
> outside of a range))

yeah
 

> > > +			continue;
> > > +
> > > +		/* Check whether the hole range is within the resource range
> > > */
> > > +		if (start < res->start || end > res->end) {
> > By definition I don't think this check is necessary as either condition
> > is impossible?
> 
> 
> This is a good question, let me please explain.
> Not all extended regions provided by the hypervisor can be used here. This is
> because the addressable physical memory range for which the linear mapping
> could be created has limits on Arm, and maximum addressable range depends on
> the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them
> in hypervisor as this logic could be quite complex as different OS may have
> different requirement, etc. This means that we need to make sure that regions
> are within the hotpluggable range to avoid a failure later on when a region is
> pre-validated by the memory hotplug path.
> 
> The following code limits the resource range based on that:
> 
> +    /* Check whether the resource range is within the hotpluggable range */
> +    mhp_range = mhp_get_pluggable_range(true);
> +    if (min_gpaddr < mhp_range.start)
> +        min_gpaddr = mhp_range.start;
> +    if (max_gpaddr > mhp_range.end)
> +        max_gpaddr = mhp_range.end;
> +
> +    res->start = min_gpaddr;
> +    res->end = max_gpaddr;
> 
> In current loop (when calculating and inserting holes) we also need to make
> sure that resulting hole range is within the resource range (and adjust/skip
> it if not true) as regs[] used for the calculations contains raw regions as
> they described in DT so not updated. Otherwise insert_resource() down the
> function will return an error for the conflicting operations. Yes, I could
> took a different route and update regs[] in advance to adjust/skip
> non-suitable regions in front, but I decided to do it on the fly in the loop
> here, I thought doing it in advance would add some overhead/complexity. What
> do you think?

I understand now.


> So I am afraid this check is necessary here.
> 
> For example in my environment the extended regions are:
> 
> (XEN) Extended region 0: 0->0x8000000
> (XEN) Extended region 1: 0xc000000->0x30000000
> (XEN) Extended region 2: 0x40000000->0x47e00000
> (XEN) Extended region 3: 0xd0000000->0xe6000000
> (XEN) Extended region 4: 0xe7800000->0xec000000
> (XEN) Extended region 5: 0xf1200000->0xfd000000
> (XEN) Extended region 6: 0x100000000->0x500000000
> (XEN) Extended region 7: 0x580000000->0x600000000
> (XEN) Extended region 8: 0x680000000->0x700000000
> (XEN) Extended region 9: 0x780000000->0x10000000000
> 
> *With* the check the holes are:
> 
> holes [47e00000 - cfffffff]
> holes [e6000000 - e77fffff]
> holes [ec000000 - f11fffff]
> holes [fd000000 - ffffffff]
> holes [500000000 - 57fffffff]
> holes [600000000 - 67fffffff]
> holes [700000000 - 77fffffff]
> 
> And they seem to look correct, you can see that two possible holes between
> extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were
> skipped as they entirely located below res->start
> which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).
> 
> *Without* the check these two holes won't be skipped and as the result
> insert_resource() will fail.
> 
> 
> **********
> 
> 
> I have one idea how we can simplify filter logic, we can drop all checks here
> (including confusing one) in Arm code and update common code a bit:
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 1a1e0d3..ed5b855 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>         struct resource *regs, *tmp_res;
>         uint64_t min_gpaddr = -1, max_gpaddr = 0;
>         unsigned int i, nr_reg = 0;
> -       struct range mhp_range;
>         int rc;
> 
>         if (!xen_domain())
> @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>                         min_gpaddr = regs[i].start;
>         }
> 
> -       /* Check whether the resource range is within the hotpluggable range
> */
> -       mhp_range = mhp_get_pluggable_range(true);
> -       if (min_gpaddr < mhp_range.start)
> -               min_gpaddr = mhp_range.start;
> -       if (max_gpaddr > mhp_range.end)
> -               max_gpaddr = mhp_range.end;
> -
>         res->start = min_gpaddr;
>         res->end = max_gpaddr;
> 
> @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>                 if (start == (end + 1))
>                         continue;
> 
> -               /* Check whether the hole range is within the resource range
> */
> -               if (start < res->start || end > res->end) {
> -                       if (start < res->start)
> -                               start = res->start;
> -                       if (end > res->end)
> -                               end = res->end;
> -
> -                       if (start >= (end + 1))
> -                               continue;
> -               }
> -
>                 tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>                 if (!tmp_res) {
>                         rc = -ENOMEM;
> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
> index 1f1d8d8..a5d3ebb 100644
> --- a/drivers/xen/unpopulated-alloc.c
> +++ b/drivers/xen/unpopulated-alloc.c
> @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
>         void *vaddr;
>         unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>         int ret;
> +       struct range mhp_range;
> 
>         res = kzalloc(sizeof(*res), GFP_KERNEL);
>         if (!res)
> @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
>         res->name = "Xen scratch";
>         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> 
> +       mhp_range = mhp_get_pluggable_range(true);
> +
>         ret = allocate_resource(target_resource, res,
> -                               alloc_pages * PAGE_SIZE, 0, -1,
> +                               alloc_pages * PAGE_SIZE, mhp_range.start,
> mhp_range.end,
>                                 PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>         if (ret < 0) {
>                 pr_err("Cannot allocate new IOMEM resource\n");
> (END)
> 
> I believe, this will work on x86 as arch_get_mappable_range() is not
> implemented there,
> and the default option contains exactly what being used currently (0, -1).
> 
> struct range __weak arch_get_mappable_range(void)
> {
>     struct range mhp_range = {
>         .start = 0UL,
>         .end = -1ULL,
>     };
>     return mhp_range;
> }
> 
> And this is going to be more generic and clear, what do you think?

Yeah this is much better, good thinking!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
@ 2021-11-19  1:19         ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-19  1:19 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, linux-arm-kernel, linux-kernel,
	Oleksandr Tyshchenko, Russell King, Boris Ostrovsky,
	Juergen Gross, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 12819 bytes --]

On Wed, 10 Nov 2021, Oleksandr wrote:
> On 28.10.21 04:40, Stefano Stabellini wrote:
> 
> Hi Stefano
> 
> I am sorry for the late response.
> 
> > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > This patch implements arch_xen_unpopulated_init() on Arm where
> > > the extended regions (if any) are gathered from DT and inserted
> > > into passed Xen resource to be used as unused address space
> > > for Xen scratch pages by unpopulated-alloc code.
> > > 
> > > The extended region (safe range) is a region of guest physical
> > > address space which is unused and could be safely used to create
> > > grant/foreign mappings instead of wasting real RAM pages from
> > > the domain memory for establishing these mappings.
> > > 
> > > The extended regions are chosen by the hypervisor at the domain
> > > creation time and advertised to it via "reg" property under
> > > hypervisor node in the guest device-tree. As region 0 is reserved
> > > for grant table space (always present), the indexes for extended
> > > regions are 1...N.
> > > 
> > > If arch_xen_unpopulated_init() fails for some reason the default
> > > behaviour will be restored (allocate xenballooned pages).
> > > 
> > > This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > Changes RFC -> V2:
> > >     - new patch, instead of
> > >      "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
> > > unallocated space"
> > > ---
> > >   arch/arm/xen/enlighten.c | 112
> > > +++++++++++++++++++++++++++++++++++++++++++++++
> > >   drivers/xen/Kconfig      |   2 +-
> > >   2 files changed, 113 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > index dea46ec..1a1e0d3 100644
> > > --- a/arch/arm/xen/enlighten.c
> > > +++ b/arch/arm/xen/enlighten.c
> > > @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
> > >   static phys_addr_t xen_grant_frames;
> > >     #define GRANT_TABLE_INDEX   0
> > > +#define EXT_REGION_INDEX    1
> > >     uint32_t xen_start_flags;
> > >   EXPORT_SYMBOL(xen_start_flags);
> > > @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
> > >   #endif
> > >   }
> > >   +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > +int arch_xen_unpopulated_init(struct resource *res)
> > > +{
> > > +	struct device_node *np;
> > > +	struct resource *regs, *tmp_res;
> > > +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
> > > +	unsigned int i, nr_reg = 0;
> > > +	struct range mhp_range;
> > > +	int rc;
> > > +
> > > +	if (!xen_domain())
> > > +		return -ENODEV;
> > > +
> > > +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
> > > +	if (WARN_ON(!np))
> > > +		return -ENODEV;
> > > +
> > > +	/* Skip region 0 which is reserved for grant table space */
> > > +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
> > > +		nr_reg++;
> > > +	if (!nr_reg) {
> > > +		pr_err("No extended regions are found\n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> > > +	if (!regs)
> > > +		return -ENOMEM;
> > > +
> > > +	/*
> > > +	 * Create resource from extended regions provided by the hypervisor to
> > > be
> > > +	 * used as unused address space for Xen scratch pages.
> > > +	 */
> > > +	for (i = 0; i < nr_reg; i++) {
> > > +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
> > > &regs[i]);
> > > +		if (rc)
> > > +			goto err;
> > > +
> > > +		if (max_gpaddr < regs[i].end)
> > > +			max_gpaddr = regs[i].end;
> > > +		if (min_gpaddr > regs[i].start)
> > > +			min_gpaddr = regs[i].start;
> > > +	}
> > > +
> > > +	/* Check whether the resource range is within the hotpluggable range
> > > */
> > > +	mhp_range = mhp_get_pluggable_range(true);
> > > +	if (min_gpaddr < mhp_range.start)
> > > +		min_gpaddr = mhp_range.start;
> > > +	if (max_gpaddr > mhp_range.end)
> > > +		max_gpaddr = mhp_range.end;
> > > +
> > > +	res->start = min_gpaddr;
> > > +	res->end = max_gpaddr;
> > > +
> > > +	/*
> > > +	 * Mark holes between extended regions as unavailable. The rest of
> > > that
> > > +	 * address space will be available for the allocation.
> > > +	 */
> > > +	for (i = 1; i < nr_reg; i++) {
> > > +		resource_size_t start, end;
> > > +
> > > +		start = regs[i - 1].end + 1;
> > > +		end = regs[i].start - 1;
> > > +
> > > +		if (start > (end + 1)) {
> > Should this be:
> > 
> > if (start >= end)
> > 
> > ?
> 
> Yes, we can do this here (since the checks are equivalent) but ...
>
> > > +			rc = -EINVAL;
> > > +			goto err;
> > > +		}
> > > +
> > > +		/* There is no hole between regions */
> > > +		if (start == (end + 1))
> > Also here, shouldn't it be:
> > 
> > if (start == end)
> > 
> > ?
> 
>    ... not here.
> 
> As
> 
> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"
> 
> but
> 
> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
 
OK. But the check:

  if (start >= end)

Actually covers both cases so that's the only check we need?


> > 
> > I think I am missing again something in termination accounting :-)
> 
> If I understand correctly, we need to follow "end = start + size - 1" rule, so
> the "end" is the last address inside a range, but not the "first" address
> outside of a range))

yeah
 

> > > +			continue;
> > > +
> > > +		/* Check whether the hole range is within the resource range
> > > */
> > > +		if (start < res->start || end > res->end) {
> > By definition I don't think this check is necessary as either condition
> > is impossible?
> 
> 
> This is a good question, let me please explain.
> Not all extended regions provided by the hypervisor can be used here. This is
> because the addressable physical memory range for which the linear mapping
> could be created has limits on Arm, and maximum addressable range depends on
> the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them
> in hypervisor as this logic could be quite complex as different OS may have
> different requirement, etc. This means that we need to make sure that regions
> are within the hotpluggable range to avoid a failure later on when a region is
> pre-validated by the memory hotplug path.
> 
> The following code limits the resource range based on that:
> 
> +    /* Check whether the resource range is within the hotpluggable range */
> +    mhp_range = mhp_get_pluggable_range(true);
> +    if (min_gpaddr < mhp_range.start)
> +        min_gpaddr = mhp_range.start;
> +    if (max_gpaddr > mhp_range.end)
> +        max_gpaddr = mhp_range.end;
> +
> +    res->start = min_gpaddr;
> +    res->end = max_gpaddr;
> 
> In current loop (when calculating and inserting holes) we also need to make
> sure that resulting hole range is within the resource range (and adjust/skip
> it if not true) as regs[] used for the calculations contains raw regions as
> they described in DT so not updated. Otherwise insert_resource() down the
> function will return an error for the conflicting operations. Yes, I could
> took a different route and update regs[] in advance to adjust/skip
> non-suitable regions in front, but I decided to do it on the fly in the loop
> here, I thought doing it in advance would add some overhead/complexity. What
> do you think?

I understand now.


> So I am afraid this check is necessary here.
> 
> For example in my environment the extended regions are:
> 
> (XEN) Extended region 0: 0->0x8000000
> (XEN) Extended region 1: 0xc000000->0x30000000
> (XEN) Extended region 2: 0x40000000->0x47e00000
> (XEN) Extended region 3: 0xd0000000->0xe6000000
> (XEN) Extended region 4: 0xe7800000->0xec000000
> (XEN) Extended region 5: 0xf1200000->0xfd000000
> (XEN) Extended region 6: 0x100000000->0x500000000
> (XEN) Extended region 7: 0x580000000->0x600000000
> (XEN) Extended region 8: 0x680000000->0x700000000
> (XEN) Extended region 9: 0x780000000->0x10000000000
> 
> *With* the check the holes are:
> 
> holes [47e00000 - cfffffff]
> holes [e6000000 - e77fffff]
> holes [ec000000 - f11fffff]
> holes [fd000000 - ffffffff]
> holes [500000000 - 57fffffff]
> holes [600000000 - 67fffffff]
> holes [700000000 - 77fffffff]
> 
> And they seem to look correct, you can see that two possible holes between
> extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were
> skipped as they entirely located below res->start
> which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).
> 
> *Without* the check these two holes won't be skipped and as the result
> insert_resource() will fail.
> 
> 
> **********
> 
> 
> I have one idea how we can simplify filter logic, we can drop all checks here
> (including confusing one) in Arm code and update common code a bit:
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 1a1e0d3..ed5b855 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>         struct resource *regs, *tmp_res;
>         uint64_t min_gpaddr = -1, max_gpaddr = 0;
>         unsigned int i, nr_reg = 0;
> -       struct range mhp_range;
>         int rc;
> 
>         if (!xen_domain())
> @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>                         min_gpaddr = regs[i].start;
>         }
> 
> -       /* Check whether the resource range is within the hotpluggable range
> */
> -       mhp_range = mhp_get_pluggable_range(true);
> -       if (min_gpaddr < mhp_range.start)
> -               min_gpaddr = mhp_range.start;
> -       if (max_gpaddr > mhp_range.end)
> -               max_gpaddr = mhp_range.end;
> -
>         res->start = min_gpaddr;
>         res->end = max_gpaddr;
> 
> @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>                 if (start == (end + 1))
>                         continue;
> 
> -               /* Check whether the hole range is within the resource range
> */
> -               if (start < res->start || end > res->end) {
> -                       if (start < res->start)
> -                               start = res->start;
> -                       if (end > res->end)
> -                               end = res->end;
> -
> -                       if (start >= (end + 1))
> -                               continue;
> -               }
> -
>                 tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>                 if (!tmp_res) {
>                         rc = -ENOMEM;
> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
> index 1f1d8d8..a5d3ebb 100644
> --- a/drivers/xen/unpopulated-alloc.c
> +++ b/drivers/xen/unpopulated-alloc.c
> @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
>         void *vaddr;
>         unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>         int ret;
> +       struct range mhp_range;
> 
>         res = kzalloc(sizeof(*res), GFP_KERNEL);
>         if (!res)
> @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
>         res->name = "Xen scratch";
>         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> 
> +       mhp_range = mhp_get_pluggable_range(true);
> +
>         ret = allocate_resource(target_resource, res,
> -                               alloc_pages * PAGE_SIZE, 0, -1,
> +                               alloc_pages * PAGE_SIZE, mhp_range.start,
> mhp_range.end,
>                                 PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>         if (ret < 0) {
>                 pr_err("Cannot allocate new IOMEM resource\n");
> (END)
> 
> I believe, this will work on x86 as arch_get_mappable_range() is not
> implemented there,
> and the default option contains exactly what being used currently (0, -1).
> 
> struct range __weak arch_get_mappable_range(void)
> {
>     struct range mhp_range = {
>         .start = 0UL,
>         .end = -1ULL,
>     };
>     return mhp_range;
> }
> 
> And this is going to be more generic and clear, what do you think?

Yeah this is much better, good thinking!

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-19  0:59       ` Stefano Stabellini
@ 2021-11-19 18:18         ` Oleksandr
  2021-11-20  2:19           ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Oleksandr @ 2021-11-19 18:18 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, Oleksandr Tyshchenko, Boris Ostrovsky,
	Juergen Gross, Julien Grall


On 19.11.21 02:59, Stefano Stabellini wrote:


Hi Stefano

> On Tue, 9 Nov 2021, Oleksandr wrote:
>> On 28.10.21 19:37, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> I am sorry for the late response.
>>
>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> The main reason of this change is that unpopulated-alloc
>>>> code cannot be used in its current form on Arm, but there
>>>> is a desire to reuse it to avoid wasting real RAM pages
>>>> for the grant/foreign mappings.
>>>>
>>>> The problem is that system "iomem_resource" is used for
>>>> the address space allocation, but the really unallocated
>>>> space can't be figured out precisely by the domain on Arm
>>>> without hypervisor involvement. For example, not all device
>>>> I/O regions are known by the time domain starts creating
>>>> grant/foreign mappings. And following the advise from
>>>> "iomem_resource" we might end up reusing these regions by
>>>> a mistake. So, the hypervisor which maintains the P2M for
>>>> the domain is in the best position to provide unused regions
>>>> of guest physical address space which could be safely used
>>>> to create grant/foreign mappings.
>>>>
>>>> Introduce new helper arch_xen_unpopulated_init() which purpose
>>>> is to create specific Xen resource based on the memory regions
>>>> provided by the hypervisor to be used as unused space for Xen
>>>> scratch pages.
>>>>
>>>> If arch doesn't implement arch_xen_unpopulated_init() to
>>>> initialize Xen resource the default "iomem_resource" will be used.
>>>> So the behavior on x86 won't be changed.
>>>>
>>>> Also fall back to allocate xenballooned pages (steal real RAM
>>>> pages) if we do not have any suitable resource to work with and
>>>> as the result we won't be able to provide unpopulated pages.
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Changes RFC -> V2:
>>>>      - new patch, instead of
>>>>       "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
>>>> unallocated space"
>>>> ---
>>>>    drivers/xen/unpopulated-alloc.c | 89
>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>    include/xen/xen.h               |  2 +
>>>>    2 files changed, 88 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/xen/unpopulated-alloc.c
>>>> b/drivers/xen/unpopulated-alloc.c
>>>> index a03dc5b..1f1d8d8 100644
>>>> --- a/drivers/xen/unpopulated-alloc.c
>>>> +++ b/drivers/xen/unpopulated-alloc.c
>>>> @@ -8,6 +8,7 @@
>>>>      #include <asm/page.h>
>>>>    +#include <xen/balloon.h>
>>>>    #include <xen/page.h>
>>>>    #include <xen/xen.h>
>>>>    @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
>>>>    static struct page *page_list;
>>>>    static unsigned int list_count;
>>>>    +static struct resource *target_resource;
>>>> +static struct resource xen_resource = {
>>>> +	.name = "Xen unused space",
>>>> +};
>>>> +
>>>> +/*
>>>> + * If arch is not happy with system "iomem_resource" being used for
>>>> + * the region allocation it can provide it's own view by initializing
>>>> + * "xen_resource" with unused regions of guest physical address space
>>>> + * provided by the hypervisor.
>>>> + */
>>>> +int __weak arch_xen_unpopulated_init(struct resource *res)
>>>> +{
>>>> +	return -ENOSYS;
>>>> +}
>>>> +
>>>>    static int fill_list(unsigned int nr_pages)
>>>>    {
>>>>    	struct dev_pagemap *pgmap;
>>>> -	struct resource *res;
>>>> +	struct resource *res, *tmp_res = NULL;
>>>>    	void *vaddr;
>>>>    	unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>>>> -	int ret = -ENOMEM;
>>>> +	int ret;
>>>>      	res = kzalloc(sizeof(*res), GFP_KERNEL);
>>>>    	if (!res)
>>>> @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
>>>>    	res->name = "Xen scratch";
>>>>    	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>>>    -	ret = allocate_resource(&iomem_resource, res,
>>>> +	ret = allocate_resource(target_resource, res,
>>>>    				alloc_pages * PAGE_SIZE, 0, -1,
>>>>    				PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>>>>    	if (ret < 0) {
>>>> @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
>>>>    		goto err_resource;
>>>>    	}
>>>>    +	/*
>>>> +	 * Reserve the region previously allocated from Xen resource to avoid
>>>> +	 * re-using it by someone else.
>>>> +	 */
>>>> +	if (target_resource != &iomem_resource) {
>>>> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>>> +		if (!res) {
>>>> +			ret = -ENOMEM;
>>>> +			goto err_insert;
>>>> +		}
>>>> +
>>>> +		tmp_res->name = res->name;
>>>> +		tmp_res->start = res->start;
>>>> +		tmp_res->end = res->end;
>>>> +		tmp_res->flags = res->flags;
>>>> +
>>>> +		ret = insert_resource(&iomem_resource, tmp_res);
>>>> +		if (ret < 0) {
>>>> +			pr_err("Cannot insert IOMEM resource [%llx - %llx]\n",
>>>> +			       tmp_res->start, tmp_res->end);
>>>> +			kfree(tmp_res);
>>>> +			goto err_insert;
>>>> +		}
>>>> +	}
>>> I am a bit confused.. why do we need to do this? Who could be
>>> erroneously re-using the region? Are you saying that the next time
>>> allocate_resource is called it could find the same region again? It
>>> doesn't seem possible?
>>
>> No, as I understand the allocate_resource() being called for the same root
>> resource won't provide the same region... We only need to do this (insert the
>> region into "iomem_resource") if we allocated it from our *internal*
>> "xen_resource", as *global* "iomem_resource" (which is used everywhere) is not
>> aware of that region has been already allocated. So inserting a region here we
>> reserving it, otherwise it could be reused elsewhere.
> But elsewhere where?

I think, theoretically everywhere where 
allocate_resource(&iomem_resource, ...) is called.


> Let's say that allocate_resource allocates a range from xen_resource.
>  From reading the code, it doesn't look like iomem_resource would have
> that range because the extended regions described under /hypervisor are
> not added automatically to iomem_resource.
>
> So what if we don't call insert_resource? Nothing could allocate the
> same range because iomem_resource doesn't have it at all and
> xen_resource is not used anywhere if not here.
>
> What am I missing?


Below my understanding which, of course, might be wrong.

If we don't claim resource by calling insert_resource (or even 
request_resource) here then the same range could be allocated everywhere 
where allocate_resource(&iomem_resource, ...) is called.
I don't see what prevents the same range from being allocated. Why 
actually allocate_resource(&iomem_resource, ...) can't provide the same 
range if it is free (not-reserved-yet) from it's PoV? The comment above 
allocate_resource() says "allocate empty slot in the resource tree given 
range & alignment". So this "empty slot" could be exactly the same range.

I experimented with that a bit trying to call 
allocate_resource(&iomem_resource, ...) several times in another place 
to see what ranges it returns in both cases (w/ and w/o calling 
insert_resource here). So an experiment confirmed (of course, if I made 
it correctly) that the same range could be allocated if we didn't call 
insert_resource() here. And as I understand there is nothing strange 
here, as iomem_resource covers all address space initially (0, -1) and 
everything *not* inserted/requested (in other words, reserved) yet is 
considered as free and could be provided if fits constraints. Or I 
really missed something?

It feels to me that it would be better to call request_resource() 
instead of insert_resource(). It seems, that if no conflict happens both 
functions will behave in same way, but in case of conflict if the 
conflicting resource entirely fit the new resource the former will 
return an error. I think, this way we will be able to detect that a 
range we are trying to reserve is already present and bail out early.


>
> Or maybe it is the other way around: core Linux code assumes everything
> is described in iomem_resource so something under kernel/ or mm/ would
> crash if we start using a page pointing to an address missing from
> iomem_resource?
>   
>   
>>>>    	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
>>>>    	if (!pgmap) {
>>>>    		ret = -ENOMEM;
>>>> @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
>>>>    err_memremap:
>>>>    	kfree(pgmap);
>>>>    err_pgmap:
>>>> +	if (tmp_res) {
>>>> +		release_resource(tmp_res);
>>>> +		kfree(tmp_res);
>>>> +	}
>>>> +err_insert:
>>>>    	release_resource(res);
>>>>    err_resource:
>>>>    	kfree(res);
>>>>    	return ret;
>>>>    }
>>>>    +static void unpopulated_init(void)
>>>> +{
>>>> +	static bool inited = false;
>>> initialized = false
>> ok.
>>
>>
>>>
>>>> +	int ret;
>>>> +
>>>> +	if (inited)
>>>> +		return;
>>>> +
>>>> +	/*
>>>> +	 * Try to initialize Xen resource the first and fall back to default
>>>> +	 * resource if arch doesn't offer one.
>>>> +	 */
>>>> +	ret = arch_xen_unpopulated_init(&xen_resource);
>>>> +	if (!ret)
>>>> +		target_resource = &xen_resource;
>>>> +	else if (ret == -ENOSYS)
>>>> +		target_resource = &iomem_resource;
>>>> +	else
>>>> +		pr_err("Cannot initialize Xen resource\n");
>>>> +
>>>> +	inited = true;
>>>> +}
>>> Would it make sense to call unpopulated_init from an init function,
>>> rather than every time xen_alloc_unpopulated_pages is called?
>> Good point, thank you. Will do. To be honest, I also don't like the current
>> approach much.
>>
>>
>>>
>>>>    /**
>>>>     * xen_alloc_unpopulated_pages - alloc unpopulated pages
>>>>     * @nr_pages: Number of pages
>>>> @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int
>>>> nr_pages, struct page **pages)
>>>>    	unsigned int i;
>>>>    	int ret = 0;
>>>>    +	unpopulated_init();
>>>> +
>>>> +	/*
>>>> +	 * Fall back to default behavior if we do not have any suitable
>>>> resource
>>>> +	 * to allocate required region from and as the result we won't be able
>>>> to
>>>> +	 * construct pages.
>>>> +	 */
>>>> +	if (!target_resource)
>>>> +		return alloc_xenballooned_pages(nr_pages, pages);
>>> The commit message says that the behavior on x86 doesn't change but this
>>> seems to be a change that could impact x86?
>> I don't think, however I didn't tested on x86 and might be wrong, but
>> according to the current patch, on x86 the "target_resource" is always valid
>> and points to the "iomem_resource" as arch_xen_unpopulated_init() is not
>> implemented. So there won't be any fallback to use
>> alloc_(free)_xenballooned_pages() here and fill_list() will behave as usual.
>   
> If target_resource is always valid, then we don't need this special
> check. In fact, the condition should never be true.


The target_resource is always valid and points to the "iomem_resource" 
on x86 (this is equivalent to the behavior before this patch).
On Arm target_resource might be NULL if arch_xen_unpopulated_init() 
failed, for example, if no extended regions reported by the hypervisor.
We cannot use "iomem_resource" on Arm, only a resource constructed from 
extended regions. This is why I added that check (and fallback to 
xenballooned pages).
What I was thinking is that in case of using old Xen (although we would 
need to balloon out RAM pages) we still would be able to keep working, 
so no need to disable CONFIG_XEN_UNPOPULATED_ALLOC on such setups.


>
>
>> You raised a really good question, on Arm we need a fallback to balloon out
>> RAM pages again if hypervisor doesn't provide extended regions (we run on old
>> version, no unused regions with reasonable size, etc), so I decided to put a
>> fallback code here, an indicator of the failure is invalid "target_resource".
> I think it is unnecessary as we already assume today that
> &iomem_resource is always available.
>> I noticed the patch which is about to be upstreamed that removes
>> alloc_(free)xenballooned_pages API [1]. Right now I have no idea how/where
>> this fallback could be implemented as this is under build option control
>> (CONFIG_XEN_UNPOPULATED_ALLOC). So the API with the same name is either used
>> for unpopulated pages (if set) or ballooned pages (if not set). I would
>> appreciate suggestions regarding that. I am wondering would it be possible and
>> correctly to have both mechanisms (unpopulated and ballooned) enabled by
>> default and some init code to decide which one to use at runtime or some sort?
> I would keep it simple and remove the fallback from this patch. So:
>
> - if not CONFIG_XEN_UNPOPULATED_ALLOC, then balloon
> - if CONFIG_XEN_UNPOPULATED_ALLOC, then
>      - xen_resource if present
>      - otherwise iomem_resource

Unfortunately, we cannot use iomem_resource on Arm safely, either 
xen_resource or fail (if no fallback exists).


>
> The xen_resource/iomem_resource config can be done at init time using
> target_resource. At runtime, target_resource is always != NULL so we
> just go ahead and use it.


Thank you for the suggestion. OK, let's keep it simple and drop fallback 
attempts for now. With one remark:
We will make CONFIG_XEN_UNPOPULATED_ALLOC disabled by default on Arm in 
next patch. So by default everything will behave as usual on Arm 
(balloon out RAM pages),
if user knows for sure that Xen reports extended regions, he/she can 
enable the config. This way we won't break anything. What do you think?


[snip]


-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
  2021-11-19  0:32         ` Stefano Stabellini
@ 2021-11-19 18:25           ` Oleksandr
  -1 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-19 18:25 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Julien Grall


On 19.11.21 02:32, Stefano Stabellini wrote:

Hi Stefano

> On Thu, 11 Nov 2021, Oleksandr wrote:
>> On 28.10.21 04:28, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> I am sorry for the late response.
>>
>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> Read the start address of the grant table space from DT
>>>> (region 0).
>>>>
>>>> This patch mostly restores behaviour before commit 3cf4095d7446
>>>> ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
>>>> but trying not to break the ACPI support added after that commit.
>>>> So the patch touches DT part only and leaves the ACPI part with
>>>> xen_xlate_map_ballooned_pages().
>>>>
>>>> This is a preparation for using Xen extended region feature
>>>> where unused regions of guest physical address space (provided
>>>> by the hypervisor) will be used to create grant/foreign/whatever
>>>> mappings instead of wasting real RAM pages from the domain memory
>>>> for establishing these mappings.
>>>>
>>>> The immediate benefit of this change:
>>>> - Avoid superpage shattering in Xen P2M when establishing
>>>>     stage-2 mapping (GFN <-> MFN) for the grant table space
>>>> - Avoid wasting real RAM pages (reducing the amount of memory
>>>>     usuable) for mapping grant table space
>>>> - The grant table space is always mapped at the exact
>>>>     same place (region 0 is reserved for the grant table)
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Changes RFC -> V2:
>>>>      - new patch
>>>> ---
>>>>    arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
>>>>    1 file changed, 25 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>> index 7f1c106b..dea46ec 100644
>>>> --- a/arch/arm/xen/enlighten.c
>>>> +++ b/arch/arm/xen/enlighten.c
>>>> @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
>>>>    struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS]
>>>> __initdata;
>>>>      static __read_mostly unsigned int xen_events_irq;
>>>> +static phys_addr_t xen_grant_frames;
>>> __read_mostly
>> ok
>>
>>
>>>
>>>> +#define GRANT_TABLE_INDEX   0
>>>>      uint32_t xen_start_flags;
>>>>    EXPORT_SYMBOL(xen_start_flags);
>>>> @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
>>>>    static void __init xen_dt_guest_init(void)
>>>>    {
>>>>    	struct device_node *xen_node;
>>>> +	struct resource res;
>>>>      	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
>>>>    	if (!xen_node) {
>>>> @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
>>>>    		return;
>>>>    	}
>>>>    +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
>>>> +		pr_err("Xen grant table region is not found\n");
>>>> +		return;
>>>> +	}
>>>> +	xen_grant_frames = res.start;
>>>> +
>>>>    	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
>>>>    }
>>>>    @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
>>>>    {
>>>>    	struct xen_add_to_physmap xatp;
>>>>    	struct shared_info *shared_info_page = NULL;
>>>> -	int cpu;
>>>> +	int rc, cpu;
>>>>      	if (!xen_domain())
>>>>    		return 0;
>>>>      	if (!acpi_disabled)
>>>>    		xen_acpi_guest_init();
>>>> -	else
>>>> +	else {
>>>>    		xen_dt_guest_init();
>>>>    +		if (!xen_grant_frames)
>>>> +			return -ENODEV;
>>> maybe we can avoid this, see below
>>>
>>>
>>>> +	}
>>>> +
>>>>    	if (!xen_events_irq) {
>>>>    		pr_err("Xen event channel interrupt not found\n");
>>>>    		return -ENODEV;
>>>> @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
>>>>    	for_each_possible_cpu(cpu)
>>>>    		per_cpu(xen_vcpu_id, cpu) = cpu;
>>>>    -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>>>> -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>>>> -					  &xen_auto_xlat_grant_frames.vaddr,
>>>> -					  xen_auto_xlat_grant_frames.count)) {
>>>> +	if (!acpi_disabled) {
>>> To make the code more resilient couldn't we do:
>>>
>>> if (!acpi_disabled || !xen_grant_frames) {
>> I think, we can.
>>
>> On the one hand, indeed the code more resilient and less change.
>>  From the other hand if grant table region is not found then something weird
>> happened as region 0 is always present in reg property if hypervisor node is
>> exposed to the guest.
>> The behavior before commit 3cf4095d7446 ("arm/xen: Use
>> xen_xlate_map_ballooned_pages to setup grant table") was exactly the same in
>> the context of the failure if region wasn't found.
>>
>> ...
>>
>> Well, if we want to make code more resilient, I will update. But, looks like
>> we also need to switch actions in xen_dt_guest_init() in order to process
>> xen_events_irq before xen_grant_frames, otherwise we may return after failing
>> with region and end up not initializing xen_events_irq so xen_guest_init()
>> will fail earlier than reaches that check.
>> What do you think?
>   
> Yes, you are right. I was re-reading the patch to refresh my memory and
> I noticed immediately that xen_dt_guest_init also need to be changed so
> that xen_events_irq is set before xen_grant_frames.
>   
> I think it is a minor change that doesn't add complexity but make the
> code more robust so I think it is a good idea


Great, thank you. Will do in next version.


>
>   
>>>> +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>>>> +		rc =
>>>> xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>>>> +
>>>> &xen_auto_xlat_grant_frames.vaddr,
>>>> +
>>>> xen_auto_xlat_grant_frames.count);
>>>> +	} else
>>>> +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
>>>> +	if (rc) {
>>>>    		free_percpu(xen_vcpu_info);
>>>> -		return -ENOMEM;
>>>> +		return rc;
>>>>    	}
>>>>    	gnttab_init();

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
@ 2021-11-19 18:25           ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-19 18:25 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Julien Grall


On 19.11.21 02:32, Stefano Stabellini wrote:

Hi Stefano

> On Thu, 11 Nov 2021, Oleksandr wrote:
>> On 28.10.21 04:28, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> I am sorry for the late response.
>>
>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> Read the start address of the grant table space from DT
>>>> (region 0).
>>>>
>>>> This patch mostly restores behaviour before commit 3cf4095d7446
>>>> ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table")
>>>> but trying not to break the ACPI support added after that commit.
>>>> So the patch touches DT part only and leaves the ACPI part with
>>>> xen_xlate_map_ballooned_pages().
>>>>
>>>> This is a preparation for using Xen extended region feature
>>>> where unused regions of guest physical address space (provided
>>>> by the hypervisor) will be used to create grant/foreign/whatever
>>>> mappings instead of wasting real RAM pages from the domain memory
>>>> for establishing these mappings.
>>>>
>>>> The immediate benefit of this change:
>>>> - Avoid superpage shattering in Xen P2M when establishing
>>>>     stage-2 mapping (GFN <-> MFN) for the grant table space
>>>> - Avoid wasting real RAM pages (reducing the amount of memory
>>>>     usuable) for mapping grant table space
>>>> - The grant table space is always mapped at the exact
>>>>     same place (region 0 is reserved for the grant table)
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Changes RFC -> V2:
>>>>      - new patch
>>>> ---
>>>>    arch/arm/xen/enlighten.c | 32 +++++++++++++++++++++++++-------
>>>>    1 file changed, 25 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>> index 7f1c106b..dea46ec 100644
>>>> --- a/arch/arm/xen/enlighten.c
>>>> +++ b/arch/arm/xen/enlighten.c
>>>> @@ -59,6 +59,9 @@ unsigned long xen_released_pages;
>>>>    struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS]
>>>> __initdata;
>>>>      static __read_mostly unsigned int xen_events_irq;
>>>> +static phys_addr_t xen_grant_frames;
>>> __read_mostly
>> ok
>>
>>
>>>
>>>> +#define GRANT_TABLE_INDEX   0
>>>>      uint32_t xen_start_flags;
>>>>    EXPORT_SYMBOL(xen_start_flags);
>>>> @@ -303,6 +306,7 @@ static void __init xen_acpi_guest_init(void)
>>>>    static void __init xen_dt_guest_init(void)
>>>>    {
>>>>    	struct device_node *xen_node;
>>>> +	struct resource res;
>>>>      	xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
>>>>    	if (!xen_node) {
>>>> @@ -310,6 +314,12 @@ static void __init xen_dt_guest_init(void)
>>>>    		return;
>>>>    	}
>>>>    +	if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) {
>>>> +		pr_err("Xen grant table region is not found\n");
>>>> +		return;
>>>> +	}
>>>> +	xen_grant_frames = res.start;
>>>> +
>>>>    	xen_events_irq = irq_of_parse_and_map(xen_node, 0);
>>>>    }
>>>>    @@ -317,16 +327,20 @@ static int __init xen_guest_init(void)
>>>>    {
>>>>    	struct xen_add_to_physmap xatp;
>>>>    	struct shared_info *shared_info_page = NULL;
>>>> -	int cpu;
>>>> +	int rc, cpu;
>>>>      	if (!xen_domain())
>>>>    		return 0;
>>>>      	if (!acpi_disabled)
>>>>    		xen_acpi_guest_init();
>>>> -	else
>>>> +	else {
>>>>    		xen_dt_guest_init();
>>>>    +		if (!xen_grant_frames)
>>>> +			return -ENODEV;
>>> maybe we can avoid this, see below
>>>
>>>
>>>> +	}
>>>> +
>>>>    	if (!xen_events_irq) {
>>>>    		pr_err("Xen event channel interrupt not found\n");
>>>>    		return -ENODEV;
>>>> @@ -370,12 +384,16 @@ static int __init xen_guest_init(void)
>>>>    	for_each_possible_cpu(cpu)
>>>>    		per_cpu(xen_vcpu_id, cpu) = cpu;
>>>>    -	xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>>>> -	if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>>>> -					  &xen_auto_xlat_grant_frames.vaddr,
>>>> -					  xen_auto_xlat_grant_frames.count)) {
>>>> +	if (!acpi_disabled) {
>>> To make the code more resilient couldn't we do:
>>>
>>> if (!acpi_disabled || !xen_grant_frames) {
>> I think, we can.
>>
>> On the one hand, indeed the code more resilient and less change.
>>  From the other hand if grant table region is not found then something weird
>> happened as region 0 is always present in reg property if hypervisor node is
>> exposed to the guest.
>> The behavior before commit 3cf4095d7446 ("arm/xen: Use
>> xen_xlate_map_ballooned_pages to setup grant table") was exactly the same in
>> the context of the failure if region wasn't found.
>>
>> ...
>>
>> Well, if we want to make code more resilient, I will update. But, looks like
>> we also need to switch actions in xen_dt_guest_init() in order to process
>> xen_events_irq before xen_grant_frames, otherwise we may return after failing
>> with region and end up not initializing xen_events_irq so xen_guest_init()
>> will fail earlier than reaches that check.
>> What do you think?
>   
> Yes, you are right. I was re-reading the patch to refresh my memory and
> I noticed immediately that xen_dt_guest_init also need to be changed so
> that xen_events_irq is set before xen_grant_frames.
>   
> I think it is a minor change that doesn't add complexity but make the
> code more robust so I think it is a good idea


Great, thank you. Will do in next version.


>
>   
>>>> +		xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>>>> +		rc =
>>>> xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>>>> +
>>>> &xen_auto_xlat_grant_frames.vaddr,
>>>> +
>>>> xen_auto_xlat_grant_frames.count);
>>>> +	} else
>>>> +		rc = gnttab_setup_auto_xlat_frames(xen_grant_frames);
>>>> +	if (rc) {
>>>>    		free_percpu(xen_vcpu_info);
>>>> -		return -ENOMEM;
>>>> +		return rc;
>>>>    	}
>>>>    	gnttab_init();

-- 
Regards,

Oleksandr Tyshchenko


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
  2021-11-19  1:19         ` Stefano Stabellini
@ 2021-11-19 20:23           ` Oleksandr
  -1 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-19 20:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Boris Ostrovsky, Juergen Gross, Julien Grall


On 19.11.21 03:19, Stefano Stabellini wrote:

Hi Stefano

> On Wed, 10 Nov 2021, Oleksandr wrote:
>> On 28.10.21 04:40, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> I am sorry for the late response.
>>
>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> This patch implements arch_xen_unpopulated_init() on Arm where
>>>> the extended regions (if any) are gathered from DT and inserted
>>>> into passed Xen resource to be used as unused address space
>>>> for Xen scratch pages by unpopulated-alloc code.
>>>>
>>>> The extended region (safe range) is a region of guest physical
>>>> address space which is unused and could be safely used to create
>>>> grant/foreign mappings instead of wasting real RAM pages from
>>>> the domain memory for establishing these mappings.
>>>>
>>>> The extended regions are chosen by the hypervisor at the domain
>>>> creation time and advertised to it via "reg" property under
>>>> hypervisor node in the guest device-tree. As region 0 is reserved
>>>> for grant table space (always present), the indexes for extended
>>>> regions are 1...N.
>>>>
>>>> If arch_xen_unpopulated_init() fails for some reason the default
>>>> behaviour will be restored (allocate xenballooned pages).
>>>>
>>>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Changes RFC -> V2:
>>>>      - new patch, instead of
>>>>       "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
>>>> unallocated space"
>>>> ---
>>>>    arch/arm/xen/enlighten.c | 112
>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>    drivers/xen/Kconfig      |   2 +-
>>>>    2 files changed, 113 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>> index dea46ec..1a1e0d3 100644
>>>> --- a/arch/arm/xen/enlighten.c
>>>> +++ b/arch/arm/xen/enlighten.c
>>>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>>>    static phys_addr_t xen_grant_frames;
>>>>      #define GRANT_TABLE_INDEX   0
>>>> +#define EXT_REGION_INDEX    1
>>>>      uint32_t xen_start_flags;
>>>>    EXPORT_SYMBOL(xen_start_flags);
>>>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>>>    #endif
>>>>    }
>>>>    +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>> +int arch_xen_unpopulated_init(struct resource *res)
>>>> +{
>>>> +	struct device_node *np;
>>>> +	struct resource *regs, *tmp_res;
>>>> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
>>>> +	unsigned int i, nr_reg = 0;
>>>> +	struct range mhp_range;
>>>> +	int rc;
>>>> +
>>>> +	if (!xen_domain())
>>>> +		return -ENODEV;
>>>> +
>>>> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
>>>> +	if (WARN_ON(!np))
>>>> +		return -ENODEV;
>>>> +
>>>> +	/* Skip region 0 which is reserved for grant table space */
>>>> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
>>>> +		nr_reg++;
>>>> +	if (!nr_reg) {
>>>> +		pr_err("No extended regions are found\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>>>> +	if (!regs)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	/*
>>>> +	 * Create resource from extended regions provided by the hypervisor to
>>>> be
>>>> +	 * used as unused address space for Xen scratch pages.
>>>> +	 */
>>>> +	for (i = 0; i < nr_reg; i++) {
>>>> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
>>>> &regs[i]);
>>>> +		if (rc)
>>>> +			goto err;
>>>> +
>>>> +		if (max_gpaddr < regs[i].end)
>>>> +			max_gpaddr = regs[i].end;
>>>> +		if (min_gpaddr > regs[i].start)
>>>> +			min_gpaddr = regs[i].start;
>>>> +	}
>>>> +
>>>> +	/* Check whether the resource range is within the hotpluggable range
>>>> */
>>>> +	mhp_range = mhp_get_pluggable_range(true);
>>>> +	if (min_gpaddr < mhp_range.start)
>>>> +		min_gpaddr = mhp_range.start;
>>>> +	if (max_gpaddr > mhp_range.end)
>>>> +		max_gpaddr = mhp_range.end;
>>>> +
>>>> +	res->start = min_gpaddr;
>>>> +	res->end = max_gpaddr;
>>>> +
>>>> +	/*
>>>> +	 * Mark holes between extended regions as unavailable. The rest of
>>>> that
>>>> +	 * address space will be available for the allocation.
>>>> +	 */
>>>> +	for (i = 1; i < nr_reg; i++) {
>>>> +		resource_size_t start, end;
>>>> +
>>>> +		start = regs[i - 1].end + 1;
>>>> +		end = regs[i].start - 1;
>>>> +
>>>> +		if (start > (end + 1)) {
>>> Should this be:
>>>
>>> if (start >= end)
>>>
>>> ?
>> Yes, we can do this here (since the checks are equivalent) but ...
>>
>>>> +			rc = -EINVAL;
>>>> +			goto err;
>>>> +		}
>>>> +
>>>> +		/* There is no hole between regions */
>>>> +		if (start == (end + 1))
>>> Also here, shouldn't it be:
>>>
>>> if (start == end)
>>>
>>> ?
>>     ... not here.
>>
>> As
>>
>> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"
>>
>> but
>>
>> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
>   
> OK. But the check:
>
>    if (start >= end)
>
> Actually covers both cases so that's the only check we need?

Sorry, I don't entirely understand the question.
Is the question to use only a single check in that loop?

Paste the updated code which I have locally for the convenience.

  [snip]

     /*
      * Mark holes between extended regions as unavailable. The rest of that
      * address space will be available for the allocation.
      */
     for (i = 1; i < nr_reg; i++) {
         resource_size_t start, end;

         start = regs[i - 1].end + 1;
         end = regs[i].start - 1;

         if (start > (end + 1)) {
             rc = -EINVAL;
             goto err;
         }

         /* There is no hole between regions */
         if (start == (end + 1))
             continue;

         tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
         if (!tmp_res) {
             rc = -ENOMEM;
             goto err;
         }

         tmp_res->name = "Unavailable space";
         tmp_res->start = start;
         tmp_res->end = end;

         rc = insert_resource(&xen_resource, tmp_res);
         if (rc) {
             pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc);
             kfree(tmp_res);
             goto err;
         }
     }

[snip]


1. The first check is to detect an overlap (which is a wrong 
configuration, correct?) and bail out if true (for example, regX: 
0x81000000...0x82FFFFFF and regY: 0x82000000...0x83FFFFFF).
2. The second check is just to skip current iteration as there is no 
space/hole between regions (for example, regX: 0x81000000...0x82FFFFFF 
and regY: 0x83000000...0x83FFFFFF).
Therefore I think they should be distinguished.

Yes, both check could be transformed to a single one, but this way the 
overlaps will be ignored:
if (start >= (end + 1))
     continue;

Or I really missed something?


>
>
>>> I think I am missing again something in termination accounting :-)
>> If I understand correctly, we need to follow "end = start + size - 1" rule, so
>> the "end" is the last address inside a range, but not the "first" address
>> outside of a range))
> yeah
>   
>
>>>> +			continue;
>>>> +
>>>> +		/* Check whether the hole range is within the resource range
>>>> */
>>>> +		if (start < res->start || end > res->end) {
>>> By definition I don't think this check is necessary as either condition
>>> is impossible?
>>
>> This is a good question, let me please explain.
>> Not all extended regions provided by the hypervisor can be used here. This is
>> because the addressable physical memory range for which the linear mapping
>> could be created has limits on Arm, and maximum addressable range depends on
>> the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them
>> in hypervisor as this logic could be quite complex as different OS may have
>> different requirement, etc. This means that we need to make sure that regions
>> are within the hotpluggable range to avoid a failure later on when a region is
>> pre-validated by the memory hotplug path.
>>
>> The following code limits the resource range based on that:
>>
>> +    /* Check whether the resource range is within the hotpluggable range */
>> +    mhp_range = mhp_get_pluggable_range(true);
>> +    if (min_gpaddr < mhp_range.start)
>> +        min_gpaddr = mhp_range.start;
>> +    if (max_gpaddr > mhp_range.end)
>> +        max_gpaddr = mhp_range.end;
>> +
>> +    res->start = min_gpaddr;
>> +    res->end = max_gpaddr;
>>
>> In current loop (when calculating and inserting holes) we also need to make
>> sure that resulting hole range is within the resource range (and adjust/skip
>> it if not true) as regs[] used for the calculations contains raw regions as
>> they described in DT so not updated. Otherwise insert_resource() down the
>> function will return an error for the conflicting operations. Yes, I could
>> took a different route and update regs[] in advance to adjust/skip
>> non-suitable regions in front, but I decided to do it on the fly in the loop
>> here, I thought doing it in advance would add some overhead/complexity. What
>> do you think?
> I understand now.
>
>
>> So I am afraid this check is necessary here.
>>
>> For example in my environment the extended regions are:
>>
>> (XEN) Extended region 0: 0->0x8000000
>> (XEN) Extended region 1: 0xc000000->0x30000000
>> (XEN) Extended region 2: 0x40000000->0x47e00000
>> (XEN) Extended region 3: 0xd0000000->0xe6000000
>> (XEN) Extended region 4: 0xe7800000->0xec000000
>> (XEN) Extended region 5: 0xf1200000->0xfd000000
>> (XEN) Extended region 6: 0x100000000->0x500000000
>> (XEN) Extended region 7: 0x580000000->0x600000000
>> (XEN) Extended region 8: 0x680000000->0x700000000
>> (XEN) Extended region 9: 0x780000000->0x10000000000
>>
>> *With* the check the holes are:
>>
>> holes [47e00000 - cfffffff]
>> holes [e6000000 - e77fffff]
>> holes [ec000000 - f11fffff]
>> holes [fd000000 - ffffffff]
>> holes [500000000 - 57fffffff]
>> holes [600000000 - 67fffffff]
>> holes [700000000 - 77fffffff]
>>
>> And they seem to look correct, you can see that two possible holes between
>> extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were
>> skipped as they entirely located below res->start
>> which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).
>>
>> *Without* the check these two holes won't be skipped and as the result
>> insert_resource() will fail.
>>
>>
>> **********
>>
>>
>> I have one idea how we can simplify filter logic, we can drop all checks here
>> (including confusing one) in Arm code and update common code a bit:
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index 1a1e0d3..ed5b855 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>>          struct resource *regs, *tmp_res;
>>          uint64_t min_gpaddr = -1, max_gpaddr = 0;
>>          unsigned int i, nr_reg = 0;
>> -       struct range mhp_range;
>>          int rc;
>>
>>          if (!xen_domain())
>> @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>>                          min_gpaddr = regs[i].start;
>>          }
>>
>> -       /* Check whether the resource range is within the hotpluggable range
>> */
>> -       mhp_range = mhp_get_pluggable_range(true);
>> -       if (min_gpaddr < mhp_range.start)
>> -               min_gpaddr = mhp_range.start;
>> -       if (max_gpaddr > mhp_range.end)
>> -               max_gpaddr = mhp_range.end;
>> -
>>          res->start = min_gpaddr;
>>          res->end = max_gpaddr;
>>
>> @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>>                  if (start == (end + 1))
>>                          continue;
>>
>> -               /* Check whether the hole range is within the resource range
>> */
>> -               if (start < res->start || end > res->end) {
>> -                       if (start < res->start)
>> -                               start = res->start;
>> -                       if (end > res->end)
>> -                               end = res->end;
>> -
>> -                       if (start >= (end + 1))
>> -                               continue;
>> -               }
>> -
>>                  tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>                  if (!tmp_res) {
>>                          rc = -ENOMEM;
>> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
>> index 1f1d8d8..a5d3ebb 100644
>> --- a/drivers/xen/unpopulated-alloc.c
>> +++ b/drivers/xen/unpopulated-alloc.c
>> @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
>>          void *vaddr;
>>          unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>>          int ret;
>> +       struct range mhp_range;
>>
>>          res = kzalloc(sizeof(*res), GFP_KERNEL);
>>          if (!res)
>> @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
>>          res->name = "Xen scratch";
>>          res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>
>> +       mhp_range = mhp_get_pluggable_range(true);
>> +
>>          ret = allocate_resource(target_resource, res,
>> -                               alloc_pages * PAGE_SIZE, 0, -1,
>> +                               alloc_pages * PAGE_SIZE, mhp_range.start,
>> mhp_range.end,
>>                                  PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>>          if (ret < 0) {
>>                  pr_err("Cannot allocate new IOMEM resource\n");
>> (END)
>>
>> I believe, this will work on x86 as arch_get_mappable_range() is not
>> implemented there,
>> and the default option contains exactly what being used currently (0, -1).
>>
>> struct range __weak arch_get_mappable_range(void)
>> {
>>      struct range mhp_range = {
>>          .start = 0UL,
>>          .end = -1ULL,
>>      };
>>      return mhp_range;
>> }
>>
>> And this is going to be more generic and clear, what do you think?
> Yeah this is much better, good thinking!

Great, thank you. Will do in next version.


-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
@ 2021-11-19 20:23           ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-19 20:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Boris Ostrovsky, Juergen Gross, Julien Grall


On 19.11.21 03:19, Stefano Stabellini wrote:

Hi Stefano

> On Wed, 10 Nov 2021, Oleksandr wrote:
>> On 28.10.21 04:40, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> I am sorry for the late response.
>>
>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> This patch implements arch_xen_unpopulated_init() on Arm where
>>>> the extended regions (if any) are gathered from DT and inserted
>>>> into passed Xen resource to be used as unused address space
>>>> for Xen scratch pages by unpopulated-alloc code.
>>>>
>>>> The extended region (safe range) is a region of guest physical
>>>> address space which is unused and could be safely used to create
>>>> grant/foreign mappings instead of wasting real RAM pages from
>>>> the domain memory for establishing these mappings.
>>>>
>>>> The extended regions are chosen by the hypervisor at the domain
>>>> creation time and advertised to it via "reg" property under
>>>> hypervisor node in the guest device-tree. As region 0 is reserved
>>>> for grant table space (always present), the indexes for extended
>>>> regions are 1...N.
>>>>
>>>> If arch_xen_unpopulated_init() fails for some reason the default
>>>> behaviour will be restored (allocate xenballooned pages).
>>>>
>>>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Changes RFC -> V2:
>>>>      - new patch, instead of
>>>>       "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
>>>> unallocated space"
>>>> ---
>>>>    arch/arm/xen/enlighten.c | 112
>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>    drivers/xen/Kconfig      |   2 +-
>>>>    2 files changed, 113 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>> index dea46ec..1a1e0d3 100644
>>>> --- a/arch/arm/xen/enlighten.c
>>>> +++ b/arch/arm/xen/enlighten.c
>>>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>>>    static phys_addr_t xen_grant_frames;
>>>>      #define GRANT_TABLE_INDEX   0
>>>> +#define EXT_REGION_INDEX    1
>>>>      uint32_t xen_start_flags;
>>>>    EXPORT_SYMBOL(xen_start_flags);
>>>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>>>    #endif
>>>>    }
>>>>    +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>> +int arch_xen_unpopulated_init(struct resource *res)
>>>> +{
>>>> +	struct device_node *np;
>>>> +	struct resource *regs, *tmp_res;
>>>> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
>>>> +	unsigned int i, nr_reg = 0;
>>>> +	struct range mhp_range;
>>>> +	int rc;
>>>> +
>>>> +	if (!xen_domain())
>>>> +		return -ENODEV;
>>>> +
>>>> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
>>>> +	if (WARN_ON(!np))
>>>> +		return -ENODEV;
>>>> +
>>>> +	/* Skip region 0 which is reserved for grant table space */
>>>> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
>>>> +		nr_reg++;
>>>> +	if (!nr_reg) {
>>>> +		pr_err("No extended regions are found\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>>>> +	if (!regs)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	/*
>>>> +	 * Create resource from extended regions provided by the hypervisor to
>>>> be
>>>> +	 * used as unused address space for Xen scratch pages.
>>>> +	 */
>>>> +	for (i = 0; i < nr_reg; i++) {
>>>> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
>>>> &regs[i]);
>>>> +		if (rc)
>>>> +			goto err;
>>>> +
>>>> +		if (max_gpaddr < regs[i].end)
>>>> +			max_gpaddr = regs[i].end;
>>>> +		if (min_gpaddr > regs[i].start)
>>>> +			min_gpaddr = regs[i].start;
>>>> +	}
>>>> +
>>>> +	/* Check whether the resource range is within the hotpluggable range
>>>> */
>>>> +	mhp_range = mhp_get_pluggable_range(true);
>>>> +	if (min_gpaddr < mhp_range.start)
>>>> +		min_gpaddr = mhp_range.start;
>>>> +	if (max_gpaddr > mhp_range.end)
>>>> +		max_gpaddr = mhp_range.end;
>>>> +
>>>> +	res->start = min_gpaddr;
>>>> +	res->end = max_gpaddr;
>>>> +
>>>> +	/*
>>>> +	 * Mark holes between extended regions as unavailable. The rest of
>>>> that
>>>> +	 * address space will be available for the allocation.
>>>> +	 */
>>>> +	for (i = 1; i < nr_reg; i++) {
>>>> +		resource_size_t start, end;
>>>> +
>>>> +		start = regs[i - 1].end + 1;
>>>> +		end = regs[i].start - 1;
>>>> +
>>>> +		if (start > (end + 1)) {
>>> Should this be:
>>>
>>> if (start >= end)
>>>
>>> ?
>> Yes, we can do this here (since the checks are equivalent) but ...
>>
>>>> +			rc = -EINVAL;
>>>> +			goto err;
>>>> +		}
>>>> +
>>>> +		/* There is no hole between regions */
>>>> +		if (start == (end + 1))
>>> Also here, shouldn't it be:
>>>
>>> if (start == end)
>>>
>>> ?
>>     ... not here.
>>
>> As
>>
>> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"
>>
>> but
>>
>> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
>   
> OK. But the check:
>
>    if (start >= end)
>
> Actually covers both cases so that's the only check we need?

Sorry, I don't entirely understand the question.
Is the question to use only a single check in that loop?

Paste the updated code which I have locally for the convenience.

  [snip]

     /*
      * Mark holes between extended regions as unavailable. The rest of that
      * address space will be available for the allocation.
      */
     for (i = 1; i < nr_reg; i++) {
         resource_size_t start, end;

         start = regs[i - 1].end + 1;
         end = regs[i].start - 1;

         if (start > (end + 1)) {
             rc = -EINVAL;
             goto err;
         }

         /* There is no hole between regions */
         if (start == (end + 1))
             continue;

         tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
         if (!tmp_res) {
             rc = -ENOMEM;
             goto err;
         }

         tmp_res->name = "Unavailable space";
         tmp_res->start = start;
         tmp_res->end = end;

         rc = insert_resource(&xen_resource, tmp_res);
         if (rc) {
             pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc);
             kfree(tmp_res);
             goto err;
         }
     }

[snip]


1. The first check is to detect an overlap (which is a wrong 
configuration, correct?) and bail out if true (for example, regX: 
0x81000000...0x82FFFFFF and regY: 0x82000000...0x83FFFFFF).
2. The second check is just to skip current iteration as there is no 
space/hole between regions (for example, regX: 0x81000000...0x82FFFFFF 
and regY: 0x83000000...0x83FFFFFF).
Therefore I think they should be distinguished.

Yes, both check could be transformed to a single one, but this way the 
overlaps will be ignored:
if (start >= (end + 1))
     continue;

Or I really missed something?


>
>
>>> I think I am missing again something in termination accounting :-)
>> If I understand correctly, we need to follow "end = start + size - 1" rule, so
>> the "end" is the last address inside a range, but not the "first" address
>> outside of a range))
> yeah
>   
>
>>>> +			continue;
>>>> +
>>>> +		/* Check whether the hole range is within the resource range
>>>> */
>>>> +		if (start < res->start || end > res->end) {
>>> By definition I don't think this check is necessary as either condition
>>> is impossible?
>>
>> This is a good question, let me please explain.
>> Not all extended regions provided by the hypervisor can be used here. This is
>> because the addressable physical memory range for which the linear mapping
>> could be created has limits on Arm, and maximum addressable range depends on
>> the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them
>> in hypervisor as this logic could be quite complex as different OS may have
>> different requirement, etc. This means that we need to make sure that regions
>> are within the hotpluggable range to avoid a failure later on when a region is
>> pre-validated by the memory hotplug path.
>>
>> The following code limits the resource range based on that:
>>
>> +    /* Check whether the resource range is within the hotpluggable range */
>> +    mhp_range = mhp_get_pluggable_range(true);
>> +    if (min_gpaddr < mhp_range.start)
>> +        min_gpaddr = mhp_range.start;
>> +    if (max_gpaddr > mhp_range.end)
>> +        max_gpaddr = mhp_range.end;
>> +
>> +    res->start = min_gpaddr;
>> +    res->end = max_gpaddr;
>>
>> In current loop (when calculating and inserting holes) we also need to make
>> sure that resulting hole range is within the resource range (and adjust/skip
>> it if not true) as regs[] used for the calculations contains raw regions as
>> they described in DT so not updated. Otherwise insert_resource() down the
>> function will return an error for the conflicting operations. Yes, I could
>> took a different route and update regs[] in advance to adjust/skip
>> non-suitable regions in front, but I decided to do it on the fly in the loop
>> here, I thought doing it in advance would add some overhead/complexity. What
>> do you think?
> I understand now.
>
>
>> So I am afraid this check is necessary here.
>>
>> For example in my environment the extended regions are:
>>
>> (XEN) Extended region 0: 0->0x8000000
>> (XEN) Extended region 1: 0xc000000->0x30000000
>> (XEN) Extended region 2: 0x40000000->0x47e00000
>> (XEN) Extended region 3: 0xd0000000->0xe6000000
>> (XEN) Extended region 4: 0xe7800000->0xec000000
>> (XEN) Extended region 5: 0xf1200000->0xfd000000
>> (XEN) Extended region 6: 0x100000000->0x500000000
>> (XEN) Extended region 7: 0x580000000->0x600000000
>> (XEN) Extended region 8: 0x680000000->0x700000000
>> (XEN) Extended region 9: 0x780000000->0x10000000000
>>
>> *With* the check the holes are:
>>
>> holes [47e00000 - cfffffff]
>> holes [e6000000 - e77fffff]
>> holes [ec000000 - f11fffff]
>> holes [fd000000 - ffffffff]
>> holes [500000000 - 57fffffff]
>> holes [600000000 - 67fffffff]
>> holes [700000000 - 77fffffff]
>>
>> And they seem to look correct, you can see that two possible holes between
>> extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were
>> skipped as they entirely located below res->start
>> which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).
>>
>> *Without* the check these two holes won't be skipped and as the result
>> insert_resource() will fail.
>>
>>
>> **********
>>
>>
>> I have one idea how we can simplify filter logic, we can drop all checks here
>> (including confusing one) in Arm code and update common code a bit:
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index 1a1e0d3..ed5b855 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>>          struct resource *regs, *tmp_res;
>>          uint64_t min_gpaddr = -1, max_gpaddr = 0;
>>          unsigned int i, nr_reg = 0;
>> -       struct range mhp_range;
>>          int rc;
>>
>>          if (!xen_domain())
>> @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>>                          min_gpaddr = regs[i].start;
>>          }
>>
>> -       /* Check whether the resource range is within the hotpluggable range
>> */
>> -       mhp_range = mhp_get_pluggable_range(true);
>> -       if (min_gpaddr < mhp_range.start)
>> -               min_gpaddr = mhp_range.start;
>> -       if (max_gpaddr > mhp_range.end)
>> -               max_gpaddr = mhp_range.end;
>> -
>>          res->start = min_gpaddr;
>>          res->end = max_gpaddr;
>>
>> @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>>                  if (start == (end + 1))
>>                          continue;
>>
>> -               /* Check whether the hole range is within the resource range
>> */
>> -               if (start < res->start || end > res->end) {
>> -                       if (start < res->start)
>> -                               start = res->start;
>> -                       if (end > res->end)
>> -                               end = res->end;
>> -
>> -                       if (start >= (end + 1))
>> -                               continue;
>> -               }
>> -
>>                  tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>                  if (!tmp_res) {
>>                          rc = -ENOMEM;
>> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
>> index 1f1d8d8..a5d3ebb 100644
>> --- a/drivers/xen/unpopulated-alloc.c
>> +++ b/drivers/xen/unpopulated-alloc.c
>> @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
>>          void *vaddr;
>>          unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>>          int ret;
>> +       struct range mhp_range;
>>
>>          res = kzalloc(sizeof(*res), GFP_KERNEL);
>>          if (!res)
>> @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
>>          res->name = "Xen scratch";
>>          res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>
>> +       mhp_range = mhp_get_pluggable_range(true);
>> +
>>          ret = allocate_resource(target_resource, res,
>> -                               alloc_pages * PAGE_SIZE, 0, -1,
>> +                               alloc_pages * PAGE_SIZE, mhp_range.start,
>> mhp_range.end,
>>                                  PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>>          if (ret < 0) {
>>                  pr_err("Cannot allocate new IOMEM resource\n");
>> (END)
>>
>> I believe, this will work on x86 as arch_get_mappable_range() is not
>> implemented there,
>> and the default option contains exactly what being used currently (0, -1).
>>
>> struct range __weak arch_get_mappable_range(void)
>> {
>>      struct range mhp_range = {
>>          .start = 0UL,
>>          .end = -1ULL,
>>      };
>>      return mhp_range;
>> }
>>
>> And this is going to be more generic and clear, what do you think?
> Yeah this is much better, good thinking!

Great, thank you. Will do in next version.


-- 
Regards,

Oleksandr Tyshchenko


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-19 18:18         ` Oleksandr
@ 2021-11-20  2:19           ` Stefano Stabellini
  2021-11-23 16:46             ` Oleksandr
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-20  2:19 UTC (permalink / raw)
  To: Oleksandr, jgross
  Cc: Stefano Stabellini, xen-devel, linux-kernel,
	Oleksandr Tyshchenko, Boris Ostrovsky, Julien Grall

Juergen please see the bottom of the email

On Fri, 19 Nov 2021, Oleksandr wrote:
> On 19.11.21 02:59, Stefano Stabellini wrote:
> > On Tue, 9 Nov 2021, Oleksandr wrote:
> > > On 28.10.21 19:37, Stefano Stabellini wrote:
> > > 
> > > Hi Stefano
> > > 
> > > I am sorry for the late response.
> > > 
> > > > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > 
> > > > > The main reason of this change is that unpopulated-alloc
> > > > > code cannot be used in its current form on Arm, but there
> > > > > is a desire to reuse it to avoid wasting real RAM pages
> > > > > for the grant/foreign mappings.
> > > > > 
> > > > > The problem is that system "iomem_resource" is used for
> > > > > the address space allocation, but the really unallocated
> > > > > space can't be figured out precisely by the domain on Arm
> > > > > without hypervisor involvement. For example, not all device
> > > > > I/O regions are known by the time domain starts creating
> > > > > grant/foreign mappings. And following the advise from
> > > > > "iomem_resource" we might end up reusing these regions by
> > > > > a mistake. So, the hypervisor which maintains the P2M for
> > > > > the domain is in the best position to provide unused regions
> > > > > of guest physical address space which could be safely used
> > > > > to create grant/foreign mappings.
> > > > > 
> > > > > Introduce new helper arch_xen_unpopulated_init() which purpose
> > > > > is to create specific Xen resource based on the memory regions
> > > > > provided by the hypervisor to be used as unused space for Xen
> > > > > scratch pages.
> > > > > 
> > > > > If arch doesn't implement arch_xen_unpopulated_init() to
> > > > > initialize Xen resource the default "iomem_resource" will be used.
> > > > > So the behavior on x86 won't be changed.
> > > > > 
> > > > > Also fall back to allocate xenballooned pages (steal real RAM
> > > > > pages) if we do not have any suitable resource to work with and
> > > > > as the result we won't be able to provide unpopulated pages.
> > > > > 
> > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > ---
> > > > > Changes RFC -> V2:
> > > > >      - new patch, instead of
> > > > >       "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
> > > > > provide
> > > > > unallocated space"
> > > > > ---
> > > > >    drivers/xen/unpopulated-alloc.c | 89
> > > > > +++++++++++++++++++++++++++++++++++++++--
> > > > >    include/xen/xen.h               |  2 +
> > > > >    2 files changed, 88 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/xen/unpopulated-alloc.c
> > > > > b/drivers/xen/unpopulated-alloc.c
> > > > > index a03dc5b..1f1d8d8 100644
> > > > > --- a/drivers/xen/unpopulated-alloc.c
> > > > > +++ b/drivers/xen/unpopulated-alloc.c
> > > > > @@ -8,6 +8,7 @@
> > > > >      #include <asm/page.h>
> > > > >    +#include <xen/balloon.h>
> > > > >    #include <xen/page.h>
> > > > >    #include <xen/xen.h>
> > > > >    @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
> > > > >    static struct page *page_list;
> > > > >    static unsigned int list_count;
> > > > >    +static struct resource *target_resource;
> > > > > +static struct resource xen_resource = {
> > > > > +	.name = "Xen unused space",
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * If arch is not happy with system "iomem_resource" being used for
> > > > > + * the region allocation it can provide it's own view by initializing
> > > > > + * "xen_resource" with unused regions of guest physical address space
> > > > > + * provided by the hypervisor.
> > > > > + */
> > > > > +int __weak arch_xen_unpopulated_init(struct resource *res)
> > > > > +{
> > > > > +	return -ENOSYS;
> > > > > +}
> > > > > +
> > > > >    static int fill_list(unsigned int nr_pages)
> > > > >    {
> > > > >    	struct dev_pagemap *pgmap;
> > > > > -	struct resource *res;
> > > > > +	struct resource *res, *tmp_res = NULL;
> > > > >    	void *vaddr;
> > > > >    	unsigned int i, alloc_pages = round_up(nr_pages,
> > > > > PAGES_PER_SECTION);
> > > > > -	int ret = -ENOMEM;
> > > > > +	int ret;
> > > > >      	res = kzalloc(sizeof(*res), GFP_KERNEL);
> > > > >    	if (!res)
> > > > > @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
> > > > >    	res->name = "Xen scratch";
> > > > >    	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > > > >    -	ret = allocate_resource(&iomem_resource, res,
> > > > > +	ret = allocate_resource(target_resource, res,
> > > > >    				alloc_pages * PAGE_SIZE, 0, -1,
> > > > >    				PAGES_PER_SECTION * PAGE_SIZE, NULL,
> > > > > NULL);
> > > > >    	if (ret < 0) {
> > > > > @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
> > > > >    		goto err_resource;
> > > > >    	}
> > > > >    +	/*
> > > > > +	 * Reserve the region previously allocated from Xen resource
> > > > > to avoid
> > > > > +	 * re-using it by someone else.
> > > > > +	 */
> > > > > +	if (target_resource != &iomem_resource) {
> > > > > +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
> > > > > +		if (!res) {
> > > > > +			ret = -ENOMEM;
> > > > > +			goto err_insert;
> > > > > +		}
> > > > > +
> > > > > +		tmp_res->name = res->name;
> > > > > +		tmp_res->start = res->start;
> > > > > +		tmp_res->end = res->end;
> > > > > +		tmp_res->flags = res->flags;
> > > > > +
> > > > > +		ret = insert_resource(&iomem_resource, tmp_res);
> > > > > +		if (ret < 0) {
> > > > > +			pr_err("Cannot insert IOMEM resource [%llx -
> > > > > %llx]\n",
> > > > > +			       tmp_res->start, tmp_res->end);
> > > > > +			kfree(tmp_res);
> > > > > +			goto err_insert;
> > > > > +		}
> > > > > +	}
> > > > I am a bit confused.. why do we need to do this? Who could be
> > > > erroneously re-using the region? Are you saying that the next time
> > > > allocate_resource is called it could find the same region again? It
> > > > doesn't seem possible?
> > > 
> > > No, as I understand the allocate_resource() being called for the same root
> > > resource won't provide the same region... We only need to do this (insert
> > > the
> > > region into "iomem_resource") if we allocated it from our *internal*
> > > "xen_resource", as *global* "iomem_resource" (which is used everywhere) is
> > > not
> > > aware of that region has been already allocated. So inserting a region
> > > here we
> > > reserving it, otherwise it could be reused elsewhere.
> > But elsewhere where?
> 
> I think, theoretically everywhere where allocate_resource(&iomem_resource,
> ...) is called.
> 
> 
> > Let's say that allocate_resource allocates a range from xen_resource.
> >  From reading the code, it doesn't look like iomem_resource would have
> > that range because the extended regions described under /hypervisor are
> > not added automatically to iomem_resource.
> > 
> > So what if we don't call insert_resource? Nothing could allocate the
> > same range because iomem_resource doesn't have it at all and
> > xen_resource is not used anywhere if not here.
> > 
> > What am I missing?
> 
> 
> Below my understanding which, of course, might be wrong.
> 
> If we don't claim resource by calling insert_resource (or even
> request_resource) here then the same range could be allocated everywhere where
> allocate_resource(&iomem_resource, ...) is called.
> I don't see what prevents the same range from being allocated. Why actually
> allocate_resource(&iomem_resource, ...) can't provide the same range if it is
> free (not-reserved-yet) from it's PoV? The comment above allocate_resource()
> says "allocate empty slot in the resource tree given range & alignment". So
> this "empty slot" could be exactly the same range.
> 
> I experimented with that a bit trying to call
> allocate_resource(&iomem_resource, ...) several times in another place to see
> what ranges it returns in both cases (w/ and w/o calling insert_resource
> here). So an experiment confirmed (of course, if I made it correctly) that the
> same range could be allocated if we didn't call insert_resource() here. And as
> I understand there is nothing strange here, as iomem_resource covers all
> address space initially (0, -1) and everything *not* inserted/requested (in
> other words, reserved) yet is considered as free and could be provided if fits
> constraints. Or I really missed something?

Thanks for the explanation! It was me that didn't know that
iomem_resource covers all the address space initially. I thought it was
populated only with actual iomem ranges. Now it makes sense, thanks!


> It feels to me that it would be better to call request_resource() instead of
> insert_resource(). It seems, that if no conflict happens both functions will
> behave in same way, but in case of conflict if the conflicting resource
> entirely fit the new resource the former will return an error. I think, this
> way we will be able to detect that a range we are trying to reserve is already
> present and bail out early.
> 
> 
> > 
> > Or maybe it is the other way around: core Linux code assumes everything
> > is described in iomem_resource so something under kernel/ or mm/ would
> > crash if we start using a page pointing to an address missing from
> > iomem_resource?
> >     
> > > > >    	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
> > > > >    	if (!pgmap) {
> > > > >    		ret = -ENOMEM;
> > > > > @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
> > > > >    err_memremap:
> > > > >    	kfree(pgmap);
> > > > >    err_pgmap:
> > > > > +	if (tmp_res) {
> > > > > +		release_resource(tmp_res);
> > > > > +		kfree(tmp_res);
> > > > > +	}
> > > > > +err_insert:
> > > > >    	release_resource(res);
> > > > >    err_resource:
> > > > >    	kfree(res);
> > > > >    	return ret;
> > > > >    }
> > > > >    +static void unpopulated_init(void)
> > > > > +{
> > > > > +	static bool inited = false;
> > > > initialized = false
> > > ok.
> > > 
> > > 
> > > > 
> > > > > +	int ret;
> > > > > +
> > > > > +	if (inited)
> > > > > +		return;
> > > > > +
> > > > > +	/*
> > > > > +	 * Try to initialize Xen resource the first and fall back to
> > > > > default
> > > > > +	 * resource if arch doesn't offer one.
> > > > > +	 */
> > > > > +	ret = arch_xen_unpopulated_init(&xen_resource);
> > > > > +	if (!ret)
> > > > > +		target_resource = &xen_resource;
> > > > > +	else if (ret == -ENOSYS)
> > > > > +		target_resource = &iomem_resource;
> > > > > +	else
> > > > > +		pr_err("Cannot initialize Xen resource\n");
> > > > > +
> > > > > +	inited = true;
> > > > > +}
> > > > Would it make sense to call unpopulated_init from an init function,
> > > > rather than every time xen_alloc_unpopulated_pages is called?
> > > Good point, thank you. Will do. To be honest, I also don't like the
> > > current
> > > approach much.
> > > 
> > > 
> > > > 
> > > > >    /**
> > > > >     * xen_alloc_unpopulated_pages - alloc unpopulated pages
> > > > >     * @nr_pages: Number of pages
> > > > > @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int
> > > > > nr_pages, struct page **pages)
> > > > >    	unsigned int i;
> > > > >    	int ret = 0;
> > > > >    +	unpopulated_init();
> > > > > +
> > > > > +	/*
> > > > > +	 * Fall back to default behavior if we do not have any
> > > > > suitable
> > > > > resource
> > > > > +	 * to allocate required region from and as the result we won't
> > > > > be able
> > > > > to
> > > > > +	 * construct pages.
> > > > > +	 */
> > > > > +	if (!target_resource)
> > > > > +		return alloc_xenballooned_pages(nr_pages, pages);
> > > > The commit message says that the behavior on x86 doesn't change but this
> > > > seems to be a change that could impact x86?
> > > I don't think, however I didn't tested on x86 and might be wrong, but
> > > according to the current patch, on x86 the "target_resource" is always
> > > valid
> > > and points to the "iomem_resource" as arch_xen_unpopulated_init() is not
> > > implemented. So there won't be any fallback to use
> > > alloc_(free)_xenballooned_pages() here and fill_list() will behave as
> > > usual.
> >   If target_resource is always valid, then we don't need this special
> > check. In fact, the condition should never be true.
> 
> 
> The target_resource is always valid and points to the "iomem_resource" on x86
> (this is equivalent to the behavior before this patch).
> On Arm target_resource might be NULL if arch_xen_unpopulated_init() failed,
> for example, if no extended regions reported by the hypervisor.
> We cannot use "iomem_resource" on Arm, only a resource constructed from
> extended regions. This is why I added that check (and fallback to xenballooned
> pages).
> What I was thinking is that in case of using old Xen (although we would need
> to balloon out RAM pages) we still would be able to keep working, so no need
> to disable CONFIG_XEN_UNPOPULATED_ALLOC on such setups.
>  
>    
> > > You raised a really good question, on Arm we need a fallback to balloon
> > > out
> > > RAM pages again if hypervisor doesn't provide extended regions (we run on
> > > old
> > > version, no unused regions with reasonable size, etc), so I decided to put
> > > a
> > > fallback code here, an indicator of the failure is invalid
> > > "target_resource".
> > I think it is unnecessary as we already assume today that
> > &iomem_resource is always available.
> > > I noticed the patch which is about to be upstreamed that removes
> > > alloc_(free)xenballooned_pages API [1]. Right now I have no idea how/where
> > > this fallback could be implemented as this is under build option control
> > > (CONFIG_XEN_UNPOPULATED_ALLOC). So the API with the same name is either
> > > used
> > > for unpopulated pages (if set) or ballooned pages (if not set). I would
> > > appreciate suggestions regarding that. I am wondering would it be possible
> > > and
> > > correctly to have both mechanisms (unpopulated and ballooned) enabled by
> > > default and some init code to decide which one to use at runtime or some
> > > sort?
> > I would keep it simple and remove the fallback from this patch. So:
> > 
> > - if not CONFIG_XEN_UNPOPULATED_ALLOC, then balloon
> > - if CONFIG_XEN_UNPOPULATED_ALLOC, then
> >      - xen_resource if present
> >      - otherwise iomem_resource
> 
> Unfortunately, we cannot use iomem_resource on Arm safely, either xen_resource
> or fail (if no fallback exists).
> 
> 
> > 
> > The xen_resource/iomem_resource config can be done at init time using
> > target_resource. At runtime, target_resource is always != NULL so we
> > just go ahead and use it.
> 
> 
> Thank you for the suggestion. OK, let's keep it simple and drop fallback
> attempts for now. With one remark:
> We will make CONFIG_XEN_UNPOPULATED_ALLOC disabled by default on Arm in next
> patch. So by default everything will behave as usual on Arm (balloon out RAM
> pages),
> if user knows for sure that Xen reports extended regions, he/she can enable
> the config. This way we won't break anything. What do you think?

Actually after reading your replies and explanation I changed opinion: I
think we do need the fallback because Linux cannot really assume that
it is running on "new Xen" so it definitely needs to keep working if
CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not
advertised.

I think we'll have to roll back some of the changes introduced by
121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is
enabled we cannot know if we can use unpopulated-alloc or whether we
have to use alloc_xenballooned_pages until we parse the /hypervisor node
in device tree at runtime.

In short, we cannot switch between unpopulated-alloc and
alloc_xenballooned_pages at build time, we have to do it at runtime
(boot time).

Juergen, what do you think?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
  2021-11-19 20:23           ` Oleksandr
@ 2021-11-20  2:36             ` Stefano Stabellini
  -1 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-20  2:36 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, linux-arm-kernel, linux-kernel,
	Oleksandr Tyshchenko, Russell King, Boris Ostrovsky,
	Juergen Gross, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 8464 bytes --]

On Fri, 19 Nov 2021, Oleksandr wrote:
> On 19.11.21 03:19, Stefano Stabellini wrote:
> > On Wed, 10 Nov 2021, Oleksandr wrote:
> > > On 28.10.21 04:40, Stefano Stabellini wrote:
> > > 
> > > Hi Stefano
> > > 
> > > I am sorry for the late response.
> > > 
> > > > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > 
> > > > > This patch implements arch_xen_unpopulated_init() on Arm where
> > > > > the extended regions (if any) are gathered from DT and inserted
> > > > > into passed Xen resource to be used as unused address space
> > > > > for Xen scratch pages by unpopulated-alloc code.
> > > > > 
> > > > > The extended region (safe range) is a region of guest physical
> > > > > address space which is unused and could be safely used to create
> > > > > grant/foreign mappings instead of wasting real RAM pages from
> > > > > the domain memory for establishing these mappings.
> > > > > 
> > > > > The extended regions are chosen by the hypervisor at the domain
> > > > > creation time and advertised to it via "reg" property under
> > > > > hypervisor node in the guest device-tree. As region 0 is reserved
> > > > > for grant table space (always present), the indexes for extended
> > > > > regions are 1...N.
> > > > > 
> > > > > If arch_xen_unpopulated_init() fails for some reason the default
> > > > > behaviour will be restored (allocate xenballooned pages).
> > > > > 
> > > > > This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> > > > > 
> > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > ---
> > > > > Changes RFC -> V2:
> > > > >      - new patch, instead of
> > > > >       "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
> > > > > provide
> > > > > unallocated space"
> > > > > ---
> > > > >    arch/arm/xen/enlighten.c | 112
> > > > > +++++++++++++++++++++++++++++++++++++++++++++++
> > > > >    drivers/xen/Kconfig      |   2 +-
> > > > >    2 files changed, 113 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > > > index dea46ec..1a1e0d3 100644
> > > > > --- a/arch/arm/xen/enlighten.c
> > > > > +++ b/arch/arm/xen/enlighten.c
> > > > > @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
> > > > >    static phys_addr_t xen_grant_frames;
> > > > >      #define GRANT_TABLE_INDEX   0
> > > > > +#define EXT_REGION_INDEX    1
> > > > >      uint32_t xen_start_flags;
> > > > >    EXPORT_SYMBOL(xen_start_flags);
> > > > > @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
> > > > >    #endif
> > > > >    }
> > > > >    +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > > > +int arch_xen_unpopulated_init(struct resource *res)
> > > > > +{
> > > > > +	struct device_node *np;
> > > > > +	struct resource *regs, *tmp_res;
> > > > > +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
> > > > > +	unsigned int i, nr_reg = 0;
> > > > > +	struct range mhp_range;
> > > > > +	int rc;
> > > > > +
> > > > > +	if (!xen_domain())
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
> > > > > +	if (WARN_ON(!np))
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	/* Skip region 0 which is reserved for grant table space */
> > > > > +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL,
> > > > > NULL))
> > > > > +		nr_reg++;
> > > > > +	if (!nr_reg) {
> > > > > +		pr_err("No extended regions are found\n");
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > > +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> > > > > +	if (!regs)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	/*
> > > > > +	 * Create resource from extended regions provided by the
> > > > > hypervisor to
> > > > > be
> > > > > +	 * used as unused address space for Xen scratch pages.
> > > > > +	 */
> > > > > +	for (i = 0; i < nr_reg; i++) {
> > > > > +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
> > > > > &regs[i]);
> > > > > +		if (rc)
> > > > > +			goto err;
> > > > > +
> > > > > +		if (max_gpaddr < regs[i].end)
> > > > > +			max_gpaddr = regs[i].end;
> > > > > +		if (min_gpaddr > regs[i].start)
> > > > > +			min_gpaddr = regs[i].start;
> > > > > +	}
> > > > > +
> > > > > +	/* Check whether the resource range is within the hotpluggable
> > > > > range
> > > > > */
> > > > > +	mhp_range = mhp_get_pluggable_range(true);
> > > > > +	if (min_gpaddr < mhp_range.start)
> > > > > +		min_gpaddr = mhp_range.start;
> > > > > +	if (max_gpaddr > mhp_range.end)
> > > > > +		max_gpaddr = mhp_range.end;
> > > > > +
> > > > > +	res->start = min_gpaddr;
> > > > > +	res->end = max_gpaddr;
> > > > > +
> > > > > +	/*
> > > > > +	 * Mark holes between extended regions as unavailable. The
> > > > > rest of
> > > > > that
> > > > > +	 * address space will be available for the allocation.
> > > > > +	 */
> > > > > +	for (i = 1; i < nr_reg; i++) {
> > > > > +		resource_size_t start, end;
> > > > > +
> > > > > +		start = regs[i - 1].end + 1;
> > > > > +		end = regs[i].start - 1;
> > > > > +
> > > > > +		if (start > (end + 1)) {
> > > > Should this be:
> > > > 
> > > > if (start >= end)
> > > > 
> > > > ?
> > > Yes, we can do this here (since the checks are equivalent) but ...
> > > 
> > > > > +			rc = -EINVAL;
> > > > > +			goto err;
> > > > > +		}
> > > > > +
> > > > > +		/* There is no hole between regions */
> > > > > +		if (start == (end + 1))
> > > > Also here, shouldn't it be:
> > > > 
> > > > if (start == end)
> > > > 
> > > > ?
> > >     ... not here.
> > > 
> > > As
> > > 
> > > "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 ==
> > > regs[i].start)"
> > > 
> > > but
> > > 
> > > "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
> >   OK. But the check:
> > 
> >    if (start >= end)
> > 
> > Actually covers both cases so that's the only check we need?
> 
> Sorry, I don't entirely understand the question.
> Is the question to use only a single check in that loop?
> 
> Paste the updated code which I have locally for the convenience.
> 
>  [snip]
> 
>     /*
>      * Mark holes between extended regions as unavailable. The rest of that
>      * address space will be available for the allocation.
>      */
>     for (i = 1; i < nr_reg; i++) {
>         resource_size_t start, end;
> 
>         start = regs[i - 1].end + 1;
>         end = regs[i].start - 1;
> 
>         if (start > (end + 1)) {
>             rc = -EINVAL;
>             goto err;
>         }
> 
>         /* There is no hole between regions */
>         if (start == (end + 1))
>             continue;
> 
>         tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>         if (!tmp_res) {
>             rc = -ENOMEM;
>             goto err;
>         }
> 
>         tmp_res->name = "Unavailable space";
>         tmp_res->start = start;
>         tmp_res->end = end;
> 
>         rc = insert_resource(&xen_resource, tmp_res);
>         if (rc) {
>             pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc);
>             kfree(tmp_res);
>             goto err;
>         }
>     }
> 
> [snip]
> 
> 
> 1. The first check is to detect an overlap (which is a wrong configuration,
> correct?) and bail out if true (for example, regX: 0x81000000...0x82FFFFFF and
> regY: 0x82000000...0x83FFFFFF).
> 2. The second check is just to skip current iteration as there is no
> space/hole between regions (for example, regX: 0x81000000...0x82FFFFFF and
> regY: 0x83000000...0x83FFFFFF).
> Therefore I think they should be distinguished.
> 
> Yes, both check could be transformed to a single one, but this way the
> overlaps will be ignored:
> if (start >= (end + 1))
>     continue;
> 
> Or I really missed something?

You are right it is better to distinguish the two cases. I suggest the
code below because I think it is a clearer, even if it might be slightly
less efficient. I don't feel too strongly about it though.

		resource_size_t start, end;

		/* There is no hole between regions */
		if ( regs[i - 1].end + 1 == regs[i].start )
			continue;

		if ( regs[i - 1].end + 1 > regs[i].start) {
			rc = -EINVAL;
			goto err;
		}

		start = regs[i - 1].end + 1;
		end = regs[i].start - 1;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
@ 2021-11-20  2:36             ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-20  2:36 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, linux-arm-kernel, linux-kernel,
	Oleksandr Tyshchenko, Russell King, Boris Ostrovsky,
	Juergen Gross, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 8464 bytes --]

On Fri, 19 Nov 2021, Oleksandr wrote:
> On 19.11.21 03:19, Stefano Stabellini wrote:
> > On Wed, 10 Nov 2021, Oleksandr wrote:
> > > On 28.10.21 04:40, Stefano Stabellini wrote:
> > > 
> > > Hi Stefano
> > > 
> > > I am sorry for the late response.
> > > 
> > > > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > 
> > > > > This patch implements arch_xen_unpopulated_init() on Arm where
> > > > > the extended regions (if any) are gathered from DT and inserted
> > > > > into passed Xen resource to be used as unused address space
> > > > > for Xen scratch pages by unpopulated-alloc code.
> > > > > 
> > > > > The extended region (safe range) is a region of guest physical
> > > > > address space which is unused and could be safely used to create
> > > > > grant/foreign mappings instead of wasting real RAM pages from
> > > > > the domain memory for establishing these mappings.
> > > > > 
> > > > > The extended regions are chosen by the hypervisor at the domain
> > > > > creation time and advertised to it via "reg" property under
> > > > > hypervisor node in the guest device-tree. As region 0 is reserved
> > > > > for grant table space (always present), the indexes for extended
> > > > > regions are 1...N.
> > > > > 
> > > > > If arch_xen_unpopulated_init() fails for some reason the default
> > > > > behaviour will be restored (allocate xenballooned pages).
> > > > > 
> > > > > This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> > > > > 
> > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > ---
> > > > > Changes RFC -> V2:
> > > > >      - new patch, instead of
> > > > >       "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
> > > > > provide
> > > > > unallocated space"
> > > > > ---
> > > > >    arch/arm/xen/enlighten.c | 112
> > > > > +++++++++++++++++++++++++++++++++++++++++++++++
> > > > >    drivers/xen/Kconfig      |   2 +-
> > > > >    2 files changed, 113 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > > > index dea46ec..1a1e0d3 100644
> > > > > --- a/arch/arm/xen/enlighten.c
> > > > > +++ b/arch/arm/xen/enlighten.c
> > > > > @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
> > > > >    static phys_addr_t xen_grant_frames;
> > > > >      #define GRANT_TABLE_INDEX   0
> > > > > +#define EXT_REGION_INDEX    1
> > > > >      uint32_t xen_start_flags;
> > > > >    EXPORT_SYMBOL(xen_start_flags);
> > > > > @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
> > > > >    #endif
> > > > >    }
> > > > >    +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > > > +int arch_xen_unpopulated_init(struct resource *res)
> > > > > +{
> > > > > +	struct device_node *np;
> > > > > +	struct resource *regs, *tmp_res;
> > > > > +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
> > > > > +	unsigned int i, nr_reg = 0;
> > > > > +	struct range mhp_range;
> > > > > +	int rc;
> > > > > +
> > > > > +	if (!xen_domain())
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
> > > > > +	if (WARN_ON(!np))
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	/* Skip region 0 which is reserved for grant table space */
> > > > > +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL,
> > > > > NULL))
> > > > > +		nr_reg++;
> > > > > +	if (!nr_reg) {
> > > > > +		pr_err("No extended regions are found\n");
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > > +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> > > > > +	if (!regs)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	/*
> > > > > +	 * Create resource from extended regions provided by the
> > > > > hypervisor to
> > > > > be
> > > > > +	 * used as unused address space for Xen scratch pages.
> > > > > +	 */
> > > > > +	for (i = 0; i < nr_reg; i++) {
> > > > > +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
> > > > > &regs[i]);
> > > > > +		if (rc)
> > > > > +			goto err;
> > > > > +
> > > > > +		if (max_gpaddr < regs[i].end)
> > > > > +			max_gpaddr = regs[i].end;
> > > > > +		if (min_gpaddr > regs[i].start)
> > > > > +			min_gpaddr = regs[i].start;
> > > > > +	}
> > > > > +
> > > > > +	/* Check whether the resource range is within the hotpluggable
> > > > > range
> > > > > */
> > > > > +	mhp_range = mhp_get_pluggable_range(true);
> > > > > +	if (min_gpaddr < mhp_range.start)
> > > > > +		min_gpaddr = mhp_range.start;
> > > > > +	if (max_gpaddr > mhp_range.end)
> > > > > +		max_gpaddr = mhp_range.end;
> > > > > +
> > > > > +	res->start = min_gpaddr;
> > > > > +	res->end = max_gpaddr;
> > > > > +
> > > > > +	/*
> > > > > +	 * Mark holes between extended regions as unavailable. The
> > > > > rest of
> > > > > that
> > > > > +	 * address space will be available for the allocation.
> > > > > +	 */
> > > > > +	for (i = 1; i < nr_reg; i++) {
> > > > > +		resource_size_t start, end;
> > > > > +
> > > > > +		start = regs[i - 1].end + 1;
> > > > > +		end = regs[i].start - 1;
> > > > > +
> > > > > +		if (start > (end + 1)) {
> > > > Should this be:
> > > > 
> > > > if (start >= end)
> > > > 
> > > > ?
> > > Yes, we can do this here (since the checks are equivalent) but ...
> > > 
> > > > > +			rc = -EINVAL;
> > > > > +			goto err;
> > > > > +		}
> > > > > +
> > > > > +		/* There is no hole between regions */
> > > > > +		if (start == (end + 1))
> > > > Also here, shouldn't it be:
> > > > 
> > > > if (start == end)
> > > > 
> > > > ?
> > >     ... not here.
> > > 
> > > As
> > > 
> > > "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 ==
> > > regs[i].start)"
> > > 
> > > but
> > > 
> > > "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
> >   OK. But the check:
> > 
> >    if (start >= end)
> > 
> > Actually covers both cases so that's the only check we need?
> 
> Sorry, I don't entirely understand the question.
> Is the question to use only a single check in that loop?
> 
> Paste the updated code which I have locally for the convenience.
> 
>  [snip]
> 
>     /*
>      * Mark holes between extended regions as unavailable. The rest of that
>      * address space will be available for the allocation.
>      */
>     for (i = 1; i < nr_reg; i++) {
>         resource_size_t start, end;
> 
>         start = regs[i - 1].end + 1;
>         end = regs[i].start - 1;
> 
>         if (start > (end + 1)) {
>             rc = -EINVAL;
>             goto err;
>         }
> 
>         /* There is no hole between regions */
>         if (start == (end + 1))
>             continue;
> 
>         tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>         if (!tmp_res) {
>             rc = -ENOMEM;
>             goto err;
>         }
> 
>         tmp_res->name = "Unavailable space";
>         tmp_res->start = start;
>         tmp_res->end = end;
> 
>         rc = insert_resource(&xen_resource, tmp_res);
>         if (rc) {
>             pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc);
>             kfree(tmp_res);
>             goto err;
>         }
>     }
> 
> [snip]
> 
> 
> 1. The first check is to detect an overlap (which is a wrong configuration,
> correct?) and bail out if true (for example, regX: 0x81000000...0x82FFFFFF and
> regY: 0x82000000...0x83FFFFFF).
> 2. The second check is just to skip current iteration as there is no
> space/hole between regions (for example, regX: 0x81000000...0x82FFFFFF and
> regY: 0x83000000...0x83FFFFFF).
> Therefore I think they should be distinguished.
> 
> Yes, both check could be transformed to a single one, but this way the
> overlaps will be ignored:
> if (start >= (end + 1))
>     continue;
> 
> Or I really missed something?

You are right it is better to distinguish the two cases. I suggest the
code below because I think it is a clearer, even if it might be slightly
less efficient. I don't feel too strongly about it though.

		resource_size_t start, end;

		/* There is no hole between regions */
		if ( regs[i - 1].end + 1 == regs[i].start )
			continue;

		if ( regs[i - 1].end + 1 > regs[i].start) {
			rc = -EINVAL;
			goto err;
		}

		start = regs[i - 1].end + 1;
		end = regs[i].start - 1;

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
  2021-11-20  2:36             ` Stefano Stabellini
@ 2021-11-20 13:38               ` Oleksandr
  -1 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-20 13:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Boris Ostrovsky, Juergen Gross, Julien Grall


On 20.11.21 04:36, Stefano Stabellini wrote:


Hi Stefano

> On Fri, 19 Nov 2021, Oleksandr wrote:
>> On 19.11.21 03:19, Stefano Stabellini wrote:
>>> On Wed, 10 Nov 2021, Oleksandr wrote:
>>>> On 28.10.21 04:40, Stefano Stabellini wrote:
>>>>
>>>> Hi Stefano
>>>>
>>>> I am sorry for the late response.
>>>>
>>>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> This patch implements arch_xen_unpopulated_init() on Arm where
>>>>>> the extended regions (if any) are gathered from DT and inserted
>>>>>> into passed Xen resource to be used as unused address space
>>>>>> for Xen scratch pages by unpopulated-alloc code.
>>>>>>
>>>>>> The extended region (safe range) is a region of guest physical
>>>>>> address space which is unused and could be safely used to create
>>>>>> grant/foreign mappings instead of wasting real RAM pages from
>>>>>> the domain memory for establishing these mappings.
>>>>>>
>>>>>> The extended regions are chosen by the hypervisor at the domain
>>>>>> creation time and advertised to it via "reg" property under
>>>>>> hypervisor node in the guest device-tree. As region 0 is reserved
>>>>>> for grant table space (always present), the indexes for extended
>>>>>> regions are 1...N.
>>>>>>
>>>>>> If arch_xen_unpopulated_init() fails for some reason the default
>>>>>> behaviour will be restored (allocate xenballooned pages).
>>>>>>
>>>>>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>>>>>
>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> ---
>>>>>> Changes RFC -> V2:
>>>>>>       - new patch, instead of
>>>>>>        "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
>>>>>> provide
>>>>>> unallocated space"
>>>>>> ---
>>>>>>     arch/arm/xen/enlighten.c | 112
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     drivers/xen/Kconfig      |   2 +-
>>>>>>     2 files changed, 113 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>>>> index dea46ec..1a1e0d3 100644
>>>>>> --- a/arch/arm/xen/enlighten.c
>>>>>> +++ b/arch/arm/xen/enlighten.c
>>>>>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>>>>>     static phys_addr_t xen_grant_frames;
>>>>>>       #define GRANT_TABLE_INDEX   0
>>>>>> +#define EXT_REGION_INDEX    1
>>>>>>       uint32_t xen_start_flags;
>>>>>>     EXPORT_SYMBOL(xen_start_flags);
>>>>>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>>>>>     #endif
>>>>>>     }
>>>>>>     +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>>>> +int arch_xen_unpopulated_init(struct resource *res)
>>>>>> +{
>>>>>> +	struct device_node *np;
>>>>>> +	struct resource *regs, *tmp_res;
>>>>>> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
>>>>>> +	unsigned int i, nr_reg = 0;
>>>>>> +	struct range mhp_range;
>>>>>> +	int rc;
>>>>>> +
>>>>>> +	if (!xen_domain())
>>>>>> +		return -ENODEV;
>>>>>> +
>>>>>> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
>>>>>> +	if (WARN_ON(!np))
>>>>>> +		return -ENODEV;
>>>>>> +
>>>>>> +	/* Skip region 0 which is reserved for grant table space */
>>>>>> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL,
>>>>>> NULL))
>>>>>> +		nr_reg++;
>>>>>> +	if (!nr_reg) {
>>>>>> +		pr_err("No extended regions are found\n");
>>>>>> +		return -EINVAL;
>>>>>> +	}
>>>>>> +
>>>>>> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>>>>>> +	if (!regs)
>>>>>> +		return -ENOMEM;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Create resource from extended regions provided by the
>>>>>> hypervisor to
>>>>>> be
>>>>>> +	 * used as unused address space for Xen scratch pages.
>>>>>> +	 */
>>>>>> +	for (i = 0; i < nr_reg; i++) {
>>>>>> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
>>>>>> &regs[i]);
>>>>>> +		if (rc)
>>>>>> +			goto err;
>>>>>> +
>>>>>> +		if (max_gpaddr < regs[i].end)
>>>>>> +			max_gpaddr = regs[i].end;
>>>>>> +		if (min_gpaddr > regs[i].start)
>>>>>> +			min_gpaddr = regs[i].start;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* Check whether the resource range is within the hotpluggable
>>>>>> range
>>>>>> */
>>>>>> +	mhp_range = mhp_get_pluggable_range(true);
>>>>>> +	if (min_gpaddr < mhp_range.start)
>>>>>> +		min_gpaddr = mhp_range.start;
>>>>>> +	if (max_gpaddr > mhp_range.end)
>>>>>> +		max_gpaddr = mhp_range.end;
>>>>>> +
>>>>>> +	res->start = min_gpaddr;
>>>>>> +	res->end = max_gpaddr;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Mark holes between extended regions as unavailable. The
>>>>>> rest of
>>>>>> that
>>>>>> +	 * address space will be available for the allocation.
>>>>>> +	 */
>>>>>> +	for (i = 1; i < nr_reg; i++) {
>>>>>> +		resource_size_t start, end;
>>>>>> +
>>>>>> +		start = regs[i - 1].end + 1;
>>>>>> +		end = regs[i].start - 1;
>>>>>> +
>>>>>> +		if (start > (end + 1)) {
>>>>> Should this be:
>>>>>
>>>>> if (start >= end)
>>>>>
>>>>> ?
>>>> Yes, we can do this here (since the checks are equivalent) but ...
>>>>
>>>>>> +			rc = -EINVAL;
>>>>>> +			goto err;
>>>>>> +		}
>>>>>> +
>>>>>> +		/* There is no hole between regions */
>>>>>> +		if (start == (end + 1))
>>>>> Also here, shouldn't it be:
>>>>>
>>>>> if (start == end)
>>>>>
>>>>> ?
>>>>      ... not here.
>>>>
>>>> As
>>>>
>>>> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 ==
>>>> regs[i].start)"
>>>>
>>>> but
>>>>
>>>> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
>>>    OK. But the check:
>>>
>>>     if (start >= end)
>>>
>>> Actually covers both cases so that's the only check we need?
>> Sorry, I don't entirely understand the question.
>> Is the question to use only a single check in that loop?
>>
>> Paste the updated code which I have locally for the convenience.
>>
>>   [snip]
>>
>>      /*
>>       * Mark holes between extended regions as unavailable. The rest of that
>>       * address space will be available for the allocation.
>>       */
>>      for (i = 1; i < nr_reg; i++) {
>>          resource_size_t start, end;
>>
>>          start = regs[i - 1].end + 1;
>>          end = regs[i].start - 1;
>>
>>          if (start > (end + 1)) {
>>              rc = -EINVAL;
>>              goto err;
>>          }
>>
>>          /* There is no hole between regions */
>>          if (start == (end + 1))
>>              continue;
>>
>>          tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>          if (!tmp_res) {
>>              rc = -ENOMEM;
>>              goto err;
>>          }
>>
>>          tmp_res->name = "Unavailable space";
>>          tmp_res->start = start;
>>          tmp_res->end = end;
>>
>>          rc = insert_resource(&xen_resource, tmp_res);
>>          if (rc) {
>>              pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc);
>>              kfree(tmp_res);
>>              goto err;
>>          }
>>      }
>>
>> [snip]
>>
>>
>> 1. The first check is to detect an overlap (which is a wrong configuration,
>> correct?) and bail out if true (for example, regX: 0x81000000...0x82FFFFFF and
>> regY: 0x82000000...0x83FFFFFF).
>> 2. The second check is just to skip current iteration as there is no
>> space/hole between regions (for example, regX: 0x81000000...0x82FFFFFF and
>> regY: 0x83000000...0x83FFFFFF).
>> Therefore I think they should be distinguished.
>>
>> Yes, both check could be transformed to a single one, but this way the
>> overlaps will be ignored:
>> if (start >= (end + 1))
>>      continue;
>>
>> Or I really missed something?
> You are right it is better to distinguish the two cases. I suggest the
> code below because I think it is a clearer, even if it might be slightly
> less efficient. I don't feel too strongly about it though.
>
> 		resource_size_t start, end;
>
> 		/* There is no hole between regions */
> 		if ( regs[i - 1].end + 1 == regs[i].start )
> 			continue;
>
> 		if ( regs[i - 1].end + 1 > regs[i].start) {
> 			rc = -EINVAL;
> 			goto err;
> 		}
>
> 		start = regs[i - 1].end + 1;
> 		end = regs[i].start - 1;

OK, let's make code clearer, will do.


-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
@ 2021-11-20 13:38               ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-20 13:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-arm-kernel, linux-kernel, Oleksandr Tyshchenko,
	Russell King, Boris Ostrovsky, Juergen Gross, Julien Grall


On 20.11.21 04:36, Stefano Stabellini wrote:


Hi Stefano

> On Fri, 19 Nov 2021, Oleksandr wrote:
>> On 19.11.21 03:19, Stefano Stabellini wrote:
>>> On Wed, 10 Nov 2021, Oleksandr wrote:
>>>> On 28.10.21 04:40, Stefano Stabellini wrote:
>>>>
>>>> Hi Stefano
>>>>
>>>> I am sorry for the late response.
>>>>
>>>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> This patch implements arch_xen_unpopulated_init() on Arm where
>>>>>> the extended regions (if any) are gathered from DT and inserted
>>>>>> into passed Xen resource to be used as unused address space
>>>>>> for Xen scratch pages by unpopulated-alloc code.
>>>>>>
>>>>>> The extended region (safe range) is a region of guest physical
>>>>>> address space which is unused and could be safely used to create
>>>>>> grant/foreign mappings instead of wasting real RAM pages from
>>>>>> the domain memory for establishing these mappings.
>>>>>>
>>>>>> The extended regions are chosen by the hypervisor at the domain
>>>>>> creation time and advertised to it via "reg" property under
>>>>>> hypervisor node in the guest device-tree. As region 0 is reserved
>>>>>> for grant table space (always present), the indexes for extended
>>>>>> regions are 1...N.
>>>>>>
>>>>>> If arch_xen_unpopulated_init() fails for some reason the default
>>>>>> behaviour will be restored (allocate xenballooned pages).
>>>>>>
>>>>>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>>>>>
>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> ---
>>>>>> Changes RFC -> V2:
>>>>>>       - new patch, instead of
>>>>>>        "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
>>>>>> provide
>>>>>> unallocated space"
>>>>>> ---
>>>>>>     arch/arm/xen/enlighten.c | 112
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     drivers/xen/Kconfig      |   2 +-
>>>>>>     2 files changed, 113 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>>>> index dea46ec..1a1e0d3 100644
>>>>>> --- a/arch/arm/xen/enlighten.c
>>>>>> +++ b/arch/arm/xen/enlighten.c
>>>>>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>>>>>     static phys_addr_t xen_grant_frames;
>>>>>>       #define GRANT_TABLE_INDEX   0
>>>>>> +#define EXT_REGION_INDEX    1
>>>>>>       uint32_t xen_start_flags;
>>>>>>     EXPORT_SYMBOL(xen_start_flags);
>>>>>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>>>>>     #endif
>>>>>>     }
>>>>>>     +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>>>> +int arch_xen_unpopulated_init(struct resource *res)
>>>>>> +{
>>>>>> +	struct device_node *np;
>>>>>> +	struct resource *regs, *tmp_res;
>>>>>> +	uint64_t min_gpaddr = -1, max_gpaddr = 0;
>>>>>> +	unsigned int i, nr_reg = 0;
>>>>>> +	struct range mhp_range;
>>>>>> +	int rc;
>>>>>> +
>>>>>> +	if (!xen_domain())
>>>>>> +		return -ENODEV;
>>>>>> +
>>>>>> +	np = of_find_compatible_node(NULL, NULL, "xen,xen");
>>>>>> +	if (WARN_ON(!np))
>>>>>> +		return -ENODEV;
>>>>>> +
>>>>>> +	/* Skip region 0 which is reserved for grant table space */
>>>>>> +	while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL,
>>>>>> NULL))
>>>>>> +		nr_reg++;
>>>>>> +	if (!nr_reg) {
>>>>>> +		pr_err("No extended regions are found\n");
>>>>>> +		return -EINVAL;
>>>>>> +	}
>>>>>> +
>>>>>> +	regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>>>>>> +	if (!regs)
>>>>>> +		return -ENOMEM;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Create resource from extended regions provided by the
>>>>>> hypervisor to
>>>>>> be
>>>>>> +	 * used as unused address space for Xen scratch pages.
>>>>>> +	 */
>>>>>> +	for (i = 0; i < nr_reg; i++) {
>>>>>> +		rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
>>>>>> &regs[i]);
>>>>>> +		if (rc)
>>>>>> +			goto err;
>>>>>> +
>>>>>> +		if (max_gpaddr < regs[i].end)
>>>>>> +			max_gpaddr = regs[i].end;
>>>>>> +		if (min_gpaddr > regs[i].start)
>>>>>> +			min_gpaddr = regs[i].start;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* Check whether the resource range is within the hotpluggable
>>>>>> range
>>>>>> */
>>>>>> +	mhp_range = mhp_get_pluggable_range(true);
>>>>>> +	if (min_gpaddr < mhp_range.start)
>>>>>> +		min_gpaddr = mhp_range.start;
>>>>>> +	if (max_gpaddr > mhp_range.end)
>>>>>> +		max_gpaddr = mhp_range.end;
>>>>>> +
>>>>>> +	res->start = min_gpaddr;
>>>>>> +	res->end = max_gpaddr;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Mark holes between extended regions as unavailable. The
>>>>>> rest of
>>>>>> that
>>>>>> +	 * address space will be available for the allocation.
>>>>>> +	 */
>>>>>> +	for (i = 1; i < nr_reg; i++) {
>>>>>> +		resource_size_t start, end;
>>>>>> +
>>>>>> +		start = regs[i - 1].end + 1;
>>>>>> +		end = regs[i].start - 1;
>>>>>> +
>>>>>> +		if (start > (end + 1)) {
>>>>> Should this be:
>>>>>
>>>>> if (start >= end)
>>>>>
>>>>> ?
>>>> Yes, we can do this here (since the checks are equivalent) but ...
>>>>
>>>>>> +			rc = -EINVAL;
>>>>>> +			goto err;
>>>>>> +		}
>>>>>> +
>>>>>> +		/* There is no hole between regions */
>>>>>> +		if (start == (end + 1))
>>>>> Also here, shouldn't it be:
>>>>>
>>>>> if (start == end)
>>>>>
>>>>> ?
>>>>      ... not here.
>>>>
>>>> As
>>>>
>>>> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 ==
>>>> regs[i].start)"
>>>>
>>>> but
>>>>
>>>> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
>>>    OK. But the check:
>>>
>>>     if (start >= end)
>>>
>>> Actually covers both cases so that's the only check we need?
>> Sorry, I don't entirely understand the question.
>> Is the question to use only a single check in that loop?
>>
>> Paste the updated code which I have locally for the convenience.
>>
>>   [snip]
>>
>>      /*
>>       * Mark holes between extended regions as unavailable. The rest of that
>>       * address space will be available for the allocation.
>>       */
>>      for (i = 1; i < nr_reg; i++) {
>>          resource_size_t start, end;
>>
>>          start = regs[i - 1].end + 1;
>>          end = regs[i].start - 1;
>>
>>          if (start > (end + 1)) {
>>              rc = -EINVAL;
>>              goto err;
>>          }
>>
>>          /* There is no hole between regions */
>>          if (start == (end + 1))
>>              continue;
>>
>>          tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>          if (!tmp_res) {
>>              rc = -ENOMEM;
>>              goto err;
>>          }
>>
>>          tmp_res->name = "Unavailable space";
>>          tmp_res->start = start;
>>          tmp_res->end = end;
>>
>>          rc = insert_resource(&xen_resource, tmp_res);
>>          if (rc) {
>>              pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc);
>>              kfree(tmp_res);
>>              goto err;
>>          }
>>      }
>>
>> [snip]
>>
>>
>> 1. The first check is to detect an overlap (which is a wrong configuration,
>> correct?) and bail out if true (for example, regX: 0x81000000...0x82FFFFFF and
>> regY: 0x82000000...0x83FFFFFF).
>> 2. The second check is just to skip current iteration as there is no
>> space/hole between regions (for example, regX: 0x81000000...0x82FFFFFF and
>> regY: 0x83000000...0x83FFFFFF).
>> Therefore I think they should be distinguished.
>>
>> Yes, both check could be transformed to a single one, but this way the
>> overlaps will be ignored:
>> if (start >= (end + 1))
>>      continue;
>>
>> Or I really missed something?
> You are right it is better to distinguish the two cases. I suggest the
> code below because I think it is a clearer, even if it might be slightly
> less efficient. I don't feel too strongly about it though.
>
> 		resource_size_t start, end;
>
> 		/* There is no hole between regions */
> 		if ( regs[i - 1].end + 1 == regs[i].start )
> 			continue;
>
> 		if ( regs[i - 1].end + 1 > regs[i].start) {
> 			rc = -EINVAL;
> 			goto err;
> 		}
>
> 		start = regs[i - 1].end + 1;
> 		end = regs[i].start - 1;

OK, let's make code clearer, will do.


-- 
Regards,

Oleksandr Tyshchenko


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-20  2:19           ` Stefano Stabellini
@ 2021-11-23 16:46             ` Oleksandr
  2021-11-23 21:25               ` Stefano Stabellini
  2021-11-24  5:16               ` Juergen Gross
  0 siblings, 2 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-23 16:46 UTC (permalink / raw)
  To: Stefano Stabellini, jgross
  Cc: xen-devel, linux-kernel, Oleksandr Tyshchenko, Boris Ostrovsky,
	Julien Grall


On 20.11.21 04:19, Stefano Stabellini wrote:

Hi Stefano, Juergen, all


> Juergen please see the bottom of the email
>
> On Fri, 19 Nov 2021, Oleksandr wrote:
>> On 19.11.21 02:59, Stefano Stabellini wrote:
>>> On Tue, 9 Nov 2021, Oleksandr wrote:
>>>> On 28.10.21 19:37, Stefano Stabellini wrote:
>>>>
>>>> Hi Stefano
>>>>
>>>> I am sorry for the late response.
>>>>
>>>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> The main reason of this change is that unpopulated-alloc
>>>>>> code cannot be used in its current form on Arm, but there
>>>>>> is a desire to reuse it to avoid wasting real RAM pages
>>>>>> for the grant/foreign mappings.
>>>>>>
>>>>>> The problem is that system "iomem_resource" is used for
>>>>>> the address space allocation, but the really unallocated
>>>>>> space can't be figured out precisely by the domain on Arm
>>>>>> without hypervisor involvement. For example, not all device
>>>>>> I/O regions are known by the time domain starts creating
>>>>>> grant/foreign mappings. And following the advise from
>>>>>> "iomem_resource" we might end up reusing these regions by
>>>>>> a mistake. So, the hypervisor which maintains the P2M for
>>>>>> the domain is in the best position to provide unused regions
>>>>>> of guest physical address space which could be safely used
>>>>>> to create grant/foreign mappings.
>>>>>>
>>>>>> Introduce new helper arch_xen_unpopulated_init() which purpose
>>>>>> is to create specific Xen resource based on the memory regions
>>>>>> provided by the hypervisor to be used as unused space for Xen
>>>>>> scratch pages.
>>>>>>
>>>>>> If arch doesn't implement arch_xen_unpopulated_init() to
>>>>>> initialize Xen resource the default "iomem_resource" will be used.
>>>>>> So the behavior on x86 won't be changed.
>>>>>>
>>>>>> Also fall back to allocate xenballooned pages (steal real RAM
>>>>>> pages) if we do not have any suitable resource to work with and
>>>>>> as the result we won't be able to provide unpopulated pages.
>>>>>>
>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> ---
>>>>>> Changes RFC -> V2:
>>>>>>       - new patch, instead of
>>>>>>        "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
>>>>>> provide
>>>>>> unallocated space"
>>>>>> ---
>>>>>>     drivers/xen/unpopulated-alloc.c | 89
>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>     include/xen/xen.h               |  2 +
>>>>>>     2 files changed, 88 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/xen/unpopulated-alloc.c
>>>>>> b/drivers/xen/unpopulated-alloc.c
>>>>>> index a03dc5b..1f1d8d8 100644
>>>>>> --- a/drivers/xen/unpopulated-alloc.c
>>>>>> +++ b/drivers/xen/unpopulated-alloc.c
>>>>>> @@ -8,6 +8,7 @@
>>>>>>       #include <asm/page.h>
>>>>>>     +#include <xen/balloon.h>
>>>>>>     #include <xen/page.h>
>>>>>>     #include <xen/xen.h>
>>>>>>     @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
>>>>>>     static struct page *page_list;
>>>>>>     static unsigned int list_count;
>>>>>>     +static struct resource *target_resource;
>>>>>> +static struct resource xen_resource = {
>>>>>> +	.name = "Xen unused space",
>>>>>> +};
>>>>>> +
>>>>>> +/*
>>>>>> + * If arch is not happy with system "iomem_resource" being used for
>>>>>> + * the region allocation it can provide it's own view by initializing
>>>>>> + * "xen_resource" with unused regions of guest physical address space
>>>>>> + * provided by the hypervisor.
>>>>>> + */
>>>>>> +int __weak arch_xen_unpopulated_init(struct resource *res)
>>>>>> +{
>>>>>> +	return -ENOSYS;
>>>>>> +}
>>>>>> +
>>>>>>     static int fill_list(unsigned int nr_pages)
>>>>>>     {
>>>>>>     	struct dev_pagemap *pgmap;
>>>>>> -	struct resource *res;
>>>>>> +	struct resource *res, *tmp_res = NULL;
>>>>>>     	void *vaddr;
>>>>>>     	unsigned int i, alloc_pages = round_up(nr_pages,
>>>>>> PAGES_PER_SECTION);
>>>>>> -	int ret = -ENOMEM;
>>>>>> +	int ret;
>>>>>>       	res = kzalloc(sizeof(*res), GFP_KERNEL);
>>>>>>     	if (!res)
>>>>>> @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
>>>>>>     	res->name = "Xen scratch";
>>>>>>     	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>>>>>     -	ret = allocate_resource(&iomem_resource, res,
>>>>>> +	ret = allocate_resource(target_resource, res,
>>>>>>     				alloc_pages * PAGE_SIZE, 0, -1,
>>>>>>     				PAGES_PER_SECTION * PAGE_SIZE, NULL,
>>>>>> NULL);
>>>>>>     	if (ret < 0) {
>>>>>> @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
>>>>>>     		goto err_resource;
>>>>>>     	}
>>>>>>     +	/*
>>>>>> +	 * Reserve the region previously allocated from Xen resource
>>>>>> to avoid
>>>>>> +	 * re-using it by someone else.
>>>>>> +	 */
>>>>>> +	if (target_resource != &iomem_resource) {
>>>>>> +		tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>>>>> +		if (!res) {
>>>>>> +			ret = -ENOMEM;
>>>>>> +			goto err_insert;
>>>>>> +		}
>>>>>> +
>>>>>> +		tmp_res->name = res->name;
>>>>>> +		tmp_res->start = res->start;
>>>>>> +		tmp_res->end = res->end;
>>>>>> +		tmp_res->flags = res->flags;
>>>>>> +
>>>>>> +		ret = insert_resource(&iomem_resource, tmp_res);
>>>>>> +		if (ret < 0) {
>>>>>> +			pr_err("Cannot insert IOMEM resource [%llx -
>>>>>> %llx]\n",
>>>>>> +			       tmp_res->start, tmp_res->end);
>>>>>> +			kfree(tmp_res);
>>>>>> +			goto err_insert;
>>>>>> +		}
>>>>>> +	}
>>>>> I am a bit confused.. why do we need to do this? Who could be
>>>>> erroneously re-using the region? Are you saying that the next time
>>>>> allocate_resource is called it could find the same region again? It
>>>>> doesn't seem possible?
>>>> No, as I understand the allocate_resource() being called for the same root
>>>> resource won't provide the same region... We only need to do this (insert
>>>> the
>>>> region into "iomem_resource") if we allocated it from our *internal*
>>>> "xen_resource", as *global* "iomem_resource" (which is used everywhere) is
>>>> not
>>>> aware of that region has been already allocated. So inserting a region
>>>> here we
>>>> reserving it, otherwise it could be reused elsewhere.
>>> But elsewhere where?
>> I think, theoretically everywhere where allocate_resource(&iomem_resource,
>> ...) is called.
>>
>>
>>> Let's say that allocate_resource allocates a range from xen_resource.
>>>   From reading the code, it doesn't look like iomem_resource would have
>>> that range because the extended regions described under /hypervisor are
>>> not added automatically to iomem_resource.
>>>
>>> So what if we don't call insert_resource? Nothing could allocate the
>>> same range because iomem_resource doesn't have it at all and
>>> xen_resource is not used anywhere if not here.
>>>
>>> What am I missing?
>>
>> Below my understanding which, of course, might be wrong.
>>
>> If we don't claim resource by calling insert_resource (or even
>> request_resource) here then the same range could be allocated everywhere where
>> allocate_resource(&iomem_resource, ...) is called.
>> I don't see what prevents the same range from being allocated. Why actually
>> allocate_resource(&iomem_resource, ...) can't provide the same range if it is
>> free (not-reserved-yet) from it's PoV? The comment above allocate_resource()
>> says "allocate empty slot in the resource tree given range & alignment". So
>> this "empty slot" could be exactly the same range.
>>
>> I experimented with that a bit trying to call
>> allocate_resource(&iomem_resource, ...) several times in another place to see
>> what ranges it returns in both cases (w/ and w/o calling insert_resource
>> here). So an experiment confirmed (of course, if I made it correctly) that the
>> same range could be allocated if we didn't call insert_resource() here. And as
>> I understand there is nothing strange here, as iomem_resource covers all
>> address space initially (0, -1) and everything *not* inserted/requested (in
>> other words, reserved) yet is considered as free and could be provided if fits
>> constraints. Or I really missed something?
> Thanks for the explanation! It was me that didn't know that
> iomem_resource covers all the address space initially. I thought it was
> populated only with actual iomem ranges. Now it makes sense, thanks!
>
>
>> It feels to me that it would be better to call request_resource() instead of
>> insert_resource(). It seems, that if no conflict happens both functions will
>> behave in same way, but in case of conflict if the conflicting resource
>> entirely fit the new resource the former will return an error. I think, this
>> way we will be able to detect that a range we are trying to reserve is already
>> present and bail out early.
>>
>>
>>> Or maybe it is the other way around: core Linux code assumes everything
>>> is described in iomem_resource so something under kernel/ or mm/ would
>>> crash if we start using a page pointing to an address missing from
>>> iomem_resource?
>>>      
>>>>>>     	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
>>>>>>     	if (!pgmap) {
>>>>>>     		ret = -ENOMEM;
>>>>>> @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
>>>>>>     err_memremap:
>>>>>>     	kfree(pgmap);
>>>>>>     err_pgmap:
>>>>>> +	if (tmp_res) {
>>>>>> +		release_resource(tmp_res);
>>>>>> +		kfree(tmp_res);
>>>>>> +	}
>>>>>> +err_insert:
>>>>>>     	release_resource(res);
>>>>>>     err_resource:
>>>>>>     	kfree(res);
>>>>>>     	return ret;
>>>>>>     }
>>>>>>     +static void unpopulated_init(void)
>>>>>> +{
>>>>>> +	static bool inited = false;
>>>>> initialized = false
>>>> ok.
>>>>
>>>>
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	if (inited)
>>>>>> +		return;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Try to initialize Xen resource the first and fall back to
>>>>>> default
>>>>>> +	 * resource if arch doesn't offer one.
>>>>>> +	 */
>>>>>> +	ret = arch_xen_unpopulated_init(&xen_resource);
>>>>>> +	if (!ret)
>>>>>> +		target_resource = &xen_resource;
>>>>>> +	else if (ret == -ENOSYS)
>>>>>> +		target_resource = &iomem_resource;
>>>>>> +	else
>>>>>> +		pr_err("Cannot initialize Xen resource\n");
>>>>>> +
>>>>>> +	inited = true;
>>>>>> +}
>>>>> Would it make sense to call unpopulated_init from an init function,
>>>>> rather than every time xen_alloc_unpopulated_pages is called?
>>>> Good point, thank you. Will do. To be honest, I also don't like the
>>>> current
>>>> approach much.
>>>>
>>>>
>>>>>>     /**
>>>>>>      * xen_alloc_unpopulated_pages - alloc unpopulated pages
>>>>>>      * @nr_pages: Number of pages
>>>>>> @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int
>>>>>> nr_pages, struct page **pages)
>>>>>>     	unsigned int i;
>>>>>>     	int ret = 0;
>>>>>>     +	unpopulated_init();
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Fall back to default behavior if we do not have any
>>>>>> suitable
>>>>>> resource
>>>>>> +	 * to allocate required region from and as the result we won't
>>>>>> be able
>>>>>> to
>>>>>> +	 * construct pages.
>>>>>> +	 */
>>>>>> +	if (!target_resource)
>>>>>> +		return alloc_xenballooned_pages(nr_pages, pages);
>>>>> The commit message says that the behavior on x86 doesn't change but this
>>>>> seems to be a change that could impact x86?
>>>> I don't think, however I didn't tested on x86 and might be wrong, but
>>>> according to the current patch, on x86 the "target_resource" is always
>>>> valid
>>>> and points to the "iomem_resource" as arch_xen_unpopulated_init() is not
>>>> implemented. So there won't be any fallback to use
>>>> alloc_(free)_xenballooned_pages() here and fill_list() will behave as
>>>> usual.
>>>    If target_resource is always valid, then we don't need this special
>>> check. In fact, the condition should never be true.
>>
>> The target_resource is always valid and points to the "iomem_resource" on x86
>> (this is equivalent to the behavior before this patch).
>> On Arm target_resource might be NULL if arch_xen_unpopulated_init() failed,
>> for example, if no extended regions reported by the hypervisor.
>> We cannot use "iomem_resource" on Arm, only a resource constructed from
>> extended regions. This is why I added that check (and fallback to xenballooned
>> pages).
>> What I was thinking is that in case of using old Xen (although we would need
>> to balloon out RAM pages) we still would be able to keep working, so no need
>> to disable CONFIG_XEN_UNPOPULATED_ALLOC on such setups.
>>   
>>     
>>>> You raised a really good question, on Arm we need a fallback to balloon
>>>> out
>>>> RAM pages again if hypervisor doesn't provide extended regions (we run on
>>>> old
>>>> version, no unused regions with reasonable size, etc), so I decided to put
>>>> a
>>>> fallback code here, an indicator of the failure is invalid
>>>> "target_resource".
>>> I think it is unnecessary as we already assume today that
>>> &iomem_resource is always available.
>>>> I noticed the patch which is about to be upstreamed that removes
>>>> alloc_(free)xenballooned_pages API [1]. Right now I have no idea how/where
>>>> this fallback could be implemented as this is under build option control
>>>> (CONFIG_XEN_UNPOPULATED_ALLOC). So the API with the same name is either
>>>> used
>>>> for unpopulated pages (if set) or ballooned pages (if not set). I would
>>>> appreciate suggestions regarding that. I am wondering would it be possible
>>>> and
>>>> correctly to have both mechanisms (unpopulated and ballooned) enabled by
>>>> default and some init code to decide which one to use at runtime or some
>>>> sort?
>>> I would keep it simple and remove the fallback from this patch. So:
>>>
>>> - if not CONFIG_XEN_UNPOPULATED_ALLOC, then balloon
>>> - if CONFIG_XEN_UNPOPULATED_ALLOC, then
>>>       - xen_resource if present
>>>       - otherwise iomem_resource
>> Unfortunately, we cannot use iomem_resource on Arm safely, either xen_resource
>> or fail (if no fallback exists).
>>
>>
>>> The xen_resource/iomem_resource config can be done at init time using
>>> target_resource. At runtime, target_resource is always != NULL so we
>>> just go ahead and use it.
>>
>> Thank you for the suggestion. OK, let's keep it simple and drop fallback
>> attempts for now. With one remark:
>> We will make CONFIG_XEN_UNPOPULATED_ALLOC disabled by default on Arm in next
>> patch. So by default everything will behave as usual on Arm (balloon out RAM
>> pages),
>> if user knows for sure that Xen reports extended regions, he/she can enable
>> the config. This way we won't break anything. What do you think?
> Actually after reading your replies and explanation I changed opinion: I
> think we do need the fallback because Linux cannot really assume that
> it is running on "new Xen" so it definitely needs to keep working if
> CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not
> advertised.
>
> I think we'll have to roll back some of the changes introduced by
> 121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is
> enabled we cannot know if we can use unpopulated-alloc or whether we
> have to use alloc_xenballooned_pages until we parse the /hypervisor node
> in device tree at runtime.

Exactly!


>
> In short, we cannot switch between unpopulated-alloc and
> alloc_xenballooned_pages at build time, we have to do it at runtime
> (boot time).

+1


I created a patch to partially revert 121f2faca2c0a "xen/balloon: rename 
alloc/free_xenballooned_pages".

If there is no objections I will add it to V3 (which is almost ready, 
except the fallback bits). Could you please tell me what do you think?


 From dc79bcd425358596d95e715a8bd8b81deaaeb703 Mon Sep 17 00:00:00 2001
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Date: Tue, 23 Nov 2021 18:14:41 +0200
Subject: [PATCH] xen/balloon: Bring alloc(free)_xenballooned_pages helpers
  back

This patch rolls back some of the changes introduced by commit
121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages"
in order to make possible to still allocate xenballooned pages
if CONFIG_XEN_UNPOPULATED_ALLOC is enabled.

On Arm the unpopulated pages will be allocated on top of extended
regions provided by Xen via device-tree (the subsequent patches
will add required bits to support unpopulated-alloc feature on Arm).
The problem is that extended regions feature has been introduced
into Xen quite recently (during 4.16 release cycle). So this
effectively means that Linux must only use unpopulated-alloc on Arm
if it is running on "new Xen" which advertises these regions.
But, it will only be known after parsing the "hypervisor" node
at boot time, so before doing that we cannot assume anything.

In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled
and the extended regions are not advertised (Linux is running on
"old Xen", etc) we need the fallback to alloc_xenballooned_pages().

This way we wouldn't reduce the amount of memory usable (wasting
RAM pages) for any of the external mappings anymore (and eliminate
XSA-300) with "new Xen", but would be still functional ballooning
out RAM pages with "old Xen".

Also rename alloc(free)_xenballooned_pages to 
xen_alloc(free)_ballooned_pages.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
  drivers/xen/balloon.c | 20 +++++++++-----------
  include/xen/balloon.h |  3 +++
  include/xen/xen.h     |  6 ++++++
  3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ba2ea11..a2c4fc49 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -581,7 +581,6 @@ void balloon_set_new_target(unsigned long target)
  }
  EXPORT_SYMBOL_GPL(balloon_set_new_target);

-#ifndef CONFIG_XEN_UNPOPULATED_ALLOC
  static int add_ballooned_pages(unsigned int nr_pages)
  {
      enum bp_state st;
@@ -610,12 +609,12 @@ static int add_ballooned_pages(unsigned int nr_pages)
  }

  /**
- * xen_alloc_unpopulated_pages - get pages that have been ballooned out
+ * xen_alloc_ballooned_pages - get pages that have been ballooned out
   * @nr_pages: Number of pages to get
   * @pages: pages returned
   * @return 0 on success, error otherwise
   */
-int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
+int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages)
  {
      unsigned int pgno = 0;
      struct page *page;
@@ -652,23 +651,23 @@ int xen_alloc_unpopulated_pages(unsigned int 
nr_pages, struct page **pages)
      return 0;
   out_undo:
      mutex_unlock(&balloon_mutex);
-    xen_free_unpopulated_pages(pgno, pages);
+    xen_free_ballooned_pages(pgno, pages);
      /*
-     * NB: free_xenballooned_pages will only subtract pgno pages, but since
+     * NB: xen_free_ballooned_pages will only subtract pgno pages, but 
since
       * target_unpopulated is incremented with nr_pages at the start we 
need
       * to remove the remaining ones also, or accounting will be screwed.
       */
      balloon_stats.target_unpopulated -= nr_pages - pgno;
      return ret;
  }
-EXPORT_SYMBOL(xen_alloc_unpopulated_pages);
+EXPORT_SYMBOL(xen_alloc_ballooned_pages);

  /**
- * xen_free_unpopulated_pages - return pages retrieved with 
get_ballooned_pages
+ * xen_free_ballooned_pages - return pages retrieved with 
get_ballooned_pages
   * @nr_pages: Number of pages
   * @pages: pages to return
   */
-void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages)
+void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages)
  {
      unsigned int i;

@@ -687,9 +686,9 @@ void xen_free_unpopulated_pages(unsigned int 
nr_pages, struct page **pages)

      mutex_unlock(&balloon_mutex);
  }
-EXPORT_SYMBOL(xen_free_unpopulated_pages);
+EXPORT_SYMBOL(xen_free_ballooned_pages);

-#if defined(CONFIG_XEN_PV)
+#if defined(CONFIG_XEN_PV) && !defined(CONFIG_XEN_UNPOPULATED_ALLOC)
  static void __init balloon_add_region(unsigned long start_pfn,
                        unsigned long pages)
  {
@@ -712,7 +711,6 @@ static void __init balloon_add_region(unsigned long 
start_pfn,
      balloon_stats.total_pages += extra_pfn_end - start_pfn;
  }
  #endif
-#endif

  static int __init balloon_init(void)
  {
diff --git a/include/xen/balloon.h b/include/xen/balloon.h
index e93d4f0..f78a6cc 100644
--- a/include/xen/balloon.h
+++ b/include/xen/balloon.h
@@ -26,6 +26,9 @@ extern struct balloon_stats balloon_stats;

  void balloon_set_new_target(unsigned long target);

+int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages);
+void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages);
+
  #ifdef CONFIG_XEN_BALLOON
  void xen_balloon_init(void);
  #else
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 9f031b5..410e3e4 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -52,7 +52,13 @@ bool xen_biovec_phys_mergeable(const struct bio_vec 
*vec1,
  extern u64 xen_saved_max_mem_size;
  #endif

+#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
  int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page 
**pages);
  void xen_free_unpopulated_pages(unsigned int nr_pages, struct page 
**pages);
+#else
+#define xen_alloc_unpopulated_pages xen_alloc_ballooned_pages
+#define xen_free_unpopulated_pages xen_free_ballooned_pages
+#include <xen/balloon.h>
+#endif

  #endif    /* _XEN_XEN_H */
-- 
2.7.4



>
> Juergen, what do you think?


-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-23 16:46             ` Oleksandr
@ 2021-11-23 21:25               ` Stefano Stabellini
  2021-11-24  9:33                 ` Oleksandr
  2021-11-24  5:16               ` Juergen Gross
  1 sibling, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2021-11-23 21:25 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, jgross, xen-devel, linux-kernel,
	Oleksandr Tyshchenko, Boris Ostrovsky, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 7276 bytes --]

On Tue, 23 Nov 2021, Oleksandr wrote:
> > Actually after reading your replies and explanation I changed opinion: I
> > think we do need the fallback because Linux cannot really assume that
> > it is running on "new Xen" so it definitely needs to keep working if
> > CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not
> > advertised.
> > 
> > I think we'll have to roll back some of the changes introduced by
> > 121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is
> > enabled we cannot know if we can use unpopulated-alloc or whether we
> > have to use alloc_xenballooned_pages until we parse the /hypervisor node
> > in device tree at runtime.
> 
> Exactly!
> 
> 
> > 
> > In short, we cannot switch between unpopulated-alloc and
> > alloc_xenballooned_pages at build time, we have to do it at runtime
> > (boot time).
> 
> +1
> 
> 
> I created a patch to partially revert 121f2faca2c0a "xen/balloon: rename
> alloc/free_xenballooned_pages".
> 
> If there is no objections I will add it to V3 (which is almost ready, except
> the fallback bits). Could you please tell me what do you think?
 
It makes sense to me. You can add my Reviewed-by.

 
> From dc79bcd425358596d95e715a8bd8b81deaaeb703 Mon Sep 17 00:00:00 2001
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Date: Tue, 23 Nov 2021 18:14:41 +0200
> Subject: [PATCH] xen/balloon: Bring alloc(free)_xenballooned_pages helpers
>  back
> 
> This patch rolls back some of the changes introduced by commit
> 121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages"
> in order to make possible to still allocate xenballooned pages
> if CONFIG_XEN_UNPOPULATED_ALLOC is enabled.
> 
> On Arm the unpopulated pages will be allocated on top of extended
> regions provided by Xen via device-tree (the subsequent patches
> will add required bits to support unpopulated-alloc feature on Arm).
> The problem is that extended regions feature has been introduced
> into Xen quite recently (during 4.16 release cycle). So this
> effectively means that Linux must only use unpopulated-alloc on Arm
> if it is running on "new Xen" which advertises these regions.
> But, it will only be known after parsing the "hypervisor" node
> at boot time, so before doing that we cannot assume anything.
> 
> In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled
> and the extended regions are not advertised (Linux is running on
> "old Xen", etc) we need the fallback to alloc_xenballooned_pages().
> 
> This way we wouldn't reduce the amount of memory usable (wasting
> RAM pages) for any of the external mappings anymore (and eliminate
> XSA-300) with "new Xen", but would be still functional ballooning
> out RAM pages with "old Xen".
> 
> Also rename alloc(free)_xenballooned_pages to xen_alloc(free)_ballooned_pages.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  drivers/xen/balloon.c | 20 +++++++++-----------
>  include/xen/balloon.h |  3 +++
>  include/xen/xen.h     |  6 ++++++
>  3 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index ba2ea11..a2c4fc49 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -581,7 +581,6 @@ void balloon_set_new_target(unsigned long target)
>  }
>  EXPORT_SYMBOL_GPL(balloon_set_new_target);
> 
> -#ifndef CONFIG_XEN_UNPOPULATED_ALLOC
>  static int add_ballooned_pages(unsigned int nr_pages)
>  {
>      enum bp_state st;
> @@ -610,12 +609,12 @@ static int add_ballooned_pages(unsigned int nr_pages)
>  }
> 
>  /**
> - * xen_alloc_unpopulated_pages - get pages that have been ballooned out
> + * xen_alloc_ballooned_pages - get pages that have been ballooned out
>   * @nr_pages: Number of pages to get
>   * @pages: pages returned
>   * @return 0 on success, error otherwise
>   */
> -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages)
>  {
>      unsigned int pgno = 0;
>      struct page *page;
> @@ -652,23 +651,23 @@ int xen_alloc_unpopulated_pages(unsigned int nr_pages,
> struct page **pages)
>      return 0;
>   out_undo:
>      mutex_unlock(&balloon_mutex);
> -    xen_free_unpopulated_pages(pgno, pages);
> +    xen_free_ballooned_pages(pgno, pages);
>      /*
> -     * NB: free_xenballooned_pages will only subtract pgno pages, but since
> +     * NB: xen_free_ballooned_pages will only subtract pgno pages, but since
>       * target_unpopulated is incremented with nr_pages at the start we need
>       * to remove the remaining ones also, or accounting will be screwed.
>       */
>      balloon_stats.target_unpopulated -= nr_pages - pgno;
>      return ret;
>  }
> -EXPORT_SYMBOL(xen_alloc_unpopulated_pages);
> +EXPORT_SYMBOL(xen_alloc_ballooned_pages);
> 
>  /**
> - * xen_free_unpopulated_pages - return pages retrieved with
> get_ballooned_pages
> + * xen_free_ballooned_pages - return pages retrieved with get_ballooned_pages
>   * @nr_pages: Number of pages
>   * @pages: pages to return
>   */
> -void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages)
> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages)
>  {
>      unsigned int i;
> 
> @@ -687,9 +686,9 @@ void xen_free_unpopulated_pages(unsigned int nr_pages,
> struct page **pages)
> 
>      mutex_unlock(&balloon_mutex);
>  }
> -EXPORT_SYMBOL(xen_free_unpopulated_pages);
> +EXPORT_SYMBOL(xen_free_ballooned_pages);
> 
> -#if defined(CONFIG_XEN_PV)
> +#if defined(CONFIG_XEN_PV) && !defined(CONFIG_XEN_UNPOPULATED_ALLOC)
>  static void __init balloon_add_region(unsigned long start_pfn,
>                        unsigned long pages)
>  {
> @@ -712,7 +711,6 @@ static void __init balloon_add_region(unsigned long
> start_pfn,
>      balloon_stats.total_pages += extra_pfn_end - start_pfn;
>  }
>  #endif
> -#endif
> 
>  static int __init balloon_init(void)
>  {
> diff --git a/include/xen/balloon.h b/include/xen/balloon.h
> index e93d4f0..f78a6cc 100644
> --- a/include/xen/balloon.h
> +++ b/include/xen/balloon.h
> @@ -26,6 +26,9 @@ extern struct balloon_stats balloon_stats;
> 
>  void balloon_set_new_target(unsigned long target);
> 
> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages);
> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages);
> +
>  #ifdef CONFIG_XEN_BALLOON
>  void xen_balloon_init(void);
>  #else
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 9f031b5..410e3e4 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -52,7 +52,13 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
>  extern u64 xen_saved_max_mem_size;
>  #endif
> 
> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>  int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
>  void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);
> +#else
> +#define xen_alloc_unpopulated_pages xen_alloc_ballooned_pages
> +#define xen_free_unpopulated_pages xen_free_ballooned_pages
> +#include <xen/balloon.h>
> +#endif
> 
>  #endif    /* _XEN_XEN_H */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-23 16:46             ` Oleksandr
  2021-11-23 21:25               ` Stefano Stabellini
@ 2021-11-24  5:16               ` Juergen Gross
  2021-11-24  9:37                 ` Oleksandr
  1 sibling, 1 reply; 41+ messages in thread
From: Juergen Gross @ 2021-11-24  5:16 UTC (permalink / raw)
  To: Oleksandr, Stefano Stabellini
  Cc: xen-devel, linux-kernel, Oleksandr Tyshchenko, Boris Ostrovsky,
	Julien Grall


[-- Attachment #1.1.1: Type: text/plain, Size: 23925 bytes --]

On 23.11.21 17:46, Oleksandr wrote:
> 
> On 20.11.21 04:19, Stefano Stabellini wrote:
> 
> Hi Stefano, Juergen, all
> 
> 
>> Juergen please see the bottom of the email
>>
>> On Fri, 19 Nov 2021, Oleksandr wrote:
>>> On 19.11.21 02:59, Stefano Stabellini wrote:
>>>> On Tue, 9 Nov 2021, Oleksandr wrote:
>>>>> On 28.10.21 19:37, Stefano Stabellini wrote:
>>>>>
>>>>> Hi Stefano
>>>>>
>>>>> I am sorry for the late response.
>>>>>
>>>>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>
>>>>>>> The main reason of this change is that unpopulated-alloc
>>>>>>> code cannot be used in its current form on Arm, but there
>>>>>>> is a desire to reuse it to avoid wasting real RAM pages
>>>>>>> for the grant/foreign mappings.
>>>>>>>
>>>>>>> The problem is that system "iomem_resource" is used for
>>>>>>> the address space allocation, but the really unallocated
>>>>>>> space can't be figured out precisely by the domain on Arm
>>>>>>> without hypervisor involvement. For example, not all device
>>>>>>> I/O regions are known by the time domain starts creating
>>>>>>> grant/foreign mappings. And following the advise from
>>>>>>> "iomem_resource" we might end up reusing these regions by
>>>>>>> a mistake. So, the hypervisor which maintains the P2M for
>>>>>>> the domain is in the best position to provide unused regions
>>>>>>> of guest physical address space which could be safely used
>>>>>>> to create grant/foreign mappings.
>>>>>>>
>>>>>>> Introduce new helper arch_xen_unpopulated_init() which purpose
>>>>>>> is to create specific Xen resource based on the memory regions
>>>>>>> provided by the hypervisor to be used as unused space for Xen
>>>>>>> scratch pages.
>>>>>>>
>>>>>>> If arch doesn't implement arch_xen_unpopulated_init() to
>>>>>>> initialize Xen resource the default "iomem_resource" will be used.
>>>>>>> So the behavior on x86 won't be changed.
>>>>>>>
>>>>>>> Also fall back to allocate xenballooned pages (steal real RAM
>>>>>>> pages) if we do not have any suitable resource to work with and
>>>>>>> as the result we won't be able to provide unpopulated pages.
>>>>>>>
>>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>> ---
>>>>>>> Changes RFC -> V2:
>>>>>>>       - new patch, instead of
>>>>>>>        "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
>>>>>>> provide
>>>>>>> unallocated space"
>>>>>>> ---
>>>>>>>     drivers/xen/unpopulated-alloc.c | 89
>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>     include/xen/xen.h               |  2 +
>>>>>>>     2 files changed, 88 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/xen/unpopulated-alloc.c
>>>>>>> b/drivers/xen/unpopulated-alloc.c
>>>>>>> index a03dc5b..1f1d8d8 100644
>>>>>>> --- a/drivers/xen/unpopulated-alloc.c
>>>>>>> +++ b/drivers/xen/unpopulated-alloc.c
>>>>>>> @@ -8,6 +8,7 @@
>>>>>>>       #include <asm/page.h>
>>>>>>>     +#include <xen/balloon.h>
>>>>>>>     #include <xen/page.h>
>>>>>>>     #include <xen/xen.h>
>>>>>>>     @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
>>>>>>>     static struct page *page_list;
>>>>>>>     static unsigned int list_count;
>>>>>>>     +static struct resource *target_resource;
>>>>>>> +static struct resource xen_resource = {
>>>>>>> +    .name = "Xen unused space",
>>>>>>> +};
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * If arch is not happy with system "iomem_resource" being used for
>>>>>>> + * the region allocation it can provide it's own view by 
>>>>>>> initializing
>>>>>>> + * "xen_resource" with unused regions of guest physical address 
>>>>>>> space
>>>>>>> + * provided by the hypervisor.
>>>>>>> + */
>>>>>>> +int __weak arch_xen_unpopulated_init(struct resource *res)
>>>>>>> +{
>>>>>>> +    return -ENOSYS;
>>>>>>> +}
>>>>>>> +
>>>>>>>     static int fill_list(unsigned int nr_pages)
>>>>>>>     {
>>>>>>>         struct dev_pagemap *pgmap;
>>>>>>> -    struct resource *res;
>>>>>>> +    struct resource *res, *tmp_res = NULL;
>>>>>>>         void *vaddr;
>>>>>>>         unsigned int i, alloc_pages = round_up(nr_pages,
>>>>>>> PAGES_PER_SECTION);
>>>>>>> -    int ret = -ENOMEM;
>>>>>>> +    int ret;
>>>>>>>           res = kzalloc(sizeof(*res), GFP_KERNEL);
>>>>>>>         if (!res)
>>>>>>> @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
>>>>>>>         res->name = "Xen scratch";
>>>>>>>         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>>>>>>     -    ret = allocate_resource(&iomem_resource, res,
>>>>>>> +    ret = allocate_resource(target_resource, res,
>>>>>>>                     alloc_pages * PAGE_SIZE, 0, -1,
>>>>>>>                     PAGES_PER_SECTION * PAGE_SIZE, NULL,
>>>>>>> NULL);
>>>>>>>         if (ret < 0) {
>>>>>>> @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
>>>>>>>             goto err_resource;
>>>>>>>         }
>>>>>>>     +    /*
>>>>>>> +     * Reserve the region previously allocated from Xen resource
>>>>>>> to avoid
>>>>>>> +     * re-using it by someone else.
>>>>>>> +     */
>>>>>>> +    if (target_resource != &iomem_resource) {
>>>>>>> +        tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>>>>>> +        if (!res) {
>>>>>>> +            ret = -ENOMEM;
>>>>>>> +            goto err_insert;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        tmp_res->name = res->name;
>>>>>>> +        tmp_res->start = res->start;
>>>>>>> +        tmp_res->end = res->end;
>>>>>>> +        tmp_res->flags = res->flags;
>>>>>>> +
>>>>>>> +        ret = insert_resource(&iomem_resource, tmp_res);
>>>>>>> +        if (ret < 0) {
>>>>>>> +            pr_err("Cannot insert IOMEM resource [%llx -
>>>>>>> %llx]\n",
>>>>>>> +                   tmp_res->start, tmp_res->end);
>>>>>>> +            kfree(tmp_res);
>>>>>>> +            goto err_insert;
>>>>>>> +        }
>>>>>>> +    }
>>>>>> I am a bit confused.. why do we need to do this? Who could be
>>>>>> erroneously re-using the region? Are you saying that the next time
>>>>>> allocate_resource is called it could find the same region again? It
>>>>>> doesn't seem possible?
>>>>> No, as I understand the allocate_resource() being called for the 
>>>>> same root
>>>>> resource won't provide the same region... We only need to do this 
>>>>> (insert
>>>>> the
>>>>> region into "iomem_resource") if we allocated it from our *internal*
>>>>> "xen_resource", as *global* "iomem_resource" (which is used 
>>>>> everywhere) is
>>>>> not
>>>>> aware of that region has been already allocated. So inserting a region
>>>>> here we
>>>>> reserving it, otherwise it could be reused elsewhere.
>>>> But elsewhere where?
>>> I think, theoretically everywhere where 
>>> allocate_resource(&iomem_resource,
>>> ...) is called.
>>>
>>>
>>>> Let's say that allocate_resource allocates a range from xen_resource.
>>>>   From reading the code, it doesn't look like iomem_resource would have
>>>> that range because the extended regions described under /hypervisor are
>>>> not added automatically to iomem_resource.
>>>>
>>>> So what if we don't call insert_resource? Nothing could allocate the
>>>> same range because iomem_resource doesn't have it at all and
>>>> xen_resource is not used anywhere if not here.
>>>>
>>>> What am I missing?
>>>
>>> Below my understanding which, of course, might be wrong.
>>>
>>> If we don't claim resource by calling insert_resource (or even
>>> request_resource) here then the same range could be allocated 
>>> everywhere where
>>> allocate_resource(&iomem_resource, ...) is called.
>>> I don't see what prevents the same range from being allocated. Why 
>>> actually
>>> allocate_resource(&iomem_resource, ...) can't provide the same range 
>>> if it is
>>> free (not-reserved-yet) from it's PoV? The comment above 
>>> allocate_resource()
>>> says "allocate empty slot in the resource tree given range & 
>>> alignment". So
>>> this "empty slot" could be exactly the same range.
>>>
>>> I experimented with that a bit trying to call
>>> allocate_resource(&iomem_resource, ...) several times in another 
>>> place to see
>>> what ranges it returns in both cases (w/ and w/o calling insert_resource
>>> here). So an experiment confirmed (of course, if I made it correctly) 
>>> that the
>>> same range could be allocated if we didn't call insert_resource() 
>>> here. And as
>>> I understand there is nothing strange here, as iomem_resource covers all
>>> address space initially (0, -1) and everything *not* 
>>> inserted/requested (in
>>> other words, reserved) yet is considered as free and could be 
>>> provided if fits
>>> constraints. Or I really missed something?
>> Thanks for the explanation! It was me that didn't know that
>> iomem_resource covers all the address space initially. I thought it was
>> populated only with actual iomem ranges. Now it makes sense, thanks!
>>
>>
>>> It feels to me that it would be better to call request_resource() 
>>> instead of
>>> insert_resource(). It seems, that if no conflict happens both 
>>> functions will
>>> behave in same way, but in case of conflict if the conflicting resource
>>> entirely fit the new resource the former will return an error. I 
>>> think, this
>>> way we will be able to detect that a range we are trying to reserve 
>>> is already
>>> present and bail out early.
>>>
>>>
>>>> Or maybe it is the other way around: core Linux code assumes everything
>>>> is described in iomem_resource so something under kernel/ or mm/ would
>>>> crash if we start using a page pointing to an address missing from
>>>> iomem_resource?
>>>>>>>         pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
>>>>>>>         if (!pgmap) {
>>>>>>>             ret = -ENOMEM;
>>>>>>> @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
>>>>>>>     err_memremap:
>>>>>>>         kfree(pgmap);
>>>>>>>     err_pgmap:
>>>>>>> +    if (tmp_res) {
>>>>>>> +        release_resource(tmp_res);
>>>>>>> +        kfree(tmp_res);
>>>>>>> +    }
>>>>>>> +err_insert:
>>>>>>>         release_resource(res);
>>>>>>>     err_resource:
>>>>>>>         kfree(res);
>>>>>>>         return ret;
>>>>>>>     }
>>>>>>>     +static void unpopulated_init(void)
>>>>>>> +{
>>>>>>> +    static bool inited = false;
>>>>>> initialized = false
>>>>> ok.
>>>>>
>>>>>
>>>>>>> +    int ret;
>>>>>>> +
>>>>>>> +    if (inited)
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * Try to initialize Xen resource the first and fall back to
>>>>>>> default
>>>>>>> +     * resource if arch doesn't offer one.
>>>>>>> +     */
>>>>>>> +    ret = arch_xen_unpopulated_init(&xen_resource);
>>>>>>> +    if (!ret)
>>>>>>> +        target_resource = &xen_resource;
>>>>>>> +    else if (ret == -ENOSYS)
>>>>>>> +        target_resource = &iomem_resource;
>>>>>>> +    else
>>>>>>> +        pr_err("Cannot initialize Xen resource\n");
>>>>>>> +
>>>>>>> +    inited = true;
>>>>>>> +}
>>>>>> Would it make sense to call unpopulated_init from an init function,
>>>>>> rather than every time xen_alloc_unpopulated_pages is called?
>>>>> Good point, thank you. Will do. To be honest, I also don't like the
>>>>> current
>>>>> approach much.
>>>>>
>>>>>
>>>>>>>     /**
>>>>>>>      * xen_alloc_unpopulated_pages - alloc unpopulated pages
>>>>>>>      * @nr_pages: Number of pages
>>>>>>> @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int
>>>>>>> nr_pages, struct page **pages)
>>>>>>>         unsigned int i;
>>>>>>>         int ret = 0;
>>>>>>>     +    unpopulated_init();
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * Fall back to default behavior if we do not have any
>>>>>>> suitable
>>>>>>> resource
>>>>>>> +     * to allocate required region from and as the result we won't
>>>>>>> be able
>>>>>>> to
>>>>>>> +     * construct pages.
>>>>>>> +     */
>>>>>>> +    if (!target_resource)
>>>>>>> +        return alloc_xenballooned_pages(nr_pages, pages);
>>>>>> The commit message says that the behavior on x86 doesn't change 
>>>>>> but this
>>>>>> seems to be a change that could impact x86?
>>>>> I don't think, however I didn't tested on x86 and might be wrong, but
>>>>> according to the current patch, on x86 the "target_resource" is always
>>>>> valid
>>>>> and points to the "iomem_resource" as arch_xen_unpopulated_init() 
>>>>> is not
>>>>> implemented. So there won't be any fallback to use
>>>>> alloc_(free)_xenballooned_pages() here and fill_list() will behave as
>>>>> usual.
>>>>    If target_resource is always valid, then we don't need this special
>>>> check. In fact, the condition should never be true.
>>>
>>> The target_resource is always valid and points to the 
>>> "iomem_resource" on x86
>>> (this is equivalent to the behavior before this patch).
>>> On Arm target_resource might be NULL if arch_xen_unpopulated_init() 
>>> failed,
>>> for example, if no extended regions reported by the hypervisor.
>>> We cannot use "iomem_resource" on Arm, only a resource constructed from
>>> extended regions. This is why I added that check (and fallback to 
>>> xenballooned
>>> pages).
>>> What I was thinking is that in case of using old Xen (although we 
>>> would need
>>> to balloon out RAM pages) we still would be able to keep working, so 
>>> no need
>>> to disable CONFIG_XEN_UNPOPULATED_ALLOC on such setups.
>>>>> You raised a really good question, on Arm we need a fallback to 
>>>>> balloon
>>>>> out
>>>>> RAM pages again if hypervisor doesn't provide extended regions (we 
>>>>> run on
>>>>> old
>>>>> version, no unused regions with reasonable size, etc), so I decided 
>>>>> to put
>>>>> a
>>>>> fallback code here, an indicator of the failure is invalid
>>>>> "target_resource".
>>>> I think it is unnecessary as we already assume today that
>>>> &iomem_resource is always available.
>>>>> I noticed the patch which is about to be upstreamed that removes
>>>>> alloc_(free)xenballooned_pages API [1]. Right now I have no idea 
>>>>> how/where
>>>>> this fallback could be implemented as this is under build option 
>>>>> control
>>>>> (CONFIG_XEN_UNPOPULATED_ALLOC). So the API with the same name is 
>>>>> either
>>>>> used
>>>>> for unpopulated pages (if set) or ballooned pages (if not set). I 
>>>>> would
>>>>> appreciate suggestions regarding that. I am wondering would it be 
>>>>> possible
>>>>> and
>>>>> correctly to have both mechanisms (unpopulated and ballooned) 
>>>>> enabled by
>>>>> default and some init code to decide which one to use at runtime or 
>>>>> some
>>>>> sort?
>>>> I would keep it simple and remove the fallback from this patch. So:
>>>>
>>>> - if not CONFIG_XEN_UNPOPULATED_ALLOC, then balloon
>>>> - if CONFIG_XEN_UNPOPULATED_ALLOC, then
>>>>       - xen_resource if present
>>>>       - otherwise iomem_resource
>>> Unfortunately, we cannot use iomem_resource on Arm safely, either 
>>> xen_resource
>>> or fail (if no fallback exists).
>>>
>>>
>>>> The xen_resource/iomem_resource config can be done at init time using
>>>> target_resource. At runtime, target_resource is always != NULL so we
>>>> just go ahead and use it.
>>>
>>> Thank you for the suggestion. OK, let's keep it simple and drop fallback
>>> attempts for now. With one remark:
>>> We will make CONFIG_XEN_UNPOPULATED_ALLOC disabled by default on Arm 
>>> in next
>>> patch. So by default everything will behave as usual on Arm (balloon 
>>> out RAM
>>> pages),
>>> if user knows for sure that Xen reports extended regions, he/she can 
>>> enable
>>> the config. This way we won't break anything. What do you think?
>> Actually after reading your replies and explanation I changed opinion: I
>> think we do need the fallback because Linux cannot really assume that
>> it is running on "new Xen" so it definitely needs to keep working if
>> CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not
>> advertised.
>>
>> I think we'll have to roll back some of the changes introduced by
>> 121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is
>> enabled we cannot know if we can use unpopulated-alloc or whether we
>> have to use alloc_xenballooned_pages until we parse the /hypervisor node
>> in device tree at runtime.
> 
> Exactly!
> 
> 
>>
>> In short, we cannot switch between unpopulated-alloc and
>> alloc_xenballooned_pages at build time, we have to do it at runtime
>> (boot time).
> 
> +1
> 
> 
> I created a patch to partially revert 121f2faca2c0a "xen/balloon: rename 
> alloc/free_xenballooned_pages".
> 
> If there is no objections I will add it to V3 (which is almost ready, 
> except the fallback bits). Could you please tell me what do you think?
> 
> 
>  From dc79bcd425358596d95e715a8bd8b81deaaeb703 Mon Sep 17 00:00:00 2001
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Date: Tue, 23 Nov 2021 18:14:41 +0200
> Subject: [PATCH] xen/balloon: Bring alloc(free)_xenballooned_pages helpers
>   back
> 
> This patch rolls back some of the changes introduced by commit
> 121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages"
> in order to make possible to still allocate xenballooned pages
> if CONFIG_XEN_UNPOPULATED_ALLOC is enabled.
> 
> On Arm the unpopulated pages will be allocated on top of extended
> regions provided by Xen via device-tree (the subsequent patches
> will add required bits to support unpopulated-alloc feature on Arm).
> The problem is that extended regions feature has been introduced
> into Xen quite recently (during 4.16 release cycle). So this
> effectively means that Linux must only use unpopulated-alloc on Arm
> if it is running on "new Xen" which advertises these regions.
> But, it will only be known after parsing the "hypervisor" node
> at boot time, so before doing that we cannot assume anything.
> 
> In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled
> and the extended regions are not advertised (Linux is running on
> "old Xen", etc) we need the fallback to alloc_xenballooned_pages().
> 
> This way we wouldn't reduce the amount of memory usable (wasting
> RAM pages) for any of the external mappings anymore (and eliminate
> XSA-300) with "new Xen", but would be still functional ballooning
> out RAM pages with "old Xen".
> 
> Also rename alloc(free)_xenballooned_pages to 
> xen_alloc(free)_ballooned_pages.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>   drivers/xen/balloon.c | 20 +++++++++-----------
>   include/xen/balloon.h |  3 +++
>   include/xen/xen.h     |  6 ++++++
>   3 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index ba2ea11..a2c4fc49 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -581,7 +581,6 @@ void balloon_set_new_target(unsigned long target)
>   }
>   EXPORT_SYMBOL_GPL(balloon_set_new_target);
> 
> -#ifndef CONFIG_XEN_UNPOPULATED_ALLOC
>   static int add_ballooned_pages(unsigned int nr_pages)
>   {
>       enum bp_state st;
> @@ -610,12 +609,12 @@ static int add_ballooned_pages(unsigned int nr_pages)
>   }
> 
>   /**
> - * xen_alloc_unpopulated_pages - get pages that have been ballooned out
> + * xen_alloc_ballooned_pages - get pages that have been ballooned out
>    * @nr_pages: Number of pages to get
>    * @pages: pages returned
>    * @return 0 on success, error otherwise
>    */
> -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page 
> **pages)
> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages)
>   {
>       unsigned int pgno = 0;
>       struct page *page;
> @@ -652,23 +651,23 @@ int xen_alloc_unpopulated_pages(unsigned int 
> nr_pages, struct page **pages)
>       return 0;
>    out_undo:
>       mutex_unlock(&balloon_mutex);
> -    xen_free_unpopulated_pages(pgno, pages);
> +    xen_free_ballooned_pages(pgno, pages);
>       /*
> -     * NB: free_xenballooned_pages will only subtract pgno pages, but 
> since
> +     * NB: xen_free_ballooned_pages will only subtract pgno pages, but 
> since
>        * target_unpopulated is incremented with nr_pages at the start we 
> need
>        * to remove the remaining ones also, or accounting will be screwed.
>        */
>       balloon_stats.target_unpopulated -= nr_pages - pgno;
>       return ret;
>   }
> -EXPORT_SYMBOL(xen_alloc_unpopulated_pages);
> +EXPORT_SYMBOL(xen_alloc_ballooned_pages);
> 
>   /**
> - * xen_free_unpopulated_pages - return pages retrieved with 
> get_ballooned_pages
> + * xen_free_ballooned_pages - return pages retrieved with 
> get_ballooned_pages
>    * @nr_pages: Number of pages
>    * @pages: pages to return
>    */
> -void xen_free_unpopulated_pages(unsigned int nr_pages, struct page 
> **pages)
> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages)
>   {
>       unsigned int i;
> 
> @@ -687,9 +686,9 @@ void xen_free_unpopulated_pages(unsigned int 
> nr_pages, struct page **pages)
> 
>       mutex_unlock(&balloon_mutex);
>   }
> -EXPORT_SYMBOL(xen_free_unpopulated_pages);
> +EXPORT_SYMBOL(xen_free_ballooned_pages);
> 
> -#if defined(CONFIG_XEN_PV)
> +#if defined(CONFIG_XEN_PV) && !defined(CONFIG_XEN_UNPOPULATED_ALLOC)
>   static void __init balloon_add_region(unsigned long start_pfn,
>                         unsigned long pages)
>   {
> @@ -712,7 +711,6 @@ static void __init balloon_add_region(unsigned long 
> start_pfn,
>       balloon_stats.total_pages += extra_pfn_end - start_pfn;
>   }
>   #endif
> -#endif
> 
>   static int __init balloon_init(void)
>   {
> diff --git a/include/xen/balloon.h b/include/xen/balloon.h
> index e93d4f0..f78a6cc 100644
> --- a/include/xen/balloon.h
> +++ b/include/xen/balloon.h
> @@ -26,6 +26,9 @@ extern struct balloon_stats balloon_stats;
> 
>   void balloon_set_new_target(unsigned long target);
> 
> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages);
> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages);
> +
>   #ifdef CONFIG_XEN_BALLOON
>   void xen_balloon_init(void);
>   #else
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 9f031b5..410e3e4 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -52,7 +52,13 @@ bool xen_biovec_phys_mergeable(const struct bio_vec 
> *vec1,
>   extern u64 xen_saved_max_mem_size;
>   #endif
> 
> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>   int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page 
> **pages);
>   void xen_free_unpopulated_pages(unsigned int nr_pages, struct page 
> **pages);
> +#else
> +#define xen_alloc_unpopulated_pages xen_alloc_ballooned_pages
> +#define xen_free_unpopulated_pages xen_free_ballooned_pages

Could you please make those inline functions instead?

Other than that I'm fine with the approach.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-23 21:25               ` Stefano Stabellini
@ 2021-11-24  9:33                 ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-24  9:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: jgross, xen-devel, linux-kernel, Oleksandr Tyshchenko,
	Boris Ostrovsky, Julien Grall


On 23.11.21 23:25, Stefano Stabellini wrote:

Hi Stefano

> On Tue, 23 Nov 2021, Oleksandr wrote:
>>> Actually after reading your replies and explanation I changed opinion: I
>>> think we do need the fallback because Linux cannot really assume that
>>> it is running on "new Xen" so it definitely needs to keep working if
>>> CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not
>>> advertised.
>>>
>>> I think we'll have to roll back some of the changes introduced by
>>> 121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is
>>> enabled we cannot know if we can use unpopulated-alloc or whether we
>>> have to use alloc_xenballooned_pages until we parse the /hypervisor node
>>> in device tree at runtime.
>> Exactly!
>>
>>
>>> In short, we cannot switch between unpopulated-alloc and
>>> alloc_xenballooned_pages at build time, we have to do it at runtime
>>> (boot time).
>> +1
>>
>>
>> I created a patch to partially revert 121f2faca2c0a "xen/balloon: rename
>> alloc/free_xenballooned_pages".
>>
>> If there is no objections I will add it to V3 (which is almost ready, except
>> the fallback bits). Could you please tell me what do you think?
>   
> It makes sense to me. You can add my Reviewed-by.

Great, thank you!


>
>   
>>  From dc79bcd425358596d95e715a8bd8b81deaaeb703 Mon Sep 17 00:00:00 2001
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> Date: Tue, 23 Nov 2021 18:14:41 +0200
>> Subject: [PATCH] xen/balloon: Bring alloc(free)_xenballooned_pages helpers
>>   back
>>
>> This patch rolls back some of the changes introduced by commit
>> 121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages"
>> in order to make possible to still allocate xenballooned pages
>> if CONFIG_XEN_UNPOPULATED_ALLOC is enabled.
>>
>> On Arm the unpopulated pages will be allocated on top of extended
>> regions provided by Xen via device-tree (the subsequent patches
>> will add required bits to support unpopulated-alloc feature on Arm).
>> The problem is that extended regions feature has been introduced
>> into Xen quite recently (during 4.16 release cycle). So this
>> effectively means that Linux must only use unpopulated-alloc on Arm
>> if it is running on "new Xen" which advertises these regions.
>> But, it will only be known after parsing the "hypervisor" node
>> at boot time, so before doing that we cannot assume anything.
>>
>> In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled
>> and the extended regions are not advertised (Linux is running on
>> "old Xen", etc) we need the fallback to alloc_xenballooned_pages().
>>
>> This way we wouldn't reduce the amount of memory usable (wasting
>> RAM pages) for any of the external mappings anymore (and eliminate
>> XSA-300) with "new Xen", but would be still functional ballooning
>> out RAM pages with "old Xen".
>>
>> Also rename alloc(free)_xenballooned_pages to xen_alloc(free)_ballooned_pages.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   drivers/xen/balloon.c | 20 +++++++++-----------
>>   include/xen/balloon.h |  3 +++
>>   include/xen/xen.h     |  6 ++++++
>>   3 files changed, 18 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>> index ba2ea11..a2c4fc49 100644
>> --- a/drivers/xen/balloon.c
>> +++ b/drivers/xen/balloon.c
>> @@ -581,7 +581,6 @@ void balloon_set_new_target(unsigned long target)
>>   }
>>   EXPORT_SYMBOL_GPL(balloon_set_new_target);
>>
>> -#ifndef CONFIG_XEN_UNPOPULATED_ALLOC
>>   static int add_ballooned_pages(unsigned int nr_pages)
>>   {
>>       enum bp_state st;
>> @@ -610,12 +609,12 @@ static int add_ballooned_pages(unsigned int nr_pages)
>>   }
>>
>>   /**
>> - * xen_alloc_unpopulated_pages - get pages that have been ballooned out
>> + * xen_alloc_ballooned_pages - get pages that have been ballooned out
>>    * @nr_pages: Number of pages to get
>>    * @pages: pages returned
>>    * @return 0 on success, error otherwise
>>    */
>> -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
>> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages)
>>   {
>>       unsigned int pgno = 0;
>>       struct page *page;
>> @@ -652,23 +651,23 @@ int xen_alloc_unpopulated_pages(unsigned int nr_pages,
>> struct page **pages)
>>       return 0;
>>    out_undo:
>>       mutex_unlock(&balloon_mutex);
>> -    xen_free_unpopulated_pages(pgno, pages);
>> +    xen_free_ballooned_pages(pgno, pages);
>>       /*
>> -     * NB: free_xenballooned_pages will only subtract pgno pages, but since
>> +     * NB: xen_free_ballooned_pages will only subtract pgno pages, but since
>>        * target_unpopulated is incremented with nr_pages at the start we need
>>        * to remove the remaining ones also, or accounting will be screwed.
>>        */
>>       balloon_stats.target_unpopulated -= nr_pages - pgno;
>>       return ret;
>>   }
>> -EXPORT_SYMBOL(xen_alloc_unpopulated_pages);
>> +EXPORT_SYMBOL(xen_alloc_ballooned_pages);
>>
>>   /**
>> - * xen_free_unpopulated_pages - return pages retrieved with
>> get_ballooned_pages
>> + * xen_free_ballooned_pages - return pages retrieved with get_ballooned_pages
>>    * @nr_pages: Number of pages
>>    * @pages: pages to return
>>    */
>> -void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages)
>> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages)
>>   {
>>       unsigned int i;
>>
>> @@ -687,9 +686,9 @@ void xen_free_unpopulated_pages(unsigned int nr_pages,
>> struct page **pages)
>>
>>       mutex_unlock(&balloon_mutex);
>>   }
>> -EXPORT_SYMBOL(xen_free_unpopulated_pages);
>> +EXPORT_SYMBOL(xen_free_ballooned_pages);
>>
>> -#if defined(CONFIG_XEN_PV)
>> +#if defined(CONFIG_XEN_PV) && !defined(CONFIG_XEN_UNPOPULATED_ALLOC)
>>   static void __init balloon_add_region(unsigned long start_pfn,
>>                         unsigned long pages)
>>   {
>> @@ -712,7 +711,6 @@ static void __init balloon_add_region(unsigned long
>> start_pfn,
>>       balloon_stats.total_pages += extra_pfn_end - start_pfn;
>>   }
>>   #endif
>> -#endif
>>
>>   static int __init balloon_init(void)
>>   {
>> diff --git a/include/xen/balloon.h b/include/xen/balloon.h
>> index e93d4f0..f78a6cc 100644
>> --- a/include/xen/balloon.h
>> +++ b/include/xen/balloon.h
>> @@ -26,6 +26,9 @@ extern struct balloon_stats balloon_stats;
>>
>>   void balloon_set_new_target(unsigned long target);
>>
>> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages);
>> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages);
>> +
>>   #ifdef CONFIG_XEN_BALLOON
>>   void xen_balloon_init(void);
>>   #else
>> diff --git a/include/xen/xen.h b/include/xen/xen.h
>> index 9f031b5..410e3e4 100644
>> --- a/include/xen/xen.h
>> +++ b/include/xen/xen.h
>> @@ -52,7 +52,13 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
>>   extern u64 xen_saved_max_mem_size;
>>   #endif
>>
>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>   int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
>>   void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);
>> +#else
>> +#define xen_alloc_unpopulated_pages xen_alloc_ballooned_pages
>> +#define xen_free_unpopulated_pages xen_free_ballooned_pages
>> +#include <xen/balloon.h>
>> +#endif
>>
>>   #endif    /* _XEN_XEN_H */

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
  2021-11-24  5:16               ` Juergen Gross
@ 2021-11-24  9:37                 ` Oleksandr
  0 siblings, 0 replies; 41+ messages in thread
From: Oleksandr @ 2021-11-24  9:37 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, xen-devel, linux-kernel,
	Oleksandr Tyshchenko, Boris Ostrovsky, Julien Grall


On 24.11.21 07:16, Juergen Gross wrote:

Hi Juergen

> On 23.11.21 17:46, Oleksandr wrote:
>>
>> On 20.11.21 04:19, Stefano Stabellini wrote:
>>
>> Hi Stefano, Juergen, all
>>
>>
>>> Juergen please see the bottom of the email
>>>
>>> On Fri, 19 Nov 2021, Oleksandr wrote:
>>>> On 19.11.21 02:59, Stefano Stabellini wrote:
>>>>> On Tue, 9 Nov 2021, Oleksandr wrote:
>>>>>> On 28.10.21 19:37, Stefano Stabellini wrote:
>>>>>>
>>>>>> Hi Stefano
>>>>>>
>>>>>> I am sorry for the late response.
>>>>>>
>>>>>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>>
>>>>>>>> The main reason of this change is that unpopulated-alloc
>>>>>>>> code cannot be used in its current form on Arm, but there
>>>>>>>> is a desire to reuse it to avoid wasting real RAM pages
>>>>>>>> for the grant/foreign mappings.
>>>>>>>>
>>>>>>>> The problem is that system "iomem_resource" is used for
>>>>>>>> the address space allocation, but the really unallocated
>>>>>>>> space can't be figured out precisely by the domain on Arm
>>>>>>>> without hypervisor involvement. For example, not all device
>>>>>>>> I/O regions are known by the time domain starts creating
>>>>>>>> grant/foreign mappings. And following the advise from
>>>>>>>> "iomem_resource" we might end up reusing these regions by
>>>>>>>> a mistake. So, the hypervisor which maintains the P2M for
>>>>>>>> the domain is in the best position to provide unused regions
>>>>>>>> of guest physical address space which could be safely used
>>>>>>>> to create grant/foreign mappings.
>>>>>>>>
>>>>>>>> Introduce new helper arch_xen_unpopulated_init() which purpose
>>>>>>>> is to create specific Xen resource based on the memory regions
>>>>>>>> provided by the hypervisor to be used as unused space for Xen
>>>>>>>> scratch pages.
>>>>>>>>
>>>>>>>> If arch doesn't implement arch_xen_unpopulated_init() to
>>>>>>>> initialize Xen resource the default "iomem_resource" will be used.
>>>>>>>> So the behavior on x86 won't be changed.
>>>>>>>>
>>>>>>>> Also fall back to allocate xenballooned pages (steal real RAM
>>>>>>>> pages) if we do not have any suitable resource to work with and
>>>>>>>> as the result we won't be able to provide unpopulated pages.
>>>>>>>>
>>>>>>>> Signed-off-by: Oleksandr Tyshchenko 
>>>>>>>> <oleksandr_tyshchenko@epam.com>
>>>>>>>> ---
>>>>>>>> Changes RFC -> V2:
>>>>>>>>       - new patch, instead of
>>>>>>>>        "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
>>>>>>>> provide
>>>>>>>> unallocated space"
>>>>>>>> ---
>>>>>>>>     drivers/xen/unpopulated-alloc.c | 89
>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>>     include/xen/xen.h               |  2 +
>>>>>>>>     2 files changed, 88 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/xen/unpopulated-alloc.c
>>>>>>>> b/drivers/xen/unpopulated-alloc.c
>>>>>>>> index a03dc5b..1f1d8d8 100644
>>>>>>>> --- a/drivers/xen/unpopulated-alloc.c
>>>>>>>> +++ b/drivers/xen/unpopulated-alloc.c
>>>>>>>> @@ -8,6 +8,7 @@
>>>>>>>>       #include <asm/page.h>
>>>>>>>>     +#include <xen/balloon.h>
>>>>>>>>     #include <xen/page.h>
>>>>>>>>     #include <xen/xen.h>
>>>>>>>>     @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
>>>>>>>>     static struct page *page_list;
>>>>>>>>     static unsigned int list_count;
>>>>>>>>     +static struct resource *target_resource;
>>>>>>>> +static struct resource xen_resource = {
>>>>>>>> +    .name = "Xen unused space",
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * If arch is not happy with system "iomem_resource" being 
>>>>>>>> used for
>>>>>>>> + * the region allocation it can provide it's own view by 
>>>>>>>> initializing
>>>>>>>> + * "xen_resource" with unused regions of guest physical 
>>>>>>>> address space
>>>>>>>> + * provided by the hypervisor.
>>>>>>>> + */
>>>>>>>> +int __weak arch_xen_unpopulated_init(struct resource *res)
>>>>>>>> +{
>>>>>>>> +    return -ENOSYS;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static int fill_list(unsigned int nr_pages)
>>>>>>>>     {
>>>>>>>>         struct dev_pagemap *pgmap;
>>>>>>>> -    struct resource *res;
>>>>>>>> +    struct resource *res, *tmp_res = NULL;
>>>>>>>>         void *vaddr;
>>>>>>>>         unsigned int i, alloc_pages = round_up(nr_pages,
>>>>>>>> PAGES_PER_SECTION);
>>>>>>>> -    int ret = -ENOMEM;
>>>>>>>> +    int ret;
>>>>>>>>           res = kzalloc(sizeof(*res), GFP_KERNEL);
>>>>>>>>         if (!res)
>>>>>>>> @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
>>>>>>>>         res->name = "Xen scratch";
>>>>>>>>         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>>>>>>>     -    ret = allocate_resource(&iomem_resource, res,
>>>>>>>> +    ret = allocate_resource(target_resource, res,
>>>>>>>>                     alloc_pages * PAGE_SIZE, 0, -1,
>>>>>>>>                     PAGES_PER_SECTION * PAGE_SIZE, NULL,
>>>>>>>> NULL);
>>>>>>>>         if (ret < 0) {
>>>>>>>> @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
>>>>>>>>             goto err_resource;
>>>>>>>>         }
>>>>>>>>     +    /*
>>>>>>>> +     * Reserve the region previously allocated from Xen resource
>>>>>>>> to avoid
>>>>>>>> +     * re-using it by someone else.
>>>>>>>> +     */
>>>>>>>> +    if (target_resource != &iomem_resource) {
>>>>>>>> +        tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>>>>>>>> +        if (!res) {
>>>>>>>> +            ret = -ENOMEM;
>>>>>>>> +            goto err_insert;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        tmp_res->name = res->name;
>>>>>>>> +        tmp_res->start = res->start;
>>>>>>>> +        tmp_res->end = res->end;
>>>>>>>> +        tmp_res->flags = res->flags;
>>>>>>>> +
>>>>>>>> +        ret = insert_resource(&iomem_resource, tmp_res);
>>>>>>>> +        if (ret < 0) {
>>>>>>>> +            pr_err("Cannot insert IOMEM resource [%llx -
>>>>>>>> %llx]\n",
>>>>>>>> +                   tmp_res->start, tmp_res->end);
>>>>>>>> +            kfree(tmp_res);
>>>>>>>> +            goto err_insert;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>> I am a bit confused.. why do we need to do this? Who could be
>>>>>>> erroneously re-using the region? Are you saying that the next time
>>>>>>> allocate_resource is called it could find the same region again? It
>>>>>>> doesn't seem possible?
>>>>>> No, as I understand the allocate_resource() being called for the 
>>>>>> same root
>>>>>> resource won't provide the same region... We only need to do this 
>>>>>> (insert
>>>>>> the
>>>>>> region into "iomem_resource") if we allocated it from our *internal*
>>>>>> "xen_resource", as *global* "iomem_resource" (which is used 
>>>>>> everywhere) is
>>>>>> not
>>>>>> aware of that region has been already allocated. So inserting a 
>>>>>> region
>>>>>> here we
>>>>>> reserving it, otherwise it could be reused elsewhere.
>>>>> But elsewhere where?
>>>> I think, theoretically everywhere where 
>>>> allocate_resource(&iomem_resource,
>>>> ...) is called.
>>>>
>>>>
>>>>> Let's say that allocate_resource allocates a range from xen_resource.
>>>>>   From reading the code, it doesn't look like iomem_resource would 
>>>>> have
>>>>> that range because the extended regions described under 
>>>>> /hypervisor are
>>>>> not added automatically to iomem_resource.
>>>>>
>>>>> So what if we don't call insert_resource? Nothing could allocate the
>>>>> same range because iomem_resource doesn't have it at all and
>>>>> xen_resource is not used anywhere if not here.
>>>>>
>>>>> What am I missing?
>>>>
>>>> Below my understanding which, of course, might be wrong.
>>>>
>>>> If we don't claim resource by calling insert_resource (or even
>>>> request_resource) here then the same range could be allocated 
>>>> everywhere where
>>>> allocate_resource(&iomem_resource, ...) is called.
>>>> I don't see what prevents the same range from being allocated. Why 
>>>> actually
>>>> allocate_resource(&iomem_resource, ...) can't provide the same 
>>>> range if it is
>>>> free (not-reserved-yet) from it's PoV? The comment above 
>>>> allocate_resource()
>>>> says "allocate empty slot in the resource tree given range & 
>>>> alignment". So
>>>> this "empty slot" could be exactly the same range.
>>>>
>>>> I experimented with that a bit trying to call
>>>> allocate_resource(&iomem_resource, ...) several times in another 
>>>> place to see
>>>> what ranges it returns in both cases (w/ and w/o calling 
>>>> insert_resource
>>>> here). So an experiment confirmed (of course, if I made it 
>>>> correctly) that the
>>>> same range could be allocated if we didn't call insert_resource() 
>>>> here. And as
>>>> I understand there is nothing strange here, as iomem_resource 
>>>> covers all
>>>> address space initially (0, -1) and everything *not* 
>>>> inserted/requested (in
>>>> other words, reserved) yet is considered as free and could be 
>>>> provided if fits
>>>> constraints. Or I really missed something?
>>> Thanks for the explanation! It was me that didn't know that
>>> iomem_resource covers all the address space initially. I thought it was
>>> populated only with actual iomem ranges. Now it makes sense, thanks!
>>>
>>>
>>>> It feels to me that it would be better to call request_resource() 
>>>> instead of
>>>> insert_resource(). It seems, that if no conflict happens both 
>>>> functions will
>>>> behave in same way, but in case of conflict if the conflicting 
>>>> resource
>>>> entirely fit the new resource the former will return an error. I 
>>>> think, this
>>>> way we will be able to detect that a range we are trying to reserve 
>>>> is already
>>>> present and bail out early.
>>>>
>>>>
>>>>> Or maybe it is the other way around: core Linux code assumes 
>>>>> everything
>>>>> is described in iomem_resource so something under kernel/ or mm/ 
>>>>> would
>>>>> crash if we start using a page pointing to an address missing from
>>>>> iomem_resource?
>>>>>>>>         pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
>>>>>>>>         if (!pgmap) {
>>>>>>>>             ret = -ENOMEM;
>>>>>>>> @@ -95,12 +137,40 @@ static int fill_list(unsigned int nr_pages)
>>>>>>>>     err_memremap:
>>>>>>>>         kfree(pgmap);
>>>>>>>>     err_pgmap:
>>>>>>>> +    if (tmp_res) {
>>>>>>>> +        release_resource(tmp_res);
>>>>>>>> +        kfree(tmp_res);
>>>>>>>> +    }
>>>>>>>> +err_insert:
>>>>>>>>         release_resource(res);
>>>>>>>>     err_resource:
>>>>>>>>         kfree(res);
>>>>>>>>         return ret;
>>>>>>>>     }
>>>>>>>>     +static void unpopulated_init(void)
>>>>>>>> +{
>>>>>>>> +    static bool inited = false;
>>>>>>> initialized = false
>>>>>> ok.
>>>>>>
>>>>>>
>>>>>>>> +    int ret;
>>>>>>>> +
>>>>>>>> +    if (inited)
>>>>>>>> +        return;
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * Try to initialize Xen resource the first and fall back to
>>>>>>>> default
>>>>>>>> +     * resource if arch doesn't offer one.
>>>>>>>> +     */
>>>>>>>> +    ret = arch_xen_unpopulated_init(&xen_resource);
>>>>>>>> +    if (!ret)
>>>>>>>> +        target_resource = &xen_resource;
>>>>>>>> +    else if (ret == -ENOSYS)
>>>>>>>> +        target_resource = &iomem_resource;
>>>>>>>> +    else
>>>>>>>> +        pr_err("Cannot initialize Xen resource\n");
>>>>>>>> +
>>>>>>>> +    inited = true;
>>>>>>>> +}
>>>>>>> Would it make sense to call unpopulated_init from an init function,
>>>>>>> rather than every time xen_alloc_unpopulated_pages is called?
>>>>>> Good point, thank you. Will do. To be honest, I also don't like the
>>>>>> current
>>>>>> approach much.
>>>>>>
>>>>>>
>>>>>>>>     /**
>>>>>>>>      * xen_alloc_unpopulated_pages - alloc unpopulated pages
>>>>>>>>      * @nr_pages: Number of pages
>>>>>>>> @@ -112,6 +182,16 @@ int xen_alloc_unpopulated_pages(unsigned int
>>>>>>>> nr_pages, struct page **pages)
>>>>>>>>         unsigned int i;
>>>>>>>>         int ret = 0;
>>>>>>>>     +    unpopulated_init();
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * Fall back to default behavior if we do not have any
>>>>>>>> suitable
>>>>>>>> resource
>>>>>>>> +     * to allocate required region from and as the result we 
>>>>>>>> won't
>>>>>>>> be able
>>>>>>>> to
>>>>>>>> +     * construct pages.
>>>>>>>> +     */
>>>>>>>> +    if (!target_resource)
>>>>>>>> +        return alloc_xenballooned_pages(nr_pages, pages);
>>>>>>> The commit message says that the behavior on x86 doesn't change 
>>>>>>> but this
>>>>>>> seems to be a change that could impact x86?
>>>>>> I don't think, however I didn't tested on x86 and might be wrong, 
>>>>>> but
>>>>>> according to the current patch, on x86 the "target_resource" is 
>>>>>> always
>>>>>> valid
>>>>>> and points to the "iomem_resource" as arch_xen_unpopulated_init() 
>>>>>> is not
>>>>>> implemented. So there won't be any fallback to use
>>>>>> alloc_(free)_xenballooned_pages() here and fill_list() will 
>>>>>> behave as
>>>>>> usual.
>>>>>    If target_resource is always valid, then we don't need this 
>>>>> special
>>>>> check. In fact, the condition should never be true.
>>>>
>>>> The target_resource is always valid and points to the 
>>>> "iomem_resource" on x86
>>>> (this is equivalent to the behavior before this patch).
>>>> On Arm target_resource might be NULL if arch_xen_unpopulated_init() 
>>>> failed,
>>>> for example, if no extended regions reported by the hypervisor.
>>>> We cannot use "iomem_resource" on Arm, only a resource constructed 
>>>> from
>>>> extended regions. This is why I added that check (and fallback to 
>>>> xenballooned
>>>> pages).
>>>> What I was thinking is that in case of using old Xen (although we 
>>>> would need
>>>> to balloon out RAM pages) we still would be able to keep working, 
>>>> so no need
>>>> to disable CONFIG_XEN_UNPOPULATED_ALLOC on such setups.
>>>>>> You raised a really good question, on Arm we need a fallback to 
>>>>>> balloon
>>>>>> out
>>>>>> RAM pages again if hypervisor doesn't provide extended regions 
>>>>>> (we run on
>>>>>> old
>>>>>> version, no unused regions with reasonable size, etc), so I 
>>>>>> decided to put
>>>>>> a
>>>>>> fallback code here, an indicator of the failure is invalid
>>>>>> "target_resource".
>>>>> I think it is unnecessary as we already assume today that
>>>>> &iomem_resource is always available.
>>>>>> I noticed the patch which is about to be upstreamed that removes
>>>>>> alloc_(free)xenballooned_pages API [1]. Right now I have no idea 
>>>>>> how/where
>>>>>> this fallback could be implemented as this is under build option 
>>>>>> control
>>>>>> (CONFIG_XEN_UNPOPULATED_ALLOC). So the API with the same name is 
>>>>>> either
>>>>>> used
>>>>>> for unpopulated pages (if set) or ballooned pages (if not set). I 
>>>>>> would
>>>>>> appreciate suggestions regarding that. I am wondering would it be 
>>>>>> possible
>>>>>> and
>>>>>> correctly to have both mechanisms (unpopulated and ballooned) 
>>>>>> enabled by
>>>>>> default and some init code to decide which one to use at runtime 
>>>>>> or some
>>>>>> sort?
>>>>> I would keep it simple and remove the fallback from this patch. So:
>>>>>
>>>>> - if not CONFIG_XEN_UNPOPULATED_ALLOC, then balloon
>>>>> - if CONFIG_XEN_UNPOPULATED_ALLOC, then
>>>>>       - xen_resource if present
>>>>>       - otherwise iomem_resource
>>>> Unfortunately, we cannot use iomem_resource on Arm safely, either 
>>>> xen_resource
>>>> or fail (if no fallback exists).
>>>>
>>>>
>>>>> The xen_resource/iomem_resource config can be done at init time using
>>>>> target_resource. At runtime, target_resource is always != NULL so we
>>>>> just go ahead and use it.
>>>>
>>>> Thank you for the suggestion. OK, let's keep it simple and drop 
>>>> fallback
>>>> attempts for now. With one remark:
>>>> We will make CONFIG_XEN_UNPOPULATED_ALLOC disabled by default on 
>>>> Arm in next
>>>> patch. So by default everything will behave as usual on Arm 
>>>> (balloon out RAM
>>>> pages),
>>>> if user knows for sure that Xen reports extended regions, he/she 
>>>> can enable
>>>> the config. This way we won't break anything. What do you think?
>>> Actually after reading your replies and explanation I changed 
>>> opinion: I
>>> think we do need the fallback because Linux cannot really assume that
>>> it is running on "new Xen" so it definitely needs to keep working if
>>> CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are 
>>> not
>>> advertised.
>>>
>>> I think we'll have to roll back some of the changes introduced by
>>> 121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is
>>> enabled we cannot know if we can use unpopulated-alloc or whether we
>>> have to use alloc_xenballooned_pages until we parse the /hypervisor 
>>> node
>>> in device tree at runtime.
>>
>> Exactly!
>>
>>
>>>
>>> In short, we cannot switch between unpopulated-alloc and
>>> alloc_xenballooned_pages at build time, we have to do it at runtime
>>> (boot time).
>>
>> +1
>>
>>
>> I created a patch to partially revert 121f2faca2c0a "xen/balloon: 
>> rename alloc/free_xenballooned_pages".
>>
>> If there is no objections I will add it to V3 (which is almost ready, 
>> except the fallback bits). Could you please tell me what do you think?
>>
>>
>>  From dc79bcd425358596d95e715a8bd8b81deaaeb703 Mon Sep 17 00:00:00 2001
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> Date: Tue, 23 Nov 2021 18:14:41 +0200
>> Subject: [PATCH] xen/balloon: Bring alloc(free)_xenballooned_pages 
>> helpers
>>   back
>>
>> This patch rolls back some of the changes introduced by commit
>> 121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages"
>> in order to make possible to still allocate xenballooned pages
>> if CONFIG_XEN_UNPOPULATED_ALLOC is enabled.
>>
>> On Arm the unpopulated pages will be allocated on top of extended
>> regions provided by Xen via device-tree (the subsequent patches
>> will add required bits to support unpopulated-alloc feature on Arm).
>> The problem is that extended regions feature has been introduced
>> into Xen quite recently (during 4.16 release cycle). So this
>> effectively means that Linux must only use unpopulated-alloc on Arm
>> if it is running on "new Xen" which advertises these regions.
>> But, it will only be known after parsing the "hypervisor" node
>> at boot time, so before doing that we cannot assume anything.
>>
>> In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled
>> and the extended regions are not advertised (Linux is running on
>> "old Xen", etc) we need the fallback to alloc_xenballooned_pages().
>>
>> This way we wouldn't reduce the amount of memory usable (wasting
>> RAM pages) for any of the external mappings anymore (and eliminate
>> XSA-300) with "new Xen", but would be still functional ballooning
>> out RAM pages with "old Xen".
>>
>> Also rename alloc(free)_xenballooned_pages to 
>> xen_alloc(free)_ballooned_pages.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   drivers/xen/balloon.c | 20 +++++++++-----------
>>   include/xen/balloon.h |  3 +++
>>   include/xen/xen.h     |  6 ++++++
>>   3 files changed, 18 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>> index ba2ea11..a2c4fc49 100644
>> --- a/drivers/xen/balloon.c
>> +++ b/drivers/xen/balloon.c
>> @@ -581,7 +581,6 @@ void balloon_set_new_target(unsigned long target)
>>   }
>>   EXPORT_SYMBOL_GPL(balloon_set_new_target);
>>
>> -#ifndef CONFIG_XEN_UNPOPULATED_ALLOC
>>   static int add_ballooned_pages(unsigned int nr_pages)
>>   {
>>       enum bp_state st;
>> @@ -610,12 +609,12 @@ static int add_ballooned_pages(unsigned int 
>> nr_pages)
>>   }
>>
>>   /**
>> - * xen_alloc_unpopulated_pages - get pages that have been ballooned out
>> + * xen_alloc_ballooned_pages - get pages that have been ballooned out
>>    * @nr_pages: Number of pages to get
>>    * @pages: pages returned
>>    * @return 0 on success, error otherwise
>>    */
>> -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page 
>> **pages)
>> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page 
>> **pages)
>>   {
>>       unsigned int pgno = 0;
>>       struct page *page;
>> @@ -652,23 +651,23 @@ int xen_alloc_unpopulated_pages(unsigned int 
>> nr_pages, struct page **pages)
>>       return 0;
>>    out_undo:
>>       mutex_unlock(&balloon_mutex);
>> -    xen_free_unpopulated_pages(pgno, pages);
>> +    xen_free_ballooned_pages(pgno, pages);
>>       /*
>> -     * NB: free_xenballooned_pages will only subtract pgno pages, 
>> but since
>> +     * NB: xen_free_ballooned_pages will only subtract pgno pages, 
>> but since
>>        * target_unpopulated is incremented with nr_pages at the start 
>> we need
>>        * to remove the remaining ones also, or accounting will be 
>> screwed.
>>        */
>>       balloon_stats.target_unpopulated -= nr_pages - pgno;
>>       return ret;
>>   }
>> -EXPORT_SYMBOL(xen_alloc_unpopulated_pages);
>> +EXPORT_SYMBOL(xen_alloc_ballooned_pages);
>>
>>   /**
>> - * xen_free_unpopulated_pages - return pages retrieved with 
>> get_ballooned_pages
>> + * xen_free_ballooned_pages - return pages retrieved with 
>> get_ballooned_pages
>>    * @nr_pages: Number of pages
>>    * @pages: pages to return
>>    */
>> -void xen_free_unpopulated_pages(unsigned int nr_pages, struct page 
>> **pages)
>> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page 
>> **pages)
>>   {
>>       unsigned int i;
>>
>> @@ -687,9 +686,9 @@ void xen_free_unpopulated_pages(unsigned int 
>> nr_pages, struct page **pages)
>>
>>       mutex_unlock(&balloon_mutex);
>>   }
>> -EXPORT_SYMBOL(xen_free_unpopulated_pages);
>> +EXPORT_SYMBOL(xen_free_ballooned_pages);
>>
>> -#if defined(CONFIG_XEN_PV)
>> +#if defined(CONFIG_XEN_PV) && !defined(CONFIG_XEN_UNPOPULATED_ALLOC)
>>   static void __init balloon_add_region(unsigned long start_pfn,
>>                         unsigned long pages)
>>   {
>> @@ -712,7 +711,6 @@ static void __init balloon_add_region(unsigned 
>> long start_pfn,
>>       balloon_stats.total_pages += extra_pfn_end - start_pfn;
>>   }
>>   #endif
>> -#endif
>>
>>   static int __init balloon_init(void)
>>   {
>> diff --git a/include/xen/balloon.h b/include/xen/balloon.h
>> index e93d4f0..f78a6cc 100644
>> --- a/include/xen/balloon.h
>> +++ b/include/xen/balloon.h
>> @@ -26,6 +26,9 @@ extern struct balloon_stats balloon_stats;
>>
>>   void balloon_set_new_target(unsigned long target);
>>
>> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page 
>> **pages);
>> +void xen_free_ballooned_pages(unsigned int nr_pages, struct page 
>> **pages);
>> +
>>   #ifdef CONFIG_XEN_BALLOON
>>   void xen_balloon_init(void);
>>   #else
>> diff --git a/include/xen/xen.h b/include/xen/xen.h
>> index 9f031b5..410e3e4 100644
>> --- a/include/xen/xen.h
>> +++ b/include/xen/xen.h
>> @@ -52,7 +52,13 @@ bool xen_biovec_phys_mergeable(const struct 
>> bio_vec *vec1,
>>   extern u64 xen_saved_max_mem_size;
>>   #endif
>>
>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>   int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page 
>> **pages);
>>   void xen_free_unpopulated_pages(unsigned int nr_pages, struct page 
>> **pages);
>> +#else
>> +#define xen_alloc_unpopulated_pages xen_alloc_ballooned_pages
>> +#define xen_free_unpopulated_pages xen_free_ballooned_pages
>
> Could you please make those inline functions instead?

Sure, will make.


>
>
> Other than that I'm fine with the approach.

Great, thank you!


>
>
> Juergen

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-11-24  9:37 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26 16:05 [PATCH V2 0/4] xen: Add support of extended regions (safe ranges) on Arm Oleksandr Tyshchenko
2021-10-26 16:05 ` Oleksandr Tyshchenko
2021-10-26 16:05 ` [PATCH V2 1/4] xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list() Oleksandr Tyshchenko
2021-10-28 18:57   ` Boris Ostrovsky
2021-10-26 16:05 ` [PATCH V2 2/4] arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT Oleksandr Tyshchenko
2021-10-26 16:05   ` Oleksandr Tyshchenko
2021-10-28  1:28   ` Stefano Stabellini
2021-10-28  1:28     ` Stefano Stabellini
2021-11-10 22:14     ` Oleksandr
2021-11-10 22:14       ` Oleksandr
2021-11-19  0:32       ` Stefano Stabellini
2021-11-19  0:32         ` Stefano Stabellini
2021-11-19 18:25         ` Oleksandr
2021-11-19 18:25           ` Oleksandr
2021-10-26 16:05 ` [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource Oleksandr Tyshchenko
2021-10-28 16:37   ` Stefano Stabellini
2021-11-09 18:34     ` Oleksandr
2021-11-19  0:59       ` Stefano Stabellini
2021-11-19 18:18         ` Oleksandr
2021-11-20  2:19           ` Stefano Stabellini
2021-11-23 16:46             ` Oleksandr
2021-11-23 21:25               ` Stefano Stabellini
2021-11-24  9:33                 ` Oleksandr
2021-11-24  5:16               ` Juergen Gross
2021-11-24  9:37                 ` Oleksandr
2021-10-28 19:08   ` Boris Ostrovsky
2021-11-09 18:51     ` Oleksandr
2021-10-26 16:05 ` [PATCH V2 4/4] arm/xen: Read extended regions from DT and init " Oleksandr Tyshchenko
2021-10-26 16:05   ` Oleksandr Tyshchenko
2021-10-28  1:40   ` Stefano Stabellini
2021-10-28  1:40     ` Stefano Stabellini
2021-11-10 20:21     ` Oleksandr
2021-11-10 20:21       ` Oleksandr
2021-11-19  1:19       ` Stefano Stabellini
2021-11-19  1:19         ` Stefano Stabellini
2021-11-19 20:23         ` Oleksandr
2021-11-19 20:23           ` Oleksandr
2021-11-20  2:36           ` Stefano Stabellini
2021-11-20  2:36             ` Stefano Stabellini
2021-11-20 13:38             ` Oleksandr
2021-11-20 13:38               ` Oleksandr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.