linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer
@ 2022-05-07 18:19 Oleksandr Tyshchenko
  2022-05-07 18:19 ` [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops() Oleksandr Tyshchenko
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, linux-kernel, linux-arm-kernel, virtualization, x86
  Cc: Oleksandr Tyshchenko, Michael S. Tsirkin, Christoph Hellwig,
	Stefano Stabellini, Boris Ostrovsky, Juergen Gross, Julien Grall,
	Bertrand Marquis, Wei Chen, Henry Wang, Kaly Xin, Jiamei Xie,
	Alex Bennée

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Hello all.

The purpose of this patch series is to add support for restricting memory access under Xen using specific
grant table [1] based DMA-mapping layer. Patch series is based on Juergen Gross’ initial work [2] which implies
using grant references instead of raw guest physical addresses (GPA) for the virtio communications (some
kind of the software IOMMU).

You can find RFC-V1 patch series (and previous discussions) at [3].

The high level idea is to create new Xen’s grant table based DMA-mapping layer for the guest Linux whose main
purpose is to provide a special 64-bit DMA address which is formed by using the grant reference (for a page
to be shared with the backend) with offset and setting the highest address bit (this is for the backend to
be able to distinguish grant ref based DMA address from normal GPA). For this to work we need the ability
to allocate contiguous (consecutive) grant references for multi-page allocations. And the backend then needs
to offer VIRTIO_F_ACCESS_PLATFORM and VIRTIO_F_VERSION_1 feature bits (it must support virtio-mmio modern
transport for 64-bit addresses in the virtqueue).

Xen's grant mapping mechanism is the secure and safe solution to share pages between domains which proven
to work and works for years (in the context of traditional Xen PV drivers for example). So far, the foreign
mapping is used for the virtio backend to map and access guest memory. With the foreign mapping, the backend
is able to map arbitrary pages from the guest memory (or even from Dom0 memory). And as the result, the malicious
backend which runs in a non-trusted domain can take advantage of this. Instead, with the grant mapping
the backend is only allowed to map pages which were explicitly granted by the guest before and nothing else.
According to the discussions in various mainline threads this solution would likely be welcome because it
perfectly fits in the security model Xen provides.

What is more, the grant table based solution requires zero changes to the Xen hypervisor itself at least
with virtio-mmio and DT (in comparison, for example, with "foreign mapping + virtio-iommu" solution which would
require the whole new complex emulator in hypervisor in addition to new functionality/hypercall to pass IOVA
from the virtio backend running elsewhere to the hypervisor and translate it to the GPA before mapping into
P2M or denying the foreign mapping request if no corresponding IOVA-GPA mapping present in the IOMMU page table
for that particular device). We only need to update toolstack to insert a new "xen,dev-domid" property to
the virtio-mmio device node when creating a guest device-tree (this is an indicator for the guest to use grants
and the ID of Xen domain where the corresponding backend resides, it is used as an argument to the grant mapping
APIs). It worth mentioning that toolstack patch is based on non upstreamed yet “Virtio support for toolstack
on Arm” series which is on review now [4].

Please note the following:
- Patch series only covers Arm and virtio-mmio (device-tree) for now. To enable the restricted memory access
  feature on Arm the following option should be set:
  CONFIG_XEN_VIRTIO = y
- Patch series is based on "platform_has()" patch series which is on review now [5]
- Xen should be built with the following options:
  CONFIG_IOREQ_SERVER=y
  CONFIG_EXPERT=y

Patch series is rebased on Linux 5.18-rc4 tag with "platform_has()" series applied and tested on Renesas
Salvator-X board + H3 ES3.0 SoC (Arm64) with standalone userspace (non-Qemu) virtio-mmio based virtio-disk
backend running in Driver domain and Linux guest running on existing virtio-blk driver (frontend).
No issues were observed. Guest domain 'reboot/destroy' use-cases work properly. I have also tested other
use-cases such as assigning several virtio block devices or a mix of virtio and Xen PV block devices
to the guest. Patch series was build-tested on Arm32 and x86.

1. Xen changes located at (last patch):
https://github.com/otyshchenko1/xen/commits/libxl_virtio_next1
2. Linux changes located at (last 7 patches):
https://github.com/otyshchenko1/linux/commits/virtio_grant7
3. virtio-disk changes located at:
https://github.com/otyshchenko1/virtio-disk/commits/virtio_grant

Any feedback/help would be highly appreciated.

[1] https://xenbits.xenproject.org/docs/4.16-testing/misc/grant-tables.txt
[2] https://www.youtube.com/watch?v=IrlEdaIUDPk
[3] https://lore.kernel.org/xen-devel/1649963973-22879-1-git-send-email-olekstysh@gmail.com/
    https://lore.kernel.org/xen-devel/1650646263-22047-1-git-send-email-olekstysh@gmail.com/
[4] https://lore.kernel.org/xen-devel/1651598763-12162-1-git-send-email-olekstysh@gmail.com/
[5] https://lore.kernel.org/xen-devel/20220504155703.13336-1-jgross@suse.com/

Juergen Gross (3):
  xen/grants: support allocating consecutive grants
  xen/grant-dma-ops: Add option to restrict memory access under Xen
  xen/virtio: Enable restricted memory access using Xen grant mappings

Oleksandr Tyshchenko (4):
  arm/xen: Introduce xen_setup_dma_ops()
  dt-bindings: Add xen,dev-domid property description for xen-grant DMA
    ops
  xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices
  arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices

 .../devicetree/bindings/arm/xen,dev-domid.yaml     |  37 +++
 Documentation/devicetree/bindings/virtio/mmio.yaml |   7 +
 arch/arm/include/asm/xen/xen-ops.h                 |   2 +
 arch/arm/mm/dma-mapping.c                          |   7 +-
 arch/arm/xen/enlighten.c                           |   2 +
 arch/arm64/include/asm/xen/xen-ops.h               |   2 +
 arch/arm64/mm/dma-mapping.c                        |   7 +-
 arch/x86/xen/enlighten_hvm.c                       |   2 +
 arch/x86/xen/enlighten_pv.c                        |   2 +
 drivers/xen/Kconfig                                |  15 +
 drivers/xen/Makefile                               |   1 +
 drivers/xen/grant-dma-ops.c                        | 324 +++++++++++++++++++++
 drivers/xen/grant-table.c                          | 238 +++++++++++++--
 include/xen/arm/xen-ops.h                          |  18 ++
 include/xen/grant_table.h                          |   4 +
 include/xen/xen-ops.h                              |  13 +
 include/xen/xen.h                                  |   8 +
 17 files changed, 647 insertions(+), 42 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
 create mode 100644 arch/arm/include/asm/xen/xen-ops.h
 create mode 100644 arch/arm64/include/asm/xen/xen-ops.h
 create mode 100644 drivers/xen/grant-dma-ops.c
 create mode 100644 include/xen/arm/xen-ops.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops()
  2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
@ 2022-05-07 18:19 ` Oleksandr Tyshchenko
  2022-05-07 18:52   ` Catalin Marinas
  2022-05-07 18:19 ` [PATCH V2 2/7] xen/grants: support allocating consecutive grants Oleksandr Tyshchenko
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, linux-kernel, linux-arm-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Russell King,
	Catalin Marinas, Will Deacon, Boris Ostrovsky, Juergen Gross,
	Julien Grall, Michael S. Tsirkin, Christoph Hellwig

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch introduces new helper and places it in new header.
The helper's purpose is to assign any Xen specific DMA ops in
a single place. For now, we deal with xen-swiotlb DMA ops only.
The one of the subsequent commits in current series will add
xen-grant DMA ops case.

Also re-use the xen_swiotlb_detect() check on Arm32.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
Changes RFC -> V1:
   - update commit description
   - move commit to the beginning of the series
   - move #ifdef CONFIG_XEN from dma-mapping.c to xen-ops.h

Changes V1 -> V2:
   - add Stefano's R-b
   - add missing SPDX-License-Identifier to xen-ops.h
---
 arch/arm/include/asm/xen/xen-ops.h   |  2 ++
 arch/arm/mm/dma-mapping.c            |  7 ++-----
 arch/arm64/include/asm/xen/xen-ops.h |  2 ++
 arch/arm64/mm/dma-mapping.c          |  7 ++-----
 include/xen/arm/xen-ops.h            | 15 +++++++++++++++
 5 files changed, 23 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm/include/asm/xen/xen-ops.h
 create mode 100644 arch/arm64/include/asm/xen/xen-ops.h
 create mode 100644 include/xen/arm/xen-ops.h

diff --git a/arch/arm/include/asm/xen/xen-ops.h b/arch/arm/include/asm/xen/xen-ops.h
new file mode 100644
index 00000000..7ebb7eb
--- /dev/null
+++ b/arch/arm/include/asm/xen/xen-ops.h
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <xen/arm/xen-ops.h>
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82ffac6..059cce0 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -33,7 +33,7 @@
 #include <asm/dma-iommu.h>
 #include <asm/mach/map.h>
 #include <asm/system_info.h>
-#include <xen/swiotlb-xen.h>
+#include <asm/xen/xen-ops.h>
 
 #include "dma.h"
 #include "mm.h"
@@ -2287,10 +2287,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 
 	set_dma_ops(dev, dma_ops);
 
-#ifdef CONFIG_XEN
-	if (xen_initial_domain())
-		dev->dma_ops = &xen_swiotlb_dma_ops;
-#endif
+	xen_setup_dma_ops(dev);
 	dev->archdata.dma_ops_setup = true;
 }
 
diff --git a/arch/arm64/include/asm/xen/xen-ops.h b/arch/arm64/include/asm/xen/xen-ops.h
new file mode 100644
index 00000000..7ebb7eb
--- /dev/null
+++ b/arch/arm64/include/asm/xen/xen-ops.h
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <xen/arm/xen-ops.h>
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 6719f9e..6099c81 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -9,9 +9,9 @@
 #include <linux/dma-map-ops.h>
 #include <linux/dma-iommu.h>
 #include <xen/xen.h>
-#include <xen/swiotlb-xen.h>
 
 #include <asm/cacheflush.h>
+#include <asm/xen/xen-ops.h>
 
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir)
@@ -52,8 +52,5 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (iommu)
 		iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
 
-#ifdef CONFIG_XEN
-	if (xen_swiotlb_detect())
-		dev->dma_ops = &xen_swiotlb_dma_ops;
-#endif
+	xen_setup_dma_ops(dev);
 }
diff --git a/include/xen/arm/xen-ops.h b/include/xen/arm/xen-ops.h
new file mode 100644
index 00000000..288deb1
--- /dev/null
+++ b/include/xen/arm/xen-ops.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_XEN_OPS_H
+#define _ASM_ARM_XEN_OPS_H
+
+#include <xen/swiotlb-xen.h>
+
+static inline void xen_setup_dma_ops(struct device *dev)
+{
+#ifdef CONFIG_XEN
+	if (xen_swiotlb_detect())
+		dev->dma_ops = &xen_swiotlb_dma_ops;
+#endif
+}
+
+#endif /* _ASM_ARM_XEN_OPS_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
  2022-05-07 18:19 ` [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops() Oleksandr Tyshchenko
@ 2022-05-07 18:19 ` Oleksandr Tyshchenko
  2022-05-11 18:00   ` Oleksandr
  2022-05-12 20:01   ` Boris Ostrovsky
  2022-05-07 18:19 ` [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen Oleksandr Tyshchenko
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, linux-kernel, linux-arm-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Julien Grall,
	Oleksandr Tyshchenko, Michael S. Tsirkin, Christoph Hellwig

From: Juergen Gross <jgross@suse.com>

For support of virtio via grant mappings in rare cases larger mappings
using consecutive grants are needed. Support those by adding a bitmap
of free grants.

As consecutive grants will be needed only in very rare cases (e.g. when
configuring a virtio device with a multi-page ring), optimize for the
normal case of non-consecutive allocations.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
Changes RFC -> V1:
   - no changes
   
Changes V1 -> V2:
   - no changes
---
 drivers/xen/grant-table.c | 238 +++++++++++++++++++++++++++++++++++++++-------
 include/xen/grant_table.h |   4 +
 2 files changed, 210 insertions(+), 32 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 8ccccac..1b458c0 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -33,6 +33,7 @@
 
 #define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt
 
+#include <linux/bitmap.h>
 #include <linux/memblock.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
@@ -72,9 +73,32 @@
 
 static grant_ref_t **gnttab_list;
 static unsigned int nr_grant_frames;
+
+/*
+ * Handling of free grants:
+ *
+ * Free grants are in a simple list anchored in gnttab_free_head. They are
+ * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
+ * of free entries is stored in gnttab_free_count.
+ * Additionally there is a bitmap of free entries anchored in
+ * gnttab_free_bitmap. This is being used for simplifying allocation of
+ * multiple consecutive grants, which is needed e.g. for support of virtio.
+ * gnttab_last_free is used to add free entries of new frames at the end of
+ * the free list.
+ * gnttab_free_tail_ptr specifies the variable which references the start
+ * of consecutive free grants ending with gnttab_last_free. This pointer is
+ * updated in a rather defensive way, in order to avoid performance hits in
+ * hot paths.
+ * All those variables are protected by gnttab_list_lock.
+ */
 static int gnttab_free_count;
-static grant_ref_t gnttab_free_head;
+static unsigned int gnttab_size;
+static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
+static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
+static grant_ref_t *gnttab_free_tail_ptr;
+static unsigned long *gnttab_free_bitmap;
 static DEFINE_SPINLOCK(gnttab_list_lock);
+
 struct grant_frames xen_auto_xlat_grant_frames;
 static unsigned int xen_gnttab_version;
 module_param_named(version, xen_gnttab_version, uint, 0);
@@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
 
 	ref = head = gnttab_free_head;
 	gnttab_free_count -= count;
-	while (count-- > 1)
-		head = gnttab_entry(head);
+	while (count--) {
+		bitmap_clear(gnttab_free_bitmap, head, 1);
+		if (gnttab_free_tail_ptr == __gnttab_entry(head))
+			gnttab_free_tail_ptr = &gnttab_free_head;
+		if (count)
+			head = gnttab_entry(head);
+	}
 	gnttab_free_head = gnttab_entry(head);
 	gnttab_entry(head) = GNTTAB_LIST_END;
 
+	if (!gnttab_free_count) {
+		gnttab_last_free = GNTTAB_LIST_END;
+		gnttab_free_tail_ptr = NULL;
+	}
+
 	spin_unlock_irqrestore(&gnttab_list_lock, flags);
 
 	return ref;
 }
 
+static int get_seq_entry_count(void)
+{
+	if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
+	    *gnttab_free_tail_ptr == GNTTAB_LIST_END)
+		return 0;
+
+	return gnttab_last_free - *gnttab_free_tail_ptr + 1;
+}
+
+/* Rebuilds the free grant list and tries to find count consecutive entries. */
+static int get_free_seq(unsigned int count)
+{
+	int ret = -ENOSPC;
+	unsigned int from, to;
+	grant_ref_t *last;
+
+	gnttab_free_tail_ptr = &gnttab_free_head;
+	last = &gnttab_free_head;
+
+	for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
+	     from < gnttab_size;
+	     from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
+		to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
+					from + 1);
+		if (ret < 0 && to - from >= count) {
+			ret = from;
+			bitmap_clear(gnttab_free_bitmap, ret, count);
+			from += count;
+			gnttab_free_count -= count;
+			if (from == to)
+				continue;
+		}
+
+		while (from < to) {
+			*last = from;
+			last = __gnttab_entry(from);
+			gnttab_last_free = from;
+			from++;
+		}
+		if (to < gnttab_size)
+			gnttab_free_tail_ptr = __gnttab_entry(to - 1);
+	}
+
+	*last = GNTTAB_LIST_END;
+	if (gnttab_last_free != gnttab_size - 1)
+		gnttab_free_tail_ptr = NULL;
+
+	return ret;
+}
+
+static int get_free_entries_seq(unsigned int count)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&gnttab_list_lock, flags);
+
+	if (gnttab_free_count < count) {
+		ret = gnttab_expand(count - gnttab_free_count);
+		if (ret < 0)
+			goto out;
+	}
+
+	if (get_seq_entry_count() < count) {
+		ret = get_free_seq(count);
+		if (ret >= 0)
+			goto out;
+		ret = gnttab_expand(count - get_seq_entry_count());
+		if (ret < 0)
+			goto out;
+	}
+
+	ret = *gnttab_free_tail_ptr;
+	*gnttab_free_tail_ptr = gnttab_entry(ret + count - 1);
+	gnttab_free_count -= count;
+	if (!gnttab_free_count)
+		gnttab_free_tail_ptr = NULL;
+	bitmap_clear(gnttab_free_bitmap, ret, count);
+
+ out:
+	spin_unlock_irqrestore(&gnttab_list_lock, flags);
+
+	return ret;
+}
+
 static void do_free_callbacks(void)
 {
 	struct gnttab_free_callback *callback, *next;
@@ -206,17 +325,48 @@ static inline void check_free_callbacks(void)
 		do_free_callbacks();
 }
 
-static void put_free_entry(grant_ref_t ref)
+static void put_free_entry_locked(grant_ref_t ref)
 {
-	unsigned long flags;
-	spin_lock_irqsave(&gnttab_list_lock, flags);
 	gnttab_entry(ref) = gnttab_free_head;
 	gnttab_free_head = ref;
+	if (!gnttab_free_count)
+		gnttab_last_free = ref;
+	if (gnttab_free_tail_ptr == &gnttab_free_head)
+		gnttab_free_tail_ptr = __gnttab_entry(ref);
 	gnttab_free_count++;
+	bitmap_set(gnttab_free_bitmap, ref, 1);
+}
+
+static void put_free_entry(grant_ref_t ref)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&gnttab_list_lock, flags);
+	put_free_entry_locked(ref);
 	check_free_callbacks();
 	spin_unlock_irqrestore(&gnttab_list_lock, flags);
 }
 
+static void gnttab_set_free(unsigned int start, unsigned int n)
+{
+	unsigned int i;
+
+	for (i = start; i < start + n - 1; i++)
+		gnttab_entry(i) = i + 1;
+
+	gnttab_entry(i) = GNTTAB_LIST_END;
+	if (!gnttab_free_count) {
+		gnttab_free_head = start;
+		gnttab_free_tail_ptr = &gnttab_free_head;
+	} else {
+		gnttab_entry(gnttab_last_free) = start;
+	}
+	gnttab_free_count += n;
+	gnttab_last_free = i;
+
+	bitmap_set(gnttab_free_bitmap, start, n);
+}
+
 /*
  * Following applies to gnttab_update_entry_v1 and gnttab_update_entry_v2.
  * Introducing a valid entry into the grant table:
@@ -448,23 +598,31 @@ void gnttab_free_grant_references(grant_ref_t head)
 {
 	grant_ref_t ref;
 	unsigned long flags;
-	int count = 1;
-	if (head == GNTTAB_LIST_END)
-		return;
+
 	spin_lock_irqsave(&gnttab_list_lock, flags);
-	ref = head;
-	while (gnttab_entry(ref) != GNTTAB_LIST_END) {
-		ref = gnttab_entry(ref);
-		count++;
+	while (head != GNTTAB_LIST_END) {
+		ref = gnttab_entry(head);
+		put_free_entry_locked(head);
+		head = ref;
 	}
-	gnttab_entry(ref) = gnttab_free_head;
-	gnttab_free_head = head;
-	gnttab_free_count += count;
 	check_free_callbacks();
 	spin_unlock_irqrestore(&gnttab_list_lock, flags);
 }
 EXPORT_SYMBOL_GPL(gnttab_free_grant_references);
 
+void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count)
+{
+	unsigned long flags;
+	unsigned int i;
+
+	spin_lock_irqsave(&gnttab_list_lock, flags);
+	for (i = count; i > 0; i--)
+		put_free_entry_locked(head + i - 1);
+	check_free_callbacks();
+	spin_unlock_irqrestore(&gnttab_list_lock, flags);
+}
+EXPORT_SYMBOL_GPL(gnttab_free_grant_reference_seq);
+
 int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
 {
 	int h = get_free_entries(count);
@@ -478,6 +636,24 @@ int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
 }
 EXPORT_SYMBOL_GPL(gnttab_alloc_grant_references);
 
+int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first)
+{
+	int h;
+
+	if (count == 1)
+		h = get_free_entries(1);
+	else
+		h = get_free_entries_seq(count);
+
+	if (h < 0)
+		return -ENOSPC;
+
+	*first = h;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(gnttab_alloc_grant_reference_seq);
+
 int gnttab_empty_grant_references(const grant_ref_t *private_head)
 {
 	return (*private_head == GNTTAB_LIST_END);
@@ -570,16 +746,13 @@ static int grow_gnttab_list(unsigned int more_frames)
 			goto grow_nomem;
 	}
 
+	gnttab_set_free(gnttab_size, extra_entries);
 
-	for (i = grefs_per_frame * nr_grant_frames;
-	     i < grefs_per_frame * new_nr_grant_frames - 1; i++)
-		gnttab_entry(i) = i + 1;
-
-	gnttab_entry(i) = gnttab_free_head;
-	gnttab_free_head = grefs_per_frame * nr_grant_frames;
-	gnttab_free_count += extra_entries;
+	if (!gnttab_free_tail_ptr)
+		gnttab_free_tail_ptr = __gnttab_entry(gnttab_size);
 
 	nr_grant_frames = new_nr_grant_frames;
+	gnttab_size += extra_entries;
 
 	check_free_callbacks();
 
@@ -1424,7 +1597,6 @@ int gnttab_init(void)
 	int i;
 	unsigned long max_nr_grant_frames;
 	unsigned int max_nr_glist_frames, nr_glist_frames;
-	unsigned int nr_init_grefs;
 	int ret;
 
 	gnttab_request_version();
@@ -1452,6 +1624,13 @@ int gnttab_init(void)
 		}
 	}
 
+	i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
+	gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
+	if (!gnttab_free_bitmap) {
+		ret = -ENOMEM;
+		goto ini_nomem;
+	}
+
 	ret = arch_gnttab_init(max_nr_grant_frames,
 			       nr_status_frames(max_nr_grant_frames));
 	if (ret < 0)
@@ -1462,15 +1641,9 @@ int gnttab_init(void)
 		goto ini_nomem;
 	}
 
-	nr_init_grefs = nr_grant_frames *
-			gnttab_interface->grefs_per_grant_frame;
-
-	for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
-		gnttab_entry(i) = i + 1;
+	gnttab_size = nr_grant_frames * gnttab_interface->grefs_per_grant_frame;
 
-	gnttab_entry(nr_init_grefs - 1) = GNTTAB_LIST_END;
-	gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
-	gnttab_free_head  = NR_RESERVED_ENTRIES;
+	gnttab_set_free(NR_RESERVED_ENTRIES, gnttab_size - NR_RESERVED_ENTRIES);
 
 	printk("Grant table initialized\n");
 	return 0;
@@ -1479,6 +1652,7 @@ int gnttab_init(void)
 	for (i--; i >= 0; i--)
 		free_page((unsigned long)gnttab_list[i]);
 	kfree(gnttab_list);
+	bitmap_free(gnttab_free_bitmap);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(gnttab_init);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index dfd5bf3..d815e1d 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -129,10 +129,14 @@ int gnttab_try_end_foreign_access(grant_ref_t ref);
  */
 int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head);
 
+int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first);
+
 void gnttab_free_grant_reference(grant_ref_t ref);
 
 void gnttab_free_grant_references(grant_ref_t head);
 
+void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count);
+
 int gnttab_empty_grant_references(const grant_ref_t *pprivate_head);
 
 int gnttab_claim_grant_reference(grant_ref_t *pprivate_head);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen
  2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
  2022-05-07 18:19 ` [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops() Oleksandr Tyshchenko
  2022-05-07 18:19 ` [PATCH V2 2/7] xen/grants: support allocating consecutive grants Oleksandr Tyshchenko
@ 2022-05-07 18:19 ` Oleksandr Tyshchenko
  2022-05-09 21:39   ` Stefano Stabellini
  2022-05-07 18:19 ` [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings Oleksandr Tyshchenko
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, linux-kernel, linux-arm-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Julien Grall,
	Oleksandr Tyshchenko, Michael S. Tsirkin, Christoph Hellwig

From: Juergen Gross <jgross@suse.com>

Introduce Xen grant DMA-mapping layer which contains special DMA-mapping
routines for providing grant references as DMA addresses to be used by
frontends (e.g. virtio) in Xen guests.

Add the needed functionality by providing a special set of DMA ops
handling the needed grant operations for the I/O pages.

The subsequent commit will introduce the use case for xen-grant DMA ops
layer to enable using virtio devices in Xen guests in a safe manner.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V1:
   - squash with almost all changes from commit (except handling "xen,dev-domid"
     property):
     "[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
   - update commit subject/description and comments in code
   - leave only single Kconfig option XEN_VIRTIO and remove architectural
     dependencies
   - introduce common xen_has_restricted_virtio_memory_access() in xen.h
     and update arch_has_restricted_virtio_memory_access() for both
     Arm and x86 to call new helper
   - use (1ULL << 63) instead of 0x8000000000000000ULL for XEN_GRANT_ADDR_OFF
   - implement xen_virtio_dma_map(unmap)_sg() using example in swiotlb-xen.c
   - optimize padding by moving "broken" field in struct xen_virtio_data
   - remove unneeded per-device spinlock
   - remove the inclusion of virtio_config.h
   - remane everything according to the new naming scheme:
     s/virtio/grant_dma
   - add new hidden config option XEN_GRANT_DMA_OPS

Changes V1 -> V2:
   - fix checkpatch.pl warnings
   - remove the inclusion of linux/pci.h
   - rework to use xarray for data context
   - remove EXPORT_SYMBOL_GPL(xen_grant_setup_dma_ops);
   - remove the line of * after SPDX-License-Identifier
   - split changes into grant-dma-ops.c and arch_has_restricted_virtio_memory_access()
     and update commit subject/description accordingly
   - remove "default n" for config XEN_VIRTIO
   - implement xen_grant_dma_alloc(free)_pages()
---
 drivers/xen/Kconfig         |   4 +
 drivers/xen/Makefile        |   1 +
 drivers/xen/grant-dma-ops.c | 314 ++++++++++++++++++++++++++++++++++++++++++++
 include/xen/xen-ops.h       |   8 ++
 4 files changed, 327 insertions(+)
 create mode 100644 drivers/xen/grant-dma-ops.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 120d32f..313a9127 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -335,4 +335,8 @@ config XEN_UNPOPULATED_ALLOC
 	  having to balloon out RAM regions in order to obtain physical memory
 	  space to create such mappings.
 
+config XEN_GRANT_DMA_OPS
+	bool
+	select DMA_OPS
+
 endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 5aae66e..1a23cb0 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -39,3 +39,4 @@ xen-gntalloc-y				:= gntalloc.o
 xen-privcmd-y				:= privcmd.o privcmd-buf.o
 obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)	+= xen-front-pgdir-shbuf.o
 obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)	+= unpopulated-alloc.o
+obj-$(CONFIG_XEN_GRANT_DMA_OPS)		+= grant-dma-ops.o
diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
new file mode 100644
index 00000000..29ad7bf
--- /dev/null
+++ b/drivers/xen/grant-dma-ops.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Xen grant DMA-mapping layer - contains special DMA-mapping routines
+ * for providing grant references as DMA addresses to be used by frontends
+ * (e.g. virtio) in Xen guests
+ *
+ * Copyright (c) 2021, Juergen Gross <jgross@suse.com>
+ */
+
+#include <linux/module.h>
+#include <linux/dma-map-ops.h>
+#include <linux/of.h>
+#include <linux/pfn.h>
+#include <linux/xarray.h>
+#include <xen/xen.h>
+#include <xen/grant_table.h>
+
+struct xen_grant_dma_data {
+	/* The ID of backend domain */
+	domid_t dev_domid;
+	/* Is device behaving sane? */
+	bool broken;
+};
+
+static DEFINE_XARRAY(xen_grant_dma_devices);
+
+#define XEN_GRANT_DMA_ADDR_OFF	(1ULL << 63)
+
+static inline dma_addr_t grant_to_dma(grant_ref_t grant)
+{
+	return XEN_GRANT_DMA_ADDR_OFF | ((dma_addr_t)grant << PAGE_SHIFT);
+}
+
+static inline grant_ref_t dma_to_grant(dma_addr_t dma)
+{
+	return (grant_ref_t)((dma & ~XEN_GRANT_DMA_ADDR_OFF) >> PAGE_SHIFT);
+}
+
+static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
+{
+	struct xen_grant_dma_data *data;
+
+	xa_lock(&xen_grant_dma_devices);
+	data = xa_load(&xen_grant_dma_devices, (unsigned long)dev);
+	xa_unlock(&xen_grant_dma_devices);
+
+	return data;
+}
+
+/*
+ * DMA ops for Xen frontends (e.g. virtio).
+ *
+ * Used to act as a kind of software IOMMU for Xen guests by using grants as
+ * DMA addresses.
+ * Such a DMA address is formed by using the grant reference as a frame
+ * number and setting the highest address bit (this bit is for the backend
+ * to be able to distinguish it from e.g. a mmio address).
+ *
+ * Note that for now we hard wire dom0 to be the backend domain. In order
+ * to support any domain as backend we'd need to add a way to communicate
+ * the domid of this backend, e.g. via Xenstore, via the PCI-device's
+ * config space or DT/ACPI.
+ */
+static void *xen_grant_dma_alloc(struct device *dev, size_t size,
+				 dma_addr_t *dma_handle, gfp_t gfp,
+				 unsigned long attrs)
+{
+	struct xen_grant_dma_data *data;
+	unsigned int i, n_pages = PFN_UP(size);
+	unsigned long pfn;
+	grant_ref_t grant;
+	void *ret;
+
+	data = find_xen_grant_dma_data(dev);
+	if (!data)
+		return NULL;
+
+	if (unlikely(data->broken))
+		return NULL;
+
+	ret = alloc_pages_exact(n_pages * PAGE_SIZE, gfp);
+	if (!ret)
+		return NULL;
+
+	pfn = virt_to_pfn(ret);
+
+	if (gnttab_alloc_grant_reference_seq(n_pages, &grant)) {
+		free_pages_exact(ret, n_pages * PAGE_SIZE);
+		return NULL;
+	}
+
+	for (i = 0; i < n_pages; i++) {
+		gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
+				pfn_to_gfn(pfn + i), 0);
+	}
+
+	*dma_handle = grant_to_dma(grant);
+
+	return ret;
+}
+
+static void xen_grant_dma_free(struct device *dev, size_t size, void *vaddr,
+			       dma_addr_t dma_handle, unsigned long attrs)
+{
+	struct xen_grant_dma_data *data;
+	unsigned int i, n_pages = PFN_UP(size);
+	grant_ref_t grant;
+
+	data = find_xen_grant_dma_data(dev);
+	if (!data)
+		return;
+
+	if (unlikely(data->broken))
+		return;
+
+	grant = dma_to_grant(dma_handle);
+
+	for (i = 0; i < n_pages; i++) {
+		if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
+			dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
+			data->broken = true;
+			return;
+		}
+	}
+
+	gnttab_free_grant_reference_seq(grant, n_pages);
+
+	free_pages_exact(vaddr, n_pages * PAGE_SIZE);
+}
+
+static struct page *xen_grant_dma_alloc_pages(struct device *dev, size_t size,
+					      dma_addr_t *dma_handle,
+					      enum dma_data_direction dir,
+					      gfp_t gfp)
+{
+	void *vaddr;
+
+	vaddr = xen_grant_dma_alloc(dev, size, dma_handle, gfp, 0);
+	if (!vaddr)
+		return NULL;
+
+	return virt_to_page(vaddr);
+}
+
+static void xen_grant_dma_free_pages(struct device *dev, size_t size,
+				     struct page *vaddr, dma_addr_t dma_handle,
+				     enum dma_data_direction dir)
+{
+	xen_grant_dma_free(dev, size, page_to_virt(vaddr), dma_handle, 0);
+}
+
+static dma_addr_t xen_grant_dma_map_page(struct device *dev, struct page *page,
+					 unsigned long offset, size_t size,
+					 enum dma_data_direction dir,
+					 unsigned long attrs)
+{
+	struct xen_grant_dma_data *data;
+	unsigned int i, n_pages = PFN_UP(size);
+	grant_ref_t grant;
+	dma_addr_t dma_handle;
+
+	if (WARN_ON(dir == DMA_NONE))
+		return DMA_MAPPING_ERROR;
+
+	data = find_xen_grant_dma_data(dev);
+	if (!data)
+		return DMA_MAPPING_ERROR;
+
+	if (unlikely(data->broken))
+		return DMA_MAPPING_ERROR;
+
+	if (gnttab_alloc_grant_reference_seq(n_pages, &grant))
+		return DMA_MAPPING_ERROR;
+
+	for (i = 0; i < n_pages; i++) {
+		gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
+				xen_page_to_gfn(page) + i, dir == DMA_TO_DEVICE);
+	}
+
+	dma_handle = grant_to_dma(grant) + offset;
+
+	return dma_handle;
+}
+
+static void xen_grant_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+				     size_t size, enum dma_data_direction dir,
+				     unsigned long attrs)
+{
+	struct xen_grant_dma_data *data;
+	unsigned int i, n_pages = PFN_UP(size);
+	grant_ref_t grant;
+
+	if (WARN_ON(dir == DMA_NONE))
+		return;
+
+	data = find_xen_grant_dma_data(dev);
+	if (!data)
+		return;
+
+	if (unlikely(data->broken))
+		return;
+
+	grant = dma_to_grant(dma_handle);
+
+	for (i = 0; i < n_pages; i++) {
+		if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
+			dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
+			data->broken = true;
+			return;
+		}
+	}
+
+	gnttab_free_grant_reference_seq(grant, n_pages);
+}
+
+static void xen_grant_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
+				   int nents, enum dma_data_direction dir,
+				   unsigned long attrs)
+{
+	struct scatterlist *s;
+	unsigned int i;
+
+	if (WARN_ON(dir == DMA_NONE))
+		return;
+
+	for_each_sg(sg, s, nents, i)
+		xen_grant_dma_unmap_page(dev, s->dma_address, sg_dma_len(s), dir,
+				attrs);
+}
+
+static int xen_grant_dma_map_sg(struct device *dev, struct scatterlist *sg,
+				int nents, enum dma_data_direction dir,
+				unsigned long attrs)
+{
+	struct scatterlist *s;
+	unsigned int i;
+
+	if (WARN_ON(dir == DMA_NONE))
+		return -EINVAL;
+
+	for_each_sg(sg, s, nents, i) {
+		s->dma_address = xen_grant_dma_map_page(dev, sg_page(s), s->offset,
+				s->length, dir, attrs);
+		if (s->dma_address == DMA_MAPPING_ERROR)
+			goto out;
+
+		sg_dma_len(s) = s->length;
+	}
+
+	return nents;
+
+out:
+	xen_grant_dma_unmap_sg(dev, sg, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
+	sg_dma_len(sg) = 0;
+
+	return -EIO;
+}
+
+static int xen_grant_dma_supported(struct device *dev, u64 mask)
+{
+	return mask == DMA_BIT_MASK(64);
+}
+
+static const struct dma_map_ops xen_grant_dma_ops = {
+	.alloc = xen_grant_dma_alloc,
+	.free = xen_grant_dma_free,
+	.alloc_pages = xen_grant_dma_alloc_pages,
+	.free_pages = xen_grant_dma_free_pages,
+	.mmap = dma_common_mmap,
+	.get_sgtable = dma_common_get_sgtable,
+	.map_page = xen_grant_dma_map_page,
+	.unmap_page = xen_grant_dma_unmap_page,
+	.map_sg = xen_grant_dma_map_sg,
+	.unmap_sg = xen_grant_dma_unmap_sg,
+	.dma_supported = xen_grant_dma_supported,
+};
+
+void xen_grant_setup_dma_ops(struct device *dev)
+{
+	struct xen_grant_dma_data *data;
+	uint32_t dev_domid;
+
+	data = find_xen_grant_dma_data(dev);
+	if (data) {
+		dev_err(dev, "Xen grant DMA data is already created\n");
+		return;
+	}
+
+	/* XXX The dom0 is hardcoded as the backend domain for now */
+	dev_domid = 0;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		goto err;
+
+	data->dev_domid = dev_domid;
+
+	if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
+			GFP_KERNEL))) {
+		dev_err(dev, "Cannot store Xen grant DMA data\n");
+		goto err;
+	}
+
+	dev->dma_ops = &xen_grant_dma_ops;
+
+	return;
+
+err:
+	dev_err(dev, "Сannot set up Xen grant DMA ops, retain platform DMA ops\n");
+}
+
+MODULE_DESCRIPTION("Xen grant DMA-mapping layer");
+MODULE_AUTHOR("Juergen Gross <jgross@suse.com>");
+MODULE_LICENSE("GPL");
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index a3584a3..4f9fad5 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -221,4 +221,12 @@ static inline void xen_preemptible_hcall_end(void) { }
 
 #endif /* CONFIG_XEN_PV && !CONFIG_PREEMPTION */
 
+#ifdef CONFIG_XEN_GRANT_DMA_OPS
+void xen_grant_setup_dma_ops(struct device *dev);
+#else
+static inline void xen_grant_setup_dma_ops(struct device *dev)
+{
+}
+#endif /* CONFIG_XEN_GRANT_DMA_OPS */
+
 #endif /* INCLUDE_XEN_OPS_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings
  2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  2022-05-07 18:19 ` [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen Oleksandr Tyshchenko
@ 2022-05-07 18:19 ` Oleksandr Tyshchenko
  2022-05-09 21:39   ` Stefano Stabellini
  2022-05-07 18:19 ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops Oleksandr Tyshchenko
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, x86, linux-arm-kernel, linux-kernel
  Cc: Juergen Gross, Oleksandr Tyshchenko, Stefano Stabellini,
	Russell King, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Julien Grall,
	Michael S. Tsirkin, Christoph Hellwig

From: Juergen Gross <jgross@suse.com>

In order to support virtio in Xen guests add a config option XEN_VIRTIO
enabling the user to specify whether in all Xen guests virtio should
be able to access memory via Xen grant mappings only on the host side.

Also set PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS feature from the guest
initialization code on Arm and x86 if CONFIG_XEN_VIRTIO is enabled.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes V1 -> V2:
   - new patch, split required changes from commit:
    "[PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen"
   - rework according to new platform_has() infrastructure
---
 arch/arm/xen/enlighten.c     |  2 ++
 arch/x86/xen/enlighten_hvm.c |  2 ++
 arch/x86/xen/enlighten_pv.c  |  2 ++
 drivers/xen/Kconfig          | 11 +++++++++++
 include/xen/xen.h            |  8 ++++++++
 5 files changed, 25 insertions(+)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 07eb69f..1f9c3ba 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -443,6 +443,8 @@ static int __init xen_guest_init(void)
 	if (!xen_domain())
 		return 0;
 
+	xen_set_restricted_virtio_memory_access();
+
 	if (!acpi_disabled)
 		xen_acpi_guest_init();
 	else
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 517a9d8..8b71b1d 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -195,6 +195,8 @@ static void __init xen_hvm_guest_init(void)
 	if (xen_pv_domain())
 		return;
 
+	xen_set_restricted_virtio_memory_access();
+
 	init_hvm_pv_info();
 
 	reserve_shared_info();
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5038edb..fcd5d5d 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -109,6 +109,8 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
 
 static void __init xen_pv_init_platform(void)
 {
+	xen_set_restricted_virtio_memory_access();
+
 	populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));
 
 	set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 313a9127..a7bd8ce 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -339,4 +339,15 @@ config XEN_GRANT_DMA_OPS
 	bool
 	select DMA_OPS
 
+config XEN_VIRTIO
+	bool "Xen virtio support"
+	depends on VIRTIO
+	select XEN_GRANT_DMA_OPS
+	help
+	  Enable virtio support for running as Xen guest. Depending on the
+	  guest type this will require special support on the backend side
+	  (qemu or kernel, depending on the virtio device types used).
+
+	  If in doubt, say n.
+
 endmenu
diff --git a/include/xen/xen.h b/include/xen/xen.h
index a99bab8..0780a81 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -52,6 +52,14 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 extern u64 xen_saved_max_mem_size;
 #endif
 
+#include <linux/platform-feature.h>
+
+static inline void xen_set_restricted_virtio_memory_access(void)
+{
+	if (IS_ENABLED(CONFIG_XEN_VIRTIO) && xen_domain())
+		platform_set(PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS);
+}
+
 #ifdef CONFIG_XEN_UNPOPULATED_ALLOC
 int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
 void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
                   ` (3 preceding siblings ...)
  2022-05-07 18:19 ` [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings Oleksandr Tyshchenko
@ 2022-05-07 18:19 ` Oleksandr Tyshchenko
  2022-05-09 21:39   ` [PATCH V2 5/7] dt-bindings: Add xen, dev-domid " Stefano Stabellini
                     ` (2 more replies)
  2022-05-07 18:19 ` [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices Oleksandr Tyshchenko
  2022-05-07 18:19 ` [PATCH V2 7/7] arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices Oleksandr Tyshchenko
  6 siblings, 3 replies; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, virtualization, devicetree, linux-kernel, linux-arm-kernel
  Cc: Oleksandr Tyshchenko, Jason Wang, Rob Herring,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Stefano Stabellini, Michael S. Tsirkin, Christoph Hellwig,
	Jean-Philippe Brucker

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Introduce Xen specific binding for the virtualized device (e.g. virtio)
to be used by Xen grant DMA-mapping layer in the subsequent commit.

This binding indicates that Xen grant mappings scheme needs to be
enabled for the device which DT node contains that property and specifies
the ID of Xen domain where the corresponding backend resides. The ID
(domid) is used as an argument to the grant mapping APIs.

This is needed for the option to restrict memory access using Xen grant
mappings to work which primary goal is to enable using virtio devices
in Xen guests.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V1:
   - update commit subject/description and text in description
   - move to devicetree/bindings/arm/

Changes V1 -> V2:
   - update text in description
   - change the maintainer of the binding
   - fix validation issue
   - reference xen,dev-domid.yaml schema from virtio/mmio.yaml
---
 .../devicetree/bindings/arm/xen,dev-domid.yaml     | 37 ++++++++++++++++++++++
 Documentation/devicetree/bindings/virtio/mmio.yaml |  7 ++++
 2 files changed, 44 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml

diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
new file mode 100644
index 00000000..750e89e
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
@@ -0,0 +1,37 @@
+# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xen specific binding for virtualized devices (e.g. virtio)
+
+maintainers:
+  - Stefano Stabellini <sstabellini@kernel.org>
+
+select: true
+
+description:
+  This binding indicates that Xen grant mappings need to be enabled for
+  the device, and it specifies the ID of the domain where the corresponding
+  device (backend) resides. The property is required to restrict memory
+  access using Xen grant mappings.
+
+properties:
+  xen,dev-domid:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description:
+      The domid (domain ID) of the domain where the device (backend) is running.
+
+additionalProperties: true
+
+examples:
+  - |
+    virtio@3000 {
+            compatible = "virtio,mmio";
+            reg = <0x3000 0x100>;
+            interrupts = <41>;
+
+            /* The device is located in Xen domain with ID 1 */
+            xen,dev-domid = <1>;
+    };
diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
index 10c22b5..29a0932 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.yaml
+++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
@@ -13,6 +13,9 @@ description:
   See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
   more details.
 
+allOf:
+  - $ref: /schemas/arm/xen,dev-domid.yaml#
+
 properties:
   compatible:
     const: virtio,mmio
@@ -33,6 +36,10 @@ properties:
     description: Required for devices making accesses thru an IOMMU.
     maxItems: 1
 
+  xen,dev-domid:
+    description: Required when Xen grant mappings need to be enabled for device.
+    $ref: /schemas/types.yaml#/definitions/uint32
+
 required:
   - compatible
   - reg
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices
  2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
                   ` (4 preceding siblings ...)
  2022-05-07 18:19 ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops Oleksandr Tyshchenko
@ 2022-05-07 18:19 ` Oleksandr Tyshchenko
  2022-05-09 21:39   ` Stefano Stabellini
  2022-05-07 18:19 ` [PATCH V2 7/7] arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices Oleksandr Tyshchenko
  6 siblings, 1 reply; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, linux-kernel, linux-arm-kernel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Boris Ostrovsky,
	Juergen Gross, Julien Grall, Michael S. Tsirkin,
	Christoph Hellwig

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Use the presence of recently introduced "xen,dev-domid" property
in the device node as a clear indicator of enabling Xen grant
mappings scheme for that device and read the ID of Xen domain where
the corresponding backend resides. The ID (domid) is used as
an argument to the Xen grant mapping APIs.

Also introduce xen_is_grant_dma_device() to check whether xen-grant
DMA ops need to be set for a passed device.

Remove the hardcoded domid 0 in xen_grant_setup_dma_ops().

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes RFC -> V1:
   - new patch, split required changes from commit:
    "[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
   - update checks in xen_virtio_setup_dma_ops() to only support
     DT devices for now
   - remove the "virtio,mmio" check from xen_is_virtio_device()
   - remane everything according to the new naming scheme:
     s/virtio/grant_dma

Changes V1 -> V2:
   - remove dev_is_pci() check in xen_grant_setup_dma_ops()
   - remove EXPORT_SYMBOL_GPL(xen_is_grant_dma_device);
---
 drivers/xen/grant-dma-ops.c | 24 +++++++++++++++++-------
 include/xen/xen-ops.h       |  5 +++++
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 29ad7bf..8924178 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -55,11 +55,6 @@ static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
  * Such a DMA address is formed by using the grant reference as a frame
  * number and setting the highest address bit (this bit is for the backend
  * to be able to distinguish it from e.g. a mmio address).
- *
- * Note that for now we hard wire dom0 to be the backend domain. In order
- * to support any domain as backend we'd need to add a way to communicate
- * the domid of this backend, e.g. via Xenstore, via the PCI-device's
- * config space or DT/ACPI.
  */
 static void *xen_grant_dma_alloc(struct device *dev, size_t size,
 				 dma_addr_t *dma_handle, gfp_t gfp,
@@ -275,6 +270,15 @@ static const struct dma_map_ops xen_grant_dma_ops = {
 	.dma_supported = xen_grant_dma_supported,
 };
 
+bool xen_is_grant_dma_device(struct device *dev)
+{
+	/* XXX Handle only DT devices for now */
+	if (!dev->of_node)
+		return false;
+
+	return of_property_read_bool(dev->of_node, "xen,dev-domid");
+}
+
 void xen_grant_setup_dma_ops(struct device *dev)
 {
 	struct xen_grant_dma_data *data;
@@ -286,8 +290,14 @@ void xen_grant_setup_dma_ops(struct device *dev)
 		return;
 	}
 
-	/* XXX The dom0 is hardcoded as the backend domain for now */
-	dev_domid = 0;
+	/* XXX ACPI device unsupported for now */
+	if (!dev->of_node)
+		goto err;
+
+	if (of_property_read_u32(dev->of_node, "xen,dev-domid", &dev_domid)) {
+		dev_err(dev, "xen,dev-domid property is not present\n");
+		goto err;
+	}
 
 	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
 	if (!data)
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 4f9fad5..62be9dc 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -223,10 +223,15 @@ static inline void xen_preemptible_hcall_end(void) { }
 
 #ifdef CONFIG_XEN_GRANT_DMA_OPS
 void xen_grant_setup_dma_ops(struct device *dev);
+bool xen_is_grant_dma_device(struct device *dev);
 #else
 static inline void xen_grant_setup_dma_ops(struct device *dev)
 {
 }
+static inline bool xen_is_grant_dma_device(struct device *dev)
+{
+	return false;
+}
 #endif /* CONFIG_XEN_GRANT_DMA_OPS */
 
 #endif /* INCLUDE_XEN_OPS_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V2 7/7] arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices
  2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
                   ` (5 preceding siblings ...)
  2022-05-07 18:19 ` [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices Oleksandr Tyshchenko
@ 2022-05-07 18:19 ` Oleksandr Tyshchenko
  6 siblings, 0 replies; 42+ messages in thread
From: Oleksandr Tyshchenko @ 2022-05-07 18:19 UTC (permalink / raw)
  To: xen-devel, linux-kernel, linux-arm-kernel
  Cc: Oleksandr Tyshchenko, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Julien Grall, Michael S. Tsirkin,
	Christoph Hellwig

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

By assigning xen-grant DMA ops we will restrict memory access for
passed device using Xen grant mappings. This is needed for using any
virtualized device (e.g. virtio) in Xen guests in a safe manner.

Please note, for the virtio devices the XEN_VIRTIO config should
be enabled (it forces ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS).

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
Changes RFC -> V1:
   - update commit subject/description
   - remove #ifdef CONFIG_XEN_VIRTIO
   - re-organize the check taking into the account that
     swiotlb and virtio cases are mutually exclusive
   - update according to the new naming scheme:
     s/virtio/grant_dma

Changes V1 -> V2:
   - add Stefano's R-b
   - remove arch_has_restricted_virtio_memory_access() check
   - update commit description
   - remove the inclusion of virtio_config.h
---
 include/xen/arm/xen-ops.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/xen/arm/xen-ops.h b/include/xen/arm/xen-ops.h
index 288deb1..b0766a6 100644
--- a/include/xen/arm/xen-ops.h
+++ b/include/xen/arm/xen-ops.h
@@ -3,11 +3,14 @@
 #define _ASM_ARM_XEN_OPS_H
 
 #include <xen/swiotlb-xen.h>
+#include <xen/xen-ops.h>
 
 static inline void xen_setup_dma_ops(struct device *dev)
 {
 #ifdef CONFIG_XEN
-	if (xen_swiotlb_detect())
+	if (xen_is_grant_dma_device(dev))
+		xen_grant_setup_dma_ops(dev);
+	else if (xen_swiotlb_detect())
 		dev->dma_ops = &xen_swiotlb_dma_ops;
 #endif
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops()
  2022-05-07 18:19 ` [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops() Oleksandr Tyshchenko
@ 2022-05-07 18:52   ` Catalin Marinas
  0 siblings, 0 replies; 42+ messages in thread
From: Catalin Marinas @ 2022-05-07 18:52 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-kernel, linux-arm-kernel, Oleksandr Tyshchenko,
	Stefano Stabellini, Russell King, Will Deacon, Boris Ostrovsky,
	Juergen Gross, Julien Grall, Michael S. Tsirkin,
	Christoph Hellwig

On Sat, May 07, 2022 at 09:19:02PM +0300, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch introduces new helper and places it in new header.
> The helper's purpose is to assign any Xen specific DMA ops in
> a single place. For now, we deal with xen-swiotlb DMA ops only.
> The one of the subsequent commits in current series will add
> xen-grant DMA ops case.
> 
> Also re-use the xen_swiotlb_detect() check on Arm32.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

For arm64:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen
  2022-05-07 18:19 ` [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen Oleksandr Tyshchenko
@ 2022-05-09 21:39   ` Stefano Stabellini
  0 siblings, 0 replies; 42+ messages in thread
From: Stefano Stabellini @ 2022-05-09 21:39 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-kernel, linux-arm-kernel, Juergen Gross,
	Boris Ostrovsky, Stefano Stabellini, Julien Grall,
	Oleksandr Tyshchenko, Michael S. Tsirkin, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 12952 bytes --]

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Juergen Gross <jgross@suse.com>
> 
> Introduce Xen grant DMA-mapping layer which contains special DMA-mapping
> routines for providing grant references as DMA addresses to be used by
> frontends (e.g. virtio) in Xen guests.
> 
> Add the needed functionality by providing a special set of DMA ops
> handling the needed grant operations for the I/O pages.
> 
> The subsequent commit will introduce the use case for xen-grant DMA ops
> layer to enable using virtio devices in Xen guests in a safe manner.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Changes RFC -> V1:
>    - squash with almost all changes from commit (except handling "xen,dev-domid"
>      property):
>      "[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
>    - update commit subject/description and comments in code
>    - leave only single Kconfig option XEN_VIRTIO and remove architectural
>      dependencies
>    - introduce common xen_has_restricted_virtio_memory_access() in xen.h
>      and update arch_has_restricted_virtio_memory_access() for both
>      Arm and x86 to call new helper
>    - use (1ULL << 63) instead of 0x8000000000000000ULL for XEN_GRANT_ADDR_OFF
>    - implement xen_virtio_dma_map(unmap)_sg() using example in swiotlb-xen.c
>    - optimize padding by moving "broken" field in struct xen_virtio_data
>    - remove unneeded per-device spinlock
>    - remove the inclusion of virtio_config.h
>    - remane everything according to the new naming scheme:
>      s/virtio/grant_dma
>    - add new hidden config option XEN_GRANT_DMA_OPS
> 
> Changes V1 -> V2:
>    - fix checkpatch.pl warnings
>    - remove the inclusion of linux/pci.h
>    - rework to use xarray for data context
>    - remove EXPORT_SYMBOL_GPL(xen_grant_setup_dma_ops);
>    - remove the line of * after SPDX-License-Identifier
>    - split changes into grant-dma-ops.c and arch_has_restricted_virtio_memory_access()
>      and update commit subject/description accordingly
>    - remove "default n" for config XEN_VIRTIO
>    - implement xen_grant_dma_alloc(free)_pages()
> ---
>  drivers/xen/Kconfig         |   4 +
>  drivers/xen/Makefile        |   1 +
>  drivers/xen/grant-dma-ops.c | 314 ++++++++++++++++++++++++++++++++++++++++++++
>  include/xen/xen-ops.h       |   8 ++
>  4 files changed, 327 insertions(+)
>  create mode 100644 drivers/xen/grant-dma-ops.c
> 
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 120d32f..313a9127 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -335,4 +335,8 @@ config XEN_UNPOPULATED_ALLOC
>  	  having to balloon out RAM regions in order to obtain physical memory
>  	  space to create such mappings.
>  
> +config XEN_GRANT_DMA_OPS
> +	bool
> +	select DMA_OPS
> +
>  endmenu
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 5aae66e..1a23cb0 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -39,3 +39,4 @@ xen-gntalloc-y				:= gntalloc.o
>  xen-privcmd-y				:= privcmd.o privcmd-buf.o
>  obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)	+= xen-front-pgdir-shbuf.o
>  obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)	+= unpopulated-alloc.o
> +obj-$(CONFIG_XEN_GRANT_DMA_OPS)		+= grant-dma-ops.o
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> new file mode 100644
> index 00000000..29ad7bf
> --- /dev/null
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -0,0 +1,314 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Xen grant DMA-mapping layer - contains special DMA-mapping routines
> + * for providing grant references as DMA addresses to be used by frontends
> + * (e.g. virtio) in Xen guests
> + *
> + * Copyright (c) 2021, Juergen Gross <jgross@suse.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/of.h>
> +#include <linux/pfn.h>
> +#include <linux/xarray.h>
> +#include <xen/xen.h>
> +#include <xen/grant_table.h>
> +
> +struct xen_grant_dma_data {
> +	/* The ID of backend domain */
> +	domid_t dev_domid;
> +	/* Is device behaving sane? */
> +	bool broken;
> +};
> +
> +static DEFINE_XARRAY(xen_grant_dma_devices);
> +
> +#define XEN_GRANT_DMA_ADDR_OFF	(1ULL << 63)
> +
> +static inline dma_addr_t grant_to_dma(grant_ref_t grant)
> +{
> +	return XEN_GRANT_DMA_ADDR_OFF | ((dma_addr_t)grant << PAGE_SHIFT);
> +}
> +
> +static inline grant_ref_t dma_to_grant(dma_addr_t dma)
> +{
> +	return (grant_ref_t)((dma & ~XEN_GRANT_DMA_ADDR_OFF) >> PAGE_SHIFT);
> +}
> +
> +static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
> +{
> +	struct xen_grant_dma_data *data;
> +
> +	xa_lock(&xen_grant_dma_devices);
> +	data = xa_load(&xen_grant_dma_devices, (unsigned long)dev);
> +	xa_unlock(&xen_grant_dma_devices);
> +
> +	return data;
> +}
> +
> +/*
> + * DMA ops for Xen frontends (e.g. virtio).
> + *
> + * Used to act as a kind of software IOMMU for Xen guests by using grants as
> + * DMA addresses.
> + * Such a DMA address is formed by using the grant reference as a frame
> + * number and setting the highest address bit (this bit is for the backend
> + * to be able to distinguish it from e.g. a mmio address).
> + *
> + * Note that for now we hard wire dom0 to be the backend domain. In order
> + * to support any domain as backend we'd need to add a way to communicate
> + * the domid of this backend, e.g. via Xenstore, via the PCI-device's
> + * config space or DT/ACPI.
> + */
> +static void *xen_grant_dma_alloc(struct device *dev, size_t size,
> +				 dma_addr_t *dma_handle, gfp_t gfp,
> +				 unsigned long attrs)
> +{
> +	struct xen_grant_dma_data *data;
> +	unsigned int i, n_pages = PFN_UP(size);
> +	unsigned long pfn;
> +	grant_ref_t grant;
> +	void *ret;
> +
> +	data = find_xen_grant_dma_data(dev);
> +	if (!data)
> +		return NULL;
> +
> +	if (unlikely(data->broken))
> +		return NULL;
> +
> +	ret = alloc_pages_exact(n_pages * PAGE_SIZE, gfp);
> +	if (!ret)
> +		return NULL;
> +
> +	pfn = virt_to_pfn(ret);
> +
> +	if (gnttab_alloc_grant_reference_seq(n_pages, &grant)) {
> +		free_pages_exact(ret, n_pages * PAGE_SIZE);
> +		return NULL;
> +	}
> +
> +	for (i = 0; i < n_pages; i++) {
> +		gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
> +				pfn_to_gfn(pfn + i), 0);
> +	}
> +
> +	*dma_handle = grant_to_dma(grant);
> +
> +	return ret;
> +}
> +
> +static void xen_grant_dma_free(struct device *dev, size_t size, void *vaddr,
> +			       dma_addr_t dma_handle, unsigned long attrs)
> +{
> +	struct xen_grant_dma_data *data;
> +	unsigned int i, n_pages = PFN_UP(size);
> +	grant_ref_t grant;
> +
> +	data = find_xen_grant_dma_data(dev);
> +	if (!data)
> +		return;
> +
> +	if (unlikely(data->broken))
> +		return;
> +
> +	grant = dma_to_grant(dma_handle);
> +
> +	for (i = 0; i < n_pages; i++) {
> +		if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
> +			dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
> +			data->broken = true;
> +			return;
> +		}
> +	}
> +
> +	gnttab_free_grant_reference_seq(grant, n_pages);
> +
> +	free_pages_exact(vaddr, n_pages * PAGE_SIZE);
> +}
> +
> +static struct page *xen_grant_dma_alloc_pages(struct device *dev, size_t size,
> +					      dma_addr_t *dma_handle,
> +					      enum dma_data_direction dir,
> +					      gfp_t gfp)
> +{
> +	void *vaddr;
> +
> +	vaddr = xen_grant_dma_alloc(dev, size, dma_handle, gfp, 0);
> +	if (!vaddr)
> +		return NULL;
> +
> +	return virt_to_page(vaddr);
> +}
> +
> +static void xen_grant_dma_free_pages(struct device *dev, size_t size,
> +				     struct page *vaddr, dma_addr_t dma_handle,
> +				     enum dma_data_direction dir)
> +{
> +	xen_grant_dma_free(dev, size, page_to_virt(vaddr), dma_handle, 0);
> +}
> +
> +static dma_addr_t xen_grant_dma_map_page(struct device *dev, struct page *page,
> +					 unsigned long offset, size_t size,
> +					 enum dma_data_direction dir,
> +					 unsigned long attrs)
> +{
> +	struct xen_grant_dma_data *data;
> +	unsigned int i, n_pages = PFN_UP(size);
> +	grant_ref_t grant;
> +	dma_addr_t dma_handle;
> +
> +	if (WARN_ON(dir == DMA_NONE))
> +		return DMA_MAPPING_ERROR;
> +
> +	data = find_xen_grant_dma_data(dev);
> +	if (!data)
> +		return DMA_MAPPING_ERROR;
> +
> +	if (unlikely(data->broken))
> +		return DMA_MAPPING_ERROR;
> +
> +	if (gnttab_alloc_grant_reference_seq(n_pages, &grant))
> +		return DMA_MAPPING_ERROR;
> +
> +	for (i = 0; i < n_pages; i++) {
> +		gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
> +				xen_page_to_gfn(page) + i, dir == DMA_TO_DEVICE);
> +	}
> +
> +	dma_handle = grant_to_dma(grant) + offset;
> +
> +	return dma_handle;
> +}
> +
> +static void xen_grant_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> +				     size_t size, enum dma_data_direction dir,
> +				     unsigned long attrs)
> +{
> +	struct xen_grant_dma_data *data;
> +	unsigned int i, n_pages = PFN_UP(size);
> +	grant_ref_t grant;
> +
> +	if (WARN_ON(dir == DMA_NONE))
> +		return;
> +
> +	data = find_xen_grant_dma_data(dev);
> +	if (!data)
> +		return;
> +
> +	if (unlikely(data->broken))
> +		return;
> +
> +	grant = dma_to_grant(dma_handle);
> +
> +	for (i = 0; i < n_pages; i++) {
> +		if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
> +			dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
> +			data->broken = true;
> +			return;
> +		}
> +	}
> +
> +	gnttab_free_grant_reference_seq(grant, n_pages);
> +}
> +
> +static void xen_grant_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
> +				   int nents, enum dma_data_direction dir,
> +				   unsigned long attrs)
> +{
> +	struct scatterlist *s;
> +	unsigned int i;
> +
> +	if (WARN_ON(dir == DMA_NONE))
> +		return;
> +
> +	for_each_sg(sg, s, nents, i)
> +		xen_grant_dma_unmap_page(dev, s->dma_address, sg_dma_len(s), dir,
> +				attrs);
> +}
> +
> +static int xen_grant_dma_map_sg(struct device *dev, struct scatterlist *sg,
> +				int nents, enum dma_data_direction dir,
> +				unsigned long attrs)
> +{
> +	struct scatterlist *s;
> +	unsigned int i;
> +
> +	if (WARN_ON(dir == DMA_NONE))
> +		return -EINVAL;
> +
> +	for_each_sg(sg, s, nents, i) {
> +		s->dma_address = xen_grant_dma_map_page(dev, sg_page(s), s->offset,
> +				s->length, dir, attrs);
> +		if (s->dma_address == DMA_MAPPING_ERROR)
> +			goto out;
> +
> +		sg_dma_len(s) = s->length;
> +	}
> +
> +	return nents;
> +
> +out:
> +	xen_grant_dma_unmap_sg(dev, sg, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
> +	sg_dma_len(sg) = 0;
> +
> +	return -EIO;
> +}
> +
> +static int xen_grant_dma_supported(struct device *dev, u64 mask)
> +{
> +	return mask == DMA_BIT_MASK(64);
> +}
> +
> +static const struct dma_map_ops xen_grant_dma_ops = {
> +	.alloc = xen_grant_dma_alloc,
> +	.free = xen_grant_dma_free,
> +	.alloc_pages = xen_grant_dma_alloc_pages,
> +	.free_pages = xen_grant_dma_free_pages,
> +	.mmap = dma_common_mmap,
> +	.get_sgtable = dma_common_get_sgtable,
> +	.map_page = xen_grant_dma_map_page,
> +	.unmap_page = xen_grant_dma_unmap_page,
> +	.map_sg = xen_grant_dma_map_sg,
> +	.unmap_sg = xen_grant_dma_unmap_sg,
> +	.dma_supported = xen_grant_dma_supported,
> +};
> +
> +void xen_grant_setup_dma_ops(struct device *dev)
> +{
> +	struct xen_grant_dma_data *data;
> +	uint32_t dev_domid;
> +
> +	data = find_xen_grant_dma_data(dev);
> +	if (data) {
> +		dev_err(dev, "Xen grant DMA data is already created\n");
> +		return;
> +	}
> +
> +	/* XXX The dom0 is hardcoded as the backend domain for now */
> +	dev_domid = 0;
> +
> +	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		goto err;
> +
> +	data->dev_domid = dev_domid;
> +
> +	if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
> +			GFP_KERNEL))) {
> +		dev_err(dev, "Cannot store Xen grant DMA data\n");
> +		goto err;
> +	}
> +
> +	dev->dma_ops = &xen_grant_dma_ops;
> +
> +	return;
> +
> +err:
> +	dev_err(dev, "Сannot set up Xen grant DMA ops, retain platform DMA ops\n");
> +}
> +
> +MODULE_DESCRIPTION("Xen grant DMA-mapping layer");
> +MODULE_AUTHOR("Juergen Gross <jgross@suse.com>");
> +MODULE_LICENSE("GPL");
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index a3584a3..4f9fad5 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -221,4 +221,12 @@ static inline void xen_preemptible_hcall_end(void) { }
>  
>  #endif /* CONFIG_XEN_PV && !CONFIG_PREEMPTION */
>  
> +#ifdef CONFIG_XEN_GRANT_DMA_OPS
> +void xen_grant_setup_dma_ops(struct device *dev);
> +#else
> +static inline void xen_grant_setup_dma_ops(struct device *dev)
> +{
> +}
> +#endif /* CONFIG_XEN_GRANT_DMA_OPS */
> +
>  #endif /* INCLUDE_XEN_OPS_H */
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings
  2022-05-07 18:19 ` [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings Oleksandr Tyshchenko
@ 2022-05-09 21:39   ` Stefano Stabellini
  0 siblings, 0 replies; 42+ messages in thread
From: Stefano Stabellini @ 2022-05-09 21:39 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, x86, linux-arm-kernel, linux-kernel, Juergen Gross,
	Oleksandr Tyshchenko, Stefano Stabellini, Russell King,
	Boris Ostrovsky, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Julien Grall, Michael S. Tsirkin,
	Christoph Hellwig

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Juergen Gross <jgross@suse.com>
> 
> In order to support virtio in Xen guests add a config option XEN_VIRTIO
> enabling the user to specify whether in all Xen guests virtio should
> be able to access memory via Xen grant mappings only on the host side.
> 
> Also set PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS feature from the guest
> initialization code on Arm and x86 if CONFIG_XEN_VIRTIO is enabled.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The patch looks OK to me

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Changes V1 -> V2:
>    - new patch, split required changes from commit:
>     "[PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen"
>    - rework according to new platform_has() infrastructure
> ---
>  arch/arm/xen/enlighten.c     |  2 ++
>  arch/x86/xen/enlighten_hvm.c |  2 ++
>  arch/x86/xen/enlighten_pv.c  |  2 ++
>  drivers/xen/Kconfig          | 11 +++++++++++
>  include/xen/xen.h            |  8 ++++++++
>  5 files changed, 25 insertions(+)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 07eb69f..1f9c3ba 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -443,6 +443,8 @@ static int __init xen_guest_init(void)
>  	if (!xen_domain())
>  		return 0;
>  
> +	xen_set_restricted_virtio_memory_access();
> +
>  	if (!acpi_disabled)
>  		xen_acpi_guest_init();
>  	else
> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
> index 517a9d8..8b71b1d 100644
> --- a/arch/x86/xen/enlighten_hvm.c
> +++ b/arch/x86/xen/enlighten_hvm.c
> @@ -195,6 +195,8 @@ static void __init xen_hvm_guest_init(void)
>  	if (xen_pv_domain())
>  		return;
>  
> +	xen_set_restricted_virtio_memory_access();
> +
>  	init_hvm_pv_info();
>  
>  	reserve_shared_info();
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 5038edb..fcd5d5d 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -109,6 +109,8 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>  
>  static void __init xen_pv_init_platform(void)
>  {
> +	xen_set_restricted_virtio_memory_access();
> +
>  	populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));
>  
>  	set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 313a9127..a7bd8ce 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -339,4 +339,15 @@ config XEN_GRANT_DMA_OPS
>  	bool
>  	select DMA_OPS
>  
> +config XEN_VIRTIO
> +	bool "Xen virtio support"
> +	depends on VIRTIO
> +	select XEN_GRANT_DMA_OPS
> +	help
> +	  Enable virtio support for running as Xen guest. Depending on the
> +	  guest type this will require special support on the backend side
> +	  (qemu or kernel, depending on the virtio device types used).
> +
> +	  If in doubt, say n.
> +
>  endmenu
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index a99bab8..0780a81 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -52,6 +52,14 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
>  extern u64 xen_saved_max_mem_size;
>  #endif
>  
> +#include <linux/platform-feature.h>
> +
> +static inline void xen_set_restricted_virtio_memory_access(void)
> +{
> +	if (IS_ENABLED(CONFIG_XEN_VIRTIO) && xen_domain())
> +		platform_set(PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS);
> +}
> +
>  #ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>  int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
>  void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen, dev-domid property description for xen-grant DMA ops
  2022-05-07 18:19 ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops Oleksandr Tyshchenko
@ 2022-05-09 21:39   ` Stefano Stabellini
  2022-05-17  0:27   ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid " Rob Herring
  2022-05-18 14:32   ` Arnd Bergmann
  2 siblings, 0 replies; 42+ messages in thread
From: Stefano Stabellini @ 2022-05-09 21:39 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, virtualization, devicetree, linux-kernel,
	linux-arm-kernel, Oleksandr Tyshchenko, Jason Wang, Rob Herring,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Stefano Stabellini, Michael S. Tsirkin, Christoph Hellwig,
	Jean-Philippe Brucker

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Introduce Xen specific binding for the virtualized device (e.g. virtio)
> to be used by Xen grant DMA-mapping layer in the subsequent commit.
> 
> This binding indicates that Xen grant mappings scheme needs to be
> enabled for the device which DT node contains that property and specifies
> the ID of Xen domain where the corresponding backend resides. The ID
> (domid) is used as an argument to the grant mapping APIs.
> 
> This is needed for the option to restrict memory access using Xen grant
> mappings to work which primary goal is to enable using virtio devices
> in Xen guests.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The binding is OK and the wording is OK too.

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

I am not an expert on the details of writing a good schema, I'll defer
to Rob if he has any comments on that.


> ---
> Changes RFC -> V1:
>    - update commit subject/description and text in description
>    - move to devicetree/bindings/arm/
> 
> Changes V1 -> V2:
>    - update text in description
>    - change the maintainer of the binding
>    - fix validation issue
>    - reference xen,dev-domid.yaml schema from virtio/mmio.yaml
> ---
>  .../devicetree/bindings/arm/xen,dev-domid.yaml     | 37 ++++++++++++++++++++++
>  Documentation/devicetree/bindings/virtio/mmio.yaml |  7 ++++
>  2 files changed, 44 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> 
> diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> new file mode 100644
> index 00000000..750e89e
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> @@ -0,0 +1,37 @@
> +# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Xen specific binding for virtualized devices (e.g. virtio)
> +
> +maintainers:
> +  - Stefano Stabellini <sstabellini@kernel.org>
> +
> +select: true
> +
> +description:
> +  This binding indicates that Xen grant mappings need to be enabled for
> +  the device, and it specifies the ID of the domain where the corresponding
> +  device (backend) resides. The property is required to restrict memory
> +  access using Xen grant mappings.
> +
> +properties:
> +  xen,dev-domid:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description:
> +      The domid (domain ID) of the domain where the device (backend) is running.
> +
> +additionalProperties: true
> +
> +examples:
> +  - |
> +    virtio@3000 {
> +            compatible = "virtio,mmio";
> +            reg = <0x3000 0x100>;
> +            interrupts = <41>;
> +
> +            /* The device is located in Xen domain with ID 1 */
> +            xen,dev-domid = <1>;
> +    };
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 10c22b5..29a0932 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -13,6 +13,9 @@ description:
>    See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>    more details.
>  
> +allOf:
> +  - $ref: /schemas/arm/xen,dev-domid.yaml#
> +
>  properties:
>    compatible:
>      const: virtio,mmio
> @@ -33,6 +36,10 @@ properties:
>      description: Required for devices making accesses thru an IOMMU.
>      maxItems: 1
>  
> +  xen,dev-domid:
> +    description: Required when Xen grant mappings need to be enabled for device.
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +
>  required:
>    - compatible
>    - reg
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices
  2022-05-07 18:19 ` [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices Oleksandr Tyshchenko
@ 2022-05-09 21:39   ` Stefano Stabellini
  0 siblings, 0 replies; 42+ messages in thread
From: Stefano Stabellini @ 2022-05-09 21:39 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, linux-kernel, linux-arm-kernel, Oleksandr Tyshchenko,
	Stefano Stabellini, Boris Ostrovsky, Juergen Gross, Julien Grall,
	Michael S. Tsirkin, Christoph Hellwig

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Use the presence of recently introduced "xen,dev-domid" property
> in the device node as a clear indicator of enabling Xen grant
> mappings scheme for that device and read the ID of Xen domain where
> the corresponding backend resides. The ID (domid) is used as
> an argument to the Xen grant mapping APIs.
> 
> Also introduce xen_is_grant_dma_device() to check whether xen-grant
> DMA ops need to be set for a passed device.
> 
> Remove the hardcoded domid 0 in xen_grant_setup_dma_ops().
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Changes RFC -> V1:
>    - new patch, split required changes from commit:
>     "[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
>    - update checks in xen_virtio_setup_dma_ops() to only support
>      DT devices for now
>    - remove the "virtio,mmio" check from xen_is_virtio_device()
>    - remane everything according to the new naming scheme:
>      s/virtio/grant_dma
> 
> Changes V1 -> V2:
>    - remove dev_is_pci() check in xen_grant_setup_dma_ops()
>    - remove EXPORT_SYMBOL_GPL(xen_is_grant_dma_device);
> ---
>  drivers/xen/grant-dma-ops.c | 24 +++++++++++++++++-------
>  include/xen/xen-ops.h       |  5 +++++
>  2 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> index 29ad7bf..8924178 100644
> --- a/drivers/xen/grant-dma-ops.c
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -55,11 +55,6 @@ static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
>   * Such a DMA address is formed by using the grant reference as a frame
>   * number and setting the highest address bit (this bit is for the backend
>   * to be able to distinguish it from e.g. a mmio address).
> - *
> - * Note that for now we hard wire dom0 to be the backend domain. In order
> - * to support any domain as backend we'd need to add a way to communicate
> - * the domid of this backend, e.g. via Xenstore, via the PCI-device's
> - * config space or DT/ACPI.
>   */
>  static void *xen_grant_dma_alloc(struct device *dev, size_t size,
>  				 dma_addr_t *dma_handle, gfp_t gfp,
> @@ -275,6 +270,15 @@ static const struct dma_map_ops xen_grant_dma_ops = {
>  	.dma_supported = xen_grant_dma_supported,
>  };
>  
> +bool xen_is_grant_dma_device(struct device *dev)
> +{
> +	/* XXX Handle only DT devices for now */
> +	if (!dev->of_node)
> +		return false;
> +
> +	return of_property_read_bool(dev->of_node, "xen,dev-domid");
> +}
> +
>  void xen_grant_setup_dma_ops(struct device *dev)
>  {
>  	struct xen_grant_dma_data *data;
> @@ -286,8 +290,14 @@ void xen_grant_setup_dma_ops(struct device *dev)
>  		return;
>  	}
>  
> -	/* XXX The dom0 is hardcoded as the backend domain for now */
> -	dev_domid = 0;
> +	/* XXX ACPI device unsupported for now */
> +	if (!dev->of_node)
> +		goto err;
> +
> +	if (of_property_read_u32(dev->of_node, "xen,dev-domid", &dev_domid)) {
> +		dev_err(dev, "xen,dev-domid property is not present\n");
> +		goto err;
> +	}
>  
>  	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>  	if (!data)
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index 4f9fad5..62be9dc 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -223,10 +223,15 @@ static inline void xen_preemptible_hcall_end(void) { }
>  
>  #ifdef CONFIG_XEN_GRANT_DMA_OPS
>  void xen_grant_setup_dma_ops(struct device *dev);
> +bool xen_is_grant_dma_device(struct device *dev);
>  #else
>  static inline void xen_grant_setup_dma_ops(struct device *dev)
>  {
>  }
> +static inline bool xen_is_grant_dma_device(struct device *dev)
> +{
> +	return false;
> +}
>  #endif /* CONFIG_XEN_GRANT_DMA_OPS */
>  
>  #endif /* INCLUDE_XEN_OPS_H */
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-07 18:19 ` [PATCH V2 2/7] xen/grants: support allocating consecutive grants Oleksandr Tyshchenko
@ 2022-05-11 18:00   ` Oleksandr
  2022-05-11 21:09     ` Boris Ostrovsky
  2022-05-12 20:01   ` Boris Ostrovsky
  1 sibling, 1 reply; 42+ messages in thread
From: Oleksandr @ 2022-05-11 18:00 UTC (permalink / raw)
  To: Boris Ostrovsky, Stefano Stabellini
  Cc: xen-devel, linux-kernel, linux-arm-kernel, Juergen Gross,
	Julien Grall, Oleksandr Tyshchenko, Michael S. Tsirkin,
	Christoph Hellwig


On 07.05.22 21:19, Oleksandr Tyshchenko wrote:

Hello Boris, Stefano


> From: Juergen Gross <jgross@suse.com>
>
> For support of virtio via grant mappings in rare cases larger mappings
> using consecutive grants are needed. Support those by adding a bitmap
> of free grants.
>
> As consecutive grants will be needed only in very rare cases (e.g. when
> configuring a virtio device with a multi-page ring), optimize for the
> normal case of non-consecutive allocations.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> Changes RFC -> V1:
>     - no changes
>     
> Changes V1 -> V2:
>     - no changes


May I please ask for the review here?


This patch has been tested in various scenarios:

1. Guest with Xen PV drivers only (gnttab_alloc(free)_grant_reference() 
usage only)

2. Guest with Virtio drivers only 
(gnttab_alloc(free)_grant_reference_seq() usage only)

3. Guest with Virtio and Xen PV drivers (combined 
gnttab_alloc(free)_grant_reference() and 
gnttab_alloc(free)_grant_reference_seq() usage)


> ---
>   drivers/xen/grant-table.c | 238 +++++++++++++++++++++++++++++++++++++++-------
>   include/xen/grant_table.h |   4 +
>   2 files changed, 210 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 8ccccac..1b458c0 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -33,6 +33,7 @@
>   
>   #define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt
>   
> +#include <linux/bitmap.h>
>   #include <linux/memblock.h>
>   #include <linux/sched.h>
>   #include <linux/mm.h>
> @@ -72,9 +73,32 @@
>   
>   static grant_ref_t **gnttab_list;
>   static unsigned int nr_grant_frames;
> +
> +/*
> + * Handling of free grants:
> + *
> + * Free grants are in a simple list anchored in gnttab_free_head. They are
> + * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
> + * of free entries is stored in gnttab_free_count.
> + * Additionally there is a bitmap of free entries anchored in
> + * gnttab_free_bitmap. This is being used for simplifying allocation of
> + * multiple consecutive grants, which is needed e.g. for support of virtio.
> + * gnttab_last_free is used to add free entries of new frames at the end of
> + * the free list.
> + * gnttab_free_tail_ptr specifies the variable which references the start
> + * of consecutive free grants ending with gnttab_last_free. This pointer is
> + * updated in a rather defensive way, in order to avoid performance hits in
> + * hot paths.
> + * All those variables are protected by gnttab_list_lock.
> + */
>   static int gnttab_free_count;
> -static grant_ref_t gnttab_free_head;
> +static unsigned int gnttab_size;
> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
> +static grant_ref_t *gnttab_free_tail_ptr;
> +static unsigned long *gnttab_free_bitmap;
>   static DEFINE_SPINLOCK(gnttab_list_lock);
> +
>   struct grant_frames xen_auto_xlat_grant_frames;
>   static unsigned int xen_gnttab_version;
>   module_param_named(version, xen_gnttab_version, uint, 0);
> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>   
>   	ref = head = gnttab_free_head;
>   	gnttab_free_count -= count;
> -	while (count-- > 1)
> -		head = gnttab_entry(head);
> +	while (count--) {
> +		bitmap_clear(gnttab_free_bitmap, head, 1);
> +		if (gnttab_free_tail_ptr == __gnttab_entry(head))
> +			gnttab_free_tail_ptr = &gnttab_free_head;
> +		if (count)
> +			head = gnttab_entry(head);
> +	}
>   	gnttab_free_head = gnttab_entry(head);
>   	gnttab_entry(head) = GNTTAB_LIST_END;
>   
> +	if (!gnttab_free_count) {
> +		gnttab_last_free = GNTTAB_LIST_END;
> +		gnttab_free_tail_ptr = NULL;
> +	}
> +
>   	spin_unlock_irqrestore(&gnttab_list_lock, flags);
>   
>   	return ref;
>   }
>   
> +static int get_seq_entry_count(void)
> +{
> +	if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
> +	    *gnttab_free_tail_ptr == GNTTAB_LIST_END)
> +		return 0;
> +
> +	return gnttab_last_free - *gnttab_free_tail_ptr + 1;
> +}
> +
> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
> +static int get_free_seq(unsigned int count)
> +{
> +	int ret = -ENOSPC;
> +	unsigned int from, to;
> +	grant_ref_t *last;
> +
> +	gnttab_free_tail_ptr = &gnttab_free_head;
> +	last = &gnttab_free_head;
> +
> +	for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
> +	     from < gnttab_size;
> +	     from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
> +		to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
> +					from + 1);
> +		if (ret < 0 && to - from >= count) {
> +			ret = from;
> +			bitmap_clear(gnttab_free_bitmap, ret, count);
> +			from += count;
> +			gnttab_free_count -= count;
> +			if (from == to)
> +				continue;
> +		}
> +
> +		while (from < to) {
> +			*last = from;
> +			last = __gnttab_entry(from);
> +			gnttab_last_free = from;
> +			from++;
> +		}
> +		if (to < gnttab_size)
> +			gnttab_free_tail_ptr = __gnttab_entry(to - 1);
> +	}
> +
> +	*last = GNTTAB_LIST_END;
> +	if (gnttab_last_free != gnttab_size - 1)
> +		gnttab_free_tail_ptr = NULL;
> +
> +	return ret;
> +}
> +
> +static int get_free_entries_seq(unsigned int count)
> +{
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	spin_lock_irqsave(&gnttab_list_lock, flags);
> +
> +	if (gnttab_free_count < count) {
> +		ret = gnttab_expand(count - gnttab_free_count);
> +		if (ret < 0)
> +			goto out;
> +	}
> +
> +	if (get_seq_entry_count() < count) {
> +		ret = get_free_seq(count);
> +		if (ret >= 0)
> +			goto out;
> +		ret = gnttab_expand(count - get_seq_entry_count());
> +		if (ret < 0)
> +			goto out;
> +	}
> +
> +	ret = *gnttab_free_tail_ptr;
> +	*gnttab_free_tail_ptr = gnttab_entry(ret + count - 1);
> +	gnttab_free_count -= count;
> +	if (!gnttab_free_count)
> +		gnttab_free_tail_ptr = NULL;
> +	bitmap_clear(gnttab_free_bitmap, ret, count);
> +
> + out:
> +	spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +
> +	return ret;
> +}
> +
>   static void do_free_callbacks(void)
>   {
>   	struct gnttab_free_callback *callback, *next;
> @@ -206,17 +325,48 @@ static inline void check_free_callbacks(void)
>   		do_free_callbacks();
>   }
>   
> -static void put_free_entry(grant_ref_t ref)
> +static void put_free_entry_locked(grant_ref_t ref)
>   {
> -	unsigned long flags;
> -	spin_lock_irqsave(&gnttab_list_lock, flags);
>   	gnttab_entry(ref) = gnttab_free_head;
>   	gnttab_free_head = ref;
> +	if (!gnttab_free_count)
> +		gnttab_last_free = ref;
> +	if (gnttab_free_tail_ptr == &gnttab_free_head)
> +		gnttab_free_tail_ptr = __gnttab_entry(ref);
>   	gnttab_free_count++;
> +	bitmap_set(gnttab_free_bitmap, ref, 1);
> +}
> +
> +static void put_free_entry(grant_ref_t ref)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&gnttab_list_lock, flags);
> +	put_free_entry_locked(ref);
>   	check_free_callbacks();
>   	spin_unlock_irqrestore(&gnttab_list_lock, flags);
>   }
>   
> +static void gnttab_set_free(unsigned int start, unsigned int n)
> +{
> +	unsigned int i;
> +
> +	for (i = start; i < start + n - 1; i++)
> +		gnttab_entry(i) = i + 1;
> +
> +	gnttab_entry(i) = GNTTAB_LIST_END;
> +	if (!gnttab_free_count) {
> +		gnttab_free_head = start;
> +		gnttab_free_tail_ptr = &gnttab_free_head;
> +	} else {
> +		gnttab_entry(gnttab_last_free) = start;
> +	}
> +	gnttab_free_count += n;
> +	gnttab_last_free = i;
> +
> +	bitmap_set(gnttab_free_bitmap, start, n);
> +}
> +
>   /*
>    * Following applies to gnttab_update_entry_v1 and gnttab_update_entry_v2.
>    * Introducing a valid entry into the grant table:
> @@ -448,23 +598,31 @@ void gnttab_free_grant_references(grant_ref_t head)
>   {
>   	grant_ref_t ref;
>   	unsigned long flags;
> -	int count = 1;
> -	if (head == GNTTAB_LIST_END)
> -		return;
> +
>   	spin_lock_irqsave(&gnttab_list_lock, flags);
> -	ref = head;
> -	while (gnttab_entry(ref) != GNTTAB_LIST_END) {
> -		ref = gnttab_entry(ref);
> -		count++;
> +	while (head != GNTTAB_LIST_END) {
> +		ref = gnttab_entry(head);
> +		put_free_entry_locked(head);
> +		head = ref;
>   	}
> -	gnttab_entry(ref) = gnttab_free_head;
> -	gnttab_free_head = head;
> -	gnttab_free_count += count;
>   	check_free_callbacks();
>   	spin_unlock_irqrestore(&gnttab_list_lock, flags);
>   }
>   EXPORT_SYMBOL_GPL(gnttab_free_grant_references);
>   
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count)
> +{
> +	unsigned long flags;
> +	unsigned int i;
> +
> +	spin_lock_irqsave(&gnttab_list_lock, flags);
> +	for (i = count; i > 0; i--)
> +		put_free_entry_locked(head + i - 1);
> +	check_free_callbacks();
> +	spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_free_grant_reference_seq);
> +
>   int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
>   {
>   	int h = get_free_entries(count);
> @@ -478,6 +636,24 @@ int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
>   }
>   EXPORT_SYMBOL_GPL(gnttab_alloc_grant_references);
>   
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first)
> +{
> +	int h;
> +
> +	if (count == 1)
> +		h = get_free_entries(1);
> +	else
> +		h = get_free_entries_seq(count);
> +
> +	if (h < 0)
> +		return -ENOSPC;
> +
> +	*first = h;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_alloc_grant_reference_seq);
> +
>   int gnttab_empty_grant_references(const grant_ref_t *private_head)
>   {
>   	return (*private_head == GNTTAB_LIST_END);
> @@ -570,16 +746,13 @@ static int grow_gnttab_list(unsigned int more_frames)
>   			goto grow_nomem;
>   	}
>   
> +	gnttab_set_free(gnttab_size, extra_entries);
>   
> -	for (i = grefs_per_frame * nr_grant_frames;
> -	     i < grefs_per_frame * new_nr_grant_frames - 1; i++)
> -		gnttab_entry(i) = i + 1;
> -
> -	gnttab_entry(i) = gnttab_free_head;
> -	gnttab_free_head = grefs_per_frame * nr_grant_frames;
> -	gnttab_free_count += extra_entries;
> +	if (!gnttab_free_tail_ptr)
> +		gnttab_free_tail_ptr = __gnttab_entry(gnttab_size);
>   
>   	nr_grant_frames = new_nr_grant_frames;
> +	gnttab_size += extra_entries;
>   
>   	check_free_callbacks();
>   
> @@ -1424,7 +1597,6 @@ int gnttab_init(void)
>   	int i;
>   	unsigned long max_nr_grant_frames;
>   	unsigned int max_nr_glist_frames, nr_glist_frames;
> -	unsigned int nr_init_grefs;
>   	int ret;
>   
>   	gnttab_request_version();
> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
>   		}
>   	}
>   
> +	i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
> +	gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
> +	if (!gnttab_free_bitmap) {
> +		ret = -ENOMEM;
> +		goto ini_nomem;
> +	}
> +
>   	ret = arch_gnttab_init(max_nr_grant_frames,
>   			       nr_status_frames(max_nr_grant_frames));
>   	if (ret < 0)
> @@ -1462,15 +1641,9 @@ int gnttab_init(void)
>   		goto ini_nomem;
>   	}
>   
> -	nr_init_grefs = nr_grant_frames *
> -			gnttab_interface->grefs_per_grant_frame;
> -
> -	for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
> -		gnttab_entry(i) = i + 1;
> +	gnttab_size = nr_grant_frames * gnttab_interface->grefs_per_grant_frame;
>   
> -	gnttab_entry(nr_init_grefs - 1) = GNTTAB_LIST_END;
> -	gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
> -	gnttab_free_head  = NR_RESERVED_ENTRIES;
> +	gnttab_set_free(NR_RESERVED_ENTRIES, gnttab_size - NR_RESERVED_ENTRIES);
>   
>   	printk("Grant table initialized\n");
>   	return 0;
> @@ -1479,6 +1652,7 @@ int gnttab_init(void)
>   	for (i--; i >= 0; i--)
>   		free_page((unsigned long)gnttab_list[i]);
>   	kfree(gnttab_list);
> +	bitmap_free(gnttab_free_bitmap);
>   	return ret;
>   }
>   EXPORT_SYMBOL_GPL(gnttab_init);
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index dfd5bf3..d815e1d 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -129,10 +129,14 @@ int gnttab_try_end_foreign_access(grant_ref_t ref);
>    */
>   int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head);
>   
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first);
> +
>   void gnttab_free_grant_reference(grant_ref_t ref);
>   
>   void gnttab_free_grant_references(grant_ref_t head);
>   
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count);
> +
>   int gnttab_empty_grant_references(const grant_ref_t *pprivate_head);
>   
>   int gnttab_claim_grant_reference(grant_ref_t *pprivate_head);

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-11 18:00   ` Oleksandr
@ 2022-05-11 21:09     ` Boris Ostrovsky
  2022-05-12  6:11       ` Oleksandr
  0 siblings, 1 reply; 42+ messages in thread
From: Boris Ostrovsky @ 2022-05-11 21:09 UTC (permalink / raw)
  To: Oleksandr, Stefano Stabellini
  Cc: xen-devel, linux-kernel, linux-arm-kernel, Juergen Gross,
	Julien Grall, Oleksandr Tyshchenko, Michael S. Tsirkin,
	Christoph Hellwig


On 5/11/22 2:00 PM, Oleksandr wrote:
>
> On 07.05.22 21:19, Oleksandr Tyshchenko wrote:
>
> Hello Boris, Stefano
>
>
>> From: Juergen Gross <jgross@suse.com>
>>
>> For support of virtio via grant mappings in rare cases larger mappings
>> using consecutive grants are needed. Support those by adding a bitmap
>> of free grants.
>>
>> As consecutive grants will be needed only in very rare cases (e.g. when
>> configuring a virtio device with a multi-page ring), optimize for the
>> normal case of non-consecutive allocations.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> Changes RFC -> V1:
>>     - no changes
>>     Changes V1 -> V2:
>>     - no changes
>
>
> May I please ask for the review here?



I had a quick look but I am stuck on get_free_seq(), I need to stare at it some more. Unless someone else reviews this, I will try to get to this in the next couple of days.


One thing I did notice is


>
>> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
>>           }
>>       }
>>   +    i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
>> +    gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
>> +    if (!gnttab_free_bitmap) {
>> +        ret = -ENOMEM;
>> +        goto ini_nomem;
>> +    }


This overwrites 'i' and will break error handling at ini_nomem.


-boris



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-11 21:09     ` Boris Ostrovsky
@ 2022-05-12  6:11       ` Oleksandr
  0 siblings, 0 replies; 42+ messages in thread
From: Oleksandr @ 2022-05-12  6:11 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Stefano Stabellini, xen-devel, linux-kernel, linux-arm-kernel,
	Juergen Gross, Julien Grall, Oleksandr Tyshchenko,
	Michael S. Tsirkin, Christoph Hellwig


On 12.05.22 00:09, Boris Ostrovsky wrote:


Hello Boris


>
> On 5/11/22 2:00 PM, Oleksandr wrote:
>>
>> On 07.05.22 21:19, Oleksandr Tyshchenko wrote:
>>
>> Hello Boris, Stefano
>>
>>
>>> From: Juergen Gross <jgross@suse.com>
>>>
>>> For support of virtio via grant mappings in rare cases larger mappings
>>> using consecutive grants are needed. Support those by adding a bitmap
>>> of free grants.
>>>
>>> As consecutive grants will be needed only in very rare cases (e.g. when
>>> configuring a virtio device with a multi-page ring), optimize for the
>>> normal case of non-consecutive allocations.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>> Changes RFC -> V1:
>>>     - no changes
>>>     Changes V1 -> V2:
>>>     - no changes
>>
>>
>> May I please ask for the review here?
>
>
>
> I had a quick look but I am stuck on get_free_seq(), I need to stare 
> at it some more. Unless someone else reviews this, I will try to get 
> to this in the next couple of days.


Thank you!


>
>
>
> One thing I did notice is
>
>
>>
>>> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
>>>           }
>>>       }
>>>   +    i = gnttab_interface->grefs_per_grant_frame * 
>>> max_nr_grant_frames;
>>> +    gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
>>> +    if (!gnttab_free_bitmap) {
>>> +        ret = -ENOMEM;
>>> +        goto ini_nomem;
>>> +    }
>
>
> This overwrites 'i' and will break error handling at ini_nomem.


Indeed, will fix. Thank you for pointing this out.


>
>
> -boris
>
>
-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-07 18:19 ` [PATCH V2 2/7] xen/grants: support allocating consecutive grants Oleksandr Tyshchenko
  2022-05-11 18:00   ` Oleksandr
@ 2022-05-12 20:01   ` Boris Ostrovsky
  2022-05-13  5:33     ` Juergen Gross
  1 sibling, 1 reply; 42+ messages in thread
From: Boris Ostrovsky @ 2022-05-12 20:01 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel, linux-kernel, linux-arm-kernel
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall,
	Oleksandr Tyshchenko, Michael S. Tsirkin, Christoph Hellwig


On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
> +
> +/*
> + * Handling of free grants:
> + *
> + * Free grants are in a simple list anchored in gnttab_free_head. They are
> + * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
> + * of free entries is stored in gnttab_free_count.
> + * Additionally there is a bitmap of free entries anchored in
> + * gnttab_free_bitmap. This is being used for simplifying allocation of
> + * multiple consecutive grants, which is needed e.g. for support of virtio.
> + * gnttab_last_free is used to add free entries of new frames at the end of
> + * the free list.
> + * gnttab_free_tail_ptr specifies the variable which references the start


If this references the start of the free interval, why is it called gnttab_free_*tail*_ptr?


> + * of consecutive free grants ending with gnttab_last_free. This pointer is
> + * updated in a rather defensive way, in order to avoid performance hits in
> + * hot paths.
> + * All those variables are protected by gnttab_list_lock.
> + */
>   static int gnttab_free_count;
> -static grant_ref_t gnttab_free_head;
> +static unsigned int gnttab_size;
> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
> +static grant_ref_t *gnttab_free_tail_ptr;
> +static unsigned long *gnttab_free_bitmap;
>   static DEFINE_SPINLOCK(gnttab_list_lock);
> +
>   struct grant_frames xen_auto_xlat_grant_frames;
>   static unsigned int xen_gnttab_version;
>   module_param_named(version, xen_gnttab_version, uint, 0);
> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>   
>   	ref = head = gnttab_free_head;
>   	gnttab_free_count -= count;
> -	while (count-- > 1)
> -		head = gnttab_entry(head);
> +	while (count--) {
> +		bitmap_clear(gnttab_free_bitmap, head, 1);
> +		if (gnttab_free_tail_ptr == __gnttab_entry(head))
> +			gnttab_free_tail_ptr = &gnttab_free_head;
> +		if (count)
> +			head = gnttab_entry(head);
> +	}
>   	gnttab_free_head = gnttab_entry(head);
>   	gnttab_entry(head) = GNTTAB_LIST_END;
>   
> +	if (!gnttab_free_count) {
> +		gnttab_last_free = GNTTAB_LIST_END;
> +		gnttab_free_tail_ptr = NULL;
> +	}
> +
>   	spin_unlock_irqrestore(&gnttab_list_lock, flags);
>   
>   	return ref;
>   }
>   
> +static int get_seq_entry_count(void)
> +{
> +	if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
> +	    *gnttab_free_tail_ptr == GNTTAB_LIST_END)
> +		return 0;
> +
> +	return gnttab_last_free - *gnttab_free_tail_ptr + 1;
> +}
> +
> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
> +static int get_free_seq(unsigned int count)
> +{
> +	int ret = -ENOSPC;
> +	unsigned int from, to;
> +	grant_ref_t *last;
> +
> +	gnttab_free_tail_ptr = &gnttab_free_head;
> +	last = &gnttab_free_head;
> +
> +	for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
> +	     from < gnttab_size;
> +	     from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
> +		to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
> +					from + 1);
> +		if (ret < 0 && to - from >= count) {
> +			ret = from;
> +			bitmap_clear(gnttab_free_bitmap, ret, count);
> +			from += count;
> +			gnttab_free_count -= count;


IIUIC we can have multiple passes over this, meaning that the gnttab_free_count may be decremented more than once. Is that intentional?


> +			if (from == to)
> +				continue;
> +		}
> +
> +		while (from < to) {
> +			*last = from;
> +			last = __gnttab_entry(from);
> +			gnttab_last_free = from;
> +			from++;
> +		}


I have been looking at this loop and I can't understand what it is doing ;-( Can you enlighten me?



-boris



> +		if (to < gnttab_size)
> +			gnttab_free_tail_ptr = __gnttab_entry(to - 1);
> +	}
> +
> +	*last = GNTTAB_LIST_END;
> +	if (gnttab_last_free != gnttab_size - 1)
> +		gnttab_free_tail_ptr = NULL;
> +
> +	return ret;
> +}
> +
> +static int get_free_entries_seq(unsigned int count)
> +{
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	spin_lock_irqsave(&gnttab_list_lock, flags);
> +
> +	if (gnttab_free_count < count) {
> +		ret = gnttab_expand(count - gnttab_free_count);
> +		if (ret < 0)
> +			goto out;
> +	}
> +
> +	if (get_seq_entry_count() < count) {
> +		ret = get_free_seq(count);
> +		if (ret >= 0)
> +			goto out;
> +		ret = gnttab_expand(count - get_seq_entry_count());
> +		if (ret < 0)
> +			goto out;
> +	}
> +
> +	ret = *gnttab_free_tail_ptr;
> +	*gnttab_free_tail_ptr = gnttab_entry(ret + count - 1);
> +	gnttab_free_count -= count;
> +	if (!gnttab_free_count)
> +		gnttab_free_tail_ptr = NULL;
> +	bitmap_clear(gnttab_free_bitmap, ret, count);
> +
> + out:
> +	spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +
> +	return ret;
> +}
> +
>   static void do_free_callbacks(void)
>   {
>   	struct gnttab_free_callback *callback, *next;
> @@ -206,17 +325,48 @@ static inline void check_free_callbacks(void)
>   		do_free_callbacks();
>   }
>   
> -static void put_free_entry(grant_ref_t ref)
> +static void put_free_entry_locked(grant_ref_t ref)
>   {
> -	unsigned long flags;
> -	spin_lock_irqsave(&gnttab_list_lock, flags);
>   	gnttab_entry(ref) = gnttab_free_head;
>   	gnttab_free_head = ref;
> +	if (!gnttab_free_count)
> +		gnttab_last_free = ref;
> +	if (gnttab_free_tail_ptr == &gnttab_free_head)
> +		gnttab_free_tail_ptr = __gnttab_entry(ref);
>   	gnttab_free_count++;
> +	bitmap_set(gnttab_free_bitmap, ref, 1);
> +}
> +
> +static void put_free_entry(grant_ref_t ref)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&gnttab_list_lock, flags);
> +	put_free_entry_locked(ref);
>   	check_free_callbacks();
>   	spin_unlock_irqrestore(&gnttab_list_lock, flags);
>   }
>   
> +static void gnttab_set_free(unsigned int start, unsigned int n)
> +{
> +	unsigned int i;
> +
> +	for (i = start; i < start + n - 1; i++)
> +		gnttab_entry(i) = i + 1;
> +
> +	gnttab_entry(i) = GNTTAB_LIST_END;
> +	if (!gnttab_free_count) {
> +		gnttab_free_head = start;
> +		gnttab_free_tail_ptr = &gnttab_free_head;
> +	} else {
> +		gnttab_entry(gnttab_last_free) = start;
> +	}
> +	gnttab_free_count += n;
> +	gnttab_last_free = i;
> +
> +	bitmap_set(gnttab_free_bitmap, start, n);
> +}
> +
>   /*
>    * Following applies to gnttab_update_entry_v1 and gnttab_update_entry_v2.
>    * Introducing a valid entry into the grant table:
> @@ -448,23 +598,31 @@ void gnttab_free_grant_references(grant_ref_t head)
>   {
>   	grant_ref_t ref;
>   	unsigned long flags;
> -	int count = 1;
> -	if (head == GNTTAB_LIST_END)
> -		return;
> +
>   	spin_lock_irqsave(&gnttab_list_lock, flags);
> -	ref = head;
> -	while (gnttab_entry(ref) != GNTTAB_LIST_END) {
> -		ref = gnttab_entry(ref);
> -		count++;
> +	while (head != GNTTAB_LIST_END) {
> +		ref = gnttab_entry(head);
> +		put_free_entry_locked(head);
> +		head = ref;
>   	}
> -	gnttab_entry(ref) = gnttab_free_head;
> -	gnttab_free_head = head;
> -	gnttab_free_count += count;
>   	check_free_callbacks();
>   	spin_unlock_irqrestore(&gnttab_list_lock, flags);
>   }
>   EXPORT_SYMBOL_GPL(gnttab_free_grant_references);
>   
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count)
> +{
> +	unsigned long flags;
> +	unsigned int i;
> +
> +	spin_lock_irqsave(&gnttab_list_lock, flags);
> +	for (i = count; i > 0; i--)
> +		put_free_entry_locked(head + i - 1);
> +	check_free_callbacks();
> +	spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_free_grant_reference_seq);
> +
>   int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
>   {
>   	int h = get_free_entries(count);
> @@ -478,6 +636,24 @@ int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
>   }
>   EXPORT_SYMBOL_GPL(gnttab_alloc_grant_references);
>   
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first)
> +{
> +	int h;
> +
> +	if (count == 1)
> +		h = get_free_entries(1);
> +	else
> +		h = get_free_entries_seq(count);
> +
> +	if (h < 0)
> +		return -ENOSPC;
> +
> +	*first = h;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_alloc_grant_reference_seq);
> +
>   int gnttab_empty_grant_references(const grant_ref_t *private_head)
>   {
>   	return (*private_head == GNTTAB_LIST_END);
> @@ -570,16 +746,13 @@ static int grow_gnttab_list(unsigned int more_frames)
>   			goto grow_nomem;
>   	}
>   
> +	gnttab_set_free(gnttab_size, extra_entries);
>   
> -	for (i = grefs_per_frame * nr_grant_frames;
> -	     i < grefs_per_frame * new_nr_grant_frames - 1; i++)
> -		gnttab_entry(i) = i + 1;
> -
> -	gnttab_entry(i) = gnttab_free_head;
> -	gnttab_free_head = grefs_per_frame * nr_grant_frames;
> -	gnttab_free_count += extra_entries;
> +	if (!gnttab_free_tail_ptr)
> +		gnttab_free_tail_ptr = __gnttab_entry(gnttab_size);
>   
>   	nr_grant_frames = new_nr_grant_frames;
> +	gnttab_size += extra_entries;
>   
>   	check_free_callbacks();
>   
> @@ -1424,7 +1597,6 @@ int gnttab_init(void)
>   	int i;
>   	unsigned long max_nr_grant_frames;
>   	unsigned int max_nr_glist_frames, nr_glist_frames;
> -	unsigned int nr_init_grefs;
>   	int ret;
>   
>   	gnttab_request_version();
> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
>   		}
>   	}
>   
> +	i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
> +	gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
> +	if (!gnttab_free_bitmap) {
> +		ret = -ENOMEM;
> +		goto ini_nomem;
> +	}
> +
>   	ret = arch_gnttab_init(max_nr_grant_frames,
>   			       nr_status_frames(max_nr_grant_frames));
>   	if (ret < 0)
> @@ -1462,15 +1641,9 @@ int gnttab_init(void)
>   		goto ini_nomem;
>   	}
>   
> -	nr_init_grefs = nr_grant_frames *
> -			gnttab_interface->grefs_per_grant_frame;
> -
> -	for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
> -		gnttab_entry(i) = i + 1;
> +	gnttab_size = nr_grant_frames * gnttab_interface->grefs_per_grant_frame;
>   
> -	gnttab_entry(nr_init_grefs - 1) = GNTTAB_LIST_END;
> -	gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
> -	gnttab_free_head  = NR_RESERVED_ENTRIES;
> +	gnttab_set_free(NR_RESERVED_ENTRIES, gnttab_size - NR_RESERVED_ENTRIES);
>   
>   	printk("Grant table initialized\n");
>   	return 0;
> @@ -1479,6 +1652,7 @@ int gnttab_init(void)
>   	for (i--; i >= 0; i--)
>   		free_page((unsigned long)gnttab_list[i]);
>   	kfree(gnttab_list);
> +	bitmap_free(gnttab_free_bitmap);
>   	return ret;
>   }
>   EXPORT_SYMBOL_GPL(gnttab_init);
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index dfd5bf3..d815e1d 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -129,10 +129,14 @@ int gnttab_try_end_foreign_access(grant_ref_t ref);
>    */
>   int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head);
>   
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first);
> +
>   void gnttab_free_grant_reference(grant_ref_t ref);
>   
>   void gnttab_free_grant_references(grant_ref_t head);
>   
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count);
> +
>   int gnttab_empty_grant_references(const grant_ref_t *pprivate_head);
>   
>   int gnttab_claim_grant_reference(grant_ref_t *pprivate_head);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-12 20:01   ` Boris Ostrovsky
@ 2022-05-13  5:33     ` Juergen Gross
  2022-05-13 10:43       ` Oleksandr
  2022-05-14  2:34       ` Boris Ostrovsky
  0 siblings, 2 replies; 42+ messages in thread
From: Juergen Gross @ 2022-05-13  5:33 UTC (permalink / raw)
  To: Boris Ostrovsky, Oleksandr Tyshchenko, xen-devel, linux-kernel,
	linux-arm-kernel
  Cc: Stefano Stabellini, Julien Grall, Oleksandr Tyshchenko,
	Michael S. Tsirkin, Christoph Hellwig


[-- Attachment #1.1.1: Type: text/plain, Size: 5075 bytes --]

On 12.05.22 22:01, Boris Ostrovsky wrote:
> 
> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>> +
>> +/*
>> + * Handling of free grants:
>> + *
>> + * Free grants are in a simple list anchored in gnttab_free_head. They are
>> + * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
>> + * of free entries is stored in gnttab_free_count.
>> + * Additionally there is a bitmap of free entries anchored in
>> + * gnttab_free_bitmap. This is being used for simplifying allocation of
>> + * multiple consecutive grants, which is needed e.g. for support of virtio.
>> + * gnttab_last_free is used to add free entries of new frames at the end of
>> + * the free list.
>> + * gnttab_free_tail_ptr specifies the variable which references the start
> 
> 
> If this references the start of the free interval, why is it called 
> gnttab_free_*tail*_ptr?

Because this is the tail of the whole area which is free.

It could be renamed to gnttab_free_tail_start_ptr. :-)

> 
> 
>> + * of consecutive free grants ending with gnttab_last_free. This pointer is
>> + * updated in a rather defensive way, in order to avoid performance hits in
>> + * hot paths.
>> + * All those variables are protected by gnttab_list_lock.
>> + */
>>   static int gnttab_free_count;
>> -static grant_ref_t gnttab_free_head;
>> +static unsigned int gnttab_size;
>> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
>> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
>> +static grant_ref_t *gnttab_free_tail_ptr;
>> +static unsigned long *gnttab_free_bitmap;
>>   static DEFINE_SPINLOCK(gnttab_list_lock);
>> +
>>   struct grant_frames xen_auto_xlat_grant_frames;
>>   static unsigned int xen_gnttab_version;
>>   module_param_named(version, xen_gnttab_version, uint, 0);
>> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>>       ref = head = gnttab_free_head;
>>       gnttab_free_count -= count;
>> -    while (count-- > 1)
>> -        head = gnttab_entry(head);
>> +    while (count--) {
>> +        bitmap_clear(gnttab_free_bitmap, head, 1);
>> +        if (gnttab_free_tail_ptr == __gnttab_entry(head))
>> +            gnttab_free_tail_ptr = &gnttab_free_head;
>> +        if (count)
>> +            head = gnttab_entry(head);
>> +    }
>>       gnttab_free_head = gnttab_entry(head);
>>       gnttab_entry(head) = GNTTAB_LIST_END;
>> +    if (!gnttab_free_count) {
>> +        gnttab_last_free = GNTTAB_LIST_END;
>> +        gnttab_free_tail_ptr = NULL;
>> +    }
>> +
>>       spin_unlock_irqrestore(&gnttab_list_lock, flags);
>>       return ref;
>>   }
>> +static int get_seq_entry_count(void)
>> +{
>> +    if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
>> +        *gnttab_free_tail_ptr == GNTTAB_LIST_END)
>> +        return 0;
>> +
>> +    return gnttab_last_free - *gnttab_free_tail_ptr + 1;
>> +}
>> +
>> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
>> +static int get_free_seq(unsigned int count)
>> +{
>> +    int ret = -ENOSPC;
>> +    unsigned int from, to;
>> +    grant_ref_t *last;
>> +
>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>> +    last = &gnttab_free_head;
>> +
>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>> +         from < gnttab_size;
>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>> +                    from + 1);
>> +        if (ret < 0 && to - from >= count) {
>> +            ret = from;
>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>> +            from += count;
>> +            gnttab_free_count -= count;
> 
> 
> IIUIC we can have multiple passes over this, meaning that the gnttab_free_count 
> may be decremented more than once. Is that intentional?

After the first pass decrementing gnttab_free_cnt, ret will no
longer be less than zero, so this can be hit only once.

> 
> 
>> +            if (from == to)
>> +                continue;
>> +        }
>> +
>> +        while (from < to) {
>> +            *last = from;
>> +            last = __gnttab_entry(from);
>> +            gnttab_last_free = from;
>> +            from++;
>> +        }
> 
> 
> I have been looking at this loop and I can't understand what it is doing ;-( Can 
> you enlighten me?

It is recreating the free list in order to have it properly sorted.
This is needed to make sure that the free tail has the maximum
possible size (you can take the tail off the list without having
to worry about breaking the linked list because of references into
the tail).


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-13  5:33     ` Juergen Gross
@ 2022-05-13 10:43       ` Oleksandr
  2022-05-14  2:34       ` Boris Ostrovsky
  1 sibling, 0 replies; 42+ messages in thread
From: Oleksandr @ 2022-05-13 10:43 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky
  Cc: xen-devel, linux-kernel, linux-arm-kernel, Stefano Stabellini,
	Julien Grall, Oleksandr Tyshchenko, Michael S. Tsirkin,
	Christoph Hellwig


On 13.05.22 08:33, Juergen Gross wrote:
> On 12.05.22 22:01, Boris Ostrovsky wrote:


Hello Juergen, Boris


[Juergen, thank you for your explanation]


>
>>
>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>>> +
>>> +/*
>>> + * Handling of free grants:
>>> + *
>>> + * Free grants are in a simple list anchored in gnttab_free_head. 
>>> They are
>>> + * linked by grant ref, the last element contains GNTTAB_LIST_END. 
>>> The number
>>> + * of free entries is stored in gnttab_free_count.
>>> + * Additionally there is a bitmap of free entries anchored in
>>> + * gnttab_free_bitmap. This is being used for simplifying 
>>> allocation of
>>> + * multiple consecutive grants, which is needed e.g. for support of 
>>> virtio.
>>> + * gnttab_last_free is used to add free entries of new frames at 
>>> the end of
>>> + * the free list.
>>> + * gnttab_free_tail_ptr specifies the variable which references the 
>>> start
>>
>>
>> If this references the start of the free interval, why is it called 
>> gnttab_free_*tail*_ptr?
>
> Because this is the tail of the whole area which is free.
>
> It could be renamed to gnttab_free_tail_start_ptr. :-)
>
>>
>>
>>> + * of consecutive free grants ending with gnttab_last_free. This 
>>> pointer is
>>> + * updated in a rather defensive way, in order to avoid performance 
>>> hits in
>>> + * hot paths.
>>> + * All those variables are protected by gnttab_list_lock.
>>> + */
>>>   static int gnttab_free_count;
>>> -static grant_ref_t gnttab_free_head;
>>> +static unsigned int gnttab_size;
>>> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
>>> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
>>> +static grant_ref_t *gnttab_free_tail_ptr;
>>> +static unsigned long *gnttab_free_bitmap;
>>>   static DEFINE_SPINLOCK(gnttab_list_lock);
>>> +
>>>   struct grant_frames xen_auto_xlat_grant_frames;
>>>   static unsigned int xen_gnttab_version;
>>>   module_param_named(version, xen_gnttab_version, uint, 0);
>>> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>>>       ref = head = gnttab_free_head;
>>>       gnttab_free_count -= count;
>>> -    while (count-- > 1)
>>> -        head = gnttab_entry(head);
>>> +    while (count--) {
>>> +        bitmap_clear(gnttab_free_bitmap, head, 1);
>>> +        if (gnttab_free_tail_ptr == __gnttab_entry(head))
>>> +            gnttab_free_tail_ptr = &gnttab_free_head;
>>> +        if (count)
>>> +            head = gnttab_entry(head);
>>> +    }
>>>       gnttab_free_head = gnttab_entry(head);
>>>       gnttab_entry(head) = GNTTAB_LIST_END;
>>> +    if (!gnttab_free_count) {
>>> +        gnttab_last_free = GNTTAB_LIST_END;
>>> +        gnttab_free_tail_ptr = NULL;
>>> +    }
>>> +
>>>       spin_unlock_irqrestore(&gnttab_list_lock, flags);
>>>       return ref;
>>>   }
>>> +static int get_seq_entry_count(void)
>>> +{
>>> +    if (gnttab_last_free == GNTTAB_LIST_END || 
>>> !gnttab_free_tail_ptr ||
>>> +        *gnttab_free_tail_ptr == GNTTAB_LIST_END)
>>> +        return 0;
>>> +
>>> +    return gnttab_last_free - *gnttab_free_tail_ptr + 1;
>>> +}
>>> +
>>> +/* Rebuilds the free grant list and tries to find count consecutive 
>>> entries. */
>>> +static int get_free_seq(unsigned int count)
>>> +{
>>> +    int ret = -ENOSPC;
>>> +    unsigned int from, to;
>>> +    grant_ref_t *last;
>>> +
>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>> +    last = &gnttab_free_head;
>>> +
>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>> +         from < gnttab_size;
>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 
>>> 1)) {
>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>> +                    from + 1);
>>> +        if (ret < 0 && to - from >= count) {
>>> +            ret = from;
>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>> +            from += count;
>>> +            gnttab_free_count -= count;
>>
>>
>> IIUIC we can have multiple passes over this, meaning that the 
>> gnttab_free_count may be decremented more than once. Is that 
>> intentional?
>
> After the first pass decrementing gnttab_free_cnt, ret will no
> longer be less than zero, so this can be hit only once.
>
>>
>>
>>> +            if (from == to)
>>> +                continue;
>>> +        }
>>> +
>>> +        while (from < to) {
>>> +            *last = from;
>>> +            last = __gnttab_entry(from);
>>> +            gnttab_last_free = from;
>>> +            from++;
>>> +        }
>>
>>
>> I have been looking at this loop and I can't understand what it is 
>> doing ;-( Can you enlighten me?
>
> It is recreating the free list in order to have it properly sorted.
> This is needed to make sure that the free tail has the maximum
> possible size (you can take the tail off the list without having
> to worry about breaking the linked list because of references into
> the tail).

I think the loop deserves a comment on top, I will add it.



>
>
>
> Juergen

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-13  5:33     ` Juergen Gross
  2022-05-13 10:43       ` Oleksandr
@ 2022-05-14  2:34       ` Boris Ostrovsky
  2022-05-16  5:59         ` Juergen Gross
  1 sibling, 1 reply; 42+ messages in thread
From: Boris Ostrovsky @ 2022-05-14  2:34 UTC (permalink / raw)
  To: Juergen Gross, Oleksandr Tyshchenko, xen-devel, linux-kernel,
	linux-arm-kernel
  Cc: Stefano Stabellini, Julien Grall, Oleksandr Tyshchenko,
	Michael S. Tsirkin, Christoph Hellwig



On 5/13/22 1:33 AM, Juergen Gross wrote:
> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>
>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:

>>> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
>>> +static int get_free_seq(unsigned int count)
>>> +{
>>> +    int ret = -ENOSPC;
>>> +    unsigned int from, to;
>>> +    grant_ref_t *last;
>>> +
>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>> +    last = &gnttab_free_head;
>>> +
>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>> +         from < gnttab_size;
>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>> +                    from + 1);
>>> +        if (ret < 0 && to - from >= count) {
>>> +            ret = from;
>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>> +            from += count;
>>> +            gnttab_free_count -= count;
>>
>>
>> IIUIC we can have multiple passes over this, meaning that the gnttab_free_count may be decremented more than once. Is that intentional?
> 
> After the first pass decrementing gnttab_free_cnt, ret will no
> longer be less than zero, so this can be hit only once.

Oh, yes, of course.

> 
>>
>>
>>> +            if (from == to)
>>> +                continue;
>>> +        }
>>> +
>>> +        while (from < to) {
>>> +            *last = from;
>>> +            last = __gnttab_entry(from);
>>> +            gnttab_last_free = from;
>>> +            from++;
>>> +        }
>>
>>
>> I have been looking at this loop and I can't understand what it is doing ;-( Can you enlighten me?
> 
> It is recreating the free list in order to have it properly sorted.
> This is needed to make sure that the free tail has the maximum
> possible size (you can take the tail off the list without having
> to worry about breaking the linked list because of references into
> the tail).


So let's say we have the (one-dimensional) table of length 13

idx    ..    2    3  ...  10  11  12

grant       12   11        2  -1   3


and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.

What will this look like after the 2 iterations of the outer loop?

(I am really having a mental block on this).



-boris


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-14  2:34       ` Boris Ostrovsky
@ 2022-05-16  5:59         ` Juergen Gross
  2022-05-16 16:00           ` Boris Ostrovsky
  0 siblings, 1 reply; 42+ messages in thread
From: Juergen Gross @ 2022-05-16  5:59 UTC (permalink / raw)
  To: Boris Ostrovsky, Oleksandr Tyshchenko, xen-devel, linux-kernel,
	linux-arm-kernel
  Cc: Stefano Stabellini, Julien Grall, Oleksandr Tyshchenko,
	Michael S. Tsirkin, Christoph Hellwig


[-- Attachment #1.1.1: Type: text/plain, Size: 2887 bytes --]

On 14.05.22 04:34, Boris Ostrovsky wrote:
> 
> 
> On 5/13/22 1:33 AM, Juergen Gross wrote:
>> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>>
>>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
> 
>>>> +/* Rebuilds the free grant list and tries to find count consecutive 
>>>> entries. */
>>>> +static int get_free_seq(unsigned int count)
>>>> +{
>>>> +    int ret = -ENOSPC;
>>>> +    unsigned int from, to;
>>>> +    grant_ref_t *last;
>>>> +
>>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>>> +    last = &gnttab_free_head;
>>>> +
>>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>>> +         from < gnttab_size;
>>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>>> +                    from + 1);
>>>> +        if (ret < 0 && to - from >= count) {
>>>> +            ret = from;
>>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>>> +            from += count;
>>>> +            gnttab_free_count -= count;
>>>
>>>
>>> IIUIC we can have multiple passes over this, meaning that the 
>>> gnttab_free_count may be decremented more than once. Is that intentional?
>>
>> After the first pass decrementing gnttab_free_cnt, ret will no
>> longer be less than zero, so this can be hit only once.
> 
> Oh, yes, of course.
> 
>>
>>>
>>>
>>>> +            if (from == to)
>>>> +                continue;
>>>> +        }
>>>> +
>>>> +        while (from < to) {
>>>> +            *last = from;
>>>> +            last = __gnttab_entry(from);
>>>> +            gnttab_last_free = from;
>>>> +            from++;
>>>> +        }
>>>
>>>
>>> I have been looking at this loop and I can't understand what it is doing ;-( 
>>> Can you enlighten me?
>>
>> It is recreating the free list in order to have it properly sorted.
>> This is needed to make sure that the free tail has the maximum
>> possible size (you can take the tail off the list without having
>> to worry about breaking the linked list because of references into
>> the tail).
> 
> 
> So let's say we have the (one-dimensional) table of length 13
> 
> idx    ..    2    3  ...  10  11  12
> 
> grant       12   11        2  -1   3
> 
> 
> and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.

You meant 10, 2, 12, 3, 11, I guess?

> 
> What will this look like after the 2 iterations of the outer loop?

idx    ..    2    3  ...  10  11  12

grant        3   10       11  12  -1

with gnttab_free_head being 2, i.e the free list is now 2, 3, 10, 11, 12.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-16  5:59         ` Juergen Gross
@ 2022-05-16 16:00           ` Boris Ostrovsky
  2022-05-16 18:30             ` Oleksandr
  0 siblings, 1 reply; 42+ messages in thread
From: Boris Ostrovsky @ 2022-05-16 16:00 UTC (permalink / raw)
  To: Juergen Gross, Oleksandr Tyshchenko, xen-devel, linux-kernel,
	linux-arm-kernel
  Cc: Stefano Stabellini, Julien Grall, Oleksandr Tyshchenko,
	Michael S. Tsirkin, Christoph Hellwig



On 5/16/22 1:59 AM, Juergen Gross wrote:
> On 14.05.22 04:34, Boris Ostrovsky wrote:
>>
>>
>> On 5/13/22 1:33 AM, Juergen Gross wrote:
>>> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>>>
>>>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>>
>>>>> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
>>>>> +static int get_free_seq(unsigned int count)
>>>>> +{
>>>>> +    int ret = -ENOSPC;
>>>>> +    unsigned int from, to;
>>>>> +    grant_ref_t *last;
>>>>> +
>>>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>>>> +    last = &gnttab_free_head;
>>>>> +
>>>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>>>> +         from < gnttab_size;
>>>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>>>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>>>> +                    from + 1);
>>>>> +        if (ret < 0 && to - from >= count) {
>>>>> +            ret = from;
>>>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>>>> +            from += count;
>>>>> +            gnttab_free_count -= count;
>>>>
>>>>
>>>> IIUIC we can have multiple passes over this, meaning that the gnttab_free_count may be decremented more than once. Is that intentional?
>>>
>>> After the first pass decrementing gnttab_free_cnt, ret will no
>>> longer be less than zero, so this can be hit only once.
>>
>> Oh, yes, of course.
>>
>>>
>>>>
>>>>
>>>>> +            if (from == to)
>>>>> +                continue;
>>>>> +        }
>>>>> +
>>>>> +        while (from < to) {
>>>>> +            *last = from;
>>>>> +            last = __gnttab_entry(from);
>>>>> +            gnttab_last_free = from;
>>>>> +            from++;
>>>>> +        }
>>>>
>>>>
>>>> I have been looking at this loop and I can't understand what it is doing ;-( Can you enlighten me?
>>>
>>> It is recreating the free list in order to have it properly sorted.
>>> This is needed to make sure that the free tail has the maximum
>>> possible size (you can take the tail off the list without having
>>> to worry about breaking the linked list because of references into
>>> the tail).
>>
>>
>> So let's say we have the (one-dimensional) table of length 13
>>
>> idx    ..    2    3  ...  10  11  12
>>
>> grant       12   11        2  -1   3
>>
>>
>> and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.
> 
> You meant 10, 2, 12, 3, 11, I guess?
> 
>>
>> What will this look like after the 2 iterations of the outer loop?
> 
> idx    ..    2    3  ...  10  11  12
> 
> grant        3   10       11  12  -1
> 
> with gnttab_free_head being 2, i.e the free list is now 2, 3, 10, 11, 12.



OK, thanks, that helped. I couldn't link the free chunks in my head


With the error handling in gnttab_init() fixed

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>



-boris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-16 16:00           ` Boris Ostrovsky
@ 2022-05-16 18:30             ` Oleksandr
  2022-05-16 18:57               ` Boris Ostrovsky
  0 siblings, 1 reply; 42+ messages in thread
From: Oleksandr @ 2022-05-16 18:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Juergen Gross, xen-devel, linux-kernel, linux-arm-kernel,
	Stefano Stabellini, Julien Grall, Oleksandr Tyshchenko,
	Michael S. Tsirkin, Christoph Hellwig


On 16.05.22 19:00, Boris Ostrovsky wrote:


Hello Boris


>
>
> On 5/16/22 1:59 AM, Juergen Gross wrote:
>> On 14.05.22 04:34, Boris Ostrovsky wrote:
>>>
>>>
>>> On 5/13/22 1:33 AM, Juergen Gross wrote:
>>>> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>>>>
>>>>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>>>
>>>>>> +/* Rebuilds the free grant list and tries to find count 
>>>>>> consecutive entries. */
>>>>>> +static int get_free_seq(unsigned int count)
>>>>>> +{
>>>>>> +    int ret = -ENOSPC;
>>>>>> +    unsigned int from, to;
>>>>>> +    grant_ref_t *last;
>>>>>> +
>>>>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>>>>> +    last = &gnttab_free_head;
>>>>>> +
>>>>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>>>>> +         from < gnttab_size;
>>>>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, 
>>>>>> to + 1)) {
>>>>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>>>>> +                    from + 1);
>>>>>> +        if (ret < 0 && to - from >= count) {
>>>>>> +            ret = from;
>>>>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>>>>> +            from += count;
>>>>>> +            gnttab_free_count -= count;
>>>>>
>>>>>
>>>>> IIUIC we can have multiple passes over this, meaning that the 
>>>>> gnttab_free_count may be decremented more than once. Is that 
>>>>> intentional?
>>>>
>>>> After the first pass decrementing gnttab_free_cnt, ret will no
>>>> longer be less than zero, so this can be hit only once.
>>>
>>> Oh, yes, of course.
>>>
>>>>
>>>>>
>>>>>
>>>>>> +            if (from == to)
>>>>>> +                continue;
>>>>>> +        }
>>>>>> +
>>>>>> +        while (from < to) {
>>>>>> +            *last = from;
>>>>>> +            last = __gnttab_entry(from);
>>>>>> +            gnttab_last_free = from;
>>>>>> +            from++;
>>>>>> +        }
>>>>>
>>>>>
>>>>> I have been looking at this loop and I can't understand what it is 
>>>>> doing ;-( Can you enlighten me?
>>>>
>>>> It is recreating the free list in order to have it properly sorted.
>>>> This is needed to make sure that the free tail has the maximum
>>>> possible size (you can take the tail off the list without having
>>>> to worry about breaking the linked list because of references into
>>>> the tail).
>>>
>>>
>>> So let's say we have the (one-dimensional) table of length 13
>>>
>>> idx    ..    2    3  ...  10  11  12
>>>
>>> grant       12   11        2  -1   3
>>>
>>>
>>> and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.
>>
>> You meant 10, 2, 12, 3, 11, I guess?
>>
>>>
>>> What will this look like after the 2 iterations of the outer loop?
>>
>> idx    ..    2    3  ...  10  11  12
>>
>> grant        3   10       11  12  -1
>>
>> with gnttab_free_head being 2, i.e the free list is now 2, 3, 10, 11, 
>> 12.
>
>
>
> OK, thanks, that helped. I couldn't link the free chunks in my head
>
>
> With the error handling in gnttab_init() fixed

yes, this is a diff that I am going to apply for the next version:


[snip]

@@ -1596,19 +1601,20 @@ static int gnttab_expand(unsigned int req_entries)
  int gnttab_init(void)
  {
         int i;
-       unsigned long max_nr_grant_frames;
+       unsigned long max_nr_grant_frames, max_nr_grefs;
         unsigned int max_nr_glist_frames, nr_glist_frames;
         int ret;

         gnttab_request_version();
         max_nr_grant_frames = gnttab_max_grant_frames();
+       max_nr_grefs = max_nr_grant_frames *
+ gnttab_interface->grefs_per_grant_frame;
         nr_grant_frames = 1;

         /* Determine the maximum number of frames required for the
          * grant reference free list on the current hypervisor.
          */
-       max_nr_glist_frames = (max_nr_grant_frames *
- gnttab_interface->grefs_per_grant_frame / RPP);
+       max_nr_glist_frames = max_nr_grefs / RPP;

         gnttab_list = kmalloc_array(max_nr_glist_frames,
                                     sizeof(grant_ref_t *),
@@ -1625,8 +1631,7 @@ int gnttab_init(void)
                 }
         }

-       i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
-       gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
+       gnttab_free_bitmap = bitmap_zalloc(max_nr_grefs, GFP_KERNEL);
         if (!gnttab_free_bitmap) {
                 ret = -ENOMEM;
                 goto ini_nomem;


>
>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Thanks!



>
>
>
> -boris

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants
  2022-05-16 18:30             ` Oleksandr
@ 2022-05-16 18:57               ` Boris Ostrovsky
  0 siblings, 0 replies; 42+ messages in thread
From: Boris Ostrovsky @ 2022-05-16 18:57 UTC (permalink / raw)
  To: Oleksandr
  Cc: Juergen Gross, xen-devel, linux-kernel, linux-arm-kernel,
	Stefano Stabellini, Julien Grall, Oleksandr Tyshchenko,
	Michael S. Tsirkin, Christoph Hellwig


On 5/16/22 2:30 PM, Oleksandr wrote:
>
> On 16.05.22 19:00, Boris Ostrovsky wrote:
>
>
>>
>>
>> With the error handling in gnttab_init() fixed
>
> yes, this is a diff that I am going to apply for the next version:
>
>
> [snip]
>
> @@ -1596,19 +1601,20 @@ static int gnttab_expand(unsigned int req_entries)
>  int gnttab_init(void)
>  {
>         int i;
> -       unsigned long max_nr_grant_frames;
> +       unsigned long max_nr_grant_frames, max_nr_grefs;
>         unsigned int max_nr_glist_frames, nr_glist_frames;
>         int ret;
>
>         gnttab_request_version();
>         max_nr_grant_frames = gnttab_max_grant_frames();
> +       max_nr_grefs = max_nr_grant_frames *
> + gnttab_interface->grefs_per_grant_frame;
>         nr_grant_frames = 1;
>
>         /* Determine the maximum number of frames required for the
>          * grant reference free list on the current hypervisor.
>          */
> -       max_nr_glist_frames = (max_nr_grant_frames *
> - gnttab_interface->grefs_per_grant_frame / RPP);
> +       max_nr_glist_frames = max_nr_grefs / RPP;
>
>         gnttab_list = kmalloc_array(max_nr_glist_frames,
>                                     sizeof(grant_ref_t *),
> @@ -1625,8 +1631,7 @@ int gnttab_init(void)
>                 }
>         }
>
> -       i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
> -       gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
> +       gnttab_free_bitmap = bitmap_zalloc(max_nr_grefs, GFP_KERNEL);
>         if (!gnttab_free_bitmap) {
>                 ret = -ENOMEM;
>                 goto ini_nomem;
>


Looks good, thanks.


-boris


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-07 18:19 ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops Oleksandr Tyshchenko
  2022-05-09 21:39   ` [PATCH V2 5/7] dt-bindings: Add xen, dev-domid " Stefano Stabellini
@ 2022-05-17  0:27   ` Rob Herring
  2022-05-18 14:12     ` Oleksandr
  2022-05-18 14:32   ` Arnd Bergmann
  2 siblings, 1 reply; 42+ messages in thread
From: Rob Herring @ 2022-05-17  0:27 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, virtualization, devicetree, linux-kernel,
	linux-arm-kernel, Oleksandr Tyshchenko, Jason Wang,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Stefano Stabellini, Michael S. Tsirkin, Christoph Hellwig,
	Jean-Philippe Brucker

On Sat, May 07, 2022 at 09:19:06PM +0300, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Introduce Xen specific binding for the virtualized device (e.g. virtio)
> to be used by Xen grant DMA-mapping layer in the subsequent commit.
> 
> This binding indicates that Xen grant mappings scheme needs to be
> enabled for the device which DT node contains that property and specifies
> the ID of Xen domain where the corresponding backend resides. The ID
> (domid) is used as an argument to the grant mapping APIs.
> 
> This is needed for the option to restrict memory access using Xen grant
> mappings to work which primary goal is to enable using virtio devices
> in Xen guests.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Changes RFC -> V1:
>    - update commit subject/description and text in description
>    - move to devicetree/bindings/arm/
> 
> Changes V1 -> V2:
>    - update text in description
>    - change the maintainer of the binding
>    - fix validation issue
>    - reference xen,dev-domid.yaml schema from virtio/mmio.yaml
> ---
>  .../devicetree/bindings/arm/xen,dev-domid.yaml     | 37 ++++++++++++++++++++++
>  Documentation/devicetree/bindings/virtio/mmio.yaml |  7 ++++
>  2 files changed, 44 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> 
> diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> new file mode 100644
> index 00000000..750e89e
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> @@ -0,0 +1,37 @@
> +# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Xen specific binding for virtualized devices (e.g. virtio)
> +
> +maintainers:
> +  - Stefano Stabellini <sstabellini@kernel.org>
> +
> +select: true

Omit. No need to apply this on every single node.

> +
> +description:
> +  This binding indicates that Xen grant mappings need to be enabled for
> +  the device, and it specifies the ID of the domain where the corresponding
> +  device (backend) resides. The property is required to restrict memory
> +  access using Xen grant mappings.
> +
> +properties:
> +  xen,dev-domid:

I kind of think 'dev' is redundant. Is there another kind of domid 
possible? Maybe xen,backend-domid or just xen,domid? I don't know Xen 
too well, so ultimately up to you all.

> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description:
> +      The domid (domain ID) of the domain where the device (backend) is running.
> +
> +additionalProperties: true
> +
> +examples:
> +  - |
> +    virtio@3000 {
> +            compatible = "virtio,mmio";
> +            reg = <0x3000 0x100>;
> +            interrupts = <41>;
> +
> +            /* The device is located in Xen domain with ID 1 */
> +            xen,dev-domid = <1>;
> +    };
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 10c22b5..29a0932 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -13,6 +13,9 @@ description:
>    See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>    more details.
>  
> +allOf:
> +  - $ref: /schemas/arm/xen,dev-domid.yaml#
> +
>  properties:
>    compatible:
>      const: virtio,mmio
> @@ -33,6 +36,10 @@ properties:
>      description: Required for devices making accesses thru an IOMMU.
>      maxItems: 1
>  
> +  xen,dev-domid:
> +    description: Required when Xen grant mappings need to be enabled for device.
> +    $ref: /schemas/types.yaml#/definitions/uint32

No need to define the type again nor describe it again.

Instead, just change additionalProperties to unevaluateProperties in 
this doc. The diff is the latter takes $ref's into account.

> +
>  required:
>    - compatible
>    - reg
> -- 
> 2.7.4
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-17  0:27   ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid " Rob Herring
@ 2022-05-18 14:12     ` Oleksandr
  0 siblings, 0 replies; 42+ messages in thread
From: Oleksandr @ 2022-05-18 14:12 UTC (permalink / raw)
  To: Rob Herring, Juergen Gross, Stefano Stabellini
  Cc: xen-devel, virtualization, devicetree, linux-kernel,
	linux-arm-kernel, Oleksandr Tyshchenko, Jason Wang,
	Krzysztof Kozlowski, Julien Grall, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker


On 17.05.22 03:27, Rob Herring wrote:

Hello Rob, all


> On Sat, May 07, 2022 at 09:19:06PM +0300, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Introduce Xen specific binding for the virtualized device (e.g. virtio)
>> to be used by Xen grant DMA-mapping layer in the subsequent commit.
>>
>> This binding indicates that Xen grant mappings scheme needs to be
>> enabled for the device which DT node contains that property and specifies
>> the ID of Xen domain where the corresponding backend resides. The ID
>> (domid) is used as an argument to the grant mapping APIs.
>>
>> This is needed for the option to restrict memory access using Xen grant
>> mappings to work which primary goal is to enable using virtio devices
>> in Xen guests.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Changes RFC -> V1:
>>     - update commit subject/description and text in description
>>     - move to devicetree/bindings/arm/
>>
>> Changes V1 -> V2:
>>     - update text in description
>>     - change the maintainer of the binding
>>     - fix validation issue
>>     - reference xen,dev-domid.yaml schema from virtio/mmio.yaml
>> ---
>>   .../devicetree/bindings/arm/xen,dev-domid.yaml     | 37 ++++++++++++++++++++++
>>   Documentation/devicetree/bindings/virtio/mmio.yaml |  7 ++++
>>   2 files changed, 44 insertions(+)
>>   create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>>
>> diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>> new file mode 100644
>> index 00000000..750e89e
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>> @@ -0,0 +1,37 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Xen specific binding for virtualized devices (e.g. virtio)
>> +
>> +maintainers:
>> +  - Stefano Stabellini <sstabellini@kernel.org>
>> +
>> +select: true
> Omit. No need to apply this on every single node.

ok, will do


>
>> +
>> +description:
>> +  This binding indicates that Xen grant mappings need to be enabled for
>> +  the device, and it specifies the ID of the domain where the corresponding
>> +  device (backend) resides. The property is required to restrict memory
>> +  access using Xen grant mappings.
>> +
>> +properties:
>> +  xen,dev-domid:
> I kind of think 'dev' is redundant. Is there another kind of domid
> possible?


In general, yes. It is driver(frontend) domid. But, at least for now, I 
don't see why we will need an additional property for that.


>   Maybe xen,backend-domid or just xen,domid? I don't know Xen
> too well, so ultimately up to you all.

xen,domid sounds ambiguous to me.

xen,backend-domid sounds perfectly fine to me, I even think it fits better.



Stefano, Juergen, would you be happy with new xen,backend-domid name?

If yes, Stefano could you please clarify, would you be OK if I retained 
your R-b tags (for all patches in current series which touch that 
property) after doing such renaming?




>
>> +    $ref: /schemas/types.yaml#/definitions/uint32
>> +    description:
>> +      The domid (domain ID) of the domain where the device (backend) is running.
>> +
>> +additionalProperties: true
>> +
>> +examples:
>> +  - |
>> +    virtio@3000 {
>> +            compatible = "virtio,mmio";
>> +            reg = <0x3000 0x100>;
>> +            interrupts = <41>;
>> +
>> +            /* The device is located in Xen domain with ID 1 */
>> +            xen,dev-domid = <1>;
>> +    };
>> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> index 10c22b5..29a0932 100644
>> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
>> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> @@ -13,6 +13,9 @@ description:
>>     See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>>     more details.
>>   
>> +allOf:
>> +  - $ref: /schemas/arm/xen,dev-domid.yaml#
>> +
>>   properties:
>>     compatible:
>>       const: virtio,mmio
>> @@ -33,6 +36,10 @@ properties:
>>       description: Required for devices making accesses thru an IOMMU.
>>       maxItems: 1
>>   
>> +  xen,dev-domid:
>> +    description: Required when Xen grant mappings need to be enabled for device.
>> +    $ref: /schemas/types.yaml#/definitions/uint32
> No need to define the type again nor describe it again.
>
> Instead, just change additionalProperties to unevaluateProperties in
> this doc. The diff is the latter takes $ref's into account.

ok, will do. Could you please clarify, shall I use?

unevaluatedProperties: false

or

unevaluatedProperties:

type: object


I am not too familiar with this stuff. Both variants seem to pass 
validation.


Thank you.


>
>> +
>>   required:
>>     - compatible
>>     - reg
>> -- 
>> 2.7.4
>>
>>
-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-07 18:19 ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops Oleksandr Tyshchenko
  2022-05-09 21:39   ` [PATCH V2 5/7] dt-bindings: Add xen, dev-domid " Stefano Stabellini
  2022-05-17  0:27   ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid " Rob Herring
@ 2022-05-18 14:32   ` Arnd Bergmann
  2022-05-18 16:06     ` Oleksandr
  2022-05-18 18:59     ` Rob Herring
  2 siblings, 2 replies; 42+ messages in thread
From: Arnd Bergmann @ 2022-05-18 14:32 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE,
	DTML, Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Rob Herring, Krzysztof Kozlowski, Julien Grall,
	Juergen Gross, Stefano Stabellini, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker

On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 10c22b5..29a0932 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -13,6 +13,9 @@ description:
>    See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>    more details.
>
> +allOf:
> +  - $ref: /schemas/arm/xen,dev-domid.yaml#
> +
>  properties:
>    compatible:
>      const: virtio,mmio
> @@ -33,6 +36,10 @@ properties:
>      description: Required for devices making accesses thru an IOMMU.
>      maxItems: 1
>
> +  xen,dev-domid:
> +    description: Required when Xen grant mappings need to be enabled for device.
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +
>  required:
>    - compatible
>    - reg

Sorry for joining the discussion late. Have you considered using the
generic iommu
binding here instead of a custom property? This would mean having a device
node for the grant-table mechanism that can be referred to using the 'iommus'
phandle property, with the domid as an additional argument.

It does not quite fit the model that Linux currently uses for iommus,
as that has an allocator for dma_addr_t space, but it would think it's
conceptually close enough that it makes sense for the binding.

         Arnd

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-18 14:32   ` Arnd Bergmann
@ 2022-05-18 16:06     ` Oleksandr
  2022-05-18 16:39       ` Arnd Bergmann
  2022-05-18 18:59     ` Rob Herring
  1 sibling, 1 reply; 42+ messages in thread
From: Oleksandr @ 2022-05-18 16:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: xen-devel, open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE,
	DTML, Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Rob Herring, Krzysztof Kozlowski, Julien Grall,
	Juergen Gross, Stefano Stabellini, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker


On 18.05.22 17:32, Arnd Bergmann wrote:


Hello Arnd


> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> index 10c22b5..29a0932 100644
>> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
>> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> @@ -13,6 +13,9 @@ description:
>>     See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>>     more details.
>>
>> +allOf:
>> +  - $ref: /schemas/arm/xen,dev-domid.yaml#
>> +
>>   properties:
>>     compatible:
>>       const: virtio,mmio
>> @@ -33,6 +36,10 @@ properties:
>>       description: Required for devices making accesses thru an IOMMU.
>>       maxItems: 1
>>
>> +  xen,dev-domid:
>> +    description: Required when Xen grant mappings need to be enabled for device.
>> +    $ref: /schemas/types.yaml#/definitions/uint32
>> +
>>   required:
>>     - compatible
>>     - reg
> Sorry for joining the discussion late. Have you considered using the
> generic iommu
> binding here instead of a custom property?

I have to admit - no, I haven't. I was thinking that Xen specific 
feature should be communicated using Xen specific DT property.


>   This would mean having a device
> node for the grant-table mechanism that can be referred to using the 'iommus'
> phandle property, with the domid as an additional argument.

I assume, you are speaking about something like the following?


xen_dummy_iommu {
    compatible = "xen,dummy-iommu";
    #iommu-cells = <1>;
};

virtio@3000 {
    compatible = "virtio,mmio";
    reg = <0x3000 0x100>;
    interrupts = <41>;

    /* The device is located in Xen domain with ID 1 */
    iommus = <&xen_dummy_iommu 1>;
};


>
> It does not quite fit the model that Linux currently uses for iommus,
> as that has an allocator for dma_addr_t space

yes (# 3/7 adds grant-table based allocator)


> , but it would think it's
> conceptually close enough that it makes sense for the binding.

Interesting idea. I am wondering, do we need an extra actions for this 
to work in Linux guest (dummy IOMMU driver, etc)?


>
>           Arnd

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-18 16:06     ` Oleksandr
@ 2022-05-18 16:39       ` Arnd Bergmann
  2022-05-18 23:32         ` Oleksandr
  0 siblings, 1 reply; 42+ messages in thread
From: Arnd Bergmann @ 2022-05-18 16:39 UTC (permalink / raw)
  To: Oleksandr
  Cc: Arnd Bergmann, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Rob Herring, Krzysztof Kozlowski, Julien Grall,
	Juergen Gross, Stefano Stabellini, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker

On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
> On 18.05.22 17:32, Arnd Bergmann wrote:
> > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
> >   This would mean having a device
> > node for the grant-table mechanism that can be referred to using the 'iommus'
> > phandle property, with the domid as an additional argument.
>
> I assume, you are speaking about something like the following?
>
>
> xen_dummy_iommu {
>     compatible = "xen,dummy-iommu";
>     #iommu-cells = <1>;
> };
>
> virtio@3000 {
>     compatible = "virtio,mmio";
>     reg = <0x3000 0x100>;
>     interrupts = <41>;
>
>     /* The device is located in Xen domain with ID 1 */
>     iommus = <&xen_dummy_iommu 1>;
> };

Right, that's that's the idea, except I would not call it a 'dummy'.
From the perspective of the DT, this behaves just like an IOMMU,
even if the exact mechanism is different from most hardware IOMMU
implementations.

> > It does not quite fit the model that Linux currently uses for iommus,
> > as that has an allocator for dma_addr_t space
>
> yes (# 3/7 adds grant-table based allocator)
>
>
> > , but it would think it's
> > conceptually close enough that it makes sense for the binding.
>
> Interesting idea. I am wondering, do we need an extra actions for this
> to work in Linux guest (dummy IOMMU driver, etc)?

It depends on how closely the guest implementation can be made to
resemble a normal iommu. If you do allocate dma_addr_t addresses,
it may actually be close enough that you can just turn the grant-table
code into a normal iommu driver and change nothing else.

        Arnd

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-18 14:32   ` Arnd Bergmann
  2022-05-18 16:06     ` Oleksandr
@ 2022-05-18 18:59     ` Rob Herring
  2022-05-18 23:48       ` Oleksandr
  1 sibling, 1 reply; 42+ messages in thread
From: Rob Herring @ 2022-05-18 18:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Oleksandr Tyshchenko, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Stefano Stabellini, Michael S. Tsirkin, Christoph Hellwig,
	Jean-Philippe Brucker

On Wed, May 18, 2022 at 03:32:27PM +0100, Arnd Bergmann wrote:
> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> >
> > diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> > index 10c22b5..29a0932 100644
> > --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> > +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> > @@ -13,6 +13,9 @@ description:
> >    See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
> >    more details.
> >
> > +allOf:
> > +  - $ref: /schemas/arm/xen,dev-domid.yaml#
> > +
> >  properties:
> >    compatible:
> >      const: virtio,mmio
> > @@ -33,6 +36,10 @@ properties:
> >      description: Required for devices making accesses thru an IOMMU.
> >      maxItems: 1
> >
> > +  xen,dev-domid:
> > +    description: Required when Xen grant mappings need to be enabled for device.
> > +    $ref: /schemas/types.yaml#/definitions/uint32
> > +
> >  required:
> >    - compatible
> >    - reg
> 
> Sorry for joining the discussion late. Have you considered using the
> generic iommu
> binding here instead of a custom property? This would mean having a device
> node for the grant-table mechanism that can be referred to using the 'iommus'
> phandle property, with the domid as an additional argument.
> 
> It does not quite fit the model that Linux currently uses for iommus,
> as that has an allocator for dma_addr_t space, but it would think it's
> conceptually close enough that it makes sense for the binding.

Something common is almost always better.

That may also have the issue that fw_devlink will make the 'iommu' 
driver a dependency to probe.

Rob

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-18 16:39       ` Arnd Bergmann
@ 2022-05-18 23:32         ` Oleksandr
  2022-05-19  1:06           ` Stefano Stabellini
  0 siblings, 1 reply; 42+ messages in thread
From: Oleksandr @ 2022-05-18 23:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: xen-devel, open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE,
	DTML, Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Rob Herring, Krzysztof Kozlowski, Julien Grall,
	Juergen Gross, Stefano Stabellini, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker


On 18.05.22 19:39, Arnd Bergmann wrote:


Hello Arnd


> On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>>>    This would mean having a device
>>> node for the grant-table mechanism that can be referred to using the 'iommus'
>>> phandle property, with the domid as an additional argument.
>> I assume, you are speaking about something like the following?
>>
>>
>> xen_dummy_iommu {
>>      compatible = "xen,dummy-iommu";
>>      #iommu-cells = <1>;
>> };
>>
>> virtio@3000 {
>>      compatible = "virtio,mmio";
>>      reg = <0x3000 0x100>;
>>      interrupts = <41>;
>>
>>      /* The device is located in Xen domain with ID 1 */
>>      iommus = <&xen_dummy_iommu 1>;
>> };
> Right, that's that's the idea,

thank you for the confirmation



>   except I would not call it a 'dummy'.
>  From the perspective of the DT, this behaves just like an IOMMU,
> even if the exact mechanism is different from most hardware IOMMU
> implementations.

well, agree


>
>>> It does not quite fit the model that Linux currently uses for iommus,
>>> as that has an allocator for dma_addr_t space
>> yes (# 3/7 adds grant-table based allocator)
>>
>>
>>> , but it would think it's
>>> conceptually close enough that it makes sense for the binding.
>> Interesting idea. I am wondering, do we need an extra actions for this
>> to work in Linux guest (dummy IOMMU driver, etc)?
> It depends on how closely the guest implementation can be made to
> resemble a normal iommu. If you do allocate dma_addr_t addresses,
> it may actually be close enough that you can just turn the grant-table
> code into a normal iommu driver and change nothing else.

Unfortunately, I failed to find a way how use grant references at the 
iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I 
am not too familiar with that, so what is written below might be wrong 
or at least not precise.

The normal IOMMU driver in Linux doesn’t allocate DMA addresses by 
itself, it just maps (IOVA-PA) what was requested to be mapped by the 
upper layer. The DMA address allocation is done by the upper layer 
(DMA-IOMMU which is the glue layer between DMA API and IOMMU API 
allocates IOVA for PA?). But, all what we need here is just to allocate 
our specific grant-table based DMA addresses (DMA address = grant 
reference + offset in the page), so let’s say we need an entity to take 
a physical address as parameter and return a DMA address (what actually 
commit #3/7 is doing), and that’s all. So working at the dma_ops layer 
we get exactly what we need, with the minimal changes to guest 
infrastructure. In our case the Xen itself acts as an IOMMU.

Assuming that we want to reuse the IOMMU infrastructure somehow for our 
needs. I think, in that case we will likely need to introduce a new 
specific IOVA allocator (alongside with a generic one) to be hooked up 
by the DMA-IOMMU layer if we run on top of Xen. But, even having the 
specific IOVA allocator to return what we indeed need (DMA address = 
grant reference + offset in the page) we will still need the specific 
minimal required IOMMU driver to be present in the system anyway in 
order to track the mappings(?) and do nothing with them, returning a 
success (this specific IOMMU driver should have all mandatory callbacks 
implemented).

I completely agree, it would be really nice to reuse generic IOMMU 
bindings rather than introducing Xen specific property if what we are 
trying to implement in current patch series fits in the usage of 
"iommus" in Linux more-less. But, if we will have to add more 
complexity/more components to the code for the sake of reusing device 
tree binding, this raises a question whether that’s worthwhile.

Or I really missed something?


>
>          Arnd

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-18 18:59     ` Rob Herring
@ 2022-05-18 23:48       ` Oleksandr
  0 siblings, 0 replies; 42+ messages in thread
From: Oleksandr @ 2022-05-18 23:48 UTC (permalink / raw)
  To: Rob Herring, Arnd Bergmann
  Cc: xen-devel, open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE,
	DTML, Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Stefano Stabellini, Michael S. Tsirkin, Christoph Hellwig,
	Jean-Philippe Brucker


On 18.05.22 21:59, Rob Herring wrote:

Hello Rob, Arnd

> On Wed, May 18, 2022 at 03:32:27PM +0100, Arnd Bergmann wrote:
>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>>> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
>>> index 10c22b5..29a0932 100644
>>> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
>>> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
>>> @@ -13,6 +13,9 @@ description:
>>>     See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>>>     more details.
>>>
>>> +allOf:
>>> +  - $ref: /schemas/arm/xen,dev-domid.yaml#
>>> +
>>>   properties:
>>>     compatible:
>>>       const: virtio,mmio
>>> @@ -33,6 +36,10 @@ properties:
>>>       description: Required for devices making accesses thru an IOMMU.
>>>       maxItems: 1
>>>
>>> +  xen,dev-domid:
>>> +    description: Required when Xen grant mappings need to be enabled for device.
>>> +    $ref: /schemas/types.yaml#/definitions/uint32
>>> +
>>>   required:
>>>     - compatible
>>>     - reg
>> Sorry for joining the discussion late. Have you considered using the
>> generic iommu
>> binding here instead of a custom property? This would mean having a device
>> node for the grant-table mechanism that can be referred to using the 'iommus'
>> phandle property, with the domid as an additional argument.
>>
>> It does not quite fit the model that Linux currently uses for iommus,
>> as that has an allocator for dma_addr_t space, but it would think it's
>> conceptually close enough that it makes sense for the binding.
> Something common is almost always better.

agree


>
> That may also have the issue that fw_devlink will make the 'iommu'
> driver a dependency to probe.

Looks like I ran into it while experimenting. I generated the following 
nodes in guest DT using Xen toolstack:

[snip]

         xen_dummy_iommu {
                 compatible = "xen,dummy-iommu";
                 #iommu-cells = <0x01>;
                 phandle = <0xfde9>;
         };
         virtio@2000000 {
                 compatible = "virtio,mmio";
                 reg = <0x00 0x2000000 0x00 0x200>;
                 interrupts = <0x00 0x01 0xf01>;
                 interrupt-parent = <0xfde8>;
                 dma-coherent;
                 iommus = <0xfde9 0x01>;
         };

[snip]


And got:

virtio-mmio 2000000.virtio: deferred probe timeout, ignoring dependency


>
> Rob

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-18 23:32         ` Oleksandr
@ 2022-05-19  1:06           ` Stefano Stabellini
  2022-05-19  6:03             ` Oleksandr
  0 siblings, 1 reply; 42+ messages in thread
From: Stefano Stabellini @ 2022-05-19  1:06 UTC (permalink / raw)
  To: Oleksandr
  Cc: Arnd Bergmann, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Rob Herring, Krzysztof Kozlowski, Julien Grall,
	Juergen Gross, Stefano Stabellini, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker

[-- Attachment #1: Type: text/plain, Size: 4419 bytes --]

On Thu, 19 May 2022, Oleksandr wrote:
> > On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
> > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > <olekstysh@gmail.com> wrote:
> > > >    This would mean having a device
> > > > node for the grant-table mechanism that can be referred to using the
> > > > 'iommus'
> > > > phandle property, with the domid as an additional argument.
> > > I assume, you are speaking about something like the following?
> > > 
> > > 
> > > xen_dummy_iommu {
> > >      compatible = "xen,dummy-iommu";
> > >      #iommu-cells = <1>;
> > > };
> > > 
> > > virtio@3000 {
> > >      compatible = "virtio,mmio";
> > >      reg = <0x3000 0x100>;
> > >      interrupts = <41>;
> > > 
> > >      /* The device is located in Xen domain with ID 1 */
> > >      iommus = <&xen_dummy_iommu 1>;
> > > };
> > Right, that's that's the idea,
> 
> thank you for the confirmation
> 
> 
> 
> >   except I would not call it a 'dummy'.
> >  From the perspective of the DT, this behaves just like an IOMMU,
> > even if the exact mechanism is different from most hardware IOMMU
> > implementations.
> 
> well, agree
> 
> 
> > 
> > > > It does not quite fit the model that Linux currently uses for iommus,
> > > > as that has an allocator for dma_addr_t space
> > > yes (# 3/7 adds grant-table based allocator)
> > > 
> > > 
> > > > , but it would think it's
> > > > conceptually close enough that it makes sense for the binding.
> > > Interesting idea. I am wondering, do we need an extra actions for this
> > > to work in Linux guest (dummy IOMMU driver, etc)?
> > It depends on how closely the guest implementation can be made to
> > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > it may actually be close enough that you can just turn the grant-table
> > code into a normal iommu driver and change nothing else.
> 
> Unfortunately, I failed to find a way how use grant references at the
> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I am
> not too familiar with that, so what is written below might be wrong or at
> least not precise.
> 
> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by itself, it
> just maps (IOVA-PA) what was requested to be mapped by the upper layer. The
> DMA address allocation is done by the upper layer (DMA-IOMMU which is the glue
> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all what we
> need here is just to allocate our specific grant-table based DMA addresses
> (DMA address = grant reference + offset in the page), so let’s say we need an
> entity to take a physical address as parameter and return a DMA address (what
> actually commit #3/7 is doing), and that’s all. So working at the dma_ops
> layer we get exactly what we need, with the minimal changes to guest
> infrastructure. In our case the Xen itself acts as an IOMMU.
> 
> Assuming that we want to reuse the IOMMU infrastructure somehow for our needs.
> I think, in that case we will likely need to introduce a new specific IOVA
> allocator (alongside with a generic one) to be hooked up by the DMA-IOMMU
> layer if we run on top of Xen. But, even having the specific IOVA allocator to
> return what we indeed need (DMA address = grant reference + offset in the
> page) we will still need the specific minimal required IOMMU driver to be
> present in the system anyway in order to track the mappings(?) and do nothing
> with them, returning a success (this specific IOMMU driver should have all
> mandatory callbacks implemented).
> 
> I completely agree, it would be really nice to reuse generic IOMMU bindings
> rather than introducing Xen specific property if what we are trying to
> implement in current patch series fits in the usage of "iommus" in Linux
> more-less. But, if we will have to add more complexity/more components to the
> code for the sake of reusing device tree binding, this raises a question
> whether that’s worthwhile.
> 
> Or I really missed something?

I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
bindings, not necessarily the IOMMU drivers framework in Linux (although
that would be an added bonus.)

I know from previous discussions with you that making the grant table
fit in the existing IOMMU drivers model is difficult, but just reusing
the Device Tree bindings seems feasible?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-19  1:06           ` Stefano Stabellini
@ 2022-05-19  6:03             ` Oleksandr
  2022-05-23 17:30               ` Oleksandr
  0 siblings, 1 reply; 42+ messages in thread
From: Oleksandr @ 2022-05-19  6:03 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Arnd Bergmann, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Oleksandr Tyshchenko,
	Jason Wang, Rob Herring, Krzysztof Kozlowski, Julien Grall,
	Juergen Gross, Michael S. Tsirkin, Christoph Hellwig,
	Jean-Philippe Brucker


On 19.05.22 04:06, Stefano Stabellini wrote:


Hello Stefano

> On Thu, 19 May 2022, Oleksandr wrote:
>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>> <olekstysh@gmail.com> wrote:
>>>>>     This would mean having a device
>>>>> node for the grant-table mechanism that can be referred to using the
>>>>> 'iommus'
>>>>> phandle property, with the domid as an additional argument.
>>>> I assume, you are speaking about something like the following?
>>>>
>>>>
>>>> xen_dummy_iommu {
>>>>       compatible = "xen,dummy-iommu";
>>>>       #iommu-cells = <1>;
>>>> };
>>>>
>>>> virtio@3000 {
>>>>       compatible = "virtio,mmio";
>>>>       reg = <0x3000 0x100>;
>>>>       interrupts = <41>;
>>>>
>>>>       /* The device is located in Xen domain with ID 1 */
>>>>       iommus = <&xen_dummy_iommu 1>;
>>>> };
>>> Right, that's that's the idea,
>> thank you for the confirmation
>>
>>
>>
>>>    except I would not call it a 'dummy'.
>>>   From the perspective of the DT, this behaves just like an IOMMU,
>>> even if the exact mechanism is different from most hardware IOMMU
>>> implementations.
>> well, agree
>>
>>
>>>>> It does not quite fit the model that Linux currently uses for iommus,
>>>>> as that has an allocator for dma_addr_t space
>>>> yes (# 3/7 adds grant-table based allocator)
>>>>
>>>>
>>>>> , but it would think it's
>>>>> conceptually close enough that it makes sense for the binding.
>>>> Interesting idea. I am wondering, do we need an extra actions for this
>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>> It depends on how closely the guest implementation can be made to
>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>> it may actually be close enough that you can just turn the grant-table
>>> code into a normal iommu driver and change nothing else.
>> Unfortunately, I failed to find a way how use grant references at the
>> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I am
>> not too familiar with that, so what is written below might be wrong or at
>> least not precise.
>>
>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by itself, it
>> just maps (IOVA-PA) what was requested to be mapped by the upper layer. The
>> DMA address allocation is done by the upper layer (DMA-IOMMU which is the glue
>> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all what we
>> need here is just to allocate our specific grant-table based DMA addresses
>> (DMA address = grant reference + offset in the page), so let’s say we need an
>> entity to take a physical address as parameter and return a DMA address (what
>> actually commit #3/7 is doing), and that’s all. So working at the dma_ops
>> layer we get exactly what we need, with the minimal changes to guest
>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>
>> Assuming that we want to reuse the IOMMU infrastructure somehow for our needs.
>> I think, in that case we will likely need to introduce a new specific IOVA
>> allocator (alongside with a generic one) to be hooked up by the DMA-IOMMU
>> layer if we run on top of Xen. But, even having the specific IOVA allocator to
>> return what we indeed need (DMA address = grant reference + offset in the
>> page) we will still need the specific minimal required IOMMU driver to be
>> present in the system anyway in order to track the mappings(?) and do nothing
>> with them, returning a success (this specific IOMMU driver should have all
>> mandatory callbacks implemented).
>>
>> I completely agree, it would be really nice to reuse generic IOMMU bindings
>> rather than introducing Xen specific property if what we are trying to
>> implement in current patch series fits in the usage of "iommus" in Linux
>> more-less. But, if we will have to add more complexity/more components to the
>> code for the sake of reusing device tree binding, this raises a question
>> whether that’s worthwhile.
>>
>> Or I really missed something?
> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> bindings, not necessarily the IOMMU drivers framework in Linux (although
> that would be an added bonus.)
>
> I know from previous discussions with you that making the grant table
> fit in the existing IOMMU drivers model is difficult, but just reusing
> the Device Tree bindings seems feasible?

I started experimenting with that. As wrote in a separate email, I got a 
deferred probe timeout,

after inserting required nodes into guest device tree, which seems to be 
a consequence of the unavailability of IOMMU, I will continue to 
investigate this question.



-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-19  6:03             ` Oleksandr
@ 2022-05-23 17:30               ` Oleksandr
  2022-05-24  1:58                 ` Stefano Stabellini
  0 siblings, 1 reply; 42+ messages in thread
From: Oleksandr @ 2022-05-23 17:30 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM
  Cc: Arnd Bergmann, Oleksandr Tyshchenko, Jason Wang, Rob Herring,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Michael S. Tsirkin, Christoph Hellwig, Jean-Philippe Brucker


On 19.05.22 09:03, Oleksandr wrote:

Hello Stefano, all


>
> On 19.05.22 04:06, Stefano Stabellini wrote:
>
>
> Hello Stefano, all
>
>> On Thu, 19 May 2022, Oleksandr wrote:
>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>> <olekstysh@gmail.com> wrote:
>>>>>>     This would mean having a device
>>>>>> node for the grant-table mechanism that can be referred to using the
>>>>>> 'iommus'
>>>>>> phandle property, with the domid as an additional argument.
>>>>> I assume, you are speaking about something like the following?
>>>>>
>>>>>
>>>>> xen_dummy_iommu {
>>>>>       compatible = "xen,dummy-iommu";
>>>>>       #iommu-cells = <1>;
>>>>> };
>>>>>
>>>>> virtio@3000 {
>>>>>       compatible = "virtio,mmio";
>>>>>       reg = <0x3000 0x100>;
>>>>>       interrupts = <41>;
>>>>>
>>>>>       /* The device is located in Xen domain with ID 1 */
>>>>>       iommus = <&xen_dummy_iommu 1>;
>>>>> };
>>>> Right, that's that's the idea,
>>> thank you for the confirmation
>>>
>>>
>>>
>>>>    except I would not call it a 'dummy'.
>>>>   From the perspective of the DT, this behaves just like an IOMMU,
>>>> even if the exact mechanism is different from most hardware IOMMU
>>>> implementations.
>>> well, agree
>>>
>>>
>>>>>> It does not quite fit the model that Linux currently uses for 
>>>>>> iommus,
>>>>>> as that has an allocator for dma_addr_t space
>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>
>>>>>
>>>>>> , but it would think it's
>>>>>> conceptually close enough that it makes sense for the binding.
>>>>> Interesting idea. I am wondering, do we need an extra actions for 
>>>>> this
>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>> It depends on how closely the guest implementation can be made to
>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>> it may actually be close enough that you can just turn the grant-table
>>>> code into a normal iommu driver and change nothing else.
>>> Unfortunately, I failed to find a way how use grant references at the
>>> iommu_ops level (I mean to fully pretend that we are an IOMMU 
>>> driver). I am
>>> not too familiar with that, so what is written below might be wrong 
>>> or at
>>> least not precise.
>>>
>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by 
>>> itself, it
>>> just maps (IOVA-PA) what was requested to be mapped by the upper 
>>> layer. The
>>> DMA address allocation is done by the upper layer (DMA-IOMMU which 
>>> is the glue
>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But, 
>>> all what we
>>> need here is just to allocate our specific grant-table based DMA 
>>> addresses
>>> (DMA address = grant reference + offset in the page), so let’s say 
>>> we need an
>>> entity to take a physical address as parameter and return a DMA 
>>> address (what
>>> actually commit #3/7 is doing), and that’s all. So working at the 
>>> dma_ops
>>> layer we get exactly what we need, with the minimal changes to guest
>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>
>>> Assuming that we want to reuse the IOMMU infrastructure somehow for 
>>> our needs.
>>> I think, in that case we will likely need to introduce a new 
>>> specific IOVA
>>> allocator (alongside with a generic one) to be hooked up by the 
>>> DMA-IOMMU
>>> layer if we run on top of Xen. But, even having the specific IOVA 
>>> allocator to
>>> return what we indeed need (DMA address = grant reference + offset 
>>> in the
>>> page) we will still need the specific minimal required IOMMU driver 
>>> to be
>>> present in the system anyway in order to track the mappings(?) and 
>>> do nothing
>>> with them, returning a success (this specific IOMMU driver should 
>>> have all
>>> mandatory callbacks implemented).
>>>
>>> I completely agree, it would be really nice to reuse generic IOMMU 
>>> bindings
>>> rather than introducing Xen specific property if what we are trying to
>>> implement in current patch series fits in the usage of "iommus" in 
>>> Linux
>>> more-less. But, if we will have to add more complexity/more 
>>> components to the
>>> code for the sake of reusing device tree binding, this raises a 
>>> question
>>> whether that’s worthwhile.
>>>
>>> Or I really missed something?
>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>> bindings, not necessarily the IOMMU drivers framework in Linux (although
>> that would be an added bonus.)
>>
>> I know from previous discussions with you that making the grant table
>> fit in the existing IOMMU drivers model is difficult, but just reusing
>> the Device Tree bindings seems feasible?
>
> I started experimenting with that. As wrote in a separate email, I got 
> a deferred probe timeout,
>
> after inserting required nodes into guest device tree, which seems to 
> be a consequence of the unavailability of IOMMU, I will continue to 
> investigate this question.


I have experimented with that. Yes, just reusing the Device Tree 
bindings is technically feasible (and we are able to do this by only 
touching grant-dma-ops.c), although deferred probe timeout still stands 
(as there is no IOMMU driver being present actually).

[    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, 
ignoring dependency
[    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks 
(2.10 GB/1.95 GiB)


Below the working diff (on top of current series):

diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index da9c7ff..6586152 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {

  bool xen_is_grant_dma_device(struct device *dev)
  {
+       struct device_node *iommu_np;
+       bool has_iommu;
+
         /* XXX Handle only DT devices for now */
         if (!dev->of_node)
                 return false;

-       return of_property_read_bool(dev->of_node, "xen,backend-domid");
+       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
+       has_iommu = iommu_np && of_device_is_compatible(iommu_np, 
"xen,grant-dma");
+       of_node_put(iommu_np);
+
+       return has_iommu;
  }

  void xen_grant_setup_dma_ops(struct device *dev)
  {
         struct xen_grant_dma_data *data;
-       uint32_t domid;
+       struct of_phandle_args iommu_spec;

         data = find_xen_grant_dma_data(dev);
         if (data) {
@@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
         if (!dev->of_node)
                 goto err;

-       if (of_property_read_u32(dev->of_node, "xen,backend-domid", 
&domid)) {
-               dev_err(dev, "xen,backend-domid property is not present\n");
+       if (of_parse_phandle_with_args(dev->of_node, "iommus", 
"#iommu-cells",
+                       0, &iommu_spec)) {
+               dev_err(dev, "Cannot parse iommus property\n");
+               goto err;
+       }
+
+       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
+                       iommu_spec.args_count != 1) {
+               dev_err(dev, "Incompatible IOMMU node\n");
+               of_node_put(iommu_spec.np);
                 goto err;
         }

+       of_node_put(iommu_spec.np);
+
         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
         if (!data)
                 goto err;

-       data->backend_domid = domid;
+       /*
+        * The endpoint ID here means the ID of the domain where the 
corresponding
+        * backend is running
+        */
+       data->backend_domid = iommu_spec.args[0];

         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, 
data,
                         GFP_KERNEL))) {
(END)



Below, the nodes generated by Xen toolstack:

         xen_grant_dma {
                 compatible = "xen,grant-dma";
                 #iommu-cells = <0x01>;
                 phandle = <0xfde9>;
         };

         virtio@2000000 {
                 compatible = "virtio,mmio";
                 reg = <0x00 0x2000000 0x00 0x200>;
                 interrupts = <0x00 0x01 0xf01>;
                 interrupt-parent = <0xfde8>;
                 dma-coherent;
                 iommus = <0xfde9 0x01>;
         };



I am wondering, would be the proper solution to eliminate deferred probe 
timeout issue in our particular case (without introducing an extra IOMMU 
driver)?




>
>
>
>
-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-23 17:30               ` Oleksandr
@ 2022-05-24  1:58                 ` Stefano Stabellini
  2022-05-24 16:01                   ` Rob Herring
  2022-05-24 16:11                   ` Oleksandr
  0 siblings, 2 replies; 42+ messages in thread
From: Stefano Stabellini @ 2022-05-24  1:58 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Arnd Bergmann,
	Oleksandr Tyshchenko, Jason Wang, Rob Herring,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Michael S. Tsirkin, Christoph Hellwig, Jean-Philippe Brucker

[-- Attachment #1: Type: text/plain, Size: 9846 bytes --]

On Mon, 23 May 2022, Oleksandr wrote:
> > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
> > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > <olekstysh@gmail.com> wrote:
> > > > > > >     This would mean having a device
> > > > > > > node for the grant-table mechanism that can be referred to using
> > > > > > > the
> > > > > > > 'iommus'
> > > > > > > phandle property, with the domid as an additional argument.
> > > > > > I assume, you are speaking about something like the following?
> > > > > > 
> > > > > > 
> > > > > > xen_dummy_iommu {
> > > > > >       compatible = "xen,dummy-iommu";
> > > > > >       #iommu-cells = <1>;
> > > > > > };
> > > > > > 
> > > > > > virtio@3000 {
> > > > > >       compatible = "virtio,mmio";
> > > > > >       reg = <0x3000 0x100>;
> > > > > >       interrupts = <41>;
> > > > > > 
> > > > > >       /* The device is located in Xen domain with ID 1 */
> > > > > >       iommus = <&xen_dummy_iommu 1>;
> > > > > > };
> > > > > Right, that's that's the idea,
> > > > thank you for the confirmation
> > > > 
> > > > 
> > > > 
> > > > >    except I would not call it a 'dummy'.
> > > > >   From the perspective of the DT, this behaves just like an IOMMU,
> > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > implementations.
> > > > well, agree
> > > > 
> > > > 
> > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > iommus,
> > > > > > > as that has an allocator for dma_addr_t space
> > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > > 
> > > > > > 
> > > > > > > , but it would think it's
> > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > Interesting idea. I am wondering, do we need an extra actions for
> > > > > > this
> > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > It depends on how closely the guest implementation can be made to
> > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > it may actually be close enough that you can just turn the grant-table
> > > > > code into a normal iommu driver and change nothing else.
> > > > Unfortunately, I failed to find a way how use grant references at the
> > > > iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
> > > > am
> > > > not too familiar with that, so what is written below might be wrong or
> > > > at
> > > > least not precise.
> > > > 
> > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > itself, it
> > > > just maps (IOVA-PA) what was requested to be mapped by the upper layer.
> > > > The
> > > > DMA address allocation is done by the upper layer (DMA-IOMMU which is
> > > > the glue
> > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
> > > > what we
> > > > need here is just to allocate our specific grant-table based DMA
> > > > addresses
> > > > (DMA address = grant reference + offset in the page), so let’s say we
> > > > need an
> > > > entity to take a physical address as parameter and return a DMA address
> > > > (what
> > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > dma_ops
> > > > layer we get exactly what we need, with the minimal changes to guest
> > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > > 
> > > > Assuming that we want to reuse the IOMMU infrastructure somehow for our
> > > > needs.
> > > > I think, in that case we will likely need to introduce a new specific
> > > > IOVA
> > > > allocator (alongside with a generic one) to be hooked up by the
> > > > DMA-IOMMU
> > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > allocator to
> > > > return what we indeed need (DMA address = grant reference + offset in
> > > > the
> > > > page) we will still need the specific minimal required IOMMU driver to
> > > > be
> > > > present in the system anyway in order to track the mappings(?) and do
> > > > nothing
> > > > with them, returning a success (this specific IOMMU driver should have
> > > > all
> > > > mandatory callbacks implemented).
> > > > 
> > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > bindings
> > > > rather than introducing Xen specific property if what we are trying to
> > > > implement in current patch series fits in the usage of "iommus" in Linux
> > > > more-less. But, if we will have to add more complexity/more components
> > > > to the
> > > > code for the sake of reusing device tree binding, this raises a question
> > > > whether that’s worthwhile.
> > > > 
> > > > Or I really missed something?
> > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > bindings, not necessarily the IOMMU drivers framework in Linux (although
> > > that would be an added bonus.)
> > > 
> > > I know from previous discussions with you that making the grant table
> > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > the Device Tree bindings seems feasible?
> > 
> > I started experimenting with that. As wrote in a separate email, I got a
> > deferred probe timeout,
> > 
> > after inserting required nodes into guest device tree, which seems to be a
> > consequence of the unavailability of IOMMU, I will continue to investigate
> > this question.
> 
> 
> I have experimented with that. Yes, just reusing the Device Tree bindings is
> technically feasible (and we are able to do this by only touching
> grant-dma-ops.c), although deferred probe timeout still stands (as there is no
> IOMMU driver being present actually).
> 
> [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
> dependency
> [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
> GB/1.95 GiB)
> 
> 
> Below the working diff (on top of current series):
> 
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> index da9c7ff..6586152 100644
> --- a/drivers/xen/grant-dma-ops.c
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
> 
>  bool xen_is_grant_dma_device(struct device *dev)
>  {
> +       struct device_node *iommu_np;
> +       bool has_iommu;
> +
>         /* XXX Handle only DT devices for now */
>         if (!dev->of_node)
>                 return false;
> 
> -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
> +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> "xen,grant-dma");
> +       of_node_put(iommu_np);
> +
> +       return has_iommu;
>  }
> 
>  void xen_grant_setup_dma_ops(struct device *dev)
>  {
>         struct xen_grant_dma_data *data;
> -       uint32_t domid;
> +       struct of_phandle_args iommu_spec;
> 
>         data = find_xen_grant_dma_data(dev);
>         if (data) {
> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>         if (!dev->of_node)
>                 goto err;
> 
> -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
> -               dev_err(dev, "xen,backend-domid property is not present\n");
> +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
> +                       0, &iommu_spec)) {
> +               dev_err(dev, "Cannot parse iommus property\n");
> +               goto err;
> +       }
> +
> +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> +                       iommu_spec.args_count != 1) {
> +               dev_err(dev, "Incompatible IOMMU node\n");
> +               of_node_put(iommu_spec.np);
>                 goto err;
>         }
> 
> +       of_node_put(iommu_spec.np);
> +
>         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>         if (!data)
>                 goto err;
> 
> -       data->backend_domid = domid;
> +       /*
> +        * The endpoint ID here means the ID of the domain where the
> corresponding
> +        * backend is running
> +        */
> +       data->backend_domid = iommu_spec.args[0];
> 
>         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
>                         GFP_KERNEL))) {
> (END)
> 
> 
> 
> Below, the nodes generated by Xen toolstack:
> 
>         xen_grant_dma {
>                 compatible = "xen,grant-dma";
>                 #iommu-cells = <0x01>;
>                 phandle = <0xfde9>;
>         };
> 
>         virtio@2000000 {
>                 compatible = "virtio,mmio";
>                 reg = <0x00 0x2000000 0x00 0x200>;
>                 interrupts = <0x00 0x01 0xf01>;
>                 interrupt-parent = <0xfde8>;
>                 dma-coherent;
>                 iommus = <0xfde9 0x01>;
>         };
 
Not bad! I like it.
 
 
> I am wondering, would be the proper solution to eliminate deferred probe
> timeout issue in our particular case (without introducing an extra IOMMU
> driver)?

In reality I don't think there is a way to do that. I would create an
empty skelethon IOMMU driver for xen,grant-dma.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-24  1:58                 ` Stefano Stabellini
@ 2022-05-24 16:01                   ` Rob Herring
  2022-05-24 18:34                     ` Saravana Kannan
  2022-05-24 16:11                   ` Oleksandr
  1 sibling, 1 reply; 42+ messages in thread
From: Rob Herring @ 2022-05-24 16:01 UTC (permalink / raw)
  To: Stefano Stabellini, Saravana Kannan
  Cc: Oleksandr, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Arnd Bergmann,
	Oleksandr Tyshchenko, Jason Wang, Krzysztof Kozlowski,
	Julien Grall, Juergen Gross, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker

+Saravana

On Mon, May 23, 2022 at 06:58:13PM -0700, Stefano Stabellini wrote:
> On Mon, 23 May 2022, Oleksandr wrote:
> > > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
> > > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > > <olekstysh@gmail.com> wrote:
> > > > > > > >     This would mean having a device
> > > > > > > > node for the grant-table mechanism that can be referred to using
> > > > > > > > the
> > > > > > > > 'iommus'
> > > > > > > > phandle property, with the domid as an additional argument.
> > > > > > > I assume, you are speaking about something like the following?
> > > > > > > 
> > > > > > > 
> > > > > > > xen_dummy_iommu {
> > > > > > >       compatible = "xen,dummy-iommu";
> > > > > > >       #iommu-cells = <1>;
> > > > > > > };
> > > > > > > 
> > > > > > > virtio@3000 {
> > > > > > >       compatible = "virtio,mmio";
> > > > > > >       reg = <0x3000 0x100>;
> > > > > > >       interrupts = <41>;
> > > > > > > 
> > > > > > >       /* The device is located in Xen domain with ID 1 */
> > > > > > >       iommus = <&xen_dummy_iommu 1>;
> > > > > > > };
> > > > > > Right, that's that's the idea,
> > > > > thank you for the confirmation
> > > > > 
> > > > > 
> > > > > 
> > > > > >    except I would not call it a 'dummy'.
> > > > > >   From the perspective of the DT, this behaves just like an IOMMU,
> > > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > > implementations.
> > > > > well, agree
> > > > > 
> > > > > 
> > > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > > iommus,
> > > > > > > > as that has an allocator for dma_addr_t space
> > > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > > > 
> > > > > > > 
> > > > > > > > , but it would think it's
> > > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > > Interesting idea. I am wondering, do we need an extra actions for
> > > > > > > this
> > > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > > It depends on how closely the guest implementation can be made to
> > > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > > it may actually be close enough that you can just turn the grant-table
> > > > > > code into a normal iommu driver and change nothing else.
> > > > > Unfortunately, I failed to find a way how use grant references at the
> > > > > iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
> > > > > am
> > > > > not too familiar with that, so what is written below might be wrong or
> > > > > at
> > > > > least not precise.
> > > > > 
> > > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > > itself, it
> > > > > just maps (IOVA-PA) what was requested to be mapped by the upper layer.
> > > > > The
> > > > > DMA address allocation is done by the upper layer (DMA-IOMMU which is
> > > > > the glue
> > > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
> > > > > what we
> > > > > need here is just to allocate our specific grant-table based DMA
> > > > > addresses
> > > > > (DMA address = grant reference + offset in the page), so let’s say we
> > > > > need an
> > > > > entity to take a physical address as parameter and return a DMA address
> > > > > (what
> > > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > > dma_ops
> > > > > layer we get exactly what we need, with the minimal changes to guest
> > > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > > > 
> > > > > Assuming that we want to reuse the IOMMU infrastructure somehow for our
> > > > > needs.
> > > > > I think, in that case we will likely need to introduce a new specific
> > > > > IOVA
> > > > > allocator (alongside with a generic one) to be hooked up by the
> > > > > DMA-IOMMU
> > > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > > allocator to
> > > > > return what we indeed need (DMA address = grant reference + offset in
> > > > > the
> > > > > page) we will still need the specific minimal required IOMMU driver to
> > > > > be
> > > > > present in the system anyway in order to track the mappings(?) and do
> > > > > nothing
> > > > > with them, returning a success (this specific IOMMU driver should have
> > > > > all
> > > > > mandatory callbacks implemented).
> > > > > 
> > > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > > bindings
> > > > > rather than introducing Xen specific property if what we are trying to
> > > > > implement in current patch series fits in the usage of "iommus" in Linux
> > > > > more-less. But, if we will have to add more complexity/more components
> > > > > to the
> > > > > code for the sake of reusing device tree binding, this raises a question
> > > > > whether that’s worthwhile.
> > > > > 
> > > > > Or I really missed something?
> > > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > > bindings, not necessarily the IOMMU drivers framework in Linux (although
> > > > that would be an added bonus.)
> > > > 
> > > > I know from previous discussions with you that making the grant table
> > > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > > the Device Tree bindings seems feasible?
> > > 
> > > I started experimenting with that. As wrote in a separate email, I got a
> > > deferred probe timeout,
> > > 
> > > after inserting required nodes into guest device tree, which seems to be a
> > > consequence of the unavailability of IOMMU, I will continue to investigate
> > > this question.
> > 
> > 
> > I have experimented with that. Yes, just reusing the Device Tree bindings is
> > technically feasible (and we are able to do this by only touching
> > grant-dma-ops.c), although deferred probe timeout still stands (as there is no
> > IOMMU driver being present actually).
> > 
> > [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
> > dependency
> > [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
> > GB/1.95 GiB)
> > 
> > 
> > Below the working diff (on top of current series):
> > 
> > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > index da9c7ff..6586152 100644
> > --- a/drivers/xen/grant-dma-ops.c
> > +++ b/drivers/xen/grant-dma-ops.c
> > @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
> > 
> >  bool xen_is_grant_dma_device(struct device *dev)
> >  {
> > +       struct device_node *iommu_np;
> > +       bool has_iommu;
> > +
> >         /* XXX Handle only DT devices for now */
> >         if (!dev->of_node)
> >                 return false;
> > 
> > -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
> > +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> > +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> > "xen,grant-dma");
> > +       of_node_put(iommu_np);
> > +
> > +       return has_iommu;
> >  }
> > 
> >  void xen_grant_setup_dma_ops(struct device *dev)
> >  {
> >         struct xen_grant_dma_data *data;
> > -       uint32_t domid;
> > +       struct of_phandle_args iommu_spec;
> > 
> >         data = find_xen_grant_dma_data(dev);
> >         if (data) {
> > @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
> >         if (!dev->of_node)
> >                 goto err;
> > 
> > -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
> > -               dev_err(dev, "xen,backend-domid property is not present\n");
> > +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
> > +                       0, &iommu_spec)) {
> > +               dev_err(dev, "Cannot parse iommus property\n");
> > +               goto err;
> > +       }
> > +
> > +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> > +                       iommu_spec.args_count != 1) {
> > +               dev_err(dev, "Incompatible IOMMU node\n");
> > +               of_node_put(iommu_spec.np);
> >                 goto err;
> >         }
> > 
> > +       of_node_put(iommu_spec.np);
> > +
> >         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> >         if (!data)
> >                 goto err;
> > 
> > -       data->backend_domid = domid;
> > +       /*
> > +        * The endpoint ID here means the ID of the domain where the
> > corresponding
> > +        * backend is running
> > +        */
> > +       data->backend_domid = iommu_spec.args[0];
> > 
> >         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
> >                         GFP_KERNEL))) {
> > (END)
> > 
> > 
> > 
> > Below, the nodes generated by Xen toolstack:
> > 
> >         xen_grant_dma {

Nit: iommu {

> >                 compatible = "xen,grant-dma";
> >                 #iommu-cells = <0x01>;
> >                 phandle = <0xfde9>;
> >         };
> > 
> >         virtio@2000000 {
> >                 compatible = "virtio,mmio";
> >                 reg = <0x00 0x2000000 0x00 0x200>;
> >                 interrupts = <0x00 0x01 0xf01>;
> >                 interrupt-parent = <0xfde8>;
> >                 dma-coherent;
> >                 iommus = <0xfde9 0x01>;
> >         };
>  
> Not bad! I like it.
>  
>  
> > I am wondering, would be the proper solution to eliminate deferred probe
> > timeout issue in our particular case (without introducing an extra IOMMU
> > driver)?
> 
> In reality I don't think there is a way to do that. I would create an
> empty skelethon IOMMU driver for xen,grant-dma.

Does it have to be an empty driver? Originally, IOMMU 'drivers' were not 
drivers, but they've been getting converted. Can that be done here?

Short of that, I think we could have some sort of skip probe list for 
deferred probe. Not sure if that would be easiest as IOMMU specific or 
global.

Rob

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-24  1:58                 ` Stefano Stabellini
  2022-05-24 16:01                   ` Rob Herring
@ 2022-05-24 16:11                   ` Oleksandr
  2022-05-24 17:59                     ` Stefano Stabellini
  1 sibling, 1 reply; 42+ messages in thread
From: Oleksandr @ 2022-05-24 16:11 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE,
	DTML, Linux Kernel Mailing List, Linux ARM, Arnd Bergmann,
	Oleksandr Tyshchenko, Jason Wang, Rob Herring,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Michael S. Tsirkin, Christoph Hellwig, Jean-Philippe Brucker


On 24.05.22 04:58, Stefano Stabellini wrote:

Hello Stefano, all

> On Mon, 23 May 2022, Oleksandr wrote:
>>>> On Thu, 19 May 2022, Oleksandr wrote:
>>>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
>>>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>>>> <olekstysh@gmail.com> wrote:
>>>>>>>>      This would mean having a device
>>>>>>>> node for the grant-table mechanism that can be referred to using
>>>>>>>> the
>>>>>>>> 'iommus'
>>>>>>>> phandle property, with the domid as an additional argument.
>>>>>>> I assume, you are speaking about something like the following?
>>>>>>>
>>>>>>>
>>>>>>> xen_dummy_iommu {
>>>>>>>        compatible = "xen,dummy-iommu";
>>>>>>>        #iommu-cells = <1>;
>>>>>>> };
>>>>>>>
>>>>>>> virtio@3000 {
>>>>>>>        compatible = "virtio,mmio";
>>>>>>>        reg = <0x3000 0x100>;
>>>>>>>        interrupts = <41>;
>>>>>>>
>>>>>>>        /* The device is located in Xen domain with ID 1 */
>>>>>>>        iommus = <&xen_dummy_iommu 1>;
>>>>>>> };
>>>>>> Right, that's that's the idea,
>>>>> thank you for the confirmation
>>>>>
>>>>>
>>>>>
>>>>>>     except I would not call it a 'dummy'.
>>>>>>    From the perspective of the DT, this behaves just like an IOMMU,
>>>>>> even if the exact mechanism is different from most hardware IOMMU
>>>>>> implementations.
>>>>> well, agree
>>>>>
>>>>>
>>>>>>>> It does not quite fit the model that Linux currently uses for
>>>>>>>> iommus,
>>>>>>>> as that has an allocator for dma_addr_t space
>>>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>>>
>>>>>>>
>>>>>>>> , but it would think it's
>>>>>>>> conceptually close enough that it makes sense for the binding.
>>>>>>> Interesting idea. I am wondering, do we need an extra actions for
>>>>>>> this
>>>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>>>> It depends on how closely the guest implementation can be made to
>>>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>>>> it may actually be close enough that you can just turn the grant-table
>>>>>> code into a normal iommu driver and change nothing else.
>>>>> Unfortunately, I failed to find a way how use grant references at the
>>>>> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
>>>>> am
>>>>> not too familiar with that, so what is written below might be wrong or
>>>>> at
>>>>> least not precise.
>>>>>
>>>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
>>>>> itself, it
>>>>> just maps (IOVA-PA) what was requested to be mapped by the upper layer.
>>>>> The
>>>>> DMA address allocation is done by the upper layer (DMA-IOMMU which is
>>>>> the glue
>>>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
>>>>> what we
>>>>> need here is just to allocate our specific grant-table based DMA
>>>>> addresses
>>>>> (DMA address = grant reference + offset in the page), so let’s say we
>>>>> need an
>>>>> entity to take a physical address as parameter and return a DMA address
>>>>> (what
>>>>> actually commit #3/7 is doing), and that’s all. So working at the
>>>>> dma_ops
>>>>> layer we get exactly what we need, with the minimal changes to guest
>>>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>>>
>>>>> Assuming that we want to reuse the IOMMU infrastructure somehow for our
>>>>> needs.
>>>>> I think, in that case we will likely need to introduce a new specific
>>>>> IOVA
>>>>> allocator (alongside with a generic one) to be hooked up by the
>>>>> DMA-IOMMU
>>>>> layer if we run on top of Xen. But, even having the specific IOVA
>>>>> allocator to
>>>>> return what we indeed need (DMA address = grant reference + offset in
>>>>> the
>>>>> page) we will still need the specific minimal required IOMMU driver to
>>>>> be
>>>>> present in the system anyway in order to track the mappings(?) and do
>>>>> nothing
>>>>> with them, returning a success (this specific IOMMU driver should have
>>>>> all
>>>>> mandatory callbacks implemented).
>>>>>
>>>>> I completely agree, it would be really nice to reuse generic IOMMU
>>>>> bindings
>>>>> rather than introducing Xen specific property if what we are trying to
>>>>> implement in current patch series fits in the usage of "iommus" in Linux
>>>>> more-less. But, if we will have to add more complexity/more components
>>>>> to the
>>>>> code for the sake of reusing device tree binding, this raises a question
>>>>> whether that’s worthwhile.
>>>>>
>>>>> Or I really missed something?
>>>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>>>> bindings, not necessarily the IOMMU drivers framework in Linux (although
>>>> that would be an added bonus.)
>>>>
>>>> I know from previous discussions with you that making the grant table
>>>> fit in the existing IOMMU drivers model is difficult, but just reusing
>>>> the Device Tree bindings seems feasible?
>>> I started experimenting with that. As wrote in a separate email, I got a
>>> deferred probe timeout,
>>>
>>> after inserting required nodes into guest device tree, which seems to be a
>>> consequence of the unavailability of IOMMU, I will continue to investigate
>>> this question.
>>
>> I have experimented with that. Yes, just reusing the Device Tree bindings is
>> technically feasible (and we are able to do this by only touching
>> grant-dma-ops.c), although deferred probe timeout still stands (as there is no
>> IOMMU driver being present actually).
>>
>> [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
>> dependency
>> [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
>> GB/1.95 GiB)
>>
>>
>> Below the working diff (on top of current series):
>>
>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>> index da9c7ff..6586152 100644
>> --- a/drivers/xen/grant-dma-ops.c
>> +++ b/drivers/xen/grant-dma-ops.c
>> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
>>
>>   bool xen_is_grant_dma_device(struct device *dev)
>>   {
>> +       struct device_node *iommu_np;
>> +       bool has_iommu;
>> +
>>          /* XXX Handle only DT devices for now */
>>          if (!dev->of_node)
>>                  return false;
>>
>> -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
>> +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
>> +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
>> "xen,grant-dma");
>> +       of_node_put(iommu_np);
>> +
>> +       return has_iommu;
>>   }
>>
>>   void xen_grant_setup_dma_ops(struct device *dev)
>>   {
>>          struct xen_grant_dma_data *data;
>> -       uint32_t domid;
>> +       struct of_phandle_args iommu_spec;
>>
>>          data = find_xen_grant_dma_data(dev);
>>          if (data) {
>> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>>          if (!dev->of_node)
>>                  goto err;
>>
>> -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
>> -               dev_err(dev, "xen,backend-domid property is not present\n");
>> +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
>> +                       0, &iommu_spec)) {
>> +               dev_err(dev, "Cannot parse iommus property\n");
>> +               goto err;
>> +       }
>> +
>> +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
>> +                       iommu_spec.args_count != 1) {
>> +               dev_err(dev, "Incompatible IOMMU node\n");
>> +               of_node_put(iommu_spec.np);
>>                  goto err;
>>          }
>>
>> +       of_node_put(iommu_spec.np);
>> +
>>          data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>          if (!data)
>>                  goto err;
>>
>> -       data->backend_domid = domid;
>> +       /*
>> +        * The endpoint ID here means the ID of the domain where the
>> corresponding
>> +        * backend is running
>> +        */
>> +       data->backend_domid = iommu_spec.args[0];
>>
>>          if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
>>                          GFP_KERNEL))) {
>> (END)
>>
>>
>>
>> Below, the nodes generated by Xen toolstack:
>>
>>          xen_grant_dma {
>>                  compatible = "xen,grant-dma";
>>                  #iommu-cells = <0x01>;
>>                  phandle = <0xfde9>;
>>          };
>>
>>          virtio@2000000 {
>>                  compatible = "virtio,mmio";
>>                  reg = <0x00 0x2000000 0x00 0x200>;
>>                  interrupts = <0x00 0x01 0xf01>;
>>                  interrupt-parent = <0xfde8>;
>>                  dma-coherent;
>>                  iommus = <0xfde9 0x01>;
>>          };
>   
> Not bad! I like it.


Good.



>   
>   
>> I am wondering, would be the proper solution to eliminate deferred probe
>> timeout issue in our particular case (without introducing an extra IOMMU
>> driver)?
> In reality I don't think there is a way to do that. I would create an
> empty skelethon IOMMU driver for xen,grant-dma.

Ok, I found yet another option how we can avoid deferred probe timeout 
issue. I am not sure whether it will be welcome. But it doesn't really 
require introducing stub IOMMU driver or other changes in the guest. The 
idea is to make IOMMU device unavailable (status = "disabled"), this way 
of_iommu_configure() will treat that as success condition also.

https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L31
https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L149

         xen_grant_dma {
                 compatible = "xen,grant-dma";
                 #iommu-cells = <0x01>;
                 phandle = <0xfde9>;
                 status = "disabled";
         };
         virtio@2000000 {
                 compatible = "virtio,mmio";
                 reg = <0x00 0x2000000 0x00 0x200>;
                 interrupts = <0x00 0x01 0xf01>;
                 interrupt-parent = <0xfde8>;
                 dma-coherent;
                 iommus = <0xfde9 0x01>;
         };

I have checked, this "fixes" deferred probe timeout issue.


Or we indeed need to introduce stub IOMMU driver (I placed it to 
driver/xen instead of driver/iommu, also we can even squash it with 
grant-dma-ops.c?).
This stub driver also results in NO_IOMMU condition (as "of_xlate" 
callback is not implemented).

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index a7bd8ce..35b91b9 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -335,6 +335,10 @@ config XEN_UNPOPULATED_ALLOC
           having to balloon out RAM regions in order to obtain physical 
memory
           space to create such mappings.

+config XEN_GRANT_DMA_IOMMU
+       bool
+       select IOMMU_API
+
  config XEN_GRANT_DMA_OPS
         bool
         select DMA_OPS
@@ -343,6 +347,7 @@ config XEN_VIRTIO
         bool "Xen virtio support"
         depends on VIRTIO
         select XEN_GRANT_DMA_OPS
+       select XEN_GRANT_DMA_IOMMU
         help
           Enable virtio support for running as Xen guest. Depending on the
           guest type this will require special support on the backend side
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 1a23cb0..c0503f1 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -40,3 +40,4 @@ xen-privcmd-y                         := privcmd.o 
privcmd-buf.o
  obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)    += xen-front-pgdir-shbuf.o
  obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)    += unpopulated-alloc.o
  obj-$(CONFIG_XEN_GRANT_DMA_OPS)                += grant-dma-ops.o
+obj-$(CONFIG_XEN_GRANT_DMA_IOMMU)      += grant-dma-iommu.o
diff --git a/drivers/xen/grant-dma-iommu.c b/drivers/xen/grant-dma-iommu.c
new file mode 100644
index 00000000..b8aad8a
--- /dev/null
+++ b/drivers/xen/grant-dma-iommu.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Stub IOMMU driver which does nothing.
+ * The main purpose of it being present is to reuse generic device-tree 
IOMMU
+ * bindings by Xen grant DMA-mapping layer.
+ */
+
+#include <linux/iommu.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+
+struct grant_dma_iommu_device {
+       struct device *dev;
+       struct iommu_device iommu;
+};
+
+/* Nothing is really needed here */
+static const struct iommu_ops grant_dma_iommu_ops;
+
+static const struct of_device_id grant_dma_iommu_of_match[] = {
+       { .compatible = "xen,grant-dma" },
+       { },
+};
+
+static int grant_dma_iommu_probe(struct platform_device *pdev)
+{
+       struct grant_dma_iommu_device *mmu;
+       int ret;
+
+       mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
+       if (!mmu)
+               return -ENOMEM;
+
+       mmu->dev = &pdev->dev;
+
+       ret = iommu_device_register(&mmu->iommu, &grant_dma_iommu_ops, 
&pdev->dev);
+       if (ret)
+               return ret;
+
+       platform_set_drvdata(pdev, mmu);
+
+       return 0;
+}
+
+static int grant_dma_iommu_remove(struct platform_device *pdev)
+{
+       struct grant_dma_iommu_device *mmu = platform_get_drvdata(pdev);
+
+       platform_set_drvdata(pdev, NULL);
+       iommu_device_unregister(&mmu->iommu);
+
+       return 0;
+}
+
+static struct platform_driver grant_dma_iommu_driver = {
+       .driver = {
+               .name = "grant-dma-iommu",
+               .of_match_table = grant_dma_iommu_of_match,
+       },
+       .probe = grant_dma_iommu_probe,
+       .remove = grant_dma_iommu_remove,
+};
+
+static int __init grant_dma_iommu_init(void)
+{
+       struct device_node *iommu_np;
+
+       iommu_np = of_find_matching_node(NULL, grant_dma_iommu_of_match);
+       if (!iommu_np)
+               return 0;
+
+       of_node_put(iommu_np);
+
+       return platform_driver_register(&grant_dma_iommu_driver);
+}
+subsys_initcall(grant_dma_iommu_init);

I have checked, this also "fixes" deferred probe timeout issue.

Personally I would prefer the first option, but I would be also happy to 
use second option in order to unblock the series.

What do the maintainers think?


-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-24 16:11                   ` Oleksandr
@ 2022-05-24 17:59                     ` Stefano Stabellini
  2022-05-25 11:15                       ` Oleksandr
  0 siblings, 1 reply; 42+ messages in thread
From: Stefano Stabellini @ 2022-05-24 17:59 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Arnd Bergmann,
	Oleksandr Tyshchenko, Jason Wang, Rob Herring,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Michael S. Tsirkin, Christoph Hellwig, Jean-Philippe Brucker

[-- Attachment #1: Type: text/plain, Size: 17380 bytes --]

On Tue, 24 May 2022, Oleksandr wrote:
> > On Mon, 23 May 2022, Oleksandr wrote:
> > > > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com>
> > > > > > > wrote:
> > > > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > > > <olekstysh@gmail.com> wrote:
> > > > > > > > >      This would mean having a device
> > > > > > > > > node for the grant-table mechanism that can be referred to
> > > > > > > > > using
> > > > > > > > > the
> > > > > > > > > 'iommus'
> > > > > > > > > phandle property, with the domid as an additional argument.
> > > > > > > > I assume, you are speaking about something like the following?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > xen_dummy_iommu {
> > > > > > > >        compatible = "xen,dummy-iommu";
> > > > > > > >        #iommu-cells = <1>;
> > > > > > > > };
> > > > > > > > 
> > > > > > > > virtio@3000 {
> > > > > > > >        compatible = "virtio,mmio";
> > > > > > > >        reg = <0x3000 0x100>;
> > > > > > > >        interrupts = <41>;
> > > > > > > > 
> > > > > > > >        /* The device is located in Xen domain with ID 1 */
> > > > > > > >        iommus = <&xen_dummy_iommu 1>;
> > > > > > > > };
> > > > > > > Right, that's that's the idea,
> > > > > > thank you for the confirmation
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > >     except I would not call it a 'dummy'.
> > > > > > >    From the perspective of the DT, this behaves just like an
> > > > > > > IOMMU,
> > > > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > > > implementations.
> > > > > > well, agree
> > > > > > 
> > > > > > 
> > > > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > > > iommus,
> > > > > > > > > as that has an allocator for dma_addr_t space
> > > > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > , but it would think it's
> > > > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > > > Interesting idea. I am wondering, do we need an extra actions
> > > > > > > > for
> > > > > > > > this
> > > > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > > > It depends on how closely the guest implementation can be made to
> > > > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > > > it may actually be close enough that you can just turn the
> > > > > > > grant-table
> > > > > > > code into a normal iommu driver and change nothing else.
> > > > > > Unfortunately, I failed to find a way how use grant references at
> > > > > > the
> > > > > > iommu_ops level (I mean to fully pretend that we are an IOMMU
> > > > > > driver). I
> > > > > > am
> > > > > > not too familiar with that, so what is written below might be wrong
> > > > > > or
> > > > > > at
> > > > > > least not precise.
> > > > > > 
> > > > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > > > itself, it
> > > > > > just maps (IOVA-PA) what was requested to be mapped by the upper
> > > > > > layer.
> > > > > > The
> > > > > > DMA address allocation is done by the upper layer (DMA-IOMMU which
> > > > > > is
> > > > > > the glue
> > > > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But,
> > > > > > all
> > > > > > what we
> > > > > > need here is just to allocate our specific grant-table based DMA
> > > > > > addresses
> > > > > > (DMA address = grant reference + offset in the page), so let’s say
> > > > > > we
> > > > > > need an
> > > > > > entity to take a physical address as parameter and return a DMA
> > > > > > address
> > > > > > (what
> > > > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > > > dma_ops
> > > > > > layer we get exactly what we need, with the minimal changes to guest
> > > > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > > > > 
> > > > > > Assuming that we want to reuse the IOMMU infrastructure somehow for
> > > > > > our
> > > > > > needs.
> > > > > > I think, in that case we will likely need to introduce a new
> > > > > > specific
> > > > > > IOVA
> > > > > > allocator (alongside with a generic one) to be hooked up by the
> > > > > > DMA-IOMMU
> > > > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > > > allocator to
> > > > > > return what we indeed need (DMA address = grant reference + offset
> > > > > > in
> > > > > > the
> > > > > > page) we will still need the specific minimal required IOMMU driver
> > > > > > to
> > > > > > be
> > > > > > present in the system anyway in order to track the mappings(?) and
> > > > > > do
> > > > > > nothing
> > > > > > with them, returning a success (this specific IOMMU driver should
> > > > > > have
> > > > > > all
> > > > > > mandatory callbacks implemented).
> > > > > > 
> > > > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > > > bindings
> > > > > > rather than introducing Xen specific property if what we are trying
> > > > > > to
> > > > > > implement in current patch series fits in the usage of "iommus" in
> > > > > > Linux
> > > > > > more-less. But, if we will have to add more complexity/more
> > > > > > components
> > > > > > to the
> > > > > > code for the sake of reusing device tree binding, this raises a
> > > > > > question
> > > > > > whether that’s worthwhile.
> > > > > > 
> > > > > > Or I really missed something?
> > > > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > > > bindings, not necessarily the IOMMU drivers framework in Linux
> > > > > (although
> > > > > that would be an added bonus.)
> > > > > 
> > > > > I know from previous discussions with you that making the grant table
> > > > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > > > the Device Tree bindings seems feasible?
> > > > I started experimenting with that. As wrote in a separate email, I got a
> > > > deferred probe timeout,
> > > > 
> > > > after inserting required nodes into guest device tree, which seems to be
> > > > a
> > > > consequence of the unavailability of IOMMU, I will continue to
> > > > investigate
> > > > this question.
> > > 
> > > I have experimented with that. Yes, just reusing the Device Tree bindings
> > > is
> > > technically feasible (and we are able to do this by only touching
> > > grant-dma-ops.c), although deferred probe timeout still stands (as there
> > > is no
> > > IOMMU driver being present actually).
> > > 
> > > [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout,
> > > ignoring
> > > dependency
> > > [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks
> > > (2.10
> > > GB/1.95 GiB)
> > > 
> > > 
> > > Below the working diff (on top of current series):
> > > 
> > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > index da9c7ff..6586152 100644
> > > --- a/drivers/xen/grant-dma-ops.c
> > > +++ b/drivers/xen/grant-dma-ops.c
> > > @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops =
> > > {
> > > 
> > >   bool xen_is_grant_dma_device(struct device *dev)
> > >   {
> > > +       struct device_node *iommu_np;
> > > +       bool has_iommu;
> > > +
> > >          /* XXX Handle only DT devices for now */
> > >          if (!dev->of_node)
> > >                  return false;
> > > 
> > > -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
> > > +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> > > +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> > > "xen,grant-dma");
> > > +       of_node_put(iommu_np);
> > > +
> > > +       return has_iommu;
> > >   }
> > > 
> > >   void xen_grant_setup_dma_ops(struct device *dev)
> > >   {
> > >          struct xen_grant_dma_data *data;
> > > -       uint32_t domid;
> > > +       struct of_phandle_args iommu_spec;
> > > 
> > >          data = find_xen_grant_dma_data(dev);
> > >          if (data) {
> > > @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
> > >          if (!dev->of_node)
> > >                  goto err;
> > > 
> > > -       if (of_property_read_u32(dev->of_node, "xen,backend-domid",
> > > &domid)) {
> > > -               dev_err(dev, "xen,backend-domid property is not
> > > present\n");
> > > +       if (of_parse_phandle_with_args(dev->of_node, "iommus",
> > > "#iommu-cells",
> > > +                       0, &iommu_spec)) {
> > > +               dev_err(dev, "Cannot parse iommus property\n");
> > > +               goto err;
> > > +       }
> > > +
> > > +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> > > +                       iommu_spec.args_count != 1) {
> > > +               dev_err(dev, "Incompatible IOMMU node\n");
> > > +               of_node_put(iommu_spec.np);
> > >                  goto err;
> > >          }
> > > 
> > > +       of_node_put(iommu_spec.np);
> > > +
> > >          data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > >          if (!data)
> > >                  goto err;
> > > 
> > > -       data->backend_domid = domid;
> > > +       /*
> > > +        * The endpoint ID here means the ID of the domain where the
> > > corresponding
> > > +        * backend is running
> > > +        */
> > > +       data->backend_domid = iommu_spec.args[0];
> > > 
> > >          if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev,
> > > data,
> > >                          GFP_KERNEL))) {
> > > (END)
> > > 
> > > 
> > > 
> > > Below, the nodes generated by Xen toolstack:
> > > 
> > >          xen_grant_dma {
> > >                  compatible = "xen,grant-dma";
> > >                  #iommu-cells = <0x01>;
> > >                  phandle = <0xfde9>;
> > >          };
> > > 
> > >          virtio@2000000 {
> > >                  compatible = "virtio,mmio";
> > >                  reg = <0x00 0x2000000 0x00 0x200>;
> > >                  interrupts = <0x00 0x01 0xf01>;
> > >                  interrupt-parent = <0xfde8>;
> > >                  dma-coherent;
> > >                  iommus = <0xfde9 0x01>;
> > >          };
> >   Not bad! I like it.
> 
> 
> Good.
> 
> 
> 
> >     
> > > I am wondering, would be the proper solution to eliminate deferred probe
> > > timeout issue in our particular case (without introducing an extra IOMMU
> > > driver)?
> > In reality I don't think there is a way to do that. I would create an
> > empty skelethon IOMMU driver for xen,grant-dma.
> 
> Ok, I found yet another option how we can avoid deferred probe timeout issue.
> I am not sure whether it will be welcome. But it doesn't really require
> introducing stub IOMMU driver or other changes in the guest. The idea is to
> make IOMMU device unavailable (status = "disabled"), this way
> of_iommu_configure() will treat that as success condition also.
> 
> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L31
> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L149
> 
>         xen_grant_dma {
>                 compatible = "xen,grant-dma";
>                 #iommu-cells = <0x01>;
>                 phandle = <0xfde9>;
>                 status = "disabled";
>         };
>         virtio@2000000 {
>                 compatible = "virtio,mmio";
>                 reg = <0x00 0x2000000 0x00 0x200>;
>                 interrupts = <0x00 0x01 0xf01>;
>                 interrupt-parent = <0xfde8>;
>                 dma-coherent;
>                 iommus = <0xfde9 0x01>;
>         };
> 
> I have checked, this "fixes" deferred probe timeout issue.
> 
> 
> Or we indeed need to introduce stub IOMMU driver (I placed it to driver/xen
> instead of driver/iommu, also we can even squash it with grant-dma-ops.c?).
> This stub driver also results in NO_IOMMU condition (as "of_xlate" callback is
> not implemented).
> 
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index a7bd8ce..35b91b9 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -335,6 +335,10 @@ config XEN_UNPOPULATED_ALLOC
>           having to balloon out RAM regions in order to obtain physical memory
>           space to create such mappings.
> 
> +config XEN_GRANT_DMA_IOMMU
> +       bool
> +       select IOMMU_API
> +
>  config XEN_GRANT_DMA_OPS
>         bool
>         select DMA_OPS
> @@ -343,6 +347,7 @@ config XEN_VIRTIO
>         bool "Xen virtio support"
>         depends on VIRTIO
>         select XEN_GRANT_DMA_OPS
> +       select XEN_GRANT_DMA_IOMMU
>         help
>           Enable virtio support for running as Xen guest. Depending on the
>           guest type this will require special support on the backend side
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 1a23cb0..c0503f1 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -40,3 +40,4 @@ xen-privcmd-y                         := privcmd.o
> privcmd-buf.o
>  obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)    += xen-front-pgdir-shbuf.o
>  obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)    += unpopulated-alloc.o
>  obj-$(CONFIG_XEN_GRANT_DMA_OPS)                += grant-dma-ops.o
> +obj-$(CONFIG_XEN_GRANT_DMA_IOMMU)      += grant-dma-iommu.o
> diff --git a/drivers/xen/grant-dma-iommu.c b/drivers/xen/grant-dma-iommu.c
> new file mode 100644
> index 00000000..b8aad8a
> --- /dev/null
> +++ b/drivers/xen/grant-dma-iommu.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Stub IOMMU driver which does nothing.
> + * The main purpose of it being present is to reuse generic device-tree IOMMU
> + * bindings by Xen grant DMA-mapping layer.
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +
> +struct grant_dma_iommu_device {
> +       struct device *dev;
> +       struct iommu_device iommu;
> +};
> +
> +/* Nothing is really needed here */
> +static const struct iommu_ops grant_dma_iommu_ops;
> +
> +static const struct of_device_id grant_dma_iommu_of_match[] = {
> +       { .compatible = "xen,grant-dma" },
> +       { },
> +};
> +
> +static int grant_dma_iommu_probe(struct platform_device *pdev)
> +{
> +       struct grant_dma_iommu_device *mmu;
> +       int ret;
> +
> +       mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
> +       if (!mmu)
> +               return -ENOMEM;
> +
> +       mmu->dev = &pdev->dev;
> +
> +       ret = iommu_device_register(&mmu->iommu, &grant_dma_iommu_ops,
> &pdev->dev);
> +       if (ret)
> +               return ret;
> +
> +       platform_set_drvdata(pdev, mmu);
> +
> +       return 0;
> +}
> +
> +static int grant_dma_iommu_remove(struct platform_device *pdev)
> +{
> +       struct grant_dma_iommu_device *mmu = platform_get_drvdata(pdev);
> +
> +       platform_set_drvdata(pdev, NULL);
> +       iommu_device_unregister(&mmu->iommu);
> +
> +       return 0;
> +}
> +
> +static struct platform_driver grant_dma_iommu_driver = {
> +       .driver = {
> +               .name = "grant-dma-iommu",
> +               .of_match_table = grant_dma_iommu_of_match,
> +       },
> +       .probe = grant_dma_iommu_probe,
> +       .remove = grant_dma_iommu_remove,
> +};
> +
> +static int __init grant_dma_iommu_init(void)
> +{
> +       struct device_node *iommu_np;
> +
> +       iommu_np = of_find_matching_node(NULL, grant_dma_iommu_of_match);
> +       if (!iommu_np)
> +               return 0;
> +
> +       of_node_put(iommu_np);
> +
> +       return platform_driver_register(&grant_dma_iommu_driver);
> +}
> +subsys_initcall(grant_dma_iommu_init);
> 
> I have checked, this also "fixes" deferred probe timeout issue.
> 
> Personally I would prefer the first option, but I would be also happy to use
> second option in order to unblock the series.
> 
> What do the maintainers think?
 

I don't think it is a good idea to mark the fake IOMMU as disabled
because it implies that there is no need to use it (no need to use
dma_ops) which is a problem.

If we don't want the skelethon driver then Rob's suggestion of having a
skip list for deferred probe is better.

I think the skelethon driver also is totally fine.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-24 16:01                   ` Rob Herring
@ 2022-05-24 18:34                     ` Saravana Kannan
  2022-05-25 16:30                       ` Oleksandr
  0 siblings, 1 reply; 42+ messages in thread
From: Saravana Kannan @ 2022-05-24 18:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: Stefano Stabellini, Oleksandr, xen-devel,
	open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE, DTML,
	Linux Kernel Mailing List, Linux ARM, Arnd Bergmann,
	Oleksandr Tyshchenko, Jason Wang, Krzysztof Kozlowski,
	Julien Grall, Juergen Gross, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker

"

On Tue, May 24, 2022 at 9:01 AM Rob Herring <robh@kernel.org> wrote:
>
> +Saravana
>
> On Mon, May 23, 2022 at 06:58:13PM -0700, Stefano Stabellini wrote:
> > On Mon, 23 May 2022, Oleksandr wrote:
> > > > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
> > > > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > > > <olekstysh@gmail.com> wrote:
> > > > > > > > >     This would mean having a device
> > > > > > > > > node for the grant-table mechanism that can be referred to using
> > > > > > > > > the
> > > > > > > > > 'iommus'
> > > > > > > > > phandle property, with the domid as an additional argument.
> > > > > > > > I assume, you are speaking about something like the following?
> > > > > > > >
> > > > > > > >
> > > > > > > > xen_dummy_iommu {
> > > > > > > >       compatible = "xen,dummy-iommu";
> > > > > > > >       #iommu-cells = <1>;
> > > > > > > > };
> > > > > > > >
> > > > > > > > virtio@3000 {
> > > > > > > >       compatible = "virtio,mmio";
> > > > > > > >       reg = <0x3000 0x100>;
> > > > > > > >       interrupts = <41>;
> > > > > > > >
> > > > > > > >       /* The device is located in Xen domain with ID 1 */
> > > > > > > >       iommus = <&xen_dummy_iommu 1>;
> > > > > > > > };
> > > > > > > Right, that's that's the idea,
> > > > > > thank you for the confirmation
> > > > > >
> > > > > >
> > > > > >
> > > > > > >    except I would not call it a 'dummy'.
> > > > > > >   From the perspective of the DT, this behaves just like an IOMMU,
> > > > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > > > implementations.
> > > > > > well, agree
> > > > > >
> > > > > >
> > > > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > > > iommus,
> > > > > > > > > as that has an allocator for dma_addr_t space
> > > > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > > > >
> > > > > > > >
> > > > > > > > > , but it would think it's
> > > > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > > > Interesting idea. I am wondering, do we need an extra actions for
> > > > > > > > this
> > > > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > > > It depends on how closely the guest implementation can be made to
> > > > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > > > it may actually be close enough that you can just turn the grant-table
> > > > > > > code into a normal iommu driver and change nothing else.
> > > > > > Unfortunately, I failed to find a way how use grant references at the
> > > > > > iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
> > > > > > am
> > > > > > not too familiar with that, so what is written below might be wrong or
> > > > > > at
> > > > > > least not precise.
> > > > > >
> > > > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > > > itself, it
> > > > > > just maps (IOVA-PA) what was requested to be mapped by the upper layer.
> > > > > > The
> > > > > > DMA address allocation is done by the upper layer (DMA-IOMMU which is
> > > > > > the glue
> > > > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
> > > > > > what we
> > > > > > need here is just to allocate our specific grant-table based DMA
> > > > > > addresses
> > > > > > (DMA address = grant reference + offset in the page), so let’s say we
> > > > > > need an
> > > > > > entity to take a physical address as parameter and return a DMA address
> > > > > > (what
> > > > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > > > dma_ops
> > > > > > layer we get exactly what we need, with the minimal changes to guest
> > > > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > > > >
> > > > > > Assuming that we want to reuse the IOMMU infrastructure somehow for our
> > > > > > needs.
> > > > > > I think, in that case we will likely need to introduce a new specific
> > > > > > IOVA
> > > > > > allocator (alongside with a generic one) to be hooked up by the
> > > > > > DMA-IOMMU
> > > > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > > > allocator to
> > > > > > return what we indeed need (DMA address = grant reference + offset in
> > > > > > the
> > > > > > page) we will still need the specific minimal required IOMMU driver to
> > > > > > be
> > > > > > present in the system anyway in order to track the mappings(?) and do
> > > > > > nothing
> > > > > > with them, returning a success (this specific IOMMU driver should have
> > > > > > all
> > > > > > mandatory callbacks implemented).
> > > > > >
> > > > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > > > bindings
> > > > > > rather than introducing Xen specific property if what we are trying to
> > > > > > implement in current patch series fits in the usage of "iommus" in Linux
> > > > > > more-less. But, if we will have to add more complexity/more components
> > > > > > to the
> > > > > > code for the sake of reusing device tree binding, this raises a question
> > > > > > whether that’s worthwhile.
> > > > > >
> > > > > > Or I really missed something?
> > > > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > > > bindings, not necessarily the IOMMU drivers framework in Linux (although
> > > > > that would be an added bonus.)
> > > > >
> > > > > I know from previous discussions with you that making the grant table
> > > > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > > > the Device Tree bindings seems feasible?
> > > >
> > > > I started experimenting with that. As wrote in a separate email, I got a
> > > > deferred probe timeout,
> > > >
> > > > after inserting required nodes into guest device tree, which seems to be a
> > > > consequence of the unavailability of IOMMU, I will continue to investigate
> > > > this question.
> > >
> > >
> > > I have experimented with that. Yes, just reusing the Device Tree bindings is
> > > technically feasible (and we are able to do this by only touching
> > > grant-dma-ops.c), although deferred probe timeout still stands (as there is no
> > > IOMMU driver being present actually).
> > >
> > > [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
> > > dependency
> > > [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
> > > GB/1.95 GiB)
> > >
> > >
> > > Below the working diff (on top of current series):
> > >
> > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > index da9c7ff..6586152 100644
> > > --- a/drivers/xen/grant-dma-ops.c
> > > +++ b/drivers/xen/grant-dma-ops.c
> > > @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
> > >
> > >  bool xen_is_grant_dma_device(struct device *dev)
> > >  {
> > > +       struct device_node *iommu_np;
> > > +       bool has_iommu;
> > > +
> > >         /* XXX Handle only DT devices for now */
> > >         if (!dev->of_node)
> > >                 return false;
> > >
> > > -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
> > > +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> > > +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> > > "xen,grant-dma");
> > > +       of_node_put(iommu_np);
> > > +
> > > +       return has_iommu;
> > >  }
> > >
> > >  void xen_grant_setup_dma_ops(struct device *dev)
> > >  {
> > >         struct xen_grant_dma_data *data;
> > > -       uint32_t domid;
> > > +       struct of_phandle_args iommu_spec;
> > >
> > >         data = find_xen_grant_dma_data(dev);
> > >         if (data) {
> > > @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
> > >         if (!dev->of_node)
> > >                 goto err;
> > >
> > > -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
> > > -               dev_err(dev, "xen,backend-domid property is not present\n");
> > > +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
> > > +                       0, &iommu_spec)) {
> > > +               dev_err(dev, "Cannot parse iommus property\n");
> > > +               goto err;
> > > +       }
> > > +
> > > +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> > > +                       iommu_spec.args_count != 1) {
> > > +               dev_err(dev, "Incompatible IOMMU node\n");
> > > +               of_node_put(iommu_spec.np);
> > >                 goto err;
> > >         }
> > >
> > > +       of_node_put(iommu_spec.np);
> > > +
> > >         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > >         if (!data)
> > >                 goto err;
> > >
> > > -       data->backend_domid = domid;
> > > +       /*
> > > +        * The endpoint ID here means the ID of the domain where the
> > > corresponding
> > > +        * backend is running
> > > +        */
> > > +       data->backend_domid = iommu_spec.args[0];
> > >
> > >         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
> > >                         GFP_KERNEL))) {
> > > (END)
> > >
> > >
> > >
> > > Below, the nodes generated by Xen toolstack:
> > >
> > >         xen_grant_dma {
>
> Nit: iommu {
>
> > >                 compatible = "xen,grant-dma";
> > >                 #iommu-cells = <0x01>;
> > >                 phandle = <0xfde9>;
> > >         };
> > >
> > >         virtio@2000000 {
> > >                 compatible = "virtio,mmio";
> > >                 reg = <0x00 0x2000000 0x00 0x200>;
> > >                 interrupts = <0x00 0x01 0xf01>;
> > >                 interrupt-parent = <0xfde8>;
> > >                 dma-coherent;
> > >                 iommus = <0xfde9 0x01>;
> > >         };
> >
> > Not bad! I like it.
> >
> >
> > > I am wondering, would be the proper solution to eliminate deferred probe
> > > timeout issue in our particular case (without introducing an extra IOMMU
> > > driver)?
> >
> > In reality I don't think there is a way to do that. I would create an
> > empty skelethon IOMMU driver for xen,grant-dma.
>
> Does it have to be an empty driver? Originally, IOMMU 'drivers' were not
> drivers, but they've been getting converted. Can that be done here?
>
> Short of that, I think we could have some sort of skip probe list for
> deferred probe. Not sure if that would be easiest as IOMMU specific or
> global.

Hi Oleksandr,

If you do fw_devlink.strict=1, you'll notice that the consumers of
this "iommu" won't probe at all or will delay the boot by some number
of seconds. The eventual goal is to go towards fw_devlink.strict=1
being the default.

From a fw_devlik perspective, please implement a driver. Ideally a
real one, but at least an empty one. The empty one doesn't need to be
an IOMMU driver, but at least just do a return 0 in the probe
function. Also, if it's not a device, why even have a "compatible"
property (removing it won't necessarily remove the deferred probe
timeout issue you see)? Will any code be using "xen,grant-dma" to look
up the node? If so, that driver could be the one that probes this
device. At least from a fw_devlink perspective, it just needs to have
a driver that binds to this device.

Also, if we aren't going to implement a driver and have the supplier
("xen,grant-dma") behave like a device (as in, have a driver that
probes), I'd rather that the iommu binding not be used at all as this
would be an exception to how every other iommu device behaves.

-Saravana

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-24 17:59                     ` Stefano Stabellini
@ 2022-05-25 11:15                       ` Oleksandr
  0 siblings, 0 replies; 42+ messages in thread
From: Oleksandr @ 2022-05-25 11:15 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE,
	DTML, Linux Kernel Mailing List, Linux ARM, Arnd Bergmann,
	Oleksandr Tyshchenko, Jason Wang, Rob Herring,
	Krzysztof Kozlowski, Julien Grall, Juergen Gross,
	Michael S. Tsirkin, Christoph Hellwig, Jean-Philippe Brucker


On 24.05.22 20:59, Stefano Stabellini wrote:

Hello Stefano

> On Tue, 24 May 2022, Oleksandr wrote:
>>> On Mon, 23 May 2022, Oleksandr wrote:
>>>>>> On Thu, 19 May 2022, Oleksandr wrote:
>>>>>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>>>>>> <olekstysh@gmail.com> wrote:
>>>>>>>>>>       This would mean having a device
>>>>>>>>>> node for the grant-table mechanism that can be referred to
>>>>>>>>>> using
>>>>>>>>>> the
>>>>>>>>>> 'iommus'
>>>>>>>>>> phandle property, with the domid as an additional argument.
>>>>>>>>> I assume, you are speaking about something like the following?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> xen_dummy_iommu {
>>>>>>>>>         compatible = "xen,dummy-iommu";
>>>>>>>>>         #iommu-cells = <1>;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> virtio@3000 {
>>>>>>>>>         compatible = "virtio,mmio";
>>>>>>>>>         reg = <0x3000 0x100>;
>>>>>>>>>         interrupts = <41>;
>>>>>>>>>
>>>>>>>>>         /* The device is located in Xen domain with ID 1 */
>>>>>>>>>         iommus = <&xen_dummy_iommu 1>;
>>>>>>>>> };
>>>>>>>> Right, that's that's the idea,
>>>>>>> thank you for the confirmation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>      except I would not call it a 'dummy'.
>>>>>>>>     From the perspective of the DT, this behaves just like an
>>>>>>>> IOMMU,
>>>>>>>> even if the exact mechanism is different from most hardware IOMMU
>>>>>>>> implementations.
>>>>>>> well, agree
>>>>>>>
>>>>>>>
>>>>>>>>>> It does not quite fit the model that Linux currently uses for
>>>>>>>>>> iommus,
>>>>>>>>>> as that has an allocator for dma_addr_t space
>>>>>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> , but it would think it's
>>>>>>>>>> conceptually close enough that it makes sense for the binding.
>>>>>>>>> Interesting idea. I am wondering, do we need an extra actions
>>>>>>>>> for
>>>>>>>>> this
>>>>>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>>>>>> It depends on how closely the guest implementation can be made to
>>>>>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>>>>>> it may actually be close enough that you can just turn the
>>>>>>>> grant-table
>>>>>>>> code into a normal iommu driver and change nothing else.
>>>>>>> Unfortunately, I failed to find a way how use grant references at
>>>>>>> the
>>>>>>> iommu_ops level (I mean to fully pretend that we are an IOMMU
>>>>>>> driver). I
>>>>>>> am
>>>>>>> not too familiar with that, so what is written below might be wrong
>>>>>>> or
>>>>>>> at
>>>>>>> least not precise.
>>>>>>>
>>>>>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
>>>>>>> itself, it
>>>>>>> just maps (IOVA-PA) what was requested to be mapped by the upper
>>>>>>> layer.
>>>>>>> The
>>>>>>> DMA address allocation is done by the upper layer (DMA-IOMMU which
>>>>>>> is
>>>>>>> the glue
>>>>>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But,
>>>>>>> all
>>>>>>> what we
>>>>>>> need here is just to allocate our specific grant-table based DMA
>>>>>>> addresses
>>>>>>> (DMA address = grant reference + offset in the page), so let’s say
>>>>>>> we
>>>>>>> need an
>>>>>>> entity to take a physical address as parameter and return a DMA
>>>>>>> address
>>>>>>> (what
>>>>>>> actually commit #3/7 is doing), and that’s all. So working at the
>>>>>>> dma_ops
>>>>>>> layer we get exactly what we need, with the minimal changes to guest
>>>>>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>>>>>
>>>>>>> Assuming that we want to reuse the IOMMU infrastructure somehow for
>>>>>>> our
>>>>>>> needs.
>>>>>>> I think, in that case we will likely need to introduce a new
>>>>>>> specific
>>>>>>> IOVA
>>>>>>> allocator (alongside with a generic one) to be hooked up by the
>>>>>>> DMA-IOMMU
>>>>>>> layer if we run on top of Xen. But, even having the specific IOVA
>>>>>>> allocator to
>>>>>>> return what we indeed need (DMA address = grant reference + offset
>>>>>>> in
>>>>>>> the
>>>>>>> page) we will still need the specific minimal required IOMMU driver
>>>>>>> to
>>>>>>> be
>>>>>>> present in the system anyway in order to track the mappings(?) and
>>>>>>> do
>>>>>>> nothing
>>>>>>> with them, returning a success (this specific IOMMU driver should
>>>>>>> have
>>>>>>> all
>>>>>>> mandatory callbacks implemented).
>>>>>>>
>>>>>>> I completely agree, it would be really nice to reuse generic IOMMU
>>>>>>> bindings
>>>>>>> rather than introducing Xen specific property if what we are trying
>>>>>>> to
>>>>>>> implement in current patch series fits in the usage of "iommus" in
>>>>>>> Linux
>>>>>>> more-less. But, if we will have to add more complexity/more
>>>>>>> components
>>>>>>> to the
>>>>>>> code for the sake of reusing device tree binding, this raises a
>>>>>>> question
>>>>>>> whether that’s worthwhile.
>>>>>>>
>>>>>>> Or I really missed something?
>>>>>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>>>>>> bindings, not necessarily the IOMMU drivers framework in Linux
>>>>>> (although
>>>>>> that would be an added bonus.)
>>>>>>
>>>>>> I know from previous discussions with you that making the grant table
>>>>>> fit in the existing IOMMU drivers model is difficult, but just reusing
>>>>>> the Device Tree bindings seems feasible?
>>>>> I started experimenting with that. As wrote in a separate email, I got a
>>>>> deferred probe timeout,
>>>>>
>>>>> after inserting required nodes into guest device tree, which seems to be
>>>>> a
>>>>> consequence of the unavailability of IOMMU, I will continue to
>>>>> investigate
>>>>> this question.
>>>> I have experimented with that. Yes, just reusing the Device Tree bindings
>>>> is
>>>> technically feasible (and we are able to do this by only touching
>>>> grant-dma-ops.c), although deferred probe timeout still stands (as there
>>>> is no
>>>> IOMMU driver being present actually).
>>>>
>>>> [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout,
>>>> ignoring
>>>> dependency
>>>> [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks
>>>> (2.10
>>>> GB/1.95 GiB)
>>>>
>>>>
>>>> Below the working diff (on top of current series):
>>>>
>>>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>>>> index da9c7ff..6586152 100644
>>>> --- a/drivers/xen/grant-dma-ops.c
>>>> +++ b/drivers/xen/grant-dma-ops.c
>>>> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops =
>>>> {
>>>>
>>>>    bool xen_is_grant_dma_device(struct device *dev)
>>>>    {
>>>> +       struct device_node *iommu_np;
>>>> +       bool has_iommu;
>>>> +
>>>>           /* XXX Handle only DT devices for now */
>>>>           if (!dev->of_node)
>>>>                   return false;
>>>>
>>>> -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
>>>> +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
>>>> +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
>>>> "xen,grant-dma");
>>>> +       of_node_put(iommu_np);
>>>> +
>>>> +       return has_iommu;
>>>>    }
>>>>
>>>>    void xen_grant_setup_dma_ops(struct device *dev)
>>>>    {
>>>>           struct xen_grant_dma_data *data;
>>>> -       uint32_t domid;
>>>> +       struct of_phandle_args iommu_spec;
>>>>
>>>>           data = find_xen_grant_dma_data(dev);
>>>>           if (data) {
>>>> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>>>>           if (!dev->of_node)
>>>>                   goto err;
>>>>
>>>> -       if (of_property_read_u32(dev->of_node, "xen,backend-domid",
>>>> &domid)) {
>>>> -               dev_err(dev, "xen,backend-domid property is not
>>>> present\n");
>>>> +       if (of_parse_phandle_with_args(dev->of_node, "iommus",
>>>> "#iommu-cells",
>>>> +                       0, &iommu_spec)) {
>>>> +               dev_err(dev, "Cannot parse iommus property\n");
>>>> +               goto err;
>>>> +       }
>>>> +
>>>> +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
>>>> +                       iommu_spec.args_count != 1) {
>>>> +               dev_err(dev, "Incompatible IOMMU node\n");
>>>> +               of_node_put(iommu_spec.np);
>>>>                   goto err;
>>>>           }
>>>>
>>>> +       of_node_put(iommu_spec.np);
>>>> +
>>>>           data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>>>           if (!data)
>>>>                   goto err;
>>>>
>>>> -       data->backend_domid = domid;
>>>> +       /*
>>>> +        * The endpoint ID here means the ID of the domain where the
>>>> corresponding
>>>> +        * backend is running
>>>> +        */
>>>> +       data->backend_domid = iommu_spec.args[0];
>>>>
>>>>           if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev,
>>>> data,
>>>>                           GFP_KERNEL))) {
>>>> (END)
>>>>
>>>>
>>>>
>>>> Below, the nodes generated by Xen toolstack:
>>>>
>>>>           xen_grant_dma {
>>>>                   compatible = "xen,grant-dma";
>>>>                   #iommu-cells = <0x01>;
>>>>                   phandle = <0xfde9>;
>>>>           };
>>>>
>>>>           virtio@2000000 {
>>>>                   compatible = "virtio,mmio";
>>>>                   reg = <0x00 0x2000000 0x00 0x200>;
>>>>                   interrupts = <0x00 0x01 0xf01>;
>>>>                   interrupt-parent = <0xfde8>;
>>>>                   dma-coherent;
>>>>                   iommus = <0xfde9 0x01>;
>>>>           };
>>>    Not bad! I like it.
>>
>> Good.
>>
>>
>>
>>>      
>>>> I am wondering, would be the proper solution to eliminate deferred probe
>>>> timeout issue in our particular case (without introducing an extra IOMMU
>>>> driver)?
>>> In reality I don't think there is a way to do that. I would create an
>>> empty skelethon IOMMU driver for xen,grant-dma.
>> Ok, I found yet another option how we can avoid deferred probe timeout issue.
>> I am not sure whether it will be welcome. But it doesn't really require
>> introducing stub IOMMU driver or other changes in the guest. The idea is to
>> make IOMMU device unavailable (status = "disabled"), this way
>> of_iommu_configure() will treat that as success condition also.
>>
>> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L31
>> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L149
>>
>>          xen_grant_dma {
>>                  compatible = "xen,grant-dma";
>>                  #iommu-cells = <0x01>;
>>                  phandle = <0xfde9>;
>>                  status = "disabled";
>>          };
>>          virtio@2000000 {
>>                  compatible = "virtio,mmio";
>>                  reg = <0x00 0x2000000 0x00 0x200>;
>>                  interrupts = <0x00 0x01 0xf01>;
>>                  interrupt-parent = <0xfde8>;
>>                  dma-coherent;
>>                  iommus = <0xfde9 0x01>;
>>          };
>>
>> I have checked, this "fixes" deferred probe timeout issue.
>>
>>
>> Or we indeed need to introduce stub IOMMU driver (I placed it to driver/xen
>> instead of driver/iommu, also we can even squash it with grant-dma-ops.c?).
>> This stub driver also results in NO_IOMMU condition (as "of_xlate" callback is
>> not implemented).
>>
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index a7bd8ce..35b91b9 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -335,6 +335,10 @@ config XEN_UNPOPULATED_ALLOC
>>            having to balloon out RAM regions in order to obtain physical memory
>>            space to create such mappings.
>>
>> +config XEN_GRANT_DMA_IOMMU
>> +       bool
>> +       select IOMMU_API
>> +
>>   config XEN_GRANT_DMA_OPS
>>          bool
>>          select DMA_OPS
>> @@ -343,6 +347,7 @@ config XEN_VIRTIO
>>          bool "Xen virtio support"
>>          depends on VIRTIO
>>          select XEN_GRANT_DMA_OPS
>> +       select XEN_GRANT_DMA_IOMMU
>>          help
>>            Enable virtio support for running as Xen guest. Depending on the
>>            guest type this will require special support on the backend side
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index 1a23cb0..c0503f1 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -40,3 +40,4 @@ xen-privcmd-y                         := privcmd.o
>> privcmd-buf.o
>>   obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)    += xen-front-pgdir-shbuf.o
>>   obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)    += unpopulated-alloc.o
>>   obj-$(CONFIG_XEN_GRANT_DMA_OPS)                += grant-dma-ops.o
>> +obj-$(CONFIG_XEN_GRANT_DMA_IOMMU)      += grant-dma-iommu.o
>> diff --git a/drivers/xen/grant-dma-iommu.c b/drivers/xen/grant-dma-iommu.c
>> new file mode 100644
>> index 00000000..b8aad8a
>> --- /dev/null
>> +++ b/drivers/xen/grant-dma-iommu.c
>> @@ -0,0 +1,76 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Stub IOMMU driver which does nothing.
>> + * The main purpose of it being present is to reuse generic device-tree IOMMU
>> + * bindings by Xen grant DMA-mapping layer.
>> + */
>> +
>> +#include <linux/iommu.h>
>> +#include <linux/of.h>
>> +#include <linux/platform_device.h>
>> +
>> +struct grant_dma_iommu_device {
>> +       struct device *dev;
>> +       struct iommu_device iommu;
>> +};
>> +
>> +/* Nothing is really needed here */
>> +static const struct iommu_ops grant_dma_iommu_ops;
>> +
>> +static const struct of_device_id grant_dma_iommu_of_match[] = {
>> +       { .compatible = "xen,grant-dma" },
>> +       { },
>> +};
>> +
>> +static int grant_dma_iommu_probe(struct platform_device *pdev)
>> +{
>> +       struct grant_dma_iommu_device *mmu;
>> +       int ret;
>> +
>> +       mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
>> +       if (!mmu)
>> +               return -ENOMEM;
>> +
>> +       mmu->dev = &pdev->dev;
>> +
>> +       ret = iommu_device_register(&mmu->iommu, &grant_dma_iommu_ops,
>> &pdev->dev);
>> +       if (ret)
>> +               return ret;
>> +
>> +       platform_set_drvdata(pdev, mmu);
>> +
>> +       return 0;
>> +}
>> +
>> +static int grant_dma_iommu_remove(struct platform_device *pdev)
>> +{
>> +       struct grant_dma_iommu_device *mmu = platform_get_drvdata(pdev);
>> +
>> +       platform_set_drvdata(pdev, NULL);
>> +       iommu_device_unregister(&mmu->iommu);
>> +
>> +       return 0;
>> +}
>> +
>> +static struct platform_driver grant_dma_iommu_driver = {
>> +       .driver = {
>> +               .name = "grant-dma-iommu",
>> +               .of_match_table = grant_dma_iommu_of_match,
>> +       },
>> +       .probe = grant_dma_iommu_probe,
>> +       .remove = grant_dma_iommu_remove,
>> +};
>> +
>> +static int __init grant_dma_iommu_init(void)
>> +{
>> +       struct device_node *iommu_np;
>> +
>> +       iommu_np = of_find_matching_node(NULL, grant_dma_iommu_of_match);
>> +       if (!iommu_np)
>> +               return 0;
>> +
>> +       of_node_put(iommu_np);
>> +
>> +       return platform_driver_register(&grant_dma_iommu_driver);
>> +}
>> +subsys_initcall(grant_dma_iommu_init);
>>
>> I have checked, this also "fixes" deferred probe timeout issue.
>>
>> Personally I would prefer the first option, but I would be also happy to use
>> second option in order to unblock the series.
>>
>> What do the maintainers think?
>   
>
> I don't think it is a good idea to mark the fake IOMMU as disabled
> because it implies that there is no need to use it (no need to use
> dma_ops) which is a problem.

I got your point. You are right, this indeed sounds weird. I expected 
this simple solution wouldn't be welcome.


>
> If we don't want the skelethon driver then Rob's suggestion of having a
> skip list for deferred probe is better.


I am not sure I understand the idea completely.

Does it mean that we will need new command line option (?) to pass some 
string (I assume, the "compatible" for IOMMU device) to not defer probe 
if corresponding driver is missing,

but just return NO_IOMMU right away?


>
> I think the skelethon driver also is totally fine.

ok, thank you for the feedback.


-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops
  2022-05-24 18:34                     ` Saravana Kannan
@ 2022-05-25 16:30                       ` Oleksandr
  0 siblings, 0 replies; 42+ messages in thread
From: Oleksandr @ 2022-05-25 16:30 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Stefano Stabellini
  Cc: xen-devel, open list:DRM DRIVER FOR QEMU'S CIRRUS DEVICE,
	DTML, Linux Kernel Mailing List, Linux ARM, Arnd Bergmann,
	Oleksandr Tyshchenko, Jason Wang, Krzysztof Kozlowski,
	Julien Grall, Juergen Gross, Michael S. Tsirkin,
	Christoph Hellwig, Jean-Philippe Brucker


On 24.05.22 21:34, Saravana Kannan wrote:

Hello all

> "
>
> On Tue, May 24, 2022 at 9:01 AM Rob Herring <robh@kernel.org> wrote:
>> +Saravana
>>
>> On Mon, May 23, 2022 at 06:58:13PM -0700, Stefano Stabellini wrote:
>>> On Mon, 23 May 2022, Oleksandr wrote:
>>>>>> On Thu, 19 May 2022, Oleksandr wrote:
>>>>>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <olekstysh@gmail.com> wrote:
>>>>>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>>>>>> <olekstysh@gmail.com> wrote:
>>>>>>>>>>      This would mean having a device
>>>>>>>>>> node for the grant-table mechanism that can be referred to using
>>>>>>>>>> the
>>>>>>>>>> 'iommus'
>>>>>>>>>> phandle property, with the domid as an additional argument.
>>>>>>>>> I assume, you are speaking about something like the following?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> xen_dummy_iommu {
>>>>>>>>>        compatible = "xen,dummy-iommu";
>>>>>>>>>        #iommu-cells = <1>;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> virtio@3000 {
>>>>>>>>>        compatible = "virtio,mmio";
>>>>>>>>>        reg = <0x3000 0x100>;
>>>>>>>>>        interrupts = <41>;
>>>>>>>>>
>>>>>>>>>        /* The device is located in Xen domain with ID 1 */
>>>>>>>>>        iommus = <&xen_dummy_iommu 1>;
>>>>>>>>> };
>>>>>>>> Right, that's that's the idea,
>>>>>>> thank you for the confirmation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>     except I would not call it a 'dummy'.
>>>>>>>>    From the perspective of the DT, this behaves just like an IOMMU,
>>>>>>>> even if the exact mechanism is different from most hardware IOMMU
>>>>>>>> implementations.
>>>>>>> well, agree
>>>>>>>
>>>>>>>
>>>>>>>>>> It does not quite fit the model that Linux currently uses for
>>>>>>>>>> iommus,
>>>>>>>>>> as that has an allocator for dma_addr_t space
>>>>>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> , but it would think it's
>>>>>>>>>> conceptually close enough that it makes sense for the binding.
>>>>>>>>> Interesting idea. I am wondering, do we need an extra actions for
>>>>>>>>> this
>>>>>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>>>>>> It depends on how closely the guest implementation can be made to
>>>>>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>>>>>> it may actually be close enough that you can just turn the grant-table
>>>>>>>> code into a normal iommu driver and change nothing else.
>>>>>>> Unfortunately, I failed to find a way how use grant references at the
>>>>>>> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
>>>>>>> am
>>>>>>> not too familiar with that, so what is written below might be wrong or
>>>>>>> at
>>>>>>> least not precise.
>>>>>>>
>>>>>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
>>>>>>> itself, it
>>>>>>> just maps (IOVA-PA) what was requested to be mapped by the upper layer.
>>>>>>> The
>>>>>>> DMA address allocation is done by the upper layer (DMA-IOMMU which is
>>>>>>> the glue
>>>>>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
>>>>>>> what we
>>>>>>> need here is just to allocate our specific grant-table based DMA
>>>>>>> addresses
>>>>>>> (DMA address = grant reference + offset in the page), so let’s say we
>>>>>>> need an
>>>>>>> entity to take a physical address as parameter and return a DMA address
>>>>>>> (what
>>>>>>> actually commit #3/7 is doing), and that’s all. So working at the
>>>>>>> dma_ops
>>>>>>> layer we get exactly what we need, with the minimal changes to guest
>>>>>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>>>>>
>>>>>>> Assuming that we want to reuse the IOMMU infrastructure somehow for our
>>>>>>> needs.
>>>>>>> I think, in that case we will likely need to introduce a new specific
>>>>>>> IOVA
>>>>>>> allocator (alongside with a generic one) to be hooked up by the
>>>>>>> DMA-IOMMU
>>>>>>> layer if we run on top of Xen. But, even having the specific IOVA
>>>>>>> allocator to
>>>>>>> return what we indeed need (DMA address = grant reference + offset in
>>>>>>> the
>>>>>>> page) we will still need the specific minimal required IOMMU driver to
>>>>>>> be
>>>>>>> present in the system anyway in order to track the mappings(?) and do
>>>>>>> nothing
>>>>>>> with them, returning a success (this specific IOMMU driver should have
>>>>>>> all
>>>>>>> mandatory callbacks implemented).
>>>>>>>
>>>>>>> I completely agree, it would be really nice to reuse generic IOMMU
>>>>>>> bindings
>>>>>>> rather than introducing Xen specific property if what we are trying to
>>>>>>> implement in current patch series fits in the usage of "iommus" in Linux
>>>>>>> more-less. But, if we will have to add more complexity/more components
>>>>>>> to the
>>>>>>> code for the sake of reusing device tree binding, this raises a question
>>>>>>> whether that’s worthwhile.
>>>>>>>
>>>>>>> Or I really missed something?
>>>>>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>>>>>> bindings, not necessarily the IOMMU drivers framework in Linux (although
>>>>>> that would be an added bonus.)
>>>>>>
>>>>>> I know from previous discussions with you that making the grant table
>>>>>> fit in the existing IOMMU drivers model is difficult, but just reusing
>>>>>> the Device Tree bindings seems feasible?
>>>>> I started experimenting with that. As wrote in a separate email, I got a
>>>>> deferred probe timeout,
>>>>>
>>>>> after inserting required nodes into guest device tree, which seems to be a
>>>>> consequence of the unavailability of IOMMU, I will continue to investigate
>>>>> this question.
>>>>
>>>> I have experimented with that. Yes, just reusing the Device Tree bindings is
>>>> technically feasible (and we are able to do this by only touching
>>>> grant-dma-ops.c), although deferred probe timeout still stands (as there is no
>>>> IOMMU driver being present actually).
>>>>
>>>> [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
>>>> dependency
>>>> [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
>>>> GB/1.95 GiB)
>>>>
>>>>
>>>> Below the working diff (on top of current series):
>>>>
>>>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>>>> index da9c7ff..6586152 100644
>>>> --- a/drivers/xen/grant-dma-ops.c
>>>> +++ b/drivers/xen/grant-dma-ops.c
>>>> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
>>>>
>>>>   bool xen_is_grant_dma_device(struct device *dev)
>>>>   {
>>>> +       struct device_node *iommu_np;
>>>> +       bool has_iommu;
>>>> +
>>>>          /* XXX Handle only DT devices for now */
>>>>          if (!dev->of_node)
>>>>                  return false;
>>>>
>>>> -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
>>>> +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
>>>> +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
>>>> "xen,grant-dma");
>>>> +       of_node_put(iommu_np);
>>>> +
>>>> +       return has_iommu;
>>>>   }
>>>>
>>>>   void xen_grant_setup_dma_ops(struct device *dev)
>>>>   {
>>>>          struct xen_grant_dma_data *data;
>>>> -       uint32_t domid;
>>>> +       struct of_phandle_args iommu_spec;
>>>>
>>>>          data = find_xen_grant_dma_data(dev);
>>>>          if (data) {
>>>> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>>>>          if (!dev->of_node)
>>>>                  goto err;
>>>>
>>>> -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
>>>> -               dev_err(dev, "xen,backend-domid property is not present\n");
>>>> +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
>>>> +                       0, &iommu_spec)) {
>>>> +               dev_err(dev, "Cannot parse iommus property\n");
>>>> +               goto err;
>>>> +       }
>>>> +
>>>> +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
>>>> +                       iommu_spec.args_count != 1) {
>>>> +               dev_err(dev, "Incompatible IOMMU node\n");
>>>> +               of_node_put(iommu_spec.np);
>>>>                  goto err;
>>>>          }
>>>>
>>>> +       of_node_put(iommu_spec.np);
>>>> +
>>>>          data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>>>          if (!data)
>>>>                  goto err;
>>>>
>>>> -       data->backend_domid = domid;
>>>> +       /*
>>>> +        * The endpoint ID here means the ID of the domain where the
>>>> corresponding
>>>> +        * backend is running
>>>> +        */
>>>> +       data->backend_domid = iommu_spec.args[0];
>>>>
>>>>          if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
>>>>                          GFP_KERNEL))) {
>>>> (END)
>>>>
>>>>
>>>>
>>>> Below, the nodes generated by Xen toolstack:
>>>>
>>>>          xen_grant_dma {
>> Nit: iommu {
>>
>>>>                  compatible = "xen,grant-dma";
>>>>                  #iommu-cells = <0x01>;
>>>>                  phandle = <0xfde9>;
>>>>          };
>>>>
>>>>          virtio@2000000 {
>>>>                  compatible = "virtio,mmio";
>>>>                  reg = <0x00 0x2000000 0x00 0x200>;
>>>>                  interrupts = <0x00 0x01 0xf01>;
>>>>                  interrupt-parent = <0xfde8>;
>>>>                  dma-coherent;
>>>>                  iommus = <0xfde9 0x01>;
>>>>          };
>>> Not bad! I like it.
>>>
>>>
>>>> I am wondering, would be the proper solution to eliminate deferred probe
>>>> timeout issue in our particular case (without introducing an extra IOMMU
>>>> driver)?
>>> In reality I don't think there is a way to do that. I would create an
>>> empty skelethon IOMMU driver for xen,grant-dma.
>> Does it have to be an empty driver? Originally, IOMMU 'drivers' were not
>> drivers, but they've been getting converted. Can that be done here?
>>
>> Short of that, I think we could have some sort of skip probe list for
>> deferred probe. Not sure if that would be easiest as IOMMU specific or
>> global.
> Hi Oleksandr,
>
> If you do fw_devlink.strict=1, you'll notice that the consumers of
> this "iommu" won't probe at all or will delay the boot by some number
> of seconds. The eventual goal is to go towards fw_devlink.strict=1
> being the default.

ok, I got it.

Let's me please explain our particular case in details, sorry I may 
repeat some information which I have already mentioned elsewhere, but it 
maybe better to keep the whole context here.

We have Xen grant DMA-mapping layer added by previous commit [1]. For it 
to operate properly we need a way to communicate some per-device 
information using device-tree,
and this information is Xen specific. This is what the current commit is 
doing by introducing new binding to describe that. The next commit [2] 
will use that new binding to retrieve required information. There was a 
suggestion to consider reusing generic device-tree IOMMU bindings to 
communicate this specific information instead of introducing a custom 
property.

Although it requires more effort for the Xen toolstack (instead of 
adding a code to insert a single "xen,backend-domid" property, we need 
to generate fake IOMMU node, reserve phandle for it, etc), from the 
device tree PoV it looks indeed good (we reuse endpoint ID to pass the 
ID of the domain where the corresponding backend is running), and 
resulting code to retrieve this information in our DMA-mapping layer 
also looks simple enough [3].

Using generic device-tree IOMMU bindings:

          iommu {
                  compatible = "xen,grant-dma";
                  #iommu-cells = <0x01>;
                  phandle = <0xfde9>;
          };
          virtio@2000000 {
                  compatible = "virtio,mmio";
                  reg = <0x00 0x2000000 0x00 0x200>;
                  interrupts = <0x00 0x01 0xf01>;
                  interrupt-parent = <0xfde8>;
                  dma-coherent;
                  iommus = <0xfde9 0x01>;
          };

Using Xen specific property:

          virtio@2000000 {
                  compatible = "virtio,mmio";
                  reg = <0x00 0x2000000 0x00 0x200>;
                  interrupts = <0x00 0x01 0xf01>;
                  interrupt-parent = <0xfde8>;
                  dma-coherent;
                  xen,backend-domid = <0x01>;
          };


The main problem is that idea doesn't quite fit into how Linux currently 
behaves for the "iommus" property. Of course, just reusing IOMMU 
bindings (without having a corresponding driver) leads to the deferred 
probe timeout issue afterwards, because the IOMMU device never becomes 
available. From my understanding, our DMA-mapping layer we are consider 
to reuse IOMMU bindings for, *cannot* be converted into the proper IOMMU 
driver.

Sure, we will need to find a way how to deal with it, if we really want 
to reuse the IOMMU bindings. And yes, one idea was to just implement 
stub IOMMU driver for that purpose, I have rechecked, it works fine with 
that stub driver [4].

>
>  From a fw_devlik perspective, please implement a driver. Ideally a
> real one, but at least an empty one. The empty one doesn't need to be
> an IOMMU driver, but at least just do a return 0 in the probe
> function.


If I got things right, I am afraid, for the "of_iommu" case the empty 
driver is not enough. The driver should at least register iommu_ops, but 
the "of_xlate" callback should be *not* implemented.
In that case, we will get NO_IOMMU (>0 : there is no IOMMU, or one was 
unavailable for non-fatal reasons) which is also a success condition, so 
-EPROBE_DEFER won't be returned.

https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L32

Otherwise, of_iommu_xlate() will call driver_deferred_probe_check_state().

https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L43

>   Also, if it's not a device, why even have a "compatible"
> property (removing it won't necessarily remove the deferred probe
> timeout issue you see)? Will any code be using "xen,grant-dma" to look
> up the node?

Yes


>   If so, that driver could be the one that probes this
> device. At least from a fw_devlink perspective, it just needs to have
> a driver that binds to this device.

Agree


>
> Also, if we aren't going to implement a driver and have the supplier
> ("xen,grant-dma") behave like a device (as in, have a driver that
> probes), I'd rather that the iommu binding not be used at all as this
> would be an exception to how every other iommu device behaves.

Agree


Saravana, thank you for the explanation.


To summarize, as I understand, we have three options (the first two are 
clear enough, the third is unclear yet):

1. Do not try to reuse IOMMU bindings for current xen-virtio enabling 
work, use "xen,backend-domid" property.
2. Reuse IOMMU bindings, for that purpose introduce stub IOMMU driver. 
It is a standalone entity in my example, but it can be a part of 
grant-dma-ops.c which
actually uses "xen,grant-dma" compatible to look up a node.
3. Try to find other options how to reuse IOMMU bindings but *without* 
introducing stub IOMMU driver, such as skip list for deferred probe, etc.


What do the maintainers think regarding the option to go forward?


[1] 
https://lore.kernel.org/xen-devel/1651947548-4055-4-git-send-email-olekstysh@gmail.com/

[2] 
https://lore.kernel.org/xen-devel/1651947548-4055-7-git-send-email-olekstysh@gmail.com/

[3] 
https://lore.kernel.org/xen-devel/390ba7bb-ee9e-b7b7-5f08-71a7245fa4ec@gmail.com/

[4] 
https://lore.kernel.org/xen-devel/606dfdcc-ec10-0c4a-04e9-72cd73ee6676@gmail.com/

>
> -Saravana

-- 
Regards,

Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2022-05-25 16:30 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-07 18:19 [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer Oleksandr Tyshchenko
2022-05-07 18:19 ` [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops() Oleksandr Tyshchenko
2022-05-07 18:52   ` Catalin Marinas
2022-05-07 18:19 ` [PATCH V2 2/7] xen/grants: support allocating consecutive grants Oleksandr Tyshchenko
2022-05-11 18:00   ` Oleksandr
2022-05-11 21:09     ` Boris Ostrovsky
2022-05-12  6:11       ` Oleksandr
2022-05-12 20:01   ` Boris Ostrovsky
2022-05-13  5:33     ` Juergen Gross
2022-05-13 10:43       ` Oleksandr
2022-05-14  2:34       ` Boris Ostrovsky
2022-05-16  5:59         ` Juergen Gross
2022-05-16 16:00           ` Boris Ostrovsky
2022-05-16 18:30             ` Oleksandr
2022-05-16 18:57               ` Boris Ostrovsky
2022-05-07 18:19 ` [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen Oleksandr Tyshchenko
2022-05-09 21:39   ` Stefano Stabellini
2022-05-07 18:19 ` [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings Oleksandr Tyshchenko
2022-05-09 21:39   ` Stefano Stabellini
2022-05-07 18:19 ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops Oleksandr Tyshchenko
2022-05-09 21:39   ` [PATCH V2 5/7] dt-bindings: Add xen, dev-domid " Stefano Stabellini
2022-05-17  0:27   ` [PATCH V2 5/7] dt-bindings: Add xen,dev-domid " Rob Herring
2022-05-18 14:12     ` Oleksandr
2022-05-18 14:32   ` Arnd Bergmann
2022-05-18 16:06     ` Oleksandr
2022-05-18 16:39       ` Arnd Bergmann
2022-05-18 23:32         ` Oleksandr
2022-05-19  1:06           ` Stefano Stabellini
2022-05-19  6:03             ` Oleksandr
2022-05-23 17:30               ` Oleksandr
2022-05-24  1:58                 ` Stefano Stabellini
2022-05-24 16:01                   ` Rob Herring
2022-05-24 18:34                     ` Saravana Kannan
2022-05-25 16:30                       ` Oleksandr
2022-05-24 16:11                   ` Oleksandr
2022-05-24 17:59                     ` Stefano Stabellini
2022-05-25 11:15                       ` Oleksandr
2022-05-18 18:59     ` Rob Herring
2022-05-18 23:48       ` Oleksandr
2022-05-07 18:19 ` [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices Oleksandr Tyshchenko
2022-05-09 21:39   ` Stefano Stabellini
2022-05-07 18:19 ` [PATCH V2 7/7] arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices Oleksandr Tyshchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).