iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/13] virtio-iommu on non-devicetree platforms
@ 2019-11-22 10:49 Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder Jean-Philippe Brucker
                   ` (14 more replies)
  0 siblings, 15 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

I'm seeking feedback on multi-platform support for virtio-iommu. At the
moment only devicetree (DT) is supported and we don't have a pleasant
solution for other platforms. Once we figure out the topology
description, x86 support is trivial.

Since the IOMMU manages memory accesses from other devices, the guest
kernel needs to initialize the IOMMU before endpoints start issuing DMA.
It's a solved problem: firmware or hypervisor describes through DT or
ACPI tables the device dependencies, and probe of endpoints is deferred
until the IOMMU is probed. But:

(1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT
    for Arm). From my point of view IORT is easier to extend, since we
    just need to introduce a new node type. There are no dependencies to
    Arm in the Linux IORT driver, so it works well with CONFIG_X86.

    However, there are concerns about other OS vendors feeling obligated
    to implement this new node, so Arm proposed introducing another ACPI
    table, that can wrap any of DMAR, IVRS and IORT to extend it with
    new virtual nodes. A draft of this VIOT table specification is
    available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf

    I'm afraid this could increase fragmentation as guests would need to
    implement or modify their support for all of DMAR, IVRS and IORT. If
    we end up doing VIOT, I suggest limiting it to IORT.

(2) In addition, there are some concerns about having virtio depend on
    ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86
    [1]) don't currently implement those methods.

    It was suggested to embed the topology description into the device.
    It can work, as demonstrated at the end of this RFC, with the
    following limitations:

    - The topology description must be read before any endpoint managed
      by the IOMMU is probed, and even before the virtio module is
      loaded. This RFC uses a PCI quirk to manually parse the virtio
      configuration. It assumes that all endpoints managed by the IOMMU
      are under this same PCI host.

    - I don't have a solution for the virtio-mmio transport at the
      moment, because I haven't had time to modify a host to test it. I
      think it could either use a notifier on the platform bus, or
      better, a new 'iommu' command-line argument to the virtio-mmio
      driver. So the current prototype doesn't work for firecracker and
      microvm, which rely on virtio-mmio.

    - For Arm, if the platform has an ITS, the hypervisor needs IORT or
      DT to describe it anyway. More generally, not using either ACPI or
      DT might prevent from supporting other features as well. I suspect
      the above users will have to implement a standard method sooner or
      later.

    - Even when reusing as much existing code as possible, guest support
      is still going to be around a few hundred lines since we can't
      rely on the normal virtio infrastructure to be loaded at that
      point. As you can see below, the diffstat for the incomplete
      topology implementation is already bigger than the exhaustive IORT
      support, even when jumping through the VIOT hoop.

    So it's a lightweight solution for very specific use-cases, and we
    should still support ACPI for the general case. Multi-platform
    guests such as Linux will then need to support three topology
    descriptions instead of two.

In this RFC I present both solutions, but I'd rather not keep all of it.
Please see the individual patches for details:

(1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
    driver and patches 2, 11 add the VIOT glue.

(2) Patch 12 adds the built-in topology description to the virtio-iommu
    specification. Patch 13 is a partial implementation for the Linux
    virtio-iommu driver. It only supports PCI, not platform devices.

You can find Linux and QEMU code on my virtio-iommu/devel branches at
http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu


I split the diffstat since there are two independent features. The first
one is for patches 1-11, and the second one for patch 13.

Jean-Philippe Brucker (11):
  ACPI/IORT: Move IORT to the ACPI folder
  ACPI: Add VIOT definitions
  ACPI/IORT: Allow registration of external tables
  ACPI/IORT: Add node categories
  ACPI/IORT: Support VIOT virtio-mmio node
  ACPI/IORT: Support VIOT virtio-pci node
  ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
  ACPI/IORT: Add callback to update a device's fwnode
  iommu/virtio: Create fwnode if necessary
  iommu/virtio: Update IORT fwnode
  ACPI: Add VIOT table

 MAINTAINERS                     |   9 +
 drivers/acpi/Kconfig            |   7 +
 drivers/acpi/Makefile           |   2 +
 drivers/acpi/arm64/Kconfig      |   3 -
 drivers/acpi/arm64/Makefile     |   1 -
 drivers/acpi/bus.c              |   2 +
 drivers/acpi/{arm64 => }/iort.c | 317 ++++++++++++++++++++++++++------
 drivers/acpi/tables.c           |   2 +-
 drivers/acpi/viot.c             |  44 +++++
 drivers/iommu/Kconfig           |   1 +
 drivers/iommu/virtio-iommu.c    |  61 +++++-
 include/acpi/actbl2.h           |  31 ++++
 include/linux/acpi_iort.h       |  14 ++
 include/linux/acpi_viot.h       |  20 ++
 14 files changed, 448 insertions(+), 66 deletions(-)
 rename drivers/acpi/{arm64 => }/iort.c (86%)
 create mode 100644 drivers/acpi/viot.c
 create mode 100644 include/linux/acpi_viot.h

Jean-Philippe Brucker (1):
  iommu/virtio: Add topology description to virtio-iommu config space

 drivers/base/platform.c               |   3 +
 drivers/iommu/Kconfig                 |   9 +
 drivers/iommu/Makefile                |   1 +
 drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++
 drivers/iommu/virtio-iommu.c          |   3 +
 drivers/pci/pci-driver.c              |   3 +
 include/linux/virtio_iommu.h          |  18 ++
 include/uapi/linux/virtio_iommu.h     |  26 ++
 8 files changed, 473 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu-topology.c
 create mode 100644 include/linux/virtio_iommu.h


[1] firecracker: https://github.com/firecracker-microvm/firecracker
    microvm: https://github.com/qemu/qemu/blob/master/docs/microvm.rst
    kvmtool: https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 02/13] ACPI: Add VIOT definitions Jean-Philippe Brucker
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

IORT can be used (by QEMU) to describe a virtual topology containing an
architecture-agnostic paravirtualized device.

In order to build IORT for x86 systems, the driver has to be moved outside
of arm64/. Since there is nothing specific to arm64 in the driver, it
simply requires moving Makefile and Kconfig entries.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 MAINTAINERS                     | 9 +++++++++
 drivers/acpi/Kconfig            | 3 +++
 drivers/acpi/Makefile           | 1 +
 drivers/acpi/arm64/Kconfig      | 3 ---
 drivers/acpi/arm64/Makefile     | 1 -
 drivers/acpi/{arm64 => }/iort.c | 0
 6 files changed, 13 insertions(+), 4 deletions(-)
 rename drivers/acpi/{arm64 => }/iort.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index eb19fad370d7..9153d278f67e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -377,6 +377,15 @@ L:	platform-driver-x86@vger.kernel.org
 S:	Maintained
 F:	drivers/platform/x86/i2c-multi-instantiate.c
 
+ACPI IORT DRIVER
+M:	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
+M:	Hanjun Guo <guohanjun@huawei.com>
+M:	Sudeep Holla <sudeep.holla@arm.com>
+L:	linux-acpi@vger.kernel.org
+L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:	Maintained
+F:	drivers/acpi/iort.c
+
 ACPI PMIC DRIVERS
 M:	"Rafael J. Wysocki" <rjw@rjwysocki.net>
 M:	Len Brown <lenb@kernel.org>
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ebe1e9e5fd81..548976c8b2b0 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -576,6 +576,9 @@ config TPS68470_PMIC_OPREGION
 	  region, which must be available before any of the devices
 	  using this, are probed.
 
+config ACPI_IORT
+	bool
+
 endif	# ACPI
 
 config X86_PM_TIMER
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 5d361e4e3405..9d1792165713 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -123,3 +123,4 @@ video-objs			+= acpi_video.o video_detect.o
 obj-y				+= dptf/
 
 obj-$(CONFIG_ARM64)		+= arm64/
+obj-$(CONFIG_ACPI_IORT) 	+= iort.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 6dba187f4f2e..d0902c85d46e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -3,8 +3,5 @@
 # ACPI Configuration for ARM64
 #
 
-config ACPI_IORT
-	bool
-
 config ACPI_GTDT
 	bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 6ff50f4ed947..38771a816caf 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,3 +1,2 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_ACPI_IORT) 	+= iort.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/iort.c
similarity index 100%
rename from drivers/acpi/arm64/iort.c
rename to drivers/acpi/iort.c
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 02/13] ACPI: Add VIOT definitions
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 03/13] ACPI/IORT: Allow registration of external tables Jean-Philippe Brucker
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

This is temporary, until the VIOT table is published and these
definitions added to ACPICA.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/acpi/actbl2.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index e45ced27f4c3..99c1d747e9d8 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -25,6 +25,7 @@
  * the wrong signature.
  */
 #define ACPI_SIG_IORT           "IORT"	/* IO Remapping Table */
+#define ACPI_SIG_VIOT           "VIOT"	/* Virtual I/O Table */
 #define ACPI_SIG_IVRS           "IVRS"	/* I/O Virtualization Reporting Structure */
 #define ACPI_SIG_LPIT           "LPIT"	/* Low Power Idle Table */
 #define ACPI_SIG_MADT           "APIC"	/* Multiple APIC Description Table */
@@ -412,6 +413,36 @@ struct acpi_ivrs_memory {
 	u64 memory_length;
 };
 
+/*******************************************************************************
+ *
+ * VIOT - Virtual I/O Table
+ *        Version 1
+ *
+ ******************************************************************************/
+
+struct acpi_table_viot {
+	struct acpi_table_header header;
+	u8 reserved[12];
+	struct acpi_table_header base_table;
+};
+
+#define ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU    0x80
+#define ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU   0x81
+
+struct acpi_viot_iort_virtio_pci_iommu {
+	u32 devid;
+};
+
+struct acpi_viot_iort_virtio_mmio_iommu {
+	u64 base_address;
+	u64 span;
+	u64 flags;
+	u64 interrupt;
+};
+
+/* FIXME: rename this monstrosity. */
+#define ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT (1<<0)
+
 /*******************************************************************************
  *
  * LPIT - Low Power Idle Table
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 03/13] ACPI/IORT: Allow registration of external tables
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 02/13] ACPI: Add VIOT definitions Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 04/13] ACPI/IORT: Add node categories Jean-Philippe Brucker
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

Add a function to register an IORT table from an external source.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/acpi/iort.c       | 22 ++++++++++++++++++++--
 include/linux/acpi_iort.h | 10 ++++++++++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index d62a9ea26fae..9c6c91e06f8f 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -144,6 +144,7 @@ typedef acpi_status (*iort_find_node_callback)
 
 /* Root pointer to the mapped IORT table */
 static struct acpi_table_header *iort_table;
+static enum iort_table_source iort_table_source;
 
 static LIST_HEAD(iort_msi_chip_list);
 static DEFINE_SPINLOCK(iort_msi_chip_lock);
@@ -1617,11 +1618,28 @@ static void __init iort_init_platform_devices(void)
 	}
 }
 
+void __init acpi_iort_register_table(struct acpi_table_header *table,
+				     enum iort_table_source source)
+{
+	/*
+	 * Firmware or hypervisor should know better than give us two IORT
+	 * tables.
+	 */
+	if (WARN_ON(iort_table))
+		return;
+
+	iort_table = table;
+	iort_table_source = source;
+
+	iort_init_platform_devices();
+}
+
 void __init acpi_iort_init(void)
 {
 	acpi_status status;
+	static struct acpi_table_header *table;
 
-	status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table);
+	status = acpi_get_table(ACPI_SIG_IORT, 0, &table);
 	if (ACPI_FAILURE(status)) {
 		if (status != AE_NOT_FOUND) {
 			const char *msg = acpi_format_exception(status);
@@ -1632,5 +1650,5 @@ void __init acpi_iort_init(void)
 		return;
 	}
 
-	iort_init_platform_devices();
+	acpi_iort_register_table(table, IORT_SOURCE_IORT);
 }
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 8e7e2ec37f1b..f4db5fff07cf 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -11,6 +11,11 @@
 #include <linux/fwnode.h>
 #include <linux/irqdomain.h>
 
+enum iort_table_source {
+	IORT_SOURCE_IORT,	/* The Real Thing */
+	IORT_SOURCE_VIOT,	/* Paravirtual extensions */
+};
+
 #define IORT_IRQ_MASK(irq)		(irq & 0xffffffffULL)
 #define IORT_IRQ_TRIGGER_MASK(irq)	((irq >> 32) & 0xffffffffULL)
 
@@ -27,6 +32,8 @@ int iort_register_domain_token(int trans_id, phys_addr_t base,
 void iort_deregister_domain_token(int trans_id);
 struct fwnode_handle *iort_find_domain_token(int trans_id);
 #ifdef CONFIG_ACPI_IORT
+void acpi_iort_register_table(struct acpi_table_header *table,
+			      enum iort_table_source source);
 void acpi_iort_init(void);
 u32 iort_msi_map_rid(struct device *dev, u32 req_id);
 struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id);
@@ -37,6 +44,9 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
 const struct iommu_ops *iort_iommu_configure(struct device *dev);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head);
 #else
+static void acpi_iort_register_table(struct acpi_table_header *table,
+				     enum iort_table_source source)
+{ }
 static inline void acpi_iort_init(void) { }
 static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id)
 { return req_id; }
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 04/13] ACPI/IORT: Add node categories
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (2 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 03/13] ACPI/IORT: Allow registration of external tables Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 05/13] ACPI/IORT: Support VIOT virtio-mmio node Jean-Philippe Brucker
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

The current node filtering won't work when introducing node types
greater than 63 (such as the virtio-iommu nodes). Add
node_type_matches() to filter nodes by category.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/acpi/iort.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index 9c6c91e06f8f..1d43fbc0001f 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -18,10 +18,10 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 
-#define IORT_TYPE_MASK(type)	(1 << (type))
-#define IORT_MSI_TYPE		(1 << ACPI_IORT_NODE_ITS_GROUP)
-#define IORT_IOMMU_TYPE		((1 << ACPI_IORT_NODE_SMMU) |	\
-				(1 << ACPI_IORT_NODE_SMMU_V3))
+enum iort_node_category {
+	IORT_MSI_TYPE,
+	IORT_IOMMU_TYPE,
+};
 
 struct iort_its_msi_chip {
 	struct list_head	list;
@@ -38,6 +38,20 @@ struct iort_fwnode {
 static LIST_HEAD(iort_fwnode_list);
 static DEFINE_SPINLOCK(iort_fwnode_lock);
 
+static bool iort_type_matches(u8 type, enum iort_node_category category)
+{
+	switch (category) {
+	case IORT_IOMMU_TYPE:
+		return type == ACPI_IORT_NODE_SMMU ||
+		       type == ACPI_IORT_NODE_SMMU_V3;
+	case IORT_MSI_TYPE:
+		return type == ACPI_IORT_NODE_ITS_GROUP;
+	default:
+		WARN_ON(1);
+		return false;
+	}
+}
+
 /**
  * iort_set_fwnode() - Create iort_fwnode and use it to register
  *		       iommu data in the iort_fwnode_list
@@ -397,7 +411,7 @@ static int iort_get_id_mapping_index(struct acpi_iort_node *node)
 
 static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
 					       u32 id_in, u32 *id_out,
-					       u8 type_mask)
+					       enum iort_node_category category)
 {
 	u32 id = id_in;
 
@@ -406,7 +420,7 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
 		struct acpi_iort_id_mapping *map;
 		int i, index;
 
-		if (IORT_TYPE_MASK(node->type) & type_mask) {
+		if (iort_type_matches(node->type, category)) {
 			if (id_out)
 				*id_out = id;
 			return node;
@@ -458,8 +472,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
 }
 
 static struct acpi_iort_node *iort_node_map_platform_id(
-		struct acpi_iort_node *node, u32 *id_out, u8 type_mask,
-		int index)
+		struct acpi_iort_node *node, u32 *id_out,
+		enum iort_node_category category, int index)
 {
 	struct acpi_iort_node *parent;
 	u32 id;
@@ -475,8 +489,8 @@ static struct acpi_iort_node *iort_node_map_platform_id(
 	 * as NC (named component) -> SMMU -> ITS. If the type is matched,
 	 * return the initial dev id and its parent pointer directly.
 	 */
-	if (!(IORT_TYPE_MASK(parent->type) & type_mask))
-		parent = iort_node_map_id(parent, id, id_out, type_mask);
+	if (!iort_type_matches(parent->type, category))
+		parent = iort_node_map_id(parent, id, id_out, category);
 	else
 		if (id_out)
 			*id_out = id;
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 05/13] ACPI/IORT: Support VIOT virtio-mmio node
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (3 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 04/13] ACPI/IORT: Add node categories Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 06/13] ACPI/IORT: Support VIOT virtio-pci node Jean-Philippe Brucker
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

Add a new type of node to the IORT driver, that describes a virtio-iommu
device based on the virtio-mmio transport. The node is only available
when the IORT is a sub-table of the VIOT.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/acpi/iort.c | 66 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index 1d43fbc0001f..adc5953fffa5 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -43,7 +43,8 @@ static bool iort_type_matches(u8 type, enum iort_node_category category)
 	switch (category) {
 	case IORT_IOMMU_TYPE:
 		return type == ACPI_IORT_NODE_SMMU ||
-		       type == ACPI_IORT_NODE_SMMU_V3;
+		       type == ACPI_IORT_NODE_SMMU_V3 ||
+		       type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU;
 	case IORT_MSI_TYPE:
 		return type == ACPI_IORT_NODE_ITS_GROUP;
 	default:
@@ -868,8 +869,10 @@ static inline bool iort_iommu_driver_enabled(u8 type)
 		return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
 	case ACPI_IORT_NODE_SMMU:
 		return IS_BUILTIN(CONFIG_ARM_SMMU);
+	case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+		return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
 	default:
-		pr_warn("IORT node type %u does not describe an SMMU\n", type);
+		pr_warn("IORT node type %u does not describe an IOMMU\n", type);
 		return false;
 	}
 }
@@ -1408,6 +1411,46 @@ static int __init arm_smmu_v3_pmcg_add_platdata(struct platform_device *pdev)
 	return platform_device_add_data(pdev, &model, sizeof(model));
 }
 
+static int __init viommu_mmio_count_resources(struct acpi_iort_node *node)
+{
+	/* Mem + IRQ */
+	return 2;
+}
+
+static void __init viommu_mmio_init_resources(struct resource *res,
+					   struct acpi_iort_node *node)
+{
+	int hw_irq, trigger;
+	struct acpi_viot_iort_virtio_mmio_iommu *viommu;
+
+	viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data;
+
+	res[0].start = viommu->base_address;
+	res[0].end = viommu->base_address + viommu->span - 1;
+	res[0].flags = IORESOURCE_MEM;
+
+	hw_irq = IORT_IRQ_MASK(viommu->interrupt);
+	trigger = IORT_IRQ_TRIGGER_MASK(viommu->interrupt);
+	acpi_iort_register_irq(hw_irq, "viommu", trigger, res + 1);
+}
+
+static void __init viommu_mmio_dma_configure(struct device *dev,
+					  struct acpi_iort_node *node)
+{
+	enum dev_dma_attr attr;
+	struct acpi_viot_iort_virtio_mmio_iommu *viommu;
+
+	viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data;
+
+	attr = (viommu->flags & ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT) ?
+		DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT;
+
+	dev->dma_mask = &dev->coherent_dma_mask;
+
+	/* Configure DMA for the page table walker */
+	acpi_dma_configure(dev, attr);
+}
+
 struct iort_dev_config {
 	const char *name;
 	int (*dev_init)(struct acpi_iort_node *node);
@@ -1443,6 +1486,14 @@ static const struct iort_dev_config iort_arm_smmu_v3_pmcg_cfg __initconst = {
 	.dev_add_platdata = arm_smmu_v3_pmcg_add_platdata,
 };
 
+static const struct iort_dev_config iort_viommu_mmio_cfg __initconst = {
+	/* Probe with the generic virtio-mmio driver */
+	.name = "virtio-mmio",
+	.dev_dma_configure = viommu_mmio_dma_configure,
+	.dev_count_resources = viommu_mmio_count_resources,
+	.dev_init_resources = viommu_mmio_init_resources,
+};
+
 static __init const struct iort_dev_config *iort_get_dev_cfg(
 			struct acpi_iort_node *node)
 {
@@ -1453,9 +1504,16 @@ static __init const struct iort_dev_config *iort_get_dev_cfg(
 		return &iort_arm_smmu_cfg;
 	case ACPI_IORT_NODE_PMCG:
 		return &iort_arm_smmu_v3_pmcg_cfg;
-	default:
-		return NULL;
 	}
+
+	if (iort_table_source == IORT_SOURCE_VIOT) {
+		switch (node->type) {
+		case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+			return &iort_viommu_mmio_cfg;
+		}
+	}
+
+	return NULL;
 }
 
 /**
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 06/13] ACPI/IORT: Support VIOT virtio-pci node
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (4 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 05/13] ACPI/IORT: Support VIOT virtio-mmio node Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 07/13] ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode Jean-Philippe Brucker
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

When virtio-iommu uses the PCI transport, IORT doesn't instantiate the
device and doesn't create a fwnode. They will be created later by the
PCI subsystem. Store the information needed to identify the IOMMU in
iort_fwnode_list.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/acpi/iort.c | 117 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 93 insertions(+), 24 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index adc5953fffa5..b517aa4e83ba 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -30,10 +30,17 @@ struct iort_its_msi_chip {
 	u32			translation_id;
 };
 
+struct iort_pci_devid {
+	u16 segment;
+	u8 bus;
+	u8 devfn;
+};
+
 struct iort_fwnode {
 	struct list_head list;
 	struct acpi_iort_node *iort_node;
 	struct fwnode_handle *fwnode;
+	struct iort_pci_devid *pci_devid;
 };
 static LIST_HEAD(iort_fwnode_list);
 static DEFINE_SPINLOCK(iort_fwnode_lock);
@@ -44,7 +51,8 @@ static bool iort_type_matches(u8 type, enum iort_node_category category)
 	case IORT_IOMMU_TYPE:
 		return type == ACPI_IORT_NODE_SMMU ||
 		       type == ACPI_IORT_NODE_SMMU_V3 ||
-		       type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU;
+		       type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU ||
+		       type == ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU;
 	case IORT_MSI_TYPE:
 		return type == ACPI_IORT_NODE_ITS_GROUP;
 	default:
@@ -59,12 +67,14 @@ static bool iort_type_matches(u8 type, enum iort_node_category category)
  *
  * @node: IORT table node associated with the IOMMU
  * @fwnode: fwnode associated with the IORT node
+ * @pci_devid: pci device ID associated with the IORT node, may be NULL
  *
  * Returns: 0 on success
  *          <0 on failure
  */
 static inline int iort_set_fwnode(struct acpi_iort_node *iort_node,
-				  struct fwnode_handle *fwnode)
+				  struct fwnode_handle *fwnode,
+				  struct iort_pci_devid *pci_devid)
 {
 	struct iort_fwnode *np;
 
@@ -76,6 +86,7 @@ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node,
 	INIT_LIST_HEAD(&np->list);
 	np->iort_node = iort_node;
 	np->fwnode = fwnode;
+	np->pci_devid = pci_devid;
 
 	spin_lock(&iort_fwnode_lock);
 	list_add_tail(&np->list, &iort_fwnode_list);
@@ -121,6 +132,7 @@ static inline void iort_delete_fwnode(struct acpi_iort_node *node)
 	spin_lock(&iort_fwnode_lock);
 	list_for_each_entry_safe(curr, tmp, &iort_fwnode_list, list) {
 		if (curr->iort_node == node) {
+			kfree(curr->pci_devid);
 			list_del(&curr->list);
 			kfree(curr);
 			break;
@@ -870,6 +882,7 @@ static inline bool iort_iommu_driver_enabled(u8 type)
 	case ACPI_IORT_NODE_SMMU:
 		return IS_BUILTIN(CONFIG_ARM_SMMU);
 	case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+	case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU:
 		return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
 	default:
 		pr_warn("IORT node type %u does not describe an IOMMU\n", type);
@@ -1451,6 +1464,28 @@ static void __init viommu_mmio_dma_configure(struct device *dev,
 	acpi_dma_configure(dev, attr);
 }
 
+static __init struct iort_pci_devid *
+viommu_pci_get_devid(struct acpi_iort_node *node)
+{
+	unsigned int val;
+	struct iort_pci_devid *devid;
+	struct acpi_viot_iort_virtio_pci_iommu *viommu;
+
+	viommu = (struct acpi_viot_iort_virtio_pci_iommu *)node->node_data;
+
+	val = le32_to_cpu(viommu->devid);
+
+	devid = kzalloc(sizeof(*devid), GFP_KERNEL);
+	if (!devid)
+		return ERR_PTR(-ENOMEM);
+
+	devid->segment = val >> 16;
+	devid->bus = PCI_BUS_NUM(val);
+	devid->devfn = val & 0xff;
+
+	return devid;
+}
+
 struct iort_dev_config {
 	const char *name;
 	int (*dev_init)(struct acpi_iort_node *node);
@@ -1462,6 +1497,7 @@ struct iort_dev_config {
 	int (*dev_set_proximity)(struct device *dev,
 				    struct acpi_iort_node *node);
 	int (*dev_add_platdata)(struct platform_device *pdev);
+	struct iort_pci_devid *(*dev_get_pci_devid)(struct acpi_iort_node *node);
 };
 
 static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = {
@@ -1494,6 +1530,10 @@ static const struct iort_dev_config iort_viommu_mmio_cfg __initconst = {
 	.dev_init_resources = viommu_mmio_init_resources,
 };
 
+static const struct iort_dev_config iort_viommu_pci_cfg __initconst = {
+	.dev_get_pci_devid = viommu_pci_get_devid,
+};
+
 static __init const struct iort_dev_config *iort_get_dev_cfg(
 			struct acpi_iort_node *node)
 {
@@ -1510,6 +1550,8 @@ static __init const struct iort_dev_config *iort_get_dev_cfg(
 		switch (node->type) {
 		case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
 			return &iort_viommu_mmio_cfg;
+		case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU:
+			return &iort_viommu_pci_cfg;
 		}
 	}
 
@@ -1641,13 +1683,55 @@ static void __init iort_enable_acs(struct acpi_iort_node *iort_node)
 static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { }
 #endif
 
-static void __init iort_init_platform_devices(void)
+static int __init iort_init_node(struct acpi_iort_node *iort_node)
+{
+	int ret;
+	const struct iort_dev_config *ops;
+	struct fwnode_handle *fwnode;
+
+	iort_enable_acs(iort_node);
+
+	ops = iort_get_dev_cfg(iort_node);
+	if (!ops)
+		return 0;
+
+	if (ops->dev_get_pci_devid) {
+		struct iort_pci_devid *pci_devid =
+			ops->dev_get_pci_devid(iort_node);
+
+		if (IS_ERR(pci_devid))
+			return PTR_ERR(pci_devid);
+		/*
+		 * For a PCI-based IOMMU, set the pci_devid handle now, but
+		 * leave the fwnode empty. It will be completed later when the
+		 * PCI device gets probed.
+		 */
+		iort_set_fwnode(iort_node, NULL, pci_devid);
+
+		return 0;
+	}
+
+	fwnode = acpi_alloc_fwnode_static();
+	if (!fwnode)
+		return -ENOMEM;
+
+	iort_set_fwnode(iort_node, fwnode, NULL);
+
+	ret = iort_add_platform_device(iort_node, ops);
+	if (ret) {
+		iort_delete_fwnode(iort_node);
+		acpi_free_fwnode_static(fwnode);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __init iort_init_devices(void)
 {
 	struct acpi_iort_node *iort_node, *iort_end;
 	struct acpi_table_iort *iort;
-	struct fwnode_handle *fwnode;
-	int i, ret;
-	const struct iort_dev_config *ops;
+	int i;
 
 	/*
 	 * iort_table and iort both point to the start of IORT table, but
@@ -1667,23 +1751,8 @@ static void __init iort_init_platform_devices(void)
 			return;
 		}
 
-		iort_enable_acs(iort_node);
-
-		ops = iort_get_dev_cfg(iort_node);
-		if (ops) {
-			fwnode = acpi_alloc_fwnode_static();
-			if (!fwnode)
-				return;
-
-			iort_set_fwnode(iort_node, fwnode);
-
-			ret = iort_add_platform_device(iort_node, ops);
-			if (ret) {
-				iort_delete_fwnode(iort_node);
-				acpi_free_fwnode_static(fwnode);
-				return;
-			}
-		}
+		if (iort_init_node(iort_node))
+			return;
 
 		iort_node = ACPI_ADD_PTR(struct acpi_iort_node, iort_node,
 					 iort_node->length);
@@ -1703,7 +1772,7 @@ void __init acpi_iort_register_table(struct acpi_table_header *table,
 	iort_table = table;
 	iort_table_source = source;
 
-	iort_init_platform_devices();
+	iort_init_devices();
 }
 
 void __init acpi_iort_init(void)
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 07/13] ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (5 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 06/13] ACPI/IORT: Support VIOT virtio-pci node Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode Jean-Philippe Brucker
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

When the IOMMU is PCI-based, IORT doesn't know the fwnode until the
driver has had a chance to register it. In addition to deferring the
probe until the IOMMU ops are set, also defer the probe until the fwspec
is available.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/acpi/iort.c | 54 ++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index b517aa4e83ba..f08f72d8af78 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -61,6 +61,22 @@ static bool iort_type_matches(u8 type, enum iort_node_category category)
 	}
 }
 
+static inline bool iort_iommu_driver_enabled(u8 type)
+{
+	switch (type) {
+	case ACPI_IORT_NODE_SMMU_V3:
+		return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
+	case ACPI_IORT_NODE_SMMU:
+		return IS_BUILTIN(CONFIG_ARM_SMMU);
+	case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+	case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU:
+		return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
+	default:
+		pr_warn("IORT node type %u does not describe an IOMMU\n", type);
+		return false;
+	}
+}
+
 /**
  * iort_set_fwnode() - Create iort_fwnode and use it to register
  *		       iommu data in the iort_fwnode_list
@@ -102,9 +118,9 @@ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node,
  *
  * Returns: fwnode_handle pointer on success, NULL on failure
  */
-static inline struct fwnode_handle *iort_get_fwnode(
-			struct acpi_iort_node *node)
+static inline struct fwnode_handle *iort_get_fwnode(struct acpi_iort_node *node)
 {
+	int err = -ENODEV;
 	struct iort_fwnode *curr;
 	struct fwnode_handle *fwnode = NULL;
 
@@ -112,12 +128,20 @@ static inline struct fwnode_handle *iort_get_fwnode(
 	list_for_each_entry(curr, &iort_fwnode_list, list) {
 		if (curr->iort_node == node) {
 			fwnode = curr->fwnode;
+			if (!fwnode && curr->pci_devid) {
+				/*
+				 * Postpone probe until virtio-iommu has
+				 * registered its fwnode.
+				 */
+				err = iort_iommu_driver_enabled(node->type) ?
+					-EPROBE_DEFER : -ENODEV;
+			}
 			break;
 		}
 	}
 	spin_unlock(&iort_fwnode_lock);
 
-	return fwnode;
+	return fwnode ?: ERR_PTR(err);
 }
 
 /**
@@ -874,22 +898,6 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 	return (resv == its->its_count) ? resv : -ENODEV;
 }
 
-static inline bool iort_iommu_driver_enabled(u8 type)
-{
-	switch (type) {
-	case ACPI_IORT_NODE_SMMU_V3:
-		return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
-	case ACPI_IORT_NODE_SMMU:
-		return IS_BUILTIN(CONFIG_ARM_SMMU);
-	case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
-	case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU:
-		return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
-	default:
-		pr_warn("IORT node type %u does not describe an IOMMU\n", type);
-		return false;
-	}
-}
-
 static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,
 			       struct fwnode_handle *fwnode,
 			       const struct iommu_ops *ops)
@@ -920,8 +928,8 @@ static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node,
 		return -ENODEV;
 
 	iort_fwnode = iort_get_fwnode(node);
-	if (!iort_fwnode)
-		return -ENODEV;
+	if (IS_ERR(iort_fwnode))
+		return PTR_ERR(iort_fwnode);
 
 	/*
 	 * If the ops look-up fails, this means that either
@@ -1618,8 +1626,8 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
 
 	fwnode = iort_get_fwnode(node);
 
-	if (!fwnode) {
-		ret = -ENODEV;
+	if (IS_ERR(fwnode)) {
+		ret = PTR_ERR(fwnode);
 		goto dev_put;
 	}
 
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (6 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 07/13] ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 09/13] iommu/virtio: Create fwnode if necessary Jean-Philippe Brucker
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

For a PCI-based IOMMU, IORT isn't in charge of allocating a fwnode. Let
the IOMMU driver update the fwnode associated to an IORT node when
available.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/acpi/iort.c       | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/acpi_iort.h |  4 ++++
 2 files changed, 42 insertions(+)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index f08f72d8af78..8263ab275b2b 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -1038,11 +1038,49 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 
 	return ops;
 }
+
+/**
+ * iort_iommu_update_fwnode - update fwnode of a PCI IOMMU
+ * @dev: the IOMMU device
+ * @fwnode: the fwnode, or NULL to remove an existing fwnode
+ *
+ * A PCI device isn't instantiated by the IORT driver. The IOMMU driver sets or
+ * removes its fwnode using this function.
+ */
+void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle *fwnode)
+{
+	struct pci_dev *pdev;
+	struct iort_fwnode *curr;
+	struct iort_pci_devid *devid;
+
+	if (!dev_is_pci(dev))
+		return;
+
+	pdev = to_pci_dev(dev);
+
+	spin_lock(&iort_fwnode_lock);
+	list_for_each_entry(curr, &iort_fwnode_list, list) {
+		devid = curr->pci_devid;
+		if (devid &&
+		    pci_domain_nr(pdev->bus) == devid->segment &&
+		    pdev->bus->number == devid->bus &&
+		    pdev->devfn == devid->devfn) {
+			WARN_ON(fwnode && curr->fwnode);
+			curr->fwnode = fwnode;
+			break;
+		}
+	}
+	spin_unlock(&iort_fwnode_lock);
+}
+EXPORT_SYMBOL_GPL(iort_iommu_update_fwnode);
 #else
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 { return 0; }
 const struct iommu_ops *iort_iommu_configure(struct device *dev)
 { return NULL; }
+static void iort_iommu_update_fwnode(struct device *dev,
+				     struct fwnode_handle *fwnode)
+{ }
 #endif
 
 static int nc_dma_get_range(struct device *dev, u64 *size)
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f4db5fff07cf..840635e40d9d 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -43,6 +43,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
 const struct iommu_ops *iort_iommu_configure(struct device *dev);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head);
+void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle *fwnode);
 #else
 static void acpi_iort_register_table(struct acpi_table_header *table,
 				     enum iort_table_source source)
@@ -63,6 +64,9 @@ static inline const struct iommu_ops *iort_iommu_configure(
 static inline
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 { return 0; }
+static void iort_iommu_update_fwnode(struct device *dev,
+				     struct fwnode_handle *fwnode)
+{ }
 #endif
 
 #endif /* __ACPI_IORT_H__ */
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 09/13] iommu/virtio: Create fwnode if necessary
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (7 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 10/13] iommu/virtio: Update IORT fwnode Jean-Philippe Brucker
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

The presence of a fwnode on a PCI device depends on the platform. QEMU
q35, for example, creates an ACPI description for each PCI slot, but
QEMU virt (aarch64) doesn't. Since the IOMMU subsystem relies heavily on
fwnode to discover the DMA topology, create a fwnode for the
virtio-iommu if necessary, using the software_node framework.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/virtio-iommu.c | 56 ++++++++++++++++++++++++++++++++----
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 3ea9d7682999..8efa368134c0 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -966,6 +966,48 @@ static struct iommu_ops viommu_ops = {
 	.of_xlate		= viommu_of_xlate,
 };
 
+static int viommu_set_fwnode(struct viommu_dev *viommu)
+{
+	/*
+	 * viommu->dev is the virtio device, its parent is the associated
+	 * transport device.
+	 */
+	struct device *dev = viommu->dev->parent;
+
+	/*
+	 * With device tree a fwnode is always present. With ACPI, on some
+	 * platforms a PCI device has a DSDT node describing the slot. On other
+	 * platforms, no fwnode is created and we have to do it ourselves.
+	 */
+	if (!dev->fwnode) {
+		struct fwnode_handle *fwnode;
+
+		fwnode = fwnode_create_software_node(NULL, NULL);
+		if (IS_ERR(fwnode))
+			return PTR_ERR(fwnode);
+
+		set_primary_fwnode(dev, fwnode);
+	}
+
+	iommu_device_set_fwnode(&viommu->iommu, dev->fwnode);
+	return 0;
+}
+
+static void viommu_clear_fwnode(struct viommu_dev *viommu)
+{
+	struct device *dev = viommu->dev->parent;
+
+	if (!dev->fwnode)
+		return;
+
+	if (is_software_node(dev->fwnode)) {
+		struct fwnode_handle *fwnode = dev->fwnode;
+
+		set_primary_fwnode(dev, NULL);
+		fwnode_remove_software_node(fwnode);
+	}
+}
+
 static int viommu_init_vqs(struct viommu_dev *viommu)
 {
 	struct virtio_device *vdev = dev_to_virtio(viommu->dev);
@@ -1004,7 +1046,6 @@ static int viommu_fill_evtq(struct viommu_dev *viommu)
 
 static int viommu_probe(struct virtio_device *vdev)
 {
-	struct device *parent_dev = vdev->dev.parent;
 	struct viommu_dev *viommu = NULL;
 	struct device *dev = &vdev->dev;
 	u64 input_start = 0;
@@ -1084,9 +1125,11 @@ static int viommu_probe(struct virtio_device *vdev)
 	if (ret)
 		goto err_free_vqs;
 
-	iommu_device_set_ops(&viommu->iommu, &viommu_ops);
-	iommu_device_set_fwnode(&viommu->iommu, parent_dev->fwnode);
+	ret = viommu_set_fwnode(viommu);
+	if (ret)
+		goto err_sysfs_remove;
 
+	iommu_device_set_ops(&viommu->iommu, &viommu_ops);
 	iommu_device_register(&viommu->iommu);
 
 #ifdef CONFIG_PCI
@@ -1119,8 +1162,10 @@ static int viommu_probe(struct virtio_device *vdev)
 	return 0;
 
 err_unregister:
-	iommu_device_sysfs_remove(&viommu->iommu);
 	iommu_device_unregister(&viommu->iommu);
+	viommu_clear_fwnode(viommu);
+err_sysfs_remove:
+	iommu_device_sysfs_remove(&viommu->iommu);
 err_free_vqs:
 	vdev->config->del_vqs(vdev);
 
@@ -1131,8 +1176,9 @@ static void viommu_remove(struct virtio_device *vdev)
 {
 	struct viommu_dev *viommu = vdev->priv;
 
-	iommu_device_sysfs_remove(&viommu->iommu);
 	iommu_device_unregister(&viommu->iommu);
+	viommu_clear_fwnode(viommu);
+	iommu_device_sysfs_remove(&viommu->iommu);
 
 	/* Stop all virtqueues */
 	vdev->config->reset(vdev);
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 10/13] iommu/virtio: Update IORT fwnode
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (8 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 09/13] iommu/virtio: Create fwnode if necessary Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC 11/13] ACPI: Add VIOT table Jean-Philippe Brucker
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

When the virtio-iommu uses the PCI transport and the topology is
described with IORT, register the PCI fwnode with IORT.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/virtio-iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 8efa368134c0..9847552faecc 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -7,6 +7,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/acpi_iort.h>
 #include <linux/amba/bus.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
@@ -989,6 +990,8 @@ static int viommu_set_fwnode(struct viommu_dev *viommu)
 		set_primary_fwnode(dev, fwnode);
 	}
 
+	/* Tell IORT about a PCI device's fwnode */
+	iort_iommu_update_fwnode(dev, dev->fwnode);
 	iommu_device_set_fwnode(&viommu->iommu, dev->fwnode);
 	return 0;
 }
@@ -1000,6 +1003,8 @@ static void viommu_clear_fwnode(struct viommu_dev *viommu)
 	if (!dev->fwnode)
 		return;
 
+	iort_iommu_update_fwnode(dev, NULL);
+
 	if (is_software_node(dev->fwnode)) {
 		struct fwnode_handle *fwnode = dev->fwnode;
 
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 11/13] ACPI: Add VIOT table
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (9 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 10/13] iommu/virtio: Update IORT fwnode Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:49 ` [RFC virtio 12/13] virtio-iommu: Add built-in topology description Jean-Philippe Brucker
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

Add support for a new ACPI table that embeds other tables describing a
platform's IOMMU topology. Currently the only supported base table is
IORT. The VIOT contains an IORT with additional node types, that
describe a virtio-iommu.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/acpi/Kconfig      |  4 ++++
 drivers/acpi/Makefile     |  1 +
 drivers/acpi/bus.c        |  2 ++
 drivers/acpi/tables.c     |  2 +-
 drivers/acpi/viot.c       | 44 +++++++++++++++++++++++++++++++++++++++
 drivers/iommu/Kconfig     |  1 +
 include/linux/acpi_viot.h | 20 ++++++++++++++++++
 7 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/viot.c
 create mode 100644 include/linux/acpi_viot.h

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 548976c8b2b0..513a5e4d3526 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -579,6 +579,10 @@ config TPS68470_PMIC_OPREGION
 config ACPI_IORT
 	bool
 
+config ACPI_VIOT
+	bool
+	select ACPI_IORT
+
 endif	# ACPI
 
 config X86_PM_TIMER
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 9d1792165713..6abdc6cc32c7 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -124,3 +124,4 @@ obj-y				+= dptf/
 
 obj-$(CONFIG_ARM64)		+= arm64/
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
+obj-$(CONFIG_ACPI_VIOT)		+= viot.o
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 48bc96d45bab..6f364e0c9240 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -25,6 +25,7 @@
 #include <linux/dmi.h>
 #endif
 #include <linux/acpi_iort.h>
+#include <linux/acpi_viot.h>
 #include <linux/pci.h>
 #include <acpi/apei.h>
 #include <linux/suspend.h>
@@ -1246,6 +1247,7 @@ static int __init acpi_init(void)
 
 	pci_mmcfg_late_init();
 	acpi_iort_init();
+	acpi_viot_init();
 	acpi_scan_init();
 	acpi_ec_init();
 	acpi_debugfs_init();
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 180ac4329763..9662ea5e1064 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -501,7 +501,7 @@ static const char * const table_sigs[] = {
 	ACPI_SIG_WDDT, ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT,
 	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
 	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
-	NULL };
+	ACPI_SIG_VIOT, NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
new file mode 100644
index 000000000000..ab9a6e43ad9b
--- /dev/null
+++ b/drivers/acpi/viot.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2019 Linaro
+ *
+ * Virtual IOMMU table
+ */
+#define pr_fmt(fmt)	"ACPI: VIOT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/acpi_iort.h>
+#include <linux/acpi_viot.h>
+
+int __init acpi_viot_init(void)
+{
+	struct acpi_table_viot *viot;
+	struct acpi_table_header *acpi_header;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_VIOT, 0, &acpi_header);
+	if (ACPI_FAILURE(status)) {
+		if (status != AE_NOT_FOUND) {
+			const char *msg = acpi_format_exception(status);
+
+			pr_err("Failed to get table, %s\n", msg);
+			return -EINVAL;
+		}
+
+		return 0;
+	}
+
+	if (acpi_header->length < sizeof(*viot)) {
+		pr_err("VIOT table overflow, bad table!\n");
+		return -EINVAL;
+	}
+
+	viot = (struct acpi_table_viot *)acpi_header;
+	if (ACPI_COMPARE_NAMESEG(viot->base_table.signature, ACPI_SIG_IORT)) {
+		acpi_iort_register_table(&viot->base_table, IORT_SOURCE_VIOT);
+		return 0;
+	}
+
+	pr_err("Unknown base table header\n");
+	return -EINVAL;
+}
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e3842eabcfdd..e6eb4f238d1a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -480,6 +480,7 @@ config VIRTIO_IOMMU
 	depends on ARM64
 	select IOMMU_API
 	select INTERVAL_TREE
+	select ACPI_VIOT if ACPI
 	help
 	  Para-virtualised IOMMU driver with virtio.
 
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
new file mode 100644
index 000000000000..6c282d5eb793
--- /dev/null
+++ b/include/linux/acpi_viot.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Linaro
+ */
+
+#ifndef __ACPI_VIOT_H__
+#define __ACPI_VIOT_H__
+
+#ifdef CONFIG_ACPI_VIOT
+
+int acpi_viot_init(void);
+
+#else /* !CONFIG_ACPI_VIOT */
+
+static inline int acpi_viot_init(void)
+{}
+
+#endif /* !CONFIG_ACPI_VIOT */
+
+#endif /* __ACPI_VIOT_H__ */
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC virtio 12/13] virtio-iommu: Add built-in topology description
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (10 preceding siblings ...)
  2019-11-22 10:49 ` [RFC 11/13] ACPI: Add VIOT table Jean-Philippe Brucker
@ 2019-11-22 10:49 ` Jean-Philippe Brucker
  2019-11-22 10:50 ` [RFC 13/13] iommu/virtio: Add topology description to Jean-Philippe Brucker
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:49 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

Add a lightweight method to describe the IOMMU topology in the config
space, guarded by a new feature bit. A list of capabilities in the
config space describes the devices managed by the IOMMU and their
endpoint IDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 virtio-iommu.tex | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/virtio-iommu.tex b/virtio-iommu.tex
index 28c562b..2b29873 100644
--- a/virtio-iommu.tex
+++ b/virtio-iommu.tex
@@ -67,6 +67,9 @@ \subsection{Feature bits}\label{sec:Device Types / IOMMU Device / Feature bits}
 
 \item[VIRTIO_IOMMU_F_MMIO (5)]
   The VIRTIO_IOMMU_MAP_F_MMIO flag is available.
+
+\item[VIRTIO_IOMMU_F_TOPOLOGY (6)]
+  Topology description is available at \field{topo_offset}.
 \end{description}
 
 \drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits}
@@ -97,6 +100,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device /
     le32 end;
   } domain_range;
   le32 probe_size;
+  le16 topo_offset;
 };
 \end{lstlisting}
 
@@ -141,6 +145,90 @@ \subsection{Device initialization}\label{sec:Device Types / IOMMU Device / Devic
 If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
 device SHOULD NOT let endpoints access the guest-physical address space.
 
+\subsubsection{Built-in topology description}\label{sec:Device Types / IOMMU Device / Device initialization / topology}
+
+The device manages memory accesses from endpoints, identified by endpoint
+IDs. The driver can discover which endpoint ID corresponds to an endpoint
+using several methods, depending on the platform. Platforms described
+with device tree use the \texttt{iommus} and \texttt{iommu-map} properties
+embedded into device nodes for this purpose. Platforms described with
+ACPI use a table such as the Virtual I/O Table. Platforms that do not
+support either device tree or ACPI may embed a minimalistic description
+in the device configuration space.
+
+An important disadvantage of describing the topology from within the
+device is the lack of initialization ordering information. Out-of-band
+descriptions such as device tree and ACPI let the operating system know
+about device dependencies so that it can initialize supplier devices
+(IOMMUs) before their consumers (endpoints). Platforms using the
+VIRTIO_IOMMU_F_TOPOLOGY feature have to communicate the device dependency
+in another way.
+
+If the VIRTIO_IOMMU_F_TOPOLOGY feature is negotiated, \field{topo_offset}
+is the offset between the beginning of the device-specific configuration
+space (virtio_iommu_config) and the first topology structure header. A
+topology structures defines the endpoint ID of one or more endpoints
+managed by the virtio-iommu device.
+
+\begin{lstlisting}
+struct virtio_iommu_topo_head {
+  le16 type;
+  le16 next;
+};
+\end{lstlisting}
+
+\field{next} is the offset between the beginning of the device-specific
+configuration space and the next topology structure header. When
+\field{next} is zero, this is the last structure.
+
+\field{type} describes the type of structure:
+\begin{description}
+  \item[VIRTIO_IOMMU_TOPO_PCI_RANGE (0)] struct virtio_iommu_topo_pci_range
+  \item[VIRTIO_IOMMU_TOPO_ENDPOINT (1)] struct virtio_iommu_topo_endpoint
+\end{description}
+
+\paragraph{PCI range}\label{sec:Device Types / IOMMU Device / Device initialization / topology / PCI range}
+
+\begin{lstlisting}
+struct virtio_iommu_topo_pci_range {
+  struct virtio_iommu_topo_head head;
+  le32 endpoint_start;
+  le16 hierarchy;
+  le16 requester_start;
+  le16 requester_end;
+  le16 reserved;
+};
+\end{lstlisting}
+
+The PCI range structure describes the endpoint IDs of a series of PCI
+devices.
+
+\begin{description}
+  \item[\field{hierarchy}] Identifier of the PCI hierarchy. Sometimes
+    called PCI segment or domain number.
+  \item[\field{requester_start}] First requester ID in the range.
+  \item[\field{requester_end}] Last requester ID in the range.
+  \item[\field{endpoint_start}] First endpoint ID.
+\end{description}
+
+The correspondence between a PCI requester ID in the range
+[ requester_start; requester_end ] and its endpoint IDs is a linear
+transformation: endpoint_id = requester_id - requester_start +
+endpoint_start.
+
+\paragraph{Single endpoint}\label{sec:Device Types / IOMMU Device / Device initialization / topology / Single endpoint}
+
+\begin{lstlisting}
+struct virtio_iommu_topo_endpoint {
+  struct virtio_iommu_topo_head head;
+  le32 endpoint;
+  le64 address;
+};
+\end{lstlisting}
+
+\field{endpoint} is the ID of a single endpoint, identified by its first
+MMIO address in the physical address space.
+
 \subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device operations}
 
 Driver send requests on the request virtqueue, notifies the device and
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 13/13] iommu/virtio: Add topology description to
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (11 preceding siblings ...)
  2019-11-22 10:49 ` [RFC virtio 12/13] virtio-iommu: Add built-in topology description Jean-Philippe Brucker
@ 2019-11-22 10:50 ` Jean-Philippe Brucker
  2019-11-22 12:53   ` Michael S. Tsirkin
  2019-11-22 13:00 ` [RFC 00/13] virtio-iommu on non-devicetree platforms Michael S. Tsirkin
  2019-11-23  0:01 ` Jacob Pan (Jun)
  14 siblings, 1 reply; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-22 10:50 UTC (permalink / raw)
  To: linux-acpi, linux-arm-kernel, iommu, virtualization, linux-pci,
	virtio-dev
  Cc: kevin.tian, mst, gregkh, sudeep.holla, rjw, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, lenb

Some hypervisors don't implement either device-tree or ACPI, but still
need a method to describe the IOMMU topology. Read the virtio-iommu
config early and parse the topology description. Hook into the
dma_setup() callbacks to initialize the IOMMU before probing endpoints.

If the virtio-iommu uses the virtio-pci transport, this will only work
if the PCI root complex is the first device probed. We don't currently
support virtio-mmio.

Initially I tried to generate a fake IORT table and feed it to the IORT
driver, in order to avoid rewriting the whole DMA code, but it wouldn't
work with platform endpoints, which are references to items in the ACPI
table on IORT.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

---
Note that we only call virt_dma_configure() if the host didn't provide
either DT or ACPI method. If you want to test this with QEMU, you'll
need to manually disable the acpi_dma_configure() part in pci-driver.c
---
 drivers/base/platform.c               |   3 +
 drivers/iommu/Kconfig                 |   9 +
 drivers/iommu/Makefile                |   1 +
 drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++
 drivers/iommu/virtio-iommu.c          |   3 +
 drivers/pci/pci-driver.c              |   3 +
 include/linux/virtio_iommu.h          |  18 ++
 include/uapi/linux/virtio_iommu.h     |  26 ++
 8 files changed, 473 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu-topology.c
 create mode 100644 include/linux/virtio_iommu.h

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b230beb6ccb4..70b12c8ef2fb 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -27,6 +27,7 @@
 #include <linux/limits.h>
 #include <linux/property.h>
 #include <linux/kmemleak.h>
+#include <linux/virtio_iommu.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev)
 	} else if (has_acpi_companion(dev)) {
 		attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
 		ret = acpi_dma_configure(dev, attr);
+	} else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
+		ret = virt_dma_configure(dev);
 	}
 
 	return ret;
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e6eb4f238d1a..d02c0d36019d 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -486,4 +486,13 @@ config VIRTIO_IOMMU
 
 	  Say Y here if you intend to run this kernel as a guest.
 
+config VIRTIO_IOMMU_TOPOLOGY
+	bool "Topology properties for the virtio-iommu"
+	depends on VIRTIO_IOMMU
+	help
+	  Enable early probing of the virtio-iommu device, to detect the
+	  topology description.
+
+	  Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 4f405f926e73..6b51c4186ebc 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
diff --git a/drivers/iommu/virtio-iommu-topology.c b/drivers/iommu/virtio-iommu-topology.c
new file mode 100644
index 000000000000..ec22510ace3d
--- /dev/null
+++ b/drivers/iommu/virtio-iommu-topology.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/acpi.h>
+#include <linux/acpi_iort.h>
+#include <linux/dma-iommu.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/printk.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_iommu.h>
+#include <linux/virtio_pci.h>
+#include <uapi/linux/virtio_iommu.h>
+
+struct viommu_cap_config {
+	u8 pos; /* PCI capability position */
+	u8 bar;
+	u32 length; /* structure size */
+	u32 offset; /* structure offset within the bar */
+};
+
+struct viommu_spec {
+	struct device		*dev; /* transport device */
+	struct fwnode_handle	*fwnode;
+	struct iommu_ops	*ops;
+	struct list_head	topology;
+	struct list_head	list;
+};
+
+struct viommu_topology {
+	union {
+		struct virtio_iommu_topo_head head;
+		struct virtio_iommu_topo_pci_range pci;
+		struct virtio_iommu_topo_endpoint ep;
+	};
+	/* Index into viommu_spec->topology */
+	struct list_head list;
+};
+
+static LIST_HEAD(viommus);
+static DEFINE_MUTEX(viommus_lock);
+
+#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field)
+
+static inline int viommu_find_capability(struct pci_dev *dev, u8 cfg_type,
+					 struct viommu_cap_config *cap)
+{
+	int pos;
+	u8 bar;
+
+	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+	     pos > 0;
+	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+		u8 type;
+
+		pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), &type);
+		if (type != cfg_type)
+			continue;
+
+		pci_read_config_byte(dev, pos + VPCI_FIELD(bar), &bar);
+
+		/* Ignore structures with reserved BAR values */
+		if (type != VIRTIO_PCI_CAP_PCI_CFG && bar > 0x5)
+			continue;
+
+		cap->bar = bar;
+		cap->pos = pos;
+		pci_read_config_dword(dev, pos + VPCI_FIELD(length),
+				      &cap->length);
+		pci_read_config_dword(dev, pos + VPCI_FIELD(offset),
+				      &cap->offset);
+
+		return pos;
+	}
+	return 0;
+}
+
+/*
+ * Setup the special virtio PCI capability to read one of the config registers
+ */
+static int viommu_switch_pci_cfg(struct pci_dev *dev, int cfg,
+				 struct viommu_cap_config *cap, u32 length,
+				 u32 offset)
+{
+	offset += cap->offset;
+
+	if (offset + length > cap->offset + cap->length) {
+		dev_warn(&dev->dev,
+			 "read of %d bytes at offset 0x%x overflows cap of size %d\n",
+			 length, offset, cap->length);
+		return -EOVERFLOW;
+	}
+
+	pci_write_config_byte(dev, cfg + VPCI_FIELD(bar), cap->bar);
+	pci_write_config_dword(dev, cfg + VPCI_FIELD(length), length);
+	pci_write_config_dword(dev, cfg + VPCI_FIELD(offset), offset);
+	return 0;
+}
+
+static u32 viommu_cread(struct pci_dev *dev, int cfg,
+			struct viommu_cap_config *cap, u32 length, u32 offset)
+{
+	u8 val8;
+	u16 val16;
+	u32 val32;
+	int out = cfg + sizeof(struct virtio_pci_cap);
+
+	if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset))
+		return 0;
+
+	switch (length) {
+	case 1:
+		pci_read_config_byte(dev, out, &val8);
+		return val8;
+	case 2:
+		pci_read_config_word(dev, out, &val16);
+		return val16;
+	case 4:
+		pci_read_config_dword(dev, out, &val32);
+		return val32;
+	default:
+		WARN_ON(1);
+		return 0;
+	}
+}
+
+static void viommu_cwrite(struct pci_dev *dev, int cfg,
+			  struct viommu_cap_config *cap, u32 length, u32 offset,
+			  u32 val)
+{
+	int out = cfg + sizeof(struct virtio_pci_cap);
+
+	if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset))
+		return;
+
+	switch (length) {
+	case 1:
+		pci_write_config_byte(dev, out, (u8)val);
+		break;
+	case 2:
+		pci_write_config_word(dev, out, (u16)val);
+		break;
+	case 4:
+		pci_write_config_dword(dev, out, val);
+		break;
+	default:
+		WARN_ON(1);
+	}
+}
+
+static int viommu_add_topology(struct viommu_spec *viommu_spec,
+			       struct viommu_topology *cap)
+{
+	struct viommu_topology *new = kmemdup(cap, sizeof(*cap), GFP_KERNEL);
+
+	if (!new)
+		return -ENOMEM;
+
+	mutex_lock(&viommus_lock);
+	list_add(&new->list, &viommu_spec->topology);
+	mutex_unlock(&viommus_lock);
+	return 0;
+}
+
+static int viommu_parse_topology(struct pci_dev *dev, int pci_cfg,
+				 struct viommu_cap_config *dev_cfg)
+{
+	u32 offset;
+	struct viommu_topology cap;
+	struct viommu_spec *viommu_spec;
+	int iter = 0; /* Protects against config loop */
+
+	offset = viommu_cread(dev, pci_cfg, dev_cfg, 2,
+			      offsetof(struct virtio_iommu_config,
+				       topo_offset));
+	if (!offset)
+		return 0;
+
+	viommu_spec = kzalloc(sizeof(*viommu_spec), GFP_KERNEL);
+	if (!viommu_spec)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&viommu_spec->topology);
+	viommu_spec->dev = &dev->dev;
+
+	while (offset >= sizeof(struct virtio_iommu_config) && ++iter < 0x10000) {
+		memset(&cap, 0, sizeof(cap));
+
+		cap.head.type = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset);
+		cap.head.next = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset + 2);
+
+		switch (cap.head.type) {
+		case VIRTIO_IOMMU_TOPO_PCI_RANGE:
+			cap.pci.endpoint_start = viommu_cread(dev, pci_cfg,
+							      dev_cfg, 2, offset
+							      + 4);
+			cap.pci.hierarchy = viommu_cread(dev, pci_cfg, dev_cfg,
+							 2, offset + 8);
+			cap.pci.requester_start = viommu_cread(dev, pci_cfg,
+							       dev_cfg, 2,
+							       offset + 10);
+			cap.pci.requester_end = viommu_cread(dev, pci_cfg,
+							     dev_cfg, 2, offset +
+							     12);
+			dev_info(&dev->dev,
+				 "topology: adding PCI range 0x%x [0x%x:0x%x] -> 0x%x\n",
+				 cap.pci.hierarchy, cap.pci.requester_start,
+				 cap.pci.requester_end, cap.pci.endpoint_start);
+			if (viommu_add_topology(viommu_spec, &cap))
+				return -ENOMEM;
+			break;
+		case VIRTIO_IOMMU_TOPO_ENDPOINT:
+			cap.ep.endpoint = viommu_cread(dev, pci_cfg, dev_cfg, 2,
+						       offset + 4);
+			cap.ep.address = viommu_cread(dev, pci_cfg, dev_cfg, 2,
+						      offset + 8);
+			dev_info(&dev->dev,
+				 "topology: adding endpoint 0x%llx -> 0x%x\n",
+				 cap.ep.address, cap.ep.endpoint);
+			if (viommu_add_topology(viommu_spec, &cap))
+				return -ENOMEM;
+			break;
+		default:
+			dev_warn(&dev->dev, "Unknown topo structure 0x%x\n",
+				 cap.head.type);
+			break;
+		}
+
+		offset = cap.head.next;
+	}
+
+	/* TODO: handle device removal */
+	mutex_lock(&viommus_lock);
+	list_add(&viommu_spec->list, &viommus);
+	mutex_unlock(&viommus_lock);
+
+	return 0;
+}
+
+static void viommu_pci_parse_topology(struct pci_dev *dev)
+{
+	int pos;
+	u32 features;
+	struct viommu_cap_config common = {0};
+	struct viommu_cap_config pci_cfg = {0};
+	struct viommu_cap_config dev_cfg = {0};
+
+	pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, &common);
+	if (!pos) {
+		dev_warn(&dev->dev, "common capability not found\n");
+		return;
+	}
+	pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, &dev_cfg);
+	if (!pos) {
+		dev_warn(&dev->dev, "device config capability not found\n");
+		return;
+	}
+	pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_PCI_CFG, &pci_cfg);
+	if (!pos) {
+		dev_warn(&dev->dev, "PCI config capability not found\n");
+		return;
+	}
+
+	/* Find out if the device supports topology description */
+	viommu_cwrite(dev, pos, &common, 4,
+		      offsetof(struct virtio_pci_common_cfg,
+			       device_feature_select),
+		      0);
+	features = viommu_cread(dev, pos, &common, 4,
+				offsetof(struct virtio_pci_common_cfg,
+					 device_feature));
+	if (!(features & VIRTIO_IOMMU_F_TOPOLOGY)) {
+		dev_dbg(&dev->dev, "device doesn't have topology description");
+		return;
+	}
+
+	viommu_parse_topology(dev, pos, &dev_cfg);
+}
+
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1014,
+			viommu_pci_parse_topology);
+
+static const struct iommu_ops *virt_iommu_setup(struct device *dev)
+{
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	const struct iommu_ops *viommu_ops = NULL;
+	struct fwnode_handle *viommu_fwnode;
+	struct viommu_spec *viommu_spec;
+	struct viommu_topology *topo;
+	struct pci_dev *pdev = NULL;
+	struct device *viommu_dev;
+	bool found = false;
+	u16 devid;
+	u32 eid;
+	int ret;
+
+	/* Already translated? */
+	if (fwspec && fwspec->ops)
+		return fwspec->ops;
+
+	if (dev_is_pci(dev)) {
+		pdev = to_pci_dev(dev);
+		devid = pci_dev_id(pdev);
+	} else {
+		/* TODO: Do something with devres */
+		return NULL;
+	}
+
+	mutex_lock(&viommus_lock);
+	list_for_each_entry(viommu_spec, &viommus, list) {
+		list_for_each_entry(topo, &viommu_spec->topology, list) {
+			if (pdev &&
+			    topo->head.type == VIRTIO_IOMMU_TOPO_PCI_RANGE &&
+			    pci_domain_nr(pdev->bus) == topo->pci.hierarchy &&
+			    devid >= topo->pci.requester_start &&
+			    devid <= topo->pci.requester_end) {
+				found = true;
+				eid = devid - topo->pci.requester_start +
+					topo->pci.endpoint_start;
+				break;
+			} else if (!pdev) {
+				/* TODO: compare address with devres */
+			}
+		}
+		if (found) {
+			viommu_ops = viommu_spec->ops;
+			viommu_fwnode = viommu_spec->fwnode;
+			viommu_dev = viommu_spec->dev;
+			break;
+		}
+	}
+	mutex_unlock(&viommus_lock);
+	if (!found)
+		return NULL;
+
+	/* We're not translating ourselves, that would be silly. */
+	if (viommu_dev == dev)
+		return NULL;
+
+	if (!viommu_ops)
+		return ERR_PTR(-EPROBE_DEFER);
+
+	ret = iommu_fwspec_init(dev, viommu_fwnode, viommu_ops);
+	if (ret)
+		return ERR_PTR(ret);
+
+	iommu_fwspec_add_ids(dev, &eid, 1);
+
+	return viommu_ops;
+}
+
+/**
+ * virt_dma_configure - Configure DMA of virtualized devices
+ * @dev: the endpoint
+ *
+ * An alternative to the ACPI and DT methods to setup DMA and the IOMMU ops of a
+ * virtual device.
+ *
+ * Return: -EPROBE_DEFER if the IOMMU hasn't been loaded yet, 0 otherwise
+ */
+int virt_dma_configure(struct device *dev)
+{
+	const struct iommu_ops *iommu_ops;
+
+	/* TODO: do we need to mess about with the dma_mask as well? */
+	WARN_ON(!dev->dma_mask);
+
+	iommu_ops = virt_iommu_setup(dev);
+	if (IS_ERR(iommu_ops)) {
+		if (PTR_ERR(iommu_ops) == -EPROBE_DEFER)
+			return -EPROBE_DEFER;
+		iommu_ops = NULL;
+	}
+
+	/*
+	 * If we have reason to believe the IOMMU driver missed the initial
+	 * add_device callback for dev, replay it to get things in order.
+	 */
+	if (iommu_ops && dev->bus && !device_iommu_mapped(dev))
+		iommu_probe_device(dev);
+
+#ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
+	/* Assume coherent, as well as full 64-bit addresses. */
+	arch_setup_dma_ops(dev, 0, ~0UL, iommu_ops, true);
+#else
+	if (iommu_ops)
+		iommu_setup_dma_ops(dev, 0, ~0UL);
+#endif
+	return 0;
+}
+
+/**
+ * virt_set_iommu_ops - Set the IOMMU ops of a virtual IOMMU device
+ *
+ * Setup the iommu_ops associated to a viommu_spec, once the driver is loaded
+ * and the device probed.
+ */
+void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops)
+{
+	struct viommu_spec *viommu_spec;
+
+	mutex_lock(&viommus_lock);
+	list_for_each_entry(viommu_spec, &viommus, list) {
+		if (viommu_spec->dev == dev) {
+			viommu_spec->ops = ops;
+			viommu_spec->fwnode = ops ? dev->fwnode : NULL;
+			break;
+		}
+	}
+	mutex_unlock(&viommus_lock);
+}
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 9847552faecc..f68ee9615b38 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -22,6 +22,7 @@
 #include <linux/virtio.h>
 #include <linux/virtio_config.h>
 #include <linux/virtio_ids.h>
+#include <linux/virtio_iommu.h>
 #include <linux/wait.h>
 
 #include <uapi/linux/virtio_iommu.h>
@@ -1134,6 +1135,7 @@ static int viommu_probe(struct virtio_device *vdev)
 	if (ret)
 		goto err_sysfs_remove;
 
+	virt_set_iommu_ops(dev->parent, &viommu_ops);
 	iommu_device_set_ops(&viommu->iommu, &viommu_ops);
 	iommu_device_register(&viommu->iommu);
 
@@ -1182,6 +1184,7 @@ static void viommu_remove(struct virtio_device *vdev)
 	struct viommu_dev *viommu = vdev->priv;
 
 	iommu_device_unregister(&viommu->iommu);
+	virt_set_iommu_ops(vdev->dev.parent, NULL);
 	viommu_clear_fwnode(viommu);
 	iommu_device_sysfs_remove(&viommu->iommu);
 
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a8124e47bf6e..d9b5e902ad18 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -17,6 +17,7 @@
 #include <linux/suspend.h>
 #include <linux/kexec.h>
 #include <linux/of_device.h>
+#include <linux/virtio_iommu.h>
 #include <linux/acpi.h>
 #include "pci.h"
 #include "pcie/portdrv.h"
@@ -1633,6 +1634,8 @@ static int pci_dma_configure(struct device *dev)
 		struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
 
 		ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
+	} else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
+		ret = virt_dma_configure(dev);
 	}
 
 	pci_put_host_bridge_device(bridge);
diff --git a/include/linux/virtio_iommu.h b/include/linux/virtio_iommu.h
new file mode 100644
index 000000000000..b700256f1063
--- /dev/null
+++ b/include/linux/virtio_iommu.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef VIRTIO_IOMMU_H_
+#define VIRTIO_IOMMU_H_
+
+#if IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)
+int virt_dma_configure(struct device *dev);
+void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops);
+#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */
+static inline int virt_dma_configure(struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops)
+{ }
+#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */
+
+#endif /* VIRTIO_IOMMU_H_ */
diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h
index 237e36a280cb..d3b7cd2a076f 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -16,6 +16,7 @@
 #define VIRTIO_IOMMU_F_BYPASS			3
 #define VIRTIO_IOMMU_F_PROBE			4
 #define VIRTIO_IOMMU_F_MMIO			5
+#define VIRTIO_IOMMU_F_TOPOLOGY			6
 
 struct virtio_iommu_range_64 {
 	__le64					start;
@@ -36,6 +37,31 @@ struct virtio_iommu_config {
 	struct virtio_iommu_range_32		domain_range;
 	/* Probe buffer size */
 	__le32					probe_size;
+	/* Offset to the beginning of the topology table */
+	__le16					topo_offset;
+};
+
+struct virtio_iommu_topo_head {
+	__le16					type;
+	__le16					next;
+};
+
+#define VIRTIO_IOMMU_TOPO_PCI_RANGE		0x0
+#define VIRTIO_IOMMU_TOPO_ENDPOINT		0x1
+
+struct virtio_iommu_topo_pci_range {
+	struct virtio_iommu_topo_head		head;
+	__le32					endpoint_start;
+	__le16					hierarchy;
+	__le16					requester_start;
+	__le16					requester_end;
+	__le16					reserved;
+};
+
+struct virtio_iommu_topo_endpoint {
+	struct virtio_iommu_topo_head		head;
+	__le32					endpoint;
+	__le64					address;
 };
 
 /* Request types */
-- 
2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC 13/13] iommu/virtio: Add topology description to
  2019-11-22 10:50 ` [RFC 13/13] iommu/virtio: Add topology description to Jean-Philippe Brucker
@ 2019-11-22 12:53   ` Michael S. Tsirkin
  2019-11-25 17:48     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2019-11-22 12:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, linux-arm-kernel,
	lenb

On Fri, Nov 22, 2019 at 11:50:00AM +0100, Jean-Philippe Brucker wrote:
> Some hypervisors don't implement either device-tree or ACPI, but still
> need a method to describe the IOMMU topology. Read the virtio-iommu
> config early and parse the topology description. Hook into the
> dma_setup() callbacks to initialize the IOMMU before probing endpoints.
> 
> If the virtio-iommu uses the virtio-pci transport, this will only work
> if the PCI root complex is the first device probed. We don't currently
> support virtio-mmio.
> 
> Initially I tried to generate a fake IORT table and feed it to the IORT
> driver, in order to avoid rewriting the whole DMA code, but it wouldn't
> work with platform endpoints, which are references to items in the ACPI
> table on IORT.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

Overall this looks good to me. The only point is that
I think the way the interface is designed makes writing
the driver a bit too difficult. Idea: if instead we just
have a length field and then an array of records
(preferably unions so we don't need to work hard),
we can shadow that into memory, then iterate over
the unions.

Maybe add a uniform record length + number of records field.
Then just skip types you do not know how to handle.
This will also help make sure it's within bounds.

What do you think?


You will need to do something to address the TODO I think.

> ---
> Note that we only call virt_dma_configure() if the host didn't provide
> either DT or ACPI method. If you want to test this with QEMU, you'll
> need to manually disable the acpi_dma_configure() part in pci-driver.c
> ---
>  drivers/base/platform.c               |   3 +
>  drivers/iommu/Kconfig                 |   9 +
>  drivers/iommu/Makefile                |   1 +
>  drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++
>  drivers/iommu/virtio-iommu.c          |   3 +
>  drivers/pci/pci-driver.c              |   3 +
>  include/linux/virtio_iommu.h          |  18 ++
>  include/uapi/linux/virtio_iommu.h     |  26 ++
>  8 files changed, 473 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virtio_iommu.h
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index b230beb6ccb4..70b12c8ef2fb 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -27,6 +27,7 @@
>  #include <linux/limits.h>
>  #include <linux/property.h>
>  #include <linux/kmemleak.h>
> +#include <linux/virtio_iommu.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev)
>  	} else if (has_acpi_companion(dev)) {
>  		attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
>  		ret = acpi_dma_configure(dev, attr);
> +	} else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
> +		ret = virt_dma_configure(dev);
>  	}
>  
>  	return ret;
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e6eb4f238d1a..d02c0d36019d 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -486,4 +486,13 @@ config VIRTIO_IOMMU
>  
>  	  Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY
> +	bool "Topology properties for the virtio-iommu"
> +	depends on VIRTIO_IOMMU
> +	help
> +	  Enable early probing of the virtio-iommu device, to detect the
> +	  topology description.
> +
> +	  Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 4f405f926e73..6b51c4186ebc 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
> diff --git a/drivers/iommu/virtio-iommu-topology.c b/drivers/iommu/virtio-iommu-topology.c
> new file mode 100644
> index 000000000000..ec22510ace3d
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu-topology.c
> @@ -0,0 +1,410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/acpi.h>
> +#include <linux/acpi_iort.h>
> +#include <linux/dma-iommu.h>
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/pci.h>
> +#include <linux/printk.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_iommu.h>
> +#include <linux/virtio_pci.h>
> +#include <uapi/linux/virtio_iommu.h>
> +
> +struct viommu_cap_config {
> +	u8 pos; /* PCI capability position */
> +	u8 bar;
> +	u32 length; /* structure size */
> +	u32 offset; /* structure offset within the bar */
> +};
> +
> +struct viommu_spec {
> +	struct device		*dev; /* transport device */
> +	struct fwnode_handle	*fwnode;
> +	struct iommu_ops	*ops;
> +	struct list_head	topology;
> +	struct list_head	list;
> +};
> +
> +struct viommu_topology {
> +	union {
> +		struct virtio_iommu_topo_head head;
> +		struct virtio_iommu_topo_pci_range pci;
> +		struct virtio_iommu_topo_endpoint ep;
> +	};
> +	/* Index into viommu_spec->topology */
> +	struct list_head list;
> +};
> +
> +static LIST_HEAD(viommus);
> +static DEFINE_MUTEX(viommus_lock);
> +
> +#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field)
> +
> +static inline int viommu_find_capability(struct pci_dev *dev, u8 cfg_type,
> +					 struct viommu_cap_config *cap)
> +{
> +	int pos;
> +	u8 bar;
> +
> +	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +	     pos > 0;
> +	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> +		u8 type;
> +
> +		pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), &type);
> +		if (type != cfg_type)
> +			continue;
> +
> +		pci_read_config_byte(dev, pos + VPCI_FIELD(bar), &bar);
> +
> +		/* Ignore structures with reserved BAR values */
> +		if (type != VIRTIO_PCI_CAP_PCI_CFG && bar > 0x5)
> +			continue;
> +
> +		cap->bar = bar;
> +		cap->pos = pos;
> +		pci_read_config_dword(dev, pos + VPCI_FIELD(length),
> +				      &cap->length);
> +		pci_read_config_dword(dev, pos + VPCI_FIELD(offset),
> +				      &cap->offset);
> +
> +		return pos;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Setup the special virtio PCI capability to read one of the config registers
> + */
> +static int viommu_switch_pci_cfg(struct pci_dev *dev, int cfg,
> +				 struct viommu_cap_config *cap, u32 length,
> +				 u32 offset)
> +{
> +	offset += cap->offset;
> +
> +	if (offset + length > cap->offset + cap->length) {
> +		dev_warn(&dev->dev,
> +			 "read of %d bytes at offset 0x%x overflows cap of size %d\n",
> +			 length, offset, cap->length);
> +		return -EOVERFLOW;
> +	}
> +
> +	pci_write_config_byte(dev, cfg + VPCI_FIELD(bar), cap->bar);
> +	pci_write_config_dword(dev, cfg + VPCI_FIELD(length), length);
> +	pci_write_config_dword(dev, cfg + VPCI_FIELD(offset), offset);
> +	return 0;
> +}
> +
> +static u32 viommu_cread(struct pci_dev *dev, int cfg,
> +			struct viommu_cap_config *cap, u32 length, u32 offset)
> +{
> +	u8 val8;
> +	u16 val16;
> +	u32 val32;
> +	int out = cfg + sizeof(struct virtio_pci_cap);
> +
> +	if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset))
> +		return 0;
> +
> +	switch (length) {
> +	case 1:
> +		pci_read_config_byte(dev, out, &val8);
> +		return val8;
> +	case 2:
> +		pci_read_config_word(dev, out, &val16);
> +		return val16;
> +	case 4:
> +		pci_read_config_dword(dev, out, &val32);
> +		return val32;
> +	default:
> +		WARN_ON(1);
> +		return 0;
> +	}
> +}
> +
> +static void viommu_cwrite(struct pci_dev *dev, int cfg,
> +			  struct viommu_cap_config *cap, u32 length, u32 offset,
> +			  u32 val)

A single user with 4 byte parameter. Just open-code?

> +{
> +	int out = cfg + sizeof(struct virtio_pci_cap);
> +
> +	if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset))
> +		return;
> +
> +	switch (length) {
> +	case 1:
> +		pci_write_config_byte(dev, out, (u8)val);
> +		break;
> +	case 2:
> +		pci_write_config_word(dev, out, (u16)val);
> +		break;
> +	case 4:
> +		pci_write_config_dword(dev, out, val);
> +		break;
> +	default:
> +		WARN_ON(1);
> +	}
> +}
> +
> +static int viommu_add_topology(struct viommu_spec *viommu_spec,
> +			       struct viommu_topology *cap)
> +{
> +	struct viommu_topology *new = kmemdup(cap, sizeof(*cap), GFP_KERNEL);
> +
> +	if (!new)
> +		return -ENOMEM;
> +
> +	mutex_lock(&viommus_lock);
> +	list_add(&new->list, &viommu_spec->topology);
> +	mutex_unlock(&viommus_lock);
> +	return 0;
> +}
> +
> +static int viommu_parse_topology(struct pci_dev *dev, int pci_cfg,
> +				 struct viommu_cap_config *dev_cfg)
> +{
> +	u32 offset;
> +	struct viommu_topology cap;
> +	struct viommu_spec *viommu_spec;
> +	int iter = 0; /* Protects against config loop */
> +
> +	offset = viommu_cread(dev, pci_cfg, dev_cfg, 2,
> +			      offsetof(struct virtio_iommu_config,
> +				       topo_offset));
> +	if (!offset)
> +		return 0;
> +
> +	viommu_spec = kzalloc(sizeof(*viommu_spec), GFP_KERNEL);
> +	if (!viommu_spec)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&viommu_spec->topology);
> +	viommu_spec->dev = &dev->dev;
> +
> +	while (offset >= sizeof(struct virtio_iommu_config) && ++iter < 0x10000) {
> +		memset(&cap, 0, sizeof(cap));
> +
> +		cap.head.type = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset);
> +		cap.head.next = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset + 2);

All of this doesn't seem to be endian-clean. Try running sparse I think
it will complain.

> +
> +		switch (cap.head.type) {
> +		case VIRTIO_IOMMU_TOPO_PCI_RANGE:
> +			cap.pci.endpoint_start = viommu_cread(dev, pci_cfg,
> +							      dev_cfg, 2, offset
> +							      + 4);
> +			cap.pci.hierarchy = viommu_cread(dev, pci_cfg, dev_cfg,
> +							 2, offset + 8);
> +			cap.pci.requester_start = viommu_cread(dev, pci_cfg,
> +							       dev_cfg, 2,
> +							       offset + 10);
> +			cap.pci.requester_end = viommu_cread(dev, pci_cfg,
> +							     dev_cfg, 2, offset +
> +							     12);
> +			dev_info(&dev->dev,
> +				 "topology: adding PCI range 0x%x [0x%x:0x%x] -> 0x%x\n",
> +				 cap.pci.hierarchy, cap.pci.requester_start,
> +				 cap.pci.requester_end, cap.pci.endpoint_start);
> +			if (viommu_add_topology(viommu_spec, &cap))
> +				return -ENOMEM;
> +			break;
> +		case VIRTIO_IOMMU_TOPO_ENDPOINT:
> +			cap.ep.endpoint = viommu_cread(dev, pci_cfg, dev_cfg, 2,
> +						       offset + 4);
> +			cap.ep.address = viommu_cread(dev, pci_cfg, dev_cfg, 2,
> +						      offset + 8);
> +			dev_info(&dev->dev,
> +				 "topology: adding endpoint 0x%llx -> 0x%x\n",
> +				 cap.ep.address, cap.ep.endpoint);
> +			if (viommu_add_topology(viommu_spec, &cap))
> +				return -ENOMEM;
> +			break;
> +		default:
> +			dev_warn(&dev->dev, "Unknown topo structure 0x%x\n",
> +				 cap.head.type);
> +			break;
> +		}
> +
> +		offset = cap.head.next;
> +	}
> +
> +	/* TODO: handle device removal */
> +	mutex_lock(&viommus_lock);
> +	list_add(&viommu_spec->list, &viommus);
> +	mutex_unlock(&viommus_lock);
> +
> +	return 0;
> +}
> +
> +static void viommu_pci_parse_topology(struct pci_dev *dev)
> +{
> +	int pos;
> +	u32 features;
> +	struct viommu_cap_config common = {0};
> +	struct viommu_cap_config pci_cfg = {0};
> +	struct viommu_cap_config dev_cfg = {0};
> +
> +	pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, &common);
> +	if (!pos) {
> +		dev_warn(&dev->dev, "common capability not found\n");
> +		return;
> +	}
> +	pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, &dev_cfg);
> +	if (!pos) {
> +		dev_warn(&dev->dev, "device config capability not found\n");
> +		return;
> +	}
> +	pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_PCI_CFG, &pci_cfg);
> +	if (!pos) {
> +		dev_warn(&dev->dev, "PCI config capability not found\n");
> +		return;
> +	}
> +
> +	/* Find out if the device supports topology description */
> +	viommu_cwrite(dev, pos, &common, 4,
> +		      offsetof(struct virtio_pci_common_cfg,
> +			       device_feature_select),
> +		      0);
> +	features = viommu_cread(dev, pos, &common, 4,
> +				offsetof(struct virtio_pci_common_cfg,
> +					 device_feature));
> +	if (!(features & VIRTIO_IOMMU_F_TOPOLOGY)) {
> +		dev_dbg(&dev->dev, "device doesn't have topology description");
> +		return;
> +	}
> +
> +	viommu_parse_topology(dev, pos, &dev_cfg);
> +}
> +
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1014,
> +			viommu_pci_parse_topology);
> +
> +static const struct iommu_ops *virt_iommu_setup(struct device *dev)
> +{
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	const struct iommu_ops *viommu_ops = NULL;
> +	struct fwnode_handle *viommu_fwnode;
> +	struct viommu_spec *viommu_spec;
> +	struct viommu_topology *topo;
> +	struct pci_dev *pdev = NULL;
> +	struct device *viommu_dev;
> +	bool found = false;
> +	u16 devid;
> +	u32 eid;
> +	int ret;
> +
> +	/* Already translated? */
> +	if (fwspec && fwspec->ops)
> +		return fwspec->ops;
> +
> +	if (dev_is_pci(dev)) {
> +		pdev = to_pci_dev(dev);
> +		devid = pci_dev_id(pdev);
> +	} else {
> +		/* TODO: Do something with devres */
> +		return NULL;
> +	}
> +
> +	mutex_lock(&viommus_lock);
> +	list_for_each_entry(viommu_spec, &viommus, list) {
> +		list_for_each_entry(topo, &viommu_spec->topology, list) {
> +			if (pdev &&
> +			    topo->head.type == VIRTIO_IOMMU_TOPO_PCI_RANGE &&
> +			    pci_domain_nr(pdev->bus) == topo->pci.hierarchy &&
> +			    devid >= topo->pci.requester_start &&
> +			    devid <= topo->pci.requester_end) {
> +				found = true;
> +				eid = devid - topo->pci.requester_start +
> +					topo->pci.endpoint_start;
> +				break;
> +			} else if (!pdev) {
> +				/* TODO: compare address with devres */
> +			}
> +		}
> +		if (found) {
> +			viommu_ops = viommu_spec->ops;
> +			viommu_fwnode = viommu_spec->fwnode;
> +			viommu_dev = viommu_spec->dev;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&viommus_lock);
> +	if (!found)
> +		return NULL;
> +
> +	/* We're not translating ourselves, that would be silly. */
> +	if (viommu_dev == dev)
> +		return NULL;
> +
> +	if (!viommu_ops)
> +		return ERR_PTR(-EPROBE_DEFER);
> +
> +	ret = iommu_fwspec_init(dev, viommu_fwnode, viommu_ops);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	iommu_fwspec_add_ids(dev, &eid, 1);
> +
> +	return viommu_ops;
> +}
> +
> +/**
> + * virt_dma_configure - Configure DMA of virtualized devices
> + * @dev: the endpoint
> + *
> + * An alternative to the ACPI and DT methods to setup DMA and the IOMMU ops of a
> + * virtual device.
> + *
> + * Return: -EPROBE_DEFER if the IOMMU hasn't been loaded yet, 0 otherwise
> + */
> +int virt_dma_configure(struct device *dev)
> +{
> +	const struct iommu_ops *iommu_ops;
> +
> +	/* TODO: do we need to mess about with the dma_mask as well? */
> +	WARN_ON(!dev->dma_mask);
> +
> +	iommu_ops = virt_iommu_setup(dev);
> +	if (IS_ERR(iommu_ops)) {
> +		if (PTR_ERR(iommu_ops) == -EPROBE_DEFER)
> +			return -EPROBE_DEFER;
> +		iommu_ops = NULL;
> +	}
> +
> +	/*
> +	 * If we have reason to believe the IOMMU driver missed the initial
> +	 * add_device callback for dev, replay it to get things in order.
> +	 */
> +	if (iommu_ops && dev->bus && !device_iommu_mapped(dev))
> +		iommu_probe_device(dev);
> +
> +#ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> +	/* Assume coherent, as well as full 64-bit addresses. */
> +	arch_setup_dma_ops(dev, 0, ~0UL, iommu_ops, true);
> +#else
> +	if (iommu_ops)
> +		iommu_setup_dma_ops(dev, 0, ~0UL);
> +#endif
> +	return 0;
> +}
> +
> +/**
> + * virt_set_iommu_ops - Set the IOMMU ops of a virtual IOMMU device
> + *
> + * Setup the iommu_ops associated to a viommu_spec, once the driver is loaded
> + * and the device probed.
> + */
> +void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops)
> +{
> +	struct viommu_spec *viommu_spec;
> +
> +	mutex_lock(&viommus_lock);
> +	list_for_each_entry(viommu_spec, &viommus, list) {
> +		if (viommu_spec->dev == dev) {
> +			viommu_spec->ops = ops;
> +			viommu_spec->fwnode = ops ? dev->fwnode : NULL;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&viommus_lock);
> +}
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 9847552faecc..f68ee9615b38 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -22,6 +22,7 @@
>  #include <linux/virtio.h>
>  #include <linux/virtio_config.h>
>  #include <linux/virtio_ids.h>
> +#include <linux/virtio_iommu.h>
>  #include <linux/wait.h>
>  
>  #include <uapi/linux/virtio_iommu.h>
> @@ -1134,6 +1135,7 @@ static int viommu_probe(struct virtio_device *vdev)
>  	if (ret)
>  		goto err_sysfs_remove;
>  
> +	virt_set_iommu_ops(dev->parent, &viommu_ops);
>  	iommu_device_set_ops(&viommu->iommu, &viommu_ops);
>  	iommu_device_register(&viommu->iommu);
>  
> @@ -1182,6 +1184,7 @@ static void viommu_remove(struct virtio_device *vdev)
>  	struct viommu_dev *viommu = vdev->priv;
>  
>  	iommu_device_unregister(&viommu->iommu);
> +	virt_set_iommu_ops(vdev->dev.parent, NULL);
>  	viommu_clear_fwnode(viommu);
>  	iommu_device_sysfs_remove(&viommu->iommu);
>  
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..d9b5e902ad18 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -17,6 +17,7 @@
>  #include <linux/suspend.h>
>  #include <linux/kexec.h>
>  #include <linux/of_device.h>
> +#include <linux/virtio_iommu.h>
>  #include <linux/acpi.h>
>  #include "pci.h"
>  #include "pcie/portdrv.h"
> @@ -1633,6 +1634,8 @@ static int pci_dma_configure(struct device *dev)
>  		struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
>  
>  		ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
> +	} else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
> +		ret = virt_dma_configure(dev);
>  	}
>  
>  	pci_put_host_bridge_device(bridge);
> diff --git a/include/linux/virtio_iommu.h b/include/linux/virtio_iommu.h
> new file mode 100644
> index 000000000000..b700256f1063
> --- /dev/null
> +++ b/include/linux/virtio_iommu.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef VIRTIO_IOMMU_H_
> +#define VIRTIO_IOMMU_H_
> +
> +#if IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)
> +int virt_dma_configure(struct device *dev);
> +void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops);
> +#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */
> +static inline int virt_dma_configure(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops)
> +{ }
> +#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */
> +
> +#endif /* VIRTIO_IOMMU_H_ */
> diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h
> index 237e36a280cb..d3b7cd2a076f 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -16,6 +16,7 @@
>  #define VIRTIO_IOMMU_F_BYPASS			3
>  #define VIRTIO_IOMMU_F_PROBE			4
>  #define VIRTIO_IOMMU_F_MMIO			5
> +#define VIRTIO_IOMMU_F_TOPOLOGY			6
>  
>  struct virtio_iommu_range_64 {
>  	__le64					start;
> @@ -36,6 +37,31 @@ struct virtio_iommu_config {
>  	struct virtio_iommu_range_32		domain_range;
>  	/* Probe buffer size */
>  	__le32					probe_size;
> +	/* Offset to the beginning of the topology table */
> +	__le16					topo_offset;

why do we need an offset?

> +};
> +
> +struct virtio_iommu_topo_head {
> +	__le16					type;
> +	__le16					next;
> +};

So this linked list makes things harder than necessary imho.
It will be easier to just have a counter with # of records.
Then make all records the same size.
Then just read each record out into a buffer, and
handle it there.

> +
> +#define VIRTIO_IOMMU_TOPO_PCI_RANGE		0x0
> +#define VIRTIO_IOMMU_TOPO_ENDPOINT		0x1
> +
> +struct virtio_iommu_topo_pci_range {
> +	struct virtio_iommu_topo_head		head;
> +	__le32					endpoint_start;
> +	__le16					hierarchy;
> +	__le16					requester_start;
> +	__le16					requester_end;
> +	__le16					reserved;
> +};
> +
> +struct virtio_iommu_topo_endpoint {
> +	struct virtio_iommu_topo_head		head;
> +	__le32					endpoint;
> +	__le64					address;
>  };
>  
>  /* Request types */
> -- 
> 2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (12 preceding siblings ...)
  2019-11-22 10:50 ` [RFC 13/13] iommu/virtio: Add topology description to Jean-Philippe Brucker
@ 2019-11-22 13:00 ` Michael S. Tsirkin
  2019-11-25 17:53   ` Jean-Philippe Brucker
  2019-11-23  0:01 ` Jacob Pan (Jun)
  14 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2019-11-22 13:00 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, linux-arm-kernel,
	lenb

On Fri, Nov 22, 2019 at 11:49:47AM +0100, Jean-Philippe Brucker wrote:
> I'm seeking feedback on multi-platform support for virtio-iommu. At the
> moment only devicetree (DT) is supported and we don't have a pleasant
> solution for other platforms. Once we figure out the topology
> description, x86 support is trivial.
> 
> Since the IOMMU manages memory accesses from other devices, the guest
> kernel needs to initialize the IOMMU before endpoints start issuing DMA.
> It's a solved problem: firmware or hypervisor describes through DT or
> ACPI tables the device dependencies, and probe of endpoints is deferred
> until the IOMMU is probed. But:
> 
> (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT
>     for Arm). From my point of view IORT is easier to extend, since we
>     just need to introduce a new node type. There are no dependencies to
>     Arm in the Linux IORT driver, so it works well with CONFIG_X86.
> 
>     However, there are concerns about other OS vendors feeling obligated
>     to implement this new node, so Arm proposed introducing another ACPI
>     table, that can wrap any of DMAR, IVRS and IORT to extend it with
>     new virtual nodes. A draft of this VIOT table specification is
>     available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf
> 
>     I'm afraid this could increase fragmentation as guests would need to
>     implement or modify their support for all of DMAR, IVRS and IORT. If
>     we end up doing VIOT, I suggest limiting it to IORT.
> 
> (2) In addition, there are some concerns about having virtio depend on
>     ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86
>     [1])

power?

> don't currently implement those methods.
> 
>     It was suggested to embed the topology description into the device.
>     It can work, as demonstrated at the end of this RFC, with the
>     following limitations:
> 
>     - The topology description must be read before any endpoint managed
>       by the IOMMU is probed, and even before the virtio module is
>       loaded. This RFC uses a PCI quirk to manually parse the virtio
>       configuration. It assumes that all endpoints managed by the IOMMU
>       are under this same PCI host.
> 
>     - I don't have a solution for the virtio-mmio transport at the
>       moment, because I haven't had time to modify a host to test it. I
>       think it could either use a notifier on the platform bus, or
>       better, a new 'iommu' command-line argument to the virtio-mmio
>       driver.

	A notifier seems easier for users. What are the disadvantages of
	that?

>	So the current prototype doesn't work for firecracker and
>       microvm, which rely on virtio-mmio.
> 
>     - For Arm, if the platform has an ITS, the hypervisor needs IORT or
>       DT to describe it anyway. More generally, not using either ACPI or
>       DT might prevent from supporting other features as well. I suspect
>       the above users will have to implement a standard method sooner or
>       later.
> 
>     - Even when reusing as much existing code as possible, guest support
>       is still going to be around a few hundred lines since we can't
>       rely on the normal virtio infrastructure to be loaded at that
>       point. As you can see below, the diffstat for the incomplete
>       topology implementation is already bigger than the exhaustive IORT
>       support, even when jumping through the VIOT hoop.
> 
>     So it's a lightweight solution for very specific use-cases, and we
>     should still support ACPI for the general case. Multi-platform
>     guests such as Linux will then need to support three topology
>     descriptions instead of two.
> 
> In this RFC I present both solutions, but I'd rather not keep all of it.
> Please see the individual patches for details:
> 
> (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
>     driver and patches 2, 11 add the VIOT glue.
> 
> (2) Patch 12 adds the built-in topology description to the virtio-iommu
>     specification. Patch 13 is a partial implementation for the Linux
>     virtio-iommu driver. It only supports PCI, not platform devices.
> 
> You can find Linux and QEMU code on my virtio-iommu/devel branches at
> http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu
> 
> 
> I split the diffstat since there are two independent features. The first
> one is for patches 1-11, and the second one for patch 13.
> 
> Jean-Philippe Brucker (11):
>   ACPI/IORT: Move IORT to the ACPI folder
>   ACPI: Add VIOT definitions
>   ACPI/IORT: Allow registration of external tables
>   ACPI/IORT: Add node categories
>   ACPI/IORT: Support VIOT virtio-mmio node
>   ACPI/IORT: Support VIOT virtio-pci node
>   ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
>   ACPI/IORT: Add callback to update a device's fwnode
>   iommu/virtio: Create fwnode if necessary
>   iommu/virtio: Update IORT fwnode
>   ACPI: Add VIOT table
> 
>  MAINTAINERS                     |   9 +
>  drivers/acpi/Kconfig            |   7 +
>  drivers/acpi/Makefile           |   2 +
>  drivers/acpi/arm64/Kconfig      |   3 -
>  drivers/acpi/arm64/Makefile     |   1 -
>  drivers/acpi/bus.c              |   2 +
>  drivers/acpi/{arm64 => }/iort.c | 317 ++++++++++++++++++++++++++------
>  drivers/acpi/tables.c           |   2 +-
>  drivers/acpi/viot.c             |  44 +++++
>  drivers/iommu/Kconfig           |   1 +
>  drivers/iommu/virtio-iommu.c    |  61 +++++-
>  include/acpi/actbl2.h           |  31 ++++
>  include/linux/acpi_iort.h       |  14 ++
>  include/linux/acpi_viot.h       |  20 ++
>  14 files changed, 448 insertions(+), 66 deletions(-)
>  rename drivers/acpi/{arm64 => }/iort.c (86%)
>  create mode 100644 drivers/acpi/viot.c
>  create mode 100644 include/linux/acpi_viot.h
> 
> Jean-Philippe Brucker (1):
>   iommu/virtio: Add topology description to virtio-iommu config space
> 
>  drivers/base/platform.c               |   3 +
>  drivers/iommu/Kconfig                 |   9 +
>  drivers/iommu/Makefile                |   1 +
>  drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++
>  drivers/iommu/virtio-iommu.c          |   3 +
>  drivers/pci/pci-driver.c              |   3 +
>  include/linux/virtio_iommu.h          |  18 ++
>  include/uapi/linux/virtio_iommu.h     |  26 ++
>  8 files changed, 473 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virtio_iommu.h
> 
> 
> [1] firecracker: https://github.com/firecracker-microvm/firecracker
>     microvm: https://github.com/qemu/qemu/blob/master/docs/microvm.rst
>     kvmtool: https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/
> -- 
> 2.24.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
  2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
                   ` (13 preceding siblings ...)
  2019-11-22 13:00 ` [RFC 00/13] virtio-iommu on non-devicetree platforms Michael S. Tsirkin
@ 2019-11-23  0:01 ` Jacob Pan (Jun)
  2019-11-25 18:02   ` Jean-Philippe Brucker
  14 siblings, 1 reply; 23+ messages in thread
From: Jacob Pan (Jun) @ 2019-11-23  0:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf, mst,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, linux-arm-kernel,
	lenb

On Fri, 22 Nov 2019 11:49:47 +0100
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> I'm seeking feedback on multi-platform support for virtio-iommu. At
> the moment only devicetree (DT) is supported and we don't have a
> pleasant solution for other platforms. Once we figure out the topology
> description, x86 support is trivial.
> 
> Since the IOMMU manages memory accesses from other devices, the guest
> kernel needs to initialize the IOMMU before endpoints start issuing
> DMA. It's a solved problem: firmware or hypervisor describes through
> DT or ACPI tables the device dependencies, and probe of endpoints is
> deferred until the IOMMU is probed. But:
> 
> (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and
> IORT for Arm). From my point of view IORT is easier to extend, since
> we just need to introduce a new node type. There are no dependencies
> to Arm in the Linux IORT driver, so it works well with CONFIG_X86.
> 
From my limited understanding, IORT and VIOT is to solve device topology
enumeration only? I am not sure how it can be expanded to cover
information beyond device topology. e.g. DMAR has NUMA information and
root port ATS, I guess they are not used today in the guest but might
be additions in the future.

>     However, there are concerns about other OS vendors feeling
> obligated to implement this new node, so Arm proposed introducing
> another ACPI table, that can wrap any of DMAR, IVRS and IORT to
> extend it with new virtual nodes. A draft of this VIOT table
> specification is available at
> http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf
> 
>     I'm afraid this could increase fragmentation as guests would need
> to implement or modify their support for all of DMAR, IVRS and IORT.
> If we end up doing VIOT, I suggest limiting it to IORT.
> 
> (2) In addition, there are some concerns about having virtio depend on
>     ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool
> x86 [1]) don't currently implement those methods.
> 
>     It was suggested to embed the topology description into the
> device. It can work, as demonstrated at the end of this RFC, with the
>     following limitations:
> 
>     - The topology description must be read before any endpoint
> managed by the IOMMU is probed, and even before the virtio module is
>       loaded. This RFC uses a PCI quirk to manually parse the virtio
>       configuration. It assumes that all endpoints managed by the
> IOMMU are under this same PCI host.
> 

>     - I don't have a solution for the virtio-mmio transport at the
>       moment, because I haven't had time to modify a host to test it.
> I think it could either use a notifier on the platform bus, or
>       better, a new 'iommu' command-line argument to the virtio-mmio
>       driver. So the current prototype doesn't work for firecracker
> and microvm, which rely on virtio-mmio.
> 
>     - For Arm, if the platform has an ITS, the hypervisor needs IORT
> or DT to describe it anyway. More generally, not using either ACPI or
>       DT might prevent from supporting other features as well. I
> suspect the above users will have to implement a standard method
> sooner or later.
> 
>     - Even when reusing as much existing code as possible, guest
> support is still going to be around a few hundred lines since we can't
>       rely on the normal virtio infrastructure to be loaded at that
>       point. As you can see below, the diffstat for the incomplete
>       topology implementation is already bigger than the exhaustive
> IORT support, even when jumping through the VIOT hoop.
> 
>     So it's a lightweight solution for very specific use-cases, and we
>     should still support ACPI for the general case. Multi-platform
>     guests such as Linux will then need to support three topology
>     descriptions instead of two.
> 
> In this RFC I present both solutions, but I'd rather not keep all of
> it. Please see the individual patches for details:
> 
> (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
>     driver and patches 2, 11 add the VIOT glue.
> 
> (2) Patch 12 adds the built-in topology description to the
> virtio-iommu specification. Patch 13 is a partial implementation for
> the Linux virtio-iommu driver. It only supports PCI, not platform
> devices.
> 
> You can find Linux and QEMU code on my virtio-iommu/devel branches at
> http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu
> 
> 
> I split the diffstat since there are two independent features. The
> first one is for patches 1-11, and the second one for patch 13.
> 
> Jean-Philippe Brucker (11):
>   ACPI/IORT: Move IORT to the ACPI folder
>   ACPI: Add VIOT definitions
>   ACPI/IORT: Allow registration of external tables
>   ACPI/IORT: Add node categories
>   ACPI/IORT: Support VIOT virtio-mmio node
>   ACPI/IORT: Support VIOT virtio-pci node
>   ACPI/IORT: Defer probe until virtio-iommu-pci has registered a
> fwnode ACPI/IORT: Add callback to update a device's fwnode
>   iommu/virtio: Create fwnode if necessary
>   iommu/virtio: Update IORT fwnode
>   ACPI: Add VIOT table
> 
>  MAINTAINERS                     |   9 +
>  drivers/acpi/Kconfig            |   7 +
>  drivers/acpi/Makefile           |   2 +
>  drivers/acpi/arm64/Kconfig      |   3 -
>  drivers/acpi/arm64/Makefile     |   1 -
>  drivers/acpi/bus.c              |   2 +
>  drivers/acpi/{arm64 => }/iort.c | 317
> ++++++++++++++++++++++++++------ drivers/acpi/tables.c           |
> 2 +- drivers/acpi/viot.c             |  44 +++++
>  drivers/iommu/Kconfig           |   1 +
>  drivers/iommu/virtio-iommu.c    |  61 +++++-
>  include/acpi/actbl2.h           |  31 ++++
>  include/linux/acpi_iort.h       |  14 ++
>  include/linux/acpi_viot.h       |  20 ++
>  14 files changed, 448 insertions(+), 66 deletions(-)
>  rename drivers/acpi/{arm64 => }/iort.c (86%)
>  create mode 100644 drivers/acpi/viot.c
>  create mode 100644 include/linux/acpi_viot.h
> 
> Jean-Philippe Brucker (1):
>   iommu/virtio: Add topology description to virtio-iommu config space
> 
>  drivers/base/platform.c               |   3 +
>  drivers/iommu/Kconfig                 |   9 +
>  drivers/iommu/Makefile                |   1 +
>  drivers/iommu/virtio-iommu-topology.c | 410
> ++++++++++++++++++++++++++ drivers/iommu/virtio-iommu.c          |
> 3 + drivers/pci/pci-driver.c              |   3 +
>  include/linux/virtio_iommu.h          |  18 ++
>  include/uapi/linux/virtio_iommu.h     |  26 ++
>  8 files changed, 473 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virtio_iommu.h
> 
> 
> [1] firecracker: https://github.com/firecracker-microvm/firecracker
>     microvm: https://github.com/qemu/qemu/blob/master/docs/microvm.rst
>     kvmtool:
> https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 13/13] iommu/virtio: Add topology description to
  2019-11-22 12:53   ` Michael S. Tsirkin
@ 2019-11-25 17:48     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-25 17:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, linux-arm-kernel,
	lenb

On Fri, Nov 22, 2019 at 07:53:19AM -0500, Michael S. Tsirkin wrote:
> Overall this looks good to me. The only point is that
> I think the way the interface is designed makes writing
> the driver a bit too difficult. Idea: if instead we just
> have a length field and then an array of records
> (preferably unions so we don't need to work hard),
> we can shadow that into memory, then iterate over
> the unions.
> 
> Maybe add a uniform record length + number of records field.
> Then just skip types you do not know how to handle.
> This will also help make sure it's within bounds.
> 
> What do you think?

Sounds good, that should simplify the implementation a bit.

> You will need to do something to address the TODO I think.

Yes, I'll try to figure out a way to test platform devices.

> > +static void viommu_cwrite(struct pci_dev *dev, int cfg,
> > +			  struct viommu_cap_config *cap, u32 length, u32 offset,
> > +			  u32 val)
> 
> A single user with 4 byte parameter. Just open-code?

Ok

> > +		cap.head.type = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset);
> > +		cap.head.next = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset + 2);
> 
> All of this doesn't seem to be endian-clean. Try running sparse I think
> it will complain.

It does, I'll fix this

> > @@ -36,6 +37,31 @@ struct virtio_iommu_config {
> >  	struct virtio_iommu_range_32		domain_range;
> >  	/* Probe buffer size */
> >  	__le32					probe_size;
> > +	/* Offset to the beginning of the topology table */
> > +	__le16					topo_offset;
> 
> why do we need an offset?

I find it awkward to put a variable-size array in the middle of the
config. The virtio_iommu_config struct would be easier to extend later if
we keep the array at the end and only define small static fields here.

> 
> > +};
> > +
> > +struct virtio_iommu_topo_head {
> > +	__le16					type;
> > +	__le16					next;
> > +};
> 
> So this linked list makes things harder than necessary imho.
> It will be easier to just have a counter with # of records.
> Then make all records the same size.
> Then just read each record out into a buffer, and
> handle it there.

Yes, that should simplify things.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
  2019-11-22 13:00 ` [RFC 00/13] virtio-iommu on non-devicetree platforms Michael S. Tsirkin
@ 2019-11-25 17:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-25 17:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, linux-arm-kernel,
	lenb

On Fri, Nov 22, 2019 at 08:00:46AM -0500, Michael S. Tsirkin wrote:
> > (2) In addition, there are some concerns about having virtio depend on
> >     ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86
> >     [1])
> 
> power?

In kvmtool it boot with device tree. It also doesn't need virtio-iommu I
think, since it has its own paravirtualized interface.

> > don't currently implement those methods.
> > 
> >     It was suggested to embed the topology description into the device.
> >     It can work, as demonstrated at the end of this RFC, with the
> >     following limitations:
> > 
> >     - The topology description must be read before any endpoint managed
> >       by the IOMMU is probed, and even before the virtio module is
> >       loaded. This RFC uses a PCI quirk to manually parse the virtio
> >       configuration. It assumes that all endpoints managed by the IOMMU
> >       are under this same PCI host.
> > 
> >     - I don't have a solution for the virtio-mmio transport at the
> >       moment, because I haven't had time to modify a host to test it. I
> >       think it could either use a notifier on the platform bus, or
> >       better, a new 'iommu' command-line argument to the virtio-mmio
> >       driver.
> 
> 	A notifier seems easier for users. What are the disadvantages of
> 	that?

For each device we have to check if it's virtio-mmio, then map the MMIO
resource and check the device type. Having a dedicated command-line
argument would be more efficient.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
  2019-11-23  0:01 ` Jacob Pan (Jun)
@ 2019-11-25 18:02   ` Jean-Philippe Brucker
  2019-12-04  3:01     ` Jacob Pan (Jun)
  0 siblings, 1 reply; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-25 18:02 UTC (permalink / raw)
  To: Jacob Pan (Jun)
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf, mst,
	guohanjun, bhelgaas, jasowang, linux-arm-kernel, lenb

On Fri, Nov 22, 2019 at 04:01:02PM -0800, Jacob Pan (Jun) wrote:
> > (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and
> > IORT for Arm). From my point of view IORT is easier to extend, since
> > we just need to introduce a new node type. There are no dependencies
> > to Arm in the Linux IORT driver, so it works well with CONFIG_X86.
> > 
> From my limited understanding, IORT and VIOT is to solve device topology
> enumeration only? I am not sure how it can be expanded to cover
> information beyond device topology. e.g. DMAR has NUMA information and
> root port ATS, I guess they are not used today in the guest but might
> be additions in the future.

The PCI root-complex node of IORT has an ATS attribute, which we can
already use. However its scope is the root complex, not individual root
ports like with DMAR.

I'm not very familiar with NUMA, but it looks like we just need to specify
a proximity domain in relation to the SRAT table, for each viommu? The
SMMUv3 node in IORT has a 4-bytes "proximity domain" field for this. We
can add the same to the VIOT virtio-iommu nodes later, since the
structures are extensible.

But it might be better to keep the bare minimum information in the FW
descriptor, and put the rest in the virtio-iommu. So yes topology
enumeration is something the device cannot do itself (not fully that is,
see (2)) but for the rest, virtio-iommu's PROBE request can provide
details about each endpoint in relation to their physical IOMMU.

We could for example add a bit in a PROBE property saying that the whole
path between the IOMMU and the endpoint supports ATS. For NUMA it might
also be more interesting to have a finer granularity, since one viommu
could be managing endpoints that are behind different physical IOMMUs. If
in the future we want to allocate page tables close to the physical IOMMU
for example, we might need to describe multiple NUMA nodes per viommu,
using the PROBE request.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
  2019-11-25 18:02   ` Jean-Philippe Brucker
@ 2019-12-04  3:01     ` Jacob Pan (Jun)
  2019-12-18 11:20       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 23+ messages in thread
From: Jacob Pan (Jun) @ 2019-12-04  3:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf, mst,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, linux-arm-kernel,
	lenb

Hi Jean,

Sorry for the delay, I was out last week. Comments inline below.

On Mon, 25 Nov 2019 19:02:47 +0100
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> On Fri, Nov 22, 2019 at 04:01:02PM -0800, Jacob Pan (Jun) wrote:
> > > (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD
> > > and IORT for Arm). From my point of view IORT is easier to
> > > extend, since we just need to introduce a new node type. There
> > > are no dependencies to Arm in the Linux IORT driver, so it works
> > > well with CONFIG_X86. 
> > From my limited understanding, IORT and VIOT is to solve device
> > topology enumeration only? I am not sure how it can be expanded to
> > cover information beyond device topology. e.g. DMAR has NUMA
> > information and root port ATS, I guess they are not used today in
> > the guest but might be additions in the future.  
> 
> The PCI root-complex node of IORT has an ATS attribute, which we can
> already use. However its scope is the root complex, not individual
> root ports like with DMAR.
> 
> I'm not very familiar with NUMA, but it looks like we just need to
> specify a proximity domain in relation to the SRAT table, for each
> viommu? The SMMUv3 node in IORT has a 4-bytes "proximity domain"
> field for this. We can add the same to the VIOT virtio-iommu nodes
> later, since the structures are extensible.
> 
I think there the proximity domain is more for each assigned device
than vIOMMU. vIOMMU in the guest can have assigned devices belong to
different pIOMMU and proximity domains. If the guest owns the first
level page tables (gIOVA or SVA), we want to make sure page tables are
allocated from the close proximity domain.

My understanding is virtio IOMMU supports both virtio devices and
assigned devices. we could care less about the former in terms of NUMA.

In ACPI, we have _PXM method to retrieve device proximity domain. I
don't know if there is something equivalent or a generic way to get
_PXM information. I think VMM also need to make sure when an assigned
device is used with vIOMMU, there are some memory is allocated from the
device's proximity domain.

> But it might be better to keep the bare minimum information in the FW
> descriptor, and put the rest in the virtio-iommu. So yes topology
> enumeration is something the device cannot do itself (not fully that
> is, see (2)) but for the rest, virtio-iommu's PROBE request can
> provide details about each endpoint in relation to their physical
> IOMMU.
> 
> We could for example add a bit in a PROBE property saying that the
> whole path between the IOMMU and the endpoint supports ATS. For NUMA
> it might also be more interesting to have a finer granularity, since
> one viommu could be managing endpoints that are behind different
> physical IOMMUs. If in the future we want to allocate page tables
> close to the physical IOMMU for example, we might need to describe
> multiple NUMA nodes per viommu, using the PROBE request.
> 
Should we reinvent something for NUMA or use ACPI's SRAT, _PXM? I am
not sure how it is handled today in QEMU in terms of guest-host NUMA
proximity domain mapping.

> Thanks,
> Jean

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
  2019-12-04  3:01     ` Jacob Pan (Jun)
@ 2019-12-18 11:20       ` Jean-Philippe Brucker
  2019-12-20 18:54         ` Jacob Pan (Jun)
  0 siblings, 1 reply; 23+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-18 11:20 UTC (permalink / raw)
  To: Jacob Pan (Jun)
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf, mst,
	guohanjun, bhelgaas, jasowang, linux-arm-kernel, lenb

On Tue, Dec 03, 2019 at 07:01:36PM -0800, Jacob Pan (Jun) wrote:
> Hi Jean,
> 
> Sorry for the delay, I was out last week. Comments inline below.
> 
> On Mon, 25 Nov 2019 19:02:47 +0100
> Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> 
> > On Fri, Nov 22, 2019 at 04:01:02PM -0800, Jacob Pan (Jun) wrote:
> > > > (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD
> > > > and IORT for Arm). From my point of view IORT is easier to
> > > > extend, since we just need to introduce a new node type. There
> > > > are no dependencies to Arm in the Linux IORT driver, so it works
> > > > well with CONFIG_X86. 
> > > From my limited understanding, IORT and VIOT is to solve device
> > > topology enumeration only? I am not sure how it can be expanded to
> > > cover information beyond device topology. e.g. DMAR has NUMA
> > > information and root port ATS, I guess they are not used today in
> > > the guest but might be additions in the future.  
> > 
> > The PCI root-complex node of IORT has an ATS attribute, which we can
> > already use. However its scope is the root complex, not individual
> > root ports like with DMAR.
> > 
> > I'm not very familiar with NUMA, but it looks like we just need to
> > specify a proximity domain in relation to the SRAT table, for each
> > viommu? The SMMUv3 node in IORT has a 4-bytes "proximity domain"
> > field for this. We can add the same to the VIOT virtio-iommu nodes
> > later, since the structures are extensible.
> > 
> I think there the proximity domain is more for each assigned device
> than vIOMMU. vIOMMU in the guest can have assigned devices belong to
> different pIOMMU and proximity domains. If the guest owns the first
> level page tables (gIOVA or SVA), we want to make sure page tables are
> allocated from the close proximity domain.
> 
> My understanding is virtio IOMMU supports both virtio devices and
> assigned devices. we could care less about the former in terms of NUMA.
> 
> In ACPI, we have _PXM method to retrieve device proximity domain. I
> don't know if there is something equivalent or a generic way to get
> _PXM information. I think VMM also need to make sure when an assigned
> device is used with vIOMMU, there are some memory is allocated from the
> device's proximity domain.
> 
> > But it might be better to keep the bare minimum information in the FW
> > descriptor, and put the rest in the virtio-iommu. So yes topology
> > enumeration is something the device cannot do itself (not fully that
> > is, see (2)) but for the rest, virtio-iommu's PROBE request can
> > provide details about each endpoint in relation to their physical
> > IOMMU.
> > 
> > We could for example add a bit in a PROBE property saying that the
> > whole path between the IOMMU and the endpoint supports ATS. For NUMA
> > it might also be more interesting to have a finer granularity, since
> > one viommu could be managing endpoints that are behind different
> > physical IOMMUs. If in the future we want to allocate page tables
> > close to the physical IOMMU for example, we might need to describe
> > multiple NUMA nodes per viommu, using the PROBE request.
> > 
> Should we reinvent something for NUMA or use ACPI's SRAT, _PXM? 

Regardless whether we put it in the VIOT or in the virtio-iommu PROBE
request, we necessarily need to reuse the node IDs defined by SRAT (or
numa-node-id on devicetree, also a 32-bit value). A virtio-pci based
virtio-iommu already has the _PXM of its closest bridge and wouldn't need
anything more in the VIOT, while a virtio-mmio based virtio-iommu would
need a proximity domain field in the VIOT. That could be added later since
the table is extensible, but as you pointed out, that information might
not be very useful.

> I am not sure how it is handled today in QEMU in terms of guest-host
> NUMA proximity domain mapping.

It looks like the user can specify this guest-host mapping on the
command-line:

  -object memory-backend-ram,id=mem0,size=4G,host-nodes=3,policy=bind
  -object memory-backend-ram,id=mem1,size=4G,host-nodes=4,policy=bind
  -numa node,memdev=mem0,nodeid=numa0
  -numa node,memdev=mem1,nodeid=numa1
  -numa cpu,node-id=numa0,socket-id=0
  -numa cpu,node-id=numa1,socket-id=1

numa0 and numa1 would get proximity domains 0 and 1, corresponding to host
domains 3 and 4. It is also possible to specify the NUMA node of a PCI bus
(via the PCI expander bridge), and therefore to assign a VFIO PCI device
in the same proximity domain as its physical location.

  -device pxb,id=bridge1,bus=pci.0,numa_node=1 (simplified)
  -device vfio-pci,host=03:01.0,bus=bridge1

Linux can use this information to allocate DMA close to the endpoint (see
for example __iommu_dma_alloc_pages()). For page tables allocation,
io-pgtables currently takes the node ID of the IOMMU device, not the
endpoint. For the scenario you describe (virtio-iommu endpoints managed by
different physical IOMMU), we would need to take for example the node ID
of the first endpoint in the iommu_domain for which we're allocating page
tables.

Is it safe to assume that the pIOMMU is in the same proximity domain as
the physical endpoint?  If that's the case, then the guest already has all
the information it needs. Otherwise it's easy to add a proximity domain
PROBE property for each endpoint. Configuring the host to pass that
information might be more difficult.


Off topic, I've been wondering how to make iommu-sva aware of NUMA
topology as well, so that when handling a page request we allocate memory
on the faulting device's NUMA node, but I think it might require invasive
changes to the mm subsystem, to pass a NUMA node to handle_mm_fault().

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
  2019-12-18 11:20       ` Jean-Philippe Brucker
@ 2019-12-20 18:54         ` Jacob Pan (Jun)
  0 siblings, 0 replies; 23+ messages in thread
From: Jacob Pan (Jun) @ 2019-12-20 18:54 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, kevin.tian, gregkh, linux-pci, sudeep.holla, rjw,
	virtualization, linux-acpi, iommu, sebastien.boeuf, mst,
	jacob.jun.pan, guohanjun, bhelgaas, jasowang, linux-arm-kernel,
	lenb

On Wed, 18 Dec 2019 12:20:44 +0100
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> On Tue, Dec 03, 2019 at 07:01:36PM -0800, Jacob Pan (Jun) wrote:
> > Hi Jean,
> > 
> > Sorry for the delay, I was out last week. Comments inline below.
> > 
> > On Mon, 25 Nov 2019 19:02:47 +0100
> > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> >   
> > > On Fri, Nov 22, 2019 at 04:01:02PM -0800, Jacob Pan (Jun) wrote:  
> > > > > (1) ACPI has one table per vendor (DMAR for Intel, IVRS for
> > > > > AMD and IORT for Arm). From my point of view IORT is easier to
> > > > > extend, since we just need to introduce a new node type. There
> > > > > are no dependencies to Arm in the Linux IORT driver, so it
> > > > > works well with CONFIG_X86.   
> > > > From my limited understanding, IORT and VIOT is to solve device
> > > > topology enumeration only? I am not sure how it can be expanded
> > > > to cover information beyond device topology. e.g. DMAR has NUMA
> > > > information and root port ATS, I guess they are not used today
> > > > in the guest but might be additions in the future.    
> > > 
> > > The PCI root-complex node of IORT has an ATS attribute, which we
> > > can already use. However its scope is the root complex, not
> > > individual root ports like with DMAR.
> > > 
> > > I'm not very familiar with NUMA, but it looks like we just need to
> > > specify a proximity domain in relation to the SRAT table, for each
> > > viommu? The SMMUv3 node in IORT has a 4-bytes "proximity domain"
> > > field for this. We can add the same to the VIOT virtio-iommu nodes
> > > later, since the structures are extensible.
> > >   
> > I think there the proximity domain is more for each assigned device
> > than vIOMMU. vIOMMU in the guest can have assigned devices belong to
> > different pIOMMU and proximity domains. If the guest owns the first
> > level page tables (gIOVA or SVA), we want to make sure page tables
> > are allocated from the close proximity domain.
> > 
> > My understanding is virtio IOMMU supports both virtio devices and
> > assigned devices. we could care less about the former in terms of
> > NUMA.
> > 
> > In ACPI, we have _PXM method to retrieve device proximity domain. I
> > don't know if there is something equivalent or a generic way to get
> > _PXM information. I think VMM also need to make sure when an
> > assigned device is used with vIOMMU, there are some memory is
> > allocated from the device's proximity domain.
> >   
> > > But it might be better to keep the bare minimum information in
> > > the FW descriptor, and put the rest in the virtio-iommu. So yes
> > > topology enumeration is something the device cannot do itself
> > > (not fully that is, see (2)) but for the rest, virtio-iommu's
> > > PROBE request can provide details about each endpoint in relation
> > > to their physical IOMMU.
> > > 
> > > We could for example add a bit in a PROBE property saying that the
> > > whole path between the IOMMU and the endpoint supports ATS. For
> > > NUMA it might also be more interesting to have a finer
> > > granularity, since one viommu could be managing endpoints that
> > > are behind different physical IOMMUs. If in the future we want to
> > > allocate page tables close to the physical IOMMU for example, we
> > > might need to describe multiple NUMA nodes per viommu, using the
> > > PROBE request. 
> > Should we reinvent something for NUMA or use ACPI's SRAT, _PXM?   
> 
> Regardless whether we put it in the VIOT or in the virtio-iommu PROBE
> request, we necessarily need to reuse the node IDs defined by SRAT (or
> numa-node-id on devicetree, also a 32-bit value). A virtio-pci based
> virtio-iommu already has the _PXM of its closest bridge and wouldn't
> need anything more in the VIOT, while a virtio-mmio based
> virtio-iommu would need a proximity domain field in the VIOT. That
> could be added later since the table is extensible, but as you
> pointed out, that information might not be very useful.
> 
> > I am not sure how it is handled today in QEMU in terms of guest-host
> > NUMA proximity domain mapping.  
> 
> It looks like the user can specify this guest-host mapping on the
> command-line:
> 
>   -object memory-backend-ram,id=mem0,size=4G,host-nodes=3,policy=bind
>   -object memory-backend-ram,id=mem1,size=4G,host-nodes=4,policy=bind
>   -numa node,memdev=mem0,nodeid=numa0
>   -numa node,memdev=mem1,nodeid=numa1
>   -numa cpu,node-id=numa0,socket-id=0
>   -numa cpu,node-id=numa1,socket-id=1
> 
> numa0 and numa1 would get proximity domains 0 and 1, corresponding to
> host domains 3 and 4. It is also possible to specify the NUMA node of
> a PCI bus (via the PCI expander bridge), and therefore to assign a
> VFIO PCI device in the same proximity domain as its physical location.
> 
>   -device pxb,id=bridge1,bus=pci.0,numa_node=1 (simplified)
>   -device vfio-pci,host=03:01.0,bus=bridge1
> 
Thanks a lot for the thorough explanation. I will give that a try on
x86, I assume the ACPI tables also built to match these cmdline options.
> Linux can use this information to allocate DMA close to the endpoint
> (see for example __iommu_dma_alloc_pages()). For page tables
> allocation, io-pgtables currently takes the node ID of the IOMMU
> device, not the endpoint. For the scenario you describe (virtio-iommu
> endpoints managed by different physical IOMMU), we would need to take
> for example the node ID of the first endpoint in the iommu_domain for
> which we're allocating page tables.
> 
If iommu_domain is shared by multiple device from different NUMA node,
I guess taking the first one is as good as anyone. It would not be
an optimal configuration.
> Is it safe to assume that the pIOMMU is in the same proximity domain
> as the physical endpoint?
I think it is a safe assumption.
>  If that's the case, then the guest already
> has all the information it needs. Otherwise it's easy to add a
> proximity domain PROBE property for each endpoint. Configuring the
> host to pass that information might be more difficult.
> 
I agree, guest should always allocate DMA and IOVA page tables basedon
the endpoint. VT-d currently allocates page table pages based on IOMMU
NUMA node, that might have to change.
> 
> Off topic, I've been wondering how to make iommu-sva aware of NUMA
> topology as well, so that when handling a page request we allocate
> memory on the faulting device's NUMA node, but I think it might
> require invasive changes to the mm subsystem, to pass a NUMA node to
> handle_mm_fault().
> 
> Thanks,
> Jean

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-12-20 18:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 10:49 [RFC 00/13] virtio-iommu on non-devicetree platforms Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 02/13] ACPI: Add VIOT definitions Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 03/13] ACPI/IORT: Allow registration of external tables Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 04/13] ACPI/IORT: Add node categories Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 05/13] ACPI/IORT: Support VIOT virtio-mmio node Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 06/13] ACPI/IORT: Support VIOT virtio-pci node Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 07/13] ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 09/13] iommu/virtio: Create fwnode if necessary Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 10/13] iommu/virtio: Update IORT fwnode Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC 11/13] ACPI: Add VIOT table Jean-Philippe Brucker
2019-11-22 10:49 ` [RFC virtio 12/13] virtio-iommu: Add built-in topology description Jean-Philippe Brucker
2019-11-22 10:50 ` [RFC 13/13] iommu/virtio: Add topology description to Jean-Philippe Brucker
2019-11-22 12:53   ` Michael S. Tsirkin
2019-11-25 17:48     ` Jean-Philippe Brucker
2019-11-22 13:00 ` [RFC 00/13] virtio-iommu on non-devicetree platforms Michael S. Tsirkin
2019-11-25 17:53   ` Jean-Philippe Brucker
2019-11-23  0:01 ` Jacob Pan (Jun)
2019-11-25 18:02   ` Jean-Philippe Brucker
2019-12-04  3:01     ` Jacob Pan (Jun)
2019-12-18 11:20       ` Jean-Philippe Brucker
2019-12-20 18:54         ` Jacob Pan (Jun)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).