All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/10] Shared Virtual Addressing for the IOMMU
@ 2018-09-20 17:00 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

This is version 3 of the core changes for Shared Virtual Addressing in
the IOMMU. It provides an API for sharing process address spaces with
devices, using for example PCI PASID and PRI.

This time I didn't append the VFIO and SMMUv3 example users. A smaller
series is easier for me to manage and may be less intimidating for
reviewers. If you're adding or updating SVA support to your IOMMU or
device driver, you can find my complete patch stack at [1] for
inspiration.

Changes to patches 1-9 address feedback from v2 [2]. Mostly small tweaks
and comments, that I tried to detail on each patch.

I added patch 10/10 from Jordan Crouse, for managing PASID without
sharing process address spaces. This interface might get superseded by
auxiliary domains [3], since they are more suitable than io_mm for Intel
vt-d's mdev virtualization. Auxiliary domains reuse existing iommu_map/
iommu_unmap/etc ops instead of introducing new ones, so even though it's
more invasive, I tend to prefer that solution. But since we need
something for testing right now, I'm appending patch 10 to the series.

The series depends on Jacob Pan's patches for fault reporting [4].
  iommu: introduce device fault data
  driver core: add per device iommu param
  iommu: add a timeout parameter for prq response
  iommu: introduce device fault report API
  iommu: introduce page response function

[1] git://linux-arm.org/linux-jpb.git sva/v3
[2] https://www.spinics.net/lists/kvm/msg168742.html
[3] https://lwn.net/ml/linux-kernel/20180830040922.30426-1-baolu.lu@linux.intel.com/
[4] https://lwn.net/ml/linux-kernel/1526072055-86990-1-git-send-email-jacob.jun.pan%40linux.intel.com/

Jean-Philippe Brucker (10):
  iommu: Introduce Shared Virtual Addressing API
  iommu/sva: Bind process address spaces to devices
  iommu/sva: Manage process address spaces
  iommu/sva: Add a mm_exit callback for device drivers
  iommu/sva: Track mm changes with an MMU notifier
  iommu/sva: Search mm by PASID
  iommu: Add a page fault handler
  iommu/iopf: Handle mm faults
  iommu/sva: Register page fault handler
  iommu/sva: Add support for private PASIDs

 drivers/iommu/Kconfig      |   9 +
 drivers/iommu/Makefile     |   2 +
 drivers/iommu/io-pgfault.c | 464 ++++++++++++++++++
 drivers/iommu/iommu-sva.c  | 967 +++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c      | 143 +++++-
 include/linux/iommu.h      | 289 ++++++++++-
 6 files changed, 1855 insertions(+), 19 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.18.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 00/10] Shared Virtual Addressing for the IOMMU
@ 2018-09-20 17:00 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

This is version 3 of the core changes for Shared Virtual Addressing in
the IOMMU. It provides an API for sharing process address spaces with
devices, using for example PCI PASID and PRI.

This time I didn't append the VFIO and SMMUv3 example users. A smaller
series is easier for me to manage and may be less intimidating for
reviewers. If you're adding or updating SVA support to your IOMMU or
device driver, you can find my complete patch stack at [1] for
inspiration.

Changes to patches 1-9 address feedback from v2 [2]. Mostly small tweaks
and comments, that I tried to detail on each patch.

I added patch 10/10 from Jordan Crouse, for managing PASID without
sharing process address spaces. This interface might get superseded by
auxiliary domains [3], since they are more suitable than io_mm for Intel
vt-d's mdev virtualization. Auxiliary domains reuse existing iommu_map/
iommu_unmap/etc ops instead of introducing new ones, so even though it's
more invasive, I tend to prefer that solution. But since we need
something for testing right now, I'm appending patch 10 to the series.

The series depends on Jacob Pan's patches for fault reporting [4].
  iommu: introduce device fault data
  driver core: add per device iommu param
  iommu: add a timeout parameter for prq response
  iommu: introduce device fault report API
  iommu: introduce page response function

[1] git://linux-arm.org/linux-jpb.git sva/v3
[2] https://www.spinics.net/lists/kvm/msg168742.html
[3] https://lwn.net/ml/linux-kernel/20180830040922.30426-1-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org/
[4] https://lwn.net/ml/linux-kernel/1526072055-86990-1-git-send-email-jacob.jun.pan%40linux.intel.com/

Jean-Philippe Brucker (10):
  iommu: Introduce Shared Virtual Addressing API
  iommu/sva: Bind process address spaces to devices
  iommu/sva: Manage process address spaces
  iommu/sva: Add a mm_exit callback for device drivers
  iommu/sva: Track mm changes with an MMU notifier
  iommu/sva: Search mm by PASID
  iommu: Add a page fault handler
  iommu/iopf: Handle mm faults
  iommu/sva: Register page fault handler
  iommu/sva: Add support for private PASIDs

 drivers/iommu/Kconfig      |   9 +
 drivers/iommu/Makefile     |   2 +
 drivers/iommu/io-pgfault.c | 464 ++++++++++++++++++
 drivers/iommu/iommu-sva.c  | 967 +++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c      | 143 +++++-
 include/linux/iommu.h      | 289 ++++++++++-
 6 files changed, 1855 insertions(+), 19 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.18.0

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Shared Virtual Addressing (SVA) provides a way for device drivers to bind
process address spaces to devices. This requires the IOMMU to support page
table format and features compatible with the CPUs, and usually requires
the system to support I/O Page Faults (IOPF) and Process Address Space ID
(PASID). When all of these are available, DMA can access virtual addresses
of a process. A PASID is allocated for each process, and the device driver
programs it into the device in an implementation-specific way.

Add a new API for sharing process page tables with devices. Introduce two
IOMMU operations, sva_init_device() and sva_shutdown_device(), that
prepare the IOMMU driver for SVA. For example allocate PASID tables and
fault queues. Subsequent patches will implement the bind() and unbind()
operations.

Introduce a new mutex sva_lock on the device's IOMMU param to serialize
init(), shutdown(), bind() and unbind() operations. Using the existing
lock isn't possible because the unbind() and shutdown() operations will
have to wait while holding sva_lock for concurrent fault queue flushes to
terminate. These flushes will take the existing lock.

Support for I/O Page Faults will be added in a later patch using a new
feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
down all shared mappings.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
v2->v3:
* Add sva_lock to serialize init/bind/unbind/shutdown
* Rename functions for consistency with the rest of the API
---
 drivers/iommu/Kconfig     |   4 ++
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/iommu-sva.c | 107 ++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c     |   1 +
 include/linux/iommu.h     |  34 ++++++++++++
 5 files changed, 147 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c60395b7470f..884580401919 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -95,6 +95,10 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_SVA
+	bool
+	select IOMMU_API
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index ab5eba6edf82..7d6332be5f0e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index 000000000000..85ef98efede8
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage PASIDs and bind process address spaces to devices.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/iommu.h>
+#include <linux/slab.h>
+
+/**
+ * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
+ * @dev: the device
+ * @features: bitmask of features that need to be initialized
+ * @min_pasid: min PASID value supported by the device
+ * @max_pasid: max PASID value supported by the device
+ *
+ * Users of the bind()/unbind() API must call this function to initialize all
+ * features required for SVA.
+ *
+ * The device must support multiple address spaces (e.g. PCI PASID). By default
+ * the PASID allocated during bind() is limited by the IOMMU capacity, and by
+ * the device PASID width defined in the PCI capability or in the firmware
+ * description. Setting @max_pasid to a non-zero value smaller than this limit
+ * overrides it. Similarly, @min_pasid overrides the lower PASID limit supported
+ * by the IOMMU.
+ *
+ * The device should not be performing any DMA while this function is running,
+ * otherwise the behavior is undefined.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_init_device(struct device *dev, unsigned long features,
+		       unsigned int min_pasid, unsigned int max_pasid)
+{
+	int ret;
+	struct iommu_sva_param *param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain || !domain->ops->sva_init_device)
+		return -ENODEV;
+
+	if (features)
+		return -EINVAL;
+
+	param = kzalloc(sizeof(*param), GFP_KERNEL);
+	if (!param)
+		return -ENOMEM;
+
+	param->features		= features;
+	param->min_pasid	= min_pasid;
+	param->max_pasid	= max_pasid;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	if (dev->iommu_param->sva_param) {
+		ret = -EEXIST;
+		goto err_unlock;
+	}
+
+	/*
+	 * IOMMU driver updates the limits depending on the IOMMU and device
+	 * capabilities.
+	 */
+	ret = domain->ops->sva_init_device(dev, param);
+	if (ret)
+		goto err_unlock;
+
+	dev->iommu_param->sva_param = param;
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	return 0;
+
+err_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	kfree(param);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_init_device);
+
+/**
+ * iommu_sva_shutdown_device() - Shutdown Shared Virtual Addressing for a device
+ * @dev: the device
+ *
+ * Disable SVA. Device driver should ensure that the device isn't performing any
+ * DMA while this function is running.
+ */
+void iommu_sva_shutdown_device(struct device *dev)
+{
+	struct iommu_sva_param *param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain)
+		return;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	param = dev->iommu_param->sva_param;
+	if (!param)
+		goto out_unlock;
+
+	if (domain->ops->sva_shutdown_device)
+		domain->ops->sva_shutdown_device(dev);
+
+	kfree(param);
+	dev->iommu_param->sva_param = NULL;
+out_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_shutdown_device);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 58f3477f2993..fa0561ed006f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -653,6 +653,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 		goto err_free_name;
 	}
 	mutex_init(&dev->iommu_param->lock);
+	mutex_init(&dev->iommu_param->sva_lock);
 
 	kobject_get(group->devices_kobj);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8177f7736fcd..4c27cb347770 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -197,6 +197,12 @@ struct page_response_msg {
 	u64 private_data;
 };
 
+struct iommu_sva_param {
+	unsigned long features;
+	unsigned int min_pasid;
+	unsigned int max_pasid;
+};
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -204,6 +210,8 @@ struct page_response_msg {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @sva_init_device: initialize Shared Virtual Addressing for a device
+ * @sva_shutdown_device: shutdown Shared Virtual Addressing for a device
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
@@ -239,6 +247,8 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	int (*sva_init_device)(struct device *dev, struct iommu_sva_param *param);
+	void (*sva_shutdown_device)(struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -393,6 +403,9 @@ struct iommu_fault_param {
  * struct iommu_param - collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
+ * @lock: serializes accesses to fault_param
+ * @sva_param: SVA parameters
+ * @sva_lock: serializes accesses to sva_param
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  *	struct iommu_group	*iommu_group;
@@ -401,6 +414,8 @@ struct iommu_fault_param {
 struct iommu_param {
 	struct mutex lock;
 	struct iommu_fault_param *fault_param;
+	struct mutex sva_lock;
+	struct iommu_sva_param *sva_param;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -904,4 +919,23 @@ void iommu_debugfs_setup(void);
 static inline void iommu_debugfs_setup(void) {}
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+extern int iommu_sva_init_device(struct device *dev, unsigned long features,
+				 unsigned int min_pasid,
+				 unsigned int max_pasid);
+extern void iommu_sva_shutdown_device(struct device *dev);
+#else /* CONFIG_IOMMU_SVA */
+static inline int iommu_sva_init_device(struct device *dev,
+					unsigned long features,
+					unsigned int min_pasid,
+					unsigned int max_pasid)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_sva_shutdown_device(struct device *dev)
+{
+}
+#endif /* CONFIG_IOMMU_SVA */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Shared Virtual Addressing (SVA) provides a way for device drivers to bind
process address spaces to devices. This requires the IOMMU to support page
table format and features compatible with the CPUs, and usually requires
the system to support I/O Page Faults (IOPF) and Process Address Space ID
(PASID). When all of these are available, DMA can access virtual addresses
of a process. A PASID is allocated for each process, and the device driver
programs it into the device in an implementation-specific way.

Add a new API for sharing process page tables with devices. Introduce two
IOMMU operations, sva_init_device() and sva_shutdown_device(), that
prepare the IOMMU driver for SVA. For example allocate PASID tables and
fault queues. Subsequent patches will implement the bind() and unbind()
operations.

Introduce a new mutex sva_lock on the device's IOMMU param to serialize
init(), shutdown(), bind() and unbind() operations. Using the existing
lock isn't possible because the unbind() and shutdown() operations will
have to wait while holding sva_lock for concurrent fault queue flushes to
terminate. These flushes will take the existing lock.

Support for I/O Page Faults will be added in a later patch using a new
feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
down all shared mappings.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
v2->v3:
* Add sva_lock to serialize init/bind/unbind/shutdown
* Rename functions for consistency with the rest of the API
---
 drivers/iommu/Kconfig     |   4 ++
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/iommu-sva.c | 107 ++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c     |   1 +
 include/linux/iommu.h     |  34 ++++++++++++
 5 files changed, 147 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c60395b7470f..884580401919 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -95,6 +95,10 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_SVA
+	bool
+	select IOMMU_API
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index ab5eba6edf82..7d6332be5f0e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index 000000000000..85ef98efede8
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage PASIDs and bind process address spaces to devices.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/iommu.h>
+#include <linux/slab.h>
+
+/**
+ * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
+ * @dev: the device
+ * @features: bitmask of features that need to be initialized
+ * @min_pasid: min PASID value supported by the device
+ * @max_pasid: max PASID value supported by the device
+ *
+ * Users of the bind()/unbind() API must call this function to initialize all
+ * features required for SVA.
+ *
+ * The device must support multiple address spaces (e.g. PCI PASID). By default
+ * the PASID allocated during bind() is limited by the IOMMU capacity, and by
+ * the device PASID width defined in the PCI capability or in the firmware
+ * description. Setting @max_pasid to a non-zero value smaller than this limit
+ * overrides it. Similarly, @min_pasid overrides the lower PASID limit supported
+ * by the IOMMU.
+ *
+ * The device should not be performing any DMA while this function is running,
+ * otherwise the behavior is undefined.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_init_device(struct device *dev, unsigned long features,
+		       unsigned int min_pasid, unsigned int max_pasid)
+{
+	int ret;
+	struct iommu_sva_param *param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain || !domain->ops->sva_init_device)
+		return -ENODEV;
+
+	if (features)
+		return -EINVAL;
+
+	param = kzalloc(sizeof(*param), GFP_KERNEL);
+	if (!param)
+		return -ENOMEM;
+
+	param->features		= features;
+	param->min_pasid	= min_pasid;
+	param->max_pasid	= max_pasid;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	if (dev->iommu_param->sva_param) {
+		ret = -EEXIST;
+		goto err_unlock;
+	}
+
+	/*
+	 * IOMMU driver updates the limits depending on the IOMMU and device
+	 * capabilities.
+	 */
+	ret = domain->ops->sva_init_device(dev, param);
+	if (ret)
+		goto err_unlock;
+
+	dev->iommu_param->sva_param = param;
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	return 0;
+
+err_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	kfree(param);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_init_device);
+
+/**
+ * iommu_sva_shutdown_device() - Shutdown Shared Virtual Addressing for a device
+ * @dev: the device
+ *
+ * Disable SVA. Device driver should ensure that the device isn't performing any
+ * DMA while this function is running.
+ */
+void iommu_sva_shutdown_device(struct device *dev)
+{
+	struct iommu_sva_param *param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain)
+		return;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	param = dev->iommu_param->sva_param;
+	if (!param)
+		goto out_unlock;
+
+	if (domain->ops->sva_shutdown_device)
+		domain->ops->sva_shutdown_device(dev);
+
+	kfree(param);
+	dev->iommu_param->sva_param = NULL;
+out_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_shutdown_device);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 58f3477f2993..fa0561ed006f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -653,6 +653,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 		goto err_free_name;
 	}
 	mutex_init(&dev->iommu_param->lock);
+	mutex_init(&dev->iommu_param->sva_lock);
 
 	kobject_get(group->devices_kobj);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8177f7736fcd..4c27cb347770 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -197,6 +197,12 @@ struct page_response_msg {
 	u64 private_data;
 };
 
+struct iommu_sva_param {
+	unsigned long features;
+	unsigned int min_pasid;
+	unsigned int max_pasid;
+};
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -204,6 +210,8 @@ struct page_response_msg {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @sva_init_device: initialize Shared Virtual Addressing for a device
+ * @sva_shutdown_device: shutdown Shared Virtual Addressing for a device
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
@@ -239,6 +247,8 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	int (*sva_init_device)(struct device *dev, struct iommu_sva_param *param);
+	void (*sva_shutdown_device)(struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -393,6 +403,9 @@ struct iommu_fault_param {
  * struct iommu_param - collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
+ * @lock: serializes accesses to fault_param
+ * @sva_param: SVA parameters
+ * @sva_lock: serializes accesses to sva_param
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  *	struct iommu_group	*iommu_group;
@@ -401,6 +414,8 @@ struct iommu_fault_param {
 struct iommu_param {
 	struct mutex lock;
 	struct iommu_fault_param *fault_param;
+	struct mutex sva_lock;
+	struct iommu_sva_param *sva_param;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -904,4 +919,23 @@ void iommu_debugfs_setup(void);
 static inline void iommu_debugfs_setup(void) {}
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+extern int iommu_sva_init_device(struct device *dev, unsigned long features,
+				 unsigned int min_pasid,
+				 unsigned int max_pasid);
+extern void iommu_sva_shutdown_device(struct device *dev);
+#else /* CONFIG_IOMMU_SVA */
+static inline int iommu_sva_init_device(struct device *dev,
+					unsigned long features,
+					unsigned int min_pasid,
+					unsigned int max_pasid)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_sva_shutdown_device(struct device *dev)
+{
+}
+#endif /* CONFIG_IOMMU_SVA */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Add bind() and unbind() operations to the IOMMU API. Bind() returns a
PASID that drivers can program in hardware, to let their devices access an
mm. This patch only adds skeletons for the device driver API, most of the
implementation is still missing.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 34 +++++++++++++++
 drivers/iommu/iommu.c     | 90 +++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 37 ++++++++++++++++
 3 files changed, 161 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 85ef98efede8..d60d4f0bb89e 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -8,6 +8,38 @@
 #include <linux/iommu.h>
 #include <linux/slab.h>
 
+int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			    unsigned long flags, void *drvdata)
+{
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
+
+int __iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
+
+static void __iommu_sva_unbind_device_all(struct device *dev)
+{
+	/* TODO */
+}
+
+/**
+ * iommu_sva_unbind_device_all() - Detach all address spaces from this device
+ * @dev: the device
+ *
+ * When detaching @dev from a domain, IOMMU drivers should use this helper.
+ */
+void iommu_sva_unbind_device_all(struct device *dev)
+{
+	mutex_lock(&dev->iommu_param->sva_lock);
+	__iommu_sva_unbind_device_all(dev);
+	mutex_unlock(&dev->iommu_param->sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -96,6 +128,8 @@ void iommu_sva_shutdown_device(struct device *dev)
 	if (!param)
 		goto out_unlock;
 
+	__iommu_sva_unbind_device_all(dev);
+
 	if (domain->ops->sva_shutdown_device)
 		domain->ops->sva_shutdown_device(dev);
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fa0561ed006f..aba3bf15d46c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2325,3 +2325,93 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to it
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between device and task, allowing the device to access the mm
+ * using the returned PASID. If unbind() isn't called first, a subsequent bind()
+ * for the same device and mm fails with -EEXIST.
+ *
+ * iommu_sva_init_device() must be called first, to initialize the required SVA
+ * features. @flags must be a subset of these features.
+ *
+ * The caller must pin down using get_user_pages*() all mappings shared with the
+ * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
+ * (e.g. copy-on-write).
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			  unsigned long flags, void *drvdata)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+
+	if (!pasid)
+		return -EINVAL;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	/* Ensure device count and domain don't change while we're binding */
+	mutex_lock(&group->mutex);
+
+	/*
+	 * To keep things simple, SVA currently doesn't support IOMMU groups
+	 * with more than one device. Existing SVA-capable systems are not
+	 * affected by the problems that required IOMMU groups (lack of ACS
+	 * isolation, device ID aliasing and other hardware issues).
+	 */
+	if (iommu_group_device_count(group) != 1)
+		goto out_unlock;
+
+	ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
+
+out_unlock:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @dev: the device
+ * @pasid: the pasid returned by bind()
+ *
+ * Remove bond between device and address space identified by @pasid. Users
+ * should not call unbind() if the corresponding mm exited (as the PASID might
+ * have been reallocated for another process).
+ *
+ * The device must not be issuing any more transaction for this PASID. All
+ * outstanding page requests for this PASID must have been flushed to the IOMMU.
+ *
+ * Returns 0 on success, or an error value
+ */
+int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_sva_unbind_device(dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4c27cb347770..9c49877e37a5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -586,6 +586,10 @@ void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
 
+extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				int *pasid, unsigned long flags, void *drvdata);
+extern int iommu_sva_unbind_device(struct device *dev, int pasid);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -910,6 +914,18 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
 	return -ENODEV;
 }
 
+static inline int iommu_sva_bind_device(struct device *dev,
+					struct mm_struct *mm, int *pasid,
+					unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
@@ -924,6 +940,11 @@ extern int iommu_sva_init_device(struct device *dev, unsigned long features,
 				 unsigned int min_pasid,
 				 unsigned int max_pasid);
 extern void iommu_sva_shutdown_device(struct device *dev);
+extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				   int *pasid, unsigned long flags,
+				   void *drvdata);
+extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
+extern void iommu_sva_unbind_device_all(struct device *dev);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
@@ -936,6 +957,22 @@ static inline int iommu_sva_init_device(struct device *dev,
 static inline void iommu_sva_shutdown_device(struct device *dev)
 {
 }
+
+static inline int __iommu_sva_bind_device(struct device *dev,
+					  struct mm_struct *mm, int *pasid,
+					  unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_sva_unbind_device_all(struct device *dev)
+{
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Add bind() and unbind() operations to the IOMMU API. Bind() returns a
PASID that drivers can program in hardware, to let their devices access an
mm. This patch only adds skeletons for the device driver API, most of the
implementation is still missing.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-sva.c | 34 +++++++++++++++
 drivers/iommu/iommu.c     | 90 +++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 37 ++++++++++++++++
 3 files changed, 161 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 85ef98efede8..d60d4f0bb89e 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -8,6 +8,38 @@
 #include <linux/iommu.h>
 #include <linux/slab.h>
 
+int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			    unsigned long flags, void *drvdata)
+{
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
+
+int __iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
+
+static void __iommu_sva_unbind_device_all(struct device *dev)
+{
+	/* TODO */
+}
+
+/**
+ * iommu_sva_unbind_device_all() - Detach all address spaces from this device
+ * @dev: the device
+ *
+ * When detaching @dev from a domain, IOMMU drivers should use this helper.
+ */
+void iommu_sva_unbind_device_all(struct device *dev)
+{
+	mutex_lock(&dev->iommu_param->sva_lock);
+	__iommu_sva_unbind_device_all(dev);
+	mutex_unlock(&dev->iommu_param->sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -96,6 +128,8 @@ void iommu_sva_shutdown_device(struct device *dev)
 	if (!param)
 		goto out_unlock;
 
+	__iommu_sva_unbind_device_all(dev);
+
 	if (domain->ops->sva_shutdown_device)
 		domain->ops->sva_shutdown_device(dev);
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fa0561ed006f..aba3bf15d46c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2325,3 +2325,93 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to it
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between device and task, allowing the device to access the mm
+ * using the returned PASID. If unbind() isn't called first, a subsequent bind()
+ * for the same device and mm fails with -EEXIST.
+ *
+ * iommu_sva_init_device() must be called first, to initialize the required SVA
+ * features. @flags must be a subset of these features.
+ *
+ * The caller must pin down using get_user_pages*() all mappings shared with the
+ * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
+ * (e.g. copy-on-write).
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			  unsigned long flags, void *drvdata)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+
+	if (!pasid)
+		return -EINVAL;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	/* Ensure device count and domain don't change while we're binding */
+	mutex_lock(&group->mutex);
+
+	/*
+	 * To keep things simple, SVA currently doesn't support IOMMU groups
+	 * with more than one device. Existing SVA-capable systems are not
+	 * affected by the problems that required IOMMU groups (lack of ACS
+	 * isolation, device ID aliasing and other hardware issues).
+	 */
+	if (iommu_group_device_count(group) != 1)
+		goto out_unlock;
+
+	ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
+
+out_unlock:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @dev: the device
+ * @pasid: the pasid returned by bind()
+ *
+ * Remove bond between device and address space identified by @pasid. Users
+ * should not call unbind() if the corresponding mm exited (as the PASID might
+ * have been reallocated for another process).
+ *
+ * The device must not be issuing any more transaction for this PASID. All
+ * outstanding page requests for this PASID must have been flushed to the IOMMU.
+ *
+ * Returns 0 on success, or an error value
+ */
+int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_sva_unbind_device(dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4c27cb347770..9c49877e37a5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -586,6 +586,10 @@ void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
 
+extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				int *pasid, unsigned long flags, void *drvdata);
+extern int iommu_sva_unbind_device(struct device *dev, int pasid);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -910,6 +914,18 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
 	return -ENODEV;
 }
 
+static inline int iommu_sva_bind_device(struct device *dev,
+					struct mm_struct *mm, int *pasid,
+					unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
@@ -924,6 +940,11 @@ extern int iommu_sva_init_device(struct device *dev, unsigned long features,
 				 unsigned int min_pasid,
 				 unsigned int max_pasid);
 extern void iommu_sva_shutdown_device(struct device *dev);
+extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				   int *pasid, unsigned long flags,
+				   void *drvdata);
+extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
+extern void iommu_sva_unbind_device_all(struct device *dev);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
@@ -936,6 +957,22 @@ static inline int iommu_sva_init_device(struct device *dev,
 static inline void iommu_sva_shutdown_device(struct device *dev)
 {
 }
+
+static inline int __iommu_sva_bind_device(struct device *dev,
+					  struct mm_struct *mm, int *pasid,
+					  unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_sva_unbind_device_all(struct device *dev)
+{
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Allocate IOMMU mm structures and bind them to devices. Four operations are
added to IOMMU drivers:

* mm_alloc(): to create an io_mm structure and perform architecture-
  specific operations required to grab the process (for instance on ARM,
  pin down the CPU ASID so that the process doesn't get assigned a new
  ASID on rollover).

  There is a single valid io_mm structure per Linux mm. Future extensions
  may also use io_mm for kernel-managed address spaces, populated with
  map()/unmap() calls instead of bound to process address spaces. This
  patch focuses on "shared" io_mm.

* mm_attach(): attach an mm to a device. The IOMMU driver checks that the
  device is capable of sharing an address space, and writes the PASID
  table entry to install the pgd.

  Some IOMMU drivers will have a single PASID table per domain, for
  convenience. Other can implement it differently but to help these
  drivers, mm_attach and mm_detach take 'attach_domain' and
  'detach_domain' parameters, that tell whether they need to set and clear
  the PASID entry or only send the required TLB invalidations.

* mm_detach(): detach an mm from a device. The IOMMU driver removes the
  PASID table entry and invalidates the IOTLBs.

* mm_free(): free a structure allocated by mm_alloc(), and let arch
  release the process.

mm_attach and mm_detach operations are serialized with a spinlock. When
trying to optimize this code, we should at least prevent concurrent
attach()/detach() on the same domain (so multi-level PASID table code can
allocate tables lazily). mm_alloc() can sleep, but mm_free must not
(because we'll have to call it from call_srcu later on).

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Keeping track of address spaces requires the use of MMU notifiers.
Handling process exit with regard to unbind() is tricky, so it is left for
another patch and we explicitly fail mm_alloc() for the moment.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
v2->v3: use sva_lock, comment updates
---
 drivers/iommu/iommu-sva.c | 397 +++++++++++++++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |   1 +
 include/linux/iommu.h     |  29 +++
 3 files changed, 424 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index d60d4f0bb89e..a486bc947335 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -5,25 +5,415 @@
  * Copyright (C) 2018 ARM Ltd.
  */
 
+#include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
+ * device. A device can have multiple io_mm and an io_mm may be bound to
+ * multiple devices.
+ *              ___________________________
+ *             |  IOMMU domain A           |
+ *             |  ________________         |
+ *             | |  IOMMU group   |        +------- io_pgtables
+ *             | |                |        |
+ *             | |   dev 00:00.0 ----+------- bond --- io_mm X
+ *             | |________________|   \    |
+ *             |                       '----- bond ---.
+ *             |___________________________|           \
+ *              ___________________________             \
+ *             |  IOMMU domain B           |           io_mm Y
+ *             |  ________________         |           / /
+ *             | |  IOMMU group   |        |          / /
+ *             | |                |        |         / /
+ *             | |   dev 00:01.0 ------------ bond -' /
+ *             | |   dev 00:01.1 ------------ bond --'
+ *             | |________________|        |
+ *             |                           +------- io_pgtables
+ *             |___________________________|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y. In addition each
+ * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
+ * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
+ *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. In the
+ * example, devices use PASID 1 to read/write into address space X and PASID 2
+ * to read/write into address space Y.
+ *
+ * Hardware tables describing this configuration in the IOMMU would typically
+ * look like this:
+ *
+ *                                PASID tables
+ *                                 of domain A
+ *                              .->+--------+
+ *                             / 0 |        |-------> io_pgtable
+ *                            /    +--------+
+ *            Device tables  /   1 |        |-------> pgd X
+ *              +--------+  /      +--------+
+ *      00:00.0 |      A |-'     2 |        |--.
+ *              +--------+         +--------+   \
+ *              :        :       3 |        |    \
+ *              +--------+         +--------+     --> pgd Y
+ *      00:01.0 |      B |--.                    /
+ *              +--------+   \                  |
+ *      00:01.1 |      B |----+   PASID tables  |
+ *              +--------+     \   of domain B  |
+ *                              '->+--------+   |
+ *                               0 |        |-- | --> io_pgtable
+ *                                 +--------+   |
+ *                               1 |        |   |
+ *                                 +--------+   |
+ *                               2 |        |---'
+ *                                 +--------+
+ *                               3 |        |
+ *                                 +--------+
+ *
+ * With this model, a single call binds all devices in a given domain to an
+ * address space. Other devices in the domain will get the same bond implicitly.
+ * However, users must issue one bind() for each device, because IOMMUs may
+ * implement SVA differently. Furthermore, mandating one bind() per device
+ * allows the driver to perform sanity-checks on device capabilities.
+ *
+ * In some IOMMUs, one entry (typically the first one) of the PASID table can be
+ * used to hold non-PASID translations. In this case PASID #0 is reserved and
+ * the first entry points to the io_pgtable pointer. In other IOMMUs the
+ * io_pgtable pointer is held in the device table and PASID #0 is available to
+ * the allocator.
+ */
+
+struct iommu_bond {
+	struct io_mm		*io_mm;
+	struct device		*dev;
+	struct iommu_domain	*domain;
+
+	struct list_head	mm_head;
+	struct list_head	dev_head;
+	struct list_head	domain_head;
+
+	void			*drvdata;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_pasid_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to bonds, access/modifications to the PASID IDR, and
+ * changes to io_mm refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_sva_lock);
+
+static struct io_mm *
+io_mm_alloc(struct iommu_domain *domain, struct device *dev,
+	    struct mm_struct *mm, unsigned long flags)
+{
+	int ret;
+	int pasid;
+	struct io_mm *io_mm;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
+		return ERR_PTR(-ENODEV);
+
+	io_mm = domain->ops->mm_alloc(domain, mm, flags);
+	if (IS_ERR(io_mm))
+		return io_mm;
+	if (!io_mm)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->flags		= flags;
+	io_mm->mm		= mm;
+	io_mm->release		= domain->ops->mm_free;
+	INIT_LIST_HEAD(&io_mm->devices);
+	/* Leave kref as zero until the io_mm is fully initialized */
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
+			  param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		ret = pasid;
+		goto err_free_mm;
+	}
+
+	/* TODO: keep track of mm. For the moment, abort. */
+	ret = -ENOSYS;
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	spin_unlock(&iommu_sva_lock);
+
+err_free_mm:
+	io_mm->release(io_mm);
+	mmdrop(mm);
+
+	return ERR_PTR(ret);
+}
+
+static void io_mm_free(struct io_mm *io_mm)
+{
+	struct mm_struct *mm = io_mm->mm;
+
+	io_mm->release(io_mm);
+	mmdrop(mm);
+}
+
+static void io_mm_release(struct kref *kref)
+{
+	struct io_mm *io_mm;
+
+	io_mm = container_of(kref, struct io_mm, kref);
+	WARN_ON(!list_empty(&io_mm->devices));
+
+	/* The PASID can now be reallocated for another mm... */
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	/* ... but this mm is freed after a grace period (TODO) */
+	io_mm_free(io_mm);
+}
+
+/*
+ * Returns non-zero if a reference to the io_mm was successfully taken.
+ * Returns zero if the io_mm is being freed and should not be used.
+ */
+static int io_mm_get_locked(struct io_mm *io_mm)
+{
+	if (io_mm)
+		return kref_get_unless_zero(&io_mm->kref);
+
+	return 0;
+}
+
+static void io_mm_put_locked(struct io_mm *io_mm)
+{
+	kref_put(&io_mm->kref, io_mm_release);
+}
+
+static void io_mm_put(struct io_mm *io_mm)
+{
+	spin_lock(&iommu_sva_lock);
+	io_mm_put_locked(io_mm);
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
+			struct io_mm *io_mm, void *drvdata)
+{
+	int ret;
+	bool attach_domain = true;
+	int pasid = io_mm->pasid;
+	struct iommu_bond *bond, *tmp;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+		return -ENODEV;
+
+	if (pasid > param->max_pasid || pasid < param->min_pasid)
+		return -ERANGE;
+
+	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond->domain		= domain;
+	bond->io_mm		= io_mm;
+	bond->dev		= dev;
+	bond->drvdata		= drvdata;
+
+	spin_lock(&iommu_sva_lock);
+	/*
+	 * Check if this io_mm is already bound to the domain. In which case the
+	 * IOMMU driver doesn't have to install the PASID table entry.
+	 */
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == io_mm) {
+			attach_domain = false;
+			break;
+		}
+	}
+
+	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
+	if (ret) {
+		kfree(bond);
+		goto out_unlock;
+	}
+
+	list_add(&bond->mm_head, &io_mm->devices);
+	list_add(&bond->domain_head, &domain->mm_list);
+	list_add(&bond->dev_head, &param->mm_list);
+
+out_unlock:
+	spin_unlock(&iommu_sva_lock);
+	return ret;
+}
+
+static void io_mm_detach_locked(struct iommu_bond *bond)
+{
+	struct iommu_bond *tmp;
+	bool detach_domain = true;
+	struct iommu_domain *domain = bond->domain;
+
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
+			detach_domain = false;
+			break;
+		}
+	}
+
+	list_del(&bond->mm_head);
+	list_del(&bond->domain_head);
+	list_del(&bond->dev_head);
+
+	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
+
+	io_mm_put_locked(bond->io_mm);
+	kfree(bond);
+}
 
 int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			    unsigned long flags, void *drvdata)
 {
-	return -ENOSYS; /* TODO */
+	int i;
+	int ret = 0;
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = NULL;
+	struct iommu_domain *domain;
+	struct iommu_sva_param *param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	param = dev->iommu_param->sva_param;
+	if (!param || (flags & ~param->features)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* If an io_mm already exists, use it */
+	spin_lock(&iommu_sva_lock);
+	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
+			/* ... Unless it's already bound to this device */
+			list_for_each_entry(bond, &io_mm->devices, mm_head) {
+				if (bond->dev == dev) {
+					ret = -EEXIST;
+					io_mm_put_locked(io_mm);
+					break;
+				}
+			}
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+	if (ret)
+		goto out_unlock;
+
+	/* Require identical features within an io_mm for now */
+	if (io_mm && (flags != io_mm->flags)) {
+		io_mm_put(io_mm);
+		ret = -EDOM;
+		goto out_unlock;
+	}
+
+	if (!io_mm) {
+		io_mm = io_mm_alloc(domain, dev, mm, flags);
+		if (IS_ERR(io_mm)) {
+			ret = PTR_ERR(io_mm);
+			goto out_unlock;
+		}
+	}
+
+	ret = io_mm_attach(domain, dev, io_mm, drvdata);
+	if (ret)
+		io_mm_put(io_mm);
+	else
+		*pasid = io_mm->pasid;
+
+out_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
 
 int __iommu_sva_unbind_device(struct device *dev, int pasid)
 {
-	return -ENOSYS; /* TODO */
+	int ret = -ESRCH;
+	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL;
+	struct iommu_sva_param *param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	param = dev->iommu_param->sva_param;
+	if (!param) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &param->mm_list, dev_head) {
+		if (bond->io_mm->pasid == pasid) {
+			io_mm_detach_locked(bond);
+			ret = 0;
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+
+out_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
 
 static void __iommu_sva_unbind_device_all(struct device *dev)
 {
-	/* TODO */
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+	struct iommu_bond *bond, *next;
+
+	if (!param)
+		return;
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
+		io_mm_detach_locked(bond);
+	spin_unlock(&iommu_sva_lock);
 }
 
 /**
@@ -82,6 +472,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	param->features		= features;
 	param->min_pasid	= min_pasid;
 	param->max_pasid	= max_pasid;
+	INIT_LIST_HEAD(&param->mm_list);
 
 	mutex_lock(&dev->iommu_param->sva_lock);
 	if (dev->iommu_param->sva_param) {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index aba3bf15d46c..7113fe398b70 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1525,6 +1525,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->mm_list);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9c49877e37a5..6a3ced6a5aa1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -99,6 +99,20 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	struct list_head mm_list;
+};
+
+struct io_mm {
+	int			pasid;
+	/* IOMMU_SVA_FEAT_* */
+	unsigned long		flags;
+	struct list_head	devices;
+	struct kref		kref;
+	struct mm_struct	*mm;
+
+	/* Release callback for this mm */
+	void (*release)(struct io_mm *io_mm);
 };
 
 enum iommu_cap {
@@ -201,6 +215,7 @@ struct iommu_sva_param {
 	unsigned long features;
 	unsigned int min_pasid;
 	unsigned int max_pasid;
+	struct list_head mm_list;
 };
 
 /**
@@ -212,6 +227,12 @@ struct iommu_sva_param {
  * @detach_dev: detach device from an iommu domain
  * @sva_init_device: initialize Shared Virtual Addressing for a device
  * @sva_shutdown_device: shutdown Shared Virtual Addressing for a device
+ * @mm_alloc: allocate io_mm
+ * @mm_free: free io_mm
+ * @mm_attach: attach io_mm to a device. Install PASID entry if necessary. Must
+ *             not sleep.
+ * @mm_detach: detach io_mm from a device. Remove PASID entry and
+ *             flush associated TLB entries if necessary. Must not sleep.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
@@ -249,6 +270,14 @@ struct iommu_ops {
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*sva_init_device)(struct device *dev, struct iommu_sva_param *param);
 	void (*sva_shutdown_device)(struct device *dev);
+	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
+				  struct mm_struct *mm,
+				  unsigned long flags);
+	void (*mm_free)(struct io_mm *io_mm);
+	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
+			 struct io_mm *io_mm, bool attach_domain);
+	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
+			  struct io_mm *io_mm, bool detach_domain);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Allocate IOMMU mm structures and bind them to devices. Four operations are
added to IOMMU drivers:

* mm_alloc(): to create an io_mm structure and perform architecture-
  specific operations required to grab the process (for instance on ARM,
  pin down the CPU ASID so that the process doesn't get assigned a new
  ASID on rollover).

  There is a single valid io_mm structure per Linux mm. Future extensions
  may also use io_mm for kernel-managed address spaces, populated with
  map()/unmap() calls instead of bound to process address spaces. This
  patch focuses on "shared" io_mm.

* mm_attach(): attach an mm to a device. The IOMMU driver checks that the
  device is capable of sharing an address space, and writes the PASID
  table entry to install the pgd.

  Some IOMMU drivers will have a single PASID table per domain, for
  convenience. Other can implement it differently but to help these
  drivers, mm_attach and mm_detach take 'attach_domain' and
  'detach_domain' parameters, that tell whether they need to set and clear
  the PASID entry or only send the required TLB invalidations.

* mm_detach(): detach an mm from a device. The IOMMU driver removes the
  PASID table entry and invalidates the IOTLBs.

* mm_free(): free a structure allocated by mm_alloc(), and let arch
  release the process.

mm_attach and mm_detach operations are serialized with a spinlock. When
trying to optimize this code, we should at least prevent concurrent
attach()/detach() on the same domain (so multi-level PASID table code can
allocate tables lazily). mm_alloc() can sleep, but mm_free must not
(because we'll have to call it from call_srcu later on).

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Keeping track of address spaces requires the use of MMU notifiers.
Handling process exit with regard to unbind() is tricky, so it is left for
another patch and we explicitly fail mm_alloc() for the moment.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
v2->v3: use sva_lock, comment updates
---
 drivers/iommu/iommu-sva.c | 397 +++++++++++++++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |   1 +
 include/linux/iommu.h     |  29 +++
 3 files changed, 424 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index d60d4f0bb89e..a486bc947335 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -5,25 +5,415 @@
  * Copyright (C) 2018 ARM Ltd.
  */
 
+#include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
+ * device. A device can have multiple io_mm and an io_mm may be bound to
+ * multiple devices.
+ *              ___________________________
+ *             |  IOMMU domain A           |
+ *             |  ________________         |
+ *             | |  IOMMU group   |        +------- io_pgtables
+ *             | |                |        |
+ *             | |   dev 00:00.0 ----+------- bond --- io_mm X
+ *             | |________________|   \    |
+ *             |                       '----- bond ---.
+ *             |___________________________|           \
+ *              ___________________________             \
+ *             |  IOMMU domain B           |           io_mm Y
+ *             |  ________________         |           / /
+ *             | |  IOMMU group   |        |          / /
+ *             | |                |        |         / /
+ *             | |   dev 00:01.0 ------------ bond -' /
+ *             | |   dev 00:01.1 ------------ bond --'
+ *             | |________________|        |
+ *             |                           +------- io_pgtables
+ *             |___________________________|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y. In addition each
+ * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
+ * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
+ *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. In the
+ * example, devices use PASID 1 to read/write into address space X and PASID 2
+ * to read/write into address space Y.
+ *
+ * Hardware tables describing this configuration in the IOMMU would typically
+ * look like this:
+ *
+ *                                PASID tables
+ *                                 of domain A
+ *                              .->+--------+
+ *                             / 0 |        |-------> io_pgtable
+ *                            /    +--------+
+ *            Device tables  /   1 |        |-------> pgd X
+ *              +--------+  /      +--------+
+ *      00:00.0 |      A |-'     2 |        |--.
+ *              +--------+         +--------+   \
+ *              :        :       3 |        |    \
+ *              +--------+         +--------+     --> pgd Y
+ *      00:01.0 |      B |--.                    /
+ *              +--------+   \                  |
+ *      00:01.1 |      B |----+   PASID tables  |
+ *              +--------+     \   of domain B  |
+ *                              '->+--------+   |
+ *                               0 |        |-- | --> io_pgtable
+ *                                 +--------+   |
+ *                               1 |        |   |
+ *                                 +--------+   |
+ *                               2 |        |---'
+ *                                 +--------+
+ *                               3 |        |
+ *                                 +--------+
+ *
+ * With this model, a single call binds all devices in a given domain to an
+ * address space. Other devices in the domain will get the same bond implicitly.
+ * However, users must issue one bind() for each device, because IOMMUs may
+ * implement SVA differently. Furthermore, mandating one bind() per device
+ * allows the driver to perform sanity-checks on device capabilities.
+ *
+ * In some IOMMUs, one entry (typically the first one) of the PASID table can be
+ * used to hold non-PASID translations. In this case PASID #0 is reserved and
+ * the first entry points to the io_pgtable pointer. In other IOMMUs the
+ * io_pgtable pointer is held in the device table and PASID #0 is available to
+ * the allocator.
+ */
+
+struct iommu_bond {
+	struct io_mm		*io_mm;
+	struct device		*dev;
+	struct iommu_domain	*domain;
+
+	struct list_head	mm_head;
+	struct list_head	dev_head;
+	struct list_head	domain_head;
+
+	void			*drvdata;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_pasid_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to bonds, access/modifications to the PASID IDR, and
+ * changes to io_mm refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_sva_lock);
+
+static struct io_mm *
+io_mm_alloc(struct iommu_domain *domain, struct device *dev,
+	    struct mm_struct *mm, unsigned long flags)
+{
+	int ret;
+	int pasid;
+	struct io_mm *io_mm;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
+		return ERR_PTR(-ENODEV);
+
+	io_mm = domain->ops->mm_alloc(domain, mm, flags);
+	if (IS_ERR(io_mm))
+		return io_mm;
+	if (!io_mm)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->flags		= flags;
+	io_mm->mm		= mm;
+	io_mm->release		= domain->ops->mm_free;
+	INIT_LIST_HEAD(&io_mm->devices);
+	/* Leave kref as zero until the io_mm is fully initialized */
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
+			  param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		ret = pasid;
+		goto err_free_mm;
+	}
+
+	/* TODO: keep track of mm. For the moment, abort. */
+	ret = -ENOSYS;
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	spin_unlock(&iommu_sva_lock);
+
+err_free_mm:
+	io_mm->release(io_mm);
+	mmdrop(mm);
+
+	return ERR_PTR(ret);
+}
+
+static void io_mm_free(struct io_mm *io_mm)
+{
+	struct mm_struct *mm = io_mm->mm;
+
+	io_mm->release(io_mm);
+	mmdrop(mm);
+}
+
+static void io_mm_release(struct kref *kref)
+{
+	struct io_mm *io_mm;
+
+	io_mm = container_of(kref, struct io_mm, kref);
+	WARN_ON(!list_empty(&io_mm->devices));
+
+	/* The PASID can now be reallocated for another mm... */
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	/* ... but this mm is freed after a grace period (TODO) */
+	io_mm_free(io_mm);
+}
+
+/*
+ * Returns non-zero if a reference to the io_mm was successfully taken.
+ * Returns zero if the io_mm is being freed and should not be used.
+ */
+static int io_mm_get_locked(struct io_mm *io_mm)
+{
+	if (io_mm)
+		return kref_get_unless_zero(&io_mm->kref);
+
+	return 0;
+}
+
+static void io_mm_put_locked(struct io_mm *io_mm)
+{
+	kref_put(&io_mm->kref, io_mm_release);
+}
+
+static void io_mm_put(struct io_mm *io_mm)
+{
+	spin_lock(&iommu_sva_lock);
+	io_mm_put_locked(io_mm);
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
+			struct io_mm *io_mm, void *drvdata)
+{
+	int ret;
+	bool attach_domain = true;
+	int pasid = io_mm->pasid;
+	struct iommu_bond *bond, *tmp;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+		return -ENODEV;
+
+	if (pasid > param->max_pasid || pasid < param->min_pasid)
+		return -ERANGE;
+
+	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond->domain		= domain;
+	bond->io_mm		= io_mm;
+	bond->dev		= dev;
+	bond->drvdata		= drvdata;
+
+	spin_lock(&iommu_sva_lock);
+	/*
+	 * Check if this io_mm is already bound to the domain. In which case the
+	 * IOMMU driver doesn't have to install the PASID table entry.
+	 */
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == io_mm) {
+			attach_domain = false;
+			break;
+		}
+	}
+
+	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
+	if (ret) {
+		kfree(bond);
+		goto out_unlock;
+	}
+
+	list_add(&bond->mm_head, &io_mm->devices);
+	list_add(&bond->domain_head, &domain->mm_list);
+	list_add(&bond->dev_head, &param->mm_list);
+
+out_unlock:
+	spin_unlock(&iommu_sva_lock);
+	return ret;
+}
+
+static void io_mm_detach_locked(struct iommu_bond *bond)
+{
+	struct iommu_bond *tmp;
+	bool detach_domain = true;
+	struct iommu_domain *domain = bond->domain;
+
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
+			detach_domain = false;
+			break;
+		}
+	}
+
+	list_del(&bond->mm_head);
+	list_del(&bond->domain_head);
+	list_del(&bond->dev_head);
+
+	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
+
+	io_mm_put_locked(bond->io_mm);
+	kfree(bond);
+}
 
 int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			    unsigned long flags, void *drvdata)
 {
-	return -ENOSYS; /* TODO */
+	int i;
+	int ret = 0;
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = NULL;
+	struct iommu_domain *domain;
+	struct iommu_sva_param *param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	param = dev->iommu_param->sva_param;
+	if (!param || (flags & ~param->features)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* If an io_mm already exists, use it */
+	spin_lock(&iommu_sva_lock);
+	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
+			/* ... Unless it's already bound to this device */
+			list_for_each_entry(bond, &io_mm->devices, mm_head) {
+				if (bond->dev == dev) {
+					ret = -EEXIST;
+					io_mm_put_locked(io_mm);
+					break;
+				}
+			}
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+	if (ret)
+		goto out_unlock;
+
+	/* Require identical features within an io_mm for now */
+	if (io_mm && (flags != io_mm->flags)) {
+		io_mm_put(io_mm);
+		ret = -EDOM;
+		goto out_unlock;
+	}
+
+	if (!io_mm) {
+		io_mm = io_mm_alloc(domain, dev, mm, flags);
+		if (IS_ERR(io_mm)) {
+			ret = PTR_ERR(io_mm);
+			goto out_unlock;
+		}
+	}
+
+	ret = io_mm_attach(domain, dev, io_mm, drvdata);
+	if (ret)
+		io_mm_put(io_mm);
+	else
+		*pasid = io_mm->pasid;
+
+out_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
 
 int __iommu_sva_unbind_device(struct device *dev, int pasid)
 {
-	return -ENOSYS; /* TODO */
+	int ret = -ESRCH;
+	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL;
+	struct iommu_sva_param *param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	mutex_lock(&dev->iommu_param->sva_lock);
+	param = dev->iommu_param->sva_param;
+	if (!param) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &param->mm_list, dev_head) {
+		if (bond->io_mm->pasid == pasid) {
+			io_mm_detach_locked(bond);
+			ret = 0;
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+
+out_unlock:
+	mutex_unlock(&dev->iommu_param->sva_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
 
 static void __iommu_sva_unbind_device_all(struct device *dev)
 {
-	/* TODO */
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+	struct iommu_bond *bond, *next;
+
+	if (!param)
+		return;
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
+		io_mm_detach_locked(bond);
+	spin_unlock(&iommu_sva_lock);
 }
 
 /**
@@ -82,6 +472,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	param->features		= features;
 	param->min_pasid	= min_pasid;
 	param->max_pasid	= max_pasid;
+	INIT_LIST_HEAD(&param->mm_list);
 
 	mutex_lock(&dev->iommu_param->sva_lock);
 	if (dev->iommu_param->sva_param) {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index aba3bf15d46c..7113fe398b70 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1525,6 +1525,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->mm_list);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9c49877e37a5..6a3ced6a5aa1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -99,6 +99,20 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	struct list_head mm_list;
+};
+
+struct io_mm {
+	int			pasid;
+	/* IOMMU_SVA_FEAT_* */
+	unsigned long		flags;
+	struct list_head	devices;
+	struct kref		kref;
+	struct mm_struct	*mm;
+
+	/* Release callback for this mm */
+	void (*release)(struct io_mm *io_mm);
 };
 
 enum iommu_cap {
@@ -201,6 +215,7 @@ struct iommu_sva_param {
 	unsigned long features;
 	unsigned int min_pasid;
 	unsigned int max_pasid;
+	struct list_head mm_list;
 };
 
 /**
@@ -212,6 +227,12 @@ struct iommu_sva_param {
  * @detach_dev: detach device from an iommu domain
  * @sva_init_device: initialize Shared Virtual Addressing for a device
  * @sva_shutdown_device: shutdown Shared Virtual Addressing for a device
+ * @mm_alloc: allocate io_mm
+ * @mm_free: free io_mm
+ * @mm_attach: attach io_mm to a device. Install PASID entry if necessary. Must
+ *             not sleep.
+ * @mm_detach: detach io_mm from a device. Remove PASID entry and
+ *             flush associated TLB entries if necessary. Must not sleep.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
@@ -249,6 +270,14 @@ struct iommu_ops {
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*sva_init_device)(struct device *dev, struct iommu_sva_param *param);
 	void (*sva_shutdown_device)(struct device *dev);
+	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
+				  struct mm_struct *mm,
+				  unsigned long flags);
+	void (*mm_free)(struct io_mm *io_mm);
+	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
+			 struct io_mm *io_mm, bool attach_domain);
+	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
+			  struct io_mm *io_mm, bool detach_domain);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 04/10] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

When an mm exits, devices that were bound to it must stop performing DMA
on its PASID. Let device drivers register a callback to be notified on mm
exit. Add the callback to the sva_param structure attached to struct
device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 10 +++++++++-
 include/linux/iommu.h     |  8 ++++++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index a486bc947335..08da479dad68 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -436,6 +436,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * @features: bitmask of features that need to be initialized
  * @min_pasid: min PASID value supported by the device
  * @max_pasid: max PASID value supported by the device
+ * @mm_exit: callback for process address space release
  *
  * Users of the bind()/unbind() API must call this function to initialize all
  * features required for SVA.
@@ -447,13 +448,19 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * overrides it. Similarly, @min_pasid overrides the lower PASID limit supported
  * by the IOMMU.
  *
+ * @mm_exit is called when an address space bound to the device is about to be
+ * torn down by exit_mmap. After @mm_exit returns, the device must not issue any
+ * more transaction with the PASID given as argument. The handler gets an opaque
+ * pointer corresponding to the drvdata passed as argument to bind().
+ *
  * The device should not be performing any DMA while this function is running,
  * otherwise the behavior is undefined.
  *
  * Return 0 if initialization succeeded, or an error.
  */
 int iommu_sva_init_device(struct device *dev, unsigned long features,
-		       unsigned int min_pasid, unsigned int max_pasid)
+			  unsigned int min_pasid, unsigned int max_pasid,
+			  iommu_mm_exit_handler_t mm_exit)
 {
 	int ret;
 	struct iommu_sva_param *param;
@@ -472,6 +479,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	param->features		= features;
 	param->min_pasid	= min_pasid;
 	param->max_pasid	= max_pasid;
+	param->mm_exit		= mm_exit;
 	INIT_LIST_HEAD(&param->mm_list);
 
 	mutex_lock(&dev->iommu_param->sva_lock);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6a3ced6a5aa1..c95ff714ea66 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,7 @@ struct iommu_fault_event;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
@@ -216,6 +217,7 @@ struct iommu_sva_param {
 	unsigned int min_pasid;
 	unsigned int max_pasid;
 	struct list_head mm_list;
+	iommu_mm_exit_handler_t mm_exit;
 };
 
 /**
@@ -967,7 +969,8 @@ static inline void iommu_debugfs_setup(void) {}
 #ifdef CONFIG_IOMMU_SVA
 extern int iommu_sva_init_device(struct device *dev, unsigned long features,
 				 unsigned int min_pasid,
-				 unsigned int max_pasid);
+				 unsigned int max_pasid,
+				 iommu_mm_exit_handler_t mm_exit);
 extern void iommu_sva_shutdown_device(struct device *dev);
 extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				   int *pasid, unsigned long flags,
@@ -978,7 +981,8 @@ extern void iommu_sva_unbind_device_all(struct device *dev);
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
 					unsigned int min_pasid,
-					unsigned int max_pasid)
+					unsigned int max_pasid,
+					iommu_mm_exit_handler_t mm_exit)
 {
 	return -ENODEV;
 }
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 04/10] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

When an mm exits, devices that were bound to it must stop performing DMA
on its PASID. Let device drivers register a callback to be notified on mm
exit. Add the callback to the sva_param structure attached to struct
device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-sva.c | 10 +++++++++-
 include/linux/iommu.h     |  8 ++++++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index a486bc947335..08da479dad68 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -436,6 +436,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * @features: bitmask of features that need to be initialized
  * @min_pasid: min PASID value supported by the device
  * @max_pasid: max PASID value supported by the device
+ * @mm_exit: callback for process address space release
  *
  * Users of the bind()/unbind() API must call this function to initialize all
  * features required for SVA.
@@ -447,13 +448,19 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * overrides it. Similarly, @min_pasid overrides the lower PASID limit supported
  * by the IOMMU.
  *
+ * @mm_exit is called when an address space bound to the device is about to be
+ * torn down by exit_mmap. After @mm_exit returns, the device must not issue any
+ * more transaction with the PASID given as argument. The handler gets an opaque
+ * pointer corresponding to the drvdata passed as argument to bind().
+ *
  * The device should not be performing any DMA while this function is running,
  * otherwise the behavior is undefined.
  *
  * Return 0 if initialization succeeded, or an error.
  */
 int iommu_sva_init_device(struct device *dev, unsigned long features,
-		       unsigned int min_pasid, unsigned int max_pasid)
+			  unsigned int min_pasid, unsigned int max_pasid,
+			  iommu_mm_exit_handler_t mm_exit)
 {
 	int ret;
 	struct iommu_sva_param *param;
@@ -472,6 +479,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	param->features		= features;
 	param->min_pasid	= min_pasid;
 	param->max_pasid	= max_pasid;
+	param->mm_exit		= mm_exit;
 	INIT_LIST_HEAD(&param->mm_list);
 
 	mutex_lock(&dev->iommu_param->sva_lock);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6a3ced6a5aa1..c95ff714ea66 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,7 @@ struct iommu_fault_event;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
@@ -216,6 +217,7 @@ struct iommu_sva_param {
 	unsigned int min_pasid;
 	unsigned int max_pasid;
 	struct list_head mm_list;
+	iommu_mm_exit_handler_t mm_exit;
 };
 
 /**
@@ -967,7 +969,8 @@ static inline void iommu_debugfs_setup(void) {}
 #ifdef CONFIG_IOMMU_SVA
 extern int iommu_sva_init_device(struct device *dev, unsigned long features,
 				 unsigned int min_pasid,
-				 unsigned int max_pasid);
+				 unsigned int max_pasid,
+				 iommu_mm_exit_handler_t mm_exit);
 extern void iommu_sva_shutdown_device(struct device *dev);
 extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				   int *pasid, unsigned long flags,
@@ -978,7 +981,8 @@ extern void iommu_sva_unbind_device_all(struct device *dev);
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
 					unsigned int min_pasid,
-					unsigned int max_pasid)
+					unsigned int max_pasid,
+					iommu_mm_exit_handler_t mm_exit)
 {
 	return -ENODEV;
 }
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 05/10] iommu/sva: Track mm changes with an MMU notifier
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

When creating an io_mm structure, register an MMU notifier that informs
us when the virtual address space changes and disappears.

Add a new operation to the IOMMU driver, mm_invalidate, called when a
range of addresses is unmapped to let the IOMMU driver send ATC
invalidations. mm_invalidate cannot sleep.

Adding the notifier complicates io_mm release. In one case device
drivers free the io_mm explicitly by calling unbind (or detaching the
device from its domain). In the other case the process could crash
before unbind, in which case the release notifier has to do all the
work.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
v2->v3: Add MMU_INVALIDATE_DOES_NOT_BLOCK flag to MMU notifier

In v2 Christian pointed out that letting mm_exit() linger for too long
(some devices could need minutes to flush a PASID context) might force
the OOM killer to kill additional tasks, for example if the victim has
mlocked all its memory, which the reaper thread cannot clean up.

If this turns out to be problematic to users, we might need to add some
complexity in IOMMU drivers in order to disable PASIDs and return to
exit_mmap() while DMA is still running. While invasive on the IOMMU
side, such change might not require modification of device drivers or
the API, since iommu_notifier_release() could simply schedule a call to
their mm_exit() instead of calling it synchronously. So we can tune this
behavior in a later series.

Note that some steps cannot be skipped: the ATC invalidation, which may
take up to a minute according to the PCI spec, must be done from the MMU
notifier context. The PCI stop PASID mechanism is an implicit ATC
invalidation, but if we postpone it then we'll have to perform an
explicit one.
---
 drivers/iommu/Kconfig     |   1 +
 drivers/iommu/iommu-sva.c | 246 +++++++++++++++++++++++++++++++++++---
 include/linux/iommu.h     |  10 ++
 3 files changed, 240 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 884580401919..88d6c68284f3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -98,6 +98,7 @@ config IOMMU_DMA
 config IOMMU_SVA
 	bool
 	select IOMMU_API
+	select MMU_NOTIFIER
 
 config FSL_PAMU
 	bool "Freescale IOMMU support"
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 08da479dad68..5ff8967cb213 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,6 +7,7 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
@@ -107,6 +108,9 @@ struct iommu_bond {
 	struct list_head	mm_head;
 	struct list_head	dev_head;
 	struct list_head	domain_head;
+	refcount_t		refs;
+	struct wait_queue_head	mm_exit_wq;
+	bool			mm_exit_active;
 
 	void			*drvdata;
 };
@@ -125,6 +129,8 @@ static DEFINE_IDR(iommu_pasid_idr);
  */
 static DEFINE_SPINLOCK(iommu_sva_lock);
 
+static struct mmu_notifier_ops iommu_mmu_notifier;
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm, unsigned long flags)
@@ -152,6 +158,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 
 	io_mm->flags		= flags;
 	io_mm->mm		= mm;
+	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
 	/* Leave kref as zero until the io_mm is fully initialized */
@@ -169,8 +176,29 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 		goto err_free_mm;
 	}
 
-	/* TODO: keep track of mm. For the moment, abort. */
-	ret = -ENOSYS;
+	ret = mmu_notifier_register(&io_mm->notifier, mm);
+	if (ret)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * io_mm by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * The following barrier ensures that users, who obtain the io_mm with
+	 * kref_get_unless_zero, don't read uninitialized fields in the
+	 * structure.
+	 */
+	smp_wmb();
+	kref_init(&io_mm->kref);
+
+	return io_mm;
+
+err_free_pasid:
+	/*
+	 * Even if the io_mm is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
 	spin_lock(&iommu_sva_lock);
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);
@@ -182,9 +210,13 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	return ERR_PTR(ret);
 }
 
-static void io_mm_free(struct io_mm *io_mm)
+static void io_mm_free(struct rcu_head *rcu)
 {
-	struct mm_struct *mm = io_mm->mm;
+	struct io_mm *io_mm;
+	struct mm_struct *mm;
+
+	io_mm = container_of(rcu, struct io_mm, rcu);
+	mm = io_mm->mm;
 
 	io_mm->release(io_mm);
 	mmdrop(mm);
@@ -197,10 +229,24 @@ static void io_mm_release(struct kref *kref)
 	io_mm = container_of(kref, struct io_mm, kref);
 	WARN_ON(!list_empty(&io_mm->devices));
 
-	/* The PASID can now be reallocated for another mm... */
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
-	/* ... but this mm is freed after a grace period (TODO) */
-	io_mm_free(io_mm);
+
+	/*
+	 * If we're being released from mm exit, the notifier callback ->release
+	 * has already been called. Otherwise we don't need ->release, the io_mm
+	 * isn't attached to anything anymore. Hence no_release.
+	 */
+	mmu_notifier_unregister_no_release(&io_mm->notifier, io_mm->mm);
+
+	/*
+	 * We can't free the structure here, because if mm exits during
+	 * unbind(), then ->release might be attempting to grab the io_mm
+	 * concurrently. And in the other case, if ->release is calling
+	 * io_mm_release, then __mmu_notifier_release expects to still have a
+	 * valid mn when returning. So free the structure when it's safe, after
+	 * the RCU grace period elapsed.
+	 */
+	mmu_notifier_call_srcu(&io_mm->rcu, io_mm_free);
 }
 
 /*
@@ -209,8 +255,14 @@ static void io_mm_release(struct kref *kref)
  */
 static int io_mm_get_locked(struct io_mm *io_mm)
 {
-	if (io_mm)
-		return kref_get_unless_zero(&io_mm->kref);
+	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
+		/*
+		 * kref_get_unless_zero doesn't provide ordering for reads. This
+		 * barrier pairs with the one in io_mm_alloc.
+		 */
+		smp_rmb();
+		return 1;
+	}
 
 	return 0;
 }
@@ -236,7 +288,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	struct iommu_bond *bond, *tmp;
 	struct iommu_sva_param *param = dev->iommu_param->sva_param;
 
-	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
+	    !domain->ops->mm_invalidate)
 		return -ENODEV;
 
 	if (pasid > param->max_pasid || pasid < param->min_pasid)
@@ -250,6 +303,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	bond->io_mm		= io_mm;
 	bond->dev		= dev;
 	bond->drvdata		= drvdata;
+	refcount_set(&bond->refs, 1);
+	init_waitqueue_head(&bond->mm_exit_wq);
 
 	spin_lock(&iommu_sva_lock);
 	/*
@@ -278,12 +333,37 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	return ret;
 }
 
-static void io_mm_detach_locked(struct iommu_bond *bond)
+static void io_mm_detach_locked(struct iommu_bond *bond, bool wait)
 {
 	struct iommu_bond *tmp;
 	bool detach_domain = true;
 	struct iommu_domain *domain = bond->domain;
 
+	if (wait) {
+		bool do_detach = true;
+		/*
+		 * If we're unbind() then we're deleting the bond no matter
+		 * what. Tell the mm_exit thread that we're cleaning up, and
+		 * wait until it finishes using the bond.
+		 *
+		 * refs is guaranteed to be one or more, otherwise it would
+		 * already have been removed from the list. Check if someone is
+		 * already waiting, in which case we wait but do not free.
+		 */
+		if (refcount_read(&bond->refs) > 1)
+			do_detach = false;
+
+		refcount_inc(&bond->refs);
+		wait_event_lock_irq(bond->mm_exit_wq, !bond->mm_exit_active,
+				    iommu_sva_lock);
+		if (!do_detach)
+			return;
+
+	} else if (!refcount_dec_and_test(&bond->refs)) {
+		/* unbind() is waiting to free the bond */
+		return;
+	}
+
 	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
 		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
 			detach_domain = false;
@@ -301,6 +381,130 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
 	kfree(bond);
 }
 
+static int iommu_signal_mm_exit(struct iommu_bond *bond)
+{
+	struct device *dev = bond->dev;
+	struct io_mm *io_mm = bond->io_mm;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	/*
+	 * We can't hold the device's sva_lock. If we did and the device driver
+	 * used a global lock around io_mm, we would risk getting the following
+	 * deadlock:
+	 *
+	 *   exit_mm()                 |  Shutdown SVA
+	 *    mutex_lock(sva_lock)     |   mutex_lock(glob lock)
+	 *     param->mm_exit()        |    sva_shutdown_device()
+	 *      mutex_lock(glob lock)  |     mutex_lock(sva_lock)
+	 *
+	 * Fortunately unbind() waits for us to finish, and sva_shutdown_device
+	 * requires that any bond is removed, so we can safely access mm_exit
+	 * and drvdata without taking the sva_lock.
+	 */
+	if (!param || !param->mm_exit)
+		return 0;
+
+	return param->mm_exit(dev, io_mm->pasid, bond->drvdata);
+}
+
+/* Called when the mm exits. Can race with unbind(). */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_bond *bond, *next;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	/*
+	 * If the mm is exiting then devices are still bound to the io_mm.
+	 * A few things need to be done before it is safe to release:
+	 *
+	 * - As the mmu notifier doesn't hold any reference to the io_mm when
+	 *   calling ->release(), try to take a reference.
+	 * - Tell the device driver to stop using this PASID.
+	 * - Clear the PASID table and invalidate TLBs.
+	 * - Drop all references to this io_mm by freeing the bonds.
+	 */
+	spin_lock(&iommu_sva_lock);
+	if (!io_mm_get_locked(io_mm)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_sva_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
+		/*
+		 * Release the lock to let the handler sleep. We need to be
+		 * careful about concurrent modifications to the list and to the
+		 * bond. Tell unbind() not to free the bond until we're done.
+		 */
+		bond->mm_exit_active = true;
+		spin_unlock(&iommu_sva_lock);
+
+		if (iommu_signal_mm_exit(bond))
+			dev_WARN(bond->dev, "possible leak of PASID %u",
+				 io_mm->pasid);
+
+		spin_lock(&iommu_sva_lock);
+		next = list_next_entry(bond, mm_head);
+
+		/* If someone is waiting, let them delete the bond now */
+		bond->mm_exit_active = false;
+		wake_up_all(&bond->mm_exit_wq);
+
+		/* Otherwise, do it ourselves */
+		io_mm_detach_locked(bond, false);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this io_mm, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the io_mm.
+	 */
+	io_mm_put(io_mm);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		struct iommu_domain *domain = bond->domain;
+
+		domain->ops->mm_invalidate(domain, bond->dev, io_mm, start,
+					   end - start);
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn,
+				      struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_mmu_notifier = {
+	.flags			= MMU_INVALIDATE_DOES_NOT_BLOCK,
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			    unsigned long flags, void *drvdata)
 {
@@ -386,15 +590,16 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 		goto out_unlock;
 	}
 
-	spin_lock(&iommu_sva_lock);
+	/* spin_lock_irq matches the one in wait_event_lock_irq */
+	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
 		if (bond->io_mm->pasid == pasid) {
-			io_mm_detach_locked(bond);
+			io_mm_detach_locked(bond, true);
 			ret = 0;
 			break;
 		}
 	}
-	spin_unlock(&iommu_sva_lock);
+	spin_unlock_irq(&iommu_sva_lock);
 
 out_unlock:
 	mutex_unlock(&dev->iommu_param->sva_lock);
@@ -410,10 +615,10 @@ static void __iommu_sva_unbind_device_all(struct device *dev)
 	if (!param)
 		return;
 
-	spin_lock(&iommu_sva_lock);
+	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
-		io_mm_detach_locked(bond);
-	spin_unlock(&iommu_sva_lock);
+		io_mm_detach_locked(bond, true);
+	spin_unlock_irq(&iommu_sva_lock);
 }
 
 /**
@@ -421,6 +626,7 @@ static void __iommu_sva_unbind_device_all(struct device *dev)
  * @dev: the device
  *
  * When detaching @dev from a domain, IOMMU drivers should use this helper.
+ * This function may sleep while waiting for bonds to be released.
  */
 void iommu_sva_unbind_device_all(struct device *dev)
 {
@@ -453,6 +659,12 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * more transaction with the PASID given as argument. The handler gets an opaque
  * pointer corresponding to the drvdata passed as argument to bind().
  *
+ * The @mm_exit handler is allowed to sleep. Be careful about the locks taken in
+ * @mm_exit, because they might lead to deadlocks if they are also held when
+ * dropping references to the mm. Consider the following call chain:
+ *   mutex_lock(A); mmput(mm) -> exit_mm() -> @mm_exit() -> mutex_lock(A)
+ * Using mmput_async() prevents this scenario.
+ *
  * The device should not be performing any DMA while this function is running,
  * otherwise the behavior is undefined.
  *
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c95ff714ea66..429f3dc37a35 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -24,6 +24,7 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/err.h>
+#include <linux/mmu_notifier.h>
 #include <linux/of.h>
 #include <uapi/linux/iommu.h>
 
@@ -110,10 +111,15 @@ struct io_mm {
 	unsigned long		flags;
 	struct list_head	devices;
 	struct kref		kref;
+#if defined(CONFIG_MMU_NOTIFIER)
+	struct mmu_notifier	notifier;
+#endif
 	struct mm_struct	*mm;
 
 	/* Release callback for this mm */
 	void (*release)(struct io_mm *io_mm);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -235,6 +241,7 @@ struct iommu_sva_param {
  *             not sleep.
  * @mm_detach: detach io_mm from a device. Remove PASID entry and
  *             flush associated TLB entries if necessary. Must not sleep.
+ * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
@@ -280,6 +287,9 @@ struct iommu_ops {
 			 struct io_mm *io_mm, bool attach_domain);
 	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
 			  struct io_mm *io_mm, bool detach_domain);
+	void (*mm_invalidate)(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, unsigned long vaddr,
+			      size_t size);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 05/10] iommu/sva: Track mm changes with an MMU notifier
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

When creating an io_mm structure, register an MMU notifier that informs
us when the virtual address space changes and disappears.

Add a new operation to the IOMMU driver, mm_invalidate, called when a
range of addresses is unmapped to let the IOMMU driver send ATC
invalidations. mm_invalidate cannot sleep.

Adding the notifier complicates io_mm release. In one case device
drivers free the io_mm explicitly by calling unbind (or detaching the
device from its domain). In the other case the process could crash
before unbind, in which case the release notifier has to do all the
work.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
v2->v3: Add MMU_INVALIDATE_DOES_NOT_BLOCK flag to MMU notifier

In v2 Christian pointed out that letting mm_exit() linger for too long
(some devices could need minutes to flush a PASID context) might force
the OOM killer to kill additional tasks, for example if the victim has
mlocked all its memory, which the reaper thread cannot clean up.

If this turns out to be problematic to users, we might need to add some
complexity in IOMMU drivers in order to disable PASIDs and return to
exit_mmap() while DMA is still running. While invasive on the IOMMU
side, such change might not require modification of device drivers or
the API, since iommu_notifier_release() could simply schedule a call to
their mm_exit() instead of calling it synchronously. So we can tune this
behavior in a later series.

Note that some steps cannot be skipped: the ATC invalidation, which may
take up to a minute according to the PCI spec, must be done from the MMU
notifier context. The PCI stop PASID mechanism is an implicit ATC
invalidation, but if we postpone it then we'll have to perform an
explicit one.
---
 drivers/iommu/Kconfig     |   1 +
 drivers/iommu/iommu-sva.c | 246 +++++++++++++++++++++++++++++++++++---
 include/linux/iommu.h     |  10 ++
 3 files changed, 240 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 884580401919..88d6c68284f3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -98,6 +98,7 @@ config IOMMU_DMA
 config IOMMU_SVA
 	bool
 	select IOMMU_API
+	select MMU_NOTIFIER
 
 config FSL_PAMU
 	bool "Freescale IOMMU support"
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 08da479dad68..5ff8967cb213 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,6 +7,7 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
@@ -107,6 +108,9 @@ struct iommu_bond {
 	struct list_head	mm_head;
 	struct list_head	dev_head;
 	struct list_head	domain_head;
+	refcount_t		refs;
+	struct wait_queue_head	mm_exit_wq;
+	bool			mm_exit_active;
 
 	void			*drvdata;
 };
@@ -125,6 +129,8 @@ static DEFINE_IDR(iommu_pasid_idr);
  */
 static DEFINE_SPINLOCK(iommu_sva_lock);
 
+static struct mmu_notifier_ops iommu_mmu_notifier;
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm, unsigned long flags)
@@ -152,6 +158,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 
 	io_mm->flags		= flags;
 	io_mm->mm		= mm;
+	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
 	/* Leave kref as zero until the io_mm is fully initialized */
@@ -169,8 +176,29 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 		goto err_free_mm;
 	}
 
-	/* TODO: keep track of mm. For the moment, abort. */
-	ret = -ENOSYS;
+	ret = mmu_notifier_register(&io_mm->notifier, mm);
+	if (ret)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * io_mm by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * The following barrier ensures that users, who obtain the io_mm with
+	 * kref_get_unless_zero, don't read uninitialized fields in the
+	 * structure.
+	 */
+	smp_wmb();
+	kref_init(&io_mm->kref);
+
+	return io_mm;
+
+err_free_pasid:
+	/*
+	 * Even if the io_mm is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
 	spin_lock(&iommu_sva_lock);
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);
@@ -182,9 +210,13 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	return ERR_PTR(ret);
 }
 
-static void io_mm_free(struct io_mm *io_mm)
+static void io_mm_free(struct rcu_head *rcu)
 {
-	struct mm_struct *mm = io_mm->mm;
+	struct io_mm *io_mm;
+	struct mm_struct *mm;
+
+	io_mm = container_of(rcu, struct io_mm, rcu);
+	mm = io_mm->mm;
 
 	io_mm->release(io_mm);
 	mmdrop(mm);
@@ -197,10 +229,24 @@ static void io_mm_release(struct kref *kref)
 	io_mm = container_of(kref, struct io_mm, kref);
 	WARN_ON(!list_empty(&io_mm->devices));
 
-	/* The PASID can now be reallocated for another mm... */
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
-	/* ... but this mm is freed after a grace period (TODO) */
-	io_mm_free(io_mm);
+
+	/*
+	 * If we're being released from mm exit, the notifier callback ->release
+	 * has already been called. Otherwise we don't need ->release, the io_mm
+	 * isn't attached to anything anymore. Hence no_release.
+	 */
+	mmu_notifier_unregister_no_release(&io_mm->notifier, io_mm->mm);
+
+	/*
+	 * We can't free the structure here, because if mm exits during
+	 * unbind(), then ->release might be attempting to grab the io_mm
+	 * concurrently. And in the other case, if ->release is calling
+	 * io_mm_release, then __mmu_notifier_release expects to still have a
+	 * valid mn when returning. So free the structure when it's safe, after
+	 * the RCU grace period elapsed.
+	 */
+	mmu_notifier_call_srcu(&io_mm->rcu, io_mm_free);
 }
 
 /*
@@ -209,8 +255,14 @@ static void io_mm_release(struct kref *kref)
  */
 static int io_mm_get_locked(struct io_mm *io_mm)
 {
-	if (io_mm)
-		return kref_get_unless_zero(&io_mm->kref);
+	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
+		/*
+		 * kref_get_unless_zero doesn't provide ordering for reads. This
+		 * barrier pairs with the one in io_mm_alloc.
+		 */
+		smp_rmb();
+		return 1;
+	}
 
 	return 0;
 }
@@ -236,7 +288,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	struct iommu_bond *bond, *tmp;
 	struct iommu_sva_param *param = dev->iommu_param->sva_param;
 
-	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
+	    !domain->ops->mm_invalidate)
 		return -ENODEV;
 
 	if (pasid > param->max_pasid || pasid < param->min_pasid)
@@ -250,6 +303,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	bond->io_mm		= io_mm;
 	bond->dev		= dev;
 	bond->drvdata		= drvdata;
+	refcount_set(&bond->refs, 1);
+	init_waitqueue_head(&bond->mm_exit_wq);
 
 	spin_lock(&iommu_sva_lock);
 	/*
@@ -278,12 +333,37 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	return ret;
 }
 
-static void io_mm_detach_locked(struct iommu_bond *bond)
+static void io_mm_detach_locked(struct iommu_bond *bond, bool wait)
 {
 	struct iommu_bond *tmp;
 	bool detach_domain = true;
 	struct iommu_domain *domain = bond->domain;
 
+	if (wait) {
+		bool do_detach = true;
+		/*
+		 * If we're unbind() then we're deleting the bond no matter
+		 * what. Tell the mm_exit thread that we're cleaning up, and
+		 * wait until it finishes using the bond.
+		 *
+		 * refs is guaranteed to be one or more, otherwise it would
+		 * already have been removed from the list. Check if someone is
+		 * already waiting, in which case we wait but do not free.
+		 */
+		if (refcount_read(&bond->refs) > 1)
+			do_detach = false;
+
+		refcount_inc(&bond->refs);
+		wait_event_lock_irq(bond->mm_exit_wq, !bond->mm_exit_active,
+				    iommu_sva_lock);
+		if (!do_detach)
+			return;
+
+	} else if (!refcount_dec_and_test(&bond->refs)) {
+		/* unbind() is waiting to free the bond */
+		return;
+	}
+
 	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
 		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
 			detach_domain = false;
@@ -301,6 +381,130 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
 	kfree(bond);
 }
 
+static int iommu_signal_mm_exit(struct iommu_bond *bond)
+{
+	struct device *dev = bond->dev;
+	struct io_mm *io_mm = bond->io_mm;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	/*
+	 * We can't hold the device's sva_lock. If we did and the device driver
+	 * used a global lock around io_mm, we would risk getting the following
+	 * deadlock:
+	 *
+	 *   exit_mm()                 |  Shutdown SVA
+	 *    mutex_lock(sva_lock)     |   mutex_lock(glob lock)
+	 *     param->mm_exit()        |    sva_shutdown_device()
+	 *      mutex_lock(glob lock)  |     mutex_lock(sva_lock)
+	 *
+	 * Fortunately unbind() waits for us to finish, and sva_shutdown_device
+	 * requires that any bond is removed, so we can safely access mm_exit
+	 * and drvdata without taking the sva_lock.
+	 */
+	if (!param || !param->mm_exit)
+		return 0;
+
+	return param->mm_exit(dev, io_mm->pasid, bond->drvdata);
+}
+
+/* Called when the mm exits. Can race with unbind(). */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_bond *bond, *next;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	/*
+	 * If the mm is exiting then devices are still bound to the io_mm.
+	 * A few things need to be done before it is safe to release:
+	 *
+	 * - As the mmu notifier doesn't hold any reference to the io_mm when
+	 *   calling ->release(), try to take a reference.
+	 * - Tell the device driver to stop using this PASID.
+	 * - Clear the PASID table and invalidate TLBs.
+	 * - Drop all references to this io_mm by freeing the bonds.
+	 */
+	spin_lock(&iommu_sva_lock);
+	if (!io_mm_get_locked(io_mm)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_sva_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
+		/*
+		 * Release the lock to let the handler sleep. We need to be
+		 * careful about concurrent modifications to the list and to the
+		 * bond. Tell unbind() not to free the bond until we're done.
+		 */
+		bond->mm_exit_active = true;
+		spin_unlock(&iommu_sva_lock);
+
+		if (iommu_signal_mm_exit(bond))
+			dev_WARN(bond->dev, "possible leak of PASID %u",
+				 io_mm->pasid);
+
+		spin_lock(&iommu_sva_lock);
+		next = list_next_entry(bond, mm_head);
+
+		/* If someone is waiting, let them delete the bond now */
+		bond->mm_exit_active = false;
+		wake_up_all(&bond->mm_exit_wq);
+
+		/* Otherwise, do it ourselves */
+		io_mm_detach_locked(bond, false);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this io_mm, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the io_mm.
+	 */
+	io_mm_put(io_mm);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		struct iommu_domain *domain = bond->domain;
+
+		domain->ops->mm_invalidate(domain, bond->dev, io_mm, start,
+					   end - start);
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn,
+				      struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_mmu_notifier = {
+	.flags			= MMU_INVALIDATE_DOES_NOT_BLOCK,
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			    unsigned long flags, void *drvdata)
 {
@@ -386,15 +590,16 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 		goto out_unlock;
 	}
 
-	spin_lock(&iommu_sva_lock);
+	/* spin_lock_irq matches the one in wait_event_lock_irq */
+	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
 		if (bond->io_mm->pasid == pasid) {
-			io_mm_detach_locked(bond);
+			io_mm_detach_locked(bond, true);
 			ret = 0;
 			break;
 		}
 	}
-	spin_unlock(&iommu_sva_lock);
+	spin_unlock_irq(&iommu_sva_lock);
 
 out_unlock:
 	mutex_unlock(&dev->iommu_param->sva_lock);
@@ -410,10 +615,10 @@ static void __iommu_sva_unbind_device_all(struct device *dev)
 	if (!param)
 		return;
 
-	spin_lock(&iommu_sva_lock);
+	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
-		io_mm_detach_locked(bond);
-	spin_unlock(&iommu_sva_lock);
+		io_mm_detach_locked(bond, true);
+	spin_unlock_irq(&iommu_sva_lock);
 }
 
 /**
@@ -421,6 +626,7 @@ static void __iommu_sva_unbind_device_all(struct device *dev)
  * @dev: the device
  *
  * When detaching @dev from a domain, IOMMU drivers should use this helper.
+ * This function may sleep while waiting for bonds to be released.
  */
 void iommu_sva_unbind_device_all(struct device *dev)
 {
@@ -453,6 +659,12 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * more transaction with the PASID given as argument. The handler gets an opaque
  * pointer corresponding to the drvdata passed as argument to bind().
  *
+ * The @mm_exit handler is allowed to sleep. Be careful about the locks taken in
+ * @mm_exit, because they might lead to deadlocks if they are also held when
+ * dropping references to the mm. Consider the following call chain:
+ *   mutex_lock(A); mmput(mm) -> exit_mm() -> @mm_exit() -> mutex_lock(A)
+ * Using mmput_async() prevents this scenario.
+ *
  * The device should not be performing any DMA while this function is running,
  * otherwise the behavior is undefined.
  *
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c95ff714ea66..429f3dc37a35 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -24,6 +24,7 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/err.h>
+#include <linux/mmu_notifier.h>
 #include <linux/of.h>
 #include <uapi/linux/iommu.h>
 
@@ -110,10 +111,15 @@ struct io_mm {
 	unsigned long		flags;
 	struct list_head	devices;
 	struct kref		kref;
+#if defined(CONFIG_MMU_NOTIFIER)
+	struct mmu_notifier	notifier;
+#endif
 	struct mm_struct	*mm;
 
 	/* Release callback for this mm */
 	void (*release)(struct io_mm *io_mm);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -235,6 +241,7 @@ struct iommu_sva_param {
  *             not sleep.
  * @mm_detach: detach io_mm from a device. Remove PASID entry and
  *             flush associated TLB entries if necessary. Must not sleep.
+ * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
@@ -280,6 +287,9 @@ struct iommu_ops {
 			 struct io_mm *io_mm, bool attach_domain);
 	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
 			  struct io_mm *io_mm, bool detach_domain);
+	void (*mm_invalidate)(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, unsigned long vaddr,
+			      size_t size);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 06/10] iommu/sva: Search mm by PASID
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

The fault handler will need to find an mm given its PASID. This is the
reason we have an IDR for storing address spaces, so hook it up.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 26 ++++++++++++++++++++++++++
 include/linux/iommu.h     |  7 +++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 5ff8967cb213..ee86f00ee1b9 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -636,6 +636,32 @@ void iommu_sva_unbind_device_all(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
 
+/**
+ * iommu_sva_find() - Find mm associated to the given PASID
+ * @pasid: Process Address Space ID assigned to the mm
+ *
+ * Returns the mm corresponding to this PASID, or NULL if not found. A reference
+ * to the mm is taken, and must be released with mmput().
+ */
+struct mm_struct *iommu_sva_find(int pasid)
+{
+	struct io_mm *io_mm;
+	struct mm_struct *mm = NULL;
+
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (io_mm && io_mm_get_locked(io_mm)) {
+		if (mmget_not_zero(io_mm->mm))
+			mm = io_mm->mm;
+
+		io_mm_put_locked(io_mm);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return mm;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 429f3dc37a35..a457650b80de 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -987,6 +987,8 @@ extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				   void *drvdata);
 extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void iommu_sva_unbind_device_all(struct device *dev);
+extern struct mm_struct *iommu_sva_find(int pasid);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
@@ -1016,6 +1018,11 @@ static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
 static inline void iommu_sva_unbind_device_all(struct device *dev)
 {
 }
+
+static inline struct mm_struct *iommu_sva_find(int pasid)
+{
+	return NULL;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 06/10] iommu/sva: Search mm by PASID
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

The fault handler will need to find an mm given its PASID. This is the
reason we have an IDR for storing address spaces, so hook it up.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-sva.c | 26 ++++++++++++++++++++++++++
 include/linux/iommu.h     |  7 +++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 5ff8967cb213..ee86f00ee1b9 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -636,6 +636,32 @@ void iommu_sva_unbind_device_all(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
 
+/**
+ * iommu_sva_find() - Find mm associated to the given PASID
+ * @pasid: Process Address Space ID assigned to the mm
+ *
+ * Returns the mm corresponding to this PASID, or NULL if not found. A reference
+ * to the mm is taken, and must be released with mmput().
+ */
+struct mm_struct *iommu_sva_find(int pasid)
+{
+	struct io_mm *io_mm;
+	struct mm_struct *mm = NULL;
+
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (io_mm && io_mm_get_locked(io_mm)) {
+		if (mmget_not_zero(io_mm->mm))
+			mm = io_mm->mm;
+
+		io_mm_put_locked(io_mm);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return mm;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 429f3dc37a35..a457650b80de 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -987,6 +987,8 @@ extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				   void *drvdata);
 extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void iommu_sva_unbind_device_all(struct device *dev);
+extern struct mm_struct *iommu_sva_find(int pasid);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
@@ -1016,6 +1018,11 @@ static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
 static inline void iommu_sva_unbind_device_all(struct device *dev)
 {
 }
+
+static inline struct mm_struct *iommu_sva_find(int pasid)
+{
+	return NULL;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 07/10] iommu: Add a page fault handler
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCI PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
recently added to the IOMMU core for SVA virtualisation. Add a page fault
handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them to
IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, supposedly in an IRQ handler, the IOMMU
driver reports the fault using iommu_report_device_fault(), which calls
the registered handler. The page fault handler then calls the mm fault
handler, and reports either success or failure with iommu_page_response().
When the handler succeeded, the IOMMU retries the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
v2->v3:
* queue_flush now removes pending partial faults
* queue_flush now takes an optional PASID argument, allowing IOMMU
  drivers to selectively flush faults if possible
* remove PAGE_RESP_HANDLED/PAGE_RESP_CONTINUE
* rename iopf_context -> iopf_fault
---
 drivers/iommu/Kconfig      |   4 +
 drivers/iommu/Makefile     |   1 +
 drivers/iommu/io-pgfault.c | 382 +++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h      |  56 +++++-
 4 files changed, 442 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 88d6c68284f3..27e9999ad980 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -100,6 +100,10 @@ config IOMMU_SVA
 	select IOMMU_API
 	select MMU_NOTIFIER
 
+config IOMMU_PAGE_FAULT
+	bool
+	select IOMMU_API
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 7d6332be5f0e..1c4b0be5d44b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..29aa8c6ba459
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+/**
+ * struct iopf_queue - IO Page Fault queue
+ * @wq: the fault workqueue
+ * @flush: low-level flush callback
+ * @flush_arg: flush() argument
+ * @refs: references to this structure taken by producers
+ */
+struct iopf_queue {
+	struct workqueue_struct		*wq;
+	iopf_queue_flush_t		flush;
+	void				*flush_arg;
+	refcount_t			refs;
+};
+
+/**
+ * struct iopf_device_param - IO Page Fault data attached to a device
+ * @queue: IOPF queue
+ * @partial: faults that are part of a Page Request Group for which the last
+ *           request hasn't been submitted yet.
+ */
+struct iopf_device_param {
+	struct iopf_queue		*queue;
+	struct list_head		partial;
+};
+
+struct iopf_fault {
+	struct iommu_fault_event	evt;
+	struct list_head		head;
+};
+
+struct iopf_group {
+	struct iopf_fault		last_fault;
+	struct list_head		faults;
+	struct work_struct		work;
+	struct device			*dev;
+};
+
+static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
+			 enum page_response_code status)
+{
+	struct page_response_msg resp = {
+		.addr			= evt->addr,
+		.pasid			= evt->pasid,
+		.pasid_present		= evt->pasid_valid,
+		.page_req_group_id	= evt->page_req_group_id,
+		.private_data		= evt->iommu_private,
+		.resp_code		= status,
+	};
+
+	return iommu_page_response(dev, &resp);
+}
+
+static enum page_response_code
+iopf_handle_single(struct iopf_fault *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iopf_handle_group(struct work_struct *work)
+{
+	struct iopf_group *group;
+	struct iopf_fault *fault, *next;
+	enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
+
+	group = container_of(work, struct iopf_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault_event *evt = &fault->evt;
+		/*
+		 * For the moment, errors are sticky: don't handle subsequent
+		 * faults in the group if there is an error.
+		 */
+		if (status == IOMMU_PAGE_RESP_SUCCESS)
+			status = iopf_handle_single(fault);
+
+		if (!evt->last_req)
+			kfree(fault);
+	}
+
+	iopf_complete(group->dev, &group->last_fault.evt, status);
+	kfree(group);
+}
+
+/**
+ * iommu_queue_iopf - IO Page Fault handler
+ * @evt: fault event
+ * @cookie: struct device, passed to iommu_register_device_fault_handler.
+ *
+ * Add a fault to the device workqueue, to be handled by mm.
+ */
+int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
+{
+	struct iopf_group *group;
+	struct iopf_fault *fault, *next;
+	struct iopf_device_param *iopf_param;
+
+	struct device *dev = cookie;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (WARN_ON(!mutex_is_locked(&param->lock)))
+		return -EINVAL;
+
+	if (evt->type != IOMMU_FAULT_PAGE_REQ)
+		/* Not a recoverable page fault */
+		return 0;
+
+	/*
+	 * As long as we're holding param->lock, the queue can't be unlinked
+	 * from the device and therefore cannot disappear.
+	 */
+	iopf_param = param->iopf_param;
+	if (!iopf_param)
+		return -ENODEV;
+
+	if (!evt->last_req) {
+		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+		if (!fault)
+			return -ENOMEM;
+
+		fault->evt = *evt;
+
+		/* Non-last request of a group. Postpone until the last one */
+		list_add(&fault->head, &iopf_param->partial);
+
+		return 0;
+	}
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return -ENOMEM;
+
+	group->dev = dev;
+	group->last_fault.evt = *evt;
+	INIT_LIST_HEAD(&group->faults);
+	list_add(&group->last_fault.head, &group->faults);
+	INIT_WORK(&group->work, iopf_handle_group);
+
+	/* See if we have partial faults for this group */
+	list_for_each_entry_safe(fault, next, &iopf_param->partial, head) {
+		if (fault->evt.page_req_group_id == evt->page_req_group_id)
+			/* Insert *before* the last fault */
+			list_move(&fault->head, &group->faults);
+	}
+
+	queue_work(iopf_param->queue->wq, &group->work);
+
+	/* Postpone the fault completion */
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_queue_iopf);
+
+/**
+ * iopf_queue_flush_dev - Ensure that all queued faults have been processed
+ * @dev: the endpoint whose faults need to be flushed.
+ * @pasid: the PASID affected by this flush
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults for this PASID have been handled, and won't hit the address
+ * space of the next process that uses this PASID.
+ *
+ * This function can also be called before shutting down the device, in which
+ * case @pasid should be IOMMU_PASID_INVALID.
+ *
+ * Return 0 on success.
+ */
+int iopf_queue_flush_dev(struct device *dev, int pasid)
+{
+	int ret = 0;
+	struct iopf_queue *queue;
+	struct iopf_fault *fault, *next;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -ENODEV;
+
+	/*
+	 * It is incredibly easy to find ourselves in a deadlock situation if
+	 * we're not careful, because we're taking the opposite path as
+	 * iommu_queue_iopf:
+	 *
+	 *   iopf_queue_flush_dev()   |  PRI queue handler
+	 *    lock(mutex)             |   iommu_queue_iopf()
+	 *     queue->flush()         |    lock(mutex)
+	 *      wait PRI queue empty  |
+	 *
+	 * So we can't hold the device param lock while flushing. We don't have
+	 * to, because the queue or the device won't disappear until all flush
+	 * are finished.
+	 */
+	mutex_lock(&param->lock);
+	if (param->iopf_param)
+		queue = param->iopf_param->queue;
+	else
+		ret = -ENODEV;
+	mutex_unlock(&param->lock);
+	if (ret)
+		return ret;
+
+	/*
+	 * When removing a PASID, the device driver tells the device to stop
+	 * using it, and flush any pending fault to the IOMMU. In this flush
+	 * callback, the IOMMU driver makes sure that there are no such faults
+	 * left in the low-level queue.
+	 */
+	queue->flush(queue->flush_arg, dev, pasid);
+
+	/*
+	 * If at some point the low-level fault queue overflowed and the IOMMU
+	 * device had to auto-respond to a 'last' page fault, other faults from
+	 * the same Page Request Group may still be stuck in the partial list.
+	 * We need to make sure that the next address space using the PASID
+	 * doesn't receive them.
+	 */
+	mutex_lock(&param->lock);
+	list_for_each_entry_safe(fault, next, &param->iopf_param->partial, head) {
+		if (fault->evt.pasid == pasid || pasid == IOMMU_PASID_INVALID) {
+			list_del(&fault->head);
+			kfree(fault);
+		}
+	}
+	mutex_unlock(&param->lock);
+
+	flush_workqueue(queue->wq);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
+
+/**
+ * iopf_queue_add_device - Add producer to the fault queue
+ * @queue: IOPF queue
+ * @dev: device to add
+ */
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
+{
+	int ret = -EINVAL;
+	struct iopf_device_param *iopf_param;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -ENODEV;
+
+	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
+	if (!iopf_param)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&iopf_param->partial);
+	iopf_param->queue = queue;
+
+	mutex_lock(&param->lock);
+	if (!param->iopf_param) {
+		refcount_inc(&queue->refs);
+		param->iopf_param = iopf_param;
+		ret = 0;
+	}
+	mutex_unlock(&param->lock);
+
+	if (ret)
+		kfree(iopf_param);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_add_device);
+
+/**
+ * iopf_queue_remove_device - Remove producer from fault queue
+ * @dev: device to remove
+ *
+ * Caller makes sure that no more fault is reported for this device, and no more
+ * flush is scheduled for this device.
+ *
+ * Note: safe to call unconditionally on a cleanup path, even if the device
+ * isn't registered to any IOPF queue.
+ *
+ * Return 0 if the device was attached to the IOPF queue
+ */
+int iopf_queue_remove_device(struct device *dev)
+{
+	struct iopf_fault *fault, *next;
+	struct iopf_device_param *iopf_param;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -EINVAL;
+
+	mutex_lock(&param->lock);
+	iopf_param = param->iopf_param;
+	if (iopf_param) {
+		refcount_dec(&iopf_param->queue->refs);
+		param->iopf_param = NULL;
+	}
+	mutex_unlock(&param->lock);
+	if (!iopf_param)
+		return -EINVAL;
+
+	/* Just in case flush_dev() wasn't called */
+	list_for_each_entry_safe(fault, next, &iopf_param->partial, head)
+		kfree(fault);
+
+	/*
+	 * No more flush is scheduled, and the caller removed all bonds from
+	 * this device. unbind() waited until any concurrent mm_exit() finished,
+	 * therefore there is no flush() running anymore and we can free the
+	 * param.
+	 */
+	kfree(iopf_param);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
+
+/**
+ * iopf_queue_alloc - Allocate and initialize a fault queue
+ * @name: a unique string identifying the queue (for workqueue)
+ * @flush: a callback that flushes the low-level queue
+ * @cookie: driver-private data passed to the flush callback
+ *
+ * The callback is called before the workqueue is flushed. The IOMMU driver must
+ * commit all faults that are pending in its low-level queues at the time of the
+ * call, into the IOPF queue (with iommu_report_device_fault). The callback
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	struct iopf_queue *queue;
+
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
+	if (!queue)
+		return NULL;
+
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name);
+	if (!queue->wq) {
+		kfree(queue);
+		return NULL;
+	}
+
+	queue->flush = flush;
+	queue->flush_arg = cookie;
+	refcount_set(&queue->refs, 1);
+
+	return queue;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_alloc);
+
+/**
+ * iopf_queue_free - Free IOPF queue
+ * @queue: queue to free
+ *
+ * Counterpart to iopf_queue_alloc(). Caller must make sure that all producers
+ * have been removed.
+ */
+void iopf_queue_free(struct iopf_queue *queue)
+{
+	/* Caller should have removed all producers first */
+	if (WARN_ON(!refcount_dec_and_test(&queue->refs)))
+		return;
+
+	destroy_workqueue(queue->wq);
+	kfree(queue);
+}
+EXPORT_SYMBOL_GPL(iopf_queue_free);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a457650b80de..b7cd00ae7358 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -63,6 +63,8 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
+#define IOMMU_PASID_INVALID		(-1)
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -440,11 +442,20 @@ struct iommu_fault_param {
 	void *data;
 };
 
+/**
+ * iopf_queue_flush_t - Flush low-level page fault queue
+ *
+ * Report all faults currently pending in the low-level page fault queue
+ */
+struct iopf_queue;
+typedef int (*iopf_queue_flush_t)(void *cookie, struct device *dev, int pasid);
+
 /**
  * struct iommu_param - collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
- * @lock: serializes accesses to fault_param
+ * @iopf_param: I/O Page Fault queue and data
+ * @lock: serializes accesses to fault_param and iopf_param
  * @sva_param: SVA parameters
  * @sva_lock: serializes accesses to sva_param
  *
@@ -455,6 +466,7 @@ struct iommu_fault_param {
 struct iommu_param {
 	struct mutex lock;
 	struct iommu_fault_param *fault_param;
+	struct iopf_device_param *iopf_param;
 	struct mutex sva_lock;
 	struct iommu_sva_param *sva_param;
 };
@@ -1025,4 +1037,46 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 }
 #endif /* CONFIG_IOMMU_SVA */
 
+#ifdef CONFIG_IOMMU_PAGE_FAULT
+extern int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie);
+
+extern int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+extern int iopf_queue_remove_device(struct device *dev);
+extern int iopf_queue_flush_dev(struct device *dev, int pasid);
+extern struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie);
+extern void iopf_queue_free(struct iopf_queue *queue);
+#else /* CONFIG_IOMMU_PAGE_FAULT */
+static inline int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_add_device(struct iopf_queue *queue,
+					struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_remove_device(struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+#endif /* CONFIG_IOMMU_PAGE_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 07/10] iommu: Add a page fault handler
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCI PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
recently added to the IOMMU core for SVA virtualisation. Add a page fault
handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them to
IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, supposedly in an IRQ handler, the IOMMU
driver reports the fault using iommu_report_device_fault(), which calls
the registered handler. The page fault handler then calls the mm fault
handler, and reports either success or failure with iommu_page_response().
When the handler succeeded, the IOMMU retries the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
v2->v3:
* queue_flush now removes pending partial faults
* queue_flush now takes an optional PASID argument, allowing IOMMU
  drivers to selectively flush faults if possible
* remove PAGE_RESP_HANDLED/PAGE_RESP_CONTINUE
* rename iopf_context -> iopf_fault
---
 drivers/iommu/Kconfig      |   4 +
 drivers/iommu/Makefile     |   1 +
 drivers/iommu/io-pgfault.c | 382 +++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h      |  56 +++++-
 4 files changed, 442 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 88d6c68284f3..27e9999ad980 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -100,6 +100,10 @@ config IOMMU_SVA
 	select IOMMU_API
 	select MMU_NOTIFIER
 
+config IOMMU_PAGE_FAULT
+	bool
+	select IOMMU_API
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 7d6332be5f0e..1c4b0be5d44b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..29aa8c6ba459
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+/**
+ * struct iopf_queue - IO Page Fault queue
+ * @wq: the fault workqueue
+ * @flush: low-level flush callback
+ * @flush_arg: flush() argument
+ * @refs: references to this structure taken by producers
+ */
+struct iopf_queue {
+	struct workqueue_struct		*wq;
+	iopf_queue_flush_t		flush;
+	void				*flush_arg;
+	refcount_t			refs;
+};
+
+/**
+ * struct iopf_device_param - IO Page Fault data attached to a device
+ * @queue: IOPF queue
+ * @partial: faults that are part of a Page Request Group for which the last
+ *           request hasn't been submitted yet.
+ */
+struct iopf_device_param {
+	struct iopf_queue		*queue;
+	struct list_head		partial;
+};
+
+struct iopf_fault {
+	struct iommu_fault_event	evt;
+	struct list_head		head;
+};
+
+struct iopf_group {
+	struct iopf_fault		last_fault;
+	struct list_head		faults;
+	struct work_struct		work;
+	struct device			*dev;
+};
+
+static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
+			 enum page_response_code status)
+{
+	struct page_response_msg resp = {
+		.addr			= evt->addr,
+		.pasid			= evt->pasid,
+		.pasid_present		= evt->pasid_valid,
+		.page_req_group_id	= evt->page_req_group_id,
+		.private_data		= evt->iommu_private,
+		.resp_code		= status,
+	};
+
+	return iommu_page_response(dev, &resp);
+}
+
+static enum page_response_code
+iopf_handle_single(struct iopf_fault *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iopf_handle_group(struct work_struct *work)
+{
+	struct iopf_group *group;
+	struct iopf_fault *fault, *next;
+	enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
+
+	group = container_of(work, struct iopf_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault_event *evt = &fault->evt;
+		/*
+		 * For the moment, errors are sticky: don't handle subsequent
+		 * faults in the group if there is an error.
+		 */
+		if (status == IOMMU_PAGE_RESP_SUCCESS)
+			status = iopf_handle_single(fault);
+
+		if (!evt->last_req)
+			kfree(fault);
+	}
+
+	iopf_complete(group->dev, &group->last_fault.evt, status);
+	kfree(group);
+}
+
+/**
+ * iommu_queue_iopf - IO Page Fault handler
+ * @evt: fault event
+ * @cookie: struct device, passed to iommu_register_device_fault_handler.
+ *
+ * Add a fault to the device workqueue, to be handled by mm.
+ */
+int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
+{
+	struct iopf_group *group;
+	struct iopf_fault *fault, *next;
+	struct iopf_device_param *iopf_param;
+
+	struct device *dev = cookie;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (WARN_ON(!mutex_is_locked(&param->lock)))
+		return -EINVAL;
+
+	if (evt->type != IOMMU_FAULT_PAGE_REQ)
+		/* Not a recoverable page fault */
+		return 0;
+
+	/*
+	 * As long as we're holding param->lock, the queue can't be unlinked
+	 * from the device and therefore cannot disappear.
+	 */
+	iopf_param = param->iopf_param;
+	if (!iopf_param)
+		return -ENODEV;
+
+	if (!evt->last_req) {
+		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+		if (!fault)
+			return -ENOMEM;
+
+		fault->evt = *evt;
+
+		/* Non-last request of a group. Postpone until the last one */
+		list_add(&fault->head, &iopf_param->partial);
+
+		return 0;
+	}
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return -ENOMEM;
+
+	group->dev = dev;
+	group->last_fault.evt = *evt;
+	INIT_LIST_HEAD(&group->faults);
+	list_add(&group->last_fault.head, &group->faults);
+	INIT_WORK(&group->work, iopf_handle_group);
+
+	/* See if we have partial faults for this group */
+	list_for_each_entry_safe(fault, next, &iopf_param->partial, head) {
+		if (fault->evt.page_req_group_id == evt->page_req_group_id)
+			/* Insert *before* the last fault */
+			list_move(&fault->head, &group->faults);
+	}
+
+	queue_work(iopf_param->queue->wq, &group->work);
+
+	/* Postpone the fault completion */
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_queue_iopf);
+
+/**
+ * iopf_queue_flush_dev - Ensure that all queued faults have been processed
+ * @dev: the endpoint whose faults need to be flushed.
+ * @pasid: the PASID affected by this flush
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults for this PASID have been handled, and won't hit the address
+ * space of the next process that uses this PASID.
+ *
+ * This function can also be called before shutting down the device, in which
+ * case @pasid should be IOMMU_PASID_INVALID.
+ *
+ * Return 0 on success.
+ */
+int iopf_queue_flush_dev(struct device *dev, int pasid)
+{
+	int ret = 0;
+	struct iopf_queue *queue;
+	struct iopf_fault *fault, *next;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -ENODEV;
+
+	/*
+	 * It is incredibly easy to find ourselves in a deadlock situation if
+	 * we're not careful, because we're taking the opposite path as
+	 * iommu_queue_iopf:
+	 *
+	 *   iopf_queue_flush_dev()   |  PRI queue handler
+	 *    lock(mutex)             |   iommu_queue_iopf()
+	 *     queue->flush()         |    lock(mutex)
+	 *      wait PRI queue empty  |
+	 *
+	 * So we can't hold the device param lock while flushing. We don't have
+	 * to, because the queue or the device won't disappear until all flush
+	 * are finished.
+	 */
+	mutex_lock(&param->lock);
+	if (param->iopf_param)
+		queue = param->iopf_param->queue;
+	else
+		ret = -ENODEV;
+	mutex_unlock(&param->lock);
+	if (ret)
+		return ret;
+
+	/*
+	 * When removing a PASID, the device driver tells the device to stop
+	 * using it, and flush any pending fault to the IOMMU. In this flush
+	 * callback, the IOMMU driver makes sure that there are no such faults
+	 * left in the low-level queue.
+	 */
+	queue->flush(queue->flush_arg, dev, pasid);
+
+	/*
+	 * If at some point the low-level fault queue overflowed and the IOMMU
+	 * device had to auto-respond to a 'last' page fault, other faults from
+	 * the same Page Request Group may still be stuck in the partial list.
+	 * We need to make sure that the next address space using the PASID
+	 * doesn't receive them.
+	 */
+	mutex_lock(&param->lock);
+	list_for_each_entry_safe(fault, next, &param->iopf_param->partial, head) {
+		if (fault->evt.pasid == pasid || pasid == IOMMU_PASID_INVALID) {
+			list_del(&fault->head);
+			kfree(fault);
+		}
+	}
+	mutex_unlock(&param->lock);
+
+	flush_workqueue(queue->wq);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
+
+/**
+ * iopf_queue_add_device - Add producer to the fault queue
+ * @queue: IOPF queue
+ * @dev: device to add
+ */
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
+{
+	int ret = -EINVAL;
+	struct iopf_device_param *iopf_param;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -ENODEV;
+
+	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
+	if (!iopf_param)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&iopf_param->partial);
+	iopf_param->queue = queue;
+
+	mutex_lock(&param->lock);
+	if (!param->iopf_param) {
+		refcount_inc(&queue->refs);
+		param->iopf_param = iopf_param;
+		ret = 0;
+	}
+	mutex_unlock(&param->lock);
+
+	if (ret)
+		kfree(iopf_param);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_add_device);
+
+/**
+ * iopf_queue_remove_device - Remove producer from fault queue
+ * @dev: device to remove
+ *
+ * Caller makes sure that no more fault is reported for this device, and no more
+ * flush is scheduled for this device.
+ *
+ * Note: safe to call unconditionally on a cleanup path, even if the device
+ * isn't registered to any IOPF queue.
+ *
+ * Return 0 if the device was attached to the IOPF queue
+ */
+int iopf_queue_remove_device(struct device *dev)
+{
+	struct iopf_fault *fault, *next;
+	struct iopf_device_param *iopf_param;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -EINVAL;
+
+	mutex_lock(&param->lock);
+	iopf_param = param->iopf_param;
+	if (iopf_param) {
+		refcount_dec(&iopf_param->queue->refs);
+		param->iopf_param = NULL;
+	}
+	mutex_unlock(&param->lock);
+	if (!iopf_param)
+		return -EINVAL;
+
+	/* Just in case flush_dev() wasn't called */
+	list_for_each_entry_safe(fault, next, &iopf_param->partial, head)
+		kfree(fault);
+
+	/*
+	 * No more flush is scheduled, and the caller removed all bonds from
+	 * this device. unbind() waited until any concurrent mm_exit() finished,
+	 * therefore there is no flush() running anymore and we can free the
+	 * param.
+	 */
+	kfree(iopf_param);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
+
+/**
+ * iopf_queue_alloc - Allocate and initialize a fault queue
+ * @name: a unique string identifying the queue (for workqueue)
+ * @flush: a callback that flushes the low-level queue
+ * @cookie: driver-private data passed to the flush callback
+ *
+ * The callback is called before the workqueue is flushed. The IOMMU driver must
+ * commit all faults that are pending in its low-level queues at the time of the
+ * call, into the IOPF queue (with iommu_report_device_fault). The callback
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	struct iopf_queue *queue;
+
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
+	if (!queue)
+		return NULL;
+
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name);
+	if (!queue->wq) {
+		kfree(queue);
+		return NULL;
+	}
+
+	queue->flush = flush;
+	queue->flush_arg = cookie;
+	refcount_set(&queue->refs, 1);
+
+	return queue;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_alloc);
+
+/**
+ * iopf_queue_free - Free IOPF queue
+ * @queue: queue to free
+ *
+ * Counterpart to iopf_queue_alloc(). Caller must make sure that all producers
+ * have been removed.
+ */
+void iopf_queue_free(struct iopf_queue *queue)
+{
+	/* Caller should have removed all producers first */
+	if (WARN_ON(!refcount_dec_and_test(&queue->refs)))
+		return;
+
+	destroy_workqueue(queue->wq);
+	kfree(queue);
+}
+EXPORT_SYMBOL_GPL(iopf_queue_free);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a457650b80de..b7cd00ae7358 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -63,6 +63,8 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
+#define IOMMU_PASID_INVALID		(-1)
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -440,11 +442,20 @@ struct iommu_fault_param {
 	void *data;
 };
 
+/**
+ * iopf_queue_flush_t - Flush low-level page fault queue
+ *
+ * Report all faults currently pending in the low-level page fault queue
+ */
+struct iopf_queue;
+typedef int (*iopf_queue_flush_t)(void *cookie, struct device *dev, int pasid);
+
 /**
  * struct iommu_param - collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
- * @lock: serializes accesses to fault_param
+ * @iopf_param: I/O Page Fault queue and data
+ * @lock: serializes accesses to fault_param and iopf_param
  * @sva_param: SVA parameters
  * @sva_lock: serializes accesses to sva_param
  *
@@ -455,6 +466,7 @@ struct iommu_fault_param {
 struct iommu_param {
 	struct mutex lock;
 	struct iommu_fault_param *fault_param;
+	struct iopf_device_param *iopf_param;
 	struct mutex sva_lock;
 	struct iommu_sva_param *sva_param;
 };
@@ -1025,4 +1037,46 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 }
 #endif /* CONFIG_IOMMU_SVA */
 
+#ifdef CONFIG_IOMMU_PAGE_FAULT
+extern int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie);
+
+extern int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+extern int iopf_queue_remove_device(struct device *dev);
+extern int iopf_queue_flush_dev(struct device *dev, int pasid);
+extern struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie);
+extern void iopf_queue_free(struct iopf_queue *queue);
+#else /* CONFIG_IOMMU_PAGE_FAULT */
+static inline int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_add_device(struct iopf_queue *queue,
+					struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_remove_device(struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+#endif /* CONFIG_IOMMU_PAGE_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 08/10] iommu/iopf: Handle mm faults
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

When a recoverable page fault is handled by the fault workqueue, find the
associated mm and call handle_mm_fault.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c | 86 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 29aa8c6ba459..f6d9f40b879b 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -7,6 +7,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -65,8 +66,65 @@ static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
 static enum page_response_code
 iopf_handle_single(struct iopf_fault *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	vm_fault_t ret;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	unsigned int access_flags = 0;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault_event *evt = &fault->evt;
+	enum page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+	if (!evt->pasid_valid)
+		return status;
+
+	mm = iommu_sva_find(evt->pasid);
+	if (!mm)
+		return status;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, evt->addr);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (evt->prot & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (evt->prot & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (evt->prot & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(evt->prot & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, evt->addr, fault_flags);
+	status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+		IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * If the process exits while we're handling the fault on its mm, we
+	 * can't do mmput(). exit_mmap() would release the MMU notifier, calling
+	 * iommu_notifier_release(), which has to flush the fault queue that
+	 * we're executing on... So mmput_async() moves the release of the mm to
+	 * another thread, if we're the last user.
+	 */
+	mmput_async(mm);
+
+	return status;
 }
 
 static void iopf_handle_group(struct work_struct *work)
@@ -100,6 +158,30 @@ static void iopf_handle_group(struct work_struct *work)
  * @cookie: struct device, passed to iommu_register_device_fault_handler.
  *
  * Add a fault to the device workqueue, to be handled by mm.
+ *
+ * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard
+ * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't
+ * expect a response. It may be generated when disabling a PASID (issuing a
+ * PASID stop request) by some PCI devices.
+ *
+ * The PASID stop request is triggered by the mm_exit() callback. When the
+ * callback returns from the device driver, no page request is generated for
+ * this PASID anymore and outstanding ones have been pushed to the IOMMU (as per
+ * PCIe 4.0r1.0 - 6.20.1 and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some
+ * PCI devices will wait for all outstanding page requests to come back with a
+ * response before completing the PASID stop request. Others do not wait for
+ * page responses, and instead issue this Stop Marker that tells us when the
+ * PASID can be reallocated.
+ *
+ * It is safe to discard the Stop Marker because it is an optimization.
+ * a. Page requests, which are posted requests, have been flushed to the IOMMU
+ *    when mm_exit() returns,
+ * b. We flush all fault queues after mm_exit() returns and before freeing the
+ *    PASID.
+ *
+ * So even though the Stop Marker might be issued by the device *after* the stop
+ * request completes, outstanding faults will have been dealt with by the time
+ * we free the PASID.
  */
 int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
 {
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 08/10] iommu/iopf: Handle mm faults
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

When a recoverable page fault is handled by the fault workqueue, find the
associated mm and call handle_mm_fault.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/io-pgfault.c | 86 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 29aa8c6ba459..f6d9f40b879b 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -7,6 +7,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -65,8 +66,65 @@ static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
 static enum page_response_code
 iopf_handle_single(struct iopf_fault *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	vm_fault_t ret;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	unsigned int access_flags = 0;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault_event *evt = &fault->evt;
+	enum page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+	if (!evt->pasid_valid)
+		return status;
+
+	mm = iommu_sva_find(evt->pasid);
+	if (!mm)
+		return status;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, evt->addr);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (evt->prot & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (evt->prot & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (evt->prot & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(evt->prot & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, evt->addr, fault_flags);
+	status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+		IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * If the process exits while we're handling the fault on its mm, we
+	 * can't do mmput(). exit_mmap() would release the MMU notifier, calling
+	 * iommu_notifier_release(), which has to flush the fault queue that
+	 * we're executing on... So mmput_async() moves the release of the mm to
+	 * another thread, if we're the last user.
+	 */
+	mmput_async(mm);
+
+	return status;
 }
 
 static void iopf_handle_group(struct work_struct *work)
@@ -100,6 +158,30 @@ static void iopf_handle_group(struct work_struct *work)
  * @cookie: struct device, passed to iommu_register_device_fault_handler.
  *
  * Add a fault to the device workqueue, to be handled by mm.
+ *
+ * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard
+ * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't
+ * expect a response. It may be generated when disabling a PASID (issuing a
+ * PASID stop request) by some PCI devices.
+ *
+ * The PASID stop request is triggered by the mm_exit() callback. When the
+ * callback returns from the device driver, no page request is generated for
+ * this PASID anymore and outstanding ones have been pushed to the IOMMU (as per
+ * PCIe 4.0r1.0 - 6.20.1 and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some
+ * PCI devices will wait for all outstanding page requests to come back with a
+ * response before completing the PASID stop request. Others do not wait for
+ * page responses, and instead issue this Stop Marker that tells us when the
+ * PASID can be reallocated.
+ *
+ * It is safe to discard the Stop Marker because it is an optimization.
+ * a. Page requests, which are posted requests, have been flushed to the IOMMU
+ *    when mm_exit() returns,
+ * b. We flush all fault queues after mm_exit() returns and before freeing the
+ *    PASID.
+ *
+ * So even though the Stop Marker might be issued by the device *after* the stop
+ * request completes, outstanding faults will have been dealt with by the time
+ * we free the PASID.
  */
 int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
 {
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 09/10] iommu/sva: Register page fault handler
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Let users call iommu_sva_init_device() with the IOMMU_SVA_FEAT_IOPF flag,
that enables the I/O Page Fault queue. The IOMMU driver checks is the
device supports a form of page fault, in which case they add the device to
a fault queue. If the device doesn't support page faults, the IOMMU driver
aborts iommu_sva_init_device().

The fault queue must be flushed before any io_mm is freed, to make sure
that its PASID isn't used in any fault queue, and can be reallocated.
Add iopf_queue_flush() calls in a few strategic locations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 26 +++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |  6 +++---
 include/linux/iommu.h     |  2 ++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index ee86f00ee1b9..1588a523a214 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -443,6 +443,8 @@ static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm
 			dev_WARN(bond->dev, "possible leak of PASID %u",
 				 io_mm->pasid);
 
+		iopf_queue_flush_dev(bond->dev, io_mm->pasid);
+
 		spin_lock(&iommu_sva_lock);
 		next = list_next_entry(bond, mm_head);
 
@@ -590,6 +592,12 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 		goto out_unlock;
 	}
 
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iopf_queue_flush_dev(dev, pasid);
+
 	/* spin_lock_irq matches the one in wait_event_lock_irq */
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
@@ -615,6 +623,8 @@ static void __iommu_sva_unbind_device_all(struct device *dev)
 	if (!param)
 		return;
 
+	iopf_queue_flush_dev(dev, IOMMU_PASID_INVALID);
+
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
 		io_mm_detach_locked(bond, true);
@@ -680,6 +690,9 @@ EXPORT_SYMBOL_GPL(iommu_sva_find);
  * overrides it. Similarly, @min_pasid overrides the lower PASID limit supported
  * by the IOMMU.
  *
+ * If the device should support recoverable I/O Page Faults (e.g. PCI PRI), the
+ * IOMMU_SVA_FEAT_IOPF feature must be requested.
+ *
  * @mm_exit is called when an address space bound to the device is about to be
  * torn down by exit_mmap. After @mm_exit returns, the device must not issue any
  * more transaction with the PASID given as argument. The handler gets an opaque
@@ -707,7 +720,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	if (!domain || !domain->ops->sva_init_device)
 		return -ENODEV;
 
-	if (features)
+	if (features & ~IOMMU_SVA_FEAT_IOPF)
 		return -EINVAL;
 
 	param = kzalloc(sizeof(*param), GFP_KERNEL);
@@ -734,10 +747,20 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	if (ret)
 		goto err_unlock;
 
+	if (features & IOMMU_SVA_FEAT_IOPF) {
+		ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
+							  dev);
+		if (ret)
+			goto err_shutdown;
+	}
+
 	dev->iommu_param->sva_param = param;
 	mutex_unlock(&dev->iommu_param->sva_lock);
 	return 0;
 
+err_shutdown:
+	if (domain->ops->sva_shutdown_device)
+		domain->ops->sva_shutdown_device(dev);
 err_unlock:
 	mutex_unlock(&dev->iommu_param->sva_lock);
 	kfree(param);
@@ -766,6 +789,7 @@ void iommu_sva_shutdown_device(struct device *dev)
 		goto out_unlock;
 
 	__iommu_sva_unbind_device_all(dev);
+	iommu_unregister_device_fault_handler(dev);
 
 	if (domain->ops->sva_shutdown_device)
 		domain->ops->sva_shutdown_device(dev);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7113fe398b70..b493f5c4fe64 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2342,9 +2342,9 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
  * iommu_sva_init_device() must be called first, to initialize the required SVA
  * features. @flags must be a subset of these features.
  *
- * The caller must pin down using get_user_pages*() all mappings shared with the
- * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
- * (e.g. copy-on-write).
+ * If IOMMU_SVA_FEAT_IOPF isn't requested, the caller must pin down using
+ * get_user_pages*() all mappings shared with the device. mlock() isn't
+ * sufficient, as it doesn't prevent minor page faults (e.g. copy-on-write).
  *
  * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
  * is returned.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b7cd00ae7358..ad2b18883ae2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -65,6 +65,8 @@ typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
 #define IOMMU_PASID_INVALID		(-1)
 
+#define IOMMU_SVA_FEAT_IOPF		(1 << 0)
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 09/10] iommu/sva: Register page fault handler
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Let users call iommu_sva_init_device() with the IOMMU_SVA_FEAT_IOPF flag,
that enables the I/O Page Fault queue. The IOMMU driver checks is the
device supports a form of page fault, in which case they add the device to
a fault queue. If the device doesn't support page faults, the IOMMU driver
aborts iommu_sva_init_device().

The fault queue must be flushed before any io_mm is freed, to make sure
that its PASID isn't used in any fault queue, and can be reallocated.
Add iopf_queue_flush() calls in a few strategic locations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-sva.c | 26 +++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |  6 +++---
 include/linux/iommu.h     |  2 ++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index ee86f00ee1b9..1588a523a214 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -443,6 +443,8 @@ static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm
 			dev_WARN(bond->dev, "possible leak of PASID %u",
 				 io_mm->pasid);
 
+		iopf_queue_flush_dev(bond->dev, io_mm->pasid);
+
 		spin_lock(&iommu_sva_lock);
 		next = list_next_entry(bond, mm_head);
 
@@ -590,6 +592,12 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 		goto out_unlock;
 	}
 
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iopf_queue_flush_dev(dev, pasid);
+
 	/* spin_lock_irq matches the one in wait_event_lock_irq */
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
@@ -615,6 +623,8 @@ static void __iommu_sva_unbind_device_all(struct device *dev)
 	if (!param)
 		return;
 
+	iopf_queue_flush_dev(dev, IOMMU_PASID_INVALID);
+
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
 		io_mm_detach_locked(bond, true);
@@ -680,6 +690,9 @@ EXPORT_SYMBOL_GPL(iommu_sva_find);
  * overrides it. Similarly, @min_pasid overrides the lower PASID limit supported
  * by the IOMMU.
  *
+ * If the device should support recoverable I/O Page Faults (e.g. PCI PRI), the
+ * IOMMU_SVA_FEAT_IOPF feature must be requested.
+ *
  * @mm_exit is called when an address space bound to the device is about to be
  * torn down by exit_mmap. After @mm_exit returns, the device must not issue any
  * more transaction with the PASID given as argument. The handler gets an opaque
@@ -707,7 +720,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	if (!domain || !domain->ops->sva_init_device)
 		return -ENODEV;
 
-	if (features)
+	if (features & ~IOMMU_SVA_FEAT_IOPF)
 		return -EINVAL;
 
 	param = kzalloc(sizeof(*param), GFP_KERNEL);
@@ -734,10 +747,20 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
 	if (ret)
 		goto err_unlock;
 
+	if (features & IOMMU_SVA_FEAT_IOPF) {
+		ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
+							  dev);
+		if (ret)
+			goto err_shutdown;
+	}
+
 	dev->iommu_param->sva_param = param;
 	mutex_unlock(&dev->iommu_param->sva_lock);
 	return 0;
 
+err_shutdown:
+	if (domain->ops->sva_shutdown_device)
+		domain->ops->sva_shutdown_device(dev);
 err_unlock:
 	mutex_unlock(&dev->iommu_param->sva_lock);
 	kfree(param);
@@ -766,6 +789,7 @@ void iommu_sva_shutdown_device(struct device *dev)
 		goto out_unlock;
 
 	__iommu_sva_unbind_device_all(dev);
+	iommu_unregister_device_fault_handler(dev);
 
 	if (domain->ops->sva_shutdown_device)
 		domain->ops->sva_shutdown_device(dev);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7113fe398b70..b493f5c4fe64 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2342,9 +2342,9 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
  * iommu_sva_init_device() must be called first, to initialize the required SVA
  * features. @flags must be a subset of these features.
  *
- * The caller must pin down using get_user_pages*() all mappings shared with the
- * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
- * (e.g. copy-on-write).
+ * If IOMMU_SVA_FEAT_IOPF isn't requested, the caller must pin down using
+ * get_user_pages*() all mappings shared with the device. mlock() isn't
+ * sufficient, as it doesn't prevent minor page faults (e.g. copy-on-write).
  *
  * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
  * is returned.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b7cd00ae7358..ad2b18883ae2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -65,6 +65,8 @@ typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
 #define IOMMU_PASID_INVALID		(-1)
 
+#define IOMMU_SVA_FEAT_IOPF		(1 << 0)
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Provide an API for allocating PASIDs and populating them manually. To ease
cleanup and factor allocation code, reuse the io_mm structure for private
PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
multiple devices. The mm_alloc() IOMMU op must now check if the mm
argument is NULL, in which case it should allocate io_pgtables instead of
binding to an mm.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
Sadly this probably won't be the final thing. The API in this patch is
used like this:

        iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
        iommu_sva_map(io_mm, ...)
        iommu_sva_unmap(io_mm, ...)
        iommu_sva_free_pasid(dev, io_mm)

The proposed API for auxiliary domains is in an early stage but might
replace this patch and could be used like this:

        iommu_enable_aux_domain(dev)
        d = iommu_domain_alloc()
        iommu_attach_aux(dev, d)
        iommu_aux_id(d) -> PASID
        iommu_map(d, ...)
        iommu_unmap(d, ...)
        iommu_detach_aux(dev, d)
        iommu_domain_free(d)

The advantage being that the driver doesn't have to use a special
version of map/unmap/etc.
---
 drivers/iommu/iommu-sva.c | 209 ++++++++++++++++++++++++++++++++++----
 drivers/iommu/iommu.c     |  51 ++++++----
 include/linux/iommu.h     | 112 +++++++++++++++++++-
 3 files changed, 331 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 1588a523a214..029776f64e7d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -15,11 +15,11 @@
 /**
  * DOC: io_mm model
  *
- * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
- * The following example illustrates the relation between structures
- * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
- * device. A device can have multiple io_mm and an io_mm may be bound to
- * multiple devices.
+ * When used with the bind()/unbind() functions, the io_mm keeps track of
+ * process address spaces shared between CPU and IOMMU. The following example
+ * illustrates the relation between structures iommu_domain, io_mm and
+ * iommu_bond. An iommu_bond is a link between io_mm and device. A device can
+ * have multiple io_mm and an io_mm may be bound to multiple devices.
  *              ___________________________
  *             |  IOMMU domain A           |
  *             |  ________________         |
@@ -98,6 +98,12 @@
  * the first entry points to the io_pgtable pointer. In other IOMMUs the
  * io_pgtable pointer is held in the device table and PASID #0 is available to
  * the allocator.
+ *
+ * The io_mm can also represent a private IOMMU address space, which isn't
+ * shared with a process. The device driver calls iommu_sva_alloc_pasid which
+ * returns an io_mm that can be populated with the iommu_sva_map/unmap
+ * functions. The principle is the same as shared io_mm, except that a private
+ * io_mm cannot be bound to multiple devices.
  */
 
 struct iommu_bond {
@@ -131,6 +137,9 @@ static DEFINE_SPINLOCK(iommu_sva_lock);
 
 static struct mmu_notifier_ops iommu_mmu_notifier;
 
+#define io_mm_is_private(io_mm) ((io_mm) != NULL && (io_mm)->mm == NULL)
+#define io_mm_is_shared(io_mm) ((io_mm) != NULL && (io_mm)->mm != NULL)
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm, unsigned long flags)
@@ -149,19 +158,10 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	if (!io_mm)
 		return ERR_PTR(-ENOMEM);
 
-	/*
-	 * The mm must not be freed until after the driver frees the io_mm
-	 * (which may involve unpinning the CPU ASID for instance, requiring a
-	 * valid mm struct.)
-	 */
-	mmgrab(mm);
-
 	io_mm->flags		= flags;
 	io_mm->mm		= mm;
-	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
-	/* Leave kref as zero until the io_mm is fully initialized */
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_sva_lock);
@@ -176,6 +176,32 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 		goto err_free_mm;
 	}
 
+	return io_mm;
+
+err_free_mm:
+	io_mm->release(io_mm);
+	return ERR_PTR(ret);
+}
+
+static struct io_mm *
+io_mm_alloc_shared(struct iommu_domain *domain, struct device *dev,
+		   struct mm_struct *mm, unsigned long flags)
+{
+	int ret;
+	struct io_mm *io_mm;
+
+	io_mm = io_mm_alloc(domain, dev, mm, flags);
+	if (IS_ERR(io_mm))
+		return io_mm;
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->notifier.ops = &iommu_mmu_notifier;
 	ret = mmu_notifier_register(&io_mm->notifier, mm);
 	if (ret)
 		goto err_free_pasid;
@@ -203,7 +229,6 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);
 
-err_free_mm:
 	io_mm->release(io_mm);
 	mmdrop(mm);
 
@@ -231,6 +256,11 @@ static void io_mm_release(struct kref *kref)
 
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 
+	if (io_mm_is_private(io_mm)) {
+		io_mm->release(io_mm);
+		return;
+	}
+
 	/*
 	 * If we're being released from mm exit, the notifier callback ->release
 	 * has already been called. Otherwise we don't need ->release, the io_mm
@@ -258,7 +288,7 @@ static int io_mm_get_locked(struct io_mm *io_mm)
 	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
 		/*
 		 * kref_get_unless_zero doesn't provide ordering for reads. This
-		 * barrier pairs with the one in io_mm_alloc.
+		 * barrier pairs with the one in io_mm_alloc_shared.
 		 */
 		smp_rmb();
 		return 1;
@@ -289,7 +319,7 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	struct iommu_sva_param *param = dev->iommu_param->sva_param;
 
 	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
-	    !domain->ops->mm_invalidate)
+	    (io_mm_is_shared(io_mm) && !domain->ops->mm_invalidate))
 		return -ENODEV;
 
 	if (pasid > param->max_pasid || pasid < param->min_pasid)
@@ -555,7 +585,7 @@ int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid
 	}
 
 	if (!io_mm) {
-		io_mm = io_mm_alloc(domain, dev, mm, flags);
+		io_mm = io_mm_alloc_shared(domain, dev, mm, flags);
 		if (IS_ERR(io_mm)) {
 			ret = PTR_ERR(io_mm);
 			goto out_unlock;
@@ -601,6 +631,9 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 	/* spin_lock_irq matches the one in wait_event_lock_irq */
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
+		if (io_mm_is_private(bond->io_mm))
+			continue;
+
 		if (bond->io_mm->pasid == pasid) {
 			io_mm_detach_locked(bond, true);
 			ret = 0;
@@ -672,6 +705,136 @@ struct mm_struct *iommu_sva_find(int pasid)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_find);
 
+/*
+ * iommu_sva_alloc_pasid - Allocate a private PASID
+ *
+ * Allocate a PASID for private map/unmap operations. Create a new I/O address
+ * space for this device, that isn't bound to any process.
+ *
+ * iommu_sva_init_device must have been called first.
+ */
+int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **out)
+{
+	int ret;
+	struct io_mm *io_mm;
+	struct iommu_domain *domain;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!out || !param)
+		return -EINVAL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	io_mm = io_mm_alloc(domain, dev, NULL, 0);
+	if (IS_ERR(io_mm))
+		return PTR_ERR(io_mm);
+
+	kref_init(&io_mm->kref);
+
+	ret = io_mm_attach(domain, dev, io_mm, NULL);
+	if (ret) {
+		io_mm_put(io_mm);
+		return ret;
+	}
+
+	*out = io_mm;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
+
+void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm)
+{
+	struct iommu_bond *bond;
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return;
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		if (bond->dev == dev) {
+			io_mm_detach_locked(bond, false);
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
+
+int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		  unsigned long iova, phys_addr_t paddr, size_t size, int prot)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return -ENODEV;
+
+	return __iommu_map(domain, io_mm, iova, paddr, size, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map);
+
+size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			unsigned long iova, struct scatterlist *sg,
+			unsigned int nents, int prot)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return -ENODEV;
+
+	return __iommu_map_sg(domain, io_mm, iova, sg, nents, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map_sg);
+
+size_t iommu_sva_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, size_t size)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	return __iommu_unmap(domain, io_mm, iova, size, true);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap);
+
+size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	return __iommu_unmap(domain, io_mm, iova, size, false);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap_fast);
+
+phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+				   struct io_mm *io_mm, dma_addr_t iova)
+{
+	if (!io_mm)
+		return iommu_iova_to_phys(domain, iova);
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	if (unlikely(domain->ops->sva_iova_to_phys == NULL))
+		return 0;
+
+	return domain->ops->sva_iova_to_phys(domain, io_mm, iova);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_iova_to_phys);
+
+void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm *io_mm,
+			     unsigned long iova, size_t size)
+{
+	if (!io_mm) {
+		iommu_tlb_range_add(domain, iova, size);
+		return;
+	}
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return;
+
+	if (domain->ops->sva_iotlb_range_add != NULL)
+		domain->ops->sva_iotlb_range_add(domain, io_mm, iova, size);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_tlb_range_add);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -693,10 +856,12 @@ EXPORT_SYMBOL_GPL(iommu_sva_find);
  * If the device should support recoverable I/O Page Faults (e.g. PCI PRI), the
  * IOMMU_SVA_FEAT_IOPF feature must be requested.
  *
- * @mm_exit is called when an address space bound to the device is about to be
- * torn down by exit_mmap. After @mm_exit returns, the device must not issue any
- * more transaction with the PASID given as argument. The handler gets an opaque
- * pointer corresponding to the drvdata passed as argument to bind().
+ * If the driver intends to share process address spaces with the device, it
+ * should pass a valid @mm_exit handler. @mm_exit is called when an address
+ * space bound to the device is about to be torn down by exit_mmap. After
+ * @mm_exit returns, the device must not issue any more transaction with the
+ * PASID given as argument. The handler gets an opaque pointer corresponding to
+ * the drvdata passed as argument to bind().
  *
  * The @mm_exit handler is allowed to sleep. Be careful about the locks taken in
  * @mm_exit, because they might lead to deadlocks if they are also held when
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b493f5c4fe64..dd75c0a19c3a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1854,8 +1854,8 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
 	return pgsize;
 }
 
-int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, size_t size, int prot)
+int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		unsigned long iova, phys_addr_t paddr, size_t size, int prot)
 {
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
@@ -1863,7 +1863,8 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	phys_addr_t orig_paddr = paddr;
 	int ret = 0;
 
-	if (unlikely(domain->ops->map == NULL ||
+	if (unlikely((!io_mm && domain->ops->map == NULL) ||
+		     (io_mm && domain->ops->sva_map == NULL) ||
 		     domain->pgsize_bitmap == 0UL))
 		return -ENODEV;
 
@@ -1892,7 +1893,12 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 			 iova, &paddr, pgsize);
 
-		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (io_mm)
+			ret = domain->ops->sva_map(domain, io_mm, iova, paddr,
+						   pgsize, prot);
+		else
+			ret = domain->ops->map(domain, iova, paddr, pgsize,
+					       prot);
 		if (ret)
 			break;
 
@@ -1903,24 +1909,30 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 
 	/* unroll mapping in case something went wrong */
 	if (ret)
-		iommu_unmap(domain, orig_iova, orig_size - size);
+		__iommu_unmap(domain, io_mm, orig_iova, orig_size - size, true);
 	else
 		trace_map(orig_iova, orig_paddr, orig_size);
 
 	return ret;
 }
+
+int iommu_map(struct iommu_domain *domain, unsigned long iova,
+	      phys_addr_t paddr, size_t size, int prot)
+{
+	return __iommu_map(domain, NULL, iova, paddr, size, prot);
+}
 EXPORT_SYMBOL_GPL(iommu_map);
 
-static size_t __iommu_unmap(struct iommu_domain *domain,
-			    unsigned long iova, size_t size,
-			    bool sync)
+size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+		     unsigned long iova, size_t size, bool sync)
 {
 	const struct iommu_ops *ops = domain->ops;
 	size_t unmapped_page, unmapped = 0;
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
 
-	if (unlikely(ops->unmap == NULL ||
+	if (unlikely((!io_mm && ops->unmap == NULL) ||
+		     (io_mm && ops->sva_unmap == NULL) ||
 		     domain->pgsize_bitmap == 0UL))
 		return 0;
 
@@ -1950,7 +1962,11 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 	while (unmapped < size) {
 		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
 
-		unmapped_page = ops->unmap(domain, iova, pgsize);
+		if (io_mm)
+			unmapped_page = ops->sva_unmap(domain, io_mm, iova,
+						       pgsize);
+		else
+			unmapped_page = ops->unmap(domain, iova, pgsize);
 		if (!unmapped_page)
 			break;
 
@@ -1974,19 +1990,20 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 size_t iommu_unmap(struct iommu_domain *domain,
 		   unsigned long iova, size_t size)
 {
-	return __iommu_unmap(domain, iova, size, true);
+	return __iommu_unmap(domain, NULL, iova, size, true);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
 size_t iommu_unmap_fast(struct iommu_domain *domain,
 			unsigned long iova, size_t size)
 {
-	return __iommu_unmap(domain, iova, size, false);
+	return __iommu_unmap(domain, NULL, iova, size, false);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap_fast);
 
-size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
-		    struct scatterlist *sg, unsigned int nents, int prot)
+size_t __iommu_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+		      unsigned long iova, struct scatterlist *sg,
+		      unsigned int nents, int prot)
 {
 	struct scatterlist *s;
 	size_t mapped = 0;
@@ -2010,7 +2027,7 @@ size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 		if (!IS_ALIGNED(s->offset, min_pagesz))
 			goto out_err;
 
-		ret = iommu_map(domain, iova + mapped, phys, s->length, prot);
+		ret = __iommu_map(domain, io_mm, iova + mapped, phys, s->length, prot);
 		if (ret)
 			goto out_err;
 
@@ -2021,12 +2038,12 @@ size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 
 out_err:
 	/* undo mappings already done */
-	iommu_unmap(domain, iova, mapped);
+	__iommu_unmap(domain, io_mm, iova, mapped, true);
 
 	return 0;
 
 }
-EXPORT_SYMBOL_GPL(iommu_map_sg);
+EXPORT_SYMBOL_GPL(__iommu_map_sg);
 
 int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
 			       phys_addr_t paddr, u64 size, int prot)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ad2b18883ae2..0674fd983f81 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -248,11 +248,15 @@ struct iommu_sva_param {
  * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @sva_map: map a physically contiguous memory region to an address space
+ * @sva_unmap: unmap a physically contiguous memory region from an address space
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
  * @tlb_range_add: Add a given iova range to the flush queue for this domain
+ * @sva_iotlb_range_add: Add a given iova range to the flush queue for this mm
  * @tlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *            queue
  * @iova_to_phys: translate iova to physical address
+ * @sva_iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
  * @remove_device: remove device from iommu grouping
  * @device_group: find iommu group for a particular device
@@ -298,11 +302,21 @@ struct iommu_ops {
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size);
+	int (*sva_map)(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
+	size_t (*sva_unmap)(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size);
 	void (*flush_iotlb_all)(struct iommu_domain *domain);
 	void (*iotlb_range_add)(struct iommu_domain *domain,
 				unsigned long iova, size_t size);
+	void (*sva_iotlb_range_add)(struct iommu_domain *domain,
+				    struct io_mm *io_mm, unsigned long iova,
+				    size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
+	phys_addr_t (*sva_iova_to_phys)(struct iommu_domain *domain,
+					struct io_mm *io_mm, dma_addr_t iova);
 	int (*add_device)(struct device *dev);
 	void (*remove_device)(struct device *dev);
 	struct iommu_group *(*device_group)(struct device *dev);
@@ -525,14 +539,27 @@ extern int iommu_sva_invalidate(struct iommu_domain *domain,
 		struct device *dev, struct tlb_invalidate_info *inv_info);
 
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
+extern size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size, bool sync);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 			  size_t size);
 extern size_t iommu_unmap_fast(struct iommu_domain *domain,
 			       unsigned long iova, size_t size);
-extern size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
-			   struct scatterlist *sg,unsigned int nents, int prot);
+extern size_t __iommu_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			     unsigned long iova, struct scatterlist *sg,
+			     unsigned int nents, int prot);
+static inline size_t iommu_map_sg(struct iommu_domain *domain,
+				  unsigned long iova,
+				  struct scatterlist *sg, unsigned int nents,
+				  int prot)
+{
+	return __iommu_map_sg(domain, NULL, iova, sg, nents, prot);
+}
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
@@ -693,12 +720,25 @@ static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 	return NULL;
 }
 
+static inline int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+			      unsigned long iova, phys_addr_t paddr,
+			      size_t size, int prot)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 			    phys_addr_t paddr, size_t size, int prot)
 {
 	return -ENODEV;
 }
 
+static inline size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+				   unsigned long iova, size_t size, bool sync)
+{
+	return 0;
+}
+
 static inline size_t iommu_unmap(struct iommu_domain *domain,
 				 unsigned long iova, size_t size)
 {
@@ -1003,6 +1043,23 @@ extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void iommu_sva_unbind_device_all(struct device *dev);
 extern struct mm_struct *iommu_sva_find(int pasid);
 
+int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **io_mm);
+void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm);
+
+int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		  unsigned long iova, phys_addr_t paddr, size_t size, int prot);
+size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			unsigned long iova, struct scatterlist *sg,
+			unsigned int nents, int prot);
+size_t iommu_sva_unmap(struct iommu_domain *domain,
+		       struct io_mm *io_mm, unsigned long iova, size_t size);
+size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size);
+phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+				   struct io_mm *io_mm, dma_addr_t iova);
+void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm *io_mm,
+			     unsigned long iova, size_t size);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
@@ -1037,6 +1094,57 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 {
 	return NULL;
 }
+
+static inline int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **io_mm)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_sva_free_pasid(struct io_mm *io_mm, struct device *dev)
+{
+}
+
+static inline int iommu_sva_map(struct iommu_domain *domain,
+				struct io_mm *io_mm, unsigned long iova,
+				phys_addr_t paddr, size_t size, int prot)
+{
+	return -EINVAL;
+}
+
+static inline size_t iommu_sva_map_sg(struct iommu_domain *domain,
+				      struct io_mm *io_mm, unsigned long iova,
+				      struct scatterlist *sg,
+				      unsigned int nents, int prot)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap(struct iommu_domain *domain,
+				     struct io_mm *io_mm, unsigned long iova,
+				     size_t size)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap_fast(struct iommu_domain *domain,
+					  struct io_mm *io_mm,
+					  unsigned long iova, size_t size)
+{
+	return 0;
+}
+
+static inline phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+						 struct io_mm *io_mm,
+						 dma_addr_t iova)
+{
+	return 0;
+}
+
+static inline void iommu_sva_tlb_range_add(struct iommu_domain *domain,
+					   struct io_mm *io_mm,
+					   unsigned long iova, size_t size)
+{
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #ifdef CONFIG_IOMMU_PAGE_FAULT
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-09-20 17:00   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-20 17:00 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Provide an API for allocating PASIDs and populating them manually. To ease
cleanup and factor allocation code, reuse the io_mm structure for private
PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
multiple devices. The mm_alloc() IOMMU op must now check if the mm
argument is NULL, in which case it should allocate io_pgtables instead of
binding to an mm.

Signed-off-by: Jordan Crouse <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
Sadly this probably won't be the final thing. The API in this patch is
used like this:

        iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
        iommu_sva_map(io_mm, ...)
        iommu_sva_unmap(io_mm, ...)
        iommu_sva_free_pasid(dev, io_mm)

The proposed API for auxiliary domains is in an early stage but might
replace this patch and could be used like this:

        iommu_enable_aux_domain(dev)
        d = iommu_domain_alloc()
        iommu_attach_aux(dev, d)
        iommu_aux_id(d) -> PASID
        iommu_map(d, ...)
        iommu_unmap(d, ...)
        iommu_detach_aux(dev, d)
        iommu_domain_free(d)

The advantage being that the driver doesn't have to use a special
version of map/unmap/etc.
---
 drivers/iommu/iommu-sva.c | 209 ++++++++++++++++++++++++++++++++++----
 drivers/iommu/iommu.c     |  51 ++++++----
 include/linux/iommu.h     | 112 +++++++++++++++++++-
 3 files changed, 331 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 1588a523a214..029776f64e7d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -15,11 +15,11 @@
 /**
  * DOC: io_mm model
  *
- * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
- * The following example illustrates the relation between structures
- * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
- * device. A device can have multiple io_mm and an io_mm may be bound to
- * multiple devices.
+ * When used with the bind()/unbind() functions, the io_mm keeps track of
+ * process address spaces shared between CPU and IOMMU. The following example
+ * illustrates the relation between structures iommu_domain, io_mm and
+ * iommu_bond. An iommu_bond is a link between io_mm and device. A device can
+ * have multiple io_mm and an io_mm may be bound to multiple devices.
  *              ___________________________
  *             |  IOMMU domain A           |
  *             |  ________________         |
@@ -98,6 +98,12 @@
  * the first entry points to the io_pgtable pointer. In other IOMMUs the
  * io_pgtable pointer is held in the device table and PASID #0 is available to
  * the allocator.
+ *
+ * The io_mm can also represent a private IOMMU address space, which isn't
+ * shared with a process. The device driver calls iommu_sva_alloc_pasid which
+ * returns an io_mm that can be populated with the iommu_sva_map/unmap
+ * functions. The principle is the same as shared io_mm, except that a private
+ * io_mm cannot be bound to multiple devices.
  */
 
 struct iommu_bond {
@@ -131,6 +137,9 @@ static DEFINE_SPINLOCK(iommu_sva_lock);
 
 static struct mmu_notifier_ops iommu_mmu_notifier;
 
+#define io_mm_is_private(io_mm) ((io_mm) != NULL && (io_mm)->mm == NULL)
+#define io_mm_is_shared(io_mm) ((io_mm) != NULL && (io_mm)->mm != NULL)
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm, unsigned long flags)
@@ -149,19 +158,10 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	if (!io_mm)
 		return ERR_PTR(-ENOMEM);
 
-	/*
-	 * The mm must not be freed until after the driver frees the io_mm
-	 * (which may involve unpinning the CPU ASID for instance, requiring a
-	 * valid mm struct.)
-	 */
-	mmgrab(mm);
-
 	io_mm->flags		= flags;
 	io_mm->mm		= mm;
-	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
-	/* Leave kref as zero until the io_mm is fully initialized */
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_sva_lock);
@@ -176,6 +176,32 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 		goto err_free_mm;
 	}
 
+	return io_mm;
+
+err_free_mm:
+	io_mm->release(io_mm);
+	return ERR_PTR(ret);
+}
+
+static struct io_mm *
+io_mm_alloc_shared(struct iommu_domain *domain, struct device *dev,
+		   struct mm_struct *mm, unsigned long flags)
+{
+	int ret;
+	struct io_mm *io_mm;
+
+	io_mm = io_mm_alloc(domain, dev, mm, flags);
+	if (IS_ERR(io_mm))
+		return io_mm;
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->notifier.ops = &iommu_mmu_notifier;
 	ret = mmu_notifier_register(&io_mm->notifier, mm);
 	if (ret)
 		goto err_free_pasid;
@@ -203,7 +229,6 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);
 
-err_free_mm:
 	io_mm->release(io_mm);
 	mmdrop(mm);
 
@@ -231,6 +256,11 @@ static void io_mm_release(struct kref *kref)
 
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 
+	if (io_mm_is_private(io_mm)) {
+		io_mm->release(io_mm);
+		return;
+	}
+
 	/*
 	 * If we're being released from mm exit, the notifier callback ->release
 	 * has already been called. Otherwise we don't need ->release, the io_mm
@@ -258,7 +288,7 @@ static int io_mm_get_locked(struct io_mm *io_mm)
 	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
 		/*
 		 * kref_get_unless_zero doesn't provide ordering for reads. This
-		 * barrier pairs with the one in io_mm_alloc.
+		 * barrier pairs with the one in io_mm_alloc_shared.
 		 */
 		smp_rmb();
 		return 1;
@@ -289,7 +319,7 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	struct iommu_sva_param *param = dev->iommu_param->sva_param;
 
 	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
-	    !domain->ops->mm_invalidate)
+	    (io_mm_is_shared(io_mm) && !domain->ops->mm_invalidate))
 		return -ENODEV;
 
 	if (pasid > param->max_pasid || pasid < param->min_pasid)
@@ -555,7 +585,7 @@ int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid
 	}
 
 	if (!io_mm) {
-		io_mm = io_mm_alloc(domain, dev, mm, flags);
+		io_mm = io_mm_alloc_shared(domain, dev, mm, flags);
 		if (IS_ERR(io_mm)) {
 			ret = PTR_ERR(io_mm);
 			goto out_unlock;
@@ -601,6 +631,9 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 	/* spin_lock_irq matches the one in wait_event_lock_irq */
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
+		if (io_mm_is_private(bond->io_mm))
+			continue;
+
 		if (bond->io_mm->pasid == pasid) {
 			io_mm_detach_locked(bond, true);
 			ret = 0;
@@ -672,6 +705,136 @@ struct mm_struct *iommu_sva_find(int pasid)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_find);
 
+/*
+ * iommu_sva_alloc_pasid - Allocate a private PASID
+ *
+ * Allocate a PASID for private map/unmap operations. Create a new I/O address
+ * space for this device, that isn't bound to any process.
+ *
+ * iommu_sva_init_device must have been called first.
+ */
+int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **out)
+{
+	int ret;
+	struct io_mm *io_mm;
+	struct iommu_domain *domain;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!out || !param)
+		return -EINVAL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	io_mm = io_mm_alloc(domain, dev, NULL, 0);
+	if (IS_ERR(io_mm))
+		return PTR_ERR(io_mm);
+
+	kref_init(&io_mm->kref);
+
+	ret = io_mm_attach(domain, dev, io_mm, NULL);
+	if (ret) {
+		io_mm_put(io_mm);
+		return ret;
+	}
+
+	*out = io_mm;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
+
+void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm)
+{
+	struct iommu_bond *bond;
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return;
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		if (bond->dev == dev) {
+			io_mm_detach_locked(bond, false);
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
+
+int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		  unsigned long iova, phys_addr_t paddr, size_t size, int prot)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return -ENODEV;
+
+	return __iommu_map(domain, io_mm, iova, paddr, size, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map);
+
+size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			unsigned long iova, struct scatterlist *sg,
+			unsigned int nents, int prot)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return -ENODEV;
+
+	return __iommu_map_sg(domain, io_mm, iova, sg, nents, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map_sg);
+
+size_t iommu_sva_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, size_t size)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	return __iommu_unmap(domain, io_mm, iova, size, true);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap);
+
+size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	return __iommu_unmap(domain, io_mm, iova, size, false);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap_fast);
+
+phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+				   struct io_mm *io_mm, dma_addr_t iova)
+{
+	if (!io_mm)
+		return iommu_iova_to_phys(domain, iova);
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	if (unlikely(domain->ops->sva_iova_to_phys == NULL))
+		return 0;
+
+	return domain->ops->sva_iova_to_phys(domain, io_mm, iova);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_iova_to_phys);
+
+void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm *io_mm,
+			     unsigned long iova, size_t size)
+{
+	if (!io_mm) {
+		iommu_tlb_range_add(domain, iova, size);
+		return;
+	}
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return;
+
+	if (domain->ops->sva_iotlb_range_add != NULL)
+		domain->ops->sva_iotlb_range_add(domain, io_mm, iova, size);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_tlb_range_add);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -693,10 +856,12 @@ EXPORT_SYMBOL_GPL(iommu_sva_find);
  * If the device should support recoverable I/O Page Faults (e.g. PCI PRI), the
  * IOMMU_SVA_FEAT_IOPF feature must be requested.
  *
- * @mm_exit is called when an address space bound to the device is about to be
- * torn down by exit_mmap. After @mm_exit returns, the device must not issue any
- * more transaction with the PASID given as argument. The handler gets an opaque
- * pointer corresponding to the drvdata passed as argument to bind().
+ * If the driver intends to share process address spaces with the device, it
+ * should pass a valid @mm_exit handler. @mm_exit is called when an address
+ * space bound to the device is about to be torn down by exit_mmap. After
+ * @mm_exit returns, the device must not issue any more transaction with the
+ * PASID given as argument. The handler gets an opaque pointer corresponding to
+ * the drvdata passed as argument to bind().
  *
  * The @mm_exit handler is allowed to sleep. Be careful about the locks taken in
  * @mm_exit, because they might lead to deadlocks if they are also held when
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b493f5c4fe64..dd75c0a19c3a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1854,8 +1854,8 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
 	return pgsize;
 }
 
-int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, size_t size, int prot)
+int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		unsigned long iova, phys_addr_t paddr, size_t size, int prot)
 {
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
@@ -1863,7 +1863,8 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	phys_addr_t orig_paddr = paddr;
 	int ret = 0;
 
-	if (unlikely(domain->ops->map == NULL ||
+	if (unlikely((!io_mm && domain->ops->map == NULL) ||
+		     (io_mm && domain->ops->sva_map == NULL) ||
 		     domain->pgsize_bitmap == 0UL))
 		return -ENODEV;
 
@@ -1892,7 +1893,12 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 			 iova, &paddr, pgsize);
 
-		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (io_mm)
+			ret = domain->ops->sva_map(domain, io_mm, iova, paddr,
+						   pgsize, prot);
+		else
+			ret = domain->ops->map(domain, iova, paddr, pgsize,
+					       prot);
 		if (ret)
 			break;
 
@@ -1903,24 +1909,30 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 
 	/* unroll mapping in case something went wrong */
 	if (ret)
-		iommu_unmap(domain, orig_iova, orig_size - size);
+		__iommu_unmap(domain, io_mm, orig_iova, orig_size - size, true);
 	else
 		trace_map(orig_iova, orig_paddr, orig_size);
 
 	return ret;
 }
+
+int iommu_map(struct iommu_domain *domain, unsigned long iova,
+	      phys_addr_t paddr, size_t size, int prot)
+{
+	return __iommu_map(domain, NULL, iova, paddr, size, prot);
+}
 EXPORT_SYMBOL_GPL(iommu_map);
 
-static size_t __iommu_unmap(struct iommu_domain *domain,
-			    unsigned long iova, size_t size,
-			    bool sync)
+size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+		     unsigned long iova, size_t size, bool sync)
 {
 	const struct iommu_ops *ops = domain->ops;
 	size_t unmapped_page, unmapped = 0;
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
 
-	if (unlikely(ops->unmap == NULL ||
+	if (unlikely((!io_mm && ops->unmap == NULL) ||
+		     (io_mm && ops->sva_unmap == NULL) ||
 		     domain->pgsize_bitmap == 0UL))
 		return 0;
 
@@ -1950,7 +1962,11 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 	while (unmapped < size) {
 		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
 
-		unmapped_page = ops->unmap(domain, iova, pgsize);
+		if (io_mm)
+			unmapped_page = ops->sva_unmap(domain, io_mm, iova,
+						       pgsize);
+		else
+			unmapped_page = ops->unmap(domain, iova, pgsize);
 		if (!unmapped_page)
 			break;
 
@@ -1974,19 +1990,20 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 size_t iommu_unmap(struct iommu_domain *domain,
 		   unsigned long iova, size_t size)
 {
-	return __iommu_unmap(domain, iova, size, true);
+	return __iommu_unmap(domain, NULL, iova, size, true);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
 size_t iommu_unmap_fast(struct iommu_domain *domain,
 			unsigned long iova, size_t size)
 {
-	return __iommu_unmap(domain, iova, size, false);
+	return __iommu_unmap(domain, NULL, iova, size, false);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap_fast);
 
-size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
-		    struct scatterlist *sg, unsigned int nents, int prot)
+size_t __iommu_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+		      unsigned long iova, struct scatterlist *sg,
+		      unsigned int nents, int prot)
 {
 	struct scatterlist *s;
 	size_t mapped = 0;
@@ -2010,7 +2027,7 @@ size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 		if (!IS_ALIGNED(s->offset, min_pagesz))
 			goto out_err;
 
-		ret = iommu_map(domain, iova + mapped, phys, s->length, prot);
+		ret = __iommu_map(domain, io_mm, iova + mapped, phys, s->length, prot);
 		if (ret)
 			goto out_err;
 
@@ -2021,12 +2038,12 @@ size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 
 out_err:
 	/* undo mappings already done */
-	iommu_unmap(domain, iova, mapped);
+	__iommu_unmap(domain, io_mm, iova, mapped, true);
 
 	return 0;
 
 }
-EXPORT_SYMBOL_GPL(iommu_map_sg);
+EXPORT_SYMBOL_GPL(__iommu_map_sg);
 
 int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
 			       phys_addr_t paddr, u64 size, int prot)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ad2b18883ae2..0674fd983f81 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -248,11 +248,15 @@ struct iommu_sva_param {
  * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @sva_map: map a physically contiguous memory region to an address space
+ * @sva_unmap: unmap a physically contiguous memory region from an address space
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
  * @tlb_range_add: Add a given iova range to the flush queue for this domain
+ * @sva_iotlb_range_add: Add a given iova range to the flush queue for this mm
  * @tlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *            queue
  * @iova_to_phys: translate iova to physical address
+ * @sva_iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
  * @remove_device: remove device from iommu grouping
  * @device_group: find iommu group for a particular device
@@ -298,11 +302,21 @@ struct iommu_ops {
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size);
+	int (*sva_map)(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
+	size_t (*sva_unmap)(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size);
 	void (*flush_iotlb_all)(struct iommu_domain *domain);
 	void (*iotlb_range_add)(struct iommu_domain *domain,
 				unsigned long iova, size_t size);
+	void (*sva_iotlb_range_add)(struct iommu_domain *domain,
+				    struct io_mm *io_mm, unsigned long iova,
+				    size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
+	phys_addr_t (*sva_iova_to_phys)(struct iommu_domain *domain,
+					struct io_mm *io_mm, dma_addr_t iova);
 	int (*add_device)(struct device *dev);
 	void (*remove_device)(struct device *dev);
 	struct iommu_group *(*device_group)(struct device *dev);
@@ -525,14 +539,27 @@ extern int iommu_sva_invalidate(struct iommu_domain *domain,
 		struct device *dev, struct tlb_invalidate_info *inv_info);
 
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
+extern size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size, bool sync);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 			  size_t size);
 extern size_t iommu_unmap_fast(struct iommu_domain *domain,
 			       unsigned long iova, size_t size);
-extern size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
-			   struct scatterlist *sg,unsigned int nents, int prot);
+extern size_t __iommu_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			     unsigned long iova, struct scatterlist *sg,
+			     unsigned int nents, int prot);
+static inline size_t iommu_map_sg(struct iommu_domain *domain,
+				  unsigned long iova,
+				  struct scatterlist *sg, unsigned int nents,
+				  int prot)
+{
+	return __iommu_map_sg(domain, NULL, iova, sg, nents, prot);
+}
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
@@ -693,12 +720,25 @@ static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 	return NULL;
 }
 
+static inline int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+			      unsigned long iova, phys_addr_t paddr,
+			      size_t size, int prot)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 			    phys_addr_t paddr, size_t size, int prot)
 {
 	return -ENODEV;
 }
 
+static inline size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+				   unsigned long iova, size_t size, bool sync)
+{
+	return 0;
+}
+
 static inline size_t iommu_unmap(struct iommu_domain *domain,
 				 unsigned long iova, size_t size)
 {
@@ -1003,6 +1043,23 @@ extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void iommu_sva_unbind_device_all(struct device *dev);
 extern struct mm_struct *iommu_sva_find(int pasid);
 
+int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **io_mm);
+void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm);
+
+int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		  unsigned long iova, phys_addr_t paddr, size_t size, int prot);
+size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			unsigned long iova, struct scatterlist *sg,
+			unsigned int nents, int prot);
+size_t iommu_sva_unmap(struct iommu_domain *domain,
+		       struct io_mm *io_mm, unsigned long iova, size_t size);
+size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size);
+phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+				   struct io_mm *io_mm, dma_addr_t iova);
+void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm *io_mm,
+			     unsigned long iova, size_t size);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_init_device(struct device *dev,
 					unsigned long features,
@@ -1037,6 +1094,57 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 {
 	return NULL;
 }
+
+static inline int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **io_mm)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_sva_free_pasid(struct io_mm *io_mm, struct device *dev)
+{
+}
+
+static inline int iommu_sva_map(struct iommu_domain *domain,
+				struct io_mm *io_mm, unsigned long iova,
+				phys_addr_t paddr, size_t size, int prot)
+{
+	return -EINVAL;
+}
+
+static inline size_t iommu_sva_map_sg(struct iommu_domain *domain,
+				      struct io_mm *io_mm, unsigned long iova,
+				      struct scatterlist *sg,
+				      unsigned int nents, int prot)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap(struct iommu_domain *domain,
+				     struct io_mm *io_mm, unsigned long iova,
+				     size_t size)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap_fast(struct iommu_domain *domain,
+					  struct io_mm *io_mm,
+					  unsigned long iova, size_t size)
+{
+	return 0;
+}
+
+static inline phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+						 struct io_mm *io_mm,
+						 dma_addr_t iova)
+{
+	return 0;
+}
+
+static inline void iommu_sva_tlb_range_add(struct iommu_domain *domain,
+					   struct io_mm *io_mm,
+					   unsigned long iova, size_t size)
+{
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #ifdef CONFIG_IOMMU_PAGE_FAULT
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
       [not found]   ` <20180920170046.20154-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-23  2:39     ` Lu Baolu
  2018-09-24 12:07         ` Jean-Philippe Brucker
  2018-09-25 13:16         ` Joerg Roedel
  0 siblings, 2 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-23  2:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi,

On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
> process address spaces to devices. This requires the IOMMU to support page
> table format and features compatible with the CPUs, and usually requires
> the system to support I/O Page Faults (IOPF) and Process Address Space ID
> (PASID). When all of these are available, DMA can access virtual addresses
> of a process. A PASID is allocated for each process, and the device driver
> programs it into the device in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce two
> IOMMU operations, sva_init_device() and sva_shutdown_device(), that
> prepare the IOMMU driver for SVA. For example allocate PASID tables and
> fault queues. Subsequent patches will implement the bind() and unbind()
> operations.
> 
> Introduce a new mutex sva_lock on the device's IOMMU param to serialize
> init(), shutdown(), bind() and unbind() operations. Using the existing
> lock isn't possible because the unbind() and shutdown() operations will
> have to wait while holding sva_lock for concurrent fault queue flushes to
> terminate. These flushes will take the existing lock.
> 
> Support for I/O Page Faults will be added in a later patch using a new
> feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
> down all shared mappings.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
> v2->v3:
> * Add sva_lock to serialize init/bind/unbind/shutdown
> * Rename functions for consistency with the rest of the API
> ---
>   drivers/iommu/Kconfig     |   4 ++
>   drivers/iommu/Makefile    |   1 +
>   drivers/iommu/iommu-sva.c | 107 ++++++++++++++++++++++++++++++++++++++
>   drivers/iommu/iommu.c     |   1 +
>   include/linux/iommu.h     |  34 ++++++++++++
>   5 files changed, 147 insertions(+)
>   create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index c60395b7470f..884580401919 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -95,6 +95,10 @@ config IOMMU_DMA
>   	select IOMMU_IOVA
>   	select NEED_SG_DMA_LENGTH
>   
> +config IOMMU_SVA
> +	bool
> +	select IOMMU_API
> +
>   config FSL_PAMU
>   	bool "Freescale IOMMU support"
>   	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index ab5eba6edf82..7d6332be5f0e 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>   obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>   obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
>   obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>   obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>   obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>   obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index 000000000000..85ef98efede8
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,107 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Manage PASIDs and bind process address spaces to devices.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/slab.h>
> +
> +/**
> + * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @min_pasid: min PASID value supported by the device
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to initialize all
> + * features required for SVA.
> + *
> + * The device must support multiple address spaces (e.g. PCI PASID). By default
> + * the PASID allocated during bind() is limited by the IOMMU capacity, and by
> + * the device PASID width defined in the PCI capability or in the firmware
> + * description. Setting @max_pasid to a non-zero value smaller than this limit
> + * overrides it. Similarly, @min_pasid overrides the lower PASID limit supported
> + * by the IOMMU.
> + *
> + * The device should not be performing any DMA while this function is running,
> + * otherwise the behavior is undefined.
> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_init_device(struct device *dev, unsigned long features,
> +		       unsigned int min_pasid, unsigned int max_pasid)
> +{
> +	int ret;
> +	struct iommu_sva_param *param;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);

This doesn't work for vt-d. The domains for host iova are self-managed
by vt-d driver itself. Hence, iommu_get_domain_for_dev() will always
return NULL unless an UNMANAGED domain is attached to the device.

How about

       const struct iommu_ops *ops = dev->bus->iommu_ops;

instead?

> +
> +	if (!domain || !domain->ops->sva_init_device)
> +		return -ENODEV;
> +
> +	if (features)
> +		return -EINVAL;
> +
> +	param = kzalloc(sizeof(*param), GFP_KERNEL);
> +	if (!param)
> +		return -ENOMEM;
> +
> +	param->features		= features;
> +	param->min_pasid	= min_pasid;
> +	param->max_pasid	= max_pasid;
> +
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	if (dev->iommu_param->sva_param) {
> +		ret = -EEXIST;
> +		goto err_unlock;
> +	}
> +
> +	/*
> +	 * IOMMU driver updates the limits depending on the IOMMU and device
> +	 * capabilities.
> +	 */
> +	ret = domain->ops->sva_init_device(dev, param);
> +	if (ret)
> +		goto err_unlock;
> +
> +	dev->iommu_param->sva_param = param;
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +	kfree(param);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_init_device);
> +
> +/**
> + * iommu_sva_shutdown_device() - Shutdown Shared Virtual Addressing for a device
> + * @dev: the device
> + *
> + * Disable SVA. Device driver should ensure that the device isn't performing any
> + * DMA while this function is running.
> + */
> +void iommu_sva_shutdown_device(struct device *dev)
> +{
> +	struct iommu_sva_param *param;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);

Ditto.

Best regards,
Lu Baolu

> +
> +	if (!domain)
> +		return;
> +
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	param = dev->iommu_param->sva_param;
> +	if (!param)
> +		goto out_unlock;
> +
> +	if (domain->ops->sva_shutdown_device)
> +		domain->ops->sva_shutdown_device(dev);
> +
> +	kfree(param);
> +	dev->iommu_param->sva_param = NULL;
> +out_unlock:
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_shutdown_device);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 58f3477f2993..fa0561ed006f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -653,6 +653,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
>   		goto err_free_name;
>   	}
>   	mutex_init(&dev->iommu_param->lock);
> +	mutex_init(&dev->iommu_param->sva_lock);
>   
>   	kobject_get(group->devices_kobj);
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 8177f7736fcd..4c27cb347770 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -197,6 +197,12 @@ struct page_response_msg {
>   	u64 private_data;
>   };
>   
> +struct iommu_sva_param {
> +	unsigned long features;
> +	unsigned int min_pasid;
> +	unsigned int max_pasid;
> +};
> +
>   /**
>    * struct iommu_ops - iommu ops and capabilities
>    * @capable: check capability
> @@ -204,6 +210,8 @@ struct page_response_msg {
>    * @domain_free: free iommu domain
>    * @attach_dev: attach device to an iommu domain
>    * @detach_dev: detach device from an iommu domain
> + * @sva_init_device: initialize Shared Virtual Addressing for a device
> + * @sva_shutdown_device: shutdown Shared Virtual Addressing for a device
>    * @map: map a physically contiguous memory region to an iommu domain
>    * @unmap: unmap a physically contiguous memory region from an iommu domain
>    * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
> @@ -239,6 +247,8 @@ struct iommu_ops {
>   
>   	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
>   	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> +	int (*sva_init_device)(struct device *dev, struct iommu_sva_param *param);
> +	void (*sva_shutdown_device)(struct device *dev);
>   	int (*map)(struct iommu_domain *domain, unsigned long iova,
>   		   phys_addr_t paddr, size_t size, int prot);
>   	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> @@ -393,6 +403,9 @@ struct iommu_fault_param {
>    * struct iommu_param - collection of per-device IOMMU data
>    *
>    * @fault_param: IOMMU detected device fault reporting data
> + * @lock: serializes accesses to fault_param
> + * @sva_param: SVA parameters
> + * @sva_lock: serializes accesses to sva_param
>    *
>    * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
>    *	struct iommu_group	*iommu_group;
> @@ -401,6 +414,8 @@ struct iommu_fault_param {
>   struct iommu_param {
>   	struct mutex lock;
>   	struct iommu_fault_param *fault_param;
> +	struct mutex sva_lock;
> +	struct iommu_sva_param *sva_param;
>   };
>   
>   int  iommu_device_register(struct iommu_device *iommu);
> @@ -904,4 +919,23 @@ void iommu_debugfs_setup(void);
>   static inline void iommu_debugfs_setup(void) {}
>   #endif
>   
> +#ifdef CONFIG_IOMMU_SVA
> +extern int iommu_sva_init_device(struct device *dev, unsigned long features,
> +				 unsigned int min_pasid,
> +				 unsigned int max_pasid);
> +extern void iommu_sva_shutdown_device(struct device *dev);
> +#else /* CONFIG_IOMMU_SVA */
> +static inline int iommu_sva_init_device(struct device *dev,
> +					unsigned long features,
> +					unsigned int min_pasid,
> +					unsigned int max_pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_sva_shutdown_device(struct device *dev)
> +{
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
>   #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-23  3:05     ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-23  3:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu
  Cc: baolu.lu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, andrew.murray, will.deacon, robin.murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Hi,

On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
> PASID that drivers can program in hardware, to let their devices access an
> mm. This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.

Is it possible that a malicious process can unbind a pasid which is
used by another normal process?

It might happen in below sequence:


Process A			Process B
=========			=========
iommu_sva_init_device(dev)
iommu_sva_bind_device(dev)
....
device access mm of A with
#PASID returned above
....
				iommu_sva_unbind_device(dev, #PASID)
....
[unrecoverable errors]

I didn't have a thorough consideration of this. Sorry if this has been
prevented.

Best regards,
Lu Baolu

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>   drivers/iommu/iommu-sva.c | 34 +++++++++++++++
>   drivers/iommu/iommu.c     | 90 +++++++++++++++++++++++++++++++++++++++
>   include/linux/iommu.h     | 37 ++++++++++++++++
>   3 files changed, 161 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 85ef98efede8..d60d4f0bb89e 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -8,6 +8,38 @@
>   #include <linux/iommu.h>
>   #include <linux/slab.h>
>   
> +int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
> +			    unsigned long flags, void *drvdata)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
> +
> +int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
> +
> +static void __iommu_sva_unbind_device_all(struct device *dev)
> +{
> +	/* TODO */
> +}
> +
> +/**
> + * iommu_sva_unbind_device_all() - Detach all address spaces from this device
> + * @dev: the device
> + *
> + * When detaching @dev from a domain, IOMMU drivers should use this helper.
> + */
> +void iommu_sva_unbind_device_all(struct device *dev)
> +{
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	__iommu_sva_unbind_device_all(dev);
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
> +
>   /**
>    * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
>    * @dev: the device
> @@ -96,6 +128,8 @@ void iommu_sva_shutdown_device(struct device *dev)
>   	if (!param)
>   		goto out_unlock;
>   
> +	__iommu_sva_unbind_device_all(dev);
> +
>   	if (domain->ops->sva_shutdown_device)
>   		domain->ops->sva_shutdown_device(dev);
>   
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index fa0561ed006f..aba3bf15d46c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2325,3 +2325,93 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access the mm
> + * using the returned PASID. If unbind() isn't called first, a subsequent bind()
> + * for the same device and mm fails with -EEXIST.
> + *
> + * iommu_sva_init_device() must be called first, to initialize the required SVA
> + * features. @flags must be a subset of these features.
> + *
> + * The caller must pin down using get_user_pages*() all mappings shared with the
> + * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
> + * (e.g. copy-on-write).
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
> +			  unsigned long flags, void *drvdata)
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	/* Ensure device count and domain don't change while we're binding */
> +	mutex_lock(&group->mutex);
> +
> +	/*
> +	 * To keep things simple, SVA currently doesn't support IOMMU groups
> +	 * with more than one device. Existing SVA-capable systems are not
> +	 * affected by the problems that required IOMMU groups (lack of ACS
> +	 * isolation, device ID aliasing and other hardware issues).
> +	 */
> +	if (iommu_group_device_count(group) != 1)
> +		goto out_unlock;
> +
> +	ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
> +
> +out_unlock:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> + * @dev: the device
> + * @pasid: the pasid returned by bind()
> + *
> + * Remove bond between device and address space identified by @pasid. Users
> + * should not call unbind() if the corresponding mm exited (as the PASID might
> + * have been reallocated for another process).
> + *
> + * The device must not be issuing any more transaction for this PASID. All
> + * outstanding page requests for this PASID must have been flushed to the IOMMU.
> + *
> + * Returns 0 on success, or an error value
> + */
> +int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	mutex_lock(&group->mutex);
> +	ret = __iommu_sva_unbind_device(dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 4c27cb347770..9c49877e37a5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -586,6 +586,10 @@ void iommu_fwspec_free(struct device *dev);
>   int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>   const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>   
> +extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				int *pasid, unsigned long flags, void *drvdata);
> +extern int iommu_sva_unbind_device(struct device *dev, int pasid);
> +
>   #else /* CONFIG_IOMMU_API */
>   
>   struct iommu_ops {};
> @@ -910,6 +914,18 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
>   	return -ENODEV;
>   }
>   
> +static inline int iommu_sva_bind_device(struct device *dev,
> +					struct mm_struct *mm, int *pasid,
> +					unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
>   #endif /* CONFIG_IOMMU_API */
>   
>   #ifdef CONFIG_IOMMU_DEBUGFS
> @@ -924,6 +940,11 @@ extern int iommu_sva_init_device(struct device *dev, unsigned long features,
>   				 unsigned int min_pasid,
>   				 unsigned int max_pasid);
>   extern void iommu_sva_shutdown_device(struct device *dev);
> +extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				   int *pasid, unsigned long flags,
> +				   void *drvdata);
> +extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
> +extern void iommu_sva_unbind_device_all(struct device *dev);
>   #else /* CONFIG_IOMMU_SVA */
>   static inline int iommu_sva_init_device(struct device *dev,
>   					unsigned long features,
> @@ -936,6 +957,22 @@ static inline int iommu_sva_init_device(struct device *dev,
>   static inline void iommu_sva_shutdown_device(struct device *dev)
>   {
>   }
> +
> +static inline int __iommu_sva_bind_device(struct device *dev,
> +					  struct mm_struct *mm, int *pasid,
> +					  unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_sva_unbind_device_all(struct device *dev)
> +{
> +}
>   #endif /* CONFIG_IOMMU_SVA */
>   
>   #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-23  3:05     ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-23  3:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi,

On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
> PASID that drivers can program in hardware, to let their devices access an
> mm. This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.

Is it possible that a malicious process can unbind a pasid which is
used by another normal process?

It might happen in below sequence:


Process A			Process B
=========			=========
iommu_sva_init_device(dev)
iommu_sva_bind_device(dev)
....
device access mm of A with
#PASID returned above
....
				iommu_sva_unbind_device(dev, #PASID)
....
[unrecoverable errors]

I didn't have a thorough consideration of this. Sorry if this has been
prevented.

Best regards,
Lu Baolu

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>   drivers/iommu/iommu-sva.c | 34 +++++++++++++++
>   drivers/iommu/iommu.c     | 90 +++++++++++++++++++++++++++++++++++++++
>   include/linux/iommu.h     | 37 ++++++++++++++++
>   3 files changed, 161 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 85ef98efede8..d60d4f0bb89e 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -8,6 +8,38 @@
>   #include <linux/iommu.h>
>   #include <linux/slab.h>
>   
> +int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
> +			    unsigned long flags, void *drvdata)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
> +
> +int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
> +
> +static void __iommu_sva_unbind_device_all(struct device *dev)
> +{
> +	/* TODO */
> +}
> +
> +/**
> + * iommu_sva_unbind_device_all() - Detach all address spaces from this device
> + * @dev: the device
> + *
> + * When detaching @dev from a domain, IOMMU drivers should use this helper.
> + */
> +void iommu_sva_unbind_device_all(struct device *dev)
> +{
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	__iommu_sva_unbind_device_all(dev);
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
> +
>   /**
>    * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
>    * @dev: the device
> @@ -96,6 +128,8 @@ void iommu_sva_shutdown_device(struct device *dev)
>   	if (!param)
>   		goto out_unlock;
>   
> +	__iommu_sva_unbind_device_all(dev);
> +
>   	if (domain->ops->sva_shutdown_device)
>   		domain->ops->sva_shutdown_device(dev);
>   
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index fa0561ed006f..aba3bf15d46c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2325,3 +2325,93 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access the mm
> + * using the returned PASID. If unbind() isn't called first, a subsequent bind()
> + * for the same device and mm fails with -EEXIST.
> + *
> + * iommu_sva_init_device() must be called first, to initialize the required SVA
> + * features. @flags must be a subset of these features.
> + *
> + * The caller must pin down using get_user_pages*() all mappings shared with the
> + * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
> + * (e.g. copy-on-write).
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
> +			  unsigned long flags, void *drvdata)
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	/* Ensure device count and domain don't change while we're binding */
> +	mutex_lock(&group->mutex);
> +
> +	/*
> +	 * To keep things simple, SVA currently doesn't support IOMMU groups
> +	 * with more than one device. Existing SVA-capable systems are not
> +	 * affected by the problems that required IOMMU groups (lack of ACS
> +	 * isolation, device ID aliasing and other hardware issues).
> +	 */
> +	if (iommu_group_device_count(group) != 1)
> +		goto out_unlock;
> +
> +	ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
> +
> +out_unlock:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> + * @dev: the device
> + * @pasid: the pasid returned by bind()
> + *
> + * Remove bond between device and address space identified by @pasid. Users
> + * should not call unbind() if the corresponding mm exited (as the PASID might
> + * have been reallocated for another process).
> + *
> + * The device must not be issuing any more transaction for this PASID. All
> + * outstanding page requests for this PASID must have been flushed to the IOMMU.
> + *
> + * Returns 0 on success, or an error value
> + */
> +int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	mutex_lock(&group->mutex);
> +	ret = __iommu_sva_unbind_device(dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 4c27cb347770..9c49877e37a5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -586,6 +586,10 @@ void iommu_fwspec_free(struct device *dev);
>   int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>   const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>   
> +extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				int *pasid, unsigned long flags, void *drvdata);
> +extern int iommu_sva_unbind_device(struct device *dev, int pasid);
> +
>   #else /* CONFIG_IOMMU_API */
>   
>   struct iommu_ops {};
> @@ -910,6 +914,18 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
>   	return -ENODEV;
>   }
>   
> +static inline int iommu_sva_bind_device(struct device *dev,
> +					struct mm_struct *mm, int *pasid,
> +					unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
>   #endif /* CONFIG_IOMMU_API */
>   
>   #ifdef CONFIG_IOMMU_DEBUGFS
> @@ -924,6 +940,11 @@ extern int iommu_sva_init_device(struct device *dev, unsigned long features,
>   				 unsigned int min_pasid,
>   				 unsigned int max_pasid);
>   extern void iommu_sva_shutdown_device(struct device *dev);
> +extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				   int *pasid, unsigned long flags,
> +				   void *drvdata);
> +extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
> +extern void iommu_sva_unbind_device_all(struct device *dev);
>   #else /* CONFIG_IOMMU_SVA */
>   static inline int iommu_sva_init_device(struct device *dev,
>   					unsigned long features,
> @@ -936,6 +957,22 @@ static inline int iommu_sva_init_device(struct device *dev,
>   static inline void iommu_sva_shutdown_device(struct device *dev)
>   {
>   }
> +
> +static inline int __iommu_sva_bind_device(struct device *dev,
> +					  struct mm_struct *mm, int *pasid,
> +					  unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_sva_unbind_device_all(struct device *dev)
> +{
> +}
>   #endif /* CONFIG_IOMMU_SVA */
>   
>   #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-24 12:07         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-24 12:07 UTC (permalink / raw)
  To: Lu Baolu, iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 23/09/2018 03:39, Lu Baolu wrote:
>> +int iommu_sva_init_device(struct device *dev, unsigned long features,
>> +                    unsigned int min_pasid, unsigned int max_pasid)
>> +{
>> +     int ret;
>> +     struct iommu_sva_param *param;
>> +     struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> 
> This doesn't work for vt-d. The domains for host iova are self-managed
> by vt-d driver itself. Hence, iommu_get_domain_for_dev() will always
> return NULL unless an UNMANAGED domain is attached to the device.
> 
> How about
> 
>        const struct iommu_ops *ops = dev->bus->iommu_ops;>
> instead?

Right, this should work. I also needed to change the IOMMU ops
introduced in patch 3 to not take a domain. It's a shame that iommu-sva
can't get the device's current domain, I was hoping we could manage the
bonds per domain in common code. But it's not a big deal and on the
upside, it simplifies these core patches.

I was previously relying on "if we have a domain, then
iommu_group_add_device has been called and therefore dev->iommu_param is
set". I now do the same as iommu_register_device_fault_handler, check if
iommu_param isn't NULL. I don't think there is a race with
iommu_group_add/remove_device, since the device driver cannot call SVA
functions before the core called its probe() callback and after the core
called its remove() callback, which happen after
iommu_group_add_device() and before iommu_group_remove_device()
respectively. Though I don't have a full picture here, and might be wrong.

I pushed the updated version to my sva/current branch

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-24 12:07         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-24 12:07 UTC (permalink / raw)
  To: Lu Baolu, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 23/09/2018 03:39, Lu Baolu wrote:
>> +int iommu_sva_init_device(struct device *dev, unsigned long features,
>> +                    unsigned int min_pasid, unsigned int max_pasid)
>> +{
>> +     int ret;
>> +     struct iommu_sva_param *param;
>> +     struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> 
> This doesn't work for vt-d. The domains for host iova are self-managed
> by vt-d driver itself. Hence, iommu_get_domain_for_dev() will always
> return NULL unless an UNMANAGED domain is attached to the device.
> 
> How about
> 
>        const struct iommu_ops *ops = dev->bus->iommu_ops;>
> instead?

Right, this should work. I also needed to change the IOMMU ops
introduced in patch 3 to not take a domain. It's a shame that iommu-sva
can't get the device's current domain, I was hoping we could manage the
bonds per domain in common code. But it's not a big deal and on the
upside, it simplifies these core patches.

I was previously relying on "if we have a domain, then
iommu_group_add_device has been called and therefore dev->iommu_param is
set". I now do the same as iommu_register_device_fault_handler, check if
iommu_param isn't NULL. I don't think there is a race with
iommu_group_add/remove_device, since the device driver cannot call SVA
functions before the core called its probe() callback and after the core
called its remove() callback, which happen after
iommu_group_add_device() and before iommu_group_remove_device()
respectively. Though I don't have a full picture here, and might be wrong.

I pushed the updated version to my sva/current branch

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-24 12:07       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-24 12:07 UTC (permalink / raw)
  To: Lu Baolu, iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 23/09/2018 04:05, Lu Baolu wrote:
> Hi,
> 
> On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
>> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
>> PASID that drivers can program in hardware, to let their devices access an
>> mm. This patch only adds skeletons for the device driver API, most of the
>> implementation is still missing.
> 
> Is it possible that a malicious process can unbind a pasid which is
> used by another normal process?

Yes, it's up to the device driver that calls unbind() to check that the
caller is allowed to unbind this PASID. We can't do it ourselves since
unbind() could also be called from a kernel thread for example from a
cleanup function in some workqueue, outside the context of the process
to unbind.

Thanks,
Jean

> 
> It might happen in below sequence:
> 
> 
> Process A                       Process B
> =========                       =========
> iommu_sva_init_device(dev)
> iommu_sva_bind_device(dev)
> ....
> device access mm of A with
> #PASID returned above
> ....
>                                 iommu_sva_unbind_device(dev, #PASID)
> ....
> [unrecoverable errors]
> 
> I didn't have a thorough consideration of this. Sorry if this has been
> prevented.
> 
> Best regards,
> Lu Baolu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-24 12:07       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-24 12:07 UTC (permalink / raw)
  To: Lu Baolu, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 23/09/2018 04:05, Lu Baolu wrote:
> Hi,
> 
> On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
>> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
>> PASID that drivers can program in hardware, to let their devices access an
>> mm. This patch only adds skeletons for the device driver API, most of the
>> implementation is still missing.
> 
> Is it possible that a malicious process can unbind a pasid which is
> used by another normal process?

Yes, it's up to the device driver that calls unbind() to check that the
caller is allowed to unbind this PASID. We can't do it ourselves since
unbind() could also be called from a kernel thread for example from a
cleanup function in some workqueue, outside the context of the process
to unbind.

Thanks,
Jean

> 
> It might happen in below sequence:
> 
> 
> Process A                       Process B
> =========                       =========
> iommu_sva_init_device(dev)
> iommu_sva_bind_device(dev)
> ....
> device access mm of A with
> #PASID returned above
> ....
>                                 iommu_sva_unbind_device(dev, #PASID)
> ....
> [unrecoverable errors]
> 
> I didn't have a thorough consideration of this. Sorry if this has been
> prevented.
> 
> Best regards,
> Lu Baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25  3:15     ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-25  3:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu
  Cc: baolu.lu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, andrew.murray, will.deacon, robin.murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Hi,

On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> Allocate IOMMU mm structures and bind them to devices. Four operations are
> added to IOMMU drivers:
> 
> * mm_alloc(): to create an io_mm structure and perform architecture-
>    specific operations required to grab the process (for instance on ARM,
>    pin down the CPU ASID so that the process doesn't get assigned a new
>    ASID on rollover).
> 
>    There is a single valid io_mm structure per Linux mm. Future extensions
>    may also use io_mm for kernel-managed address spaces, populated with
>    map()/unmap() calls instead of bound to process address spaces. This
>    patch focuses on "shared" io_mm.
> 
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>    device is capable of sharing an address space, and writes the PASID
>    table entry to install the pgd.
> 
>    Some IOMMU drivers will have a single PASID table per domain, for
>    convenience. Other can implement it differently but to help these
>    drivers, mm_attach and mm_detach take 'attach_domain' and
>    'detach_domain' parameters, that tell whether they need to set and clear
>    the PASID entry or only send the required TLB invalidations.
> 
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>    PASID table entry and invalidates the IOTLBs.
> 
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>    release the process.
> 
> mm_attach and mm_detach operations are serialized with a spinlock. When
> trying to optimize this code, we should at least prevent concurrent
> attach()/detach() on the same domain (so multi-level PASID table code can
> allocate tables lazily). mm_alloc() can sleep, but mm_free must not
> (because we'll have to call it from call_srcu later on).
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
> 
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
> v2->v3: use sva_lock, comment updates
> ---
>   drivers/iommu/iommu-sva.c | 397 +++++++++++++++++++++++++++++++++++++-
>   drivers/iommu/iommu.c     |   1 +
>   include/linux/iommu.h     |  29 +++
>   3 files changed, 424 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index d60d4f0bb89e..a486bc947335 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -5,25 +5,415 @@
>    * Copyright (C) 2018 ARM Ltd.
>    */
>   
> +#include <linux/idr.h>
>   #include <linux/iommu.h>
> +#include <linux/sched/mm.h>
>   #include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
> + * B. All devices within the same domain access the same address spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each mm. In the
> + * example, devices use PASID 1 to read/write into address space X and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
> + * With this model, a single call binds all devices in a given domain to an
> + * address space. Other devices in the domain will get the same bond implicitly.
> + * However, users must issue one bind() for each device, because IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * In some IOMMUs, one entry (typically the first one) of the PASID table can be
> + * used to hold non-PASID translations. In this case PASID #0 is reserved and
> + * the first entry points to the io_pgtable pointer. In other IOMMUs the
> + * io_pgtable pointer is held in the device table and PASID #0 is available to
> + * the allocator.
> + */
> +
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
> + * used for returning errors). In practice implementations will use at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm, unsigned long flags)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm, flags);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the io_mm
> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->flags		= flags;
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +	/* Leave kref as zero until the io_mm is fully initialized */
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
> +			  param->max_pasid + 1, GFP_ATOMIC);
> +	io_mm->pasid = pasid;
> +	spin_unlock(&iommu_sva_lock);
> +	idr_preload_end();
> +
> +	if (pasid < 0) {
> +		ret = pasid;
> +		goto err_free_mm;
> +	}
> +
> +	/* TODO: keep track of mm. For the moment, abort. */
> +	ret = -ENOSYS;
> +	spin_lock(&iommu_sva_lock);
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +	spin_unlock(&iommu_sva_lock);
> +
> +err_free_mm:
> +	io_mm->release(io_mm);
> +	mmdrop(mm);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm = io_mm->mm;
> +
> +	io_mm->release(io_mm);
> +	mmdrop(mm);
> +}
> +
> +static void io_mm_release(struct kref *kref)
> +{
> +	struct io_mm *io_mm;
> +
> +	io_mm = container_of(kref, struct io_mm, kref);
> +	WARN_ON(!list_empty(&io_mm->devices));
> +
> +	/* The PASID can now be reallocated for another mm... */
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +	/* ... but this mm is freed after a grace period (TODO) */
> +	io_mm_free(io_mm);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the io_mm was successfully taken.
> + * Returns zero if the io_mm is being freed and should not be used.
> + */
> +static int io_mm_get_locked(struct io_mm *io_mm)
> +{
> +	if (io_mm)
> +		return kref_get_unless_zero(&io_mm->kref);
> +
> +	return 0;
> +}
> +
> +static void io_mm_put_locked(struct io_mm *io_mm)
> +{
> +	kref_put(&io_mm->kref, io_mm_release);
> +}
> +
> +static void io_mm_put(struct io_mm *io_mm)
> +{
> +	spin_lock(&iommu_sva_lock);
> +	io_mm_put_locked(io_mm);
> +	spin_unlock(&iommu_sva_lock);
> +}
> +
> +static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
> +			struct io_mm *io_mm, void *drvdata)
> +{
> +	int ret;
> +	bool attach_domain = true;
> +	int pasid = io_mm->pasid;
> +	struct iommu_bond *bond, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +		return -ENODEV;
> +
> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> +		return -ERANGE;
> +
> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond->domain		= domain;
> +	bond->io_mm		= io_mm;
> +	bond->dev		= dev;
> +	bond->drvdata		= drvdata;
> +
> +	spin_lock(&iommu_sva_lock);
> +	/*
> +	 * Check if this io_mm is already bound to the domain. In which case the
> +	 * IOMMU driver doesn't have to install the PASID table entry.
> +	 */
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == io_mm) {
> +			attach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
> +	if (ret) {
> +		kfree(bond);
> +		goto out_unlock;
> +	}
> +
> +	list_add(&bond->mm_head, &io_mm->devices);
> +	list_add(&bond->domain_head, &domain->mm_list);
> +	list_add(&bond->dev_head, &param->mm_list);
> +
> +out_unlock:
> +	spin_unlock(&iommu_sva_lock);
> +	return ret;
> +}
> +
> +static void io_mm_detach_locked(struct iommu_bond *bond)
> +{
> +	struct iommu_bond *tmp;
> +	bool detach_domain = true;
> +	struct iommu_domain *domain = bond->domain;
> +
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
> +			detach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	list_del(&bond->mm_head);
> +	list_del(&bond->domain_head);
> +	list_del(&bond->dev_head);
> +
> +	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
> +
> +	io_mm_put_locked(bond->io_mm);
> +	kfree(bond);
> +}
>   
>   int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>   			    unsigned long flags, void *drvdata)
>   {
> -	return -ENOSYS; /* TODO */
> +	int i;
> +	int ret = 0;
> +	struct iommu_bond *bond;
> +	struct io_mm *io_mm = NULL;
> +	struct iommu_domain *domain;
> +	struct iommu_sva_param *param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	param = dev->iommu_param->sva_param;
> +	if (!param || (flags & ~param->features)) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {

This might be problematic for vt-d (and other possible arch's which use
PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
might be allocated for:

(1) SVA
(2) Device Assignable Interface (might be a mdev or directly managed
     within a device driver).
(3) SVA in VM guest
(4) Device Assignable Interface in VM guest

So we can't expect that an io_mm pointer was associated with each PASID.
And this code might run into problem if the pasid is allocated for
usages other than SVA.

Best regards,
Lu Baolu

> +		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
> +			/* ... Unless it's already bound to this device */
> +			list_for_each_entry(bond, &io_mm->devices, mm_head) {
> +				if (bond->dev == dev) {
> +					ret = -EEXIST;
> +					io_mm_put_locked(io_mm);
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +	if (ret)
> +		goto out_unlock;
> +
> +	/* Require identical features within an io_mm for now */
> +	if (io_mm && (flags != io_mm->flags)) {
> +		io_mm_put(io_mm);
> +		ret = -EDOM;
> +		goto out_unlock;
> +	}
> +
> +	if (!io_mm) {
> +		io_mm = io_mm_alloc(domain, dev, mm, flags);
> +		if (IS_ERR(io_mm)) {
> +			ret = PTR_ERR(io_mm);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	ret = io_mm_attach(domain, dev, io_mm, drvdata);
> +	if (ret)
> +		io_mm_put(io_mm);
> +	else
> +		*pasid = io_mm->pasid;
> +
> +out_unlock:
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
>   
>   int __iommu_sva_unbind_device(struct device *dev, int pasid)
>   {
> -	return -ENOSYS; /* TODO */
> +	int ret = -ESRCH;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL;
> +	struct iommu_sva_param *param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	param = dev->iommu_param->sva_param;
> +	if (!param) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry(bond, &param->mm_list, dev_head) {
> +		if (bond->io_mm->pasid == pasid) {
> +			io_mm_detach_locked(bond);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +out_unlock:
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>   
>   static void __iommu_sva_unbind_device_all(struct device *dev)
>   {
> -	/* TODO */
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +	struct iommu_bond *bond, *next;
> +
> +	if (!param)
> +		return;
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
> +		io_mm_detach_locked(bond);
> +	spin_unlock(&iommu_sva_lock);
>   }
>   
>   /**
> @@ -82,6 +472,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
>   	param->features		= features;
>   	param->min_pasid	= min_pasid;
>   	param->max_pasid	= max_pasid;
> +	INIT_LIST_HEAD(&param->mm_list);
>   
>   	mutex_lock(&dev->iommu_param->sva_lock);
>   	if (dev->iommu_param->sva_param) {
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index aba3bf15d46c..7113fe398b70 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1525,6 +1525,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>   	domain->type = type;
>   	/* Assume all sizes by default; the driver may override this later */
>   	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
> +	INIT_LIST_HEAD(&domain->mm_list);
>   
>   	return domain;
>   }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9c49877e37a5..6a3ced6a5aa1 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -99,6 +99,20 @@ struct iommu_domain {
>   	void *handler_token;
>   	struct iommu_domain_geometry geometry;
>   	void *iova_cookie;
> +
> +	struct list_head mm_list;
> +};
> +
> +struct io_mm {
> +	int			pasid;
> +	/* IOMMU_SVA_FEAT_* */
> +	unsigned long		flags;
> +	struct list_head	devices;
> +	struct kref		kref;
> +	struct mm_struct	*mm;
> +
> +	/* Release callback for this mm */
> +	void (*release)(struct io_mm *io_mm);
>   };
>   
>   enum iommu_cap {
> @@ -201,6 +215,7 @@ struct iommu_sva_param {
>   	unsigned long features;
>   	unsigned int min_pasid;
>   	unsigned int max_pasid;
> +	struct list_head mm_list;
>   };
>   
>   /**
> @@ -212,6 +227,12 @@ struct iommu_sva_param {
>    * @detach_dev: detach device from an iommu domain
>    * @sva_init_device: initialize Shared Virtual Addressing for a device
>    * @sva_shutdown_device: shutdown Shared Virtual Addressing for a device
> + * @mm_alloc: allocate io_mm
> + * @mm_free: free io_mm
> + * @mm_attach: attach io_mm to a device. Install PASID entry if necessary. Must
> + *             not sleep.
> + * @mm_detach: detach io_mm from a device. Remove PASID entry and
> + *             flush associated TLB entries if necessary. Must not sleep.
>    * @map: map a physically contiguous memory region to an iommu domain
>    * @unmap: unmap a physically contiguous memory region from an iommu domain
>    * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
> @@ -249,6 +270,14 @@ struct iommu_ops {
>   	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
>   	int (*sva_init_device)(struct device *dev, struct iommu_sva_param *param);
>   	void (*sva_shutdown_device)(struct device *dev);
> +	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
> +				  struct mm_struct *mm,
> +				  unsigned long flags);
> +	void (*mm_free)(struct io_mm *io_mm);
> +	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
> +			 struct io_mm *io_mm, bool attach_domain);
> +	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
> +			  struct io_mm *io_mm, bool detach_domain);
>   	int (*map)(struct iommu_domain *domain, unsigned long iova,
>   		   phys_addr_t paddr, size_t size, int prot);
>   	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25  3:15     ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-25  3:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi,

On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> Allocate IOMMU mm structures and bind them to devices. Four operations are
> added to IOMMU drivers:
> 
> * mm_alloc(): to create an io_mm structure and perform architecture-
>    specific operations required to grab the process (for instance on ARM,
>    pin down the CPU ASID so that the process doesn't get assigned a new
>    ASID on rollover).
> 
>    There is a single valid io_mm structure per Linux mm. Future extensions
>    may also use io_mm for kernel-managed address spaces, populated with
>    map()/unmap() calls instead of bound to process address spaces. This
>    patch focuses on "shared" io_mm.
> 
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>    device is capable of sharing an address space, and writes the PASID
>    table entry to install the pgd.
> 
>    Some IOMMU drivers will have a single PASID table per domain, for
>    convenience. Other can implement it differently but to help these
>    drivers, mm_attach and mm_detach take 'attach_domain' and
>    'detach_domain' parameters, that tell whether they need to set and clear
>    the PASID entry or only send the required TLB invalidations.
> 
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>    PASID table entry and invalidates the IOTLBs.
> 
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>    release the process.
> 
> mm_attach and mm_detach operations are serialized with a spinlock. When
> trying to optimize this code, we should at least prevent concurrent
> attach()/detach() on the same domain (so multi-level PASID table code can
> allocate tables lazily). mm_alloc() can sleep, but mm_free must not
> (because we'll have to call it from call_srcu later on).
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
> 
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
> v2->v3: use sva_lock, comment updates
> ---
>   drivers/iommu/iommu-sva.c | 397 +++++++++++++++++++++++++++++++++++++-
>   drivers/iommu/iommu.c     |   1 +
>   include/linux/iommu.h     |  29 +++
>   3 files changed, 424 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index d60d4f0bb89e..a486bc947335 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -5,25 +5,415 @@
>    * Copyright (C) 2018 ARM Ltd.
>    */
>   
> +#include <linux/idr.h>
>   #include <linux/iommu.h>
> +#include <linux/sched/mm.h>
>   #include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
> + * B. All devices within the same domain access the same address spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each mm. In the
> + * example, devices use PASID 1 to read/write into address space X and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
> + * With this model, a single call binds all devices in a given domain to an
> + * address space. Other devices in the domain will get the same bond implicitly.
> + * However, users must issue one bind() for each device, because IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * In some IOMMUs, one entry (typically the first one) of the PASID table can be
> + * used to hold non-PASID translations. In this case PASID #0 is reserved and
> + * the first entry points to the io_pgtable pointer. In other IOMMUs the
> + * io_pgtable pointer is held in the device table and PASID #0 is available to
> + * the allocator.
> + */
> +
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
> + * used for returning errors). In practice implementations will use at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm, unsigned long flags)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm, flags);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the io_mm
> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->flags		= flags;
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +	/* Leave kref as zero until the io_mm is fully initialized */
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
> +			  param->max_pasid + 1, GFP_ATOMIC);
> +	io_mm->pasid = pasid;
> +	spin_unlock(&iommu_sva_lock);
> +	idr_preload_end();
> +
> +	if (pasid < 0) {
> +		ret = pasid;
> +		goto err_free_mm;
> +	}
> +
> +	/* TODO: keep track of mm. For the moment, abort. */
> +	ret = -ENOSYS;
> +	spin_lock(&iommu_sva_lock);
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +	spin_unlock(&iommu_sva_lock);
> +
> +err_free_mm:
> +	io_mm->release(io_mm);
> +	mmdrop(mm);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm = io_mm->mm;
> +
> +	io_mm->release(io_mm);
> +	mmdrop(mm);
> +}
> +
> +static void io_mm_release(struct kref *kref)
> +{
> +	struct io_mm *io_mm;
> +
> +	io_mm = container_of(kref, struct io_mm, kref);
> +	WARN_ON(!list_empty(&io_mm->devices));
> +
> +	/* The PASID can now be reallocated for another mm... */
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +	/* ... but this mm is freed after a grace period (TODO) */
> +	io_mm_free(io_mm);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the io_mm was successfully taken.
> + * Returns zero if the io_mm is being freed and should not be used.
> + */
> +static int io_mm_get_locked(struct io_mm *io_mm)
> +{
> +	if (io_mm)
> +		return kref_get_unless_zero(&io_mm->kref);
> +
> +	return 0;
> +}
> +
> +static void io_mm_put_locked(struct io_mm *io_mm)
> +{
> +	kref_put(&io_mm->kref, io_mm_release);
> +}
> +
> +static void io_mm_put(struct io_mm *io_mm)
> +{
> +	spin_lock(&iommu_sva_lock);
> +	io_mm_put_locked(io_mm);
> +	spin_unlock(&iommu_sva_lock);
> +}
> +
> +static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
> +			struct io_mm *io_mm, void *drvdata)
> +{
> +	int ret;
> +	bool attach_domain = true;
> +	int pasid = io_mm->pasid;
> +	struct iommu_bond *bond, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +		return -ENODEV;
> +
> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> +		return -ERANGE;
> +
> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond->domain		= domain;
> +	bond->io_mm		= io_mm;
> +	bond->dev		= dev;
> +	bond->drvdata		= drvdata;
> +
> +	spin_lock(&iommu_sva_lock);
> +	/*
> +	 * Check if this io_mm is already bound to the domain. In which case the
> +	 * IOMMU driver doesn't have to install the PASID table entry.
> +	 */
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == io_mm) {
> +			attach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
> +	if (ret) {
> +		kfree(bond);
> +		goto out_unlock;
> +	}
> +
> +	list_add(&bond->mm_head, &io_mm->devices);
> +	list_add(&bond->domain_head, &domain->mm_list);
> +	list_add(&bond->dev_head, &param->mm_list);
> +
> +out_unlock:
> +	spin_unlock(&iommu_sva_lock);
> +	return ret;
> +}
> +
> +static void io_mm_detach_locked(struct iommu_bond *bond)
> +{
> +	struct iommu_bond *tmp;
> +	bool detach_domain = true;
> +	struct iommu_domain *domain = bond->domain;
> +
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
> +			detach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	list_del(&bond->mm_head);
> +	list_del(&bond->domain_head);
> +	list_del(&bond->dev_head);
> +
> +	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
> +
> +	io_mm_put_locked(bond->io_mm);
> +	kfree(bond);
> +}
>   
>   int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>   			    unsigned long flags, void *drvdata)
>   {
> -	return -ENOSYS; /* TODO */
> +	int i;
> +	int ret = 0;
> +	struct iommu_bond *bond;
> +	struct io_mm *io_mm = NULL;
> +	struct iommu_domain *domain;
> +	struct iommu_sva_param *param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	param = dev->iommu_param->sva_param;
> +	if (!param || (flags & ~param->features)) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {

This might be problematic for vt-d (and other possible arch's which use
PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
might be allocated for:

(1) SVA
(2) Device Assignable Interface (might be a mdev or directly managed
     within a device driver).
(3) SVA in VM guest
(4) Device Assignable Interface in VM guest

So we can't expect that an io_mm pointer was associated with each PASID.
And this code might run into problem if the pasid is allocated for
usages other than SVA.

Best regards,
Lu Baolu

> +		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
> +			/* ... Unless it's already bound to this device */
> +			list_for_each_entry(bond, &io_mm->devices, mm_head) {
> +				if (bond->dev == dev) {
> +					ret = -EEXIST;
> +					io_mm_put_locked(io_mm);
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +	if (ret)
> +		goto out_unlock;
> +
> +	/* Require identical features within an io_mm for now */
> +	if (io_mm && (flags != io_mm->flags)) {
> +		io_mm_put(io_mm);
> +		ret = -EDOM;
> +		goto out_unlock;
> +	}
> +
> +	if (!io_mm) {
> +		io_mm = io_mm_alloc(domain, dev, mm, flags);
> +		if (IS_ERR(io_mm)) {
> +			ret = PTR_ERR(io_mm);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	ret = io_mm_attach(domain, dev, io_mm, drvdata);
> +	if (ret)
> +		io_mm_put(io_mm);
> +	else
> +		*pasid = io_mm->pasid;
> +
> +out_unlock:
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
>   
>   int __iommu_sva_unbind_device(struct device *dev, int pasid)
>   {
> -	return -ENOSYS; /* TODO */
> +	int ret = -ESRCH;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL;
> +	struct iommu_sva_param *param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&dev->iommu_param->sva_lock);
> +	param = dev->iommu_param->sva_param;
> +	if (!param) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry(bond, &param->mm_list, dev_head) {
> +		if (bond->io_mm->pasid == pasid) {
> +			io_mm_detach_locked(bond);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +out_unlock:
> +	mutex_unlock(&dev->iommu_param->sva_lock);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>   
>   static void __iommu_sva_unbind_device_all(struct device *dev)
>   {
> -	/* TODO */
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +	struct iommu_bond *bond, *next;
> +
> +	if (!param)
> +		return;
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
> +		io_mm_detach_locked(bond);
> +	spin_unlock(&iommu_sva_lock);
>   }
>   
>   /**
> @@ -82,6 +472,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long features,
>   	param->features		= features;
>   	param->min_pasid	= min_pasid;
>   	param->max_pasid	= max_pasid;
> +	INIT_LIST_HEAD(&param->mm_list);
>   
>   	mutex_lock(&dev->iommu_param->sva_lock);
>   	if (dev->iommu_param->sva_param) {
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index aba3bf15d46c..7113fe398b70 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1525,6 +1525,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>   	domain->type = type;
>   	/* Assume all sizes by default; the driver may override this later */
>   	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
> +	INIT_LIST_HEAD(&domain->mm_list);
>   
>   	return domain;
>   }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9c49877e37a5..6a3ced6a5aa1 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -99,6 +99,20 @@ struct iommu_domain {
>   	void *handler_token;
>   	struct iommu_domain_geometry geometry;
>   	void *iova_cookie;
> +
> +	struct list_head mm_list;
> +};
> +
> +struct io_mm {
> +	int			pasid;
> +	/* IOMMU_SVA_FEAT_* */
> +	unsigned long		flags;
> +	struct list_head	devices;
> +	struct kref		kref;
> +	struct mm_struct	*mm;
> +
> +	/* Release callback for this mm */
> +	void (*release)(struct io_mm *io_mm);
>   };
>   
>   enum iommu_cap {
> @@ -201,6 +215,7 @@ struct iommu_sva_param {
>   	unsigned long features;
>   	unsigned int min_pasid;
>   	unsigned int max_pasid;
> +	struct list_head mm_list;
>   };
>   
>   /**
> @@ -212,6 +227,12 @@ struct iommu_sva_param {
>    * @detach_dev: detach device from an iommu domain
>    * @sva_init_device: initialize Shared Virtual Addressing for a device
>    * @sva_shutdown_device: shutdown Shared Virtual Addressing for a device
> + * @mm_alloc: allocate io_mm
> + * @mm_free: free io_mm
> + * @mm_attach: attach io_mm to a device. Install PASID entry if necessary. Must
> + *             not sleep.
> + * @mm_detach: detach io_mm from a device. Remove PASID entry and
> + *             flush associated TLB entries if necessary. Must not sleep.
>    * @map: map a physically contiguous memory region to an iommu domain
>    * @unmap: unmap a physically contiguous memory region from an iommu domain
>    * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
> @@ -249,6 +270,14 @@ struct iommu_ops {
>   	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
>   	int (*sva_init_device)(struct device *dev, struct iommu_sva_param *param);
>   	void (*sva_shutdown_device)(struct device *dev);
> +	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
> +				  struct mm_struct *mm,
> +				  unsigned long flags);
> +	void (*mm_free)(struct io_mm *io_mm);
> +	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
> +			 struct io_mm *io_mm, bool attach_domain);
> +	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
> +			  struct io_mm *io_mm, bool detach_domain);
>   	int (*map)(struct iommu_domain *domain, unsigned long iova,
>   		   phys_addr_t paddr, size_t size, int prot);
>   	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 06/10] iommu/sva: Search mm by PASID
@ 2018-09-25  4:59     ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-25  4:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu
  Cc: baolu.lu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, andrew.murray, will.deacon, robin.murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Hi,

On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> The fault handler will need to find an mm given its PASID. This is the
> reason we have an IDR for storing address spaces, so hook it up.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>   drivers/iommu/iommu-sva.c | 26 ++++++++++++++++++++++++++
>   include/linux/iommu.h     |  7 +++++++
>   2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 5ff8967cb213..ee86f00ee1b9 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -636,6 +636,32 @@ void iommu_sva_unbind_device_all(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
>   
> +/**
> + * iommu_sva_find() - Find mm associated to the given PASID
> + * @pasid: Process Address Space ID assigned to the mm
> + *
> + * Returns the mm corresponding to this PASID, or NULL if not found. A reference
> + * to the mm is taken, and must be released with mmput().
> + */
> +struct mm_struct *iommu_sva_find(int pasid)
> +{
> +	struct io_mm *io_mm;
> +	struct mm_struct *mm = NULL;
> +
> +	spin_lock(&iommu_sva_lock);
> +	io_mm = idr_find(&iommu_pasid_idr, pasid);

The same thing here. We can't guarantee that a pointer of mm_struct type
is associated with a pasid value when pasid is also used for other
purposes. If hardware reports a bad pasid, this function might run into
problem.

Best regards,
Lu Baolu

> +	if (io_mm && io_mm_get_locked(io_mm)) {
> +		if (mmget_not_zero(io_mm->mm))
> +			mm = io_mm->mm;
> +
> +		io_mm_put_locked(io_mm);
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return mm;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_find);
> +
>   /**
>    * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
>    * @dev: the device
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 429f3dc37a35..a457650b80de 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -987,6 +987,8 @@ extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
>   				   void *drvdata);
>   extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
>   extern void iommu_sva_unbind_device_all(struct device *dev);
> +extern struct mm_struct *iommu_sva_find(int pasid);
> +
>   #else /* CONFIG_IOMMU_SVA */
>   static inline int iommu_sva_init_device(struct device *dev,
>   					unsigned long features,
> @@ -1016,6 +1018,11 @@ static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
>   static inline void iommu_sva_unbind_device_all(struct device *dev)
>   {
>   }
> +
> +static inline struct mm_struct *iommu_sva_find(int pasid)
> +{
> +	return NULL;
> +}
>   #endif /* CONFIG_IOMMU_SVA */
>   
>   #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 06/10] iommu/sva: Search mm by PASID
@ 2018-09-25  4:59     ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-25  4:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi,

On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> The fault handler will need to find an mm given its PASID. This is the
> reason we have an IDR for storing address spaces, so hook it up.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>   drivers/iommu/iommu-sva.c | 26 ++++++++++++++++++++++++++
>   include/linux/iommu.h     |  7 +++++++
>   2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 5ff8967cb213..ee86f00ee1b9 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -636,6 +636,32 @@ void iommu_sva_unbind_device_all(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
>   
> +/**
> + * iommu_sva_find() - Find mm associated to the given PASID
> + * @pasid: Process Address Space ID assigned to the mm
> + *
> + * Returns the mm corresponding to this PASID, or NULL if not found. A reference
> + * to the mm is taken, and must be released with mmput().
> + */
> +struct mm_struct *iommu_sva_find(int pasid)
> +{
> +	struct io_mm *io_mm;
> +	struct mm_struct *mm = NULL;
> +
> +	spin_lock(&iommu_sva_lock);
> +	io_mm = idr_find(&iommu_pasid_idr, pasid);

The same thing here. We can't guarantee that a pointer of mm_struct type
is associated with a pasid value when pasid is also used for other
purposes. If hardware reports a bad pasid, this function might run into
problem.

Best regards,
Lu Baolu

> +	if (io_mm && io_mm_get_locked(io_mm)) {
> +		if (mmget_not_zero(io_mm->mm))
> +			mm = io_mm->mm;
> +
> +		io_mm_put_locked(io_mm);
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return mm;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_find);
> +
>   /**
>    * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
>    * @dev: the device
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 429f3dc37a35..a457650b80de 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -987,6 +987,8 @@ extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
>   				   void *drvdata);
>   extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
>   extern void iommu_sva_unbind_device_all(struct device *dev);
> +extern struct mm_struct *iommu_sva_find(int pasid);
> +
>   #else /* CONFIG_IOMMU_SVA */
>   static inline int iommu_sva_init_device(struct device *dev,
>   					unsigned long features,
> @@ -1016,6 +1018,11 @@ static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
>   static inline void iommu_sva_unbind_device_all(struct device *dev)
>   {
>   }
> +
> +static inline struct mm_struct *iommu_sva_find(int pasid)
> +{
> +	return NULL;
> +}
>   #endif /* CONFIG_IOMMU_SVA */
>   
>   #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25 10:32       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-25 10:32 UTC (permalink / raw)
  To: Lu Baolu, iommu
  Cc: joro, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 25/09/2018 04:15, Lu Baolu wrote:
>> +     /* If an io_mm already exists, use it */
>> +     spin_lock(&iommu_sva_lock);
>> +     idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> 
> This might be problematic for vt-d (and other possible arch's which use
> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
> might be allocated for:
> 
> (1) SVA
> (2) Device Assignable Interface (might be a mdev or directly managed
>      within a device driver).
> (3) SVA in VM guest
> (4) Device Assignable Interface in VM guest
> 
> So we can't expect that an io_mm pointer was associated with each PASID.

Yes as discussed on the previous series, we'll need to move the PASID
allocator outside of iommu-sva at some point, and even outside of
drivers/iommu since device driver might want to use the PASID allocator
without an IOMMU.

To be usable by all consumers of PASIDs, that allocator will need the
same interface as IDR, so I don't think we have a problem here. I
haven't had time or need to write such allocator yet (and don't plan to
do it as part of this series), but I drafted an interface, that at least
fulfills the needs of SVA.

* Single system-wide PASID space
* Multiple consumers, each associating their own structure to PASIDs.
  Each consumer gets a token.
* Device drivers might want to use both SVA and private PASIDs for a
  device at the same time.
* In my opinion "pasid" isn't the right name, "ioasid" would be better
  but that's not important.

typedef unsigned int pasid_t;

/* Returns consumer token */
void *pasid_get_consumer();
void pasid_put_consumer(void *consumer);

/* Returns pasid or invalid (pasid_t)(-1) */
pasid_t pasid_alloc(void *consumer, pasid_t min, pasid_t max,
                    void *private);
void pasid_remove(pasid_t pasid);

/* Iterate over PASIDs for this consumer. Func returns non-zero to stop
iterating */
int pasid_for_each(void *consumer, void *iter_data,
		   int (*func)(void *iter_data, pasid_t pasid,
			       void *private));
/* Returns priv data or NULL */
void *pasid_find(void *consumer, pasid_t pasid);

Thanks,
Jean

> And this code might run into problem if the pasid is allocated for
> usages other than SVA.
> 
> Best regards,
> Lu Baolu


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25 10:32       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-25 10:32 UTC (permalink / raw)
  To: Lu Baolu, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 25/09/2018 04:15, Lu Baolu wrote:
>> +     /* If an io_mm already exists, use it */
>> +     spin_lock(&iommu_sva_lock);
>> +     idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> 
> This might be problematic for vt-d (and other possible arch's which use
> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
> might be allocated for:
> 
> (1) SVA
> (2) Device Assignable Interface (might be a mdev or directly managed
>      within a device driver).
> (3) SVA in VM guest
> (4) Device Assignable Interface in VM guest
> 
> So we can't expect that an io_mm pointer was associated with each PASID.

Yes as discussed on the previous series, we'll need to move the PASID
allocator outside of iommu-sva at some point, and even outside of
drivers/iommu since device driver might want to use the PASID allocator
without an IOMMU.

To be usable by all consumers of PASIDs, that allocator will need the
same interface as IDR, so I don't think we have a problem here. I
haven't had time or need to write such allocator yet (and don't plan to
do it as part of this series), but I drafted an interface, that at least
fulfills the needs of SVA.

* Single system-wide PASID space
* Multiple consumers, each associating their own structure to PASIDs.
  Each consumer gets a token.
* Device drivers might want to use both SVA and private PASIDs for a
  device at the same time.
* In my opinion "pasid" isn't the right name, "ioasid" would be better
  but that's not important.

typedef unsigned int pasid_t;

/* Returns consumer token */
void *pasid_get_consumer();
void pasid_put_consumer(void *consumer);

/* Returns pasid or invalid (pasid_t)(-1) */
pasid_t pasid_alloc(void *consumer, pasid_t min, pasid_t max,
                    void *private);
void pasid_remove(pasid_t pasid);

/* Iterate over PASIDs for this consumer. Func returns non-zero to stop
iterating */
int pasid_for_each(void *consumer, void *iter_data,
		   int (*func)(void *iter_data, pasid_t pasid,
			       void *private));
/* Returns priv data or NULL */
void *pasid_find(void *consumer, pasid_t pasid);

Thanks,
Jean

> And this code might run into problem if the pasid is allocated for
> usages other than SVA.
> 
> Best regards,
> Lu Baolu

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-25 13:16         ` Joerg Roedel
  0 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-25 13:16 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Jean-Philippe Brucker, iommu, linux-pci, jcrouse,
	alex.williamson, Jonathan.Cameron, jacob.jun.pan,
	christian.koenig, eric.auger, kevin.tian, yi.l.liu,
	andrew.murray, will.deacon, robin.murphy, ashok.raj, xuzaibo,
	liguozhu, okaya, bharatku, ilias.apalodimas, shunyong.yang

On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
> > +int iommu_sva_init_device(struct device *dev, unsigned long features,
> > +		       unsigned int min_pasid, unsigned int max_pasid)
> > +{
> > +	int ret;
> > +	struct iommu_sva_param *param;
> > +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> 
> This doesn't work for vt-d. The domains for host iova are self-managed
> by vt-d driver itself. Hence, iommu_get_domain_for_dev() will always
> return NULL unless an UNMANAGED domain is attached to the device.
> 
> How about
> 
>       const struct iommu_ops *ops = dev->bus->iommu_ops;
> 
> instead?

The per-bus iommu-ops might go away sooner or later as we move to
per-device iommu-ops. How about fixing the VT-d driver to not keep that
domain internal to itself?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-25 13:16         ` Joerg Roedel
  0 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-25 13:16 UTC (permalink / raw)
  To: Lu Baolu
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, Jean-Philippe Brucker,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	robin.murphy-5wv7dgnIgG8

On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
> > +int iommu_sva_init_device(struct device *dev, unsigned long features,
> > +		       unsigned int min_pasid, unsigned int max_pasid)
> > +{
> > +	int ret;
> > +	struct iommu_sva_param *param;
> > +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> 
> This doesn't work for vt-d. The domains for host iova are self-managed
> by vt-d driver itself. Hence, iommu_get_domain_for_dev() will always
> return NULL unless an UNMANAGED domain is attached to the device.
> 
> How about
> 
>       const struct iommu_ops *ops = dev->bus->iommu_ops;
> 
> instead?

The per-bus iommu-ops might go away sooner or later as we move to
per-device iommu-ops. How about fixing the VT-d driver to not keep that
domain internal to itself?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25 13:26       ` Joerg Roedel
  0 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-25 13:26 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Jean-Philippe Brucker, iommu, linux-pci, jcrouse,
	alex.williamson, Jonathan.Cameron, jacob.jun.pan,
	christian.koenig, eric.auger, kevin.tian, yi.l.liu,
	andrew.murray, will.deacon, robin.murphy, ashok.raj, xuzaibo,
	liguozhu, okaya, bharatku, ilias.apalodimas, shunyong.yang

On Tue, Sep 25, 2018 at 11:15:40AM +0800, Lu Baolu wrote:
> This might be problematic for vt-d (and other possible arch's which use
> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
> might be allocated for:
> 
> (1) SVA
> (2) Device Assignable Interface (might be a mdev or directly managed
>     within a device driver).
> (3) SVA in VM guest
> (4) Device Assignable Interface in VM guest
> 
> So we can't expect that an io_mm pointer was associated with each PASID.
> And this code might run into problem if the pasid is allocated for
> usages other than SVA.

So all of these use-cases above should work in parallel on the same
device, just with different PASIDs? Or is a device always using only one
of the above modes at the same time?

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25 13:26       ` Joerg Roedel
  0 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-25 13:26 UTC (permalink / raw)
  To: Lu Baolu
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, Jean-Philippe Brucker,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	robin.murphy-5wv7dgnIgG8

On Tue, Sep 25, 2018 at 11:15:40AM +0800, Lu Baolu wrote:
> This might be problematic for vt-d (and other possible arch's which use
> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
> might be allocated for:
> 
> (1) SVA
> (2) Device Assignable Interface (might be a mdev or directly managed
>     within a device driver).
> (3) SVA in VM guest
> (4) Device Assignable Interface in VM guest
> 
> So we can't expect that an io_mm pointer was associated with each PASID.
> And this code might run into problem if the pasid is allocated for
> usages other than SVA.

So all of these use-cases above should work in parallel on the same
device, just with different PASIDs? Or is a device always using only one
of the above modes at the same time?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-25 22:46           ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-25 22:46 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lu Baolu, Jean-Philippe Brucker, iommu, linux-pci, jcrouse,
	alex.williamson, Jonathan.Cameron, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, andrew.murray, will.deacon, robin.murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang, jacob.jun.pan

On Tue, 25 Sep 2018 15:16:47 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
> > > +int iommu_sva_init_device(struct device *dev, unsigned long
> > > features,
> > > +		       unsigned int min_pasid, unsigned int
> > > max_pasid) +{
> > > +	int ret;
> > > +	struct iommu_sva_param *param;
> > > +	struct iommu_domain *domain =
> > > iommu_get_domain_for_dev(dev);  
> > 
> > This doesn't work for vt-d. The domains for host iova are
> > self-managed by vt-d driver itself. Hence,
> > iommu_get_domain_for_dev() will always return NULL unless an
> > UNMANAGED domain is attached to the device.
> > 
> > How about
> > 
> >       const struct iommu_ops *ops = dev->bus->iommu_ops;
> > 
> > instead?  
> 
> The per-bus iommu-ops might go away sooner or later as we move to
> per-device iommu-ops. How about fixing the VT-d driver to not keep
> that domain internal to itself?
> 
Just to understand more specifically, you mean let VT-d driver also
support IOMMU_DOMAIN_DMA as default domain?

But I think the ordering issue is still there in that the DOMAIN_DMA
domain will not be created until DMA map call is invoked. I think
sva_init_device should not depend on the default domain.

> Regards,
> 
> 	Joerg

[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-25 22:46           ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-25 22:46 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	Jean-Philippe Brucker, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo

On Tue, 25 Sep 2018 15:16:47 +0200
Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:

> On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
> > > +int iommu_sva_init_device(struct device *dev, unsigned long
> > > features,
> > > +		       unsigned int min_pasid, unsigned int
> > > max_pasid) +{
> > > +	int ret;
> > > +	struct iommu_sva_param *param;
> > > +	struct iommu_domain *domain =
> > > iommu_get_domain_for_dev(dev);  
> > 
> > This doesn't work for vt-d. The domains for host iova are
> > self-managed by vt-d driver itself. Hence,
> > iommu_get_domain_for_dev() will always return NULL unless an
> > UNMANAGED domain is attached to the device.
> > 
> > How about
> > 
> >       const struct iommu_ops *ops = dev->bus->iommu_ops;
> > 
> > instead?  
> 
> The per-bus iommu-ops might go away sooner or later as we move to
> per-device iommu-ops. How about fixing the VT-d driver to not keep
> that domain internal to itself?
> 
Just to understand more specifically, you mean let VT-d driver also
support IOMMU_DOMAIN_DMA as default domain?

But I think the ordering issue is still there in that the DOMAIN_DMA
domain will not be created until DMA map call is invoked. I think
sva_init_device should not depend on the default domain.

> Regards,
> 
> 	Joerg

[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25 23:33         ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-25 23:33 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: baolu.lu, Jean-Philippe Brucker, iommu, linux-pci, jcrouse,
	alex.williamson, Jonathan.Cameron, jacob.jun.pan,
	christian.koenig, eric.auger, kevin.tian, yi.l.liu,
	andrew.murray, will.deacon, robin.murphy, ashok.raj, xuzaibo,
	liguozhu, okaya, bharatku, ilias.apalodimas, shunyong.yang

Hi Joerg,

On 09/25/2018 09:26 PM, Joerg Roedel wrote:
> On Tue, Sep 25, 2018 at 11:15:40AM +0800, Lu Baolu wrote:
>> This might be problematic for vt-d (and other possible arch's which use
>> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
>> might be allocated for:
>>
>> (1) SVA
>> (2) Device Assignable Interface (might be a mdev or directly managed
>>      within a device driver).
>> (3) SVA in VM guest
>> (4) Device Assignable Interface in VM guest
>>
>> So we can't expect that an io_mm pointer was associated with each PASID.
>> And this code might run into problem if the pasid is allocated for
>> usages other than SVA.
> 
> So all of these use-cases above should work in parallel on the same
> device, just with different PASIDs?

No. It's not required.

> Or is a device always using only one
> of the above modes at the same time?

A device might use one or multiple modes described above at the same
time.

Best regards,
Lu Baolu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-25 23:33         ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-25 23:33 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	Jean-Philippe Brucker, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo

Hi Joerg,

On 09/25/2018 09:26 PM, Joerg Roedel wrote:
> On Tue, Sep 25, 2018 at 11:15:40AM +0800, Lu Baolu wrote:
>> This might be problematic for vt-d (and other possible arch's which use
>> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
>> might be allocated for:
>>
>> (1) SVA
>> (2) Device Assignable Interface (might be a mdev or directly managed
>>      within a device driver).
>> (3) SVA in VM guest
>> (4) Device Assignable Interface in VM guest
>>
>> So we can't expect that an io_mm pointer was associated with each PASID.
>> And this code might run into problem if the pasid is allocated for
>> usages other than SVA.
> 
> So all of these use-cases above should work in parallel on the same
> device, just with different PASIDs?

No. It's not required.

> Or is a device always using only one
> of the above modes at the same time?

A device might use one or multiple modes described above at the same
time.

Best regards,
Lu Baolu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26  3:12         ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-26  3:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu
  Cc: baolu.lu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Hi,

On 09/25/2018 06:32 PM, Jean-Philippe Brucker wrote:
> On 25/09/2018 04:15, Lu Baolu wrote:
>>> +     /* If an io_mm already exists, use it */
>>> +     spin_lock(&iommu_sva_lock);
>>> +     idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
>>
>> This might be problematic for vt-d (and other possible arch's which use
>> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
>> might be allocated for:
>>
>> (1) SVA
>> (2) Device Assignable Interface (might be a mdev or directly managed
>>       within a device driver).
>> (3) SVA in VM guest
>> (4) Device Assignable Interface in VM guest
>>
>> So we can't expect that an io_mm pointer was associated with each PASID.
> 
> Yes as discussed on the previous series, we'll need to move the PASID
> allocator outside of iommu-sva at some point, and even outside of
> drivers/iommu since device driver might want to use the PASID allocator
> without an IOMMU.
> 
> To be usable by all consumers of PASIDs, that allocator will need the
> same interface as IDR, so I don't think we have a problem here. I
> haven't had time or need to write such allocator yet (and don't plan to
> do it as part of this series), but I drafted an interface, that at least
> fulfills the needs of SVA.

I have a patch set for the global pasid allocator. It mostly matches
your idea. I can send it out for comments later.

Best regards,
Lu Baolu

> 
> * Single system-wide PASID space
> * Multiple consumers, each associating their own structure to PASIDs.
>    Each consumer gets a token.
> * Device drivers might want to use both SVA and private PASIDs for a
>    device at the same time.
> * In my opinion "pasid" isn't the right name, "ioasid" would be better
>    but that's not important.
> 
> typedef unsigned int pasid_t;
> 
> /* Returns consumer token */
> void *pasid_get_consumer();
> void pasid_put_consumer(void *consumer);
> 
> /* Returns pasid or invalid (pasid_t)(-1) */
> pasid_t pasid_alloc(void *consumer, pasid_t min, pasid_t max,
>                      void *private);
> void pasid_remove(pasid_t pasid);
> 
> /* Iterate over PASIDs for this consumer. Func returns non-zero to stop
> iterating */
> int pasid_for_each(void *consumer, void *iter_data,
> 		   int (*func)(void *iter_data, pasid_t pasid,
> 			       void *private));
> /* Returns priv data or NULL */
> void *pasid_find(void *consumer, pasid_t pasid);
> 
> Thanks,
> Jean
> 
>> And this code might run into problem if the pasid is allocated for
>> usages other than SVA.
>>
>> Best regards,
>> Lu Baolu
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26  3:12         ` Lu Baolu
  0 siblings, 0 replies; 87+ messages in thread
From: Lu Baolu @ 2018-09-26  3:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

Hi,

On 09/25/2018 06:32 PM, Jean-Philippe Brucker wrote:
> On 25/09/2018 04:15, Lu Baolu wrote:
>>> +     /* If an io_mm already exists, use it */
>>> +     spin_lock(&iommu_sva_lock);
>>> +     idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
>>
>> This might be problematic for vt-d (and other possible arch's which use
>> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
>> might be allocated for:
>>
>> (1) SVA
>> (2) Device Assignable Interface (might be a mdev or directly managed
>>       within a device driver).
>> (3) SVA in VM guest
>> (4) Device Assignable Interface in VM guest
>>
>> So we can't expect that an io_mm pointer was associated with each PASID.
> 
> Yes as discussed on the previous series, we'll need to move the PASID
> allocator outside of iommu-sva at some point, and even outside of
> drivers/iommu since device driver might want to use the PASID allocator
> without an IOMMU.
> 
> To be usable by all consumers of PASIDs, that allocator will need the
> same interface as IDR, so I don't think we have a problem here. I
> haven't had time or need to write such allocator yet (and don't plan to
> do it as part of this series), but I drafted an interface, that at least
> fulfills the needs of SVA.

I have a patch set for the global pasid allocator. It mostly matches
your idea. I can send it out for comments later.

Best regards,
Lu Baolu

> 
> * Single system-wide PASID space
> * Multiple consumers, each associating their own structure to PASIDs.
>    Each consumer gets a token.
> * Device drivers might want to use both SVA and private PASIDs for a
>    device at the same time.
> * In my opinion "pasid" isn't the right name, "ioasid" would be better
>    but that's not important.
> 
> typedef unsigned int pasid_t;
> 
> /* Returns consumer token */
> void *pasid_get_consumer();
> void pasid_put_consumer(void *consumer);
> 
> /* Returns pasid or invalid (pasid_t)(-1) */
> pasid_t pasid_alloc(void *consumer, pasid_t min, pasid_t max,
>                      void *private);
> void pasid_remove(pasid_t pasid);
> 
> /* Iterate over PASIDs for this consumer. Func returns non-zero to stop
> iterating */
> int pasid_for_each(void *consumer, void *iter_data,
> 		   int (*func)(void *iter_data, pasid_t pasid,
> 			       void *private));
> /* Returns priv data or NULL */
> void *pasid_find(void *consumer, pasid_t pasid);
> 
> Thanks,
> Jean
> 
>> And this code might run into problem if the pasid is allocated for
>> usages other than SVA.
>>
>> Best regards,
>> Lu Baolu
> 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
  2018-09-25 22:46           ` Jacob Pan
@ 2018-09-26 10:14             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-26 10:14 UTC (permalink / raw)
  To: Jacob Pan, Joerg Roedel
  Cc: Lu Baolu, iommu, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 25/09/2018 23:46, Jacob Pan wrote:
> On Tue, 25 Sep 2018 15:16:47 +0200
> Joerg Roedel <joro@8bytes.org> wrote:
> 
>> On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
>> > > +int iommu_sva_init_device(struct device *dev, unsigned long
>> > > features,
>> > > +                unsigned int min_pasid, unsigned int
>> > > max_pasid) +{
>> > > + int ret;
>> > > + struct iommu_sva_param *param;
>> > > + struct iommu_domain *domain =
>> > > iommu_get_domain_for_dev(dev);  
>> > 
>> > This doesn't work for vt-d. The domains for host iova are
>> > self-managed by vt-d driver itself. Hence,
>> > iommu_get_domain_for_dev() will always return NULL unless an
>> > UNMANAGED domain is attached to the device.
>> > 
>> > How about
>> > 
>> >       const struct iommu_ops *ops = dev->bus->iommu_ops;
>> > 
>> > instead?  
>> 
>> The per-bus iommu-ops might go away sooner or later as we move to
>> per-device iommu-ops. How about fixing the VT-d driver to not keep
>> that domain internal to itself?
>> 
> Just to understand more specifically, you mean let VT-d driver also
> support IOMMU_DOMAIN_DMA as default domain?
> 
> But I think the ordering issue is still there in that the DOMAIN_DMA
> domain will not be created until DMA map call is invoked. I think
> sva_init_device should not depend on the default domain.

Normally the default domain is created when the .add_device() IOMMU op
calls iommu_group_get_for_dev(). That should happen before the driver
probe, so before it can call sva_init_device()

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-26 10:14             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-26 10:14 UTC (permalink / raw)
  To: Jacob Pan, Joerg Roedel
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 25/09/2018 23:46, Jacob Pan wrote:
> On Tue, 25 Sep 2018 15:16:47 +0200
> Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> 
>> On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
>> > > +int iommu_sva_init_device(struct device *dev, unsigned long
>> > > features,
>> > > +                unsigned int min_pasid, unsigned int
>> > > max_pasid) +{
>> > > + int ret;
>> > > + struct iommu_sva_param *param;
>> > > + struct iommu_domain *domain =
>> > > iommu_get_domain_for_dev(dev);  
>> > 
>> > This doesn't work for vt-d. The domains for host iova are
>> > self-managed by vt-d driver itself. Hence,
>> > iommu_get_domain_for_dev() will always return NULL unless an
>> > UNMANAGED domain is attached to the device.
>> > 
>> > How about
>> > 
>> >       const struct iommu_ops *ops = dev->bus->iommu_ops;
>> > 
>> > instead?  
>> 
>> The per-bus iommu-ops might go away sooner or later as we move to
>> per-device iommu-ops. How about fixing the VT-d driver to not keep
>> that domain internal to itself?
>> 
> Just to understand more specifically, you mean let VT-d driver also
> support IOMMU_DOMAIN_DMA as default domain?
> 
> But I think the ordering issue is still there in that the DOMAIN_DMA
> domain will not be created until DMA map call is invoked. I think
> sva_init_device should not depend on the default domain.

Normally the default domain is created when the .add_device() IOMMU op
calls iommu_group_get_for_dev(). That should happen before the driver
probe, so before it can call sva_init_device()

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 10:20           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-26 10:20 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: iommu, linux-pci, jcrouse, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 26/09/2018 00:33, Lu Baolu wrote:
> Hi Joerg,
> 
> On 09/25/2018 09:26 PM, Joerg Roedel wrote:
>> On Tue, Sep 25, 2018 at 11:15:40AM +0800, Lu Baolu wrote:
>>> This might be problematic for vt-d (and other possible arch's which use
>>> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
>>> might be allocated for:
>>>
>>> (1) SVA
>>> (2) Device Assignable Interface (might be a mdev or directly managed
>>>      within a device driver).
>>> (3) SVA in VM guest
>>> (4) Device Assignable Interface in VM guest
>>>
>>> So we can't expect that an io_mm pointer was associated with each PASID.
>>> And this code might run into problem if the pasid is allocated for
>>> usages other than SVA.
>> 
>> So all of these use-cases above should work in parallel on the same
>> device, just with different PASIDs?
> 
> No. It's not required.
> 
>> Or is a device always using only one
>> of the above modes at the same time?
> 
> A device might use one or multiple modes described above at the same
> time.

Yes, at the moment it's difficult to guess what device drivers will
want, but I can imagine some driver offering SVA to userspace, while
keeping a few PASIDs for themselves to map kernel memory. Or create mdev
devices for virtualization while also allowing bare-metal SVA. So I
think we should aim at enabling these use-cases in parallel, even if it
doesn't necessarily need to be possible right now.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 10:20           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-26 10:20 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	Robin Murphy

On 26/09/2018 00:33, Lu Baolu wrote:
> Hi Joerg,
> 
> On 09/25/2018 09:26 PM, Joerg Roedel wrote:
>> On Tue, Sep 25, 2018 at 11:15:40AM +0800, Lu Baolu wrote:
>>> This might be problematic for vt-d (and other possible arch's which use
>>> PASID other than SVA). When vt-d iommu works in scalable mode, a PASID
>>> might be allocated for:
>>>
>>> (1) SVA
>>> (2) Device Assignable Interface (might be a mdev or directly managed
>>>      within a device driver).
>>> (3) SVA in VM guest
>>> (4) Device Assignable Interface in VM guest
>>>
>>> So we can't expect that an io_mm pointer was associated with each PASID.
>>> And this code might run into problem if the pasid is allocated for
>>> usages other than SVA.
>> 
>> So all of these use-cases above should work in parallel on the same
>> device, just with different PASIDs?
> 
> No. It's not required.
> 
>> Or is a device always using only one
>> of the above modes at the same time?
> 
> A device might use one or multiple modes described above at the same
> time.

Yes, at the moment it's difficult to guess what device drivers will
want, but I can imagine some driver offering SVA to userspace, while
keeping a few PASIDs for themselves to map kernel memory. Or create mdev
devices for virtualization while also allowing bare-metal SVA. So I
think we should aim at enabling these use-cases in parallel, even if it
doesn't necessarily need to be possible right now.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 12:45             ` Joerg Roedel
  0 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-26 12:45 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Lu Baolu, iommu, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
> Yes, at the moment it's difficult to guess what device drivers will
> want, but I can imagine some driver offering SVA to userspace, while
> keeping a few PASIDs for themselves to map kernel memory. Or create mdev
> devices for virtualization while also allowing bare-metal SVA. So I
> think we should aim at enabling these use-cases in parallel, even if it
> doesn't necessarily need to be possible right now.

Yeah okay, but allowing these use-cases in parallel basically disallows
giving any guest control over a device's pasid-table, no?

I am just asking because I want to make up my mind about the necessary
extensions to the IOMMU-API.


Regards,

	Joerg


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 12:45             ` Joerg Roedel
  0 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-26 12:45 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
> Yes, at the moment it's difficult to guess what device drivers will
> want, but I can imagine some driver offering SVA to userspace, while
> keeping a few PASIDs for themselves to map kernel memory. Or create mdev
> devices for virtualization while also allowing bare-metal SVA. So I
> think we should aim at enabling these use-cases in parallel, even if it
> doesn't necessarily need to be possible right now.

Yeah okay, but allowing these use-cases in parallel basically disallows
giving any guest control over a device's pasid-table, no?

I am just asking because I want to make up my mind about the necessary
extensions to the IOMMU-API.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
  2018-09-25 22:46           ` Jacob Pan
@ 2018-09-26 12:48             ` Joerg Roedel
  -1 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-26 12:48 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Lu Baolu, Jean-Philippe Brucker, iommu, linux-pci, jcrouse,
	alex.williamson, Jonathan.Cameron, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, andrew.murray, will.deacon, robin.murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On Tue, Sep 25, 2018 at 03:46:42PM -0700, Jacob Pan wrote:
> On Tue, 25 Sep 2018 15:16:47 +0200
> Joerg Roedel <joro@8bytes.org> wrote:
> 
> > On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
> > > > +int iommu_sva_init_device(struct device *dev, unsigned long
> > > > features,
> > > > +		       unsigned int min_pasid, unsigned int
> > > > max_pasid) +{
> > > > +	int ret;
> > > > +	struct iommu_sva_param *param;
> > > > +	struct iommu_domain *domain =
> > > > iommu_get_domain_for_dev(dev);  
> > > 
> > > This doesn't work for vt-d. The domains for host iova are
> > > self-managed by vt-d driver itself. Hence,
> > > iommu_get_domain_for_dev() will always return NULL unless an
> > > UNMANAGED domain is attached to the device.
> > > 
> > > How about
> > > 
> > >       const struct iommu_ops *ops = dev->bus->iommu_ops;
> > > 
> > > instead?  
> > 
> > The per-bus iommu-ops might go away sooner or later as we move to
> > per-device iommu-ops. How about fixing the VT-d driver to not keep
> > that domain internal to itself?
> > 
> Just to understand more specifically, you mean let VT-d driver also
> support IOMMU_DOMAIN_DMA as default domain?

Yes, bringing it on-par with other IOMMU drivers in this regard.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API
@ 2018-09-26 12:48             ` Joerg Roedel
  0 siblings, 0 replies; 87+ messages in thread
From: Joerg Roedel @ 2018-09-26 12:48 UTC (permalink / raw)
  To: Jacob Pan
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, Jean-Philippe Brucker,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On Tue, Sep 25, 2018 at 03:46:42PM -0700, Jacob Pan wrote:
> On Tue, 25 Sep 2018 15:16:47 +0200
> Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> 
> > On Sun, Sep 23, 2018 at 10:39:25AM +0800, Lu Baolu wrote:
> > > > +int iommu_sva_init_device(struct device *dev, unsigned long
> > > > features,
> > > > +		       unsigned int min_pasid, unsigned int
> > > > max_pasid) +{
> > > > +	int ret;
> > > > +	struct iommu_sva_param *param;
> > > > +	struct iommu_domain *domain =
> > > > iommu_get_domain_for_dev(dev);  
> > > 
> > > This doesn't work for vt-d. The domains for host iova are
> > > self-managed by vt-d driver itself. Hence,
> > > iommu_get_domain_for_dev() will always return NULL unless an
> > > UNMANAGED domain is attached to the device.
> > > 
> > > How about
> > > 
> > >       const struct iommu_ops *ops = dev->bus->iommu_ops;
> > > 
> > > instead?  
> > 
> > The per-bus iommu-ops might go away sooner or later as we move to
> > per-device iommu-ops. How about fixing the VT-d driver to not keep
> > that domain internal to itself?
> > 
> Just to understand more specifically, you mean let VT-d driver also
> support IOMMU_DOMAIN_DMA as default domain?

Yes, bringing it on-par with other IOMMU drivers in this regard.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 13:50               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-26 13:50 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: kevin.tian, ashok.raj, linux-pci, ilias.apalodimas, Will Deacon,
	alex.williamson, okaya, iommu, liguozhu, Robin Murphy,
	christian.koenig

On 26/09/2018 13:45, Joerg Roedel wrote:
> On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
>> Yes, at the moment it's difficult to guess what device drivers will
>> want, but I can imagine some driver offering SVA to userspace, while
>> keeping a few PASIDs for themselves to map kernel memory. Or create mdev
>> devices for virtualization while also allowing bare-metal SVA. So I
>> think we should aim at enabling these use-cases in parallel, even if it
>> doesn't necessarily need to be possible right now.
> 
> Yeah okay, but allowing these use-cases in parallel basically disallows
> giving any guest control over a device's pasid-table, no?
All of these use-cases require the host to manage the PASID tables, so
while any one of them is enabled, we can't give a guest control over the
PASID tables. But allowing these use-cases in parallel doesn't change that.

There is an ambiguity: I understand "(3) SVA in VM guest" as SVA for a
device-assignable interface assigned to a guest, using vfio-mdev and the
new Intel vt-d architecture (right?). That case does require the host to
allocate and manage PASIDs (because the PCI device is shared between
multiple VMs).

For the "classic" vfio-pci case, "SVA in guest" still means giving the
guest control over the whole PASID table.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 13:50               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-26 13:50 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	Robin Murphy

On 26/09/2018 13:45, Joerg Roedel wrote:
> On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
>> Yes, at the moment it's difficult to guess what device drivers will
>> want, but I can imagine some driver offering SVA to userspace, while
>> keeping a few PASIDs for themselves to map kernel memory. Or create mdev
>> devices for virtualization while also allowing bare-metal SVA. So I
>> think we should aim at enabling these use-cases in parallel, even if it
>> doesn't necessarily need to be possible right now.
> 
> Yeah okay, but allowing these use-cases in parallel basically disallows
> giving any guest control over a device's pasid-table, no?
All of these use-cases require the host to manage the PASID tables, so
while any one of them is enabled, we can't give a guest control over the
PASID tables. But allowing these use-cases in parallel doesn't change that.

There is an ambiguity: I understand "(3) SVA in VM guest" as SVA for a
device-assignable interface assigned to a guest, using vfio-mdev and the
new Intel vt-d architecture (right?). That case does require the host to
allocate and manage PASIDs (because the PCI device is shared between
multiple VMs).

For the "classic" vfio-pci case, "SVA in guest" still means giving the
guest control over the whole PASID table.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-26 18:01         ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-26 18:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Lu Baolu, iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang, jacob.jun.pan

On Mon, 24 Sep 2018 13:07:47 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 23/09/2018 04:05, Lu Baolu wrote:
> > Hi,
> > 
> > On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:  
> >> Add bind() and unbind() operations to the IOMMU API. Bind()
> >> returns a PASID that drivers can program in hardware, to let their
> >> devices access an mm. This patch only adds skeletons for the
> >> device driver API, most of the implementation is still missing.  
> > 
> > Is it possible that a malicious process can unbind a pasid which is
> > used by another normal process?  
> 
> Yes, it's up to the device driver that calls unbind() to check that
> the caller is allowed to unbind this PASID. We can't do it ourselves
> since unbind() could also be called from a kernel thread for example
> from a cleanup function in some workqueue, outside the context of the
> process to unbind.
> 
I am wondering if we can avoid the complexity around permission
checking by simply _only_ allow bind/unbind() on current mm? what would
be the missing use cases if we bind current only?
It can also avoid other race such as unbind and mmu_notifier release
call.

> Jean
> 
> > 
> > It might happen in below sequence:
> > 
> > 
> > Process A                       Process B
> > =========                       =========
> > iommu_sva_init_device(dev)
> > iommu_sva_bind_device(dev)
> > ....
> > device access mm of A with
> > #PASID returned above
> > ....
> >                                 iommu_sva_unbind_device(dev, #PASID)
> > ....
> > [unrecoverable errors]
> > 
> > I didn't have a thorough consideration of this. Sorry if this has
> > been prevented.
> > 
> > Best regards,
> > Lu Baolu  

[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-26 18:01         ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-26 18:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo

On Mon, 24 Sep 2018 13:07:47 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 23/09/2018 04:05, Lu Baolu wrote:
> > Hi,
> > 
> > On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:  
> >> Add bind() and unbind() operations to the IOMMU API. Bind()
> >> returns a PASID that drivers can program in hardware, to let their
> >> devices access an mm. This patch only adds skeletons for the
> >> device driver API, most of the implementation is still missing.  
> > 
> > Is it possible that a malicious process can unbind a pasid which is
> > used by another normal process?  
> 
> Yes, it's up to the device driver that calls unbind() to check that
> the caller is allowed to unbind this PASID. We can't do it ourselves
> since unbind() could also be called from a kernel thread for example
> from a cleanup function in some workqueue, outside the context of the
> process to unbind.
> 
I am wondering if we can avoid the complexity around permission
checking by simply _only_ allow bind/unbind() on current mm? what would
be the missing use cases if we bind current only?
It can also avoid other race such as unbind and mmu_notifier release
call.

> Jean
> 
> > 
> > It might happen in below sequence:
> > 
> > 
> > Process A                       Process B
> > =========                       =========
> > iommu_sva_init_device(dev)
> > iommu_sva_bind_device(dev)
> > ....
> > device access mm of A with
> > #PASID returned above
> > ....
> >                                 iommu_sva_unbind_device(dev, #PASID)
> > ....
> > [unrecoverable errors]
> > 
> > I didn't have a thorough consideration of this. Sorry if this has
> > been prevented.
> > 
> > Best regards,
> > Lu Baolu  

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 22:35     ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-26 22:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang, jacob.jun.pan

On Thu, 20 Sep 2018 18:00:39 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> +
> +static int io_mm_attach(struct iommu_domain *domain, struct device
> *dev,
> +			struct io_mm *io_mm, void *drvdata)
> +{
> +	int ret;
> +	bool attach_domain = true;
> +	int pasid = io_mm->pasid;
> +	struct iommu_bond *bond, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +		return -ENODEV;
> +
> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> +		return -ERANGE;
> +
> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond->domain		= domain;
> +	bond->io_mm		= io_mm;
> +	bond->dev		= dev;
> +	bond->drvdata		= drvdata;
> +
> +	spin_lock(&iommu_sva_lock);
> +	/*
> +	 * Check if this io_mm is already bound to the domain. In
> which case the
> +	 * IOMMU driver doesn't have to install the PASID table
> entry.
> +	 */
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == io_mm) {
> +			attach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	ret = domain->ops->mm_attach(domain, dev, io_mm,
> attach_domain);
> +	if (ret) {
> +		kfree(bond);
> +		goto out_unlock;
> +	}
> +
> +	list_add(&bond->mm_head, &io_mm->devices);
> +	list_add(&bond->domain_head, &domain->mm_list);
> +	list_add(&bond->dev_head, &param->mm_list);
> +

I am trying to understand if mm_list is needed for both per device and
per domain. Do you always unbind and detach domain? Seems device could
use the domain->mm_list to track all mm's, true?




^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 22:35     ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-26 22:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo

On Thu, 20 Sep 2018 18:00:39 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> +
> +static int io_mm_attach(struct iommu_domain *domain, struct device
> *dev,
> +			struct io_mm *io_mm, void *drvdata)
> +{
> +	int ret;
> +	bool attach_domain = true;
> +	int pasid = io_mm->pasid;
> +	struct iommu_bond *bond, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +		return -ENODEV;
> +
> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> +		return -ERANGE;
> +
> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond->domain		= domain;
> +	bond->io_mm		= io_mm;
> +	bond->dev		= dev;
> +	bond->drvdata		= drvdata;
> +
> +	spin_lock(&iommu_sva_lock);
> +	/*
> +	 * Check if this io_mm is already bound to the domain. In
> which case the
> +	 * IOMMU driver doesn't have to install the PASID table
> entry.
> +	 */
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == io_mm) {
> +			attach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	ret = domain->ops->mm_attach(domain, dev, io_mm,
> attach_domain);
> +	if (ret) {
> +		kfree(bond);
> +		goto out_unlock;
> +	}
> +
> +	list_add(&bond->mm_head, &io_mm->devices);
> +	list_add(&bond->domain_head, &domain->mm_list);
> +	list_add(&bond->dev_head, &param->mm_list);
> +

I am trying to understand if mm_list is needed for both per device and
per domain. Do you always unbind and detach domain? Seems device could
use the domain->mm_list to track all mm's, true?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 22:58               ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-26 22:58 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Jean-Philippe Brucker, Lu Baolu, iommu, linux-pci, jcrouse,
	alex.williamson, Jonathan.Cameron, christian.koenig, eric.auger,
	kevin.tian, yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy,
	ashok.raj, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang, jacob.jun.pan

On Wed, 26 Sep 2018 14:45:27 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
> > Yes, at the moment it's difficult to guess what device drivers will
> > want, but I can imagine some driver offering SVA to userspace, while
> > keeping a few PASIDs for themselves to map kernel memory. Or create
> > mdev devices for virtualization while also allowing bare-metal SVA.
> > So I think we should aim at enabling these use-cases in parallel,
> > even if it doesn't necessarily need to be possible right now.  
> 
> Yeah okay, but allowing these use-cases in parallel basically
> disallows giving any guest control over a device's pasid-table, no?
> 
For VT-d 3 (which is the only revision to support PASID), PASID table
is always controlled by the host driver. Guest SVA usage would bind
PASID with gCR3.
But I thought ARM (https://lkml.org/lkml/2018/9/18/1082) is using bind
PASID table approach which gives guest control of the device PASID
table. I don't know if that is intended for any parallel use of PASID
on the same device.
> I am just asking because I want to make up my mind about the necessary
> extensions to the IOMMU-API.
> 
One extension, we will need and being developed is bind_guest_pasid()
for guest SVA usage.
Usage:
1. guest allocate a system wide PASID for SVA
2. guest write PASID to its PASID table
3. PASID cache flush results in bind PASID (from guest) to device
4. Host IOMMU driver install gCR3s of the PASID to device PASID table
(ops.bind_guest_pasid)

Thanks,

Jacob
> 
> Regards,
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-26 22:58               ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-26 22:58 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	Jean-Philippe Brucker, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo

On Wed, 26 Sep 2018 14:45:27 +0200
Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:

> On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
> > Yes, at the moment it's difficult to guess what device drivers will
> > want, but I can imagine some driver offering SVA to userspace, while
> > keeping a few PASIDs for themselves to map kernel memory. Or create
> > mdev devices for virtualization while also allowing bare-metal SVA.
> > So I think we should aim at enabling these use-cases in parallel,
> > even if it doesn't necessarily need to be possible right now.  
> 
> Yeah okay, but allowing these use-cases in parallel basically
> disallows giving any guest control over a device's pasid-table, no?
> 
For VT-d 3 (which is the only revision to support PASID), PASID table
is always controlled by the host driver. Guest SVA usage would bind
PASID with gCR3.
But I thought ARM (https://lkml.org/lkml/2018/9/18/1082) is using bind
PASID table approach which gives guest control of the device PASID
table. I don't know if that is intended for any parallel use of PASID
on the same device.
> I am just asking because I want to make up my mind about the necessary
> extensions to the IOMMU-API.
> 
One extension, we will need and being developed is bind_guest_pasid()
for guest SVA usage.
Usage:
1. guest allocate a system wide PASID for SVA
2. guest write PASID to its PASID table
3. PASID cache flush results in bind PASID (from guest) to device
4. Host IOMMU driver install gCR3s of the PASID to device PASID table
(ops.bind_guest_pasid)

Thanks,

Jacob
> 
> Regards,
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-27  3:22                 ` Liu, Yi L
  0 siblings, 0 replies; 87+ messages in thread
From: Liu, Yi L @ 2018-09-27  3:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Joerg Roedel
  Cc: Tian, Kevin, Raj, Ashok, linux-pci, ilias.apalodimas,
	Will Deacon, iommu, okaya, alex.williamson, liguozhu,
	christian.koenig, Robin Murphy, Liu, Yi L

> From: iommu-bounces@lists.linux-foundation.org [mailto:iommu-
> bounces@lists.linux-foundation.org] On Behalf Of Jean-Philippe Brucker
> Sent: Wednesday, September 26, 2018 9:50 PM
> Subject: Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
> 
> On 26/09/2018 13:45, Joerg Roedel wrote:
> > On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
> >> Yes, at the moment it's difficult to guess what device drivers will
> >> want, but I can imagine some driver offering SVA to userspace, while
> >> keeping a few PASIDs for themselves to map kernel memory. Or create mdev
> >> devices for virtualization while also allowing bare-metal SVA. So I
> >> think we should aim at enabling these use-cases in parallel, even if it
> >> doesn't necessarily need to be possible right now.
> >
> > Yeah okay, but allowing these use-cases in parallel basically disallows
> > giving any guest control over a device's pasid-table, no?
> All of these use-cases require the host to manage the PASID tables, so
> while any one of them is enabled, we can't give a guest control over the
> PASID tables. But allowing these use-cases in parallel doesn't change that.
> 
> There is an ambiguity: I understand "(3) SVA in VM guest" as SVA for a
> device-assignable interface assigned to a guest, using vfio-mdev and the
> new Intel vt-d architecture (right?). That case does require the host to
> allocate and manage PASIDs (because the PCI device is shared between
> multiple VMs).

Correct. For such case, we give host the charge to allocate and manage PASIDs.
And the reason is correct.

> 
> For the "classic" vfio-pci case, "SVA in guest" still means giving the
> guest control over the whole PASID table.

No, if giving guest control over the whole PASID table, it means guest may have
its own PASID namespace. right? And for vfio-mdev case, it gets PASID from host.
So there would be multiple PASID namespaces. Thinking about the following scenario:

A PF/VF assigned to a VM via "classic" vfio-pci. And an assignable-device-interface
assigned to this VM via vfio-mdev. If an application in this VM tries to manipulate
these two "devices", it should have the same PASID programmed to them. right?
But as the above comments mentioned, for vfio-pci case, it would get a PASID from
its own PASID namespace. While the vfio-mdev case would get a PASID from host.
This would result in conflict.

So I would like we get host to allocate and manage the whole PASID table, so that
to cover possible combinations of vfio-pci passthru and vfio-mdev passthru.

> 
> Thanks,
> Jean

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-27  3:22                 ` Liu, Yi L
  0 siblings, 0 replies; 87+ messages in thread
From: Liu, Yi L @ 2018-09-27  3:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Joerg Roedel
  Cc: Tian, Kevin, Raj, Ashok, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	Robin Murphy

> From: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org [mailto:iommu-
> bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org] On Behalf Of Jean-Philippe Brucker
> Sent: Wednesday, September 26, 2018 9:50 PM
> Subject: Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
> 
> On 26/09/2018 13:45, Joerg Roedel wrote:
> > On Wed, Sep 26, 2018 at 11:20:34AM +0100, Jean-Philippe Brucker wrote:
> >> Yes, at the moment it's difficult to guess what device drivers will
> >> want, but I can imagine some driver offering SVA to userspace, while
> >> keeping a few PASIDs for themselves to map kernel memory. Or create mdev
> >> devices for virtualization while also allowing bare-metal SVA. So I
> >> think we should aim at enabling these use-cases in parallel, even if it
> >> doesn't necessarily need to be possible right now.
> >
> > Yeah okay, but allowing these use-cases in parallel basically disallows
> > giving any guest control over a device's pasid-table, no?
> All of these use-cases require the host to manage the PASID tables, so
> while any one of them is enabled, we can't give a guest control over the
> PASID tables. But allowing these use-cases in parallel doesn't change that.
> 
> There is an ambiguity: I understand "(3) SVA in VM guest" as SVA for a
> device-assignable interface assigned to a guest, using vfio-mdev and the
> new Intel vt-d architecture (right?). That case does require the host to
> allocate and manage PASIDs (because the PCI device is shared between
> multiple VMs).

Correct. For such case, we give host the charge to allocate and manage PASIDs.
And the reason is correct.

> 
> For the "classic" vfio-pci case, "SVA in guest" still means giving the
> guest control over the whole PASID table.

No, if giving guest control over the whole PASID table, it means guest may have
its own PASID namespace. right? And for vfio-mdev case, it gets PASID from host.
So there would be multiple PASID namespaces. Thinking about the following scenario:

A PF/VF assigned to a VM via "classic" vfio-pci. And an assignable-device-interface
assigned to this VM via vfio-mdev. If an application in this VM tries to manipulate
these two "devices", it should have the same PASID programmed to them. right?
But as the above comments mentioned, for vfio-pci case, it would get a PASID from
its own PASID namespace. While the vfio-mdev case would get a PASID from host.
This would result in conflict.

So I would like we get host to allocate and manage the whole PASID table, so that
to cover possible combinations of vfio-pci passthru and vfio-mdev passthru.

> 
> Thanks,
> Jean

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-27 13:37                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-27 13:37 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel
  Cc: Tian, Kevin, Raj, Ashok, linux-pci, ilias.apalodimas,
	Will Deacon, alex.williamson, okaya, iommu, liguozhu,
	christian.koenig, Robin Murphy

On 27/09/2018 04:22, Liu, Yi L wrote:
>> For the "classic" vfio-pci case, "SVA in guest" still means giving the
>> guest control over the whole PASID table.
> 
> No, if giving guest control over the whole PASID table, it means guest may have
> its own PASID namespace. right? And for vfio-mdev case, it gets PASID from host.
> So there would be multiple PASID namespaces. Thinking about the following scenario:
> 
> A PF/VF assigned to a VM via "classic" vfio-pci. And an assignable-device-interface
> assigned to this VM via vfio-mdev. If an application in this VM tries to manipulate
> these two "devices", it should have the same PASID programmed to them. right?
> But as the above comments mentioned, for vfio-pci case, it would get a PASID from
> its own PASID namespace. While the vfio-mdev case would get a PASID from host.
> This would result in conflict.

Ah I see, if the host assigns an ADI via vfio-mdev and a PCI function
via vfio-pci to the same VM, the guest needs to use the paravirtualized
PASID allocator for the PCI device as well, not just the ADI. In fact
all guest PASIDs need to be allocated through one PV channel, even if
the VM has other vIOMMUs that don't support PV. But I suppose that kind
of VM is unrealistic. However for SMMUv3 we'll still use the
bind_pasid_table for vfio-pci and let the guest allocate PASIDs, since
the PASID table lives in guest-physical space.

In any case, for the patch series at hand, it means that iommu-sva will
need assistance from the vt-d driver to allocate PASIDs: host uses the
generic allocator, guest uses the PV one. I guess the mm_alloc() op
could do that?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-09-27 13:37                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-27 13:37 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel
  Cc: Tian, Kevin, Raj, Ashok, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	Robin Murphy

On 27/09/2018 04:22, Liu, Yi L wrote:
>> For the "classic" vfio-pci case, "SVA in guest" still means giving the
>> guest control over the whole PASID table.
> 
> No, if giving guest control over the whole PASID table, it means guest may have
> its own PASID namespace. right? And for vfio-mdev case, it gets PASID from host.
> So there would be multiple PASID namespaces. Thinking about the following scenario:
> 
> A PF/VF assigned to a VM via "classic" vfio-pci. And an assignable-device-interface
> assigned to this VM via vfio-mdev. If an application in this VM tries to manipulate
> these two "devices", it should have the same PASID programmed to them. right?
> But as the above comments mentioned, for vfio-pci case, it would get a PASID from
> its own PASID namespace. While the vfio-mdev case would get a PASID from host.
> This would result in conflict.

Ah I see, if the host assigns an ADI via vfio-mdev and a PCI function
via vfio-pci to the same VM, the guest needs to use the paravirtualized
PASID allocator for the PCI device as well, not just the ADI. In fact
all guest PASIDs need to be allocated through one PV channel, even if
the VM has other vIOMMUs that don't support PV. But I suppose that kind
of VM is unrealistic. However for SMMUv3 we'll still use the
bind_pasid_table for vfio-pci and let the guest allocate PASIDs, since
the PASID table lives in guest-physical space.

In any case, for the patch series at hand, it means that iommu-sva will
need assistance from the vt-d driver to allocate PASIDs: host uses the
generic allocator, guest uses the PV one. I guess the mm_alloc() op
could do that?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
  2018-09-26 18:01         ` Jacob Pan
@ 2018-09-27 15:06           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-27 15:06 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Lu Baolu, iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 26/09/2018 19:01, Jacob Pan wrote:
> On Mon, 24 Sep 2018 13:07:47 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> On 23/09/2018 04:05, Lu Baolu wrote:
>> > Hi,
>> > 
>> > On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:  
>> >> Add bind() and unbind() operations to the IOMMU API. Bind()
>> >> returns a PASID that drivers can program in hardware, to let their
>> >> devices access an mm. This patch only adds skeletons for the
>> >> device driver API, most of the implementation is still missing.  
>> > 
>> > Is it possible that a malicious process can unbind a pasid which is
>> > used by another normal process?  
>> 
>> Yes, it's up to the device driver that calls unbind() to check that
>> the caller is allowed to unbind this PASID. We can't do it ourselves
>> since unbind() could also be called from a kernel thread for example
>> from a cleanup function in some workqueue, outside the context of the
>> process to unbind.

Actually I'm not too concerned about a process unbinding another one,
since in general only the kernel will hold the PASID values. Userspace
shouldn't even need to see them, so issuing unbind() with the wrong
PASID isn't an easy mistake.

> I am wondering if we can avoid the complexity around permission
> checking by simply _only_ allow bind/unbind() on current mm? what would
> be the missing use cases if we bind current only?
> It can also avoid other race such as unbind and mmu_notifier release
> call.

That's tempting but may be too restrictive. I just tried to copy what
the current AMD and Intel drivers do in their SVA implementation, but I
don't know if users will need all of it. At the moment the amdkfd driver
does unbind() from a workqueue, although moving to the generic API might
simplify things there.

Callers can easily enforce that only current->mm is passed to bind(). I
don't know if allowing a process to bind another one is a real use-case,
but the permission check on the device driver side is fairly easy, and
disallowing it wouldn't simplify iommu-sva.

Even if we allow bind() only on current, forcing unbind() to be done on
current means that the driver can't clean things up from a workqueue.
But you're right that this restriction would make things *much* simpler
for the exit()/unbind() race.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-27 15:06           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-27 15:06 UTC (permalink / raw)
  To: Jacob Pan
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 26/09/2018 19:01, Jacob Pan wrote:
> On Mon, 24 Sep 2018 13:07:47 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> On 23/09/2018 04:05, Lu Baolu wrote:
>> > Hi,
>> > 
>> > On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:  
>> >> Add bind() and unbind() operations to the IOMMU API. Bind()
>> >> returns a PASID that drivers can program in hardware, to let their
>> >> devices access an mm. This patch only adds skeletons for the
>> >> device driver API, most of the implementation is still missing.  
>> > 
>> > Is it possible that a malicious process can unbind a pasid which is
>> > used by another normal process?  
>> 
>> Yes, it's up to the device driver that calls unbind() to check that
>> the caller is allowed to unbind this PASID. We can't do it ourselves
>> since unbind() could also be called from a kernel thread for example
>> from a cleanup function in some workqueue, outside the context of the
>> process to unbind.

Actually I'm not too concerned about a process unbinding another one,
since in general only the kernel will hold the PASID values. Userspace
shouldn't even need to see them, so issuing unbind() with the wrong
PASID isn't an easy mistake.

> I am wondering if we can avoid the complexity around permission
> checking by simply _only_ allow bind/unbind() on current mm? what would
> be the missing use cases if we bind current only?
> It can also avoid other race such as unbind and mmu_notifier release
> call.

That's tempting but may be too restrictive. I just tried to copy what
the current AMD and Intel drivers do in their SVA implementation, but I
don't know if users will need all of it. At the moment the amdkfd driver
does unbind() from a workqueue, although moving to the generic API might
simplify things there.

Callers can easily enforce that only current->mm is passed to bind(). I
don't know if allowing a process to bind another one is a real use-case,
but the permission check on the device driver side is fairly easy, and
disallowing it wouldn't simplify iommu-sva.

Even if we allow bind() only on current, forcing unbind() to be done on
current means that the driver can't clean things up from a workqueue.
But you're right that this restriction would make things *much* simpler
for the exit()/unbind() race.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 07/10] iommu: Add a page fault handler
@ 2018-09-27 20:37     ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-27 20:37 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang, jacob.jun.pan

On Thu, 20 Sep 2018 18:00:43 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> +	/*
> +	 * When removing a PASID, the device driver tells the device
> to stop
> +	 * using it, and flush any pending fault to the IOMMU. In
> this flush
> +	 * callback, the IOMMU driver makes sure that there are no
> such faults
> +	 * left in the low-level queue.
> +	 */
> +	queue->flush(queue->flush_arg, dev, pasid);
> +
> +	/*
> +	 * If at some point the low-level fault queue overflowed and
> the IOMMU
> +	 * device had to auto-respond to a 'last' page fault, other
> faults from
> +	 * the same Page Request Group may still be stuck in the
> partial list.
> +	 * We need to make sure that the next address space using
> the PASID
> +	 * doesn't receive them.
> +	 */
Trying to understand the intended use case under queue full condition.
1 model specific iommu driver register a flush callback to handle
  internal PRQ drain

2 IOMMU HW detects queue full and auto respond with 'SUCCESS' code to
  all device and PASID, raise interrupt

3 model specific iommu driver detects queue full and call
iopf_queue_flush_dev()

4 call queue->flush() callback to drain PRQ in-flight inside IOMMU HW
5.Shoot down partial list for all PASIDs

If the above understanding is correct, don't we need to shoot down all
partial groups? instead of just one PASID. At least for VT-d, we need
to do that.


> +	mutex_lock(&param->lock);
> +	list_for_each_entry_safe(fault, next,
> &param->iopf_param->partial, head) {
> +		if (fault->evt.pasid == pasid || pasid ==
> IOMMU_PASID_INVALID) {
> +			list_del(&fault->head);
> +			kfree(fault);
> +		}
> +	}
> +	mutex_unlock(&param->lock);
> +
> +	flush_workqueue(queue->wq);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
> +
[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 07/10] iommu: Add a page fault handler
@ 2018-09-27 20:37     ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-09-27 20:37 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo

On Thu, 20 Sep 2018 18:00:43 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> +	/*
> +	 * When removing a PASID, the device driver tells the device
> to stop
> +	 * using it, and flush any pending fault to the IOMMU. In
> this flush
> +	 * callback, the IOMMU driver makes sure that there are no
> such faults
> +	 * left in the low-level queue.
> +	 */
> +	queue->flush(queue->flush_arg, dev, pasid);
> +
> +	/*
> +	 * If at some point the low-level fault queue overflowed and
> the IOMMU
> +	 * device had to auto-respond to a 'last' page fault, other
> faults from
> +	 * the same Page Request Group may still be stuck in the
> partial list.
> +	 * We need to make sure that the next address space using
> the PASID
> +	 * doesn't receive them.
> +	 */
Trying to understand the intended use case under queue full condition.
1 model specific iommu driver register a flush callback to handle
  internal PRQ drain

2 IOMMU HW detects queue full and auto respond with 'SUCCESS' code to
  all device and PASID, raise interrupt

3 model specific iommu driver detects queue full and call
iopf_queue_flush_dev()

4 call queue->flush() callback to drain PRQ in-flight inside IOMMU HW
5.Shoot down partial list for all PASIDs

If the above understanding is correct, don't we need to shoot down all
partial groups? instead of just one PASID. At least for VT-d, we need
to do that.


> +	mutex_lock(&param->lock);
> +	list_for_each_entry_safe(fault, next,
> &param->iopf_param->partial, head) {
> +		if (fault->evt.pasid == pasid || pasid ==
> IOMMU_PASID_INVALID) {
> +			list_del(&fault->head);
> +			kfree(fault);
> +		}
> +	}
> +	mutex_unlock(&param->lock);
> +
> +	flush_workqueue(queue->wq);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
> +
[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-28  1:14             ` Tian, Kevin
  0 siblings, 0 replies; 87+ messages in thread
From: Tian, Kevin @ 2018-09-28  1:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan
  Cc: Lu Baolu, iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, Liu, Yi L,
	Andrew Murray, Will Deacon, Robin Murphy, Raj, Ashok, xuzaibo,
	liguozhu, okaya, bharatku, ilias.apalodimas, shunyong.yang

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, September 27, 2018 11:06 PM
> 
> On 26/09/2018 19:01, Jacob Pan wrote:
> > On Mon, 24 Sep 2018 13:07:47 +0100
> > Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> >
> >> On 23/09/2018 04:05, Lu Baolu wrote:
> >> > Hi,
> >> >
> >> > On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> >> >> Add bind() and unbind() operations to the IOMMU API. Bind()
> >> >> returns a PASID that drivers can program in hardware, to let their
> >> >> devices access an mm. This patch only adds skeletons for the
> >> >> device driver API, most of the implementation is still missing.
> >> >
> >> > Is it possible that a malicious process can unbind a pasid which is
> >> > used by another normal process?
> >>
> >> Yes, it's up to the device driver that calls unbind() to check that
> >> the caller is allowed to unbind this PASID. We can't do it ourselves
> >> since unbind() could also be called from a kernel thread for example
> >> from a cleanup function in some workqueue, outside the context of the
> >> process to unbind.
> 
> Actually I'm not too concerned about a process unbinding another one,
> since in general only the kernel will hold the PASID values. Userspace
> shouldn't even need to see them, so issuing unbind() with the wrong
> PASID isn't an easy mistake.
> 

well, it depends on which scenario is talked here.

for native SVA with device driver in kernel, your description is correct.

for native SVA with device driver in user space, then user space needs to
see/hold PASIDs and program them to device specific register.

for virtual SVA (vtd case), Qemu needs to see/hold PASIDs and pass to 
guest upon any PASID allocation request thru a PV channel, as you just 
saw in another thread. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices
@ 2018-09-28  1:14             ` Tian, Kevin
  0 siblings, 0 replies; 87+ messages in thread
From: Tian, Kevin @ 2018-09-28  1:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan
  Cc: Raj, Ashok, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
> Sent: Thursday, September 27, 2018 11:06 PM
> 
> On 26/09/2018 19:01, Jacob Pan wrote:
> > On Mon, 24 Sep 2018 13:07:47 +0100
> > Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> >
> >> On 23/09/2018 04:05, Lu Baolu wrote:
> >> > Hi,
> >> >
> >> > On 09/21/2018 01:00 AM, Jean-Philippe Brucker wrote:
> >> >> Add bind() and unbind() operations to the IOMMU API. Bind()
> >> >> returns a PASID that drivers can program in hardware, to let their
> >> >> devices access an mm. This patch only adds skeletons for the
> >> >> device driver API, most of the implementation is still missing.
> >> >
> >> > Is it possible that a malicious process can unbind a pasid which is
> >> > used by another normal process?
> >>
> >> Yes, it's up to the device driver that calls unbind() to check that
> >> the caller is allowed to unbind this PASID. We can't do it ourselves
> >> since unbind() could also be called from a kernel thread for example
> >> from a cleanup function in some workqueue, outside the context of the
> >> process to unbind.
> 
> Actually I'm not too concerned about a process unbinding another one,
> since in general only the kernel will hold the PASID values. Userspace
> shouldn't even need to see them, so issuing unbind() with the wrong
> PASID isn't an easy mistake.
> 

well, it depends on which scenario is talked here.

for native SVA with device driver in kernel, your description is correct.

for native SVA with device driver in user space, then user space needs to
see/hold PASIDs and program them to device specific register.

for virtual SVA (vtd case), Qemu needs to see/hold PASIDs and pass to 
guest upon any PASID allocation request thru a PV channel, as you just 
saw in another thread. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 07/10] iommu: Add a page fault handler
  2018-09-27 20:37     ` Jacob Pan
@ 2018-10-03 17:46       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-03 17:46 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 27/09/2018 21:37, Jacob Pan wrote:
> On Thu, 20 Sep 2018 18:00:43 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> +	/*
>> +	 * When removing a PASID, the device driver tells the device
>> to stop
>> +	 * using it, and flush any pending fault to the IOMMU. In
>> this flush
>> +	 * callback, the IOMMU driver makes sure that there are no
>> such faults
>> +	 * left in the low-level queue.
>> +	 */
>> +	queue->flush(queue->flush_arg, dev, pasid);
>> +
>> +	/*
>> +	 * If at some point the low-level fault queue overflowed and
>> the IOMMU
>> +	 * device had to auto-respond to a 'last' page fault, other
>> faults from
>> +	 * the same Page Request Group may still be stuck in the
>> partial list.
>> +	 * We need to make sure that the next address space using
>> the PASID
>> +	 * doesn't receive them.
>> +	 */
> Trying to understand the intended use case under queue full condition.
> 1 model specific iommu driver register a flush callback to handle
>   internal PRQ drain
> 
> 2 IOMMU HW detects queue full and auto respond with 'SUCCESS' code to
>   all device and PASID, raise interrupt
> 
> 3 model specific iommu driver detects queue full and call
> iopf_queue_flush_dev()

I didn't intent for iopf_queue_flush_dev to be called by the IOMMU driver
in this situation, at the moment it's only intended for the SVA code to
clean up before removing a PASID (in which case we have to wipe partial
faults). This version doesn't provide anything to the IOMMU driver for
handling overflow condition cleanly, partial faults are kept until the
PASID is unbound or SVA is disabled.

> 4 call queue->flush() callback to drain PRQ in-flight inside IOMMU HW

Could we avoid this step in this scenario? If it's the PRI IRQ thread that
detects queue full in step 3, then it could drain the HW queue before
calling iopf_flush_partial() (or something like that). I'm a bit worried
about possible locking problems if we go back to the IOMMU driver here
while it is calling us.

> 5.Shoot down partial list for all PASIDs
> 
> If the above understanding is correct, don't we need to shoot down all
> partial groups? instead of just one PASID. At least for VT-d, we need
> to do that.

Passing IOMMU_PASID_INVALID will do that. But It also needs to be done for
all devices that use this IOPF queue, and we don't need to flush the
workqueue, so iopf_queue_flush_dev isn't really adapted.

Thanks,
Jean

> 
> 
>> +	mutex_lock(&param->lock);
>> +	list_for_each_entry_safe(fault, next,
>> &param->iopf_param->partial, head) {
>> +		if (fault->evt.pasid == pasid || pasid ==
>> IOMMU_PASID_INVALID) {
>> +			list_del(&fault->head);
>> +			kfree(fault);
>> +		}
>> +	}
>> +	mutex_unlock(&param->lock);
>> +
>> +	flush_workqueue(queue->wq);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
>> +
> [Jacob Pan]
> 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 07/10] iommu: Add a page fault handler
@ 2018-10-03 17:46       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-03 17:46 UTC (permalink / raw)
  To: Jacob Pan
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	Robin Murphy

On 27/09/2018 21:37, Jacob Pan wrote:
> On Thu, 20 Sep 2018 18:00:43 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> +	/*
>> +	 * When removing a PASID, the device driver tells the device
>> to stop
>> +	 * using it, and flush any pending fault to the IOMMU. In
>> this flush
>> +	 * callback, the IOMMU driver makes sure that there are no
>> such faults
>> +	 * left in the low-level queue.
>> +	 */
>> +	queue->flush(queue->flush_arg, dev, pasid);
>> +
>> +	/*
>> +	 * If at some point the low-level fault queue overflowed and
>> the IOMMU
>> +	 * device had to auto-respond to a 'last' page fault, other
>> faults from
>> +	 * the same Page Request Group may still be stuck in the
>> partial list.
>> +	 * We need to make sure that the next address space using
>> the PASID
>> +	 * doesn't receive them.
>> +	 */
> Trying to understand the intended use case under queue full condition.
> 1 model specific iommu driver register a flush callback to handle
>   internal PRQ drain
> 
> 2 IOMMU HW detects queue full and auto respond with 'SUCCESS' code to
>   all device and PASID, raise interrupt
> 
> 3 model specific iommu driver detects queue full and call
> iopf_queue_flush_dev()

I didn't intent for iopf_queue_flush_dev to be called by the IOMMU driver
in this situation, at the moment it's only intended for the SVA code to
clean up before removing a PASID (in which case we have to wipe partial
faults). This version doesn't provide anything to the IOMMU driver for
handling overflow condition cleanly, partial faults are kept until the
PASID is unbound or SVA is disabled.

> 4 call queue->flush() callback to drain PRQ in-flight inside IOMMU HW

Could we avoid this step in this scenario? If it's the PRI IRQ thread that
detects queue full in step 3, then it could drain the HW queue before
calling iopf_flush_partial() (or something like that). I'm a bit worried
about possible locking problems if we go back to the IOMMU driver here
while it is calling us.

> 5.Shoot down partial list for all PASIDs
> 
> If the above understanding is correct, don't we need to shoot down all
> partial groups? instead of just one PASID. At least for VT-d, we need
> to do that.

Passing IOMMU_PASID_INVALID will do that. But It also needs to be done for
all devices that use this IOPF queue, and we don't need to flush the
workqueue, so iopf_queue_flush_dev isn't really adapted.

Thanks,
Jean

> 
> 
>> +	mutex_lock(&param->lock);
>> +	list_for_each_entry_safe(fault, next,
>> &param->iopf_param->partial, head) {
>> +		if (fault->evt.pasid == pasid || pasid ==
>> IOMMU_PASID_INVALID) {
>> +			list_del(&fault->head);
>> +			kfree(fault);
>> +		}
>> +	}
>> +	mutex_unlock(&param->lock);
>> +
>> +	flush_workqueue(queue->wq);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
>> +
> [Jacob Pan]
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
  2018-09-26 22:35     ` Jacob Pan
@ 2018-10-03 17:52       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-03 17:52 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 26/09/2018 23:35, Jacob Pan wrote:
> On Thu, 20 Sep 2018 18:00:39 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> +
>> +static int io_mm_attach(struct iommu_domain *domain, struct device
>> *dev,
>> +			struct io_mm *io_mm, void *drvdata)
>> +{
>> +	int ret;
>> +	bool attach_domain = true;
>> +	int pasid = io_mm->pasid;
>> +	struct iommu_bond *bond, *tmp;
>> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
>> +
>> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
>> +		return -ENODEV;
>> +
>> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
>> +		return -ERANGE;
>> +
>> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
>> +	if (!bond)
>> +		return -ENOMEM;
>> +
>> +	bond->domain		= domain;
>> +	bond->io_mm		= io_mm;
>> +	bond->dev		= dev;
>> +	bond->drvdata		= drvdata;
>> +
>> +	spin_lock(&iommu_sva_lock);
>> +	/*
>> +	 * Check if this io_mm is already bound to the domain. In
>> which case the
>> +	 * IOMMU driver doesn't have to install the PASID table
>> entry.
>> +	 */
>> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
>> +		if (tmp->io_mm == io_mm) {
>> +			attach_domain = false;
>> +			break;
>> +		}
>> +	}
>> +
>> +	ret = domain->ops->mm_attach(domain, dev, io_mm,
>> attach_domain);
>> +	if (ret) {
>> +		kfree(bond);
>> +		goto out_unlock;
>> +	}
>> +
>> +	list_add(&bond->mm_head, &io_mm->devices);
>> +	list_add(&bond->domain_head, &domain->mm_list);
>> +	list_add(&bond->dev_head, &param->mm_list);
>> +
> 
> I am trying to understand if mm_list is needed for both per device and
> per domain. Do you always unbind and detach domain? Seems device could
> use the domain->mm_list to track all mm's, true?

We need to track bonds per devices, since the bind/unbind() user interface
in on devices. Tracking per domain is just a helper, so IOMMU drivers that
have a single PASID table per domain know when they need to install a new
entry (the attach_domain parameter above) and remove it. I think my code
is wrong here: if binding two devices that are in the same domain to the
same process we shouldn't add the io_mm to domain->mm_list twice.

I'm still not sure if I should remove domains handling here though, could
you confirm if you're planning to support iommu_get_domain_for_dev for vt-d?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-10-03 17:52       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-03 17:52 UTC (permalink / raw)
  To: Jacob Pan
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	Robin Murphy

On 26/09/2018 23:35, Jacob Pan wrote:
> On Thu, 20 Sep 2018 18:00:39 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> +
>> +static int io_mm_attach(struct iommu_domain *domain, struct device
>> *dev,
>> +			struct io_mm *io_mm, void *drvdata)
>> +{
>> +	int ret;
>> +	bool attach_domain = true;
>> +	int pasid = io_mm->pasid;
>> +	struct iommu_bond *bond, *tmp;
>> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
>> +
>> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
>> +		return -ENODEV;
>> +
>> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
>> +		return -ERANGE;
>> +
>> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
>> +	if (!bond)
>> +		return -ENOMEM;
>> +
>> +	bond->domain		= domain;
>> +	bond->io_mm		= io_mm;
>> +	bond->dev		= dev;
>> +	bond->drvdata		= drvdata;
>> +
>> +	spin_lock(&iommu_sva_lock);
>> +	/*
>> +	 * Check if this io_mm is already bound to the domain. In
>> which case the
>> +	 * IOMMU driver doesn't have to install the PASID table
>> entry.
>> +	 */
>> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
>> +		if (tmp->io_mm == io_mm) {
>> +			attach_domain = false;
>> +			break;
>> +		}
>> +	}
>> +
>> +	ret = domain->ops->mm_attach(domain, dev, io_mm,
>> attach_domain);
>> +	if (ret) {
>> +		kfree(bond);
>> +		goto out_unlock;
>> +	}
>> +
>> +	list_add(&bond->mm_head, &io_mm->devices);
>> +	list_add(&bond->domain_head, &domain->mm_list);
>> +	list_add(&bond->dev_head, &param->mm_list);
>> +
> 
> I am trying to understand if mm_list is needed for both per device and
> per domain. Do you always unbind and detach domain? Seems device could
> use the domain->mm_list to track all mm's, true?

We need to track bonds per devices, since the bind/unbind() user interface
in on devices. Tracking per domain is just a helper, so IOMMU drivers that
have a single PASID table per domain know when they need to install a new
entry (the attach_domain parameter above) and remove it. I think my code
is wrong here: if binding two devices that are in the same domain to the
same process we shouldn't add the io_mm to domain->mm_list twice.

I'm still not sure if I should remove domains handling here though, could
you confirm if you're planning to support iommu_get_domain_for_dev for vt-d?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-10-08  8:29                     ` Liu, Yi L
  0 siblings, 0 replies; 87+ messages in thread
From: Liu, Yi L @ 2018-10-08  8:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Joerg Roedel
  Cc: Tian, Kevin, Raj, Ashok, linux-pci, ilias.apalodimas,
	Will Deacon, alex.williamson, okaya, iommu, liguozhu,
	christian.koenig, Robin Murphy

Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, September 27, 2018 9:38 PM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>
> Subject: Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
> 
> On 27/09/2018 04:22, Liu, Yi L wrote:
> >> For the "classic" vfio-pci case, "SVA in guest" still means giving the
> >> guest control over the whole PASID table.
> >
> > No, if giving guest control over the whole PASID table, it means guest may have
> > its own PASID namespace. right? And for vfio-mdev case, it gets PASID from host.
> > So there would be multiple PASID namespaces. Thinking about the following
> scenario:
> >
> > A PF/VF assigned to a VM via "classic" vfio-pci. And an assignable-device-interface
> > assigned to this VM via vfio-mdev. If an application in this VM tries to manipulate
> > these two "devices", it should have the same PASID programmed to them. right?
> > But as the above comments mentioned, for vfio-pci case, it would get a PASID
> from
> > its own PASID namespace. While the vfio-mdev case would get a PASID from host.
> > This would result in conflict.
> 
> Ah I see, if the host assigns an ADI via vfio-mdev and a PCI function
> via vfio-pci to the same VM, the guest needs to use the paravirtualized
> PASID allocator for the PCI device as well, not just the ADI. In fact
> all guest PASIDs need to be allocated through one PV channel, even if
> the VM has other vIOMMUs that don't support PV. But I suppose that kind
> of VM is unrealistic.

yes, such kind of VM is unrealistic. :)

> However for SMMUv3 we'll still use the
> bind_pasid_table for vfio-pci and let the guest allocate PASIDs, since
> the PASID table lives in guest-physical space.

I think it's ok. This doesn’t result in any conflict.

> 
> In any case, for the patch series at hand, it means that iommu-sva will
> need assistance from the vt-d driver to allocate PASIDs: host uses the
> generic allocator, guest uses the PV one.

Exactly.

> I guess the mm_alloc() op could do that?

Do you mean the io_mm_alloc in your SVA patch series? We've got
some patch for the PV one. Allen (Baolu Lu) is preparing to send it out
for review. I guess we can have more alignment during that patch
reviewing.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-10-08  8:29                     ` Liu, Yi L
  0 siblings, 0 replies; 87+ messages in thread
From: Liu, Yi L @ 2018-10-08  8:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Joerg Roedel
  Cc: Tian, Kevin, Raj, Ashok, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo,
	Robin Murphy

Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, September 27, 2018 9:38 PM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>
> Subject: Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
> 
> On 27/09/2018 04:22, Liu, Yi L wrote:
> >> For the "classic" vfio-pci case, "SVA in guest" still means giving the
> >> guest control over the whole PASID table.
> >
> > No, if giving guest control over the whole PASID table, it means guest may have
> > its own PASID namespace. right? And for vfio-mdev case, it gets PASID from host.
> > So there would be multiple PASID namespaces. Thinking about the following
> scenario:
> >
> > A PF/VF assigned to a VM via "classic" vfio-pci. And an assignable-device-interface
> > assigned to this VM via vfio-mdev. If an application in this VM tries to manipulate
> > these two "devices", it should have the same PASID programmed to them. right?
> > But as the above comments mentioned, for vfio-pci case, it would get a PASID
> from
> > its own PASID namespace. While the vfio-mdev case would get a PASID from host.
> > This would result in conflict.
> 
> Ah I see, if the host assigns an ADI via vfio-mdev and a PCI function
> via vfio-pci to the same VM, the guest needs to use the paravirtualized
> PASID allocator for the PCI device as well, not just the ADI. In fact
> all guest PASIDs need to be allocated through one PV channel, even if
> the VM has other vIOMMUs that don't support PV. But I suppose that kind
> of VM is unrealistic.

yes, such kind of VM is unrealistic. :)

> However for SMMUv3 we'll still use the
> bind_pasid_table for vfio-pci and let the guest allocate PASIDs, since
> the PASID table lives in guest-physical space.

I think it's ok. This doesn’t result in any conflict.

> 
> In any case, for the patch series at hand, it means that iommu-sva will
> need assistance from the vt-d driver to allocate PASIDs: host uses the
> generic allocator, guest uses the PV one.

Exactly.

> I guess the mm_alloc() op could do that?

Do you mean the io_mm_alloc in your SVA patch series? We've got
some patch for the PV one. Allen (Baolu Lu) is preparing to send it out
for review. I guess we can have more alignment during that patch
reviewing.

Thanks,
Yi Liu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-12 14:32     ` Jordan Crouse
  0 siblings, 0 replies; 87+ messages in thread
From: Jordan Crouse @ 2018-10-12 14:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, joro, linux-pci, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
> Provide an API for allocating PASIDs and populating them manually. To ease
> cleanup and factor allocation code, reuse the io_mm structure for private
> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
> multiple devices. The mm_alloc() IOMMU op must now check if the mm
> argument is NULL, in which case it should allocate io_pgtables instead of
> binding to an mm.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
> Sadly this probably won't be the final thing. The API in this patch is
> used like this:
> 
>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
>         iommu_sva_map(io_mm, ...)
>         iommu_sva_unmap(io_mm, ...)
>         iommu_sva_free_pasid(dev, io_mm)
> 
> The proposed API for auxiliary domains is in an early stage but might
> replace this patch and could be used like this:
> 
>         iommu_enable_aux_domain(dev)
>         d = iommu_domain_alloc()
>         iommu_attach_aux(dev, d)
>         iommu_aux_id(d) -> PASID
>         iommu_map(d, ...)
>         iommu_unmap(d, ...)
>         iommu_detach_aux(dev, d)
>         iommu_domain_free(d)
> 
> The advantage being that the driver doesn't have to use a special
> version of map/unmap/etc.

Hi Jean-Phillippe -

Have you thought about this any more? I want to send out a
refresh for the per-context pagetables for arm-smmu so if we want to change
the underlying assumptions this would be a great time.

For my part I'm okay with either model. In fact the second one is closer
to the original implementation that I sent out so I have a clear development
path in mind for either option depending on what the community decides.

Thanks,
Jordan

<snip the rest of the patch>

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-12 14:32     ` Jordan Crouse
  0 siblings, 0 replies; 87+ messages in thread
From: Jordan Crouse @ 2018-10-12 14:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
> Provide an API for allocating PASIDs and populating them manually. To ease
> cleanup and factor allocation code, reuse the io_mm structure for private
> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
> multiple devices. The mm_alloc() IOMMU op must now check if the mm
> argument is NULL, in which case it should allocate io_pgtables instead of
> binding to an mm.
> 
> Signed-off-by: Jordan Crouse <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
> Sadly this probably won't be the final thing. The API in this patch is
> used like this:
> 
>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
>         iommu_sva_map(io_mm, ...)
>         iommu_sva_unmap(io_mm, ...)
>         iommu_sva_free_pasid(dev, io_mm)
> 
> The proposed API for auxiliary domains is in an early stage but might
> replace this patch and could be used like this:
> 
>         iommu_enable_aux_domain(dev)
>         d = iommu_domain_alloc()
>         iommu_attach_aux(dev, d)
>         iommu_aux_id(d) -> PASID
>         iommu_map(d, ...)
>         iommu_unmap(d, ...)
>         iommu_detach_aux(dev, d)
>         iommu_domain_free(d)
> 
> The advantage being that the driver doesn't have to use a special
> version of map/unmap/etc.

Hi Jean-Phillippe -

Have you thought about this any more? I want to send out a
refresh for the per-context pagetables for arm-smmu so if we want to change
the underlying assumptions this would be a great time.

For my part I'm okay with either model. In fact the second one is closer
to the original implementation that I sent out so I have a clear development
path in mind for either option depending on what the community decides.

Thanks,
Jordan

<snip the rest of the patch>

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-10-15 20:53         ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-10-15 20:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, joro, linux-pci, jcrouse, alex.williamson,
	Jonathan.Cameron, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, Andrew Murray, Will Deacon, Robin Murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang, jacob.jun.pan

On Wed, 3 Oct 2018 18:52:16 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 26/09/2018 23:35, Jacob Pan wrote:
> > On Thu, 20 Sep 2018 18:00:39 +0100
> > Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> >   
> >> +
> >> +static int io_mm_attach(struct iommu_domain *domain, struct device
> >> *dev,
> >> +			struct io_mm *io_mm, void *drvdata)
> >> +{
> >> +	int ret;
> >> +	bool attach_domain = true;
> >> +	int pasid = io_mm->pasid;
> >> +	struct iommu_bond *bond, *tmp;
> >> +	struct iommu_sva_param *param =
> >> dev->iommu_param->sva_param; +
> >> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> >> +		return -ENODEV;
> >> +
> >> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> >> +		return -ERANGE;
> >> +
> >> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> >> +	if (!bond)
> >> +		return -ENOMEM;
> >> +
> >> +	bond->domain		= domain;
> >> +	bond->io_mm		= io_mm;
> >> +	bond->dev		= dev;
> >> +	bond->drvdata		= drvdata;
> >> +
> >> +	spin_lock(&iommu_sva_lock);
> >> +	/*
> >> +	 * Check if this io_mm is already bound to the domain. In
> >> which case the
> >> +	 * IOMMU driver doesn't have to install the PASID table
> >> entry.
> >> +	 */
> >> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> >> +		if (tmp->io_mm == io_mm) {
> >> +			attach_domain = false;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	ret = domain->ops->mm_attach(domain, dev, io_mm,
> >> attach_domain);
> >> +	if (ret) {
> >> +		kfree(bond);
> >> +		goto out_unlock;
> >> +	}
> >> +
> >> +	list_add(&bond->mm_head, &io_mm->devices);
> >> +	list_add(&bond->domain_head, &domain->mm_list);
> >> +	list_add(&bond->dev_head, &param->mm_list);
> >> +  
> > 
> > I am trying to understand if mm_list is needed for both per device
> > and per domain. Do you always unbind and detach domain? Seems
> > device could use the domain->mm_list to track all mm's, true?  
> 
> We need to track bonds per devices, since the bind/unbind() user
> interface in on devices. Tracking per domain is just a helper, so
> IOMMU drivers that have a single PASID table per domain know when
> they need to install a new entry (the attach_domain parameter above)
> and remove it. I think my code is wrong here: if binding two devices
> that are in the same domain to the same process we shouldn't add the
> io_mm to domain->mm_list twice.
> 
> I'm still not sure if I should remove domains handling here though,
> could you confirm if you're planning to support
> iommu_get_domain_for_dev for vt-d?
> 
yes. i am working on getting vt-d onto the same behavior in terms of
default domain. I have a patch being tested, we need to respect RMRR (
reserved region) that is setup before iommu_get_domain_for_dev().
> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] iommu/sva: Manage process address spaces
@ 2018-10-15 20:53         ` Jacob Pan
  0 siblings, 0 replies; 87+ messages in thread
From: Jacob Pan @ 2018-10-15 20:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, christian.koenig-5C7GfCeVMHo

On Wed, 3 Oct 2018 18:52:16 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> On 26/09/2018 23:35, Jacob Pan wrote:
> > On Thu, 20 Sep 2018 18:00:39 +0100
> > Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> >   
> >> +
> >> +static int io_mm_attach(struct iommu_domain *domain, struct device
> >> *dev,
> >> +			struct io_mm *io_mm, void *drvdata)
> >> +{
> >> +	int ret;
> >> +	bool attach_domain = true;
> >> +	int pasid = io_mm->pasid;
> >> +	struct iommu_bond *bond, *tmp;
> >> +	struct iommu_sva_param *param =
> >> dev->iommu_param->sva_param; +
> >> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> >> +		return -ENODEV;
> >> +
> >> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> >> +		return -ERANGE;
> >> +
> >> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> >> +	if (!bond)
> >> +		return -ENOMEM;
> >> +
> >> +	bond->domain		= domain;
> >> +	bond->io_mm		= io_mm;
> >> +	bond->dev		= dev;
> >> +	bond->drvdata		= drvdata;
> >> +
> >> +	spin_lock(&iommu_sva_lock);
> >> +	/*
> >> +	 * Check if this io_mm is already bound to the domain. In
> >> which case the
> >> +	 * IOMMU driver doesn't have to install the PASID table
> >> entry.
> >> +	 */
> >> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> >> +		if (tmp->io_mm == io_mm) {
> >> +			attach_domain = false;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	ret = domain->ops->mm_attach(domain, dev, io_mm,
> >> attach_domain);
> >> +	if (ret) {
> >> +		kfree(bond);
> >> +		goto out_unlock;
> >> +	}
> >> +
> >> +	list_add(&bond->mm_head, &io_mm->devices);
> >> +	list_add(&bond->domain_head, &domain->mm_list);
> >> +	list_add(&bond->dev_head, &param->mm_list);
> >> +  
> > 
> > I am trying to understand if mm_list is needed for both per device
> > and per domain. Do you always unbind and detach domain? Seems
> > device could use the domain->mm_list to track all mm's, true?  
> 
> We need to track bonds per devices, since the bind/unbind() user
> interface in on devices. Tracking per domain is just a helper, so
> IOMMU drivers that have a single PASID table per domain know when
> they need to install a new entry (the attach_domain parameter above)
> and remove it. I think my code is wrong here: if binding two devices
> that are in the same domain to the same process we shouldn't add the
> io_mm to domain->mm_list twice.
> 
> I'm still not sure if I should remove domains handling here though,
> could you confirm if you're planning to support
> iommu_get_domain_for_dev for vt-d?
> 
yes. i am working on getting vt-d onto the same behavior in terms of
default domain. I have a patch being tested, we need to respect RMRR (
reserved region) that is setup before iommu_get_domain_for_dev().
> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-17 14:21       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-17 14:21 UTC (permalink / raw)
  To: iommu, joro, linux-pci, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

Hi Jordan,

On 12/10/2018 15:32, Jordan Crouse wrote:
> On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
>> Provide an API for allocating PASIDs and populating them manually. To ease
>> cleanup and factor allocation code, reuse the io_mm structure for private
>> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
>> multiple devices. The mm_alloc() IOMMU op must now check if the mm
>> argument is NULL, in which case it should allocate io_pgtables instead of
>> binding to an mm.
>>
>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> ---
>> Sadly this probably won't be the final thing. The API in this patch is
>> used like this:
>>
>>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
>>         iommu_sva_map(io_mm, ...)
>>         iommu_sva_unmap(io_mm, ...)
>>         iommu_sva_free_pasid(dev, io_mm)
>>
>> The proposed API for auxiliary domains is in an early stage but might
>> replace this patch and could be used like this:
>>
>>         iommu_enable_aux_domain(dev)
>>         d = iommu_domain_alloc()
>>         iommu_attach_aux(dev, d)
>>         iommu_aux_id(d) -> PASID
>>         iommu_map(d, ...)
>>         iommu_unmap(d, ...)
>>         iommu_detach_aux(dev, d)
>>         iommu_domain_free(d)
>>
>> The advantage being that the driver doesn't have to use a special
>> version of map/unmap/etc.
>
> Hi Jean-Phillippe -
>
> Have you thought about this any more? I want to send out a
> refresh for the per-context pagetables for arm-smmu so if we want to change
> the underlying assumptions this would be a great time.
>
> For my part I'm okay with either model. In fact the second one is closer
> to the original implementation that I sent out so I have a clear development
> path in mind for either option depending on what the community decides.

We'll probably go with the second model. I'm trying to make the latest
version work with SMMUv3
(https://lwn.net/ml/linux-kernel/20181012051632.26064-1-baolu.lu@linux.intel.com/)
and I'd like to send an RFC soon

Thanks,
Jean
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-17 14:21       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-17 14:21 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	joro-zLv9SwRftAIdnm+yROfE0A, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	christian.koenig-5C7GfCeVMHo, eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w, andrew.murray-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	baolu.lu-VuQAYsv1563Yd54FQh9/CA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg

Hi Jordan,

On 12/10/2018 15:32, Jordan Crouse wrote:
> On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
>> Provide an API for allocating PASIDs and populating them manually. To ease
>> cleanup and factor allocation code, reuse the io_mm structure for private
>> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
>> multiple devices. The mm_alloc() IOMMU op must now check if the mm
>> argument is NULL, in which case it should allocate io_pgtables instead of
>> binding to an mm.
>>
>> Signed-off-by: Jordan Crouse <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
>> ---
>> Sadly this probably won't be the final thing. The API in this patch is
>> used like this:
>>
>>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
>>         iommu_sva_map(io_mm, ...)
>>         iommu_sva_unmap(io_mm, ...)
>>         iommu_sva_free_pasid(dev, io_mm)
>>
>> The proposed API for auxiliary domains is in an early stage but might
>> replace this patch and could be used like this:
>>
>>         iommu_enable_aux_domain(dev)
>>         d = iommu_domain_alloc()
>>         iommu_attach_aux(dev, d)
>>         iommu_aux_id(d) -> PASID
>>         iommu_map(d, ...)
>>         iommu_unmap(d, ...)
>>         iommu_detach_aux(dev, d)
>>         iommu_domain_free(d)
>>
>> The advantage being that the driver doesn't have to use a special
>> version of map/unmap/etc.
>
> Hi Jean-Phillippe -
>
> Have you thought about this any more? I want to send out a
> refresh for the per-context pagetables for arm-smmu so if we want to change
> the underlying assumptions this would be a great time.
>
> For my part I'm okay with either model. In fact the second one is closer
> to the original implementation that I sent out so I have a clear development
> path in mind for either option depending on what the community decides.

We'll probably go with the second model. I'm trying to make the latest
version work with SMMUv3
(https://lwn.net/ml/linux-kernel/20181012051632.26064-1-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org/)
and I'd like to send an RFC soon

Thanks,
Jean
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-17 14:24         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-17 14:24 UTC (permalink / raw)
  To: iommu, joro, linux-pci, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On 17/10/2018 15:21, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> On 12/10/2018 15:32, Jordan Crouse wrote:
>> On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
>>> Provide an API for allocating PASIDs and populating them manually. To ease
>>> cleanup and factor allocation code, reuse the io_mm structure for private
>>> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
>>> multiple devices. The mm_alloc() IOMMU op must now check if the mm
>>> argument is NULL, in which case it should allocate io_pgtables instead of
>>> binding to an mm.
>>>
>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>>> ---
>>> Sadly this probably won't be the final thing. The API in this patch is
>>> used like this:
>>>
>>>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
>>>         iommu_sva_map(io_mm, ...)
>>>         iommu_sva_unmap(io_mm, ...)
>>>         iommu_sva_free_pasid(dev, io_mm)
>>>
>>> The proposed API for auxiliary domains is in an early stage but might
>>> replace this patch and could be used like this:
>>>
>>>         iommu_enable_aux_domain(dev)
>>>         d = iommu_domain_alloc()
>>>         iommu_attach_aux(dev, d)
>>>         iommu_aux_id(d) -> PASID
>>>         iommu_map(d, ...)
>>>         iommu_unmap(d, ...)
>>>         iommu_detach_aux(dev, d)
>>>         iommu_domain_free(d)
>>>
>>> The advantage being that the driver doesn't have to use a special
>>> version of map/unmap/etc.
>>
>> Hi Jean-Phillippe -
>>
>> Have you thought about this any more? I want to send out a
>> refresh for the per-context pagetables for arm-smmu so if we want to change
>> the underlying assumptions this would be a great time.
>>
>> For my part I'm okay with either model. In fact the second one is closer
>> to the original implementation that I sent out so I have a clear development
>> path in mind for either option depending on what the community decides.
> 
> We'll probably go with the second model. I'm trying to make the latest
> version work with SMMUv3
> (https://lwn.net/ml/linux-kernel/20181012051632.26064-1-baolu.lu@linux.intel.com/)
> and I'd like to send an RFC soon
> 
> Thanks,
> Jean
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Ugh. Please disregard this notice.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-17 14:24         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 87+ messages in thread
From: Jean-Philippe Brucker @ 2018-10-17 14:24 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	joro-zLv9SwRftAIdnm+yROfE0A, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	christian.koenig-5C7GfCeVMHo, eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w, andrew.murray-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	baolu.lu-VuQAYsv1563Yd54FQh9/CA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg

On 17/10/2018 15:21, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> On 12/10/2018 15:32, Jordan Crouse wrote:
>> On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
>>> Provide an API for allocating PASIDs and populating them manually. To ease
>>> cleanup and factor allocation code, reuse the io_mm structure for private
>>> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
>>> multiple devices. The mm_alloc() IOMMU op must now check if the mm
>>> argument is NULL, in which case it should allocate io_pgtables instead of
>>> binding to an mm.
>>>
>>> Signed-off-by: Jordan Crouse <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
>>> ---
>>> Sadly this probably won't be the final thing. The API in this patch is
>>> used like this:
>>>
>>>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
>>>         iommu_sva_map(io_mm, ...)
>>>         iommu_sva_unmap(io_mm, ...)
>>>         iommu_sva_free_pasid(dev, io_mm)
>>>
>>> The proposed API for auxiliary domains is in an early stage but might
>>> replace this patch and could be used like this:
>>>
>>>         iommu_enable_aux_domain(dev)
>>>         d = iommu_domain_alloc()
>>>         iommu_attach_aux(dev, d)
>>>         iommu_aux_id(d) -> PASID
>>>         iommu_map(d, ...)
>>>         iommu_unmap(d, ...)
>>>         iommu_detach_aux(dev, d)
>>>         iommu_domain_free(d)
>>>
>>> The advantage being that the driver doesn't have to use a special
>>> version of map/unmap/etc.
>>
>> Hi Jean-Phillippe -
>>
>> Have you thought about this any more? I want to send out a
>> refresh for the per-context pagetables for arm-smmu so if we want to change
>> the underlying assumptions this would be a great time.
>>
>> For my part I'm okay with either model. In fact the second one is closer
>> to the original implementation that I sent out so I have a clear development
>> path in mind for either option depending on what the community decides.
> 
> We'll probably go with the second model. I'm trying to make the latest
> version work with SMMUv3
> (https://lwn.net/ml/linux-kernel/20181012051632.26064-1-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org/)
> and I'd like to send an RFC soon
> 
> Thanks,
> Jean
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Ugh. Please disregard this notice.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-17 15:07         ` Jordan Crouse
  0 siblings, 0 replies; 87+ messages in thread
From: Jordan Crouse @ 2018-10-17 15:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, joro, linux-pci, alex.williamson, Jonathan.Cameron,
	jacob.jun.pan, christian.koenig, eric.auger, kevin.tian,
	yi.l.liu, andrew.murray, will.deacon, robin.murphy, ashok.raj,
	baolu.lu, xuzaibo, liguozhu, okaya, bharatku, ilias.apalodimas,
	shunyong.yang

On Wed, Oct 17, 2018 at 03:21:43PM +0100, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> On 12/10/2018 15:32, Jordan Crouse wrote:
> > On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
> >> Provide an API for allocating PASIDs and populating them manually. To ease
> >> cleanup and factor allocation code, reuse the io_mm structure for private
> >> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
> >> multiple devices. The mm_alloc() IOMMU op must now check if the mm
> >> argument is NULL, in which case it should allocate io_pgtables instead of
> >> binding to an mm.
> >>
> >> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> >> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> >> ---
> >> Sadly this probably won't be the final thing. The API in this patch is
> >> used like this:
> >>
> >>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
> >>         iommu_sva_map(io_mm, ...)
> >>         iommu_sva_unmap(io_mm, ...)
> >>         iommu_sva_free_pasid(dev, io_mm)
> >>
> >> The proposed API for auxiliary domains is in an early stage but might
> >> replace this patch and could be used like this:
> >>
> >>         iommu_enable_aux_domain(dev)
> >>         d = iommu_domain_alloc()
> >>         iommu_attach_aux(dev, d)
> >>         iommu_aux_id(d) -> PASID
> >>         iommu_map(d, ...)
> >>         iommu_unmap(d, ...)
> >>         iommu_detach_aux(dev, d)
> >>         iommu_domain_free(d)
> >>
> >> The advantage being that the driver doesn't have to use a special
> >> version of map/unmap/etc.
> >
> > Hi Jean-Phillippe -
> >
> > Have you thought about this any more? I want to send out a
> > refresh for the per-context pagetables for arm-smmu so if we want to change
> > the underlying assumptions this would be a great time.
> >
> > For my part I'm okay with either model. In fact the second one is closer
> > to the original implementation that I sent out so I have a clear development
> > path in mind for either option depending on what the community decides.
> 
> We'll probably go with the second model. I'm trying to make the latest
> version work with SMMUv3
> (https://lwn.net/ml/linux-kernel/20181012051632.26064-1-baolu.lu@linux.intel.com/)
> and I'd like to send an RFC soon

Okay. When you do, I'll try to add the v2 code and make it work with the Adreno
GPU.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs
@ 2018-10-17 15:07         ` Jordan Crouse
  0 siblings, 0 replies; 87+ messages in thread
From: Jordan Crouse @ 2018-10-17 15:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On Wed, Oct 17, 2018 at 03:21:43PM +0100, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> On 12/10/2018 15:32, Jordan Crouse wrote:
> > On Thu, Sep 20, 2018 at 06:00:46PM +0100, Jean-Philippe Brucker wrote:
> >> Provide an API for allocating PASIDs and populating them manually. To ease
> >> cleanup and factor allocation code, reuse the io_mm structure for private
> >> PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
> >> multiple devices. The mm_alloc() IOMMU op must now check if the mm
> >> argument is NULL, in which case it should allocate io_pgtables instead of
> >> binding to an mm.
> >>
> >> Signed-off-by: Jordan Crouse <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> >> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> >> ---
> >> Sadly this probably won't be the final thing. The API in this patch is
> >> used like this:
> >>
> >>         iommu_sva_alloc_pasid(dev, &io_mm) -> PASID
> >>         iommu_sva_map(io_mm, ...)
> >>         iommu_sva_unmap(io_mm, ...)
> >>         iommu_sva_free_pasid(dev, io_mm)
> >>
> >> The proposed API for auxiliary domains is in an early stage but might
> >> replace this patch and could be used like this:
> >>
> >>         iommu_enable_aux_domain(dev)
> >>         d = iommu_domain_alloc()
> >>         iommu_attach_aux(dev, d)
> >>         iommu_aux_id(d) -> PASID
> >>         iommu_map(d, ...)
> >>         iommu_unmap(d, ...)
> >>         iommu_detach_aux(dev, d)
> >>         iommu_domain_free(d)
> >>
> >> The advantage being that the driver doesn't have to use a special
> >> version of map/unmap/etc.
> >
> > Hi Jean-Phillippe -
> >
> > Have you thought about this any more? I want to send out a
> > refresh for the per-context pagetables for arm-smmu so if we want to change
> > the underlying assumptions this would be a great time.
> >
> > For my part I'm okay with either model. In fact the second one is closer
> > to the original implementation that I sent out so I have a clear development
> > path in mind for either option depending on what the community decides.
> 
> We'll probably go with the second model. I'm trying to make the latest
> version work with SMMUv3
> (https://lwn.net/ml/linux-kernel/20181012051632.26064-1-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org/)
> and I'd like to send an RFC soon

Okay. When you do, I'll try to add the v2 code and make it work with the Adreno
GPU.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2018-10-17 15:08 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-20 17:00 [PATCH v3 00/10] Shared Virtual Addressing for the IOMMU Jean-Philippe Brucker
2018-09-20 17:00 ` Jean-Philippe Brucker
2018-09-20 17:00 ` [PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
     [not found]   ` <20180920170046.20154-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-09-23  2:39     ` Lu Baolu
2018-09-24 12:07       ` Jean-Philippe Brucker
2018-09-24 12:07         ` Jean-Philippe Brucker
2018-09-25 13:16       ` Joerg Roedel
2018-09-25 13:16         ` Joerg Roedel
2018-09-25 22:46         ` Jacob Pan
2018-09-25 22:46           ` Jacob Pan
2018-09-26 10:14           ` Jean-Philippe Brucker
2018-09-26 10:14             ` Jean-Philippe Brucker
2018-09-26 12:48           ` Joerg Roedel
2018-09-26 12:48             ` Joerg Roedel
2018-09-20 17:00 ` [PATCH v3 02/10] iommu/sva: Bind process address spaces to devices Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-23  3:05   ` Lu Baolu
2018-09-23  3:05     ` Lu Baolu
2018-09-24 12:07     ` Jean-Philippe Brucker
2018-09-24 12:07       ` Jean-Philippe Brucker
2018-09-26 18:01       ` Jacob Pan
2018-09-26 18:01         ` Jacob Pan
2018-09-27 15:06         ` Jean-Philippe Brucker
2018-09-27 15:06           ` Jean-Philippe Brucker
2018-09-28  1:14           ` Tian, Kevin
2018-09-28  1:14             ` Tian, Kevin
2018-09-20 17:00 ` [PATCH v3 03/10] iommu/sva: Manage process address spaces Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-25  3:15   ` Lu Baolu
2018-09-25  3:15     ` Lu Baolu
2018-09-25 10:32     ` Jean-Philippe Brucker
2018-09-25 10:32       ` Jean-Philippe Brucker
2018-09-26  3:12       ` Lu Baolu
2018-09-26  3:12         ` Lu Baolu
2018-09-25 13:26     ` Joerg Roedel
2018-09-25 13:26       ` Joerg Roedel
2018-09-25 23:33       ` Lu Baolu
2018-09-25 23:33         ` Lu Baolu
2018-09-26 10:20         ` Jean-Philippe Brucker
2018-09-26 10:20           ` Jean-Philippe Brucker
2018-09-26 12:45           ` Joerg Roedel
2018-09-26 12:45             ` Joerg Roedel
2018-09-26 13:50             ` Jean-Philippe Brucker
2018-09-26 13:50               ` Jean-Philippe Brucker
2018-09-27  3:22               ` Liu, Yi L
2018-09-27  3:22                 ` Liu, Yi L
2018-09-27 13:37                 ` Jean-Philippe Brucker
2018-09-27 13:37                   ` Jean-Philippe Brucker
2018-10-08  8:29                   ` Liu, Yi L
2018-10-08  8:29                     ` Liu, Yi L
2018-09-26 22:58             ` Jacob Pan
2018-09-26 22:58               ` Jacob Pan
2018-09-26 22:35   ` Jacob Pan
2018-09-26 22:35     ` Jacob Pan
2018-10-03 17:52     ` Jean-Philippe Brucker
2018-10-03 17:52       ` Jean-Philippe Brucker
2018-10-15 20:53       ` Jacob Pan
2018-10-15 20:53         ` Jacob Pan
2018-09-20 17:00 ` [PATCH v3 04/10] iommu/sva: Add a mm_exit callback for device drivers Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-20 17:00 ` [PATCH v3 05/10] iommu/sva: Track mm changes with an MMU notifier Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-20 17:00 ` [PATCH v3 06/10] iommu/sva: Search mm by PASID Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-25  4:59   ` Lu Baolu
2018-09-25  4:59     ` Lu Baolu
2018-09-20 17:00 ` [PATCH v3 07/10] iommu: Add a page fault handler Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-27 20:37   ` Jacob Pan
2018-09-27 20:37     ` Jacob Pan
2018-10-03 17:46     ` Jean-Philippe Brucker
2018-10-03 17:46       ` Jean-Philippe Brucker
2018-09-20 17:00 ` [PATCH v3 08/10] iommu/iopf: Handle mm faults Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-20 17:00 ` [PATCH v3 09/10] iommu/sva: Register page fault handler Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-09-20 17:00 ` [RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs Jean-Philippe Brucker
2018-09-20 17:00   ` Jean-Philippe Brucker
2018-10-12 14:32   ` Jordan Crouse
2018-10-12 14:32     ` Jordan Crouse
2018-10-17 14:21     ` Jean-Philippe Brucker
2018-10-17 14:21       ` Jean-Philippe Brucker
2018-10-17 14:24       ` Jean-Philippe Brucker
2018-10-17 14:24         ` Jean-Philippe Brucker
2018-10-17 15:07       ` Jordan Crouse
2018-10-17 15:07         ` Jordan Crouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.