iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/40] Shared Virtual Addressing for the IOMMU
@ 2018-05-11 19:06 Jean-Philippe Brucker
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

This is version 2 of the Shared Virtual Addressing (SVA) series, which
adds process address space management (1-6) and I/O page faults (7-9) to
the IOMMU core. It also includes two example users of the API: VFIO as
device driver (10-13), and Arm SMMUv3 as IOMMU driver (14-40).

The series is getting bulky and although it will remove lots of context,
I'll probably split next version into IOMMU, VFIO and SMMU changes. This
time around I also Cc'd the mm list even though get_maintainers doesn't
pick it up, because I'd like to export a few symbols in patches 10-12
and because of the recent interest in sharing mm with devices [1], which
seems somewhat related.

Major changes since v1 [2]:

* Removed bind_group(). Only bind_device() is supported. Supporting
  multi-device groups is difficult and unnecessary at the moment.

* Reworked the I/O Page Fault code to support multiple queues. IOMMU
  driver now registers IOPF queues as needed (for example one per IOMMU
  device), then attach devices that require IOPF (for example, in
  sva_device_init)

* mm_exit callback is allowed to sleep. It is now registered with
  init_sva_device_init() instead of a separate function.

* Remove IOMMU_SVA_FEAT_PASID, making PASID support mandatory for now.
  If a future series implements !PASID, it will have to introduce flag
  IOMMU_SVA_FEAT_NO_PASID.

* Removed the atomic/blocking handler distinction for now. It is a bit
  irrelevant here and can be added back later.

* Tried to address all comments. Please let me know if I missed
  something.

The series relies on Jacob Pan's fault reporting work, currently under
discussion [3].

You can pull everything from:
     git://linux-arm.org/linux-jpb.git sva/v2
Branch sva/debug contains tracepoints and other tools that I found
useful for validation.

[1] https://lwn.net/Articles/753481/
[2] https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg21813.html
[3] IOMMU and VT-d driver support for Shared Virtual Address (SVA)
    https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg22640.html
                                ---

If you're unfamiliar with SVM/SVA, a wordy description copied from
previous version follows.

Shared Virtual Addressing (SVA) is the ability to share process address
spaces with devices. It is called "SVM" (Shared Virtual Memory) by OpenCL
and some IOMMU architectures, but since that abbreviation is already used
for AMD virtualisation in Linux (Secure Virtual Machine), we prefer the
less ambiguous "SVA".

Sharing process address spaces with devices allows to rely on core kernel
memory management for DMA, removing some complexity from application and
device drivers. After binding to a device, applications can instruct it to
perform DMA on buffers obtained with malloc.

The device, bus and IOMMU must support the following features:

* Multiple address spaces per device, for example using the PCI PASID
  (Process Address Space ID) extension. The IOMMU driver allocates a
  PASID and the device uses it in DMA transactions.

* I/O Page Faults (IOPF), for example PCI PRI (Page Request Interface) or
  Arm SMMU stall. The core mm handles translation faults from the IOMMU.

* MMU and IOMMU implement compatible page table formats.

This series requires to support all three features. Upcoming patches will
enable private PASID management, which doesn't share page tables and
augments the map/unmap API with PASIDs.

Although we don't have any performance measurement at the moment, SVA will
likely be slower than classical DMA since it relies on page faults,
whereas classical DMA pins all pages in memory. SVA mostly aims at
simplifying DMA management, but also improves security by isolating
address spaces in devices.

Jean-Philippe Brucker (40):
  iommu: Introduce Shared Virtual Addressing API
  iommu/sva: Bind process address spaces to devices
  iommu/sva: Manage process address spaces
  iommu/sva: Add a mm_exit callback for device drivers
  iommu/sva: Track mm changes with an MMU notifier
  iommu/sva: Search mm by PASID
  iommu: Add a page fault handler
  iommu/iopf: Handle mm faults
  iommu/sva: Register page fault handler
  mm: export symbol mm_access
  mm: export symbol find_get_task_by_vpid
  mm: export symbol mmput_async
  vfio: Add support for Shared Virtual Addressing
  dt-bindings: document stall and PASID properties for IOMMU masters
  iommu/of: Add stall and pasid properties to iommu_fwspec
  arm64: mm: Pin down ASIDs for sharing mm with devices
  iommu/arm-smmu-v3: Link domains and devices
  iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  iommu: Add generic PASID table library
  iommu/arm-smmu-v3: Move context descriptor code
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Seize private ASID
  iommu/arm-smmu-v3: Add support for VHE
  iommu/arm-smmu-v3: Enable broadcast TLB maintenance
  iommu/arm-smmu-v3: Add SVA feature checking
  iommu/arm-smmu-v3: Implement mm operations
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  iommu/arm-smmu-v3: Register I/O Page Fault queue
  iommu/arm-smmu-v3: Improve add_device error handling
  iommu/arm-smmu-v3: Maintain a SID->device structure
  iommu/arm-smmu-v3: Add stall support for platform devices
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
  iommu/arm-smmu-v3: Disable tagged pointers
  PCI: Make "PRG Response PASID Required" handling common
  iommu/arm-smmu-v3: Add support for PRI
  iommu/arm-smmu-v3: Add support for PCI PASID

 .../devicetree/bindings/iommu/iommu.txt       |   24 +
 MAINTAINERS                                   |    6 +-
 arch/arm64/include/asm/mmu.h                  |    1 +
 arch/arm64/include/asm/mmu_context.h          |   11 +-
 arch/arm64/mm/context.c                       |   92 +-
 drivers/acpi/arm64/iort.c                     |   11 +
 drivers/iommu/Kconfig                         |   29 +
 drivers/iommu/Makefile                        |    4 +
 drivers/iommu/amd_iommu.c                     |   19 +-
 drivers/iommu/arm-smmu-v3-context.c           |  700 ++++++++
 drivers/iommu/arm-smmu-v3.c                   | 1438 ++++++++++++++---
 drivers/iommu/io-pgfault.c                    |  445 +++++
 drivers/iommu/io-pgtable-arm.c                |   49 +-
 drivers/iommu/io-pgtable-arm.h                |   54 +
 drivers/iommu/iommu-pasid-table.c             |   52 +
 drivers/iommu/iommu-pasid-table.h             |  177 ++
 drivers/iommu/iommu-sva.c                     |  792 +++++++++
 drivers/iommu/iommu.c                         |   84 +
 drivers/iommu/of_iommu.c                      |   12 +
 drivers/pci/ats.c                             |   17 +
 drivers/vfio/vfio_iommu_type1.c               |  449 ++++-
 include/linux/iommu.h                         |  184 +++
 include/linux/pci-ats.h                       |    8 +
 include/uapi/linux/pci_regs.h                 |    1 +
 include/uapi/linux/vfio.h                     |   76 +
 kernel/fork.c                                 |   15 +
 kernel/pid.c                                  |    1 +
 27 files changed, 4469 insertions(+), 282 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-context.c
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-pasid-table.c
 create mode 100644 drivers/iommu/iommu-pasid-table.h
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.17.0

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices Jean-Philippe Brucker
                     ` (38 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Shared Virtual Addressing (SVA) provides a way for device drivers to bind
process address spaces to devices. This requires the IOMMU to support page
table format and features compatible with the CPUs, and usually requires
the system to support I/O Page Faults (IOPF) and Process Address Space ID
(PASID). When all of these are available, DMA can access virtual addresses
of a process. A PASID is allocated for each process, and the device driver
programs it into the device in an implementation-specific way.

Add a new API for sharing process page tables with devices. Introduce two
IOMMU operations, sva_device_init() and sva_device_shutdown(), that
prepare the IOMMU driver for SVA. For example allocate PASID tables and
fault queues. Subsequent patches will implement the bind() and unbind()
operations.

Support for I/O Page Faults will be added in a later patch using a new
feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
down all shared mappings. Other feature bits that may be added in the
future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address
spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address
space to a process.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2:
* Add sva_param structure to iommu_param
* CONFIG option is only selectable by IOMMU drivers
---
 drivers/iommu/Kconfig     |   4 ++
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/iommu-sva.c | 110 ++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     |  32 +++++++++++
 4 files changed, 147 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7564237f788d..cca8e06903c7 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,6 +74,10 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_SVA
+	bool
+	select IOMMU_API
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..1dbcc89ebe4c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index 000000000000..8b4afb7c63ae
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage PASIDs and bind process address spaces to devices.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/iommu.h>
+#include <linux/slab.h>
+
+/**
+ * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
+ * @dev: the device
+ * @features: bitmask of features that need to be initialized
+ * @max_pasid: max PASID value supported by the device
+ *
+ * Users of the bind()/unbind() API must call this function to initialize all
+ * features required for SVA.
+ *
+ * The device must support multiple address spaces (e.g. PCI PASID). By default
+ * the PASID allocated during bind() is limited by the IOMMU capacity, and by
+ * the device PASID width defined in the PCI capability or in the firmware
+ * description. Setting @max_pasid to a non-zero value smaller than this limit
+ * overrides it.
+ *
+ * The device should not be performing any DMA while this function is running,
+ * otherwise the behavior is undefined.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_device_init(struct device *dev, unsigned long features,
+			  unsigned int max_pasid)
+{
+	int ret;
+	struct iommu_sva_param *param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain || !domain->ops->sva_device_init)
+		return -ENODEV;
+
+	if (features)
+		return -EINVAL;
+
+	param = kzalloc(sizeof(*param), GFP_KERNEL);
+	if (!param)
+		return -ENOMEM;
+
+	param->features		= features;
+	param->max_pasid	= max_pasid;
+
+	/*
+	 * IOMMU driver updates the limits depending on the IOMMU and device
+	 * capabilities.
+	 */
+	ret = domain->ops->sva_device_init(dev, param);
+	if (ret)
+		goto err_free_param;
+
+	mutex_lock(&dev->iommu_param->lock);
+	if (dev->iommu_param->sva_param)
+		ret = -EEXIST;
+	else
+		dev->iommu_param->sva_param = param;
+	mutex_unlock(&dev->iommu_param->lock);
+	if (ret)
+		goto err_device_shutdown;
+
+	return 0;
+
+err_device_shutdown:
+	if (domain->ops->sva_device_shutdown)
+		domain->ops->sva_device_shutdown(dev, param);
+
+err_free_param:
+	kfree(param);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_device_init);
+
+/**
+ * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
+ * @dev: the device
+ *
+ * Disable SVA. Device driver should ensure that the device isn't performing any
+ * DMA while this function is running.
+ */
+int iommu_sva_device_shutdown(struct device *dev)
+{
+	struct iommu_sva_param *param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain)
+		return -ENODEV;
+
+	mutex_lock(&dev->iommu_param->lock);
+	param = dev->iommu_param->sva_param;
+	dev->iommu_param->sva_param = NULL;
+	mutex_unlock(&dev->iommu_param->lock);
+	if (!param)
+		return -ENODEV;
+
+	if (domain->ops->sva_device_shutdown)
+		domain->ops->sva_device_shutdown(dev, param);
+
+	kfree(param);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0933f726d2e6..2efe7738bedb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -212,6 +212,12 @@ struct page_response_msg {
 	u64 private_data;
 };
 
+struct iommu_sva_param {
+	unsigned long features;
+	unsigned int min_pasid;
+	unsigned int max_pasid;
+};
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -219,6 +225,8 @@ struct page_response_msg {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @sva_device_init: initialize Shared Virtual Adressing for a device
+ * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -256,6 +264,10 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	int (*sva_device_init)(struct device *dev,
+			       struct iommu_sva_param *param);
+	void (*sva_device_shutdown)(struct device *dev,
+				    struct iommu_sva_param *param);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -413,6 +425,7 @@ struct iommu_fault_param {
  * struct iommu_param - collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
+ * @sva_param: SVA parameters
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  *	struct iommu_group	*iommu_group;
@@ -421,6 +434,7 @@ struct iommu_fault_param {
 struct iommu_param {
 	struct mutex lock;
 	struct iommu_fault_param *fault_param;
+	struct iommu_sva_param *sva_param;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -920,4 +934,22 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
 
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_SVA
+extern int iommu_sva_device_init(struct device *dev, unsigned long features,
+				 unsigned int max_pasid);
+extern int iommu_sva_device_shutdown(struct device *dev);
+#else /* CONFIG_IOMMU_SVA */
+static inline int iommu_sva_device_init(struct device *dev,
+					unsigned long features,
+					unsigned int max_pasid)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_device_shutdown(struct device *dev)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-3-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 03/40] iommu/sva: Manage process address spaces Jean-Philippe Brucker
                     ` (37 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Add bind() and unbind() operations to the IOMMU API. Bind() returns a
PASID that drivers can program in hardware, to let their devices access an
mm. This patch only adds skeletons for the device driver API, most of the
implementation is still missing.

IOMMU groups with more than one device aren't supported for SVA at the
moment. There may be P2P traffic between devices within a group, which
cannot be seen by an IOMMU (note that supporting PASID doesn't add any
form of isolation with regard to P2P). Supporting groups would require
calling bind() for all bound processes every time a device is added to a
group, to perform sanity checks (e.g. ensure that new devices support
PASIDs at least as big as those already allocated in the group). It also
means making sure that reserved ranges (IOMMU_RESV_*) of all devices are
carved out of processes. This is already tricky with single devices, but
becomes very difficult with groups. Since SVA-capable devices are expected
to be cleanly isolated, and since we don't have any way to test groups or
hot-plug, we only allow singular groups for now.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: remove iommu_sva_bind/unbind_group
---
 drivers/iommu/iommu-sva.c | 27 +++++++++++++
 drivers/iommu/iommu.c     | 83 +++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 37 +++++++++++++++++
 3 files changed, 147 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 8b4afb7c63ae..8d98f9c09864 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -93,6 +93,8 @@ int iommu_sva_device_shutdown(struct device *dev)
 	if (!domain)
 		return -ENODEV;
 
+	__iommu_sva_unbind_dev_all(dev);
+
 	mutex_lock(&dev->iommu_param->lock);
 	param = dev->iommu_param->sva_param;
 	dev->iommu_param->sva_param = NULL;
@@ -108,3 +110,28 @@ int iommu_sva_device_shutdown(struct device *dev)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
+
+int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+			    int *pasid, unsigned long flags, void *drvdata)
+{
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
+
+int __iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
+
+/**
+ * __iommu_sva_unbind_dev_all() - Detach all address spaces from this device
+ * @dev: the device
+ *
+ * When detaching @device from a domain, IOMMU drivers should use this helper.
+ */
+void __iommu_sva_unbind_dev_all(struct device *dev)
+{
+	/* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9e28d88c8074..bd2819deae5b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2261,3 +2261,86 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to it
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between device and task, allowing the device to access the mm
+ * using the returned PASID. If unbind() isn't called first, a subsequent bind()
+ * for the same device and mm fails with -EEXIST.
+ *
+ * iommu_sva_device_init() must be called first, to initialize the required SVA
+ * features. @flags is a subset of these features.
+ *
+ * The caller must pin down using get_user_pages*() all mappings shared with the
+ * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
+ * (e.g. copy-on-write).
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			  unsigned long flags, void *drvdata)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+
+	if (!pasid)
+		return -EINVAL;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	/* Ensure device count and domain don't change while we're binding */
+	mutex_lock(&group->mutex);
+	if (iommu_group_device_count(group) != 1)
+		goto out_unlock;
+
+	ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
+
+out_unlock:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @dev: the device
+ * @pasid: the pasid returned by bind()
+ *
+ * Remove bond between device and address space identified by @pasid. Users
+ * should not call unbind() if the corresponding mm exited (as the PASID might
+ * have been reallocated for another process).
+ *
+ * The device must not be issuing any more transaction for this PASID. All
+ * outstanding page requests for this PASID must have been flushed to the IOMMU.
+ *
+ * Returns 0 on success, or an error value
+ */
+int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_sva_unbind_device(dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2efe7738bedb..da59c20c4f12 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -613,6 +613,10 @@ void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
 
+extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				int *pasid, unsigned long flags, void *drvdata);
+extern int iommu_sva_unbind_device(struct device *dev, int pasid);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -932,12 +936,29 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
 	return -EINVAL;
 }
 
+static inline int iommu_sva_bind_device(struct device *dev,
+					struct mm_struct *mm, int *pasid,
+					unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_SVA
 extern int iommu_sva_device_init(struct device *dev, unsigned long features,
 				 unsigned int max_pasid);
 extern int iommu_sva_device_shutdown(struct device *dev);
+extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				   int *pasid, unsigned long flags,
+				   void *drvdata);
+extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
+extern void __iommu_sva_unbind_dev_all(struct device *dev);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -950,6 +971,22 @@ static inline int iommu_sva_device_shutdown(struct device *dev)
 {
 	return -ENODEV;
 }
+
+static inline int __iommu_sva_bind_device(struct device *dev,
+					  struct mm_struct *mm, int *pasid,
+					  unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void __iommu_sva_unbind_dev_all(struct device *dev)
+{
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-4-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 04/40] iommu/sva: Add a mm_exit callback for device drivers Jean-Philippe Brucker
                     ` (36 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Allocate IOMMU mm structures and binding them to devices. Four operations
are added to IOMMU drivers:

* mm_alloc(): to create an io_mm structure and perform architecture-
  specific operations required to grab the process (for instance on ARM,
  pin down the CPU ASID so that the process doesn't get assigned a new
  ASID on rollover).

  There is a single valid io_mm structure per Linux mm. Future extensions
  may also use io_mm for kernel-managed address spaces, populated with
  map()/unmap() calls instead of bound to process address spaces. This
  patch focuses on "shared" io_mm.

* mm_attach(): attach an mm to a device. The IOMMU driver checks that the
  device is capable of sharing an address space, and writes the PASID
  table entry to install the pgd.

  Some IOMMU drivers will have a single PASID table per domain, for
  convenience. Other can implement it differently but to help these
  drivers, mm_attach and mm_detach take 'attach_domain' and
  'detach_domain' parameters, that tell whether they need to set and clear
  the PASID entry or only send the required TLB invalidations.

* mm_detach(): detach an mm from a device. The IOMMU driver removes the
  PASID table entry and invalidates the IOTLBs.

* mm_free(): free a structure allocated by mm_alloc(), and let arch
  release the process.

mm_attach and mm_detach operations are serialized with a spinlock. When
trying to optimize this code, we should at least prevent concurrent
attach()/detach() on the same domain (so multi-level PASID table code can
allocate tables lazily). mm_alloc() can sleep, but mm_free must not
(because we'll have to call it from call_srcu later on).

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Keeping track of address spaces requires the use of MMU notifiers.
Handling process exit with regard to unbind() is tricky, so it is left for
another patch and we explicitly fail mm_alloc() for the moment.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: sanity-check of flags
---
 drivers/iommu/iommu-sva.c | 380 +++++++++++++++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |   1 +
 include/linux/iommu.h     |  28 +++
 3 files changed, 406 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 8d98f9c09864..6ac679c48f3c 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -5,8 +5,298 @@
  * Copyright (C) 2018 ARM Ltd.
  */
 
+#include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
+ * device. A device can have multiple io_mm and an io_mm may be bound to
+ * multiple devices.
+ *              ___________________________
+ *             |  IOMMU domain A           |
+ *             |  ________________         |
+ *             | |  IOMMU group   |        +------- io_pgtables
+ *             | |                |        |
+ *             | |   dev 00:00.0 ----+------- bond --- io_mm X
+ *             | |________________|   \    |
+ *             |                       '----- bond ---.
+ *             |___________________________|           \
+ *              ___________________________             \
+ *             |  IOMMU domain B           |           io_mm Y
+ *             |  ________________         |           / /
+ *             | |  IOMMU group   |        |          / /
+ *             | |                |        |         / /
+ *             | |   dev 00:01.0 ------------ bond -' /
+ *             | |   dev 00:01.1 ------------ bond --'
+ *             | |________________|        |
+ *             |                           +------- io_pgtables
+ *             |___________________________|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y. In addition each
+ * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
+ * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
+ *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. In the
+ * example, devices use PASID 1 to read/write into address space X and PASID 2
+ * to read/write into address space Y.
+ *
+ * Hardware tables describing this configuration in the IOMMU would typically
+ * look like this:
+ *
+ *                                PASID tables
+ *                                 of domain A
+ *                              .->+--------+
+ *                             / 0 |        |-------> io_pgtable
+ *                            /    +--------+
+ *            Device tables  /   1 |        |-------> pgd X
+ *              +--------+  /      +--------+
+ *      00:00.0 |      A |-'     2 |        |--.
+ *              +--------+         +--------+   \
+ *              :        :       3 |        |    \
+ *              +--------+         +--------+     --> pgd Y
+ *      00:01.0 |      B |--.                    /
+ *              +--------+   \                  |
+ *      00:01.1 |      B |----+   PASID tables  |
+ *              +--------+     \   of domain B  |
+ *                              '->+--------+   |
+ *                               0 |        |-- | --> io_pgtable
+ *                                 +--------+   |
+ *                               1 |        |   |
+ *                                 +--------+   |
+ *                               2 |        |---'
+ *                                 +--------+
+ *                               3 |        |
+ *                                 +--------+
+ *
+ * With this model, a single call binds all devices in a given domain to an
+ * address space. Other devices in the domain will get the same bond implicitly.
+ * However, users must issue one bind() for each device, because IOMMUs may
+ * implement SVA differently. Furthermore, mandating one bind() per device
+ * allows the driver to perform sanity-checks on device capabilities.
+ *
+ * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
+ * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
+ * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
+ * the device table and PASID 0 would be available to the allocator.
+ */
+
+struct iommu_bond {
+	struct io_mm		*io_mm;
+	struct device		*dev;
+	struct iommu_domain	*domain;
+
+	struct list_head	mm_head;
+	struct list_head	dev_head;
+	struct list_head	domain_head;
+
+	void			*drvdata;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_pasid_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to bonds, access/modifications to the PASID IDR, and
+ * changes to io_mm refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_sva_lock);
+
+static struct io_mm *
+io_mm_alloc(struct iommu_domain *domain, struct device *dev,
+	    struct mm_struct *mm, unsigned long flags)
+{
+	int ret;
+	int pasid;
+	struct io_mm *io_mm;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
+		return ERR_PTR(-ENODEV);
+
+	io_mm = domain->ops->mm_alloc(domain, mm, flags);
+	if (IS_ERR(io_mm))
+		return io_mm;
+	if (!io_mm)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->flags		= flags;
+	io_mm->mm		= mm;
+	io_mm->release		= domain->ops->mm_free;
+	INIT_LIST_HEAD(&io_mm->devices);
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
+			  param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		ret = pasid;
+		goto err_free_mm;
+	}
+
+	/* TODO: keep track of mm. For the moment, abort. */
+	ret = -ENOSYS;
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	spin_unlock(&iommu_sva_lock);
+
+err_free_mm:
+	domain->ops->mm_free(io_mm);
+	mmdrop(mm);
+
+	return ERR_PTR(ret);
+}
+
+static void io_mm_free(struct io_mm *io_mm)
+{
+	struct mm_struct *mm = io_mm->mm;
+
+	io_mm->release(io_mm);
+	mmdrop(mm);
+}
+
+static void io_mm_release(struct kref *kref)
+{
+	struct io_mm *io_mm;
+
+	io_mm = container_of(kref, struct io_mm, kref);
+	WARN_ON(!list_empty(&io_mm->devices));
+
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+
+	io_mm_free(io_mm);
+}
+
+/*
+ * Returns non-zero if a reference to the io_mm was successfully taken.
+ * Returns zero if the io_mm is being freed and should not be used.
+ */
+static int io_mm_get_locked(struct io_mm *io_mm)
+{
+	if (io_mm)
+		return kref_get_unless_zero(&io_mm->kref);
+
+	return 0;
+}
+
+static void io_mm_put_locked(struct io_mm *io_mm)
+{
+	kref_put(&io_mm->kref, io_mm_release);
+}
+
+static void io_mm_put(struct io_mm *io_mm)
+{
+	spin_lock(&iommu_sva_lock);
+	io_mm_put_locked(io_mm);
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
+			struct io_mm *io_mm, void *drvdata)
+{
+	int ret;
+	bool attach_domain = true;
+	int pasid = io_mm->pasid;
+	struct iommu_bond *bond, *tmp;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+		return -ENODEV;
+
+	if (pasid > param->max_pasid || pasid < param->min_pasid)
+		return -ERANGE;
+
+	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond->domain		= domain;
+	bond->io_mm		= io_mm;
+	bond->dev		= dev;
+	bond->drvdata		= drvdata;
+
+	spin_lock(&iommu_sva_lock);
+	/*
+	 * Check if this io_mm is already bound to the domain. In which case the
+	 * IOMMU driver doesn't have to install the PASID table entry.
+	 */
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == io_mm) {
+			attach_domain = false;
+			break;
+		}
+	}
+
+	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
+	if (ret) {
+		kfree(bond);
+		spin_unlock(&iommu_sva_lock);
+		return ret;
+	}
+
+	list_add(&bond->mm_head, &io_mm->devices);
+	list_add(&bond->domain_head, &domain->mm_list);
+	list_add(&bond->dev_head, &param->mm_list);
+	spin_unlock(&iommu_sva_lock);
+
+	return 0;
+}
+
+static void io_mm_detach_locked(struct iommu_bond *bond)
+{
+	struct iommu_bond *tmp;
+	bool detach_domain = true;
+	struct iommu_domain *domain = bond->domain;
+
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
+			detach_domain = false;
+			break;
+		}
+	}
+
+	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
+
+	list_del(&bond->mm_head);
+	list_del(&bond->domain_head);
+	list_del(&bond->dev_head);
+	io_mm_put_locked(bond->io_mm);
+
+	kfree(bond);
+}
 
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
@@ -47,6 +337,7 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
 
 	param->features		= features;
 	param->max_pasid	= max_pasid;
+	INIT_LIST_HEAD(&param->mm_list);
 
 	/*
 	 * IOMMU driver updates the limits depending on the IOMMU and device
@@ -114,13 +405,87 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
 int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 			    int *pasid, unsigned long flags, void *drvdata)
 {
-	return -ENOSYS; /* TODO */
+	int i, ret = 0;
+	struct io_mm *io_mm = NULL;
+	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL, *tmp;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	/*
+	 * The device driver does not call sva_device_init/shutdown and
+	 * bind/unbind concurrently, so no need to take the param lock.
+	 */
+	if (WARN_ON_ONCE(!param) || (flags & ~param->features))
+		return -EINVAL;
+
+	/* If an io_mm already exists, use it */
+	spin_lock(&iommu_sva_lock);
+	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
+			/* ... Unless it's already bound to this device */
+			list_for_each_entry(tmp, &io_mm->devices, mm_head) {
+				if (tmp->dev == dev) {
+					bond = tmp;
+					io_mm_put_locked(io_mm);
+					break;
+				}
+			}
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	if (bond)
+		return -EEXIST;
+
+	/* Require identical features within an io_mm for now */
+	if (io_mm && (flags != io_mm->flags)) {
+		io_mm_put(io_mm);
+		return -EDOM;
+	}
+
+	if (!io_mm) {
+		io_mm = io_mm_alloc(domain, dev, mm, flags);
+		if (IS_ERR(io_mm))
+			return PTR_ERR(io_mm);
+	}
+
+	ret = io_mm_attach(domain, dev, io_mm, drvdata);
+	if (ret)
+		io_mm_put(io_mm);
+	else
+		*pasid = io_mm->pasid;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
 
 int __iommu_sva_unbind_device(struct device *dev, int pasid)
 {
-	return -ENOSYS; /* TODO */
+	int ret = -ESRCH;
+	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!param || WARN_ON(!domain))
+		return -EINVAL;
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &param->mm_list, dev_head) {
+		if (bond->io_mm->pasid == pasid) {
+			io_mm_detach_locked(bond);
+			ret = 0;
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
 
@@ -132,6 +497,15 @@ EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
  */
 void __iommu_sva_unbind_dev_all(struct device *dev)
 {
-	/* TODO */
+	struct iommu_sva_param *param;
+	struct iommu_bond *bond, *next;
+
+	param = dev->iommu_param->sva_param;
+	if (param) {
+		spin_lock(&iommu_sva_lock);
+		list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
+			io_mm_detach_locked(bond);
+		spin_unlock(&iommu_sva_lock);
+	}
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bd2819deae5b..333801e1519c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1463,6 +1463,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->mm_list);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index da59c20c4f12..d5f21719a5a0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -100,6 +100,20 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	struct list_head mm_list;
+};
+
+struct io_mm {
+	int			pasid;
+	/* IOMMU_SVA_FEAT_* */
+	unsigned long		flags;
+	struct list_head	devices;
+	struct kref		kref;
+	struct mm_struct	*mm;
+
+	/* Release callback for this mm */
+	void (*release)(struct io_mm *io_mm);
 };
 
 enum iommu_cap {
@@ -216,6 +230,7 @@ struct iommu_sva_param {
 	unsigned long features;
 	unsigned int min_pasid;
 	unsigned int max_pasid;
+	struct list_head mm_list;
 };
 
 /**
@@ -227,6 +242,11 @@ struct iommu_sva_param {
  * @detach_dev: detach device from an iommu domain
  * @sva_device_init: initialize Shared Virtual Adressing for a device
  * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
+ * @mm_alloc: allocate io_mm
+ * @mm_free: free io_mm
+ * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
+ * @mm_detach: detach io_mm from a device. Remove PASID entry and
+ *             flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -268,6 +288,14 @@ struct iommu_ops {
 			       struct iommu_sva_param *param);
 	void (*sva_device_shutdown)(struct device *dev,
 				    struct iommu_sva_param *param);
+	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
+				  struct mm_struct *mm,
+				  unsigned long flags);
+	void (*mm_free)(struct io_mm *io_mm);
+	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
+			 struct io_mm *io_mm, bool attach_domain);
+	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
+			  struct io_mm *io_mm, bool detach_domain);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 04/40] iommu/sva: Add a mm_exit callback for device drivers
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 03/40] iommu/sva: Manage process address spaces Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-5-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 05/40] iommu/sva: Track mm changes with an MMU notifier Jean-Philippe Brucker
                     ` (35 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

When an mm exits, devices that were bound to it must stop performing DMA
on its PASID. Let device drivers register a callback to be notified on mm
exit. Add the callback to the sva_param structure attached to struct
device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: use iommu_sva_device_init instead of a separate function
---
 drivers/iommu/iommu-sva.c | 11 ++++++++++-
 include/linux/iommu.h     |  9 +++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 6ac679c48f3c..0700893c679d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -303,6 +303,7 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
  * @dev: the device
  * @features: bitmask of features that need to be initialized
  * @max_pasid: max PASID value supported by the device
+ * @mm_exit: callback to notify the device driver of an mm exiting
  *
  * Users of the bind()/unbind() API must call this function to initialize all
  * features required for SVA.
@@ -313,13 +314,20 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
  * description. Setting @max_pasid to a non-zero value smaller than this limit
  * overrides it.
  *
+ * If the driver intends to share process address spaces, it should pass a valid
+ * @mm_exit handler. Otherwise @mm_exit can be NULL. After @mm_exit returns, the
+ * device must not issue any more transaction with the PASID given as argument.
+ * The handler gets an opaque pointer corresponding to the drvdata passed as
+ * argument of bind().
+ *
  * The device should not be performing any DMA while this function is running,
  * otherwise the behavior is undefined.
  *
  * Return 0 if initialization succeeded, or an error.
  */
 int iommu_sva_device_init(struct device *dev, unsigned long features,
-			  unsigned int max_pasid)
+			  unsigned int max_pasid,
+			  iommu_mm_exit_handler_t mm_exit)
 {
 	int ret;
 	struct iommu_sva_param *param;
@@ -337,6 +345,7 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
 
 	param->features		= features;
 	param->max_pasid	= max_pasid;
+	param->mm_exit		= mm_exit;
 	INIT_LIST_HEAD(&param->mm_list);
 
 	/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d5f21719a5a0..439c8fffd836 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,7 @@ struct iommu_fault_event;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
@@ -231,6 +232,7 @@ struct iommu_sva_param {
 	unsigned int min_pasid;
 	unsigned int max_pasid;
 	struct list_head mm_list;
+	iommu_mm_exit_handler_t mm_exit;
 };
 
 /**
@@ -980,17 +982,20 @@ static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
 
 #ifdef CONFIG_IOMMU_SVA
 extern int iommu_sva_device_init(struct device *dev, unsigned long features,
-				 unsigned int max_pasid);
+				 unsigned int max_pasid,
+				 iommu_mm_exit_handler_t mm_exit);
 extern int iommu_sva_device_shutdown(struct device *dev);
 extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				   int *pasid, unsigned long flags,
 				   void *drvdata);
 extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void __iommu_sva_unbind_dev_all(struct device *dev);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
-					unsigned int max_pasid)
+					unsigned int max_pasid,
+					iommu_mm_exit_handler_t mm_exit)
 {
 	return -ENODEV;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 05/40] iommu/sva: Track mm changes with an MMU notifier
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 04/40] iommu/sva: Add a mm_exit callback for device drivers Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-6-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 06/40] iommu/sva: Search mm by PASID Jean-Philippe Brucker
                     ` (34 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

When creating an io_mm structure, register an MMU notifier that informs
us when the virtual address space changes and disappears.

Add a new operation to the IOMMU driver, mm_invalidate, called when a
range of addresses is unmapped to let the IOMMU driver send ATC
invalidations. mm_invalidate cannot sleep.

Adding the notifier complicates io_mm release. In one case device
drivers free the io_mm explicitly by calling unbind (or detaching the
device from its domain). In the other case the process could crash
before unbind, in which case the release notifier has to do all the
work.

Allowing the device driver's mm_exit() handler to sleep adds another
complication, but it will greatly simplify things for users. For example
VFIO can take the IOMMU mutex and remove any trace of io_mm, instead of
introducing complex synchronization to delicatly handle this race. But
relaxing the user side does force unbind() to sleep and wait for all
pending mm_exit() calls to finish.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2:
* Unbind() waits for mm_exit to finish
* mm_exit can sleep
---
 drivers/iommu/Kconfig     |   1 +
 drivers/iommu/iommu-sva.c | 248 +++++++++++++++++++++++++++++++++++---
 include/linux/iommu.h     |  10 ++
 3 files changed, 244 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index cca8e06903c7..38434899e283 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -77,6 +77,7 @@ config IOMMU_DMA
 config IOMMU_SVA
 	bool
 	select IOMMU_API
+	select MMU_NOTIFIER
 
 config FSL_PAMU
 	bool "Freescale IOMMU support"
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 0700893c679d..e9afae2537a2 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,6 +7,7 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
@@ -106,6 +107,9 @@ struct iommu_bond {
 	struct list_head	mm_head;
 	struct list_head	dev_head;
 	struct list_head	domain_head;
+	refcount_t		refs;
+	struct wait_queue_head	mm_exit_wq;
+	bool			mm_exit_active;
 
 	void			*drvdata;
 };
@@ -124,6 +128,8 @@ static DEFINE_IDR(iommu_pasid_idr);
  */
 static DEFINE_SPINLOCK(iommu_sva_lock);
 
+static struct mmu_notifier_ops iommu_mmu_notifier;
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm, unsigned long flags)
@@ -151,6 +157,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 
 	io_mm->flags		= flags;
 	io_mm->mm		= mm;
+	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
 
@@ -167,8 +174,29 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 		goto err_free_mm;
 	}
 
-	/* TODO: keep track of mm. For the moment, abort. */
-	ret = -ENOSYS;
+	ret = mmu_notifier_register(&io_mm->notifier, mm);
+	if (ret)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * io_mm by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * The following barrier ensures that users, who obtain the io_mm with
+	 * kref_get_unless_zero, don't read uninitialized fields in the
+	 * structure.
+	 */
+	smp_wmb();
+	kref_init(&io_mm->kref);
+
+	return io_mm;
+
+err_free_pasid:
+	/*
+	 * Even if the io_mm is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
 	spin_lock(&iommu_sva_lock);
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);
@@ -180,9 +208,13 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	return ERR_PTR(ret);
 }
 
-static void io_mm_free(struct io_mm *io_mm)
+static void io_mm_free(struct rcu_head *rcu)
 {
-	struct mm_struct *mm = io_mm->mm;
+	struct io_mm *io_mm;
+	struct mm_struct *mm;
+
+	io_mm = container_of(rcu, struct io_mm, rcu);
+	mm = io_mm->mm;
 
 	io_mm->release(io_mm);
 	mmdrop(mm);
@@ -197,7 +229,22 @@ static void io_mm_release(struct kref *kref)
 
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 
-	io_mm_free(io_mm);
+	/*
+	 * If we're being released from mm exit, the notifier callback ->release
+	 * has already been called. Otherwise we don't need ->release, the io_mm
+	 * isn't attached to anything anymore. Hence no_release.
+	 */
+	mmu_notifier_unregister_no_release(&io_mm->notifier, io_mm->mm);
+
+	/*
+	 * We can't free the structure here, because if mm exits during
+	 * unbind(), then ->release might be attempting to grab the io_mm
+	 * concurrently. And in the other case, if ->release is calling
+	 * io_mm_release, then __mmu_notifier_release expects to still have a
+	 * valid mn when returning. So free the structure when it's safe, after
+	 * the RCU grace period elapsed.
+	 */
+	mmu_notifier_call_srcu(&io_mm->rcu, io_mm_free);
 }
 
 /*
@@ -206,8 +253,14 @@ static void io_mm_release(struct kref *kref)
  */
 static int io_mm_get_locked(struct io_mm *io_mm)
 {
-	if (io_mm)
-		return kref_get_unless_zero(&io_mm->kref);
+	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
+		/*
+		 * kref_get_unless_zero doesn't provide ordering for reads. This
+		 * barrier pairs with the one in io_mm_alloc.
+		 */
+		smp_rmb();
+		return 1;
+	}
 
 	return 0;
 }
@@ -233,7 +286,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	struct iommu_bond *bond, *tmp;
 	struct iommu_sva_param *param = dev->iommu_param->sva_param;
 
-	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
+	    !domain->ops->mm_invalidate)
 		return -ENODEV;
 
 	if (pasid > param->max_pasid || pasid < param->min_pasid)
@@ -247,6 +301,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	bond->io_mm		= io_mm;
 	bond->dev		= dev;
 	bond->drvdata		= drvdata;
+	refcount_set(&bond->refs, 1);
+	init_waitqueue_head(&bond->mm_exit_wq);
 
 	spin_lock(&iommu_sva_lock);
 	/*
@@ -275,12 +331,37 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	return 0;
 }
 
-static void io_mm_detach_locked(struct iommu_bond *bond)
+static void io_mm_detach_locked(struct iommu_bond *bond, bool wait)
 {
 	struct iommu_bond *tmp;
 	bool detach_domain = true;
 	struct iommu_domain *domain = bond->domain;
 
+	if (wait) {
+		bool do_detach = true;
+		/*
+		 * If we're unbind() then we're deleting the bond no matter
+		 * what. Tell the mm_exit thread that we're cleaning up, and
+		 * wait until it finishes using the bond.
+		 *
+		 * refs is guaranteed to be one or more, otherwise it would
+		 * already have been removed from the list. Check is someone is
+		 * already waiting, in which case we wait but do not free.
+		 */
+		if (refcount_read(&bond->refs) > 1)
+			do_detach = false;
+
+		refcount_inc(&bond->refs);
+		wait_event_lock_irq(bond->mm_exit_wq, !bond->mm_exit_active,
+				    iommu_sva_lock);
+		if (!do_detach)
+			return;
+
+	} else if (!refcount_dec_and_test(&bond->refs)) {
+		/* unbind() is waiting to free the bond */
+		return;
+	}
+
 	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
 		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
 			detach_domain = false;
@@ -298,6 +379,129 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
 	kfree(bond);
 }
 
+static int iommu_signal_mm_exit(struct iommu_bond *bond)
+{
+	struct device *dev = bond->dev;
+	struct io_mm *io_mm = bond->io_mm;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	/*
+	 * We can't hold the device's param_lock. If we did and the device
+	 * driver used a global lock around io_mm, we would risk getting the
+	 * following deadlock:
+	 *
+	 *   exit_mm()                 |  Shutdown SVA
+	 *    mutex_lock(param->lock)  |   mutex_lock(glob lock)
+	 *     param->mm_exit()        |    sva_device_shutdown()
+	 *      mutex_lock(glob lock)  |     mutex_lock(param->lock)
+	 *
+	 * Fortunately unbind() waits for us to finish, and sva_device_shutdown
+	 * requires that any bond is removed, so we can safely access mm_exit
+	 * and drvdata without taking any lock.
+	 */
+	if (!param || !param->mm_exit)
+		return 0;
+
+	return param->mm_exit(dev, io_mm->pasid, bond->drvdata);
+}
+
+/* Called when the mm exits. Can race with unbind(). */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_bond *bond, *next;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	/*
+	 * If the mm is exiting then devices are still bound to the io_mm.
+	 * A few things need to be done before it is safe to release:
+	 *
+	 * - As the mmu notifier doesn't hold any reference to the io_mm when
+	 *   calling ->release(), try to take a reference.
+	 * - Tell the device driver to stop using this PASID.
+	 * - Clear the PASID table and invalidate TLBs.
+	 * - Drop all references to this io_mm by freeing the bonds.
+	 */
+	spin_lock(&iommu_sva_lock);
+	if (!io_mm_get_locked(io_mm)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_sva_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
+		/*
+		 * Release the lock to let the handler sleep. We need to be
+		 * careful about concurrent modifications to the list and to the
+		 * bond. Tell unbind() not to free the bond until we're done.
+		 */
+		bond->mm_exit_active = true;
+		spin_unlock(&iommu_sva_lock);
+
+		if (iommu_signal_mm_exit(bond))
+			dev_WARN(bond->dev, "possible leak of PASID %u",
+				 io_mm->pasid);
+
+		spin_lock(&iommu_sva_lock);
+		next = list_next_entry(bond, mm_head);
+
+		/* If someone is waiting, let them delete the bond now */
+		bond->mm_exit_active = false;
+		wake_up_all(&bond->mm_exit_wq);
+
+		/* Otherwise, do it ourselves */
+		io_mm_detach_locked(bond, false);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this io_mm, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the io_mm.
+	 */
+	io_mm_put(io_mm);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		struct iommu_domain *domain = bond->domain;
+
+		domain->ops->mm_invalidate(domain, bond->dev, io_mm, start,
+					   end - start);
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn,
+				      struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_mmu_notifier = {
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -320,6 +524,12 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
  * The handler gets an opaque pointer corresponding to the drvdata passed as
  * argument of bind().
  *
+ * The @mm_exit handler is allowed to sleep. Be careful about the locks taken in
+ * @mm_exit, because they might lead to deadlocks if they are also held when
+ * dropping references to the mm. Consider the following call chain:
+ *   mutex_lock(A); mmput(mm) -> exit_mm() -> @mm_exit() -> mutex_lock(A)
+ * Using mmput_async() prevents this scenario.
+ *
  * The device should not be performing any DMA while this function is running,
  * otherwise the behavior is undefined.
  *
@@ -484,15 +694,16 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 	if (!param || WARN_ON(!domain))
 		return -EINVAL;
 
-	spin_lock(&iommu_sva_lock);
+	/* spin_lock_irq matches the one in wait_event_lock_irq */
+	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
 		if (bond->io_mm->pasid == pasid) {
-			io_mm_detach_locked(bond);
+			io_mm_detach_locked(bond, true);
 			ret = 0;
 			break;
 		}
 	}
-	spin_unlock(&iommu_sva_lock);
+	spin_unlock_irq(&iommu_sva_lock);
 
 	return ret;
 }
@@ -503,18 +714,25 @@ EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
  * @dev: the device
  *
  * When detaching @device from a domain, IOMMU drivers should use this helper.
+ * This function may sleep while waiting for bonds to be released.
  */
 void __iommu_sva_unbind_dev_all(struct device *dev)
 {
 	struct iommu_sva_param *param;
 	struct iommu_bond *bond, *next;
 
+	/*
+	 * io_mm_detach_locked might wait, so we shouldn't call it with the dev
+	 * param lock held. It's fine to read sva_param outside the lock because
+	 * it can only be freed by iommu_sva_device_shutdown when there are no
+	 * more bonds in the list.
+	 */
 	param = dev->iommu_param->sva_param;
 	if (param) {
-		spin_lock(&iommu_sva_lock);
+		spin_lock_irq(&iommu_sva_lock);
 		list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
-			io_mm_detach_locked(bond);
-		spin_unlock(&iommu_sva_lock);
+			io_mm_detach_locked(bond, true);
+		spin_unlock_irq(&iommu_sva_lock);
 	}
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 439c8fffd836..caa6f79785b9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -24,6 +24,7 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/err.h>
+#include <linux/mmu_notifier.h>
 #include <linux/of.h>
 #include <uapi/linux/iommu.h>
 
@@ -111,10 +112,15 @@ struct io_mm {
 	unsigned long		flags;
 	struct list_head	devices;
 	struct kref		kref;
+#if defined(CONFIG_MMU_NOTIFIER)
+	struct mmu_notifier	notifier;
+#endif
 	struct mm_struct	*mm;
 
 	/* Release callback for this mm */
 	void (*release)(struct io_mm *io_mm);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -249,6 +255,7 @@ struct iommu_sva_param {
  * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
  * @mm_detach: detach io_mm from a device. Remove PASID entry and
  *             flush associated TLB entries.
+ * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -298,6 +305,9 @@ struct iommu_ops {
 			 struct io_mm *io_mm, bool attach_domain);
 	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
 			  struct io_mm *io_mm, bool detach_domain);
+	void (*mm_invalidate)(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, unsigned long vaddr,
+			      size_t size);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 06/40] iommu/sva: Search mm by PASID
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 05/40] iommu/sva: Track mm changes with an MMU notifier Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 07/40] iommu: Add a page fault handler Jean-Philippe Brucker
                     ` (33 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The fault handler will need to find an mm given its PASID. This is the
reason we have an IDR for storing address spaces, so hook it up.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-sva.c | 26 ++++++++++++++++++++++++++
 include/linux/iommu.h     |  6 ++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index e9afae2537a2..5abe0f0b445c 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -736,3 +736,29 @@ void __iommu_sva_unbind_dev_all(struct device *dev)
 	}
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
+
+/**
+ * iommu_sva_find() - Find mm associated to the given PASID
+ * @pasid: Process Address Space ID assigned to the mm
+ *
+ * Returns the mm corresponding to this PASID, or NULL if not found. A reference
+ * to the mm is taken, and must be released with mmput().
+ */
+struct mm_struct *iommu_sva_find(int pasid)
+{
+	struct io_mm *io_mm;
+	struct mm_struct *mm = NULL;
+
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (io_mm && io_mm_get_locked(io_mm)) {
+		if (mmget_not_zero(io_mm->mm))
+			mm = io_mm->mm;
+
+		io_mm_put_locked(io_mm);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return mm;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index caa6f79785b9..faf3390ce89d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1001,6 +1001,7 @@ extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void __iommu_sva_unbind_dev_all(struct device *dev);
 
+extern struct mm_struct *iommu_sva_find(int pasid);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -1030,6 +1031,11 @@ static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
 static inline void __iommu_sva_unbind_dev_all(struct device *dev)
 {
 }
+
+static inline struct mm_struct *iommu_sva_find(int pasid)
+{
+	return NULL;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 07/40] iommu: Add a page fault handler
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 06/40] iommu/sva: Search mm by PASID Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-8-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 08/40] iommu/iopf: Handle mm faults Jean-Philippe Brucker
                     ` (32 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCI PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
recently added to the IOMMU core for SVA virtualisation. Add a page fault
handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them to
IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, supposedly in an IRQ handler, the IOMMU
driver reports the fault using iommu_report_device_fault(), which calls
the registered handler. The page fault handler then calls the mm fault
handler, and reports either success or failure with iommu_page_response().
When the handler succeeded, the IOMMU retries the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: multiple queues registered by IOMMU driver
---
 drivers/iommu/Kconfig      |   4 +
 drivers/iommu/Makefile     |   1 +
 drivers/iommu/io-pgfault.c | 363 +++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h      |  58 ++++++
 4 files changed, 426 insertions(+)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 38434899e283..09f13a7c4b60 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -79,6 +79,10 @@ config IOMMU_SVA
 	select IOMMU_API
 	select MMU_NOTIFIER
 
+config IOMMU_PAGE_FAULT
+	bool
+	select IOMMU_API
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1dbcc89ebe4c..4b744e399a1b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..321c00dd3a3d
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,363 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+/**
+ * struct iopf_queue - IO Page Fault queue
+ * @wq: the fault workqueue
+ * @flush: low-level flush callback
+ * @flush_arg: flush() argument
+ * @refs: references to this structure taken by producers
+ */
+struct iopf_queue {
+	struct workqueue_struct		*wq;
+	iopf_queue_flush_t		flush;
+	void				*flush_arg;
+	refcount_t			refs;
+};
+
+/**
+ * struct iopf_device_param - IO Page Fault data attached to a device
+ * @queue: IOPF queue
+ * @partial: faults that are part of a Page Request Group for which the last
+ *           request hasn't been submitted yet.
+ */
+struct iopf_device_param {
+	struct iopf_queue		*queue;
+	struct list_head		partial;
+};
+
+struct iopf_context {
+	struct device			*dev;
+	struct iommu_fault_event	evt;
+	struct list_head		head;
+};
+
+struct iopf_group {
+	struct iopf_context		last_fault;
+	struct list_head		faults;
+	struct work_struct		work;
+};
+
+static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
+			 enum page_response_code status)
+{
+	struct page_response_msg resp = {
+		.addr			= evt->addr,
+		.pasid			= evt->pasid,
+		.pasid_present		= evt->pasid_valid,
+		.page_req_group_id	= evt->page_req_group_id,
+		.private_data		= evt->iommu_private,
+		.resp_code		= status,
+	};
+
+	return iommu_page_response(dev, &resp);
+}
+
+static enum page_response_code
+iopf_handle_single(struct iopf_context *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iopf_handle_group(struct work_struct *work)
+{
+	struct iopf_group *group;
+	struct iopf_context *fault, *next;
+	enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
+
+	group = container_of(work, struct iopf_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault_event *evt = &fault->evt;
+		/*
+		 * Errors are sticky: don't handle subsequent faults in the
+		 * group if there is an error.
+		 */
+		if (status == IOMMU_PAGE_RESP_SUCCESS)
+			status = iopf_handle_single(fault);
+
+		if (!evt->last_req)
+			kfree(fault);
+	}
+
+	iopf_complete(group->last_fault.dev, &group->last_fault.evt, status);
+	kfree(group);
+}
+
+/**
+ * iommu_queue_iopf - IO Page Fault handler
+ * @evt: fault event
+ * @cookie: struct device, passed to iommu_register_device_fault_handler.
+ *
+ * Add a fault to the device workqueue, to be handled by mm.
+ */
+int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
+{
+	struct iopf_group *group;
+	struct iopf_context *fault, *next;
+	struct iopf_device_param *iopf_param;
+
+	struct device *dev = cookie;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (WARN_ON(!mutex_is_locked(&param->lock)))
+		return -EINVAL;
+
+	if (evt->type != IOMMU_FAULT_PAGE_REQ)
+		/* Not a recoverable page fault */
+		return IOMMU_PAGE_RESP_CONTINUE;
+
+	/*
+	 * As long as we're holding param->lock, the queue can't be unlinked
+	 * from the device and therefore cannot disappear.
+	 */
+	iopf_param = param->iopf_param;
+	if (!iopf_param)
+		return -ENODEV;
+
+	if (!evt->last_req) {
+		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+		if (!fault)
+			return -ENOMEM;
+
+		fault->evt = *evt;
+		fault->dev = dev;
+
+		/* Non-last request of a group. Postpone until the last one */
+		list_add(&fault->head, &iopf_param->partial);
+
+		return IOMMU_PAGE_RESP_HANDLED;
+	}
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return -ENOMEM;
+
+	group->last_fault.evt = *evt;
+	group->last_fault.dev = dev;
+	INIT_LIST_HEAD(&group->faults);
+	list_add(&group->last_fault.head, &group->faults);
+	INIT_WORK(&group->work, iopf_handle_group);
+
+	/* See if we have partial faults for this group */
+	list_for_each_entry_safe(fault, next, &iopf_param->partial, head) {
+		if (fault->evt.page_req_group_id == evt->page_req_group_id)
+			/* Insert *before* the last fault */
+			list_move(&fault->head, &group->faults);
+	}
+
+	queue_work(iopf_param->queue->wq, &group->work);
+
+	/* Postpone the fault completion */
+	return IOMMU_PAGE_RESP_HANDLED;
+}
+EXPORT_SYMBOL_GPL(iommu_queue_iopf);
+
+/**
+ * iopf_queue_flush_dev - Ensure that all queued faults have been processed
+ * @dev: the endpoint whose faults need to be flushed.
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults for this PASID have been handled, and won't hit the address
+ * space of the next process that uses this PASID.
+ *
+ * Return 0 on success.
+ */
+int iopf_queue_flush_dev(struct device *dev)
+{
+	int ret = 0;
+	struct iopf_queue *queue;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -ENODEV;
+
+	/*
+	 * It is incredibly easy to find ourselves in a deadlock situation if
+	 * we're not careful, because we're taking the opposite path as
+	 * iommu_queue_iopf:
+	 *
+	 *   iopf_queue_flush_dev()   |  PRI queue handler
+	 *    lock(mutex)             |   iommu_queue_iopf()
+	 *     queue->flush()         |    lock(mutex)
+	 *      wait PRI queue empty  |
+	 *
+	 * So we can't hold the device param lock while flushing. We don't have
+	 * to, because the queue or the device won't disappear until all flush
+	 * are finished.
+	 */
+	mutex_lock(&param->lock);
+	if (param->iopf_param) {
+		queue = param->iopf_param->queue;
+	} else {
+		ret = -ENODEV;
+	}
+	mutex_unlock(&param->lock);
+	if (ret)
+		return ret;
+
+	queue->flush(queue->flush_arg, dev);
+
+	/*
+	 * No need to clear the partial list. All PRGs containing the PASID that
+	 * needs to be decommissioned are whole (the device driver made sure of
+	 * it before this function was called). They have been submitted to the
+	 * queue by the above flush().
+	 */
+	flush_workqueue(queue->wq);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
+
+/**
+ * iopf_queue_add_device - Add producer to the fault queue
+ * @queue: IOPF queue
+ * @dev: device to add
+ */
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
+{
+	int ret = -EINVAL;
+	struct iopf_device_param *iopf_param;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -ENODEV;
+
+	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
+	if (!iopf_param)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&iopf_param->partial);
+	iopf_param->queue = queue;
+
+	mutex_lock(&param->lock);
+	if (!param->iopf_param) {
+		refcount_inc(&queue->refs);
+		param->iopf_param = iopf_param;
+		ret = 0;
+	}
+	mutex_unlock(&param->lock);
+
+	if (ret)
+		kfree(iopf_param);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_add_device);
+
+/**
+ * iopf_queue_remove_device - Remove producer from fault queue
+ * @dev: device to remove
+ *
+ * Caller makes sure that no more fault is reported for this device, and no more
+ * flush is scheduled for this device.
+ *
+ * Note: safe to call unconditionally on a cleanup path, even if the device
+ * isn't registered to any IOPF queue.
+ *
+ * Return 0 if the device was attached to the IOPF queue
+ */
+int iopf_queue_remove_device(struct device *dev)
+{
+	struct iopf_context *fault, *next;
+	struct iopf_device_param *iopf_param;
+	struct iommu_param *param = dev->iommu_param;
+
+	if (!param)
+		return -EINVAL;
+
+	mutex_lock(&param->lock);
+	iopf_param = param->iopf_param;
+	if (iopf_param) {
+		refcount_dec(&iopf_param->queue->refs);
+		param->iopf_param = NULL;
+	}
+	mutex_unlock(&param->lock);
+	if (!iopf_param)
+		return -EINVAL;
+
+	list_for_each_entry_safe(fault, next, &iopf_param->partial, head)
+		kfree(fault);
+
+	/*
+	 * No more flush is scheduled, and the caller removed all bonds from
+	 * this device. unbind() waited until any concurrent mm_exit() finished,
+	 * therefore there is no flush() running anymore and we can free the
+	 * param.
+	 */
+	kfree(iopf_param);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
+
+/**
+ * iopf_queue_alloc - Allocate and initialize a fault queue
+ * @name: a unique string identifying the queue (for workqueue)
+ * @flush: a callback that flushes the low-level queue
+ * @cookie: driver-private data passed to the flush callback
+ *
+ * The callback is called before the workqueue is flushed. The IOMMU driver must
+ * commit all faults that are pending in its low-level queues at the time of the
+ * call, into the IOPF queue (with iommu_report_device_fault). The callback
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	struct iopf_queue *queue;
+
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
+	if (!queue)
+		return NULL;
+
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name);
+	if (!queue->wq) {
+		kfree(queue);
+		return NULL;
+	}
+
+	queue->flush = flush;
+	queue->flush_arg = cookie;
+	refcount_set(&queue->refs, 1);
+
+	return queue;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_alloc);
+
+/**
+ * iopf_queue_free - Free IOPF queue
+ * @queue: queue to free
+ *
+ * Counterpart to iopf_queue_alloc(). Caller must make sure that all producers
+ * have been removed.
+ */
+void iopf_queue_free(struct iopf_queue *queue)
+{
+
+	/* Caller should have removed all producers first */
+	if (WARN_ON(!refcount_dec_and_test(&queue->refs)))
+		return;
+
+	destroy_workqueue(queue->wq);
+	kfree(queue);
+}
+EXPORT_SYMBOL_GPL(iopf_queue_free);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index faf3390ce89d..fad3a60e1c14 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -461,11 +461,20 @@ struct iommu_fault_param {
 	void *data;
 };
 
+/**
+ * iopf_queue_flush_t - Flush low-level page fault queue
+ *
+ * Report all faults currently pending in the low-level page fault queue
+ */
+struct iopf_queue;
+typedef int (*iopf_queue_flush_t)(void *cookie, struct device *dev);
+
 /**
  * struct iommu_param - collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
  * @sva_param: SVA parameters
+ * @iopf_param: I/O Page Fault queue and data
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  *	struct iommu_group	*iommu_group;
@@ -475,6 +484,7 @@ struct iommu_param {
 	struct mutex lock;
 	struct iommu_fault_param *fault_param;
 	struct iommu_sva_param *sva_param;
+	struct iopf_device_param *iopf_param;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -874,6 +884,12 @@ static inline int iommu_report_device_fault(struct device *dev, struct iommu_fau
 	return 0;
 }
 
+static inline int iommu_page_response(struct device *dev,
+				      struct page_response_msg *msg)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_group_id(struct iommu_group *group)
 {
 	return -ENODEV;
@@ -1038,4 +1054,46 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 }
 #endif /* CONFIG_IOMMU_SVA */
 
+#ifdef CONFIG_IOMMU_PAGE_FAULT
+extern int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie);
+
+extern int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+extern int iopf_queue_remove_device(struct device *dev);
+extern int iopf_queue_flush_dev(struct device *dev);
+extern struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie);
+extern void iopf_queue_free(struct iopf_queue *queue);
+#else /* CONFIG_IOMMU_PAGE_FAULT */
+static inline int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_add_device(struct iopf_queue *queue,
+					struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_remove_device(struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+#endif /* CONFIG_IOMMU_PAGE_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 08/40] iommu/iopf: Handle mm faults
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 07/40] iommu: Add a page fault handler Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 09/40] iommu/sva: Register page fault handler Jean-Philippe Brucker
                     ` (31 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

When a recoverable page fault is handled by the fault workqueue, find the
associated mm and call handle_mm_fault.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: let IOMMU drivers deal with Stop PASID
---
 drivers/iommu/io-pgfault.c | 86 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 321c00dd3a3d..dd2639e5c03b 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -7,6 +7,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -65,8 +66,65 @@ static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
 static enum page_response_code
 iopf_handle_single(struct iopf_context *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	int ret;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	unsigned int access_flags = 0;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault_event *evt = &fault->evt;
+	enum page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+	if (!evt->pasid_valid)
+		return status;
+
+	mm = iommu_sva_find(evt->pasid);
+	if (!mm)
+		return status;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, evt->addr);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (evt->prot & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (evt->prot & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (evt->prot & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(evt->prot & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, evt->addr, fault_flags);
+	status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+		IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * If the process exits while we're handling the fault on its mm, we
+	 * can't do mmput(). exit_mmap() would release the MMU notifier, calling
+	 * iommu_notifier_release(), which has to flush the fault queue that
+	 * we're executing on... So mmput_async() moves the release of the mm to
+	 * another thread, if we're the last user.
+	 */
+	mmput_async(mm);
+
+	return status;
 }
 
 static void iopf_handle_group(struct work_struct *work)
@@ -100,6 +158,30 @@ static void iopf_handle_group(struct work_struct *work)
  * @cookie: struct device, passed to iommu_register_device_fault_handler.
  *
  * Add a fault to the device workqueue, to be handled by mm.
+ *
+ * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard
+ * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't
+ * expect a response. It may be generated when disabling a PASID (issuing a
+ * PASID stop request) by some PCI devices.
+ *
+ * The PASID stop request is triggered by the mm_exit() callback. When the
+ * callback returns from the device driver, no page request is generated for
+ * this PASID anymore and outstanding ones have been pushed to the IOMMU (as per
+ * PCIe 4.0r1.0 - 6.20.1 and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some
+ * PCI devices will wait for all outstanding page requests to come back with a
+ * response before completing the PASID stop request. Others do not wait for
+ * page responses, and instead issue this Stop Marker that tells us when the
+ * PASID can be reallocated.
+ *
+ * It is safe to discard the Stop Marker because it is an optimization.
+ * a. Page requests, which are posted requests, have been flushed to the IOMMU
+ *    when mm_exit() returns,
+ * b. We flush all fault queues after mm_exit() returns and before freeing the
+ *    PASID.
+ *
+ * So even though the Stop Marker might be issued by the device *after* the stop
+ * request completes, outstanding faults will have been dealt with by the time
+ * we free the PASID.
  */
 int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
 {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 09/40] iommu/sva: Register page fault handler
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 08/40] iommu/iopf: Handle mm faults Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 10/40] mm: export symbol mm_access Jean-Philippe Brucker
                     ` (30 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Let users call iommu_sva_device_init() with the IOMMU_SVA_FEAT_IOPF flag,
that enables the I/O Page Fault queue. The IOMMU driver checks is the
device supports a form of page fault, in which case they add the device to
a fault queue. If the device doesn't support page faults, the IOMMU driver
aborts iommu_sva_device_init().

The fault queue must be flushed before any io_mm is freed, to make sure
that its PASID isn't used in any fault queue, and can be reallocated.
Add iopf_queue_flush() calls in a few strategic locations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: new
---
 drivers/iommu/iommu-sva.c | 36 ++++++++++++++++++++++++++++++++----
 drivers/iommu/iommu.c     |  6 +++---
 include/linux/iommu.h     |  2 ++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 5abe0f0b445c..e98b994c15f1 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -441,6 +441,8 @@ static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm
 			dev_WARN(bond->dev, "possible leak of PASID %u",
 				 io_mm->pasid);
 
+		iopf_queue_flush_dev(bond->dev);
+
 		spin_lock(&iommu_sva_lock);
 		next = list_next_entry(bond, mm_head);
 
@@ -518,6 +520,9 @@ static struct mmu_notifier_ops iommu_mmu_notifier = {
  * description. Setting @max_pasid to a non-zero value smaller than this limit
  * overrides it.
  *
+ * If the device should support recoverable I/O Page Faults (e.g. PCI PRI), the
+ * IOMMU_SVA_FEAT_IOPF feature must be requested.
+ *
  * If the driver intends to share process address spaces, it should pass a valid
  * @mm_exit handler. Otherwise @mm_exit can be NULL. After @mm_exit returns, the
  * device must not issue any more transaction with the PASID given as argument.
@@ -546,12 +551,21 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
 	if (!domain || !domain->ops->sva_device_init)
 		return -ENODEV;
 
-	if (features)
+	if (features & ~IOMMU_SVA_FEAT_IOPF)
 		return -EINVAL;
 
+	if (features & IOMMU_SVA_FEAT_IOPF) {
+		ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
+							  dev);
+		if (ret)
+			return ret;
+	}
+
 	param = kzalloc(sizeof(*param), GFP_KERNEL);
-	if (!param)
-		return -ENOMEM;
+	if (!param) {
+		ret = -ENOMEM;
+		goto err_remove_handler;
+	}
 
 	param->features		= features;
 	param->max_pasid	= max_pasid;
@@ -584,6 +598,9 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
 err_free_param:
 	kfree(param);
 
+err_remove_handler:
+	iommu_unregister_device_fault_handler(dev);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_device_init);
@@ -593,7 +610,8 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_init);
  * @dev: the device
  *
  * Disable SVA. Device driver should ensure that the device isn't performing any
- * DMA while this function is running.
+ * DMA while this function is running. In addition all faults should have been
+ * flushed to the IOMMU.
  */
 int iommu_sva_device_shutdown(struct device *dev)
 {
@@ -617,6 +635,8 @@ int iommu_sva_device_shutdown(struct device *dev)
 
 	kfree(param);
 
+	iommu_unregister_device_fault_handler(dev);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
@@ -694,6 +714,12 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
 	if (!param || WARN_ON(!domain))
 		return -EINVAL;
 
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iopf_queue_flush_dev(dev);
+
 	/* spin_lock_irq matches the one in wait_event_lock_irq */
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
@@ -721,6 +747,8 @@ void __iommu_sva_unbind_dev_all(struct device *dev)
 	struct iommu_sva_param *param;
 	struct iommu_bond *bond, *next;
 
+	iopf_queue_flush_dev(dev);
+
 	/*
 	 * io_mm_detach_locked might wait, so we shouldn't call it with the dev
 	 * param lock held. It's fine to read sva_param outside the lock because
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 333801e1519c..13f705df0725 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2278,9 +2278,9 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
  * iommu_sva_device_init() must be called first, to initialize the required SVA
  * features. @flags is a subset of these features.
  *
- * The caller must pin down using get_user_pages*() all mappings shared with the
- * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
- * (e.g. copy-on-write).
+ * If IOMMU_SVA_FEAT_IOPF isn't requested, the caller must pin down using
+ * get_user_pages*() all mappings shared with the device. mlock() isn't
+ * sufficient, as it doesn't prevent minor page faults (e.g. copy-on-write).
  *
  * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
  * is returned.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fad3a60e1c14..933100678f64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,8 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
+#define IOMMU_SVA_FEAT_IOPF		(1 << 0)
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 10/40] mm: export symbol mm_access
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (8 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 09/40] iommu/sva: Register page fault handler Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 11/40] mm: export symbol find_get_task_by_vpid Jean-Philippe Brucker
                     ` (29 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, felix.kuehling-5C7GfCeVMHo,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Some devices can access process address spaces directly. When creating
such bond, to check that a process controlling the device is allowed to
access the target address space, the device driver uses mm_access(). Since
the drivers (in this case VFIO) can be built as a module, export the
mm_access symbol.

Cc: felix.kuehling-5C7GfCeVMHo@public.gmane.org
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
This patch was already sent last year for AMD KFD. I'm resending it for
VFIO, trying to address Andrew Morton's request to comment the exported
function: http://lkml.iu.edu/hypermail/linux/kernel/1705.2/06774.html
---
 kernel/fork.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index a5d21c42acfc..1062f7450e97 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1098,6 +1098,19 @@ struct mm_struct *get_task_mm(struct task_struct *task)
 }
 EXPORT_SYMBOL_GPL(get_task_mm);
 
+/**
+ * mm_access - check access permission to a task and and acquire a reference to
+ * its mm.
+ * @task: target task
+ * @mode: selects type of access and caller credentials
+ *
+ * Return the task's mm on success, or %NULL if it cannot be accessed.
+ *
+ * Check if the caller is allowed to read or write the target task's pages.
+ * @mode describes the access mode and credentials using ptrace access flags.
+ * See ptrace_may_access() for more details. On success, a reference to the mm
+ * is taken.
+ */
 struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
 {
 	struct mm_struct *mm;
@@ -1117,6 +1130,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
 
 	return mm;
 }
+EXPORT_SYMBOL_GPL(mm_access);
 
 static void complete_vfork_done(struct task_struct *tsk)
 {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 11/40] mm: export symbol find_get_task_by_vpid
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (9 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 10/40] mm: export symbol mm_access Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 12/40] mm: export symbol mmput_async Jean-Philippe Brucker
                     ` (28 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Userspace drivers implemented with VFIO might want to bind sub-processes
to their devices. In a VFIO ioctl, they provide a pid that is used to find
a task and its mm. Since VFIO can be built as a module, export the
find_get_task_by_vpid symbol.

Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
Here I didn't add a comment because neighbouring functions that get
exported don't have a comment either, and the name seems fairly clear.
---
 kernel/pid.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/pid.c b/kernel/pid.c
index 157fe4b19971..0b6d3201c42d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -355,6 +355,7 @@ struct task_struct *find_get_task_by_vpid(pid_t nr)
 
 	return task;
 }
+EXPORT_SYMBOL_GPL(find_get_task_by_vpid);
 
 struct pid *get_task_pid(struct task_struct *task, enum pid_type type)
 {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 12/40] mm: export symbol mmput_async
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (10 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 11/40] mm: export symbol find_get_task_by_vpid Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing Jean-Philippe Brucker
                     ` (27 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

In some cases releasing a mm bound to a device might invoke an exit
handler, that takes a lock already held by the function calling mmput().
This is the case for VFIO, which needs to call mmput_async to avoid a
deadlock. Other drivers using SVA might follow. Since they can be built as
modules, export the mmput_async symbol.

Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 kernel/fork.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index 1062f7450e97..bf05d188c8de 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -999,6 +999,7 @@ void mmput_async(struct mm_struct *mm)
 		schedule_work(&mm->async_put_work);
 	}
 }
+EXPORT_SYMBOL_GPL(mmput_async);
 #endif
 
 /**
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (11 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 12/40] mm: export symbol mmput_async Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-14-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 14/40] dt-bindings: document stall and PASID properties for IOMMU masters Jean-Philippe Brucker
                     ` (26 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
Process Address Space ID (PASID). Devices in the container append this
PASID to DMA transactions in order to access the process' address space.
The process page tables are shared with the IOMMU, and mechanisms such as
PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a
bond created with VFIO_IOMMU_BIND_PROCESS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2:
* Simplify mm_exit
* Can be built as module again
---
 drivers/vfio/vfio_iommu_type1.c | 449 ++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       |  76 ++++++
 2 files changed, 504 insertions(+), 21 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5c212bf29640..2902774062b8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,6 +30,7 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/ptrace.h>
 #include <linux/rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
@@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	mm_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -90,6 +92,14 @@ struct vfio_dma {
 struct vfio_group {
 	struct iommu_group	*iommu_group;
 	struct list_head	next;
+	bool			sva_enabled;
+};
+
+struct vfio_mm {
+#define VFIO_PASID_INVALID	(-1)
+	int			pasid;
+	struct mm_struct	*mm;
+	struct list_head	next;
 };
 
 /*
@@ -1238,6 +1248,164 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	return 0;
 }
 
+static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
+{
+	struct vfio_mm *vfio_mm;
+	struct vfio_iommu *iommu = data;
+
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		if (vfio_mm->pasid == pasid) {
+			list_del(&vfio_mm->next);
+			kfree(vfio_mm);
+			break;
+		}
+	}
+	mutex_unlock(&iommu->lock);
+
+	return 0;
+}
+
+static int vfio_iommu_sva_init(struct device *dev, void *data)
+{
+	return iommu_sva_device_init(dev, IOMMU_SVA_FEAT_IOPF, 0,
+				     vfio_iommu_mm_exit);
+}
+
+static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
+{
+	iommu_sva_device_shutdown(dev);
+
+	return 0;
+}
+
+struct vfio_iommu_sva_bind_data {
+	struct vfio_mm *vfio_mm;
+	struct vfio_iommu *iommu;
+	int count;
+};
+
+static int vfio_iommu_sva_bind_dev(struct device *dev, void *data)
+{
+	struct vfio_iommu_sva_bind_data *bind_data = data;
+
+	/* Multi-device groups aren't support for SVA */
+	if (bind_data->count++)
+		return -EINVAL;
+
+	return __iommu_sva_bind_device(dev, bind_data->vfio_mm->mm,
+				       &bind_data->vfio_mm->pasid,
+				       IOMMU_SVA_FEAT_IOPF, bind_data->iommu);
+}
+
+static int vfio_iommu_sva_unbind_dev(struct device *dev, void *data)
+{
+	struct vfio_mm *vfio_mm = data;
+
+	return __iommu_sva_unbind_device(dev, vfio_mm->pasid);
+}
+
+static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
+				 struct vfio_group *group,
+				 struct vfio_mm *vfio_mm)
+{
+	int ret;
+	bool enabled_sva = false;
+	struct vfio_iommu_sva_bind_data data = {
+		.vfio_mm	= vfio_mm,
+		.iommu		= iommu,
+		.count		= 0,
+	};
+
+	if (!group->sva_enabled) {
+		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
+					       vfio_iommu_sva_init);
+		if (ret)
+			return ret;
+
+		group->sva_enabled = enabled_sva = true;
+	}
+
+	ret = iommu_group_for_each_dev(group->iommu_group, &data,
+				       vfio_iommu_sva_bind_dev);
+	if (ret && data.count > 1)
+		iommu_group_for_each_dev(group->iommu_group, vfio_mm,
+					 vfio_iommu_sva_unbind_dev);
+	if (ret && enabled_sva) {
+		iommu_group_for_each_dev(group->iommu_group, NULL,
+					 vfio_iommu_sva_shutdown);
+		group->sva_enabled = false;
+	}
+
+	return ret;
+}
+
+static void vfio_iommu_unbind_group(struct vfio_group *group,
+				    struct vfio_mm *vfio_mm)
+{
+	iommu_group_for_each_dev(group->iommu_group, vfio_mm,
+				 vfio_iommu_sva_unbind_dev);
+}
+
+static void vfio_iommu_unbind(struct vfio_iommu *iommu,
+			      struct vfio_mm *vfio_mm)
+{
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+
+	list_for_each_entry(domain, &iommu->domain_list, next)
+		list_for_each_entry(group, &domain->group_list, next)
+			vfio_iommu_unbind_group(group, vfio_mm);
+}
+
+static int vfio_iommu_replay_bind(struct vfio_iommu *iommu,
+				  struct vfio_group *group)
+{
+	int ret = 0;
+	struct vfio_mm *vfio_mm;
+
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		/*
+		 * Ensure that mm doesn't exit while we're binding it to the new
+		 * group. It may already have exited, and the mm_exit notifier
+		 * might be waiting on the IOMMU mutex to remove this vfio_mm
+		 * from the list.
+		 */
+		if (!mmget_not_zero(vfio_mm->mm))
+			continue;
+		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
+		/*
+		 * Use async to avoid triggering an mm_exit callback right away,
+		 * which would block on the mutex that we're holding.
+		 */
+		mmput_async(vfio_mm->mm);
+
+		if (ret)
+			goto out_unbind;
+	}
+
+	return 0;
+
+out_unbind:
+	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next)
+		vfio_iommu_unbind_group(group, vfio_mm);
+
+	return ret;
+}
+
+static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
+{
+	struct vfio_mm *vfio_mm, *next;
+
+	/*
+	 * No need for unbind() here. Since all groups are detached from this
+	 * iommu, bonds have been removed.
+	 */
+	list_for_each_entry_safe(vfio_mm, next, &iommu->mm_list, next)
+		kfree(vfio_mm);
+	INIT_LIST_HEAD(&iommu->mm_list);
+}
+
 /*
  * We change our unmap behavior slightly depending on whether the IOMMU
  * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
@@ -1313,6 +1481,44 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
 	return ret;
 }
 
+static int vfio_iommu_try_attach_group(struct vfio_iommu *iommu,
+				       struct vfio_group *group,
+				       struct vfio_domain *cur_domain,
+				       struct vfio_domain *new_domain)
+{
+	struct iommu_group *iommu_group = group->iommu_group;
+
+	/*
+	 * Try to match an existing compatible domain.  We don't want to
+	 * preclude an IOMMU driver supporting multiple bus_types and being
+	 * able to include different bus_types in the same IOMMU domain, so
+	 * we test whether the domains use the same iommu_ops rather than
+	 * testing if they're on the same bus_type.
+	 */
+	if (new_domain->domain->ops != cur_domain->domain->ops ||
+	    new_domain->prot != cur_domain->prot)
+		return 1;
+
+	iommu_detach_group(cur_domain->domain, iommu_group);
+
+	if (iommu_attach_group(new_domain->domain, iommu_group))
+		goto out_reattach;
+
+	if (vfio_iommu_replay_bind(iommu, group))
+		goto out_detach;
+
+	return 0;
+
+out_detach:
+	iommu_detach_group(new_domain->domain, iommu_group);
+
+out_reattach:
+	if (iommu_attach_group(cur_domain->domain, iommu_group))
+		return -EINVAL;
+
+	return 1;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -1410,28 +1616,16 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
 		domain->prot |= IOMMU_CACHE;
 
-	/*
-	 * Try to match an existing compatible domain.  We don't want to
-	 * preclude an IOMMU driver supporting multiple bus_types and being
-	 * able to include different bus_types in the same IOMMU domain, so
-	 * we test whether the domains use the same iommu_ops rather than
-	 * testing if they're on the same bus_type.
-	 */
 	list_for_each_entry(d, &iommu->domain_list, next) {
-		if (d->domain->ops == domain->domain->ops &&
-		    d->prot == domain->prot) {
-			iommu_detach_group(domain->domain, iommu_group);
-			if (!iommu_attach_group(d->domain, iommu_group)) {
-				list_add(&group->next, &d->group_list);
-				iommu_domain_free(domain->domain);
-				kfree(domain);
-				mutex_unlock(&iommu->lock);
-				return 0;
-			}
-
-			ret = iommu_attach_group(domain->domain, iommu_group);
-			if (ret)
-				goto out_domain;
+		ret = vfio_iommu_try_attach_group(iommu, group, domain, d);
+		if (ret < 0) {
+			goto out_domain;
+		} else if (!ret) {
+			list_add(&group->next, &d->group_list);
+			iommu_domain_free(domain->domain);
+			kfree(domain);
+			mutex_unlock(&iommu->lock);
+			return 0;
 		}
 	}
 
@@ -1442,6 +1636,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
+	ret = vfio_iommu_replay_bind(iommu, group);
+	if (ret)
+		goto out_detach;
+
 	if (resv_msi) {
 		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
 		if (ret)
@@ -1547,6 +1745,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 			continue;
 
 		iommu_detach_group(domain->domain, iommu_group);
+		if (group->sva_enabled) {
+			iommu_group_for_each_dev(iommu_group, NULL,
+						 vfio_iommu_sva_shutdown);
+			group->sva_enabled = false;
+		}
 		list_del(&group->next);
 		kfree(group);
 		/*
@@ -1562,6 +1765,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 					vfio_iommu_unmap_unpin_all(iommu);
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
+				vfio_iommu_free_all_mm(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -1596,6 +1800,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->mm_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1630,6 +1835,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		kfree(iommu->external_domain);
 	}
 
+	vfio_iommu_free_all_mm(iommu);
 	vfio_iommu_unmap_unpin_all(iommu);
 
 	list_for_each_entry_safe(domain, domain_tmp,
@@ -1658,6 +1864,169 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
+{
+	struct mm_struct *mm;
+	struct task_struct *task;
+
+	task = find_get_task_by_vpid(vpid);
+	if (!task)
+		return ERR_PTR(-ESRCH);
+
+	/* Ensure that current has RW access on the mm */
+	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+	put_task_struct(task);
+
+	if (!mm)
+		return ERR_PTR(-ESRCH);
+
+	return mm;
+}
+
+static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
+					  void __user *arg,
+					  struct vfio_iommu_type1_bind *bind)
+{
+	struct vfio_iommu_type1_bind_process params;
+	struct vfio_domain *domain;
+	struct vfio_group *group;
+	struct vfio_mm *vfio_mm;
+	struct mm_struct *mm;
+	unsigned long minsz;
+	int ret = 0;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		mm = vfio_iommu_get_mm_by_vpid(params.pid);
+		if (IS_ERR(mm))
+			return PTR_ERR(mm);
+	} else {
+		mm = get_task_mm(current);
+		if (!mm)
+			return -EINVAL;
+	}
+
+	mutex_lock(&iommu->lock);
+	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		if (vfio_mm->mm == mm) {
+			params.pasid = vfio_mm->pasid;
+			ret = copy_to_user(arg, &params, sizeof(params)) ?
+				-EFAULT : 0;
+			goto out_unlock;
+		}
+	}
+
+	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
+	if (!vfio_mm) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	vfio_mm->mm = mm;
+	vfio_mm->pasid = VFIO_PASID_INVALID;
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		list_for_each_entry(group, &domain->group_list, next) {
+			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
+			if (ret)
+				goto out_unbind;
+		}
+	}
+
+	list_add(&vfio_mm->next, &iommu->mm_list);
+
+	params.pasid = vfio_mm->pasid;
+	ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
+	if (ret)
+		goto out_delete;
+
+	mutex_unlock(&iommu->lock);
+	mmput(mm);
+	return 0;
+
+out_delete:
+	list_del(&vfio_mm->next);
+
+out_unbind:
+	/* Undo all binds that already succeeded */
+	vfio_iommu_unbind(iommu, vfio_mm);
+	kfree(vfio_mm);
+
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	mmput(mm);
+	return ret;
+}
+
+static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
+					    void __user *arg,
+					    struct vfio_iommu_type1_bind *bind)
+{
+	int ret = -EINVAL;
+	unsigned long minsz;
+	struct mm_struct *mm;
+	struct vfio_mm *vfio_mm;
+	struct vfio_iommu_type1_bind_process params;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	/*
+	 * We can't simply call unbind with the PASID, because the process might
+	 * have died and the PASID might have been reallocated to another
+	 * process. Instead we need to fetch that process mm by PID again to
+	 * make sure we remove the right vfio_mm.
+	 */
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		mm = vfio_iommu_get_mm_by_vpid(params.pid);
+		if (IS_ERR(mm))
+			return PTR_ERR(mm);
+	} else {
+		mm = get_task_mm(current);
+		if (!mm)
+			return -EINVAL;
+	}
+
+	ret = -ESRCH;
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		if (vfio_mm->mm == mm) {
+			vfio_iommu_unbind(iommu, vfio_mm);
+			list_del(&vfio_mm->next);
+			kfree(vfio_mm);
+			ret = 0;
+			break;
+		}
+	}
+	mutex_unlock(&iommu->lock);
+	mmput(mm);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1728,6 +2097,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+
+	} else if (cmd == VFIO_IOMMU_BIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.flags) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
+							     &bind);
+		default:
+			return -EINVAL;
+		}
+
+	} else if (cmd == VFIO_IOMMU_UNBIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.flags) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
+							       &bind);
+		default:
+			return -EINVAL;
+		}
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1aa7b82e8169..dc07752c8fe8 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -665,6 +665,82 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/*
+ * VFIO_IOMMU_BIND_PROCESS
+ *
+ * Allocate a PASID for a process address space, and use it to attach this
+ * process to all devices in the container. Devices can then tag their DMA
+ * traffic with the returned @pasid to perform transactions on the associated
+ * virtual address space. Mapping and unmapping buffers is performed by standard
+ * functions such as mmap and malloc.
+ *
+ * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
+ * bind. Otherwise the current task is bound. Given that the caller owns the
+ * device, setting this flag grants the caller read and write permissions on the
+ * entire address space of foreign process described by @pid. Therefore,
+ * permission to perform the bind operation on a foreign process is governed by
+ * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
+ * for more information.
+ *
+ * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
+ * ID is unique to a process and can be used on all devices in the container.
+ *
+ * On fork, the child inherits the device fd and can use the bonds setup by its
+ * parent. Consequently, the child has R/W access on the address spaces bound by
+ * its parent. After an execv, the device fd is closed and the child doesn't
+ * have access to the address space anymore.
+ *
+ * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
+ * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
+ * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
+ * task from the container.
+ */
+struct vfio_iommu_type1_bind_process {
+	__u32	flags;
+#define VFIO_IOMMU_BIND_PID		(1 << 0)
+	__u32	pasid;
+	__s32	pid;
+};
+
+/*
+ * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
+ * vfio_iommu_type1_bind_process in data.
+ */
+struct vfio_iommu_type1_bind {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
+	__u8	data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 14/40] dt-bindings: document stall and PASID properties for IOMMU masters
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (12 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 15/40] iommu/of: Add stall and pasid properties to iommu_fwspec Jean-Philippe Brucker
                     ` (25 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: mark.rutland-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	robh-DgEjT+Ai2ygdnm+yROfE0A, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

On ARM systems, some platform devices behind an IOMMU may support stall
and PASID features. Stall is the ability to recover from page faults and
PASID offers multiple process address spaces to the device. Together they
allow to do paging with a device. Let the firmware tell us when a device
supports stall and PASID.

Cc: robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: mark.rutland-5wv7dgnIgG8@public.gmane.org
Reviewed-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 .../devicetree/bindings/iommu/iommu.txt       | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
index 5a8b4624defc..ec3ea3a97b77 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
 disable the IOMMU's device tree node in the first place because it would
 prevent any driver from properly setting up the translations.
 
+Optional properties:
+--------------------
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
+- pasid-num-bits: Some masters support multiple address spaces for DMA, by
+  tagging DMA transactions with an address space identifier. By default,
+  this is 0, which means that the device only has one address space.
+
 
 Notes:
 ======
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 15/40] iommu/of: Add stall and pasid properties to iommu_fwspec
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (13 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 14/40] dt-bindings: document stall and PASID properties for IOMMU masters Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 16/40] arm64: mm: Pin down ASIDs for sharing mm with devices Jean-Philippe Brucker
                     ` (24 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Add stall and pasid properties to iommu_fwspec, and fill them when
dma-can-stall and pasid-bits properties are present in the device tree.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/of_iommu.c | 12 ++++++++++++
 include/linux/iommu.h    |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5c36a8b7656a..9b6b55f3cccd 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -204,6 +204,18 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 			if (err)
 				break;
 		}
+
+		fwspec = dev->iommu_fwspec;
+		if (!err && fwspec) {
+			const __be32 *prop;
+
+			if (of_get_property(master_np, "dma-can-stall", NULL))
+				fwspec->can_stall = true;
+
+			prop = of_get_property(master_np, "pasid-num-bits", NULL);
+			if (prop)
+				fwspec->num_pasid_bits = be32_to_cpu(*prop);
+		}
 	}
 
 	/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 933100678f64..bcce44455117 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -656,6 +656,8 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
 	unsigned int		num_ids;
+	unsigned int		num_pasid_bits;
+	bool			can_stall;
 	u32			ids[1];
 };
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 16/40] arm64: mm: Pin down ASIDs for sharing mm with devices
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (14 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 15/40] iommu/of: Add stall and pasid properties to iommu_fwspec Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-17-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 17/40] iommu/arm-smmu-v3: Link domains and devices Jean-Philippe Brucker
                     ` (23 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

To enable address space sharing with the IOMMU, introduce mm_context_get()
and mm_context_put(), that pin down a context and ensure that it will keep
its ASID after a rollover.

Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:

1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover. The IOMMU issues invalidation commands that can take tens of
   milliseconds.

It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.

After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maximum number of ASIDs that can be shared with the IOMMU.

Cc: catalin.marinas-5wv7dgnIgG8@public.gmane.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: TLC found a bug in my code :) It was a bit silly.

When updating an mm's context after a rollover, we check if the ASID is
pinned. If it is, then we can reuse it. But even then we do need to update
the generation in the reserved_asid map. V1 didn't do that, so what
happened was:

1. A task t1 is running with ASID (gen=1, asid=1) on CPU1 all along.
2. CPU2 triggers a rollover, but since t1 is running it keeps its ASID.
3. ASID 1 is pinned. t1 is scheduled on CPU2. The ASID allocator sees the
   ASID pinned, and skips the update of reserved_asids. t1 now has ASID
   (2, 1)
4. ASID 1 is unpinned. Another rollover. t1 is scheduled on CPU2. Since it
   is still running on CPU1, the allocator should keep reuse its ASID, but
   as it looks for ASID (2, 1) in reserved_asid, it finds (1, 1), and
   concludes that the task needs a new ASID. Woops.

The fix is simple: check and update reserved_asids *before* checking for
pinned ASIDs. The bug was found this afternoon (after a 4h run), and there
probably will be more. I restarted the validation but it might take a
while or never finish -- I had to stop the penultimate run after 2 weeks,
the parameters were too large. The last successful run was with only two
generations and took 4:30 hours (on 4 Xeon E5-2660v4). This bug was found
with 3 generations and a single pinned task.

You can find the asidalloc changes for kernel-tla here, temporarily:
http://jpbrucker.net/git/kernel-tla/commit/?id=b70361
http://jpbrucker.net/git/kernel-tla/commit/?id=f5413d
---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 +++-
 arch/arm64/mm/context.c              | 92 ++++++++++++++++++++++++++--
 3 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index dd320df0d026..dcf30e43af5e 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -27,6 +27,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	pinned;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 39ec0b8a689e..0eb3f8cc3c9b 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -168,7 +168,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgdp)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.pinned = 0;
+	return 0;
+}
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
@@ -241,6 +247,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 301417ae2ba8..e605adbad92c 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 
@@ -88,13 +92,16 @@ void verify_cpu_asid_bits(void)
 	}
 }
 
+#define asid_gen_match(asid) \
+	(!(((asid) ^ atomic64_read(&asid_generation)) >> asid_bits))
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
 	u64 asid;
 
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
-	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 
 	for_each_possible_cpu(i) {
 		asid = atomic64_xchg_relaxed(&per_cpu(active_asids, i), 0);
@@ -158,6 +165,14 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 		if (check_update_reserved_asid(asid, newasid))
 			return newasid;
 
+		/*
+		 * If it is pinned, we can keep using it. Note that reserved
+		 * takes priority, because even if it is also pinned, we need to
+		 * update the generation into the reserved_asids.
+		 */
+		if (mm->context.pinned)
+			return newasid;
+
 		/*
 		 * We had a valid ASID in a previous life, so try to re-use
 		 * it if possible.
@@ -213,8 +228,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	 *   because atomic RmWs are totally ordered for a given location.
 	 */
 	old_active_asid = atomic64_read(&per_cpu(active_asids, cpu));
-	if (old_active_asid &&
-	    !((asid ^ atomic64_read(&asid_generation)) >> asid_bits) &&
+	if (old_active_asid && asid_gen_match(asid) &&
 	    atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu),
 				     old_active_asid, asid))
 		goto switch_mm_fastpath;
@@ -222,7 +236,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
 	/* Check that our ASID belongs to the current generation. */
 	asid = atomic64_read(&mm->context.id);
-	if ((asid ^ atomic64_read(&asid_generation)) >> asid_bits) {
+	if (!asid_gen_match(asid)) {
 		asid = new_context(mm, cpu);
 		atomic64_set(&mm->context.id, asid);
 	}
@@ -245,6 +259,63 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.pinned) {
+		mm->context.pinned++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (!asid_gen_match(asid)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 * The cpu argument isn't used by new_context.
+		 */
+		asid = new_context(mm, 0);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid2idx(asid), pinned_asid_map);
+	mm->context.pinned++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	return asid;
+}
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.pinned == 0) {
+		__clear_bit(asid2idx(asid), pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+
 /* Errata workaround post TTBRx_EL1 update. */
 asmlinkage void post_ttbr_update_workaround(void)
 {
@@ -269,6 +340,19 @@ static int asids_init(void)
 		panic("Failed to allocate bitmap for %lu ASIDs\n",
 		      NUM_USER_ASIDS);
 
+	pinned_asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS)
+				  * sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned bitmap\n");
+
+	/*
+	 * We assume that an ASID is always available after a rollover. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid map.
+	 */
+	max_pinned_asids = NUM_USER_ASIDS - num_possible_cpus() - 2;
+	nr_pinned_asids = 0;
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 17/40] iommu/arm-smmu-v3: Link domains and devices
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (15 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 16/40] arm64: mm: Pin down ASIDs for sharing mm with devices Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-18-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 18/40] iommu/io-pgtable-arm: Factor out ARM LPAE register defines Jean-Philippe Brucker
                     ` (22 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

When removing a mapping from a domain, we need to send an invalidation to
all devices that might have stored it in their Address Translation Cache
(ATC). In addition when updating the context descriptor of a live domain,
we'll need to send invalidations for all devices attached to it.

Maintain a list of devices in each domain, protected by a spinlock. It is
updated every time we attach or detach devices to and from domains.

It needs to be a spinlock because we'll invalidate ATC entries from
within hardirq-safe contexts, but it may be possible to relax the read
side with RCU later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1d647104bccc..c892f012fb43 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -595,6 +595,11 @@ struct arm_smmu_device {
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_strtab_ent	ste;
+
+	struct arm_smmu_domain		*domain;
+	struct list_head		list; /* domain->devices */
+
+	struct device			*dev;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -618,6 +623,9 @@ struct arm_smmu_domain {
 	};
 
 	struct iommu_domain		domain;
+
+	struct list_head		devices;
+	spinlock_t			devices_lock;
 };
 
 struct arm_smmu_option_prop {
@@ -1470,6 +1478,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	}
 
 	mutex_init(&smmu_domain->init_mutex);
+	INIT_LIST_HEAD(&smmu_domain->devices);
+	spin_lock_init(&smmu_domain->devices_lock);
+
 	return &smmu_domain->domain;
 }
 
@@ -1685,7 +1696,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
 
 static void arm_smmu_detach_dev(struct device *dev)
 {
+	unsigned long flags;
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_domain *smmu_domain = master->domain;
+
+	if (smmu_domain) {
+		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+		list_del(&master->list);
+		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+		master->domain = NULL;
+	}
 
 	master->ste.assigned = false;
 	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
@@ -1694,6 +1715,7 @@ static void arm_smmu_detach_dev(struct device *dev)
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
+	unsigned long flags;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_master_data *master;
@@ -1729,6 +1751,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	}
 
 	ste->assigned = true;
+	master->domain = smmu_domain;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_add(&master->list, &smmu_domain->devices);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
 		ste->s1_cfg = NULL;
@@ -1847,6 +1874,7 @@ static int arm_smmu_add_device(struct device *dev)
 			return -ENOMEM;
 
 		master->smmu = smmu;
+		master->dev = dev;
 		fwspec->iommu_priv = master;
 	}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 18/40] iommu/io-pgtable-arm: Factor out ARM LPAE register defines
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (16 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 17/40] iommu/arm-smmu-v3: Link domains and devices Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 19/40] iommu: Add generic PASID table library Jean-Philippe Brucker
                     ` (21 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

For SVA, we'll need to extract CPU page table information and mirror it in
the substream setup. Move relevant defines to a common header.

Fix TCR_SZ_MASK while we're at it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 MAINTAINERS                    |  3 +-
 drivers/iommu/io-pgtable-arm.c | 49 +-----------------------------
 drivers/iommu/io-pgtable-arm.h | 54 ++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 50 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index df6e9bb2559a..9b996a94e460 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1114,8 +1114,7 @@ L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
 S:	Maintained
 F:	drivers/iommu/arm-smmu.c
 F:	drivers/iommu/arm-smmu-v3.c
-F:	drivers/iommu/io-pgtable-arm.c
-F:	drivers/iommu/io-pgtable-arm-v7s.c
+F:	drivers/iommu/io-pgtable-arm*
 
 ARM SUB-ARCHITECTURES
 L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 39c2a056da21..fe851eae9057 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -32,6 +32,7 @@
 #include <asm/barrier.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 #define ARM_LPAE_MAX_ADDR_BITS		52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
@@ -121,54 +122,6 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_32_LPAE_TCR_EAE		(1 << 31)
-#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
-
-#define ARM_LPAE_TCR_EPD1		(1 << 23)
-
-#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
-#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
-#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
-
-#define ARM_LPAE_TCR_SH0_SHIFT		12
-#define ARM_LPAE_TCR_SH0_MASK		0x3
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_ORGN0_SHIFT	10
-#define ARM_LPAE_TCR_IRGN0_SHIFT	8
-#define ARM_LPAE_TCR_RGN_MASK		0x3
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
-#define ARM_LPAE_TCR_SL0_SHIFT		6
-#define ARM_LPAE_TCR_SL0_MASK		0x3
-
-#define ARM_LPAE_TCR_T0SZ_SHIFT		0
-#define ARM_LPAE_TCR_SZ_MASK		0xf
-
-#define ARM_LPAE_TCR_PS_SHIFT		16
-#define ARM_LPAE_TCR_PS_MASK		0x7
-
-#define ARM_LPAE_TCR_IPS_SHIFT		32
-#define ARM_LPAE_TCR_IPS_MASK		0x7
-
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
-
-#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
-#define ARM_LPAE_MAIR_ATTR_MASK		0xff
-#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-#define ARM_LPAE_MAIR_ATTR_NC		0x44
-#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
 #define ARM_LPAE_MAIR_ATTR_IDX_NC	0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
new file mode 100644
index 000000000000..e35ba4666214
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __IO_PGTABLE_ARM_H
+#define __IO_PGTABLE_ARM_H
+
+#define ARM_32_LPAE_TCR_EAE		(1 << 31)
+#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+
+#define ARM_LPAE_TCR_EPD1		(1 << 23)
+
+#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
+#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
+#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+
+#define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH0_MASK		0x3
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_RGN_MASK		0x3
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_SL0_SHIFT		6
+#define ARM_LPAE_TCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+#define ARM_LPAE_TCR_SZ_MASK		0x3f
+
+#define ARM_LPAE_TCR_PS_SHIFT		16
+#define ARM_LPAE_TCR_PS_MASK		0x7
+
+#define ARM_LPAE_TCR_IPS_SHIFT		32
+#define ARM_LPAE_TCR_IPS_MASK		0x7
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+
+#endif /* __IO_PGTABLE_ARM_H */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 19/40] iommu: Add generic PASID table library
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (17 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 18/40] iommu/io-pgtable-arm: Factor out ARM LPAE register defines Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 20/40] iommu/arm-smmu-v3: Move context descriptor code Jean-Philippe Brucker
                     ` (20 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Add a small API within the IOMMU subsystem to handle different formats of
PASID tables. It uses the same principle as io-pgtable:

* The IOMMU driver registers a PASID table with some invalidation
  callbacks.
* The pasid-table lib allocates a set of tables of the right format, and
  returns an iommu_pasid_table_ops structure.
* The IOMMU driver allocates entries and writes them using the provided
  ops.
* The pasid-table lib calls the IOMMU driver back for invalidation when
  necessary.
* The IOMMU driver unregisters the ops which frees the tables when
  finished.

An example user will be Arm SMMU in a subsequent patch. Other IOMMU
drivers (e.g. paravirtualized ones) will be able to use the same PASID
table code.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: remove free_entry from the ops. The table driver now registers a
standalone release callback to each entry, because it may be freed after
the tables.
---
 drivers/iommu/Kconfig             |   7 ++
 drivers/iommu/Makefile            |   1 +
 drivers/iommu/iommu-pasid-table.c |  51 +++++++++++
 drivers/iommu/iommu-pasid-table.h | 146 ++++++++++++++++++++++++++++++
 4 files changed, 205 insertions(+)
 create mode 100644 drivers/iommu/iommu-pasid-table.c
 create mode 100644 drivers/iommu/iommu-pasid-table.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 09f13a7c4b60..fae34d6a522d 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -60,6 +60,13 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
 endmenu
 
+menu "Generic PASID table support"
+
+config IOMMU_PASID_TABLE
+	bool
+
+endmenu
+
 config IOMMU_IOVA
 	tristate
 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 4b744e399a1b..8e335a7f10aa 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid-table.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/iommu-pasid-table.c b/drivers/iommu/iommu-pasid-table.c
new file mode 100644
index 000000000000..ed62591dcc26
--- /dev/null
+++ b/drivers/iommu/iommu-pasid-table.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/kernel.h>
+
+#include "iommu-pasid-table.h"
+
+static const struct iommu_pasid_init_fns *
+pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct iommu_pasid_table *table;
+	const struct iommu_pasid_init_fns *fns;
+
+	if (fmt >= PASID_TABLE_NUM_FMTS)
+		return NULL;
+
+	fns = pasid_table_init_fns[fmt];
+	if (!fns)
+		return NULL;
+
+	table = fns->alloc(cfg, cookie);
+	if (!table)
+		return NULL;
+
+	table->fmt = fmt;
+	table->cookie = cookie;
+	table->cfg = *cfg;
+
+	return &table->ops;
+}
+
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops)
+{
+	struct iommu_pasid_table *table;
+
+	if (!ops)
+		return;
+
+	table = container_of(ops, struct iommu_pasid_table, ops);
+	iommu_pasid_flush_all(table);
+	pasid_table_init_fns[table->fmt]->free(table);
+}
diff --git a/drivers/iommu/iommu-pasid-table.h b/drivers/iommu/iommu-pasid-table.h
new file mode 100644
index 000000000000..d5bd098fef19
--- /dev/null
+++ b/drivers/iommu/iommu-pasid-table.h
@@ -0,0 +1,146 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+#ifndef __IOMMU_PASID_TABLE_H
+#define __IOMMU_PASID_TABLE_H
+
+#include <linux/bug.h>
+#include <linux/types.h>
+#include "io-pgtable.h"
+
+struct mm_struct;
+
+enum iommu_pasid_table_fmt {
+	PASID_TABLE_NUM_FMTS,
+};
+
+/**
+ * iommu_pasid_entry - Entry of a PASID table
+ *
+ * @tag: architecture-specific data needed to uniquely identify the entry. Most
+ * notably used for TLB invalidation
+ * @release: function that frees the entry and its content. PASID entries may be
+ * freed well after the PASID table ops are released, and may be shared between
+ * different PASID tables, so the release method has to be standalone.
+ */
+struct iommu_pasid_entry {
+	u64 tag;
+	void (*release)(struct iommu_pasid_entry *);
+};
+
+/**
+ * iommu_pasid_table_ops - Operations on a PASID table
+ *
+ * @alloc_shared_entry: allocate an entry for sharing an mm (SVA). Returns the
+ * pointer to a new entry or an error.
+ * @alloc_priv_entry: allocate an entry for map/unmap operations. Returns the
+ * pointer to a new entry or an error.
+ * @set_entry: write PASID table entry
+ * @clear_entry: clear PASID table entry
+ */
+struct iommu_pasid_table_ops {
+	struct iommu_pasid_entry *
+	(*alloc_shared_entry)(struct iommu_pasid_table_ops *ops,
+			      struct mm_struct *mm);
+	struct iommu_pasid_entry *
+	(*alloc_priv_entry)(struct iommu_pasid_table_ops *ops,
+			    enum io_pgtable_fmt fmt,
+			    struct io_pgtable_cfg *cfg);
+	int (*set_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			 struct iommu_pasid_entry *entry);
+	void (*clear_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			    struct iommu_pasid_entry *entry);
+};
+
+/**
+ * iommu_pasid_sync_ops - Callbacks into the IOMMU driver
+ *
+ * @cfg_flush: flush cached configuration for one entry. For a multi-level PASID
+ * table, 'leaf' tells whether to only flush cached leaf entries or intermediate
+ * levels as well.
+ * @cfg_flush_all: flush cached configuration for all entries of the PASID table
+ * @tlb_flush: flush TLB entries for one entry
+ */
+struct iommu_pasid_sync_ops {
+	void (*cfg_flush)(void *cookie, int pasid, bool leaf);
+	void (*cfg_flush_all)(void *cookie);
+	void (*tlb_flush)(void *cookie, int pasid,
+			  struct iommu_pasid_entry *entry);
+};
+
+/**
+ * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
+ *
+ * @iommu_dev device performing the DMA table walks
+ * @order: number of PASID bits, set by IOMMU driver
+ * @flush: TLB management callbacks for this set of tables.
+ *
+ * @base: DMA address of the allocated table, set by the allocator.
+ */
+struct iommu_pasid_table_cfg {
+	struct device				*iommu_dev;
+	size_t					order;
+	const struct iommu_pasid_sync_ops	*sync;
+	dma_addr_t				base;
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg,
+		      void *cookie);
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops);
+
+static inline void iommu_free_pasid_entry(struct iommu_pasid_entry *entry)
+{
+	if (WARN_ON(!entry->release))
+		return;
+	entry->release(entry);
+}
+
+/**
+ * struct iommu_pasid_table - describes a set of PASID tables
+ *
+ * @fmt: The PASID table format.
+ * @cookie: An opaque token provided by the IOMMU driver and passed back to any
+ * callback routine.
+ * @cfg: A copy of the PASID table configuration.
+ * @ops: The PASID table operations in use for this set of page tables.
+ */
+struct iommu_pasid_table {
+	enum iommu_pasid_table_fmt	fmt;
+	void				*cookie;
+	struct iommu_pasid_table_cfg	cfg;
+	struct iommu_pasid_table_ops	ops;
+};
+
+#define iommu_pasid_table_ops_to_table(ops) \
+	container_of((ops), struct iommu_pasid_table, ops)
+
+struct iommu_pasid_init_fns {
+	struct iommu_pasid_table *(*alloc)(struct iommu_pasid_table_cfg *cfg,
+					   void *cookie);
+	void (*free)(struct iommu_pasid_table *table);
+};
+
+static inline void iommu_pasid_flush_all(struct iommu_pasid_table *table)
+{
+	table->cfg.sync->cfg_flush_all(table->cookie);
+}
+
+static inline void iommu_pasid_flush(struct iommu_pasid_table *table,
+					 int pasid, bool leaf)
+{
+	table->cfg.sync->cfg_flush(table->cookie, pasid, leaf);
+}
+
+static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
+					  int pasid,
+					  struct iommu_pasid_entry *entry)
+{
+	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
+}
+
+#endif /* __IOMMU_PASID_TABLE_H */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 20/40] iommu/arm-smmu-v3: Move context descriptor code
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (18 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 19/40] iommu: Add generic PASID table library Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 21/40] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
                     ` (19 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

In preparation for substream ID support, move the context descriptor code
into a separate module. At the moment it only manages context descriptor
zero, which is used for non-PASID translations.

One important behavior change is the ASID allocator, which is now global
instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
would be relatively simple to move back to per-device allocator instead
of a global one. Sharing ASIDs will require an IDR, so implement the
ASID allocator with an IDA instead of porting the bitmap, to ease the
transition.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: try to simplify
---
 MAINTAINERS                         |   3 +-
 drivers/iommu/Kconfig               |  11 ++
 drivers/iommu/Makefile              |   1 +
 drivers/iommu/arm-smmu-v3-context.c | 257 ++++++++++++++++++++++++++++
 drivers/iommu/arm-smmu-v3.c         | 191 +++++++--------------
 drivers/iommu/iommu-pasid-table.c   |   1 +
 drivers/iommu/iommu-pasid-table.h   |  20 +++
 7 files changed, 355 insertions(+), 129 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-context.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9b996a94e460..c08c0c71a568 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1112,8 +1112,7 @@ M:	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
 R:	Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
 L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
 S:	Maintained
-F:	drivers/iommu/arm-smmu.c
-F:	drivers/iommu/arm-smmu-v3.c
+F:	drivers/iommu/arm-smmu*
 F:	drivers/iommu/io-pgtable-arm*
 
 ARM SUB-ARCHITECTURES
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index fae34d6a522d..11c8492b3763 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -65,6 +65,16 @@ menu "Generic PASID table support"
 config IOMMU_PASID_TABLE
 	bool
 
+config ARM_SMMU_V3_CONTEXT
+	bool "ARM SMMU v3 Context Descriptor tables"
+	select IOMMU_PASID_TABLE
+	depends on ARM64
+	help
+	  Enable support for ARM SMMU v3 Context Descriptor tables, used for
+	  DMA and PASID support.
+
+	  If unsure, say N here.
+
 endmenu
 
 config IOMMU_IOVA
@@ -334,6 +344,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
+	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
 	help
 	  Support for implementations of the ARM System MMU architecture
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8e335a7f10aa..244ad7913a81 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid-table.o
+obj-$(CONFIG_ARM_SMMU_V3_CONTEXT) += arm-smmu-v3-context.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
new file mode 100644
index 000000000000..15d3d02c59b2
--- /dev/null
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Context descriptor table driver for SMMUv3
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "iommu-pasid-table.h"
+
+#define CTXDESC_CD_DWORDS		8
+#define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
+#define ARM64_TCR_T0SZ			GENMASK_ULL(5, 0)
+#define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
+#define ARM64_TCR_TG0			GENMASK_ULL(15, 14)
+#define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
+#define ARM64_TCR_IRGN0			GENMASK_ULL(9, 8)
+#define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
+#define ARM64_TCR_ORGN0			GENMASK_ULL(11, 10)
+#define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
+#define ARM64_TCR_SH0			GENMASK_ULL(13, 12)
+#define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
+#define ARM64_TCR_EPD0			(1ULL << 7)
+#define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
+#define ARM64_TCR_EPD1			(1ULL << 23)
+
+#define CTXDESC_CD_0_ENDI		(1UL << 15)
+#define CTXDESC_CD_0_V			(1UL << 31)
+
+#define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
+#define ARM64_TCR_IPS			GENMASK_ULL(34, 32)
+#define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
+#define ARM64_TCR_TBI0			(1ULL << 37)
+
+#define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_S			(1UL << 44)
+#define CTXDESC_CD_0_R			(1UL << 45)
+#define CTXDESC_CD_0_A			(1UL << 46)
+#define CTXDESC_CD_0_ASET		(1UL << 47)
+#define CTXDESC_CD_0_ASID		GENMASK_ULL(63, 48)
+
+#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
+
+/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
+#define ARM_SMMU_TCR2CD(tcr, fld)	FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
+					FIELD_GET(ARM64_TCR_##fld, tcr))
+
+struct arm_smmu_cd {
+	struct iommu_pasid_entry	entry;
+
+	u64				ttbr;
+	u64				tcr;
+	u64				mair;
+};
+
+#define pasid_entry_to_cd(entry) \
+	container_of((entry), struct arm_smmu_cd, entry)
+
+struct arm_smmu_cd_tables {
+	struct iommu_pasid_table	pasid;
+
+	void				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
+#define pasid_to_cd_tables(pasid_table) \
+	container_of((pasid_table), struct arm_smmu_cd_tables, pasid)
+
+#define pasid_ops_to_tables(ops) \
+	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
+
+static DEFINE_IDA(asid_ida);
+
+static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+{
+	u64 val = 0;
+
+	/* Repack the TCR. Just care about TTBR0 for now */
+	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
+	val |= ARM_SMMU_TCR2CD(tcr, TG0);
+	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, SH0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
+	val |= ARM_SMMU_TCR2CD(tcr, IPS);
+	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
+
+	return val;
+}
+
+static void arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl,
+				    struct arm_smmu_cd *cd)
+{
+	u64 val;
+	__u64 *cdptr = tbl->ptr;
+	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
+
+	/*
+	 * We don't need to issue any invalidation here, as we'll invalidate
+	 * the STE when installing the new entry anyway.
+	 */
+	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+#ifdef __BIG_ENDIAN
+	      CTXDESC_CD_0_ENDI |
+#endif
+	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
+	      CTXDESC_CD_0_AA64 | FIELD_PREP(CTXDESC_CD_0_ASID, cd->entry.tag) |
+	      CTXDESC_CD_0_V;
+
+	if (cfg->stall)
+		val |= CTXDESC_CD_0_S;
+
+	cdptr[0] = cpu_to_le64(val);
+
+	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK;
+	cdptr[1] = cpu_to_le64(val);
+
+	cdptr[3] = cpu_to_le64(cd->mair);
+}
+
+static void arm_smmu_free_cd(struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	ida_simple_remove(&asid_ida, (u16)entry->tag);
+	kfree(cd);
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
+		       enum io_pgtable_fmt fmt,
+		       struct io_pgtable_cfg *cfg)
+{
+	int ret;
+	int asid;
+	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd)
+		return ERR_PTR(-ENOMEM);
+
+	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
+			      GFP_KERNEL);
+	if (asid < 0) {
+		kfree(cd);
+		return ERR_PTR(asid);
+	}
+
+	cd->entry.tag = asid;
+	cd->entry.release = arm_smmu_free_cd;
+
+	switch (fmt) {
+	case ARM_64_LPAE_S1:
+		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
+		cd->tcr		= cfg->arm_lpae_s1_cfg.tcr;
+		cd->mair	= cfg->arm_lpae_s1_cfg.mair[0];
+		break;
+	default:
+		pr_err("Unsupported pgtable format 0x%x\n", fmt);
+		ret = -EINVAL;
+		goto err_free_cd;
+	}
+
+	return &cd->entry;
+
+err_free_cd:
+	arm_smmu_free_cd(&cd->entry);
+
+	return ERR_PTR(ret);
+}
+
+static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			   struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	arm_smmu_write_ctx_desc(tbl, cd);
+	return 0;
+}
+
+static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			      struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+
+	arm_smmu_write_ctx_desc(tbl, NULL);
+}
+
+static struct iommu_pasid_table *
+arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct arm_smmu_cd_tables *tbl;
+	struct device *dev = cfg->iommu_dev;
+
+	if (cfg->order) {
+		/* TODO: support SSID */
+		return NULL;
+	}
+
+	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
+	if (!tbl)
+		return NULL;
+
+	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
+				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
+	if (!tbl->ptr) {
+		dev_warn(dev, "failed to allocate context descriptor\n");
+		goto err_free_tbl;
+	}
+
+	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
+		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
+		.alloc_shared_entry	= arm_smmu_alloc_shared_cd,
+		.set_entry		= arm_smmu_set_cd,
+		.clear_entry		= arm_smmu_clear_cd,
+	};
+	cfg->base = tbl->ptr_dma;
+
+	return &tbl->pasid;
+
+err_free_tbl:
+	devm_kfree(dev, tbl);
+
+	return NULL;
+}
+
+static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
+{
+	struct iommu_pasid_table_cfg *cfg = &pasid_table->cfg;
+	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
+
+	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
+			   tbl->ptr, tbl->ptr_dma);
+	devm_kfree(dev, tbl);
+}
+
+struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns = {
+	.alloc	= arm_smmu_alloc_cd_tables,
+	.free	= arm_smmu_free_cd_tables,
+};
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c892f012fb43..68764a200e44 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -42,6 +42,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable.h"
+#include "iommu-pasid-table.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -258,44 +259,6 @@
 
 #define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
 
-/* Context descriptor (stage-1 only) */
-#define CTXDESC_CD_DWORDS		8
-#define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
-#define ARM64_TCR_T0SZ			GENMASK_ULL(5, 0)
-#define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
-#define ARM64_TCR_TG0			GENMASK_ULL(15, 14)
-#define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
-#define ARM64_TCR_IRGN0			GENMASK_ULL(9, 8)
-#define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
-#define ARM64_TCR_ORGN0			GENMASK_ULL(11, 10)
-#define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
-#define ARM64_TCR_SH0			GENMASK_ULL(13, 12)
-#define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
-#define ARM64_TCR_EPD0			(1ULL << 7)
-#define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
-#define ARM64_TCR_EPD1			(1ULL << 23)
-
-#define CTXDESC_CD_0_ENDI		(1UL << 15)
-#define CTXDESC_CD_0_V			(1UL << 31)
-
-#define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
-#define ARM64_TCR_IPS			GENMASK_ULL(34, 32)
-#define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
-#define ARM64_TCR_TBI0			(1ULL << 37)
-
-#define CTXDESC_CD_0_AA64		(1UL << 41)
-#define CTXDESC_CD_0_S			(1UL << 44)
-#define CTXDESC_CD_0_R			(1UL << 45)
-#define CTXDESC_CD_0_A			(1UL << 46)
-#define CTXDESC_CD_0_ASET		(1UL << 47)
-#define CTXDESC_CD_0_ASID		GENMASK_ULL(63, 48)
-
-#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
-
-/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
-#define ARM_SMMU_TCR2CD(tcr, fld)	FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
-					FIELD_GET(ARM64_TCR_##fld, tcr))
-
 /* Command queue */
 #define CMDQ_ENT_DWORDS			2
 #define CMDQ_MAX_SZ_SHIFT		8
@@ -494,15 +457,9 @@ struct arm_smmu_strtab_l1_desc {
 };
 
 struct arm_smmu_s1_cfg {
-	__le64				*cdptr;
-	dma_addr_t			cdptr_dma;
-
-	struct arm_smmu_ctx_desc {
-		u16	asid;
-		u64	ttbr;
-		u64	tcr;
-		u64	mair;
-	}				cd;
+	struct iommu_pasid_table_cfg	tables;
+	struct iommu_pasid_table_ops	*ops;
+	struct iommu_pasid_entry	*cd0; /* Default context */
 };
 
 struct arm_smmu_s2_cfg {
@@ -572,9 +529,7 @@ struct arm_smmu_device {
 	unsigned long			oas; /* PA */
 	unsigned long			pgsize_bitmap;
 
-#define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -999,54 +954,6 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
-/* Context descriptor manipulation functions */
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
-{
-	u64 val = 0;
-
-	/* Repack the TCR. Just care about TTBR0 for now */
-	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
-	val |= ARM_SMMU_TCR2CD(tcr, TG0);
-	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, SH0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
-	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
-
-	return val;
-}
-
-static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
-				    struct arm_smmu_s1_cfg *cfg)
-{
-	u64 val;
-
-	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
-	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
-#ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
-#endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
-	      CTXDESC_CD_0_AA64 | FIELD_PREP(CTXDESC_CD_0_ASID, cfg->cd.asid) |
-	      CTXDESC_CD_0_V;
-
-	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
-		val |= CTXDESC_CD_0_S;
-
-	cfg->cdptr[0] = cpu_to_le64(val);
-
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
-	cfg->cdptr[1] = cpu_to_le64(val);
-
-	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair);
-}
-
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1155,7 +1062,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
+		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK) |
 			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS);
 	}
 
@@ -1396,8 +1303,10 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	struct arm_smmu_cmdq_ent cmd;
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1421,8 +1330,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	};
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
@@ -1440,6 +1351,26 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
 	.tlb_sync	= arm_smmu_tlb_sync,
 };
 
+/* PASID TABLE API */
+static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
+{
+}
+
+static void arm_smmu_sync_cd_all(void *cookie)
+{
+}
+
+static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
+				  struct iommu_pasid_entry *entry)
+{
+}
+
+static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
+	.cfg_flush	= arm_smmu_sync_cd,
+	.cfg_flush_all	= arm_smmu_sync_cd_all,
+	.tlb_flush	= arm_smmu_tlb_inv_ssid,
+};
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
@@ -1512,15 +1443,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 
 	/* Free the CD and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+		struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
 
-		if (cfg->cdptr) {
-			dmam_free_coherent(smmu_domain->smmu->dev,
-					   CTXDESC_CD_DWORDS << 3,
-					   cfg->cdptr,
-					   cfg->cdptr_dma);
-
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
+		if (ops) {
+			iommu_free_pasid_entry(smmu_domain->s1_cfg.cd0);
+			iommu_free_pasid_ops(ops);
 		}
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -1535,31 +1462,42 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
-	int asid;
+	struct iommu_pasid_entry *entry;
+	struct iommu_pasid_table_ops *ops;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	struct iommu_pasid_table_cfg pasid_cfg = {
+		.iommu_dev		= smmu->dev,
+		.sync			= &arm_smmu_ctx_sync,
+		.arm_smmu = {
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.asid_bits	= smmu->asid_bits,
+		},
+	};
+
+	ops = iommu_alloc_pasid_ops(PASID_TABLE_ARM_SMMU_V3, &pasid_cfg,
+				    smmu_domain);
+	if (!ops)
+		return -ENOMEM;
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
+	/* Create default entry */
+	entry = ops->alloc_priv_entry(ops, ARM_64_LPAE_S1, pgtbl_cfg);
+	if (IS_ERR(entry)) {
+		iommu_free_pasid_ops(ops);
+		return PTR_ERR(entry);
+	}
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr) {
-		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
-		ret = -ENOMEM;
-		goto out_free_asid;
+	ret = ops->set_entry(ops, 0, entry);
+	if (ret) {
+		iommu_free_pasid_entry(entry);
+		iommu_free_pasid_ops(ops);
+		return ret;
 	}
 
-	cfg->cd.asid	= (u16)asid;
-	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
-	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
-	return 0;
+	cfg->tables	= pasid_cfg;
+	cfg->ops	= ops;
+	cfg->cd0	= entry;
 
-out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
 	return ret;
 }
 
@@ -1763,7 +1701,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
-		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
diff --git a/drivers/iommu/iommu-pasid-table.c b/drivers/iommu/iommu-pasid-table.c
index ed62591dcc26..2b6a8a585771 100644
--- a/drivers/iommu/iommu-pasid-table.c
+++ b/drivers/iommu/iommu-pasid-table.c
@@ -11,6 +11,7 @@
 
 static const struct iommu_pasid_init_fns *
 pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+	[PASID_TABLE_ARM_SMMU_V3] = &arm_smmu_v3_pasid_init_fns,
 };
 
 struct iommu_pasid_table_ops *
diff --git a/drivers/iommu/iommu-pasid-table.h b/drivers/iommu/iommu-pasid-table.h
index d5bd098fef19..f52a15f60e81 100644
--- a/drivers/iommu/iommu-pasid-table.h
+++ b/drivers/iommu/iommu-pasid-table.h
@@ -14,6 +14,7 @@
 struct mm_struct;
 
 enum iommu_pasid_table_fmt {
+	PASID_TABLE_ARM_SMMU_V3,
 	PASID_TABLE_NUM_FMTS,
 };
 
@@ -71,6 +72,18 @@ struct iommu_pasid_sync_ops {
 			  struct iommu_pasid_entry *entry);
 };
 
+/**
+ * arm_smmu_context_cfg - PASID table configuration for ARM SMMU v3
+ *
+ * SMMU properties:
+ * @stall: devices attached to the domain are allowed to stall.
+ * @asid_bits: number of ASID bits supported by the SMMU
+ */
+struct arm_smmu_context_cfg {
+	u8				stall:1;
+	u8				asid_bits;
+};
+
 /**
  * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
  *
@@ -85,6 +98,11 @@ struct iommu_pasid_table_cfg {
 	size_t					order;
 	const struct iommu_pasid_sync_ops	*sync;
 	dma_addr_t				base;
+
+	/* Low-level data specific to the IOMMU */
+	union {
+		struct arm_smmu_context_cfg	arm_smmu;
+	};
 };
 
 struct iommu_pasid_table_ops *
@@ -143,4 +161,6 @@ static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
 	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
 }
 
+extern struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns;
+
 #endif /* __IOMMU_PASID_TABLE_H */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 21/40] iommu/arm-smmu-v3: Add support for Substream IDs
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (19 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 20/40] iommu/arm-smmu-v3: Move context descriptor code Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-22-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 22/40] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
                     ` (18 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
address space to each device. SMMUv3 allows to associate multiple address
spaces per device. In addition to the Stream ID (SID), that identifies a
device, we can now have Substream IDs (SSID) identifying an address space.
In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
Address-Space ID (PASID).

Prepare the driver for SSID support, by adding context descriptor tables
in STEs (previously a single static context descriptor). A complete
stage-1 walk is now performed like this by the SMMU:

      Stream tables          Ctx. tables          Page tables
        +--------+   ,------->+-------+   ,------->+-------+
        :        :   |        :       :   |        :       :
        +--------+   |        +-------+   |        +-------+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
        +--------+            +-------+            +-------+
        :        :            :       :            :       :
        +--------+            +-------+            +-------+

We only implement one level of context descriptor table for now, but as
with stream and page tables, an SSID can be split to target multiple
levels of tables.

In all stream table entries, we set S1DSS=SSID0 mode, making translations
without an ssid use context descriptor 0.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: use GENMASK throughout SMMU patches
---
 drivers/iommu/arm-smmu-v3-context.c | 141 +++++++++++++++++++++-------
 drivers/iommu/arm-smmu-v3.c         |  82 +++++++++++++++-
 drivers/iommu/iommu-pasid-table.h   |   7 ++
 3 files changed, 190 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 15d3d02c59b2..0969a3626110 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -62,11 +62,14 @@ struct arm_smmu_cd {
 #define pasid_entry_to_cd(entry) \
 	container_of((entry), struct arm_smmu_cd, entry)
 
+struct arm_smmu_cd_table {
+	__le64				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
 struct arm_smmu_cd_tables {
 	struct iommu_pasid_table	pasid;
-
-	void				*ptr;
-	dma_addr_t			ptr_dma;
+	struct arm_smmu_cd_table	table;
 };
 
 #define pasid_to_cd_tables(pasid_table) \
@@ -77,6 +80,36 @@ struct arm_smmu_cd_tables {
 
 static DEFINE_IDA(asid_ida);
 
+static int arm_smmu_alloc_cd_leaf_table(struct device *dev,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	desc->ptr = dmam_alloc_coherent(dev, size, &desc->ptr_dma,
+					GFP_ATOMIC | __GFP_ZERO);
+	if (!desc->ptr) {
+		dev_warn(dev, "failed to allocate context descriptor table\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void arm_smmu_free_cd_leaf_table(struct device *dev,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma);
+}
+
+static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
+{
+	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -95,34 +128,74 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static void arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl,
-				    struct arm_smmu_cd *cd)
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				   struct arm_smmu_cd *cd)
 {
 	u64 val;
-	__u64 *cdptr = tbl->ptr;
+	bool cd_live;
+	__le64 *cdptr = arm_smmu_get_cd_ptr(tbl, ssid);
 	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
 
 	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
+	 * This function handles the following cases:
+	 *
+	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
+	 * (2) Install a secondary CD, for SID+SSID traffic, followed by an
+	 *     invalidation.
+	 * (3) Update ASID of primary CD. This is allowed by atomically writing
+	 *     the first 64 bits of the CD, followed by invalidation of the old
+	 *     entry and mappings.
+	 * (4) Remove a secondary CD and invalidate it.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+
+	if (!cdptr)
+		return -ENOMEM;
+
+	val = le64_to_cpu(cdptr[0]);
+	cd_live = !!(val & CTXDESC_CD_0_V);
+
+	if (!cd) { /* (4) */
+		cdptr[0] = 0;
+	} else if (cd_live) { /* (3) */
+		val &= ~CTXDESC_CD_0_ASID;
+		val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->entry.tag);
+
+		cdptr[0] = cpu_to_le64(val);
+		/*
+		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
+		 * this substream's traffic
+		 */
+	} else { /* (1) and (2) */
+		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
+		cdptr[2] = 0;
+		cdptr[3] = cpu_to_le64(cd->mair);
+
+		/*
+		 * STE is live, and the SMMU might fetch this CD at any
+		 * time. Ensure it observes the rest of the CD before we
+		 * enable it.
+		 */
+		iommu_pasid_flush(&tbl->pasid, ssid, true);
+
+
+		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
+		      CTXDESC_CD_0_ENDI |
 #endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
-	      CTXDESC_CD_0_AA64 | FIELD_PREP(CTXDESC_CD_0_ASID, cd->entry.tag) |
-	      CTXDESC_CD_0_V;
+		      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
+		      CTXDESC_CD_0_AA64 |
+		      FIELD_PREP(CTXDESC_CD_0_ASID, cd->entry.tag) |
+		      CTXDESC_CD_0_V;
 
-	if (cfg->stall)
-		val |= CTXDESC_CD_0_S;
+		if (cfg->stall)
+			val |= CTXDESC_CD_0_S;
 
-	cdptr[0] = cpu_to_le64(val);
+		cdptr[0] = cpu_to_le64(val);
+	}
 
-	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK;
-	cdptr[1] = cpu_to_le64(val);
+	iommu_pasid_flush(&tbl->pasid, ssid, true);
 
-	cdptr[3] = cpu_to_le64(cd->mair);
+	return 0;
 }
 
 static void arm_smmu_free_cd(struct iommu_pasid_entry *entry)
@@ -190,8 +263,10 @@ static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
 	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
-	arm_smmu_write_ctx_desc(tbl, cd);
-	return 0;
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return -EINVAL;
+
+	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
 }
 
 static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
@@ -199,30 +274,26 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 {
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
 
-	arm_smmu_write_ctx_desc(tbl, NULL);
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return;
+
+	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
 }
 
 static struct iommu_pasid_table *
 arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 {
+	int ret;
 	struct arm_smmu_cd_tables *tbl;
 	struct device *dev = cfg->iommu_dev;
 
-	if (cfg->order) {
-		/* TODO: support SSID */
-		return NULL;
-	}
-
 	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		return NULL;
 
-	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
-				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
-	if (!tbl->ptr) {
-		dev_warn(dev, "failed to allocate context descriptor\n");
+	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	if (ret)
 		goto err_free_tbl;
-	}
 
 	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
 		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
@@ -230,7 +301,8 @@ arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 		.set_entry		= arm_smmu_set_cd,
 		.clear_entry		= arm_smmu_clear_cd,
 	};
-	cfg->base = tbl->ptr_dma;
+	cfg->base			= tbl->table.ptr_dma;
+	cfg->arm_smmu.s1fmt		= ARM_SMMU_S1FMT_LINEAR;
 
 	return &tbl->pasid;
 
@@ -246,8 +318,7 @@ static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
 	struct device *dev = cfg->iommu_dev;
 	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
 
-	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
-			   tbl->ptr, tbl->ptr_dma);
+	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
 	devm_kfree(dev, tbl);
 }
 
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 68764a200e44..16b08f2fb8ac 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -224,10 +224,14 @@
 #define STRTAB_STE_0_CFG_S2_TRANS	6
 
 #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
-#define STRTAB_STE_0_S1FMT_LINEAR	0
 #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
 
+#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
+#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
+#define STRTAB_STE_1_S1DSS_BYPASS	0x1
+#define STRTAB_STE_1_S1DSS_SSID0	0x2
+
 #define STRTAB_STE_1_S1C_CACHE_NC	0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
 #define STRTAB_STE_1_S1C_CACHE_WT	2UL
@@ -275,6 +279,7 @@
 #define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
 #define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
 
+#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
 #define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
 #define CMDQ_CFGI_1_LEAF		(1UL << 0)
 #define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
@@ -381,8 +386,11 @@ struct arm_smmu_cmdq_ent {
 
 		#define CMDQ_OP_CFGI_STE	0x3
 		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
 		struct {
 			u32			sid;
+			u32			ssid;
 			union {
 				bool		leaf;
 				u8		span;
@@ -555,6 +563,7 @@ struct arm_smmu_master_data {
 	struct list_head		list; /* domain->devices */
 
 	struct device			*dev;
+	size_t				ssid_bits;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -753,10 +762,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= FIELD_PREP(CMDQ_PREFETCH_1_SIZE, ent->prefetch.size);
 		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
 		break;
+	case CMDQ_OP_CFGI_CD:
+		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
+		/* Fallthrough */
 	case CMDQ_OP_CFGI_STE:
 		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
 		break;
+	case CMDQ_OP_CFGI_CD_ALL:
+		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
+		break;
 	case CMDQ_OP_CFGI_ALL:
 		/* Cover the entire SID range */
 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
@@ -1048,8 +1063,11 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 	}
 
 	if (ste->s1_cfg) {
+		struct iommu_pasid_table_cfg *cfg = &ste->s1_cfg->tables;
+
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
+			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
@@ -1063,7 +1081,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK) |
-			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS);
+			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
+			FIELD_PREP(STRTAB_STE_0_S1CDMAX, cfg->order) |
+			FIELD_PREP(STRTAB_STE_0_S1FMT, cfg->arm_smmu.s1fmt);
 	}
 
 	if (ste->s2_cfg) {
@@ -1352,17 +1372,62 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
 };
 
 /* PASID TABLE API */
+static void __arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
+			       struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t i;
+	unsigned long flags;
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+		for (i = 0; i < fwspec->num_ids; i++) {
+			cmd->cfgi.sid = fwspec->ids[i];
+			arm_smmu_cmdq_issue_cmd(smmu, cmd);
+		}
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	__arm_smmu_tlb_sync(smmu);
+}
+
 static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
 {
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+		.cfgi	= {
+			.ssid	= ssid,
+			.leaf	= leaf,
+		},
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
 }
 
 static void arm_smmu_sync_cd_all(void *cookie)
 {
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
 }
 
 static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
 				  struct iommu_pasid_entry *entry)
 {
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.tlbi.asid	= entry->tag,
+	};
+
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+	__arm_smmu_tlb_sync(smmu);
 }
 
 static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
@@ -1459,6 +1524,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 }
 
 static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master_data *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
@@ -1468,6 +1534,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 	struct iommu_pasid_table_cfg pasid_cfg = {
 		.iommu_dev		= smmu->dev,
+		.order			= master->ssid_bits,
 		.sync			= &arm_smmu_ctx_sync,
 		.arm_smmu = {
 			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
@@ -1502,6 +1569,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 }
 
 static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master_data *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int vmid;
@@ -1518,7 +1586,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct iommu_domain *domain,
+				    struct arm_smmu_master_data *master)
 {
 	int ret;
 	unsigned long ias, oas;
@@ -1526,6 +1595,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	struct io_pgtable_cfg pgtbl_cfg;
 	struct io_pgtable_ops *pgtbl_ops;
 	int (*finalise_stage_fn)(struct arm_smmu_domain *,
+				 struct arm_smmu_master_data *,
 				 struct io_pgtable_cfg *);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
@@ -1579,7 +1649,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
 	domain->geometry.force_aperture = true;
 
-	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
+	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
 	if (ret < 0) {
 		free_io_pgtable_ops(pgtbl_ops);
 		return ret;
@@ -1674,7 +1744,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	if (!smmu_domain->smmu) {
 		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
+		ret = arm_smmu_domain_finalise(domain, master);
 		if (ret) {
 			smmu_domain->smmu = NULL;
 			goto out_unlock;
@@ -1830,6 +1900,8 @@ static int arm_smmu_add_device(struct device *dev)
 		}
 	}
 
+	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		iommu_group_put(group);
diff --git a/drivers/iommu/iommu-pasid-table.h b/drivers/iommu/iommu-pasid-table.h
index f52a15f60e81..b84709e297bc 100644
--- a/drivers/iommu/iommu-pasid-table.h
+++ b/drivers/iommu/iommu-pasid-table.h
@@ -78,10 +78,17 @@ struct iommu_pasid_sync_ops {
  * SMMU properties:
  * @stall: devices attached to the domain are allowed to stall.
  * @asid_bits: number of ASID bits supported by the SMMU
+ *
+ * @s1fmt: PASID table format, chosen by the allocator.
  */
 struct arm_smmu_context_cfg {
 	u8				stall:1;
 	u8				asid_bits;
+
+#define ARM_SMMU_S1FMT_LINEAR		0x0
+#define ARM_SMMU_S1FMT_4K_L2		0x1
+#define ARM_SMMU_S1FMT_64K_L2		0x2
+	u8				s1fmt;
 };
 
 /**
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 22/40] iommu/arm-smmu-v3: Add second level of context descriptor table
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (20 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 21/40] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 23/40] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
                     ` (17 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 137 ++++++++++++++++++++++++++--
 1 file changed, 130 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 0969a3626110..d68da99aa472 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -14,6 +14,18 @@
 
 #include "iommu-pasid-table.h"
 
+/*
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entrie,
+ *	 1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORD		1
+#define CTXDESC_L1_DESC_VALID		1
+#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
+
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
 #define ARM64_TCR_T0SZ			GENMASK_ULL(5, 0)
@@ -69,7 +81,17 @@ struct arm_smmu_cd_table {
 
 struct arm_smmu_cd_tables {
 	struct iommu_pasid_table	pasid;
-	struct arm_smmu_cd_table	table;
+	bool				linear;
+	union {
+		struct arm_smmu_cd_table table;
+		struct {
+			__le64		*ptr;
+			dma_addr_t	ptr_dma;
+			size_t		num_entries;
+
+			struct arm_smmu_cd_table *tables;
+		} l1;
+	};
 };
 
 #define pasid_to_cd_tables(pasid_table) \
@@ -105,9 +127,44 @@ static void arm_smmu_free_cd_leaf_table(struct device *dev,
 	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma);
 }
 
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+				      struct arm_smmu_cd_table *desc)
+{
+	u64 val = (desc->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
+		CTXDESC_L1_DESC_VALID;
+
+	*dst = cpu_to_le64(val);
+}
+
 static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
 {
-	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+	unsigned long idx;
+	struct arm_smmu_cd_table *l1_desc;
+	struct iommu_pasid_table_cfg *cfg = &tbl->pasid.cfg;
+
+	if (tbl->linear)
+		return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+
+	idx = ssid >> CTXDESC_SPLIT;
+	if (idx >= tbl->l1.num_entries)
+		return NULL;
+
+	l1_desc = &tbl->l1.tables[idx];
+	if (!l1_desc->ptr) {
+		__le64 *l1ptr = tbl->l1.ptr + idx * CTXDESC_L1_DESC_DWORD;
+
+		if (arm_smmu_alloc_cd_leaf_table(cfg->iommu_dev, l1_desc,
+						 CTXDESC_NUM_L2_ENTRIES))
+			return NULL;
+
+		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+		/* An invalid L1 entry is allowed to be cached */
+		iommu_pasid_flush(&tbl->pasid, idx << CTXDESC_SPLIT, false);
+	}
+
+	idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
+
+	return l1_desc->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
@@ -284,16 +341,51 @@ static struct iommu_pasid_table *
 arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 {
 	int ret;
+	size_t size = 0;
 	struct arm_smmu_cd_tables *tbl;
 	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_table *leaf_table;
+	size_t num_contexts, num_leaf_entries;
 
 	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		return NULL;
 
-	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	num_contexts = 1 << cfg->order;
+	if (num_contexts <= CTXDESC_NUM_L2_ENTRIES) {
+		/* Fits in a single table */
+		tbl->linear = true;
+		num_leaf_entries = num_contexts;
+		leaf_table = &tbl->table;
+	} else {
+		/*
+		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
+		 * 2nd-level
+		 */
+		tbl->l1.num_entries = num_contexts / CTXDESC_NUM_L2_ENTRIES;
+
+		tbl->l1.tables = devm_kzalloc(dev,
+					      sizeof(struct arm_smmu_cd_table) *
+					      tbl->l1.num_entries, GFP_KERNEL);
+		if (!tbl->l1.tables)
+			goto err_free_tbl;
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		tbl->l1.ptr = dmam_alloc_coherent(dev, size, &tbl->l1.ptr_dma,
+						  GFP_KERNEL | __GFP_ZERO);
+		if (!tbl->l1.ptr) {
+			dev_warn(dev, "failed to allocate L1 context table\n");
+			devm_kfree(dev, tbl->l1.tables);
+			goto err_free_tbl;
+		}
+
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+		leaf_table = tbl->l1.tables;
+	}
+
+	ret = arm_smmu_alloc_cd_leaf_table(dev, leaf_table, num_leaf_entries);
 	if (ret)
-		goto err_free_tbl;
+		goto err_free_l1;
 
 	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
 		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
@@ -301,11 +393,23 @@ arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 		.set_entry		= arm_smmu_set_cd,
 		.clear_entry		= arm_smmu_clear_cd,
 	};
-	cfg->base			= tbl->table.ptr_dma;
-	cfg->arm_smmu.s1fmt		= ARM_SMMU_S1FMT_LINEAR;
+
+	if (tbl->linear) {
+		cfg->base		= leaf_table->ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+	} else {
+		cfg->base		= tbl->l1.ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_64K_L2;
+		arm_smmu_write_cd_l1_desc(tbl->l1.ptr, leaf_table);
+	}
 
 	return &tbl->pasid;
 
+err_free_l1:
+	if (!tbl->linear) {
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
 err_free_tbl:
 	devm_kfree(dev, tbl);
 
@@ -318,7 +422,26 @@ static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
 	struct device *dev = cfg->iommu_dev;
 	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
 
-	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	if (tbl->linear) {
+		arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	} else {
+		size_t i, size;
+
+		for (i = 0; i < tbl->l1.num_entries; i++) {
+			struct arm_smmu_cd_table *table = &tbl->l1.tables[i];
+
+			if (!table->ptr)
+				continue;
+
+			arm_smmu_free_cd_leaf_table(dev, table,
+						    CTXDESC_NUM_L2_ENTRIES);
+		}
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
+
 	devm_kfree(dev, tbl);
 }
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 23/40] iommu/arm-smmu-v3: Share process page tables
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (21 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 22/40] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 24/40] iommu/arm-smmu-v3: Seize private ASID Jean-Philippe Brucker
                     ` (16 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

With Shared Virtual Addressing (SVA), we need to mirror CPU TTBR, TCR,
MAIR and ASIDs in SMMU contexts. Each SMMU has a single ASID space split
into two sets, shared and private. Shared ASIDs correspond to those
obtained from the arch ASID allocator, and private ASIDs are used for
"classic" map/unmap DMA.

Replace the ASID IDA with an IDR, allowing to keep information about each
context. Initialize shared contexts with info obtained from the mm.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 182 ++++++++++++++++++++++++++--
 1 file changed, 172 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index d68da99aa472..352cba3c1a62 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -10,8 +10,10 @@
 #include <linux/dma-mapping.h>
 #include <linux/idr.h>
 #include <linux/kernel.h>
+#include <linux/mmu_context.h>
 #include <linux/slab.h>
 
+#include "io-pgtable-arm.h"
 #include "iommu-pasid-table.h"
 
 /*
@@ -69,6 +71,9 @@ struct arm_smmu_cd {
 	u64				ttbr;
 	u64				tcr;
 	u64				mair;
+
+	refcount_t			refs;
+	struct mm_struct		*mm;
 };
 
 #define pasid_entry_to_cd(entry) \
@@ -100,7 +105,8 @@ struct arm_smmu_cd_tables {
 #define pasid_ops_to_tables(ops) \
 	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
 
-static DEFINE_IDA(asid_ida);
+static DEFINE_SPINLOCK(asid_lock);
+static DEFINE_IDR(asid_idr);
 
 static int arm_smmu_alloc_cd_leaf_table(struct device *dev,
 					struct arm_smmu_cd_table *desc,
@@ -239,7 +245,8 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 #ifdef __BIG_ENDIAN
 		      CTXDESC_CD_0_ENDI |
 #endif
-		      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
+		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
+		      (cd->mm ? 0 : CTXDESC_CD_0_ASET) |
 		      CTXDESC_CD_0_AA64 |
 		      FIELD_PREP(CTXDESC_CD_0_ASID, cd->entry.tag) |
 		      CTXDESC_CD_0_V;
@@ -255,18 +262,161 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 	return 0;
 }
 
+static bool arm_smmu_free_asid(struct arm_smmu_cd *cd)
+{
+	bool free;
+	struct arm_smmu_cd *old_cd;
+
+	spin_lock(&asid_lock);
+	free = refcount_dec_and_test(&cd->refs);
+	if (free) {
+		old_cd = idr_remove(&asid_idr, (u16)cd->entry.tag);
+		WARN_ON(old_cd != cd);
+	}
+	spin_unlock(&asid_lock);
+
+	return free;
+}
+
 static void arm_smmu_free_cd(struct iommu_pasid_entry *entry)
 {
 	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
-	ida_simple_remove(&asid_ida, (u16)entry->tag);
+	if (!arm_smmu_free_asid(cd))
+		return;
+
+	if (cd->mm) {
+		/* Unpin ASID */
+		mm_context_put(cd->mm);
+	}
+
 	kfree(cd);
 }
 
+static struct arm_smmu_cd *arm_smmu_alloc_cd(struct arm_smmu_cd_tables *tbl)
+{
+	struct arm_smmu_cd *cd;
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd)
+		return NULL;
+
+	cd->entry.release = arm_smmu_free_cd;
+	refcount_set(&cd->refs, 1);
+
+	return cd;
+}
+
+static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
+{
+	struct arm_smmu_cd *cd;
+
+	cd = idr_find(&asid_idr, asid);
+	if (!cd)
+		return NULL;
+
+	if (cd->mm) {
+		/*
+		 * It's pretty common to find a stale CD when doing unbind-bind,
+		 * given that the release happens after a RCU grace period.
+		 * Simply reuse it.
+		 */
+		refcount_inc(&cd->refs);
+		return cd;
+	}
+
+	/*
+	 * Ouch, ASID is already in use for a private cd.
+	 * TODO: seize it, for the common good.
+	 */
+	return ERR_PTR(-EEXIST);
+}
+
 static struct iommu_pasid_entry *
 arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
 {
-	return ERR_PTR(-ENODEV);
+	u16 asid;
+	u64 tcr, par, reg;
+	int ret = -ENOMEM;
+	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd *old_cd = NULL;
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+
+	asid = mm_context_get(mm);
+	if (!asid)
+		return ERR_PTR(-ESRCH);
+
+	cd = arm_smmu_alloc_cd(tbl);
+	if (!cd)
+		goto err_put_context;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&asid_lock);
+	old_cd = arm_smmu_share_asid(asid);
+	if (!old_cd)
+		ret = idr_alloc(&asid_idr, cd, asid, asid + 1, GFP_ATOMIC);
+	spin_unlock(&asid_lock);
+	idr_preload_end();
+
+	if (!IS_ERR_OR_NULL(old_cd)) {
+		if (WARN_ON(old_cd->mm != mm)) {
+			ret = -EINVAL;
+			goto err_free_cd;
+		}
+		kfree(cd);
+		mm_context_put(mm);
+		return &old_cd->entry;
+	} else if (old_cd) {
+		ret = PTR_ERR(old_cd);
+		goto err_free_cd;
+	}
+
+	tcr = TCR_T0SZ(VA_BITS) | TCR_IRGN0_WBWA | TCR_ORGN0_WBWA |
+		TCR_SH0_INNER | ARM_LPAE_TCR_EPD1;
+
+	switch (PAGE_SIZE) {
+	case SZ_4K:
+		tcr |= TCR_TG0_4K;
+		break;
+	case SZ_16K:
+		tcr |= TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		tcr |= TCR_TG0_64K;
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EINVAL;
+		goto err_free_asid;
+	}
+
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+
+	cd->ttbr	= virt_to_phys(mm->pgd);
+	cd->tcr		= tcr;
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	cd->mair	= read_sysreg(mair_el1);
+
+	cd->mm		= mm;
+	cd->entry.tag	= asid;
+
+	return &cd->entry;
+
+err_free_asid:
+	arm_smmu_free_asid(cd);
+
+err_free_cd:
+	kfree(cd);
+
+err_put_context:
+	mm_context_put(mm);
+
+	return ERR_PTR(ret);
 }
 
 static struct iommu_pasid_entry *
@@ -280,20 +430,23 @@ arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
 	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
 
-	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	cd = arm_smmu_alloc_cd(tbl);
 	if (!cd)
 		return ERR_PTR(-ENOMEM);
 
-	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
-			      GFP_KERNEL);
+	idr_preload(GFP_KERNEL);
+	spin_lock(&asid_lock);
+	asid = idr_alloc_cyclic(&asid_idr, cd, 0, 1 << ctx_cfg->asid_bits,
+				GFP_ATOMIC);
+	cd->entry.tag = asid;
+	spin_unlock(&asid_lock);
+	idr_preload_end();
+
 	if (asid < 0) {
 		kfree(cd);
 		return ERR_PTR(asid);
 	}
 
-	cd->entry.tag = asid;
-	cd->entry.release = arm_smmu_free_cd;
-
 	switch (fmt) {
 	case ARM_64_LPAE_S1:
 		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
@@ -330,11 +483,20 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 			      struct iommu_pasid_entry *entry)
 {
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return;
 
 	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
+
+	/*
+	 * The ASID allocator won't broadcast the final TLB invalidations for
+	 * this ASID, so we need to do it manually. For private contexts,
+	 * freeing io-pgtable ops performs the invalidation.
+	 */
+	if (cd->mm)
+		iommu_pasid_flush_tlbs(&tbl->pasid, pasid, entry);
 }
 
 static struct iommu_pasid_table *
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 24/40] iommu/arm-smmu-v3: Seize private ASID
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (22 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 23/40] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 25/40] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
                     ` (15 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The SMMU has a single ASID space, the union of shared and private ASID
sets. This means that the context table module competes with the arch
allocator for ASIDs. Shared ASIDs are those of Linux processes, allocated
by the arch, and contribute in broadcast TLB maintenance. Private ASIDs
are allocated by the SMMU driver and used for "classic" map/unmap DMA.
They require explicit TLB invalidations.

When we pin down an mm_context and get an ASID that is already in use by
the SMMU, it belongs to a private context. We used to simply abort the
bind, but this is unfair to users that would be unable to bind a few
seemingly random processes. Try to allocate a new private ASID for the
context in use, and make the old ASID shared.

Introduce a new lock to prevent races when rewriting context descriptors.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 90 ++++++++++++++++++++++++++---
 1 file changed, 83 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 352cba3c1a62..0e12f6804e16 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -65,6 +65,8 @@
 #define ARM_SMMU_TCR2CD(tcr, fld)	FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
 					FIELD_GET(ARM64_TCR_##fld, tcr))
 
+#define ARM_SMMU_NO_PASID		(-1)
+
 struct arm_smmu_cd {
 	struct iommu_pasid_entry	entry;
 
@@ -72,8 +74,14 @@ struct arm_smmu_cd {
 	u64				tcr;
 	u64				mair;
 
+	int				pasid;
+
+	/* 'refs' tracks alloc/free */
 	refcount_t			refs;
+	/* 'users' tracks attach/detach, and is only used for sanity checking */
+	unsigned int			users;
 	struct mm_struct		*mm;
+	struct arm_smmu_cd_tables	*tbl;
 };
 
 #define pasid_entry_to_cd(entry) \
@@ -105,6 +113,7 @@ struct arm_smmu_cd_tables {
 #define pasid_ops_to_tables(ops) \
 	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
 
+static DEFINE_SPINLOCK(contexts_lock);
 static DEFINE_SPINLOCK(asid_lock);
 static DEFINE_IDR(asid_idr);
 
@@ -191,8 +200,8 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
-				   struct arm_smmu_cd *cd)
+static int __arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				     struct arm_smmu_cd *cd)
 {
 	u64 val;
 	bool cd_live;
@@ -262,6 +271,18 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 	return 0;
 }
 
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				   struct arm_smmu_cd *cd)
+{
+	int ret;
+
+	spin_lock(&contexts_lock);
+	ret = __arm_smmu_write_ctx_desc(tbl, ssid, cd);
+	spin_unlock(&contexts_lock);
+
+	return ret;
+}
+
 static bool arm_smmu_free_asid(struct arm_smmu_cd *cd)
 {
 	bool free;
@@ -301,15 +322,26 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd(struct arm_smmu_cd_tables *tbl)
 	if (!cd)
 		return NULL;
 
-	cd->entry.release = arm_smmu_free_cd;
+	cd->pasid		= ARM_SMMU_NO_PASID;
+	cd->tbl			= tbl;
+	cd->entry.release	= arm_smmu_free_cd;
 	refcount_set(&cd->refs, 1);
 
 	return cd;
 }
 
+/*
+ * Try to reserve this ASID in the SMMU. If it is in use, try to steal it from
+ * the private entry. Careful here, we may be modifying the context tables of
+ * another SMMU!
+ */
 static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 {
+	int ret;
 	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl;
+	struct arm_smmu_context_cfg *cfg;
+	struct iommu_pasid_entry old_entry;
 
 	cd = idr_find(&asid_idr, asid);
 	if (!cd)
@@ -319,17 +351,47 @@ static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 		/*
 		 * It's pretty common to find a stale CD when doing unbind-bind,
 		 * given that the release happens after a RCU grace period.
-		 * Simply reuse it.
+		 * Simply reuse it, but check that it isn't active, because it's
+		 * going to be assigned a different PASID.
 		 */
+		if (WARN_ON(cd->users))
+			return ERR_PTR(-EINVAL);
+
 		refcount_inc(&cd->refs);
 		return cd;
 	}
 
+	tbl = cd->tbl;
+	cfg = &tbl->pasid.cfg.arm_smmu;
+
+	ret = idr_alloc_cyclic(&asid_idr, cd, 0, 1 << cfg->asid_bits,
+			       GFP_ATOMIC);
+	if (ret < 0)
+		return ERR_PTR(-ENOSPC);
+
+	/* Save the previous ASID */
+	old_entry = cd->entry;
+
 	/*
-	 * Ouch, ASID is already in use for a private cd.
-	 * TODO: seize it, for the common good.
+	 * Race with unmap; TLB invalidations will start targeting the new ASID,
+	 * which isn't assigned yet. We'll do an invalidate-all on the old ASID
+	 * later, so it doesn't matter.
 	 */
-	return ERR_PTR(-EEXIST);
+	cd->entry.tag = ret;
+
+	/*
+	 * Update ASID and invalidate CD in all associated masters. There will
+	 * be some overlap between use of both ASIDs, until we invalidate the
+	 * TLB.
+	 */
+	arm_smmu_write_ctx_desc(tbl, cd->pasid, cd);
+
+	/* Invalidate TLB entries previously associated with that context */
+	iommu_pasid_flush_tlbs(&tbl->pasid, cd->pasid, &old_entry);
+
+	idr_remove(&asid_idr, asid);
+
+	return NULL;
 }
 
 static struct iommu_pasid_entry *
@@ -476,6 +538,15 @@ static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return -EINVAL;
 
+	if (WARN_ON(cd->pasid != ARM_SMMU_NO_PASID && cd->pasid != pasid))
+		return -EEXIST;
+
+	/*
+	 * There is a single cd structure for each address space, multiple
+	 * devices may use the same in different tables.
+	 */
+	cd->users++;
+	cd->pasid = pasid;
 	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
 }
 
@@ -488,6 +559,11 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return;
 
+	WARN_ON(cd->pasid != pasid);
+
+	if (!(--cd->users))
+		cd->pasid = ARM_SMMU_NO_PASID;
+
 	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
 
 	/*
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 25/40] iommu/arm-smmu-v3: Add support for VHE
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (23 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 24/40] iommu/arm-smmu-v3: Seize private ASID Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 26/40] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Jean-Philippe Brucker
                     ` (14 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

ARMv8.1 extensions added Virtualization Host Extensions (VHE), which allow
to run a host kernel at EL2. When using normal DMA, Device and CPU address
spaces are dissociated, and do not need to implement the same
capabilities, so VHE hasn't been used in the SMMU until now.

With shared address spaces however, ASIDs are shared between MMU and SMMU,
and broadcast TLB invalidations issued by a CPU are taken into account by
the SMMU. TLB entries on both sides need to have identical exception level
in order to be cleared with a single invalidation.

When the CPU is using VHE, enable VHE in the SMMU for all STEs. Normal DMA
mappings will need to use TLBI_EL2 commands instead of TLBI_NH, but
shouldn't be otherwise affected by this change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 16b08f2fb8ac..280a5d9be839 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -24,6 +24,7 @@
 #include <linux/acpi_iort.h>
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
+#include <linux/cpufeature.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -400,6 +401,8 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
@@ -519,6 +522,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_VAX		(1 << 14)
+#define ARM_SMMU_FEAT_E2H		(1 << 15)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -777,6 +781,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
 		break;
 	case CMDQ_OP_TLBI_NH_VA:
+	case CMDQ_OP_TLBI_EL2_VA:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
 		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
@@ -792,6 +797,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
 		break;
+	case CMDQ_OP_TLBI_EL2_ASID:
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
 		cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SSID, ent->pri.ssid);
@@ -1064,6 +1072,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 
 	if (ste->s1_cfg) {
 		struct iommu_pasid_table_cfg *cfg = &ste->s1_cfg->tables;
+		int strw = smmu->features & ARM_SMMU_FEAT_E2H ?
+			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
 
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
@@ -1074,7 +1084,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 #ifdef CONFIG_PCI_ATS
 			 FIELD_PREP(STRTAB_STE_1_EATS, STRTAB_STE_1_EATS_TRANS) |
 #endif
-			 FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
+			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
@@ -1325,7 +1335,8 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
@@ -1352,7 +1363,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
@@ -1422,7 +1434,8 @@ static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd = {
-		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.opcode		= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
 		.tlbi.asid	= entry->tag,
 	};
 
@@ -2446,7 +2459,11 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID | CR2_E2H;
+	reg = CR2_PTM | CR2_RECINVSID;
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H)
+		reg |= CR2_E2H;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2594,8 +2611,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR0_MSI)
 		smmu->features |= ARM_SMMU_FEAT_MSI;
 
-	if (reg & IDR0_HYP)
+	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
 
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 26/40] iommu/arm-smmu-v3: Enable broadcast TLB maintenance
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (24 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 25/40] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 27/40] iommu/arm-smmu-v3: Add SVA feature checking Jean-Philippe Brucker
                     ` (13 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs. If the implementation supports broadcast TLB maintenance, enable it
and keep track of it in a feature bit. The SMMU will then be affected by
inner-shareable TLB invalidations from other agents.

A major side-effect of this change is that stage-2 translation contexts
are now affected by all invalidations by VMID. VMIDs are all shared and
the only ways to prevent over-invalidation, since the stage-2 page tables
are not shared between CPU and SMMU, are to either disable BTM or allocate
different VMIDs. This patch does not address the problem.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 280a5d9be839..073cba33ae6c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -64,6 +64,7 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
 #define IDR0_TTF_AARCH64		2
@@ -523,6 +524,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_VAX		(1 << 14)
 #define ARM_SMMU_FEAT_E2H		(1 << 15)
+#define ARM_SMMU_FEAT_BTM		(1 << 16)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2459,11 +2461,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2564,6 +2569,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -2613,10 +2619,19 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	/*
+	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+	 * will create TLB entries for NH-EL1 world and will miss the
+	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+	 * BTM in that case.
+	 */
+	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+		smmu->features |= ARM_SMMU_FEAT_BTM;
+
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 27/40] iommu/arm-smmu-v3: Add SVA feature checking
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (25 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 26/40] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 28/40] iommu/arm-smmu-v3: Implement mm operations Jean-Philippe Brucker
                     ` (12 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVA bit. For PCIe SVA, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVM, they will most likely have
to check FEAT_STALLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: Add 52-bit PA cap and debug message
---
 drivers/iommu/arm-smmu-v3.c | 72 +++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 073cba33ae6c..2716e4a4d3f7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -525,6 +525,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_VAX		(1 << 14)
 #define ARM_SMMU_FEAT_E2H		(1 << 15)
 #define ARM_SMMU_FEAT_BTM		(1 << 16)
+#define ARM_SMMU_FEAT_SVA		(1 << 17)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2565,6 +2566,74 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static bool arm_smmu_supports_sva(struct arm_smmu_device *smmu)
+{
+	unsigned long reg, fld;
+	unsigned long oas;
+	unsigned long asid_bits;
+
+	u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+	if ((smmu->features & feat_mask) != feat_mask)
+		return false;
+
+	if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+		return false;
+
+	/*
+	 * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+	 * not even pretending to support AArch32 here.
+	 */
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	switch (fld) {
+	case 0x0:
+		oas = 32;
+		break;
+	case 0x1:
+		oas = 36;
+		break;
+	case 0x2:
+		oas = 40;
+		break;
+	case 0x3:
+		oas = 42;
+		break;
+	case 0x4:
+		oas = 44;
+		break;
+	case 0x5:
+		oas = 48;
+		break;
+	case 0x6:
+		oas = 52;
+		break;
+	default:
+		return false;
+	}
+
+	/* abort if MMU outputs addresses greater than what we support. */
+	if (smmu->oas < oas)
+		return false;
+
+	/* We can support bigger ASIDs than the CPU, but not smaller */
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_ASID_SHIFT);
+	asid_bits = fld ? 16 : 8;
+	if (smmu->asid_bits < asid_bits)
+		return false;
+
+	/*
+	 * See max_pinned_asids in arch/arm64/mm/context.c. The following is
+	 * generally the maximum number of bindable processes.
+	 */
+	if (IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
+		asid_bits--;
+	dev_dbg(smmu->dev, "%d shared contexts\n", (1 << asid_bits) -
+		num_possible_cpus() - 2);
+
+	return true;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -2766,6 +2835,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	smmu->ias = max(smmu->ias, smmu->oas);
 
+	if (arm_smmu_supports_sva(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVA;
+
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
 	return 0;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 28/40] iommu/arm-smmu-v3: Implement mm operations
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (26 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 27/40] iommu/arm-smmu-v3: Add SVA feature checking Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 29/40] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
                     ` (11 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Hook mm operations to support PASID and page table sharing with the
SMMUv3:

* mm_alloc allocates a context descriptor.
* mm_free releases the context descriptor.
* mm_attach checks device capabilities and writes the context descriptor.
* mm_detach clears the context descriptor and sends required
  invalidations.
* mm_invalidate sends required invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 126 ++++++++++++++++++++++++++++++++++++
 2 files changed, 127 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 11c8492b3763..70900670a9fa 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -343,6 +343,7 @@ config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
 	select IOMMU_API
+	select IOMMU_SVA
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2716e4a4d3f7..c2c96025ac3b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -31,6 +31,7 @@
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
+#include <linux/mmu_context.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -39,6 +40,7 @@
 #include <linux/of_platform.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/sched/mm.h>
 
 #include <linux/amba/bus.h>
 
@@ -599,6 +601,11 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+struct arm_smmu_mm {
+	struct io_mm			io_mm;
+	struct iommu_pasid_entry	*cd;
+};
+
 struct arm_smmu_option_prop {
 	u32 opt;
 	const char *prop;
@@ -625,6 +632,11 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_mm *to_smmu_mm(struct io_mm *io_mm)
+{
+	return container_of(io_mm, struct arm_smmu_mm, io_mm);
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -1725,6 +1737,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (smmu_domain) {
+		__iommu_sva_unbind_dev_all(dev);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -1842,6 +1856,111 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_sva_init(struct device *dev, struct iommu_sva_param *param)
+{
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	/* SSID support is mandatory for the moment */
+	if (!master->ssid_bits)
+		return -EINVAL;
+
+	if (param->features)
+		return -EINVAL;
+
+	if (!param->max_pasid)
+		param->max_pasid = 0xfffffU;
+
+	/* SSID support in the SMMU requires at least one SSID bit */
+	param->min_pasid = max(param->min_pasid, 1U);
+	param->max_pasid = min(param->max_pasid, (1U << master->ssid_bits) - 1);
+
+	return 0;
+}
+
+static void arm_smmu_sva_shutdown(struct device *dev,
+				  struct iommu_sva_param *param)
+{
+}
+
+static struct io_mm *arm_smmu_mm_alloc(struct iommu_domain *domain,
+				       struct mm_struct *mm,
+				       unsigned long flags)
+{
+	struct arm_smmu_mm *smmu_mm;
+	struct iommu_pasid_entry *cd;
+	struct iommu_pasid_table_ops *ops;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return NULL;
+
+	smmu_mm = kzalloc(sizeof(*smmu_mm), GFP_KERNEL);
+	if (!smmu_mm)
+		return NULL;
+
+	ops = smmu_domain->s1_cfg.ops;
+	cd = ops->alloc_shared_entry(ops, mm);
+	if (IS_ERR(cd)) {
+		kfree(smmu_mm);
+		return ERR_CAST(cd);
+	}
+
+	smmu_mm->cd = cd;
+	return &smmu_mm->io_mm;
+}
+
+static void arm_smmu_mm_free(struct io_mm *io_mm)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+
+	iommu_free_pasid_entry(smmu_mm->cd);
+	kfree(smmu_mm);
+}
+
+static int arm_smmu_mm_attach(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, bool attach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
+		return -ENODEV;
+
+	if (!attach_domain)
+		return 0;
+
+	return ops->set_entry(ops, io_mm->pasid, smmu_mm->cd);
+}
+
+static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
+			       struct io_mm *io_mm, bool detach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+
+	if (detach_domain)
+		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
+
+	/* TODO: Invalidate ATC. */
+	/* TODO: Invalidate all mappings if last and not DVM. */
+}
+
+static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
+				   struct device *dev, struct io_mm *io_mm,
+				   unsigned long iova, size_t size)
+{
+	/*
+	 * TODO: Invalidate ATC.
+	 * TODO: Invalidate mapping if not DVM
+	 */
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static int arm_smmu_match_node(struct device *dev, void *data)
@@ -2048,6 +2167,13 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
+	.sva_device_init	= arm_smmu_sva_init,
+	.sva_device_shutdown	= arm_smmu_sva_shutdown,
+	.mm_alloc		= arm_smmu_mm_alloc,
+	.mm_free		= arm_smmu_mm_free,
+	.mm_attach		= arm_smmu_mm_attach,
+	.mm_detach		= arm_smmu_mm_detach,
+	.mm_invalidate		= arm_smmu_mm_invalidate,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 29/40] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (27 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 28/40] iommu/arm-smmu-v3: Implement mm operations Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 30/40] iommu/arm-smmu-v3: Register I/O Page Fault queue Jean-Philippe Brucker
                     ` (10 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

If the SMMU supports it and the kernel was built with HTTU support, enable
hardware update of access and dirty flags. This is essential for shared
page tables, to reduce the number of access faults on the fault queue.

We can enable HTTU even if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 16 ++++++++++++++--
 drivers/iommu/arm-smmu-v3.c         | 12 ++++++++++++
 drivers/iommu/iommu-pasid-table.h   |  4 ++++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 0e12f6804e16..bdc9bfd1f35d 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -52,6 +52,11 @@
 #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
 #define ARM64_TCR_TBI0			(1ULL << 37)
 
+#define CTXDESC_CD_0_TCR_HA		(1UL << 43)
+#define ARM64_TCR_HA			(1ULL << 39)
+#define CTXDESC_CD_0_TCR_HD		(1UL << 42)
+#define ARM64_TCR_HD			(1ULL << 40)
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
@@ -182,7 +187,7 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
 	return l1_desc->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_context_cfg *cfg, u64 tcr)
 {
 	u64 val = 0;
 
@@ -197,6 +202,12 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
 	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
+	if (cfg->hw_access)
+		val |= ARM_SMMU_TCR2CD(tcr, HA);
+
+	if (cfg->hw_dirty)
+		val |= ARM_SMMU_TCR2CD(tcr, HD);
+
 	return val;
 }
 
@@ -250,7 +261,7 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 		iommu_pasid_flush(&tbl->pasid, ssid, true);
 
 
-		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+		val = arm_smmu_cpu_tcr_to_cd(cfg, cd->tcr) |
 #ifdef __BIG_ENDIAN
 		      CTXDESC_CD_0_ENDI |
 #endif
@@ -455,6 +466,7 @@ arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm
 	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
 	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+	tcr |= TCR_HA | TCR_HD;
 
 	cd->ttbr	= virt_to_phys(mm->pgd);
 	cd->tcr		= tcr;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c2c96025ac3b..7c839d305d97 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -66,6 +66,8 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HD				(1 << 7)
+#define IDR0_HA				(1 << 6)
 #define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
@@ -528,6 +530,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 15)
 #define ARM_SMMU_FEAT_BTM		(1 << 16)
 #define ARM_SMMU_FEAT_SVA		(1 << 17)
+#define ARM_SMMU_FEAT_HA		(1 << 18)
+#define ARM_SMMU_FEAT_HD		(1 << 19)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1567,6 +1571,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 		.arm_smmu = {
 			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
 			.asid_bits	= smmu->asid_bits,
+			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
+			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
 		},
 	};
 
@@ -2818,6 +2824,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	if (reg & (IDR0_HA | IDR0_HD)) {
+		smmu->features |= ARM_SMMU_FEAT_HA;
+		if (reg & IDR0_HD)
+			smmu->features |= ARM_SMMU_FEAT_HD;
+	}
+
 	/*
 	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
 	 * will create TLB entries for NH-EL1 world and will miss the
diff --git a/drivers/iommu/iommu-pasid-table.h b/drivers/iommu/iommu-pasid-table.h
index b84709e297bc..a7243579a4cb 100644
--- a/drivers/iommu/iommu-pasid-table.h
+++ b/drivers/iommu/iommu-pasid-table.h
@@ -78,12 +78,16 @@ struct iommu_pasid_sync_ops {
  * SMMU properties:
  * @stall: devices attached to the domain are allowed to stall.
  * @asid_bits: number of ASID bits supported by the SMMU
+ * @hw_dirty: hardware may update dirty flag
+ * @hw_access: hardware may update access flag
  *
  * @s1fmt: PASID table format, chosen by the allocator.
  */
 struct arm_smmu_context_cfg {
 	u8				stall:1;
 	u8				asid_bits;
+	u8				hw_dirty:1;
+	u8				hw_access:1;
 
 #define ARM_SMMU_S1FMT_LINEAR		0x0
 #define ARM_SMMU_S1FMT_4K_L2		0x1
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 30/40] iommu/arm-smmu-v3: Register I/O Page Fault queue
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (28 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 29/40] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 31/40] iommu/arm-smmu-v3: Improve add_device error handling Jean-Philippe Brucker
                     ` (9 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

When using PRI or Stall, the PRI or event handler enqueues faults into the
core fault queue. Register it based on the SMMU features.

When the core stops using a PASID, it notifies the SMMU to flush all
instances of this PASID from the PRI queue. Add a way to flush the PRI and
event queue. PRI and event thread now take a spinlock while processing the
queue. The flush handler takes this lock to inspect the queue state.
We avoid livelock, where the SMMU adds fault to the queue faster than we
can consume them, by incrementing a 'batch' number on every cycle so the
flush handler only has to wait a complete cycle (two batch increments).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: Use an iopf_queue for each SMMU
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 111 +++++++++++++++++++++++++++++++++++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 70900670a9fa..41db49795c90 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -344,6 +344,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_SVA
+	select IOMMU_PAGE_FAULT
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7c839d305d97..5d57f41f79b4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -448,6 +448,10 @@ struct arm_smmu_queue {
 
 	u32 __iomem			*prod_reg;
 	u32 __iomem			*cons_reg;
+
+	/* Event and PRI */
+	u64				batch;
+	wait_queue_head_t		wq;
 };
 
 struct arm_smmu_cmdq {
@@ -565,6 +569,8 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	struct iopf_queue		*iopf_queue;
 };
 
 /* SMMU private data for each master */
@@ -577,6 +583,7 @@ struct arm_smmu_master_data {
 
 	struct device			*dev;
 	size_t				ssid_bits;
+	bool				can_fault;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -1183,14 +1190,23 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
 	int i;
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[EVTQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
 
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_all_locked(&q->wq);
+				num_handled = 0;
+			}
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1208,6 +1224,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_all_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
@@ -1251,13 +1272,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 {
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->priq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[PRIQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
-		while (!queue_remove_raw(q, evt))
+		while (!queue_remove_raw(q, evt)) {
+			spin_unlock(&q->wq.lock);
 			arm_smmu_handle_ppr(smmu, evt);
+			spin_lock(&q->wq.lock);
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_all_locked(&q->wq);
+				num_handled = 0;
+			}
+		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
@@ -1265,9 +1297,60 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_all_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
+/*
+ * arm_smmu_flush_queue - wait until all events/PPRs currently in the queue have
+ * been consumed.
+ *
+ * Wait until the queue thread finished a batch, or until the queue is empty.
+ * Note that we don't handle overflows on q->batch. If it occurs, just wait for
+ * the queue to be empty.
+ */
+static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
+				struct arm_smmu_queue *q, const char *name)
+{
+	int ret;
+	u64 batch;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod(q) == -EOVERFLOW)
+		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	return ret;
+}
+
+static int arm_smmu_flush_queues(void *cookie, struct device *dev)
+{
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_device *smmu = cookie;
+
+	if (dev) {
+		master = dev->iommu_fwspec->iommu_priv;
+		/* TODO: add support for PRI and Stall */
+		return 0;
+	}
+
+	/* No target device, flush all queues. */
+	if (smmu->features & ARM_SMMU_FEAT_STALLS)
+		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+	if (smmu->features & ARM_SMMU_FEAT_PRI)
+		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
+
+	return 0;
+}
+
 static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
@@ -1864,15 +1947,24 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 
 static int arm_smmu_sva_init(struct device *dev, struct iommu_sva_param *param)
 {
+	int ret;
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
 	/* SSID support is mandatory for the moment */
 	if (!master->ssid_bits)
 		return -EINVAL;
 
-	if (param->features)
+	if (param->features & ~IOMMU_SVA_FEAT_IOPF)
 		return -EINVAL;
 
+	if (param->features & IOMMU_SVA_FEAT_IOPF) {
+		if (!master->can_fault)
+			return -EINVAL;
+		ret = iopf_queue_add_device(master->smmu->iopf_queue, dev);
+		if (ret)
+			return ret;
+	}
+
 	if (!param->max_pasid)
 		param->max_pasid = 0xfffffU;
 
@@ -1886,6 +1978,7 @@ static int arm_smmu_sva_init(struct device *dev, struct iommu_sva_param *param)
 static void arm_smmu_sva_shutdown(struct device *dev,
 				  struct iommu_sva_param *param)
 {
+	iopf_queue_remove_device(dev);
 }
 
 static struct io_mm *arm_smmu_mm_alloc(struct iommu_domain *domain,
@@ -2063,6 +2156,7 @@ static void arm_smmu_remove_device(struct device *dev)
 
 	master = fwspec->iommu_priv;
 	smmu = master->smmu;
+	iopf_queue_remove_device(dev);
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	iommu_group_remove_device(dev);
@@ -2222,6 +2316,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->max_n_shift);
 
 	q->prod = q->cons = 0;
+
+	init_waitqueue_head(&q->wq);
+	q->batch = 0;
+
 	return 0;
 }
 
@@ -3128,6 +3226,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
+		smmu->iopf_queue = iopf_queue_alloc(dev_name(dev),
+						    arm_smmu_flush_queues,
+						    smmu);
+		if (!smmu->iopf_queue)
+			return -ENOMEM;
+	}
+
 	/* And we're up. Go go go! */
 	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
 				     "smmu3.%pa", &ioaddr);
@@ -3170,6 +3276,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
+	iopf_queue_free(smmu->iopf_queue);
 	arm_smmu_device_disable(smmu);
 
 	return 0;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 31/40] iommu/arm-smmu-v3: Improve add_device error handling
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (29 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 30/40] iommu/arm-smmu-v3: Register I/O Page Fault queue Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 32/40] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
                     ` (8 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

As add_device becomes more likely to fail when adding new features, let it
clean up behind itself. The iommu_bus_init function does call
remove_device on error, but other sites (e.g. of_iommu) do not.

Don't free level-2 stream tables because we'd have to track if we
allocated each of them or if they are used by other endpoints. It's not
worth the hassle since they are managed resources.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: new
---
 drivers/iommu/arm-smmu-v3.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 5d57f41f79b4..d5f3875abfb9 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2123,26 +2123,43 @@ static int arm_smmu_add_device(struct device *dev)
 	for (i = 0; i < fwspec->num_ids; i++) {
 		u32 sid = fwspec->ids[i];
 
-		if (!arm_smmu_sid_in_range(smmu, sid))
-			return -ERANGE;
+		if (!arm_smmu_sid_in_range(smmu, sid)) {
+			ret = -ERANGE;
+			goto err_free_master;
+		}
 
 		/* Ensure l2 strtab is initialised */
 		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
 			ret = arm_smmu_init_l2_strtab(smmu, sid);
 			if (ret)
-				return ret;
+				goto err_free_master;
 		}
 	}
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
+	ret = iommu_device_link(&smmu->iommu, dev);
+	if (ret)
+		goto err_free_master;
+
 	group = iommu_group_get_for_dev(dev);
-	if (!IS_ERR(group)) {
-		iommu_group_put(group);
-		iommu_device_link(&smmu->iommu, dev);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
+		goto err_unlink;
 	}
 
-	return PTR_ERR_OR_ZERO(group);
+	iommu_group_put(group);
+
+	return 0;
+
+err_unlink:
+	iommu_device_unlink(&smmu->iommu, dev);
+
+err_free_master:
+	kfree(master);
+	fwspec->iommu_priv = NULL;
+
+	return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
@@ -2155,9 +2172,12 @@ static void arm_smmu_remove_device(struct device *dev)
 		return;
 
 	master = fwspec->iommu_priv;
+	if (!master)
+		return;
+
 	smmu = master->smmu;
 	iopf_queue_remove_device(dev);
-	if (master && master->ste.assigned)
+	if (master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 32/40] iommu/arm-smmu-v3: Maintain a SID->device structure
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (30 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 31/40] iommu/arm-smmu-v3: Improve add_device error handling Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 33/40] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
                     ` (7 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 114 +++++++++++++++++++++++++++++++++++-
 1 file changed, 113 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d5f3875abfb9..0f2d8aa0deee 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -570,9 +570,18 @@ struct arm_smmu_device {
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
 
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+
 	struct iopf_queue		*iopf_queue;
 };
 
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master_data	*master;
+	struct rb_node			node;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
@@ -580,6 +589,7 @@ struct arm_smmu_master_data {
 
 	struct arm_smmu_domain		*domain;
 	struct list_head		list; /* domain->devices */
+	struct arm_smmu_stream		*streams;
 
 	struct device			*dev;
 	size_t				ssid_bits;
@@ -1186,6 +1196,32 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+__maybe_unused
+static struct arm_smmu_master_data *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master_data *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2086,6 +2122,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master_data *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		new_stream = &master->streams[i];
+		new_stream->id = fwspec->ids[i];
+		new_stream->master = master;
+
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master_data *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2142,16 +2243,23 @@ static int arm_smmu_add_device(struct device *dev)
 	if (ret)
 		goto err_free_master;
 
+	ret = arm_smmu_insert_master(smmu, master);
+	if (ret)
+		goto err_unlink;
+
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
 		ret = PTR_ERR(group);
-		goto err_unlink;
+		goto err_remove_master;
 	}
 
 	iommu_group_put(group);
 
 	return 0;
 
+err_remove_master:
+	arm_smmu_remove_master(smmu, master);
+
 err_unlink:
 	iommu_device_unlink(&smmu->iommu, dev);
 
@@ -2180,6 +2288,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	iommu_group_remove_device(dev);
+	arm_smmu_remove_master(smmu, master);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
 	iommu_fwspec_free(dev);
@@ -2483,6 +2592,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	int ret;
 
 	atomic_set(&smmu->sync_nr, 0);
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 33/40] iommu/arm-smmu-v3: Add stall support for platform devices
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (31 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 32/40] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 34/40] ACPI/IORT: Check ATS capability in root complex nodes Jean-Philippe Brucker
                     ` (6 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 178 +++++++++++++++++++++++++++++++++++-
 1 file changed, 173 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 0f2d8aa0deee..8a6a799ba04a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -301,6 +301,11 @@
 #define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
 #define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
 
+#define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
+#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << 12)
+#define CMDQ_RESUME_0_ACTION_ABORT	(1UL << 13)
+#define CMDQ_RESUME_1_STAG		GENMASK_ULL(15, 0)
+
 #define CMDQ_SYNC_0_CS			GENMASK_ULL(13, 12)
 #define CMDQ_SYNC_0_CS_NONE		0
 #define CMDQ_SYNC_0_CS_IRQ		1
@@ -316,6 +321,25 @@
 
 #define EVTQ_0_ID			GENMASK_ULL(7, 0)
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID			GENMASK_ULL(31, 12)
+#define EVTQ_0_SID			GENMASK_ULL(63, 32)
+#define EVTQ_1_STAG			GENMASK_ULL(15, 0)
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS			GENMASK_ULL(41, 40)
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR			GENMASK_ULL(63, 0)
+#define EVTQ_3_IPA			GENMASK_ULL(51, 12)
+
 /* PRI queue */
 #define PRIQ_ENT_DWORDS			2
 #define PRIQ_MAX_SZ_SHIFT		8
@@ -426,6 +450,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			enum page_response_code	resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 		struct {
 			u32			msidata;
@@ -499,6 +530,8 @@ struct arm_smmu_strtab_ent {
 	bool				assigned;
 	struct arm_smmu_s1_cfg		*s1_cfg;
 	struct arm_smmu_s2_cfg		*s2_cfg;
+
+	bool				can_stall;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -851,6 +884,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		}
 		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
 		break;
+	case CMDQ_OP_RESUME:
+		cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_SID, ent->resume.sid);
+		cmd[1] |= FIELD_PREP(CMDQ_RESUME_1_STAG, ent->resume.stag);
+		switch (ent->resume.resp) {
+		case IOMMU_PAGE_RESP_INVALID:
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
+			break;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		if (ent->sync.msiaddr)
 			cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ);
@@ -1013,6 +1061,34 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
+static int arm_smmu_page_response(struct device *dev,
+				  struct page_response_msg *resp)
+{
+	int sid = dev->iommu_fwspec->ids[0];
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (master->ste.can_stall) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= resp->page_req_group_id;
+		cmd.resume.resp		= resp->resp_code;
+	} else {
+		/* TODO: put PRI response here */
+		return -ENODEV;
+	}
+
+	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
+	/*
+	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
+	 * RESUME consumption guarantees that the stalled transaction will be
+	 * terminated... at some point in the future. PRI_RESP is fire and
+	 * forget.
+	 */
+
+	return 0;
+}
+
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1123,7 +1199,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
+		   !ste->can_stall)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK) |
@@ -1196,7 +1273,6 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
-__maybe_unused
 static struct arm_smmu_master_data *
 arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 {
@@ -1222,10 +1298,86 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	int ret;
+	struct arm_smmu_master_data *master;
+	u8 type = FIELD_GET(EVTQ_0_ID, evt[0]);
+	u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
+
+	struct iommu_fault_event fault = {
+		.page_req_group_id	= FIELD_GET(EVTQ_1_STAG, evt[1]),
+		.addr			= FIELD_GET(EVTQ_2_ADDR, evt[2]),
+		.last_req		= true,
+	};
+
+	switch (type) {
+	case EVT_ID_TRANSLATION_FAULT:
+	case EVT_ID_ADDR_SIZE_FAULT:
+	case EVT_ID_ACCESS_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
+		break;
+	case EVT_ID_PERMISSION_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
+		break;
+	default:
+		/* TODO: report other unrecoverable faults. */
+		return -EFAULT;
+	}
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	/*
+	 * The domain is valid until the fault returns, because detach() flushes
+	 * the fault queue.
+	 */
+	if (evt[1] & EVTQ_1_STALL)
+		fault.type = IOMMU_FAULT_PAGE_REQ;
+	else
+		fault.type = IOMMU_FAULT_DMA_UNRECOV;
+
+	if (evt[1] & EVTQ_1_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	else
+		fault.prot |= IOMMU_FAULT_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	if (evt[0] & EVTQ_0_SSV) {
+		fault.pasid_valid = true;
+		fault.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
+	}
+
+	ret = iommu_report_device_fault(master->dev, &fault);
+	if (ret && fault.type == IOMMU_FAULT_PAGE_REQ) {
+		/* Nobody cared, abort the access */
+		struct page_response_msg resp = {
+			.addr			= fault.addr,
+			.pasid			= fault.pasid,
+			.pasid_present		= fault.pasid_valid,
+			.page_req_group_id	= fault.page_req_group_id,
+			.resp_code		= IOMMU_PAGE_RESP_FAILURE,
+		};
+		arm_smmu_page_response(master->dev, &resp);
+	}
+
+	return ret;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
 	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
@@ -1237,12 +1389,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
 
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
+
 			if (++num_handled == queue_size) {
 				q->batch++;
 				wake_up_all_locked(&q->wq);
 				num_handled = 0;
 			}
 
+			if (!ret)
+				continue;
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1374,7 +1533,9 @@ static int arm_smmu_flush_queues(void *cookie, struct device *dev)
 
 	if (dev) {
 		master = dev->iommu_fwspec->iommu_priv;
-		/* TODO: add support for PRI and Stall */
+		if (master->ste.can_stall)
+			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+		/* TODO: add support for PRI */
 		return 0;
 	}
 
@@ -1688,7 +1849,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 		.order			= master->ssid_bits,
 		.sync			= &arm_smmu_ctx_sync,
 		.arm_smmu = {
-			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
+					  master->ste.can_stall,
 			.asid_bits	= smmu->asid_bits,
 			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
 			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
@@ -2239,6 +2401,11 @@ static int arm_smmu_add_device(struct device *dev)
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
+	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
+		master->can_fault = true;
+		master->ste.can_stall = true;
+	}
+
 	ret = iommu_device_link(&smmu->iommu, dev);
 	if (ret)
 		goto err_free_master;
@@ -2403,6 +2570,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.mm_attach		= arm_smmu_mm_attach,
 	.mm_detach		= arm_smmu_mm_detach,
 	.mm_invalidate		= arm_smmu_mm_invalidate,
+	.page_response		= arm_smmu_page_response,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 34/40] ACPI/IORT: Check ATS capability in root complex nodes
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (32 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 33/40] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 35/40] iommu/arm-smmu-v3: Add support for PCI ATS Jean-Philippe Brucker
                     ` (5 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	sudeep.holla-5wv7dgnIgG8, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo

Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver.

Use the negative version (NO_ATS) at the moment because it's not clear
if/how the bit needs to be integrated in other firmware descriptions. The
SMMU has a feature bit telling if it supports ATS, which might be
sufficient in most systems for deciding whether or not we should enable
the ATS capability in endpoints.

Cc: lorenzo.pieralisi-5wv7dgnIgG8@public.gmane.org
Cc: hanjun.guo-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
Cc: sudeep.holla-5wv7dgnIgG8@public.gmane.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/acpi/arm64/iort.c | 11 +++++++++++
 include/linux/iommu.h     |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 7a3a541046ed..4f4907e58cc8 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1004,6 +1004,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+	struct acpi_iort_root_complex *pci_rc;
+
+	pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -1039,6 +1047,9 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 		info.node = node;
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
+
+		if (!err && !iort_pci_rc_supports_ats(node))
+			dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
 	} else {
 		int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index bcce44455117..d7e2f54086e4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -655,12 +655,16 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
+	u32			flags;
 	unsigned int		num_ids;
 	unsigned int		num_pasid_bits;
 	bool			can_stall;
 	u32			ids[1];
 };
 
+/* Firmware disabled ATS in the root complex */
+#define IOMMU_FWSPEC_PCI_NO_ATS			(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 35/40] iommu/arm-smmu-v3: Add support for PCI ATS
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (33 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 34/40] ACPI/IORT: Check ATS capability in root complex nodes Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-36-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 36/40] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops Jean-Philippe Brucker
                     ` (4 subsequent siblings)
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

PCIe devices can implement their own TLB, named Address Translation Cache
(ATC). Enable Address Translation Service (ATS) for devices that support
it and send them invalidation requests whenever we invalidate the IOTLBs.

  Range calculation
  -----------------

The invalidation packet itself is a bit awkward: range must be naturally
aligned, which means that the start address is a multiple of the range
size. In addition, the size must be a power of two number of 4k pages. We
have a few options to enforce this constraint:

(1) Find the smallest naturally aligned region that covers the requested
    range. This is simple to compute and only takes one ATC_INV, but it
    will spill on lots of neighbouring ATC entries.

(2) Align the start address to the region size (rounded up to a power of
    two), and send a second invalidation for the next range of the same
    size. Still not great, but reduces spilling.

(3) Cover the range exactly with the smallest number of naturally aligned
    regions. This would be interesting to implement but as for (2),
    requires multiple ATC_INV.

As I suspect ATC invalidation packets will be a very scarce resource, I'll
go with option (1) for now, and only send one big invalidation. We can
move to (2), which is both easier to read and more gentle with the ATC,
once we've observed on real systems that we can send multiple smaller
Invalidation Requests for roughly the same price as a single big one.

Note that with io-pgtable, the unmap function is called for each page, so
this doesn't matter. The problem shows up when sharing page tables with
the MMU.

  Timeout
  -------

ATC invalidation is allowed to take up to 90 seconds, according to the
PCIe spec, so it is possible to hit the SMMU command queue timeout during
normal operations.

Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
Others might let CMD_SYNC complete and have an asynchronous IMPDEF
mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
could retry sending all ATC_INV since last successful CMD_SYNC. When a
CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
commands since last successful CMD_SYNC.

We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
notifiers. So we'd have to introduce a more clever system if this timeout
becomes a problem, like keeping hold of mappings and invalidating in the
background. Implementing safe delayed invalidations is a very complex
problem and deserves a series of its own. We'll assess whether more work
is needed to properly handle ATC invalidation timeouts once this code runs
on real hardware.

  Misc
  ----

I didn't put ATC and TLB invalidations in the same functions for three
reasons:

* TLB invalidation by range is batched and committed with a single sync.
  Batching ATC invalidation is inconvenient, endpoints limit the number of
  inflight invalidations. We'd have to count the number of invalidations
  queued and send a sync periodically. In addition, I suspect we always
  need a sync between TLB and ATC invalidation for the same page.

* Doing ATC invalidation outside tlb_inv_range also allows to send less
  requests, since TLB invalidations are done per page or block, while ATC
  invalidations target IOVA ranges.

* TLB invalidation by context is performed when freeing the domain, at
  which point there isn't any device attached anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2: display error if ats is supported but cannot be enabled
---
 drivers/iommu/arm-smmu-v3.c | 225 +++++++++++++++++++++++++++++++++++-
 1 file changed, 219 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8a6a799ba04a..7034b0bdcbdf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -39,6 +39,7 @@
 #include <linux/of_iommu.h>
 #include <linux/of_platform.h>
 #include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 #include <linux/sched/mm.h>
 
@@ -103,6 +104,7 @@
 #define IDR5_VAX_52_BIT			1
 
 #define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
 #define CR0_CMDQEN			(1 << 3)
 #define CR0_EVTQEN			(1 << 2)
 #define CR0_PRIQEN			(1 << 1)
@@ -277,6 +279,7 @@
 #define CMDQ_ERR_CERROR_NONE_IDX	0
 #define CMDQ_ERR_CERROR_ILL_IDX		1
 #define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
 
 #define CMDQ_0_OP			GENMASK_ULL(7, 0)
 #define CMDQ_0_SSV			(1UL << 11)
@@ -296,6 +299,12 @@
 #define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
 #define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
 
+#define CMDQ_ATC_0_SSID			GENMASK_ULL(31, 12)
+#define CMDQ_ATC_0_SID			GENMASK_ULL(63, 32)
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE			GENMASK_ULL(5, 0)
+#define CMDQ_ATC_1_ADDR_MASK		GENMASK_ULL(63, 12)
+
 #define CMDQ_PRI_0_SSID			GENMASK_ULL(31, 12)
 #define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
 #define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
@@ -369,6 +378,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
+static bool disable_ats_check;
+module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_ats_check,
+	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
+
 enum pri_resp {
 	PRI_RESP_DENY = 0,
 	PRI_RESP_FAIL = 1,
@@ -442,6 +456,16 @@ struct arm_smmu_cmdq_ent {
 			u64			addr;
 		} tlbi;
 
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
 		#define CMDQ_OP_PRI_RESP	0x41
 		struct {
 			u32			sid;
@@ -869,6 +893,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_EL2_ASID:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
 		break;
+	case CMDQ_OP_ATC_INV:
+		cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
+		cmd[0] |= FIELD_PREP(CMDQ_ATC_0_GLOBAL, ent->atc.global);
+		cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SSID, ent->atc.ssid);
+		cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SID, ent->atc.sid);
+		cmd[1] |= FIELD_PREP(CMDQ_ATC_1_SIZE, ent->atc.size);
+		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
 		cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SSID, ent->pri.ssid);
@@ -922,6 +954,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
 		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
 		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
+		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
 	};
 
 	int i;
@@ -941,6 +974,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		dev_err(smmu->dev, "retrying command fetch\n");
 	case CMDQ_ERR_CERROR_NONE_IDX:
 		return;
+	case CMDQ_ERR_CERROR_ATC_INV_IDX:
+		/*
+		 * ATC Invalidation Completion timeout. CONS is still pointing
+		 * at the CMD_SYNC. Attempt to complete other pending commands
+		 * by repeating the CMD_SYNC, though we might well end up back
+		 * here since the ATC invalidation may still be pending.
+		 */
+		return;
 	case CMDQ_ERR_CERROR_ILL_IDX:
 		/* Fallthrough */
 	default:
@@ -1193,9 +1234,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
-#ifdef CONFIG_PCI_ATS
-			 FIELD_PREP(STRTAB_STE_1_EATS, STRTAB_STE_1_EATS_TRANS) |
-#endif
 			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
@@ -1225,6 +1263,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
 	}
 
+	if (IS_ENABLED(CONFIG_PCI_ATS))
+		dst[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
+						 STRTAB_STE_1_EATS_TRANS));
+
 	arm_smmu_sync_ste_for_sid(smmu, sid);
 	dst[0] = cpu_to_le64(val);
 	arm_smmu_sync_ste_for_sid(smmu, sid);
@@ -1613,6 +1655,104 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
 	return IRQ_WAKE_THREAD;
 }
 
+/* ATS invalidation */
+static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
+{
+	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
+}
+
+static void
+arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
+			struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t log2_span;
+	size_t span_mask;
+	/* ATC invalidates are always on 4096 bytes pages */
+	size_t inval_grain_shift = 12;
+	unsigned long page_start, page_end;
+
+	*cmd = (struct arm_smmu_cmdq_ent) {
+		.opcode			= CMDQ_OP_ATC_INV,
+		.substream_valid	= !!ssid,
+		.atc.ssid		= ssid,
+	};
+
+	if (!size) {
+		cmd->atc.size = ATC_INV_SIZE_ALL;
+		return;
+	}
+
+	page_start	= iova >> inval_grain_shift;
+	page_end	= (iova + size - 1) >> inval_grain_shift;
+
+	/*
+	 * Find the smallest power of two that covers the range. Most
+	 * significant differing bit between start and end address indicates the
+	 * required span, ie. fls(start ^ end). For example:
+	 *
+	 * We want to invalidate pages [8; 11]. This is already the ideal range:
+	 *		x = 0b1000 ^ 0b1011 = 0b11
+	 *		span = 1 << fls(x) = 4
+	 *
+	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
+	 *		x = 0b0111 ^ 0b1010 = 0b1101
+	 *		span = 1 << fls(x) = 16
+	 */
+	log2_span	= fls_long(page_start ^ page_end);
+	span_mask	= (1ULL << log2_span) - 1;
+
+	page_start	&= ~span_mask;
+
+	cmd->atc.addr	= page_start << inval_grain_shift;
+	cmd->atc.size	= log2_span;
+}
+
+static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
+				   struct arm_smmu_cmdq_ent *cmd)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!arm_smmu_master_has_ats(master))
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		cmd->atc.sid = fwspec->ids[i];
+		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
+	}
+
+	arm_smmu_cmdq_issue_sync(master->smmu);
+
+	return 0;
+}
+
+static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
+				       int ssid)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IO_PGTABLE API */
 static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 {
@@ -2026,6 +2166,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	if (smmu_domain) {
 		__iommu_sva_unbind_dev_all(dev);
 
+		arm_smmu_atc_inv_master_all(master, 0);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -2113,12 +2255,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t
 arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	ret = ops->unmap(ops, iova, size);
+
+	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
+		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+
+	return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -2284,6 +2433,54 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
+{
+	size_t stu;
+	int ret, pos;
+	struct pci_dev *pdev;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
+	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS);
+	if (!pos)
+		return -ENOSYS;
+
+	/* Smallest Translation Unit: log2 of the smallest supported granule */
+	stu = __ffs(smmu->pgsize_bitmap);
+
+	ret = pci_enable_ats(pdev, stu);
+	if (ret) {
+		dev_err(&pdev->dev, "could not enable ATS: %d\n", ret);
+		return ret;
+	}
+
+	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
+		pci_ats_queue_depth(pdev));
+
+	return 0;
+}
+
+static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->ats_enabled)
+		return;
+
+	pci_disable_ats(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2406,9 +2603,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	arm_smmu_enable_ats(master);
+
 	ret = iommu_device_link(&smmu->iommu, dev);
 	if (ret)
-		goto err_free_master;
+		goto err_disable_ats;
 
 	ret = arm_smmu_insert_master(smmu, master);
 	if (ret)
@@ -2430,6 +2629,9 @@ static int arm_smmu_add_device(struct device *dev)
 err_unlink:
 	iommu_device_unlink(&smmu->iommu, dev);
 
+err_disable_ats:
+	arm_smmu_disable_ats(master);
+
 err_free_master:
 	kfree(master);
 	fwspec->iommu_priv = NULL;
@@ -2457,6 +2659,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	iommu_group_remove_device(dev);
 	arm_smmu_remove_master(smmu, master);
 	iommu_device_unlink(&smmu->iommu, dev);
+	arm_smmu_disable_ats(master);
 	kfree(master);
 	iommu_fwspec_free(dev);
 }
@@ -3069,6 +3272,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 		}
 	}
 
+	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
+		enables |= CR0_ATSCHK;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable ATS check\n");
+			return ret;
+		}
+	}
+
 	ret = arm_smmu_setup_irqs(smmu);
 	if (ret) {
 		dev_err(smmu->dev, "failed to setup irqs\n");
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 36/40] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (34 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 35/40] iommu/arm-smmu-v3: Add support for PCI ATS Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 37/40] iommu/arm-smmu-v3: Disable tagged pointers Jean-Philippe Brucker
                     ` (3 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The core calls us when an mm is modified. Perform the required ATC
invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7034b0bdcbdf..6cb69ace371b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1735,6 +1735,15 @@ static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
 	return arm_smmu_atc_inv_master(master, &cmd);
 }
 
+static int arm_smmu_atc_inv_master_range(struct arm_smmu_master_data *master,
+					 int ssid, unsigned long iova, size_t size)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
 static size_t
 arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 			unsigned long iova, size_t size)
@@ -2389,11 +2398,12 @@ static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
 	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
 	if (detach_domain)
 		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
 
-	/* TODO: Invalidate ATC. */
+	arm_smmu_atc_inv_master_all(master, io_mm->pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2401,8 +2411,10 @@ static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
 				   struct device *dev, struct io_mm *io_mm,
 				   unsigned long iova, size_t size)
 {
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	arm_smmu_atc_inv_master_range(master, io_mm->pasid, iova, size);
 	/*
-	 * TODO: Invalidate ATC.
 	 * TODO: Invalidate mapping if not DVM
 	 */
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 37/40] iommu/arm-smmu-v3: Disable tagged pointers
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (35 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 36/40] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 38/40] PCI: Make "PRG Response PASID Required" handling common Jean-Philippe Brucker
                     ` (2 subsequent siblings)
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The ARM architecture has a "Top Byte Ignore" (TBI) option that makes the
MMU mask out bits [63:56] of an address, allowing a userspace application
to store data in its pointers. This option is incompatible with PCI ATS.

If TBI is enabled in the SMMU and userspace triggers DMA transactions on
tagged pointers, the endpoint might create ATC entries for addresses that
include a tag. Software would then have to send ATC invalidation packets
for each 255 possible alias of an address, or just wipe the whole address
space. This is not a viable option, so disable TBI.

The impact of this change is unclear, since there are very few users of
tagged pointers, much less SVA. But the requirement introduced by this
patch doesn't seem excessive: a userspace application using both tagged
pointers and SVA should now sanitize addresses (clear the tag) before
using them for device DMA.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index bdc9bfd1f35d..22e7b80a7682 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -200,7 +200,6 @@ static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_context_cfg *cfg, u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
 	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
 	if (cfg->hw_access)
 		val |= ARM_SMMU_TCR2CD(tcr, HA);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 38/40] PCI: Make "PRG Response PASID Required" handling common
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (36 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 37/40] iommu/arm-smmu-v3: Disable tagged pointers Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 39/40] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker
  2018-05-11 19:06   ` [PATCH v2 40/40] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

The PASID ECN to the PCIe spec added a bit in the PRI status register that
allows a Function to declare whether a PRG Response should contain the
PASID prefix or not.

Move the helper that accesses it from amd_iommu into the PCI subsystem,
renaming it to be consistent with the current PCI Express specification
(PRPR - PRG Response PASID Required).

Cc: bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
Acked-by: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/amd_iommu.c     | 19 +------------------
 drivers/pci/ats.c             | 17 +++++++++++++++++
 include/linux/pci-ats.h       |  8 ++++++++
 include/uapi/linux/pci_regs.h |  1 +
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8fb8c737fffe..ac0d97fa514f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2043,23 +2043,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 	return ret;
 }
 
-/* FIXME: Move this to PCI code */
-#define PCI_PRI_TLP_OFF		(1 << 15)
-
-static bool pci_pri_tlp_required(struct pci_dev *pdev)
-{
-	u16 status;
-	int pos;
-
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
-		return false;
-
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
-
-	return (status & PCI_PRI_TLP_OFF) ? true : false;
-}
-
 /*
  * If a device is not yet associated with a domain, this function
  * assigns it visible for the hardware
@@ -2088,7 +2071,7 @@ static int attach_device(struct device *dev,
 
 			dev_data->ats.enabled = true;
 			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
-			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
+			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
 		}
 	} else if (amd_iommu_iotlb_sup &&
 		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 89305b569d3d..dc743e2fae7d 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -388,3 +388,20 @@ int pci_max_pasids(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
 #endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	u16 status;
+	int pos;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return false;
+
+	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+
+	return !!(status & PCI_PRI_STATUS_PRPR);
+}
+EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 7c4b8e27268c..1825ca2c9bf4 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -68,5 +68,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_PASID */
 
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
+#else
+static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
 
 #endif /* LINUX_PCI_ATS_H*/
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 103ba797a8f3..f9a11a13131c 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -871,6 +871,7 @@
 #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
 #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
 #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
+#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
 #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
 #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
 #define PCI_EXT_CAP_PRI_SIZEOF	16
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 39/40] iommu/arm-smmu-v3: Add support for PRI
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (37 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 38/40] PCI: Make "PRG Response PASID Required" handling common Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
       [not found]     ` <20180511190641.23008-40-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-11 19:06   ` [PATCH v2 40/40] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
  39 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

For PCI devices that support it, enable the PRI capability and handle PRI
Page Requests with the generic fault handler. It is enabled on demand by
iommu_sva_device_init().

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

---
v1->v2:
* Terminate the page request and disable PRI if no handler is registered
* Enable and disable PRI in sva_device_init/shutdown, instead of
  add/remove_device
---
 drivers/iommu/arm-smmu-v3.c | 192 +++++++++++++++++++++++++++---------
 1 file changed, 145 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 6cb69ace371b..0edbb8d19579 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -248,6 +248,7 @@
 #define STRTAB_STE_1_S1COR		GENMASK_ULL(5, 4)
 #define STRTAB_STE_1_S1CSH		GENMASK_ULL(7, 6)
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS		GENMASK_ULL(29, 28)
@@ -309,6 +310,9 @@
 #define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
 #define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
 #define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
+#define CMDQ_PRI_1_RESP_FAILURE		FIELD_PREP(CMDQ_PRI_1_RESP, 0UL)
+#define CMDQ_PRI_1_RESP_INVALID		FIELD_PREP(CMDQ_PRI_1_RESP, 1UL)
+#define CMDQ_PRI_1_RESP_SUCCESS		FIELD_PREP(CMDQ_PRI_1_RESP, 2UL)
 
 #define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
 #define CMDQ_RESUME_0_ACTION_RETRY	(1UL << 12)
@@ -383,12 +387,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_ats_check,
 	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
 
-enum pri_resp {
-	PRI_RESP_DENY = 0,
-	PRI_RESP_FAIL = 1,
-	PRI_RESP_SUCC = 2,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -471,7 +469,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			enum page_response_code	resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -556,6 +554,7 @@ struct arm_smmu_strtab_ent {
 	struct arm_smmu_s2_cfg		*s2_cfg;
 
 	bool				can_stall;
+	bool				prg_resp_needs_ssid;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -907,14 +906,18 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SID, ent->pri.sid);
 		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_GRPID, ent->pri.grpid);
 		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-		case PRI_RESP_FAIL:
-		case PRI_RESP_SUCC:
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
+			break;
+		case IOMMU_PAGE_RESP_INVALID:
+			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
+			break;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
 			break;
 		default:
 			return -EINVAL;
 		}
-		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
 		break;
 	case CMDQ_OP_RESUME:
 		cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_SID, ent->resume.sid);
@@ -1114,8 +1117,15 @@ static int arm_smmu_page_response(struct device *dev,
 		cmd.resume.sid		= sid;
 		cmd.resume.stag		= resp->page_req_group_id;
 		cmd.resume.resp		= resp->resp_code;
+	} else if (master->can_fault) {
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= resp->pasid_present &&
+					  master->ste.prg_resp_needs_ssid;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= resp->pasid;
+		cmd.pri.grpid		= resp->page_req_group_id;
+		cmd.pri.resp		= resp->resp_code;
 	} else {
-		/* TODO: put PRI response here */
 		return -ENODEV;
 	}
 
@@ -1236,6 +1246,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
 			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
 
+		if (ste->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
 		   !ste->can_stall)
@@ -1471,39 +1484,54 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = FIELD_GET(PRIQ_0_SID, evt[0]);
-	ssv = FIELD_GET(PRIQ_0_SSID_V, evt[0]);
-	ssid = ssv ? FIELD_GET(PRIQ_0_SSID, evt[0]) : 0;
-	last = FIELD_GET(PRIQ_0_PRG_LAST, evt[0]);
-	grpid = FIELD_GET(PRIQ_1_PRG_IDX, evt[1]);
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK);
-
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
+	u32 sid = FIELD_PREP(PRIQ_0_SID, evt[0]);
+
+	struct arm_smmu_master_data *master;
+	struct iommu_fault_event fault = {
+		.type			= IOMMU_FAULT_PAGE_REQ,
+		.last_req		= FIELD_GET(PRIQ_0_PRG_LAST, evt[0]),
+		.pasid_valid		= FIELD_GET(PRIQ_0_SSID_V, evt[0]),
+		.pasid			= FIELD_GET(PRIQ_0_SSID, evt[0]),
+		.page_req_group_id	= FIELD_GET(PRIQ_1_PRG_IDX, evt[1]),
+		.addr			= evt[1] & PRIQ_1_ADDR_MASK,
+	};
+
+	if (evt[0] & PRIQ_0_PERM_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		fault.prot |= IOMMU_FAULT_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	/* Discard Stop PASID marker, it isn't used */
+	if (!(fault.prot & (IOMMU_FAULT_READ|IOMMU_FAULT_WRITE)) &&
+	    fault.last_req)
+		return;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	if (iommu_report_device_fault(master->dev, &fault)) {
+		/*
+		 * No handler registered, so subsequent faults won't produce
+		 * better results. Try to disable PRI.
+		 */
+		struct page_response_msg page_response = {
+			.addr			= fault.addr,
+			.pasid			= fault.pasid,
+			.pasid_present		= fault.pasid_valid,
+			.page_req_group_id	= fault.page_req_group_id,
+			.resp_code		= IOMMU_PAGE_RESP_FAILURE,
 		};
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		dev_warn(master->dev,
+			 "PPR 0x%x:0x%llx 0x%x: nobody cared, disabling PRI\n",
+			 fault.pasid_valid ? fault.pasid : 0, fault.addr,
+			 fault.prot);
+		arm_smmu_page_response(master->dev, &page_response);
 	}
 }
 
@@ -1529,6 +1557,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
+			/*
+			 * TODO: flush pending faults, since the SMMU might have
+			 * auto-responded to the Last request of a pending
+			 * group
+			 */
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
 	} while (!queue_empty(q));
 
@@ -1577,7 +1610,8 @@ static int arm_smmu_flush_queues(void *cookie, struct device *dev)
 		master = dev->iommu_fwspec->iommu_priv;
 		if (master->ste.can_stall)
 			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
-		/* TODO: add support for PRI */
+		else if (master->can_fault)
+			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
 		return 0;
 	}
 
@@ -2301,6 +2335,59 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
+{
+	int ret, pos;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about current and future (hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	master->can_fault = true;
+	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
+
+	dev_dbg(master->dev, "enabled PRI\n");
+
+	return 0;
+}
+
+static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+	dev_dbg(master->dev, "disabled PRI\n");
+	master->can_fault = false;
+}
+
 static int arm_smmu_sva_init(struct device *dev, struct iommu_sva_param *param)
 {
 	int ret;
@@ -2314,11 +2401,15 @@ static int arm_smmu_sva_init(struct device *dev, struct iommu_sva_param *param)
 		return -EINVAL;
 
 	if (param->features & IOMMU_SVA_FEAT_IOPF) {
-		if (!master->can_fault)
-			return -EINVAL;
+		arm_smmu_enable_pri(master);
+		if (!master->can_fault) {
+			ret = -ENODEV;
+			goto err_disable_pri;
+		}
+
 		ret = iopf_queue_add_device(master->smmu->iopf_queue, dev);
 		if (ret)
-			return ret;
+			goto err_disable_pri;
 	}
 
 	if (!param->max_pasid)
@@ -2329,11 +2420,17 @@ static int arm_smmu_sva_init(struct device *dev, struct iommu_sva_param *param)
 	param->max_pasid = min(param->max_pasid, (1U << master->ssid_bits) - 1);
 
 	return 0;
+
+err_disable_pri:
+	arm_smmu_disable_pri(master);
+
+	return ret;
 }
 
 static void arm_smmu_sva_shutdown(struct device *dev,
 				  struct iommu_sva_param *param)
 {
+	arm_smmu_disable_pri(dev->iommu_fwspec->iommu_priv);
 	iopf_queue_remove_device(dev);
 }
 
@@ -2671,6 +2768,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	iommu_group_remove_device(dev);
 	arm_smmu_remove_master(smmu, master);
 	iommu_device_unlink(&smmu->iommu, dev);
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 	kfree(master);
 	iommu_fwspec_free(dev);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [PATCH v2 40/40] iommu/arm-smmu-v3: Add support for PCI PASID
       [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
                     ` (38 preceding siblings ...)
  2018-05-11 19:06   ` [PATCH v2 39/40] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker
@ 2018-05-11 19:06   ` Jean-Philippe Brucker
  39 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Enable PASID for PCI devices that support it. Unlike PRI, we can't enable
PASID lazily in iommu_sva_device_init(), because it has to be enabled
before ATS, and because we have to allocate substream tables early.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 54 +++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 0edbb8d19579..ac6e69f25893 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2542,6 +2542,52 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_enable_pasid(struct arm_smmu_master_data *master)
+{
+	int ret;
+	int features;
+	u8 pasid_bits;
+	int num_pasids;
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	features = pci_pasid_features(pdev);
+	if (features < 0)
+		return -ENOSYS;
+
+	num_pasids = pci_max_pasids(pdev);
+	if (num_pasids <= 0)
+		return -ENOSYS;
+
+	pasid_bits = min_t(u8, ilog2(num_pasids), master->smmu->ssid_bits);
+
+	dev_dbg(&pdev->dev, "device supports %#x PASID bits [%s%s]\n", pasid_bits,
+		(features & PCI_PASID_CAP_EXEC) ? "x" : "",
+		(features & PCI_PASID_CAP_PRIV) ? "p" : "");
+
+	ret = pci_enable_pasid(pdev, features);
+	return ret ? ret : pasid_bits;
+}
+
+static void arm_smmu_disable_pasid(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pasid_enabled)
+		return;
+
+	pci_disable_pasid(pdev);
+}
+
 static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
 {
 	size_t stu;
@@ -2712,6 +2758,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	/* PASID must be enabled before ATS */
+	ret = arm_smmu_enable_pasid(master);
+	if (ret > 0)
+		master->ssid_bits = ret;
+
 	arm_smmu_enable_ats(master);
 
 	ret = iommu_device_link(&smmu->iommu, dev);
@@ -2740,6 +2791,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 err_disable_ats:
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 err_free_master:
 	kfree(master);
@@ -2769,7 +2821,9 @@ static void arm_smmu_remove_device(struct device *dev)
 	arm_smmu_remove_master(smmu, master);
 	iommu_device_unlink(&smmu->iommu, dev);
 	arm_smmu_disable_pri(master);
+	/* PASID must be disabled after ATS */
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 	kfree(master);
 	iommu_fwspec_free(dev);
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 16/40] arm64: mm: Pin down ASIDs for sharing mm with devices
       [not found]     ` <20180511190641.23008-17-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-15 14:16       ` Catalin Marinas
       [not found]         ` <20180515141658.vivrgcyww2pxumye-+1aNUgJU5qkijLcmloz0ER/iLCjYCKR+VpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Catalin Marinas @ 2018-05-15 14:16 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

Hi Jean-Philippe,

On Fri, May 11, 2018 at 08:06:17PM +0100, Jean-Philippe Brucker wrote:
> +unsigned long mm_context_get(struct mm_struct *mm)
> +{
> +	unsigned long flags;
> +	u64 asid;
> +
> +	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
> +
> +	asid = atomic64_read(&mm->context.id);
> +
> +	if (mm->context.pinned) {
> +		mm->context.pinned++;
> +		asid &= ~ASID_MASK;
> +		goto out_unlock;
> +	}
> +
> +	if (nr_pinned_asids >= max_pinned_asids) {
> +		asid = 0;
> +		goto out_unlock;
> +	}
> +
> +	if (!asid_gen_match(asid)) {
> +		/*
> +		 * We went through one or more rollover since that ASID was
> +		 * used. Ensure that it is still valid, or generate a new one.
> +		 * The cpu argument isn't used by new_context.
> +		 */
> +		asid = new_context(mm, 0);
> +		atomic64_set(&mm->context.id, asid);
> +	}
> +
> +	asid &= ~ASID_MASK;
> +
> +	nr_pinned_asids++;
> +	__set_bit(asid2idx(asid), pinned_asid_map);
> +	mm->context.pinned++;
> +
> +out_unlock:
> +	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
> +
> +	return asid;
> +}

With CONFIG_UNMAP_KERNEL_AT_EL0 (a.k.a. KPTI), the hardware ASID has bit
0 set automatically when entering user space (and cleared when getting
back to the kernel). If the returned asid value here is going to be used
as is in the calling code, you should probably set bit 0 when KPTI is
enabled.

-- 
Catalin

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]     ` <20180511190641.23008-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-16 20:41       ` Jacob Pan
  2018-05-17 10:02         ` Jean-Philippe Brucker
  2018-09-05 11:29       ` Auger Eric
  1 sibling, 1 reply; 123+ messages in thread
From: Jacob Pan @ 2018-05-16 20:41 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:02 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Shared Virtual Addressing (SVA) provides a way for device drivers to
> bind process address spaces to devices. This requires the IOMMU to
> support page table format and features compatible with the CPUs, and
> usually requires the system to support I/O Page Faults (IOPF) and
> Process Address Space ID (PASID). When all of these are available,
> DMA can access virtual addresses of a process. A PASID is allocated
> for each process, and the device driver programs it into the device
> in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce
> two IOMMU operations, sva_device_init() and sva_device_shutdown(),
> that prepare the IOMMU driver for SVA. For example allocate PASID
> tables and fault queues. Subsequent patches will implement the bind()
> and unbind() operations.
> 
> Support for I/O Page Faults will be added in a later patch using a new
> feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
> down all shared mappings. Other feature bits that may be added in the
> future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address
> spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address
> space to a process.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2:
> * Add sva_param structure to iommu_param
> * CONFIG option is only selectable by IOMMU drivers
> ---
>  drivers/iommu/Kconfig     |   4 ++
>  drivers/iommu/Makefile    |   1 +
>  drivers/iommu/iommu-sva.c | 110
> ++++++++++++++++++++++++++++++++++++++ include/linux/iommu.h     |
> 32 +++++++++++ 4 files changed, 147 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 7564237f788d..cca8e06903c7 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,10 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
>  
> +config IOMMU_SVA
> +	bool
> +	select IOMMU_API
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index 000000000000..8b4afb7c63ae
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Manage PASIDs and bind process address spaces to devices.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/slab.h>
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing
> for a device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to
> initialize all
> + * features required for SVA.
> + *
> + * The device must support multiple address spaces (e.g. PCI PASID).
> By default
> + * the PASID allocated during bind() is limited by the IOMMU
> capacity, and by
> + * the device PASID width defined in the PCI capability or in the
> firmware
> + * description. Setting @max_pasid to a non-zero value smaller than
> this limit
> + * overrides it.
> + *
seems the min_pasid never gets used. do you really need it?
 
> + * The device should not be performing any DMA while this function
> is running,
> + * otherwise the behavior is undefined.
> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +			  unsigned int max_pasid)
> +{
> +	int ret;
> +	struct iommu_sva_param *param;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain || !domain->ops->sva_device_init)
> +		return -ENODEV;
> +
> +	if (features)
> +		return -EINVAL;
should it be !features?

> +
> +	param = kzalloc(sizeof(*param), GFP_KERNEL);
> +	if (!param)
> +		return -ENOMEM;
> +
> +	param->features		= features;
> +	param->max_pasid	= max_pasid;
> +
> +	/*
> +	 * IOMMU driver updates the limits depending on the IOMMU
> and device
> +	 * capabilities.
> +	 */
> +	ret = domain->ops->sva_device_init(dev, param);
> +	if (ret)
> +		goto err_free_param;
> +
> +	mutex_lock(&dev->iommu_param->lock);
> +	if (dev->iommu_param->sva_param)
> +		ret = -EEXIST;
> +	else
> +		dev->iommu_param->sva_param = param;
> +	mutex_unlock(&dev->iommu_param->lock);
> +	if (ret)
> +		goto err_device_shutdown;
> +
> +	return 0;
> +
> +err_device_shutdown:
> +	if (domain->ops->sva_device_shutdown)
> +		domain->ops->sva_device_shutdown(dev, param);
> +
> +err_free_param:
> +	kfree(param);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_init);
> +
> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing
> for a device
> + * @dev: the device
> + *
> + * Disable SVA. Device driver should ensure that the device isn't
> performing any
> + * DMA while this function is running.
> + */
> +int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	struct iommu_sva_param *param;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	mutex_lock(&dev->iommu_param->lock);
> +	param = dev->iommu_param->sva_param;
> +	dev->iommu_param->sva_param = NULL;
> +	mutex_unlock(&dev->iommu_param->lock);
> +	if (!param)
> +		return -ENODEV;
> +
> +	if (domain->ops->sva_device_shutdown)
> +		domain->ops->sva_device_shutdown(dev, param);
seems a little mismatch here, do you need pass the param. I don't think
there is anything else model specific iommu driver need to do for the
param.
> +
> +	kfree(param);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0933f726d2e6..2efe7738bedb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -212,6 +212,12 @@ struct page_response_msg {
>  	u64 private_data;
>  };
>  
> +struct iommu_sva_param {
> +	unsigned long features;
> +	unsigned int min_pasid;
> +	unsigned int max_pasid;
> +};
> +
>  /**
>   * struct iommu_ops - iommu ops and capabilities
>   * @capable: check capability
> @@ -219,6 +225,8 @@ struct page_response_msg {
>   * @domain_free: free iommu domain
>   * @attach_dev: attach device to an iommu domain
>   * @detach_dev: detach device from an iommu domain
> + * @sva_device_init: initialize Shared Virtual Adressing for a device
> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a
> device
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @unmap: unmap a physically contiguous memory region from an iommu
> domain
>   * @map_sg: map a scatter-gather list of physically contiguous
> memory chunks @@ -256,6 +264,10 @@ struct iommu_ops {
>  
>  	int (*attach_dev)(struct iommu_domain *domain, struct device
> *dev); void (*detach_dev)(struct iommu_domain *domain, struct device
> *dev);
> +	int (*sva_device_init)(struct device *dev,
> +			       struct iommu_sva_param *param);
> +	void (*sva_device_shutdown)(struct device *dev,
> +				    struct iommu_sva_param *param);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long
> iova, @@ -413,6 +425,7 @@ struct iommu_fault_param {
>   * struct iommu_param - collection of per-device IOMMU data
>   *
>   * @fault_param: IOMMU detected device fault reporting data
> + * @sva_param: SVA parameters
>   *
>   * TODO: migrate other per device data pointers under
> iommu_dev_data, e.g.
>   *	struct iommu_group	*iommu_group;
> @@ -421,6 +434,7 @@ struct iommu_fault_param {
>  struct iommu_param {
>  	struct mutex lock;
>  	struct iommu_fault_param *fault_param;
> +	struct iommu_sva_param *sva_param;
>  };
>  
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -920,4 +934,22 @@ static inline int iommu_sva_invalidate(struct
> iommu_domain *domain, 
>  #endif /* CONFIG_IOMMU_API */
>  
> +#ifdef CONFIG_IOMMU_SVA
> +extern int iommu_sva_device_init(struct device *dev, unsigned long
> features,
> +				 unsigned int max_pasid);
> +extern int iommu_sva_device_shutdown(struct device *dev);
> +#else /* CONFIG_IOMMU_SVA */
> +static inline int iommu_sva_device_init(struct device *dev,
> +					unsigned long features,
> +					unsigned int max_pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
>  #endif /* __LINUX_IOMMU_H */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]     ` <20180511190641.23008-4-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-16 23:31       ` Jacob Pan
  2018-05-17 10:02         ` Jean-Philippe Brucker
  2018-05-17 14:25       ` Jonathan Cameron
  2018-09-05 12:14       ` Auger Eric
  2 siblings, 1 reply; 123+ messages in thread
From: Jacob Pan @ 2018-05-16 23:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:04 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Allocate IOMMU mm structures and binding them to devices. Four
> operations are added to IOMMU drivers:
> 
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on
> ARM, pin down the CPU ASID so that the process doesn't get assigned a
> new ASID on rollover).
> 
>   There is a single valid io_mm structure per Linux mm. Future
> extensions may also use io_mm for kernel-managed address spaces,
> populated with map()/unmap() calls instead of bound to process
> address spaces. This patch focuses on "shared" io_mm.
> 
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that
> the device is capable of sharing an address space, and writes the
> PASID table entry to install the pgd.
> 
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and
> clear the PASID entry or only send the required TLB invalidations.
> 
> * mm_detach(): detach an mm from a device. The IOMMU driver removes
> the PASID table entry and invalidates the IOTLBs.
> 
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
> 
> mm_attach and mm_detach operations are serialized with a spinlock.
> When trying to optimize this code, we should at least prevent
> concurrent attach()/detach() on the same domain (so multi-level PASID
> table code can allocate tables lazily). mm_alloc() can sleep, but
> mm_free must not (because we'll have to call it from call_srcu later
> on).
> 
> At the moment we use an IDR for allocating PASIDs and retrieving
> contexts. We also use a single spinlock. These can be refined and
> optimized later (a custom allocator will be needed for top-down PASID
> allocation).
> 
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is
> left for another patch and we explicitly fail mm_alloc() for the
> moment.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2: sanity-check of flags
> ---
>  drivers/iommu/iommu-sva.c | 380
> +++++++++++++++++++++++++++++++++++++- drivers/iommu/iommu.c     |
> 1 + include/linux/iommu.h     |  28 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 8d98f9c09864..6ac679c48f3c 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -5,8 +5,298 @@
>   * Copyright (C) 2018 ARM Ltd.
>   */
>  
> +#include <linux/idr.h>
>  #include <linux/iommu.h>
> +#include <linux/sched/mm.h>
>  #include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between
> CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link
> between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be
> bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.*
> are in domain
> + * B. All devices within the same domain access the same address
> spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an
> mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable,
> that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the
> CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue
> the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each
> mm. In the
> + * example, devices use PASID 1 to read/write into address space X
> and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would
> typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
I am a little confused about domain vs. pasid relationship. If
each domain represents a address space, should there be a domain for
each pasid?

> + * With this model, a single call binds all devices in a given
> domain to an
> + * address space. Other devices in the domain will get the same bond
> implicitly.
> + * However, users must issue one bind() for each device, because
> IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per
> device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to
> hold
> + * non-PASID translations. In this case PASID 0 is reserved and
> entry 0 points
> + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would
> be held in
> + * the device table and PASID 0 would be available to the allocator.
> + */
> +
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the
> sign bit is
> + * used for returning errors). In practice implementations will use
> at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID
> IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm, unsigned long flags)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm, flags);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the
> io_mm
> +	 * (which may involve unpinning the CPU ASID for instance,
> requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->flags		= flags;
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
> +			  param->max_pasid + 1, GFP_ATOMIC);
> +	io_mm->pasid = pasid;
> +	spin_unlock(&iommu_sva_lock);
> +	idr_preload_end();
> +
> +	if (pasid < 0) {
> +		ret = pasid;
> +		goto err_free_mm;
> +	}
> +
> +	/* TODO: keep track of mm. For the moment, abort. */
> +	ret = -ENOSYS;
> +	spin_lock(&iommu_sva_lock);
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +	spin_unlock(&iommu_sva_lock);
> +
> +err_free_mm:
> +	domain->ops->mm_free(io_mm);
> +	mmdrop(mm);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm = io_mm->mm;
> +
> +	io_mm->release(io_mm);
> +	mmdrop(mm);
> +}
> +
> +static void io_mm_release(struct kref *kref)
> +{
> +	struct io_mm *io_mm;
> +
> +	io_mm = container_of(kref, struct io_mm, kref);
> +	WARN_ON(!list_empty(&io_mm->devices));
> +
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +
> +	io_mm_free(io_mm);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the io_mm was successfully
> taken.
> + * Returns zero if the io_mm is being freed and should not be used.
> + */
> +static int io_mm_get_locked(struct io_mm *io_mm)
> +{
> +	if (io_mm)
> +		return kref_get_unless_zero(&io_mm->kref);
> +
> +	return 0;
> +}
> +
> +static void io_mm_put_locked(struct io_mm *io_mm)
> +{
> +	kref_put(&io_mm->kref, io_mm_release);
> +}
> +
> +static void io_mm_put(struct io_mm *io_mm)
> +{
> +	spin_lock(&iommu_sva_lock);
> +	io_mm_put_locked(io_mm);
> +	spin_unlock(&iommu_sva_lock);
> +}
> +
> +static int io_mm_attach(struct iommu_domain *domain, struct device
> *dev,
> +			struct io_mm *io_mm, void *drvdata)
> +{
> +	int ret;
> +	bool attach_domain = true;
> +	int pasid = io_mm->pasid;
> +	struct iommu_bond *bond, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +		return -ENODEV;
> +
> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> +		return -ERANGE;
> +
> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond->domain		= domain;
> +	bond->io_mm		= io_mm;
> +	bond->dev		= dev;
> +	bond->drvdata		= drvdata;
> +
> +	spin_lock(&iommu_sva_lock);
> +	/*
> +	 * Check if this io_mm is already bound to the domain. In
> which case the
> +	 * IOMMU driver doesn't have to install the PASID table
> entry.
> +	 */
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == io_mm) {
> +			attach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	ret = domain->ops->mm_attach(domain, dev, io_mm,
> attach_domain);
> +	if (ret) {
> +		kfree(bond);
> +		spin_unlock(&iommu_sva_lock);
> +		return ret;
> +	}
> +
> +	list_add(&bond->mm_head, &io_mm->devices);
> +	list_add(&bond->domain_head, &domain->mm_list);
> +	list_add(&bond->dev_head, &param->mm_list);
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return 0;
> +}
> +
> +static void io_mm_detach_locked(struct iommu_bond *bond)
> +{
> +	struct iommu_bond *tmp;
> +	bool detach_domain = true;
> +	struct iommu_domain *domain = bond->domain;
> +
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == bond->io_mm && tmp->dev !=
> bond->dev) {
> +			detach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	domain->ops->mm_detach(domain, bond->dev, bond->io_mm,
> detach_domain); +
> +	list_del(&bond->mm_head);
> +	list_del(&bond->domain_head);
> +	list_del(&bond->dev_head);
> +	io_mm_put_locked(bond->io_mm);
> +
> +	kfree(bond);
> +}
>  
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing
> for a device @@ -47,6 +337,7 @@ int iommu_sva_device_init(struct
> device *dev, unsigned long features, 
>  	param->features		= features;
>  	param->max_pasid	= max_pasid;
> +	INIT_LIST_HEAD(&param->mm_list);
>  
>  	/*
>  	 * IOMMU driver updates the limits depending on the IOMMU
> and device @@ -114,13 +405,87 @@
> EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown); int
> __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int
> *pasid, unsigned long flags, void *drvdata) {
> -	return -ENOSYS; /* TODO */
> +	int i, ret = 0;
> +	struct io_mm *io_mm = NULL;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	/*
> +	 * The device driver does not call sva_device_init/shutdown
> and
> +	 * bind/unbind concurrently, so no need to take the param
> lock.
> +	 */
> +	if (WARN_ON_ONCE(!param) || (flags & ~param->features))
> +		return -EINVAL;
> +
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> +		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
> +			/* ... Unless it's already bound to this
> device */
> +			list_for_each_entry(tmp, &io_mm->devices,
> mm_head) {
> +				if (tmp->dev == dev) {
> +					bond = tmp;
> +					io_mm_put_locked(io_mm);
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	if (bond)
> +		return -EEXIST;
> +
> +	/* Require identical features within an io_mm for now */
> +	if (io_mm && (flags != io_mm->flags)) {
> +		io_mm_put(io_mm);
> +		return -EDOM;
> +	}
> +
> +	if (!io_mm) {
> +		io_mm = io_mm_alloc(domain, dev, mm, flags);
> +		if (IS_ERR(io_mm))
> +			return PTR_ERR(io_mm);
> +	}
> +
> +	ret = io_mm_attach(domain, dev, io_mm, drvdata);
> +	if (ret)
> +		io_mm_put(io_mm);
> +	else
> +		*pasid = io_mm->pasid;
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
>  
>  int __iommu_sva_unbind_device(struct device *dev, int pasid)
>  {
> -	return -ENOSYS; /* TODO */
> +	int ret = -ESRCH;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!param || WARN_ON(!domain))
> +		return -EINVAL;
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry(bond, &param->mm_list, dev_head) {
> +		if (bond->io_mm->pasid == pasid) {
> +			io_mm_detach_locked(bond);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>  
> @@ -132,6 +497,15 @@ EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>   */
>  void __iommu_sva_unbind_dev_all(struct device *dev)
>  {
> -	/* TODO */
> +	struct iommu_sva_param *param;
> +	struct iommu_bond *bond, *next;
> +
> +	param = dev->iommu_param->sva_param;
> +	if (param) {
> +		spin_lock(&iommu_sva_lock);
> +		list_for_each_entry_safe(bond, next,
> &param->mm_list, dev_head)
> +			io_mm_detach_locked(bond);
> +		spin_unlock(&iommu_sva_lock);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index bd2819deae5b..333801e1519c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1463,6 +1463,7 @@ static struct iommu_domain
> *__iommu_domain_alloc(struct bus_type *bus, domain->type = type;
>  	/* Assume all sizes by default; the driver may override this
> later */ domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
> +	INIT_LIST_HEAD(&domain->mm_list);
>  
>  	return domain;
>  }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index da59c20c4f12..d5f21719a5a0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -100,6 +100,20 @@ struct iommu_domain {
>  	void *handler_token;
>  	struct iommu_domain_geometry geometry;
>  	void *iova_cookie;
> +
> +	struct list_head mm_list;
> +};
> +
> +struct io_mm {
> +	int			pasid;
> +	/* IOMMU_SVA_FEAT_* */
> +	unsigned long		flags;
> +	struct list_head	devices;
> +	struct kref		kref;
> +	struct mm_struct	*mm;
> +
> +	/* Release callback for this mm */
> +	void (*release)(struct io_mm *io_mm);
>  };
>  
>  enum iommu_cap {
> @@ -216,6 +230,7 @@ struct iommu_sva_param {
>  	unsigned long features;
>  	unsigned int min_pasid;
>  	unsigned int max_pasid;
> +	struct list_head mm_list;
>  };
>  
>  /**
> @@ -227,6 +242,11 @@ struct iommu_sva_param {
>   * @detach_dev: detach device from an iommu domain
>   * @sva_device_init: initialize Shared Virtual Adressing for a device
>   * @sva_device_shutdown: shutdown Shared Virtual Adressing for a
> device
> + * @mm_alloc: allocate io_mm
> + * @mm_free: free io_mm
> + * @mm_attach: attach io_mm to a device. Install PASID entry if
> necessary
> + * @mm_detach: detach io_mm from a device. Remove PASID entry and
> + *             flush associated TLB entries.
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @unmap: unmap a physically contiguous memory region from an iommu
> domain
>   * @map_sg: map a scatter-gather list of physically contiguous
> memory chunks @@ -268,6 +288,14 @@ struct iommu_ops {
>  			       struct iommu_sva_param *param);
>  	void (*sva_device_shutdown)(struct device *dev,
>  				    struct iommu_sva_param *param);
> +	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
> +				  struct mm_struct *mm,
> +				  unsigned long flags);
> +	void (*mm_free)(struct io_mm *io_mm);
> +	int (*mm_attach)(struct iommu_domain *domain, struct device
> *dev,
> +			 struct io_mm *io_mm, bool attach_domain);
> +	void (*mm_detach)(struct iommu_domain *domain, struct device
> *dev,
> +			  struct io_mm *io_mm, bool detach_domain);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long
> iova,

[Jacob Pan]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 16/40] arm64: mm: Pin down ASIDs for sharing mm with devices
       [not found]         ` <20180515141658.vivrgcyww2pxumye-+1aNUgJU5qkijLcmloz0ER/iLCjYCKR+VpNB7YpNyf8@public.gmane.org>
@ 2018-05-17 10:01           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-17 10:01 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 15/05/18 15:16, Catalin Marinas wrote:
> Hi Jean-Philippe,
> 
> On Fri, May 11, 2018 at 08:06:17PM +0100, Jean-Philippe Brucker wrote:
>> +unsigned long mm_context_get(struct mm_struct *mm)
>> +{
>> +	unsigned long flags;
>> +	u64 asid;
>> +
>> +	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
>> +
>> +	asid = atomic64_read(&mm->context.id);
>> +
>> +	if (mm->context.pinned) {
>> +		mm->context.pinned++;
>> +		asid &= ~ASID_MASK;
>> +		goto out_unlock;
>> +	}
>> +
>> +	if (nr_pinned_asids >= max_pinned_asids) {
>> +		asid = 0;
>> +		goto out_unlock;
>> +	}
>> +
>> +	if (!asid_gen_match(asid)) {
>> +		/*
>> +		 * We went through one or more rollover since that ASID was
>> +		 * used. Ensure that it is still valid, or generate a new one.
>> +		 * The cpu argument isn't used by new_context.
>> +		 */
>> +		asid = new_context(mm, 0);
>> +		atomic64_set(&mm->context.id, asid);
>> +	}
>> +
>> +	asid &= ~ASID_MASK;
>> +
>> +	nr_pinned_asids++;
>> +	__set_bit(asid2idx(asid), pinned_asid_map);
>> +	mm->context.pinned++;
>> +
>> +out_unlock:
>> +	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
>> +
>> +	return asid;
>> +}
> 
> With CONFIG_UNMAP_KERNEL_AT_EL0 (a.k.a. KPTI), the hardware ASID has bit
> 0 set automatically when entering user space (and cleared when getting
> back to the kernel). If the returned asid value here is going to be used
> as is in the calling code, you should probably set bit 0 when KPTI is
> enabled.
> 

Oh right, I'll change this

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
  2018-05-16 20:41       ` Jacob Pan
@ 2018-05-17 10:02         ` Jean-Philippe Brucker
  2018-05-17 17:00           ` Jacob Pan
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-17 10:02 UTC (permalink / raw)
  To: Jacob Pan
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

Hi Jacob,

Thanks for reviewing this

On 16/05/18 21:41, Jacob Pan wrote:
>> + * The device must support multiple address spaces (e.g. PCI PASID).
>> By default
>> + * the PASID allocated during bind() is limited by the IOMMU
>> capacity, and by
>> + * the device PASID width defined in the PCI capability or in the
>> firmware
>> + * description. Setting @max_pasid to a non-zero value smaller than
>> this limit
>> + * overrides it.
>> + *
> seems the min_pasid never gets used. do you really need it?

Yes, the SMMU sets it to 1 in patch 28/40, because it needs to reserve
PASID 0

>> + * The device should not be performing any DMA while this function
>> is running,
>> + * otherwise the behavior is undefined.
>> + *
>> + * Return 0 if initialization succeeded, or an error.
>> + */
>> +int iommu_sva_device_init(struct device *dev, unsigned long features,
>> +			  unsigned int max_pasid)
>> +{
>> +	int ret;
>> +	struct iommu_sva_param *param;
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +
>> +	if (!domain || !domain->ops->sva_device_init)
>> +		return -ENODEV;
>> +
>> +	if (features)
>> +		return -EINVAL;
> should it be !features?

This checks if the user sets any unsupported bit in features. No feature
is supported right now, but patch 09/40 adds IOMMU_SVA_FEAT_IOPF, and
changes this line to "features & ~IOMMU_SVA_FEAT_IOPF"

>> +	mutex_lock(&dev->iommu_param->lock);
>> +	param = dev->iommu_param->sva_param;
>> +	dev->iommu_param->sva_param = NULL;
>> +	mutex_unlock(&dev->iommu_param->lock);
>> +	if (!param)
>> +		return -ENODEV;
>> +
>> +	if (domain->ops->sva_device_shutdown)
>> +		domain->ops->sva_device_shutdown(dev, param);
> seems a little mismatch here, do you need pass the param. I don't think
> there is anything else model specific iommu driver need to do for the
> param.

SMMU doesn't use it, but maybe it would remind other IOMMU driver which
features were enabled, so they don't have to keep track of that
themselves? I can remove it if it isn't useful

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-05-16 23:31       ` Jacob Pan
@ 2018-05-17 10:02         ` Jean-Philippe Brucker
       [not found]           ` <de478769-9f7a-d40b-a55e-e2c63ad883e8-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-17 10:02 UTC (permalink / raw)
  To: Jacob Pan
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On 17/05/18 00:31, Jacob Pan wrote:
> On Fri, 11 May 2018 20:06:04 +0100
> I am a little confused about domain vs. pasid relationship. If
> each domain represents a address space, should there be a domain for
> each pasid?

I don't think there is a formal definition, but from previous discussion
the consensus seems to be: domains are a collection of devices that have
the same virtual address spaces (one or many).

Keeping that definition makes things easier, in my opinion. Some time
ago, I did try to represent PASIDs using "subdomains" (introducing a
hierarchy of struct iommu_domain), but it required invasive changes in
the IOMMU subsystem and probably all over the tree.

You do need some kind of "root domain" for each device, so that
"iommu_get_domain_for_dev()" still makes sense. That root domain doesn't
have a single address space but a collection of subdomains. If you need
this anyway, representing a PASID with an iommu_domain doesn't seem
preferable than using a different structure (io_mm), because they don't
have anything in common.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices
       [not found]     ` <20180511190641.23008-3-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-17 13:10       ` Jonathan Cameron
       [not found]         ` <20180517141058.00001c76-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2018-09-05 11:29       ` Auger Eric
  1 sibling, 1 reply; 123+ messages in thread
From: Jonathan Cameron @ 2018-05-17 13:10 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:03 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
> PASID that drivers can program in hardware, to let their devices access an
> mm. This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.
> 
> IOMMU groups with more than one device aren't supported for SVA at the
> moment. There may be P2P traffic between devices within a group, which
> cannot be seen by an IOMMU (note that supporting PASID doesn't add any
> form of isolation with regard to P2P). Supporting groups would require
> calling bind() for all bound processes every time a device is added to a
> group, to perform sanity checks (e.g. ensure that new devices support
> PASIDs at least as big as those already allocated in the group).

Is it worth adding an explicit comment on this reasoning (or a minimal subset
of it) at the check for the number of devices in the group?
It's well laid out here, but might not be so obvious if someone is reading
the code in the future.

>It also
> means making sure that reserved ranges (IOMMU_RESV_*) of all devices are
> carved out of processes. This is already tricky with single devices, but
> becomes very difficult with groups. Since SVA-capable devices are expected
> to be cleanly isolated, and since we don't have any way to test groups or
> hot-plug, we only allow singular groups for now.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

Otherwise, looks good to me.

> 
> ---
> v1->v2: remove iommu_sva_bind/unbind_group
> ---
>  drivers/iommu/iommu-sva.c | 27 +++++++++++++
>  drivers/iommu/iommu.c     | 83 +++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 37 +++++++++++++++++
>  3 files changed, 147 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 8b4afb7c63ae..8d98f9c09864 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -93,6 +93,8 @@ int iommu_sva_device_shutdown(struct device *dev)
>  	if (!domain)
>  		return -ENODEV;
>  
> +	__iommu_sva_unbind_dev_all(dev);
> +
>  	mutex_lock(&dev->iommu_param->lock);
>  	param = dev->iommu_param->sva_param;
>  	dev->iommu_param->sva_param = NULL;
> @@ -108,3 +110,28 @@ int iommu_sva_device_shutdown(struct device *dev)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> +
> +int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +			    int *pasid, unsigned long flags, void *drvdata)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
> +
> +int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
> +
> +/**
> + * __iommu_sva_unbind_dev_all() - Detach all address spaces from this device
> + * @dev: the device
> + *
> + * When detaching @device from a domain, IOMMU drivers should use this helper.
> + */
> +void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +	/* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 9e28d88c8074..bd2819deae5b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2261,3 +2261,86 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access the mm
> + * using the returned PASID. If unbind() isn't called first, a subsequent bind()
> + * for the same device and mm fails with -EEXIST.
> + *
> + * iommu_sva_device_init() must be called first, to initialize the required SVA
> + * features. @flags is a subset of these features.
> + *
> + * The caller must pin down using get_user_pages*() all mappings shared with the
> + * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
> + * (e.g. copy-on-write).
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
> +			  unsigned long flags, void *drvdata)
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	/* Ensure device count and domain don't change while we're binding */
> +	mutex_lock(&group->mutex);
> +	if (iommu_group_device_count(group) != 1)
> +		goto out_unlock;
> +
> +	ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
> +
> +out_unlock:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> + * @dev: the device
> + * @pasid: the pasid returned by bind()
> + *
> + * Remove bond between device and address space identified by @pasid. Users
> + * should not call unbind() if the corresponding mm exited (as the PASID might
> + * have been reallocated for another process).
> + *
> + * The device must not be issuing any more transaction for this PASID. All
> + * outstanding page requests for this PASID must have been flushed to the IOMMU.
> + *
> + * Returns 0 on success, or an error value
> + */
> +int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	mutex_lock(&group->mutex);
> +	ret = __iommu_sva_unbind_device(dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 2efe7738bedb..da59c20c4f12 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -613,6 +613,10 @@ void iommu_fwspec_free(struct device *dev);
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>  
> +extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				int *pasid, unsigned long flags, void *drvdata);
> +extern int iommu_sva_unbind_device(struct device *dev, int pasid);
> +
>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> @@ -932,12 +936,29 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
>  	return -EINVAL;
>  }
>  
> +static inline int iommu_sva_bind_device(struct device *dev,
> +					struct mm_struct *mm, int *pasid,
> +					unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_SVA
>  extern int iommu_sva_device_init(struct device *dev, unsigned long features,
>  				 unsigned int max_pasid);
>  extern int iommu_sva_device_shutdown(struct device *dev);
> +extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				   int *pasid, unsigned long flags,
> +				   void *drvdata);
> +extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
> +extern void __iommu_sva_unbind_dev_all(struct device *dev);
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -950,6 +971,22 @@ static inline int iommu_sva_device_shutdown(struct device *dev)
>  {
>  	return -ENODEV;
>  }
> +
> +static inline int __iommu_sva_bind_device(struct device *dev,
> +					  struct mm_struct *mm, int *pasid,
> +					  unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +}
>  #endif /* CONFIG_IOMMU_SVA */
>  
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]     ` <20180511190641.23008-4-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-16 23:31       ` Jacob Pan
@ 2018-05-17 14:25       ` Jonathan Cameron
       [not found]         ` <20180517150516.000067ca-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2018-09-05 12:14       ` Auger Eric
  2 siblings, 1 reply; 123+ messages in thread
From: Jonathan Cameron @ 2018-05-17 14:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:04 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Allocate IOMMU mm structures and binding them to devices. Four operations
> are added to IOMMU drivers:
> 
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on ARM,
>   pin down the CPU ASID so that the process doesn't get assigned a new
>   ASID on rollover).
> 
>   There is a single valid io_mm structure per Linux mm. Future extensions
>   may also use io_mm for kernel-managed address spaces, populated with
>   map()/unmap() calls instead of bound to process address spaces. This
>   patch focuses on "shared" io_mm.
> 
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>   device is capable of sharing an address space, and writes the PASID
>   table entry to install the pgd.
> 
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and clear
>   the PASID entry or only send the required TLB invalidations.
> 
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>   PASID table entry and invalidates the IOTLBs.
> 
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
> 
> mm_attach and mm_detach operations are serialized with a spinlock. When
> trying to optimize this code, we should at least prevent concurrent
> attach()/detach() on the same domain (so multi-level PASID table code can
> allocate tables lazily). mm_alloc() can sleep, but mm_free must not
> (because we'll have to call it from call_srcu later on).
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
> 
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

A few minor bits and bobs inline.  Looks good in general + nice diags!

Thanks,

Jonathan

> 
> ---
> v1->v2: sanity-check of flags
> ---
>  drivers/iommu/iommu-sva.c | 380 +++++++++++++++++++++++++++++++++++++-
>  drivers/iommu/iommu.c     |   1 +
>  include/linux/iommu.h     |  28 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 8d98f9c09864..6ac679c48f3c 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -5,8 +5,298 @@
>   * Copyright (C) 2018 ARM Ltd.
>   */
>  
> +#include <linux/idr.h>
>  #include <linux/iommu.h>
> +#include <linux/sched/mm.h>
>  #include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
> + * B. All devices within the same domain access the same address spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each mm. In the
> + * example, devices use PASID 1 to read/write into address space X and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
> + * With this model, a single call binds all devices in a given domain to an
> + * address space. Other devices in the domain will get the same bond implicitly.
> + * However, users must issue one bind() for each device, because IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
> + * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
> + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
> + * the device table and PASID 0 would be available to the allocator.
> + */
> +
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
> + * used for returning errors). In practice implementations will use at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm, unsigned long flags)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm, flags);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the io_mm
> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->flags		= flags;
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
> +			  param->max_pasid + 1, GFP_ATOMIC);

I'd expect the IDR cleanup to be in io_mm_free as that would 'match'
against io_mm_alloc but it's in io_mm_release just before the io_mm_free
call, perhaps move it or am I missing something?

Hmm. This is reworked in patch 5 to use call rcu to do the free.  I suppose
we may be burning an idr entry if we take a while to get round to the
free..  If so a comment to explain this would be great.

> +	io_mm->pasid = pasid;
> +	spin_unlock(&iommu_sva_lock);
> +	idr_preload_end();
> +
> +	if (pasid < 0) {
> +		ret = pasid;
> +		goto err_free_mm;
> +	}
> +
> +	/* TODO: keep track of mm. For the moment, abort. */

>From later patches, I can now see why we didn't init the kref
here, but perhaps a comment would make that clear rather than
people checking it is correctly used throughout?  Actually just grab
the comment from patch 5 and put it in this one and that will
do the job nicely.

> +	ret = -ENOSYS;
> +	spin_lock(&iommu_sva_lock);
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +	spin_unlock(&iommu_sva_lock);
> +
> +err_free_mm:
> +	domain->ops->mm_free(io_mm);

Really minor, but you now have io_mm->release set so to keep
this obviously the same as the io_mm_free path, perhaps call
that rather than mm_free directly.

> +	mmdrop(mm);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm = io_mm->mm;
> +
> +	io_mm->release(io_mm);
> +	mmdrop(mm);
> +}
> +
> +static void io_mm_release(struct kref *kref)
> +{
> +	struct io_mm *io_mm;
> +
> +	io_mm = container_of(kref, struct io_mm, kref);
> +	WARN_ON(!list_empty(&io_mm->devices));
> +
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +
> +	io_mm_free(io_mm);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the io_mm was successfully taken.
> + * Returns zero if the io_mm is being freed and should not be used.
> + */
> +static int io_mm_get_locked(struct io_mm *io_mm)
> +{
> +	if (io_mm)
> +		return kref_get_unless_zero(&io_mm->kref);
> +
> +	return 0;
> +}
> +
> +static void io_mm_put_locked(struct io_mm *io_mm)
> +{
> +	kref_put(&io_mm->kref, io_mm_release);
> +}
> +
> +static void io_mm_put(struct io_mm *io_mm)
> +{
> +	spin_lock(&iommu_sva_lock);
> +	io_mm_put_locked(io_mm);
> +	spin_unlock(&iommu_sva_lock);
> +}
> +
> +static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
> +			struct io_mm *io_mm, void *drvdata)
> +{
> +	int ret;
> +	bool attach_domain = true;
> +	int pasid = io_mm->pasid;
> +	struct iommu_bond *bond, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +		return -ENODEV;
> +
> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> +		return -ERANGE;
> +
> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond->domain		= domain;
> +	bond->io_mm		= io_mm;
> +	bond->dev		= dev;
> +	bond->drvdata		= drvdata;
> +
> +	spin_lock(&iommu_sva_lock);
> +	/*
> +	 * Check if this io_mm is already bound to the domain. In which case the
> +	 * IOMMU driver doesn't have to install the PASID table entry.
> +	 */
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == io_mm) {
> +			attach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
> +	if (ret) {
> +		kfree(bond);
> +		spin_unlock(&iommu_sva_lock);
> +		return ret;
> +	}
> +
> +	list_add(&bond->mm_head, &io_mm->devices);
> +	list_add(&bond->domain_head, &domain->mm_list);
> +	list_add(&bond->dev_head, &param->mm_list);
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return 0;
> +}
> +
> +static void io_mm_detach_locked(struct iommu_bond *bond)
> +{
> +	struct iommu_bond *tmp;
> +	bool detach_domain = true;
> +	struct iommu_domain *domain = bond->domain;
> +
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
> +			detach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
> +

I can't see an immediate reason to have a different order in her to the reverse of
the attach above.   So I think you should be detaching after the list_del calls.
If there is a reason, can we have a comment so I don't ask on v10.

> +	list_del(&bond->mm_head);
> +	list_del(&bond->domain_head);
> +	list_del(&bond->dev_head);
> +	io_mm_put_locked(bond->io_mm);
> +
> +	kfree(bond);
> +}
>  
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
> @@ -47,6 +337,7 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
>  
>  	param->features		= features;
>  	param->max_pasid	= max_pasid;
> +	INIT_LIST_HEAD(&param->mm_list);
>  
>  	/*
>  	 * IOMMU driver updates the limits depending on the IOMMU and device
> @@ -114,13 +405,87 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>  int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
>  			    int *pasid, unsigned long flags, void *drvdata)
>  {
> -	return -ENOSYS; /* TODO */
> +	int i, ret = 0;
> +	struct io_mm *io_mm = NULL;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	/*
> +	 * The device driver does not call sva_device_init/shutdown and
> +	 * bind/unbind concurrently, so no need to take the param lock.
> +	 */
> +	if (WARN_ON_ONCE(!param) || (flags & ~param->features))
> +		return -EINVAL;
> +
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> +		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
> +			/* ... Unless it's already bound to this device */
> +			list_for_each_entry(tmp, &io_mm->devices, mm_head) {
> +				if (tmp->dev == dev) {
> +					bond = tmp;

Using bond for this is clear in a sense, but can we not just use ret
so it is obvious here that we are going to return -EEXIST?
At first glance I thought you were going to carry on with this bond
and couldn't work out why it would ever make sense to have two bonds
between a device an an io_mm (which it doesn't!)

> +					io_mm_put_locked(io_mm);
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	if (bond)
> +		return -EEXIST;
> +
> +	/* Require identical features within an io_mm for now */
> +	if (io_mm && (flags != io_mm->flags)) {
> +		io_mm_put(io_mm);
> +		return -EDOM;
> +	}
> +
> +	if (!io_mm) {
> +		io_mm = io_mm_alloc(domain, dev, mm, flags);
> +		if (IS_ERR(io_mm))
> +			return PTR_ERR(io_mm);
> +	}
> +
> +	ret = io_mm_attach(domain, dev, io_mm, drvdata);
> +	if (ret)
> +		io_mm_put(io_mm);
> +	else
> +		*pasid = io_mm->pasid;
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
>  
>  int __iommu_sva_unbind_device(struct device *dev, int pasid)
>  {
> -	return -ENOSYS; /* TODO */
> +	int ret = -ESRCH;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!param || WARN_ON(!domain))
> +		return -EINVAL;
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry(bond, &param->mm_list, dev_head) {
> +		if (bond->io_mm->pasid == pasid) {
> +			io_mm_detach_locked(bond);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>  
> @@ -132,6 +497,15 @@ EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>   */
>  void __iommu_sva_unbind_dev_all(struct device *dev)
>  {
> -	/* TODO */
> +	struct iommu_sva_param *param;
> +	struct iommu_bond *bond, *next;
> +
> +	param = dev->iommu_param->sva_param;
> +	if (param) {
> +		spin_lock(&iommu_sva_lock);
> +		list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
> +			io_mm_detach_locked(bond);
> +		spin_unlock(&iommu_sva_lock);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index bd2819deae5b..333801e1519c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1463,6 +1463,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>  	domain->type = type;
>  	/* Assume all sizes by default; the driver may override this later */
>  	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
> +	INIT_LIST_HEAD(&domain->mm_list);
>  
>  	return domain;
>  }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index da59c20c4f12..d5f21719a5a0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -100,6 +100,20 @@ struct iommu_domain {
>  	void *handler_token;
>  	struct iommu_domain_geometry geometry;
>  	void *iova_cookie;
> +
> +	struct list_head mm_list;
> +};
> +
> +struct io_mm {
> +	int			pasid;
> +	/* IOMMU_SVA_FEAT_* */
> +	unsigned long		flags;
> +	struct list_head	devices;
> +	struct kref		kref;
> +	struct mm_struct	*mm;
> +
> +	/* Release callback for this mm */
> +	void (*release)(struct io_mm *io_mm);
>  };
>  
>  enum iommu_cap {
> @@ -216,6 +230,7 @@ struct iommu_sva_param {
>  	unsigned long features;
>  	unsigned int min_pasid;
>  	unsigned int max_pasid;
> +	struct list_head mm_list;
>  };
>  
>  /**
> @@ -227,6 +242,11 @@ struct iommu_sva_param {
>   * @detach_dev: detach device from an iommu domain
>   * @sva_device_init: initialize Shared Virtual Adressing for a device
>   * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
> + * @mm_alloc: allocate io_mm
> + * @mm_free: free io_mm
> + * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
> + * @mm_detach: detach io_mm from a device. Remove PASID entry and
> + *             flush associated TLB entries.
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @unmap: unmap a physically contiguous memory region from an iommu domain
>   * @map_sg: map a scatter-gather list of physically contiguous memory chunks
> @@ -268,6 +288,14 @@ struct iommu_ops {
>  			       struct iommu_sva_param *param);
>  	void (*sva_device_shutdown)(struct device *dev,
>  				    struct iommu_sva_param *param);
> +	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
> +				  struct mm_struct *mm,
> +				  unsigned long flags);
> +	void (*mm_free)(struct io_mm *io_mm);
> +	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
> +			 struct io_mm *io_mm, bool attach_domain);
> +	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
> +			  struct io_mm *io_mm, bool detach_domain);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 05/40] iommu/sva: Track mm changes with an MMU notifier
       [not found]     ` <20180511190641.23008-6-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-17 14:25       ` Jonathan Cameron
       [not found]         ` <20180517152514.00004247-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jonathan Cameron @ 2018-05-17 14:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:06 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> When creating an io_mm structure, register an MMU notifier that informs
> us when the virtual address space changes and disappears.
> 
> Add a new operation to the IOMMU driver, mm_invalidate, called when a
> range of addresses is unmapped to let the IOMMU driver send ATC
> invalidations. mm_invalidate cannot sleep.
> 
> Adding the notifier complicates io_mm release. In one case device
> drivers free the io_mm explicitly by calling unbind (or detaching the
> device from its domain). In the other case the process could crash
> before unbind, in which case the release notifier has to do all the
> work.
> 
> Allowing the device driver's mm_exit() handler to sleep adds another
> complication, but it will greatly simplify things for users. For example
> VFIO can take the IOMMU mutex and remove any trace of io_mm, instead of
> introducing complex synchronization to delicatly handle this race. But
> relaxing the user side does force unbind() to sleep and wait for all
> pending mm_exit() calls to finish.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2:
> * Unbind() waits for mm_exit to finish
> * mm_exit can sleep
> ---
>  drivers/iommu/Kconfig     |   1 +
>  drivers/iommu/iommu-sva.c | 248 +++++++++++++++++++++++++++++++++++---
>  include/linux/iommu.h     |  10 ++
>  3 files changed, 244 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index cca8e06903c7..38434899e283 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -77,6 +77,7 @@ config IOMMU_DMA
>  config IOMMU_SVA
>  	bool
>  	select IOMMU_API
> +	select MMU_NOTIFIER
>  
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 0700893c679d..e9afae2537a2 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -7,6 +7,7 @@
>  
>  #include <linux/idr.h>
>  #include <linux/iommu.h>
> +#include <linux/mmu_notifier.h>
>  #include <linux/sched/mm.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> @@ -106,6 +107,9 @@ struct iommu_bond {
>  	struct list_head	mm_head;
>  	struct list_head	dev_head;
>  	struct list_head	domain_head;
> +	refcount_t		refs;
> +	struct wait_queue_head	mm_exit_wq;
> +	bool			mm_exit_active;
>  
>  	void			*drvdata;
>  };
> @@ -124,6 +128,8 @@ static DEFINE_IDR(iommu_pasid_idr);
>   */
>  static DEFINE_SPINLOCK(iommu_sva_lock);
>  
> +static struct mmu_notifier_ops iommu_mmu_notifier;
> +
>  static struct io_mm *
>  io_mm_alloc(struct iommu_domain *domain, struct device *dev,
>  	    struct mm_struct *mm, unsigned long flags)
> @@ -151,6 +157,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
>  
>  	io_mm->flags		= flags;
>  	io_mm->mm		= mm;
> +	io_mm->notifier.ops	= &iommu_mmu_notifier;
>  	io_mm->release		= domain->ops->mm_free;
>  	INIT_LIST_HEAD(&io_mm->devices);
>  
> @@ -167,8 +174,29 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
>  		goto err_free_mm;
>  	}
>  
> -	/* TODO: keep track of mm. For the moment, abort. */
> -	ret = -ENOSYS;
> +	ret = mmu_notifier_register(&io_mm->notifier, mm);
> +	if (ret)
> +		goto err_free_pasid;
> +
> +	/*
> +	 * Now that the MMU notifier is valid, we can allow users to grab this
> +	 * io_mm by setting a valid refcount. Before that it was accessible in
> +	 * the IDR but invalid.
> +	 *
> +	 * The following barrier ensures that users, who obtain the io_mm with
> +	 * kref_get_unless_zero, don't read uninitialized fields in the
> +	 * structure.
> +	 */
> +	smp_wmb();
> +	kref_init(&io_mm->kref);
> +
> +	return io_mm;
> +
> +err_free_pasid:
> +	/*
> +	 * Even if the io_mm is accessible from the IDR at this point, kref is
> +	 * 0 so no user could get a reference to it. Free it manually.
> +	 */
>  	spin_lock(&iommu_sva_lock);
>  	idr_remove(&iommu_pasid_idr, io_mm->pasid);
>  	spin_unlock(&iommu_sva_lock);
> @@ -180,9 +208,13 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
>  	return ERR_PTR(ret);
>  }
>  
> -static void io_mm_free(struct io_mm *io_mm)
> +static void io_mm_free(struct rcu_head *rcu)
>  {
> -	struct mm_struct *mm = io_mm->mm;
> +	struct io_mm *io_mm;
> +	struct mm_struct *mm;
> +
> +	io_mm = container_of(rcu, struct io_mm, rcu);
> +	mm = io_mm->mm;
>  
>  	io_mm->release(io_mm);
>  	mmdrop(mm);
> @@ -197,7 +229,22 @@ static void io_mm_release(struct kref *kref)
>  
>  	idr_remove(&iommu_pasid_idr, io_mm->pasid);
>  
> -	io_mm_free(io_mm);
> +	/*
> +	 * If we're being released from mm exit, the notifier callback ->release
> +	 * has already been called. Otherwise we don't need ->release, the io_mm
> +	 * isn't attached to anything anymore. Hence no_release.
> +	 */
> +	mmu_notifier_unregister_no_release(&io_mm->notifier, io_mm->mm);
> +
> +	/*
> +	 * We can't free the structure here, because if mm exits during
> +	 * unbind(), then ->release might be attempting to grab the io_mm
> +	 * concurrently. And in the other case, if ->release is calling
> +	 * io_mm_release, then __mmu_notifier_release expects to still have a
> +	 * valid mn when returning. So free the structure when it's safe, after
> +	 * the RCU grace period elapsed.
> +	 */
> +	mmu_notifier_call_srcu(&io_mm->rcu, io_mm_free);
>  }
>  
>  /*
> @@ -206,8 +253,14 @@ static void io_mm_release(struct kref *kref)
>   */
>  static int io_mm_get_locked(struct io_mm *io_mm)
>  {
> -	if (io_mm)
> -		return kref_get_unless_zero(&io_mm->kref);
> +	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
> +		/*
> +		 * kref_get_unless_zero doesn't provide ordering for reads. This
> +		 * barrier pairs with the one in io_mm_alloc.
> +		 */
> +		smp_rmb();
> +		return 1;
> +	}
>  
>  	return 0;
>  }
> @@ -233,7 +286,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
>  	struct iommu_bond *bond, *tmp;
>  	struct iommu_sva_param *param = dev->iommu_param->sva_param;
>  
> -	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
> +	    !domain->ops->mm_invalidate)
>  		return -ENODEV;
>  
>  	if (pasid > param->max_pasid || pasid < param->min_pasid)
> @@ -247,6 +301,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
>  	bond->io_mm		= io_mm;
>  	bond->dev		= dev;
>  	bond->drvdata		= drvdata;
> +	refcount_set(&bond->refs, 1);
> +	init_waitqueue_head(&bond->mm_exit_wq);
>  
>  	spin_lock(&iommu_sva_lock);
>  	/*
> @@ -275,12 +331,37 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
>  	return 0;
>  }
>  
> -static void io_mm_detach_locked(struct iommu_bond *bond)
> +static void io_mm_detach_locked(struct iommu_bond *bond, bool wait)
>  {
>  	struct iommu_bond *tmp;
>  	bool detach_domain = true;
>  	struct iommu_domain *domain = bond->domain;
>  
> +	if (wait) {
> +		bool do_detach = true;
> +		/*
> +		 * If we're unbind() then we're deleting the bond no matter
> +		 * what. Tell the mm_exit thread that we're cleaning up, and
> +		 * wait until it finishes using the bond.
> +		 *
> +		 * refs is guaranteed to be one or more, otherwise it would
> +		 * already have been removed from the list. Check is someone is

Check if someone...

> +		 * already waiting, in which case we wait but do not free.
> +		 */
> +		if (refcount_read(&bond->refs) > 1)
> +			do_detach = false;
> +
> +		refcount_inc(&bond->refs);
> +		wait_event_lock_irq(bond->mm_exit_wq, !bond->mm_exit_active,
> +				    iommu_sva_lock);
> +		if (!do_detach)
> +			return;
> +
> +	} else if (!refcount_dec_and_test(&bond->refs)) {
> +		/* unbind() is waiting to free the bond */
> +		return;
> +	}
> +
>  	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
>  		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
>  			detach_domain = false;
> @@ -298,6 +379,129 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
>  	kfree(bond);
>  }
>  
> +static int iommu_signal_mm_exit(struct iommu_bond *bond)
> +{
> +	struct device *dev = bond->dev;
> +	struct io_mm *io_mm = bond->io_mm;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	/*
> +	 * We can't hold the device's param_lock. If we did and the device
> +	 * driver used a global lock around io_mm, we would risk getting the
> +	 * following deadlock:
> +	 *
> +	 *   exit_mm()                 |  Shutdown SVA
> +	 *    mutex_lock(param->lock)  |   mutex_lock(glob lock)
> +	 *     param->mm_exit()        |    sva_device_shutdown()
> +	 *      mutex_lock(glob lock)  |     mutex_lock(param->lock)
> +	 *
> +	 * Fortunately unbind() waits for us to finish, and sva_device_shutdown
> +	 * requires that any bond is removed, so we can safely access mm_exit
> +	 * and drvdata without taking any lock.
> +	 */
> +	if (!param || !param->mm_exit)
> +		return 0;
> +
> +	return param->mm_exit(dev, io_mm->pasid, bond->drvdata);
> +}
> +
> +/* Called when the mm exits. Can race with unbind(). */
> +static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +	struct iommu_bond *bond, *next;
> +	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
> +
> +	/*
> +	 * If the mm is exiting then devices are still bound to the io_mm.
> +	 * A few things need to be done before it is safe to release:
> +	 *
> +	 * - As the mmu notifier doesn't hold any reference to the io_mm when
> +	 *   calling ->release(), try to take a reference.
> +	 * - Tell the device driver to stop using this PASID.
> +	 * - Clear the PASID table and invalidate TLBs.
> +	 * - Drop all references to this io_mm by freeing the bonds.
> +	 */
> +	spin_lock(&iommu_sva_lock);
> +	if (!io_mm_get_locked(io_mm)) {
> +		/* Someone's already taking care of it. */
> +		spin_unlock(&iommu_sva_lock);
> +		return;
> +	}
> +
> +	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
> +		/*
> +		 * Release the lock to let the handler sleep. We need to be
> +		 * careful about concurrent modifications to the list and to the
> +		 * bond. Tell unbind() not to free the bond until we're done.
> +		 */
> +		bond->mm_exit_active = true;
> +		spin_unlock(&iommu_sva_lock);
> +
> +		if (iommu_signal_mm_exit(bond))
> +			dev_WARN(bond->dev, "possible leak of PASID %u",
> +				 io_mm->pasid);
> +
> +		spin_lock(&iommu_sva_lock);
> +		next = list_next_entry(bond, mm_head);
> +
> +		/* If someone is waiting, let them delete the bond now */
> +		bond->mm_exit_active = false;
> +		wake_up_all(&bond->mm_exit_wq);
> +
> +		/* Otherwise, do it ourselves */
> +		io_mm_detach_locked(bond, false);
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	/*
> +	 * We're now reasonably certain that no more fault is being handled for
> +	 * this io_mm, since we just flushed them all out of the fault queue.
> +	 * Release the last reference to free the io_mm.
> +	 */
> +	io_mm_put(io_mm);
> +}
> +
> +static void iommu_notifier_invalidate_range(struct mmu_notifier *mn,
> +					    struct mm_struct *mm,
> +					    unsigned long start,
> +					    unsigned long end)
> +{
> +	struct iommu_bond *bond;
> +	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry(bond, &io_mm->devices, mm_head) {
> +		struct iommu_domain *domain = bond->domain;
> +
> +		domain->ops->mm_invalidate(domain, bond->dev, io_mm, start,
> +					   end - start);
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +}
> +
> +static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
> +					    struct mm_struct *mm,
> +					    unsigned long start,
> +					    unsigned long end)
> +{
> +	iommu_notifier_invalidate_range(mn, mm, start, end);
> +	return 0;
> +}
> +
> +static void iommu_notifier_change_pte(struct mmu_notifier *mn,
> +				      struct mm_struct *mm,
> +				      unsigned long address, pte_t pte)
> +{
> +	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
> +}
> +
> +static struct mmu_notifier_ops iommu_mmu_notifier = {
> +	.release		= iommu_notifier_release,
> +	.clear_flush_young	= iommu_notifier_clear_flush_young,
> +	.change_pte		= iommu_notifier_change_pte,
> +	.invalidate_range	= iommu_notifier_invalidate_range,
> +};
> +
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
>   * @dev: the device
> @@ -320,6 +524,12 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
>   * The handler gets an opaque pointer corresponding to the drvdata passed as
>   * argument of bind().
>   *
> + * The @mm_exit handler is allowed to sleep. Be careful about the locks taken in
> + * @mm_exit, because they might lead to deadlocks if they are also held when
> + * dropping references to the mm. Consider the following call chain:
> + *   mutex_lock(A); mmput(mm) -> exit_mm() -> @mm_exit() -> mutex_lock(A)
> + * Using mmput_async() prevents this scenario.
> + *
>   * The device should not be performing any DMA while this function is running,
>   * otherwise the behavior is undefined.
>   *
> @@ -484,15 +694,16 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid)
>  	if (!param || WARN_ON(!domain))
>  		return -EINVAL;
>  
> -	spin_lock(&iommu_sva_lock);
> +	/* spin_lock_irq matches the one in wait_event_lock_irq */
> +	spin_lock_irq(&iommu_sva_lock);
>  	list_for_each_entry(bond, &param->mm_list, dev_head) {
>  		if (bond->io_mm->pasid == pasid) {
> -			io_mm_detach_locked(bond);
> +			io_mm_detach_locked(bond, true);
>  			ret = 0;
>  			break;
>  		}
>  	}
> -	spin_unlock(&iommu_sva_lock);
> +	spin_unlock_irq(&iommu_sva_lock);
>  
>  	return ret;
>  }
> @@ -503,18 +714,25 @@ EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>   * @dev: the device
>   *
>   * When detaching @device from a domain, IOMMU drivers should use this helper.
> + * This function may sleep while waiting for bonds to be released.
>   */
>  void __iommu_sva_unbind_dev_all(struct device *dev)
>  {
>  	struct iommu_sva_param *param;
>  	struct iommu_bond *bond, *next;
>  
> +	/*
> +	 * io_mm_detach_locked might wait, so we shouldn't call it with the dev
> +	 * param lock held. It's fine to read sva_param outside the lock because
> +	 * it can only be freed by iommu_sva_device_shutdown when there are no
> +	 * more bonds in the list.
> +	 */
>  	param = dev->iommu_param->sva_param;
>  	if (param) {
> -		spin_lock(&iommu_sva_lock);
> +		spin_lock_irq(&iommu_sva_lock);
>  		list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
> -			io_mm_detach_locked(bond);
> -		spin_unlock(&iommu_sva_lock);
> +			io_mm_detach_locked(bond, true);
> +		spin_unlock_irq(&iommu_sva_lock);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 439c8fffd836..caa6f79785b9 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -24,6 +24,7 @@
>  #include <linux/types.h>
>  #include <linux/errno.h>
>  #include <linux/err.h>
> +#include <linux/mmu_notifier.h>
>  #include <linux/of.h>
>  #include <uapi/linux/iommu.h>
>  
> @@ -111,10 +112,15 @@ struct io_mm {
>  	unsigned long		flags;
>  	struct list_head	devices;
>  	struct kref		kref;
> +#if defined(CONFIG_MMU_NOTIFIER)
> +	struct mmu_notifier	notifier;
> +#endif
>  	struct mm_struct	*mm;
>  
>  	/* Release callback for this mm */
>  	void (*release)(struct io_mm *io_mm);
> +	/* For postponed release */
> +	struct rcu_head		rcu;
>  };
>  
>  enum iommu_cap {
> @@ -249,6 +255,7 @@ struct iommu_sva_param {
>   * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
>   * @mm_detach: detach io_mm from a device. Remove PASID entry and
>   *             flush associated TLB entries.
> + * @mm_invalidate: Invalidate a range of mappings for an mm
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @unmap: unmap a physically contiguous memory region from an iommu domain
>   * @map_sg: map a scatter-gather list of physically contiguous memory chunks
> @@ -298,6 +305,9 @@ struct iommu_ops {
>  			 struct io_mm *io_mm, bool attach_domain);
>  	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
>  			  struct io_mm *io_mm, bool detach_domain);
> +	void (*mm_invalidate)(struct iommu_domain *domain, struct device *dev,
> +			      struct io_mm *io_mm, unsigned long vaddr,
> +			      size_t size);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
       [not found]     ` <20180511190641.23008-8-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-17 15:25       ` Jonathan Cameron
       [not found]         ` <20180517162555.00002bd3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2018-05-18 18:04       ` Jacob Pan
  1 sibling, 1 reply; 123+ messages in thread
From: Jonathan Cameron @ 2018-05-17 15:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:08 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Some systems allow devices to handle I/O Page Faults in the core mm. For
> example systems implementing the PCI PRI extension or Arm SMMU stall
> model. Infrastructure for reporting these recoverable page faults was
> recently added to the IOMMU core for SVA virtualisation. Add a page fault
> handler for host SVA.
> 
> IOMMU driver can now instantiate several fault workqueues and link them to
> IOPF-capable devices. Drivers can choose between a single global
> workqueue, one per IOMMU device, one per low-level fault queue, one per
> domain, etc.
> 
> When it receives a fault event, supposedly in an IRQ handler, the IOMMU
> driver reports the fault using iommu_report_device_fault(), which calls
> the registered handler. The page fault handler then calls the mm fault
> handler, and reports either success or failure with iommu_page_response().
> When the handler succeeded, the IOMMU retries the access.
> 
> The iopf_param pointer could be embedded into iommu_fault_param. But
> putting iopf_param into the iommu_param structure allows us not to care
> about ordering between calls to iopf_queue_add_device() and
> iommu_register_device_fault_handler().
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

Hi Jean-Phillipe,

One question below on how we would end up with partial faults left when
doing the queue remove. Code looks fine, but I'm not seeing how that
would happen without buggy hardware... + we presumably have to rely on
the hardware timing out on that request or it's dead anyway...

Jonathan

> 
> ---
> v1->v2: multiple queues registered by IOMMU driver
> ---
>  drivers/iommu/Kconfig      |   4 +
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 363 +++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h      |  58 ++++++
>  4 files changed, 426 insertions(+)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 38434899e283..09f13a7c4b60 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -79,6 +79,10 @@ config IOMMU_SVA
>  	select IOMMU_API
>  	select MMU_NOTIFIER
>  
> +config IOMMU_PAGE_FAULT
> +	bool
> +	select IOMMU_API
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..4b744e399a1b 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..321c00dd3a3d
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,363 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +/**
> + * struct iopf_queue - IO Page Fault queue
> + * @wq: the fault workqueue
> + * @flush: low-level flush callback
> + * @flush_arg: flush() argument
> + * @refs: references to this structure taken by producers
> + */
> +struct iopf_queue {
> +	struct workqueue_struct		*wq;
> +	iopf_queue_flush_t		flush;
> +	void				*flush_arg;
> +	refcount_t			refs;
> +};
> +
> +/**
> + * struct iopf_device_param - IO Page Fault data attached to a device
> + * @queue: IOPF queue
> + * @partial: faults that are part of a Page Request Group for which the last
> + *           request hasn't been submitted yet.
> + */
> +struct iopf_device_param {
> +	struct iopf_queue		*queue;
> +	struct list_head		partial;
> +};
> +
> +struct iopf_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iopf_group {
> +	struct iopf_context		last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
> +static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
> +			 enum page_response_code status)
> +{
> +	struct page_response_msg resp = {
> +		.addr			= evt->addr,
> +		.pasid			= evt->pasid,
> +		.pasid_present		= evt->pasid_valid,
> +		.page_req_group_id	= evt->page_req_group_id,
> +		.private_data		= evt->iommu_private,
> +		.resp_code		= status,
> +	};
> +
> +	return iommu_page_response(dev, &resp);
> +}
> +
> +static enum page_response_code
> +iopf_handle_single(struct iopf_context *fault)
> +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iopf_handle_group(struct work_struct *work)
> +{
> +	struct iopf_group *group;
> +	struct iopf_context *fault, *next;
> +	enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iopf_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults in the
> +		 * group if there is an error.
> +		 */
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iopf_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iopf_complete(group->last_fault.dev, &group->last_fault.evt, status);
> +	kfree(group);
> +}
> +
> +/**
> + * iommu_queue_iopf - IO Page Fault handler
> + * @evt: fault event
> + * @cookie: struct device, passed to iommu_register_device_fault_handler.
> + *
> + * Add a fault to the device workqueue, to be handled by mm.
> + */
> +int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
> +{
> +	struct iopf_group *group;
> +	struct iopf_context *fault, *next;
> +	struct iopf_device_param *iopf_param;
> +
> +	struct device *dev = cookie;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (WARN_ON(!mutex_is_locked(&param->lock)))
> +		return -EINVAL;
> +
> +	if (evt->type != IOMMU_FAULT_PAGE_REQ)
> +		/* Not a recoverable page fault */
> +		return IOMMU_PAGE_RESP_CONTINUE;
> +
> +	/*
> +	 * As long as we're holding param->lock, the queue can't be unlinked
> +	 * from the device and therefore cannot disappear.
> +	 */
> +	iopf_param = param->iopf_param;
> +	if (!iopf_param)
> +		return -ENODEV;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the last one */
> +		list_add(&fault->head, &iopf_param->partial);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iopf_handle_group);
> +
> +	/* See if we have partial faults for this group */
> +	list_for_each_entry_safe(fault, next, &iopf_param->partial, head) {
> +		if (fault->evt.page_req_group_id == evt->page_req_group_id)
> +			/* Insert *before* the last fault */
> +			list_move(&fault->head, &group->faults);
> +	}
> +
> +	queue_work(iopf_param->queue->wq, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +EXPORT_SYMBOL_GPL(iommu_queue_iopf);
> +
> +/**
> + * iopf_queue_flush_dev - Ensure that all queued faults have been processed
> + * @dev: the endpoint whose faults need to be flushed.
> + *
> + * Users must call this function when releasing a PASID, to ensure that all
> + * pending faults for this PASID have been handled, and won't hit the address
> + * space of the next process that uses this PASID.
> + *
> + * Return 0 on success.
> + */
> +int iopf_queue_flush_dev(struct device *dev)
> +{
> +	int ret = 0;
> +	struct iopf_queue *queue;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (!param)
> +		return -ENODEV;
> +
> +	/*
> +	 * It is incredibly easy to find ourselves in a deadlock situation if
> +	 * we're not careful, because we're taking the opposite path as
> +	 * iommu_queue_iopf:
> +	 *
> +	 *   iopf_queue_flush_dev()   |  PRI queue handler
> +	 *    lock(mutex)             |   iommu_queue_iopf()
> +	 *     queue->flush()         |    lock(mutex)
> +	 *      wait PRI queue empty  |
> +	 *
> +	 * So we can't hold the device param lock while flushing. We don't have
> +	 * to, because the queue or the device won't disappear until all flush
> +	 * are finished.
> +	 */
> +	mutex_lock(&param->lock);
> +	if (param->iopf_param) {
> +		queue = param->iopf_param->queue;
> +	} else {
> +		ret = -ENODEV;
> +	}
> +	mutex_unlock(&param->lock);
> +	if (ret)
> +		return ret;
> +
> +	queue->flush(queue->flush_arg, dev);
> +
> +	/*
> +	 * No need to clear the partial list. All PRGs containing the PASID that
> +	 * needs to be decommissioned are whole (the device driver made sure of
> +	 * it before this function was called). They have been submitted to the
> +	 * queue by the above flush().
> +	 */
> +	flush_workqueue(queue->wq);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
> +
> +/**
> + * iopf_queue_add_device - Add producer to the fault queue
> + * @queue: IOPF queue
> + * @dev: device to add
> + */
> +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
> +{
> +	int ret = -EINVAL;
> +	struct iopf_device_param *iopf_param;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (!param)
> +		return -ENODEV;
> +
> +	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
> +	if (!iopf_param)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&iopf_param->partial);
> +	iopf_param->queue = queue;
> +
> +	mutex_lock(&param->lock);
> +	if (!param->iopf_param) {
> +		refcount_inc(&queue->refs);
> +		param->iopf_param = iopf_param;
> +		ret = 0;
> +	}
> +	mutex_unlock(&param->lock);
> +
> +	if (ret)
> +		kfree(iopf_param);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_add_device);
> +
> +/**
> + * iopf_queue_remove_device - Remove producer from fault queue
> + * @dev: device to remove
> + *
> + * Caller makes sure that no more fault is reported for this device, and no more
> + * flush is scheduled for this device.
> + *
> + * Note: safe to call unconditionally on a cleanup path, even if the device
> + * isn't registered to any IOPF queue.
> + *
> + * Return 0 if the device was attached to the IOPF queue
> + */
> +int iopf_queue_remove_device(struct device *dev)
> +{
> +	struct iopf_context *fault, *next;
> +	struct iopf_device_param *iopf_param;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (!param)
> +		return -EINVAL;
> +
> +	mutex_lock(&param->lock);
> +	iopf_param = param->iopf_param;
> +	if (iopf_param) {
> +		refcount_dec(&iopf_param->queue->refs);
> +		param->iopf_param = NULL;
> +	}
> +	mutex_unlock(&param->lock);
> +	if (!iopf_param)
> +		return -EINVAL;
> +
> +	list_for_each_entry_safe(fault, next, &iopf_param->partial, head)
> +		kfree(fault);
> +

Why would we end up here with partials still in the list?  Buggy hardware?

> +	/*
> +	 * No more flush is scheduled, and the caller removed all bonds from
> +	 * this device. unbind() waited until any concurrent mm_exit() finished,
> +	 * therefore there is no flush() running anymore and we can free the
> +	 * param.
> +	 */
> +	kfree(iopf_param);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
> +
> +/**
> + * iopf_queue_alloc - Allocate and initialize a fault queue
> + * @name: a unique string identifying the queue (for workqueue)
> + * @flush: a callback that flushes the low-level queue
> + * @cookie: driver-private data passed to the flush callback
> + *
> + * The callback is called before the workqueue is flushed. The IOMMU driver must
> + * commit all faults that are pending in its low-level queues at the time of the
> + * call, into the IOPF queue (with iommu_report_device_fault). The callback
> + * takes a device pointer as argument, hinting what endpoint is causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +struct iopf_queue *
> +iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
> +{
> +	struct iopf_queue *queue;
> +
> +	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
> +	if (!queue)
> +		return NULL;
> +
> +	/*
> +	 * The WQ is unordered because the low-level handler enqueues faults by
> +	 * group. PRI requests within a group have to be ordered, but once
> +	 * that's dealt with, the high-level function can handle groups out of
> +	 * order.
> +	 */
> +	queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name);
> +	if (!queue->wq) {
> +		kfree(queue);
> +		return NULL;
> +	}
> +
> +	queue->flush = flush;
> +	queue->flush_arg = cookie;
> +	refcount_set(&queue->refs, 1);
> +
> +	return queue;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_alloc);
> +
> +/**
> + * iopf_queue_free - Free IOPF queue
> + * @queue: queue to free
> + *
> + * Counterpart to iopf_queue_alloc(). Caller must make sure that all producers
> + * have been removed.
> + */
> +void iopf_queue_free(struct iopf_queue *queue)
> +{
> +

Nitpick : Blank line here doesn't add anything.

> +	/* Caller should have removed all producers first */
> +	if (WARN_ON(!refcount_dec_and_test(&queue->refs)))
> +		return;
> +
> +	destroy_workqueue(queue->wq);
> +	kfree(queue);
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_free);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index faf3390ce89d..fad3a60e1c14 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -461,11 +461,20 @@ struct iommu_fault_param {
>  	void *data;
>  };
>  
> +/**
> + * iopf_queue_flush_t - Flush low-level page fault queue
> + *
> + * Report all faults currently pending in the low-level page fault queue
> + */
> +struct iopf_queue;
> +typedef int (*iopf_queue_flush_t)(void *cookie, struct device *dev);
> +
>  /**
>   * struct iommu_param - collection of per-device IOMMU data
>   *
>   * @fault_param: IOMMU detected device fault reporting data
>   * @sva_param: SVA parameters
> + * @iopf_param: I/O Page Fault queue and data
>   *
>   * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
>   *	struct iommu_group	*iommu_group;
> @@ -475,6 +484,7 @@ struct iommu_param {
>  	struct mutex lock;
>  	struct iommu_fault_param *fault_param;
>  	struct iommu_sva_param *sva_param;
> +	struct iopf_device_param *iopf_param;
>  };
>  
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -874,6 +884,12 @@ static inline int iommu_report_device_fault(struct device *dev, struct iommu_fau
>  	return 0;
>  }
>  
> +static inline int iommu_page_response(struct device *dev,
> +				      struct page_response_msg *msg)
> +{
> +	return -ENODEV;
> +}
> +
>  static inline int iommu_group_id(struct iommu_group *group)
>  {
>  	return -ENODEV;
> @@ -1038,4 +1054,46 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
>  }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_PAGE_FAULT
> +extern int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie);
> +
> +extern int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
> +extern int iopf_queue_remove_device(struct device *dev);
> +extern int iopf_queue_flush_dev(struct device *dev);
> +extern struct iopf_queue *
> +iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie);
> +extern void iopf_queue_free(struct iopf_queue *queue);
> +#else /* CONFIG_IOMMU_PAGE_FAULT */
> +static inline int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iopf_queue_add_device(struct iopf_queue *queue,
> +					struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iopf_queue_remove_device(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iopf_queue_flush_dev(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline struct iopf_queue *
> +iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
> +{
> +	return NULL;
> +}
> +
> +static inline void iopf_queue_free(struct iopf_queue *queue)
> +{
> +}
> +#endif /* CONFIG_IOMMU_PAGE_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]     ` <20180511190641.23008-14-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-17 15:58       ` Jonathan Cameron
       [not found]         ` <20180517165845.00000cc9-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2018-05-23  9:38       ` Xu Zaibo
  2018-08-27  8:06       ` Xu Zaibo
  2 siblings, 1 reply; 123+ messages in thread
From: Jonathan Cameron @ 2018-05-17 15:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:14 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> Process Address Space ID (PASID). Devices in the container append this
> PASID to DMA transactions in order to access the process' address space.
> The process page tables are shared with the IOMMU, and mechanisms such as
> PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a
> bond created with VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

A couple of small comments inline..

Jonathan

> 
> ---
> v1->v2:
> * Simplify mm_exit
> * Can be built as module again
> ---
>  drivers/vfio/vfio_iommu_type1.c | 449 ++++++++++++++++++++++++++++++--
>  include/uapi/linux/vfio.h       |  76 ++++++
>  2 files changed, 504 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5c212bf29640..2902774062b8 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/ptrace.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>  	struct list_head	domain_list;
> +	struct list_head	mm_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> @@ -90,6 +92,14 @@ struct vfio_dma {
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> +	bool			sva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID	(-1)
> +	int			pasid;
> +	struct mm_struct	*mm;
> +	struct list_head	next;
>  };
>  
>  /*
> @@ -1238,6 +1248,164 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  	return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu *iommu = data;
> +
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->pasid == pasid) {
> +			list_del(&vfio_mm->next);
> +			kfree(vfio_mm);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&iommu->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{
> +	return iommu_sva_device_init(dev, IOMMU_SVA_FEAT_IOPF, 0,
> +				     vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> +	iommu_sva_device_shutdown(dev);
> +
> +	return 0;
> +}
> +
> +struct vfio_iommu_sva_bind_data {
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu *iommu;
> +	int count;
> +};
> +
> +static int vfio_iommu_sva_bind_dev(struct device *dev, void *data)
> +{
> +	struct vfio_iommu_sva_bind_data *bind_data = data;
> +
> +	/* Multi-device groups aren't support for SVA */
> +	if (bind_data->count++)
> +		return -EINVAL;
> +
> +	return __iommu_sva_bind_device(dev, bind_data->vfio_mm->mm,
> +				       &bind_data->vfio_mm->pasid,
> +				       IOMMU_SVA_FEAT_IOPF, bind_data->iommu);
> +}
> +
> +static int vfio_iommu_sva_unbind_dev(struct device *dev, void *data)
> +{
> +	struct vfio_mm *vfio_mm = data;
> +
> +	return __iommu_sva_unbind_device(dev, vfio_mm->pasid);
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	bool enabled_sva = false;
> +	struct vfio_iommu_sva_bind_data data = {
> +		.vfio_mm	= vfio_mm,
> +		.iommu		= iommu,
> +		.count		= 0,
> +	};
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
> +		if (ret)
> +			return ret;
> +
> +		group->sva_enabled = enabled_sva = true;
> +	}
> +
> +	ret = iommu_group_for_each_dev(group->iommu_group, &data,
> +				       vfio_iommu_sva_bind_dev);
> +	if (ret && data.count > 1)

Are we safe to run this even if data.count == 1?  I assume that at
that point we would always not have an iommu domain associated with the
device so the initial check would error out.

Just be nice to get rid of the special casing in this error path as then
could just do it all under if (ret) and make it visually clearer these
are different aspects of the same error path.


> +		iommu_group_for_each_dev(group->iommu_group, vfio_mm,
> +					 vfio_iommu_sva_unbind_dev);
> +	if (ret && enabled_sva) {
> +		iommu_group_for_each_dev(group->iommu_group, NULL,
> +					 vfio_iommu_sva_shutdown);
> +		group->sva_enabled = false;
> +	}
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_unbind_group(struct vfio_group *group,
> +				    struct vfio_mm *vfio_mm)
> +{
> +	iommu_group_for_each_dev(group->iommu_group, vfio_mm,
> +				 vfio_iommu_sva_unbind_dev);
> +}
> +
> +static void vfio_iommu_unbind(struct vfio_iommu *iommu,
> +			      struct vfio_mm *vfio_mm)
> +{
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next)
> +		list_for_each_entry(group, &domain->group_list, next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +}
> +
> +static int vfio_iommu_replay_bind(struct vfio_iommu *iommu,
> +				  struct vfio_group *group)
> +{
> +	int ret = 0;
> +	struct vfio_mm *vfio_mm;
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		/*
> +		 * Ensure that mm doesn't exit while we're binding it to the new
> +		 * group. It may already have exited, and the mm_exit notifier
> +		 * might be waiting on the IOMMU mutex to remove this vfio_mm
> +		 * from the list.
> +		 */
> +		if (!mmget_not_zero(vfio_mm->mm))
> +			continue;
> +		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +		/*
> +		 * Use async to avoid triggering an mm_exit callback right away,
> +		 * which would block on the mutex that we're holding.
> +		 */
> +		mmput_async(vfio_mm->mm);
> +
> +		if (ret)
> +			goto out_unbind;
> +	}
> +
> +	return 0;
> +
> +out_unbind:
> +	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next)
> +		vfio_iommu_unbind_group(group, vfio_mm);
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
> +{
> +	struct vfio_mm *vfio_mm, *next;
> +
> +	/*
> +	 * No need for unbind() here. Since all groups are detached from this
> +	 * iommu, bonds have been removed.
> +	 */
> +	list_for_each_entry_safe(vfio_mm, next, &iommu->mm_list, next)
> +		kfree(vfio_mm);
> +	INIT_LIST_HEAD(&iommu->mm_list);
> +}
> +
>  /*
>   * We change our unmap behavior slightly depending on whether the IOMMU
>   * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
> @@ -1313,6 +1481,44 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
>  	return ret;
>  }
>  
> +static int vfio_iommu_try_attach_group(struct vfio_iommu *iommu,
> +				       struct vfio_group *group,
> +				       struct vfio_domain *cur_domain,
> +				       struct vfio_domain *new_domain)
> +{
> +	struct iommu_group *iommu_group = group->iommu_group;
> +
> +	/*
> +	 * Try to match an existing compatible domain.  We don't want to
> +	 * preclude an IOMMU driver supporting multiple bus_types and being
> +	 * able to include different bus_types in the same IOMMU domain, so
> +	 * we test whether the domains use the same iommu_ops rather than
> +	 * testing if they're on the same bus_type.
> +	 */
> +	if (new_domain->domain->ops != cur_domain->domain->ops ||
> +	    new_domain->prot != cur_domain->prot)
> +		return 1;
> +
> +	iommu_detach_group(cur_domain->domain, iommu_group);
> +
> +	if (iommu_attach_group(new_domain->domain, iommu_group))
> +		goto out_reattach;
> +
> +	if (vfio_iommu_replay_bind(iommu, group))
> +		goto out_detach;
> +
> +	return 0;
> +
> +out_detach:
> +	iommu_detach_group(new_domain->domain, iommu_group);
> +
> +out_reattach:
> +	if (iommu_attach_group(cur_domain->domain, iommu_group))
> +		return -EINVAL;
> +
> +	return 1;
> +}
> +
>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>  					 struct iommu_group *iommu_group)
>  {
> @@ -1410,28 +1616,16 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
>  		domain->prot |= IOMMU_CACHE;
>  
> -	/*
> -	 * Try to match an existing compatible domain.  We don't want to
> -	 * preclude an IOMMU driver supporting multiple bus_types and being
> -	 * able to include different bus_types in the same IOMMU domain, so
> -	 * we test whether the domains use the same iommu_ops rather than
> -	 * testing if they're on the same bus_type.
> -	 */
>  	list_for_each_entry(d, &iommu->domain_list, next) {
> -		if (d->domain->ops == domain->domain->ops &&
> -		    d->prot == domain->prot) {
> -			iommu_detach_group(domain->domain, iommu_group);
> -			if (!iommu_attach_group(d->domain, iommu_group)) {
> -				list_add(&group->next, &d->group_list);
> -				iommu_domain_free(domain->domain);
> -				kfree(domain);
> -				mutex_unlock(&iommu->lock);
> -				return 0;
> -			}
> -
> -			ret = iommu_attach_group(domain->domain, iommu_group);
> -			if (ret)
> -				goto out_domain;
> +		ret = vfio_iommu_try_attach_group(iommu, group, domain, d);
> +		if (ret < 0) {
> +			goto out_domain;
> +		} else if (!ret) {
> +			list_add(&group->next, &d->group_list);
> +			iommu_domain_free(domain->domain);
> +			kfree(domain);
> +			mutex_unlock(&iommu->lock);
> +			return 0;
>  		}
>  	}
>  
> @@ -1442,6 +1636,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_detach;
>  
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>  	if (resv_msi) {
>  		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>  		if (ret)
> @@ -1547,6 +1745,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  			continue;
>  
>  		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
>  		list_del(&group->next);
>  		kfree(group);
>  		/*
> @@ -1562,6 +1765,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +				vfio_iommu_free_all_mm(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -1596,6 +1800,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->mm_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> @@ -1630,6 +1835,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
>  		kfree(iommu->external_domain);
>  	}
>  
> +	vfio_iommu_free_all_mm(iommu);
>  	vfio_iommu_unmap_unpin_all(iommu);
>  
>  	list_for_each_entry_safe(domain, domain_tmp,
> @@ -1658,6 +1864,169 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
> +{
> +	struct mm_struct *mm;
> +	struct task_struct *task;
> +
> +	task = find_get_task_by_vpid(vpid);
> +	if (!task)
> +		return ERR_PTR(-ESRCH);
> +
> +	/* Ensure that current has RW access on the mm */
> +	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> +	put_task_struct(task);
> +
> +	if (!mm)
> +		return ERR_PTR(-ESRCH);
> +
> +	return mm;
> +}
> +
> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
> +					  void __user *arg,
> +					  struct vfio_iommu_type1_bind *bind)
> +{
> +	struct vfio_iommu_type1_bind_process params;
> +	struct vfio_domain *domain;
> +	struct vfio_group *group;
> +	struct vfio_mm *vfio_mm;
> +	struct mm_struct *mm;
> +	unsigned long minsz;
> +	int ret = 0;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	mutex_lock(&iommu->lock);
> +	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm == mm) {
> +			params.pasid = vfio_mm->pasid;
> +			ret = copy_to_user(arg, &params, sizeof(params)) ?
> +				-EFAULT : 0;
> +			goto out_unlock;
> +		}
> +	}
> +
> +	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
> +	if (!vfio_mm) {
> +		ret = -ENOMEM;
> +		goto out_unlock;
> +	}
> +
> +	vfio_mm->mm = mm;
> +	vfio_mm->pasid = VFIO_PASID_INVALID;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &domain->group_list, next) {
> +			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +			if (ret)
> +				goto out_unbind;
> +		}
> +	}
> +
> +	list_add(&vfio_mm->next, &iommu->mm_list);
> +
> +	params.pasid = vfio_mm->pasid;
> +	ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_delete;
> +
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +	return 0;
> +
> +out_delete:
> +	list_del(&vfio_mm->next);
> +
> +out_unbind:
> +	/* Undo all binds that already succeeded */
> +	vfio_iommu_unbind(iommu, vfio_mm);
> +	kfree(vfio_mm);
> +
> +out_unlock:
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
> +					    void __user *arg,
> +					    struct vfio_iommu_type1_bind *bind)
> +{
> +	int ret = -EINVAL;
> +	unsigned long minsz;
> +	struct mm_struct *mm;
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu_type1_bind_process params;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	/*
> +	 * We can't simply call unbind with the PASID, because the process might
> +	 * have died and the PASID might have been reallocated to another
> +	 * process. Instead we need to fetch that process mm by PID again to
> +	 * make sure we remove the right vfio_mm.
> +	 */
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	ret = -ESRCH;
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm == mm) {
> +			vfio_iommu_unbind(iommu, vfio_mm);
> +			list_del(&vfio_mm->next);
> +			kfree(vfio_mm);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1728,6 +2097,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +
> +	} else if (cmd == VFIO_IOMMU_BIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.flags) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
> +							     &bind);

Can we combine these two blocks given it is only this case statement that is different?

> +		default:
> +			return -EINVAL;
> +		}
> +
> +	} else if (cmd == VFIO_IOMMU_UNBIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.flags) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
> +							       &bind);
> +		default:
> +			return -EINVAL;
> +		}
>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 1aa7b82e8169..dc07752c8fe8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -665,6 +665,82 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/*
> + * VFIO_IOMMU_BIND_PROCESS
> + *
> + * Allocate a PASID for a process address space, and use it to attach this
> + * process to all devices in the container. Devices can then tag their DMA
> + * traffic with the returned @pasid to perform transactions on the associated
> + * virtual address space. Mapping and unmapping buffers is performed by standard
> + * functions such as mmap and malloc.
> + *
> + * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
> + * bind. Otherwise the current task is bound. Given that the caller owns the
> + * device, setting this flag grants the caller read and write permissions on the
> + * entire address space of foreign process described by @pid. Therefore,
> + * permission to perform the bind operation on a foreign process is governed by
> + * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
> + * for more information.
> + *
> + * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
> + * ID is unique to a process and can be used on all devices in the container.
> + *
> + * On fork, the child inherits the device fd and can use the bonds setup by its
> + * parent. Consequently, the child has R/W access on the address spaces bound by
> + * its parent. After an execv, the device fd is closed and the child doesn't
> + * have access to the address space anymore.
> + *
> + * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
> + * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
> + * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
> + * task from the container.
> + */
> +struct vfio_iommu_type1_bind_process {
> +	__u32	flags;
> +#define VFIO_IOMMU_BIND_PID		(1 << 0)
> +	__u32	pasid;
> +	__s32	pid;
> +};
> +
> +/*
> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
> + * vfio_iommu_type1_bind_process in data.
> + */
> +struct vfio_iommu_type1_bind {
> +	__u32	argsz;
> +	__u32	flags;
> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
> +	__u8	data[];
> +};
> +
> +/*
> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
> + *
> + * Manage address spaces of devices in this container. Initially a TYPE1
> + * container can only have one address space, managed with
> + * VFIO_IOMMU_MAP/UNMAP_DMA.
> + *
> + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
> + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
> + * tables, and BIND manages the stage-1 (guest) page tables. Other types of
> + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
> + * non-PASID traffic and BIND controls PASID traffic. But this depends on the
> + * underlying IOMMU architecture and isn't guaranteed.
> + *
> + * Availability of this feature depends on the device, its bus, the underlying
> + * IOMMU and the CPU architecture.
> + *
> + * returns: 0 on success, -errno on failure.
> + */
> +#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +/*
> + * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
> + *
> + * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
> + */
> +#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 17/40] iommu/arm-smmu-v3: Link domains and devices
       [not found]     ` <20180511190641.23008-18-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-17 16:07       ` Jonathan Cameron
       [not found]         ` <20180517170748.00004927-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2018-09-10 15:16       ` Auger Eric
  1 sibling, 1 reply; 123+ messages in thread
From: Jonathan Cameron @ 2018-05-17 16:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:18 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> When removing a mapping from a domain, we need to send an invalidation to
> all devices that might have stored it in their Address Translation Cache
> (ATC). In addition when updating the context descriptor of a live domain,
> we'll need to send invalidations for all devices attached to it.
> 
> Maintain a list of devices in each domain, protected by a spinlock. It is
> updated every time we attach or detach devices to and from domains.
> 
> It needs to be a spinlock because we'll invalidate ATC entries from
> within hardirq-safe contexts, but it may be possible to relax the read
> side with RCU later.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

Trivial naming suggestion...

> ---
>  drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 1d647104bccc..c892f012fb43 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -595,6 +595,11 @@ struct arm_smmu_device {
>  struct arm_smmu_master_data {
>  	struct arm_smmu_device		*smmu;
>  	struct arm_smmu_strtab_ent	ste;
> +
> +	struct arm_smmu_domain		*domain;
> +	struct list_head		list; /* domain->devices */

More meaningful name perhaps to avoid the need for the comment?

> +
> +	struct device			*dev;
>  };
>  
>  /* SMMU private data for an IOMMU domain */
> @@ -618,6 +623,9 @@ struct arm_smmu_domain {
>  	};
>  
>  	struct iommu_domain		domain;
> +
> +	struct list_head		devices;
> +	spinlock_t			devices_lock;
>  };
>  
>  struct arm_smmu_option_prop {
> @@ -1470,6 +1478,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  	}
>  
>  	mutex_init(&smmu_domain->init_mutex);
> +	INIT_LIST_HEAD(&smmu_domain->devices);
> +	spin_lock_init(&smmu_domain->devices_lock);
> +
>  	return &smmu_domain->domain;
>  }
>  
> @@ -1685,7 +1696,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
>  
>  static void arm_smmu_detach_dev(struct device *dev)
>  {
> +	unsigned long flags;
>  	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
> +	struct arm_smmu_domain *smmu_domain = master->domain;
> +
> +	if (smmu_domain) {
> +		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +		list_del(&master->list);
> +		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +		master->domain = NULL;
> +	}
>  
>  	master->ste.assigned = false;
>  	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
> @@ -1694,6 +1715,7 @@ static void arm_smmu_detach_dev(struct device *dev)
>  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
>  	int ret = 0;
> +	unsigned long flags;
>  	struct arm_smmu_device *smmu;
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_master_data *master;
> @@ -1729,6 +1751,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	}
>  
>  	ste->assigned = true;
> +	master->domain = smmu_domain;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_add(&master->list, &smmu_domain->devices);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
>  		ste->s1_cfg = NULL;
> @@ -1847,6 +1874,7 @@ static int arm_smmu_add_device(struct device *dev)
>  			return -ENOMEM;
>  
>  		master->smmu = smmu;
> +		master->dev = dev;
>  		fwspec->iommu_priv = master;
>  	}
>  

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
  2018-05-17 10:02         ` Jean-Philippe Brucker
@ 2018-05-17 17:00           ` Jacob Pan
  0 siblings, 0 replies; 123+ messages in thread
From: Jacob Pan @ 2018-05-17 17:00 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: xieyisheng1, liubo95, kvm, linux-pci, xuzaibo, jonathan.cameron,
	Will Deacon, okaya, linux-mm, yi.l.liu, ashok.raj, tn, joro,
	robdclark, bharatku, linux-acpi, liudongdong3, rfranz,
	devicetree@vger.kernel.org

On Thu, 17 May 2018 11:02:02 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Hi Jacob,
> 
> Thanks for reviewing this
> 
> On 16/05/18 21:41, Jacob Pan wrote:
>  [...]  
> > seems the min_pasid never gets used. do you really need it?  
> 
> Yes, the SMMU sets it to 1 in patch 28/40, because it needs to reserve
> PASID 0
> 
>  [...]  
> > should it be !features?  
> 
> This checks if the user sets any unsupported bit in features. No
> feature is supported right now, but patch 09/40 adds
> IOMMU_SVA_FEAT_IOPF, and changes this line to "features &
> ~IOMMU_SVA_FEAT_IOPF"
> 
> >> +	mutex_lock(&dev->iommu_param->lock);
> >> +	param = dev->iommu_param->sva_param;
> >> +	dev->iommu_param->sva_param = NULL;
> >> +	mutex_unlock(&dev->iommu_param->lock);
> >> +	if (!param)
> >> +		return -ENODEV;
> >> +
> >> +	if (domain->ops->sva_device_shutdown)
> >> +		domain->ops->sva_device_shutdown(dev, param);  
> > seems a little mismatch here, do you need pass the param. I don't
> > think there is anything else model specific iommu driver need to do
> > for the param.  
> 
> SMMU doesn't use it, but maybe it would remind other IOMMU driver
> which features were enabled, so they don't have to keep track of that
> themselves? I can remove it if it isn't useful
> 
If there is a use case, I guess iommu driver can always retrieve the
param from struct device.
> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
       [not found]     ` <20180511190641.23008-8-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-17 15:25       ` Jonathan Cameron
@ 2018-05-18 18:04       ` Jacob Pan
  2018-05-21 14:49         ` Jean-Philippe Brucker
  1 sibling, 1 reply; 123+ messages in thread
From: Jacob Pan @ 2018-05-18 18:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Fri, 11 May 2018 20:06:08 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Some systems allow devices to handle I/O Page Faults in the core mm.
> For example systems implementing the PCI PRI extension or Arm SMMU
> stall model. Infrastructure for reporting these recoverable page
> faults was recently added to the IOMMU core for SVA virtualisation.
> Add a page fault handler for host SVA.
> 
> IOMMU driver can now instantiate several fault workqueues and link
> them to IOPF-capable devices. Drivers can choose between a single
> global workqueue, one per IOMMU device, one per low-level fault
> queue, one per domain, etc.
> 
> When it receives a fault event, supposedly in an IRQ handler, the
> IOMMU driver reports the fault using iommu_report_device_fault(),
> which calls the registered handler. The page fault handler then calls
> the mm fault handler, and reports either success or failure with
> iommu_page_response(). When the handler succeeded, the IOMMU retries
> the access.
> 
> The iopf_param pointer could be embedded into iommu_fault_param. But
> putting iopf_param into the iommu_param structure allows us not to
> care about ordering between calls to iopf_queue_add_device() and
> iommu_register_device_fault_handler().
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2: multiple queues registered by IOMMU driver
> ---
>  drivers/iommu/Kconfig      |   4 +
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 363
> +++++++++++++++++++++++++++++++++++++ include/linux/iommu.h      |
> 58 ++++++ 4 files changed, 426 insertions(+)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 38434899e283..09f13a7c4b60 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -79,6 +79,10 @@ config IOMMU_SVA
>  	select IOMMU_API
>  	select MMU_NOTIFIER
>  
> +config IOMMU_PAGE_FAULT
> +	bool
> +	select IOMMU_API
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..4b744e399a1b 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..321c00dd3a3d
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,363 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +/**
> + * struct iopf_queue - IO Page Fault queue
> + * @wq: the fault workqueue
> + * @flush: low-level flush callback
> + * @flush_arg: flush() argument
> + * @refs: references to this structure taken by producers
> + */
> +struct iopf_queue {
> +	struct workqueue_struct		*wq;
> +	iopf_queue_flush_t		flush;
> +	void				*flush_arg;
> +	refcount_t			refs;
> +};
> +
> +/**
> + * struct iopf_device_param - IO Page Fault data attached to a device
> + * @queue: IOPF queue
> + * @partial: faults that are part of a Page Request Group for which
> the last
> + *           request hasn't been submitted yet.
> + */
> +struct iopf_device_param {
> +	struct iopf_queue		*queue;
> +	struct list_head		partial;
> +};
> +
> +struct iopf_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iopf_group {
> +	struct iopf_context		last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
All the page requests in the group should belong to the same device,
perhaps struct device tracking should be in iopf_group instead of
iopf_context?

> +static int iopf_complete(struct device *dev, struct
> iommu_fault_event *evt,
> +			 enum page_response_code status)
> +{
> +	struct page_response_msg resp = {
> +		.addr			= evt->addr,
> +		.pasid			= evt->pasid,
> +		.pasid_present		= evt->pasid_valid,
> +		.page_req_group_id	= evt->page_req_group_id,
> +		.private_data		= evt->iommu_private,
> +		.resp_code		= status,
> +	};
> +
> +	return iommu_page_response(dev, &resp);
> +}
> +
> +static enum page_response_code
> +iopf_handle_single(struct iopf_context *fault)
> +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iopf_handle_group(struct work_struct *work)
> +{
> +	struct iopf_group *group;
> +	struct iopf_context *fault, *next;
> +	enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iopf_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults
> in the
> +		 * group if there is an error.
> +		 */
There might be performance benefit for certain device to continue in
spite of error in that the ATS retry may have less fault. Perhaps a
policy decision for later enhancement.
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iopf_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iopf_complete(group->last_fault.dev, &group->last_fault.evt,
> status);
> +	kfree(group);
> +}
> +
> +/**
> + * iommu_queue_iopf - IO Page Fault handler
> + * @evt: fault event
> + * @cookie: struct device, passed to
> iommu_register_device_fault_handler.
> + *
> + * Add a fault to the device workqueue, to be handled by mm.
> + */
> +int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
> +{
> +	struct iopf_group *group;
> +	struct iopf_context *fault, *next;
> +	struct iopf_device_param *iopf_param;
> +
> +	struct device *dev = cookie;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (WARN_ON(!mutex_is_locked(&param->lock)))
> +		return -EINVAL;
> +
> +	if (evt->type != IOMMU_FAULT_PAGE_REQ)
> +		/* Not a recoverable page fault */
> +		return IOMMU_PAGE_RESP_CONTINUE;
> +
> +	/*
> +	 * As long as we're holding param->lock, the queue can't be
> unlinked
> +	 * from the device and therefore cannot disappear.
> +	 */
> +	iopf_param = param->iopf_param;
> +	if (!iopf_param)
> +		return -ENODEV;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the
> last one */
> +		list_add(&fault->head, &iopf_param->partial);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iopf_handle_group);
> +
> +	/* See if we have partial faults for this group */
> +	list_for_each_entry_safe(fault, next, &iopf_param->partial,
> head) {
> +		if (fault->evt.page_req_group_id ==
> evt->page_req_group_id)
If all 9 bit group index are used, there can be lots of PRGs. For
future enhancement, maybe we can have per group partial list ready to
go when LPIG comes in? I will be less searching.
> +			/* Insert *before* the last fault */
> +			list_move(&fault->head, &group->faults);
> +	}
> +
If you already sorted the group list with last fault at the end, why do
you need a separate entry to track the last fault?
> +	queue_work(iopf_param->queue->wq, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +EXPORT_SYMBOL_GPL(iommu_queue_iopf);
> +
> +/**
> + * iopf_queue_flush_dev - Ensure that all queued faults have been
> processed
> + * @dev: the endpoint whose faults need to be flushed.
> + *
> + * Users must call this function when releasing a PASID, to ensure
> that all
> + * pending faults for this PASID have been handled, and won't hit
> the address
> + * space of the next process that uses this PASID.
> + *
> + * Return 0 on success.
> + */
> +int iopf_queue_flush_dev(struct device *dev)
> +{
> +	int ret = 0;
> +	struct iopf_queue *queue;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (!param)
> +		return -ENODEV;
> +
> +	/*
> +	 * It is incredibly easy to find ourselves in a deadlock
> situation if
> +	 * we're not careful, because we're taking the opposite path
> as
> +	 * iommu_queue_iopf:
> +	 *
> +	 *   iopf_queue_flush_dev()   |  PRI queue handler
> +	 *    lock(mutex)             |   iommu_queue_iopf()
> +	 *     queue->flush()         |    lock(mutex)
> +	 *      wait PRI queue empty  |
> +	 *
> +	 * So we can't hold the device param lock while flushing. We
> don't have
> +	 * to, because the queue or the device won't disappear until
> all flush
> +	 * are finished.
> +	 */
> +	mutex_lock(&param->lock);
> +	if (param->iopf_param) {
> +		queue = param->iopf_param->queue;
> +	} else {
> +		ret = -ENODEV;
> +	}
> +	mutex_unlock(&param->lock);
> +	if (ret)
> +		return ret;
> +
> +	queue->flush(queue->flush_arg, dev);
> +
> +	/*
> +	 * No need to clear the partial list. All PRGs containing
> the PASID that
> +	 * needs to be decommissioned are whole (the device driver
> made sure of
> +	 * it before this function was called). They have been
> submitted to the
> +	 * queue by the above flush().
> +	 */
So you are saying device driver need to make sure LPIG PRQ is submitted
in the flush call above such that partial list is cleared? what if
there are device failures where device needs to reset (either whole
function level or PASID level), should there still be a need to clear
the partial list for all associated PASID/group?
> +	flush_workqueue(queue->wq);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
> +
> +/**
> + * iopf_queue_add_device - Add producer to the fault queue
> + * @queue: IOPF queue
> + * @dev: device to add
> + */
> +int iopf_queue_add_device(struct iopf_queue *queue, struct device
> *dev) +{
> +	int ret = -EINVAL;
> +	struct iopf_device_param *iopf_param;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (!param)
> +		return -ENODEV;
> +
> +	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
> +	if (!iopf_param)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&iopf_param->partial);
> +	iopf_param->queue = queue;
> +
> +	mutex_lock(&param->lock);
> +	if (!param->iopf_param) {
> +		refcount_inc(&queue->refs);
> +		param->iopf_param = iopf_param;
> +		ret = 0;
> +	}
> +	mutex_unlock(&param->lock);
> +
> +	if (ret)
> +		kfree(iopf_param);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_add_device);
> +
> +/**
> + * iopf_queue_remove_device - Remove producer from fault queue
> + * @dev: device to remove
> + *
> + * Caller makes sure that no more fault is reported for this device,
> and no more
> + * flush is scheduled for this device.
> + *
> + * Note: safe to call unconditionally on a cleanup path, even if the
> device
> + * isn't registered to any IOPF queue.
> + *
> + * Return 0 if the device was attached to the IOPF queue
> + */
> +int iopf_queue_remove_device(struct device *dev)
> +{
> +	struct iopf_context *fault, *next;
> +	struct iopf_device_param *iopf_param;
> +	struct iommu_param *param = dev->iommu_param;
> +
> +	if (!param)
> +		return -EINVAL;
> +
> +	mutex_lock(&param->lock);
> +	iopf_param = param->iopf_param;
> +	if (iopf_param) {
> +		refcount_dec(&iopf_param->queue->refs);
> +		param->iopf_param = NULL;
> +	}
> +	mutex_unlock(&param->lock);
> +	if (!iopf_param)
> +		return -EINVAL;
> +
> +	list_for_each_entry_safe(fault, next, &iopf_param->partial,
> head)
> +		kfree(fault);
> +
> +	/*
> +	 * No more flush is scheduled, and the caller removed all
> bonds from
> +	 * this device. unbind() waited until any concurrent
> mm_exit() finished,
> +	 * therefore there is no flush() running anymore and we can
> free the
> +	 * param.
> +	 */
> +	kfree(iopf_param);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
> +
> +/**
> + * iopf_queue_alloc - Allocate and initialize a fault queue
> + * @name: a unique string identifying the queue (for workqueue)
> + * @flush: a callback that flushes the low-level queue
> + * @cookie: driver-private data passed to the flush callback
> + *
> + * The callback is called before the workqueue is flushed. The IOMMU
> driver must
> + * commit all faults that are pending in its low-level queues at the
> time of the
> + * call, into the IOPF queue (with iommu_report_device_fault). The
> callback
> + * takes a device pointer as argument, hinting what endpoint is
> causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +struct iopf_queue *
> +iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void
> *cookie) +{
> +	struct iopf_queue *queue;
> +
> +	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
> +	if (!queue)
> +		return NULL;
> +
> +	/*
> +	 * The WQ is unordered because the low-level handler
> enqueues faults by
> +	 * group. PRI requests within a group have to be ordered,
> but once
> +	 * that's dealt with, the high-level function can handle
> groups out of
> +	 * order.
> +	 */
> +	queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0,
> name);
> +	if (!queue->wq) {
> +		kfree(queue);
> +		return NULL;
> +	}
> +
> +	queue->flush = flush;
> +	queue->flush_arg = cookie;
> +	refcount_set(&queue->refs, 1);
> +
> +	return queue;
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_alloc);
> +
> +/**
> + * iopf_queue_free - Free IOPF queue
> + * @queue: queue to free
> + *
> + * Counterpart to iopf_queue_alloc(). Caller must make sure that all
> producers
> + * have been removed.
> + */
> +void iopf_queue_free(struct iopf_queue *queue)
> +{
> +
> +	/* Caller should have removed all producers first */
> +	if (WARN_ON(!refcount_dec_and_test(&queue->refs)))
> +		return;
> +
> +	destroy_workqueue(queue->wq);
> +	kfree(queue);
> +}
> +EXPORT_SYMBOL_GPL(iopf_queue_free);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index faf3390ce89d..fad3a60e1c14 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -461,11 +461,20 @@ struct iommu_fault_param {
>  	void *data;
>  };
>  
> +/**
> + * iopf_queue_flush_t - Flush low-level page fault queue
> + *
> + * Report all faults currently pending in the low-level page fault
> queue
> + */
> +struct iopf_queue;
> +typedef int (*iopf_queue_flush_t)(void *cookie, struct device *dev);
> +
>  /**
>   * struct iommu_param - collection of per-device IOMMU data
>   *
>   * @fault_param: IOMMU detected device fault reporting data
>   * @sva_param: SVA parameters
> + * @iopf_param: I/O Page Fault queue and data
>   *
>   * TODO: migrate other per device data pointers under
> iommu_dev_data, e.g.
>   *	struct iommu_group	*iommu_group;
> @@ -475,6 +484,7 @@ struct iommu_param {
>  	struct mutex lock;
>  	struct iommu_fault_param *fault_param;
>  	struct iommu_sva_param *sva_param;
> +	struct iopf_device_param *iopf_param;
>  };
>  
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -874,6 +884,12 @@ static inline int
> iommu_report_device_fault(struct device *dev, struct iommu_fau return
> 0; }
>  
> +static inline int iommu_page_response(struct device *dev,
> +				      struct page_response_msg *msg)
> +{
> +	return -ENODEV;
> +}
> +
>  static inline int iommu_group_id(struct iommu_group *group)
>  {
>  	return -ENODEV;
> @@ -1038,4 +1054,46 @@ static inline struct mm_struct
> *iommu_sva_find(int pasid) }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_PAGE_FAULT
> +extern int iommu_queue_iopf(struct iommu_fault_event *evt, void
> *cookie); +
> +extern int iopf_queue_add_device(struct iopf_queue *queue, struct
> device *dev); +extern int iopf_queue_remove_device(struct device
> *dev); +extern int iopf_queue_flush_dev(struct device *dev);
> +extern struct iopf_queue *
> +iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void
> *cookie); +extern void iopf_queue_free(struct iopf_queue *queue);
> +#else /* CONFIG_IOMMU_PAGE_FAULT */
> +static inline int iommu_queue_iopf(struct iommu_fault_event *evt,
> void *cookie) +{
> +	return -ENODEV;
> +}
> +
> +static inline int iopf_queue_add_device(struct iopf_queue *queue,
> +					struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iopf_queue_remove_device(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iopf_queue_flush_dev(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline struct iopf_queue *
> +iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void
> *cookie) +{
> +	return NULL;
> +}
> +
> +static inline void iopf_queue_free(struct iopf_queue *queue)
> +{
> +}
> +#endif /* CONFIG_IOMMU_PAGE_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 35/40] iommu/arm-smmu-v3: Add support for PCI ATS
       [not found]     ` <20180511190641.23008-36-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-19 17:25       ` Sinan Kaya
       [not found]         ` <922474e8-0aa5-e022-0502-f1e51b0d4859-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Sinan Kaya @ 2018-05-19 17:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

On 5/11/2018 3:06 PM, Jean-Philippe Brucker wrote:
> PCIe devices can implement their own TLB, named Address Translation Cache
> (ATC). Enable Address Translation Service (ATS) for devices that support
> it and send them invalidation requests whenever we invalidate the IOTLBs.
> 
>   Range calculation
>   -----------------
> 
> The invalidation packet itself is a bit awkward: range must be naturally
> aligned, which means that the start address is a multiple of the range
> size. In addition, the size must be a power of two number of 4k pages. We
> have a few options to enforce this constraint:
> 
> (1) Find the smallest naturally aligned region that covers the requested
>     range. This is simple to compute and only takes one ATC_INV, but it
>     will spill on lots of neighbouring ATC entries.
> 
> (2) Align the start address to the region size (rounded up to a power of
>     two), and send a second invalidation for the next range of the same
>     size. Still not great, but reduces spilling.
> 
> (3) Cover the range exactly with the smallest number of naturally aligned
>     regions. This would be interesting to implement but as for (2),
>     requires multiple ATC_INV.
> 
> As I suspect ATC invalidation packets will be a very scarce resource, I'll
> go with option (1) for now, and only send one big invalidation. We can
> move to (2), which is both easier to read and more gentle with the ATC,
> once we've observed on real systems that we can send multiple smaller
> Invalidation Requests for roughly the same price as a single big one.
> 
> Note that with io-pgtable, the unmap function is called for each page, so
> this doesn't matter. The problem shows up when sharing page tables with
> the MMU.
> 
>   Timeout
>   -------
> 
> ATC invalidation is allowed to take up to 90 seconds, according to the
> PCIe spec, so it is possible to hit the SMMU command queue timeout during
> normal operations.
> 
> Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
> fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
> Others might let CMD_SYNC complete and have an asynchronous IMPDEF
> mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
> could retry sending all ATC_INV since last successful CMD_SYNC. When a
> CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
> commands since last successful CMD_SYNC.
> 
> We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
> notifiers. So we'd have to introduce a more clever system if this timeout
> becomes a problem, like keeping hold of mappings and invalidating in the
> background. Implementing safe delayed invalidations is a very complex
> problem and deserves a series of its own. We'll assess whether more work
> is needed to properly handle ATC invalidation timeouts once this code runs
> on real hardware.
> 
>   Misc
>   ----
> 
> I didn't put ATC and TLB invalidations in the same functions for three
> reasons:
> 
> * TLB invalidation by range is batched and committed with a single sync.
>   Batching ATC invalidation is inconvenient, endpoints limit the number of
>   inflight invalidations. We'd have to count the number of invalidations
>   queued and send a sync periodically. In addition, I suspect we always
>   need a sync between TLB and ATC invalidation for the same page.
> 
> * Doing ATC invalidation outside tlb_inv_range also allows to send less
>   requests, since TLB invalidations are done per page or block, while ATC
>   invalidations target IOVA ranges.
> 
> * TLB invalidation by context is performed when freeing the domain, at
>   which point there isn't any device attached anymore.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>


Nothing specific about this patch but just a general observation. Last time I
looked at the code, it seemed to require both ATS and PRI support from a given
hardware.

I think you can assume that for ATS 1.1 specification but ATS 1.0 specification
allows a system to have ATS+PASID without PRI. 

QDF2400 is ATS 1.0 compatible as an example. 

Is this an assumption / my misinterpretation?


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices
       [not found]         ` <20180517141058.00001c76-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-21 14:43           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:43 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/05/18 14:10, Jonathan Cameron wrote:
> On Fri, 11 May 2018 20:06:03 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
>> PASID that drivers can program in hardware, to let their devices access an
>> mm. This patch only adds skeletons for the device driver API, most of the
>> implementation is still missing.
>>
>> IOMMU groups with more than one device aren't supported for SVA at the
>> moment. There may be P2P traffic between devices within a group, which
>> cannot be seen by an IOMMU (note that supporting PASID doesn't add any
>> form of isolation with regard to P2P). Supporting groups would require
>> calling bind() for all bound processes every time a device is added to a
>> group, to perform sanity checks (e.g. ensure that new devices support
>> PASIDs at least as big as those already allocated in the group).
> 
> Is it worth adding an explicit comment on this reasoning (or a minimal subset
> of it) at the check for the number of devices in the group?
> It's well laid out here, but might not be so obvious if someone is reading
> the code in the future.

Sure, I'll add something

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]         ` <20180517150516.000067ca-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-21 14:44           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/05/18 15:25, Jonathan Cameron wrote:
>> +static struct io_mm *
>> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
>> +	    struct mm_struct *mm, unsigned long flags)
>> +{
>> +	int ret;
>> +	int pasid;
>> +	struct io_mm *io_mm;
>> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
>> +
>> +	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	io_mm = domain->ops->mm_alloc(domain, mm, flags);
>> +	if (IS_ERR(io_mm))
>> +		return io_mm;
>> +	if (!io_mm)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	/*
>> +	 * The mm must not be freed until after the driver frees the io_mm
>> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
>> +	 * valid mm struct.)
>> +	 */
>> +	mmgrab(mm);
>> +
>> +	io_mm->flags		= flags;
>> +	io_mm->mm		= mm;
>> +	io_mm->release		= domain->ops->mm_free;
>> +	INIT_LIST_HEAD(&io_mm->devices);
>> +
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_sva_lock);
>> +	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
>> +			  param->max_pasid + 1, GFP_ATOMIC);
> 
> I'd expect the IDR cleanup to be in io_mm_free as that would 'match'
> against io_mm_alloc but it's in io_mm_release just before the io_mm_free
> call, perhaps move it or am I missing something?
> 
> Hmm. This is reworked in patch 5 to use call rcu to do the free.  I suppose
> we may be burning an idr entry if we take a while to get round to the
> free..  If so a comment to explain this would be great.

Ok, I'll see if I can come up with some comments for both patch 3 and 5.

>> +	io_mm->pasid = pasid;
>> +	spin_unlock(&iommu_sva_lock);
>> +	idr_preload_end();
>> +
>> +	if (pasid < 0) {
>> +		ret = pasid;
>> +		goto err_free_mm;
>> +	}
>> +
>> +	/* TODO: keep track of mm. For the moment, abort. */
> 
> From later patches, I can now see why we didn't init the kref
> here, but perhaps a comment would make that clear rather than
> people checking it is correctly used throughout?  Actually just grab
> the comment from patch 5 and put it in this one and that will
> do the job nicely.

Ok

>> +	ret = -ENOSYS;
>> +	spin_lock(&iommu_sva_lock);
>> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
>> +	spin_unlock(&iommu_sva_lock);
>> +
>> +err_free_mm:
>> +	domain->ops->mm_free(io_mm);
> 
> Really minor, but you now have io_mm->release set so to keep
> this obviously the same as the io_mm_free path, perhaps call
> that rather than mm_free directly.

Yes, makes sense

>> +static void io_mm_detach_locked(struct iommu_bond *bond)
>> +{
>> +	struct iommu_bond *tmp;
>> +	bool detach_domain = true;
>> +	struct iommu_domain *domain = bond->domain;
>> +
>> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
>> +		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
>> +			detach_domain = false;
>> +			break;
>> +		}
>> +	}
>> +
>> +	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
>> +
> 
> I can't see an immediate reason to have a different order in her to the reverse of
> the attach above.   So I think you should be detaching after the list_del calls.
> If there is a reason, can we have a comment so I don't ask on v10.

I don't see a reason either right now, I'll see if it can be moved

> 
>> +	list_del(&bond->mm_head);
>> +	list_del(&bond->domain_head);
>> +	list_del(&bond->dev_head);
>> +	io_mm_put_locked(bond->io_mm);


>> +	/* If an io_mm already exists, use it */
>> +	spin_lock(&iommu_sva_lock);
>> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
>> +		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
>> +			/* ... Unless it's already bound to this device */
>> +			list_for_each_entry(tmp, &io_mm->devices, mm_head) {
>> +				if (tmp->dev == dev) {
>> +					bond = tmp;
> 
> Using bond for this is clear in a sense, but can we not just use ret
> so it is obvious here that we are going to return -EEXIST?
> At first glance I thought you were going to carry on with this bond
> and couldn't work out why it would ever make sense to have two bonds
> between a device an an io_mm (which it doesn't!)

Yes, using ret is nicer

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 05/40] iommu/sva: Track mm changes with an MMU notifier
       [not found]         ` <20180517152514.00004247-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-21 14:44           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/05/18 15:25, Jonathan Cameron wrote:
>> +		 * already have been removed from the list. Check is someone is
> 
> Check if someone...

Ok

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
       [not found]         ` <20180517162555.00002bd3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-21 14:48           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:48 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/05/18 16:25, Jonathan Cameron wrote:
> On Fri, 11 May 2018 20:06:08 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> Some systems allow devices to handle I/O Page Faults in the core mm. For
>> example systems implementing the PCI PRI extension or Arm SMMU stall
>> model. Infrastructure for reporting these recoverable page faults was
>> recently added to the IOMMU core for SVA virtualisation. Add a page fault
>> handler for host SVA.
>>
>> IOMMU driver can now instantiate several fault workqueues and link them to
>> IOPF-capable devices. Drivers can choose between a single global
>> workqueue, one per IOMMU device, one per low-level fault queue, one per
>> domain, etc.
>>
>> When it receives a fault event, supposedly in an IRQ handler, the IOMMU
>> driver reports the fault using iommu_report_device_fault(), which calls
>> the registered handler. The page fault handler then calls the mm fault
>> handler, and reports either success or failure with iommu_page_response().
>> When the handler succeeded, the IOMMU retries the access.
>>
>> The iopf_param pointer could be embedded into iommu_fault_param. But
>> putting iopf_param into the iommu_param structure allows us not to care
>> about ordering between calls to iopf_queue_add_device() and
>> iommu_register_device_fault_handler().
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> Hi Jean-Phillipe,
> 
> One question below on how we would end up with partial faults left when
> doing the queue remove. Code looks fine, but I'm not seeing how that
> would happen without buggy hardware... + we presumably have to rely on
> the hardware timing out on that request or it's dead anyway...

>> +/**
>> + * iopf_queue_remove_device - Remove producer from fault queue
>> + * @dev: device to remove
>> + *
>> + * Caller makes sure that no more fault is reported for this device, and no more
>> + * flush is scheduled for this device.
>> + *
>> + * Note: safe to call unconditionally on a cleanup path, even if the device
>> + * isn't registered to any IOPF queue.
>> + *
>> + * Return 0 if the device was attached to the IOPF queue
>> + */
>> +int iopf_queue_remove_device(struct device *dev)
>> +{
>> +	struct iopf_context *fault, *next;
>> +	struct iopf_device_param *iopf_param;
>> +	struct iommu_param *param = dev->iommu_param;
>> +
>> +	if (!param)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&param->lock);
>> +	iopf_param = param->iopf_param;
>> +	if (iopf_param) {
>> +		refcount_dec(&iopf_param->queue->refs);
>> +		param->iopf_param = NULL;
>> +	}
>> +	mutex_unlock(&param->lock);
>> +	if (!iopf_param)
>> +		return -EINVAL;
>> +
>> +	list_for_each_entry_safe(fault, next, &iopf_param->partial, head)
>> +		kfree(fault);
>> +
> 
> Why would we end up here with partials still in the list?  Buggy hardware?

Buggy hardware is one possibility. There also is the nasty case of PRI
queue overflow. If the PRI queue is full, then the SMMU discards new
Page Requests from the device. It may discard a 'last' PR of a group
that is already in iopf_param->partial. This group will never be freed
until this cleanup.

The spec dismisses PRIq overflows because the OS is supposed to allocate
PRI credits to devices such that they can't send more than the PRI queue
size. This isn't possible in Linux because we don't know exactly how
many PRI-capable devices will share a queue (the upper limit is 2**32,
and devices may be hotplugged well after we allocated the PRI queue). So
PRIq overflow is possible.

Ideally when detecting a PRIq overflow we need to immediately remove all
partial faults of all devices sharing this queue. Since I can't easily
test that case (my device doesn't do PRGs) and it's complicated code, I
left it as TODO in patch 39, and this is our only chance to free
orphaned page requests.

>> +void iopf_queue_free(struct iopf_queue *queue)
>> +{
>> +
> 
> Nitpick : Blank line here doesn't add anything.

Ok

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
  2018-05-18 18:04       ` Jacob Pan
@ 2018-05-21 14:49         ` Jean-Philippe Brucker
       [not found]           ` <8a640794-a6f3-fa01-82a9-06479a6f779a-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:49 UTC (permalink / raw)
  To: Jacob Pan
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 18/05/18 19:04, Jacob Pan wrote:
>> +struct iopf_context {
>> +	struct device			*dev;
>> +	struct iommu_fault_event	evt;
>> +	struct list_head		head;
>> +};
>> +
>> +struct iopf_group {
>> +	struct iopf_context		last_fault;
>> +	struct list_head		faults;
>> +	struct work_struct		work;
>> +};
>> +
> All the page requests in the group should belong to the same device,
> perhaps struct device tracking should be in iopf_group instead of
> iopf_context?

Right, this is a leftover from when we kept a global list of partial
faults. Since the list is now per-device, I can move the dev pointer (I
think I should also rename iopf_context to iopf_fault, if that's all right)

>> +static int iopf_complete(struct device *dev, struct
>> iommu_fault_event *evt,
>> +			 enum page_response_code status)
>> +{
>> +	struct page_response_msg resp = {
>> +		.addr			= evt->addr,
>> +		.pasid			= evt->pasid,
>> +		.pasid_present		= evt->pasid_valid,
>> +		.page_req_group_id	= evt->page_req_group_id,
>> +		.private_data		= evt->iommu_private,
>> +		.resp_code		= status,
>> +	};
>> +
>> +	return iommu_page_response(dev, &resp);
>> +}
>> +
>> +static enum page_response_code
>> +iopf_handle_single(struct iopf_context *fault)
>> +{
>> +	/* TODO */
>> +	return -ENODEV;
>> +}
>> +
>> +static void iopf_handle_group(struct work_struct *work)
>> +{
>> +	struct iopf_group *group;
>> +	struct iopf_context *fault, *next;
>> +	enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
>> +
>> +	group = container_of(work, struct iopf_group, work);
>> +
>> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
>> +		struct iommu_fault_event *evt = &fault->evt;
>> +		/*
>> +		 * Errors are sticky: don't handle subsequent faults
>> in the
>> +		 * group if there is an error.
>> +		 */
> There might be performance benefit for certain device to continue in
> spite of error in that the ATS retry may have less fault. Perhaps a
> policy decision for later enhancement.

Yes I think we should leave it to future work. My current reasoning is
that subsequent requests are more likely to fail as well and reporting
the error is more urgent, but we need real workloads to confirm this.

>> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
>> +			status = iopf_handle_single(fault);
>> +
>> +		if (!evt->last_req)
>> +			kfree(fault);
>> +	}
>> +
>> +	iopf_complete(group->last_fault.dev, &group->last_fault.evt,
>> status);
>> +	kfree(group);
>> +}
>> +
>> +/**
>> + * iommu_queue_iopf - IO Page Fault handler
>> + * @evt: fault event
>> + * @cookie: struct device, passed to
>> iommu_register_device_fault_handler.
>> + *
>> + * Add a fault to the device workqueue, to be handled by mm.
>> + */
>> +int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
>> +{
>> +	struct iopf_group *group;
>> +	struct iopf_context *fault, *next;
>> +	struct iopf_device_param *iopf_param;
>> +
>> +	struct device *dev = cookie;
>> +	struct iommu_param *param = dev->iommu_param;
>> +
>> +	if (WARN_ON(!mutex_is_locked(&param->lock)))
>> +		return -EINVAL;
>> +
>> +	if (evt->type != IOMMU_FAULT_PAGE_REQ)
>> +		/* Not a recoverable page fault */
>> +		return IOMMU_PAGE_RESP_CONTINUE;
>> +
>> +	/*
>> +	 * As long as we're holding param->lock, the queue can't be
>> unlinked
>> +	 * from the device and therefore cannot disappear.
>> +	 */
>> +	iopf_param = param->iopf_param;
>> +	if (!iopf_param)
>> +		return -ENODEV;
>> +
>> +	if (!evt->last_req) {
>> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
>> +		if (!fault)
>> +			return -ENOMEM;
>> +
>> +		fault->evt = *evt;
>> +		fault->dev = dev;
>> +
>> +		/* Non-last request of a group. Postpone until the
>> last one */
>> +		list_add(&fault->head, &iopf_param->partial);
>> +
>> +		return IOMMU_PAGE_RESP_HANDLED;
>> +	}
>> +
>> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
>> +	if (!group)
>> +		return -ENOMEM;
>> +
>> +	group->last_fault.evt = *evt;
>> +	group->last_fault.dev = dev;
>> +	INIT_LIST_HEAD(&group->faults);
>> +	list_add(&group->last_fault.head, &group->faults);
>> +	INIT_WORK(&group->work, iopf_handle_group);
>> +
>> +	/* See if we have partial faults for this group */
>> +	list_for_each_entry_safe(fault, next, &iopf_param->partial,
>> head) {
>> +		if (fault->evt.page_req_group_id ==
>> evt->page_req_group_id)
> If all 9 bit group index are used, there can be lots of PRGs. For
> future enhancement, maybe we can have per group partial list ready to
> go when LPIG comes in? I will be less searching.

Yes, allocating the PRG from the start could be more efficient. I think
it's slightly more complicated code so I'd rather see performance
numbers before implementing it

>> +			/* Insert *before* the last fault */
>> +			list_move(&fault->head, &group->faults);
>> +	}
>> +
> If you already sorted the group list with last fault at the end, why do
> you need a separate entry to track the last fault?

Not sure I understand your question, sorry. Do you mean why the
iopf_group.last_fault? Just to avoid one more kzalloc.

>> +
>> +	queue->flush(queue->flush_arg, dev);
>> +
>> +	/*
>> +	 * No need to clear the partial list. All PRGs containing
>> the PASID that
>> +	 * needs to be decommissioned are whole (the device driver
>> made sure of
>> +	 * it before this function was called). They have been
>> submitted to the
>> +	 * queue by the above flush().
>> +	 */
> So you are saying device driver need to make sure LPIG PRQ is submitted
> in the flush call above such that partial list is cleared?

Not exactly, it's the IOMMU driver that makes sure all LPIG in its
queues are submitted by the above flush call. In more details the flow is:

* Either device driver calls unbind()/sva_device_shutdown(), or the
process exits.
* If the device driver called, then it already told the device to stop
using the PASID. Otherwise we use the mm_exit() callback to tell the
device driver to stop using the PASID.
* In either case, when receiving a stop request from the driver, the
device sends the LPIGs to the IOMMU queue.
* Then, the flush call above ensures that the IOMMU reports the LPIG
with iommu_report_device_fault.
* While submitting all LPIGs for this PASID to the work queue,
ipof_queue_fault also picked up all partial faults, so the partial list
is clean.

Maybe I should improve this comment?

> what if
> there are device failures where device needs to reset (either whole
> function level or PASID level), should there still be a need to clear
> the partial list for all associated PASID/group?

I guess the simplest way out, when resetting the device, is for the
device driver to call iommu_sva_device_shutdown(). Both the IOMMU
driver's sva_device_shutdown() and remove_device() ops should call
iopf_queue_remove_device(), which clears the partial list.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 17/40] iommu/arm-smmu-v3: Link domains and devices
       [not found]         ` <20180517170748.00004927-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-21 14:49           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:49 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/05/18 17:07, Jonathan Cameron wrote:
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -595,6 +595,11 @@ struct arm_smmu_device {
>>  struct arm_smmu_master_data {
>>  	struct arm_smmu_device		*smmu;
>>  	struct arm_smmu_strtab_ent	ste;
>> +
>> +	struct arm_smmu_domain		*domain;
>> +	struct list_head		list; /* domain->devices */
> 
> More meaningful name perhaps to avoid the need for the comment?

Sure

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]         ` <20180517165845.00000cc9-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-21 14:51           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:51 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/05/18 16:58, Jonathan Cameron wrote:
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +				 struct vfio_group *group,
>> +				 struct vfio_mm *vfio_mm)
>> +{
>> +	int ret;
>> +	bool enabled_sva = false;
>> +	struct vfio_iommu_sva_bind_data data = {
>> +		.vfio_mm	= vfio_mm,
>> +		.iommu		= iommu,
>> +		.count		= 0,
>> +	};
>> +
>> +	if (!group->sva_enabled) {
>> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +					       vfio_iommu_sva_init);
>> +		if (ret)
>> +			return ret;
>> +
>> +		group->sva_enabled = enabled_sva = true;
>> +	}
>> +
>> +	ret = iommu_group_for_each_dev(group->iommu_group, &data,
>> +				       vfio_iommu_sva_bind_dev);
>> +	if (ret && data.count > 1)
> 
> Are we safe to run this even if data.count == 1?  I assume that at
> that point we would always not have an iommu domain associated with the
> device so the initial check would error out.

If data.count == 1, then the first bind didn't succeed. But it's not
necessarily a domain missing, failure to bind may come from various
places. If this vfio_mm was already bound to another device then it
contains a valid PASID and it's safe to call unbind(). Otherwise we call
it with PASID -1 (VFIO_INVALID_PASID) and that's a bit of a grey area.
-1 is currently invalid everywhere, but in the future someone might
implement 32 bits of PASIDs, in which case a bond between this dev and
PASID -1 might exist. But I think it's safe for now, and whoever
redefines VFIO_INVALID_PASID when such implementation appears will also
fix this case.

> Just be nice to get rid of the special casing in this error path as then
> could just do it all under if (ret) and make it visually clearer these
> are different aspects of the same error path.

[...]
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  				   unsigned int cmd, unsigned long arg)
>>  {
>> @@ -1728,6 +2097,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  
>>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>>  			-EFAULT : 0;
>> +
>> +	} else if (cmd == VFIO_IOMMU_BIND) {
>> +		struct vfio_iommu_type1_bind bind;
>> +
>> +		minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
>> +
>> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
>> +			return -EFAULT;
>> +
>> +		if (bind.argsz < minsz)
>> +			return -EINVAL;
>> +
>> +		switch (bind.flags) {
>> +		case VFIO_IOMMU_BIND_PROCESS:
>> +			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
>> +							     &bind);
> 
> Can we combine these two blocks given it is only this case statement that is different?

That would be nicer, though I don't know yet what's needed for vSVA (by
Yi Liu on Cc), which will add statements to the switches.

Thanks,
Jean

> 
>> +		default:
>> +			return -EINVAL;
>> +		}
>> +
>> +	} else if (cmd == VFIO_IOMMU_UNBIND) {
>> +		struct vfio_iommu_type1_bind bind;
>> +
>> +		minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
>> +
>> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
>> +			return -EFAULT;
>> +
>> +		if (bind.argsz < minsz)
>> +			return -EINVAL;
>> +
>> +		switch (bind.flags) {
>> +		case VFIO_IOMMU_BIND_PROCESS:
>> +			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
>> +							       &bind);
>> +		default:
>> +			return -EINVAL;
>> +		}
>>  	}

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 35/40] iommu/arm-smmu-v3: Add support for PCI ATS
       [not found]         ` <922474e8-0aa5-e022-0502-f1e51b0d4859-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-05-21 14:52           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-21 14:52 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

Hi Sinan,

On 19/05/18 18:25, Sinan Kaya wrote:
> Nothing specific about this patch but just a general observation. Last time I
> looked at the code, it seemed to require both ATS and PRI support from a given
> hardware.
> 
> I think you can assume that for ATS 1.1 specification but ATS 1.0 specification
> allows a system to have ATS+PASID without PRI. 

As far as I know, the latest ATS spec also states that "device that
supports ATS need not support PRI". I'm referring to the version
integrated into PCIe v4.0r1.0, which I think corresponds to ATS 1.1.

> QDF2400 is ATS 1.0 compatible as an example. 
> 
> Is this an assumption / my misinterpretation?

In this series you can enable ATS and PASID without PRI. The SMMU
enables ATS and PASID in add_device() if supported. Then PRI is only
enabled if users request IOMMU_SVA_FEAT_IOPF in sva_init_device(). If
the device driver pins all DMA memory, it can use PASID without PRI.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]           ` <de478769-9f7a-d40b-a55e-e2c63ad883e8-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-22 16:43             ` Jacob Pan
  2018-05-24 11:44               ` Jean-Philippe Brucker
  0 siblings, 1 reply; 123+ messages in thread
From: Jacob Pan @ 2018-05-22 16:43 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On Thu, 17 May 2018 11:02:42 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> On 17/05/18 00:31, Jacob Pan wrote:
> > On Fri, 11 May 2018 20:06:04 +0100
> > I am a little confused about domain vs. pasid relationship. If
> > each domain represents a address space, should there be a domain for
> > each pasid?  
> 
> I don't think there is a formal definition, but from previous
> discussion the consensus seems to be: domains are a collection of
> devices that have the same virtual address spaces (one or many).
> 
> Keeping that definition makes things easier, in my opinion. Some time
> ago, I did try to represent PASIDs using "subdomains" (introducing a
> hierarchy of struct iommu_domain), but it required invasive changes in
> the IOMMU subsystem and probably all over the tree.
> 
> You do need some kind of "root domain" for each device, so that
> "iommu_get_domain_for_dev()" still makes sense. That root domain
> doesn't have a single address space but a collection of subdomains.
> If you need this anyway, representing a PASID with an iommu_domain
> doesn't seem preferable than using a different structure (io_mm),
> because they don't have anything in common.
> 
My main concern is the PASID table storage. If PASID table storage
is tied to domain, it is ok to scale up, i.e. multiple devices in a
domain share a single PASID table. But to scale down, e.g. further
partition device with VFIO mdev for assignment, each mdev may get its
own domain via vfio. But there is no IOMMU storage for PASID table at
mdev device level. Perhaps we do need root domain or some parent-child
relationship to locate PASID table.

> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
       [not found]           ` <8a640794-a6f3-fa01-82a9-06479a6f779a-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-22 23:35             ` Jacob Pan
  2018-05-24 11:44               ` Jean-Philippe Brucker
  0 siblings, 1 reply; 123+ messages in thread
From: Jacob Pan @ 2018-05-22 23:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, 21 May 2018 15:49:40 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> On 18/05/18 19:04, Jacob Pan wrote:
> >> +struct iopf_context {
> >> +	struct device			*dev;
> >> +	struct iommu_fault_event	evt;
> >> +	struct list_head		head;
> >> +};
> >> +
> >> +struct iopf_group {
> >> +	struct iopf_context		last_fault;
> >> +	struct list_head		faults;
> >> +	struct work_struct		work;
> >> +};
> >> +  
> > All the page requests in the group should belong to the same device,
> > perhaps struct device tracking should be in iopf_group instead of
> > iopf_context?  
> 
> Right, this is a leftover from when we kept a global list of partial
> faults. Since the list is now per-device, I can move the dev pointer
> (I think I should also rename iopf_context to iopf_fault, if that's
> all right)
> 
> >> +static int iopf_complete(struct device *dev, struct
> >> iommu_fault_event *evt,
> >> +			 enum page_response_code status)
> >> +{
> >> +	struct page_response_msg resp = {
> >> +		.addr			= evt->addr,
> >> +		.pasid			= evt->pasid,
> >> +		.pasid_present		= evt->pasid_valid,
> >> +		.page_req_group_id	=
> >> evt->page_req_group_id,
> >> +		.private_data		= evt->iommu_private,
> >> +		.resp_code		= status,
> >> +	};
> >> +
> >> +	return iommu_page_response(dev, &resp);
> >> +}
> >> +
> >> +static enum page_response_code
> >> +iopf_handle_single(struct iopf_context *fault)
> >> +{
> >> +	/* TODO */
> >> +	return -ENODEV;
> >> +}
> >> +
> >> +static void iopf_handle_group(struct work_struct *work)
> >> +{
> >> +	struct iopf_group *group;
> >> +	struct iopf_context *fault, *next;
> >> +	enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
> >> +
> >> +	group = container_of(work, struct iopf_group, work);
> >> +
> >> +	list_for_each_entry_safe(fault, next, &group->faults,
> >> head) {
> >> +		struct iommu_fault_event *evt = &fault->evt;
> >> +		/*
> >> +		 * Errors are sticky: don't handle subsequent
> >> faults in the
> >> +		 * group if there is an error.
> >> +		 */  
> > There might be performance benefit for certain device to continue in
> > spite of error in that the ATS retry may have less fault. Perhaps a
> > policy decision for later enhancement.  
> 
> Yes I think we should leave it to future work. My current reasoning is
> that subsequent requests are more likely to fail as well and reporting
> the error is more urgent, but we need real workloads to confirm this.
> 
> >> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> >> +			status = iopf_handle_single(fault);
> >> +
> >> +		if (!evt->last_req)
> >> +			kfree(fault);
> >> +	}
> >> +
> >> +	iopf_complete(group->last_fault.dev,
> >> &group->last_fault.evt, status);
> >> +	kfree(group);
> >> +}
> >> +
> >> +/**
> >> + * iommu_queue_iopf - IO Page Fault handler
> >> + * @evt: fault event
> >> + * @cookie: struct device, passed to
> >> iommu_register_device_fault_handler.
> >> + *
> >> + * Add a fault to the device workqueue, to be handled by mm.
> >> + */
> >> +int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
> >> +{
> >> +	struct iopf_group *group;
> >> +	struct iopf_context *fault, *next;
> >> +	struct iopf_device_param *iopf_param;
> >> +
> >> +	struct device *dev = cookie;
> >> +	struct iommu_param *param = dev->iommu_param;
> >> +
> >> +	if (WARN_ON(!mutex_is_locked(&param->lock)))
> >> +		return -EINVAL;
> >> +
> >> +	if (evt->type != IOMMU_FAULT_PAGE_REQ)
> >> +		/* Not a recoverable page fault */
> >> +		return IOMMU_PAGE_RESP_CONTINUE;
> >> +
> >> +	/*
> >> +	 * As long as we're holding param->lock, the queue can't
> >> be unlinked
> >> +	 * from the device and therefore cannot disappear.
> >> +	 */
> >> +	iopf_param = param->iopf_param;
> >> +	if (!iopf_param)
> >> +		return -ENODEV;
> >> +
> >> +	if (!evt->last_req) {
> >> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> >> +		if (!fault)
> >> +			return -ENOMEM;
> >> +
> >> +		fault->evt = *evt;
> >> +		fault->dev = dev;
> >> +
> >> +		/* Non-last request of a group. Postpone until the
> >> last one */
> >> +		list_add(&fault->head, &iopf_param->partial);
> >> +
> >> +		return IOMMU_PAGE_RESP_HANDLED;
> >> +	}
> >> +
> >> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> >> +	if (!group)
> >> +		return -ENOMEM;
> >> +
> >> +	group->last_fault.evt = *evt;
> >> +	group->last_fault.dev = dev;
> >> +	INIT_LIST_HEAD(&group->faults);
> >> +	list_add(&group->last_fault.head, &group->faults);
> >> +	INIT_WORK(&group->work, iopf_handle_group);
> >> +
> >> +	/* See if we have partial faults for this group */
> >> +	list_for_each_entry_safe(fault, next,
> >> &iopf_param->partial, head) {
> >> +		if (fault->evt.page_req_group_id ==
> >> evt->page_req_group_id)  
> > If all 9 bit group index are used, there can be lots of PRGs. For
> > future enhancement, maybe we can have per group partial list ready
> > to go when LPIG comes in? I will be less searching.  
> 
> Yes, allocating the PRG from the start could be more efficient. I
> think it's slightly more complicated code so I'd rather see
> performance numbers before implementing it
> 
> >> +			/* Insert *before* the last fault */
> >> +			list_move(&fault->head, &group->faults);
> >> +	}
> >> +  
> > If you already sorted the group list with last fault at the end,
> > why do you need a separate entry to track the last fault?  
> 
> Not sure I understand your question, sorry. Do you mean why the
> iopf_group.last_fault? Just to avoid one more kzalloc.
> 
kind of :) what i thought was that why can't the last_fault naturally
be the last entry in your group fault list? then there is no need for
special treatment in terms of allocation of the last fault. just my
preference.
> >> +
> >> +	queue->flush(queue->flush_arg, dev);
> >> +
> >> +	/*
> >> +	 * No need to clear the partial list. All PRGs containing
> >> the PASID that
> >> +	 * needs to be decommissioned are whole (the device driver
> >> made sure of
> >> +	 * it before this function was called). They have been
> >> submitted to the
> >> +	 * queue by the above flush().
> >> +	 */  
> > So you are saying device driver need to make sure LPIG PRQ is
> > submitted in the flush call above such that partial list is
> > cleared?  
> 
> Not exactly, it's the IOMMU driver that makes sure all LPIG in its
> queues are submitted by the above flush call. In more details the
> flow is:
> 
> * Either device driver calls unbind()/sva_device_shutdown(), or the
> process exits.
> * If the device driver called, then it already told the device to stop
> using the PASID. Otherwise we use the mm_exit() callback to tell the
> device driver to stop using the PASID.
> * In either case, when receiving a stop request from the driver, the
> device sends the LPIGs to the IOMMU queue.
> * Then, the flush call above ensures that the IOMMU reports the LPIG
> with iommu_report_device_fault.
> * While submitting all LPIGs for this PASID to the work queue,
> ipof_queue_fault also picked up all partial faults, so the partial
> list is clean.
> 
> Maybe I should improve this comment?
> 
thanks for explaining. LPIG submission is done by device asynchronously
w.r.t. driver stopping/decommission PASID. so if we were to use this
flow on vt-d, which does not stall page request queue, then we should
use the iommu model specific flush() callback to ensure LPIG is
received? There could be queue full condition and retry. I am just
trying to understand how and where we can make sure LPIG is
received and all groups are whole.

> > what if
> > there are device failures where device needs to reset (either whole
> > function level or PASID level), should there still be a need to
> > clear the partial list for all associated PASID/group?  
> 
> I guess the simplest way out, when resetting the device, is for the
> device driver to call iommu_sva_device_shutdown(). Both the IOMMU
> driver's sva_device_shutdown() and remove_device() ops should call
> iopf_queue_remove_device(), which clears the partial list.
> 
yes. I think that should work for functional reset.
> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]     ` <20180511190641.23008-14-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-17 15:58       ` Jonathan Cameron
@ 2018-05-23  9:38       ` Xu Zaibo
       [not found]         ` <5B0536A3.1000304-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2018-08-27  8:06       ` Xu Zaibo
  2 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-05-23  9:38 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	christian.koenig-5C7GfCeVMHo

Hi,

On 2018/5/12 3:06, Jean-Philippe Brucker wrote:
> Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> Process Address Space ID (PASID). Devices in the container append this
> PASID to DMA transactions in order to access the process' address space.
> The process page tables are shared with the IOMMU, and mechanisms such as
> PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a
> bond created with VFIO_IOMMU_BIND_PROCESS.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
>
>
>
>
>
>
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	bool enabled_sva = false;
> +	struct vfio_iommu_sva_bind_data data = {
> +		.vfio_mm	= vfio_mm,
> +		.iommu		= iommu,
> +		.count		= 0,
> +	};
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
Do we need to do *sva_init here or do anything to avoid repeated initiation?
while another process already did initiation at this device, I think 
that current process will get an EEXIST.

Thanks.
> +		if (ret)
> +			return ret;
> +
> +		group->sva_enabled = enabled_sva = true;
> +	}
> +
> +	ret = iommu_group_for_each_dev(group->iommu_group, &data,
> +				       vfio_iommu_sva_bind_dev);
> +	if (ret && data.count > 1)
> +		iommu_group_for_each_dev(group->iommu_group, vfio_mm,
> +					 vfio_iommu_sva_unbind_dev);
> +	if (ret && enabled_sva) {
> +		iommu_group_for_each_dev(group->iommu_group, NULL,
> +					 vfio_iommu_sva_shutdown);
> +		group->sva_enabled = false;
> +	}
> +
> +	return ret;
> +}
>
>
>   
> @@ -1442,6 +1636,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>   	if (ret)
>   		goto out_detach;
>   
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>   	if (resv_msi) {
>   		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>   		if (ret)
> @@ -1547,6 +1745,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>   			continue;
>   
>   		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
Here, why shut down here? If another process is working on the device, 
there may be a crash?

Thanks.
>   		list_del(&group->next);
>   		kfree(group);
>   		/*
> @@ -1562,6 +1765,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-05-22 16:43             ` Jacob Pan
@ 2018-05-24 11:44               ` Jean-Philippe Brucker
  2018-05-24 11:50                 ` Ilias Apalodimas
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-24 11:44 UTC (permalink / raw)
  To: Jacob Pan
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christian.koenig-5C7GfCeVMHo

On 22/05/18 17:43, Jacob Pan wrote:
> On Thu, 17 May 2018 11:02:42 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> On 17/05/18 00:31, Jacob Pan wrote:
>>> On Fri, 11 May 2018 20:06:04 +0100
>>> I am a little confused about domain vs. pasid relationship. If
>>> each domain represents a address space, should there be a domain for
>>> each pasid?  
>>
>> I don't think there is a formal definition, but from previous
>> discussion the consensus seems to be: domains are a collection of
>> devices that have the same virtual address spaces (one or many).
>>
>> Keeping that definition makes things easier, in my opinion. Some time
>> ago, I did try to represent PASIDs using "subdomains" (introducing a
>> hierarchy of struct iommu_domain), but it required invasive changes in
>> the IOMMU subsystem and probably all over the tree.
>>
>> You do need some kind of "root domain" for each device, so that
>> "iommu_get_domain_for_dev()" still makes sense. That root domain
>> doesn't have a single address space but a collection of subdomains.
>> If you need this anyway, representing a PASID with an iommu_domain
>> doesn't seem preferable than using a different structure (io_mm),
>> because they don't have anything in common.
>>
> My main concern is the PASID table storage. If PASID table storage
> is tied to domain, it is ok to scale up, i.e. multiple devices in a
> domain share a single PASID table. But to scale down, e.g. further
> partition device with VFIO mdev for assignment, each mdev may get its
> own domain via vfio. But there is no IOMMU storage for PASID table at
> mdev device level. Perhaps we do need root domain or some parent-child
> relationship to locate PASID table.

Interesting, I hadn't thought about this use-case before. At first I
thought you were talking about mdev devices assigned to VMs, but I think
you're referring to mdevs assigned to userspace drivers instead? Out of
curiosity, is it only theoretical or does someone actually need this?

I don't think mdev for VM assignment are compatible with PASID, at least
not when the IOMMU is involved. I usually ignore mdev in my reasoning
because, as far as I know, currently they only affect devices that have
their own MMU, and IOMMU domains don't come into play. However, if a
device was backed by the IOMMU, and the device driver wanted to
partition it into mdevs, then users would be tempted to assign mdev1 to
VM1 and mdev2 to VM2.

It doesn't work with PASID, because the PASID spaces of VM1 and VM2 will
conflict. If both VM1 and VM2 allocate PASID #1, then the host has to
somehow arbitrate device accesses, for example scheduling first mdev1
then mdev2. That's possible if the device driver is in charge of the
MMU, but not really compatible with the IOMMU.

So in the IOMMU subsystem, for assigning devices to VMs the only
model that makes sense is SR-IOV, where each VF/mdev has its own RID and
its own PASID table. In that case you'd get one IOMMU domain per VF.


But considering userspace drivers in the host alone, it might make sense
to partition a device into mdevs and assign them to multiple processes.
Interestingly this scheme still doesn't work with the classic MAP/UNMAP
ioctl: since there is a single device context entry for all mdevs, the
mediating driver would have to catch all MAP/UNMAP ioctls and reject
those with IOVAs that overlap those of another mdev. It's doesn't seem
viable. But when using PASID then each mdev has its own address space,
and since PASIDs are allocated by the kernel there is no such conflict.

Anyway, I think this use-case can work with the current structures, if
mediating driver does the bind() instead of VFIO. That's necessary
because you can't let userspace program the PASID into the device, or
they would be able to access address spaces owned by other mdevs. Then
the mediating driver does the bind(), and keeps internal structures to
associate the process to the given mdev.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
  2018-05-22 23:35             ` Jacob Pan
@ 2018-05-24 11:44               ` Jean-Philippe Brucker
       [not found]                 ` <bdf9f221-ab97-2168-d072-b7f6a0dba840-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-24 11:44 UTC (permalink / raw)
  To: Jacob Pan
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 23/05/18 00:35, Jacob Pan wrote:
>>>> +			/* Insert *before* the last fault */
>>>> +			list_move(&fault->head, &group->faults);
>>>> +	}
>>>> +  
>>> If you already sorted the group list with last fault at the end,
>>> why do you need a separate entry to track the last fault?  
>>
>> Not sure I understand your question, sorry. Do you mean why the
>> iopf_group.last_fault? Just to avoid one more kzalloc.
>>
> kind of :) what i thought was that why can't the last_fault naturally
> be the last entry in your group fault list? then there is no need for
> special treatment in terms of allocation of the last fault. just my
> preference.

But we need a kzalloc for the last fault anyway, so I thought I'd just
piggy-back on the group allocation. And if I don't add the last fault at
the end of group->faults, then it's iopf_handle_group that requires
special treatment. I'm still not sure I understood your question though,
could you send me a patch that does it?

>>>> +
>>>> +	queue->flush(queue->flush_arg, dev);
>>>> +
>>>> +	/*
>>>> +	 * No need to clear the partial list. All PRGs containing
>>>> the PASID that
>>>> +	 * needs to be decommissioned are whole (the device driver
>>>> made sure of
>>>> +	 * it before this function was called). They have been
>>>> submitted to the
>>>> +	 * queue by the above flush().
>>>> +	 */  
>>> So you are saying device driver need to make sure LPIG PRQ is
>>> submitted in the flush call above such that partial list is
>>> cleared?  
>>
>> Not exactly, it's the IOMMU driver that makes sure all LPIG in its
>> queues are submitted by the above flush call. In more details the
>> flow is:
>>
>> * Either device driver calls unbind()/sva_device_shutdown(), or the
>> process exits.
>> * If the device driver called, then it already told the device to stop
>> using the PASID. Otherwise we use the mm_exit() callback to tell the
>> device driver to stop using the PASID.
>> * In either case, when receiving a stop request from the driver, the
>> device sends the LPIGs to the IOMMU queue.
>> * Then, the flush call above ensures that the IOMMU reports the LPIG
>> with iommu_report_device_fault.
>> * While submitting all LPIGs for this PASID to the work queue,
>> ipof_queue_fault also picked up all partial faults, so the partial
>> list is clean.
>>
>> Maybe I should improve this comment?
>>
> thanks for explaining. LPIG submission is done by device asynchronously
> w.r.t. driver stopping/decommission PASID.

Hmm, it should really be synchronous, otherwise there is no way to know
when the PASID can be decommissioned. We need a guarantee such as the
one in 6.20.1 of the PCIe spec, "Managing PASID TLP Prefix Usage":

"When the stop request mechanism indicates completion, the Function has:
* Completed all Non-Posted Requests associated with this PASID.
* Flushed to the host all Posted Requests addressing host memory in all
TCs that were used by the PASID."

That's in combination with "The function shall [...] finish transmitting
any multi-page Page Request Messages for this PASID (i.e. send the Page
Request Message with the L bit Set)." from the ATS spec.

(If I remember correctly a PRI Page Request is a Posted Request.) Only
after this stop request completes can the driver call unbind(), or
return from exit_mm(). Then we know that if there was a LPIG for that
PASID, it is in the IOMMU's PRI queue (or already completed) once we
call flush().

> so if we were to use this
> flow on vt-d, which does not stall page request queue, then we should
> use the iommu model specific flush() callback to ensure LPIG is
> received? There could be queue full condition and retry. I am just
> trying to understand how and where we can make sure LPIG is
> received and all groups are whole.

For SMMU in patch 30, the flush() callback waits until the PRI queue is
empty or, when the PRI thread is running in parallel, until the thread
has done a full circle (handled as many faults as the queue size). It's
really unpleasant to implement because the flush() callback takes a lock
to inspect the hardware state, but I don't think we have a choice.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]         ` <5B0536A3.1000304-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-24 11:44           ` Jean-Philippe Brucker
       [not found]             ` <cd13f60d-b282-3804-4ca7-2d34476c597f-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-24 11:44 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, liguozhu,
	christian.koenig-5C7GfCeVMHo

Hi,

On 23/05/18 10:38, Xu Zaibo wrote:
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +                 struct vfio_group *group,
>> +                 struct vfio_mm *vfio_mm)
>> +{
>> +    int ret;
>> +    bool enabled_sva = false;
>> +    struct vfio_iommu_sva_bind_data data = {
>> +        .vfio_mm    = vfio_mm,
>> +        .iommu        = iommu,
>> +        .count        = 0,
>> +    };
>> +
>> +    if (!group->sva_enabled) {
>> +        ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +                           vfio_iommu_sva_init);
> Do we need to do *sva_init here or do anything to avoid repeated
> initiation?
> while another process already did initiation at this device, I think
> that current process will get an EEXIST.

Right, sva_init() must be called once for any device that intends to use
bind(). For the second process though, group->sva_enabled will be true
so we won't call sva_init() again, only bind().

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-05-24 11:44               ` Jean-Philippe Brucker
@ 2018-05-24 11:50                 ` Ilias Apalodimas
  2018-05-24 15:04                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 123+ messages in thread
From: Ilias Apalodimas @ 2018-05-24 11:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: xieyisheng1, kvm, linux-pci, xuzaibo, jonathan.cameron,
	Will Deacon, okaya, linux-mm, yi.l.liu, ashok.raj, tn, joro,
	robdclark, bharatku, linux-acpi, liudongdong3, rfranz,
	devicetree, kevin.tian@intel.com

> Interesting, I hadn't thought about this use-case before. At first I
> thought you were talking about mdev devices assigned to VMs, but I think
> you're referring to mdevs assigned to userspace drivers instead? Out of
> curiosity, is it only theoretical or does someone actually need this?

There has been some non upstreamed efforts to have mdev and produce userspace
drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
we did a proof of concept for ethernet interfaces. At the time we choose not to
involve the IOMMU for the reason you mentioned, but having it there would be
good.

Thanks
Ilias

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]             ` <cd13f60d-b282-3804-4ca7-2d34476c597f-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-24 12:35               ` Xu Zaibo
       [not found]                 ` <5B06B17C.1090809-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-05-24 12:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, liguozhu,
	christian.koenig-5C7GfCeVMHo


[-- Attachment #1.1: Type: text/plain, Size: 2914 bytes --]



On 2018/5/24 19:44, Jean-Philippe Brucker wrote:
> Hi,
>
> On 23/05/18 10:38, Xu Zaibo wrote:
>>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>>> +                 struct vfio_group *group,
>>> +                 struct vfio_mm *vfio_mm)
>>> +{
>>> +    int ret;
>>> +    bool enabled_sva = false;
>>> +    struct vfio_iommu_sva_bind_data data = {
>>> +        .vfio_mm    = vfio_mm,
>>> +        .iommu        = iommu,
>>> +        .count        = 0,
>>> +    };
>>> +
>>> +    if (!group->sva_enabled) {
>>> +        ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>>> +                           vfio_iommu_sva_init);
>> Do we need to do *sva_init here or do anything to avoid repeated
>> initiation?
>> while another process already did initiation at this device, I think
>> that current process will get an EEXIST.
> Right, sva_init() must be called once for any device that intends to use
> bind(). For the second process though, group->sva_enabled will be true
> so we won't call sva_init() again, only bind().
>
Well, while I create mediated devices based on one parent device to 
support multiple
processes(a new process will create a new 'vfio_group' for the 
corresponding mediated device,
and 'sva_enabled' cannot work any more), in fact, *sva_init and 
*sva_shutdown are basically
working on parent device, so, as a result, I just only need sva 
initiation and shutdown on the
parent device only once. So I change the two as following:

/@@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, 
unsigned long features,/
       if (features & ~IOMMU_SVA_FEAT_IOPF)
          return -EINVAL;

/+    /* If already exists, do nothing  *///
//+    mutex_lock(&dev->iommu_param->lock);//
//+    if (dev->iommu_param->sva_param) {//
//+        mutex_unlock(&dev->iommu_param->lock);//
//+        return 0;//
//+    }//
//+    mutex_unlock(&dev->iommu_param->lock);//
////
//     if (features & IOMMU_SVA_FEAT_IOPF) {//
//         ret = iommu_register_device_fault_handler(dev, 
iommu_queue_iopf,///// /
//
//
//@@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)//
//     if (!domain)//
//         return -ENODEV;//
////
//+    /* If any other process is working on the device, shut down does 
nothing. *///
//+    mutex_lock(&dev->iommu_param->lock);//
//+    if (!list_empty(&dev->iommu_param->sva_param->mm_list)) {//
//+        mutex_unlock(&dev->iommu_param->lock);//
//+        return 0;//
//+    }//
//+    mutex_unlock(&dev->iommu_param->lock);//
//+//
//     __iommu_sva_unbind_dev_all(dev);//
////
//     mutex_lock(&dev->iommu_param->lock);/

I add the above two checkings in both *sva_init and *sva_shutdown, it is 
working now,
but i don't know if it will cause any new problems. What's more, i doubt 
if it is
reasonable to check this to avoid repeating operation in VFIO, maybe it 
is better to check
in IOMMU. :)

Thanks
Zaibo
>
> .
>


[-- Attachment #1.2: Type: text/html, Size: 42100 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                 ` <5B06B17C.1090809-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-24 15:04                   ` Jean-Philippe Brucker
       [not found]                     ` <205c1729-8026-3efe-c363-d37d7150d622-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-24 15:04 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	christian.koenig-5C7GfCeVMHo

On 24/05/18 13:35, Xu Zaibo wrote:
>> Right, sva_init() must be called once for any device that intends to use
>> bind(). For the second process though, group->sva_enabled will be true
>> so we won't call sva_init() again, only bind().
> 
> Well, while I create mediated devices based on one parent device to support multiple
> processes(a new process will create a new 'vfio_group' for the corresponding mediated device,
> and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown are basically
> working on parent device, so, as a result, I just only need sva initiation and shutdown on the
> parent device only once. So I change the two as following:
> 
> @@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
>       if (features & ~IOMMU_SVA_FEAT_IOPF)
>          return -EINVAL;
> 
> +    /* If already exists, do nothing  */
> +    mutex_lock(&dev->iommu_param->lock);
> +    if (dev->iommu_param->sva_param) {
> +        mutex_unlock(&dev->iommu_param->lock);
> +        return 0;
> +    }
> +    mutex_unlock(&dev->iommu_param->lock);
> 
>      if (features & IOMMU_SVA_FEAT_IOPF) {
>          ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
> 
> 
> @@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
>      if (!domain)
>          return -ENODEV;
> 
> +    /* If any other process is working on the device, shut down does nothing. */
> +    mutex_lock(&dev->iommu_param->lock);
> +    if (!list_empty(&dev->iommu_param->sva_param->mm_list)) {
> +        mutex_unlock(&dev->iommu_param->lock);
> +        return 0;
> +    }
> +    mutex_unlock(&dev->iommu_param->lock);

I don't think iommu-sva.c is the best place for this, it's probably
better to implement an intermediate layer (the mediating driver), that
calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
vfio-pci would still call these functions itself, but for mdev the
mediating driver keeps a refcount of groups, and calls device_shutdown()
only when freeing the last mdev.

A device driver (non mdev in this example) expects to be able to free
all its resources after sva_device_shutdown() returns. Imagine the
mm_list isn't empty (mm_exit() is running late), and instead of waiting
in unbind_dev_all() below, we return 0 immediately. Then the calling
driver frees its resources, and the mm_exit callback along with private
data passed to bind() disappear. If a mm_exit() is still running in
parallel, then it will try to access freed data and corrupt memory. So
in this function if mm_list isn't empty, the only thing we can do is wait.

Thanks,
Jean

> +
>      __iommu_sva_unbind_dev_all(dev);
> 
>      mutex_lock(&dev->iommu_param->lock);
> 
> I add the above two checkings in both *sva_init and *sva_shutdown, it is working now,
> but i don't know if it will cause any new problems. What's more, i doubt if it is
> reasonable to check this to avoid repeating operation in VFIO, maybe it is better to check
> in IOMMU. :)

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-05-24 11:50                 ` Ilias Apalodimas
@ 2018-05-24 15:04                   ` Jean-Philippe Brucker
       [not found]                     ` <19e82a74-429a-3f86-119e-32b12082d0ff-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-24 15:04 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

On 24/05/18 12:50, Ilias Apalodimas wrote:
>> Interesting, I hadn't thought about this use-case before. At first I
>> thought you were talking about mdev devices assigned to VMs, but I think
>> you're referring to mdevs assigned to userspace drivers instead? Out of
>> curiosity, is it only theoretical or does someone actually need this?
> 
> There has been some non upstreamed efforts to have mdev and produce userspace
> drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> we did a proof of concept for ethernet interfaces. At the time we choose not to
> involve the IOMMU for the reason you mentioned, but having it there would be
> good.

I'm guessing there were good reasons to do it that way but I wonder, is
it not simpler to just have the kernel driver create a /dev/foo, with a
standard ioctl/mmap/poll interface? Here VFIO adds a layer of
indirection, and since the mediating driver has to implement these
operations already, what is gained?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                     ` <205c1729-8026-3efe-c363-d37d7150d622-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-25  2:39                       ` Xu Zaibo
  2018-05-25  9:47                         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-05-25  2:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	christian.koenig-5C7GfCeVMHo

Hi,

On 2018/5/24 23:04, Jean-Philippe Brucker wrote:
> On 24/05/18 13:35, Xu Zaibo wrote:
>>> Right, sva_init() must be called once for any device that intends to use
>>> bind(). For the second process though, group->sva_enabled will be true
>>> so we won't call sva_init() again, only bind().
>> Well, while I create mediated devices based on one parent device to support multiple
>> processes(a new process will create a new 'vfio_group' for the corresponding mediated device,
>> and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown are basically
>> working on parent device, so, as a result, I just only need sva initiation and shutdown on the
>> parent device only once. So I change the two as following:
>>
>> @@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
>>        if (features & ~IOMMU_SVA_FEAT_IOPF)
>>           return -EINVAL;
>>
>> +    /* If already exists, do nothing  */
>> +    mutex_lock(&dev->iommu_param->lock);
>> +    if (dev->iommu_param->sva_param) {
>> +        mutex_unlock(&dev->iommu_param->lock);
>> +        return 0;
>> +    }
>> +    mutex_unlock(&dev->iommu_param->lock);
>>
>>       if (features & IOMMU_SVA_FEAT_IOPF) {
>>           ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
>>
>>
>> @@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
>>       if (!domain)
>>           return -ENODEV;
>>
>> +    /* If any other process is working on the device, shut down does nothing. */
>> +    mutex_lock(&dev->iommu_param->lock);
>> +    if (!list_empty(&dev->iommu_param->sva_param->mm_list)) {
>> +        mutex_unlock(&dev->iommu_param->lock);
>> +        return 0;
>> +    }
>> +    mutex_unlock(&dev->iommu_param->lock);
> I don't think iommu-sva.c is the best place for this, it's probably
> better to implement an intermediate layer (the mediating driver), that
> calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
> vfio-pci would still call these functions itself, but for mdev the
> mediating driver keeps a refcount of groups, and calls device_shutdown()
> only when freeing the last mdev.
>
> A device driver (non mdev in this example) expects to be able to free
> all its resources after sva_device_shutdown() returns. Imagine the
> mm_list isn't empty (mm_exit() is running late), and instead of waiting
> in unbind_dev_all() below, we return 0 immediately. Then the calling
> driver frees its resources, and the mm_exit callback along with private
> data passed to bind() disappear. If a mm_exit() is still running in
> parallel, then it will try to access freed data and corrupt memory. So
> in this function if mm_list isn't empty, the only thing we can do is wait.
>
I still don't understand why we should 'unbind_dev_all', is it possible 
to do a 'unbind_dev_pasid'?
Then we can do other things instead of waiting that user may not like. :)

Thanks
Zaibo
>
>> +
>>       __iommu_sva_unbind_dev_all(dev);
>>
>>       mutex_lock(&dev->iommu_param->lock);
>>
>> I add the above two checkings in both *sva_init and *sva_shutdown, it is working now,
>> but i don't know if it will cause any new problems. What's more, i doubt if it is
>> reasonable to check this to avoid repeating operation in VFIO, maybe it is better to check
>> in IOMMU. :)
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]                     ` <19e82a74-429a-3f86-119e-32b12082d0ff-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-25  6:33                       ` Ilias Apalodimas
  2018-05-25  8:39                         ` Jonathan Cameron
  0 siblings, 1 reply; 123+ messages in thread
From: Ilias Apalodimas @ 2018-05-25  6:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

On Thu, May 24, 2018 at 04:04:39PM +0100, Jean-Philippe Brucker wrote:
> On 24/05/18 12:50, Ilias Apalodimas wrote:
> >> Interesting, I hadn't thought about this use-case before. At first I
> >> thought you were talking about mdev devices assigned to VMs, but I think
> >> you're referring to mdevs assigned to userspace drivers instead? Out of
> >> curiosity, is it only theoretical or does someone actually need this?
> > 
> > There has been some non upstreamed efforts to have mdev and produce userspace
> > drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> > we did a proof of concept for ethernet interfaces. At the time we choose not to
> > involve the IOMMU for the reason you mentioned, but having it there would be
> > good.
> 
> I'm guessing there were good reasons to do it that way but I wonder, is
> it not simpler to just have the kernel driver create a /dev/foo, with a
> standard ioctl/mmap/poll interface? Here VFIO adds a layer of
> indirection, and since the mediating driver has to implement these
> operations already, what is gained?
The best reason i can come up with is "common code". You already have one API
doing that for you so we replicate it in a /dev file?
The mdev approach still needs extentions to support what we tried to do (i.e
mdev bus might need yo have access on iommu_ops), but as far as i undestand it's
a possible case.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-05-25  6:33                       ` Ilias Apalodimas
@ 2018-05-25  8:39                         ` Jonathan Cameron
       [not found]                           ` <20180525093959.000040a7-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
                                             ` (3 more replies)
  0 siblings, 4 replies; 123+ messages in thread
From: Jonathan Cameron @ 2018-05-25  8:39 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kenneth-lee-2012-H32Fclmsjq1BDgjK7y7TUQ,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

+CC Kenneth Lee

On Fri, 25 May 2018 09:33:11 +0300
Ilias Apalodimas <ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:

> On Thu, May 24, 2018 at 04:04:39PM +0100, Jean-Philippe Brucker wrote:
> > On 24/05/18 12:50, Ilias Apalodimas wrote:  
> > >> Interesting, I hadn't thought about this use-case before. At first I
> > >> thought you were talking about mdev devices assigned to VMs, but I think
> > >> you're referring to mdevs assigned to userspace drivers instead? Out of
> > >> curiosity, is it only theoretical or does someone actually need this?  
> > > 
> > > There has been some non upstreamed efforts to have mdev and produce userspace
> > > drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> > > we did a proof of concept for ethernet interfaces. At the time we choose not to
> > > involve the IOMMU for the reason you mentioned, but having it there would be
> > > good.  
> > 
> > I'm guessing there were good reasons to do it that way but I wonder, is
> > it not simpler to just have the kernel driver create a /dev/foo, with a
> > standard ioctl/mmap/poll interface? Here VFIO adds a layer of
> > indirection, and since the mediating driver has to implement these
> > operations already, what is gained?  
> The best reason i can come up with is "common code". You already have one API
> doing that for you so we replicate it in a /dev file?
> The mdev approach still needs extentions to support what we tried to do (i.e
> mdev bus might need yo have access on iommu_ops), but as far as i undestand it's
> a possible case.
> > 
> > Thanks,
> > Jean  

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
  2018-05-25  2:39                       ` Xu Zaibo
@ 2018-05-25  9:47                         ` Jean-Philippe Brucker
       [not found]                           ` <fea420ff-e738-e2ed-ab1e-a3f4dde4b6ef-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-25  9:47 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm, linux-mm
  Cc: bharatku, ashok.raj, dwmw2, ilias.apalodimas, Will Deacon, okaya,
	rgummal, rfranz, liguozhu, christian.koenig

On 25/05/18 03:39, Xu Zaibo wrote:
> Hi,
> 
> On 2018/5/24 23:04, Jean-Philippe Brucker wrote:
>> On 24/05/18 13:35, Xu Zaibo wrote:
>>>> Right, sva_init() must be called once for any device that intends to use
>>>> bind(). For the second process though, group->sva_enabled will be true
>>>> so we won't call sva_init() again, only bind().
>>> Well, while I create mediated devices based on one parent device to support multiple
>>> processes(a new process will create a new 'vfio_group' for the corresponding mediated device,
>>> and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown are basically
>>> working on parent device, so, as a result, I just only need sva initiation and shutdown on the
>>> parent device only once. So I change the two as following:
>>>
>>> @@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
>>>        if (features & ~IOMMU_SVA_FEAT_IOPF)
>>>           return -EINVAL;
>>>
>>> +    /* If already exists, do nothing  */
>>> +    mutex_lock(&dev->iommu_param->lock);
>>> +    if (dev->iommu_param->sva_param) {
>>> +        mutex_unlock(&dev->iommu_param->lock);
>>> +        return 0;
>>> +    }
>>> +    mutex_unlock(&dev->iommu_param->lock);
>>>
>>>       if (features & IOMMU_SVA_FEAT_IOPF) {
>>>           ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
>>>
>>>
>>> @@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
>>>       if (!domain)
>>>           return -ENODEV;
>>>
>>> +    /* If any other process is working on the device, shut down does nothing. */
>>> +    mutex_lock(&dev->iommu_param->lock);
>>> +    if (!list_empty(&dev->iommu_param->sva_param->mm_list)) {
>>> +        mutex_unlock(&dev->iommu_param->lock);
>>> +        return 0;
>>> +    }
>>> +    mutex_unlock(&dev->iommu_param->lock);
>> I don't think iommu-sva.c is the best place for this, it's probably
>> better to implement an intermediate layer (the mediating driver), that
>> calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
>> vfio-pci would still call these functions itself, but for mdev the
>> mediating driver keeps a refcount of groups, and calls device_shutdown()
>> only when freeing the last mdev.
>>
>> A device driver (non mdev in this example) expects to be able to free
>> all its resources after sva_device_shutdown() returns. Imagine the
>> mm_list isn't empty (mm_exit() is running late), and instead of waiting
>> in unbind_dev_all() below, we return 0 immediately. Then the calling
>> driver frees its resources, and the mm_exit callback along with private
>> data passed to bind() disappear. If a mm_exit() is still running in
>> parallel, then it will try to access freed data and corrupt memory. So
>> in this function if mm_list isn't empty, the only thing we can do is wait.
>>
> I still don't understand why we should 'unbind_dev_all', is it possible 
> to do a 'unbind_dev_pasid'?

Not in sva_device_shutdown(), it needs to clean up everything. For
example you want to physically unplug the device, or assign it to a VM.
To prevent any leak sva_device_shutdown() needs to remove all bonds. In
theory there shouldn't be any, since either the driver did unbind_dev(),
or all process exited. This is a safety net.

> Then we can do other things instead of waiting that user may not like. :)

They may not like it, but it's for their own good :) At the moment we're
waiting that:

* All exit_mm() callback for this device have finished. If we don't wait
  then the caller will free the private data passed to bind and the
  mm_exit() callback while they are still being used.

* All page requests targeting this device are dealt with. If we don't
  wait then some requests, that are lingering in the IOMMU PRI queue,
  may hit the next contexts bound to this device, possibly in a
  different VM. It may not be too risky (though probably exploitable in
  some way), but is incredibly messy.

All of this is bounded in time, and normally should be over pretty fast
unless the device driver's exit_mm() does something strange. If the
driver did the right thing, there shouldn't be any wait here (although
there may be one in unbind_dev() for the same reasons - prevent use
after free).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [PATCH v2 39/40] iommu/arm-smmu-v3: Add support for PRI
       [not found]     ` <20180511190641.23008-40-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-25 14:08       ` Bharat Kumar Gogada
       [not found]         ` <BLUPR0201MB150513BBAA161355DE9B3A48A5690-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Bharat Kumar Gogada @ 2018-05-25 14:08 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ravikiran Gummaluri,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

> 
> For PCI devices that support it, enable the PRI capability and handle PRI Page
> Requests with the generic fault handler. It is enabled on demand by
> iommu_sva_device_init().
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2:
> * Terminate the page request and disable PRI if no handler is registered
> * Enable and disable PRI in sva_device_init/shutdown, instead of
>   add/remove_device
> ---
>  drivers/iommu/arm-smmu-v3.c | 192 +++++++++++++++++++++++++++-------
> --
>  1 file changed, 145 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 6cb69ace371b..0edbb8d19579 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -248,6 +248,7 @@
>  #define STRTAB_STE_1_S1COR		GENMASK_ULL(5, 4)
>  #define STRTAB_STE_1_S1CSH		GENMASK_ULL(7, 6)
> 
> +#define STRTAB_STE_1_PPAR		(1UL << 18)
>  #define STRTAB_STE_1_S1STALLD		(1UL << 27)
> 
>  #define STRTAB_STE_1_EATS		GENMASK_ULL(29, 28)
> @@ -309,6 +310,9 @@
>  #define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
>  #define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
>  #define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
> +#define CMDQ_PRI_1_RESP_FAILURE
> 	FIELD_PREP(CMDQ_PRI_1_RESP, 0UL)
> +#define CMDQ_PRI_1_RESP_INVALID
> 	FIELD_PREP(CMDQ_PRI_1_RESP, 1UL)
> +#define CMDQ_PRI_1_RESP_SUCCESS
> 	FIELD_PREP(CMDQ_PRI_1_RESP, 2UL)
> 
>  #define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
>  #define CMDQ_RESUME_0_ACTION_RETRY	(1UL << 12)
> @@ -383,12 +387,6 @@ module_param_named(disable_ats_check,
> disable_ats_check, bool, S_IRUGO);
> MODULE_PARM_DESC(disable_ats_check,
>  	"By default, the SMMU checks whether each incoming transaction
> marked as translated is allowed by the stream configuration. This option
> disables the check.");
> 
> -enum pri_resp {
> -	PRI_RESP_DENY = 0,
> -	PRI_RESP_FAIL = 1,
> -	PRI_RESP_SUCC = 2,
> -};
> -
>  enum arm_smmu_msi_index {
>  	EVTQ_MSI_INDEX,
>  	GERROR_MSI_INDEX,
> @@ -471,7 +469,7 @@ struct arm_smmu_cmdq_ent {
>  			u32			sid;
>  			u32			ssid;
>  			u16			grpid;
> -			enum pri_resp		resp;
> +			enum page_response_code	resp;
>  		} pri;
> 
>  		#define CMDQ_OP_RESUME		0x44
> @@ -556,6 +554,7 @@ struct arm_smmu_strtab_ent {
>  	struct arm_smmu_s2_cfg		*s2_cfg;
> 
>  	bool				can_stall;
> +	bool				prg_resp_needs_ssid;
>  };
> 
>  struct arm_smmu_strtab_cfg {
> @@ -907,14 +906,18 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd,
> struct arm_smmu_cmdq_ent *ent)
>  		cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SID, ent->pri.sid);
>  		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_GRPID, ent->pri.grpid);
>  		switch (ent->pri.resp) {
> -		case PRI_RESP_DENY:
> -		case PRI_RESP_FAIL:
> -		case PRI_RESP_SUCC:
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
> +			break;
> +		case IOMMU_PAGE_RESP_INVALID:
> +			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
> +			break;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
>  			break;
>  		default:
>  			return -EINVAL;
>  		}
> -		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
>  		break;
>  	case CMDQ_OP_RESUME:
>  		cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_SID, ent-
> >resume.sid); @@ -1114,8 +1117,15 @@ static int
> arm_smmu_page_response(struct device *dev,
>  		cmd.resume.sid		= sid;
>  		cmd.resume.stag		= resp->page_req_group_id;
>  		cmd.resume.resp		= resp->resp_code;
> +	} else if (master->can_fault) {
> +		cmd.opcode		= CMDQ_OP_PRI_RESP;
> +		cmd.substream_valid	= resp->pasid_present &&
> +					  master->ste.prg_resp_needs_ssid;
> +		cmd.pri.sid		= sid;
> +		cmd.pri.ssid		= resp->pasid;
> +		cmd.pri.grpid		= resp->page_req_group_id;
> +		cmd.pri.resp		= resp->resp_code;
>  	} else {
> -		/* TODO: put PRI response here */
>  		return -ENODEV;
>  	}
> 
> @@ -1236,6 +1246,9 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_device *smmu, u32 sid,
>  			 FIELD_PREP(STRTAB_STE_1_S1CSH,
> ARM_SMMU_SH_ISH) |
>  			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
> 
> +		if (ste->prg_resp_needs_ssid)
> +			dst[1] |= STRTAB_STE_1_PPAR;
> +
>  		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
>  		   !ste->can_stall)
> @@ -1471,39 +1484,54 @@ static irqreturn_t arm_smmu_evtq_thread(int
> irq, void *dev)
> 
>  static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64
> *evt)  {
> -	u32 sid, ssid;
> -	u16 grpid;
> -	bool ssv, last;
> -
> -	sid = FIELD_GET(PRIQ_0_SID, evt[0]);
> -	ssv = FIELD_GET(PRIQ_0_SSID_V, evt[0]);
> -	ssid = ssv ? FIELD_GET(PRIQ_0_SSID, evt[0]) : 0;
> -	last = FIELD_GET(PRIQ_0_PRG_LAST, evt[0]);
> -	grpid = FIELD_GET(PRIQ_1_PRG_IDX, evt[1]);
> -
> -	dev_info(smmu->dev, "unexpected PRI request received:\n");
> -	dev_info(smmu->dev,
> -		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access
> at iova 0x%016llx\n",
> -		 sid, ssid, grpid, last ? "L" : "",
> -		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
> -		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
> -		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
> -		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
> -		 evt[1] & PRIQ_1_ADDR_MASK);
> -
> -	if (last) {
> -		struct arm_smmu_cmdq_ent cmd = {
> -			.opcode			=
> CMDQ_OP_PRI_RESP,
> -			.substream_valid	= ssv,
> -			.pri			= {
> -				.sid	= sid,
> -				.ssid	= ssid,
> -				.grpid	= grpid,
> -				.resp	= PRI_RESP_DENY,
> -			},
> +	u32 sid = FIELD_PREP(PRIQ_0_SID, evt[0]);
> +
> +	struct arm_smmu_master_data *master;
> +	struct iommu_fault_event fault = {
> +		.type			= IOMMU_FAULT_PAGE_REQ,
> +		.last_req		= FIELD_GET(PRIQ_0_PRG_LAST,
> evt[0]),
> +		.pasid_valid		= FIELD_GET(PRIQ_0_SSID_V, evt[0]),
> +		.pasid			= FIELD_GET(PRIQ_0_SSID, evt[0]),
> +		.page_req_group_id	= FIELD_GET(PRIQ_1_PRG_IDX,
> evt[1]),
> +		.addr			= evt[1] & PRIQ_1_ADDR_MASK,
> +	};
> +
> +	if (evt[0] & PRIQ_0_PERM_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	if (evt[0] & PRIQ_0_PERM_WRITE)
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +	if (evt[0] & PRIQ_0_PERM_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +	if (evt[0] & PRIQ_0_PERM_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	/* Discard Stop PASID marker, it isn't used */
> +	if (!(fault.prot & (IOMMU_FAULT_READ|IOMMU_FAULT_WRITE)) &&
> +	    fault.last_req)
> +		return;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (WARN_ON(!master))
> +		return;
> +
> +	if (iommu_report_device_fault(master->dev, &fault)) {
> +		/*
> +		 * No handler registered, so subsequent faults won't produce
> +		 * better results. Try to disable PRI.
> +		 */
> +		struct page_response_msg page_response = {
> +			.addr			= fault.addr,
> +			.pasid			= fault.pasid,
> +			.pasid_present		= fault.pasid_valid,
> +			.page_req_group_id	= fault.page_req_group_id,
> +			.resp_code		=
> IOMMU_PAGE_RESP_FAILURE,
>  		};
> 
> -		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +		dev_warn(master->dev,
> +			 "PPR 0x%x:0x%llx 0x%x: nobody cared, disabling
> PRI\n",
> +			 fault.pasid_valid ? fault.pasid : 0, fault.addr,
> +			 fault.prot);
> +		arm_smmu_page_response(master->dev, &page_response);
>  	}
>  }
> 
> @@ -1529,6 +1557,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq,
> void *dev)
>  		}
> 
>  		if (queue_sync_prod(q) == -EOVERFLOW)
> +			/*
> +			 * TODO: flush pending faults, since the SMMU might
> have
> +			 * auto-responded to the Last request of a pending
> +			 * group
> +			 */
>  			dev_err(smmu->dev, "PRIQ overflow detected --
> requests lost\n");
>  	} while (!queue_empty(q));
> 
> @@ -1577,7 +1610,8 @@ static int arm_smmu_flush_queues(void *cookie,
> struct device *dev)
>  		master = dev->iommu_fwspec->iommu_priv;
>  		if (master->ste.can_stall)
>  			arm_smmu_flush_queue(smmu, &smmu->evtq.q,
> "evtq");
> -		/* TODO: add support for PRI */
> +		else if (master->can_fault)
> +			arm_smmu_flush_queue(smmu, &smmu->priq.q,
> "priq");
>  		return 0;
>  	}
> 
> @@ -2301,6 +2335,59 @@ arm_smmu_iova_to_phys(struct iommu_domain
> *domain, dma_addr_t iova)
>  	return ops->iova_to_phys(ops, iova);
>  }
> 
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master) {
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI
> queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) ||
> !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid =
> pci_prg_resp_requires_prefix(pdev);

Any reason why this is not cleared in arm_smmu_disable_pri ?

> +
> +	dev_dbg(master->dev, "enabled PRI\n");
> +
> +	return 0;
> +}
> +
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data
> *master) {
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	dev_dbg(master->dev, "disabled PRI\n");
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_sva_init(struct device *dev, struct iommu_sva_param
> *param)  {
>  	int ret;
> @@ -2314,11 +2401,15 @@ static int arm_smmu_sva_init(struct device
> *dev, struct iommu_sva_param *param)
>  		return -EINVAL;
> 
>  	if (param->features & IOMMU_SVA_FEAT_IOPF) {
> -		if (!master->can_fault)
> -			return -EINVAL;
> +		arm_smmu_enable_pri(master);
> +		if (!master->can_fault) {
> +			ret = -ENODEV;
> +			goto err_disable_pri;
> +		}
> +
>  		ret = iopf_queue_add_device(master->smmu->iopf_queue,
> dev);
>  		if (ret)
> -			return ret;
> +			goto err_disable_pri;
>  	}
> 
>  	if (!param->max_pasid)
> @@ -2329,11 +2420,17 @@ static int arm_smmu_sva_init(struct device
> *dev, struct iommu_sva_param *param)
>  	param->max_pasid = min(param->max_pasid, (1U << master-
> >ssid_bits) - 1);
> 
>  	return 0;
> +
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
> +
> +	return ret;
>  }
> 
>  static void arm_smmu_sva_shutdown(struct device *dev,
>  				  struct iommu_sva_param *param)
>  {
> +	arm_smmu_disable_pri(dev->iommu_fwspec->iommu_priv);
>  	iopf_queue_remove_device(dev);
>  }
> 
> @@ -2671,6 +2768,7 @@ static void arm_smmu_remove_device(struct
> device *dev)
>  	iommu_group_remove_device(dev);
>  	arm_smmu_remove_master(smmu, master);
>  	iommu_device_unlink(&smmu->iommu, dev);
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>  	kfree(master);
>  	iommu_fwspec_free(dev);
> --
> 2.17.0

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
       [not found]                 ` <bdf9f221-ab97-2168-d072-b7f6a0dba840-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-26  0:35                   ` Jacob Pan
  2018-05-29 10:00                     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 123+ messages in thread
From: Jacob Pan @ 2018-05-26  0:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, 24 May 2018 12:44:38 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> On 23/05/18 00:35, Jacob Pan wrote:
> >>>> +			/* Insert *before* the last fault */
> >>>> +			list_move(&fault->head, &group->faults);
> >>>> +	}
> >>>> +    
> >>> If you already sorted the group list with last fault at the end,
> >>> why do you need a separate entry to track the last fault?    
> >>
> >> Not sure I understand your question, sorry. Do you mean why the
> >> iopf_group.last_fault? Just to avoid one more kzalloc.
> >>  
> > kind of :) what i thought was that why can't the last_fault
> > naturally be the last entry in your group fault list? then there is
> > no need for special treatment in terms of allocation of the last
> > fault. just my preference.  
> 
> But we need a kzalloc for the last fault anyway, so I thought I'd just
> piggy-back on the group allocation. And if I don't add the last fault
> at the end of group->faults, then it's iopf_handle_group that requires
> special treatment. I'm still not sure I understood your question
> though, could you send me a patch that does it?
> 
> >>>> +
> >>>> +	queue->flush(queue->flush_arg, dev);
> >>>> +
> >>>> +	/*
> >>>> +	 * No need to clear the partial list. All PRGs
> >>>> containing the PASID that
> >>>> +	 * needs to be decommissioned are whole (the device
> >>>> driver made sure of
> >>>> +	 * it before this function was called). They have been
> >>>> submitted to the
> >>>> +	 * queue by the above flush().
> >>>> +	 */    
> >>> So you are saying device driver need to make sure LPIG PRQ is
> >>> submitted in the flush call above such that partial list is
> >>> cleared?    
> >>
> >> Not exactly, it's the IOMMU driver that makes sure all LPIG in its
> >> queues are submitted by the above flush call. In more details the
> >> flow is:
> >>
> >> * Either device driver calls unbind()/sva_device_shutdown(), or the
> >> process exits.
> >> * If the device driver called, then it already told the device to
> >> stop using the PASID. Otherwise we use the mm_exit() callback to
> >> tell the device driver to stop using the PASID.
Sorry I still need more clarification. For the PASID termination
initiated by vfio unbind, I don't see device driver given a chance to
stop PASID. Seems just call __iommu_sva_unbind_device() which already
assume device stopped issuing DMA with the PASID.
So it is the vfio unbind caller responsible for doing driver callback
to stop DMA on a given PASID?

> >> * In either case, when receiving a stop request from the driver,
> >> the device sends the LPIGs to the IOMMU queue.
> >> * Then, the flush call above ensures that the IOMMU reports the
> >> LPIG with iommu_report_device_fault.
> >> * While submitting all LPIGs for this PASID to the work queue,
> >> ipof_queue_fault also picked up all partial faults, so the partial
> >> list is clean.
> >>
> >> Maybe I should improve this comment?
> >>  
> > thanks for explaining. LPIG submission is done by device
> > asynchronously w.r.t. driver stopping/decommission PASID.  
> 
> Hmm, it should really be synchronous, otherwise there is no way to
> know when the PASID can be decommissioned. We need a guarantee such
> as the one in 6.20.1 of the PCIe spec, "Managing PASID TLP Prefix
> Usage":
> 
> "When the stop request mechanism indicates completion, the Function
> has:
> * Completed all Non-Posted Requests associated with this PASID.
> * Flushed to the host all Posted Requests addressing host memory in
> all TCs that were used by the PASID."
> 
> That's in combination with "The function shall [...] finish
> transmitting any multi-page Page Request Messages for this PASID
> (i.e. send the Page Request Message with the L bit Set)." from the
> ATS spec.
> 
I am not contesting on the device side, what I meant was from the
host IOMMU driver perspective, LPIG is received via IOMMU host queue,
therefore asynchronous. Not sure about ARM, but on VT-d LPIG submission
could meet queue full condition. So per VT-d spec, iommu will generate a
successful auto response to the device. At this point, assume we
already stopped the given PASID on the device, there might not be
another LPIG sent for the device. Therefore, you could have a partial
list. I think we can just drop the requests in the partial list for
that PASID until the PASID gets re-allocated.


> (If I remember correctly a PRI Page Request is a Posted Request.) Only
> after this stop request completes can the driver call unbind(), or
> return from exit_mm(). Then we know that if there was a LPIG for that
> PASID, it is in the IOMMU's PRI queue (or already completed) once we
> call flush().
> 
agreed.
> > so if we were to use this
> > flow on vt-d, which does not stall page request queue, then we
> > should use the iommu model specific flush() callback to ensure LPIG
> > is received? There could be queue full condition and retry. I am
> > just trying to understand how and where we can make sure LPIG is
> > received and all groups are whole.  
> 
> For SMMU in patch 30, the flush() callback waits until the PRI queue
> is empty or, when the PRI thread is running in parallel, until the
> thread has done a full circle (handled as many faults as the queue
> size). It's really unpleasant to implement because the flush()
> callback takes a lock to inspect the hardware state, but I don't
> think we have a choice.
> 
yes, vt-d has similar situation in page request queue. one option is to
track queue head (SW update) to make sure one complete cycle when queue
tail(HW update) crosses. Or we(suggested by Ashok Raj) can take a
snapshot of the entire queue and process (drops PRQs belong to the
terminated PASID) without holding the queue.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-05-25  8:39                         ` Jonathan Cameron
       [not found]                           ` <20180525093959.000040a7-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-26  2:24                           ` Kenneth Lee
       [not found]                           ` <20180526022445.GA6069@kllp05>
  2018-06-11 16:32                           ` Kenneth Lee
  3 siblings, 0 replies; 123+ messages in thread
From: Kenneth Lee @ 2018-05-26  2:24 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: xieyisheng1, liubo95, kvm, linux-pci, xuzaibo, Will Deacon,
	okaya, linux-mm, liguozhu, yi.l.liu, ashok.raj,
	Jean-Philippe Brucker, tn, joro, iommu, bharatku, linux-acpi,
	liudongdong3, rfranz, devic

On Fri, May 25, 2018 at 09:39:59AM +0100, Jonathan Cameron wrote:
> Date: Fri, 25 May 2018 09:39:59 +0100
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> To: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> CC: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
>  "xieyisheng1@huawei.com" <xieyisheng1@huawei.com>, "kvm@vger.kernel.org"
>  <kvm@vger.kernel.org>, "linux-pci@vger.kernel.org"
>  <linux-pci@vger.kernel.org>, "xuzaibo@huawei.com" <xuzaibo@huawei.com>,
>  Will Deacon <Will.Deacon@arm.com>, "okaya@codeaurora.org"
>  <okaya@codeaurora.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>,
>  "yi.l.liu@intel.com" <yi.l.liu@intel.com>, "ashok.raj@intel.com"
>  <ashok.raj@intel.com>, "tn@semihalf.com" <tn@semihalf.com>,
>  "joro@8bytes.org" <joro@8bytes.org>, "robdclark@gmail.com"
>  <robdclark@gmail.com>, "bharatku@xilinx.com" <bharatku@xilinx.com>,
>  "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
>  "liudongdong3@huawei.com" <liudongdong3@huawei.com>, "rfranz@cavium.com"
>  <rfranz@cavium.com>, "devicetree@vger.kernel.org"
>  <devicetree@vger.kernel.org>, "kevin.tian@intel.com"
>  <kevin.tian@intel.com>, Jacob Pan <jacob.jun.pan@linux.intel.com>,
>  "alex.williamson@redhat.com" <alex.williamson@redhat.com>,
>  "rgummal@xilinx.com" <rgummal@xilinx.com>, "thunder.leizhen@huawei.com"
>  <thunder.leizhen@huawei.com>, "linux-arm-kernel@lists.infradead.org"
>  <linux-arm-kernel@lists.infradead.org>, "shunyong.yang@hxt-semitech.com"
>  <shunyong.yang@hxt-semitech.com>, "dwmw2@infradead.org"
>  <dwmw2@infradead.org>, "liubo95@huawei.com" <liubo95@huawei.com>,
>  "jcrouse@codeaurora.org" <jcrouse@codeaurora.org>,
>  "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>,
>  Robin Murphy <Robin.Murphy@arm.com>, "christian.koenig@amd.com"
>  <christian.koenig@amd.com>, "nwatters@codeaurora.org"
>  <nwatters@codeaurora.org>, "baolu.lu@linux.intel.com"
>  <baolu.lu@linux.intel.com>, liguozhu@hisilicon.com,
>  kenneth-lee-2012@foxmail.com
> Subject: Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
> Message-ID: <20180525093959.000040a7@huawei.com>
> 
> +CC Kenneth Lee
> 
> On Fri, 25 May 2018 09:33:11 +0300
> Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:
> 
> > On Thu, May 24, 2018 at 04:04:39PM +0100, Jean-Philippe Brucker wrote:
> > > On 24/05/18 12:50, Ilias Apalodimas wrote:  
> > > >> Interesting, I hadn't thought about this use-case before. At first I
> > > >> thought you were talking about mdev devices assigned to VMs, but I think
> > > >> you're referring to mdevs assigned to userspace drivers instead? Out of
> > > >> curiosity, is it only theoretical or does someone actually need this?  
> > > > 
> > > > There has been some non upstreamed efforts to have mdev and produce userspace
> > > > drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> > > > we did a proof of concept for ethernet interfaces. At the time we choose not to
> > > > involve the IOMMU for the reason you mentioned, but having it there would be
> > > > good.  
> > > 
> > > I'm guessing there were good reasons to do it that way but I wonder, is
> > > it not simpler to just have the kernel driver create a /dev/foo, with a
> > > standard ioctl/mmap/poll interface? Here VFIO adds a layer of
> > > indirection, and since the mediating driver has to implement these
> > > operations already, what is gained?  
> > The best reason i can come up with is "common code". You already have one API
> > doing that for you so we replicate it in a /dev file?
> > The mdev approach still needs extentions to support what we tried to do (i.e
> > mdev bus might need yo have access on iommu_ops), but as far as i undestand it's
> > a possible case.

Hi, Jean, Please allow me to share my understanding here:
https://zhuanlan.zhihu.com/p/35489035

The reason we do not use the /dev/foo scheme is that the devices to be
shared are programmable accelerators. We cannot fix up the kernel driver for them.
> > > 
> > > Thanks,
> > > Jean  
> 
> 

-- 
			-Kenneth Lee (Hisilicon)

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]                           ` <20180525093959.000040a7-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-26  2:24                             ` Kenneth Lee
  0 siblings, 0 replies; 123+ messages in thread
From: Kenneth Lee @ 2018-05-26  2:24 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Ilias Apalodimas, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo

On Fri, May 25, 2018 at 09:39:59AM +0100, Jonathan Cameron wrote:
> Date: Fri, 25 May 2018 09:39:59 +0100
> From: Jonathan Cameron <Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> To: Ilias Apalodimas <ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> CC: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>,
>  "xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
>  <kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
>  <linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "xuzaibo-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <xuzaibo-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
>  Will Deacon <Will.Deacon-5wv7dgnIgG8@public.gmane.org>, "okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org"
>  <okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>, "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
>  "yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, "ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org"
>  <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, "tn-nYOzD4b6Jr9Wk0Htik3J/w@public.gmane.org" <tn-nYOzD4b6Jr9Wk0Htik3J/w@public.gmane.org>,
>  "joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org" <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>, "robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
>  <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "bharatku-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org" <bharatku-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>,
>  "linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
>  "liudongdong3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <liudongdong3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "rfranz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org"
>  <rfranz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>, "devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
>  <devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org"
>  <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
>  "alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org" <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
>  "rgummal-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org" <rgummal-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>, "thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org"
>  <thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
>  <linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>, "shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg@public.gmane.org"
>  <shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg@public.gmane.org>, "dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org"
>  <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, "liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
>  "jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org" <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
>  "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org" <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
>  Robin Murphy <Robin.Murphy-5wv7dgnIgG8@public.gmane.org>, "christian.koenig-5C7GfCeVMHo@public.gmane.org"
>  <christian.koenig-5C7GfCeVMHo@public.gmane.org>, "nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org"
>  <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>, "baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org"
>  <baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q@public.gmane.org,
>  kenneth-lee-2012-H32Fclmsjq1BDgjK7y7TUQ@public.gmane.org
> Subject: Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
> Message-ID: <20180525093959.000040a7-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> +CC Kenneth Lee
> 
> On Fri, 25 May 2018 09:33:11 +0300
> Ilias Apalodimas <ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> 
> > On Thu, May 24, 2018 at 04:04:39PM +0100, Jean-Philippe Brucker wrote:
> > > On 24/05/18 12:50, Ilias Apalodimas wrote:  
> > > >> Interesting, I hadn't thought about this use-case before. At first I
> > > >> thought you were talking about mdev devices assigned to VMs, but I think
> > > >> you're referring to mdevs assigned to userspace drivers instead? Out of
> > > >> curiosity, is it only theoretical or does someone actually need this?  
> > > > 
> > > > There has been some non upstreamed efforts to have mdev and produce userspace
> > > > drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> > > > we did a proof of concept for ethernet interfaces. At the time we choose not to
> > > > involve the IOMMU for the reason you mentioned, but having it there would be
> > > > good.  
> > > 
> > > I'm guessing there were good reasons to do it that way but I wonder, is
> > > it not simpler to just have the kernel driver create a /dev/foo, with a
> > > standard ioctl/mmap/poll interface? Here VFIO adds a layer of
> > > indirection, and since the mediating driver has to implement these
> > > operations already, what is gained?  
> > The best reason i can come up with is "common code". You already have one API
> > doing that for you so we replicate it in a /dev file?
> > The mdev approach still needs extentions to support what we tried to do (i.e
> > mdev bus might need yo have access on iommu_ops), but as far as i undestand it's
> > a possible case.

Hi, Jean, Please allow me to share my understanding here:
https://zhuanlan.zhihu.com/p/35489035

The reason we do not use the /dev/foo scheme is that the devices to be
shared are programmable accelerators. We cannot fix up the kernel driver for them.
> > > 
> > > Thanks,
> > > Jean  
> 
> 

-- 
			-Kenneth Lee (Hisilicon)

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                           ` <fea420ff-e738-e2ed-ab1e-a3f4dde4b6ef-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-26  3:53                             ` Xu Zaibo
       [not found]                               ` <5B08DA21.3070507-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-05-26  3:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: ashok.raj-ral2JQCrhuEAvxtiuMwx3w, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	christian.koenig-5C7GfCeVMHo


[-- Attachment #1.1: Type: text/plain, Size: 5992 bytes --]

Hi,

On 2018/5/25 17:47, Jean-Philippe Brucker wrote:
> On 25/05/18 03:39, Xu Zaibo wrote:
>> Hi,
>>
>> On 2018/5/24 23:04, Jean-Philippe Brucker wrote:
>>> On 24/05/18 13:35, Xu Zaibo wrote:
>>>>> Right, sva_init() must be called once for any device that intends to use
>>>>> bind(). For the second process though, group->sva_enabled will be true
>>>>> so we won't call sva_init() again, only bind().
>>>> Well, while I create mediated devices based on one parent device to support multiple
>>>> processes(a new process will create a new 'vfio_group' for the corresponding mediated device,
>>>> and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown are basically
>>>> working on parent device, so, as a result, I just only need sva initiation and shutdown on the
>>>> parent device only once. So I change the two as following:
>>>>
>>>> @@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
>>>>         if (features & ~IOMMU_SVA_FEAT_IOPF)
>>>>            return -EINVAL;
>>>>
>>>> +    /* If already exists, do nothing  */
>>>> +    mutex_lock(&dev->iommu_param->lock);
>>>> +    if (dev->iommu_param->sva_param) {
>>>> +        mutex_unlock(&dev->iommu_param->lock);
>>>> +        return 0;
>>>> +    }
>>>> +    mutex_unlock(&dev->iommu_param->lock);
>>>>
>>>>        if (features & IOMMU_SVA_FEAT_IOPF) {
>>>>            ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
>>>>
>>>>
>>>> @@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
>>>>        if (!domain)
>>>>            return -ENODEV;
>>>>
>>>> +    /* If any other process is working on the device, shut down does nothing. */
>>>> +    mutex_lock(&dev->iommu_param->lock);
>>>> +    if (!list_empty(&dev->iommu_param->sva_param->mm_list)) {
>>>> +        mutex_unlock(&dev->iommu_param->lock);
>>>> +        return 0;
>>>> +    }
>>>> +    mutex_unlock(&dev->iommu_param->lock);
>>> I don't think iommu-sva.c is the best place for this, it's probably
>>> better to implement an intermediate layer (the mediating driver), that
>>> calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
>>> vfio-pci would still call these functions itself, but for mdev the
>>> mediating driver keeps a refcount of groups, and calls device_shutdown()
>>> only when freeing the last mdev.
>>>
>>> A device driver (non mdev in this example) expects to be able to free
>>> all its resources after sva_device_shutdown() returns. Imagine the
>>> mm_list isn't empty (mm_exit() is running late), and instead of waiting
>>> in unbind_dev_all() below, we return 0 immediately. Then the calling
>>> driver frees its resources, and the mm_exit callback along with private
>>> data passed to bind() disappear. If a mm_exit() is still running in
>>> parallel, then it will try to access freed data and corrupt memory. So
>>> in this function if mm_list isn't empty, the only thing we can do is wait.
>>>
>> I still don't understand why we should 'unbind_dev_all', is it possible
>> to do a 'unbind_dev_pasid'?
> Not in sva_device_shutdown(), it needs to clean up everything. For
> example you want to physically unplug the device, or assign it to a VM.
> To prevent any leak sva_device_shutdown() needs to remove all bonds. In
> theory there shouldn't be any, since either the driver did unbind_dev(),
> or all process exited. This is a safety net.
>
>> Then we can do other things instead of waiting that user may not like. :)
> They may not like it, but it's for their own good :) At the moment we're
> waiting that:
>
> * All exit_mm() callback for this device have finished. If we don't wait
>    then the caller will free the private data passed to bind and the
>    mm_exit() callback while they are still being used.
>
> * All page requests targeting this device are dealt with. If we don't
>    wait then some requests, that are lingering in the IOMMU PRI queue,
>    may hit the next contexts bound to this device, possibly in a
>    different VM. It may not be too risky (though probably exploitable in
>    some way), but is incredibly messy.
>
> All of this is bounded in time, and normally should be over pretty fast
> unless the device driver's exit_mm() does something strange. If the
> driver did the right thing, there shouldn't be any wait here (although
> there may be one in unbind_dev() for the same reasons - prevent use
> after free).
>
I guess there may be some misunderstandings :).

 From the current patches, '/iommu_sva_device_shutdown/' is called by 
'/vfio_iommu_sva_shutdown/', which
is mainly used by '/vfio_iommu_type1_detach_group/' that is finally 
called by processes' release of vfio facilitiy
automatically or called by 'VFIO_GROUP_UNSET_CONTAINER' manually in the 
processes.

So, just image that 2 processes is working on the device with IOPF 
feature, and the 2 do the following to enable SVA:

/    1.open("/dev/vfio/vfio") ;//
//
//   2.open the group of the devcie by calling open("/dev/vfio/x"), but 
now, //
//     I think the second processes will fail to open the group because 
current VFIO thinks that one group only can be used by one process/vm,
      at present, I use mediated device to create more groups on the 
parent device to support multiple processes;//
/ //

/    3.VFIO_GROUP_SET_CONTAINER;/

/    4.VFIO_SET_IOMMU;
/

///    5.VFIO_IOMMU_BIND;

6.Do some works with the hardware working unit filled by PASID on the 
device;

    7.//VFIO_IOMMU_UNBIND;

//***8.VFIO_GROUP_UNSET_CONTAINER;---here, have to sleep to wait another 
process to finish works of the step 6;

*    9. close(group); close(containner);


/So, my idea is: If it is possible to just release the params or 
facilities that only belong to the current process while the process 
shutdown the device,
and while the last process releases them all. Then, as in the above step 
8, we
don't need to wait, or maybe wait for a very long time if the other 
process is doing lots of work on the device.

Thanks
Zaibo
>


[-- Attachment #1.2: Type: text/html, Size: 46082 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 07/40] iommu: Add a page fault handler
  2018-05-26  0:35                   ` Jacob Pan
@ 2018-05-29 10:00                     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-29 10:00 UTC (permalink / raw)
  To: Jacob Pan
  Cc: ashok.raj-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 26/05/18 01:35, Jacob Pan wrote:
>>>> Not exactly, it's the IOMMU driver that makes sure all LPIG in its
>>>> queues are submitted by the above flush call. In more details the
>>>> flow is:
>>>>
>>>> * Either device driver calls unbind()/sva_device_shutdown(), or the
>>>> process exits.
>>>> * If the device driver called, then it already told the device to
>>>> stop using the PASID. Otherwise we use the mm_exit() callback to
>>>> tell the device driver to stop using the PASID.
> Sorry I still need more clarification. For the PASID termination
> initiated by vfio unbind, I don't see device driver given a chance to
> stop PASID. Seems just call __iommu_sva_unbind_device() which already
> assume device stopped issuing DMA with the PASID.
> So it is the vfio unbind caller responsible for doing driver callback
> to stop DMA on a given PASID?

Yes, but I don't know how to implement this. Since PCI doesn't formalize
the PASID stop mechanism and the device doesn't have a kernel driver,
VFIO would need help from the userspace driver for stopping PASID
(notify the userspace driver when an other process exits).


>>>> * In either case, when receiving a stop request from the driver,
>>>> the device sends the LPIGs to the IOMMU queue.
>>>> * Then, the flush call above ensures that the IOMMU reports the
>>>> LPIG with iommu_report_device_fault.
>>>> * While submitting all LPIGs for this PASID to the work queue,
>>>> ipof_queue_fault also picked up all partial faults, so the partial
>>>> list is clean.
>>>>
>>>> Maybe I should improve this comment?
>>>>  
>>> thanks for explaining. LPIG submission is done by device
>>> asynchronously w.r.t. driver stopping/decommission PASID.  
>>
>> Hmm, it should really be synchronous, otherwise there is no way to
>> know when the PASID can be decommissioned. We need a guarantee such
>> as the one in 6.20.1 of the PCIe spec, "Managing PASID TLP Prefix
>> Usage":
>>
>> "When the stop request mechanism indicates completion, the Function
>> has:
>> * Completed all Non-Posted Requests associated with this PASID.
>> * Flushed to the host all Posted Requests addressing host memory in
>> all TCs that were used by the PASID."
>>
>> That's in combination with "The function shall [...] finish
>> transmitting any multi-page Page Request Messages for this PASID
>> (i.e. send the Page Request Message with the L bit Set)." from the
>> ATS spec.
>>
> I am not contesting on the device side, what I meant was from the
> host IOMMU driver perspective, LPIG is received via IOMMU host queue,
> therefore asynchronous. Not sure about ARM, but on VT-d LPIG submission
> could meet queue full condition. So per VT-d spec, iommu will generate a
> successful auto response to the device. At this point, assume we
> already stopped the given PASID on the device, there might not be
> another LPIG sent for the device. Therefore, you could have a partial
> list. I think we can just drop the requests in the partial list for
> that PASID until the PASID gets re-allocated.

Indeed, I'll add this in next version. For a complete solution to the
queue-full condition (which seems to behave the same way on ARM) I was
thinking the IOMMU driver should also have a method for removing all
partial faults when detecting a queue overflow. Since it doesn't know
which PRGs did receive an auto-response, all it can do is remove all
partial faults, for all devices using this queue. But freeing the stuck
partial faults in flush() and remove_device() should be good enough

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 39/40] iommu/arm-smmu-v3: Add support for PRI
       [not found]         ` <BLUPR0201MB150513BBAA161355DE9B3A48A5690-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-05-29 10:27           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-29 10:27 UTC (permalink / raw)
  To: Bharat Kumar Gogada,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ravikiran Gummaluri,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

On 25/05/18 15:08, Bharat Kumar Gogada wrote:
>> +	master->can_fault = true;
>> +	master->ste.prg_resp_needs_ssid =
>> pci_prg_resp_requires_prefix(pdev);
> 
> Any reason why this is not cleared in arm_smmu_disable_pri ?

Actually, setting it here is wrong. Since we now call enable_pri()
lazily, prg_resp_needs_ssid isn't initialized when writing the STE. That
bit is read by the SMMU when the PRIQ is full and it needs to
auto-respond. Fortunately the PRI doesn't need to be enabled in order to
read this bit, so we can move pci_prg_resp_requires_prefix() to
add_device() and clear the bit in remove_device(). Thanks for catching this.

Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                               ` <5B08DA21.3070507-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-05-29 11:55                                 ` Jean-Philippe Brucker
       [not found]                                   ` <99ff4f89-86ef-a251-894c-8aa8a47d2a69-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-05-29 11:55 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: ashok.raj-ral2JQCrhuEAvxtiuMwx3w, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	christian.koenig-5C7GfCeVMHo

(If possible, please reply in plain text to the list. Reading this in a
text-only reader is confusing, because all quotes have the same level)

On 26/05/18 04:53, Xu Zaibo wrote:
> I guess there may be some misunderstandings :).
> 
> From the current patches, 'iommu_sva_device_shutdown' is called by 'vfio_iommu_sva_shutdown', which
> is mainly used by 'vfio_iommu_type1_detach_group' that is finally called by processes' release of vfio facilitiy
> automatically or called by 'VFIO_GROUP_UNSET_CONTAINER' manually in the processes.
> 
> So, just image that 2 processes is working on the device with IOPF feature, and the 2 do the following to enable SVA:
> 
>     1.open("/dev/vfio/vfio") ;
> 
>    2.open the group of the devcie by calling open("/dev/vfio/x"), but now,
>      I think the second processes will fail to open the group because current VFIO thinks that one group only can be used by one process/vm,
>      at present, I use mediated device to create more groups on the parent device to support multiple processes;
> 
>     3.VFIO_GROUP_SET_CONTAINER;
> 
>     4.VFIO_SET_IOMMU;
> 
>     5.VFIO_IOMMU_BIND;

I have a concern regarding your driver. With mdev you can't allow
processes to program the PASID themselves, since the parent device has a
single PASID table. You lose all isolation since processes could write
any value in the PASID field and access address spaces of other
processes bound to this parent device (even if the BIND call was for
other mdevs).

The wrap driver has to mediate calls to bind(), and either program the
PASID into the device itself, or at least check that, when receiving a
SET_PASID ioctl from a process, the given PASID was actually allocated
to the process.

>     6.Do some works with the hardware working unit filled by PASID on the device;
> 
>    7.VFIO_IOMMU_UNBIND;
> 
>     8.VFIO_GROUP_UNSET_CONTAINER;---here, have to sleep to wait another process to finish works of the step 6;
> 
>     9. close(group); close(containner);
> 
> 
> So, my idea is: If it is possible to just release the params or facilities that only belong to the current process while the process shutdown the device,
> and while the last process releases them all. Then, as in the above step 8, we
> don't need to wait, or maybe wait for a very long time if the other process is doing lots of work on the device.
Given that you need to notify the mediating driver of IOMMU_BIND calls
as explained above, you could do something similar for shutdown: from
the mdev driver, call iommu_sva_shutdown_device() only for the last mdev.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                                   ` <99ff4f89-86ef-a251-894c-8aa8a47d2a69-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-29 12:24                                     ` Xu Zaibo
  0 siblings, 0 replies; 123+ messages in thread
From: Xu Zaibo @ 2018-05-29 12:24 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: ashok.raj-ral2JQCrhuEAvxtiuMwx3w, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	christian.koenig-5C7GfCeVMHo

Hi,

On 2018/5/29 19:55, Jean-Philippe Brucker wrote:
> (If possible, please reply in plain text to the list. Reading this in a
> text-only reader is confusing, because all quotes have the same level)
Sorry for that, I have reset the thunderbird, :) thanks.
> On 26/05/18 04:53, Xu Zaibo wrote:
>> I guess there may be some misunderstandings :).
>>
>>  From the current patches, 'iommu_sva_device_shutdown' is called by 'vfio_iommu_sva_shutdown', which
>> is mainly used by 'vfio_iommu_type1_detach_group' that is finally called by processes' release of vfio facilitiy
>> automatically or called by 'VFIO_GROUP_UNSET_CONTAINER' manually in the processes.
>>
>> So, just image that 2 processes is working on the device with IOPF feature, and the 2 do the following to enable SVA:
>>
>>      1.open("/dev/vfio/vfio") ;
>>
>>     2.open the group of the devcie by calling open("/dev/vfio/x"), but now,
>>       I think the second processes will fail to open the group because current VFIO thinks that one group only can be used by one process/vm,
>>       at present, I use mediated device to create more groups on the parent device to support multiple processes;
>>
>>      3.VFIO_GROUP_SET_CONTAINER;
>>
>>      4.VFIO_SET_IOMMU;
>>
>>      5.VFIO_IOMMU_BIND;
> I have a concern regarding your driver. With mdev you can't allow
> processes to program the PASID themselves, since the parent device has a
> single PASID table. You lose all isolation since processes could write
> any value in the PASID field and access address spaces of other
> processes bound to this parent device (even if the BIND call was for
> other mdevs).
Yes, if wrapdrive do nothing on this PASID setting procedure in kernel 
space, I think
there definitely exists this security risk.
> The wrap driver has to mediate calls to bind(), and either program the
> PASID into the device itself, or at least check that, when receiving a
> SET_PASID ioctl from a process, the given PASID was actually allocated
> to the process.
Yes, good advice, thanks.
>
>>      6.Do some works with the hardware working unit filled by PASID on the device;
>>
>>     7.VFIO_IOMMU_UNBIND;
>>
>>      8.VFIO_GROUP_UNSET_CONTAINER;---here, have to sleep to wait another process to finish works of the step 6;
>>
>>      9. close(group); close(containner);
>>
>>
>> So, my idea is: If it is possible to just release the params or facilities that only belong to the current process while the process shutdown the device,
>> and while the last process releases them all. Then, as in the above step 8, we
>> don't need to wait, or maybe wait for a very long time if the other process is doing lots of work on the device.
> Given that you need to notify the mediating driver of IOMMU_BIND calls
> as explained above, you could do something similar for shutdown: from
> the mdev driver, call iommu_sva_shutdown_device() only for the last mdev.
Currently, I add an API to check if it is the last mdev in wrapdrive, as 
vfio shutdowns the device,
it call the API to do the check at first.

Thanks
Zaibo


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [PATCH v2 21/40] iommu/arm-smmu-v3: Add support for Substream IDs
       [not found]     ` <20180511190641.23008-22-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-05-31 11:01       ` Bharat Kumar Gogada
       [not found]         ` <BLUPR0201MB1505AA55707BE2E13392FFAFA5630-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Bharat Kumar Gogada @ 2018-05-31 11:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ravikiran Gummaluri,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

> 
> At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
> address space to each device. SMMUv3 allows to associate multiple address
> spaces per device. In addition to the Stream ID (SID), that identifies a device,
> we can now have Substream IDs (SSID) identifying an address space.
> In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
> Address-Space ID (PASID).
> 
> Prepare the driver for SSID support, by adding context descriptor tables in
> STEs (previously a single static context descriptor). A complete
> stage-1 walk is now performed like this by the SMMU:
> 
>       Stream tables          Ctx. tables          Page tables
>         +--------+   ,------->+-------+   ,------->+-------+
>         :        :   |        :       :   |        :       :
>         +--------+   |        +-------+   |        +-------+
>    SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
>         +--------+            +-------+            +-------+
>         :        :            :       :            :       :
>         +--------+            +-------+            +-------+
> 
> We only implement one level of context descriptor table for now, but as with
> stream and page tables, an SSID can be split to target multiple levels of
> tables.
> 
> In all stream table entries, we set S1DSS=SSID0 mode, making translations
> without an ssid use context descriptor 0.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2: use GENMASK throughout SMMU patches
> ---
>  drivers/iommu/arm-smmu-v3-context.c | 141 +++++++++++++++++++++------
> -
>  drivers/iommu/arm-smmu-v3.c         |  82 +++++++++++++++-
>  drivers/iommu/iommu-pasid-table.h   |   7 ++
>  3 files changed, 190 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-
> smmu-v3-context.c
> index 15d3d02c59b2..0969a3626110 100644
> --- a/drivers/iommu/arm-smmu-v3-context.c
> +++ b/drivers/iommu/arm-smmu-v3-context.c
> @@ -62,11 +62,14 @@ struct arm_smmu_cd {  #define
> pasid_entry_to_cd(entry) \
>  	container_of((entry), struct arm_smmu_cd, entry)
> 
> +struct arm_smmu_cd_table {
> +	__le64				*ptr;
> +	dma_addr_t			ptr_dma;
> +};
> +
>  struct arm_smmu_cd_tables {
>  	struct iommu_pasid_table	pasid;
> -
> -	void				*ptr;
> -	dma_addr_t			ptr_dma;
> +	struct arm_smmu_cd_table	table;
>  };
> 
>  #define pasid_to_cd_tables(pasid_table) \ @@ -77,6 +80,36 @@ struct
> arm_smmu_cd_tables {
> 
>  static DEFINE_IDA(asid_ida);
> 
> +static int arm_smmu_alloc_cd_leaf_table(struct device *dev,
> +					struct arm_smmu_cd_table *desc,
> +					size_t num_entries)
> +{
> +	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
> +
> +	desc->ptr = dmam_alloc_coherent(dev, size, &desc->ptr_dma,
> +					GFP_ATOMIC | __GFP_ZERO);
> +	if (!desc->ptr) {
> +		dev_warn(dev, "failed to allocate context descriptor
> table\n");
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static void arm_smmu_free_cd_leaf_table(struct device *dev,
> +					struct arm_smmu_cd_table *desc,
> +					size_t num_entries)
> +{
> +	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
> +
> +	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma); }
> +
> +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl,
> u32
> +ssid) {
> +	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS; }
> +
>  static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)  {
>  	u64 val = 0;
> @@ -95,34 +128,74 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
>  	return val;
>  }
> 
> -static void arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl,
> -				    struct arm_smmu_cd *cd)
> +static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int
> ssid,
> +				   struct arm_smmu_cd *cd)
>  {
>  	u64 val;
> -	__u64 *cdptr = tbl->ptr;
> +	bool cd_live;
> +	__le64 *cdptr = arm_smmu_get_cd_ptr(tbl, ssid);
>  	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
> 
>  	/*
> -	 * We don't need to issue any invalidation here, as we'll invalidate
> -	 * the STE when installing the new entry anyway.
> +	 * This function handles the following cases:
> +	 *
> +	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
> +	 * (2) Install a secondary CD, for SID+SSID traffic, followed by an
> +	 *     invalidation.
> +	 * (3) Update ASID of primary CD. This is allowed by atomically
> writing
> +	 *     the first 64 bits of the CD, followed by invalidation of the old
> +	 *     entry and mappings.
> +	 * (4) Remove a secondary CD and invalidate it.
>  	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
> +
> +	if (!cdptr)
> +		return -ENOMEM;
> +
> +	val = le64_to_cpu(cdptr[0]);
> +	cd_live = !!(val & CTXDESC_CD_0_V);
> +
> +	if (!cd) { /* (4) */
> +		cdptr[0] = 0;
> +	} else if (cd_live) { /* (3) */
> +		val &= ~CTXDESC_CD_0_ASID;
> +		val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->entry.tag);
> +
> +		cdptr[0] = cpu_to_le64(val);
> +		/*
> +		 * Until CD+TLB invalidation, both ASIDs may be used for
> tagging
> +		 * this substream's traffic
> +		 */
> +	} else { /* (1) and (2) */
> +		cdptr[1] = cpu_to_le64(cd->ttbr &
> CTXDESC_CD_1_TTB0_MASK);
> +		cdptr[2] = 0;
> +		cdptr[3] = cpu_to_le64(cd->mair);
> +
> +		/*
> +		 * STE is live, and the SMMU might fetch this CD at any
> +		 * time. Ensure it observes the rest of the CD before we
> +		 * enable it.
> +		 */
> +		iommu_pasid_flush(&tbl->pasid, ssid, true);
> +
> +
> +		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
>  #ifdef __BIG_ENDIAN
> -	      CTXDESC_CD_0_ENDI |
> +		      CTXDESC_CD_0_ENDI |
>  #endif
> -	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
> -	      CTXDESC_CD_0_AA64 | FIELD_PREP(CTXDESC_CD_0_ASID, cd-
> >entry.tag) |
> -	      CTXDESC_CD_0_V;
> +		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
> CTXDESC_CD_0_ASET |
> +		      CTXDESC_CD_0_AA64 |
> +		      FIELD_PREP(CTXDESC_CD_0_ASID, cd->entry.tag) |
> +		      CTXDESC_CD_0_V;
> 
> -	if (cfg->stall)
> -		val |= CTXDESC_CD_0_S;
> +		if (cfg->stall)
> +			val |= CTXDESC_CD_0_S;
> 
> -	cdptr[0] = cpu_to_le64(val);
> +		cdptr[0] = cpu_to_le64(val);
> +	}
> 
> -	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK;
> -	cdptr[1] = cpu_to_le64(val);
> +	iommu_pasid_flush(&tbl->pasid, ssid, true);
> 
> -	cdptr[3] = cpu_to_le64(cd->mair);
> +	return 0;
>  }
> 
>  static void arm_smmu_free_cd(struct iommu_pasid_entry *entry) @@ -
> 190,8 +263,10 @@ static int arm_smmu_set_cd(struct
> iommu_pasid_table_ops *ops, int pasid,
>  	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
>  	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
> 
> -	arm_smmu_write_ctx_desc(tbl, cd);
> -	return 0;
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return -EINVAL;
> +
> +	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
>  }
> 
>  static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int
> pasid, @@ -199,30 +274,26 @@ static void arm_smmu_clear_cd(struct
> iommu_pasid_table_ops *ops, int pasid,  {
>  	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> 
> -	arm_smmu_write_ctx_desc(tbl, NULL);
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return;
> +
> +	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
>  }
> 
>  static struct iommu_pasid_table *
>  arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void
> *cookie)  {
> +	int ret;
>  	struct arm_smmu_cd_tables *tbl;
>  	struct device *dev = cfg->iommu_dev;
> 
> -	if (cfg->order) {
> -		/* TODO: support SSID */
> -		return NULL;
> -	}
> -
>  	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
>  	if (!tbl)
>  		return NULL;
> 
> -	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
> -				       &tbl->ptr_dma, GFP_KERNEL |
> __GFP_ZERO);
> -	if (!tbl->ptr) {
> -		dev_warn(dev, "failed to allocate context descriptor\n");
> +	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg-
> >order);
> +	if (ret)
>  		goto err_free_tbl;
> -	}
> 
>  	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
>  		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
> @@ -230,7 +301,8 @@ arm_smmu_alloc_cd_tables(struct
> iommu_pasid_table_cfg *cfg, void *cookie)
>  		.set_entry		= arm_smmu_set_cd,
>  		.clear_entry		= arm_smmu_clear_cd,
>  	};
> -	cfg->base = tbl->ptr_dma;
> +	cfg->base			= tbl->table.ptr_dma;
> +	cfg->arm_smmu.s1fmt		= ARM_SMMU_S1FMT_LINEAR;
> 
>  	return &tbl->pasid;
> 
> @@ -246,8 +318,7 @@ static void arm_smmu_free_cd_tables(struct
> iommu_pasid_table *pasid_table)
>  	struct device *dev = cfg->iommu_dev;
>  	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
> 
> -	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
> -			   tbl->ptr, tbl->ptr_dma);
> +	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
>  	devm_kfree(dev, tbl);
>  }
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 68764a200e44..16b08f2fb8ac 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -224,10 +224,14 @@
>  #define STRTAB_STE_0_CFG_S2_TRANS	6
> 
>  #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
> -#define STRTAB_STE_0_S1FMT_LINEAR	0
>  #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
> 
> +#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
> +#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
> +#define STRTAB_STE_1_S1DSS_BYPASS	0x1
> +#define STRTAB_STE_1_S1DSS_SSID0	0x2
> +
>  #define STRTAB_STE_1_S1C_CACHE_NC	0UL
>  #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
>  #define STRTAB_STE_1_S1C_CACHE_WT	2UL
> @@ -275,6 +279,7 @@
>  #define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
>  #define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
> 
> +#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
>  #define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
>  #define CMDQ_CFGI_1_LEAF		(1UL << 0)
>  #define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
> @@ -381,8 +386,11 @@ struct arm_smmu_cmdq_ent {
> 
>  		#define CMDQ_OP_CFGI_STE	0x3
>  		#define CMDQ_OP_CFGI_ALL	0x4
> +		#define CMDQ_OP_CFGI_CD		0x5
> +		#define CMDQ_OP_CFGI_CD_ALL	0x6
>  		struct {
>  			u32			sid;
> +			u32			ssid;
>  			union {
>  				bool		leaf;
>  				u8		span;
> @@ -555,6 +563,7 @@ struct arm_smmu_master_data {
>  	struct list_head		list; /* domain->devices */
> 
>  	struct device			*dev;
> +	size_t				ssid_bits;
>  };
> 
>  /* SMMU private data for an IOMMU domain */ @@ -753,10 +762,16 @@
> static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct
> arm_smmu_cmdq_ent *ent)
>  		cmd[1] |= FIELD_PREP(CMDQ_PREFETCH_1_SIZE, ent-
> >prefetch.size);
>  		cmd[1] |= ent->prefetch.addr &
> CMDQ_PREFETCH_1_ADDR_MASK;
>  		break;
> +	case CMDQ_OP_CFGI_CD:
> +		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
> +		/* Fallthrough */
>  	case CMDQ_OP_CFGI_STE:
>  		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
>  		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
>  		break;
> +	case CMDQ_OP_CFGI_CD_ALL:
> +		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
> +		break;
>  	case CMDQ_OP_CFGI_ALL:
>  		/* Cover the entire SID range */
>  		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31); @@ -
> 1048,8 +1063,11 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_device *smmu, u32 sid,
>  	}
> 
>  	if (ste->s1_cfg) {
> +		struct iommu_pasid_table_cfg *cfg = &ste->s1_cfg->tables;
> +
>  		BUG_ON(ste_live);
>  		dst[1] = cpu_to_le64(
> +			 FIELD_PREP(STRTAB_STE_1_S1DSS,
> STRTAB_STE_1_S1DSS_SSID0) |
>  			 FIELD_PREP(STRTAB_STE_1_S1CIR,
> STRTAB_STE_1_S1C_CACHE_WBRA) |
>  			 FIELD_PREP(STRTAB_STE_1_S1COR,
> STRTAB_STE_1_S1C_CACHE_WBRA) |
>  			 FIELD_PREP(STRTAB_STE_1_S1CSH,
> ARM_SMMU_SH_ISH) | @@ -1063,7 +1081,9 @@ static void
> arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
> 
>  		val |= (ste->s1_cfg->tables.base &
> STRTAB_STE_0_S1CTXPTR_MASK) |
> -			FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_S1_TRANS);
> +			FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_S1_TRANS) |
> +			FIELD_PREP(STRTAB_STE_0_S1CDMAX, cfg->order) |
> +			FIELD_PREP(STRTAB_STE_0_S1FMT, cfg-
> >arm_smmu.s1fmt);
>  	}
> 
>  	if (ste->s2_cfg) {
> @@ -1352,17 +1372,62 @@ static const struct iommu_gather_ops
> arm_smmu_gather_ops = {  };
> 
>  /* PASID TABLE API */
> +static void __arm_smmu_sync_cd(struct arm_smmu_domain
> *smmu_domain,
> +			       struct arm_smmu_cmdq_ent *cmd) {
> +	size_t i;
> +	unsigned long flags;
> +	struct arm_smmu_master_data *master;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list) {
> +		struct iommu_fwspec *fwspec = master->dev-
> >iommu_fwspec;
> +
> +		for (i = 0; i < fwspec->num_ids; i++) {
> +			cmd->cfgi.sid = fwspec->ids[i];
> +			arm_smmu_cmdq_issue_cmd(smmu, cmd);
> +		}
> +	}
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	__arm_smmu_tlb_sync(smmu);
> +}
> +
>  static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)  {
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,

Hi Jean, here CMDQ_OP_CFGI_CD opcode 0x5. 

> +		.cfgi	= {
> +			.ssid	= ssid,
> +			.leaf	= leaf,
> +		},
> +	};
> +
> +	__arm_smmu_sync_cd(cookie, &cmd);
>  }
> 

Regards,
Bharat

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 21/40] iommu/arm-smmu-v3: Add support for Substream IDs
       [not found]         ` <BLUPR0201MB1505AA55707BE2E13392FFAFA5630-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-06-01 10:46           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-06-01 10:46 UTC (permalink / raw)
  To: Bharat Kumar Gogada,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ravikiran Gummaluri,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, christian.koenig-5C7GfCeVMHo

On 31/05/18 12:01, Bharat Kumar Gogada wrote:
>>  static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)  {
>> +	struct arm_smmu_cmdq_ent cmd = {
>> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,
> 
> Hi Jean, here CMDQ_OP_CFGI_CD opcode 0x5. 

Woops, nice catch!

I pushed fixes for all comments so far to branch sva/current

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]                           ` <20180526022445.GA6069@kllp05>
@ 2018-06-11 16:10                             ` Kenneth Lee
  2018-06-11 16:10                             ` Kenneth Lee
  1 sibling, 0 replies; 123+ messages in thread
From: Kenneth Lee @ 2018-06-11 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Ilias Apalodimas, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo

On Sat, May 26, 2018 at 10:24:45AM +0800, Kenneth Lee wrote:
> Date: Sat, 26 May 2018 10:24:45 +0800
> From: Kenneth Lee <Kenneth-Lee-2012-H32Fclmsjq1BDgjK7y7TUQ@public.gmane.org>
> To: Jonathan Cameron <Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Cc: Ilias Apalodimas <ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>, Jean-Philippe Brucker
>  <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>, "xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org"
>  <xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
>  "linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
>  "xuzaibo-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <xuzaibo-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Will Deacon
>  <Will.Deacon-5wv7dgnIgG8@public.gmane.org>, "okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org" <okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
>  "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>, "yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org"
>  <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, "ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
>  "tn-nYOzD4b6Jr9Wk0Htik3J/w@public.gmane.org" <tn-nYOzD4b6Jr9Wk0Htik3J/w@public.gmane.org>, "joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org" <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>,
>  "robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "bharatku-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org"
>  <bharatku-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>, "linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
>  <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "liudongdong3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org"
>  <liudongdong3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "rfranz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org" <rfranz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>,
>  "devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
>  "kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Jacob Pan
>  <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>, "alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
>  <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "rgummal-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org" <rgummal-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>,
>  "thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
>  "linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
>  <linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>, "shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg@public.gmane.org"
>  <shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg@public.gmane.org>, "dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org"
>  <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, "liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
>  "jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org" <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
>  "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org" <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
>  Robin Murphy <Robin.Murphy-5wv7dgnIgG8@public.gmane.org>, "christian.koenig-5C7GfCeVMHo@public.gmane.org"
>  <christian.koenig-5C7GfCeVMHo@public.gmane.org>, "nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org"
>  <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>, "baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org"
>  <baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q@public.gmane.org
> Subject: Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
> Message-ID: <20180526022445.GA6069@kllp05>
> 
> On Fri, May 25, 2018 at 09:39:59AM +0100, Jonathan Cameron wrote:
> > Date: Fri, 25 May 2018 09:39:59 +0100
> > From: Jonathan Cameron <Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > To: Ilias Apalodimas <ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> > CC: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>,
> >  "xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
> >  <kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
> >  <linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "xuzaibo-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <xuzaibo-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
> >  Will Deacon <Will.Deacon-5wv7dgnIgG8@public.gmane.org>, "okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org"
> >  <okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>, "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
> >  "yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, "ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org"
> >  <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, "tn-nYOzD4b6Jr9Wk0Htik3J/w@public.gmane.org" <tn-nYOzD4b6Jr9Wk0Htik3J/w@public.gmane.org>,
> >  "joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org" <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>, "robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
> >  <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "bharatku-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org" <bharatku-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>,
> >  "linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
> >  "liudongdong3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <liudongdong3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "rfranz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org"
> >  <rfranz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>, "devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
> >  <devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org"
> >  <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
> >  "alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org" <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
> >  "rgummal-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org" <rgummal-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>, "thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org"
> >  <thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, "linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
> >  <linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>, "shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg@public.gmane.org"
> >  <shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg@public.gmane.org>, "dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org"
> >  <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, "liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" <liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
> >  "jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org" <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
> >  "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org" <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
> >  Robin Murphy <Robin.Murphy-5wv7dgnIgG8@public.gmane.org>, "christian.koenig-5C7GfCeVMHo@public.gmane.org"
> >  <christian.koenig-5C7GfCeVMHo@public.gmane.org>, "nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org"
> >  <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>, "baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org"
> >  <baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q@public.gmane.org,
> >  kenneth-lee-2012-H32Fclmsjq1BDgjK7y7TUQ@public.gmane.org
> > Subject: Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
> > Message-ID: <20180525093959.000040a7-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > 
> > +CC Kenneth Lee
> > 
> > On Fri, 25 May 2018 09:33:11 +0300
> > Ilias Apalodimas <ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > 
> > > On Thu, May 24, 2018 at 04:04:39PM +0100, Jean-Philippe Brucker wrote:
> > > > On 24/05/18 12:50, Ilias Apalodimas wrote:  
> > > > >> Interesting, I hadn't thought about this use-case before. At first I
> > > > >> thought you were talking about mdev devices assigned to VMs, but I think
> > > > >> you're referring to mdevs assigned to userspace drivers instead? Out of
> > > > >> curiosity, is it only theoretical or does someone actually need this?  
> > > > > 
> > > > > There has been some non upstreamed efforts to have mdev and produce userspace
> > > > > drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> > > > > we did a proof of concept for ethernet interfaces. At the time we choose not to
> > > > > involve the IOMMU for the reason you mentioned, but having it there would be
> > > > > good.  
> > > > 
> > > > I'm guessing there were good reasons to do it that way but I wonder, is
> > > > it not simpler to just have the kernel driver create a /dev/foo, with a
> > > > standard ioctl/mmap/poll interface? Here VFIO adds a layer of
> > > > indirection, and since the mediating driver has to implement these
> > > > operations already, what is gained?  
> > > The best reason i can come up with is "common code". You already have one API
> > > doing that for you so we replicate it in a /dev file?
> > > The mdev approach still needs extentions to support what we tried to do (i.e
> > > mdev bus might need yo have access on iommu_ops), but as far as i undestand it's
> > > a possible case.
> 
> Hi, Jean, Please allow me to share my understanding here:
> https://zhuanlan.zhihu.com/p/35489035
> 
> The reason we do not use the /dev/foo scheme is that the devices to be
> shared are programmable accelerators. We cannot fix up the kernel driver for them.
> > > > 
> > > > Thanks,
> > > > Jean  
> > 
> > 
> 
> -- 
> 			-Kenneth Lee (Hisilicon)

I just found this mail was missed in the mailing list. I tried it once
again.

-- 
			-Kenneth Lee (Hisilicon)

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]                           ` <20180526022445.GA6069@kllp05>
  2018-06-11 16:10                             ` Kenneth Lee
@ 2018-06-11 16:10                             ` Kenneth Lee
  1 sibling, 0 replies; 123+ messages in thread
From: Kenneth Lee @ 2018-06-11 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: xieyisheng1, liubo95, kvm, linux-pci, xuzaibo, Will Deacon,
	okaya, linux-mm, liguozhu, yi.l.liu, ashok.raj,
	Jean-Philippe Brucker, tn, joro, iommu, bharatku, linux-acpi,
	liudongdong3, rfranz, devic

On Sat, May 26, 2018 at 10:24:45AM +0800, Kenneth Lee wrote:
> Date: Sat, 26 May 2018 10:24:45 +0800
> From: Kenneth Lee <Kenneth-Lee-2012@foxmail.com>
> To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>, Jean-Philippe Brucker
>  <jean-philippe.brucker@arm.com>, "xieyisheng1@huawei.com"
>  <xieyisheng1@huawei.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
>  "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
>  "xuzaibo@huawei.com" <xuzaibo@huawei.com>, Will Deacon
>  <Will.Deacon@arm.com>, "okaya@codeaurora.org" <okaya@codeaurora.org>,
>  "linux-mm@kvack.org" <linux-mm@kvack.org>, "yi.l.liu@intel.com"
>  <yi.l.liu@intel.com>, "ashok.raj@intel.com" <ashok.raj@intel.com>,
>  "tn@semihalf.com" <tn@semihalf.com>, "joro@8bytes.org" <joro@8bytes.org>,
>  "robdclark@gmail.com" <robdclark@gmail.com>, "bharatku@xilinx.com"
>  <bharatku@xilinx.com>, "linux-acpi@vger.kernel.org"
>  <linux-acpi@vger.kernel.org>, "liudongdong3@huawei.com"
>  <liudongdong3@huawei.com>, "rfranz@cavium.com" <rfranz@cavium.com>,
>  "devicetree@vger.kernel.org" <devicetree@vger.kernel.org>,
>  "kevin.tian@intel.com" <kevin.tian@intel.com>, Jacob Pan
>  <jacob.jun.pan@linux.intel.com>, "alex.williamson@redhat.com"
>  <alex.williamson@redhat.com>, "rgummal@xilinx.com" <rgummal@xilinx.com>,
>  "thunder.leizhen@huawei.com" <thunder.leizhen@huawei.com>,
>  "linux-arm-kernel@lists.infradead.org"
>  <linux-arm-kernel@lists.infradead.org>, "shunyong.yang@hxt-semitech.com"
>  <shunyong.yang@hxt-semitech.com>, "dwmw2@infradead.org"
>  <dwmw2@infradead.org>, "liubo95@huawei.com" <liubo95@huawei.com>,
>  "jcrouse@codeaurora.org" <jcrouse@codeaurora.org>,
>  "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>,
>  Robin Murphy <Robin.Murphy@arm.com>, "christian.koenig@amd.com"
>  <christian.koenig@amd.com>, "nwatters@codeaurora.org"
>  <nwatters@codeaurora.org>, "baolu.lu@linux.intel.com"
>  <baolu.lu@linux.intel.com>, liguozhu@hisilicon.com
> Subject: Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
> Message-ID: <20180526022445.GA6069@kllp05>
> 
> On Fri, May 25, 2018 at 09:39:59AM +0100, Jonathan Cameron wrote:
> > Date: Fri, 25 May 2018 09:39:59 +0100
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > To: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> > CC: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
> >  "xieyisheng1@huawei.com" <xieyisheng1@huawei.com>, "kvm@vger.kernel.org"
> >  <kvm@vger.kernel.org>, "linux-pci@vger.kernel.org"
> >  <linux-pci@vger.kernel.org>, "xuzaibo@huawei.com" <xuzaibo@huawei.com>,
> >  Will Deacon <Will.Deacon@arm.com>, "okaya@codeaurora.org"
> >  <okaya@codeaurora.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>,
> >  "yi.l.liu@intel.com" <yi.l.liu@intel.com>, "ashok.raj@intel.com"
> >  <ashok.raj@intel.com>, "tn@semihalf.com" <tn@semihalf.com>,
> >  "joro@8bytes.org" <joro@8bytes.org>, "robdclark@gmail.com"
> >  <robdclark@gmail.com>, "bharatku@xilinx.com" <bharatku@xilinx.com>,
> >  "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
> >  "liudongdong3@huawei.com" <liudongdong3@huawei.com>, "rfranz@cavium.com"
> >  <rfranz@cavium.com>, "devicetree@vger.kernel.org"
> >  <devicetree@vger.kernel.org>, "kevin.tian@intel.com"
> >  <kevin.tian@intel.com>, Jacob Pan <jacob.jun.pan@linux.intel.com>,
> >  "alex.williamson@redhat.com" <alex.williamson@redhat.com>,
> >  "rgummal@xilinx.com" <rgummal@xilinx.com>, "thunder.leizhen@huawei.com"
> >  <thunder.leizhen@huawei.com>, "linux-arm-kernel@lists.infradead.org"
> >  <linux-arm-kernel@lists.infradead.org>, "shunyong.yang@hxt-semitech.com"
> >  <shunyong.yang@hxt-semitech.com>, "dwmw2@infradead.org"
> >  <dwmw2@infradead.org>, "liubo95@huawei.com" <liubo95@huawei.com>,
> >  "jcrouse@codeaurora.org" <jcrouse@codeaurora.org>,
> >  "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>,
> >  Robin Murphy <Robin.Murphy@arm.com>, "christian.koenig@amd.com"
> >  <christian.koenig@amd.com>, "nwatters@codeaurora.org"
> >  <nwatters@codeaurora.org>, "baolu.lu@linux.intel.com"
> >  <baolu.lu@linux.intel.com>, liguozhu@hisilicon.com,
> >  kenneth-lee-2012@foxmail.com
> > Subject: Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
> > Message-ID: <20180525093959.000040a7@huawei.com>
> > 
> > +CC Kenneth Lee
> > 
> > On Fri, 25 May 2018 09:33:11 +0300
> > Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:
> > 
> > > On Thu, May 24, 2018 at 04:04:39PM +0100, Jean-Philippe Brucker wrote:
> > > > On 24/05/18 12:50, Ilias Apalodimas wrote:  
> > > > >> Interesting, I hadn't thought about this use-case before. At first I
> > > > >> thought you were talking about mdev devices assigned to VMs, but I think
> > > > >> you're referring to mdevs assigned to userspace drivers instead? Out of
> > > > >> curiosity, is it only theoretical or does someone actually need this?  
> > > > > 
> > > > > There has been some non upstreamed efforts to have mdev and produce userspace
> > > > > drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> > > > > we did a proof of concept for ethernet interfaces. At the time we choose not to
> > > > > involve the IOMMU for the reason you mentioned, but having it there would be
> > > > > good.  
> > > > 
> > > > I'm guessing there were good reasons to do it that way but I wonder, is
> > > > it not simpler to just have the kernel driver create a /dev/foo, with a
> > > > standard ioctl/mmap/poll interface? Here VFIO adds a layer of
> > > > indirection, and since the mediating driver has to implement these
> > > > operations already, what is gained?  
> > > The best reason i can come up with is "common code". You already have one API
> > > doing that for you so we replicate it in a /dev file?
> > > The mdev approach still needs extentions to support what we tried to do (i.e
> > > mdev bus might need yo have access on iommu_ops), but as far as i undestand it's
> > > a possible case.
> 
> Hi, Jean, Please allow me to share my understanding here:
> https://zhuanlan.zhihu.com/p/35489035
> 
> The reason we do not use the /dev/foo scheme is that the devices to be
> shared are programmable accelerators. We cannot fix up the kernel driver for them.
> > > > 
> > > > Thanks,
> > > > Jean  
> > 
> > 
> 
> -- 
> 			-Kenneth Lee (Hisilicon)

I just found this mail was missed in the mailing list. I tried it once
again.

-- 
			-Kenneth Lee (Hisilicon)

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-05-25  8:39                         ` Jonathan Cameron
                                             ` (2 preceding siblings ...)
       [not found]                           ` <20180526022445.GA6069@kllp05>
@ 2018-06-11 16:32                           ` Kenneth Lee
  3 siblings, 0 replies; 123+ messages in thread
From: Kenneth Lee @ 2018-06-11 16:32 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: xieyisheng1, liubo95, kvm, linux-pci, xuzaibo, Will Deacon,
	okaya, linux-mm, yi.l.liu, ashok.raj, Jean-Philippe Brucker, tn,
	joro, iommu, bharatku, linux-acpi, liudongdong3, rfranz

On Fri, May 25, 2018 at 09:39:59AM +0100, Jonathan Cameron wrote:
> Date: Fri, 25 May 2018 09:39:59 +0100
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> To: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> CC: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
>  "xieyisheng1@huawei.com" <xieyisheng1@huawei.com>, "kvm@vger.kernel.org"
>  <kvm@vger.kernel.org>, "linux-pci@vger.kernel.org"
>  <linux-pci@vger.kernel.org>, "xuzaibo@huawei.com" <xuzaibo@huawei.com>,
>  Will Deacon <Will.Deacon@arm.com>, "okaya@codeaurora.org"
>  <okaya@codeaurora.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>,
>  "yi.l.liu@intel.com" <yi.l.liu@intel.com>, "ashok.raj@intel.com"
>  <ashok.raj@intel.com>, "tn@semihalf.com" <tn@semihalf.com>,
>  "joro@8bytes.org" <joro@8bytes.org>, "robdclark@gmail.com"
>  <robdclark@gmail.com>, "bharatku@xilinx.com" <bharatku@xilinx.com>,
>  "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
>  "liudongdong3@huawei.com" <liudongdong3@huawei.com>, "rfranz@cavium.com"
>  <rfranz@cavium.com>, "devicetree@vger.kernel.org"
>  <devicetree@vger.kernel.org>, "kevin.tian@intel.com"
>  <kevin.tian@intel.com>, Jacob Pan <jacob.jun.pan@linux.intel.com>,
>  "alex.williamson@redhat.com" <alex.williamson@redhat.com>,
>  "rgummal@xilinx.com" <rgummal@xilinx.com>, "thunder.leizhen@huawei.com"
>  <thunder.leizhen@huawei.com>, "linux-arm-kernel@lists.infradead.org"
>  <linux-arm-kernel@lists.infradead.org>, "shunyong.yang@hxt-semitech.com"
>  <shunyong.yang@hxt-semitech.com>, "dwmw2@infradead.org"
>  <dwmw2@infradead.org>, "liubo95@huawei.com" <liubo95@huawei.com>,
>  "jcrouse@codeaurora.org" <jcrouse@codeaurora.org>,
>  "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>,
>  Robin Murphy <Robin.Murphy@arm.com>, "christian.koenig@amd.com"
>  <christian.koenig@amd.com>, "nwatters@codeaurora.org"
>  <nwatters@codeaurora.org>, "baolu.lu@linux.intel.com"
>  <baolu.lu@linux.intel.com>, liguozhu@hisilicon.com,
>  kenneth-lee-2012@foxmail.com
> Subject: Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
> Message-ID: <20180525093959.000040a7@huawei.com>
> Organization: Huawei
> X-Mailer: Claws Mail 3.15.0 (GTK+ 2.24.31; x86_64-w64-mingw32)
> 
> +CC Kenneth Lee
> 
> On Fri, 25 May 2018 09:33:11 +0300
> Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:
> 
> > On Thu, May 24, 2018 at 04:04:39PM +0100, Jean-Philippe Brucker wrote:
> > > On 24/05/18 12:50, Ilias Apalodimas wrote:  
> > > >> Interesting, I hadn't thought about this use-case before. At first I
> > > >> thought you were talking about mdev devices assigned to VMs, but I think
> > > >> you're referring to mdevs assigned to userspace drivers instead? Out of
> > > >> curiosity, is it only theoretical or does someone actually need this?  
> > > > 
> > > > There has been some non upstreamed efforts to have mdev and produce userspace
> > > > drivers. Huawei is using it on what they call "wrapdrive" for crypto devices and
> > > > we did a proof of concept for ethernet interfaces. At the time we choose not to
> > > > involve the IOMMU for the reason you mentioned, but having it there would be
> > > > good.  
> > > 
> > > I'm guessing there were good reasons to do it that way but I wonder, is
> > > it not simpler to just have the kernel driver create a /dev/foo, with a
> > > standard ioctl/mmap/poll interface? Here VFIO adds a layer of
> > > indirection, and since the mediating driver has to implement these
> > > operations already, what is gained?  
> > The best reason i can come up with is "common code". You already have one API
> > doing that for you so we replicate it in a /dev file?
> > The mdev approach still needs extentions to support what we tried to do (i.e
> > mdev bus might need yo have access on iommu_ops), but as far as i undestand it's
> > a possible case.

Hi, Jean, Please allow me to share my understanding here:
https://zhuanlan.zhihu.com/p/35489035

The reason we do not use the /dev/foo scheme is that the devices to be
shared are programmable accelerators. We cannot fix up the kernel driver for
them.

> > > 
> > > Thanks,
> > > Jean  
> 
> 

(p.s. I sent this mail on May 26 from my public email count. But it
seems the email server is blocked. I resent it from my company count until my
colleague told me just now. Sorry for inconvenience)

-- 
			-Kenneth(Hisilicon)

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]     ` <20180511190641.23008-14-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-17 15:58       ` Jonathan Cameron
  2018-05-23  9:38       ` Xu Zaibo
@ 2018-08-27  8:06       ` Xu Zaibo
       [not found]         ` <5B83B11E.7010807-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-08-27  8:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu, fanghao11,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Jean,

On 2018/5/12 3:06, Jean-Philippe Brucker wrote:
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 1aa7b82e8169..dc07752c8fe8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -665,6 +665,82 @@ struct vfio_iommu_type1_dma_unmap {
>   #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>   #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>   
> +/*
> + * VFIO_IOMMU_BIND_PROCESS
> + *
> + * Allocate a PASID for a process address space, and use it to attach this
> + * process to all devices in the container. Devices can then tag their DMA
> + * traffic with the returned @pasid to perform transactions on the associated
> + * virtual address space. Mapping and unmapping buffers is performed by standard
> + * functions such as mmap and malloc.
> + *
> + * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
> + * bind. Otherwise the current task is bound. Given that the caller owns the
> + * device, setting this flag grants the caller read and write permissions on the
> + * entire address space of foreign process described by @pid. Therefore,
> + * permission to perform the bind operation on a foreign process is governed by
> + * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
> + * for more information.
> + *
> + * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
> + * ID is unique to a process and can be used on all devices in the container.
> + *
> + * On fork, the child inherits the device fd and can use the bonds setup by its
> + * parent. Consequently, the child has R/W access on the address spaces bound by
> + * its parent. After an execv, the device fd is closed and the child doesn't
> + * have access to the address space anymore.
> + *
> + * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
> + * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
> + * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
> + * task from the container.
> + */
> +struct vfio_iommu_type1_bind_process {
> +	__u32	flags;
> +#define VFIO_IOMMU_BIND_PID		(1 << 0)
> +	__u32	pasid;
As I am doing some works on the SVA patch set. I just consider why the 
user space need this pasid.
Maybe, is it much more reasonable to set the pasid into all devices 
under the vfio container by
a call back function from 'vfio_devices'  while 
'VFIO_IOMMU_BIND_PROCESS' CMD is executed
in kernel land? I am not sure because there exists no suitable call back 
in 'vfio_device' at present.

Thanks,
Zaibo
> +	__s32	pid;
> +};
> +
> +/*
> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
> + * vfio_iommu_type1_bind_process in data.
> + */
> +struct vfio_iommu_type1_bind {
> +	__u32	argsz;
> +	__u32	flags;
> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
> +	__u8	data[];
> +};
> +
> +/*
> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
> + *
> + * Manage address spaces of devices in this container. Initially a TYPE1
> + * container can only have one address space, managed with
> + * VFIO_IOMMU_MAP/UNMAP_DMA.
> + *
> + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
> + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
> + * tables, and BIND manages the stage-1 (guest) page tables. Other types of
> + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
> + * non-PASID traffic and BIND controls PASID traffic. But this depends on the
> + * underlying IOMMU architecture and isn't guaranteed.
> + *
> + * Availability of this feature depends on the device, its bus, the underlying
> + * IOMMU and the CPU architecture.
> + *
> + * returns: 0 on success, -errno on failure.
> + */
> +#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +/*
> + * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
> + *
> + * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
> + */
> +#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>   /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>   
>   /*

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]         ` <5B83B11E.7010807-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-08-31 13:34           ` Jean-Philippe Brucker
       [not found]             ` <1d5b6529-4e5a-723c-3f1b-dd5a9adb490c-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-08-31 13:34 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu, fanghao11,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Zaibo,

On 27/08/18 09:06, Xu Zaibo wrote:
>> +struct vfio_iommu_type1_bind_process {
>> +    __u32    flags;
>> +#define VFIO_IOMMU_BIND_PID        (1 << 0)
>> +    __u32    pasid;
> As I am doing some works on the SVA patch set. I just consider why the
> user space need this pasid.
> Maybe, is it much more reasonable to set the pasid into all devices
> under the vfio container by
> a call back function from 'vfio_devices'  while
> 'VFIO_IOMMU_BIND_PROCESS' CMD is executed
> in kernel land? I am not sure because there exists no suitable call back
> in 'vfio_device' at present.

When using vfio-pci, the kernel doesn't know how to program the PASID
into the device because the only kernel driver for the device is the
generic vfio-pci module. The PCI specification doesn't describe a way of
setting up the PASID, it's vendor-specific. Only the userspace
application owning the device (and calling VFIO_IOMMU_BIND) knows how to
do it, so we return the allocated PASID.

Note that unlike vfio-mdev where applications share slices of a
function, with vfio-pci one application owns the whole function so it's
safe to let userspace set the PASID in hardware. With vfio-mdev it's the
kernel driver that should be in charge of setting the PASID as you
described, and we wouldn't have a reason to return the PASID in the
vfio_iommu_type1_bind_process structure.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]             ` <1d5b6529-4e5a-723c-3f1b-dd5a9adb490c-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-01  2:23               ` Xu Zaibo
       [not found]                 ` <5B89F818.7060300-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-09-01  2:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu, fanghao11,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Jean,

On 2018/8/31 21:34, Jean-Philippe Brucker wrote:
> On 27/08/18 09:06, Xu Zaibo wrote:
>>> +struct vfio_iommu_type1_bind_process {
>>> +    __u32    flags;
>>> +#define VFIO_IOMMU_BIND_PID        (1 << 0)
>>> +    __u32    pasid;
>> As I am doing some works on the SVA patch set. I just consider why the
>> user space need this pasid.
>> Maybe, is it much more reasonable to set the pasid into all devices
>> under the vfio container by
>> a call back function from 'vfio_devices'  while
>> 'VFIO_IOMMU_BIND_PROCESS' CMD is executed
>> in kernel land? I am not sure because there exists no suitable call back
>> in 'vfio_device' at present.
> When using vfio-pci, the kernel doesn't know how to program the PASID
> into the device because the only kernel driver for the device is the
> generic vfio-pci module. The PCI specification doesn't describe a way of
> setting up the PASID, it's vendor-specific. Only the userspace
> application owning the device (and calling VFIO_IOMMU_BIND) knows how to
> do it, so we return the allocated PASID.
>
> Note that unlike vfio-mdev where applications share slices of a
> function, with vfio-pci one application owns the whole function so it's
> safe to let userspace set the PASID in hardware. With vfio-mdev it's the
> kernel driver that should be in charge of setting the PASID as you
> described, and we wouldn't have a reason to return the PASID in the
> vfio_iommu_type1_bind_process structure.
Understood. But I still get the following confusion:

As one application takes a whole function while using VFIO-PCI, why do 
the application and the
function need to enable PASID capability? (Since just one I/O page table 
is enough for them.)

Maybe I misunderstood, hope you can help me clear it. Thank you very much.

Thanks,
Zaibo

.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                 ` <5B89F818.7060300-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-09-03 10:34                   ` Jean-Philippe Brucker
       [not found]                     ` <3a961aff-e830-64bb-b6a9-14e08de1abf5-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-03 10:34 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	fanghao11, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 01/09/18 03:23, Xu Zaibo wrote:
> As one application takes a whole function while using VFIO-PCI, why do 
> the application and the
> function need to enable PASID capability? (Since just one I/O page table 
> is enough for them.)

At the moment the series doesn't provide support for SVA without PASID
(on the I/O page fault path, 08/40). In addition the BIND ioctl could be
used by the owner application to bind other processes (slaves) and
perform sub-assignment. But that feature is incomplete because we don't
send stop_pasid notification to the owner when a slave dies.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                     ` <3a961aff-e830-64bb-b6a9-14e08de1abf5-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-04  2:12                       ` Xu Zaibo
       [not found]                         ` <5B8DEA15.7020404-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-09-04  2:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	fanghao11, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	christian.koenig-5C7GfCeVMHo



On 2018/9/3 18:34, Jean-Philippe Brucker wrote:
> On 01/09/18 03:23, Xu Zaibo wrote:
>> As one application takes a whole function while using VFIO-PCI, why do
>> the application and the
>> function need to enable PASID capability? (Since just one I/O page table
>> is enough for them.)
> At the moment the series doesn't provide support for SVA without PASID
> (on the I/O page fault path, 08/40). In addition the BIND ioctl could be
> used by the owner application to bind other processes (slaves) and
> perform sub-assignment. But that feature is incomplete because we don't
> send stop_pasid notification to the owner when a slave dies.
>
So, Could I understand like this?

     1. While the series are finished well, VFIO-PCI device can be held 
by only one process
         through binding IOCTL command without PASID (without PASID 
being exposed user space).

     2. While using VFIO-PCI device to support multiple processes with 
SVA series, a primary
         process with multiple secondary processes must be deployed just 
like DPDK(https://www.dpdk.org/).
         And, the PASID still has to be exposed to user land.


Thanks,
Zaibo

.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                         ` <5B8DEA15.7020404-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-09-04 10:57                           ` Jean-Philippe Brucker
       [not found]                             ` <bc27f902-4d12-21b7-b9e9-18bcae170503-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-04 10:57 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	fanghao11, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 04/09/2018 03:12, Xu Zaibo wrote:
> On 2018/9/3 18:34, Jean-Philippe Brucker wrote:
>> On 01/09/18 03:23, Xu Zaibo wrote:
>>> As one application takes a whole function while using VFIO-PCI, why do
>>> the application and the
>>> function need to enable PASID capability? (Since just one I/O page table
>>> is enough for them.)
>> At the moment the series doesn't provide support for SVA without PASID
>> (on the I/O page fault path, 08/40). In addition the BIND ioctl could be
>> used by the owner application to bind other processes (slaves) and
>> perform sub-assignment. But that feature is incomplete because we don't
>> send stop_pasid notification to the owner when a slave dies.
>>
> So, Could I understand like this?
> 
>      1. While the series are finished well, VFIO-PCI device can be held 
> by only one process
>          through binding IOCTL command without PASID (without PASID 
> being exposed user space).

It could, but isn't supported at the moment. In addition to adding
support in the I/O page fault code, we'd also need to update the VFIO
API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
The case you describe isn't compatible with MAP/UNMAP, since the process
manages the shared address space with mmap or malloc. We'd probably need
to introduce a new VFIO IOMMU type, in which case the bind could be
performed implicitly when the process does VFIO_SET_IOMMU. Then the
process wouldn't need to send an additional BIND IOCTL.

>      2. While using VFIO-PCI device to support multiple processes with 
> SVA series, a primary
>          process with multiple secondary processes must be deployed just 
> like DPDK(https://www.dpdk.org/).
>          And, the PASID still has to be exposed to user land.

Right. A third case, also implemented by this patch (and complete), is
the primary process simply doing a BIND for itself, and using the
returned PASID to share its own address space with the device.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                             ` <bc27f902-4d12-21b7-b9e9-18bcae170503-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-05  3:15                               ` Xu Zaibo
       [not found]                                 ` <5B8F4A59.20004-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Xu Zaibo @ 2018-09-05  3:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	fanghao11, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

Hi,

On 2018/9/4 18:57, Jean-Philippe Brucker wrote:
> On 04/09/2018 03:12, Xu Zaibo wrote:
>> On 2018/9/3 18:34, Jean-Philippe Brucker wrote:
>>> On 01/09/18 03:23, Xu Zaibo wrote:
>>>> As one application takes a whole function while using VFIO-PCI, why do
>>>> the application and the
>>>> function need to enable PASID capability? (Since just one I/O page table
>>>> is enough for them.)
>>> At the moment the series doesn't provide support for SVA without PASID
>>> (on the I/O page fault path, 08/40). In addition the BIND ioctl could be
>>> used by the owner application to bind other processes (slaves) and
>>> perform sub-assignment. But that feature is incomplete because we don't
>>> send stop_pasid notification to the owner when a slave dies.
>>>
>> So, Could I understand like this?
>>
>>       1. While the series are finished well, VFIO-PCI device can be held
>> by only one process
>>           through binding IOCTL command without PASID (without PASID
>> being exposed user space).
> It could, but isn't supported at the moment. In addition to adding
> support in the I/O page fault code, we'd also need to update the VFIO
> API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
> The case you describe isn't compatible with MAP/UNMAP, since the process
> manages the shared address space with mmap or malloc. We'd probably need
> to introduce a new VFIO IOMMU type, in which case the bind could be
> performed implicitly when the process does VFIO_SET_IOMMU. Then the
> process wouldn't need to send an additional BIND IOCTL.
ok. got it.  This is the legacy mode, so all the VFIO APIs are kept 
unchanged?
>>       2. While using VFIO-PCI device to support multiple processes with
>> SVA series, a primary
>>           process with multiple secondary processes must be deployed just
>> like DPDK(https://www.dpdk.org/).
>>           And, the PASID still has to be exposed to user land.
> Right. A third case, also implemented by this patch (and complete), is
> the primary process simply doing a BIND for itself, and using the
> returned PASID to share its own address space with the device.
>
ok. But I am worried that the sulotion of one primary processes with 
several secondary ones

is a little bit limited. Maybe, users don't want to depend on the 
primary process. :)



Thanks,
Zaibo

.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                                 ` <5B8F4A59.20004-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-09-05 11:02                                   ` Jean-Philippe Brucker
       [not found]                                     ` <b51107b8-a525-13ce-f4c3-d423b8502c27-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-05 11:02 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, fanghao11,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	christian.koenig-5C7GfCeVMHo

On 05/09/2018 04:15, Xu Zaibo wrote:
>>>       1. While the series are finished well, VFIO-PCI device can be held
>>> by only one process
>>>           through binding IOCTL command without PASID (without PASID
>>> being exposed user space).
>> It could, but isn't supported at the moment. In addition to adding
>> support in the I/O page fault code, we'd also need to update the VFIO
>> API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
>> The case you describe isn't compatible with MAP/UNMAP, since the process
>> manages the shared address space with mmap or malloc. We'd probably need
>> to introduce a new VFIO IOMMU type, in which case the bind could be
>> performed implicitly when the process does VFIO_SET_IOMMU. Then the
>> process wouldn't need to send an additional BIND IOCTL.
> ok. got it.  This is the legacy mode, so all the VFIO APIs are kept 
> unchanged?

Yes, existing VFIO semantics are preserved

>>>       2. While using VFIO-PCI device to support multiple processes with
>>> SVA series, a primary
>>>           process with multiple secondary processes must be deployed just
>>> like DPDK(https://www.dpdk.org/).
>>>           And, the PASID still has to be exposed to user land.
>> Right. A third case, also implemented by this patch (and complete), is
>> the primary process simply doing a BIND for itself, and using the
>> returned PASID to share its own address space with the device.
>>
> ok. But I am worried that the sulotion of one primary processes with 
> several secondary ones
> 
> is a little bit limited. Maybe, users don't want to depend on the 
> primary process. :)

I don't see a better way for vfio-pci, though. But more importantly, I
don't know of any users :) While the feature is great for testing new
hardware, and I've been using it for all kinds of stress testing, I
haven't received feedback from possible users in production settings
(DPDK etc) and can't speculate about what they'd prefer.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]     ` <20180511190641.23008-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-16 20:41       ` Jacob Pan
@ 2018-09-05 11:29       ` Auger Eric
       [not found]         ` <bf42affd-e9d0-e4fc-6d28-f3c3f7795348-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 123+ messages in thread
From: Auger Eric @ 2018-09-05 11:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Jean-Philippe,
On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
> process address spaces to devices. This requires the IOMMU to support page
> table format and features compatible with the CPUs, and usually requires
> the system to support I/O Page Faults (IOPF) and Process Address Space ID
> (PASID). When all of these are available, DMA can access virtual addresses
> of a process. A PASID is allocated for each process, and the device driver
> programs it into the device in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce two
> IOMMU operations, sva_device_init() and sva_device_shutdown(), that
> prepare the IOMMU driver for SVA. For example allocate PASID tables and
> fault queues. Subsequent patches will implement the bind() and unbind()
> operations.
> 
> Support for I/O Page Faults will be added in a later patch using a new
> feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
> down all shared mappings. Other feature bits that may be added in the
> future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address
> spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address
> space to a process.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2:
> * Add sva_param structure to iommu_param
> * CONFIG option is only selectable by IOMMU drivers
> ---
>  drivers/iommu/Kconfig     |   4 ++
>  drivers/iommu/Makefile    |   1 +
>  drivers/iommu/iommu-sva.c | 110 ++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     |  32 +++++++++++
>  4 files changed, 147 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 7564237f788d..cca8e06903c7 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,10 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
>  
> +config IOMMU_SVA
> +	bool
> +	select IOMMU_API
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index 000000000000..8b4afb7c63ae
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Manage PASIDs and bind process address spaces to devices.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/slab.h>
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to initialize all
> + * features required for SVA.
> + *
> + * The device must support multiple address spaces (e.g. PCI PASID). By default
> + * the PASID allocated during bind() is limited by the IOMMU capacity, and by
> + * the device PASID width defined in the PCI capability or in the firmware
> + * description. Setting @max_pasid to a non-zero value smaller than this limit
> + * overrides it.
> + *
> + * The device should not be performing any DMA while this function is running,
> + * otherwise the behavior is undefined.
> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +			  unsigned int max_pasid)
what about min_pasid?
> +{
> +	int ret;
> +	struct iommu_sva_param *param;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain || !domain->ops->sva_device_init)
> +		return -ENODEV;
> +
> +	if (features)
> +		return -EINVAL;
> +
> +	param = kzalloc(sizeof(*param), GFP_KERNEL);
> +	if (!param)
> +		return -ENOMEM;
> +
> +	param->features		= features;
> +	param->max_pasid	= max_pasid;
> +
> +	/*
> +	 * IOMMU driver updates the limits depending on the IOMMU and device
> +	 * capabilities.
> +	 */
> +	ret = domain->ops->sva_device_init(dev, param);
> +	if (ret)
> +		goto err_free_param;
So you are likely to call sva_device_init even if it was already called
(ie. dev->iommu_param->sva_param is already set). Can't you test whether
dev->iommu_param->sva_param is already set first?
> +
> +	mutex_lock(&dev->iommu_param->lock);
> +	if (dev->iommu_param->sva_param)
> +		ret = -EEXIST;
> +	else
> +		dev->iommu_param->sva_param = param;
> +	mutex_unlock(&dev->iommu_param->lock);
> +	if (ret)
> +		goto err_device_shutdown;
> +
> +	return 0;
> +
> +err_device_shutdown:
> +	if (domain->ops->sva_device_shutdown)
> +		domain->ops->sva_device_shutdown(dev, param);
> +
> +err_free_param:
> +	kfree(param);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_init);
> +
> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
> + * @dev: the device
> + *
> + * Disable SVA. Device driver should ensure that the device isn't performing any
> + * DMA while this function is running.
> + */
> +int iommu_sva_device_shutdown(struct device *dev)
Not sure the returned value is required for a shutdown operation.
> +{
> +	struct iommu_sva_param *param;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	mutex_lock(&dev->iommu_param->lock);
> +	param = dev->iommu_param->sva_param;
> +	dev->iommu_param->sva_param = NULL;
> +	mutex_unlock(&dev->iommu_param->lock);
> +	if (!param)
> +		return -ENODEV;
> +
> +	if (domain->ops->sva_device_shutdown)
> +		domain->ops->sva_device_shutdown(dev, param);
> +
> +	kfree(param);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0933f726d2e6..2efe7738bedb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -212,6 +212,12 @@ struct page_response_msg {
>  	u64 private_data;
>  };
>  
> +struct iommu_sva_param {
What are the feature values?
> +	unsigned long features;
> +	unsigned int min_pasid;
> +	unsigned int max_pasid;
> +};
> +
>  /**
>   * struct iommu_ops - iommu ops and capabilities
>   * @capable: check capability
> @@ -219,6 +225,8 @@ struct page_response_msg {
>   * @domain_free: free iommu domain
>   * @attach_dev: attach device to an iommu domain
>   * @detach_dev: detach device from an iommu domain
> + * @sva_device_init: initialize Shared Virtual Adressing for a device
Addressing
> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
Addressing
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @unmap: unmap a physically contiguous memory region from an iommu domain
>   * @map_sg: map a scatter-gather list of physically contiguous memory chunks
> @@ -256,6 +264,10 @@ struct iommu_ops {
>  
>  	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
>  	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> +	int (*sva_device_init)(struct device *dev,
> +			       struct iommu_sva_param *param);
> +	void (*sva_device_shutdown)(struct device *dev,
> +				    struct iommu_sva_param *param);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> @@ -413,6 +425,7 @@ struct iommu_fault_param {
>   * struct iommu_param - collection of per-device IOMMU data
>   *
>   * @fault_param: IOMMU detected device fault reporting data
> + * @sva_param: SVA parameters
>   *
>   * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
>   *	struct iommu_group	*iommu_group;
> @@ -421,6 +434,7 @@ struct iommu_fault_param {
>  struct iommu_param {
>  	struct mutex lock;
>  	struct iommu_fault_param *fault_param;
> +	struct iommu_sva_param *sva_param;
>  };
>  
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -920,4 +934,22 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
>  
>  #endif /* CONFIG_IOMMU_API */
>  
> +#ifdef CONFIG_IOMMU_SVA
> +extern int iommu_sva_device_init(struct device *dev, unsigned long features,
> +				 unsigned int max_pasid);
> +extern int iommu_sva_device_shutdown(struct device *dev);
> +#else /* CONFIG_IOMMU_SVA */
> +static inline int iommu_sva_device_init(struct device *dev,
> +					unsigned long features,
> +					unsigned int max_pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
>  #endif /* __LINUX_IOMMU_H */
> 

Thanks

Eric

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices
       [not found]     ` <20180511190641.23008-3-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-17 13:10       ` Jonathan Cameron
@ 2018-09-05 11:29       ` Auger Eric
       [not found]         ` <471873d4-a1a6-1a3a-cf17-1e686b4ade96-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 123+ messages in thread
From: Auger Eric @ 2018-09-05 11:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Jean-Philippe,

On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
> PASID that drivers can program in hardware, to let their devices access an
> mm. This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.
> 
> IOMMU groups with more than one device aren't supported for SVA at the
> moment. There may be P2P traffic between devices within a group, which
> cannot be seen by an IOMMU (note that supporting PASID doesn't add any
> form of isolation with regard to P2P). Supporting groups would require
> calling bind() for all bound processes every time a device is added to a
> group, to perform sanity checks (e.g. ensure that new devices support
> PASIDs at least as big as those already allocated in the group). It also
> means making sure that reserved ranges (IOMMU_RESV_*) of all devices are
> carved out of processes. This is already tricky with single devices, but
> becomes very difficult with groups. Since SVA-capable devices are expected
> to be cleanly isolated, and since we don't have any way to test groups or
> hot-plug, we only allow singular groups for now.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2: remove iommu_sva_bind/unbind_group
> ---
>  drivers/iommu/iommu-sva.c | 27 +++++++++++++
>  drivers/iommu/iommu.c     | 83 +++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 37 +++++++++++++++++
>  3 files changed, 147 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 8b4afb7c63ae..8d98f9c09864 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -93,6 +93,8 @@ int iommu_sva_device_shutdown(struct device *dev)
>  	if (!domain)
>  		return -ENODEV;
>  
> +	__iommu_sva_unbind_dev_all(dev);
> +
>  	mutex_lock(&dev->iommu_param->lock);
>  	param = dev->iommu_param->sva_param;
>  	dev->iommu_param->sva_param = NULL;
> @@ -108,3 +110,28 @@ int iommu_sva_device_shutdown(struct device *dev)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> +
> +int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +			    int *pasid, unsigned long flags, void *drvdata)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
> +
> +int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
> +
> +/**
> + * __iommu_sva_unbind_dev_all() - Detach all address spaces from this device
> + * @dev: the device
> + *
> + * When detaching @device from a domain, IOMMU drivers should use this helper.
> + */
> +void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +	/* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 9e28d88c8074..bd2819deae5b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2261,3 +2261,86 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access the mm
> + * using the returned PASID. If unbind() isn't called first, a subsequent bind()
> + * for the same device and mm fails with -EEXIST.
> + *
> + * iommu_sva_device_init() must be called first, to initialize the required SVA
> + * features. @flags is a subset of these features.
@flags must be a subset of these features?
> + *
> + * The caller must pin down using get_user_pages*() all mappings shared with the
> + * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
> + * (e.g. copy-on-write).
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
> +			  unsigned long flags, void *drvdata)
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	/* Ensure device count and domain don't change while we're binding */
> +	mutex_lock(&group->mutex);
> +	if (iommu_group_device_count(group) != 1)
> +		goto out_unlock;
don't you want to check flags is a subset of
dev->iommu_param->sva_param->features?
> +
> +	ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
> +
> +out_unlock:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> + * @dev: the device
> + * @pasid: the pasid returned by bind()
> + *
> + * Remove bond between device and address space identified by @pasid. Users
> + * should not call unbind() if the corresponding mm exited (as the PASID might
> + * have been reallocated for another process).
> + *
> + * The device must not be issuing any more transaction for this PASID. All
> + * outstanding page requests for this PASID must have been flushed to the IOMMU.
> + *
> + * Returns 0 on success, or an error value
> + */
> +int iommu_sva_unbind_device(struct device *dev, int pasid)
returned value needed?
> +{
> +	int ret = -EINVAL;
> +	struct iommu_group *group;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	mutex_lock(&group->mutex);
> +	ret = __iommu_sva_unbind_device(dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 2efe7738bedb..da59c20c4f12 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -613,6 +613,10 @@ void iommu_fwspec_free(struct device *dev);
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>  
> +extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				int *pasid, unsigned long flags, void *drvdata);
> +extern int iommu_sva_unbind_device(struct device *dev, int pasid);
> +
>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> @@ -932,12 +936,29 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain,
>  	return -EINVAL;
>  }
>  
> +static inline int iommu_sva_bind_device(struct device *dev,
> +					struct mm_struct *mm, int *pasid,
> +					unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_SVA
>  extern int iommu_sva_device_init(struct device *dev, unsigned long features,
>  				 unsigned int max_pasid);
>  extern int iommu_sva_device_shutdown(struct device *dev);
> +extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> +				   int *pasid, unsigned long flags,
> +				   void *drvdata);
> +extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
> +extern void __iommu_sva_unbind_dev_all(struct device *dev);
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -950,6 +971,22 @@ static inline int iommu_sva_device_shutdown(struct device *dev)
>  {
>  	return -ENODEV;
>  }
> +
> +static inline int __iommu_sva_bind_device(struct device *dev,
> +					  struct mm_struct *mm, int *pasid,
> +					  unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +}
>  #endif /* CONFIG_IOMMU_SVA */
>  
>  #endif /* __LINUX_IOMMU_H */
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]     ` <20180511190641.23008-4-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-16 23:31       ` Jacob Pan
  2018-05-17 14:25       ` Jonathan Cameron
@ 2018-09-05 12:14       ` Auger Eric
       [not found]         ` <d785ec89-6743-d0f1-1a7f-85599a033e5b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2 siblings, 1 reply; 123+ messages in thread
From: Auger Eric @ 2018-09-05 12:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Jean-Philippe,

On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> Allocate IOMMU mm structures and binding them to devices. Four operations
s/binding/bind
> are added to IOMMU drivers:
> 
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on ARM,
>   pin down the CPU ASID so that the process doesn't get assigned a new
>   ASID on rollover).
> 
>   There is a single valid io_mm structure per Linux mm. Future extensions
>   may also use io_mm for kernel-managed address spaces, populated with
>   map()/unmap() calls instead of bound to process address spaces. This
>   patch focuses on "shared" io_mm.
> 
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>   device is capable of sharing an address space, and writes the PASID
>   table entry to install the pgd.
> 
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and clear
>   the PASID entry or only send the required TLB invalidations.
> 
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>   PASID table entry and invalidates the IOTLBs.
> 
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
> 
> mm_attach and mm_detach operations are serialized with a spinlock. When
> trying to optimize this code, we should at least prevent concurrent
> attach()/detach() on the same domain (so multi-level PASID table code can
> allocate tables lazily). mm_alloc() can sleep, but mm_free must not
> (because we'll have to call it from call_srcu later on).
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
> 
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2: sanity-check of flags
> ---
>  drivers/iommu/iommu-sva.c | 380 +++++++++++++++++++++++++++++++++++++-
>  drivers/iommu/iommu.c     |   1 +
>  include/linux/iommu.h     |  28 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 8d98f9c09864..6ac679c48f3c 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -5,8 +5,298 @@
>   * Copyright (C) 2018 ARM Ltd.
>   */
>  
> +#include <linux/idr.h>
>  #include <linux/iommu.h>
> +#include <linux/sched/mm.h>
>  #include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
> + * B. All devices within the same domain access the same address spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each mm. In the
> + * example, devices use PASID 1 to read/write into address space X and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
> + * With this model, a single call binds all devices in a given domain to an
> + * address space. Other devices in the domain will get the same bond implicitly.
> + * However, users must issue one bind() for each device, because IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
> + * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
> + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
> + * the device table and PASID 0 would be available to the allocator.
> + */
very nice explanation
> +
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
> + * used for returning errors). In practice implementations will use at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm, unsigned long flags)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
don't you need to check param != NULL and flags are compatible with
those set at init?
> +	if (!domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm, flags);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the io_mm
> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->flags		= flags;
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
> +			  param->max_pasid + 1, GFP_ATOMIC);
isn't it param->max_pasid - 1?
> +	io_mm->pasid = pasid;
> +	spin_unlock(&iommu_sva_lock);
> +	idr_preload_end();
> +
> +	if (pasid < 0) {
> +		ret = pasid;
> +		goto err_free_mm;
> +	}
> +
> +	/* TODO: keep track of mm. For the moment, abort. */I don't get where you return with success?
> +	ret = -ENOSYS;
> +	spin_lock(&iommu_sva_lock);
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +	spin_unlock(&iommu_sva_lock);
> +
> +err_free_mm:
> +	domain->ops->mm_free(io_mm);
> +	mmdrop(mm);
> +
> +	return ERR_PTR(ret)
> +}
> +
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm = io_mm->mm;
> +
> +	io_mm->release(io_mm);
> +	mmdrop(mm);
> +}
> +
> +static void io_mm_release(struct kref *kref)
> +{
> +	struct io_mm *io_mm;
> +
> +	io_mm = container_of(kref, struct io_mm, kref);
> +	WARN_ON(!list_empty(&io_mm->devices));
> +
> +	idr_remove(&iommu_pasid_idr, io_mm->pasid);
> +
> +	io_mm_free(io_mm);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the io_mm was successfully taken.
> + * Returns zero if the io_mm is being freed and should not be used.
> + */
> +static int io_mm_get_locked(struct io_mm *io_mm)
> +{
> +	if (io_mm)
> +		return kref_get_unless_zero(&io_mm->kref);
> +
> +	return 0;
> +}
> +
> +static void io_mm_put_locked(struct io_mm *io_mm)
> +{
> +	kref_put(&io_mm->kref, io_mm_release);
> +}
> +
> +static void io_mm_put(struct io_mm *io_mm)
> +{
> +	spin_lock(&iommu_sva_lock);
> +	io_mm_put_locked(io_mm);
> +	spin_unlock(&iommu_sva_lock);
> +}
> +
> +static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
> +			struct io_mm *io_mm, void *drvdata)
> +{
> +	int ret;
> +	bool attach_domain = true;
> +	int pasid = io_mm->pasid;
> +	struct iommu_bond *bond, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> +		return -ENODEV;
don't you need to check param is not NULL?
> +
> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
pasid >= param->max_pasid ?
> +		return -ERANGE;
> +
> +	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond->domain		= domain;
> +	bond->io_mm		= io_mm;
> +	bond->dev		= dev;
> +	bond->drvdata		= drvdata;
> +
> +	spin_lock(&iommu_sva_lock);
> +	/*
> +	 * Check if this io_mm is already bound to the domain. In which case the
> +	 * IOMMU driver doesn't have to install the PASID table entry.
> +	 */
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == io_mm) {
> +			attach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
the fact the mm_attach/detach() must not sleep may be documented in the
API doc.
> +	if (ret) {
> +		kfree(bond);
> +		spin_unlock(&iommu_sva_lock);
> +		return ret;
nit: goto unlock simplification
> +	}
> +
> +	list_add(&bond->mm_head, &io_mm->devices);
> +	list_add(&bond->domain_head, &domain->mm_list);
> +	list_add(&bond->dev_head, &param->mm_list);
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return 0;
> +}
> +
> +static void io_mm_detach_locked(struct iommu_bond *bond)
> +{
> +	struct iommu_bond *tmp;
> +	bool detach_domain = true;
> +	struct iommu_domain *domain = bond->domain;
> +
> +	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
> +		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
> +			detach_domain = false;
> +			break;
> +		}
> +	}
> +
> +	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
> +
> +	list_del(&bond->mm_head);
> +	list_del(&bond->domain_head);
> +	list_del(&bond->dev_head);
> +	io_mm_put_locked(bond->io_mm);
> +
> +	kfree(bond);
> +}
>  
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
> @@ -47,6 +337,7 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
>  
>  	param->features		= features;
>  	param->max_pasid	= max_pasid;
> +	INIT_LIST_HEAD(&param->mm_list);
>  
>  	/*
>  	 * IOMMU driver updates the limits depending on the IOMMU and device
> @@ -114,13 +405,87 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>  int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
>  			    int *pasid, unsigned long flags, void *drvdata)
>  {
> -	return -ENOSYS; /* TODO */
> +	int i, ret = 0;
> +	struct io_mm *io_mm = NULL;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL, *tmp;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	/*
> +	 * The device driver does not call sva_device_init/shutdown and
> +	 * bind/unbind concurrently, so no need to take the param lock.
> +	 */
what does guarantee that?
> +	if (WARN_ON_ONCE(!param) || (flags & ~param->features))
> +		return -EINVAL;
> +
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> +		if (io_mm->mm == mm && io_mm_get_locked(io_mm)) {
> +			/* ... Unless it's already bound to this device */
> +			list_for_each_entry(tmp, &io_mm->devices, mm_head) {
> +				if (tmp->dev == dev) {
> +					bond = tmp;
> +					io_mm_put_locked(io_mm);
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	if (bond)
> +		return -EEXIST;
> +
> +	/* Require identical features within an io_mm for now */
> +	if (io_mm && (flags != io_mm->flags)) {
> +		io_mm_put(io_mm);
> +		return -EDOM;
> +	}
> +
> +	if (!io_mm) {
> +		io_mm = io_mm_alloc(domain, dev, mm, flags);
> +		if (IS_ERR(io_mm))
> +			return PTR_ERR(io_mm);
> +	}
> +
> +	ret = io_mm_attach(domain, dev, io_mm, drvdata);
> +	if (ret)
> +		io_mm_put(io_mm);
dont't you want to free the io_mm if just allocated?
> +	else
> +		*pasid = io_mm->pasid;
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
>  
>  int __iommu_sva_unbind_device(struct device *dev, int pasid)
>  {
> -	return -ENOSYS; /* TODO */
> +	int ret = -ESRCH;
> +	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL;
> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!param || WARN_ON(!domain))
> +		return -EINVAL;
> +
> +	spin_lock(&iommu_sva_lock);
> +	list_for_each_entry(bond, &param->mm_list, dev_head) {
> +		if (bond->io_mm->pasid == pasid) {
> +			io_mm_detach_locked(bond);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>  
> @@ -132,6 +497,15 @@ EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
>   */
>  void __iommu_sva_unbind_dev_all(struct device *dev)
>  {
> -	/* TODO */
> +	struct iommu_sva_param *param;
> +	struct iommu_bond *bond, *next;
> +
> +	param = dev->iommu_param->sva_param;
> +	if (param) {
> +		spin_lock(&iommu_sva_lock);
> +		list_for_each_entry_safe(bond, next, &param->mm_list, dev_head)
> +			io_mm_detach_locked(bond);
> +		spin_unlock(&iommu_sva_lock);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index bd2819deae5b..333801e1519c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1463,6 +1463,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>  	domain->type = type;
>  	/* Assume all sizes by default; the driver may override this later */
>  	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
> +	INIT_LIST_HEAD(&domain->mm_list);
>  
>  	return domain;
>  }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index da59c20c4f12..d5f21719a5a0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -100,6 +100,20 @@ struct iommu_domain {
>  	void *handler_token;
>  	struct iommu_domain_geometry geometry;
>  	void *iova_cookie;
> +
> +	struct list_head mm_list;
> +};
> +
> +struct io_mm {
> +	int			pasid;
> +	/* IOMMU_SVA_FEAT_* */
> +	unsigned long		flags;
> +	struct list_head	devices;
> +	struct kref		kref;
> +	struct mm_struct	*mm;
> +
> +	/* Release callback for this mm */
> +	void (*release)(struct io_mm *io_mm);
>  };
>  
>  enum iommu_cap {
> @@ -216,6 +230,7 @@ struct iommu_sva_param {
>  	unsigned long features;
>  	unsigned int min_pasid;
>  	unsigned int max_pasid;
> +	struct list_head mm_list;
>  };
>  
>  /**
> @@ -227,6 +242,11 @@ struct iommu_sva_param {
>   * @detach_dev: detach device from an iommu domain
>   * @sva_device_init: initialize Shared Virtual Adressing for a device
>   * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
> + * @mm_alloc: allocate io_mm
> + * @mm_free: free io_mm
> + * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
> + * @mm_detach: detach io_mm from a device. Remove PASID entry and
> + *             flush associated TLB entries.
if necessary too?
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @unmap: unmap a physically contiguous memory region from an iommu domain
>   * @map_sg: map a scatter-gather list of physically contiguous memory chunks
> @@ -268,6 +288,14 @@ struct iommu_ops {
>  			       struct iommu_sva_param *param);
>  	void (*sva_device_shutdown)(struct device *dev,
>  				    struct iommu_sva_param *param);
> +	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
> +				  struct mm_struct *mm,
> +				  unsigned long flags);
> +	void (*mm_free)(struct io_mm *io_mm);
> +	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
> +			 struct io_mm *io_mm, bool attach_domain);
> +	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
> +			  struct io_mm *io_mm, bool detach_domain);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 04/40] iommu/sva: Add a mm_exit callback for device drivers
       [not found]     ` <20180511190641.23008-5-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-05 13:23       ` Auger Eric
       [not found]         ` <d1dc28c4-7742-9c41-3f91-3fbcb8b13c1c-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Auger Eric @ 2018-09-05 13:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Jean-Philippe,

On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> When an mm exits, devices that were bound to it must stop performing DMA
> on its PASID. Let device drivers register a callback to be notified on mm
> exit. Add the callback to the sva_param structure attached to struct
> device.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> 
> ---
> v1->v2: use iommu_sva_device_init instead of a separate function
> ---
>  drivers/iommu/iommu-sva.c | 11 ++++++++++-
>  include/linux/iommu.h     |  9 +++++++--
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 6ac679c48f3c..0700893c679d 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -303,6 +303,7 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
>   * @dev: the device
>   * @features: bitmask of features that need to be initialized
>   * @max_pasid: max PASID value supported by the device
> + * @mm_exit: callback to notify the device driver of an mm exiting
>   *
>   * Users of the bind()/unbind() API must call this function to initialize all
>   * features required for SVA.
> @@ -313,13 +314,20 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
>   * description. Setting @max_pasid to a non-zero value smaller than this limit
>   * overrides it.
>   *
> + * If the driver intends to share process address spaces, it should pass a valid
> + * @mm_exit handler. Otherwise @mm_exit can be NULL.
I don't get case where mm_exit is allowed to be NULL.

Thanks

Eric
 After @mm_exit returns, the
> + * device must not issue any more transaction with the PASID given as argument.
> + * The handler gets an opaque pointer corresponding to the drvdata passed as
> + * argument of bind().
> + *
>   * The device should not be performing any DMA while this function is running,
>   * otherwise the behavior is undefined.
>   *
>   * Return 0 if initialization succeeded, or an error.
>   */
>  int iommu_sva_device_init(struct device *dev, unsigned long features,
> -			  unsigned int max_pasid)
> +			  unsigned int max_pasid,
> +			  iommu_mm_exit_handler_t mm_exit)
>  {
>  	int ret;
>  	struct iommu_sva_param *param;
> @@ -337,6 +345,7 @@ int iommu_sva_device_init(struct device *dev, unsigned long features,
>  
>  	param->features		= features;
>  	param->max_pasid	= max_pasid;
> +	param->mm_exit		= mm_exit;
>  	INIT_LIST_HEAD(&param->mm_list);
>  
>  	/*
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index d5f21719a5a0..439c8fffd836 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -61,6 +61,7 @@ struct iommu_fault_event;
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>  			struct device *, unsigned long, int, void *);
>  typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
> +typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
>  
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped    */
> @@ -231,6 +232,7 @@ struct iommu_sva_param {
>  	unsigned int min_pasid;
>  	unsigned int max_pasid;
>  	struct list_head mm_list;
> +	iommu_mm_exit_handler_t mm_exit;
>  };
>  
>  /**
> @@ -980,17 +982,20 @@ static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
>  
>  #ifdef CONFIG_IOMMU_SVA
>  extern int iommu_sva_device_init(struct device *dev, unsigned long features,
> -				 unsigned int max_pasid);
> +				 unsigned int max_pasid,
> +				 iommu_mm_exit_handler_t mm_exit);
>  extern int iommu_sva_device_shutdown(struct device *dev);
>  extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
>  				   int *pasid, unsigned long flags,
>  				   void *drvdata);
>  extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
>  extern void __iommu_sva_unbind_dev_all(struct device *dev);
> +
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> -					unsigned int max_pasid)
> +					unsigned int max_pasid,
> +					iommu_mm_exit_handler_t mm_exit)
>  {
>  	return -ENODEV;
>  }
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]         ` <d785ec89-6743-d0f1-1a7f-85599a033e5b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-09-05 18:18           ` Jacob Pan
  2018-09-06 17:40             ` Jean-Philippe Brucker
  2018-09-06 11:10           ` Jean-Philippe Brucker
  1 sibling, 1 reply; 123+ messages in thread
From: Jacob Pan @ 2018-09-05 18:18 UTC (permalink / raw)
  To: Auger Eric
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, Jean-Philippe Brucker,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, liubo95-hv44wF8Li93QT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	robin.murphy-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Wed, 5 Sep 2018 14:14:12 +0200
Auger Eric <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> > + *
> > + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used
> > to hold
> > + * non-PASID translations. In this case PASID 0 is reserved and
> > entry 0 points
> > + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base
> > would be held in
> > + * the device table and PASID 0 would be available to the
> > allocator.
> > + */  
> very nice explanation
With the new Vt-d 3.0 spec., 2nd level IO page table base is no longer
held in the device context table. Instead it is held in the PASID table
entry pointed by the RID_PASID field in the device context entry. If
RID_PASID = 0, then it is the same as ARM and AMD IOMMUs.
You can refer to ch3.4.3 of the VT-d spec.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
       [not found]                                     ` <b51107b8-a525-13ce-f4c3-d423b8502c27-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-06  7:26                                       ` Xu Zaibo
  0 siblings, 0 replies; 123+ messages in thread
From: Xu Zaibo @ 2018-09-06  7:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q, fanghao11,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, 米米,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, Robin Murphy,
	christian.koenig-5C7GfCeVMHo


On 2018/9/5 19:02, Jean-Philippe Brucker wrote:
> On 05/09/2018 04:15, Xu Zaibo wrote:
>>>>        1. While the series are finished well, VFIO-PCI device can be held
>>>> by only one process
>>>>            through binding IOCTL command without PASID (without PASID
>>>> being exposed user space).
>>> It could, but isn't supported at the moment. In addition to adding
>>> support in the I/O page fault code, we'd also need to update the VFIO
>>> API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
>>> The case you describe isn't compatible with MAP/UNMAP, since the process
>>> manages the shared address space with mmap or malloc. We'd probably need
>>> to introduce a new VFIO IOMMU type, in which case the bind could be
>>> performed implicitly when the process does VFIO_SET_IOMMU. Then the
>>> process wouldn't need to send an additional BIND IOCTL.
>> ok. got it.  This is the legacy mode, so all the VFIO APIs are kept
>> unchanged?
> Yes, existing VFIO semantics are preserved
>
>>>>        2. While using VFIO-PCI device to support multiple processes with
>>>> SVA series, a primary
>>>>            process with multiple secondary processes must be deployed just
>>>> like DPDK(https://www.dpdk.org/).
>>>>            And, the PASID still has to be exposed to user land.
>>> Right. A third case, also implemented by this patch (and complete), is
>>> the primary process simply doing a BIND for itself, and using the
>>> returned PASID to share its own address space with the device.
>>>
>> ok. But I am worried that the sulotion of one primary processes with
>> several secondary ones
>>
>> is a little bit limited. Maybe, users don't want to depend on the
>> primary process. :)
> I don't see a better way for vfio-pci, though. But more importantly, I
> don't know of any users :) While the feature is great for testing new
> hardware, and I've been using it for all kinds of stress testing, I
> haven't received feedback from possible users in production settings
> (DPDK etc) and can't speculate about what they'd prefer.
>
At present, It seems no other way existing while being compatible with 
current logic.
Thank you.

Thanks,
Zaibo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]         ` <bf42affd-e9d0-e4fc-6d28-f3c3f7795348-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-09-06 11:09           ` Jean-Philippe Brucker
       [not found]             ` <03d31ba5-1eda-ea86-8c0c-91d14c86fe83-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-06 11:09 UTC (permalink / raw)
  To: Auger Eric, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Eric,

Thanks for reviewing

On 05/09/2018 12:29, Auger Eric wrote:
>> +int iommu_sva_device_init(struct device *dev, unsigned long features,
>> +			  unsigned int max_pasid)
> what about min_pasid?

No one asked for it... The max_pasid parameter is here for drivers that
have vendor-specific PASID size limits, such as AMD KFD (see
kfd_iommu_device_init and
https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
the PASID size will only depend on the PCI PASID capability and the
IOMMU limits, both known by the IOMMU driver, so device drivers won't
have to set max_pasid.

IOMMU drivers need to set min_pasid in the sva_device_init callback
because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
(Intel Vt-d rev2), but at the moment I can't see a reason for device
drivers to override min_pasid

>> +	/*
>> +	 * IOMMU driver updates the limits depending on the IOMMU and device
>> +	 * capabilities.
>> +	 */
>> +	ret = domain->ops->sva_device_init(dev, param);
>> +	if (ret)
>> +		goto err_free_param;
> So you are likely to call sva_device_init even if it was already called
> (ie. dev->iommu_param->sva_param is already set). Can't you test whether
> dev->iommu_param->sva_param is already set first?

Right, that's probably better

>> +/**
>> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
>> + * @dev: the device
>> + *
>> + * Disable SVA. Device driver should ensure that the device isn't performing any
>> + * DMA while this function is running.
>> + */
>> +int iommu_sva_device_shutdown(struct device *dev)
> Not sure the returned value is required for a shutdown operation.

I don't know either. I like them because they help me debug, but are
otherwise rather useless if we don't describe precise semantics. The
caller cannot do anything with it. Given that the corresponding IOMMU op
is already void, I can change this function to void as well

>> +struct iommu_sva_param {
> What are the feature values?

At the moment only IOMMU_SVA_FEAT_IOPF, introduced by patch 09

>> +	unsigned long features;
>> +	unsigned int min_pasid;
>> +	unsigned int max_pasid;
>> +};
>> +
>>  /**
>>   * struct iommu_ops - iommu ops and capabilities
>>   * @capable: check capability
>> @@ -219,6 +225,8 @@ struct page_response_msg {
>>   * @domain_free: free iommu domain
>>   * @attach_dev: attach device to an iommu domain
>>   * @detach_dev: detach device from an iommu domain
>> + * @sva_device_init: initialize Shared Virtual Adressing for a device
> Addressing
>> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
> Addressing

Nice catch

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices
       [not found]         ` <471873d4-a1a6-1a3a-cf17-1e686b4ade96-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-09-06 11:09           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-06 11:09 UTC (permalink / raw)
  To: Auger Eric, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 05/09/2018 12:29, Auger Eric wrote:
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to it
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties
>> + * @drvdata: private data passed to the mm exit handler
>> + *
>> + * Create a bond between device and task, allowing the device to access the mm
>> + * using the returned PASID. If unbind() isn't called first, a subsequent bind()
>> + * for the same device and mm fails with -EEXIST.
>> + *
>> + * iommu_sva_device_init() must be called first, to initialize the required SVA
>> + * features. @flags is a subset of these features.
> @flags must be a subset of these features?

Ok

> don't you want to check flags is a subset of
> dev->iommu_param->sva_param->features?

Yes, that will be in next version

>> +/**
>> + * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
>> + * @dev: the device
>> + * @pasid: the pasid returned by bind()
>> + *
>> + * Remove bond between device and address space identified by @pasid. Users
>> + * should not call unbind() if the corresponding mm exited (as the PASID might
>> + * have been reallocated for another process).
>> + *
>> + * The device must not be issuing any more transaction for this PASID. All
>> + * outstanding page requests for this PASID must have been flushed to the IOMMU.
>> + *
>> + * Returns 0 on success, or an error value
>> + */
>> +int iommu_sva_unbind_device(struct device *dev, int pasid)
> returned value needed?

I'd rather keep this one, since it already pointed out some of my bugs
during regression testing.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
       [not found]         ` <d785ec89-6743-d0f1-1a7f-85599a033e5b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2018-09-05 18:18           ` Jacob Pan
@ 2018-09-06 11:10           ` Jean-Philippe Brucker
  1 sibling, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-06 11:10 UTC (permalink / raw)
  To: Auger Eric, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 05/09/2018 13:14, Auger Eric wrote:
>> +static struct io_mm *
>> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
>> +	    struct mm_struct *mm, unsigned long flags)
>> +{
>> +	int ret;
>> +	int pasid;
>> +	struct io_mm *io_mm;
>> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
>> +
> don't you need to check param != NULL and flags are compatible with
> those set at init?

It would be redundant, parameters are already checked by bind().
Following your comment below I think this function should also be called
under iommu_param->lock

>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_sva_lock);
>> +	pasid = idr_alloc(&iommu_pasid_idr, io_mm, param->min_pasid,
>> +			  param->max_pasid + 1, GFP_ATOMIC);
> isn't it param->max_pasid - 1?

max_pasid is the last allocatable PASID, and the 'end' parameter of
idr_alloc is exclusive, so this needs to be max_pasid + 1.

>> +static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
>> +			struct io_mm *io_mm, void *drvdata)
>> +{
>> +	int ret;
>> +	bool attach_domain = true;
>> +	int pasid = io_mm->pasid;
>> +	struct iommu_bond *bond, *tmp;
>> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
>> +
>> +	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
>> +		return -ENODEV;
> don't you need to check param is not NULL?

As mm_alloc, this is called by bind() which already performs argument checks

>> +
>> +	if (pasid > param->max_pasid || pasid < param->min_pasid)
> pasid >= param->max_pasid ?

max_pasid is inclusive

>> +	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
> the fact the mm_attach/detach() must not sleep may be documented in the
> API doc.

Ok

>>  int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
>>  			    int *pasid, unsigned long flags, void *drvdata)
>>  {
>> -	return -ENOSYS; /* TODO */
>> +	int i, ret = 0;
>> +	struct io_mm *io_mm = NULL;
>> +	struct iommu_domain *domain;
>> +	struct iommu_bond *bond = NULL, *tmp;
>> +	struct iommu_sva_param *param = dev->iommu_param->sva_param;
>> +
>> +	domain = iommu_get_domain_for_dev(dev);
>> +	if (!domain)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * The device driver does not call sva_device_init/shutdown and
>> +	 * bind/unbind concurrently, so no need to take the param lock.
>> +	 */
> what does guarantee that?

The doc for iommu_sva_bind_device mandates that iommu_sva_device_init()
is called before bind(), but nothing is said about unbind and shutdown.
I think that was just based on the assumption that the device driver
doesn't have any reason to call unbind and shutdown concurrently, but
taking the lock here and in unbind is probably safer.

>> +	ret = io_mm_attach(domain, dev, io_mm, drvdata);
>> +	if (ret)
>> +		io_mm_put(io_mm);
> dont't you want to free the io_mm if just allocated?

We do: if the io_mm has just been allocated, it has a single reference
so io_mm_put frees it.

>> + * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
>> + * @mm_detach: detach io_mm from a device. Remove PASID entry and
>> + *             flush associated TLB entries.
> if necessary too?

Right

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 04/40] iommu/sva: Add a mm_exit callback for device drivers
       [not found]         ` <d1dc28c4-7742-9c41-3f91-3fbcb8b13c1c-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-09-06 11:10           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-06 11:10 UTC (permalink / raw)
  To: Auger Eric, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 05/09/2018 14:23, Auger Eric wrote:
>> + * If the driver intends to share process address spaces, it should pass a valid
>> + * @mm_exit handler. Otherwise @mm_exit can be NULL.
> I don't get case where mm_exit is allowed to be NULL.

Right, this comment is a bit premature. Next version adds a "private
PASID" patch to allocate private address spaces per PASID (modifiable
with map/unmap). That mode doesn't require mm_exit, and I can move the
comment there

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]             ` <03d31ba5-1eda-ea86-8c0c-91d14c86fe83-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-06 11:12               ` Christian König
       [not found]                 ` <ed39159c-087e-7e56-7d29-d1de9fa1677f-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Christian König @ 2018-09-06 11:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8

Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker:
> Hi Eric,
>
> Thanks for reviewing
>
> On 05/09/2018 12:29, Auger Eric wrote:
>>> +int iommu_sva_device_init(struct device *dev, unsigned long features,
>>> +			  unsigned int max_pasid)
>> what about min_pasid?
> No one asked for it... The max_pasid parameter is here for drivers that
> have vendor-specific PASID size limits, such as AMD KFD (see
> kfd_iommu_device_init and
> https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
> the PASID size will only depend on the PCI PASID capability and the
> IOMMU limits, both known by the IOMMU driver, so device drivers won't
> have to set max_pasid.
>
> IOMMU drivers need to set min_pasid in the sva_device_init callback
> because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
> (Intel Vt-d rev2), but at the moment I can't see a reason for device
> drivers to override min_pasid

Sorry to ruin your day, but if I'm not completely mistaken PASID zero is 
reserved in the AMD KFD as well.

Regards,
Christian.

>
>>> +	/*
>>> +	 * IOMMU driver updates the limits depending on the IOMMU and device
>>> +	 * capabilities.
>>> +	 */
>>> +	ret = domain->ops->sva_device_init(dev, param);
>>> +	if (ret)
>>> +		goto err_free_param;
>> So you are likely to call sva_device_init even if it was already called
>> (ie. dev->iommu_param->sva_param is already set). Can't you test whether
>> dev->iommu_param->sva_param is already set first?
> Right, that's probably better
>
>>> +/**
>>> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
>>> + * @dev: the device
>>> + *
>>> + * Disable SVA. Device driver should ensure that the device isn't performing any
>>> + * DMA while this function is running.
>>> + */
>>> +int iommu_sva_device_shutdown(struct device *dev)
>> Not sure the returned value is required for a shutdown operation.
> I don't know either. I like them because they help me debug, but are
> otherwise rather useless if we don't describe precise semantics. The
> caller cannot do anything with it. Given that the corresponding IOMMU op
> is already void, I can change this function to void as well
>
>>> +struct iommu_sva_param {
>> What are the feature values?
> At the moment only IOMMU_SVA_FEAT_IOPF, introduced by patch 09
>
>>> +	unsigned long features;
>>> +	unsigned int min_pasid;
>>> +	unsigned int max_pasid;
>>> +};
>>> +
>>>   /**
>>>    * struct iommu_ops - iommu ops and capabilities
>>>    * @capable: check capability
>>> @@ -219,6 +225,8 @@ struct page_response_msg {
>>>    * @domain_free: free iommu domain
>>>    * @attach_dev: attach device to an iommu domain
>>>    * @detach_dev: detach device from an iommu domain
>>> + * @sva_device_init: initialize Shared Virtual Adressing for a device
>> Addressing
>>> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
>> Addressing
> Nice catch
>
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                 ` <ed39159c-087e-7e56-7d29-d1de9fa1677f-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-06 12:45                   ` Jean-Philippe Brucker
       [not found]                     ` <f0b317d5-e2e9-5478-952c-05e8b97bd68b-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-06 12:45 UTC (permalink / raw)
  To: Christian König, Auger Eric,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8

On 06/09/2018 12:12, Christian König wrote:
> Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker:
>> Hi Eric,
>>
>> Thanks for reviewing
>>
>> On 05/09/2018 12:29, Auger Eric wrote:
>>>> +int iommu_sva_device_init(struct device *dev, unsigned long features,
>>>> +              unsigned int max_pasid)
>>> what about min_pasid?
>> No one asked for it... The max_pasid parameter is here for drivers that
>> have vendor-specific PASID size limits, such as AMD KFD (see
>> kfd_iommu_device_init and
>> https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
>> the PASID size will only depend on the PCI PASID capability and the
>> IOMMU limits, both known by the IOMMU driver, so device drivers won't
>> have to set max_pasid.
>>
>> IOMMU drivers need to set min_pasid in the sva_device_init callback
>> because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
>> (Intel Vt-d rev2), but at the moment I can't see a reason for device
>> drivers to override min_pasid
> 
> Sorry to ruin your day, but if I'm not completely mistaken PASID zero is
> reserved in the AMD KFD as well.

Heh, fair enough. I'll add the min_pasid parameter

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces
  2018-09-05 18:18           ` Jacob Pan
@ 2018-09-06 17:40             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-06 17:40 UTC (permalink / raw)
  To: Jacob Pan, Auger Eric
  Cc: xieyisheng1, ilias.apalodimas, kvm, linux-pci, xuzaibo,
	jonathan.cameron, Will Deacon, okaya, linux-mm, yi.l.liu,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	liudongdong3, rfranz

On 05/09/2018 19:18, Jacob Pan wrote:
> On Wed, 5 Sep 2018 14:14:12 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>>> + *
>>> + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used
>>> to hold
>>> + * non-PASID translations. In this case PASID 0 is reserved and
>>> entry 0 points
>>> + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base
>>> would be held in
>>> + * the device table and PASID 0 would be available to the
>>> allocator.
>>> + */  
>> very nice explanation
> With the new Vt-d 3.0 spec., 2nd level IO page table base is no longer
> held in the device context table. Instead it is held in the PASID table
> entry pointed by the RID_PASID field in the device context entry. If
> RID_PASID = 0, then it is the same as ARM and AMD IOMMUs.
> You can refer to ch3.4.3 of the VT-d spec.

I could simplify that paragraph by removing the specific implementations:

"In some IOMMUs, entry 0 of the PASID table can be used to hold
non-PASID translations. In this case PASID 0 is reserved and entry 0
points to the io_pgtable base. In other IOMMUs the io_pgtable base is
held in the device table and PASID 0 is available to the allocator."

I guess in Linux there isn't any reason to set RID_PASID to a non-zero
value? Otherwise the iommu-sva allocator will need minor changes.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                     ` <f0b317d5-e2e9-5478-952c-05e8b97bd68b-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-07  8:55                       ` Christian König
       [not found]                         ` <2fd4a0a1-1a35-bf82-df84-b995cce011d9-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Christian König @ 2018-09-07  8:55 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8

Am 06.09.2018 um 14:45 schrieb Jean-Philippe Brucker:
> On 06/09/2018 12:12, Christian König wrote:
>> Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker:
>>> Hi Eric,
>>>
>>> Thanks for reviewing
>>>
>>> On 05/09/2018 12:29, Auger Eric wrote:
>>>>> +int iommu_sva_device_init(struct device *dev, unsigned long features,
>>>>> +              unsigned int max_pasid)
>>>> what about min_pasid?
>>> No one asked for it... The max_pasid parameter is here for drivers that
>>> have vendor-specific PASID size limits, such as AMD KFD (see
>>> kfd_iommu_device_init and
>>> https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
>>> the PASID size will only depend on the PCI PASID capability and the
>>> IOMMU limits, both known by the IOMMU driver, so device drivers won't
>>> have to set max_pasid.
>>>
>>> IOMMU drivers need to set min_pasid in the sva_device_init callback
>>> because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
>>> (Intel Vt-d rev2), but at the moment I can't see a reason for device
>>> drivers to override min_pasid
>> Sorry to ruin your day, but if I'm not completely mistaken PASID zero is
>> reserved in the AMD KFD as well.
> Heh, fair enough. I'll add the min_pasid parameter

I will take this as an opportunity to summarize some of the requirements 
we have for PASID management from the amdgpu driver point of view:

1. We need to be able to allocate PASID between 1 and some maximum. Zero 
is reserved as far as I know, but we don't necessary need a minimum.

2. We need to be able to allocate PASIDs without a process address space 
backing it. E.g. our hardware uses PASIDs even without Shared Virtual 
Addressing enabled to distinct clients from each other.
         Would be a pity if we need to still have a separate PASID 
handling because the system wide is only available when IOMMU is turned on.

3. Even after destruction of a process address space we need some grace 
period before a PASID is reused because it can be that the specific 
PASID is still in some hardware queues etc...
         At bare minimum all device drivers using process binding need 
to explicitly note to the core when they are done with a PASID.

4. It would be nice to have to be able to set a "void *" for each 
PASID/device combination while binding to a process which then can be 
queried later on based on the PASID.
         E.g. when you have a per PASID/device structure around anyway, 
just add an extra field.

5. It would be nice to have to allocate multiple PASIDs for the same 
process address space.
         E.g. some teams at AMD want to use a separate GPU address space 
for their userspace client library. I'm still trying to avoid that, but 
it is perfectly possible that we are going to need that.
         Additional to that it is sometimes quite useful for debugging 
to isolate where exactly an incorrect access (segfault) is coming from.

Let me know if there are some problems with that, especially I want to 
know if there is pushback on #5 so that I can forward that :)

Thanks,
Christian.

>
> Thanks,
> Jean

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                         ` <2fd4a0a1-1a35-bf82-df84-b995cce011d9-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-07 15:45                           ` Jean-Philippe Brucker
       [not found]                             ` <65e7accd-4446-19f5-c667-c6407e89cfa6-5wv7dgnIgG8@public.gmane.org>
  2018-09-13  7:26                           ` Tian, Kevin
  1 sibling, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-07 15:45 UTC (permalink / raw)
  To: Christian König, Auger Eric,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, Robin Murphy

On 07/09/2018 09:55, Christian König wrote:
> I will take this as an opportunity to summarize some of the requirements 
> we have for PASID management from the amdgpu driver point of view:

That's incredibly useful, thanks :)

> 1. We need to be able to allocate PASID between 1 and some maximum. Zero 
> is reserved as far as I know, but we don't necessary need a minimum.

Should be fine. The PASID range is restricted by the PCI PASID
capability, firmware description (for non-PCI devices), the IOMMU
capacity, and what the device driver passes to iommu_sva_device_init.
Not all IOMMUs reserve PASID 0 (AMD IOMMU without GIoSup doesn't, if I'm
not mistaken), so the KFD driver will need to pass min_pasid=1 to make
sure that 0 isn't allocated.

> 2. We need to be able to allocate PASIDs without a process address space 
> backing it. E.g. our hardware uses PASIDs even without Shared Virtual 
> Addressing enabled to distinct clients from each other.
>          Would be a pity if we need to still have a separate PASID 
> handling because the system wide is only available when IOMMU is turned on.

I'm still not sure about this one. From my point of view we shouldn't
add to the IOMMU subsystem helpers for devices without an IOMMU.
iommu-sva expects everywhere that the device has an iommu_domain, it's
the first thing we check on entry. Bypassing all of this would call
idr_alloc() directly, and wouldn't have any code in common with the
current iommu-sva. So it seems like you need a layer on top of iommu-sva
calling idr_alloc() when an IOMMU isn't present, but I don't think it
should be in drivers/iommu/

> 3. Even after destruction of a process address space we need some grace 
> period before a PASID is reused because it can be that the specific 
> PASID is still in some hardware queues etc...
>          At bare minimum all device drivers using process binding need 
> to explicitly note to the core when they are done with a PASID.

Right, much of the horribleness in iommu-sva deals with this:

The process dies, iommu-sva is notified and calls the mm_exit() function
passed by the device driver to iommu_sva_device_init(). In mm_exit() the
device driver needs to clear any reference to the PASID in hardware and
in its own structures. When the device driver returns from mm_exit(), it
effectively tells the core that it has finished using the PASID, and
iommu-sva can reuse the PASID for another process. mm_exit() is allowed
to block, so the device driver has time to clean up and flush the queues.

If the device driver finishes using the PASID before the process exits,
it just calls unbind().

> 4. It would be nice to have to be able to set a "void *" for each 
> PASID/device combination while binding to a process which then can be 
> queried later on based on the PASID.
>          E.g. when you have a per PASID/device structure around anyway, 
> just add an extra field.

iommu_sva_bind_device() takes a "drvdata" pointer that is stored
internally for the PASID/device combination (iommu_bond). It is passed
to mm_exit(), but I haven't added anything for the device driver to
query it back.

> 5. It would be nice to have to allocate multiple PASIDs for the same 
> process address space.
>          E.g. some teams at AMD want to use a separate GPU address space 
> for their userspace client library. I'm still trying to avoid that, but 
> it is perfectly possible that we are going to need that.

Two PASIDs pointing to the same process pgd? At first glance it seems
feasible, maybe with a flag passed to bind() and a few changes to
internal structures. It will duplicate ATC invalidation commands for
each process address space change (munmap etc) so you might take a
performance hit.

Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar to
what you describe, but I don't plan to support it in this series (the
io_mm model is already pretty complicated). I think it can be added
without too much effort in a future series, though with a different flag
name since we'd like to use "private PASID" for something else
(https://www.spinics.net/lists/dri-devel/msg177007.html).

Thanks,
Jean

>          Additional to that it is sometimes quite useful for debugging 
> to isolate where exactly an incorrect access (segfault) is coming from.
> 
> Let me know if there are some problems with that, especially I want to 
> know if there is pushback on #5 so that I can forward that :)
> 
> Thanks,
> Christian.
> 
>>
>> Thanks,
>> Jean
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                             ` <65e7accd-4446-19f5-c667-c6407e89cfa6-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-07 18:02                               ` Christian König
       [not found]                                 ` <5bbc0332-b94b-75cc-ca42-a9b196811daf-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Christian König @ 2018-09-07 18:02 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	Michal Hocko, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, Robin Murphy

Am 07.09.2018 um 17:45 schrieb Jean-Philippe Brucker:
> On 07/09/2018 09:55, Christian König wrote:
>> I will take this as an opportunity to summarize some of the requirements
>> we have for PASID management from the amdgpu driver point of view:
> That's incredibly useful, thanks :)
>
>> 1. We need to be able to allocate PASID between 1 and some maximum. Zero
>> is reserved as far as I know, but we don't necessary need a minimum.
> Should be fine. The PASID range is restricted by the PCI PASID
> capability, firmware description (for non-PCI devices), the IOMMU
> capacity, and what the device driver passes to iommu_sva_device_init.
> Not all IOMMUs reserve PASID 0 (AMD IOMMU without GIoSup doesn't, if I'm
> not mistaken), so the KFD driver will need to pass min_pasid=1 to make
> sure that 0 isn't allocated.
>
>> 2. We need to be able to allocate PASIDs without a process address space
>> backing it. E.g. our hardware uses PASIDs even without Shared Virtual
>> Addressing enabled to distinct clients from each other.
>>           Would be a pity if we need to still have a separate PASID
>> handling because the system wide is only available when IOMMU is turned on.
> I'm still not sure about this one. From my point of view we shouldn't
> add to the IOMMU subsystem helpers for devices without an IOMMU.

I agree on that.

> iommu-sva expects everywhere that the device has an iommu_domain, it's
> the first thing we check on entry. Bypassing all of this would call
> idr_alloc() directly, and wouldn't have any code in common with the
> current iommu-sva. So it seems like you need a layer on top of iommu-sva
> calling idr_alloc() when an IOMMU isn't present, but I don't think it
> should be in drivers/iommu/

In this case I question if the PASID handling should be under 
drivers/iommu at all.

See I can have a mix of VM context which are bound to processes (some 
few) and VM contexts which are standalone and doesn't care for a process 
address space. But for each VM context I need a distinct PASID for the 
hardware to work.

I can live if we say if IOMMU is completely disabled we use a simple ida 
to allocate them, but when IOMMU is enabled I certainly need a way to 
reserve a PASID without an associated process.

>> 3. Even after destruction of a process address space we need some grace
>> period before a PASID is reused because it can be that the specific
>> PASID is still in some hardware queues etc...
>>           At bare minimum all device drivers using process binding need
>> to explicitly note to the core when they are done with a PASID.
> Right, much of the horribleness in iommu-sva deals with this:
>
> The process dies, iommu-sva is notified and calls the mm_exit() function
> passed by the device driver to iommu_sva_device_init(). In mm_exit() the
> device driver needs to clear any reference to the PASID in hardware and
> in its own structures. When the device driver returns from mm_exit(), it
> effectively tells the core that it has finished using the PASID, and
> iommu-sva can reuse the PASID for another process. mm_exit() is allowed
> to block, so the device driver has time to clean up and flush the queues.
>
> If the device driver finishes using the PASID before the process exits,
> it just calls unbind().

Exactly that's what Michal Hocko is probably going to not like at all.

Can we have a different approach where each driver is informed by the 
mm_exit(), but needs to explicitly call unbind() before a PASID is reused?

During that teardown transition it would be ideal if that PASID only 
points to a dummy root page directory with only invalid entries.

>
>> 4. It would be nice to have to be able to set a "void *" for each
>> PASID/device combination while binding to a process which then can be
>> queried later on based on the PASID.
>>           E.g. when you have a per PASID/device structure around anyway,
>> just add an extra field.
> iommu_sva_bind_device() takes a "drvdata" pointer that is stored
> internally for the PASID/device combination (iommu_bond). It is passed
> to mm_exit(), but I haven't added anything for the device driver to
> query it back.

Nice! Looks like all we need additionally is a function to retrieve that 
based on the PASID.

>> 5. It would be nice to have to allocate multiple PASIDs for the same
>> process address space.
>>           E.g. some teams at AMD want to use a separate GPU address space
>> for their userspace client library. I'm still trying to avoid that, but
>> it is perfectly possible that we are going to need that.
> Two PASIDs pointing to the same process pgd? At first glance it seems
> feasible, maybe with a flag passed to bind() and a few changes to
> internal structures. It will duplicate ATC invalidation commands for
> each process address space change (munmap etc) so you might take a
> performance hit.
>
> Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar to
> what you describe, but I don't plan to support it in this series (the
> io_mm model is already pretty complicated). I think it can be added
> without too much effort in a future series, though with a different flag
> name since we'd like to use "private PASID" for something else
> (https://www.spinics.net/lists/dri-devel/msg177007.html).

To be honest I hoped that you would say: No never! So that I have a good 
argument to pushback on such requirements :)

But if it's doable it would be at least nice to have for debugging.

Thanks a lot for working on that,
Christian.

>
> Thanks,
> Jean
>
>>           Additional to that it is sometimes quite useful for debugging
>> to isolate where exactly an incorrect access (segfault) is coming from.
>>
>> Let me know if there are some problems with that, especially I want to
>> know if there is pushback on #5 so that I can forward that :)
>>
>> Thanks,
>> Christian.
>>
>>> Thanks,
>>> Jean

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                                 ` <5bbc0332-b94b-75cc-ca42-a9b196811daf-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-07 21:25                                   ` Jacob Pan
  2018-09-08  7:29                                     ` Christian König
  2018-09-13  7:15                                     ` Tian, Kevin
  0 siblings, 2 replies; 123+ messages in thread
From: Jacob Pan @ 2018-09-07 21:25 UTC (permalink / raw)
  To: Christian König
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Will Deacon, Michal Hocko, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, Jean-Philippe Brucker,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org

On Fri, 7 Sep 2018 20:02:54 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 07.09.2018 um 17:45 schrieb Jean-Philippe Brucker:
> > On 07/09/2018 09:55, Christian König wrote:  
> >> I will take this as an opportunity to summarize some of the
> >> requirements we have for PASID management from the amdgpu driver
> >> point of view:  
> > That's incredibly useful, thanks :)
> >  
> >> 1. We need to be able to allocate PASID between 1 and some
> >> maximum. Zero is reserved as far as I know, but we don't necessary
> >> need a minimum.  
>  [...]  
> >> 2. We need to be able to allocate PASIDs without a process address
> >> space backing it. E.g. our hardware uses PASIDs even without
> >> Shared Virtual Addressing enabled to distinct clients from each
> >> other. Would be a pity if we need to still have a separate PASID
> >> handling because the system wide is only available when IOMMU is
> >> turned on.  
>  [...]  
> 
> I agree on that.
> 
> > iommu-sva expects everywhere that the device has an iommu_domain,
> > it's the first thing we check on entry. Bypassing all of this would
> > call idr_alloc() directly, and wouldn't have any code in common
> > with the current iommu-sva. So it seems like you need a layer on
> > top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
> > but I don't think it should be in drivers/iommu/  
> 
> In this case I question if the PASID handling should be under 
> drivers/iommu at all.
> 
> See I can have a mix of VM context which are bound to processes (some 
> few) and VM contexts which are standalone and doesn't care for a
> process address space. But for each VM context I need a distinct
> PASID for the hardware to work.
> 
> I can live if we say if IOMMU is completely disabled we use a simple
> ida to allocate them, but when IOMMU is enabled I certainly need a
> way to reserve a PASID without an associated process.
> 
VT-d would also have such requirement. There is a virtual command
register for allocate and free PASID for VM use. When that PASID
allocation request gets propagated to the host IOMMU driver, we need to
allocate PASID w/o mm.

If the PASID allocation is done via VFIO, can we have FD to track PASID
life cycle instead of mm_exit()? i.e. all FDs get closed before
mm_exit, I assume?

> >> 3. Even after destruction of a process address space we need some
> >> grace period before a PASID is reused because it can be that the
> >> specific PASID is still in some hardware queues etc...
> >>           At bare minimum all device drivers using process binding
> >> need to explicitly note to the core when they are done with a
> >> PASID.  
> > Right, much of the horribleness in iommu-sva deals with this:
> >
> > The process dies, iommu-sva is notified and calls the mm_exit()
> > function passed by the device driver to iommu_sva_device_init(). In
> > mm_exit() the device driver needs to clear any reference to the
> > PASID in hardware and in its own structures. When the device driver
> > returns from mm_exit(), it effectively tells the core that it has
> > finished using the PASID, and iommu-sva can reuse the PASID for
> > another process. mm_exit() is allowed to block, so the device
> > driver has time to clean up and flush the queues.
> >
> > If the device driver finishes using the PASID before the process
> > exits, it just calls unbind().  
> 
> Exactly that's what Michal Hocko is probably going to not like at all.
> 
> Can we have a different approach where each driver is informed by the 
> mm_exit(), but needs to explicitly call unbind() before a PASID is
> reused?
> 
> During that teardown transition it would be ideal if that PASID only 
> points to a dummy root page directory with only invalid entries.
> 
I guess this can be vendor specific, In VT-d I plan to mark PASID
entry not present and disable fault reporting while draining remaining
activities.

> >  
> >> 4. It would be nice to have to be able to set a "void *" for each
> >> PASID/device combination while binding to a process which then can
> >> be queried later on based on the PASID.
> >>           E.g. when you have a per PASID/device structure around
> >> anyway, just add an extra field.  
> > iommu_sva_bind_device() takes a "drvdata" pointer that is stored
> > internally for the PASID/device combination (iommu_bond). It is
> > passed to mm_exit(), but I haven't added anything for the device
> > driver to query it back.  
> 
> Nice! Looks like all we need additionally is a function to retrieve
> that based on the PASID.
> 
> >> 5. It would be nice to have to allocate multiple PASIDs for the
> >> same process address space.
> >>           E.g. some teams at AMD want to use a separate GPU
> >> address space for their userspace client library. I'm still trying
> >> to avoid that, but it is perfectly possible that we are going to
> >> need that.  
> > Two PASIDs pointing to the same process pgd? At first glance it
> > seems feasible, maybe with a flag passed to bind() and a few
> > changes to internal structures. It will duplicate ATC invalidation
> > commands for each process address space change (munmap etc) so you
> > might take a performance hit.
> >
> > Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar
> > to what you describe, but I don't plan to support it in this series
> > (the io_mm model is already pretty complicated). I think it can be
> > added without too much effort in a future series, though with a
> > different flag name since we'd like to use "private PASID" for
> > something else
> > (https://www.spinics.net/lists/dri-devel/msg177007.html).  
> 
> To be honest I hoped that you would say: No never! So that I have a
> good argument to pushback on such requirements :)
> 
> But if it's doable it would be at least nice to have for debugging.
> 
> Thanks a lot for working on that,
> Christian.
> 
> >
> > Thanks,
> > Jean
> >  
> >>           Additional to that it is sometimes quite useful for
> >> debugging to isolate where exactly an incorrect access (segfault)
> >> is coming from.
> >>
> >> Let me know if there are some problems with that, especially I
> >> want to know if there is pushback on #5 so that I can forward
> >> that :)
> >>
> >> Thanks,
> >> Christian.
> >>  
> >>> Thanks,
> >>> Jean  
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
  2018-09-07 21:25                                   ` Jacob Pan
@ 2018-09-08  7:29                                     ` Christian König
       [not found]                                       ` <3e3a6797-a233-911b-3a7d-d9b549160960-5C7GfCeVMHo@public.gmane.org>
  2018-09-13  7:15                                     ` Tian, Kevin
  1 sibling, 1 reply; 123+ messages in thread
From: Christian König @ 2018-09-08  7:29 UTC (permalink / raw)
  To: Jacob Pan
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Will Deacon, Michal Hocko, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, Jean-Philippe Brucker,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA@public.gmane.org

Am 07.09.2018 um 23:25 schrieb Jacob Pan:
> On Fri, 7 Sep 2018 20:02:54 +0200
> Christian König <christian.koenig@amd.com> wrote:
>> [SNIP]
>>> iommu-sva expects everywhere that the device has an iommu_domain,
>>> it's the first thing we check on entry. Bypassing all of this would
>>> call idr_alloc() directly, and wouldn't have any code in common
>>> with the current iommu-sva. So it seems like you need a layer on
>>> top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
>>> but I don't think it should be in drivers/iommu/
>> In this case I question if the PASID handling should be under
>> drivers/iommu at all.
>>
>> See I can have a mix of VM context which are bound to processes (some
>> few) and VM contexts which are standalone and doesn't care for a
>> process address space. But for each VM context I need a distinct
>> PASID for the hardware to work.
>>
>> I can live if we say if IOMMU is completely disabled we use a simple
>> ida to allocate them, but when IOMMU is enabled I certainly need a
>> way to reserve a PASID without an associated process.
>>
> VT-d would also have such requirement. There is a virtual command
> register for allocate and free PASID for VM use. When that PASID
> allocation request gets propagated to the host IOMMU driver, we need to
> allocate PASID w/o mm.
>
> If the PASID allocation is done via VFIO, can we have FD to track PASID
> life cycle instead of mm_exit()? i.e. all FDs get closed before
> mm_exit, I assume?

Yes, exactly. I just need a PASID which is never used by the OS for a 
process and we can easily give that back when the last FD reference is 
closed.

>>>> 3. Even after destruction of a process address space we need some
>>>> grace period before a PASID is reused because it can be that the
>>>> specific PASID is still in some hardware queues etc...
>>>>            At bare minimum all device drivers using process binding
>>>> need to explicitly note to the core when they are done with a
>>>> PASID.
>>> Right, much of the horribleness in iommu-sva deals with this:
>>>
>>> The process dies, iommu-sva is notified and calls the mm_exit()
>>> function passed by the device driver to iommu_sva_device_init(). In
>>> mm_exit() the device driver needs to clear any reference to the
>>> PASID in hardware and in its own structures. When the device driver
>>> returns from mm_exit(), it effectively tells the core that it has
>>> finished using the PASID, and iommu-sva can reuse the PASID for
>>> another process. mm_exit() is allowed to block, so the device
>>> driver has time to clean up and flush the queues.
>>>
>>> If the device driver finishes using the PASID before the process
>>> exits, it just calls unbind().
>> Exactly that's what Michal Hocko is probably going to not like at all.
>>
>> Can we have a different approach where each driver is informed by the
>> mm_exit(), but needs to explicitly call unbind() before a PASID is
>> reused?
>>
>> During that teardown transition it would be ideal if that PASID only
>> points to a dummy root page directory with only invalid entries.
>>
> I guess this can be vendor specific, In VT-d I plan to mark PASID
> entry not present and disable fault reporting while draining remaining
> activities.

Sounds good to me.

Point is at least in the case where the process was killed by the OOM 
killer we should not block in mm_exit().

Instead operations issued by the process to a device driver which uses 
SVA needs to be terminated as soon as possible to make sure that the OOM 
killer can advance.

Thanks,
Christian.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 17/40] iommu/arm-smmu-v3: Link domains and devices
       [not found]     ` <20180511190641.23008-18-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
  2018-05-17 16:07       ` Jonathan Cameron
@ 2018-09-10 15:16       ` Auger Eric
  1 sibling, 0 replies; 123+ messages in thread
From: Auger Eric @ 2018-09-10 15:16 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	liubo95-hv44wF8Li93QT0dZR+AlfA, robin.murphy-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

Hi Jean-Philippe,

On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> When removing a mapping from a domain, we need to send an invalidation to
> all devices that might have stored it in their Address Translation Cache
> (ATC). In addition when updating the context descriptor of a live domain,
> we'll need to send invalidations for all devices attached to it.
> 
> Maintain a list of devices in each domain, protected by a spinlock. It is
> updated every time we attach or detach devices to and from domains.
> 
> It needs to be a spinlock because we'll invalidate ATC entries from
> within hardirq-safe contexts, but it may be possible to relax the read
> side with RCU later.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 1d647104bccc..c892f012fb43 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -595,6 +595,11 @@ struct arm_smmu_device {
>  struct arm_smmu_master_data {
>  	struct arm_smmu_device		*smmu;
>  	struct arm_smmu_strtab_ent	ste;
> +
> +	struct arm_smmu_domain		*domain;
> +	struct list_head		list; /* domain->devices */
> +
> +	struct device			*dev;
This field addition and associated assignment in arm_smmu_attach_dev()
is not really documented in the commit message.

>  };
>  
>  /* SMMU private data for an IOMMU domain */
> @@ -618,6 +623,9 @@ struct arm_smmu_domain {
>  	};
>  
>  	struct iommu_domain		domain;
> +
> +	struct list_head		devices;
> +	spinlock_t			devices_lock;
>  };
>  
>  struct arm_smmu_option_prop {
> @@ -1470,6 +1478,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  	}
>  
>  	mutex_init(&smmu_domain->init_mutex);
> +	INIT_LIST_HEAD(&smmu_domain->devices);
> +	spin_lock_init(&smmu_domain->devices_lock);
> +
>  	return &smmu_domain->domain;
>  }
>  
> @@ -1685,7 +1696,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
>  
>  static void arm_smmu_detach_dev(struct device *dev)
>  {
> +	unsigned long flags;
>  	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
> +	struct arm_smmu_domain *smmu_domain = master->domain;
> +
> +	if (smmu_domain) {
> +		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +		list_del(&master->list);
> +		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +		master->domain = NULL;
> +	}
>  
>  	master->ste.assigned = false;
>  	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
> @@ -1694,6 +1715,7 @@ static void arm_smmu_detach_dev(struct device *dev)
>  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
>  	int ret = 0;
> +	unsigned long flags;
>  	struct arm_smmu_device *smmu;
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_master_data *master;
> @@ -1729,6 +1751,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	}
>  
>  	ste->assigned = true;
> +	master->domain = smmu_domain;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_add(&master->list, &smmu_domain->devices);
it is not totally obvious to me why master->domain = smmu_domain isn't
within the lock either for consistency. Same when deleting the node.

Thanks

Eric
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
>  		ste->s1_cfg = NULL;
> @@ -1847,6 +1874,7 @@ static int arm_smmu_add_device(struct device *dev)
>  			return -ENOMEM;
>  
>  		master->smmu = smmu;
> +		master->dev = dev;
>  		fwspec->iommu_priv = master;
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                                       ` <3e3a6797-a233-911b-3a7d-d9b549160960-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-12 12:40                                         ` Jean-Philippe Brucker
       [not found]                                           ` <9445a0be-fb5b-d195-4fdf-7ad6cb36ef4f-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 123+ messages in thread
From: Jean-Philippe Brucker @ 2018-09-12 12:40 UTC (permalink / raw)
  To: Christian König, Jacob Pan
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Will Deacon, Michal Hocko, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, liubo95-hv44wF8Li93QT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

On 08/09/2018 08:29, Christian König wrote:
> Yes, exactly. I just need a PASID which is never used by the OS for a 
> process and we can easily give that back when the last FD reference is 
> closed.

Alright, iommu-sva can get its PASID from this external allocator as
well, as long as it has an interface similar to idr. Where would it go,
drivers/base/, mm/, kernel/...?

>>>> The process dies, iommu-sva is notified and calls the mm_exit()
>>>> function passed by the device driver to iommu_sva_device_init(). In
>>>> mm_exit() the device driver needs to clear any reference to the
>>>> PASID in hardware and in its own structures. When the device driver
>>>> returns from mm_exit(), it effectively tells the core that it has
>>>> finished using the PASID, and iommu-sva can reuse the PASID for
>>>> another process. mm_exit() is allowed to block, so the device
>>>> driver has time to clean up and flush the queues.
>>>>
>>>> If the device driver finishes using the PASID before the process
>>>> exits, it just calls unbind().
>>> Exactly that's what Michal Hocko is probably going to not like at all.
>>>
>>> Can we have a different approach where each driver is informed by the
>>> mm_exit(), but needs to explicitly call unbind() before a PASID is
>>> reused?

It's awful from the IOMMU driver perspective. In addition to "enabled"
and "disabled" PASID states, you add "disabled but DMA still running
normally". Between that new state and "disabled", the IOMMU will be
flooded by translation faults (non-recoverable ones), which it needs to
ignore instead of reporting to the kernel. Not all IOMMUs can deal with
this in hardware (SMMU and VT-d can quiesce translation faults
per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to
filter fault events themselves, depending on the PASID state.

>>> During that teardown transition it would be ideal if that PASID only
>>> points to a dummy root page directory with only invalid entries.
>>>
>> I guess this can be vendor specific, In VT-d I plan to mark PASID
>> entry not present and disable fault reporting while draining remaining
>> activities.
>
> Sounds good to me.
> 
> Point is at least in the case where the process was killed by the OOM 
> killer we should not block in mm_exit().
>
> Instead operations issued by the process to a device driver which uses 
> SVA needs to be terminated as soon as possible to make sure that the OOM 
> killer can advance.

I don't see how we're preventing the OOM killer from advancing, so I'm
looking for a stronger argument that justifies adding this complexity to
IOMMU drivers. Time limit of the release MMU notifier, locking
requirement, a concrete example where things break, even a comment
somewhere in mm/ would do...

In my tests I can't manage to disturb the OOM killer, but I could be
missing some special case. Even if the mm_exit() callback
(unrealistically) sleeps for 60 seconds, the OOM killer isn't affected
because oom_reap_task_mm() wipes the victim's address space in another
thread, either before or while the release notifier is running.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                                           ` <9445a0be-fb5b-d195-4fdf-7ad6cb36ef4f-5wv7dgnIgG8@public.gmane.org>
@ 2018-09-12 12:56                                             ` Christian König
  0 siblings, 0 replies; 123+ messages in thread
From: Christian König @ 2018-09-12 12:56 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Will Deacon, Michal Hocko, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, liubo95-hv44wF8Li93QT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

Am 12.09.2018 um 14:40 schrieb Jean-Philippe Brucker:
> On 08/09/2018 08:29, Christian König wrote:
>> Yes, exactly. I just need a PASID which is never used by the OS for a
>> process and we can easily give that back when the last FD reference is
>> closed.
> Alright, iommu-sva can get its PASID from this external allocator as
> well, as long as it has an interface similar to idr. Where would it go,
> drivers/base/, mm/, kernel/...?

Good question, my initial instinct was to put it under drivers/pci.

But AFAIKS now you are supporting SVA implementations which are not 
based on PCI.

So drivers/base sounds like a good place to me.

>
>>>>> The process dies, iommu-sva is notified and calls the mm_exit()
>>>>> function passed by the device driver to iommu_sva_device_init(). In
>>>>> mm_exit() the device driver needs to clear any reference to the
>>>>> PASID in hardware and in its own structures. When the device driver
>>>>> returns from mm_exit(), it effectively tells the core that it has
>>>>> finished using the PASID, and iommu-sva can reuse the PASID for
>>>>> another process. mm_exit() is allowed to block, so the device
>>>>> driver has time to clean up and flush the queues.
>>>>>
>>>>> If the device driver finishes using the PASID before the process
>>>>> exits, it just calls unbind().
>>>> Exactly that's what Michal Hocko is probably going to not like at all.
>>>>
>>>> Can we have a different approach where each driver is informed by the
>>>> mm_exit(), but needs to explicitly call unbind() before a PASID is
>>>> reused?
> It's awful from the IOMMU driver perspective. In addition to "enabled"
> and "disabled" PASID states, you add "disabled but DMA still running
> normally". Between that new state and "disabled", the IOMMU will be
> flooded by translation faults (non-recoverable ones), which it needs to
> ignore instead of reporting to the kernel. Not all IOMMUs can deal with
> this in hardware (SMMU and VT-d can quiesce translation faults
> per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to
> filter fault events themselves, depending on the PASID state.

Puh, yeah that is probably true.

Ok let us skip that for a moment, we just need to invest more work in 
killing DMA operations quickly when the process address space is teared 
down.

>>>> During that teardown transition it would be ideal if that PASID only
>>>> points to a dummy root page directory with only invalid entries.
>>>>
>>> I guess this can be vendor specific, In VT-d I plan to mark PASID
>>> entry not present and disable fault reporting while draining remaining
>>> activities.
>> Sounds good to me.
>>
>> Point is at least in the case where the process was killed by the OOM
>> killer we should not block in mm_exit().
>>
>> Instead operations issued by the process to a device driver which uses
>> SVA needs to be terminated as soon as possible to make sure that the OOM
>> killer can advance.
> I don't see how we're preventing the OOM killer from advancing, so I'm
> looking for a stronger argument that justifies adding this complexity to
> IOMMU drivers. Time limit of the release MMU notifier, locking
> requirement, a concrete example where things break, even a comment
> somewhere in mm/ would do...
>
> In my tests I can't manage to disturb the OOM killer, but I could be
> missing some special case. Even if the mm_exit() callback
> (unrealistically) sleeps for 60 seconds,

Well you are *COMPLETELY* under estimating this. A compute task with a 
huge wave launch can take multiple minutes to tear down.

That's why I'm so concerned about that, but to be honest I think that 
just the hardware needs to become better and we need to be able to block 
dead tasks from spawning threads again.

Regards,
Christian.

>   the OOM killer isn't affected
> because oom_reap_task_mm() wipes the victim's address space in another
> thread, either before or while the release notifier is running.
>
> Thanks,
> Jean

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
  2018-09-07 21:25                                   ` Jacob Pan
  2018-09-08  7:29                                     ` Christian König
@ 2018-09-13  7:15                                     ` Tian, Kevin
  1 sibling, 0 replies; 123+ messages in thread
From: Tian, Kevin @ 2018-09-13  7:15 UTC (permalink / raw)
  To: Jacob Pan, Christian König
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Will Deacon, Michal Hocko, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Raj, Ashok,
	Jean-Philippe Brucker, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, liubo95-hv44wF8Li93QT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

> From: Jacob Pan [mailto:jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org]
> Sent: Saturday, September 8, 2018 5:25 AM
> > > iommu-sva expects everywhere that the device has an iommu_domain,
> > > it's the first thing we check on entry. Bypassing all of this would
> > > call idr_alloc() directly, and wouldn't have any code in common
> > > with the current iommu-sva. So it seems like you need a layer on
> > > top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
> > > but I don't think it should be in drivers/iommu/
> >
> > In this case I question if the PASID handling should be under
> > drivers/iommu at all.
> >
> > See I can have a mix of VM context which are bound to processes (some
> > few) and VM contexts which are standalone and doesn't care for a
> > process address space. But for each VM context I need a distinct
> > PASID for the hardware to work.

I'm confused about VM context vs. process. Is VM referring to Virtual
Machine or something else? If yes, I don't understand the binding part
- what VM context is bound to (host?) process?

> >
> > I can live if we say if IOMMU is completely disabled we use a simple
> > ida to allocate them, but when IOMMU is enabled I certainly need a
> > way to reserve a PASID without an associated process.
> >
> VT-d would also have such requirement. There is a virtual command
> register for allocate and free PASID for VM use. When that PASID
> allocation request gets propagated to the host IOMMU driver, we need to
> allocate PASID w/o mm.
> 

VT-d is a bit different. In host side, PASID allocation always happens in
Qemu's context, so those PASIDs are recorded with Qemu process, 
though the entries may point to guest page tables instead of host mm
of Qemu.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
       [not found]                         ` <2fd4a0a1-1a35-bf82-df84-b995cce011d9-5C7GfCeVMHo@public.gmane.org>
  2018-09-07 15:45                           ` Jean-Philippe Brucker
@ 2018-09-13  7:26                           ` Tian, Kevin
  1 sibling, 0 replies; 123+ messages in thread
From: Tian, Kevin @ 2018-09-13  7:26 UTC (permalink / raw)
  To: Christian König, Jean-Philippe Brucker, Auger Eric,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	liubo95-hv44wF8Li93QT0dZR+AlfA, Raj, Ashok,
	robin.murphy-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	rgummal-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

> From: Christian König
> Sent: Friday, September 7, 2018 4:56 PM
> 
> 5. It would be nice to have to allocate multiple PASIDs for the same
> process address space.
>          E.g. some teams at AMD want to use a separate GPU address space
> for their userspace client library. I'm still trying to avoid that, but
> it is perfectly possible that we are going to need that.
>          Additional to that it is sometimes quite useful for debugging
> to isolate where exactly an incorrect access (segfault) is coming from.
> 
> Let me know if there are some problems with that, especially I want to
> know if there is pushback on #5 so that I can forward that :)
> 

We have similar requirement, except that it is "multiple PASIDs for
same process" instead of "for same process address space".

Intel VT-d goes to a 'true' system-wide PASID allocation policy, 
cross both host processes and guest processes. As Jacob explains,
there will be a virtual cmd register on virtual vtd, through which
guest IOMMU driver requests to get system-wide PASIDs allocated
by host IOMMU driver.

with that design, Qemu represents all guest processes in host
side, thus will get "multiple PASIDs allocated for same process".
However instead of binding all PASIDs to same host address space
of  Qemu, each of PASID entry points to guest address space if 
used by guest process.

Thanks
Kevin
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 123+ messages in thread

end of thread, other threads:[~2018-09-13  7:26 UTC | newest]

Thread overview: 123+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-11 19:06 [PATCH v2 00/40] Shared Virtual Addressing for the IOMMU Jean-Philippe Brucker
     [not found] ` <20180511190641.23008-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-11 19:06   ` [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-16 20:41       ` Jacob Pan
2018-05-17 10:02         ` Jean-Philippe Brucker
2018-05-17 17:00           ` Jacob Pan
2018-09-05 11:29       ` Auger Eric
     [not found]         ` <bf42affd-e9d0-e4fc-6d28-f3c3f7795348-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-09-06 11:09           ` Jean-Philippe Brucker
     [not found]             ` <03d31ba5-1eda-ea86-8c0c-91d14c86fe83-5wv7dgnIgG8@public.gmane.org>
2018-09-06 11:12               ` Christian König
     [not found]                 ` <ed39159c-087e-7e56-7d29-d1de9fa1677f-5C7GfCeVMHo@public.gmane.org>
2018-09-06 12:45                   ` Jean-Philippe Brucker
     [not found]                     ` <f0b317d5-e2e9-5478-952c-05e8b97bd68b-5wv7dgnIgG8@public.gmane.org>
2018-09-07  8:55                       ` Christian König
     [not found]                         ` <2fd4a0a1-1a35-bf82-df84-b995cce011d9-5C7GfCeVMHo@public.gmane.org>
2018-09-07 15:45                           ` Jean-Philippe Brucker
     [not found]                             ` <65e7accd-4446-19f5-c667-c6407e89cfa6-5wv7dgnIgG8@public.gmane.org>
2018-09-07 18:02                               ` Christian König
     [not found]                                 ` <5bbc0332-b94b-75cc-ca42-a9b196811daf-5C7GfCeVMHo@public.gmane.org>
2018-09-07 21:25                                   ` Jacob Pan
2018-09-08  7:29                                     ` Christian König
     [not found]                                       ` <3e3a6797-a233-911b-3a7d-d9b549160960-5C7GfCeVMHo@public.gmane.org>
2018-09-12 12:40                                         ` Jean-Philippe Brucker
     [not found]                                           ` <9445a0be-fb5b-d195-4fdf-7ad6cb36ef4f-5wv7dgnIgG8@public.gmane.org>
2018-09-12 12:56                                             ` Christian König
2018-09-13  7:15                                     ` Tian, Kevin
2018-09-13  7:26                           ` Tian, Kevin
2018-05-11 19:06   ` [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-3-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-17 13:10       ` Jonathan Cameron
     [not found]         ` <20180517141058.00001c76-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-21 14:43           ` Jean-Philippe Brucker
2018-09-05 11:29       ` Auger Eric
     [not found]         ` <471873d4-a1a6-1a3a-cf17-1e686b4ade96-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-09-06 11:09           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 03/40] iommu/sva: Manage process address spaces Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-4-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-16 23:31       ` Jacob Pan
2018-05-17 10:02         ` Jean-Philippe Brucker
     [not found]           ` <de478769-9f7a-d40b-a55e-e2c63ad883e8-5wv7dgnIgG8@public.gmane.org>
2018-05-22 16:43             ` Jacob Pan
2018-05-24 11:44               ` Jean-Philippe Brucker
2018-05-24 11:50                 ` Ilias Apalodimas
2018-05-24 15:04                   ` Jean-Philippe Brucker
     [not found]                     ` <19e82a74-429a-3f86-119e-32b12082d0ff-5wv7dgnIgG8@public.gmane.org>
2018-05-25  6:33                       ` Ilias Apalodimas
2018-05-25  8:39                         ` Jonathan Cameron
     [not found]                           ` <20180525093959.000040a7-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-26  2:24                             ` Kenneth Lee
2018-05-26  2:24                           ` Kenneth Lee
     [not found]                           ` <20180526022445.GA6069@kllp05>
2018-06-11 16:10                             ` Kenneth Lee
2018-06-11 16:10                             ` Kenneth Lee
2018-06-11 16:32                           ` Kenneth Lee
2018-05-17 14:25       ` Jonathan Cameron
     [not found]         ` <20180517150516.000067ca-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-21 14:44           ` Jean-Philippe Brucker
2018-09-05 12:14       ` Auger Eric
     [not found]         ` <d785ec89-6743-d0f1-1a7f-85599a033e5b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-09-05 18:18           ` Jacob Pan
2018-09-06 17:40             ` Jean-Philippe Brucker
2018-09-06 11:10           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 04/40] iommu/sva: Add a mm_exit callback for device drivers Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-5-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-09-05 13:23       ` Auger Eric
     [not found]         ` <d1dc28c4-7742-9c41-3f91-3fbcb8b13c1c-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-09-06 11:10           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 05/40] iommu/sva: Track mm changes with an MMU notifier Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-6-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-17 14:25       ` Jonathan Cameron
     [not found]         ` <20180517152514.00004247-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-21 14:44           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 06/40] iommu/sva: Search mm by PASID Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 07/40] iommu: Add a page fault handler Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-8-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-17 15:25       ` Jonathan Cameron
     [not found]         ` <20180517162555.00002bd3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-21 14:48           ` Jean-Philippe Brucker
2018-05-18 18:04       ` Jacob Pan
2018-05-21 14:49         ` Jean-Philippe Brucker
     [not found]           ` <8a640794-a6f3-fa01-82a9-06479a6f779a-5wv7dgnIgG8@public.gmane.org>
2018-05-22 23:35             ` Jacob Pan
2018-05-24 11:44               ` Jean-Philippe Brucker
     [not found]                 ` <bdf9f221-ab97-2168-d072-b7f6a0dba840-5wv7dgnIgG8@public.gmane.org>
2018-05-26  0:35                   ` Jacob Pan
2018-05-29 10:00                     ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 08/40] iommu/iopf: Handle mm faults Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 09/40] iommu/sva: Register page fault handler Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 10/40] mm: export symbol mm_access Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 11/40] mm: export symbol find_get_task_by_vpid Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 12/40] mm: export symbol mmput_async Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-14-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-17 15:58       ` Jonathan Cameron
     [not found]         ` <20180517165845.00000cc9-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-21 14:51           ` Jean-Philippe Brucker
2018-05-23  9:38       ` Xu Zaibo
     [not found]         ` <5B0536A3.1000304-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-24 11:44           ` Jean-Philippe Brucker
     [not found]             ` <cd13f60d-b282-3804-4ca7-2d34476c597f-5wv7dgnIgG8@public.gmane.org>
2018-05-24 12:35               ` Xu Zaibo
     [not found]                 ` <5B06B17C.1090809-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-24 15:04                   ` Jean-Philippe Brucker
     [not found]                     ` <205c1729-8026-3efe-c363-d37d7150d622-5wv7dgnIgG8@public.gmane.org>
2018-05-25  2:39                       ` Xu Zaibo
2018-05-25  9:47                         ` Jean-Philippe Brucker
     [not found]                           ` <fea420ff-e738-e2ed-ab1e-a3f4dde4b6ef-5wv7dgnIgG8@public.gmane.org>
2018-05-26  3:53                             ` Xu Zaibo
     [not found]                               ` <5B08DA21.3070507-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-29 11:55                                 ` Jean-Philippe Brucker
     [not found]                                   ` <99ff4f89-86ef-a251-894c-8aa8a47d2a69-5wv7dgnIgG8@public.gmane.org>
2018-05-29 12:24                                     ` Xu Zaibo
2018-08-27  8:06       ` Xu Zaibo
     [not found]         ` <5B83B11E.7010807-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-08-31 13:34           ` Jean-Philippe Brucker
     [not found]             ` <1d5b6529-4e5a-723c-3f1b-dd5a9adb490c-5wv7dgnIgG8@public.gmane.org>
2018-09-01  2:23               ` Xu Zaibo
     [not found]                 ` <5B89F818.7060300-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-09-03 10:34                   ` Jean-Philippe Brucker
     [not found]                     ` <3a961aff-e830-64bb-b6a9-14e08de1abf5-5wv7dgnIgG8@public.gmane.org>
2018-09-04  2:12                       ` Xu Zaibo
     [not found]                         ` <5B8DEA15.7020404-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-09-04 10:57                           ` Jean-Philippe Brucker
     [not found]                             ` <bc27f902-4d12-21b7-b9e9-18bcae170503-5wv7dgnIgG8@public.gmane.org>
2018-09-05  3:15                               ` Xu Zaibo
     [not found]                                 ` <5B8F4A59.20004-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-09-05 11:02                                   ` Jean-Philippe Brucker
     [not found]                                     ` <b51107b8-a525-13ce-f4c3-d423b8502c27-5wv7dgnIgG8@public.gmane.org>
2018-09-06  7:26                                       ` Xu Zaibo
2018-05-11 19:06   ` [PATCH v2 14/40] dt-bindings: document stall and PASID properties for IOMMU masters Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 15/40] iommu/of: Add stall and pasid properties to iommu_fwspec Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 16/40] arm64: mm: Pin down ASIDs for sharing mm with devices Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-17-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-15 14:16       ` Catalin Marinas
     [not found]         ` <20180515141658.vivrgcyww2pxumye-+1aNUgJU5qkijLcmloz0ER/iLCjYCKR+VpNB7YpNyf8@public.gmane.org>
2018-05-17 10:01           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 17/40] iommu/arm-smmu-v3: Link domains and devices Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-18-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-17 16:07       ` Jonathan Cameron
     [not found]         ` <20180517170748.00004927-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-05-21 14:49           ` Jean-Philippe Brucker
2018-09-10 15:16       ` Auger Eric
2018-05-11 19:06   ` [PATCH v2 18/40] iommu/io-pgtable-arm: Factor out ARM LPAE register defines Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 19/40] iommu: Add generic PASID table library Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 20/40] iommu/arm-smmu-v3: Move context descriptor code Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 21/40] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-22-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-31 11:01       ` Bharat Kumar Gogada
     [not found]         ` <BLUPR0201MB1505AA55707BE2E13392FFAFA5630-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-06-01 10:46           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 22/40] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 23/40] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 24/40] iommu/arm-smmu-v3: Seize private ASID Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 25/40] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 26/40] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 27/40] iommu/arm-smmu-v3: Add SVA feature checking Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 28/40] iommu/arm-smmu-v3: Implement mm operations Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 29/40] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 30/40] iommu/arm-smmu-v3: Register I/O Page Fault queue Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 31/40] iommu/arm-smmu-v3: Improve add_device error handling Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 32/40] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 33/40] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 34/40] ACPI/IORT: Check ATS capability in root complex nodes Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 35/40] iommu/arm-smmu-v3: Add support for PCI ATS Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-36-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-19 17:25       ` Sinan Kaya
     [not found]         ` <922474e8-0aa5-e022-0502-f1e51b0d4859-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-05-21 14:52           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 36/40] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 37/40] iommu/arm-smmu-v3: Disable tagged pointers Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 38/40] PCI: Make "PRG Response PASID Required" handling common Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 39/40] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker
     [not found]     ` <20180511190641.23008-40-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-05-25 14:08       ` Bharat Kumar Gogada
     [not found]         ` <BLUPR0201MB150513BBAA161355DE9B3A48A5690-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-05-29 10:27           ` Jean-Philippe Brucker
2018-05-11 19:06   ` [PATCH v2 40/40] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).